How do I analyze a bunch of profiles

How do I analyze a bunch of profiles - linux

I have a bunch of profiles about my application(200+ profiles per week), I want to analyze them for getting some performance information.
I don't know what information can be analyzed from these files, maybe
the slowest function or the hot-spot code in the different release versions?
I have no relevant experience in this field, has anyone had that?

Well, it depends on the kind of profiles you have, you can capture number of different kinds of profiles.
In general most people are interested in the CPU profile and heap profile since they allow you to see which functions/lines use the most amount of CPU and allocate the most memory.
There are a number of ways you can visualize the data and/or drill into them. You should take a look at help output of go tool pprof and the pprof blog article to get you going.
As for the amount of profiles. In general, people analyze them just one by one. If you want an average over multiple samples, you can merge multiple samples with pprof-merge.
You can substract one profile from another to see relative changes. By using the -diff_base and -base flags.
Yet another approach can be to extract percentages per function for each profile and see if they change over time. You can do this by parsing the output of go tool pprof -top {profile}. You can correlate this data with known events like software updates or high demand and see if one functions starts taking up more resources than others(it might not scale very well).
If you are ever that the point where you want to develop custom tools for analysis, you can use the pprof library to parse the profiles for you.

Related

How to get unsampled data from Google Analytics API in a specific day

I am building a package that uses the Google Analytics API for Python.
But, in severous cases when I have multiple dimensions the extraction by day is sampled.
I know that if I use sampling_level = LARGE will use a sample more accurate.
But, somebody knows if has a way to reduce a request that you can extract one day without sampling?
Grateful

setting sampling to LARGE is the only method we have to decide the amount of sampling but as you already know this doesn't prevent it.
The only way to reduce the chances of sampling is to request less data. A reduced number of dimensions and metrics as well as a shorter date range are the best ways to ensure that you dont get sampled data

This is probably not the answer you want to hear but, one way of getting unsampled data from Google analytics is to use unsampled reports. However this requires that you sign up for Google Marketing Platform. With these you can create an unsampled report request using the API or the UI.
There is also a way to export the data to Big Query. But you lose the analysis that Google provides and will have to do that yourself. This too requires that you sign up for Google Marketing Platform.

there are several tactics of building unsampled reports, most popular is splitting your report into shorter time ranges up to hours. Mark Edmondson did a great work on anti-sampling in his R package so you might find it useful. You may start with this blog post https://code.markedmondson.me/anti-sampling-google-analytics-api/

How to export specific price and volume data from the LMAX level 2 widget to excel

Background -
I am not a programmer.
I do trade spot forex on an intraday basis.
I am willing to learn programming
Specific Query -
I would like to know how to export into Excel in real time 'top of book' price and volume data as displayed on the LMAX level 2 widget/frame on -
https://s3-eu-west-1.amazonaws.com/lmax-widget/website-widget-quote-prof-flex.html?a=rTWcS34L5WRQkHtC
In essence I am looking to export
price and volume data where the coloured flashes occur.
price and volume data for when the coloured flashes do not occur.
I understand that 1) and 2) will encompass all the top of book prices and volume. However i would like to keep 1) and 2) separate/distinguished as far as data collection is concerned.
Time period for which the collected data intends to be stored -> 2-3 hours.
What kind of languages do I need to know to do the above?
I understand that I need to be an advanced excel user too.
Long term goals -
I intend to use the above information to make discretionary intraday trading decisions.
In the long run I will get more involved with creating an algo or indicator to help with the decision making process, which would include the information above.
I have understood that one needs to know coding to get involved in activities such as the above. Hence I have started learning C ++. More so to get a hang/feel for coding.
I have been searching all over the web as to where to start in this endeavor. However I am quite confused and overwhelmed with all the information.
Hence apart from the specific data export query, any additional guidelines would also be helpful.
As of now I use MT4 to trade. Hence I believe to do the above - I will need more than just MT4.
Any help would be highly appreciated.

Yes, MetaTrader4 is still not able ( in spite of all white-label-ed Terminals' OrderBook Add-On(s) marketing and PR efforts ) to provide an OrderBook-L2/DoM-data into your MQL4 / NewMQL4 algorithm for any decision making. Third party software tools' integration is needed to make MQL4-code aware of the real-time L2/DoM-data.
LMAX widget has impressive look & feel, however for your Excel export it requires a lot of programming efforts to re-use it for an automated scanner to produce data for 1 & 2 while there may be some further, non-technical, troubles on legal / operational restrictions for automated scanner to be operated on such data-source. To bring an example, the data-publisher policy restrict automated Options-pricing scanners for options on { FTSE | CAC | AMS | DAX }, may re-visit the online published data-sources no more than once a quarter of an hour and get blocked / black-listed otherwise. So a care and a proper data-source engineering is in place.
Size of data collection is another issue. Excel has some restrictions on an amount of rows/columns that may get imported. Large data-files, the more the CSV-imports may strike these limits. L2/DoM-data, collected for 2-3 hours just for one single FX Major may go beyond such a limit, as there are many records per second ( tens, if not hundreds, with just a few miliseconds between them ). Static file-size of collected data-records take typically several minutes to just get written on disk, so proper distributed processing data-flow-design and non-blocking-fileIO engineering is a must.
Real-time system design is the right angle to view the problem solution approach, rather than just some programming language excersise. Having mastered some programming language is a great move, nevertheless, so called robust real-time system design, and Trading software is such a domain, requires, with all respect, a lot more insights and hands-on experience than to make an MQL4 code run multi-thread-ed & multi-process-ed with a few DLL services for a Cloud/Grid-based distributed processing system.
How much real-time traffic is expected to be there?
For just a raw idea, what the Market can produce per second, per milisecon, per microsecond, let's view a NYNEX traffic analysis for one instrument:
One second can have this wild relief:
And once looking into 5-msec sampling:
How to export
Check if the data-source owner legally permits your automated processing.
Create your own real-time DataPump software, independent of the HTML-wrapped Widget
Create your own 'DB-store' to efficiently off-load scanned data-records from real-time DataPump
Test the live data-source >> DataPump >> DB-store performance & robustness on being able to serve error-free a 24/6 duty for several FX Majors in parallel
Integrate your DataPump fed DB-store local data-source for on-line/off-line interactions with your preferred { MT4 | Excel | quantitative-analytics } package
Integrate a monitoring of any production environment irregularity in your real-time processing pipeline, which may range from network issues, VPN / hosting issues, data-source availability issues to an unexpected change in the scanned data-source format/access conditions.

Converting data into information:Where to start?

We (my company) runs a website which have lots of data recorded like user registration, visits, clicks, what the stuff they post etc etc but so far we don't have a tool to find out how to monitor entire thing or how to find patterns in it so that we can understand what kind of information we can get from it? So that Mgmt can take decisions based on it. In short, the people do at Amazon or Google based on data they retrieve, we want a similar thing.
Now, after the intro, I would like to know what technology could it be called;is it Data Mining,Machine Learning or what? Where should we start to convert meaningless data into useful Information?

I think what you need enters in the "realm" of: parsing data, creating graphs, showing statistics about some elements, etc.
There is no "easy" answer, I can only answer parts of your question.
There are no premade magical analytical tools, big companies have their own backend tools tunned to parse the large amounts of data and spit out data summaries that are then used to build graphs or for statistical analysis.
I think the domain you are searching for is statistical data analysis. But there are many parts that go together here.
Best advice I can give you is to set up specific goals for you analysis and then try to see what is the best solution, you question is too open.
ie. if you are interested in visits/clicks/website related statistics Google Analytics is a great tool, and very easy to use.

Optimizations in security systems

I would like to know what we can mean by saying a optimized security system(physical or logical security system).
Does it mean something like a system which can monitor performance of services, SQL, DB maintenance, logs etc.
Thanks

Optimized is a general term, you will have to get specific in terms of defining what you need to consider it optimized to an "acceptable" level. Plus there are different kinds of "optimization", such as for speed, memory usage, maintainability, etc.
Are you trying to figure out some criteria so that you can market your product as "optimized" and be able to explain it if someone asks what you mean?
If so, you need to figure out what your customers (or potential customers) actually care about. If they care about video resolution and disk space usage (how much the system can store before having to archive elsewhere), then you need to make your application smart (optimized! :) in those areas.
THEN, you could be more specific in your marketing and say, "optimized to use XYZ resolution and store up to 2 weeks of video on a standard hardware setup!" - which would actually mean something tangible to your customers, and show them that you care about what they care about.

How do you measure if an interface change improved or reduced usability?

For an ecommerce website how do you measure if a change to your site actually improved usability? What kind of measurements should you gather and how would you set up a framework for making this testing part of development?

Multivariate testing and reporting is a great way to actually measure these kind of things.
It allows you to test what combination of page elements has the greatest conversion rate, providing continual improvement on your site design and usability.
Google Web Optimiser has support for this.

Similar methods that you used to identify the usability problems to begin with-- usability testing. Typically you identify your use-cases and then have a lab study evaluating how users go about accomplishing certain goals. Lab testing is typically good with 8-10 people.
The more information methodology we have adopted to understand our users is to have anonymous data collection (you may need user permission, make your privacy policys clear, etc.) This is simply evaluating what buttons/navigation menus users click on, how users delete something (i.e. changing quantity - are more users entering 0 and updating quantity or hitting X)? This is a bit more complex to setup; you have to develop an infrastructure to hold this data (which is actually just counters, i.e. "Times clicked x: 138838383, Times entered 0: 390393") and allow data points to be created as needed to plug into the design.

To push the measurement of an improvement of a UI change up the stream from end-user (where the data gathering could take a while) to design or implementation, some simple heuristics can be used:
Is the number of actions it takes to perform a scenario less? (If yes, then it has improved). Measurement: # of steps reduced / added.
Does the change reduce the number of kinds of input devices to use (even if # of steps is the same)? By this, I mean if you take something that relied on both the mouse and keyboard and changed it to rely only on the mouse or only on the keyboard, then you have improved useability. Measurement: Change in # of devices used.
Does the change make different parts of the website consistent? E.g. If one part of the e-Commerce site loses changes made while you are not logged on and another part does not, this is inconsistent. Changing it so that they have the same behavior improves usability (preferably to the more fault tolerant please!). Measurement: Make a graph (flow chart really) mapping the ways a particular action could be done. Improvement is a reduction in the # of edges on the graph.
And so on... find some general UI tips, figure out some metrics like the above, and you can approximate usability improvement.
Once you have these design approximations of user improvement, and then gather longer term data, you can see if there is any predictive ability for the design-level usability improvements to the end-user reaction (like: Over the last 10 projects, we've seen an average of 1% quicker scenarios for each action removed, with a range of 0.25% and standard dev of 0.32%).

The first way can be fully subjective or partly quantified: user complaints and positive feedbacks. The problem with this is that you may have some strong biases when it comes to filter those feedbacks, so you better make as quantitative as possible. Having some ticketing system to file every report from the users and gathering statistics about each version of the interface might be useful. Just get your statistics right.
The second way is to measure the difference in a questionnaire taken about the interface by end-users. Answers to each question should be a set of discrete values and then again you can gather statistics for each version of the interface.
The latter way may be much harder to setup (designing a questionnaire and possibly the controlled environment for it as well as the guidelines to interpret the results is a craft by itself) but the former makes it unpleasantly easy to mess up with the measurements. For example, you have to consider the fact that the number of tickets you get for each version is dependent on the time it is used, and that all time ranges are not equal (e.g. a whole class of critical issues may never be discovered before the third or fourth week of usage, or users might tend not to file tickets the first days of use, even if they find issues, etc.).

Torial stole my answer. Although if there is a measure of how long it takes to do a certain task. If the time is reduced and the task is still completed, then that's a good thing.
Also, if there is a way to record the number of cancels, then that would work too.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string