When reading through the DevOps Handbook by Gene Kim, I came across this interesting tid bit of information -
Apparently there is native functionality within Grafana that allows you to perform the K-S test on different datasets. Looking through the documentation I haven't been able to find anything confirming this.
Does anyone know how to perform this test on datasets? Do I need to do it within a query as a function, or is it something simpler that I am missing?
Thanks!
Related
I am building a package that uses the Google Analytics API for Python.
But, in severous cases when I have multiple dimensions the extraction by day is sampled.
I know that if I use sampling_level = LARGE will use a sample more accurate.
But, somebody knows if has a way to reduce a request that you can extract one day without sampling?
Grateful
setting sampling to LARGE is the only method we have to decide the amount of sampling but as you already know this doesn't prevent it.
The only way to reduce the chances of sampling is to request less data. A reduced number of dimensions and metrics as well as a shorter date range are the best ways to ensure that you dont get sampled data
This is probably not the answer you want to hear but, one way of getting unsampled data from Google analytics is to use unsampled reports. However this requires that you sign up for Google Marketing Platform. With these you can create an unsampled report request using the API or the UI.
There is also a way to export the data to Big Query. But you lose the analysis that Google provides and will have to do that yourself. This too requires that you sign up for Google Marketing Platform.
there are several tactics of building unsampled reports, most popular is splitting your report into shorter time ranges up to hours. Mark Edmondson did a great work on anti-sampling in his R package so you might find it useful. You may start with this blog post https://code.markedmondson.me/anti-sampling-google-analytics-api/
I have two changes on my web page but I'm monitoring a bunch of variables. So what I'm able to extract from my website monitoring experiment is as follows:
Original solution: Visitors, body link click-visitors, most popular click-visitors, share-visitors.
Solution with some change: Visitors, body link click-visitors, most popular click-visitors, share-visitors.
I was wondering about simple 2 sample portion test. Take each of the monitored variable and compute portion test for original and changed solution.
I don't know if it tells me something about the overall result - if original solution is better than the solution with some change or not.
Is there something better what can I use for this purpose. I'll appreciate any of your advice.
Sounds to me like you’re confusing two things: business metric of interest and test for statistical significance. The former is some business mesurement that you would like to measure for. This could be sales, conversion, subscription rate, or many others. See e.g. this paper for a good discussion on the perils of using the wrong metric. Statistical significance is a test that tells you if the number of measurements you’ve seen so far is enough to substantiate a claim that the difference between the two experiences is very unlikely random. See e.g. this paper for a good discussion.
I'm eager to try out some more with Microsoft Azure Machine Learning and would like to find a data set to make a use case concerning predictive manufacturing. Microsoft already offers a data set (semi conductor) for a use case like this, but I would like to try out some more. Does anybody of you know where I can find another data set similiar to the one provided by MS?
Basically I'm looking for a bunch of sensor data in a manufacturing process and a classification whether it came to a failure or not.
Your help would be greatly appreciated ;)
Thanks,
Clemens
This blog post contains interesting sources of data that you can use with Azure ML. From the post:
Data.gov – http://www.data.gov/
Kaggle - http://www.kaggle.com/
UCI Machine Learning Repository - http://archive.ics.uci.edu/ml/
Specifically, you could check out the manufacturing data set on data.gov.
We have a java application which essentially performs ETL - reading from and writing to files/databases with transformation rules applied in the middle.
I've started looking into automating acceptance testing for the application however am struggling to apply the frameworks i've looked at so far (concordion, cucumber etc). They seem very easy to implement for simple applications like those shown on their tutorials, but I basically have to have tests saying "I have this input file and expect this output file (or result in db table)" - with each file having 100's of fields.
I could fake it so that the input values are read from a html table (as per concordion tutorial) however that isnt really a true test.
Has anyone come across a framework that could help? or been able to use concordion for such a purpose?
Many thanks
Who is the audience of the test? If this is purely a technical exercise and there are no non technical business owners that need to interact with the test then just doing it with your favorite unit testing framework is fine. Fitnesse works best when there is collaboration for the acceptance criteria with non techies.
So no, just "file input 'a' produces file output 'b'" probably is not enough to warrant the overhead of fitnesse. I would only move it to tables if someone was going to change it on a regular basis and that person was not comfortable editing a file directly.
At a leading bank in the Netherlands we have setup testautomation with Fitnesse and ETL fixtures. It is an Agile project and for our ETL solution we use Informatica Powercenter and Oracle DB. For us our test automation/specification in Fitnesse is of great value now.
We have SLIM fixtures for truncating tables, inserting records into tables, checking table records with expected values, updating records and calling our Fitnesse workflows.
Did you try JBehave ?
read more information about it at http://www.qatestingtools.com/jbehave
Don't know where to start on this one so hopefully you guys can clear up my question. I have project where email will be searched for specific words/patterns and stored in a structured manner. Something that is done with Trip it.
The article states that they developed a DataMapper
The DataMapper is responsible for taking inbound email messages
addressed to plans [at] tripit.com and transforming them from the
semi-structured format you see in your mail reader into a highly
structured XML document.
There is a comment that also states
If you're looking to build this yourself, reading a little bit about
Wrappers and Wrapper Induction might be helpful
I Googled and read about wrapper induction but it was just too broad of a definition and didn't help me understand how one would go about solving such problem.
Is there some open source project out there that does similar things?
There are a couple of different ways and things you can do to accomplish this.
The first part, which involves getting access to the email content I'll not answer here. Basically, I'll assume that you have access to the text of emails, and if you don't there are some libraries that allow you to connect java to an email box like camel (http://camel.apache.org/mail.html).
So now you've got the email so then what?
A handy thing that could help is that lingpipe (http://alias-i.com/lingpipe/) has an entity recognizer that you can populate with your own terms. Specifically, look at some of their extraction tutorials and their dictionary extractor (http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html) So inside of the lingpipe dictionary extractor (http://alias-i.com/lingpipe/docs/api/com/aliasi/dict/ExactDictionaryChunker.html) you'd simply import the terms you're interested in and use that to associate labels with an email.
You might also find the following question helpful: Dictionary-Based Named Entity Recognition with zero edit distance: LingPipe, Lucene or what?
Really a very broad question, but I can try to give you some general ideas, which might be enough to get started. Basically, it sounds like you're talking about an elaborate parsing problem - scanning through the text and looking to apply meaning to specific chunks. Depending on what exactly you're looking for, you might get some good mileage out of a few regular expressions to start - things like phone numbers, email addresses, and dates have fairly standard structures that should be matchable. Other data points might benefit from some indicator words - the phrase "departing from" might indicate that what follows is an address. The natural language processing community also has a large tool set available for text processing - check out things like parts of speech taggers and semantic analyzers if they're appropriate to what you're trying to do.
Armed with those techniques, you can follow a basic iterative development process: For each data point in your expected output structure, define some simple rules for how to capture it. Then, run the application over a batch of test data and see which samples didn't capture that datum. Look at the samples and revise your rules to catch those samples. Repeat until the extractor reaches an acceptable level of accuracy.
Depending on the specifics of your problem, there may be machine learning techniques that can automate much of that process for you.