avg_over_time of a max - promql

I have a gauge metric badness which goes up when my service is performing poorly. There is one gauge per instance of the service and I have many instances.
I can take a max over all instances so that I can see how bad the worst instance is:
max(badness)
This graph is noisy because the identity of the worst instance, and how bad it is, changes frequently. I would like to smooth it out by applying a moving average. However, this doesn't work (I get a PromQL syntax error):
avg_over_time(max(badness)[1m])
How can I apply avg_over_time() to a timeseries that has already been aggregated with the max() operator?
My backend is VictoriaMetrics so I can use either MetricsQL or pure PromQL.

The avg_over_time(max(process_resident_memory_bytes)[5m]) query works without issues in VictoriaMetrics. It may fail if you use promxy in front of VictoriaMetrics, since promxy doesn't support MetricsQL - see this issue for details.
The query can be fixed, so it may work in Prometheus and promxy - just add a colon after 5m in square brackets:
avg_over_time(max(process_resident_memory_bytes)[5m:])
This is named subquery in Prometheus world. See mode details about subquery specifics in VictoriaMetrics and Prometheus in this article

Related

Sort results in Azure Maps Search

i'm using AzureMaps Search and i'm trying to retrieve all POI(point of interest) in a location, but i can't find in any documentation how to sort, for example by distance my results
Someone has same problem?
https://atlas.microsoft.com/search/poi/json?subscription-key=key&api-version=1.0&query=restaurant&lat=45&lon=9
I don't think the current Search POI API provides sorting as part of the API itself. So, you'll have to do that in-memory afterwards. The results are sorted by "score"(relevancy) by default.
There is no way to order by results with POI,I guess what you're looking for here. As per the best practices, you could use nearby-search
https://atlas.microsoft.com/search/address/json?subscription-key={subscription-key}&api-version=1&query=400%20Broad%20Street%2C%20Seattle%2C%20WA&countrySet=US
If you would like straight line distances you can loop through the results can calculate the distances using the haversine formula. If using the Azure Maps Web SDK, you can use the atlas.math.getDistanceTo function instead. Once you calculate a distance to each point, then you can sort accordingly.
If you want to get the driving distance to each point there are two approaches you can take;
Use the Route Matrix API. This is fairly easy to use, would be less error prone than the second option below and the response is easy enough to work with. Only negative with this approach is that you will need to S1 pricing tier to access this service and each cell would generate a transaction which can get expensive fast.
Use the Routing Directions API with a large number of waypoints that go from your origin to each destination and back (A->B->A->C....). This will be a bit more work to understand the results and if any leg of the route is unrouteable for any reason, the whole route calculation would fail. However, this would be significantly cheaper than option one as you can use S0 pricing tier which has free limits and this would only generate 1 transaction in most cases (if you have a large number of locations then you might need to break them up and spread across a few calls). Because this would calculate the route from the origin to each destination and back, you twice as many calculations are made than you need which could make this slower than approach 1. When parsing the response you would look at the odd indexed route legs as those would go from the origin to each destination. In some scenarios it might be desirable to know the travel time from the destinations to the origin (i.e. how long would it take all employees to get to work), in which case the even numbered legs is what you would want to use.
Again, once you have the distance, or better yet, travel time, you can then sort the results accordingly.

Nools and Drools

I was really happy to see a rules engine in Node and also was looking at Drools in the Java world and while reading the documentation (specifically: http://docs.jboss.org/drools/release/6.1.0.Final/drools-docs/html_single/index.html#PHREAK)found that Drools 6.0 has evolved and now uses the PHREAK method for rules matching. The specific paragraph that is of interest is:
Each successful join attempt in RETE produces a tuple (or token, or
partial match) that will be propagated to the child nodes. For this
reason it is characterised as a tuple oriented algorithm. For each
child node that it reaches it will attempt to join with the other side
of the node, again each successful join attempt will be propagated
straight away. This creates a descent recursion effect. Thrashing the
network of nodes as it ripples up and down, left and right from the
point of entry into the beta network to all the reachable leaf nodes.
For complex rules and rules over a certain limit, the above quote says that RETE based method trashes the memory quite a lot and so it was evolved into PHREAK.
Since nools is based on the Rete algorithm, is the above valid? Are there any optimizations done similar to PHREAK? Any comparisons done w.r.t to Drools?
The network thrashing is only an issue when you want to try and apply concurrency and parallelism, which requires locking in areas. As NodeJS is single threaded, that won't be an issue. We haven't yet attempted to solve this area in Drools yet either - but the Phreak work was preparation with this in mind, learning from the issues we found from our Rete implementation. On a separate note Rete has used partition algorithms in the past for parallelism, and this work is in the same area for the problem it's trying to solve.
For single threaded machines lazy rule evaluation is much more interesting. However as the document notes a single rule of joins will not differ in performance between Phreak and Rete. As you add lots of rules, the lazy nature avoids potential work, thus saving over-all cpu cycles. The algorithm is also more forgiving for a larger number of badly written rules, and should degrade less in performance. For instance it doesn't need the traditional Rete root "Context" object that is used to drive rule selection and short-circuit wasteful matching - this would be seen as anti-pattern in Phreak and may actually slow it down, as you blow away matches it might use again in the future.
http://www.dzone.com/links/rip_rete_time_to_get_phreaky.html
Also the collection oriented propagation is relevant when multiple subnetworks are used in rules, such as with multiple accumulates.
http://blog.athico.com/2014/02/drools-6-performance-with-phreak.html
I also did a follow up on the backward chaining and stack evaluation infrastructure:
http://blog.athico.com/2014/01/drools-phreak-stack-based-evaluations.html
Mark (Creator of Phreak)

How much request volume can Microsoft Web Ngram API handle?

I currently have a list of 200 words from which I need to create semantically correct permutations. Unfortunately, permutating through a list of that size will lead to something like a trillion permutations.
What I am planning to do is utilize the Microsoft Web Ngram service and a yield function to find ngrams within my permutations that have joint scores above a certain threshold. My hope here is that by filtering based on score, I will be left with only semantically correct permutations
My question is regarding the Microsoft Ngram API: with a list of 200 words, there will be A LOT of permutations to go through using this method -- can someone give me a sense if the api function be able to handle that volume of requests?
Thanks!
There is no limit on the number of queries you can make. However, the terms of use disallow threaded access, and the server response is relatively slow (between 0.12 and 0.22 s each query). So you could get at most 720k queries in a 24 hour period. I'm using PHP's file_get_contents(...). There may be a faster way.
In my application I've chopped up a library such that portions are updated with n-gram data as needed. It does slow down my code but it is at least tenable.
http://kkava.com/vocab/?ngram=on&imp=on&v=

How to estimate search application's efficiency?

I hope it belongs here.
Can anyone please tell me is there any method to compare different search applications working in the same domain with the same dataset?
The problem is they are quite different - one is a web application which looks up the database where items are grouped in categories, and another one is a rich client which makes search by keywords.
Is there any standard test giudes for that purpose?
There are testing methods. You may use e.g. Precision/Recall or the F beta method to estimate a rate which computes the "efficiency". However you need to make a reference set by yourself. That means you will somehow measure not the efficiency in the domain, more likely the efficiency compared to your own reasoning.
The more you need to make sure that your reference set is representative for the data you have.
In most cases common reasoning will give you also the result.
If you want to measure the performance in matters of speed you need to formulate a set of assumed queries against the search and query your search engine with these at a given rate. Thats doable with every common loadtesting tool.

Generic graphing and charting solutions

I'm looking for a generic charting solution, ideally not a hosted one that provides the following features:
Charting a tuple of values where the values are:
1) A service identifier (e.g. CPU usage)
2) A client identifier within that service (e.g. server IP)
3) A value
4) A timestamp with millisecond/second resolution.
Optional:
I'd like to also extend the concept of a client identifier further, taking the above example further, I'd like to store statistics for each core separately, so, another identifier would be Core 1/Core 2..
Now, to make sure I'm clearly stating my problem, I don't want a utility that collects these statistics. I'd like something that stores them, but, this is also not mandatory, I can always store them in MySQL, or such.
What I'm looking for is something that takes values such as these, and charts them nicely, in a multitude of ways (timelines, motion, and the usual ones [pie, bar..]). Essentially, a nice visualization package that allows me to make use of all this data. I'd be collecting data from multiple services, multiple applications, and the datapoints will be of varying resolution. Some of the data will include multiple layers of nesting, some none. (For example, CPU would go down to Server IP, CPU#, whereas memory would only be Server IP, but would include a different identifier, i.e free/used/cached as the "secondary' identifier. Something like average request latency might not have a secondary identifier at all, in the case of ping). What I'm trying to get across is that having multiple layers of identifiers would be great. To add one final example of where multiple identifiers would be great: adding an extra identifier on top of ip/cpu#, namely, process name. I think the advantages of that are obvious.
For some applications, we might collect data at a very narrow scope, focusing on every aspect, in other cases, it might be a more general statistic. When stuff goes wrong, both come in useful, the first to quickly say "something just went wrong", and the second to say "why?".
Further, it would be a nice thing if the charting application threw out "bad" values, that is, if for some reason our monitoring program started to throw values of 300% CPU used on a single core for 10 seconds, it'd be nice if the charts themselves didn't reflect it in the long run. Some sort of smoothing, maybe? This could obviously be done at the data-layer though, so its not a requirement at all.
Finally, comparing two points in time, or comparing two different client identifiers of the same service etc without too much effort would be great.
I'm not partial to any specific language, although I'd prefer something in (one of the following) PHP, Python, C/C++, C#, as these are languages I'm familiar with. It doesn't have to be open source, it doesn't have to be a library, I'm open to using whatever fits my purpose the best.
More of a P.S than a requirement: I'd like to have pretty charts that are easy for non-technical people to understand, and act upon too (and like looking at!).
I'm open to clarifying, and, in advance, thanks for your time!
I am pretty sure that protovis meets all your requirements. But it has a bit of a learning curve. You are meant to learn by examples, and there are plenty to work from. It makes some pretty nice graphs by default. Every value can be a function, so you can do things like get rid of your "Bad" values.

Resources