Include a label for metadata purposes without aggregating on it - promql

I have a Gauge metric that indicates the error status of a variable number of project replications to mirrors:
project A -> project A Mirror 1 -> project A Mirror 2
project B -> project B Mirror 1 -> project B Mirror 2
...
There is one value per mirror, per project, where 1 is a successful mirror and 0 is a failure. A status label includes a variable error message string if there was a failure, and a type label differentiates mirrors.
The full data for a single time series might look something like this:
mirror_info{instance="",job="mirror-status",path="projectA",status="ok",type="a"} 1
mirror_info{instance="",job="mirror-status",path="projectA",status="ok",type="b"} 1
mirror_info{instance="",job="mirror-status",path="projectB",status="Something went wrong: full error message",type="a"} 0
mirror_info{instance="",job="mirror-status",path="projectB",status="ok",type="b"} 1
mirror_info{instance="",job="mirror-status",path="projectC",status="ok",type="a"} 1
mirror_info{instance="",job="mirror-status",path="projectC",status="ok",type="b"} 1
mirror_info{instance="",job="mirror-status",path="projectD",status="ok",type="a"} 1
mirror_info{instance="",job="mirror-status",path="projectD",status="Something different went wrong: full error message",type="b"} 0
mirror_info{instance="",job="mirror-status",path="projectE",status="ok",type="a"} 1
mirror_info{instance="",job="mirror-status",path="projectE",status="ok",type="b"} 1
I want to be able to show a table like this:
| project | status a | status b |
| ------- | -------- | --------- |
| projectA | ok | ok |
| projectB | Something went wrong: full error message | ok |
| ... | ... | ... |
| projectD | ok | Something different went wrong: full error message |
The issue I'm running into is that I can't aggregate on path without losing the status, and I can't include the status without getting a different entry for every single error message variant.
I'm too much of a beginner at PromQL to know if such a thing is even possible, and fully aware that Prometheus may not even be the right tool to use for this, however that particular requirement is beyond my control in this case.

Related

Pandas: Sliding window, summing app 14 day data

I do wonder how it is possible to make sliding windows in Pandas.
I have a dataframe with three columns.
Country | Number | DayOfTheYear
===================================
No | 50 | 0
No | 20 | 1
No | 37 | 2
I would love to see 14 day chunks for every country and day combination.
The country think can be ignored for the moment, since I can filter those manually in some way. But imagine there is only one country, is there a smart way to get some sort of summed up sliding window, resulting in something like the following?
Country | Sum | DatesOftheYear
===================================
No | 504 | 0-13
No | 207 | 1-14
No | 337 | 2-15
I would also accept if if they where disjunct, being only 0-13, 14-27, etc.
But I just cannot come along with Pandas. I know an old SQL solution, but is there anybody having a nice idea for Pandas?
If you want a rolling windows of your dataframe, you can simply use the .rolling function of pandas : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html
In your case : df["Number"].rolling(14).sum()

Getting multiple readings from .txt into excel

I'm not sure if this is the correct place to ask this, but basically I have a .txt file containing values that came from 2 separate sensors.
Example of some data:
{"t":3838202,"s":0,"n":"x1","v":-1052}
{"t":3838203,"s":0,"n":"y1","v":44}
{"t":3838204,"s":0,"n":"z1","v":-84}
{"t":3838435,"s":0,"n":"x1","v":-1052}
{"t":3838436,"s":0,"n":"y1","v":36}
{"t":3838437,"s":0,"n":"z1","v":-80}
{"t":3838670,"s":0,"n":"x1","v":-1056}
{"t":3838671,"s":0,"n":"y1","v":52}
{"t":3838672,"s":0,"n":"z1","v":-88}
{"t":3838902,"s":0,"n":"x1","v":-1052}
{"t":3838903,"s":0,"n":"y1","v":48}
{"t":3838904,"s":0,"n":"z1","v":-80}
{"t":3839136,"s":0,"n":"x1","v":-1056}
{"t":3839137,"s":0,"n":"y1","v":40}
{"t":3839138,"s":0,"n":"z1","v":-80}
x2:-944
y2:108
z2:-380
{"t":3839841,"s":0,"n":"x1","v":-1052}
{"t":3839842,"s":0,"n":"y1","v":44}
{"t":3839843,"s":0,"n":"z1","v":-80}
x2:-948
y2:100
z2:-380
{"t":3840541,"s":0,"n":"x1","v":-1052}
{"t":3840542,"s":0,"n":"y1","v":40}
{"t":3840543,"s":0,"n":"z1","v":-84}
{"t":3840774,"s":0,"n":"x1","v":-1052}
{"t":3840775,"s":0,"n":"y1","v":40}
{"t":3840776,"s":0,"n":"z1","v":-84}
x2:-948
y2:108
z2:-368
I'm trying to get the data into excel, so that for each "chunk" of data in the x1y1z1 section, I take the last set of recorded data and discard the rest and "pair" it with the next set of x2y2z2 data. I don't think I'm explaining it very well, but I basically want to take that text file and get this in excel:
+---------+-------+----+-----+------+-----+------+
| t | x1 | y1 | z1 | x2 | y2 | z2 |
+---------+-------+----+-----+------+-----+------+
| 3839138 | -1056 | 40 | -80 | -944 | 100 | -380 |
| 3839843 | -1052 | 44 | -80 | -948 | 100 | -380 |
| 3840776 | -1052 | 40 | -84 | -948 | 108 | -368 |
+---------+-------+----+-----+------+-----+------+
I'm really stuck as to where I should even start
I think like a programmer, so I would approach this problem in steps. If you are not a programmer, this might not be so helpful to you, and I am sorry for that.
First, define the data. How does each line of data get read and understood.
Second, write a parsing utility. A piece of code which interprets the data as it is read in and stores it in the form you want for your output
Third, import data into Excel.
So, based on the limited data you provided, I am not sure how you are able to determine the x1,y1,z1,x2,y2,z2 for each t, but I assume that the values enclosed in curly braces have something to do with that based on the values for s, n, and v I'm seeing in there. So, first of all you need to clearly determine the way you read the data. Take it one line at a time, and determine how you would build your output table based on each line of data. I assume you would treat the lines enclosed in curly braces differently from the lines with standalone x/y/z values for example.
I hope this points you in the right direction.

Spark: count events based on two columns

I have a table with events which are grouped by a uid. All rows have the columns uid, visit_num and event_num.
visit_num is an arbitrary counter that occasionally increases. event_num is the counter of interactions within the visit.
I want to merge these two counters into a single interaction counter that keeps increasing by 1 for each event and continues to increase when then next visit has started.
As I only look at the relative distance between events, it's fine if I don't start the counter at 1.
|uid |visit_num|event_num|interaction_num|
| 1 | 1 | 1 | 1 |
| 1 | 1 | 2 | 2 |
| 1 | 2 | 1 | 3 |
| 1 | 2 | 2 | 4 |
| 2 | 1 | 1 | 500 |
| 2 | 2 | 1 | 501 |
| 2 | 2 | 2 | 502 |
I can achieve this by repartitioning the data and using the monotonically_increasing_id like this:
df.repartition("uid")\
.sort("visit_num", "event_num")\
.withColumn("iid", fn.monotonically_increasing_id())
However the documentation states:
The generated ID is guaranteed to be monotonically increasing and unique, but not consecutive. The current implementation puts the partition ID in the upper 31 bits, and the record number within each partition in the lower 33 bits. The assumption is that the data frame has less than 1 billion partitions, and each partition has less than 8 billion records.
As the id seems to be monotonically increasing by partition this seems fine. However:
I am close to reaching the 1 billion partition/uid threshold.
I don't want to rely on the current implementation not changing.
Is there a way I can start each uid with 1 as the first interaction num?
Edit
After testing this some more, I notice that some of the users don't seem to have consecutive iid values using the approach described above.
Edit 2: Windowing
Unfortunately there are some (rare) cases where more thanone row has the samevisit_numandevent_num`. I've tried using the windowing function as below, but due to this assigning the same rank to two identical columns, this is not really an option.
iid_window = Window.partitionBy("uid").orderBy("visit_num", "event_num")
df_sample_iid=df_sample.withColumn("iid", fn.rank().over(iid_window))
The best solution is the Windowing function with rank, as suggested by Jacek Laskowski.
iid_window = Window.partitionBy("uid").orderBy("visit_num", "event_num")
df_sample_iid=df_sample.withColumn("iid", fn.rank().over(iid_window))
In my specific case some more data cleaning was required but generally, this should work.

Can I use BDD by testing low abstraction level code?

I checked several (real world) BDD examples, but all I have found are e2e tests using selenium. I was wondering, is it possible to write unit tests with BDD? If so, how should such a unit test look alike in gherkin? I have a hard time to imagine what to write into the feature and scenario description and how to use them to generate a documentation for example by the java collection framework.
edit
I have found an example here: http://jonkruger.com/blog/2010/12/13/using-cucumber-for-unit-tests-why-not/comment-page-1/
features:
Feature: Checkout
Scenario Outline: Checking out individual items
Given that I have not checked anything out
When I check out item
Then the total price should be the of that item
Examples:
| item | unit price |
| "A" | 50 |
| "B" | 30 |
| "C" | 20 |
| "D" | 15 |
Scenario Outline: Checking out multiple items
Given that I have not checked anything out
When I check out
Then the total price should be the of those items
Examples:
| multiple items | expected total price | notes |
| "AAA" | 130 | 3 for 130 |
| "BB" | 45 | 2 for 45 |
| "CCC" | 60 | |
| "DDD" | 45 | |
| "BBB" | 75 | (2 for 45) + 30 |
| "BABBAA" | 205 | order doesn't matter |
| "" | 0 | |
Scenario Outline: Rounding money
When rounding "" to the nearest penny
Then it should round it using midpoint rounding to ""
Examples:
| amount | rounded amount |
| 1 | 1 |
| 1.225 | 1.23 |
| 1.2251 | 1.23 |
| 1.2249 | 1.22 |
| 1.22 | 1.22 |
step definitions (ruby):
require 'spec_helper'
describe "Given that I have not checked anything out" do
before :each do
#check_out = CheckOut.new
end
[["A", 50], ["B", 30], ["C", 20], ["D", 15]].each do |item, unit_price|
describe "When I check out an invididual item" do
it "The total price should be the unit price of that item" do
#check_out.scan(item)
#check_out.total.should == unit_price
end
end
end
[["AAA", 130], # 3 for 130
["BB", 45], # 2 for 45
["CCC", 60],
["DDD", 45],
["BBB", 75], # (2 for 45) + 30
["BABBAA", 205], # order doesn't matter
["", 0]].each do |items, expected_total_price|
describe "When I check out multiple items" do
it "The total price should be the expected total price of those items" do
individual_items = items.split(//)
individual_items.each { |item| #check_out.scan(item) }
#check_out.total.should == expected_total_price
end
end
end
end
class RoundingTester
include Rounding
end
[[1, 1],
[1.225, 1.23],
[1.2251, 1.23],
[1.2249, 1.22],
[1.22, 1.22]].each do |amount, rounded_amount|
describe "When rounding an amount of money to the nearest penny" do
it "Should round the amount using midpoint rounding" do
RoundingTester.new.round_money(amount).should == rounded_amount
end
end
end
I don't know a way of generating documentation based on this. It is not hopeless, e.g. it is easy to map the Feature: Checkout to the Checkout class. Maybe something similar can be done on the method level. Another possible solution to write helpers specific to this task.
A key idea here is understanding the difference between describing behaviour, and testing. In this context describing behaviour is:
more abstract
easy to read by a wider audience
more focused on what you are doing and why you are doing
less focused on 'how' you are doing something
less exhaustive, we use examples, we don't cover everything
Testing tends to be:
precise
detailed
exhaustive
technical
When you use a BDD tool, e.g. Cucumber to write unit tests you tend to end up with tests that are
verbose
full of technical detail which only a few people can appreciate
very expensive to run
difficult to maintain
So we have different tools and different techniques for different sorts of 'testing'. You get the most bang for your buck by using the right tool for the right job.
Whilst the idea of using one tool for all your testing seems very appealing. In the end its about as sensible as using one tool to fix your car - try pumping up your tyres with a hammer!
BDD describe systems as a black box. If you have any words in there related to the implementation, it's no longer BDD. Inf3rno posted an example with the correct abstraction.
I always ask myself, if the user interface was gone, would I be able to keep the same feature files? If the use cases were to be carried out over voice commands, would the steps still make sense?
Another way to think about it is, steps statements should be facts about a system, not instructions on how to manually test it.
good step definition
Given An account "FOO" with username <username> and password <password>
bad step definition (only applies to ui)
Given I am at the login page
And I enter <username> as the username
And I enter <password> as the password
Full example
Given An account "FOO" with username <username> and password <password>
When Creating an account with username <username> and password BAR
Then An error "Username already in use" is returned
Note that I could implement this last example against the user interface, against the api, but I could also implement it over voice commands ;)

Web parts, dynamically created controls and eventhandlers

What is the best way to display, in a web part, dynamic tables where each cell can cause a postback to display a different set of data?
For example, imagine some financial data:
Table 1: Quarters in year
| Q1 | Q2 | Q3 | Q4 |
Things 1 | 23 | 34 | 44 | 32 |
Things 2 | 24 | 76 | 67 | 98 |
On clicking on the value for Q2, Things 1 (34), this will lead to a second table being displayed instead of Table 1:
Table 2: Weeks in Quarter
| W1 | W2 | W3 | W4 | W5 | W6 | W7 |
SubThings 1 | 231 | 22 | 44 | 22 | 344 | 86 | 12 |
SubThings 2 | 14 | 75 | 47 | 108 | 344 | 86 | 12 |
The problem with the approach I am taking at the moment is that I can create Table 1 in CreateChildControls, which leads to all the events being wired up fine for all the linkbuttons in the cells.
However, because on the postback, I need to create Table 1 in CreateChildControls again, in order to have the eventhandlers correctly wired up, and as the events fire after CreateChildControls, I only know that I need to change the table after CreateChildControls.
Thus, wherever I create Table 2 as a resault (since its after CreateChildControls), I cant get it to wire up events correctly.
Any thoughts?
Regards
Moo
Edit: Solved it.
What you need to do is check in OnPreRender any eventhandler calls, set any flags you need to and then call this.CreateChildControls manually so the new table is created and everything is wired up correctly.
Looks like you are talking about a master/detail situation here. Could you not create two web parts and use web part connection to communicate the required information from table 1, in the first web part to table 2 in the second web part?
J
Just add 2 tables to your web part, hide the second until the first has an element clicked, then set the second table's datasource in the OnClick event handler, set the second grid to visible and the first to hidden...
At Alex's suggestion, here is the answer:
The events need to be tied up prior to them being called, so you need to create the same control in CreateChildControls, allow the event to be called and then resetup everything afterward.
To do this, first do CreateChildControls identically to the prior page, then check in OnPreRender if any eventhandler calls have been made, set any flags you need to and then call this.CreateChildControls manually with the new setup information so the new table is created and everything is wired up correctly.

Resources