Alertmanager to send aggregated/consolidated alert to Webhook - prometheus-alertmanager

We have some alerts posted by prometheus based on the metrics to our alertmanager.
Currently the alertmanager has the below firing alerts posted to slack-integration:
AlertNo.1 - alertname:Alert_Offline, alertsrc:prometheus, cluster_name:cc100, site_name:PP101, device:K8308, timestamp:2021-08-11 00:46:18
AlertNo.2 - alertname:Alert_Offline, alertsrc:prometheus, cluster_name:cc100, site_name:PP101, device:D3010, timestamp:2021-08-11 00:46:18
AlertNo.3 - alertname:Alert_Offline, alertsrc:prometheus, cluster_name:cc100, site_name:PP101, device:X2008, timestamp:2021-08-11 00:46:18
AlertNo.4 - alertname:Alert_Offline, alertsrc:prometheus, cluster_name:cc100, site_name:PP101, device:X2005, timestamp:2021-08-11 00:46:18
AlertNo.5 - alertname:Alert_Offline, alertsrc:prometheus, cluster_name:cc100, site_name:PP101, device:X2202, timestamp:2021-08-11 00:46:18
Our alertmanager notifies 5 different alerts due to the 5 unique device names as posted by prometheus. We want to know how we can post only one single/aggregated alert with just cluster_name or site_name label-value to a specific webhook based on the above firing data. Is there a way to post to specific webhook based on a specific label only one alert even though there are multiple due to other unique values in other alerting labels?
Expected:
to slack:
<as-above-posted>
to a 3rd-party webhook:
<only-one-alert-as-below>
AlertNo.1 - alertname:Alert_Offline, alertsrc:prometheus, cluster_name:cc100, site_name:PP101 timestamp:2021-08-11 00:46:18

This can be achieved using the group_by parameter in conjunction with group_wait and group_interval in your alertmanager.yml.
From the docs:
# To aggregate by all possible labels use the special value '...' as the sole label name, for example:
# group_by: ['...']
# This effectively disables aggregation entirely, passing through all
# alerts as-is. This is unlikely to be what you want, unless you have
# a very low alert volume or your upstream notification system performs
# its own grouping.
[ group_by: '[' <labelname>, ... ']' ]
# How long to initially wait to send a notification for a group
# of alerts. Allows to wait for an inhibiting alert to arrive or collect
# more initial alerts for the same group. (Usually ~0s to few minutes.)
[ group_wait: <duration> | default = 30s ]
# How long to wait before sending a notification about new alerts that
# are added to a group of alerts for which an initial notification has
# already been sent. (Usually ~5m or more.)
[ group_interval: <duration> | default = 5m ]
In your case, try something like:
group_by: ['cluster_name', 'site_name']
group_wait: 10s
group_interval: 1m
group_by specifies the labels to aggregate alerts upon.
group_wait specifies the amount of time to wait for alerts with labels to add to the aggregated group. In your case, it looks like the alerts come in at the same time, so keeping this value low should be alright, but you can experiment with it to see what works best for you.
group_interval specifies the amount of time to wait before sending alerts from an aggregated group that have already been alerted.
Doing so will aggregate your alerts by the specified labels, cluster_name and site_name, resulting in one fired alert with the payload containing the list of alerts in the alerts section.

Related

Use One TradingView Strategy for Multiple Coins connected to Bot

I'm a newbie for TradingView and have been learning a lot. I'm developing a strategy with backtesting using pine-script language however what confusing me, is how to use the same strategy for multiple coins.
The strategy is mainly developed for Binance Futures trading not sure if possible to apply it to other Exchangers.
So I wanna setup alerting system for multiple coins to be connected to 3comma Bot or Finandy to execute a trade based on the setup parameter.
My questions are.
If I wanna use certain candle types like Hiken Ashi should that be included in the code or just select it in the chart and it will be read by the strategy automatically?
Should I include the coins in the script or I should select them one by one in the chart and then setup an alert per coin?
Should I create one alert per one coin per chart or I should have multiple charts per each coin to setup an alert?
Should the timeframe also be defined in the code or the chart can do the job?
Sorry for many questions, I'm trying to understand the process well.
For multiple coins, the easiest way is to attach your strategy to each and every coin on Tradingview you want to trade with. This way you can backtest each on their respective chart.
If you create a strategy for say BINANCE:BTCUSDT and think of using this strategy on different exchange, you can do it, but first I suggest test it on BINANCE:BTCPERP and see for yourself how the same strategy can show a wildly different result (even though BTCUSDT and BTCPERP should be moving the same).
For a complex solution you can create a single script that uses multiple securities, but you won't be able to backtest that with simple approach, you would have to write your own gain/loss calculator, and you are not there yet.
I was going down the same road, my suggestions are:
create an input for the coin you want to trade (that will go into an input variable)
abstract the alert message off of the strategy.entry() command, that is, construct the alert message in a way you can replace values with variables in it (like the above selected coin)
3Commas needs a Bot ID to start/stop a bot, abstract that off as well, and you will have a good boilerplate code you can reuse many times
as a good practice (stolen from Kubernetes) besides the human readable name, I give a 5 letter identifier to every one of my bots, for easy recognition
A few examples. The below will create a selector for a coin and the Bot ID that is used to trade that coin. The names like 'BIN:GMTPERP - osakr' are entirely my making, they act as a Key (for a key/value pair):
symbol_choser = input.string(title='Ticker symbol', defval='BTC-PERP - aktqw', options=[FTX_Multi, 'FTX:MOVE Single - pdikr', 'BIN:GMTPERP - osakr', 'BIN:GMTPERP - rkwif', 'BTC-PERP - aktqw', 'BTC-PERP - ikrtl', 'BTC-PERP - cbdwe', 'BTC-PERP', 'BAL-PERP', 'RUNE-PERP', 'Paper Multi - fjeur', 'Paper Single - ruafh'], group = 'Bot settings')
exchange_symbol = switch symbol_choser // if you use Single Pair bots on 3Commas, the Value should be an empty string
'BIN:GMTPERP - osakr' => 'USDT_GMTUSDT'
'BTC-PERP - cbdwe' => 'USD_BTC-PERP'
'Paper Multi - fjeur' => 'USDT_ADADOWN'
bot_id = switch symbol_choser
'BIN:GMTPERP - osakr' => '8941983'
'BTC-PERP - cbdwe' => '8669136'
'Paper Multi - fjeur' => '8246237'
And now you can combine the above parts into two Alerts, for starting/stopping the bot:
alertMessage_Enter = '{"message_type": "bot", "bot_id": ' + bot_id + ', "email_token": "12345678-4321-abcd-xyzq-132435465768", "delay_seconds": 0, "pair": "' + exchange_symbol + '"}'
alertMessage_Exit = '{"action": "close_at_market_price", "message_type": "bot", "bot_id": ' + bot_id + ', "email_token": "12345678-4321-abcd-xyzq-132435465768", "delay_seconds": 0, "pair": "' + exchange_symbol + '"}'
exchange_symbol is the proper exchange symbol you need to provide to your bot, you can get help on the 3Commas' bot page (they have pre-crafted the HTTP requests you need to use for certain actions).
bot_id is the ID of your Bot, that is straightforward.
The above solution does not handle Single coin bots, their trigger message has a different structure.
Whenever you can, use Multi coin bots as they can act as a Single bot with two exception:
if you have a long spanning strategy and when you start a bot, you should be already in a trade, you can manually start a Single bot, but you cannot start a Multi coin bot (as there is no way to provide the coin info on which to start the trade)
if you are trading a derivative like FTX's MOVE contracts and your script is attached to the underlying BTC Futures. MOVE contracts changes name every day (the date is in their name, like: BTC-MOVE-0523) so you would need delete an alert, update and reapply the alert every day, etc. Instead, if your script is on the BTC-PERP then you can use a Single coin bot which does not expect a coin name in the alert message so it will start/stop the Bot on whatever coin it is connected to, then you need to change the coin name every day only in the Bot settings and never touch the Alert.
To summarize on your questions:
Do not include chart type in code (that is not even an embeddable data), just apply your code to whatever chart you want to use. Hint: never use Heikin-Ashi for trading. You can, but you will pay for it dearly (everyone tries, even against warnings, no worries)
Set up them one-by-one, so you can backtest them
No, set the timeframe on the chart. Later, when you will be more experienced you will be able to abstract the current timeframe (whatever it is) away and write code that is timeframe-agnostic. But that's hard and make your code less readable.

How to create a power bi measure which looks at distinct rows and then filters

I'm having trouble getting my power bi measure to work.
I will say the report has a number of filters set for the page and [Call Direction](Same table) and [Region](another table)
I'm looking at list of calls coming into company via Cisco UCCX it has a table which logs every call that comes in, the [Node ID - Session ID - Sequence No] can be listed more than once if it not answered by an agent first time.
I can easily create a measure which distinct counts [Node ID - Session ID - Sequence No] to give me an entry once per call
measure =calculate(CALCULATE(DISTINCTCOUNT('contactcalldetail'[Node ID - Session ID - Sequence No])
That works fine and gives the expected number however I wish to only see those that have abandoned. If I try and break those out either as below or using the filter option.
abandoned calls = CALCULATE(DISTINCTCOUNT('contactcalldetail'[Node ID - Session ID - Sequence No]), 'contactcalldetail'[CallOutcome]=1 || 'contactcalldetail'[CallOutcome]=4)+0
I then get a count of all abandoned calls and not just one of each [node ID - Session ID - Sequence No].
Is anyone able to help me as to where I'm going wrong?
I managed to fix this myself i used follow by counting the distictvlaues in sub table that has each phone call that hit a CSQ
Abandonded Calls = CALCULATE(
DISTINCTCOUNT('contactqueuedetail'[Node ID - Session ID - Sequence No])+0,
FILTER('contactcalldetail',[CallOutcome]=1 || [CallOutcome]=4)
)

Creating an alert for long running pipelines

I currently have an alert setup for Data Factory that sends an email alert if the pipeline runs longer than 120 minutes, following this tutorial: https://www.techtalkcorner.com/long-running-azure-data-factory-pipelines/. So when a pipeline does in fact run longer than the expected time, I do receive an alert however, I am also getting additional & unexpected alerts.
My query looks like:
ADFPipelineRun
| where Status =="InProgress" // Pipeline is in progress
| where RunId !in (( ADFPipelineRun | where Status in ("Succeeded","Failed","Cancelled") | project RunId ) ) // Subquery, pipeline hasn't finished
| where datetime_diff('minute', now(), Start) > 120 // It has been running for more than 120 minutes
I received an alert email on September 28th of course saying a pipeline was running longer than the 120 minutes but when trying to find the pipeline in the Azure Data Factory pipeline runs nothing shows up. In the alert email there is a button that says, "View the alert in Azure monitor" and when I go to that I can then press "View Query Results" above the shown query. Here I can re-enter the query above and filter the date to show all pipelines running longer than 120 minutes since September 27th and it returns 3 pipelines.
Something I noticed about these pipelines is the end time column:
I'm thinking that at some point the UTC time is not properly configured and for that reason, maybe the alert is triggered? Is there something I am doing wrong, or a better way to do this to avoid a bunch of false alarms?
To create Preemptive warnings for long-running jobs.
Create activity.
Click on blank space.
Follow path: Settings > Elapsed time metric
Refer Operationalize Data Pipelines - Azure Data Factory
I'm not sure if you're seeing false alerts. What you've shown here looks like the correct behavior.
You need to keep in mind:
Duration threshold should be offset by the time it takes for the logs to appear in Azure Monitor.
The email alert takes you to the query that triggered the event. Your query is only showing "InProgress" statues and so the End property is not set/updated. You'll need to extend your query to look at one of the other statues to see the actual duration.
Run another query with the RunId of the suspect runs to inspect the durations.
ADFPipelineRun
| where RunId == 'bf461c8b-0b1e-43c4-9cdf-7d9f7ccc6f06'
| distinct TimeGenerated, OperationName, RunId, Start, End, Status
For example:

Description for a stopped containers

I added a rule into the rules.yml in order to get an alert whenever a container stops.
In order to get an alert for each stopped container with the suffix "dev-23", I used this rule:
- alert: ContainerKilled
expr: absent(container_start_time_seconds{name=~".*dev-23"})
for: 0m
labels:
severity: 'critical'
annotations:
summary: 'Container killed (instance {{ $labels.instance }})'
description: 'A container has disappeared\n VALUE = {{ $value }}\n LABELS: {{ $labels }}'
This indeed works, and I get an alert whenever a container that ends with "dev-23" stops. However, the summary and description of the receieved alert do not tell me what is the name of the stopped container.
In the alert I get this description:
description = A container has disappeared\n VALUE = 1\n LABELS: map[]
summary = Container killed (instance )
What should I use in order to get the exact name of the stopped container?
The issue is, that the metrics created with the absent() function does not have any labels and its value is always 1 (if the queried metric is not present)
You can leave it like that, without any label, if you just want to check if there is NO container running at all, without any details about its previous state and that is enough information to detect the issue.
Or you can use the UP metric, that has the labels, but potentially disappers, e.g. when you have a dynamic service discovery.
In general it a good practice to have an alert on UP{...}==0 and on absent({...})

Azure Stream Analytics job triggers False Positives missing assets on job start

On starting my on Azure Stream Analytics (ASA) job I get several False Positives (FP) and I want to know what causes this.
I am trying to implement asset tracking in ASA as disccussed in another question. My specific use case is that I want to trigger events when an asset has not send a signal in the last 70 minutes. This works fine when the ASA job is running but triggers false positives on starting the job.
For example when starting the ASA-job at 2017-11-07T09:30:00Z. The ASA-job gives an entry with MostRecentSignalInWindow: 1510042968 (=2017-11-07T08:22:48Z) for name 'A'. while I am sure that there is another event for name 'A' with time: '2017-11-07T08:52:49Z' and one at '2017-11-07T09:22:49Z in the eventhub.
Some events arrive late due to the event ordering policy:
Late: 5 seconds
Out-of-order: 5 seconds
Action: adjust
I use the below query:
WITH
Missing AS (
SELECT
PreviousSignal.name,
PreviousSignal.time,
FROM
[signal-eventhub] PreviousSignal
TIMESTAMP BY
time
LEFT OUTER JOIN
[signal-eventhub] CurrentSignal
TIMESTAMP BY
time
ON
PreviousSignal.name= CurrentSignal.certname
AND
DATEDIFF(second, PreviousSignal, CurrentSignal) BETWEEN 1 AND 4200
WHERE CurrentSignal.name IS NULL
),
EventsInWindow AS (
SELECT
name,
max(DATEDIFF(second, '1970-01-01 00:00:00Z', time)) MostRecentSignalInWindow
FROM
Missing
GROUP BY
name,
TumblingWindow(minute, 1)
)
For anyone reading this, this was a confirmed bug in Azure Stream Analytics and has now been resolved.

Resources