Identify if a logstash pipeline is complete

Identify if a logstash pipeline is complete - logstash

I have below 2 queries
Is there a specific key in logstash API response which can identify that a pipeline (incremental) completed successfully? I checked this where the Pipeline stats give an event response ...in, filtered, out... but did not find anything which can clearly say that a pipeline completed successfully.
In logstash logs for a particular run I get below result, what does (seconds) denote here?
[2018-07-14T01:00:05,117][INFO ][logstash.inputs.jdbc ] (4.753178s) SELECT a.*,....
[2018-07-14T02:00:45,221][INFO ][logstash.inputs.jdbc ] (42.543719s) SELECT pkey_ AS job_id,.....

Related

Insert Events with Changed Status Only using Logstash

I'm inserting data into elasticsearch (Index A) every minute for the healthcheck of some endpoints. I want to read the index A every minute for the last events it received and if the state changes of any endpoint (from healthy to unhealthy or unhealthy to healthy) then insert that event into Index B.
How would I achieve that and if possible can someone provide a sample code please. I tried elasticsearch filter plugin but couldn't get the desired result.
I tried elasticsearch filter plugin but couldn't get the desired result.

Creating an alert for long running pipelines

I currently have an alert setup for Data Factory that sends an email alert if the pipeline runs longer than 120 minutes, following this tutorial: https://www.techtalkcorner.com/long-running-azure-data-factory-pipelines/. So when a pipeline does in fact run longer than the expected time, I do receive an alert however, I am also getting additional & unexpected alerts.
My query looks like:
ADFPipelineRun
| where Status =="InProgress" // Pipeline is in progress
| where RunId !in (( ADFPipelineRun | where Status in ("Succeeded","Failed","Cancelled") | project RunId ) ) // Subquery, pipeline hasn't finished
| where datetime_diff('minute', now(), Start) > 120 // It has been running for more than 120 minutes
I received an alert email on September 28th of course saying a pipeline was running longer than the 120 minutes but when trying to find the pipeline in the Azure Data Factory pipeline runs nothing shows up. In the alert email there is a button that says, "View the alert in Azure monitor" and when I go to that I can then press "View Query Results" above the shown query. Here I can re-enter the query above and filter the date to show all pipelines running longer than 120 minutes since September 27th and it returns 3 pipelines.
Something I noticed about these pipelines is the end time column:
I'm thinking that at some point the UTC time is not properly configured and for that reason, maybe the alert is triggered? Is there something I am doing wrong, or a better way to do this to avoid a bunch of false alarms?

To create Preemptive warnings for long-running jobs.
Create activity.
Click on blank space.
Follow path: Settings > Elapsed time metric
Refer Operationalize Data Pipelines - Azure Data Factory

I'm not sure if you're seeing false alerts. What you've shown here looks like the correct behavior.
You need to keep in mind:
Duration threshold should be offset by the time it takes for the logs to appear in Azure Monitor.
The email alert takes you to the query that triggered the event. Your query is only showing "InProgress" statues and so the End property is not set/updated. You'll need to extend your query to look at one of the other statues to see the actual duration.
Run another query with the RunId of the suspect runs to inspect the durations.
ADFPipelineRun
| where RunId == 'bf461c8b-0b1e-43c4-9cdf-7d9f7ccc6f06'
| distinct TimeGenerated, OperationName, RunId, Start, End, Status
For example:

'Delay until' finish time of 'Queue a new build' not working in Azure Logic App

I'm triggering an Azure Logic App from an https webhook for a docker image in Azure Container Registry.
The workflow is roughly:
When a HTTP request is received
Queue a new build
Delay until
FinishTime of Queue a new build
See: Workflow image
The Delay until action doesn't work in that the queueried FinishTime is 0001-01-01T00:00:00.
It complains about the wrong format, so I manually added a Z after the FinishTime keyword.
Now the time stamp is in the right format, however, the timestamp 0001-01-01T00:00:00Z obviously doesn't make sense and subsequent steps are executed without delay.
Anything that I am missing?
edit: Queue a new build queues an Azure pipeline build. I.e. the FinishTime property comes from the pipeline.

You need to set a timestamp in future, the timestamp 0001-01-01T00:00:00Z you set to the "Delay until" action is not a future time. If you set a timestamp as 2020-04-02T07:30:00Z, the "Delay until" action will take effect.
Update:
I don't think the "Delay until" can do what you expect, but maybe you can refer to the operations below. Just add a "Condition" action to judge if the FinishTime is greater than current time.
The expression in the "Condition" is:
sub(ticks(variables('FinishTime')), ticks(utcNow()))
In a word, if the FinishTime is greater than current time --> do the "Delay until" aciton. If the FinishTime is less than current time --> do anything else which you want.(By the way you need to pay attention to the time zone of your timestamp, maybe you need to convert all of the time zone to UTC)

I've been in touch with an Azure support engineer, who has confirmed that the Delay until action should work as I intended to use it, however, that the FinishTime property will not hold a value that I can use.
In the meantime, I have found a workaround, where I'm using some logic and quite a few additional steps. Inconvenient but at least it does what I want.
Here are the most important steps that are executed after the workflow gets triggered from a webhook (docker base image update in Azure Container Registry).
Essentially, I'm initializing the following variables and queing a new build:
buildStatusCompleted: String value containing the target value completed
jarsBuildStatus: String value containing the initial value notStarted
jarsBuildResult: String value containing the default value failed
Then, I'm using an Until action to monitor when the jarsBuildStatus's value is switching to completed.
In the Until action, I'm repeating the following steps until jarsBuildStatus changes its value to buildStatusCompleted:
Delay for 15 seconds
HTTP request to Azure DevOps build, authenticating with personal access token
Parse JSON body of previous raw HTTP output for status and result keywords
Set jarsBuildStatus = status
After breaking out of the Until action (loop), the jarsBuildResult is set to the parsed result.
All these steps are part of a larger build orchestration workflow, where I'm repeating the given steps multiple times for several different Azure DevOps build pipelines.
The final action in the workflow is sending all the status, result and other relevant data as a build summary to Azure DevOps.
To me, this is only a workaround and I'll leave this question open to see if others have suggestions as well or in case the Azure support engineers can give more insight into the Delay until action.
Here's an image of the final workflow (at least, the part where I implemented the Delay until action):
edit: Turns out, I can simplify the workflow because there's a dedicated Azure DevOps action in the Logic App called Send an HTTP request to Azure DevOps, which omits the need for manual authentication (Azure support engineer pointed this out).
The workflow now looks like this:
That is, I can query the build status directly and set the jarsBuildStatus as
#{body('Send_an_HTTP_request_to_Azure_DevOps:_jar''s')['status']}
The code snippet above is automagically converted to a value for the Set variable action. Thus, no need to use an additional Parse JSON action.

Azure Data Factory v2: Activity execute pipeline output

Is there a way to reference the output of an executed pipeline in the activity "Execute pipeline"?
I.e.: master pipeline executes 2 pipelines in sequence. The first pipeline generates an own created run_id that needs to be forwarded as a parameter to the second pipeline.
I've read the documentation and checked that the master pipeline log the output of the first pipeline, but it looks like that this is not directly possible?
We've used until now only 2 pipelines without a master pipeline, but we want to re-use the logic more. Currently we have 1 pipeline that calls the next pipeline and forwards the run_id.

ExecutePipline currently cannot pass anything from its insides to its output. You can only get the runID or name.
For some weird reason, the output of ExecutePipeline is returned not as a JSON object but as a string. So if you try to select a property of output like this #activity('ExecutePipelineActivityName').output.something then you get this error:
Property selection is not supported on values of type 'String'
I found that I had to use the following to get the run ID:
#json(activity('ExecutePipelineActivityName').output).pipelineRunId

The execute pipeline activity is just another activity with outputs that can be captured by other activities. https://learn.microsoft.com/en-us/azure/data-factory/control-flow-execute-pipeline-activity#type-properties
If you want to use the runId of the pipeline executed previosly, it would look like this:
#activity('ExecutePipelineActivityName').output.pipeline.runId
Hope this helped!

Azure Stream Analytics job triggers False Positives missing assets on job start

On starting my on Azure Stream Analytics (ASA) job I get several False Positives (FP) and I want to know what causes this.
I am trying to implement asset tracking in ASA as disccussed in another question. My specific use case is that I want to trigger events when an asset has not send a signal in the last 70 minutes. This works fine when the ASA job is running but triggers false positives on starting the job.
For example when starting the ASA-job at 2017-11-07T09:30:00Z. The ASA-job gives an entry with MostRecentSignalInWindow: 1510042968 (=2017-11-07T08:22:48Z) for name 'A'. while I am sure that there is another event for name 'A' with time: '2017-11-07T08:52:49Z' and one at '2017-11-07T09:22:49Z in the eventhub.
Some events arrive late due to the event ordering policy:
Late: 5 seconds
Out-of-order: 5 seconds
Action: adjust
I use the below query:
WITH
Missing AS (
SELECT
PreviousSignal.name,
PreviousSignal.time,
FROM
[signal-eventhub] PreviousSignal
TIMESTAMP BY
time
LEFT OUTER JOIN
[signal-eventhub] CurrentSignal
TIMESTAMP BY
time
ON
PreviousSignal.name= CurrentSignal.certname
AND
DATEDIFF(second, PreviousSignal, CurrentSignal) BETWEEN 1 AND 4200
WHERE CurrentSignal.name IS NULL
),
EventsInWindow AS (
SELECT
name,
max(DATEDIFF(second, '1970-01-01 00:00:00Z', time)) MostRecentSignalInWindow
FROM
Missing
GROUP BY
name,
TumblingWindow(minute, 1)
)

For anyone reading this, this was a confirmed bug in Azure Stream Analytics and has now been resolved.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Identify if a logstash pipeline is complete - logstash

Related

Insert Events with Changed Status Only using Logstash

Creating an alert for long running pipelines

'Delay until' finish time of 'Queue a new build' not working in Azure Logic App

Azure Data Factory v2: Activity execute pipeline output

Azure Stream Analytics job triggers False Positives missing assets on job start

Categories

Resources