Best practices to handle errors in GCP Dataflow pipelines

Best practices to handle errors in GCP Dataflow pipelines - python-3.x

I have a GCP Datapipeline running and I am wondering what are the best ways to handle errors. The pipeline looks like this
read_from_pubsub --> business_logic_ParDo() --> write_to_bigquery
While testing, I have noticed that ParDo being stuck. Though I was able to resolve the issue but i noticed that it made my pipeline stuck, So what should be the best approach to handle this?
What should my ParDo function do if the business logic fails? I don't want to write to big_query partial data.
Can't think of any other error scenarios.

I would recommend the dead letter pattern to handle unrecoverable errors in business logic. As for aborting stuck records, you could try something like func-timeout, but that could be expensive to use on every element.

Related

Azure data factory - Dashboard Log Query - Filter failed pipelines who successfully rerun

I've been tasked with reducing monitor overhead of a data lake (~80TiB) with multiple ADF pipelines running (~2k daily). Currently we are logging Failed pipeline runs by doing a query on ADFPipelineRun. I do not own these pipelines, nor do I know the inner workings of existing and future pipes, I cannot make assumptions on how to filter these by custom logic in my queries. Currently the team is experiencing fatigue with these, most failed piperuns are solved during their reruns.
How can I filter these failures so they dont show up when a rerun succeeds?
The logs exposes a few id's that initially looks interesting, like Id, PipelineRunId, CorrelationId, RunId, but none of these will link a failed pipe to a successful one.
The logs does however show an interesting column, UserProperties, that apparently can be dynamically populated during the pipeline run. There may be a solution to be found here, however it would require time and friction for all existing factories to be reconfigured.
Are there any obvious solutions I have overlooked here? Preferably Azure native solutions. I can see that reruns and failures are linked inside ADF Studio, but I cannot see a way to query it externally.

After a discussion with the owner of the ADF pipes we realized the naming convention of the pipelines would allow me to filter out the noisy failing pipes that would later succeed. It's not a universal solution but it will work for us as the naming convention is enforced across the business unit I am supporting

AadServiceTemporarilyUnavailable. Logic Apps Error

I am trying to automate data saving email attachments using Azure Logic Apps. But I am getting the above error. Would anyone please help me how to solve this?

The following statements are based on our research, please consider if the data is useful.
There might be a situation where Logic App instances run at the same time, or in parallel. Make sure you enable Concurrency Control.
While if you are still receiving the same then we can able to do the error handling using Azure Monitor which catches most things like malformed data.
You can also check Handling error codes in your application which lists some common errors and can Troubleshoot and diagnose workflow failures accordingly.

How to handle an Azure Function rerunning when using message queue binding?

I have a v1 Azure Function that is triggered by a message being written to the Azure Storage message queue.
The Azure Function needs to perform multiple updates to SharePoint Online. Occasionally these operations fail. This results in the message being returned to the queue and being reprocessed.
When I developed the function, I didn't consider that it might partially complete and then restart. I've done a little research and it sounds like I need to modify it to be re-entrant.
Is there a design pattern that I should follow to cater for this without having to add a lot of checks to determine if an operation has already been carried out by a previous execution? Alternatively, is there any Azure functionality that can help (beyond the existing message retries and poison queue)

It sounds like you will need to do some re-engineering. Our team had a similar issue and wrote a home-grown solution years ago. But we eventually scrapped our solution and went with Azure Durable Functions.
Not gonna lie - this framework has some complexity and it took me a bit to wrap my head around it. Check out the function chaining pattern.
We have processing that requires multiple steps that all must be completed. We're spanning multiple data stores (Updating Cosmos Db, Azure SQL, Blob Storage, etc), so there's no support for distributed transactions across multiple PaaS offerings. Durable Functions will allow you to break your process up into discrete steps. If a step fails, an orchestrator will re-run that step based on a retry policy.
So in a nutshell, we use Durable Task Activity functions to attempt each step. If the step fails due to what we think is a transient error, we retry. If it's an unrecoverable error, we don't retry.

Non blocking operation In Flink or Sparks vs Akka stream

I am learning and evaluating sparks and Flink before picking one of them for a project that I got.
In my evaluation I came up with the following simple tasks, that I can figure out how to implement it in both framework.
Let say that
1-/ I have a stream of events that are simply information about the fact that some item have changed somewhere in a database.
2-/ I need for each of those event, to query the db to get the new version of the item
3-/ apply some transformation
4-/connect to another Db and write that results.
My question here is as follow:
Using Flink or Sparks, how can one make sure that the calls to the dbs are handle asynchronously to avoid thread starvation?
I come from scala/Akka, where typically we avoid to make blocking calls and use future all the ways for this kind of situation. Akka stream allows that fine grain level of detail for stream processing for instance Integrating stream with external service. This avoid thread starvation. While I wait in my io operation the thread can be used for something else.
In short I don't see how to work with futures in both frameworks.
So I believe that somehow this can be reproduce with both frameworks.
Can anyone please explain how this is supposed to be handled in Flink or sparks.
If this is not supported out of the box, does anyone has experience with getting it incorporated somehow.

Since version 1.2.0 of Flink, you can now use the Async I/O API to achieve this.

Azure hosting Workflow Activity with Persistence for Windows8 Push Notification

I'm completely new to the Windows Azure and Windows Workflow scope of things.
But basically, what I'm trying to implement is the Cloud web-app that's going to be responsible for pushing down tile updates/badge/toast notifications to my Winodws 8 application.
The code to run to send down the tile notification etc is fine, but needs to be executed every hour or so.
I decided the most straight forward approach was to make an MVC application that would have a WebAPI, this WebAPI will be responsible for receiving the ChannelURI from the ModernApplication that sends it to it, and will be stored on SQL Azure.
There will then be a class that has a static method which does the logic for gathering the new data and generating a new Tile/Badge/Toast.
I've created a simple Activity workflow, that has a Sequence with a DoWhile(true) activity. Inside the body of this DoWhile, contains a Sequence which has InvokeMethod and Delay, the InvokeMethod will call my class that contains the static method. The delay is set to one hour.
So that seems to be all okay. I then start this Activity via the Application_Start in Global.asax with the following line:
this.ActivityInvoker = new WorkflowInvoker(new NotificationActivity());
this.ActivityInvoker.InvokeAsync();
So I just tested it with that and it seems to be running my custom static method at the set interval.
That's all good, but now I have three questions in relation to this way of handling it:
Is this the correct/best approach to do this? If not, what are some other ways I should look into.
If a new instance is spun up on Azure, how do I ensure that the running Workflow for both instances won't step on each other's foot? i.e. how do I make sure that the InvokeMethod won't run two times, I only want it to run once an hour regardless of how many instances there are.
How do I ensure that if the instances crash/go-down that the state of it is maintained?
Any help, guidance, etc is much appreciated.

A couple of good questions that I would love to answer, however trying to do some on a forum like this would be difficult. But let's give it a crack. To start with at least.
1) There is nothing wrong with your approach for implementing a scheduled task. I can think of a few other ways of doing it. Like running a simple Worker Role with a Do{Thread.Sleep(); ...} simple, but effective. There are more complex / elegant ways too including using external libraries and frameworks for scheduling tasks in Azure.
2) You would need to implement some sort of Singleton type pattern in your workflow / job processing engine. You could for instance acquire a lease on a 1Kb blob record when your job starts, and not allow another instance to start etc.
For more detailed answers I suggest we take this offline and have a Skype call and discuss in detail your requirements. You know how to get hold of me via email :) look forward to it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Best practices to handle errors in GCP Dataflow pipelines - python-3.x

I would recommend the dead letter pattern to handle unrecoverable errors in business logic. As for aborting stuck records, you could try something like func-timeout, but that could be expensive to use on every element.

Related

Azure data factory - Dashboard Log Query - Filter failed pipelines who successfully rerun

AadServiceTemporarilyUnavailable. Logic Apps Error

How to handle an Azure Function rerunning when using message queue binding?

Non blocking operation In Flink or Sparks vs Akka stream

Azure hosting Workflow Activity with Persistence for Windows8 Push Notification

Categories

Resources