Azure Storage Account file details in a table in databricks - azure

I am loading data via pipelines in ADLS gen2 container.
Now I want to create a table that has details that when the pipeline start running and then completed. like below fields
where
startts - start time of job
endts- end time of job
but extractts is the extraction time which is what i want to create.. is there any approch i can create table ?? help will be really appreciated.

It might not be exactly what you need, but you can add columns inside a copy activity with the property #pipeline().TriggerTime to get your "startts" field. Go to the bottom option on the source tab and select new, and on value choose "Add dynamic content".
Here is an example
For your "extractts" you could use the property of the activity called "executionDuration", which will give you the time it took to adf to run the activity in seconds. Again, use dynamic content and get this: #activity('ReadFile').output.executionDuration, then store this value wherever you need.
In this example I'm storing the value in a variable, but you could use it anywhere.
I dont understand your need for a field "endts", I would just simply do startts+extractts to get it.

Related

How to create datadog 'change alerts' using terraform?

I am trying to create a change monitor using terraform. To create a monitor that checks that overtime a count stays at 0 for example every day (the value will go up to one some times and get back to 0).
I found on the UI the capacity to create a change alert.
I cant seem to find a way to define the configuration for this type. Is terraform just supporting only a subset of the monitors? or does the query need to be change in some specific way that I cant find documentation for?.
I've stumbled upon this as well. I just figured out you have to manually create the monitor using "change alerts" then go to "manage monitors" page, open the one you just created and you'll see the query that starts with change(...). Copy the whole query to the query field in your terraform config.

LogicApp that returns newly generated ID back to original source

Hello I am trying to create a LogicApp that first:
Extracts data from CosmosDB, using a query
Loops over the results
Pushes the results data into CRM
Sidenote: Once this data is pushed into CRM, CRM automatically generates an ID for each record. How can I then:
My biggest dilemma is figuring out how can I return the newly generated ID back to the original CosmosDB container in which it was pulled from?
I have started on this and these are my questions:
Does this first part look correct in what I am doing? I must use this SQL query for this:
a. I am simply telling the Logic App to extract data from a particular container inside CosmosDB
b. Then I would like to loop over the results I just obtained, and push this to CRM
c. My dilemma is:
Once data is pushed to CRM, CRM then automatically generates an ID for each record that is uploaded. How would I then return the updated ID to cosmos?
Should I create a variable that stores the IDS, then replace the old IDs with the new ones?
I am not sure how to construct/write this logic within LogicApps and have been researching examples of this.
Any help is greatly appreciated.
Thanks
If the call to your CRM system returns the ID you are talking about then I would just add one additional action in your loop in Azure Logic App to update the read record in Azure Cosmos DB. Given that you are doing a SELECT * from the container you should have the whole original document.
Add the 'Create or update document' action as a step with a reference to the THFeature container along with your Database ID and then provide the new values for the document. I pasted an example below.
BTW. Your select query looks strange - you should avoid slow cross partition queries if you can.

Mapping columns from JSON in an Azure SQL Data Flow task

I am attempting a simple SELECT action on a Source JSON dataset in an Azure Data Factory data flow, but I am getting an error message that none of the columns from my source are valid. I use the exact configuration as the video, except instead of a CSV file, I use a JSON file.
In the video, at 1:12, you can see that after configuring the source dataset, the source projection shows all of the columns from the source schema. Below is a screen shot from the tutorial video:
image.png
And below is a screen shot from my attempt:
(I blurred the column names because they match column names from a vendor app)
Note in my projection, I am unable to modify the data types or the format. I'm not sure why not, but I don't need to modify either so I moved on. I did try with a CSV and I was able to modify the data types. I'm assuming this is a JSON thing, but I'm noting here just in case there is some configuration that I should take a look at.
At 6:48 in the video, you'll see the user add a select task, exactly as I have done. Below is a screen shot of the select task in the tutorial immediately following adding the task:
Notice the source columns all appear. Below is a screen shot of my select task:
I'm curious why the column names are missing? If I type them in manually, I get an error: "Column not found"
For reference, below are screen shots of my Data Source setup. I'm using a Data Lake Storage Gen2 Linked Service connected via Managed Identity and the AutoResolvingIntegrationRuntime.
Note that I tried to do this with a CSV as well. I was able to edit the datatype and format on a CSV, but I get the same column not found error on the next step.
Try doing this in a different browser or clear your browser cache. It may just be a formatting thing in the auto-generated JSON. This has happened to me before.

How to pass Tumbling Window parameters to a Data Factory pipeline in the Data Factory UI?

I have defined a pipeline in Azure Data Factory with a Tumbling Window trigger, as seen below:
I would like for my activities to receive the Tumbling window parameters (trigger().outputs.windowStartTime and trigger().outputs.windowEndTime) however I did not find any examples in the documentation showing how to do this in the UI.
Question
How can I pass the Tumbling Window parameters to a Data Factory pipeline in the Data Factory UI?
Assuming the pipeline that you're triggering is already paramaterised then you're nearly there.
When adding the trigger you'll see a second screen to pass parameters from the trigger.
You can then add your functions prefixed with #. So:
#trigger().outputs.windowStartTime
#trigger().outputs.windowEndTime
If you need to call a function on the parameter before you pass it you can do that too
#addHours(trigger().outputs.windowEndTime,1)
This answer is out of date. Parameters can be added directly in the UI - see my answer above.
Note: You cannot pass the Tumbling Windows parameters to a Data Factory pipeline in the ADF UI.
You need to pass the tumbling window parameters by following steps:
First create a Tumbling window trigger as per your requirement.
On the bottom left corner, you will find the "Triggers" tab => Click on Triggers and select the created trigger and click on "Code" and replace the parameters.
To use the WindowStart and WindowEnd system variable values in the pipeline definition, use your "MyWindowStart" and "MyWindowEnd" parameters, accordingly.
For more details, refer "MSDN" thread, which addressing similar issue.
Hope this helps.
This is because you set #trigger().outputs.windowStartTime and #trigger().outputs.windowEndTime in the variable. In fact, you should set them in the parameter, like this:
Please let me know if still facing issue.

pipeline fails for stored Procedure called in Copy Activity - Azure data factory V2

We've a SQL server stored procedure which returns the incremental records. If there are no changes to the table, then nothing is returned. Stored procedure does what is expected.
We're invoking the above said stored procedure via Copy activity in Azure data factory. It works fine for all the cases except when nothing (empty) is returned.
We are looking for an option, where Nothing(Empty) is returned from stored procedure, pipeline should skip and proceed further and also mark the whole pipeline successful rather failed.
Thanks
Your stored procedure needs to end by a SELECT, so it returns something - including an empty set if there is no rows to return.
However, to skip the pipeline if there is no row, DraganB's last answer is pretty relevant, I had to do that a couple of time on my current project.
As #DraganB said in the comment, activities could run in the flow,so you could do stored procedure activity --> if activity --> copy activity.If the output of sp activity is empty,then don't run the copy activity and end the pipeline.
Another idea, maybe you could learn about azure function external table trigger. You could add a status column in your table such as needToBeCopied, every insert or update operation will change the column. Then filter the data which need to be copied while running the copy activity.
It got resolved. The real issue was Copy activity wasn't returning the correct error message. There was an issue with access control.
Grant VIEW CHANGE TRACKING permission on a table to a user:
[sql]
GRANT VIEW CHANGE TRACKING ON OBJECT::dbo.databaselog to username
[/sql]
Incremental loading approach is through enabling Change Tracking on SQL Server Database and on required tables .
Azure data factory should have logged error as 'Insufficient permissions on so and so table'. Instead it failed the whole pipeline with error message as 'Stored procedure might be invalid or stored procedure doesn't return any output'.
Anyway, we assigned the right permissions and issue got resolved. Now, it creates an Empty file just with header record in it when there's no output returned from Stored Procedure likewise in - Data Factory Avoiding creation of empty files

Resources