Dynamic sqlReaderQuery in azure data factory using powershell - azure

I am new to data factory and powershell. Looking a way to provide user input to the sqlReaderQuery as a where clause.
So that user can select a subset of data from sql server and push it azure sql.
I can see the parameters for date and time values but I am looking to provide ID with date.
Is there a way to write powershell to pass these values to the pipeline.
Any help is highly appreciated!!

The sqlReaderQuery in Azure Data Factory is unfortunately not very dynamic; the only variables that are really available are SliceStart, SliceEnd, WindowStart, and WindowEnd. You can tweak these with functions like AddDays and so on but that is not really going to do what you want I don't think.
One option with PowerShell is to generate a new pipeline JSON file based on your users input and use New-AzureRmDataFactoryPipeline to add that JSON to your data factory as a new pipeline. Of course this will mean you'll have a lot of pipelines unless you use Remove-AzureRmDataFactoryPipeline.
Another option would be to use a Stored Procedure activity. Your user input could be saved in the database and the stored procedure would then dynamically create the extract to another table.

Related

Python Function to get data from excel which has 1000 Column

I have a very large excel file and it gets crashed every time whenever I try to find some data so now I'm planning to store the file in Azure Blob Storage/other database and trying to write an Azure function in python so that I can fetch the data from Azure Blob/DB.
I have 1000 Columns and need a dynamic query so that end user can put any column out of 1000 columns to get the data from the excel which is stored in Blob/DB.
Could someone please help me to solution for the same. Which DB would be best and which python library I can use.
I will trigger azure function from Azure API management.
Have a look at the Microsoft example docs for quering Azure Cosmos using the python SDK here. Once we know how to hardcode an azure cosmos query with python, we only need to make it dynamic. Meaning we let the
user specify the column name.
One way you could do that is by creating an azure function that performed the database query. You would allow the user to specify the column name in url query parameters.
As an example, imagine we have an Azure function that is hosted at
https://my-function-app.azurewebsites.com/api/httptrigger1
We would let the user specify which column they would like data from using the url query parameters. They could add which column at the end of url. Example ...
https://my-function-app.azurewebsites.com/api/httptrigger1?MyColumn=Column1
An Azure function can read the url query parameters. They are part of the incoming request object. So, you could use them to dynamically build your sql statement ...
var myColumn = req.Query["MyColumn"]; // Get column name from url query params
QUERY = $"SELECT {myColumn} FROM MyTable"; // build dynamic query
This would allow your user to select their own column using the url!
Note: Be sure to sanitize and parametrize those user inputs to protect against sql injection.

How to insert item in CosmosDB(SQL API) from using Azure Data Factory activity

I have an ADF pipeline which is iterating over a set of files, performing various operations and I have an Azure CosmosDB (SQL API) instance where I would like to insert the name of file and a timestamp, mainly to keep track on which files have been already processed and which not, but in the future I might want to add some other bits of data related to each file.
What I have is my CosmosDB
And currently I am trying to utilice the Copy Data Activity for the insert part.
One problem that I have is that this particular activity expects source while at this point I have only the filename. In theory it was an option to use the Blob Storage from where I read the file at the beginning, but since the Blob Storage is set to store binary files I got the following error if I try to use it as source
Because of that I created a dummy CosmosDB Linked service, but I have several issues with this approach:
Generally the idea for dummy source is not very appealing to me
I haven't find a lot of information on the topic but it seems that if I want to use something in the Sink I need to SELECT from the source
Even though I have selected a value for the id the item is not saved with the selected value from the Source query, but as you can see from the first screenshot I got a GUID and only the name is as I want it.
So my questions are two. I just learn ADF but this approach doesn't look like the proper way to insert item into CosmosDB from activity, so a better/more common approach would be appreciated. If there is not better proposal, how can I at least apply my own value for the id column? If I create the item in the CosmosDB GUI and save it from there, as you can see I am able to use the filename as id which for now seems like a good idea to me, but I wasn't able to add custom value (string or int) when I was trying through the activity, so how can I achieve this?
This is how my Sink looks like

Create Data Factory Dataset in a specific Azure DevOps branch rather than directly in Data Factory

While trying to build an ADF pipeline that generates datasets within Data Factory, I ran into an interesting issue. Or maybe I misunderstand some components completely, in which case I'd happily be educated.
I basically read some meta data from a SQL Database table which determines which source system, schema and tables I should pull new data from. The meta data is stored within a bunch of variables, which then feed a Web Request that attempts to generate a new Data Source as per the MS documentation. Yes, I'm trying to use Azure Data Factory to generate Azure Data Factory components.
The URL to create the DataSet and the JSON Body for the request are both generated using #Concat and a number of the variables. The resulting DataSet is a very straightforward file that does not contain references to the columns, but just the table schema and table name. I generated these manually before, and that all seems to work brilliantly. I basically have a dataset connected to the source system, referincing the table from the meta data.
The code runs, but the resulting dataset is directly published, as opposed to being added in my working branch. While this should not be a big issue once I manage to properly test everything, ideally the object would be created in my working branch (using Azure DevOps, thus a local file).
My next thought was to set up a linked service to my local PC, and simply write the same contents as above there. My challenge seems to be that I essentially am creating a file out of nothing. I am trying to use a Copy Data component, and added an empty placeholder file to act as a source.
I configure the sink with Dynamic Content for Copy Behavior, and attempted to add the JSON contents there. This gets the file created, but it's unfortunately empty. I also attempted to add a new column to the source with the data being the same contents.
However, seeing the file to be used as a sink doesn't exist, a mapping error will occur. Apart from this, I'd not want a column header to be written; just the dynamically created contents.
I'm not sure how to continue with this. I feel I'm very close to achieving my goal, but cannot seem to take this final hurdle.
Any hints or suggestions would be very welcome.

Azure Data Factory dynamic output path based on source dataset payload

I have a stream analytics job which constantly dumps data in Cosmos DB. The payload has a property "Type" which determines the payload itself. i.e. which columns are included in the payload. It is an integer value of either 1 or 2.
I'm using Azure Data Factory V2 to copy data from Cosmos DB to Data Lake. I've created a pipeline with an activity that does this job. I'm setting the output path folder name using :
#concat('datafactoryingress/rawdata/',dataset().productFilter,'/',formatDateTime(utcnow(),'yyyy'),'/')
What I want in the datafactory is to identify the payload itself, i.e. determine if the type is 1 or 2 and then determine if the data goes in folder 1 or folder 2. I want to iterate the data from Cosmos DB and determine the message type and segregate based on message Type and set the folder paths dynamically.
Is there a way to do that? Can I check the Cosmos DB document to find out the message type and then how do I set the folder path dynamically based on that?
Is there a way to do that? Can I check the Cosmos DB document to find
out the message type and then how do I set the folder path dynamically
based on that?
Unfortunately, based on the doc, dynamic content from source dataset is not supported by adf so far. You can't grab the fields in the source data as sink output dynamic parameters. Based on your situation, I suggest you setting up two separate pipelines to transfer data according to the Type field respectively.
If the Type field is varied and you do want to differentiate the output path, the ADF may not be the suitable choice for you. You could write logical code to fulfill your needs.

Stream Analytics Query (Select * into output)(Exclude specific columns)

I have a query like;
SELECT
*
INTO [documentdb]
FROM
[iothub]
TIMESTAMP BY eventenqueuedutctime
I need to use * because data is dynamic and dont have specific schema. Problem is Iothub system information data is written to documentdb in this query. Is there any way to exclude Iothub system information data?
Thanks.
This is not possible currently but this will be possible in Job Compatibility Level 1.2 in near future. For now, one workaround is that you could create a post create trigger in Cosmos DB to remove this property from the document.
To answer your question, Azure stream analytics service doesn't have an in-built support for excluding columns from dynamic data (iothub information). But, we can achieve this by using UDF. Here is more info on UDF.
UDF can help us in deleting the column from input data and returning us the updated json.
There are two steps basically to achieve this:
Create a JavaScript UDF.
Go to functions from left hand side navigation (below inputs).
Click on Add --> JavaScript UDF.
Give a function alias = removeiothubinfo
keep output type - any.
copy paste following code into function definition.
function main(input) {
delete input['IoTHub'];
return input;
}
Click on Save
Update query
Go to query mode and copy paste the following query :
WITH NewInput AS
(
SELECT
udf.removeiothubinfo(iothub) AS UpdatedJson
FROM
[iothub]
)
SELECT
UpdatedJson.*
INTO
[documentdb]
FROM
NewInput
Click on Save
I suggest you to test your query before running the job by uploading a sample file containing similar structure for json.
Edited
Also, even in job compatibility level 1.2 there has been no additional functionality to achieve this. Check this out for more info.
As #chetangm said in his answer, no such filtering mechanism is supported in ASA so far. Yes, you could use create trigger in Cosmos db, however it need to be triggered in sdk code or REST API. It won't be triggered automatically.
I provide you with another workaround that using Azure Function Cosmos DB Triggered. It could be executed when data is added to or changed in Azure Cosmos DB. You just need to remove the fields you don't want in the function code.

Resources