Can you use dynamic/run-time outputs with azure stream analytics - azure

I am trying to get aggregate data sent to different table storage outputs based on a column name in select query. I am not sure if this is possible with stream analytics.
I've looked up the stream analytics docs and different forums, so far haven't found any leads. I am looking for something like
Select tableName,count(distinct records)
into tableName
from inputStream
I hope this makes it clear what I'm trying to achieve, I am trying to insert aggregates data into table storage (defined as outputs). I want to grab the output stream/tablestorage name from a select Query. Any idea how that could be done?

I am trying to get aggregate data sent to different table storage
outputs based on a column name in select query.
If i don't misunderstand your requirement,you want to do a case...when... or if...else... structure in the ASA sql so that you could send data into different table output based on some conditions. If so,i'm afraid that it could not be implemented so far.Every destination in ASA has to be specific,dynamic output is not supported in ASA.
However,as a workaround,you could use Azure Function as output.You could pass the columns into Azure Function,then do the switches with code in the Azure Function to save data into different table storage destinations. More details,please refer to this official doc:https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-with-azure-functions

Related

ADF - Iterate through blob container containing JSON files, partition and push to various SQL tables

I have a blob storage container in Azure Data Factory that contains many JSON files. Each of these JSON files contain data from an API. The data needs to be split into ~30 tables in my azure DWH.
ADF Processhere
I am hoping someone can provide some clarity on the best way to achieve this (I am new to the field of Data Engineering and trying to develop my skills through projects).
At present, I have written 1 stored procedure which contains code to extract data and insert data into 1 of the 30 tables. Is this the right approach? And if so, could you please advise how best to design my pipeline?
Thanks in advance.
I am assuming that the you also have 30 folders in the containers and eachdata from each folder will land up in a table or may be the file are named in a such a way that you can dereive which table it lands on . Please read about Get meta data activity and it should give an idea as to how to get the filename/folder name : https://learn.microsoft.com/en-us/azure/data-factory/control-flow-get-metadata-activity
Once you have that I think you should read about paramterizing a dataset : Use parameterized dataset for DataFlow and this also https://www.youtube.com/watch?v=9XSJih4k-l8

Delta Logic implementation using SnapLogic

Is there any snap available in SnapLogic to do following
Connect with snowflake and get data by SELECT * FROM VIEW
Connect with Azure Blob Storage and get the data from csv file : FILENAME_YYYYMMDD.csv
Take only those data which are available in 1 but NOT available in 2 and write this delta back to Azure Blob Storage : FILENAME_YYYYMMDD.csv
Is In-Memory Look-Up useful for this?
No, In-Memory Lookup snap is used for cases where you need to look up the value corresponding to the value in a certain field of the incoming records. For example, say you want to look up a country name against the country ISO code. This snap generally fetches the lookup table once and stores it in memory. Then it uses this stored lookup table to provide data corresponding to the incoming records.
In your case, you have to use the Join snap and configure it to an inner join.

Azure Data Factory join two lookup output

I'm trying to create a pipeline which:
Gets some info from an Azure SQL Database table with a LookUp activity.
Gets similar info from an Oracle Database table with a similar LookUp activity.
Join both tables somehow
Then process this joined data with a ForEach activity.
It seems pretty simple but I'm not able to find a solution that sweets my requirements.
I want to join those tables with a simple join as both tables shares a column with the same type and value. I've tried some approaches:
Tried a filter activity with items like #union(activity('Get Tables from oracle config').output.value, activity('Get tables from azure config').output.value) and condition #or(equals(item().status, 'READY'), equals(item().owner,'OWNer')) but it fails because some records doesn't have a status field and some others haven't the owner field and I don't know how to bypass this error
Tried a Data Flow Activity approach which should be the one, but Oracle connection is not compatible with the data flow Activity, just the Azure SQL Server one
I've tried to put all the records into an array and then process it through a DataBricks Activity, but DataBricks doesn't accept an array as a job input through their widgets.
I've tried a ForEach loop for appending the Oracle result to the Azure SQL one but no luck at all.
So, I'm totally blocked on how to proceed. Any suggestions? Thanks in advance.
Here's a pipeline overview:

Azure Data Factory Input dataset with sqlReaderQuery as source

We are creating Azure Data Factory pipeline using .net API. Here we are providing input data source using sqlReaderQuery. By this mean, this query can use multiple table.
So problem is we can't extract any single table from this query and give tableName as typeProperty in Dataset as shown below:
"typeProperties": {
"tableName": "?"
}
While creating dataset it throws exception as tableName is mandatory. We don't want to provide tableName in this case? Is there any alternative of doing the same?
We are also providing structure in dataset.
Unfortunately you cant do that natively. You need to deploy a Dataset for each table. Azure Data Factory produce slices for every activity ahead of execution time. Without knowing the table name, Data Factory would fail when producing these input slices.
If you want to read from multiple tables, then use a stored procedure as the input to the data set. Do your joins and input shaping in the stored procedure.
You could also get around this by building a dynamic custom activity that operates, say, at the database level. When doing this you would use a dummy input dataset and a generic output data set and control most of the process yourself.
It is a bit of a nuisance this property being mandatory, particularly if you have provided a ...ReaderQuery. For Oracle copies I have used sys.dual as the table name, this is a sort of built-in dummy table in Oracle. In SQL Server you could use one of the system views or set up a dummy table.

Is it possible to Upsert using Stream Analytics

I am using Stream Analytics to insert data into table storage. This works when all I want to do is add new rows. However, I now want to insert or update existing rows. Is this possible with Stream Analytics/Table storage?
The current implementation of Stream Analytics output to Azure Table uses InsertOrReplace API. So as long as your new data is cumulative (not just the deltas) it should simply work.
On the other hand, if you would like only upsert (insert or update), you could consider DocumentDB output.
If you like something more customized, You could also consider a trigger in your SQL table output.
cheers
Chetan
In short, no. Stream Analytics isn't an ETL tool.
However, you might be able to pass the output to a downstream SQLDB table. Then have a second stream job and query that joins the first to the table using left/right and inner joins. Just an idea, not tested, and not recommended.
OR
Maybe output the streamed data to a SQL DB landing table or Data Lake Store. Then perform a merge there before producing the output dataset. This would be a more natural approach.

Resources