I'm trying to create a pipeline which:
Gets some info from an Azure SQL Database table with a LookUp activity.
Gets similar info from an Oracle Database table with a similar LookUp activity.
Join both tables somehow
Then process this joined data with a ForEach activity.
It seems pretty simple but I'm not able to find a solution that sweets my requirements.
I want to join those tables with a simple join as both tables shares a column with the same type and value. I've tried some approaches:
Tried a filter activity with items like #union(activity('Get Tables from oracle config').output.value, activity('Get tables from azure config').output.value) and condition #or(equals(item().status, 'READY'), equals(item().owner,'OWNer')) but it fails because some records doesn't have a status field and some others haven't the owner field and I don't know how to bypass this error
Tried a Data Flow Activity approach which should be the one, but Oracle connection is not compatible with the data flow Activity, just the Azure SQL Server one
I've tried to put all the records into an array and then process it through a DataBricks Activity, but DataBricks doesn't accept an array as a job input through their widgets.
I've tried a ForEach loop for appending the Oracle result to the Azure SQL one but no luck at all.
So, I'm totally blocked on how to proceed. Any suggestions? Thanks in advance.
Here's a pipeline overview:
Related
Recently Microsoft launched the snowflake connection for data flow in ADF. Is there any way to turn on the push down optimization in ADF so that if my source and target is Snowflake only then instead of pulling data out of snowflake environment it should trigger a query in snowflake to do the task. Like a normal ELT process instead of ETL.
Let me know if you need some more clarification.
As I understand the intent here is to fire a query from ADF on snowflake data so that possibley the data can be scrubbed ( or something similar ) . I see that Lookup activity also supports snowflake and probably that should help you . My knowledge on SF is limited , but i know that you can call a proc/query from lookup activity and that should help .
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
"Lookup activity reads and returns the content of a configuration file or table. It also returns the result of executing a query or stored procedure. The output from Lookup activity can be used in a subsequent copy or transformation activity if it's a singleton value. The output can be used in a ForEach activity if it's an array of attributes."
So I continue to rewrite my lovely SSIS packages to ADF Data Flows. However, there is a lot of cases where I have some OLE DB Source with quite complicated SQL statement followed by other transformations.
Let's say there is a SQL statement which joins 10 different tables. As far as I know I can execute SQL statement only on my sink. So to get the very same dataset that is being used later, I have to create 10 different sources and 10 join operations. Is that correct?
It is possible but it doesn't seem to be very efficient. The only other thing that comes to my mind is to re-think our whole DWH logic but it would be a lot of added work, so I would rather avoid that.
Thank you in advance!
Actually, it's possible to execute SQL query on Source(only can do sql query).
For example, I do a SQL query in Source Azure SQL database.
Here's the data in my table test4 and test6:
Don't specify the table in Source dataset:
Data Flow Source setting:
Source Options, execute a SQL query joined two tables:
select a.id, a.tname,b.tt from test6 as a left join test4 as b on a.id=b.id
Import the schema of the query result:
Data Preview:
Hope this helps.
I am trying to get aggregate data sent to different table storage outputs based on a column name in select query. I am not sure if this is possible with stream analytics.
I've looked up the stream analytics docs and different forums, so far haven't found any leads. I am looking for something like
Select tableName,count(distinct records)
into tableName
from inputStream
I hope this makes it clear what I'm trying to achieve, I am trying to insert aggregates data into table storage (defined as outputs). I want to grab the output stream/tablestorage name from a select Query. Any idea how that could be done?
I am trying to get aggregate data sent to different table storage
outputs based on a column name in select query.
If i don't misunderstand your requirement,you want to do a case...when... or if...else... structure in the ASA sql so that you could send data into different table output based on some conditions. If so,i'm afraid that it could not be implemented so far.Every destination in ASA has to be specific,dynamic output is not supported in ASA.
However,as a workaround,you could use Azure Function as output.You could pass the columns into Azure Function,then do the switches with code in the Azure Function to save data into different table storage destinations. More details,please refer to this official doc:https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-with-azure-functions
Is there a way to Bind share point list to Sql Db tables to fetch the updates dynamically. Example: I have a share point list of 2 columns and, I have azure sql Db table with 2 columns. I would like to bind them together so when an update is happened in DB column, respective share point list column data will be updated.
I have tried write a sprint book job to do this but, it is lot of code to maintain. Also we need to manage the real time sync on our own.
I am expecting there might be some out of the box connecter in Microsoft flow, or azure logic app or something automation which will help me automate this.
I would suggest you check BCS so your db data could sync with SharePoint external list.
https://learn.microsoft.com/en-us/sharepoint/make-external-list
Another thread with demo.
https://www.c-sharpcorner.com/article/integrate-azure-sql-db-with-sharepoint-online-as-an-external-list-using-business/
There is SQL Server connector, suppose this is what you want. You could use the trigger When an item is created or When an item is modified to get the sql updates details.
The output would be like the below pic shows.
Further more information you could refer to this doc:SQL Server.Note: there are some known limitation, like When invoking triggers,
A ROWVERSION column is required for OnUpdatedItems
An IDENTITY column is required for OnNewItems
After the trigger, you could use the table details to update sharepoint list.
Hope this could help you.
Not really sure this is an explicit question or just a query for input. I'm looking at Azure Data Factory to implement a data migration operation. What I'm trying to do is the following:
I have a No SQL DB with two collections. These collections are associated via a common property.
I have a MS SQL Server DB which has data that is related to the data within the No SQL DB Collections via an attribute/column.
One of the NoSQL DB collections will be updated on a regular basis, the other one on a not so often basis.
What I want to do is be able to prepare a Data Factory pipline that will grab the data from all 3 DB locations combine them based on the common attributes, which will result in a new dataset. Then from this dataset push the data wihin the dataset to another SQL Server DB.
I'm a bit unclear on how this is to be done within the data factory. There is a copy activity, but only works on a single dataset input so I can't use that directly. I see that there is a concept of data transformation activities that look like they are specific to massaging input datasets to produce new datasets, but I'm not clear on what ones would be relevant to the activity I am wanting to do.
I did find that there is a special activity called a Custom Activity that is in effect a user defined definition that can be developed to do whatever you want. This looks the closest to being able to do what I need, but I'm not sure if this is the most optimal solution.
On top of that I am also unclear about how the merging of the 3 data sources would work if the need to join data from the 3 different sources is required but do not know how you would do this if the datasets are just snapshots of the originating source data, leading me to think that the possibility of missing data occurring. I'm not sure if a concept of publishing some of the data someplace someplace would be required, but seems like it would in effect be maintaining two stores for the same data.
Any input on this would be helpful.
There are a lot of things you are trying to do.
I don't know if you have experience with SSIS but what you are trying to do is fairly common for either of these integration tools.
Your ADF diagram should look something like:
1. You define your 3 Data Sources as ADF Datasets on top of a
corresponding Linked service
2. Then you build a pipeline that brings information from SQL Server into a
temporary Data Source (Azure Table for example)
3. Next you need to build 2 pipelines that will each take one of your NoSQL
Dataset and run a function to update the temporary Data Source which is the ouput
4. Finally you can build a pipeline that will bring all your data from the
temporary Data Source into your other SQL Server
Steps 2 and 3 could be switched depending on which source is the master.
ADF can run multiple tasks one after another or concurrently. Simply break down the task in logical jobs and you should have no problem coming up with a solution.