I have a requirement to move the XML documents stored in SQL Server Table as an XML column to Azure Blob Storage. I haven't done something like this in ADF. Any help would be appreciated.
In this repro, I have created a sample table of two records which has xml data. I have copied each XML record into Blob storage file (For every one xml record of SQL table, one file is created in Blob).
A table is created with the following script in SQL server.
drop table if exists XmlValuesTable;
CREATE TABLE XmlValuesTable (pkid varchar(10) PRIMARY KEY, v XML NOT NULL );
INSERT INTO XmlValuesTable
VALUES ('a1','<note><float>123.456</float><time>01:23:45.789</time></note>');
INSERT INTO XmlValuesTable
VALUES ('b1','<note><float>4.0000000000</float><time>01:23:45Z</time></note>');
In ADF pipeline, Lookup activity is taken and Query is given as select pkid from XmlValuesTable
Then Foreach activity is added and in settings, #activity('Lookup1').output.value is given in items text box.
Inside the Foreach activity, copy activity is added. In source Query is given as select v from XmlValuesTable where pkid='#{item().pkid}'
In sink activity csv dataset is added and primary key column name pkid is given as filename.
Once pipeline is run, files are created in blob storage.
Output:
File A1
File B1
In this way, We can copy the XML data stored in SQL server to Blob storage.
Related
I have a table in sql and it is copied to ADLS. After copying, sql table got inserted with new rows. I wanted to get the new rows.
I tried to use join transformation. But I couldn't get the output. What is the way to achieve this.
Refer this link. Using this you can get newly added rows from sql to data lake storage. Reproduced issue from my side and able to get newly added records from pipeline.
Created two tables in sql storage with names data_source_table and watermarktable.
data_source_table is the one which is having data in table and watermarktable used for tracking new records based date.
Created pipeline as shown below,
In lookup1 selecting the datasource table
In lookup2 select Query as follows
MAX(LastModifytime) as NewWatermarkvalue from data_source_table;
Then in copy activity source and sink taken as shown below images
SOURCE:
Query in Source:
select `* from data_source_table where LastModifytime > '#{activity('Lookup1').output.firstRow.WatermarkValue}' and LastModifytime <= '#{activity('Lookup1').output.firstRow.Watermarkvalue}'
SINK:
Pipeline ran successfully and data in sql table is loaded into data lake storage file.
Inserted new rows inserted in data_source_table and able to get those records from Lookup activity
I need to pick a time stamp data from a column ‘created on’ from a csv file in ADLS. Later I want to query Azure SQL DB like delete from table where created on = ‘time stamp’ in ADF. Please help on how could this be achieved.
Here I repro'd to fetch a selected row from the CSV in ADLS.
Create a Linked service and Dataset of the source file.
Read the Data by the Lookup Activity from the Source path.
For each activity iterates the values from the output of Lookup.#activity('Lookup1').output.value
Inside of For Each activity use Append Variable and set Variable Use value for append variable from the For each item records.
Using it as Index variable.
Use script activity to run query and reflect the script on the data.
Delete FROM dbo.test_table where Created_on = #{variables('Date_COL3')[4]}
I'm using the Copy Data activity from Microsoft Azure Data Factory for mapping JSON file attributes in my storage account (source) to MySQL database columns (sink).
Question:
Is it possible to get the Blob URL's within Copy Data from the JSON files and send it to my database? I know it's possible to get the file name in the source tab with "$$FILEPATH", but I'd like to get the complete URL.
ok since its constant value , you can add it as a column to your json Data and write it into SQL DB
Here is a quick demo that i created :
created a dataflow activity
in Dataflow activity, added a json file as a source
flattened the json (you have to flatten the json in order to write it into SQL DB because you cant write a complex data type to SQL Db)
added a derived column 'BlobUrl', here i mapped a new column and added a string value to the data. (see screenshot attached below) you can ingest here a variable instead of a constant value , please check it out here : Passing a variable from pipeline to a dataflow in azure data factory
saved the result in SQL DB as a sink
DataFlow Activities:
Derived Column Activity:
I am trying copy different csv files in blob storage into there very own sql tables(I want to auto create these tables). Ive seen alot of questions but I haven't seen any that answer this.
Currently I have a getmetadata function that grabs a list of child items to get the name of the files and a foreach loop but from there I don't know how to have them sent to different tables per file.
Updated:
When I run it for a 2nd time. It will add new rows into the table.
I created a simple test and it works well. This is my csv file stored in Azure Data Lake.
Then we can use pipeline to copy this csv file into Azure SQL table(auto create these tables).
At GetMetaData1 activity, we can set the dataset of the folder containing csv files
And select First row as header at the dataset.
2.At ForEach1 activity we can foreach the file list via expression #activity('Get Metadata1').output.childItems.
3.Inside ForEach1 activity, we can use Copy data1 activity with the same data source as GetMetaData1 activity. At source tab, we can type in dynamic content #item().name. We can use #item().name to get the file name.
At sink tab, we should select Auto create table.
In the Azure SQL dataset, we should type in schema name and dynamic content #replace(item().name,'.csv','') as its table name. Because this information is needed to create a table dynamically.
The debug result is as follows:
I'm trying to use Azure Data Factory to move the contents of an Azure SQL table that holds photo (JPEG) data into JPEG files held in Azure blob storage. There does not seem to be a way to create Binary files in Blob storage using ADF without the binary file being a specific format like AVRO or Parquet. I need to create 'raw' binary blobs.
I've been able to create Parquet files for each row in the SQL table, where the Parquet file contains columns for the Id, ImageType and Data (the varbinary that came from the SQL row). I cannot work out how to get the Data column directly into a binary file called "{id}.jpeg".
So far I have a Lookup activity that queries the SQL Photos table to get the Ids of the rows I want, which feeds a ForEach that executes a pipeline for every id. That pipeline uses the Id to query the Id, ImageType and Data from SQL and writes a Parquet file containing those 3 columns into a blob dataset.