I want to add a new column in Azure Synapse Analytics - azure

I want to add a new datetime column to a file.
I would like to use the "add column" feature in the source dataset of the synapse pipeline to store the date the file was copied.
However, when I use the datetime function in Syanpse, the return value is a string, so the copy activity mapping can only be mapped as a string type. I want to store this in datetime type.
Is there a better way to do this?
The error message is "The column was not found" .
Also, the pipeline is copying data to SFTP > Raw > Processed > Published, and Raw/Processed/Published is Blob Storage.
When copying to Raw > Processed it converts to csv > Parquet.
The functions set for the additional columns are as follows:
#concat(formatDateTime(addToTime(utcNow(),9,'Hour'), 'yyyy-MM-dd'))

Related

How to copy the data from Append variable activity to a csv file using Azure Data Factory

How to copy the data from append variable activity to a csv file using Azure Data Factory
I have array of file names stored in append variable activity. I want to store all these files names inside a .CSV file in the data lake location.
For more info refer this
how to compare the file names that are inside a folder (Datalake) using ADF
In this repro, Variable V1 (array type) is taken with values as in below image.
New variable v2 of string type is taken and value is given as #join(variables('V1'),decodeUriComponent('%0A'))
This step is done to join all the strings of the array using \n (line feed).
Then Copy activity is taken and dummy source dataset with one row is taken.
In Source, +New is selected and value is given as dynamic content #varaiables('v2').
Sink dataset is created for CSV file.
In Mapping, import schemas is clicked and other than col1 , all other columns are deleted.
Then pipeline is debugged, and values got loaded in csv file.
Edited
Variable v2 is storing all the missing file names. (False activity of IF condition)
After for-each, Set variable is added and variable v3 (string type) is set as
#join(variables('v2'),decodeUriComponent('%0A'))
Then, in copy activity, +New column is added in source

Can we fetch a single row data from csv file in ADLS using Azure data factory

I need to pick a time stamp data from a column ‘created on’ from a csv file in ADLS. Later I want to query Azure SQL DB like delete from table where created on = ‘time stamp’ in ADF. Please help on how could this be achieved.
Here I repro'd to fetch a selected row from the CSV in ADLS.
Create a Linked service and Dataset of the source file.
Read the Data by the Lookup Activity from the Source path.
For each activity iterates the values from the output of Lookup.#activity('Lookup1').output.value
Inside of For Each activity use Append Variable and set Variable Use value for append variable from the For each item records.
Using it as Index variable.
Use script activity to run query and reflect the script on the data.
Delete FROM dbo.test_table where Created_on = #{variables('Date_COL3')[4]}

ADF - The data type SqlBigDecimal is not supported when writing from Money to Decimal column

I have a copy data activity in Azure Data Factory that takes the output of a stored procedure and then writes to CSV file. I have a Money columns (Precision: 19 Scale: 4) in the source that are converted into Decimal columns in the CSV Sink. I'm getting an error that SqlBigDecimal is not supported but the mapping looks good and it should convert the data to Decimal from Money not BigDecimal.
I used to have the same problem but with writing to Parquet file. This issue got resolved by itself somehow. I don't know what I did exactly to resolve that.
"Failure happened on 'Sink' side. ErrorCode=DataTypeNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The data type SqlBigDecimal is not supported.,Source=Microsoft.DataTransfer.Common,'"
I created a simple test to sink money type column into a csv.
First, I created a table in Azure SQL
create table dbo.test(
id int,
salary money
);
insert into dbo.test values (1,3500.1234);
insert into dbo.test values (2,3465.1234);
Query the rows via stored procudure in the Copy Activity. I didnt set any mapping and import schema.
Then sink to an empty csv file.
It will create the csv file.
After I run debug, we can see the result:

ADF: Dataflow sink activity file format for adls

I wanted to copy multiple table data from Azure SQL database to ADLS gen2. I created a pipeline which take table names as dynamic input values. later i used dataflow activity which copies the data to adls. I used sink type as delta. Now few of my table data are getting copied to adls properly with snappy.parquet format but few are giving error as column names are invalid for delta format.
How can we deal with this error and get data copied from all tables?
Also for knowledge wanted to know that does file formats for the files generated at destination folder in adls are by default parquet file? Or is there any option to change that?
Delta format is parquet underneath. You cannot use characters like " ,;{}()\n\t=" and have to replace that with _ or another character.
Dataflow has easy ways to rename column names in derive or select transforms.

Getting "Error converting data type VARCHAR to DATETIM"E while copying data from Azure blob to Azure DW through Polybase

I am new to the Azure environment and i am using data factory while trying to copy data present in the CSV file on Azure blob storage which has three columns (id,age,birth date) to a table in Azure data warehouse. The birth date is of the format "MM/dd/yyyy" and i am using polybase to copy the data from blob to my table in azure DW. The columns of the table are defined as(int,int,datetime).
I can copy my data if i use "Bulk Insert" option in data factory but it gives me an error when i choose the Polybase copy. Also changing the dateformat in the pipleine does not do any good either.
Polybase copies successfully if i change the date format in my file to "yyyy/MM/dd".
Is there a way i can copy data from my blob to my table without having to change the date format in the source file to "yyyy/MM/dd".
I assume you have created an external file format which you reference in your external table?
The CREATE EXTERNAL FILEFORMAT has an option to define how a date is represented: DATE_FORMAT, and you set that to how your source data represents datetime.
So something like so:
CREATE EXTERNAL FILE FORMAT your-format
WITH
(
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = '|',
DATE_FORMAT = 'MM/dd/yyyy' )
);
You can find more about this at: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-ver15
Seems like this error is resolved now. I was giving the date-format as 'MM/dd/yyyy' whereas the data factory expected it to be just MM/dd/yyyy without any quotes.
So as per my understanding i will summarize what i learned while copying data from Azure blob to Azure SQL Data Warehouse with a 'MM/dd/yyy' date format, in a few points here :
1) If you are using azure portal to copy data from blob to azure sql data warehouse using Data Factory copy option.
Create a copy data pipe line using data factory.
Specify your input data source and your destination data store.
Under filed mappings,choose datetime in the column that contains the
date, click on the little icon on its right to bring the custom date
format field and enter your date format without quotes e.g.
MM/dd/yyyy as in my case.
Run your pipleline and it should successfully complete.
2) You can use polybase directly by:
Creating external data source that specifies the location of your
input file e.g. csv file on blob storage in my case.
An external file format that specifies the delimiter and custom date format e.g. MM/dd/yyyy in your
input file.
External table that defines all the columns present in your source
file and uses the external data storage and file format which you
defined above.
You can then create your custom tables as select using the external
table(CTAS).Something which Niels stated in his answer above.I used
Microsoft SQL Server Management Studio for this process.

Resources