Getting error in storing the filename in the Azure Dataflow? - azure

I am getting one excel file in data lake and I am exporting the excel file into Azure SQL database using Dataflow in the ADF.
I need to store the filename as a column in my data. I am following the below steps:
I am giving the column name called "filename" in the Column to store the file name section.
I can able to see the entire columns and my new column "filename" in the projection and inspect section. However, when I tried to see the preview data, I am getting the below error
Not sure what is the issue? I changed the column name but no success. Could anyone advise what is the issue?

As the error message states, there must be a column name Filename which already exists in the source file schema.
Your settings look correct and if you are facing the same error after changing the column name, try to refresh your dataset or remove and add the source dataset again. This can refresh your source schema if any to the latest.
If you have a file inside a subfolder, the filename column extracts the full file path which includes subfolders and filename (ex: /subfolder/filename).
In this case, extract only filename from the filename column using derived column transformation.

Related

Add file name to Copy activity in Azure Data Factory

I want to copy data from a CSV file (Source) on Blob storage to Azure SQL Database table (Sink) via regular Copy activity but I want to copy also file name alongside every entry into the table. I am new to ADF so the solution is probably easy but I have not been able to find the answer in the documentation and neither on the internet so far.
My mapping currently looks like this (I have created a table for output with the file name column but this data is not explicitly defined at the column level at the CSV file therefore I need to extract it from the metadata and pair it to the column):
For the first time, I thought that I am going to put dynamic content in there and therefore solve the problem this way. But there is not an option to use dynamic content in each individual box so I do not know how to implement the solution. My next thought was to use Pre-copy script but have not seen how could I use it for this purpose. What is the best way to solve this issue?
In Mapping columns of copy activity you cannot add the dynamic content of Meta data.
First give the source csv dataset to the Get Metadata activity then join it with copy activity like below.
You can add the file name column by the Additional columns in the copy activity source itself by giving the dynamic content of the Get Meta data Actvity after giving same source csv dataset.
#activity('Get Metadata1').output.itemName
If you are sure about the data types of your data then no need to go to the mapping, you can execute your pipeline.
Here I am copying the contents of samplecsv.csv file to SQL table named output.
My output for your reference:

Create list of files in Azure Storage and send it to sql table using ADF

I need to copy file names of excel files that are in my Azure Storage as blobs and then put these names in the SQL Server table using ADF. It can be a file path as a name of a file but the hardest thing is that in the dataset which takes all the files from one specific folder I have to select a sheet name and these sheet names are different for each file, therefore it returns an error. Is there a way to create a collective dataset without indicating the sheet name?
So, if I understand your question correctly you are looking for a way to write all Excel filenames to a SQL Database using ADF.
You can use the generic Get Metadata activity and use a binary dataset as source. Select Child items as an field to retrieve. This will retrieve all files in the folder. Then add a filter to only select the Excel file types.
Hope that this gets you on the right track.

split the file by their transaction date though ADF

By using ADF we unloaded data from on-premise sql server to datalake folder in single parquet for full load.
Then in delta load we are keeping in current day's folder yyyy/mm/dd structur going forward.
But i want full load file also separate it by their respective transaction day's folder.ex: in full load file we have 3 years data. i want data split it by transaction day wise in each separate folder. like 2019/01/01..2019/01/02 ..2020/01/01 instead of single file.
is there way to achieve this in ADF or while unloading itself can we get this folder structure for full load?
Hi#Kumar AK After a period of exploration, I found the answer. I think we need to use Azure data flow to achieve that.
My source file is a csv file, which contains transaction_date column.
Set this csv as the source in data flow.
In DerivedColumn1 activity, we can generate a new column FolderName via column transaction_date. FolderName will be used as a folder structure.
In sink1 activity, select Name file as column data as File Name option, select FolderName column as Column data.
That's all. These rows of the csv file will be split into files in different folders. The debug result is as follows, :

Add a date column in Azure Data Factory

I am wondering if it is possible to add a date column to each file uploaded.
For example each month a CSV is produced. I am wanted to add for example "December 2020" to each row and then for the next months upload add "January 2021" to every row in the CSV file. Before copying this into a SQL database.
e.g. file name "Latest Rating December 2020" I would want the 'December 2020' as a column and be the same value for all rows. The naming convention will be the same for each months upload.
Thanks
I've created a test to add a column to the csv file.
The result is as follows:
We can get file name via Child Items in Get MetaData activity.
The dataset is to the container in ADLS.
Then we can declare a variable FileName to store the file name via the expression #activity('Get Metadata1').output.childItems[0].name.
3.We can use additional column in Copy activity, and use the expression #concat(split(variables('FileName'),' ')[2],' ',split(variables('FileName'),' ')[3]) to
get the value we need. Note that the single quote contains a space.
In the dataset, we need key in a dynamic content #variables('FileName') to specify which file to be copied.
The sink is the same as source in Copy activity.
Then we can run debug to confirm it.
Here I think we also can copy into SQL table directly, when we set the sink to a sql table.

Issue with CSV as a source in datafactory

I have a CSV
"Heading","Heading","Heading",LF
"Data1","Data2","Data3",LF
"Data4","Data5","Data6",LF
And for the above CSV row limiter is LF
Issue is last comma. When I try to preview data after setting first column as heading and skip rows as 0 in source of copy activity in data factory, it throws error stating last column is null.
If I remove last comma.ie
"Heading","Heading","Heading"LF
"Data1","Data2","Data3"LF
"Data4","Data5","Data6"LF
It will work fine.
It's not possible to edit CSV as each CSV may contain 500k records.
How to solve this?
Addition details:
CSV i am uploadingenter image description here
My azure portal setting
enter image description here
Error message on preview data
enter image description here
if i remove the first row as header i could see an empty column
enter image description here
Please try to set Row delimiter as Line Feed(\n).
I tested your sample csv file and it works fine.
output:
I tried to create the same file with you and reproduce your issue.It seems the check mechanism of adf. You need to remove the first row as header selection to escape this check. If you do not want to do that, you have to preprocess your CSV files.
I suggest you below two workarounds.
1.Use Azure Function Http Trigger. You could pass the CSV file name as parameter into Azure Function.Then use Azure Blob Storage SDK to process your csv file to cut the last comma.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook
2.Use Azure Stream Analytics. You could configure your blob storage as input and create another container as output. Then use SQL query to process your CSV data.
https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-quick-create-portal

Resources