DelimitedTextMoreColumnsThanDefined in Azure Data Factory

DelimitedTextMoreColumnsThanDefined in Azure Data Factory - azure

After several successful pipelines witch move .txt files from you Azure fileshare to your Azure SQL-server I am experiencing problems with moving one specific file to an sql-server table. I get the following errorcode:
ErrorCode=DelimitedTextMoreColumnsThanDefined,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing 'Csv/Tsv Format Text' source 'chauf_afw.txt' with row number 2: found more columns than expected column count 5.,Source=Microsoft.DataTransfer.Common,'
Azure Data factory sees 5 column on both sink and source side. On both sides(source and sink) I have a schema with 5 columns, the final schematic look likes following.
schematic picture The .txt file contains 6 columns when counting the tabs.
The source file is a UTF-8 .txt file with tab separated data nothing special and in the same format as the other successfully imported files.
Regarding the delimiter the file used tabs in notepad++ it looks like this.
I am afraid I am missing something but I can't file the cause the the error code.

Related

Add column to CSV File from another CSV File (Azure Data Factory)

For example:
Persons.csv
name, last_name
-----------------------
jack, jack_lastName
luc, luc_lastname
FileExample.csv
id
243
123
Result:
name, last_name, exampleId
-------------------------------
jack, jack_lastName, 243
luc, luc_lastname, 123
I want to aggregate any number of columns from another data source, to insert that final result in a file or in a database table.
I have been trying many ways but I can't do it.

You can try to make use of Mergefiles in azure data factory pipeline to merge two csv files .
Select copydata activity and go to source to loop through wild card entry *.csv to search for csv files in storage(configure linked storage to adf in this process).
Then the create a output csv in the same container if required as in my case to merge files and store by naming it some examplemerge.csv.
Check mark the first row as header.
validate and try to debug .
Then you must be able to see merged files in the resultant merged file in output folder.
You can check this reference vlog Merge Multiple CSV files to single CSV for more details and also this vlog on Load Multiple CSV Files to a Table in Azure Data Factory if required.
But if you want to join the files , there must be some common column to join.
Also check this thread from Q&A Azure Data Factory merge 2 csv files with different schema

split the file by their transaction date though ADF

By using ADF we unloaded data from on-premise sql server to datalake folder in single parquet for full load.
Then in delta load we are keeping in current day's folder yyyy/mm/dd structur going forward.
But i want full load file also separate it by their respective transaction day's folder.ex: in full load file we have 3 years data. i want data split it by transaction day wise in each separate folder. like 2019/01/01..2019/01/02 ..2020/01/01 instead of single file.
is there way to achieve this in ADF or while unloading itself can we get this folder structure for full load?

Hi#Kumar AK After a period of exploration, I found the answer. I think we need to use Azure data flow to achieve that.
My source file is a csv file, which contains transaction_date column.
Set this csv as the source in data flow.
In DerivedColumn1 activity, we can generate a new column FolderName via column transaction_date. FolderName will be used as a folder structure.
In sink1 activity, select Name file as column data as File Name option, select FolderName column as Column data.
That's all. These rows of the csv file will be split into files in different folders. The debug result is as follows, :

Data Factory cannot copy `csv` with comma after last column to sql data warehouse

I have CSV files that I want to copy from a blob to DW, the CSV files have comma after the last column (see example below). Using ADF, I tried to copy csv files to a SQL table in DW. However, I got this error, which I think it's because of the last comma (as I have 15 columns):
few rows of csv file:
Code,Last Trading Date,Bid Price,Bid Size,Ask Price,Ask Size,Last Price,Traded Volume,Open Price,High Price,Low Price,Settlement Price,Settlement Date,Implied Volatility,Last Trade Time,
BNH2021F,31/03/2021,37.750000,1,38.000000,1,,0,,,,37.750000,29/03/2021,,,
BNM2021F,30/06/2021,44.500000,6,44.700000,2,44.400000,4,44.300000,44.400000,44.300000,44.500000,29/03/2021,,15-55-47.000,
BNU2021F,30/09/2021,46.250000,2,47.000000,1,47.490000,2,47.490000,47.490000,47.490000,46.920000,29/03/2021,,15-59-10.000,
Note that CSVs are the original files and I can't change them. I also tried different Quote and Escape characters in the dataset and it didn't work.
Also I want to do this using ADF, not azure functions.
I couldn't find any solution to that, please help.
Update:
It's interesting that the dataset preview works:

I think you can use data flow to achieve that.
Azure data factory will interpret last comma as a column with null value. So we can use Select activity to filter last column.
Set mapping manually at sink.
Then we can sink to our DW or SQL table.

You are using 15 columns and your destination is expecting 16. Add another column to your CSV or modify your DW.

There is a simple solution to this.
Step 1:
Uncheck the "First Row as header" option in your source dataset
Step 2: Sink it first to another CSV file. in the sink csv dataset import schema like below. Copy activity will create a new CSV file with all clean 15 columns i.e. last extra comma will not be present in new csv file.
Step 3: Copy from the newly created csv file with "First row as header" checked and sick it to DW.

Issue with CSV as a source in datafactory

I have a CSV
"Heading","Heading","Heading",LF
"Data1","Data2","Data3",LF
"Data4","Data5","Data6",LF
And for the above CSV row limiter is LF
Issue is last comma. When I try to preview data after setting first column as heading and skip rows as 0 in source of copy activity in data factory, it throws error stating last column is null.
If I remove last comma.ie
"Heading","Heading","Heading"LF
"Data1","Data2","Data3"LF
"Data4","Data5","Data6"LF
It will work fine.
It's not possible to edit CSV as each CSV may contain 500k records.
How to solve this?
Addition details:
CSV i am uploadingenter image description here
My azure portal setting
enter image description here
Error message on preview data
enter image description here
if i remove the first row as header i could see an empty column
enter image description here

Please try to set Row delimiter as Line Feed(\n).
I tested your sample csv file and it works fine.
output:
I tried to create the same file with you and reproduce your issue.It seems the check mechanism of adf. You need to remove the first row as header selection to escape this check. If you do not want to do that, you have to preprocess your CSV files.
I suggest you below two workarounds.
1.Use Azure Function Http Trigger. You could pass the CSV file name as parameter into Azure Function.Then use Azure Blob Storage SDK to process your csv file to cut the last comma.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook
2.Use Azure Stream Analytics. You could configure your blob storage as input and create another container as output. Then use SQL query to process your CSV data.
https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-quick-create-portal

How to transfer data from Excel spreadsheet to flat file without column headers?

Got a simple data flow in SSIS with excel source and flat file destination. The headers in excel are on first row and in SSIS I've the 'first row has headers' ticked in excel connection manager.
In my flat file the data is being loaded and all the data looks correct except the headers from the excel.
When I set up my flat file connection manager (ffcm) it was using the comma delimited setting for the columns
Checked in columns in ffcm and all the columns were there.
and after a few runs I noticed that I had not ticked the 'Column names in the first data row' in the flat file connection manager. Now that I have done this I have an error
TITLE: Package Validation Error
ADDITIONAL INFORMATION:
Error at Data Flow Task [DTS.Pipeline]: "component "Flat File Destination" (487)" failed >validation and returned validation status "VS_NEEDSNEWMETADATA".
Error at Data Flow Task [DTS.Pipeline]: One or more component failed validation.
Error at Data Flow Task: There were errors during task validation.
(Microsoft.DataTransformationServices.VsIntegration)
So unticked that again but made no difference.
Checked the columns in the ffcm and they are now set to column0, column1, column2....etc.
Also when I run it it puts out a number of lines of commas realted to the rows in excel sheet:
,,,,,,,,,,,,
,,,,,,,,,,,,
,,,,,,,,,,,,
,,,,,,,,,,,,
and I seem to be getting in a bit of a pickle and need some better advice about what the problem may be.

It seems that you have lost the field mappings between your Excel Source and Flat File Destination since you last configured the values.
Unchecking and checking the box Column names in the first data row on the Flat File Connection Manager has renamed the actual column names of the flat file destination. These new columns should now be re-mapped on the Flat File Destination component.
If you notice the warning sign on your Flat file destination, double-click the flat file destination. You will receive a message something similar to the one shown below.
On the Flat File Destination, you will notice the warning message Map the column on the Mappings page if the field mappings have been lost..
On the Flat File Destination, you will notice that field mappings have been lost and you need click Mappings page to configure the field mappings between source and destination.
I believe that this is the issue you are facing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

DelimitedTextMoreColumnsThanDefined in Azure Data Factory - azure

Related

Add column to CSV File from another CSV File (Azure Data Factory)

split the file by their transaction date though ADF

Data Factory cannot copy `csv` with comma after last column to sql data warehouse

Issue with CSV as a source in datafactory

How to transfer data from Excel spreadsheet to flat file without column headers?

Categories

Resources