Issue with CSV as a source in datafactory

Issue with CSV as a source in datafactory - azure

I have a CSV
"Heading","Heading","Heading",LF
"Data1","Data2","Data3",LF
"Data4","Data5","Data6",LF
And for the above CSV row limiter is LF
Issue is last comma. When I try to preview data after setting first column as heading and skip rows as 0 in source of copy activity in data factory, it throws error stating last column is null.
If I remove last comma.ie
"Heading","Heading","Heading"LF
"Data1","Data2","Data3"LF
"Data4","Data5","Data6"LF
It will work fine.
It's not possible to edit CSV as each CSV may contain 500k records.
How to solve this?
Addition details:
CSV i am uploadingenter image description here
My azure portal setting
enter image description here
Error message on preview data
enter image description here
if i remove the first row as header i could see an empty column
enter image description here

Please try to set Row delimiter as Line Feed(\n).
I tested your sample csv file and it works fine.
output:
I tried to create the same file with you and reproduce your issue.It seems the check mechanism of adf. You need to remove the first row as header selection to escape this check. If you do not want to do that, you have to preprocess your CSV files.
I suggest you below two workarounds.
1.Use Azure Function Http Trigger. You could pass the CSV file name as parameter into Azure Function.Then use Azure Blob Storage SDK to process your csv file to cut the last comma.
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook
2.Use Azure Stream Analytics. You could configure your blob storage as input and create another container as output. Then use SQL query to process your CSV data.
https://learn.microsoft.com/en-us/azure/stream-analytics/stream-analytics-quick-create-portal

Related

DelimitedTextMoreColumnsThanDefined in Azure Data Factory

After several successful pipelines witch move .txt files from you Azure fileshare to your Azure SQL-server I am experiencing problems with moving one specific file to an sql-server table. I get the following errorcode:
ErrorCode=DelimitedTextMoreColumnsThanDefined,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing 'Csv/Tsv Format Text' source 'chauf_afw.txt' with row number 2: found more columns than expected column count 5.,Source=Microsoft.DataTransfer.Common,'
Azure Data factory sees 5 column on both sink and source side. On both sides(source and sink) I have a schema with 5 columns, the final schematic look likes following.
schematic picture The .txt file contains 6 columns when counting the tabs.
The source file is a UTF-8 .txt file with tab separated data nothing special and in the same format as the other successfully imported files.
Regarding the delimiter the file used tabs in notepad++ it looks like this.
I am afraid I am missing something but I can't file the cause the the error code.

Cannot convert excel to csv : Azure Synapse Analytics

I want to convert Excel to CVS in Azure Synapse Analytics but I got an error.
The error message is "Invalid excel header with empty value".
The Excel file I want to convert looks like this (created for the question) and I need to remove the blank column A when converting to csv.
I have never used ADF before so I don't know.
Can someone please tell me how to do this?
Any help would be appreciated.
sample.excel

You have to use dataflows to do that in ADF.
First create a linked service for your source data set.
Create linked service for your target folder.
My input looks like this (took from your attached sheet)
Go to the author tab of data factory and select on new dataflow.
Source settings should look like this
Source options: Point to the location where you have stored excel sheet and also select the sheetname, in my case it is sheet1 (For this example I have used Azure Blob storage)
Keep rest of the tabs as default and add a sink to your data flow.
Sink Settings should look like below
Point to the target location where you want to store your csv file (I have used Azure blob storage). Keep rest of the things on default
Go to the new pipeline and pull dataflow activity in your canvas and trigger your dataflow.
And my output in csv looks like this

Data Factory cannot copy `csv` with comma after last column to sql data warehouse

I have CSV files that I want to copy from a blob to DW, the CSV files have comma after the last column (see example below). Using ADF, I tried to copy csv files to a SQL table in DW. However, I got this error, which I think it's because of the last comma (as I have 15 columns):
few rows of csv file:
Code,Last Trading Date,Bid Price,Bid Size,Ask Price,Ask Size,Last Price,Traded Volume,Open Price,High Price,Low Price,Settlement Price,Settlement Date,Implied Volatility,Last Trade Time,
BNH2021F,31/03/2021,37.750000,1,38.000000,1,,0,,,,37.750000,29/03/2021,,,
BNM2021F,30/06/2021,44.500000,6,44.700000,2,44.400000,4,44.300000,44.400000,44.300000,44.500000,29/03/2021,,15-55-47.000,
BNU2021F,30/09/2021,46.250000,2,47.000000,1,47.490000,2,47.490000,47.490000,47.490000,46.920000,29/03/2021,,15-59-10.000,
Note that CSVs are the original files and I can't change them. I also tried different Quote and Escape characters in the dataset and it didn't work.
Also I want to do this using ADF, not azure functions.
I couldn't find any solution to that, please help.
Update:
It's interesting that the dataset preview works:

I think you can use data flow to achieve that.
Azure data factory will interpret last comma as a column with null value. So we can use Select activity to filter last column.
Set mapping manually at sink.
Then we can sink to our DW or SQL table.

You are using 15 columns and your destination is expecting 16. Add another column to your CSV or modify your DW.

There is a simple solution to this.
Step 1:
Uncheck the "First Row as header" option in your source dataset
Step 2: Sink it first to another CSV file. in the sink csv dataset import schema like below. Copy activity will create a new CSV file with all clean 15 columns i.e. last extra comma will not be present in new csv file.
Step 3: Copy from the newly created csv file with "First row as header" checked and sick it to DW.

deleting rows in azure data flow

I am trying to clean a data frame In azure data flow using alter row operation. I have created a blob link service with CSV file (5 columns). Then created a data flow as follows: Please refer to the image attached.
enter image description here
As you can see in the third image, alterrow still contains zero columns, not picking up columns from the source file. Can anyone tell me why this is happening?

As Mark Kromer refers,you can delete your AlterRow1 and add a new AlterRow.If it doesn't work,try doing this in a different browser or clear your browser cache.It looks the same as this question.

Azure Stream Analytics no output header in CSV required

We have an ASA job which outputs data every hour to a CSV blob file. However, it also writes a header row which we do not require.
How do I configure ASA job so that the header row is not generated in the output CSV blob file?

From official document, we could find that there is no option for user to configure the output as CSV format without column header row when using Azure Stream Analytics. Is there any special requirement for you to removing the header row in your CSV file? Whether you could change your output format to JSON or Avro? You could provide your detailed requirement, then we could better help you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Issue with CSV as a source in datafactory - azure

Related

DelimitedTextMoreColumnsThanDefined in Azure Data Factory

Cannot convert excel to csv : Azure Synapse Analytics

Data Factory cannot copy `csv` with comma after last column to sql data warehouse

deleting rows in azure data flow

Azure Stream Analytics no output header in CSV required

Categories

Resources