Bulk Inserting CSV into Azure SQL with formatting file error

Bulk Inserting CSV into Azure SQL with formatting file error - azure

I'm trying to bulk insert data from a csv stored in a blob storage account in an Azure SQL database using a format file.
Here is what I am attempting to run:
BULK INSERT dbo.MyTable
FROM 'MyCSV.csv'
WITH(DATA_SOURCE = 'MyBlobStorageAccount'
, FIRSTROW = 2
, CODEPAGE = '65001'
, FORMATFILE = 'MyFormatFile.xml'
);
I get the following error:
Cannot bulk load because the file "MyFormatFile.xml" could not be opened. Operating system error code (null).
Now, I can successfully load other files from the same storage account that do not use a format file and I can successfully load the contents of MyFormatFile.xml using the following, so its not a permissions / credentials issue:
IF(OBJECT_ID('tempdb..#data') IS NOT NULL)
DROP TABLE #data
CREATE TABLE #data
(
data VARCHAR (MAX)
)
BULK INSERT #data
FROM 'MyFormatFile.xml'
WITH (DATA_SOURCE = 'MyBlobStorageAccount',
FIRSTROW = 1)
What am I doing wrong here? The Microsoft Documentation says that this is supported: Bulk Insert

As suggested in Microsoft Document there is no need to give file format when trading CSV file jus specify FORMAT = CSV
The issue is with your Format file you can simply insert data from csv fil without Format File.
If you still want to use Format File then please check the length of columns and column name that might causing the error.
Code Example:
BULK INSERT rest
FROM 'demo/mysamplecsv.csv'
WITH (DATA_SOURCE = 'MyAzurefile',
FIRSTROW = 2,
FORMAT = 'CSV');
Execution and Output:
Reference: Examples of bulk access to data in Azure Blob Storage
also see this Answer by Joseph Xu.

Related

Can we fetch a single row data from csv file in ADLS using Azure data factory

I need to pick a time stamp data from a column ‘created on’ from a csv file in ADLS. Later I want to query Azure SQL DB like delete from table where created on = ‘time stamp’ in ADF. Please help on how could this be achieved.

Here I repro'd to fetch a selected row from the CSV in ADLS.
Create a Linked service and Dataset of the source file.
Read the Data by the Lookup Activity from the Source path.
For each activity iterates the values from the output of Lookup.#activity('Lookup1').output.value
Inside of For Each activity use Append Variable and set Variable Use value for append variable from the For each item records.
Using it as Index variable.
Use script activity to run query and reflect the script on the data.
Delete FROM dbo.test_table where Created_on = #{variables('Date_COL3')[4]}

How to copy data from ADF to Synapse with special character("Ã") in one of the fields without errors or record rejection

I have a scenario where I copy data from Azure storage account(CSV - Pipe delimited source file) to Azure Synapse using ADF Copy utility. However the pipeline is failing because three of the records has special character "Ã" in one of the character field. Tried different encodings UTF-8,UTF-16 and Windows-1252, but none of them resolved my issue. I have also tried direct Copy utility(Copy into "") within Azure Synapse and getting the same error. I am able to manually insert those three records using "Insert into " statement.
Is there a better way to handle this without Manual inserts through something like doing pre conversion of that character before copy or through any available settings in ADF?

Please re-check the format settings for the source csv dataset as given in this Microsoft Documentation.
I reproduced this in my environment, and I am able to copy the csv data with special characters into Synapse with default UTF-8 encoding.
This is my source csv with special characters:
I have created a table named mytable in Synapse.
create table mytable (
firstname VARCHAR(32),
lastname VARCHAR(32)
)
WITH
(
DISTRIBUTION = HASH (firstname),
CLUSTERED COLUMNSTORE INDEX
)
GO
In the source give the format settings as per the above documentation.
Here I have used copy command to copy. If you want to create table automatically, you can enable it in the sink.
Copied data in Synapse table:

Load array file to Big query

I'm looking for effective way to upload the following array to Big query table
in this format :
Big query columns (example)
event_type: video_screen
event_label: click_on_screen
is_ready:false
time:202011231958
long:1
high:43
lenght:0
**
Array object
**
[["video_screen","click_on_screen","false","202011231958","1","43","0"],["buy","error","2","202011231807","1","6","0"],["sign_in","enter","user_details","202011231220","2","4","0"]]
I thought of several options but none of them seems to be The best practice.
Option A: Upload the following file to Google storage and then create table related to this bucket - not worked because of file format, Google Bigquery can't parse array from Google bucket.
Option B: Use by backend (node.js) to change the file structure to CSV and upload it directly to Bigquery - failed because of latency (the array is long, more than my example).
Option C: Use Google Appcript to get the array object and insert it to Bigquery - I didn't find a simple code for this, Google storage has no API connected to Appscript.
Someone deal with such a case and can share his solution? What is the best practice for this case? if you've code for this it will be great.

Load the file from GCS to BigQuery into a table with 1 single string column. So you get 100K rows and one single column.
Essentially you will have a table that has a JSON in a string.
Use JSON_EXTRACT_ARRAY to process the JSON array into elements
then later extract each position into its coresponding variable/column and write it to a table
here is a demo:
with t as (
select '[["video_screen","click_on_screen","false","202011231958","1","43","0"],["buy","error","2","202011231807","1","6","0"],["sign_in","enter","user_details","202011231220","2","4","0"]]' as s
),
elements as (
select e from t,unnest(JSON_EXTRACT_ARRAY(t.s)) e
)
select
json_extract_scalar(e,'$[0]') as event_type ,
json_extract_scalar(e,'$[1]') as event_label,
from elements
the output is:

Getting "Error converting data type VARCHAR to DATETIM"E while copying data from Azure blob to Azure DW through Polybase

I am new to the Azure environment and i am using data factory while trying to copy data present in the CSV file on Azure blob storage which has three columns (id,age,birth date) to a table in Azure data warehouse. The birth date is of the format "MM/dd/yyyy" and i am using polybase to copy the data from blob to my table in azure DW. The columns of the table are defined as(int,int,datetime).
I can copy my data if i use "Bulk Insert" option in data factory but it gives me an error when i choose the Polybase copy. Also changing the dateformat in the pipleine does not do any good either.
Polybase copies successfully if i change the date format in my file to "yyyy/MM/dd".
Is there a way i can copy data from my blob to my table without having to change the date format in the source file to "yyyy/MM/dd".

I assume you have created an external file format which you reference in your external table?
The CREATE EXTERNAL FILEFORMAT has an option to define how a date is represented: DATE_FORMAT, and you set that to how your source data represents datetime.
So something like so:
CREATE EXTERNAL FILE FORMAT your-format
WITH
(
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = '|',
DATE_FORMAT = 'MM/dd/yyyy' )
);
You can find more about this at: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-ver15

Seems like this error is resolved now. I was giving the date-format as 'MM/dd/yyyy' whereas the data factory expected it to be just MM/dd/yyyy without any quotes.
So as per my understanding i will summarize what i learned while copying data from Azure blob to Azure SQL Data Warehouse with a 'MM/dd/yyy' date format, in a few points here :
1) If you are using azure portal to copy data from blob to azure sql data warehouse using Data Factory copy option.
Create a copy data pipe line using data factory.
Specify your input data source and your destination data store.
Under filed mappings,choose datetime in the column that contains the
date, click on the little icon on its right to bring the custom date
format field and enter your date format without quotes e.g.
MM/dd/yyyy as in my case.
Run your pipleline and it should successfully complete.
2) You can use polybase directly by:
Creating external data source that specifies the location of your
input file e.g. csv file on blob storage in my case.
An external file format that specifies the delimiter and custom date format e.g. MM/dd/yyyy in your
input file.
External table that defines all the columns present in your source
file and uses the external data storage and file format which you
defined above.
You can then create your custom tables as select using the external
table(CTAS).Something which Niels stated in his answer above.I used
Microsoft SQL Server Management Studio for this process.

Loading ORC Data into SQL DW using PolyBase

I am trying to load ORC file format via PolyBase but I am facing below problems.
Problem:1
I have a CSV file which below code converts the csv file to ORC format but its selecting data from permanent table. If I remove "as select * from dbo.test" then create external table is not working. Permanent table contains 0 record.
create external table test_orc
with
(
location='/TEST/',
DATA_SOURCE=SIMPLE,
FILE_FORMAT=TEST_ORC
)
as select * from dbo.test ---Permanent Table
Problem:2
If I select the data from test_orc then I am getting invalid postscript error so, I removed my .csv file from the TEST directory. Is there any way to convert CSV to ORC file in different directory like TEST2.
Problem:3
If I select the data from test_orc then the count is zero and I am not getting any error.
select count(*) from test_orc---count is zero
Expectation
Need to load ORC file in the TEST directory to dbo.test table.
Please share your thought one this issue.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Bulk Inserting CSV into Azure SQL with formatting file error - azure

Related

Can we fetch a single row data from csv file in ADLS using Azure data factory

How to copy data from ADF to Synapse with special character("Ã") in one of the fields without errors or record rejection

Load array file to Big query

Getting "Error converting data type VARCHAR to DATETIM"E while copying data from Azure blob to Azure DW through Polybase

Loading ORC Data into SQL DW using PolyBase

Categories

Resources