I need to pick a time stamp data from a column ‘created on’ from a csv file in ADLS. Later I want to query Azure SQL DB like delete from table where created on = ‘time stamp’ in ADF. Please help on how could this be achieved.
Here I repro'd to fetch a selected row from the CSV in ADLS.
Create a Linked service and Dataset of the source file.
Read the Data by the Lookup Activity from the Source path.
For each activity iterates the values from the output of Lookup.#activity('Lookup1').output.value
Inside of For Each activity use Append Variable and set Variable Use value for append variable from the For each item records.
Using it as Index variable.
Use script activity to run query and reflect the script on the data.
Delete FROM dbo.test_table where Created_on = #{variables('Date_COL3')[4]}
I have a scenario where I copy data from Azure storage account(CSV - Pipe delimited source file) to Azure Synapse using ADF Copy utility. However the pipeline is failing because three of the records has special character "Ã" in one of the character field. Tried different encodings UTF-8,UTF-16 and Windows-1252, but none of them resolved my issue. I have also tried direct Copy utility(Copy into "") within Azure Synapse and getting the same error. I am able to manually insert those three records using "Insert into " statement.
Is there a better way to handle this without Manual inserts through something like doing pre conversion of that character before copy or through any available settings in ADF?
Please re-check the format settings for the source csv dataset as given in this Microsoft Documentation.
I reproduced this in my environment, and I am able to copy the csv data with special characters into Synapse with default UTF-8 encoding.
This is my source csv with special characters:
I have created a table named mytable in Synapse.
create table mytable (
firstname VARCHAR(32),
lastname VARCHAR(32)
)
WITH
(
DISTRIBUTION = HASH (firstname),
CLUSTERED COLUMNSTORE INDEX
)
GO
In the source give the format settings as per the above documentation.
Here I have used copy command to copy. If you want to create table automatically, you can enable it in the sink.
Copied data in Synapse table:
I'm looking for effective way to upload the following array to Big query table
in this format :
Big query columns (example)
event_type: video_screen
event_label: click_on_screen
is_ready:false
time:202011231958
long:1
high:43
lenght:0
**
Array object
**
[["video_screen","click_on_screen","false","202011231958","1","43","0"],["buy","error","2","202011231807","1","6","0"],["sign_in","enter","user_details","202011231220","2","4","0"]]
I thought of several options but none of them seems to be The best practice.
Option A: Upload the following file to Google storage and then create table related to this bucket - not worked because of file format, Google Bigquery can't parse array from Google bucket.
Option B: Use by backend (node.js) to change the file structure to CSV and upload it directly to Bigquery - failed because of latency (the array is long, more than my example).
Option C: Use Google Appcript to get the array object and insert it to Bigquery - I didn't find a simple code for this, Google storage has no API connected to Appscript.
Someone deal with such a case and can share his solution? What is the best practice for this case? if you've code for this it will be great.
Load the file from GCS to BigQuery into a table with 1 single string column. So you get 100K rows and one single column.
Essentially you will have a table that has a JSON in a string.
Use JSON_EXTRACT_ARRAY to process the JSON array into elements
then later extract each position into its coresponding variable/column and write it to a table
here is a demo:
with t as (
select '[["video_screen","click_on_screen","false","202011231958","1","43","0"],["buy","error","2","202011231807","1","6","0"],["sign_in","enter","user_details","202011231220","2","4","0"]]' as s
),
elements as (
select e from t,unnest(JSON_EXTRACT_ARRAY(t.s)) e
)
select
json_extract_scalar(e,'$[0]') as event_type ,
json_extract_scalar(e,'$[1]') as event_label,
from elements
the output is:
I am new to the Azure environment and i am using data factory while trying to copy data present in the CSV file on Azure blob storage which has three columns (id,age,birth date) to a table in Azure data warehouse. The birth date is of the format "MM/dd/yyyy" and i am using polybase to copy the data from blob to my table in azure DW. The columns of the table are defined as(int,int,datetime).
I can copy my data if i use "Bulk Insert" option in data factory but it gives me an error when i choose the Polybase copy. Also changing the dateformat in the pipleine does not do any good either.
Polybase copies successfully if i change the date format in my file to "yyyy/MM/dd".
Is there a way i can copy data from my blob to my table without having to change the date format in the source file to "yyyy/MM/dd".
I assume you have created an external file format which you reference in your external table?
The CREATE EXTERNAL FILEFORMAT has an option to define how a date is represented: DATE_FORMAT, and you set that to how your source data represents datetime.
So something like so:
CREATE EXTERNAL FILE FORMAT your-format
WITH
(
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = '|',
DATE_FORMAT = 'MM/dd/yyyy' )
);
You can find more about this at: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-external-file-format-transact-sql?view=sql-server-ver15
Seems like this error is resolved now. I was giving the date-format as 'MM/dd/yyyy' whereas the data factory expected it to be just MM/dd/yyyy without any quotes.
So as per my understanding i will summarize what i learned while copying data from Azure blob to Azure SQL Data Warehouse with a 'MM/dd/yyy' date format, in a few points here :
1) If you are using azure portal to copy data from blob to azure sql data warehouse using Data Factory copy option.
Create a copy data pipe line using data factory.
Specify your input data source and your destination data store.
Under filed mappings,choose datetime in the column that contains the
date, click on the little icon on its right to bring the custom date
format field and enter your date format without quotes e.g.
MM/dd/yyyy as in my case.
Run your pipleline and it should successfully complete.
2) You can use polybase directly by:
Creating external data source that specifies the location of your
input file e.g. csv file on blob storage in my case.
An external file format that specifies the delimiter and custom date format e.g. MM/dd/yyyy in your
input file.
External table that defines all the columns present in your source
file and uses the external data storage and file format which you
defined above.
You can then create your custom tables as select using the external
table(CTAS).Something which Niels stated in his answer above.I used
Microsoft SQL Server Management Studio for this process.
I am trying to load ORC file format via PolyBase but I am facing below problems.
Problem:1
I have a CSV file which below code converts the csv file to ORC format but its selecting data from permanent table. If I remove "as select * from dbo.test" then create external table is not working. Permanent table contains 0 record.
create external table test_orc
with
(
location='/TEST/',
DATA_SOURCE=SIMPLE,
FILE_FORMAT=TEST_ORC
)
as select * from dbo.test ---Permanent Table
Problem:2
If I select the data from test_orc then I am getting invalid postscript error so, I removed my .csv file from the TEST directory. Is there any way to convert CSV to ORC file in different directory like TEST2.
Problem:3
If I select the data from test_orc then the count is zero and I am not getting any error.
select count(*) from test_orc---count is zero
Expectation
Need to load ORC file in the TEST directory to dbo.test table.
Please share your thought one this issue.