Azure Synapse pipeline parse xml data to rowset

Azure Synapse pipeline parse xml data to rowset - azure

In Azure Synapse pipeline I adjust loading from stage table to DWH on Azure Synapse SQL dedicated pool.
I have a source table where one column has xml data stored in the column as text (nvarchar(max)).
I need parse this xml from every row in the set of columns and rows and load into Azure Synapse SQL dedicated pool. The xml functions don't apply here because Azure Synapse does not support xml.
Table is something like this:
source table
I need such result table:
needed result set
or such:
needed result set another
I tried to use Azure Synapse pipeline Data Flow Parse transformation, but for the third row returns only last element from xml (where CUSTOMERNO is 122 only).
Please could someone tell me how to parse the xml text into a set of rows?
Thanks in advance.

I created a table with xml data type column in azure sql database and inserted values into that.
students:
CREATE TABLE dbo.students (
studentId INT NOT NULL PRIMARY KEY,
studentName VARCHAR(20) NOT NULL,
request XML NOT NULL
)
insert into students(studentId,studentName,request)
values
(1,'XXX','<Customers><row><CUSTOMERNO>100</CUSTOMERNO><OPERATION>INSERT</OPERATION><EMAIL>bill.gates#microsoft.com</EMAIL></row> </Customers>'),
(2,'YYY','<Customers><row><CUSTOMERNO>101</CUSTOMERNO><OPERATION>INSERT</OPERATION><EMAIL>bill.gates#microsoft.com</EMAIL></row></Customers>'),
(3,'ZZZ','<Customers><row><CUSTOMERNO>12</CUSTOMERNO><OPERATION>INSERT</OPERATION><EMAIL>bill.gates#microsoft.com</EMAIL></row><row>
<CUSTOMERNO>947</CUSTOMERNO><OPERATION>UPDATE</OPERATION><EMAIL>steve.jobs#apple.com</EMAIL></row><row>
<CUSTOMERNO>947</CUSTOMERNO><OPERATION>DELETE</OPERATION><EMAIL>steve.jobs#apple.com</EMAIL></row></Customers>');
Image for reference:
I created another table and retrieve the data from the xml data type column into that table using below code
CREATE TABLE dbo.studentTable (
studentId INT NOT NULL,
studentName VARCHAR(20) NOT NULL,
customerno INT NOT NULL,
operation VARCHAR(20) NOT NULL,
email VARCHAR(100) NOT NULL
)
INSERT INTO dbo.studentTable ( studentId,studentName, customerno, operation, email )
SELECT
s.studentId,
s.studentName,
c.c.value( '(CUSTOMERNO/text())[1]', 'INT' ) customerno,
c.c.value( '(OPERATION/text())[1]', 'VARCHAR(20)' ) operation,
c.c.value( '(EMAIL/text())[1]', 'VARCHAR(100)' ) email
FROM dbo.students s
CROSS APPLY s.request.nodes('Customers/row') c(c)
Output of table:
I did this in azure SQL database because azure synapse dedicated pool is not supported for xml data type.
we can copy above table to azure synapse dedicated pool using azure synapse pipeline copy activity.
I created dedicated SQL pool in azure synapse and created pipeline and performed copy activity using below procedure:
created azure SQL database dataset using azure SQL database linked service.
azure SQL database linked service:
Azure SQL database dataset:
Source of copy activity:
I created table in SQL pool- using below code:
CREATE TABLE dbo.studentTable (
studentId INT NOT NULL,
studentName VARCHAR(20) NOT NULL,
customerno INT NOT NULL,
operation VARCHAR(20) NOT NULL,
email VARCHAR(100) NOT NULL
)
Creating SQL dedicated pool linked service:
search for synapse dedicated pool in linked service options.
Image for reference:
select and click on continue and fill required details and click on OK.
Image for reference:
I gave sink as synapse dedicated pool database by enabling bulk insert option.
I debug the pipeline It run successfully.
Image for reference:
The table is copied successfully into dedicated pool.
Image for reference:

I think what #Amar has called out may work out if you want to use Azure SQL , but otherwise since you are already using the Synapse , you will have to explore SPARK .
https://kontext.tech/article/1091/extract-value-from-xml-column-in-pyspark-dataframe

Related

Azure Data Factory Copy Activity from csv to Azure Synapse is failing with identity columns in synapse tables

I am trying to do a simple copy data from csv in ADLS to Azure Synapse in ADF Pipeline . I have also set pre Copy script to Identity Insert ON but still the Copy activity fails with "An explicit value for the identity column in table can only be specified when a column list is used and IDENTITY_INSERT is ON".
I know that if identity insert is ON the Synapse will expect the column names to be listed but how do we do it in copy activity of ADF ?
Or how to copy data from csv to synapse with identity column in Azure synapse ?

When you are copying data into an identity column, ADF auto takes care w.r.t identity insert on and Off, there is no explicit need to execute anything.
So bulk insert would work.

How to create a table in databricks from an existing table on SQL

Can someone let me know how to create a table in Azure Databricks from a table that exists on Azure sql server? (assuming Databricks already has a jdbc connection to the sql server).
For example, the following will create a table if it doesn't exist from a location in my datalake.
CREATE TABLE IF NOT EXISTS newDB.MyTable USING delta LOCATION
'/mnt/dblake/BASE/Public/Adventureworks/delta/SalesLT.Product/'
I would like do the same but with the table existing on SQL Server?

Here is the basic solution for creation of an external table over an Azure SQL table
You can take the url (connection string) from the Azure Portal
create table if not exists mydb.mytable
using jdbc
options (url = 'jdbc:sqlserver://mysqlserver.database.windows.net:1433;database=mydb;user=myuser;password=mypassword;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;', dbtable = 'dbo.mytable')
Check the following links for additional options
https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-datasource.html
https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

How to migrate data from local storage to CosmosDB Table API?

I tried following the documentation where I'm able to migrate data from Azure Table storage to Local storage but after that when I'm trying migrating data from Local to Cosmos DB Table API, I'm facing issues with destination endpoint of Table API. Anyone have the idea that which destination endpoint to use? right now I'm using Table API endpoint from overview section.
cmd error

Problem I see here is you are not using the Table name correctly in source. TablesDB is not the table name. Please check the screenshot below for what we should use for table name. (In this case, mytable1 is the table name). So your source should be something like:
/Source:C:\myfolder\ /Dest:https://xxxxxxxx.table.cosmos.azure.com:443/mytable1/
Just re-iterating that I followed below steps and was able to migrate successfully:
Export from Azure Table Storage to local folder using below article. The table name should match the name of table in storage account:
AzCopy /Source:https://xxxxxxxxxxx.table.core.windows.net/myTable/ /Dest:C:\myfolder\ /SourceKey:key
Export data from Table storage
Import from local folder to Azure Cosmos DB table API using below command where table name is the one we created in the azure cosmos db table api, destkey is primary key and source is exactly copied from connection string appended with table name
AzCopy /Source:C:\myfolder\ /Dest:https://xxxxxxxx.table.cosmos.azure.com:443/mytable1//DestKey:key /Manifest:"myaccount_mytable_20140103T112020.manifest" /EntityOperation:InsertOrReplace
Output:

Azure Logic App : Truncate table before insert records

I have a basic logic app with sql Inserting records and it works fine. but here I need to truncate table before insert records.
Is this possible with Azure logic apps? if so how can i achieve this?
Note: I am accessing DB using on-premises data gateway and database is MSsql one

I need to truncate table before insert records. Is this possible with Azure logic apps?
Yeah you can do it on Logic App with out writing any SP.
There is builtin connector under SQL that is Execute a SQL Query on logic app designer. You have to use that connector.
If so how can i achieve this?
You can write RAW sql command there. See the below example.
I have a table like below:
CREATE TABLE AzureSqlTable(
[Id] [int] PRIMARY KEY IDENTITY(1,1) NOT NULL,
[FirstName] [nvarchar](max) NULL,
[LastName] [nvarchar](max) NULL,
[Email] [nvarchar](max) NULL,
)
GO
Your Case:
In your case I had tried to INSERT data using SQL Executor command
before INSERT operation I just TRUNCATE that table Like below:
TRUNCATE TABLE AzureSqlTable
INSERT INTO AzureSqlTable VALUES('PassFirstNameParam','PassLastNameParam','PassEmailParam')
I have defined a HTTP Request JSON schema with my parameter value and pass it to the SQL Executor Connector. See the screen shot below:
On Premises Operation:
For on premises query execution you have to configure like below:
Note: For On Prem connection configuration you could refer this official docs
This is how you could can TRUNCATE your table before INSERT operation. As you already know how to configure SQL connection. So I haven't put that in.

Write a stored procedure on the SQL server that truncates the table, call upon that in logic apps via the SQL connector through the data gateway. After that, you could inject the data.

Hive external tables map to azure blob storage

Is there a way to create a Hive external table using with location pointing to Azure Storage? We actually want to connect SAP HANA (SDA) to blob storage, so it seems the only way is to create an external hive table first which points to Azure blob storage and then use Hive ODBC connector/spark connectorto connect it toHANA SAP`. Does anyone have any idea how to achieve that?

You can create external tables in Hive or Spark on Azure. There are several options available:
Azure HDInsight
Azure Databricks (via Spark)
Hadoop distros supporting Azure Blob Storage (e. g. HDP)
External table creation would reference the data in the Blob storage account. See the following example for a Hive table created in HDInsight (wasb is used in the location):
CREATE EXTERNAL TABLE IF NOT EXISTS <database name>.<external textfile table name>
(
field1 string,
field2 int,
...
fieldN date
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '<field separator>'
lines terminated by '<line separator>' STORED AS TEXTFILE
LOCATION 'wasb:///<directory in Azure blob>'
TBLPROPERTIES("skip.header.line.count"="1");
or in Azure Databricks:
CREATE EXTERNAL TABLE IF NOT EXISTS my_table (name STRING, age INT)
COMMENT 'This table is created with existing data'
LOCATION 'wasbs://<storage-account#<containername>.blob.core.windows.net/<directory>'
See also:
HDInsight Documentation
Azure Databricks Documentation
I don' t know what SAP supports. ODBC-Access is possible to all of the solutions.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Azure Synapse pipeline parse xml data to rowset - azure

I think what #Amar has called out may work out if you want to use Azure SQL , but otherwise since you are already using the Synapse , you will have to explore SPARK . https://kontext.tech/article/1091/extract-value-from-xml-column-in-pyspark-dataframe

Related

Azure Data Factory Copy Activity from csv to Azure Synapse is failing with identity columns in synapse tables

How to create a table in databricks from an existing table on SQL

How to migrate data from local storage to CosmosDB Table API?

Azure Logic App : Truncate table before insert records

Hive external tables map to azure blob storage

Categories

Resources