Azure Blob to Azure SQL tables creation - azure

I am trying to convert  BLOB Files into SQL DB Tables in Azure using BULK INSERT.
Here is the reference from the Microsoft:
https://azure.microsoft.com/en-us/updates/preview-loading-files-from-azure-blob-storage-into-sql-database/
My DATA in CSV looks like this
100,"37415B4EAF943043E1111111A05370E","ONT","000","S","ABCDEF","AB","001","000002","001","04","20110902","11111111","20110830152048.1837780","",""
My BLOB Container is in Public Access Level.
Step 1: Created Storage Credential.  I had generated a shared Access key (SAS token).
CREATE DATABASE SCOPED CREDENTIAL Abablobv1BlobStorageCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2017-07-29&ss=bfqt&srt=sco&sp=rwdlacup&se=2018-04-10T18:05:55Z&st=2018-04-09T10:05:55Z&sip=141.6.1.0-141.6.1.255&spr=https&sig=XIFs1TWafAakQT3Ig%3D';
GO
Step 2:  Created EXTERNAL DATA SOURCE in reference to Storage Credential
CREATE EXTERNAL DATA SOURCE Abablobv1BlobStorage
WITH ( TYPE = BLOB_STORAGE, LOCATION = 'https://abcd.blob.core.windows.net/', CREDENTIAL = Abablobv1BlobStorageCredential );
GO
Step 3 BULK INSERT STATEMENT using the External Data Source and DB TABLE
BULK INSERT dbo.TWCS
FROM 'TWCSSampleData.csv'
WITH ( DATA_SOURCE = 'Abablobv1BlobStorage', FORMAT = 'CSV');
GO
I am facing this error:
Bad or inaccessible location specified in external data source
"Abablobv1BlobStorage".
Does anyone have some idea about this?
I changed the Location of EXTERNAL DATA SOURCE to Location = abcd.blob.core.windows.net/invoapprover/SampleData.csv Now I get, Cannot bulk load because the file "SampleData.csv" could not be opened. Operating system error code 5(Access is denied.). For both statements using Bulk Insert or Open Row Set. I was not sure which access should be changed because the file is in Azure blob not on my machine, any ideas for this??

Please try the following query
SELECT * FROM OPENROWSET(
BULK 'TWCSSampleData.csv',
DATA_SOURCE = 'Abablobv1BlobStorage',
SINGLE_CLOB) AS DataFile;
Make sure the file is not located inside a container on the BLOB storage. In that case you need to specify the container in the Location parameter of the External Data Source. If you have a container named "files" then the location should be like 'https://abcd.blob.core.windows.net/files'.
More examples of bulk import here.

Related

Error when creating view on pipeline (problem with BULK path)

Good morning everybody!
Me and my team managed to create part of an Azure Synapse pipeline which selects the database and creates a data source named 'files'.
Now we want to create a view in the same pipeline using a Script activity. However, this error comes up:
Error message here
Even if we hardcoded the folder names and the file name on the path, the pipeline won't recognise the existance of the file in question.
This is our query. If we run it manually on a script in the Develop section everything works smoothly:
CREATE VIEW query here
We expected to get every file with ".parquet" extension inside every folder available on our data_source named 'files'. However, running this query on the Azure Synapse Pipeline won't work. If we run it on a script in Develop section, it works perfectly. We want to achieve that result.
Could anyone help us out?
Thanks in advance!
I tried to reproduce the same thing my environment and got error.
The cause of error can be the Your synapse service principal or the user who is accessing the storage account does not have the role of Storage Blob data Contributor role assigned to it or your External data source have some issue. try with creating new external data source with SAS token.
Sample code:
CREATE DATABASE SCOPED CREDENTIAL SasToken
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'SAS token';
GO
CREATE EXTERNAL DATA SOURCE mysample1
WITH ( LOCATION = 'storage account',
CREDENTIAL = SasToken
)
CREATE VIEW [dbo].[View4] AS SELECT [result].filepath(1) as [YEAR], [result].filepath(2) as [MONTH], [result].filepath(3) as [DAY], *
FROM
OPENROWSET(
BULK 'fsn2p/*-*-*.parquet',
DATA_SOURCE = 'mysample1',
FORMAT = 'PARQUET'
) AS [result]
Execution:
Output:

Azure Data Factory Exception while reading table from Synapse and using staging for Polybase

I'm using Data Flow in Data Factory and I need to join a table from Synapse with my flow of data.
When I added the new source in Azure Data Flow I had to add a Staging linked service (as the label said: "For SQL DW, please specify a staging location for PolyBase.")
So I specified a path in Azure Data Lake Gen2 in which Polybase can create its tem dir.
Nevertheless I'm getting this error:
{"StatusCode":"DFExecutorUserError","Message":"Job failed due to reason: at Source 'keyMapCliente': shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: CREATE EXTERNAL TABLE AS SELECT statement failed as the path name 'abfss://MyContainerName#mystorgaename.dfs.core.windows.net/Raw/Tmp/e3e71c102e0a46cea0b286f17cc5b945/' could not be used for export. Please ensure that the specified path is a directory which exists or can be created, and that files can be created in that directory.","Details":"shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException: CREATE EXTERNAL TABLE AS SELECT statement failed as the path name 'abfss://MyContainerName#mystorgaename.dfs.core.windows.net/Raw/Tmp/e3e71c102e0a46cea0b286f17cc5b945/' could not be used for export. Please ensure that the specified path is a directory which exists or can be created, and that files can be created in that directory.\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerException.makeFromDatabaseError(SQLServerException.java:262)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerStatement.getNextResult(SQLServerStatement.java:1632)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerStatement.doExecuteStatement(SQLServerStatement.java:872)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.SQLServerStatement$StmtExecCmd.doExecute(SQLServerStatement.java:767)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jdbc.TDSCommand.execute(IOBuffer.java:7418)\n\tat shaded.msdataflow.com.microsoft.sqlserver.jd"}
The following are the Azure Data Flow Settings:
this the added source inside the data flow:
Any help is appreciated
I have reproed and was able to enable stagging location as azure data lake Gen2 storage account for polybase and connected synapse table data successfully.
Create your database scooped credentials with azure storage account key as secret.
Create an external data source and an external table with the scooped credentials created.
In Azure data factory:
Enable staging and connect to azure data lake Gen2 storage account with Account key authentication type.
In the data flow, connect your source to the synapse table and enable staging property in the source option

How to migrate data from local storage to CosmosDB Table API?

I tried following the documentation where I'm able to migrate data from Azure Table storage to Local storage but after that when I'm trying migrating data from Local to Cosmos DB Table API, I'm facing issues with destination endpoint of Table API. Anyone have the idea that which destination endpoint to use? right now I'm using Table API endpoint from overview section.
cmd error
Problem I see here is you are not using the Table name correctly in source. TablesDB is not the table name. Please check the screenshot below for what we should use for table name. (In this case, mytable1 is the table name). So your source should be something like:
/Source:C:\myfolder\ /Dest:https://xxxxxxxx.table.cosmos.azure.com:443/mytable1/
Just re-iterating that I followed below steps and was able to migrate successfully:
Export from Azure Table Storage to local folder using below article. The table name should match the name of table in storage account:
AzCopy /Source:https://xxxxxxxxxxx.table.core.windows.net/myTable/ /Dest:C:\myfolder\ /SourceKey:key
Export data from Table storage
Import from local folder to Azure Cosmos DB table API using below command where table name is the one we created in the azure cosmos db table api, destkey is primary key and source is exactly copied from connection string appended with table name
AzCopy /Source:C:\myfolder\ /Dest:https://xxxxxxxx.table.cosmos.azure.com:443/mytable1//DestKey:key /Manifest:"myaccount_mytable_20140103T112020.manifest" /EntityOperation:InsertOrReplace
Output:

Error while trying to query external tables in Serverless SQL pools

I am trying to run SELECT * from [dbo].[<table-name>]; queries using JDBC driver on an external table I created in my serverless SQL pool in Azure Synapse using ADLS Gen2 as the storage but getting this error :-
External table 'dbo' is not accessible because location does not exist or it is used by another process.
I get the same error with SELECT * from [<table-name>]; as well. I've tried giving all the required permissions in the storage account as mentioned here https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/resources-self-help-sql-on-demand#query-execution but still getting the same.
Can someone please help me out with this?
This type of error (if you're sure both file exist and it's not used by another process) usually could mean:
As the documentation you linked says, Serverless SQL pool can't access the file because your Azure Active Directory identity doesn't have rights to access the file or because a firewall is blocking access to the file. By default, serverless SQL pool is trying to access the file using your Azure Active Directory identity. To resolve this issue, you need to have proper rights to access the file. Easiest way is to grant yourself 'Storage Blob Data Contributor' role on the storage account you're trying to query.
You have done something wrong when you created the external table. For example, if you have defined your external table in a folder that doesn't actually exist in your storage, you will receive the same error provided, so check out for that (see the SQL script example below for a correct creation of the external table)
-- Create external data source for the storage
CREATE EXTERNAL DATA SOURCE [Test] WITH
(
LOCATION = 'https://<StorageAccount>.blob.core.windows.net/<Container>'
)
-- Create external file format for csv
CREATE EXTERNAL FILE FORMAT [csv] WITH
(
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',')
)
-- Create external table
CREATE EXTERNAL TABLE [dbo].[TableTest]
(
[Col1] VARCHAR(2),
[Col2] INT,
[Col3] INT
)
WITH
(
LOCATION = '<FolderIfExist>/<FileName>.csv',
DATA_SOURCE = [DSTest],
FILE_FORMAT = [csv]
)

Create External table in Azure databricks

I am new to azure databricks and trying to create an external table, pointing to Azure Data Lake Storage (ADLS) Gen-2 location.
From databricks notebook i have tried to set the spark configuration for ADLS access. Still i am unable to execute the DDL created.
Note: One solution working for me is mounting the ADLS account to cluster and then use the mount location in external table's DDL. But i needed to check if it is possible to create a external table DDL with ADLS path without mount location.
# Using Principal credentials
spark.conf.set("dfs.azure.account.auth.type", "OAuth")
spark.conf.set("dfs.azure.account.oauth.provider.type", "ClientCredential")
spark.conf.set("dfs.azure.account.oauth2.client.id", "client_id")
spark.conf.set("dfs.azure.account.oauth2.client.secret", "client_secret")
spark.conf.set("dfs.azure.account.oauth2.client.endpoint",
"https://login.microsoftonline.com/tenant_id/oauth2/token")
DDL
create external table test(
id string,
name string
)
partitioned by (pt_batch_id bigint, pt_file_id integer)
STORED as parquet
location 'abfss://container#account_name.dfs.core.windows.net/dev/data/employee
Error Received
Error in SQL statement: AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.contracts.exceptions.ConfigurationPropertyNotFoundException Configuration property account_name.dfs.core.windows.net not found.);
I need help in knowing if this is possible to refer to ADLS location directly in DDL?
Thanks.
Sort of if you can use Python (or Scala).
Start by making the connection:
TenantID = "blah"
def connectLake():
spark.conf.set("fs.azure.account.auth.type", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id", dbutils.secrets.get(scope = "LIQUIX", key = "lake-sp"))
spark.conf.set("fs.azure.account.oauth2.client.secret", dbutils.secrets.get(scope = "LIQUIX", key = "lake-key"))
spark.conf.set("fs.azure.account.oauth2.client.endpoint", "https://login.microsoftonline.com/"+TenantID+"/oauth2/token")
connectLake()
lakePath = "abfss://liquix#mystorageaccount.dfs.core.windows.net/"
Using Python you can register a table using:
spark.sql("CREATE TABLE DimDate USING PARQUET LOCATION '"+lakePath+"/PRESENTED/DIMDATE/V1'")
You can now query that table if you have executed the connectLake() function - which is fine in your current session/notebook.
The problem is now if a new session comes in and they try select * from that table it will fail unless they run the connectLake() function first. There is no way around that limitation as you have to prove credentials to access the lake.
You may want to consider ADLS Gen2 credential pass through: https://docs.azuredatabricks.net/spark/latest/data-sources/azure/adls-passthrough.html
Note that this requires using a High Concurrency cluster.

Resources