Error when creating view on pipeline (problem with BULK path) - azure

Good morning everybody!
Me and my team managed to create part of an Azure Synapse pipeline which selects the database and creates a data source named 'files'.
Now we want to create a view in the same pipeline using a Script activity. However, this error comes up:
Error message here
Even if we hardcoded the folder names and the file name on the path, the pipeline won't recognise the existance of the file in question.
This is our query. If we run it manually on a script in the Develop section everything works smoothly:
CREATE VIEW query here
We expected to get every file with ".parquet" extension inside every folder available on our data_source named 'files'. However, running this query on the Azure Synapse Pipeline won't work. If we run it on a script in Develop section, it works perfectly. We want to achieve that result.
Could anyone help us out?
Thanks in advance!

I tried to reproduce the same thing my environment and got error.
The cause of error can be the Your synapse service principal or the user who is accessing the storage account does not have the role of Storage Blob data Contributor role assigned to it or your External data source have some issue. try with creating new external data source with SAS token.
Sample code:
CREATE DATABASE SCOPED CREDENTIAL SasToken
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'SAS token';
GO
CREATE EXTERNAL DATA SOURCE mysample1
WITH ( LOCATION = 'storage account',
CREDENTIAL = SasToken
)
CREATE VIEW [dbo].[View4] AS SELECT [result].filepath(1) as [YEAR], [result].filepath(2) as [MONTH], [result].filepath(3) as [DAY], *
FROM
OPENROWSET(
BULK 'fsn2p/*-*-*.parquet',
DATA_SOURCE = 'mysample1',
FORMAT = 'PARQUET'
) AS [result]
Execution:
Output:

Related

Error while trying to query external tables in Serverless SQL pools

I am trying to run SELECT * from [dbo].[<table-name>]; queries using JDBC driver on an external table I created in my serverless SQL pool in Azure Synapse using ADLS Gen2 as the storage but getting this error :-
External table 'dbo' is not accessible because location does not exist or it is used by another process.
I get the same error with SELECT * from [<table-name>]; as well. I've tried giving all the required permissions in the storage account as mentioned here https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/resources-self-help-sql-on-demand#query-execution but still getting the same.
Can someone please help me out with this?
This type of error (if you're sure both file exist and it's not used by another process) usually could mean:
As the documentation you linked says, Serverless SQL pool can't access the file because your Azure Active Directory identity doesn't have rights to access the file or because a firewall is blocking access to the file. By default, serverless SQL pool is trying to access the file using your Azure Active Directory identity. To resolve this issue, you need to have proper rights to access the file. Easiest way is to grant yourself 'Storage Blob Data Contributor' role on the storage account you're trying to query.
You have done something wrong when you created the external table. For example, if you have defined your external table in a folder that doesn't actually exist in your storage, you will receive the same error provided, so check out for that (see the SQL script example below for a correct creation of the external table)
-- Create external data source for the storage
CREATE EXTERNAL DATA SOURCE [Test] WITH
(
LOCATION = 'https://<StorageAccount>.blob.core.windows.net/<Container>'
)
-- Create external file format for csv
CREATE EXTERNAL FILE FORMAT [csv] WITH
(
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (
FIELD_TERMINATOR = ',')
)
-- Create external table
CREATE EXTERNAL TABLE [dbo].[TableTest]
(
[Col1] VARCHAR(2),
[Col2] INT,
[Col3] INT
)
WITH
(
LOCATION = '<FolderIfExist>/<FileName>.csv',
DATA_SOURCE = [DSTest],
FILE_FORMAT = [csv]
)

Azure Data Factory to Azure Blob Storage Permissions

I'm connecting ADF to blob storage v2 using a managed identity following this doc: Doc1
When it comes to test the connection with my first dataset, I am successful when I test the connection to the linkedservice. When I try by the filepath, and enter "testfolder" (which exists in the blob) it fails returning a generic forbidden error displayed at the end of this post.
However, when I opt to "browse" the folders in the dataset portal, the folder "testfolder" does show up. But when I select it, it will not show me anything within that folder.
The Data Factory managed instance is given the role of Contributor, granting full access to manage all resources. Is there some other hidden issue or possible way to narrow down the issue? My instinct is that this is something within the blob container since I can view the containers, but not their contents.
Error message:
It seems that you don't give the role of azure blob storage.
Please fellow this:
1.click IAM in azure blob storage,navigate to Role assignments and add role assignment.
2.choose role according your need and select your data factory.
3.A few minute later,you can retry to choose file path.
Hope this can help you.

Loading file from Azure Blob storage into Azure SQL DB: error code 86 The specified network password is not correct

I've been trying to run the following script to read the file from azure blob storage.
--------------------------------------------
--CREATING CREDENTIAL
-- --------------------------------------------
--------------------------------------------
--shared access signature
-- --------------------------------------------
CREATE DATABASE SCOPED CREDENTIAL dlcred
with identity='SHARED ACCESS SIGNATURE',
SECRET = 'sv=2018-03-28&ss=bfqt&srt=sco&sp=rwdlacup&se=2019-12-01T07:28:58Z&st=2019-08-31T23:28:58Z&spr=https,http&sig=<signature from storage account>';
--------------------------------------------
--CREATING SOURCE
--------------------------------------------
CREATE EXTERNAL DATA SOURCE datalake
WITH (
TYPE =  BLOB_STORAGE,
LOCATION='https://<storageaccount>.blob.core.windows.net/<blob>',
CREDENTIAL = dlcred
);
Originally, the script worked just fine, but later on it started giving the following error when running the last query below - Cannot bulk load because the file "test.txt" could not be opened. Operating system error code 86(The specified network password is not correct.)
--TEST
--------------------------------------------
SELECT CAST(BulkColumn AS XML)
FROM OPENROWSET
(
 BULK 'test.xml',
 DATA_SOURCE = 'datalake', 
 SINGLE_BLOB
) as xml_import
The same error happens if I create a credential with service principal or access key.
Tried literally everything and logged the ticket with Azure support, however they are struggling to replicate this error.
I feel like it's an issue outside of the storage account and SQL server - Azure has a whole bunch of services that can be activated/deactivated against a subscription, and I feel like it's one of these that's preventing us from successfully mapping the storage account.
Has anyone encountered this error? And if so, how did you solve it?
I was able to get this issue resolved with Microsoft support. In section F here, I granted Storage Blob Data Contributor access to the managed identity of the SQL Server instance, then ran the SQL statements using the managed identity section in here: https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-ver15#f-importing-data-from-a-file-in-azure-blob-storage.
Preserving the code solution below:
CREATE DATABASE SCOPED CREDENTIAL msi_cred WITH IDENTITY = 'Managed Identity';
GO
CREATE EXTERNAL DATA SOURCE MyAzureBlobStorage
WITH ( TYPE = BLOB_STORAGE,
LOCATION = 'https://****************.blob.core.windows.net/curriculum'
, CREDENTIAL= msi_cred );
BULK INSERT Sales.Invoices
FROM 'inv-2017-12-08.csv'
WITH (DATA_SOURCE = 'MyAzureBlobStorage');
In order to do this, the SQL server instance requires a managed identity to be assigned to it. This can be done at creation time with the --assign-identity flag.

Azure Blob to Azure SQL tables creation

I am trying to convert  BLOB Files into SQL DB Tables in Azure using BULK INSERT.
Here is the reference from the Microsoft:
https://azure.microsoft.com/en-us/updates/preview-loading-files-from-azure-blob-storage-into-sql-database/
My DATA in CSV looks like this
100,"37415B4EAF943043E1111111A05370E","ONT","000","S","ABCDEF","AB","001","000002","001","04","20110902","11111111","20110830152048.1837780","",""
My BLOB Container is in Public Access Level.
Step 1: Created Storage Credential.  I had generated a shared Access key (SAS token).
CREATE DATABASE SCOPED CREDENTIAL Abablobv1BlobStorageCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'sv=2017-07-29&ss=bfqt&srt=sco&sp=rwdlacup&se=2018-04-10T18:05:55Z&st=2018-04-09T10:05:55Z&sip=141.6.1.0-141.6.1.255&spr=https&sig=XIFs1TWafAakQT3Ig%3D';
GO
Step 2:  Created EXTERNAL DATA SOURCE in reference to Storage Credential
CREATE EXTERNAL DATA SOURCE Abablobv1BlobStorage
WITH ( TYPE = BLOB_STORAGE, LOCATION = 'https://abcd.blob.core.windows.net/', CREDENTIAL = Abablobv1BlobStorageCredential );
GO
Step 3 BULK INSERT STATEMENT using the External Data Source and DB TABLE
BULK INSERT dbo.TWCS
FROM 'TWCSSampleData.csv'
WITH ( DATA_SOURCE = 'Abablobv1BlobStorage', FORMAT = 'CSV');
GO
I am facing this error:
Bad or inaccessible location specified in external data source
"Abablobv1BlobStorage".
Does anyone have some idea about this?
I changed the Location of EXTERNAL DATA SOURCE to Location = abcd.blob.core.windows.net/invoapprover/SampleData.csv Now I get, Cannot bulk load because the file "SampleData.csv" could not be opened. Operating system error code 5(Access is denied.). For both statements using Bulk Insert or Open Row Set. I was not sure which access should be changed because the file is in Azure blob not on my machine, any ideas for this??
Please try the following query
SELECT * FROM OPENROWSET(
BULK 'TWCSSampleData.csv',
DATA_SOURCE = 'Abablobv1BlobStorage',
SINGLE_CLOB) AS DataFile;
Make sure the file is not located inside a container on the BLOB storage. In that case you need to specify the container in the Location parameter of the External Data Source. If you have a container named "files" then the location should be like 'https://abcd.blob.core.windows.net/files'.
More examples of bulk import here.

Restoring Database from Azure Blob Storage failing from SSMS while using RESTORE FILELISTONLY

I am trying to restore a SQL 2016 database backup file which is in Azure Blob Storage from SSMS using the below T- SQL command :
RESTORE FILELISTONLY
FROM URL = 'https://.blob.core.windows.net//.bak'
GO
It works fine with my normal Azure subscription. But when I use a CSP account ,I get the below error :
Cannot open backup device 'https://.blob.core.windows.net//.bak'. Operating system error 86(The specified network password is not correct.).
Any help on fixing this issue is greatly appreciated.
Following the steps below you should be able to get the file-list.
First you need to create a 'credential': e.g.
create credential [cmbackupprd-sqlbackup]
with
identity = '<storageaccountname>',
secret = 'long-and-lengthy-storageaccountkey'
Now you can use this credential to connect to your storage-account.
restore filelistonly
from URL = 'https://yourstorageaccount.blob.core.windows.net/path/to/backup.bak'
with credential='storageaccount-credential'
Note, I'm asssuming the backup is made directly from sql to azure blob-storage. Otherwise you might need to check the blob-type.

Resources