Can anyone help me with this error - azure

I am trying to fetch some data from azure data lake to azure datawarehouse, but I am unable to do it I have followed the documentation link
https://learn.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-load-from-azure-data-lake-store
But I am getting this error when I am trying to create an external table, I have created another web/api app but still was not able to access thE application here is the error which I am facing
EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:
GETFILESTATUS failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.). [0ec4b8e0-b16d-470e-9c98-37818176a188][2017-08-14T02:30:58.9795172-07:00]: Error [GETFILESTATUS failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.). [0ec4b8e0-b16d-470e-9c98-37818176a188][2017-08-14T02:30:58.9795172-07:00]] occurred while accessing external file.'
Here is the script which I am trying to get it to work with
CREATE DATABASE SCOPED CREDENTIAL ADLCredential2
WITH
IDENTITY = '2ec11315-5a30-4bea-9428-e511bf3fa8a1#https://login.microsoftonline.com/24708086-c2ce-4b77-8d61-7e6fe8303971/oauth2/token',
SECRET = '3Htr2au0b0wvmb3bwzv1FekK88YQYZCUrJy7OB3NzYs='
;
CREATE EXTERNAL DATA SOURCE AzureDataLakeStore11
WITH (
TYPE = HADOOP,
LOCATION = 'adl://test.azuredatalakestore.net/',
CREDENTIAL = ADLCredential2
);
CREATE EXTERNAL FILE FORMAT TextFileFormat
WITH
( FORMAT_TYPE = DELIMITEDTEXT
, FORMAT_OPTIONS ( FIELD_TERMINATOR = '|'
, DATE_FORMAT = 'yyyy-MM-dd HH:mm:ss.fff'
, USE_TYPE_DEFAULT = FALSE
)
);
CREATE EXTERNAL TABLE [extccsm].[external_medication]
(
person_id varchar(4000),
encounter_id varchar(4000),
fin varchar(4000),
mrn varchar(4000),
icd_code varchar(4000),
icd_description varchar(300),
priority integer,
optional1 varchar(4000),
optional2 varchar(4000),
optional3 varchar(4000),
load_identifier varchar(4000),
upload_time datetime2,
xx_person_id varchar(4000),--Person ID is the ID that we will use to represent the person through out the process uniquely, This requires initial analysis to determine how to set it
xx_encounter_id varchar(4000),--Encounter ID is the ID that will represent the encounter uniquely through out the process, This requires initial analysis to determine hos to set it based on client data
mod_optional1 varchar(4000),
mod_optional2 varchar(4000),
mod_optional3 varchar(4000),
mod_optional4 varchar(4000),
mod_optional5 varchar(4000),
mod_loadidentifier datetime2
)
WITH
(
LOCATION='\testfiles\procedure_azure.txt000\',
DATA_SOURCE = AzureDataLakeStore11, --DATA SOURCE THE BLOB STORAGE
FILE_FORMAT = TextFileFormat, --TYPE OF FILE FORMAT
REJECT_TYPE = percentage,
REJECT_VALUE = 1,
REJECT_SAMPLE_VALUE = 0
);
Please tell me whats wrong here?

I can reproduce this but it's hard to narrow down exactly. I think it's to do with permissions. From the Azure portal:
Data Lake Store > yourDataLakeAccount > your folder > Access
From there, make sure your AD Application has Read, Write and Execute permission on the relevant files / folders. Start with one file initially. I can reproduce the error by assigning / unassigning the Execute permissions but need to repeat the steps to confirm. I'll retrace my steps but for now concentrate your search here. In my example below, my Azure Active Directory Application is called adwAndPolybase; you can see I've given it Read, Write and Execute. I also experimented with the Advanced and 'Apply to children' options:

Related

Synapse Dedicated SQL Pool - Copy Into Failing With Odd error - Python

I'm getting an error when attempting to insert from a temp table into a table that exists in Synapse, here is the relevant code:
def load_adls_data(self, schema: str, table: str, environment: str, filepath: str, columns: list) -> str:
if self.exists_schema(schema):
if self.exists_table(schema, table):
if environment.lower() == 'prod':
schema = "lvl0"
else:
schema = f"{environment.lower()}_lvl0"
temp_table = self.generate_temp_create_table(schema, table, columns)
sql0 = """
IF OBJECT_ID('tempdb..#CopyDataFromADLS') IS NOT NULL
BEGIN
DROP TABLE #CopyDataFromADLS;
END
"""
sql1 = """
{}
COPY INTO #CopyDataFromADLS FROM
'{}'
WITH
(
FILE_TYPE = 'CSV',
FIRSTROW = 1
)
INSERT INTO {}.{}
SELECT *, GETDATE(), '{}' from #CopyDataFromADLS
""".format(temp_table, filepath, schema, table, Path(filepath).name)
print(sql1)
conn = pyodbc.connect(self._synapse_cnx_str)
conn.autocommit = True
with conn.cursor() as db:
db.execute(sql0)
db.execute(sql1)
If I get rid of the insert statement and just do a select from the temp table in the script:
SELECT * FROM #CopyDataFromADLS
I get the same error in either case:
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Not able to validate external location because The remote server returned an error: (409) Conflict. (105215) (SQLExecDirectW)')
I've run the generated code for both the insert and the select in Synapse and they ran perfectly. Google has no real info on this so could someone assist with this? Thanks
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Not able to validate external location because The remote server returned an error: (409) Conflict. (105215) (SQLExecDirectW)')
This error occurs mostly because of authentication or access.
Make sure you have blob storage contributor access.
In the copy into script, add the authentication key for blob storage, unless it is a public blob storage.
I tried to repro this using copy into statement without authentication and got the same error.
After adding authentication using SAS key data is copied successfully.
Refer the Microsoft document for permissions required for bulk load using copy into statements.

Azure Synapse Serverless CETAS error "External table location is not valid"

I'm using Synapse Serverless Pool and get the following error trying to use CETAS
Msg 15860, Level 16, State 5, Line 3
External table location path is not valid. Location provided: 'https://accountName.blob.core.windows.net/ontainerName/test/'
My workspace managed identity should have all the correct ACL and RBAC roles on the storage account. I'm able to query the files I have there but is unable to execute the CETAS command.
CREATE DATABASE SCOPED CREDENTIAL WorkspaceIdentity WITH IDENTITY = 'Managed Identity'
GO
CREATE EXTERNAL DATA SOURCE MyASDL
WITH ( LOCATION = 'https://accountName.blob.core.windows.net/containerName'
,CREDENTIAL = WorkspaceIdentity)
GO
CREATE EXTERNAL FILE FORMAT CustomCSV
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (ENCODING = 'UTF8')
);
GO
CREATE EXTERNAL TABLE Test.dbo.TestTable
WITH (
LOCATION = 'test/',
DATA_SOURCE = MyASDL,
FILE_FORMAT = CustomCSV
) AS
WITH source AS
(
SELECT
jsonContent
, JSON_VALUE (jsonContent, '$.zipCode') AS ZipCode
FROM
OPENROWSET(
BULK '/customer-001-100MB.json',
FORMAT = 'CSV',
FIELDQUOTE = '0x00',
FIELDTERMINATOR ='0x0b',
ROWTERMINATOR = '\n',
DATA_SOURCE = 'MyASDL'
)
WITH (
jsonContent varchar(1000) COLLATE Latin1_General_100_BIN2_UTF8
) AS [result]
)
SELECT ZipCode, COUNT(*) as Count
FROM source
GROUP BY ZipCode
;
If I've tried everything in the LOCATION parameter of the CETAS command, but nothing seems to work. Both folder paths, file paths, with and without leading / trailing / etc.
The CTE select statement works without the CETAS.
Can't I use the same data source for both reading and writing? or is it something else?
The issue was with my data source definition.
Where I had used https:\\ when I changed this to wasbs:\\ as per the following link TSQL CREATE EXTERNAL DATA SOURCE
Where it describes you have to use wasbs, abfs or adl depending on your data source type being a V2 storage account, V2 data lake or V1 data lake

Azure Synapse TSQL

I am new to Azure Synapse and had a question about how the files are setup on Azure while creating an external table from a select. Would the files be over-written or would one need to truncate the files every time a create external table script is run? For e.g. if I run the following script
CREATE EXTERNAL TABLE [dbo].[PopulationCETAS] WITH (
LOCATION = 'populationParquet/',
DATA_SOURCE = [MyDataSource],
FILE_FORMAT = [ParquetFF]
) AS
SELECT
*
FROM
OPENROWSET(
BULK 'csv/population-unix/population.csv',
DATA_SOURCE = 'sqlondemanddemo',
FORMAT = 'CSV', PARSER_VERSION = '2.0'
) WITH (
CountryCode varchar(4),
CountryName varchar(64),
Year int,
PopulationCount int
) AS r;
Would the file created
LOCATION = 'populationParquet/',
DATA_SOURCE = [MyDataSource],
FILE_FORMAT = [ParquetFF]
be overwritten every time the script is run? Can this be specified at the time of setup or within the query options?
I would love to be able to drop the files in storage with a DELETE or TRUNCATE operation but this feature doesn’t currently exist within T-SQL. Please vote for this feature.
In the meantime you will need to use outside automation like an Azure Data Factory pipeline.

Error while connecting DB2/IDAA using ADFV2

I am trying to connect DB2/IDAA using ADFV2 - while executing simple query "select * from table" - I am getting below error:
Operation on target Copy data from IDAA failed: An error has occurred on the Source side. 'Type = Microsoft.HostIntegration.DrdaClient.DrdaException, Message = Exception or type' Microsoft.HostIntegration.Drda.Common.DrdaException 'was thrown. SQLSTATE = HY000 SQLCODE = -343, Source = Microsoft.HostIntegration.Drda.Requester, '
I checked a lot and tried various options but still it's an issue.
I tried query "select * from table with ur" - query to call with read-only but still get above result.
If I use query like select * from table; commit; - then activity succeeded but no record fetch.
Is anyone have solution ?
I have my linked service setup like this. additional connection properties value is : SET CURRENT QUERY ACCELERATION = ALL

I want to use SQL_VARIANT datatype in external table Azure SQL and I get the "Index was out of range error."

I have two SQL Azure databases - DatabaseA and DatabaseB on a server hosted in Azure.
I need to access a view on DatabaseA from DatabaseB - namely I need the sys.identity_columns in DatabaseA to be available to me on DatabaseB. So I am creating an external table on DatabaseB that links to this information like this (I didn't include all the columns but I included the one causing the problem)
CREATE EXTERNAL TABLE [SOURCE_SYS].[identity_columns](
[object_id] int not null
,[name] nvarchar(128) null
,[column_id] int not null
,[system_type_id] tinyint not null
,[seed_value] sql_variant null
)
WITH
(
DATA_SOURCE = MyElasticDBQueryDataSrc,
SCHEMA_NAME = 'sys',
OBJECT_NAME = 'identity_columns'
);
When I run this - it works. But when I try to use the result - select * from [SOURCE_SYS].[identity_columns] - I get this error:
Msg 46823, Level 16, State 1, Line 50
Error retrieving data from MyServer.database.windows.net.DatabaseA. The underlying error message received was: 'Index was out of range. Must be non-negative and less than the size of the collection.
Parameter name: index'.
If I comment out the fields in this table that have the sql_variant datatypes - it works fine but I do need the information in that field and the other two sql_variant fields that exist in the same table. MyElasticDBQueryDataSrc works fine on other similar tables without the sql_variant type.
Can anyone suggest what I might be doing wrong? Or suggest a workaround? I tried using bigints as it is mostly seed values that are either integers or null but that didn't work because it told me it wasn't the same datatype.
Any help much appreciated.
Well - after a weekend of sleep I figured out the answer!
If you use nvarchar(30) in he external table definition - you can then convert it to a bigint in any query you use it in
CREATE EXTERNAL TABLE [SOURCE_SYS].[identity_columns](
[object_id] int not null
,[name] nvarchar(128) null
,[column_id] int not null
,[system_type_id] tinyint not null
,[seed_value] nvarchar(30) null
)
WITH
(
DATA_SOURCE = MyElasticDBQueryDataSrc,
SCHEMA_NAME = 'sys',
OBJECT_NAME = 'identity_columns'
);
Now I can access the value like this:
select cast(isnull([seed_value], 0) as bigint) from SOURCE_SYS.identity_columns
Beware that if you do a select * from - you will need to do the variants separately from the rest of the query - you'll get this error:
Msg 46825, Level 16, State 1, Line 58
The data type of the column 'seed_value' in the external table is different than the column's data type in the underlying standalone or sharded table present on the external source.
Hope this is helpful to someone!

Resources