Reading data from Azure Blob Storage into Azure Databricks using /mnt/ - azure

I've successfully mounted my blob storage to Databricks, and can see the defined mount point when running dbutils.fs.ls("/mnt/"). This has size=0 - it's not clear if this is expected or not.
When I try and run dbutils.fs.ls("/mnt/<mount-name>"), I get this error:
java.io.FileNotFoundException: / is not found
When I try and write a simple file to my mounted blob with dbutils.fs.put("/mnt/<mount-name>/1.txt", "Hello, World!", True), I get the following error (shortened for readability):
ExecutionError: An error occurred while calling z:com.databricks.backend.daemon.dbutils.FSUtils.put. : shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
...
Caused by: com.microsoft.azure.storage.StorageException: The specified resource does not exist.
All the data is in the root of the Blob container, so I have not defined any folder structures in the dbutils.fs.mount code.
thinking emoji

The solution here is making sure you are using the 'correct' part of your Shared Access Signature (SAS). When the SAS is generated, you'll find there are lots of different parts of it that you can use - it's likely sent to you as one long connection string, e.g:
BlobEndpoint=https://<storage-account>.blob.core.windows.net/;QueueEndpoint=https://<storage-account>.queue.core.windows.net/;FileEndpoint=https://<storage-account>.file.core.windows.net/;TableEndpoint=https://<storage-account>.table.core.windows.net/;SharedAccessSignature=sv=<date>&ss=nwrt&srt=sco&sp=rsdgrtp&se=<datetime>&st=<datetime>&spr=https&sig=<long-string>
When you define your mount point, use the value of the SharedAccessSignature key, e.g:
sv=<date>&ss=nwrt&srt=sco&sp=rsdgrtp&se=<datetime>&st=<datetime>&spr=https&sig=<long-string>

Related

shaded.databricks.org.apache.hadoop.fs.azure.AzureException: An exception while trying to list a directory after mounting

I am getting below exception,
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details
First I mounted the directory in dbfs like below,
dbutils.fs.mount(
source = f"wasbs://{containerName}#{storageAccount}.blob.core.windows.net/",
mount_point = "/mnt/a",
extra_configs = {f"fs.azure.sas.{containerName}.{storageAccount}.blob.core.windows.net": sasKey}
)
then I did,
dbutils.fs.ls("/mnt/a")
I see below reason,
Caused by: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
at hadoop_azure_shaded.com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113)
at shaded.databricks.org.apache.hadoop.fs.azure.StorageInterfaceImpl$WrappingIterator.hasNext(StorageInterfaceImpl.java:158)
at shaded.databricks.org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.listInternal(AzureNativeFileSystemStore.java:2444)
... 41 more
Caused by: hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: This request is not authorized to perform this operation using this permission.
at hadoop_azure_shaded.com.microsoft.azure.storage.StorageException.translateException(StorageException.java:87)
at hadoop_azure_shaded.com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:305)
at hadoop_azure_shaded.com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:196)
at hadoop_azure_shaded.com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109)
Could someone please help me on this?
This has happened due to wrong SAS key configuration which did not have all permissions for the container. The issue has been resolved after giving right SAS key with all permissions.
The real error this: "This request is not authorized to perform this operation using this permission" - the most probably cause is that you don't have "Blob Contributor" permission that is different from the "Contributor" permission that is set when you create a storage account.

Azure DataLake (ADLS) BulkDownload Bad Request

I am trying to download the file from adls using the BulkDownload method using BulkDownload but I am getting a BAD Request response as below:
Error in getting metadata for path cc-
adl://testaccount.azuredatalakestore.net//HelloWorld//test.txt
Operation: GETFILESTATUS failed with HttpStatus:BadRequest Error: Uexpected
error in JSON parsing.
Last encountered exception thrown after 1 tries. [Uexpected error in JSON
parsing]
[ServerRequestId:]
However, if I try to download the file through azure client shell it works.
I am using the BulkDownload as follow:
client.BulkDownload(
srcPath,
dstPath);
Is anyone else facing the same issue for BulkDownload call?
I got this fixed as the srcPath is the relative path ("/HelloWorld/test.txt") in the azure datalake storage, previously I was the using the absolute path ("adl://testaccount.azuredatalakestore.net//HelloWorld/test.txt).

EXTERNAL TABLE access failed due to internal error: 'Java exception raised on call to HdfsBridge_IsDirExist. Java exception message:

I am trying to create external table through polybase with below syntax on Visual Studio 2015. Its giving me the below error. Can some one pls help on this
CREATE EXTERNAL TABLE dbo.DimDate2External (
DateId INT NOT NULL,
CalendarQuarter TINYINT NOT NULL,
FiscalQuarter TINYINT NOT NULL
)
WITH (
LOCATION='/textfiles/DimDate2.txt',
DATA_SOURCE=AzureStorage,
FILE_FORMAT=TextFile
);
CREATE EXTERNAL DATA SOURCE AzureStorage
WITH (
TYPE = HADOOP,
LOCATION = 'wasbs://<blob_container_name>#<azure_storage_account_name>.‌​blob.core.windows.ne‌​t',
CREDENTIAL = AzureStorageCredential
);
CREATE EXTERNAL FILE FORMAT TextFile WITH ( FORMAT_TYPE = DelimitedText, FORMAT_OPTIONS (FIELD_TERMINATOR = ',') );
EXTERNAL TABLE access failed due to internal error:
'Java exception raised on call to HdfsBridge_IsDirExist. Java
exception message: com.microsoft.azure.storage.StorageException:
Server failed to authenticate the request. Make sure the value of
Authorization header is formed correctly including the signature.:
Error [com.microsoft.azure.storage.StorageException: Server failed to
authenticate the request. Make sure the value of Authorization header
is formed correctly including the signature.] occurred while accessing
external file.'
In the 'LOCATION' syntax I mistakenly misplaced the Blob container and Storage account and got this error. Now its fixed.
CREATE EXTERNAL DATA SOURCE AzureStorage WITH ( TYPE = HADOOP, LOCATION = 'wasbs://#.‌​blob.core.windows.ne‌​t', CREDENTIAL = AzureStorageCredential )
I can reproduce this error if the Azure Storage account element of your external data source is incorrect (XXX in my example):
CREATE EXTERNAL DATA SOURCE eds_dummy
WITH (
TYPE = Hadoop,
LOCATION = 'wasbs://dummy#XXX.blob.core.windows.net',
CREDENTIAL = sc_tpch
);
If the blob container name is incorrect (dummy in my example) but the storage account is correct, you get a very specific error message when trying to create the table:
Msg 105002, Level 16, State 1, Line 27 EXTERNAL TABLE access failed
because the specified path name '/test.txt' does not exist. Enter a
valid path and try again.
There appears to be some kind of validation on the blob container. However if the Azure Storage Account name is incorrect, you do not get an error when you create the external data source, only when you try and create the table:
Msg 105019, Level 16, State 1, Line 35 EXTERNAL TABLE access failed
due to internal error: 'Java exception raised on call to
HdfsBridge_IsDirExist. Java exception message:
com.microsoft.azure.storage.StorageException: The server encountered
an unknown failure: : Error
[com.microsoft.azure.storage.StorageException: The server encountered
an unknown failure: ] occurred while accessing external file.'
To correct, please make sure the Azure Storage Account and Blob container exist.
The easiest way to do this is copy the URL of your file or folder from the portal and fix it up for external tables, ie from this:
https://yourStorageAccountName.blob.core.windows.net/yourBlobContainerName
to this:
wasbs://yourBlobContainerName#yourStorageAccountName.blob.core.windows.net
Good luck.

Streaming through .NET application in Azure

I have a .NET executable through which I want to stream data in Pig on my Azure HDInsight cluster. I've uploaded it to my container, but when I try to stream data through it, I get the following error:
<line 1, column 393> Failed to generate logical plan. Nested exception: java.io.IOException: Invalid ship specification: '/util/myStreamApp.exe' does not exist!
I define and use my action as follows:
DEFINE myApp `myStreamApp.exe` SHIP('/util/myStreamApp.exe');
outputData = STREAM inputData THROUGH myApp;
I try with and without the leading /, tried qualifying as wasb:///util/myStreamApp.exe and tried fully qualifying it as wasb://myContainer#myAccount.blob.core.windows.net/util/myStreamApp.exe, but in every case, I get the message that my file doesn't exist.
This page on uploading to HDInsight indicates you can use the Azure Blob Storage path of wasb:///example/data/davinci.txt in HDInsight as /example/data/davinci.txt, which indicates to me that there shouldn't be a problem with the paths.
It turns out the problem was that I wasn't declaring a dependency on the caller's side. I've got a console app that creates the Pig job:
var job = new PigJobCreateParameters()
{
Query = myPigQuery,
StatusFolder = myStatusFolder
};
But I needed to add to the job.Files collection a dependency upon my file:
job.Files.Add("wasbs://myContainer#myAccount.blob.core.windows.net/util/myStreamApp.exe");

How to handle DNS-lookup failure to Azure Blob Storage

(I'm quite new to Windows Azure development, so I hope I'm using the right terms.)
We have an Azure Worker Role that is supposed to fetch data stored in Blob Storage.
Somehow we occasionally get the following error message:
Microsoft.WindowsAzure.StorageClient.StorageServerException: The server encountered an unknown failure: The remote name could not be resolved: 'XXX.blob.core.windows.net' ---> System.Net.WebException: The remote name could not be resolved: 'XXX.blob.core.windows.net'
This seems strange, since requests only a second before and/or after works as expected.
If I understand things correctly, the CloudBlob class has internal retry functionality. It seems that this is not considered as a "retryable" error. Is this perhaps handled by the Transient Error Handling Block (Topaz), or do we have to handle this specific error in some other way?

Resources