I am trying to copy several csv files from an Azure Storage Blob to an Azure Data Lake Storage Gen1 using AdlCopy in Standalone Mode. The Data Lake Storage folder I am trying to move the data to is currently empty. Here is my CMD script that I am using:
C:\Users\username\Applications>AdlCopy /source https://myblobstorage.blob.core.windows.net/Folder/SubFolder/SubFolder/ /dest swebhdfs://mydatalakestorage.azuredatalakestore.net/Folder/ /sourcekey mysourcekey
When I run the script I am prompted for my credentials. After I enter them this is the error that I get:
Initializing Copy.
Invalid JSON primitive: .
Copy Failed.
Has anyone else had any experience with this error? I have yet to see any documentation on how to handle this error or what might be causing it. I appreciate any help or guidance!
Related
My batch processing pipeline in Azure has the following scenario: I am using the copy activity in Azure Data Factory to unzip thousands of zip files, stored in a blob storage container. These zip files are stored in a nested folder structure inside the container, e.g.
zipContainer/deviceA/component1/20220301.zip
The resulting unzipped files will be stored in another container, preserving the hierarchy in the sink's copy behavior option, e.g.
unzipContainer/deviceA/component1/20220301.zip/measurements_01.csv
I enabled the logging of the copy activity as:
And then provided the folder path to store the generated logs (in txt format), which have the following structure:
Timestamp
Level
OperationName
OperationItem
Message
2022-03-01 15:14:06.9880973
Info
FileWrite
"deviceA/component1/2022.zip/measurements_01.csv"
"Complete writing file. File is successfully copied."
I want to read the content of these logs in an R notebook in Azure DataBricks, in order to get the complete paths for these csv files for processing. The command I used, read.df is part of SparkR library:
Logs <- read.df(log_path, source = "csv", header="true", delimiter=",")
The following exception is returned:
Exception: Incorrect Blob type, please use the correct Blob type to access a blob on the server. Expected BLOCK_BLOB, actual APPEND_BLOB.
The generated logs from the copy activity is of append blob type. read.df() can read block blobs without any issue.
From the above scenario, how can I read these logs successfully into my R session in DataBricks ?
According to this Microsoft documentation, Azure Databricks and Hadoop Azure WASB implementations do not support reading append blobs.
https://learn.microsoft.com/en-us/azure/databricks/kb/data-sources/wasb-check-blob-types
And when you try to read this log file of append blob type, it gives error saying that Exception: Incorrect Blob type, please use the correct Blob type to access a blob on the server. Expected BLOCK_BLOB, actual APPEND_BLOB.
So, you cannot read the log file of append blob type from blob storage account. A solution to this would be to use an azure datalake gen2 storage container for logging. When you run the pipeline using ADLS gen2 for logs, it creates log file of block blob type. You can now read this file without any issue from dataricks.
Using blob storage for logging:
Using ADLS gen2 for logging:
Need help with this error.
ADF copy activity, Moving data from snowflake to azure blob storage delimited text.
I am able to preview the snowflake source data. I am also able to browse the containers via sink browse. This doesn't look like an issue with permissions.
ErrorCode=SnowflakeExportCopyCommandValidationFailed,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Snowflake Export Copy Command validation failed:
'The Snowflake copy command payload is invalid.
Cannot specify property: column mapping,
Source=Microsoft.DataTransfer.ClientLibrary,'
Thanks for your help
Clear the mapping from copy activity, it worked.
ErrorCode=SnowflakeExportCopyCommandValidationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Snowflake Export Copy Command validation failed: 'The Snowflake copy command payload is invalid. Cannot specify property: column mapping,Source=Microsoft.DataTransfer.ClientLibrary,'"
To resolve the above error, please check the following:
Please check the copy command you are using, it means the copy command you are using is not valid.
To know more in detail about the error, try like below:
COPY INTO MYTABLE VALIDATION_MODE = 'RETURN_ERRORS' FILES=('result.csv');
Check if you have any issue with the data files before loading the data again.
Try checking the storage account you are currently using and note that Snowflake doesn't support Data Lake Storage Gen1.
Use the COPY INTO command to copy the data from the Snowflake database table into Azure blob storage container.
Note:
Use the blob.core.windows.net endpoint for all supported types of Azure blob storage accounts, including Data Lake Storage Gen2.
Make sure to have either ACCOUNTADMIN role or a role with the global CREATE INTEGRATION privilege to run the below sample command:
copy into 'azure://myaccount.blob.core.windows.net/mycontainer/unload/' from mytable storage_integration = myint;
By using the above command, you no need to include credentials to access the storage.
For more in detail, please refer below links:
Unloading into Microsoft Azure — Snowflake Documentation
COPY INTO < table> — Snowflake Documentation
I sent my azure databricks logs to storage account & Microsoft by default contains those entry in append_blob. I tried to read the Json data with access key I got a error ( shaded.databricks.org.apache.hadoop.fs.azure.AzureException: com.microsoft.azure.storage.StorageException: Incorrect Blob type, please use the correct Blob type to access a blob on the server. Expected BLOCK_BLOB, actual APPEND_BLOB).
Error -
Is there any way to read directly that data path(insights-logs-jobs.mdd.blob.core.windows.net/resourceId=/SUBSCRIPTIONS/xxxxxx-xxxx-xxxxx/RESOURCEGROUPS/ssd--RG/PROVIDERS/MICROSOFT.DATABRICKS/WORKSPACES/addd-PROCESS-xx-ADB/y%3D2021/m%3D12/d%3D07/h%3D00/m%3D00/PT1H.json")
Second way I tired to copy data to other container where data comes in block_blob & tried to read using databricks it works. But need to automate the copy data from multiple container to another. As seen in diagram.
I have a problem using azure AzCopy. Here my scenario.
I have 2 storage accounts, which I am gonna name storage1 and storage2.
Storage1 contains some important data in multiple tables, what I want to do..is to be able to copy all the tables in storage1 to storage2 (having a backup).
I tried 2 different approaches:
AzCopy
Azure Data Factory
With Azure Data Factory I didn't have any particular problem to make it work, I was able to move all the blobs from storage1 to the data Factory but I I couldn't move the tables and have no clue if this is possible to do it with python.
with AzCopy I had zero luck. I gave myself permission in IAM Blob Storage Data contributor and from the terminal when I run this command:
azcopy cp 'https://storage1.table.core.windows.net/Table1' 'https://storage2[...]-Key'
I got the permission error.
In this specific scenario I would love to be able to use AzCopy as is way more simple than data factory as all what I need is to move those table from one storage to the other.
Anyone who can help me out to understand what am I doing wrong with azCopy please?
EDIT:
This is the error when I try to copy the table using azcopy
INFO: The parameters you supplied were Source: 'https://storage1.table.core.windows.net/[SAS]' of type Local, and Destination: 'https://storage2.table.core.windows.net/[SAS]' of type Local
INFO: Based on the parameters supplied, a valid source-destination combination could not automatically be found. Please check the parameters you supplied. If they are correct, please specify an exact source and destination type using the --from-to switch. Valid values are two-word phases of the form BlobLocal, LocalBlob etc. Use the word 'Blob' for Blob Storage, 'Local' for the local file system, 'File' for Azure Files, and 'BlobFS' for ADLS Gen2. If you need a combination that is not supported yet, please log an issue on the AzCopy GitHub issues list.
failed to parse user input due to error: the inferred source/destination combination could not be identified, or is currently not supported
If you want to copy all the tables which is present is abc container to xyz container. Use simple copy acitivty and while creating dataset just give folder path that copy all the content i.e all the tables to your xyz container.
I would like to watch below video from the 30th mins. It will help in your scenario.
https://youtu.be/m6wyB-Hm3j0
I am using the following code in Databricks to upload files to DBFS. The files are showing when I do dbutils.fs.ls(path). However when I try to read, I am getting a file not found error (see further down). Also, the file sizes are showing as zero?
def WriteFileToDbfs(file_path,test_folder_file_path,target_test_file_name):
df = spark.read.format("delta").load(file_path)
df2 = df.limit(1000)
df2.write.mode("overwrite").parquet(test_folder_file_path+target_test_file_name)
Here is the error:
AnalysisException: Path does not exist: dbfs:/tmp/qa_test/test-file.parquet;
Here are the files listed but with zero sizes:
In Azure Databricks, this is expected behavior.
For Files, it displays the actual file size.
For Directories, it displays the size=0
For Corrupted files displays the size=0
You can get more details using Azure Databricks CLI:
You can get more details using Databricks Explorer:
DBFS Explorer was created as a quick way to upload and download files to the Databricks filesystem (DBFS). This will work with both AWS and Azure instances of Databricks. You will need to create a bearer token in the web interface in order to connect.