How to create a empty folder in Azure Blob from Azure databricks - azure

I have scenario where I want to list all the folders inside a directory in Azure Blob. If no folders present create a new folder with certain name.
I am trying to list the folders using dbutils.fs.ls(path).
But the problem with the above command is it fails if the path doesn't exist, which is a valid scenario for me.
If my program runs for the first time the path will not exist and dbutils.fs.ls command will fail.
Is there any way I can handle this scenario dynamically from Databricks.
It will also work for me if I can create an empty folder in Azure Blob from Databricks before executing my job.
I have tried running below command from databricks notebook
%sh mkdir -p /mnt/<mountName>/path/folderName
Here the command runs successfully and even though my container in Azure Blob is mounted it doesn't create the folder.
Sorry for such an elongated post. Any help is much appreciated. Thanks in advance

dbutils.fs.mkdirs("/mnt/<mountName>/path/folderName")
I found this was able to create a folder with a mounted blob storage

Related

Azure Cloudshell Powershell Copy Blob between Containers

I set up a storage account (Blob, v2) with two containers. I uploaded a test excel file into one of the containers. Now I would like to use Azure Cloudshell PowerShell in order to copy that file from one of the containers and insert it to the other.
Does anyone know what command(s) I've got to type in there? (command, src-format, dest-format)
Thanks in advance
PS:
cp https://...blob... https://...blob...
returns "cannot stat 'https://...blob...': no such file or directory"
Glad that # T1B for solved the issue. Thank you #holger for the workaround that helped to fix the issue. Posting this on behalf of your discussion and few points so that it will be beneficial for other community members.
To copy the files between containers we can use the below cmdlts after azcopy login. So that we can able to copy the files within
container as mentioned in this MICROSOFT DOCUMENT .
azcopy copy 'https://staccount.blob.core.windows.net/test1/Stack Overflow.xlsx' 'https://destStaccount.blob.core.windows.net/test2/Stack Overflow.xlsx' --recursive
To do the above make sure that we have sufficient permissions to that storage account likewise storage blob data contributor or owner role.
For more information please refer this similar SO THREAD| How to copy files from one container to another containers fits equally in all dest containers according to size using powershell

How to Download latest file in Azure storage using azcopy to a local system

I am new to azure storage. In the azure storage i have a container and i have multiple directory and i have sub directory inside the container.
Sub directory contains multiple files. I need to download the latest file from the sub directory.
As of now i use the command
azcopy cp "https://storageforecast.blob.core.windows.net/test/pollo/pollo1/pollo2/?si=plus&sv=2019-12-12&srMAVZhkpCwrXs1" "E:\111" --recursive
test- container
pollo -directory
pollo1 - subdirectory1
pollo2 - subdirectory2
I have multiple files inside pollo2. I need to download the latest file...and how can i do that..Can someone pls help me
If you aren't explicitly looking for a cmd solution, then you can download and install Azure Storage Explorer and connect to your storage accounts. This explorer gives your the options to order by Last Modified Date. You can simply right-click and download the blobs in your containers after ordering by Last Modified Date.
Link to download Azure Storage Explorer : https://azure.microsoft.com/en-us/features/storage-explorer/

Yaml configuration to mount Azure Blob Container share

How to configure Azure Blob Storage Container on an Yaml
- name: scripts-file-share
azureFile:
secretName: dev-blobstorage-secret
shareName: logs
readOnly: false```
The above is for the logs file share to configure on yaml.
But if I need to mount blob container? How to configure it?
Instead of azureFile do I need to use azureBlob?
And what is the configuration that I need to have below azureBlob? Please help
After the responses I got from the above post and also went through the articles online, I see there is no option for Azure Blob to mount on Azure AKS except to use azcopy or rest api integration for my problem considering the limitations I have on my environment.
So, after little bit research and taking references from below articles I could able to create a Docker image.
1.) Created the docker image with the reference article. But again, I also need support to run a bash script as I am running azcopy command using bash file. So, I tried to copy the azcopy tool to /usr/bin.
2.) Created SAS tokens for Azure File Share & Azure Blob. (Make sure you give required access permissions only)
3.) Created a bash file that runs the below command.
azcopy <FileShareSASTokenConnectionUrl> <BlobSASTokenConnectionUrl> --recursive=true
4.) Created a deployment yaml that runs on AKS. Added the command to run bash file in that.
This gave me the ability to copy the files from Azure File Share Folders to Azure Blob Container
References:
1.) https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#obtain-a-static-download-link
2.) https://github.com/Azure/azure-storage-azcopy/issues/423

How to ship Airflow logs to Azure Blob Store

I'm having trouble following this guide section 3.6.5.3 "Writing Logs to Azure Blob Storage"
The documentation states you need an active hook to Azure Blob storage. I'm not sure how to create this. Some sources say you need to create the hook in the UI, and some say you can use an environment variable. Either way, none of my logs are getting written to blob store and I'm at my wits end.
Azure Blob Store hook(or any hook for that matter) tells overflow how to write to into Azure Blob Store. This is already included in recent versions of airflow, wasb_hook.
You will need to make sure that the hook is able to write to Azure Blob Store. Just mention the REMOTE_BASE_LOG_FOLDER bucket should be named like wasb-xxx. Once you take care of these two things instructions works without a hitch,
I achieved writing logs to blob using below steps
Create folder named config inside airflow folder
Create empty __init__.py and log_config.py files inside config folder
Search airflow_local_settings.py in your machine
/home/user/env/lib/python2.7/site-packages/airflow/config_templates/airflow_local_settings.py
/home/user/env/lib/python2.7/site-packages/airflow/config_templates/airflow_local_settings.pyc
run
cp /home/user/env/lib/python2.7/site-packages/airflow/config_templates/airflow_local_settings.py config/log_config.py
Edit airflow.cfg [core] section
remote_logging = True
remote_log_conn_id = log_sync
remote_base_log_folder=wasb://airflow-logs#storage-account.blob.core.windows.net/logs/
logging_config_class =log_config.DEFAULT_LOGGING_CONFIG
Add log_sync connection object as below
install airflow azure dependency
pip install apache-airflow[azure]
Restart webserver and scheduler

MarkLogic - Forest data folder & Azure Blob

Technical Stack
MarkLogic 9.0
Cenos Linux
Azure Blob
Blobfuse
To make sure we do not have to worry about data disk size for MarkLogic Forest, we have configured Azure Blob to one of folder in Linux machine, so we do not have to worry about disk size.
There are few things i noticed
Need to create folder in Linux
Create folder and point it to above folder
Then configure Blobfuse else we are getting permission denied while creating forest
Use below command to give permission to all
chmod 777 -R
Now when we started importing using MarkLogic Content Pump (MLCP)
19/03/15 17:01:19 ERROR mapreduce.ContentWriter: SVC-FILSTAT: File status error: stat64 '/mnt/mycontainer/Forests/forest-01/000043e5': Permission denied
So if you look at below image
1st we tried with mycontainer but as soon as we map it to Azure Blob, it does not looks green as azureblob which is. We still need to map azureblob to "azureblob" folder.
It seems i am missing something here, anything to do with Azure Blob security settings?
With the test, when you mount the Azure Blob to Linux, for example, Ubuntu 18.04 (which I'm using), if you want to allow other users to use the mount directory, you can add the parameter -o allow_other when you execute the command blobfuse.
To allow access to all users, you can mount via the option -o
allow_other.
Also, I think you should give others permission through the command chown. For more details, see How to mount Blob storage as a file system with blobfuse.
First i would like to thanks Charles for his efforts and extended help on this issue, Thanks Charls :). I am sure this will help me sometime, somewhere.
I got link on how to setup MarkLogic on Aure
On Page No. 27, steps to Configuring MarkLogic for Azure Blob Storage
In summary it is
Create Storage account in Azure
Create Blob container
Go to MarkLogic server (http://localhost:8001)
Go to Security -> Credentials
Provide Storage account and Azure storage key
While creating MarkLogic Forest, mentioned container path in data directory
azure://mycontainer/mydirectory/myfile
And you are done. No Blobfuse, no drive mount, just a configuration in MarkLogic
Awesome!!
Its working like dream :)

Resources