How to ship Airflow logs to Azure Blob Store - azure

I'm having trouble following this guide section 3.6.5.3 "Writing Logs to Azure Blob Storage"
The documentation states you need an active hook to Azure Blob storage. I'm not sure how to create this. Some sources say you need to create the hook in the UI, and some say you can use an environment variable. Either way, none of my logs are getting written to blob store and I'm at my wits end.

Azure Blob Store hook(or any hook for that matter) tells overflow how to write to into Azure Blob Store. This is already included in recent versions of airflow, wasb_hook.
You will need to make sure that the hook is able to write to Azure Blob Store. Just mention the REMOTE_BASE_LOG_FOLDER bucket should be named like wasb-xxx. Once you take care of these two things instructions works without a hitch,

I achieved writing logs to blob using below steps
Create folder named config inside airflow folder
Create empty __init__.py and log_config.py files inside config folder
Search airflow_local_settings.py in your machine
/home/user/env/lib/python2.7/site-packages/airflow/config_templates/airflow_local_settings.py
/home/user/env/lib/python2.7/site-packages/airflow/config_templates/airflow_local_settings.pyc
run
cp /home/user/env/lib/python2.7/site-packages/airflow/config_templates/airflow_local_settings.py config/log_config.py
Edit airflow.cfg [core] section
remote_logging = True
remote_log_conn_id = log_sync
remote_base_log_folder=wasb://airflow-logs#storage-account.blob.core.windows.net/logs/
logging_config_class =log_config.DEFAULT_LOGGING_CONFIG
Add log_sync connection object as below
install airflow azure dependency
pip install apache-airflow[azure]
Restart webserver and scheduler

Related

Access S3 files from Azure Synapse Notebook

Goal:
Move a lot of files from AWS S3 to ADLS Gen2 using Azure Synapse as fast as possible using parameterized regex expression for filename pattern using Synapse Notebook.
What I tried so far:
I know to access ADLS gen2, we can use
mssparkutils.fs.ls('abfss://container_name#storage_account_name.blob.core.windows.net/foldername') works but what is the equivalent to access S3 ?
I used mssparkutils.credentials.getsecret('AKV name','secretname') and mssparkutils.credentials.getsecret('AKV name','secret key id') to fetch secret details in the Synapse notebook but unable configure S3 to Synapse.
Question: Do I have to use the existing linked service using the credentials.getFullConnectionString(LinkedService) API ?
In short, my question is, How do I configure connectivity to S3 from within Synapse Notebook?
Answering my question here. AzCopy worked.Below is the link which helped me finish the task. The steps are as follows.
Install AzCopy on your machine.
Goto your terminal and goto the directory where the executeable is installed; run "AzCopy Login"; use Azure Active Directory credentials in your browser using the link from terminal message..Use the CODE provided in the terminal.
Authorize with S3 using below
set AWS_ACCESS_KEY_ID=
set AWS_SECRET_ACCESS_KEY=
For ADLS Gen2, you are already done in step-2
Use the commands (which ever suits your need) from the link below.
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-s3

Delete images from a folder created on a Google Cloud run API

I have a flask api that is running on google cloud run. For the sake of the question let it be called https://objdetect-bbzurq6giq-as.a.run.app/objdetect.
Using this API, a person uploads an image, the api highlights objects in the image and then stores the new image in a folder called static. The location of that folder is https://objdetect-bbzurq6giq-as.a.run.app/static/.
Now, that I am testing the API on tons of images, the capacity of the server is running out. I want to delete all the images from the static folder.
I tried the below python script but it didn't work for me, maybe thats not the right solution:
from google.cloud import storage
import os
os.environ["GCLOUD_PROJECT"] = "my-project-1234"
bucket_name = 'https://objdetect-bbzurq6giq-as.a.run.app/objdetect'
directory_name = 'https://objdetect-bbzurq6giq-as.a.run.app/static/'
client = storage.Client()
bucket = client.get_bucket(bucket_name)
# list all objects in the directory
blobs = bucket.list_blobs(prefix=directory_name)
for blob in blobs:
blob.delete()
Is there a way to achieve this using a python script?
Cloud Run is not Cloud Storage. Use the Linux file system APIs to delete files stored in Cloud Run.
Use the function os.unlink()
path = '/static'
with os.scandir(path) as it:
for entry in it:
if entry.is_file():
unlink(os.path.join(path, entry.name))
Cloud run and Cloud storage are 2 different services.
Cloud run runs within a container, we can say within a stateless machine/VM. if the images are created within the container it will get deleted once the container shuts down.
Cloud storage is like a file storage and the files within GCS will persist till explicitly deleted or deleted by a lifecycle.
So files created within cloud run are not stored within a storage rather in the in the server run in the container. If you want to delete the files inside cloud run then you would need to delete those using the code(Python in your case).

How to create a empty folder in Azure Blob from Azure databricks

I have scenario where I want to list all the folders inside a directory in Azure Blob. If no folders present create a new folder with certain name.
I am trying to list the folders using dbutils.fs.ls(path).
But the problem with the above command is it fails if the path doesn't exist, which is a valid scenario for me.
If my program runs for the first time the path will not exist and dbutils.fs.ls command will fail.
Is there any way I can handle this scenario dynamically from Databricks.
It will also work for me if I can create an empty folder in Azure Blob from Databricks before executing my job.
I have tried running below command from databricks notebook
%sh mkdir -p /mnt/<mountName>/path/folderName
Here the command runs successfully and even though my container in Azure Blob is mounted it doesn't create the folder.
Sorry for such an elongated post. Any help is much appreciated. Thanks in advance
dbutils.fs.mkdirs("/mnt/<mountName>/path/folderName")
I found this was able to create a folder with a mounted blob storage

Yaml configuration to mount Azure Blob Container share

How to configure Azure Blob Storage Container on an Yaml
- name: scripts-file-share
azureFile:
secretName: dev-blobstorage-secret
shareName: logs
readOnly: false```
The above is for the logs file share to configure on yaml.
But if I need to mount blob container? How to configure it?
Instead of azureFile do I need to use azureBlob?
And what is the configuration that I need to have below azureBlob? Please help
After the responses I got from the above post and also went through the articles online, I see there is no option for Azure Blob to mount on Azure AKS except to use azcopy or rest api integration for my problem considering the limitations I have on my environment.
So, after little bit research and taking references from below articles I could able to create a Docker image.
1.) Created the docker image with the reference article. But again, I also need support to run a bash script as I am running azcopy command using bash file. So, I tried to copy the azcopy tool to /usr/bin.
2.) Created SAS tokens for Azure File Share & Azure Blob. (Make sure you give required access permissions only)
3.) Created a bash file that runs the below command.
azcopy <FileShareSASTokenConnectionUrl> <BlobSASTokenConnectionUrl> --recursive=true
4.) Created a deployment yaml that runs on AKS. Added the command to run bash file in that.
This gave me the ability to copy the files from Azure File Share Folders to Azure Blob Container
References:
1.) https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10?toc=%2fazure%2fstorage%2fblobs%2ftoc.json#obtain-a-static-download-link
2.) https://github.com/Azure/azure-storage-azcopy/issues/423

Unable to locate the repository cloned from git using Azure cloud shell

I opened Azure Cloud Shell and once the command prompt was ready, I tried git clone https://github.com/Azure-Samples/python-docs-hello-world and it was cloned successfully. However, i am unable to locate where the cloned files are. Need help with the process for locating using Azure Cloud Shell.
The Azure Cloud shell stores the files in a file share within a storage account that you either specified or Azure created for you.
When you use basic settings and select only a subscription, Cloud
Shell creates three resources on your behalf in the supported region
that's nearest to you:
Resource group: cloud-shell-storage-<region>
Storage account: cs<uniqueGuid>
File share: cs-<user>-<domain>-com-<uniqueGuid>
Source.

Resources