Access blob file using time stamp in Azure - azure

I want to access a blob file that is getting generated out of azure ml web service along with the ilearner and csv file. The problem is that the file is getting generated automatically with guid as its name, and with no response mentioning the existence of that file. I know that the file is getting generated as i can access it through azure portal. i would like to automatically access the file and the only possibility i can see is by using the time stamp of other file created at the same instance. is there any api or method available to access blobs created at a particular instance using time stamp instead of file name?

According to your description, I guess you used Export Data Module.
As your requirements, it is highly recommended that you could replace Export Data with Execute Python Script in Azure Machine Learning which allows you to customize the blob file name.
For the introduction to Execute Python Script, you could refer to the official documentation here.
Please refer to the following steps to implement:
Step 1: Please use Python virtualenv create Python independent running environment, specific steps please refer to https://virtualenv.pypa.io/en/stable/userguide/, then use the pip install command to download Azure Storage related Scripts.
Compress all of the files in the Lib/site-packages folder into a zip package (I'm calling it azure - storage - package here)
Step 2: Upload the zip package into the Azure Machine Learning WorkSpace DataSet.
specific steps please refer to the Technical Notes.
After success, you will see the uploaded package in the DataSet List, dragging it to the third node of the Execute Python Script.
Step 3 : Customize the blob file name in the python script to the timestamp, you could even add GUID to ensure uniqueness at the end of the file name.
I provided a simple snippet of code:
import pandas as pd
from azure.storage.blob import BlockBlobService
import time
def azureml_main(dataframe1 = None, dataframe2 = None):
myaccount= '****'
mykey= '****'
block_blob_service = BlockBlobService(account_name=myaccount, account_key=mykey)
block_blob_service.create_blob_from_text('test', 'str(int(time.time()))+'.txt', 'upload image test')
return dataframe1,
Also,you could refer to the SO thread Access Azure blog storage from within an Azure ML experiment.
Hope it helps you.

Related

Azure Synapse: Upload directory of py files in Spark job reference files

I am trying to pass a whole directory of python files that are referenced in the main python file in Azure Synapse Spark Job Definition but the files are not appearing in the location and I get Module Not Found Error. Trying to upload like this:
abfss://[directory path in data lake]/*
You have to trick the Spark job definition by exporting it, editing it as a JSON, and importing it back.
After the export, open in a text editor and add the following:
"conf": {
"spark.submit.pyFiles":
"path-to-abfss/module1.zip, path-to-abfss/module2.zip"
},
Now, import the JSON back.
The way to achieve this on Synapse is to package your python files into a wheel package and upload the wheel package to a specific location the Azure Data Lake Storage where your spark pool will load them from every time it starts. This will make the custom python packages available to all jobs and notebooks using that spark pool.
You can find more details on the official documentation: https://learn.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-manage-python-packages#install-wheel-files

Use Azure Data Factory to copy files and place a csv of files copied

I am trying to implement the following flow in an Azure Data Factory pipeline:
Copy files from an SFTP to a local folder.
Create a comma separated file in the local folder with the list of files and their
sizes.
The first step was easy enough, using a 'Copy Data' step with 'SFTP' as source and 'File System' as sink.
The files are being copied, but in the output of this step, I don't see any file information.
I also don't see an option to create a file using data from a previous step.
Maybe I'm using the wrong technology?
One of the reasons I'm using Azure Data Factory, is because of the integration runtime, which allows us to have a single fixed IP to connect to the external SFTP. (easier firewall configuration)
Is there a way to implement step 2?
Thanks for any insight!
There is no built-in feature to achieve this.
You need to use ADF with other service, I suppose you to first use azure function to check the files and then do copy.
The structure should be like this:
You can get the size of the files and save them to the csv file:
Get size of files(python):
How to fetch sizes of all SFTP files in a directory through Paramiko
And use pandas to save the messages as csv(python):
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
Writing a pandas DataFrame to CSV file
Simple http trigger of azure function(python):
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook-trigger?tabs=python
(Put the processing logic in the body of the azure function. Basically, you can do anything you want in the body of the azure function except for the graphical interface and some unsupported things. You can choose the language you are familiar with, but in short, there is not a feature in ADF that satisfies your idea.)

External Properties File in Azure Databricks

We have a full fledge Spark Application that is taking a lot off parameter from properties file. Now we want move the application to Azure notebook format .Entire code is working fine and giving expected result with hard coded parameter. But is it possible to use external properties file in Azure Databricks Notebook also ??If we can, then where we need to place properties file??
You may utilize the Databricks DBFS Filestore, Azure Databricks note books can access user's files from here.
To Upload the properties file you have, you can use 2 options
Using wget,
import sys
"wget -P /tmp/ http://<your-repo>/<path>/app1.properties"
dbutils.fs.cp("file:/tmp/app1.properties", "dbfs:/FileStore/configs/app1/")
Using dbfs.fs.put, (may be an one-time activity to create this file)
dbutils.fs.put("FileStore/configs/app1/app1.properties", "prop1=val1\nprop2=val2")
To import the properties file values,
properties = dict(line.strip().split('=') for line in open('/dbfs/FileStore/configs/app1/app1.properties'))
Hope this helps!!
There's a possibility of providing/returning arguments with use of Databricks Jobs REST API, more information can be found e.g. here: https://docs.databricks.com/dev-tools/api/latest/examples.html#jobs-api-example

SAP Commerce Cloud Hot Folder local setup

We are trying to use cloud hot folder functionality and in order to do so we are modifying our existing hot-folder implementation that was not implemented originally for usage within cloud.
Following the steps on this help page:
https://help.sap.com/viewer/0fa6bcf4736c46f78c248512391eb467/SHIP/en-US/4abf9290a64f43b59fbf35a3d8e5ba4d.html
We are trying to test the cloud functionality locally. I have on my machine azurite docker container running and I have modified the mentioned properties in local.properties file but it seems that the files are not being picked up by hybris in any of the cases that we are trying.
First we have in our local azurite storage a blob storage called hybris. Within this blob storage we have folders master>hotfolder, and according to docs uploading a sample.csv file into this should trigger a hot folder upload.
Also we have a mapping for our hot-folder import that scans the files within this folder: #{baseDirectory}/${tenantId}/sample/classifications. {baseDirectory} is configured using a property like so: ${HYBRIS_DATA_DIR}/sample/import
Can we keep these mappings within our hot folder xml definitions, or do we need to change them?
How should the blob container be named in order for it to be accessible to hybris?
Thank you very much,
I would be very happy to provide any further information.
In the end I did manage to run cloud hot folder imports on local machine.
It was a matter of correctly configuring a number of properties that are used by cloudhotfolder and azurecloudhotfolder extensions.
Simply use the following properties to set the desired behaviour of the system:
cluster.node.groups=integration,yHotfolderCandidate
azure.hotfolder.storage.account.connection-string=DefaultEndpointsProtocol=http;AccountName=devstoreaccount1;AccountKey=Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==;BlobEndpoint=http://127.0.0.1:32770/devstoreaccount1;
azure.hotfolder.storage.container.hotfolder=${tenantId}/your/path/here
cloud.hotfolder.default.mapping.file.name.pattern=^(customer|product|url_media|sampleFilePattern|anotherFileNamePattern)-\\d+.*
cloud.hotfolder.default.images.root.url=http://127.0.0.1:32785/devstoreaccount1/${azure.hotfolder.storage.container.name}/master/path/to/media/folder
cloud.hotfolder.default.mapping.header.catalog=YourProductCatalog
And that is it, if there are existing routings for traditional hot folder import, these can also be used but their mappings should be in the value of
cloud.hotfolder.default.mapping.file.name.pattern
property.
I am trying the same - to set up a local dev env to test out the cloud hotfolder. It seems that you have had some success. Can you provide where you located the azurecloudhotfolder - which is called out here https://help.sap.com/viewer/0fa6bcf4736c46f78c248512391eb467/SHIP/en-US/4abf9290a64f43b59fbf35a3d8e5ba4d.html
Thanks

How can we save or upload .py file on dbfs/filestore

We have few .py files on my local needs to stored/saved on fileStore path on dbfs. How can I achieve this?
Tried with dbUtils.fs module copy actions.
I tried the below code but did not work, I know something is not right with my source path. Or is there any better way of doing this? please advise
'''
dbUtils.fs.cp ("c:\\file.py", "dbfs/filestore/file.py")
'''
It sounds like you want to copy a file on local to the dbfs path of servers of Azure Databricks. However, due to the interactive interface of Notebook of Azure Databricks based on browser, it could not directly operate the files on local by programming on cloud.
So the solutions as below that you can try.
As #Jon said in the comment, you can follow the offical document Databricks CLI to install the databricks CLI via Python tool command pip install databricks-cli on local and then copy a file to dbfs.
Follow the offical document Accessing Data to import data via Drop files into or browse to files in the Import & Explore Data box on the landing page, but also recommended to use CLI, as the figure below.
Upload your specified files to Azure Blob Storage, then follow the offical document Data sources / Azure Blob Storage to do the operations include dbutils.fs.cp.
Hope it helps.

Resources