Azure ML Workbench File from Blob - azure

When trying to reference/load a dsource or dprep file generated with a data source file from blob storage, I receive the error "No files for given path(s)".
Tested with .py and .ipynb files. Here's the code:
# Use the Azure Machine Learning data source package
from azureml.dataprep import datasource
df = datasource.load_datasource('POS.dsource') #Error generated here
# Remove this line and add code that uses the DataFrame
df.head(10)
Please let me know what other information would be helpful. Thanks!

Encountered the same issue and it took some research to figure out!
Currently, data source files from blob storage are only supported for two cluster types: Azure HDInsight PySpark and Docker (Linux VM) PySpark
In order to get this to work, it's necessary to follow instructions in Configuring Azure Machine Learning Experimentation Service.
I also ran az ml experiment prepare -c <compute_name> to install all dependencies on the cluster before submitting the first command, since that deployment takes quite a bit of time (at least 10 minutes for my D12 v2 cluster.)
Got the .py files to run with HDInsight PySpark compute cluster (for data stored in Azure blobs.) But .ipynb files are still not working on my local Jupyter server - the cells never finish.

I'm from the Azure Machine Learning team - sorry you are having issues with Jupyter notebook. Have you tried running the notebook from the CLI? If you run from the CLI you should see the stderr/stdout. The IFrame in WB swallows the actual error messages. This might help you troubleshoot.

Related

runOutput isn't appearing even after using dbutils.notebook.exit in ADF

I am using the below code to get some information in the Azure Databricks notebook, but runOutput isn't appearing even after the successful completion of the notebook activity.
Code that I used.
import json
dbutils.notebook.exit(json.dumps({
"num_records" : dest_count,
"source_table_name" : table_name
}))
Databricks notebook exited properly, but Notebook activity isn't showing runOutput.
Can someone please help me what is wrong here?
When I tried the above in my environment, it is working fine for me.
These are my Linked service Configurations.
Result:
I suggest you try the troubleshooting steps like, changing Notebook and changing the Databricks workspace with new one or using Existing cluster in linked service.
If still, it is giving the same, then it's better to raise a Support ticket for your issue.

No details in KqlError when I try to use KqlMagic

I'm trying to connect to an azure data explorer but I keep getting a non descriptive error. I'm following this tutorial.
https://learn.microsoft.com/en-us/sql/azure-data-studio/notebooks/notebooks-kqlmagic?view=sql-server-ver16.
Has anyone seen this?
click here for screenshot
I was trying to connect to azure data explorer from Azure Machine Learning Studio notebooks. I also tried it in Jupyter notebooks with an anaconda environment and I got the same error.
However, the command %reload_ext Kqlmagic worked for me
Maybe its because that Azure login has multiple directories?

unable to upload workspace packages and requirement.txt files on azure synapse analytics sparks pool

When trying to import python libraries at a spark pool level by applying an uploaded requirements.txt file and custom packages, I get the following error with no other details:
CreateOrUpdateSparkComputeFailed
Error occured while processing the request
It was working perfectly fine few days back. Last upload was successful on 12/3/2021.
Also SystemReservedJob-LibraryManagement application job not getting triggered.
Environment Details:
Azure Synapse Analytics
Apache Spark pool - 3.1
We tried below things:
increase the vcore size up to 200
uploaded the same packages to different subscription resource and it is working fine.
increased the spark pool size.
Please suggest
Thank you
Make sure you have below packages in your requirement.txt
Before that we need to check about the packages which are installed and which are not. You can get all the details of packages install by running below lines of code and can conclude which packages are missing and can keep them in place:
import pkg_resources
for d in pkg_resources.working_set:
print(d)
Install the missing libraries with Requirement.txt.
I faced the similar use case where I got good information and step procedure from MS Docs, have a look on it to handle workspace libs

Execute databricks magic command from PyCharm IDE

With databricks-connect we can successfully run codes written in Databricks or Databricks notebook from many IDE. Databricks has also created many magic commands to support their feature with regards to running multi-language support in each cell by adding commands like %sql or %md. One issue I am facing currently is when I try to execute Databricks notebooks in Pycharm is as follows:
How to execute Databricks specific magic command from PyCharm.
E.g.
Importing a script or notebook in Done in Databricks using this command-
%run
'./FILE_TO_IMPORT'
Where as in IDE from FILE_TO_IMPORT import XYZ works.
Again everytime I download Databricks notebook it comments out the magic commands and that makes it impossible to be used anywhere outside Databricks environment.
It's really inefficient to convert all databricks magic command everytime I want to do any developement.
Is there any configuration I could set which automatically detects Databricks specific magic commands?
Any solution to this will be helpful. Thanks in Advance!!!
Unfortunately, as per the databricks-connect version 6.2.0-
" We cannot use magic command outside the databricks environment directly. This will either require creating custom functions but again that will only work for Jupyter not PyCharm"
Again, since importing py files requires %run magic command so this also becomes a major issue. A solution to this is by converting the set of files to be imported as a python package and add it to the cluster via Databricks UI and then import and use it in PyCharm. But this is a very tedious process.

ADFV 2 Spark Activity with Scala throwing error with error code 2312

Using Azure Data Factory Version 2, we have created a Spark Activity ( a simple Hello World example ), but it throws Error with Error Code 2312
Our configuration is Hdinsight cluster with Azure Data Lake as primary storage.
We also tried spinning up an HDInsight cluster with Azure Blob Storage as primary storage and there as well we are facing same issue.
We further tried replacing Scala code with Python scrip ( simple hello world example ), But facing same issue.
Has anyone encountered this issue, are we missing any basic setting
Thanks in advance
May be its too late and you have already solved your issue . However , you can try below
Use azure databricks . Create a new instance of databricks and run your sample hello world in notebook . if its works in notebook then call the same notebook in adf .
hope it helps
#Yogesh, have you tried debugging the issue through ADF by opting Debug as the screenshot? That might help you get the exact root cause. I would suggest trying using the spark-submit with the jar in the Linux box to find out the exact cause.
Also, you can find more info on https://learn.microsoft.com/en-us/azure/data-factory/data-factory-troubleshoot-guide#error-code-2312

Resources