Azure Synapse Spark LIVY_JOB_STATE_ERROR - azure

i'm experimenting the following error when executing any cell in my notebook:
LIVY_JOB_STATE_ERROR: Livy session has failed. Session state: Killed. Error code: LIVY_JOB_STATE_ERROR. [(my.synapse.spark.pool.name) WorkspaceType: CCID:<(hexcode)>] [Monitoring] Livy Endpoint=[https://hubservice1.westeurope.azuresynapse.net:8001/api/v1.0/publish/8dda5837-2f37-4a5d-97b9-0994b59e17f0]. Livy Id=[3] Job failed during run time with state=[error]. Source: Dependency.
My notebook was working ok till yesterday, the thing that i changed is the spark pool that was using spark 2.4 to spark 3.2(preview). Such change was made by a terraform template deploy, could this be the source of the issue? if so how to prevent it?

The issue was fixed by deleting and creating my spark pool again via the azure portal, still not sure what configuration inside my terraform template created the issue but at least this fixes the problem for now.

Related

runOutput isn't appearing even after using dbutils.notebook.exit in ADF

I am using the below code to get some information in the Azure Databricks notebook, but runOutput isn't appearing even after the successful completion of the notebook activity.
Code that I used.
import json
dbutils.notebook.exit(json.dumps({
"num_records" : dest_count,
"source_table_name" : table_name
}))
Databricks notebook exited properly, but Notebook activity isn't showing runOutput.
Can someone please help me what is wrong here?
When I tried the above in my environment, it is working fine for me.
These are my Linked service Configurations.
Result:
I suggest you try the troubleshooting steps like, changing Notebook and changing the Databricks workspace with new one or using Existing cluster in linked service.
If still, it is giving the same, then it's better to raise a Support ticket for your issue.

unable to upload workspace packages and requirement.txt files on azure synapse analytics sparks pool

When trying to import python libraries at a spark pool level by applying an uploaded requirements.txt file and custom packages, I get the following error with no other details:
CreateOrUpdateSparkComputeFailed
Error occured while processing the request
It was working perfectly fine few days back. Last upload was successful on 12/3/2021.
Also SystemReservedJob-LibraryManagement application job not getting triggered.
Environment Details:
Azure Synapse Analytics
Apache Spark pool - 3.1
We tried below things:
increase the vcore size up to 200
uploaded the same packages to different subscription resource and it is working fine.
increased the spark pool size.
Please suggest
Thank you
Make sure you have below packages in your requirement.txt
Before that we need to check about the packages which are installed and which are not. You can get all the details of packages install by running below lines of code and can conclude which packages are missing and can keep them in place:
import pkg_resources
for d in pkg_resources.working_set:
print(d)
Install the missing libraries with Requirement.txt.
I faced the similar use case where I got good information and step procedure from MS Docs, have a look on it to handle workspace libs

ADFV 2 Spark Activity with Scala throwing error with error code 2312

Using Azure Data Factory Version 2, we have created a Spark Activity ( a simple Hello World example ), but it throws Error with Error Code 2312
Our configuration is Hdinsight cluster with Azure Data Lake as primary storage.
We also tried spinning up an HDInsight cluster with Azure Blob Storage as primary storage and there as well we are facing same issue.
We further tried replacing Scala code with Python scrip ( simple hello world example ), But facing same issue.
Has anyone encountered this issue, are we missing any basic setting
Thanks in advance
May be its too late and you have already solved your issue . However , you can try below
Use azure databricks . Create a new instance of databricks and run your sample hello world in notebook . if its works in notebook then call the same notebook in adf .
hope it helps
#Yogesh, have you tried debugging the issue through ADF by opting Debug as the screenshot? That might help you get the exact root cause. I would suggest trying using the spark-submit with the jar in the Linux box to find out the exact cause.
Also, you can find more info on https://learn.microsoft.com/en-us/azure/data-factory/data-factory-troubleshoot-guide#error-code-2312

Azure ML Workbench File from Blob

When trying to reference/load a dsource or dprep file generated with a data source file from blob storage, I receive the error "No files for given path(s)".
Tested with .py and .ipynb files. Here's the code:
# Use the Azure Machine Learning data source package
from azureml.dataprep import datasource
df = datasource.load_datasource('POS.dsource') #Error generated here
# Remove this line and add code that uses the DataFrame
df.head(10)
Please let me know what other information would be helpful. Thanks!
Encountered the same issue and it took some research to figure out!
Currently, data source files from blob storage are only supported for two cluster types: Azure HDInsight PySpark and Docker (Linux VM) PySpark
In order to get this to work, it's necessary to follow instructions in Configuring Azure Machine Learning Experimentation Service.
I also ran az ml experiment prepare -c <compute_name> to install all dependencies on the cluster before submitting the first command, since that deployment takes quite a bit of time (at least 10 minutes for my D12 v2 cluster.)
Got the .py files to run with HDInsight PySpark compute cluster (for data stored in Azure blobs.) But .ipynb files are still not working on my local Jupyter server - the cells never finish.
I'm from the Azure Machine Learning team - sorry you are having issues with Jupyter notebook. Have you tried running the notebook from the CLI? If you run from the CLI you should see the stderr/stdout. The IFrame in WB swallows the actual error messages. This might help you troubleshoot.

When trying to register a UDF using Python on I get an error about Spark BUILD with HIVE

Exception: ("You must build Spark with Hive. Export 'SPARK_HIVE=true' and run build/sbt assembly", Py4JJavaError(u'An error occurred while calling None.org.apache.spark.sql.hive.HiveContext.\n', JavaObject id=o54))
This happens whenever I create a UDF on a second notebook in Jupyter on IBM Bluemix Spark as a Service.
If you are using IBM Bluemix Spark as a Service, execute the following command in a cell of the python notebook :
!rm -rf /gpfs/global_fs01/sym_shared/YPProdSpark/user/spark_tenant_id/notebook/notebooks/metastore_db/*.lck
Replace spark_tenant_id with the actual one. You can find the tenant id using the following command in a cell of the notebook:
!whoami
I've run into these errors as well. Only the first notebook you launch will have access to the hive context. From here
By default Hive(Context) is using embedded Derby as a metastore. It is intended mostly for testing and supports only one active user.

Resources