ADFV 2 Spark Activity with Scala throwing error with error code 2312 - azure

Using Azure Data Factory Version 2, we have created a Spark Activity ( a simple Hello World example ), but it throws Error with Error Code 2312
Our configuration is Hdinsight cluster with Azure Data Lake as primary storage.
We also tried spinning up an HDInsight cluster with Azure Blob Storage as primary storage and there as well we are facing same issue.
We further tried replacing Scala code with Python scrip ( simple hello world example ), But facing same issue.
Has anyone encountered this issue, are we missing any basic setting
Thanks in advance

May be its too late and you have already solved your issue . However , you can try below
Use azure databricks . Create a new instance of databricks and run your sample hello world in notebook . if its works in notebook then call the same notebook in adf .
hope it helps

#Yogesh, have you tried debugging the issue through ADF by opting Debug as the screenshot? That might help you get the exact root cause. I would suggest trying using the spark-submit with the jar in the Linux box to find out the exact cause.
Also, you can find more info on https://learn.microsoft.com/en-us/azure/data-factory/data-factory-troubleshoot-guide#error-code-2312

Related

Additional column throwing validation issue with Azure SQL data sink in Azure Data Factory

Validation Error
I've got this weird issue where validation fails on 'additional columns' for my data sink to Azure SQL coming from a blob storage source in the Azure Data Factory GUI. No matter how many times we recreate the dataset (or specify another dataset, new) we can't get past this validation issue.
The irony of this is we deploy these pipelines from code and when we run them, we get no errors at all. This issue we have had just made life really difficult developing pipelines further as we have to do everything by code. We cant use the pipepline publish option.
Here are some screen grabs for you of the pipeline so you can see the flow.
Pipeline
Inside copyCustomer.
Source
Mapping
Sink
Any ideas on how to fix this validation would be greatly appreciated.
For what it's worth, we have recreated the dataset multiple times (clone and new) to avoid any issue with the dataset model not being the latest as per what's documented here https://learn.microsoft.com/en-us/azure/data-factory/copy-activity-overview#add-additional-columns-during-copy
Sometimes by setting the table in sink to autocreate has shown the validation to be 'fixed' but then when we go to publish it errors out again.
When your Azure SQL dataset was created long time before and is still utilizing an outdated dataset model that Additional Columns do not support, this is expected behavior.
As per official Microsoft documentation
To resolve this issue, you can just follow the error message to create a new Azure SQL dataset and use this as copy sink.
I followed error message and created new data set and it is working fine for me.
Source:
Mapping:
Sink:
Output:
I suspect here, your dataset of Sink type is incorrect. I reproduced,
same at my end. Its working fine. Kindly make sure you create a sink dataset type with Azure SQL database type connector only.
Please check below screenshots from my implementation.
If still it helps, feel free to share your sink dataset type connector details along with screenshots.

runOutput isn't appearing even after using dbutils.notebook.exit in ADF

I am using the below code to get some information in the Azure Databricks notebook, but runOutput isn't appearing even after the successful completion of the notebook activity.
Code that I used.
import json
dbutils.notebook.exit(json.dumps({
"num_records" : dest_count,
"source_table_name" : table_name
}))
Databricks notebook exited properly, but Notebook activity isn't showing runOutput.
Can someone please help me what is wrong here?
When I tried the above in my environment, it is working fine for me.
These are my Linked service Configurations.
Result:
I suggest you try the troubleshooting steps like, changing Notebook and changing the Databricks workspace with new one or using Existing cluster in linked service.
If still, it is giving the same, then it's better to raise a Support ticket for your issue.

No details in KqlError when I try to use KqlMagic

I'm trying to connect to an azure data explorer but I keep getting a non descriptive error. I'm following this tutorial.
https://learn.microsoft.com/en-us/sql/azure-data-studio/notebooks/notebooks-kqlmagic?view=sql-server-ver16.
Has anyone seen this?
click here for screenshot
I was trying to connect to azure data explorer from Azure Machine Learning Studio notebooks. I also tried it in Jupyter notebooks with an anaconda environment and I got the same error.
However, the command %reload_ext Kqlmagic worked for me
Maybe its because that Azure login has multiple directories?

Azure Synapse Spark LIVY_JOB_STATE_ERROR

i'm experimenting the following error when executing any cell in my notebook:
LIVY_JOB_STATE_ERROR: Livy session has failed. Session state: Killed. Error code: LIVY_JOB_STATE_ERROR. [(my.synapse.spark.pool.name) WorkspaceType: CCID:<(hexcode)>] [Monitoring] Livy Endpoint=[https://hubservice1.westeurope.azuresynapse.net:8001/api/v1.0/publish/8dda5837-2f37-4a5d-97b9-0994b59e17f0]. Livy Id=[3] Job failed during run time with state=[error]. Source: Dependency.
My notebook was working ok till yesterday, the thing that i changed is the spark pool that was using spark 2.4 to spark 3.2(preview). Such change was made by a terraform template deploy, could this be the source of the issue? if so how to prevent it?
The issue was fixed by deleting and creating my spark pool again via the azure portal, still not sure what configuration inside my terraform template created the issue but at least this fixes the problem for now.

Azure ML Workbench File from Blob

When trying to reference/load a dsource or dprep file generated with a data source file from blob storage, I receive the error "No files for given path(s)".
Tested with .py and .ipynb files. Here's the code:
# Use the Azure Machine Learning data source package
from azureml.dataprep import datasource
df = datasource.load_datasource('POS.dsource') #Error generated here
# Remove this line and add code that uses the DataFrame
df.head(10)
Please let me know what other information would be helpful. Thanks!
Encountered the same issue and it took some research to figure out!
Currently, data source files from blob storage are only supported for two cluster types: Azure HDInsight PySpark and Docker (Linux VM) PySpark
In order to get this to work, it's necessary to follow instructions in Configuring Azure Machine Learning Experimentation Service.
I also ran az ml experiment prepare -c <compute_name> to install all dependencies on the cluster before submitting the first command, since that deployment takes quite a bit of time (at least 10 minutes for my D12 v2 cluster.)
Got the .py files to run with HDInsight PySpark compute cluster (for data stored in Azure blobs.) But .ipynb files are still not working on my local Jupyter server - the cells never finish.
I'm from the Azure Machine Learning team - sorry you are having issues with Jupyter notebook. Have you tried running the notebook from the CLI? If you run from the CLI you should see the stderr/stdout. The IFrame in WB swallows the actual error messages. This might help you troubleshoot.

Resources