Long time in installing azureml-sdk[databricks] on DataBricke - databricks

I am trying to install "azureml-sdk[databricks]" on my cluster in DataBricks using PyPi. But it is taking very long time and the status is always "Pending". I have waited around 2-3 hours but this package is still not installed.
Some other packages I can install easily using the same procedure. As you can see in the picture.
Could anybody tell me that what could be the problem ?
Thanks !!!

Yes, you are correct, it seems like a bug. I tried to reproduce the same in my environment it took a very long time.
You can reach out to Azure Support or Can raise a GitHub Issue.
Or
You can follow below alternative approach. I successfully installed azureml-sdk[databricks] using the 10.5 ML Version.
Azure Databricks Runtime Version: 10.5 ML (includes Apache Spark
3.2.1, Scala 2.12)

Related

How to Upgrade Azure Data Explorer Python Plugin Sandbox Anaconda and Python Version?

I'm using the Python Sandbox in Azure Data Explorer to do inference on my data tables.
However, in some of my python code I'll need to upgrade the python() sandbox. (e.g., my models are TensorFlow 2 models instead of TF 1 provided by Py 3.6 and Anaconda 5.2 inside the sandbox).
I was looking online but didn't find any good solution on how to upgrade Azure Data Explorer Python Sandbox Anaconda and Python Version
I've tried to follow the documentation on Azure Data Explorer Python Sandbox Policy, but there's no mention on upgrading the Python and Anaconda version
Another information that I've found is the dependencies versions are shown in the Anaconda page
If anyone knows of a solution or any information to run this python sandbox upgrade that'd be really great
Or if it's not possible can anyone suggest me what other Azure solution I might use to do inference on my data tables?
Thanks and have a great day!
There is a new version based on Python 3.10.8 + latest packages. This version is still in preview, in few weeks it would be GA. Currently this upgrade is not yet self service, you should contact ADX support to perform the upgrade. You can also email me (adieldar#microsoft.com), specifying your cluster name, and I can take it from there.
thanks,
Adi

Which spark should I download?

I'm new to spark and try to build spark+hadoop+hive environment.
I've download the lastest version hive, and accroding to the [Version Compatibility] section on the https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark%3A+Getting+Started, I should download spark 2.3.0, and at the page https://archive.apache.org/dist/spark/spark-2.3.0/, I found there are some different versions, such as spark-2.3.0-bin-hadoop2.7.tgz, spark-2.3.0-bin-without-hadoop.tgz, SparkR_2.3.0.tar.gz and so on.
Now I'm confused! I don't konw which version of spark I need download, and if I download spark-2.3.0-bin-hadoop2.7.tgz, is it mean I need't download hadoop? And what's the different between SparkR_2.3.0.tar.gz and spark-2.3.0-bin-without-hadoop.tgz?
thanks
You should download the latest version that includes Hadoop since that's what you want to setup. That would be Spark 3.x, not 2.3
If you already have Hadoop environment (HDFS/YARN), download the one without Hadoop.
If you're not going to right R code, don't download the SparkR version
AFAIK, "Spark on Hive" execution engine is no longer being worked on. Spark Thriftserver can be used in place of running your own HiveServer

Different Spark versions used on using the source code and getting a pre-built version

I have downloaded Spark source code(branch 2.4) and built the jars using the built instruction for Hadoop 2.7.4. I have also downloaded a pre-built version of Spark 2.4.4(Pre-built for Hadoop 2.7).
When I start spark-shell I see two different versions of Spark as shown in the picture below:
In the first picture, version is 3.0.0 for the jars built after downloading source code of branch 2.4. The second picture is from the pre-built version available from apache spark website. Not only that, the plans are using RelationV2 in first case and Relation logical node in second case.
Can anyone explain why is there such a difference?
Pretty sure you got mixed up, as 3.0.0 is the default choice for dowloading source or prebuilt version. Maybe I am mistaked, but, as of my comment, carefully check what version you have built.

Spark integration in knime

I am planning to execute spark from KNIME analytics platform. For this I need to install KNIME spark executors in the KNIME analytics platform.
Can any one please let me know how to install KNIME spark executors in the KNIME analytics platform for hadoop distribution CDH 5.10.X.
I am referring the installation guide from the below link:
https://www.knime.org/knime-spark-executor
I could successfully configure/integrate spark in KNIME.
I did it in CDH 5.7.
I followed the following steps:
1.Downloaded knime-full_3.3.2.linux.gtk.x86_64.tar.gz.
2.Exract the above mentioned pacakge and run installation for KNIME.
3.After KNIME is installed goto File ->Install KNIME Extensions -> Install Bigdata extensions(Check all the Spark related extensions and proceed).
Follow this link:
https://tech.knime.org/installation-instructions#download
4.Till now only the Bigdata related extensions have been installed but they need license to be functional.
5.License needs to be purchased.However,free trail for 30 days can be availed after which it needs to be purchased.
Folow this link :
https://www.knime.org/knime-spark-executor
6.After plugins are installed we need to configure Spark-job-server.
For that we need to download the compatible version of spark-job-server for the hadoop version we have.
Folow this link for version of spark-job-server and its compatible version :
https://www.knime.org/knime-spark-executor
I'm pretty sure it's as easy as registering for the free trial (and buying the license for longer than 30 days) and then installing the software from the Help->Install New Software menu.
As of version KNIME 3.6 (latest), it should be possible to connect to Spark via Livy, no specific executor deployment on a KNIME Server. Still in preview, but it should do it.
https://www.knime.com/whats-new-in-knime-36

Upgrade Apache Spark version from 1.6 to 2.0

Currently I have Spark version 1.6.2 installed.
I want to upgrade the Spark version to the newest 2.0.1. How do I do this without losing the existing configurations?
Any help would be appreciated.
If its maven or sbt application you simply change dependency version of spark and also migrate your code according to 2.0 so you will not lose you configurations. and for spark binary you can take backup of config folder.
There is no much change related to configuration, some method signatures are changed , major changes i observed was mapPartitions method signature and some changes to metrics/listener api, apart from new features.

Resources