databricks-connect failed to connect to Databricks cluster runtime 8.4 - databricks

I have the following setup
Databricks Cluster:
8.4 ML (includes Apache Spark 3.1.2, Scala 2.12)
Client side:
databricks==0.2
databricks-cli==0.14.3
databricks-connect==8.1.10 (The latest version up to date 20210813)
When I run databricks-connect test to test the connection, I got this error message
java.lang.IllegalArgumentException:
The cluster is running server version `dbr-8.4` but this client only supports Set(dbr-8.1).
You can find a list of client releases at https://pypi.org/project/databricks-connect/#history, and install the right client version with `pip install -U databricks-connect==<version>`.
For example, to install the latest 5.1 release, use `pip install -U databricks-connect==5.1.*`.
To ignore this error and continue, set DEBUG_IGNORE_VERSION_MISMATCH=1.
According to the error message, I understand there is a compatibility issue with the client and server versions. However, I am already using the latest client.
Does it mean the client doesn't add support for server version 8.4?
Is there any way to circumvent this issue?

Unfortunately, it's the latest version of databricks-connect for DBR 8.x., and versions for 8.2-8.4 aren't released. But it may work if you add DEBUG_IGNORE_VERSION_MISMATCH=1 before the databricks-connect test or other commands like, pyspark, spark-submit, etc.

Related

Problem starting cluster on azure databricks with version 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11)

I have configured a cluster on azure databricks, which uses version: 6.4 (includes Apache Spark 2.4.5, Scala 2.11). This version is found to be deprecated.
This cluster is configured to run 2 scripts when it starts and everything is working correctly.
I need to configure the same cluster but in another azure environment, I created everything just like the other cluster that is working, but because this runtime version is deprecated, I configured it with the "new" available version 6.4 Extended Support (includes Apache Spark 2.4.5, Scala 2.11). With this Runtime change, I get an error starting the cluster when the init scripts are run.
LOG INIT_SCRIPTS_FINISHED
The file "dbfs:/FileStore/tables/init_install.sh" is as follows:
apt update -y
apt-get install libav-tools libavcodec-extra -y --fix-missing
I researched about changes that have existed between these two runtime versions and found this page:
https://docs.databricks.com/release-notes/runtime/6.4x.html
Get me some help figuring out what I need to change in this file to be compatible with this new runtime version?
Thanks.
Seems like this issue is not related to runtime version . Since you are trying to create cluster in another azure environment respective init script location won't available . DBFS locations are in work space level . So respective same location will not be there in another azure environment . That’s reason this issue occurs. So that you can create init script file in expected azure environment and attach with that cluster through cluster UI .
How to create Init script :
dbutils.fs.put("/FileStore/tables/init_install.sh","""
apt update -y
apt-get install libav-tools libavcodec-extra -y --fix-missing""", True)
Update the respective location in cluster config :

spark-cassnadra connector issue

I am using spark 1.6.2 with scala version 2.10.5.
Now I have installed cassndra locally and downloaded spark-cassandra-connector_2.10-1.6.2.jar from https://spark-packages.org/package/datastax/spark-cassandra-connector
But when I am trying to fire up the spark shell from the cassandra using the connector I am getting this error
can some one please help me if I am downloading the wrong version of the connector or there are some other issues?
Just put : between spark-cassandra-connector and 1.6.2 instead of _, and remove the ; character after the version of connector...
spark-shell --packages datastax:spark-cassandra-connector:1.6.2-s_2.10
But it's better to use latest from 1.6.x release: 1.6.11 instead of 1.6.2

How to connect Zeppelin to Spark 1.5 built from the sources?

I pulled the latest source from the Spark repository and built locally. It works great from an interactive shell like spark-shell or spark-sql.
Now I want to connect Zeppelin to my Spark 1.5, according to this install manual. I published the custom Spark build to the local maven repository and set the custom Spark version in the Zeppelin build command. The build process finished successfully but when I try to run basic things like sc inside notebook, it throws:
akka.ConfigurationException: Akka JAR version [2.3.11] does not match the provided config version [2.3.4]
Version 2.3.4 is set in pom.xml and spark/pom.xml, but simply changing them won’t even let me get a build.
If I rebuild Zeppelin with the standard -Dspark.vesion=1.4.1, everything works.
Update 2016-01
Spark 1.6 support has landed to master and is available under -Pspark-1.6 profile.
Update 2015-09
Spark 1.5 support has landed to master and is available under -Pspark-1.5 profile.
Work on supporting Spark 1.5 in Apache Zeppelin (incubating) was done under this PR apache/incubator-zeppelin#269 which will lend to master soon.
For now, building from Spark_1.5 branch with -Pspark-1.5 should do the trick.

Do I need Hadoop in my windows to connect on hbase running on linux?

Do I need Hadoop in my windows to connect on hbase running on ununtu with hadoop?
My hbase is running fine on my ubuntu machine. I am able to connect with eclipse on same machine ( I am using kundera to connect hbase). Now I want to connect hbase from my windows 7 eclipse IDE . Do I need to install hadoop on my windows to connect remote hbase which is on ubuntu .?? when I tried I am getting something like this
Failed to locate the winutils binary in the hadoop binary path
Read about open-source technology .IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
All you need are the hadoop, hbase jars and then Configuration object initialized
with:
1. hbase.zookeeper.quorum(if cluster) details and other information initialized.
2. hbase.zookeeper.property.clientPort
3. zookeeper.znode.parent
And then, getting connection with the above config object
This problem usually occurs in Hadoop 2.x.x version. One of the option is to build Windows distribution for Hadoop version.
Refer this link:
http://www.srccodes.com/p/article/38/build-install-configure-run-apache-hadoop-2.2.0-microsoft-windows-os
But, before building try to use the zip file given in this link:
http://www.srccodes.com/p/article/39/error-util-shell-failed-locate-winutils-binary-hadoop-binary-path
Extract this zip file and paste the files under hadoop-common-2.2.0/bin to $HADOOP_HOME/bin directory.
Note: For me this works even for Hadoop 2.5 version.

Error - Apache Cassandra 2.0.5 & Datastax OpsCenter 4.1

I am using cassandra 2.0.5 on Centos 6.5 and OpsCenter 4 worked fine until i updated OpsCenter to version 4.1 . I access OpsCenter page, click on manage existing cluster and give the ip address of my node (127.0.0.1) and it gives me the following: "Error creating cluster: max() arg is an empty sequence".
Any clues ?
The bug is on 4.1.0, and is affecting those running Python 2.6. The complete fix for this is 4.1.1 (http://www.datastax.com/dev/blog/opscenter-4-1-1-now-available). To workaround this issue on 4.1.0, users should disable the auto-update feature, and manually re-populate the latest definitions. This will only need to be done once. This doesn't need to be done with 4.1.1, and that's the best fix. See the Known issues of the release notes (http://www.datastax.com/documentation/opscenter/4.1/opsc/release_notes/opscReleaseNotes410.html)
Add the following to opscenterd.conf to disable auto-update:
[definitions]
auto_update = False
Manually download the definition files
for tarball installs:
cd ./conf/definitions
for packages installs:
cd /etc/opscenter/definitions
Apply the latest definitions
curl https://opscenter.datastax.com/definitions/4.1.0/definition_files.tgz | tar xz
Restart opscenterd
I jus had today the same problem that you. I downloaded an older versions of opscenter (particulary version 4.0.2) from http://rpm.datastax.com/community/noarch/ and the error has gone.
I am also using the sam cassandra version and also on centos

Resources