Apache spark 2.3 over Apache HBase 2.0 - apache-spark

Need to add spark connector over HBase where
Spark version: 2.3.1
HBase Version: 2.0.0
Getting Bellow Exception:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.deploy.SparkHadoopUtil.getCurrentUserCredentials()Lorg/apache/hadoop/security/Credentials;
at org.apache.hadoop.hbase.spark.HBaseContext.<init>(HBaseContext.scala:68)
at org.apache.hadoop.hbase.spark.JavaHBaseContext.<init>(JavaHBaseContext.scala:46)
at com.cloud.databaseroot.hbase.spark.JavaHBaseBulkPutExample.main(JavaHBaseBulkPutExample.java:60)
Snap from pom.xml:
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-spark</artifactId>
<version>2.0.0-alpha4</version>
<exclusions>
<exclusion>
<artifactId>jackson-module-scala_2.10</artifactId>
<groupId>com.fasterxml.jackson.module</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.17.Final</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-scala_2.11</artifactId>
<version>2.8.8</version>
</dependency>
Let me know where am I getting wrong.

It seems that hbase-spark version 2.0.0-alpha4 is not compatible with Spark 2.3.1.
SparkHadoopUtil.getCurrentUserCredentials method is available in Spark version <= 2.2. Either downgrade Spark or build hbase-spark with Spark 2.3.1 which may require some code changes in it.

Related

Cassandra driver 3.4 is not compatible with Guava 30

We've a Java 8 standalone applicat that reads from Cassandra tables, The client version we're currently using is 3.4.0. The application should also support reading from Google Cloud Storage, but once we added the GCS dependencies to the pom file we started see exceptions when reading from Cassandra. Seems like the 3.4 driver uses Guava 19, and the GCS uses Guava 30. Is it possible to make them both live together in the same Java process? Trying to exlude Guava from the cassandra-driver-core 3.4 causing the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/util/concurrent/FutureFallback
at com.datastax.driver.core.GuavaCompatibility.selectImplementation(GuavaCompatibility.java:136)
at com.datastax.driver.core.GuavaCompatibility.<clinit>(GuavaCompatibility.java:52)
at com.datastax.driver.core.Cluster.<clinit>(Cluster.java:68)
at com.myorg.infra.cassandra.CassandraConnector.basicBuilder(CassandraConnector.java:32)
at com.myorg.infra.cassandra.CassandraConnector.connect(CassandraConnector.java:61)
at com.myorg.aggregator.cassandra.analytics.repository.CategoryDetailsRepository.main(CategoryDetailsRepository.java:56)
Caused by: java.lang.ClassNotFoundException: com.google.common.util.concurrent.FutureFallback
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
Cassandra dependencies:
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.4.0</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-mapping</artifactId>
<version>3.4.0</version>
<exclusions>
<exclusion>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
</exclusion>
</exclusions>
</dependency>
GCS dependencies:
<dependencyManagement>
<dependencies>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>libraries-bom</artifactId>
<version>20.1.0</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>com.google.cloud</groupId>
<artifactId>google-cloud-storage</artifactId>
</dependency>
I had a similar issue and found that upgrading Cassandra driver version to 3.11.0 was the best solution. It takes Guava 30 while keeping most of the driver interfaces.
Note that Cassandra driver version 4.0+ is not binary compatible, meaning you can not drop it in and hoping it works. It does require a complete re-write of application code.
As to your question,
Is it possible to make them both live together in the same Java process?
It's possible with multiple classloaders but you may not want to do that.

Jackson version issue in spark structured streaming

When using spark structured streaming with spark-sql-kafka-0-10_2.11 I was seeing MethodNotFoundError's . Based on another question Cannot run queries in SQLContext from Apache Spark SQL 1.5.2, getting java.lang.NoSuchMethodError
I tried to explicitly set the jackson version.
Versions 2.9.6, 2.4.3, 2.9.0 have been tried. The 2.4.3 says "Jackson version too old". The other versions say
Caused by: com.fasterxml.jackson.databind.JsonMappingException:
Incompatible Jackson version: 2.9.0
Here is the full ST for 2.9.0:
19/05/10 11:30:18 ERROR MicroBatchExecution: Query [id = dbd581ba-42d7-4496-9fde-fe04dab6e7b4, runId = b5b023df-cb39-4048-90dc-e9a57cce4883] terminated with error
java.lang.ExceptionInInitializerError
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3365)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:2783)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:537)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at
..
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: com.fasterxml.jackson.databind.JsonMappingException:
Incompatible Jackson version: 2.9.0
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:751)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
Note also that I do have exclusions in place in the pom.xml:
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>${jackson.databind.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</exclusion>
</exclusions>
</dependency>
And similar exclusion for AWS
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.7.4</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</exclusion>
</exclusions>
</dependency>
Any thoughts on what might fix the jackson versioning issues here?
Found the answer by looking in the $SPARK_HOME/jars directory and searching for jackson-databind:
$ll *jackson-databind*
-rw-r--r--# 1 steve staff 1165323 Mar 26 17:13 jackson-databind-2.6.7.1.jar
So then updating the pom.xml for
<jackson.databind.version>2.6.7</jackson.databind.version>
resolved the issue.
For Scala, add in your build.sbt file:
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.7"

NoClassDefFoundError: io/netty/handler/timeout/IdleStateHandler Datastax dse java driver

I am trying to connect DSE 5.0 server on ubuntu (with graph enable) with my java code but got this error
Exception in thread "main" java.lang.NoClassDefFoundError: io/netty/handler/timeout/IdleStateHandler
at com.datastax.driver.core.Connection$Initializer.<init>(Connection.java:1409)
at com.datastax.driver.core.Connection.initAsync(Connection.java:144)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:796)
at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:253)
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:201)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1473)
at com.datastax.driver.core.Cluster.init(Cluster.java:159)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:330)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:305)
at com.datastax.driver.core.Cluster.connect(Cluster.java:247)
at com.datastax.driver.core.DelegatingCluster.connect(DelegatingCluster.java:71)
at com.datastax.driver.dse.DseCluster.connect(DseCluster.java:351)
As the error says the netty library is probably missing.
I added netty-all in my pom.xml but then also got same error.
Pom.xml
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>dse-driver</artifactId>
<version>1.1.1-beta1</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>dse-driver</artifactId>
<version>1.1.1-beta1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/io.netty/netty-all -->
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.6.Final</version>
</dependency>
Thanks for help..!
The java driver is built and tested against Netty 4.0 (see JAVA-1241 for 4.1 support). It's possible that there is some incompatibility that prevents this from working (although I see IdleStateHandler in that path in Netty 4.1).
If you need to use a different version of Netty in your project, you can consider using the shaded classifier of the driver which includes its own bundled version of netty under its own package structure. Since you are using the dse driver you'll also need to exclude the core driver from its dependency definition (this will be less complicated in the future):
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.1.3</version>
<classifier>shaded</classifier>
<!-- Because the shaded JAR uses the original POM, you still need
to exclude this dependency explicitly: -->
<exclusions>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>dse-driver</artifactId>
<version>1.1.1-beta1</version>
<exclusions>
<exclusion>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
</exclusion>
</exclusions>
</dependency>

org.apache.spark.sql.Row cannot be resolved in Spark 2.0 Preview

I am already using Spark 1.6.1 and now evaluating Spark 2.0 Preview, but I am not able to find org.apache.spark.sql.Row.
This is required as, I am migrating my DataFrame code in 1.6.1 to 2.0-preview. Am I missing something over here? My maven dependency is pasted below
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.0-preview</version>
<scope>system</scope>
<systemPath>C://spark-2.0.0-preview-bin-hadoop2.7//jars//spark-core_2.11-2.0.0-preview.jar</systemPath>
</dependency>
<dependency>
<groupId>com.oracle</groupId>
<artifactId>ojdbc7</artifactId>
<version>12.1.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.0-preview</version>
<scope>system</scope>
<systemPath>C://spark-2.0.0-preview-bin-hadoop2.7//jars//spark-sql_2.11-2.0.0-preview.jar</systemPath>
</dependency>
in spark v2.0.0 Row has moved to another jar file ,
add this to your maven dependency
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-catalyst_2.11</artifactId>
<version>2.0.0-preview</version>
<scope>system</scope>
<systemPath>C://spark-2.0.0-preview-bin-hadoop2.7//spark-catalyst_2.11-2.0.0-preview.jar</systemPath>
</dependency>
Use this, it works for me.
Moved from Spark 1.6.1 to 2.0
For Maven,
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-tags_2.11</artifactId>
<version>2.0.0-preview</version>
</dependency>
For SBT ,
libraryDependencies += "org.apache.spark" % "spark-tags_2.11" % "2.0.0-preview"
For Gradle,
compile group: 'org.apache.spark', name: 'spark-tags_2.11', version: '2.0.0-preview'

NoClassDefFoundError - datastax java driver for Cassandra

I am currently unable to connect to my cassandra database using the datastax driver. I am getting the following error:
com.datastax.driver.core.TransportException: [/127.0.0.1] Unexpected exception triggered (java.lang.NoSuchMethodError: com.google.common.collect.ImmutableSet.copyOf(Ljava/util/Collection;)Lcom/google/common/collect/ImmutableSet;)
at com.datastax.driver.core.Connection$Dispatcher.exceptionCaught(Connection.java:556)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122)
Caused by: java.lang.NoSuchMethodError: com.google.common.collect.ImmutableSet.copyOf(Ljava/util/Collection;)Lcom/google/common/collect/ImmutableSet;
at com.datastax.driver.core.DataType.<clinit>(DataType.java:144)
at com.datastax.driver.core.Codec.<clinit>(Codec.java:31)
However, I have included the guava artefact in my pom.xml as follows:
<!-- Datastax driver -->
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>1.0.4</version>
</dependency>
<!-- Cassandra -->
<dependency>
<groupId>org.apache.cassandra</groupId>
<artifactId>cassandra-all</artifactId>
<version>1.2.9</version>
</dependency>
<!-- guava --<
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>15.0</version>
</dependency>
Full pom.xml: http://pastebin.ubuntu.com/6358603/
Am I missing a dependency?
According to its POM, version 1.0.4 of cassandra-driver-core uses version 14.0.1 of Guava, not version 15.0. I'm guessing you are seeing a version clash. Even if that version difference is not the cause of this problem, it might cause other problems.
You do not usually need to include transitive dependencies in POMs, Maven takes care of them for you. Or does your own code use Guava itself?
Based on the advice of this question: no such method error: ImmutableList.copyOf()
I had to exclude the google collections jar:
<dependency>
<groupId>org.zkoss.zk</groupId>
<artifactId>zkspring-core</artifactId>
<version>3.1</version>
<exclusions>
<exclusion>
<groupId>com.google.collections</groupId>
<artifactId>google-collections</artifactId>
</exclusion>
</exclusions>
</dependency>

Resources