org.apache.spark.sql.Row cannot be resolved in Spark 2.0 Preview - apache-spark

I am already using Spark 1.6.1 and now evaluating Spark 2.0 Preview, but I am not able to find org.apache.spark.sql.Row.
This is required as, I am migrating my DataFrame code in 1.6.1 to 2.0-preview. Am I missing something over here? My maven dependency is pasted below
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.0-preview</version>
<scope>system</scope>
<systemPath>C://spark-2.0.0-preview-bin-hadoop2.7//jars//spark-core_2.11-2.0.0-preview.jar</systemPath>
</dependency>
<dependency>
<groupId>com.oracle</groupId>
<artifactId>ojdbc7</artifactId>
<version>12.1.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.0-preview</version>
<scope>system</scope>
<systemPath>C://spark-2.0.0-preview-bin-hadoop2.7//jars//spark-sql_2.11-2.0.0-preview.jar</systemPath>
</dependency>

in spark v2.0.0 Row has moved to another jar file ,
add this to your maven dependency
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-catalyst_2.11</artifactId>
<version>2.0.0-preview</version>
<scope>system</scope>
<systemPath>C://spark-2.0.0-preview-bin-hadoop2.7//spark-catalyst_2.11-2.0.0-preview.jar</systemPath>
</dependency>

Use this, it works for me.
Moved from Spark 1.6.1 to 2.0
For Maven,
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-tags_2.11</artifactId>
<version>2.0.0-preview</version>
</dependency>
For SBT ,
libraryDependencies += "org.apache.spark" % "spark-tags_2.11" % "2.0.0-preview"
For Gradle,
compile group: 'org.apache.spark', name: 'spark-tags_2.11', version: '2.0.0-preview'

Related

Jackson version issue in spark structured streaming

When using spark structured streaming with spark-sql-kafka-0-10_2.11 I was seeing MethodNotFoundError's . Based on another question Cannot run queries in SQLContext from Apache Spark SQL 1.5.2, getting java.lang.NoSuchMethodError
I tried to explicitly set the jackson version.
Versions 2.9.6, 2.4.3, 2.9.0 have been tried. The 2.4.3 says "Jackson version too old". The other versions say
Caused by: com.fasterxml.jackson.databind.JsonMappingException:
Incompatible Jackson version: 2.9.0
Here is the full ST for 2.9.0:
19/05/10 11:30:18 ERROR MicroBatchExecution: Query [id = dbd581ba-42d7-4496-9fde-fe04dab6e7b4, runId = b5b023df-cb39-4048-90dc-e9a57cce4883] terminated with error
java.lang.ExceptionInInitializerError
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3365)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:2783)
at org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:537)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
at at org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at
..
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: com.fasterxml.jackson.databind.JsonMappingException:
Incompatible Jackson version: 2.9.0
at com.fasterxml.jackson.module.scala.JacksonModule$class.setupModule(JacksonModule.scala:64)
at com.fasterxml.jackson.module.scala.DefaultScalaModule.setupModule(DefaultScalaModule.scala:19)
at com.fasterxml.jackson.databind.ObjectMapper.registerModule(ObjectMapper.java:751)
at org.apache.spark.rdd.RDDOperationScope$.<init>(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDDOperationScope$.<clinit>(RDDOperationScope.scala)
Note also that I do have exclusions in place in the pom.xml:
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>${jackson.databind.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_${scala.binary.version}</artifactId>
<version>${spark.version}</version>
<scope>compile</scope>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</exclusion>
</exclusions>
</dependency>
And similar exclusion for AWS
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.7.4</version>
<exclusions>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
</exclusion>
<exclusion>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
</exclusion>
</exclusions>
</dependency>
Any thoughts on what might fix the jackson versioning issues here?
Found the answer by looking in the $SPARK_HOME/jars directory and searching for jackson-databind:
$ll *jackson-databind*
-rw-r--r--# 1 steve staff 1165323 Mar 26 17:13 jackson-databind-2.6.7.1.jar
So then updating the pom.xml for
<jackson.databind.version>2.6.7</jackson.databind.version>
resolved the issue.
For Scala, add in your build.sbt file:
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.6.7"

Apache spark 2.3 over Apache HBase 2.0

Need to add spark connector over HBase where
Spark version: 2.3.1
HBase Version: 2.0.0
Getting Bellow Exception:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.deploy.SparkHadoopUtil.getCurrentUserCredentials()Lorg/apache/hadoop/security/Credentials;
at org.apache.hadoop.hbase.spark.HBaseContext.<init>(HBaseContext.scala:68)
at org.apache.hadoop.hbase.spark.JavaHBaseContext.<init>(JavaHBaseContext.scala:46)
at com.cloud.databaseroot.hbase.spark.JavaHBaseBulkPutExample.main(JavaHBaseBulkPutExample.java:60)
Snap from pom.xml:
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-spark</artifactId>
<version>2.0.0-alpha4</version>
<exclusions>
<exclusion>
<artifactId>jackson-module-scala_2.10</artifactId>
<groupId>com.fasterxml.jackson.module</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.17.Final</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.module</groupId>
<artifactId>jackson-module-scala_2.11</artifactId>
<version>2.8.8</version>
</dependency>
Let me know where am I getting wrong.
It seems that hbase-spark version 2.0.0-alpha4 is not compatible with Spark 2.3.1.
SparkHadoopUtil.getCurrentUserCredentials method is available in Spark version <= 2.2. Either downgrade Spark or build hbase-spark with Spark 2.3.1 which may require some code changes in it.

NoClassDefFoundError: io/netty/handler/timeout/IdleStateHandler Datastax dse java driver

I am trying to connect DSE 5.0 server on ubuntu (with graph enable) with my java code but got this error
Exception in thread "main" java.lang.NoClassDefFoundError: io/netty/handler/timeout/IdleStateHandler
at com.datastax.driver.core.Connection$Initializer.<init>(Connection.java:1409)
at com.datastax.driver.core.Connection.initAsync(Connection.java:144)
at com.datastax.driver.core.Connection$Factory.open(Connection.java:796)
at com.datastax.driver.core.ControlConnection.tryConnect(ControlConnection.java:253)
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:201)
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:79)
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1473)
at com.datastax.driver.core.Cluster.init(Cluster.java:159)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:330)
at com.datastax.driver.core.Cluster.connectAsync(Cluster.java:305)
at com.datastax.driver.core.Cluster.connect(Cluster.java:247)
at com.datastax.driver.core.DelegatingCluster.connect(DelegatingCluster.java:71)
at com.datastax.driver.dse.DseCluster.connect(DseCluster.java:351)
As the error says the netty library is probably missing.
I added netty-all in my pom.xml but then also got same error.
Pom.xml
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>dse-driver</artifactId>
<version>1.1.1-beta1</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>dse-driver</artifactId>
<version>1.1.1-beta1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/io.netty/netty-all -->
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.6.Final</version>
</dependency>
Thanks for help..!
The java driver is built and tested against Netty 4.0 (see JAVA-1241 for 4.1 support). It's possible that there is some incompatibility that prevents this from working (although I see IdleStateHandler in that path in Netty 4.1).
If you need to use a different version of Netty in your project, you can consider using the shaded classifier of the driver which includes its own bundled version of netty under its own package structure. Since you are using the dse driver you'll also need to exclude the core driver from its dependency definition (this will be less complicated in the future):
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.1.3</version>
<classifier>shaded</classifier>
<!-- Because the shaded JAR uses the original POM, you still need
to exclude this dependency explicitly: -->
<exclusions>
<exclusion>
<groupId>io.netty</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>dse-driver</artifactId>
<version>1.1.1-beta1</version>
<exclusions>
<exclusion>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
</exclusion>
</exclusions>
</dependency>

What's happening in HyperJaxb3 project?

I understand that the Hyperjaxb3 library will be quite useful for my project, read a couple of descriptions across multiple sites, and decided to embed it into my Spring-Hibernate project.
I have found a reference to Hyperjaxb3 in https://jaxb.java.net/, which looks pretty official, but the hyperlink - http://confluence.highsource.org/display/HJ3/Home - doesn't open.
I have found some old POM examples, included it into my project, and located some of the old versions references, tried to eliminate them, but right now it seems that I am bumping into a dependency on an old Hibernate version, the error is like this:
java.util.ServiceConfigurationError: com.sun.tools.xjc.Plugin: Provider org.jvnet.hyperjaxb3.hibernate.plugin.HibernatePlugin could not be instantiated: java.lang.NoClassDefFoundError: org/hibernate/type/MutableType
I am wondering if there is better Maven entry, if the project is alive and how do I use it with moder Hibernate.
This is my pom excerpt about Hyperjaxb3, where I exclude some outdated links and specify the latest versions of other dependencies:
<dependency>
<groupId>org.glassfish.jaxb</groupId>
<artifactId>jaxb-core</artifactId>
<version>${jaxb-version}</version>
</dependency>
<dependency>
<groupId>org.glassfish.jaxb</groupId>
<artifactId>jaxb-xjc</artifactId>
<version>${jaxb-version}</version>
</dependency>
<!--<dependency>
<groupId>org.jvnet.hyperjaxb3</groupId>
<artifactId>hyperjaxb3</artifactId>
<version>0.6.1</version>
</dependency> -->
<dependency>
<groupId>org.jvnet.hyperjaxb3</groupId>
<artifactId>hyperjaxb3-hibernate-plugin</artifactId>
<version>0.1</version>
<exclusions>
<exclusion>
<groupId>hsqldb</groupId>
<artifactId>hsqldb</artifactId>
</exclusion>
<exclusion>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
<exclusion>
<groupId>net.sf.saxon</groupId>
<artifactId>saxon</artifactId>
</exclusion>
<exclusion>
<groupId>net.sf.saxon</groupId>
<artifactId>saxon-dom</artifactId>
</exclusion>
<exclusion>
<groupId>org.hibernate</groupId>
<artifactId>hibernate</artifactId>
</exclusion>
<exclusion>
<groupId>org.springframework</groupId>
<artifactId>spring</artifactId>
</exclusion>
<exclusion>
<groupId>com.sun.xml.bind</groupId>
<artifactId>jaxb-xjc</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>asm</groupId>
<artifactId>asm</artifactId>
<version>3.3.1</version>
</dependency>
<dependency>
<groupId>asm</groupId>
<artifactId>asm-attrs</artifactId>
<version>2.2.3</version>
</dependency>
<dependency>
<groupId>cglib</groupId>
<artifactId>cglib</artifactId>
<version>3.2.1</version>
</dependency>
<dependency>
<groupId>commons-beanutils</groupId>
<artifactId>commons-beanutils</artifactId>
<version>1.9.2</version>
</dependency>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
</dependency>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.2</version>
</dependency>
I am not currently trying to generate annotated Hibernate entity classes, but POJOs from the PurchaseOdrer example. This is what I currently do:
public void initializeModel(String name, InputStream src, String dir) throws IOException, URISyntaxException{
dir = Paths.get(new URL(dir).toURI()).toString();
File directory = new File(dir);
directory.mkdirs();
SchemaCompiler sc = XJC.createSchemaCompiler();
sc.setDefaultPackageName(this.getClass().getPackage().getName() + ".generated");
InputSource is = new InputSource(src);
is.setSystemId(name);
sc.parseSchema(is);
S2JJAXBModel model = sc.bind();
JCodeModel codeModel = model.generateCode(null, null);
CodeWriter cw = new FileCodeWriter(directory);
codeModel.build(cw);
}
Disclaimer: I'm the author of Hyperjaxb3.
The project is hosted on GitHub:
https://github.com/highsource/hyperjaxb3
The latest version 0.6.1 is functional, works as it should.
However, I don't develop it actively anymore.
Will it work with the current version of hibernate?
Version 0.6.1 works was tested with Hibernate 4.1.7. HJ3 is just a code generator which produces standard JPA-annotated classes. So chances are pretty good that it will work with latest versions of Hibernate.
I just can't get the 0.6.1 jar from Maven. Seems like it was eliminated from maven repositories.
Really? Still there.
http://repo1.maven.org/maven2/org/jvnet/hyperjaxb3/hyperjaxb3-ejb-plugin/0.6.1/
An I don't understand if I need the "hyperjaxb3-hibernate-plugin" 0.1 from the year 2011.
You definitely don't.
Or do you aware of any fork or an analog?
Unfortunately, nothing comes close.

NoClassDefFoundError - datastax java driver for Cassandra

I am currently unable to connect to my cassandra database using the datastax driver. I am getting the following error:
com.datastax.driver.core.TransportException: [/127.0.0.1] Unexpected exception triggered (java.lang.NoSuchMethodError: com.google.common.collect.ImmutableSet.copyOf(Ljava/util/Collection;)Lcom/google/common/collect/ImmutableSet;)
at com.datastax.driver.core.Connection$Dispatcher.exceptionCaught(Connection.java:556)
at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:122)
Caused by: java.lang.NoSuchMethodError: com.google.common.collect.ImmutableSet.copyOf(Ljava/util/Collection;)Lcom/google/common/collect/ImmutableSet;
at com.datastax.driver.core.DataType.<clinit>(DataType.java:144)
at com.datastax.driver.core.Codec.<clinit>(Codec.java:31)
However, I have included the guava artefact in my pom.xml as follows:
<!-- Datastax driver -->
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>1.0.4</version>
</dependency>
<!-- Cassandra -->
<dependency>
<groupId>org.apache.cassandra</groupId>
<artifactId>cassandra-all</artifactId>
<version>1.2.9</version>
</dependency>
<!-- guava --<
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>15.0</version>
</dependency>
Full pom.xml: http://pastebin.ubuntu.com/6358603/
Am I missing a dependency?
According to its POM, version 1.0.4 of cassandra-driver-core uses version 14.0.1 of Guava, not version 15.0. I'm guessing you are seeing a version clash. Even if that version difference is not the cause of this problem, it might cause other problems.
You do not usually need to include transitive dependencies in POMs, Maven takes care of them for you. Or does your own code use Guava itself?
Based on the advice of this question: no such method error: ImmutableList.copyOf()
I had to exclude the google collections jar:
<dependency>
<groupId>org.zkoss.zk</groupId>
<artifactId>zkspring-core</artifactId>
<version>3.1</version>
<exclusions>
<exclusion>
<groupId>com.google.collections</groupId>
<artifactId>google-collections</artifactId>
</exclusion>
</exclusions>
</dependency>

Resources