spark-cassnadra connector issue - apache-spark

I am using spark 1.6.2 with scala version 2.10.5.
Now I have installed cassndra locally and downloaded spark-cassandra-connector_2.10-1.6.2.jar from https://spark-packages.org/package/datastax/spark-cassandra-connector
But when I am trying to fire up the spark shell from the cassandra using the connector I am getting this error
can some one please help me if I am downloading the wrong version of the connector or there are some other issues?

Just put : between spark-cassandra-connector and 1.6.2 instead of _, and remove the ; character after the version of connector...
spark-shell --packages datastax:spark-cassandra-connector:1.6.2-s_2.10
But it's better to use latest from 1.6.x release: 1.6.11 instead of 1.6.2

Related

pyspark compatible hadoop aws and aws adk for version 2.4.4

I am trying to read and write from s3 buckets using pyspark with the help of these two libraries from maven https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/2.7.7 and https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk/1.7.4 which are really old. I tried with the different combinations of hadoop-aws and aws-java-SDK but it's not working with the pyspark version 2.4.4 . does anyone know which versions of Hadoop and java SDK's are compatible with spark version 2.4.4?
I am using the following:
Spark: 2.4.4
Hadoop: 2.7.3
Haddop-AWS: hadoop-aws-2.7.3.jar
AWS-JAVA-SDK: aws-java-sdk-1.7.3.jar
Scala: 2.11
Works for me and use s3a://bucket-name/
(Note: For PySPark I used aws-java-sdk-1.7.4.jar) because I wasn't able to use
df.write.csv(path=path, mode="overwrite", compression="None")

How/where to get the source jar of datastax-cassandra-connector-2.3.0 version?

I am new to cassandra ,trying to connect cassandra using spark with spark-cassandra-connector 2.3.0 jar ...while i am trying to saveToCassandra it is giving an error pointing to localhost instead of cassandra cluster. I am using scalaIDE , while debugging it expecting a source jar of spark-cassandra-connector 2.3.0 , i dont find it on maven , where to download the source jar of the same?
Below is the url where you get the sources
http://central.maven.org/maven2/com/datastax/spark/spark-cassandra-connector_2.11/2.3.0/

Spark2.2 interpreter on Zeppelin 0.7.0 running on HDP 2.6

Simply trying to test a Zeppelin interpreter to run Spark 2.2 on YARN on Zeppelin 0.7.0(HDP2.6) but repeatedly getting:
java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
All I am running is
%spark2
sc.version
With the same Spark 2.2 I can run spark-submit s and spark-shell operations running on YARN(locally and remotely) but can't make Zeppelin listen to this new version of Spark. Does Zeppelin-HDP only support Spark 2.1 and 1.6? (My Spark 2.2 is a custom installation).
The only thing that makes me believe the above is that i can see in the logs of testing the Zeppelin notebook:
Added JAR file:/usr/hdp/current/zeppelin-server/interpreter/spark/zeppelin-spark_2.10-0.7.0.2.6.0.3-8.jar
which appears to be a HDP-specific zeppelin JAR.
Please help.
Yes you are right. I was hitting a similar issue while I was running zeppelin 0.7.0 and spark 2.2.0 on mesos. Infact have a look at this commit:
https://github.com/apache/zeppelin/commit/28310c2b95785d8b9e63bc0adc5a26df8b3c9dec
The support seems to be added in 0.7.3 so try upgrading zeppelin and give it a try. I built zeppelin from master branch and it worked for me but the tag v0.7.3 should work fine as well.

How to safely remove Spark 2.2.0 and install Spark 2.1.0 instead?

I have recently installed Spark 2.2.0 and started configuration, but unfortunately 2.2.0 does not support sparklyr library yet. What is the safest way to exchange it with 2.1.0?

Spark notebook in Hue 3.11

I am trying to setup Spark notebook in HUE(version 3.11) with Spark 2.0.0 using Livy 0.2.0.
With Spark 1.6.1 the notebook is working perfectly fine.
Livy only supports Scala 2.10 builds of Spark.So I did a build of Spark-2.0.0 with Scala-2.10.6.When I open up spark-shell(2.0.0) it clears says "Using Scala version 2.10.6".
But Spark notebook is not working with this build.In the Spark notebook when I do 1+1 and execute it , it gives the following error.
What could be wrong here?Below is the exception in the logs
"java.util.concurrent.ExecutionException: com.cloudera.livy.rsc.rpc.RpcException: java.lang.NoSuchMethodError: org.json4s.jackson.JsonMethods$.render(Lorg/json4s/JsonAST$JValue;)Lorg/json4s/JsonAST$JValue;\ncom.cloudera.livy.repl.ReplDriver$$anonfun$handle$2.apply(ReplDriver.scala:78)\ncom.cloudera.livy.repl.ReplDriver$$anonfun$handle$2.apply(ReplDriver.scala:78)\nscala.Option.map(Option.scala:145)\ncom.cloudera.livy.repl.ReplDriver.handle(ReplDriver.scala:78)\nsun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\nsun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)\nsun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\njava.lang.reflect.Method.invoke(Method.java:606)\ncom.cloudera.livy.rsc.rpc.RpcDispatcher.handleCall(RpcDispatcher.java:130)\ncom.cloudera.livy.rsc.rpc.RpcDispatcher.channelRead0(RpcDispatcher.java:77)\nio.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)\nio.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)\nio.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)\nio.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)\nio.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)\nio.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)\nio.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)\nio.netty.handler.codec.ByteToMessageCodec.channelRead(ByteToMessageCodec.java:103)\nio.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)\nio.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)\nio.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)\nio.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)\nio.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)\nio.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)\nio.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)\nio.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)\nio.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)\njava.lang.Thread.run(Thread.java:745)" (error 500)
This solved my problem.
Download latest Livy code from Git hub.Use the below maven build command
mvn clean package -DskipTests -Dspark-2.0 -Dscala-2.11
I'm not sure that this is even possible.
Relying on release notes, Hue 3.11 not works with Spark 2.0 (it works with Spark 1.6).

Resources