Unable to find CassandraSQLContext - apache-spark

I am using spark 1.6 prebuild with hadoop 2.6
spark cassandra connector 1.6
cassandra 2.1.12
I wrote a simple scala program to run simple select count(*) query on cassandra here is my code
import org.apache.spark.{SparkContext, SparkConf}
import com.datastax.spark.connector._
import org.apache.spark.sql.cassandra.CassandraSQLContext
import org.apache.spark.sql._
object Hi {
def main(args: Array[String])
{
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "172.16.4.196")
val sc = new SparkContext("spark://naresh-pc:7077", "test", conf)
val csc = new CassandraSQLContext(sc)
val rdd1 = csc.sql("SELECT count(*) from cw.testdata")
println(rdd1.count)
println(rdd1.first)
}
}
it is successfully building with sbt assembly and creating jar
when i submit using spark submit
it gives the following error
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/cassandra/CassandraSQLContext
at Hi$.main(trySpark.scala:15)
at Hi.main(trySpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.cassandra.CassandraSQLContext
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
Any help on this ?
Moreover when i run with spark-shell it works fine

Related

pyspark configuration for connecting google cloud platform

i can't connect to my google cloud platform via pyspark, can anyone help?
i am not using dataproc, just a local spark instance
background:
i have downloaded all the jars file into $SPARK_HOME/jars, including
google-api-client-2.0.0.jar
google-auth-library-credentials-1.12.1.jar
google-auth-library-oauth2-http-1.12.1.jar
google-http-client-1.42.2.jar
gcs-connector-hadoop3-2.2.8.jar
guava-14.0.1.jar
guava-31.1-jre.jar
i am using the docker image: jupyter-notebook
code:
from pyspark.sql import SparkSession
builder = SparkSession.builder.appName('GCSFilesRead').config("google.cloud.auth.service.account.enable", "true")\
.config("google.cloud.auth.service.account.json.keyfile","/home/jovyan/work/gcs_admin.json")\
.config('fs.gs.auth.type','SERVICE_ACCOUNT_JSON_KEYFILE')
spark = builder.getOrCreate()
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = '/home/jovyan/work/gcs_admin.json'
spark._jsc.hadoopConfiguration().set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
spark._jsc.hadoopConfiguration().set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
bucket_name="mybucket"
path=f"gs://{bucket_name}/my_file.csv"
df=spark.read.option("header",True).csv(path, header=True)
df.show()
Py4JJavaError: An error occurred while calling o59.csv.
: java.lang.NoClassDefFoundError: com/google/api/client/auth/oauth2/Credential
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1012)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:150)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:862)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:760)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:681)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:639)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:467)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2625)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2590)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2686)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3431)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:53)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:537)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:568)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.ClassNotFoundException: com.google.api.client.auth.oauth2.Credential
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:520)
... 40 more

Py4JJavaError An error occurred while calling o37.load. : net.snowflake.client.jdbc.SnowflakeSQLException: JDBC driver encountered communication error

I am trying to connect (PySpark + Snowflake) been continuously getting the error.
I am using PySpark 3.1, JDBC and Spark_Snowflake jar files also placed in Classpath. Not sure why I am getting the following error.
Code:
from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *
from pyspark import SparkConf, SparkContext
sc = SparkContext("local", "Test App")
spark = SQLContext(sc)
spark_conf = SparkConf().setMaster('local').setAppName('Testing Spark SF')
sfOptions = {
"sfURL" : "<account_identifier>.snowflakecomputing.com",
"sfUser" : "<user_name>",
"sfPassword" : "<password>",
"sfDatabase" : "<database>",
"sfSchema" : "<schema>",
"sfWarehouse" : "<warehouse>"
}
SNOWFLAKE_SOURCE_NAME = "net.snowflake.spark.snowflake"
df = spark.read.format(SNOWFLAKE_SOURCE_NAME) \
.options(**sfOptions) \
.option("query", "select 1 as my_num union all select 2 as my_num") \
.load()
df.show()
Error:
Py4JJavaError: An error occurred while calling o37.load. :
net.snowflake.client.jdbc.SnowflakeSQLException: JDBC driver
encountered communication error. Message: Exception encountered for
HTTP request: sun.security.validator.ValidatorException: No trusted
certificate found. at
net.snowflake.client.jdbc.RestRequest.execute(RestRequest.java:284)
at
net.snowflake.client.core.HttpUtil.executeRequestInternal(HttpUtil.java:639)
at
net.snowflake.client.core.HttpUtil.executeRequest(HttpUtil.java:584)
at
net.snowflake.client.core.HttpUtil.executeGeneralRequest(HttpUtil.java:551)
at
net.snowflake.client.core.SessionUtil.newSession(SessionUtil.java:587)
at
net.snowflake.client.core.SessionUtil.openSession(SessionUtil.java:285)
at net.snowflake.client.core.SFSession.open(SFSession.java:446) at
net.snowflake.client.jdbc.DefaultSFConnectionHandler.initialize(DefaultSFConnectionHandler.java:104)
at
net.snowflake.client.jdbc.DefaultSFConnectionHandler.initializeConnection(DefaultSFConnectionHandler.java:79)
at
net.snowflake.client.jdbc.SnowflakeConnectionV1.initConnectionWithImpl(SnowflakeConnectionV1.java:116)
at
net.snowflake.client.jdbc.SnowflakeConnectionV1.(SnowflakeConnectionV1.java:96)
at
net.snowflake.client.jdbc.SnowflakeDriver.connect(SnowflakeDriver.java:172)
at java.sql.DriverManager.getConnection(DriverManager.java:664) at
java.sql.DriverManager.getConnection(DriverManager.java:208) at
net.snowflake.spark.snowflake.JDBCWrapper.getConnector(SnowflakeJDBCWrapper.scala:209)
at
net.snowflake.spark.snowflake.SnowflakeRelation.$anonfun$schema$1(SnowflakeRelation.scala:60)
at
net.snowflake.spark.snowflake.SnowflakeRelation$$Lambda$866/22415031.apply(Unknown
Source) at scala.Option.getOrElse(Option.scala:189) at
net.snowflake.spark.snowflake.SnowflakeRelation.schema$lzycompute(SnowflakeRelation.scala:57)
at
net.snowflake.spark.snowflake.SnowflakeRelation.schema(SnowflakeRelation.scala:56)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:449)
at
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:325)
at
org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
at
org.apache.spark.sql.DataFrameReader$$Lambda$858/5135046.apply(Unknown
Source) at scala.Option.getOrElse(Option.scala:189) at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
at
org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:225)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497) at
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282) at
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79) at
py4j.GatewayConnection.run(GatewayConnection.java:238) at
java.lang.Thread.run(Thread.java:745) Caused by:
javax.net.ssl.SSLHandshakeException:
sun.security.validator.ValidatorException: No trusted certificate
found at sun.security.ssl.Alerts.getSSLException(Alerts.java:198)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1958)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:322) at
sun.security.ssl.Handshaker.fatalSE(Handshaker.java:316) at
sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1526)
at
sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:215)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1024)
at sun.security.ssl.Handshaker.process_record(Handshaker.java:954)
at
sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1065)
at
sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1384)
at
sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1412)
at
sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1396)
at
net.snowflake.client.jdbc.internal.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:436)
It's obvious you have issues with the SSL certificate. You can override that temporarily.
sfOptions = {
...
"sfSSL" : "false",
}
However, you can check if you access Snowflake through a proxy.
You will need to import proxy's certificate and include it inside your cacerts.
the default location of cacerts of your running java version. you can locate it inside the java home directory under lib/security.
keytool -import -trustcacerts -alias cert_ssl -file proxy.cer -noprompt -storepass changeit -keystore cacerts

Cannot connect to hive through scala in spark sql

I tried to connect to hive and select some data for test in spark sql by scala, but I failed to do so. The code I use is as follows:
object LoadHive {
def main(args: Array[String]) {
if (args.length < 2) {
println("Usage: [sparkmaster] [tablename]")
exit(1)
}
val master = args(0)
val tableName = args(1)
val sc = new SparkContext(master, "LoadHive", System.getenv("SPARK_HOME"))
val hiveCtx = new HiveContext(sc)
val input = hiveCtx.sql("FROM src SELECT key, value")
val data = input.map(_.getInt(0))
println(data.collect().toList)
}
}
And the exception log is as follows:
16/11/28 07:25:34 WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:204)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:249)
at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:327)
at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:237)
at org.apache.spark.sql.hive.HiveContext.setConf(HiveContext.scala:441)
at org.apache.spark.sql.hive.HiveContext.defaultOverrides(HiveContext.scala:226)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:229)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at com.oreilly.learningsparkexamples.scala.LoadHive$.main(LoadHive.scala:19)
at com.oreilly.learningsparkexamples.scala.LoadHive.main(LoadHive.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
... 26 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
... 32 more
Caused by: java.lang.NoSuchMethodError: org.apache.thrift.EncodingUtils.setBit(BIZ)B
at org.apache.hadoop.hive.metastore.api.PrivilegeGrantInfo.setCreateTimeIsSet(PrivilegeGrantInfo.java:245)
at org.apache.hadoop.hive.metastore.api.PrivilegeGrantInfo.<init>(PrivilegeGrantInfo.java:163)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultRoles_core(HiveMetaStore.java:675)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultRoles(HiveMetaStore.java:645)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:462)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
... 37 more
I feel very confused about this, has anybody met this issue? Do I need to built the spark with hive to solve this issue? I think the spark already support hive. The spark version I use is 1.6.1. Any help is appreciated!
To verify that Spark is working with Hive run
./bin/spark-sql --master local[*]
and execute any sql query (e.g. select count(1) from src;)
Note that Spark has to be built with Hive support and hive-site.xml exists on classpath.

Get a java.lang.LinkageError: ClassCastException when use spark sql hivesql on yarn

This is the driver i upload to yarn-cluster:
package com.baidu.spark.forhivetest
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.hive._
import org.apache.spark.SparkContext
object ForTest {
def main(args : Array[String]){
val sc = new SparkContext()
val sqlc = new SQLContext(sc)
val hivec = new HiveContext(sc)
hivec.sql("CREATE TABLE IF NOT EXISTS newtest (time TIMESTAMP,word STRING,current_city_name STRING,content_src_name STRING,content_name STRING)")
val schema = hivec.table("newtest").schema
println(schema)
}
In hive config file: i set the hive.metastore.uris and hive.metastore.warehouse.dir
On spark-sumbit I do added jars
datanucleus-api-jdo-3.2.6.jar
datanucleus-core-3.2.10.jar
datanucleus-rdbms-3.2.9.jar
Even if I added the mysql-connector-java-5.1.38-bin.jar and spark-1.6.0-bin-hadoop2.6/lib/guava-14.0.1.jar , I still get this error!
But when i run this spark on ide it works successly!
Hope someone can help me ! thx a lot!
This is the error information:
java.lang.LinkageError: ClassCastException: attempting to castjar:file:/mnt/hadoop/yarn/local/filecache/18/spark-assembly-1.6.0-hadoop2.6.0.jar!/javax/ws/rs/ext/RuntimeDelegate.classtojar:file:/mnt/hadoop/yarn/local/filecache/18/spark-assembly-1.6.0-hadoop2.6.0.jar!/javax/ws/rs/ext/RuntimeDelegate.class
at javax.ws.rs.ext.RuntimeDelegate.findDelegate(RuntimeDelegate.java:116)
at javax.ws.rs.ext.RuntimeDelegate.getInstance(RuntimeDelegate.java:91)
at javax.ws.rs.core.MediaType.<clinit>(MediaType.java:44)
at com.sun.jersey.core.header.MediaTypes.<clinit>(MediaTypes.java:64)
at com.sun.jersey.core.spi.factory.MessageBodyFactory.initReaders(MessageBodyFactory.java:182)
at com.sun.jersey.core.spi.factory.MessageBodyFactory.initReaders(MessageBodyFactory.java:175)
at com.sun.jersey.core.spi.factory.MessageBodyFactory.init(MessageBodyFactory.java:162)
at com.sun.jersey.api.client.Client.init(Client.java:342)
at com.sun.jersey.api.client.Client.access$000(Client.java:118)
at com.sun.jersey.api.client.Client$1.f(Client.java:191)
at com.sun.jersey.api.client.Client$1.f(Client.java:187)
at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)
at com.sun.jersey.api.client.Client.<init>(Client.java:187)
at com.sun.jersey.api.client.Client.<init>(Client.java:170)
at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceInit(TimelineClientImpl.java:268)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.hive.ql.hooks.ATSHook.<init>(ATSHook.java:67)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:374)
at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:60)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1309)
at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1293)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1347)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:484)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:473)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:279)
at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:226)
at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:225)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:268)
at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:473)
at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:463)
at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:605)
at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
at com.baidu.spark.forhivetest.ForTest$.main(ForTest.scala:12)
at com.baidu.spark.forhivetest.ForTest.main(ForTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:542)
16/03/22 17:04:32 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.LinkageError: ClassCastException: attempting to castjar:file:/mnt/hadoop/yarn/local/filecache/18/spark-assembly-1.6.0-hadoop2.6.0.jar!/javax/ws/rs/ext/RuntimeDelegate.classtojar:file:/mnt/hadoop/yarn/local/filecache/18/spark-assembly-1.6.0-hadoop2.6.0.jar!/javax/ws/rs/ext/RuntimeDelegate.class)
This has to do with your class path. Try not to build a fat jar.

Spark Tutorial for Avro

I've started with Spark and my use case is to read Avro file (data source) and perform ETL based on rules. As a start I just wanted to try reading the AVRO and create an RDD. Based on a recommendation in one of the stackoverflow sites I `
object abc {
def main(args: Array[String]): Unit =
{
//val master = Properties.envOrElse("MASTER",args(0))
val path = args(0)
val sparkContext = new SparkContext(new SparkConf().setAppName("My-spark-app"))
val jobConf = new JobConf(sparkContext.hadoopConfiguration)
val rdd = sparkContext.hadoopFile (
path,
classOf[org.apache.avro.mapred.AvroInputFormat[GenericRecord]],
classOf[org.apache.avro.mapred.AvroWrapper[GenericRecord]],
classOf[org.apache.hadoop.io.NullWritable],
10)
println(rdd.first)
}
}`
My environment is CDH 5.1.3. I'm getting the following error.
15/03/17 08:53:58 INFO YarnClientClusterScheduler: YarnClientClusterScheduler.postStartHook done
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/avro/mapred/AvroInputFormat
at com.scif.afw.abc$.main(abc.scala:30)
at com.scif.afw.abc.main(abc.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.avro.mapred.AvroInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
I've built the project using Maven and my POM has the Avro jar and I can see the class in the jar.
Appreciate any help
If you are using yarn cluster, there could be avro jar present from yarn.application.classpath. NoClassDefFound could be caused by multiple instances of the same class in classpath(1 from your jar and 2nd from default yarn app classpath)

Resources