JMeter 3.3 connect Spark 2.2.1 error: "Cannot create PoolableConnectionFactory (Method not supported)" - apache-spark

Use this list of jars, I can successfully connect SQuirrel SQL to Spark 2.2.1:
commons-logging-1.1.3.jar
hadoop-common-2.7.3.jar
hive-exec-1.2.1.spark2.jar
hive-jdbc-1.2.1.spark2.jar
hive-metastore-1.2.1.spark2.jar
http-client-1.0.4.jar
httpclient-4.5.2.jar
httpcore-4.4.4.jar
libfb303-0.9.3.jar
libthrift-0.9.3.jar
log4j-1.2.17.jar
slf4j-api-1.7.16.jar
slf4j-log4j12-1.7.16.jar
spark-hive-thriftserver_2.11-2.2.1.jar
spark-hive_2.11-2.2.1.jar
spark-network-common_2.11-2.2.1.jar
I think the above jars are more than necessary. But when trying to connect JMeter 3.3 to same Spark 2.2.1 ThriftServer with them, I got below error message
enter code here Cannot create PoolableConnectionFactory (Method not supported)
The JDBC configuration is here:
The full response at Jmeter is here:
Thread Name: test 1-1
Sample Start: 2018-04-03 13:34:43 CST
Load time: 511
Connect Time: 510
Latency: 0
Size in bytes: 62
Sent bytes:0
Headers size in bytes: 0
Body size in bytes: 62
Sample Count: 1
Error Count: 1
Data type ("text"|"bin"|""): text
Response code: null 0
Response message: java.sql.SQLException: Cannot create PoolableConnectionFactory (Method not supported)
Response headers:
SampleResult fields:
ContentType: text/plain
DataEncoding: UTF-8
I also try to use newer Hive JDBC driver 2.3.0, but it is obviously that is not worked with Spark 2.2.1 either on beeline or any others including Jmeter.
Error message when using beeline with Hive JDBC driver 2.3.0 is here:
$ beeline -u jdbc:hive2://<hostip>:10000/tpch_sf100_orc -n rxxxds
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/apache-hive-2.3.0-bin/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/apache-tez-0.9.0-bin/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://<hostip>:10000/tpch_sf100_orc
18/04/03 13:41:58 [main]: ERROR jdbc.HiveConnection: Error opening session
org.apache.thrift.TApplicationException: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000, use:database=tpch_sf100_orc})
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:79) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hive.service.rpc.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:168) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hive.service.rpc.thrift.TCLIService$Client.OpenSession(TCLIService.java:155) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:680) [hive-jdbc-2.3.0.jar:2.3.0]
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:200) [hive-jdbc-2.3.0.jar:2.3.0]
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107) [hive-jdbc-2.3.0.jar:2.3.0]
at java.sql.DriverManager.getConnection(DriverManager.java:664) [?:1.8.0_112]
at java.sql.DriverManager.getConnection(DriverManager.java:208) [?:1.8.0_112]
at org.apache.hive.beeline.DatabaseConnection.connect(DatabaseConnection.java:145) [hive-beeline-2.3.0.jar:2.3.0]
at org.apache.hive.beeline.DatabaseConnection.getConnection(DatabaseConnection.java:209) [hive-beeline-2.3.0.jar:2.3.0]
at org.apache.hive.beeline.Commands.connect(Commands.java:1641) [hive-beeline-2.3.0.jar:2.3.0]
at org.apache.hive.beeline.Commands.connect(Commands.java:1536) [hive-beeline-2.3.0.jar:2.3.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_112]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_112]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
at org.apache.hive.beeline.ReflectiveCommandHandler.execute(ReflectiveCommandHandler.java:56) [hive-beeline-2.3.0.jar:2.3.0]
at org.apache.hive.beeline.BeeLine.execCommandWithPrefix(BeeLine.java:1274) [hive-beeline-2.3.0.jar:2.3.0]
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1313) [hive-beeline-2.3.0.jar:2.3.0]
at org.apache.hive.beeline.BeeLine.connectUsingArgs(BeeLine.java:867) [hive-beeline-2.3.0.jar:2.3.0]
at org.apache.hive.beeline.BeeLine.initArgs(BeeLine.java:776) [hive-beeline-2.3.0.jar:2.3.0]
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:1010) [hive-beeline-2.3.0.jar:2.3.0]
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:519) [hive-beeline-2.3.0.jar:2.3.0]
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501) [hive-beeline-2.3.0.jar:2.3.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_112]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_112]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_112]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_112]
at org.apache.hadoop.util.RunJar.run(RunJar.java:239) [hadoop-common-2.9.0.jar:?]
at org.apache.hadoop.util.RunJar.main(RunJar.java:153) [hadoop-common-2.9.0.jar:?]
18/04/03 13:41:58 [main]: WARN jdbc.HiveConnection: Failed to connect to <hostip>:10000
Error: Could not open client transport with JDBC Uri: jdbc:hive2://<hostip>:10000/tpch_sf100_orc: Could not establish connection to jdbc:hive2://<hostip>:10000/tpch_sf100_orc: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null, configuration:{set:hiveconf:hive.server2.thrift.resultset.default.fetch.size=1000, use:database=tpch_sf100_orc}) (state=08S01,code=0)
Beeline version 2.3.0 by Apache Hive
What else can be done to connect JMeter to Spark?

Most probably this is due to client and server libraries mismatch, you need to use the same version as on the server.
So identify which version of hive is running on your server and download appropriate hive-jdbc jar and all the dependencies (you can retrieve them using i.e. Maven Dependency Plugin) and update the ones in JMeter Classpath with the correct versions.
It might be better to use "clean" JMeter installation for this to avoid possible jar hell so it is a good time to upgrade to JMeter 4.0

You could also build your own "JDBC sampler" and add/adapt just few line of code from the original to make it works. I had to do that for Hive and Phoenix support 4 years ago and it worked. It required a Timeout method that was not implemented back then.
Here are the sources:
https://github.com/csalperwyck/JMeterJDBCSamplerWithOutTimeOut

Related

Ignite Yarn and Hortonworks

I'm trying to deploy ignite so that I can use the shared RDD/Dataframe cache for my spark cluster. I've followed the spark install instructions and choose to deploy into my existing yarn cluster running spark. I'm using HDP to deploy spark.
I've already verified that Resource Manager and History server are listening on the ports below and I can telnet to each port. What am I doing wrong? Am I not deploying this the way it is intended?
I'm running:
yarn jar ignite-yarn-2.6.0.jar ./ignite-yarn-2.6.0.jar ../../../cluster.properties
Error below:
18/09/24 22:13:38 INFO client.RMProxy: Connecting to ResourceManager at dev01clus02.dna.local/172.31.31.5:8050
18/09/24 22:13:38 INFO client.AHSProxy: Connecting to Application History server at dev01clus02.dna.local/172.31.31.5:10200
Exception in thread "main" java.lang.RuntimeException: Failed update ignite.
at org.apache.ignite.yarn.IgniteProvider.updateIgnite(IgniteProvider.java:243)
at org.apache.ignite.yarn.IgniteProvider.getIgnite(IgniteProvider.java:93)
at org.apache.ignite.yarn.IgniteYarnClient.getIgnite(IgniteYarnClient.java:194)
at org.apache.ignite.yarn.IgniteYarnClient.main(IgniteYarnClient.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:209)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:675)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1569)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)
at org.apache.ignite.yarn.IgniteProvider.updateIgnite(IgniteProvider.java:220)
... 9 more
It looks like a piece of insfrastructure, provided by GridGain for Apache Ignite project, does not work currently. I'll raise the issue.
In the meantime, you can provide IGNITE_PATH property (in config, system properties or env) pointed to unzipped Apache Ignite 2.6 distribution directory to avoid downloading attempts altogether.

How to integrate Ganglia for Spark 2.1 Job metrics, Spark ignoring Ganglia metrics

I am trying to integrate Spark 2.1 job's metrics to Ganglia.
My spark-default.conf looks like
*.sink.ganglia.class org.apache.spark.metrics.sink.GangliaSink
*.sink.ganglia.name Name
*.sink.ganglia.host $MASTERIP
*.sink.ganglia.port $PORT
*.sink.ganglia.mode unicast
*.sink.ganglia.period 10
*.sink.ganglia.unit seconds
When i submit my job i can see the warn
Warning: Ignoring non-spark config property: *.sink.ganglia.host=host
Warning: Ignoring non-spark config property: *.sink.ganglia.name=Name
Warning: Ignoring non-spark config property: *.sink.ganglia.mode=unicast
Warning: Ignoring non-spark config property: *.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink
Warning: Ignoring non-spark config property: *.sink.ganglia.period=10
Warning: Ignoring non-spark config property: *.sink.ganglia.port=8649
Warning: Ignoring non-spark config property: *.sink.ganglia.unit=seconds
My environment details are
Hadoop : Amazon 2.7.3 - emr-5.7.0
Spark : Spark 2.1.1,
Ganglia: 3.7.2
If you have any inputs or any other alternative of Ganglia please reply.
according to the spark docs
The metrics system is configured via a configuration file that Spark expects to be present at $SPARK_HOME/conf/metrics.properties. A custom file location can be specified via the spark.metrics.conf configuration property.
so instead of having these confs in spark-default.conf, move them to $SPARK_HOME/conf/metrics.properties
For EMR specifically, you'll need to put these settings in /etc/spark/conf/metrics.properties on the master node.
Spark on EMR does include the Ganglia library:
$ ls -l /usr/lib/spark/external/lib/spark-ganglia-lgpl_*
-rw-r--r-- 1 root root 28376 Mar 22 00:43 /usr/lib/spark/external/lib/spark-ganglia-lgpl_2.11-2.3.0.jar
In addition, your example is missing the equals sign (=) between the config names and values - unsure if that's an issue. Below is an example config that worked successfully for me.
*.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink
*.sink.ganglia.name=AMZN-EMR
*.sink.ganglia.host=$MASTERIP
*.sink.ganglia.port=8649
*.sink.ganglia.mode=unicast
*.sink.ganglia.period=10
*.sink.ganglia.unit=seconds
From this page:
https://spark.apache.org/docs/latest/monitoring.html
Spark also supports a Ganglia sink which is not included in the default build due to licensing restrictions:
GangliaSink: Sends metrics to a Ganglia node or multicast group.
**To install the GangliaSink you’ll need to perform a custom build of Spark**. Note that by embedding this library you will include LGPL-licensed code in your Spark package. For sbt users, set the SPARK_GANGLIA_LGPL environment variable before building. For Maven users, enable the -Pspark-ganglia-lgpl profile. In addition to modifying the cluster’s Spark build user
I don't know if anyone still needs this. But you have to make the full Ganglia configurations:
# Ganglia conf
*.sink.ganglia.class=org.apache.spark.metrics.sink.GangliaSink
*.sink.ganglia.name=AMZN-EMR
*.sink.ganglia.host=$MASTERIP
*.sink.ganglia.port=8649
*.sink.ganglia.mode=unicast
*.sink.ganglia.period=10
*.sink.ganglia.unit=seconds
# Enable JvmSource for instance master, worker, driver and executor
master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource
Even with the full configuration, I'm running into this issue from AWS EMR 5.33.0
21/05/26 14:18:20 ERROR org.apache.spark.metrics.MetricsSystem: Source class org.apache.spark.metrics.source.JvmSource cannot be instantiated
java.lang.ClassNotFoundException: org.apache.spark.metrics.source.JvmSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:239)
at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSources$1.apply(MetricsSystem.scala:184)
at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSources$1.apply(MetricsSystem.scala:181)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
at org.apache.spark.metrics.MetricsSystem.registerSources(MetricsSystem.scala:181)
at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:528)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
21/05/26 14:18:20 ERROR org.apache.spark.metrics.MetricsSystem: Sink class org.apache.spark.metrics.sink.GangliaSink cannot be instantiated
21/05/26 14:18:20 ERROR org.apache.spark.SparkContext: Error initializing SparkContext.
java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.GangliaSink
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:239)
at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:200)
at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:196)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:130)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:130)
at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:196)
at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:104)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:528)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
It's weird because AWS EMR should provide this dependency (org.apache.spark:spark-core_2.11:2.4.7) and I hope that the Spark distribution with AWS EMR is compiled with the Ganglia option. Forcing this jar on --packages or --jars spark options doesn't help either.
If someone manages to get Ganglia working with Spark on AWS EMR with driver/executors jvm monitoring. Please do tell me how.

Apache Kafka 0.8 log4j2.xml appender getting timeout error

I need to configure Kafka appender using log4j2.xml, the setup worked fine local, but server I am getting below error.
Local Kafka: kafka_2.11-0.10.2.0
Kafka Appender: 0.9.0.0
Local worked fine.
Server Kafka: 0.8
Kafka Appender: 0.9.0.0
On the server I got the following error:
2017-03-09 21:19:18,255 main ERROR Unable to write to Kafka [kafkaAppender] for appender [kafkaAppender]. java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
at org.apache.kafka.clients.producer.KafkaProducer$FutureFailure.<init>(KafkaProducer.java:730)
at org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:483)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:430)
at org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:353)
at org.apache.logging.log4j.core.appender.mom.kafka.KafkaManager.send(KafkaManager.java:81)
at org.apache.logging.log4j.core.appender.mom.kafka.KafkaAppender.append(KafkaAppender.java:85)
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:155)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:128)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:119)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:390)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:375)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:359)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:349)
at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63)
at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:146)
at org.apache.logging.slf4j.Log4jLogger.log(Log4jLogger.java:376)
at org.apache.commons.logging.impl.SLF4JLocationAwareLog.info(SLF4JLocationAwareLog.java:155)
at org.springframework.boot.SpringApplication.logStartupProfileInfo(SpringApplication.java:665)
at org.springframework.boot.SpringApplication.prepareContext(SpringApplication.java:353)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:313)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1186)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1175)
at com.charter.kafka.proxy.app.ProxyApplication.main(ProxyApplication.java:21)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51)
Caused by: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
My bad, I wrote value for bootstrap.servers property with extra space in log4j2.xml, after removing the space, I could connect to Kafka. Thanks.
You are unable to get the metadata , check that the kafka cluster is up and running and that there is no connectivity issue

unable to bring up spark 2.1.0 manually on HDP 2.5.3

I was testing my spark code on spark 2.0.0 and I hit a bug SPARK-17463 and I wanted to use spark 2.1.0 since the bug is fixed in this version.
However, I am unable to bring up spark-shell with yarn client mode for spark 2.1.0
I need to get 2.1.0 working on a HDP 2.5.3 cluster.
It throws an exception:
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/15 14:28:46 WARN Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
java.lang.NoClassDefFoundError: com/sun/jersey/api/client/config/ClientConfig
at org.apache.hadoop.yarn.client.api.TimelineClient.createTimelineClient(TimelineClient.java:55)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createTimelineClient(YarnClientImpl.java:181)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.serviceInit(YarnClientImpl.java:168)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:151)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:156)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
... 47 elided
Caused by: java.lang.ClassNotFoundException: com.sun.jersey.api.client.config.ClientConfig
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 61 more
<console>:14: error: not found: value spark
import spark.implicits._
^
<console>:14: error: not found: value spark
import spark.sql
^
I had the same issue with spark-submit in EMR, after adding jersey-bundle-1.19.1.jar to $SPARK_HOME/jars, issue got resolved.
you can download it from here: http://repo1.maven.org/maven2/com/sun/jersey/jersey-bundle/1.19.1/jersey-bundle-1.19.1.jar
The YARN time service is not compatible with libraries provided by Spark. Please disable time service by setting spark.hadoop.yarn.timeline-service.enabled=false.
For more details please visit https://issues.apache.org/jira/browse/SPARK-15343
Add below parameter in spark-defaults.conf and restart Spark history server.
spark.hadoop.yarn.timeline-service.enabled false

Why does starting spark-shell fail with "we couldn't find any external IP address!" on Windows?

I am having trouble in starting spark-shell on my Windows computer now. The version of Spark I am using is 1.5.2 pre-built for Hadoop 2.4 or later. I think spark-shell.cmd could be run directly without any configuration since it is pre-built and I cannot figure out what is the problem that prevents me from starting Spark correctly.
Aside from the error message printed out I can still execute some basic scala command on the command line, but apparently something is going wrong here.
Here is the error log from cmd:
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.li
b.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more in
fo.
Using Spark's repl log4j profile: org/apache/spark/log4j-defaults-repl.propertie
s
To adjust logging level use sc.setLogLevel("INFO")
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.5.2
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_25)
Type in expressions to have them evaluated.
Type :help for more information.
15/11/18 17:51:32 WARN MetricsSystem: Using default name DAGScheduler for source
because spark.app.id is not set.
Spark context available as sc.
15/11/18 17:51:39 WARN General: Plugin (Bundle) "org.datanucleus" is already reg
istered. Ensure you dont have multiple JAR versions of the same plugin in the cl
asspath. The URL "file:/C:/spark-1.5.2-bin-hadoop2.4/lib/datanucleus-core-3.2.10
.jar" is already registered, and you are trying to register an identical plugin
located at URL "file:/C:/spark-1.5.2-bin-hadoop2.4/bin/../lib/datanucleus-core-3
.2.10.jar."
15/11/18 17:51:39 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is
already registered. Ensure you dont have multiple JAR versions of the same plug
in in the classpath. The URL "file:/C:/spark-1.5.2-bin-hadoop2.4/lib/datanucleus
-rdbms-3.2.9.jar" is already registered, and you are trying to register an ident
ical plugin located at URL "file:/C:/spark-1.5.2-bin-hadoop2.4/bin/../lib/datanu
cleus-rdbms-3.2.9.jar."
15/11/18 17:51:39 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is alr
eady registered. Ensure you dont have multiple JAR versions of the same plugin i
n the classpath. The URL "file:/C:/spark-1.5.2-bin-hadoop2.4/bin/../lib/datanucl
eus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an
identical plugin located at URL "file:/C:/spark-1.5.2-bin-hadoop2.4/lib/datanucl
eus-api-jdo-3.2.6.jar."
15/11/18 17:51:39 WARN Connection: BoneCP specified but not present in CLASSPATH
(or one of dependencies)
15/11/18 17:51:40 WARN Connection: BoneCP specified but not present in CLASSPATH
(or one of dependencies)
15/11/18 17:51:46 WARN ObjectStore: Version information not found in metastore.
hive.metastore.schema.verification is not enabled so recording the schema versio
n 1.2.0
15/11/18 17:51:46 WARN ObjectStore: Failed to get database default, returning No
SuchObjectException
15/11/18 17:51:47 WARN : Your hostname, Lenovo-PC resolves to a loopback/non-reachab
le address: fe80:0:0:0:297a:e76d:828:59dc%wlan2, but we couldn't find any extern
al IP address!
java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.jav
a:522)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.s
cala:171)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveCo
ntext.scala:162)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala
:160)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:167)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstruct
orAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingC
onstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:10
28)
at $iwC$$iwC.<init>(<console>:9)
at $iwC.<init>(<console>:18)
at <init>(<console>:20)
at .<init>(<console>:24)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:
1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:
1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840
)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:8
57)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.sca
la:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply
(SparkILoopInit.scala:132)
at org.apache.spark.repl.SparkILoopInit$$anonfun$initializeSpark$1.apply
(SparkILoopInit.scala:124)
at org.apache.spark.repl.SparkIMain.beQuietDuring(SparkIMain.scala:324)
at org.apache.spark.repl.SparkILoopInit$class.initializeSpark(SparkILoop
Init.scala:124)
at org.apache.spark.repl.SparkILoop.initializeSpark(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spark
ILoop$$process$1$$anonfun$apply$mcZ$sp$5.apply$mcV$sp(SparkILoop.scala:974)
at org.apache.spark.repl.SparkILoopInit$class.runThunks(SparkILoopInit.s
cala:159)
at org.apache.spark.repl.SparkILoop.runThunks(SparkILoop.scala:64)
at org.apache.spark.repl.SparkILoopInit$class.postInitialization(SparkIL
oopInit.scala:108)
at org.apache.spark.repl.SparkILoop.postInitialization(SparkILoop.scala:
64)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spark
ILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:991)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spark
ILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$Spark
ILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClass
Loader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$pr
ocess(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSub
mit$$runMain(SparkSubmit.scala:674)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:18
0)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1012)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
650)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:739)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:722)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1097)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.
loadPermissionInfo(RawLocalFileSystem.java:559)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.
getPermission(RawLocalFileSystem.java:534)
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(Sess
ionState.java:599)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(Sess
ionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.jav
a:508)
... 56 more
<console>:10: error: not found: value sqlContext
import sqlContext.implicits._
^
<console>:10: error: not found: value sqlContext
import sqlContext.sql
^
There are a couple of issues. You're on Windows and things are different on this OS comparing to other POSIX-compliant OSes.
Start by reading Problems running Hadoop on Windows document and see if "missing WINUTILS.EXE" is the issue. Make sure you run spark-shell in console with admin rights.
You may also want to read the answers to a similar question Why does starting spark-shell fail with NullPointerException on Windows?
Also, you may have started spark-shell inside bin subdirectory and hence the errors like:
15/11/18 17:51:39 WARN General: Plugin (Bundle)
"org.datanucleus.api.jdo" is already registered. Ensure you dont have
multiple JAR versions of the same plugin in the classpath. The URL
"file:/C:/spark-1.5.2-bin-hadoop2.4/bin/../lib/datanucleus-api-jdo-3.2.6.jar"
is already registered, and you are trying to register an identical
plugin located at URL
"file:/C:/spark-1.5.2-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar."
And the last issue:
15/11/18 17:51:47 WARN : Your hostname, Lenovo-PC resolves to a
loopback/non-reachable address: fe80:0:0:0:297a:e76d:828:59dc%wlan2,
but we couldn't find any external IP address!
One workaround is to set SPARK_LOCAL_HOSTNAME to some resolvable host name and be done with it.
SPARK_LOCAL_HOSTNAME is the custom host name that overrides any other candidates for hostname when driver, master, workers, and executors are created.
In your case, using spark-shell, just execute the following:
SPARK_LOCAL_HOSTNAME=localhost ./bin/spark-shell
You can also use:
./bin/spark-shell -c spark.driver.host=localhost
Refer also to Environment Variables in the official documentation of Spark.

Resources