Enabling Spark logs in IntelliJ - apache-spark

I am running my spark jobs on intelliJ, and am getting the following error:-
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/Users/sanchay/brazil-pkg-cache/packages/Log4j-slf4j/Log4j-slf4j-2.x.698.1338/RHEL5_64/DEV.STD.PTHREAD/build/lib/log4j-slf4j-impl-2.8.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/sanchay/brazil-pkg-cache/packages/Slf4j_log4j/Slf4j_log4j-1.7.2463.17539/RHEL5_64/DEV.STD.PTHREAD/build/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/Users/sanchay/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
However, when I run the same jar/class using the spark-submit, I am able to obtain all the INFO logs required :-
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/11/08 00:17:27 INFO SparkContext: Running Spark version 2.2.0
17/11/08 00:17:27 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/08 00:17:28 INFO SparkContext: Submitted application: JavaSparkPi
17/11/08 00:17:28 INFO SecurityManager: Changing view acls to: sanchay
17/11/08 00:17:28 INFO SecurityManager: Changing modify acls to: sanchay
17/11/08 00:17:28 INFO SecurityManager: Changing view acls groups to:
17/11/08 00:17:28 INFO SecurityManager: Changing modify acls groups to:
17/11/08 00:17:28 INFO SecurityManager: SecurityManager: authentication dis
How can I switch to the default logging file in IntelliJ, so that I can obtain information regarding the Spark Ui etc via the INFO logs ?

assume that you manage code with maven. build dependency:tree and check what version of log libraries are used. https://www.slf4j.org/legacy.html wil very good chart with infromation what libraries should be. All others should be excluded
how do you build jar to be submitted to spark? I assume that you did not include all dependencies and as result you didn't have extra logging libraries.

Related

How to start Spark History Server pointing to Google Cloud?

I'm using following command to start the server
sh /usr/local/Cellar/apache-spark/3.2.1/libexec/sbin/start-history-server.sh --properties-file /usr/local/Cellar/apache-spark/3.2.1/custom-configs/cloud.properties
cloud.properties file contents
spark.history.fs.logDirectory=gs://bucket/spark-history-server
Got the following error
Spark Command: /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java -cp /usr/local/Cellar/apache-spark/3.2.1/libexec/conf/:/usr/local/Cellar/apache-spark/3.2.1/libexec/jars/* -Xmx1g org.apache.spark.deploy.history.HistoryServer --properties-file /usr/local/Cellar/apache-spark/3.2.1/custom-configs/cloud.properties
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/05/03 00:58:50 INFO HistoryServer: Started daemon with process name: XXXX
22/05/03 00:58:50 INFO SignalUtils: Registering signal handler for TERM
22/05/03 00:58:50 INFO SignalUtils: Registering signal handler for HUP
22/05/03 00:58:50 INFO SignalUtils: Registering signal handler for INT
22/05/03 00:58:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/05/03 00:58:50 INFO SecurityManager: Changing view acls to: xxx
22/05/03 00:58:50 INFO SecurityManager: Changing modify acls to: xxx
22/05/03 00:58:50 INFO SecurityManager: Changing view acls groups to:
22/05/03 00:58:50 INFO SecurityManager: Changing modify acls groups to:
22/05/03 00:58:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); groups with view permissions: Set(); users with modify permissions: Set(xxx); groups with modify permissions: Set()
22/05/03 00:58:50 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:305)
at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "gs"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:116)
at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:88)
... 6 more
How to set the FileSystem configuration here?

Apache Spark on IPv6

I am trying to install Spark on the IPv6, however, spark-master comes on the IPv6 - DNS hostname but the spark worker node doesn't start even though I pass IP or DNS the error is the same. I need to use LOCAL_WORKER_IP=127.0.0.1 to make the spark worker start.
SPARK Worker log:
*fd74:ca9b:3a09:868c:172:18:0:462a spark-master.t253-u000265.svc.cluster.local
21/08/09 16:41:40 INFO Worker: Started daemon with process name: 10#v4-virtio-spark-worker-zjgxs
21/08/09 16:41:40 INFO SignalUtils: Registered signal handler for TERM
21/08/09 16:41:40 INFO SignalUtils: Registered signal handler for HUP
21/08/09 16:41:40 INFO SignalUtils: Registered signal handler for INT
21/08/09 16:41:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
21/08/09 16:41:41 INFO SecurityManager: Changing view acls to: root
21/08/09 16:41:41 INFO SecurityManager: Changing modify acls to: root
21/08/09 16:41:41 INFO SecurityManager: Changing view acls groups to:
21/08/09 16:41:41 INFO SecurityManager: Changing modify acls groups to:
21/08/09 16:41:41 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/08/09 16:41:42 INFO Utils: Successfully started service 'sparkWorker' on port 37816
21/08/09 16:41:42 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.lang.AssertionError: assertion failed: Expected hostname (not IP) but got fd74:ca9b:3a09:868c:172:18:0:488e
at scala.Predef$.assert(Predef.scala:170)
at org.apache.spark.util.Utils$.checkHost(Utils.scala:1014)
at org.apache.spark.deploy.worker.Worker.<init>(Worker.scala:60)
at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:811)
at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:779)
at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
21/08/09 16:41:42 INFO ShutdownHookManager: Shutdown hook called*
Does anyone know if the SPARK worker can be configured on the Ipv6 using any configurations?

spark-submit: unable to get driver status

I'm running a job on a test Spark standalone in cluster mode, but I'm finding myself unable to monitor the status of the driver.
Here is a minimal example using spark-2.4.3 (master and one worker running on the same node, started running sbin/start-all.sh on a freshly unarchived installation using the default conf, no conf/slaves set), executing spark-submit from the node itself:
$ spark-submit --master spark://ip-172-31-15-245:7077 --deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
/home/ubuntu/spark/examples/jars/spark-examples_2.11-2.4.3.jar 100
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.NativeCodeLoader).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/06/27 09:08:28 INFO SecurityManager: Changing view acls to: ubuntu
19/06/27 09:08:28 INFO SecurityManager: Changing modify acls to: ubuntu
19/06/27 09:08:28 INFO SecurityManager: Changing view acls groups to:
19/06/27 09:08:28 INFO SecurityManager: Changing modify acls groups to:
19/06/27 09:08:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); groups with view permissions: Set(); users with modify permissions: Set(ubuntu); groups with modify permissions: Set()
19/06/27 09:08:28 INFO Utils: Successfully started service 'driverClient' on port 36067.
19/06/27 09:08:28 INFO TransportClientFactory: Successfully created connection to ip-172-31-15-245/172.31.15.245:7077 after 29 ms (0 ms spent in bootstraps)
19/06/27 09:08:28 INFO ClientEndpoint: Driver successfully submitted as driver-20190627090828-0008
19/06/27 09:08:28 INFO ClientEndpoint: ... waiting before polling master for driver state
19/06/27 09:08:33 INFO ClientEndpoint: ... polling master for driver state
19/06/27 09:08:33 INFO ClientEndpoint: State of driver-20190627090828-0008 is RUNNING
19/06/27 09:08:33 INFO ClientEndpoint: Driver running on 172.31.15.245:41057 (worker-20190627083412-172.31.15.245-41057)
19/06/27 09:08:33 INFO ShutdownHookManager: Shutdown hook called
19/06/27 09:08:33 INFO ShutdownHookManager: Deleting directory /tmp/spark-34082661-f0de-4c56-92b7-648ea24fa59c
> spark-submit --master spark://ip-172-31-15-245:7077 --status driver-20190627090828-0008
19/06/27 09:09:27 WARN RestSubmissionClient: Unable to connect to server spark://ip-172-31-15-245:7077.
Exception in thread "main" org.apache.spark.deploy.rest.SubmitRestConnectionException: Unable to connect to server
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:165)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:148)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.deploy.rest.RestSubmissionClient.requestSubmissionStatus(RestSubmissionClient.scala:148)
at org.apache.spark.deploy.SparkSubmit.requestStatus(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:88)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.deploy.rest.SubmitRestConnectionException: No response from server
at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:285)
at org.apache.spark.deploy.rest.RestSubmissionClient.org$apache$spark$deploy$rest$RestSubmissionClient$$get(RestSubmissionClient.scala:195)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:152)
... 11 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:278)
... 13 more
Spark is in good health (I'm able to run other jobs after the one above), the driver-20190627090828-0008 appears as "FINISHED" in the web UI.
Is there something I am missing?
UPDATE:
on the master log all I get is
19/07/01 09:40:24 INFO master.Master: 172.31.15.245:42308 got disassociated, removing it.

Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/Logging

I'm new to Spark. I attempted to run a Spark app (.jar) on CDH 5.8.0-0 on Oracle VirtualBox 5.1.4r110228 which leveraged Spark Steaming to perform sentiment analysis on twitter. I have my twitter account created and all required (4) tokens were generated. I was blocked by the NoClassDefFoundError exception.
I've been googling around for a couple of days. The best advice I found so far was in the URL below but apparently my environment is still missing something.
http://javarevisited.blogspot.com/2011/06/noclassdeffounderror-exception-in.html#ixzz4Ia99dsp0
What does it mean by a library showed up in Compile by was missing at RunTime? How can we fix this?
What is the library of Logging? I came across an article stating this Logging is subject to be deprecated. Besides that, I do see log4j in my environment.
In my CDH 5.8, I'm running these versions of software:
Spark-2.0.0-bin-hadoop2.7 / spark-core_2.10-2.0.0
jdk-8u101-linux-x64 / jre-bu101-linux-x64
I appended the detail of the exception at the end. Here is the procedure I performed to execute the app and some verification I did after hitting the exception:
Unzip twitter-streaming.zip (the Spark app)
cd twitter-streaming
run ./sbt/sbt assembly
Update env.sh with your Twitter account
$ cat env.sh
export SPARK_HOME=/home/cloudera/spark-2.0.0-bin-hadoop2.7
export CONSUMER_KEY=<my_consumer_key>
export CONSUMER_SECRET=<my_consumer_secret>
export ACCESS_TOKEN=<my_twitterapp_access_token>
export ACCESS_TOKEN_SECRET=<my_twitterapp_access_token>
The submit.sh script wrapped up the spark-submit command with required credential info in env.sh:
$ cat submit.sh
source ./env.sh
$SPARK_HOME/bin/spark-submit --class "TwitterStreamingApp" --master local[*] ./target/scala-2.10/twitter-streaming-assembly-1.0.jar $CONSUMER_KEY $CONSUMER_SECRET $ACCESS_TOKEN $ACCESS_TOKEN_SECRET
The log of the assembly process:
[cloudera#quickstart twitter-streaming]$ ./sbt/sbt assembly
Launching sbt from sbt/sbt-launch-0.13.7.jar
[info] Loading project definition from /home/cloudera/workspace/twitter-streaming/project
[info] Set current project to twitter-streaming (in build file:/home/cloudera/workspace/twitter-streaming/)
[info] Including: twitter4j-stream-3.0.3.jar
[info] Including: twitter4j-core-3.0.3.jar
[info] Including: scala-library.jar
[info] Including: unused-1.0.0.jar
[info] Including: spark-streaming-twitter_2.10-1.4.1.jar
[info] Checking every *.class/*.jar file's SHA-1.
[info] Merging files...
[warn] Merging 'META-INF/LICENSE.txt' with strategy 'first'
[warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.spark-project.spark/unused/pom.properties' with strategy 'first'
[warn] Merging 'META-INF/maven/org.spark-project.spark/unused/pom.xml' with strategy 'first'
[warn] Merging 'log4j.properties' with strategy 'discard'
[warn] Merging 'org/apache/spark/unused/UnusedStubClass.class' with strategy 'first'
[warn] Strategy 'discard' was applied to 2 files
[warn] Strategy 'first' was applied to 4 files
[info] SHA-1: 69146d6fdecc2a97e346d36fafc86c2819d5bd8f
[info] Packaging /home/cloudera/workspace/twitter-streaming/target/scala-2.10/twitter-streaming-assembly-1.0.jar ...
[info] Done packaging.
[success] Total time: 6 s, completed Aug 27, 2016 11:58:03 AM
Not sure exactly what it means but everything looked good when I ran Hadoop NativeCheck:
$ hadoop checknative -a
16/08/27 13:27:22 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
16/08/27 13:27:22 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1
lz4: true revision:10301
bzip2: true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so
Here is the console log of my exception:
$ ./submit.sh
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/08/28 20:13:23 INFO SparkContext: Running Spark version 2.0.0
16/08/28 20:13:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/28 20:13:24 WARN Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface eth0)
16/08/28 20:13:24 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/08/28 20:13:24 INFO SecurityManager: Changing view acls to: cloudera
16/08/28 20:13:24 INFO SecurityManager: Changing modify acls to: cloudera
16/08/28 20:13:24 INFO SecurityManager: Changing view acls groups to:
16/08/28 20:13:24 INFO SecurityManager: Changing modify acls groups to:
16/08/28 20:13:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); groups with view permissions: Set(); users with modify permissions: Set(cloudera); groups with modify permissions: Set()
16/08/28 20:13:25 INFO Utils: Successfully started service 'sparkDriver' on port 37550.
16/08/28 20:13:25 INFO SparkEnv: Registering MapOutputTracker
16/08/28 20:13:25 INFO SparkEnv: Registering BlockManagerMaster
16/08/28 20:13:25 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-37a0492e-67e3-4ad5-ac38-40448c25d523
16/08/28 20:13:25 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
16/08/28 20:13:25 INFO SparkEnv: Registering OutputCommitCoordinator
16/08/28 20:13:25 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/08/28 20:13:25 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4040
16/08/28 20:13:25 INFO SparkContext: Added JAR file:/home/cloudera/workspace/twitter-streaming/target/scala-2.10/twitter-streaming-assembly-1.1.jar at spark://10.0.2.15:37550/jars/twitter-streaming-assembly-1.1.jar with timestamp 1472440405882
16/08/28 20:13:26 INFO Executor: Starting executor ID driver on host localhost
16/08/28 20:13:26 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41264.
16/08/28 20:13:26 INFO NettyBlockTransferService: Server created on 10.0.2.15:41264
16/08/28 20:13:26 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 41264)
16/08/28 20:13:26 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:41264 with 366.3 MB RAM, BlockManagerId(driver, 10.0.2.15, 41264)
16/08/28 20:13:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 41264)
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)I
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.streaming.twitter.TwitterUtils$.createStream(TwitterUtils.scala:44)
at TwitterStreamingApp$.main(TwitterStreamingApp.scala:42)
at TwitterStreamingApp.main(TwitterStreamingApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 23 more
16/08/28 20:13:26 INFO SparkContext: Invoking stop() from shutdown hook
16/08/28 20:13:26 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
16/08/28 20:13:26 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/08/28 20:13:26 INFO MemoryStore: MemoryStore cleared
16/08/28 20:13:26 INFO BlockManager: BlockManager stopped
16/08/28 20:13:26 INFO BlockManagerMaster: BlockManagerMaster stopped
16/08/28 20:13:26 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/08/28 20:13:26 INFO SparkContext: Successfully stopped SparkContext
16/08/28 20:13:26 INFO ShutdownHookManager: Shutdown hook called
16/08/28 20:13:26 INFO ShutdownHookManager: Deleting directory /tmp/spark-5e29c3b2-74c2-4d89-970f-5be89d176b26
I understand my post was lengthy. Your advice or insights are highly appreciated!!
-jsung8
Use: spark-core_2.11-1.5.2.jar
I had the same problem described by #jsung8 and tried to find the .jar suggested by #youngstephen but could not. However linking in spark-core_2.11-1.5.2.jar instead of spark-core_2.11-1.5.2.logging.jar resolved the exception in the way #youngstephen suggested.
org/apache/spark/Logging was removed after spark 1.5.2.
Since your spark-core version is 2.0, then the simplest solution is:
download a single spark-core_2.11-1.5.2.logging.jar and put it in the jars directory under your spark root directory.
Anyway, It solves my problem, hope it helps.
One reason that may cause this problem is lib and class conflict.
I faced this problem and solved it using some maven exclusions:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.0</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.0.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
You're using an old version of the Spark Twitter connector. This class from your stack trace hints at that:
org.apache.spark.streaming.twitter.TwitterUtils
Spark removed that integration in version 2.0. You're using the one from an old Spark version that references the old Logging class which moved to a different package in Spark 2.0.
If you want to use Spark 2.0, you'll need to use the Twitter connector from the Bahir project.
Spark core version should be degraded to 1.5 due to the below error
java.lang.NoClassDefFoundError: org/apache/spark/Logging
http://bahir.apache.org/docs/spark/2.0.0/spark-streaming-twitter/ provides the better solution for this. By adding the below dependency, my issue was resolved.
<dependency>
<groupId>org.apache.bahir</groupId>
<artifactId>spark-streaming-twitter_2.11</artifactId>
<version>2.0.0</version>
</dependency>

Running Spark on Yarn

I am trying to run spark on yarn in quickstart cloudera vm.It already has spark 1.3 and Hadoop 2.6.0-cdh5.4.0 installed.(I am not using spark-submit since I want to run a different version of spark).
I am able to run spark 1.3 on yarn but get the below error for spark 1.4.
The log shows its running on spark 1.4 but still gives a error on a method which is present in 1.4 and not 1.3. Even the fat jar contains the class files of 1.4.
As far as running in yarn the installed spark version shouldnt matter, but still its running on the other version.
Hadoop Version:
Hadoop 2.6.0-cdh5.4.0
Subversion http://github.com/cloudera/hadoop -r c788a14a5de9ecd968d1e2666e8765c5f018c271
Compiled by jenkins on 2015-04-21T19:18Z
Compiled with protoc 2.5.0
From source with checksum cd78f139c66c13ab5cee96e15a629025
This command was run using /usr/lib/hadoop/hadoop-common-2.6.0-cdh5.4.0.jar
Error:
LogType:stderr
Log Upload Time:Tue Oct 20 21:58:56 -0700 2015
LogLength:2334
Log Contents:
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/lib/zookeeper/lib/slf4j-log4j12- 1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/filecache/10/simple-yarn-app-1.1.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
15/10/20 21:58:50 INFO spark.SparkContext: Running Spark version 1.4.0
15/10/20 21:58:53 INFO spark.SecurityManager: Changing view acls to: yarn
15/10/20 21:58:53 INFO spark.SecurityManager: Changing modify acls to: yarn
15/10/20 21:58:53 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn); users with modify permissions: Set(yarn)
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.network.util.JavaUtils.timeStringAsSec(Ljava/lang/String;)J
at org.apache.spark.util.Utils$.timeStringAsSeconds(Utils.scala:1027)
at org.apache.spark.SparkConf.getTimeAsSeconds(SparkConf.scala:194)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:68)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.rpc.akka.AkkaRpcEnvFactory.create(AkkaRpcEnv.scala:245)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:52)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:247)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:188)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:267)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:424)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at com.hortonworks.simpleyarnapp.HelloWorld.main(HelloWorld.java:50)
15/10/20 21:58:53 INFO util.Utils: Shutdown hook called
Please help

Resources