Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/spark/Logging - apache-spark

I'm new to Spark. I attempted to run a Spark app (.jar) on CDH 5.8.0-0 on Oracle VirtualBox 5.1.4r110228 which leveraged Spark Steaming to perform sentiment analysis on twitter. I have my twitter account created and all required (4) tokens were generated. I was blocked by the NoClassDefFoundError exception.
I've been googling around for a couple of days. The best advice I found so far was in the URL below but apparently my environment is still missing something.
http://javarevisited.blogspot.com/2011/06/noclassdeffounderror-exception-in.html#ixzz4Ia99dsp0
What does it mean by a library showed up in Compile by was missing at RunTime? How can we fix this?
What is the library of Logging? I came across an article stating this Logging is subject to be deprecated. Besides that, I do see log4j in my environment.
In my CDH 5.8, I'm running these versions of software:
Spark-2.0.0-bin-hadoop2.7 / spark-core_2.10-2.0.0
jdk-8u101-linux-x64 / jre-bu101-linux-x64
I appended the detail of the exception at the end. Here is the procedure I performed to execute the app and some verification I did after hitting the exception:
Unzip twitter-streaming.zip (the Spark app)
cd twitter-streaming
run ./sbt/sbt assembly
Update env.sh with your Twitter account
$ cat env.sh
export SPARK_HOME=/home/cloudera/spark-2.0.0-bin-hadoop2.7
export CONSUMER_KEY=<my_consumer_key>
export CONSUMER_SECRET=<my_consumer_secret>
export ACCESS_TOKEN=<my_twitterapp_access_token>
export ACCESS_TOKEN_SECRET=<my_twitterapp_access_token>
The submit.sh script wrapped up the spark-submit command with required credential info in env.sh:
$ cat submit.sh
source ./env.sh
$SPARK_HOME/bin/spark-submit --class "TwitterStreamingApp" --master local[*] ./target/scala-2.10/twitter-streaming-assembly-1.0.jar $CONSUMER_KEY $CONSUMER_SECRET $ACCESS_TOKEN $ACCESS_TOKEN_SECRET
The log of the assembly process:
[cloudera#quickstart twitter-streaming]$ ./sbt/sbt assembly
Launching sbt from sbt/sbt-launch-0.13.7.jar
[info] Loading project definition from /home/cloudera/workspace/twitter-streaming/project
[info] Set current project to twitter-streaming (in build file:/home/cloudera/workspace/twitter-streaming/)
[info] Including: twitter4j-stream-3.0.3.jar
[info] Including: twitter4j-core-3.0.3.jar
[info] Including: scala-library.jar
[info] Including: unused-1.0.0.jar
[info] Including: spark-streaming-twitter_2.10-1.4.1.jar
[info] Checking every *.class/*.jar file's SHA-1.
[info] Merging files...
[warn] Merging 'META-INF/LICENSE.txt' with strategy 'first'
[warn] Merging 'META-INF/MANIFEST.MF' with strategy 'discard'
[warn] Merging 'META-INF/maven/org.spark-project.spark/unused/pom.properties' with strategy 'first'
[warn] Merging 'META-INF/maven/org.spark-project.spark/unused/pom.xml' with strategy 'first'
[warn] Merging 'log4j.properties' with strategy 'discard'
[warn] Merging 'org/apache/spark/unused/UnusedStubClass.class' with strategy 'first'
[warn] Strategy 'discard' was applied to 2 files
[warn] Strategy 'first' was applied to 4 files
[info] SHA-1: 69146d6fdecc2a97e346d36fafc86c2819d5bd8f
[info] Packaging /home/cloudera/workspace/twitter-streaming/target/scala-2.10/twitter-streaming-assembly-1.0.jar ...
[info] Done packaging.
[success] Total time: 6 s, completed Aug 27, 2016 11:58:03 AM
Not sure exactly what it means but everything looked good when I ran Hadoop NativeCheck:
$ hadoop checknative -a
16/08/27 13:27:22 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
16/08/27 13:27:22 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
snappy: true /usr/lib/hadoop/lib/native/libsnappy.so.1
lz4: true revision:10301
bzip2: true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so
Here is the console log of my exception:
$ ./submit.sh
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/08/28 20:13:23 INFO SparkContext: Running Spark version 2.0.0
16/08/28 20:13:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/28 20:13:24 WARN Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface eth0)
16/08/28 20:13:24 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/08/28 20:13:24 INFO SecurityManager: Changing view acls to: cloudera
16/08/28 20:13:24 INFO SecurityManager: Changing modify acls to: cloudera
16/08/28 20:13:24 INFO SecurityManager: Changing view acls groups to:
16/08/28 20:13:24 INFO SecurityManager: Changing modify acls groups to:
16/08/28 20:13:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); groups with view permissions: Set(); users with modify permissions: Set(cloudera); groups with modify permissions: Set()
16/08/28 20:13:25 INFO Utils: Successfully started service 'sparkDriver' on port 37550.
16/08/28 20:13:25 INFO SparkEnv: Registering MapOutputTracker
16/08/28 20:13:25 INFO SparkEnv: Registering BlockManagerMaster
16/08/28 20:13:25 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-37a0492e-67e3-4ad5-ac38-40448c25d523
16/08/28 20:13:25 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
16/08/28 20:13:25 INFO SparkEnv: Registering OutputCommitCoordinator
16/08/28 20:13:25 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/08/28 20:13:25 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.0.2.15:4040
16/08/28 20:13:25 INFO SparkContext: Added JAR file:/home/cloudera/workspace/twitter-streaming/target/scala-2.10/twitter-streaming-assembly-1.1.jar at spark://10.0.2.15:37550/jars/twitter-streaming-assembly-1.1.jar with timestamp 1472440405882
16/08/28 20:13:26 INFO Executor: Starting executor ID driver on host localhost
16/08/28 20:13:26 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41264.
16/08/28 20:13:26 INFO NettyBlockTransferService: Server created on 10.0.2.15:41264
16/08/28 20:13:26 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.0.2.15, 41264)
16/08/28 20:13:26 INFO BlockManagerMasterEndpoint: Registering block manager 10.0.2.15:41264 with 366.3 MB RAM, BlockManagerId(driver, 10.0.2.15, 41264)
16/08/28 20:13:26 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.0.2.15, 41264)
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/Logging
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)I
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.streaming.twitter.TwitterUtils$.createStream(TwitterUtils.scala:44)
at TwitterStreamingApp$.main(TwitterStreamingApp.scala:42)
at TwitterStreamingApp.main(TwitterStreamingApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.Logging
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 23 more
16/08/28 20:13:26 INFO SparkContext: Invoking stop() from shutdown hook
16/08/28 20:13:26 INFO SparkUI: Stopped Spark web UI at http://10.0.2.15:4040
16/08/28 20:13:26 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/08/28 20:13:26 INFO MemoryStore: MemoryStore cleared
16/08/28 20:13:26 INFO BlockManager: BlockManager stopped
16/08/28 20:13:26 INFO BlockManagerMaster: BlockManagerMaster stopped
16/08/28 20:13:26 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/08/28 20:13:26 INFO SparkContext: Successfully stopped SparkContext
16/08/28 20:13:26 INFO ShutdownHookManager: Shutdown hook called
16/08/28 20:13:26 INFO ShutdownHookManager: Deleting directory /tmp/spark-5e29c3b2-74c2-4d89-970f-5be89d176b26
I understand my post was lengthy. Your advice or insights are highly appreciated!!
-jsung8

Use: spark-core_2.11-1.5.2.jar
I had the same problem described by #jsung8 and tried to find the .jar suggested by #youngstephen but could not. However linking in spark-core_2.11-1.5.2.jar instead of spark-core_2.11-1.5.2.logging.jar resolved the exception in the way #youngstephen suggested.

org/apache/spark/Logging was removed after spark 1.5.2.
Since your spark-core version is 2.0, then the simplest solution is:
download a single spark-core_2.11-1.5.2.logging.jar and put it in the jars directory under your spark root directory.
Anyway, It solves my problem, hope it helps.

One reason that may cause this problem is lib and class conflict.
I faced this problem and solved it using some maven exclusions:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.0</version>
<scope>provided</scope>
<exclusions>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
<version>2.0.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
<exclusion>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
</exclusion>
</exclusions>
</dependency>

You're using an old version of the Spark Twitter connector. This class from your stack trace hints at that:
org.apache.spark.streaming.twitter.TwitterUtils
Spark removed that integration in version 2.0. You're using the one from an old Spark version that references the old Logging class which moved to a different package in Spark 2.0.
If you want to use Spark 2.0, you'll need to use the Twitter connector from the Bahir project.

Spark core version should be degraded to 1.5 due to the below error
java.lang.NoClassDefFoundError: org/apache/spark/Logging
http://bahir.apache.org/docs/spark/2.0.0/spark-streaming-twitter/ provides the better solution for this. By adding the below dependency, my issue was resolved.
<dependency>
<groupId>org.apache.bahir</groupId>
<artifactId>spark-streaming-twitter_2.11</artifactId>
<version>2.0.0</version>
</dependency>

Related

Error when running spark-submit: java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition

spark-submit --jars spark-streaming-kafka-0-8_2.11-2.4.4.jar direct_approach.py localhost:9092 new_topic
I ran the code above but I don't know why I got this error. I spent hours to fix but I cannot.
I am using Spark 2.4.4 and Scala 2.13.0. I tried to set spark.executor.memory and spark.driver.memory in my Spark configuration file but i still could not solve the problem.
Here is the error:
(tutorial-env) (base) harry#harry-badass:~/Desktop/twitter_project$ spark-submit --jars spark-streaming-kafka-0-8_2.11-2.4.4.jar direct_approach.py localhost:9092 new_topic
19/12/14 14:27:23 WARN Utils: Your hostname, harry-badass resolves to a loopback address: 127.0.1.1; using 220.149.84.46 instead (on interface enp4s0)
19/12/14 14:27:23 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/jars/spark-unsafe_2.11-2.4.4.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
19/12/14 14:27:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/12/14 14:27:24 INFO SparkContext: Running Spark version 2.4.4
19/12/14 14:27:24 INFO SparkContext: Submitted application: PythonStreamingDirectKafkaWordCount
19/12/14 14:27:24 INFO SecurityManager: Changing view acls to: harry
19/12/14 14:27:24 INFO SecurityManager: Changing modify acls to: harry
19/12/14 14:27:24 INFO SecurityManager: Changing view acls groups to:
19/12/14 14:27:24 INFO SecurityManager: Changing modify acls groups to:
19/12/14 14:27:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(harry); groups with view permissions: Set(); users with modify permissions: Set(harry); groups with modify permissions: Set()
19/12/14 14:27:24 INFO Utils: Successfully started service 'sparkDriver' on port 41699.
19/12/14 14:27:24 INFO SparkEnv: Registering MapOutputTracker
19/12/14 14:27:24 INFO SparkEnv: Registering BlockManagerMaster
19/12/14 14:27:24 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/12/14 14:27:24 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/12/14 14:27:24 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-2067d2bb-4b7c-49d8-8f02-f20e8467b21e
19/12/14 14:27:24 INFO MemoryStore: MemoryStore started with capacity 434.4 MB
19/12/14 14:27:24 INFO SparkEnv: Registering OutputCommitCoordinator
19/12/14 14:27:24 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/12/14 14:27:24 INFO Utils: Successfully started service 'SparkUI' on port 4041.
19/12/14 14:27:24 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://220.149.84.46:4041
19/12/14 14:27:24 INFO SparkContext: Added JAR file:///home/harry/Desktop/twitter_project/spark-streaming-kafka-0-8_2.11-2.4.4.jar at spark://220.149.84.46:41699/jars/spark-streaming-kafka-0-8_2.11-2.4.4.jar with timestamp 1576301244901
19/12/14 14:27:24 INFO Executor: Starting executor ID driver on host localhost
19/12/14 14:27:25 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46637.
19/12/14 14:27:25 INFO NettyBlockTransferService: Server created on 220.149.84.46:46637
19/12/14 14:27:25 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/12/14 14:27:25 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 220.149.84.46, 46637, None)
19/12/14 14:27:25 INFO BlockManagerMasterEndpoint: Registering block manager 220.149.84.46:46637 with 434.4 MB RAM, BlockManagerId(driver, 220.149.84.46, 46637, None)
19/12/14 14:27:25 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 220.149.84.46, 46637, None)
19/12/14 14:27:25 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 220.149.84.46, 46637, None)
Exception in thread "Thread-5" java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
at java.base/java.lang.Class.getDeclaredMethods0(Native Method)
at java.base/java.lang.Class.privateGetDeclaredMethods(Class.java:3139)
at java.base/java.lang.Class.privateGetPublicMethods(Class.java:3164)
at java.base/java.lang.Class.getMethods(Class.java:1861)
at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:345)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:305)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:466)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:563)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
... 12 more
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "/home/harry/Desktop/twitter_project/direct_approach.py", line 9, in <module>
kvs = KafkaUtils.createDirectStream(ssc, [topic],{"metadata.broker.list": brokers})
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 146, in createDirectStream
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o24.createDirectStreamWithoutMessageHandler
19/12/14 14:27:25 INFO SparkContext: Invoking stop() from shutdown hook
19/12/14 14:27:25 INFO SparkUI: Stopped Spark web UI at http://220.149.84.46:4041
19/12/14 14:27:25 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/12/14 14:27:25 INFO MemoryStore: MemoryStore cleared
19/12/14 14:27:25 INFO BlockManager: BlockManager stopped
19/12/14 14:27:25 INFO BlockManagerMaster: BlockManagerMaster stopped
19/12/14 14:27:25 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/12/14 14:27:25 INFO SparkContext: Successfully stopped SparkContext
19/12/14 14:27:25 INFO ShutdownHookManager: Shutdown hook called
19/12/14 14:27:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-8e271f94-bec9-4f7e-aad0-1f3b651e9b29
19/12/14 14:27:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-747cc9ca-bca4-42a7-ad82-d6a055727394
19/12/14 14:27:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-747cc9ca-bca4-42a7-ad82-d6a055727394/pyspark-83cc90cc-1aaa-4dea-b364-4b66487be18f
Memory doesn't help find a missing class. You need to download the kafka-clients JAR as well
Note: You can use --packages instead of downloading jars

java.sql.SQLException: ERROR 2007 (INT09): Outdated jars

I am new to spark and kafka. We have a requirement to integrate kafka+spark+Hbase(with Phoenix).
ERROR:
Exception in thread "main" java.sql.SQLException: ERROR 2007 (INT09): Outdated jars. The following servers require an updated phoenix.jar to be put in the classpath of HBase:
I ended up with the above ERROR. If anybody could you please help how to resolve this issue.
Below is error log:
jdbc:phoenix:localhost.localdomain:2181:/hbase-unsecure
testlocalhost.localdomain:6667
18/03/05 16:18:52 INFO Metrics: Initializing metrics system: phoenix
18/03/05 16:18:52 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-phoenix.properties,hadoop-metrics2.properties
18/03/05 16:18:52 INFO MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
18/03/05 16:18:52 INFO MetricsSystemImpl: phoenix metrics system started
18/03/05 16:18:52 INFO ConnectionManager$HConnectionImplementation: Closing master protocol: MasterService
18/03/05 16:18:52 INFO ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x161f6fc5e4800a3
18/03/05 16:18:52 INFO ZooKeeper: Session: 0x161f6fc5e4800a3 closed
18/03/05 16:18:52 INFO ClientCnxn: EventThread shut down
Exception in thread "main" java.sql.SQLException: ERROR 2007 (INT09): Outdated jars. The following servers require an updated phoenix.jar to be put in the classpath of HBase: region=SYSTEM.CATALOG,,1519831518459.b16e566d706c68469922eba74844a444., hostname=localhost,16020,1520282812066, seqNum=59
at org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:476)
at org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:150)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.checkClientServerCompatibility(ConnectionQueryServicesImpl.java:1272)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.ensureTableCreated(ConnectionQueryServicesImpl.java:1107)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.createTable(ConnectionQueryServicesImpl.java:1429)
at org.apache.phoenix.schema.MetaDataClient.createTableInternal(MetaDataClient.java:2574)
at org.apache.phoenix.schema.MetaDataClient.createTable(MetaDataClient.java:1024)
at org.apache.phoenix.compile.CreateTableCompiler$2.execute(CreateTableCompiler.java:212)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:358)
at org.apache.phoenix.jdbc.PhoenixStatement$2.call(PhoenixStatement.java:341)
at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
at org.apache.phoenix.jdbc.PhoenixStatement.executeMutation(PhoenixStatement.java:339)
at org.apache.phoenix.jdbc.PhoenixStatement.executeUpdate(PhoenixStatement.java:1492)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2437)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:2382)
at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2382)
at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:255)
at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:149)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:221)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:270)
at com.spark.kafka.PhoenixJdbcClient.getConnection(PhoenixJdbcClient.scala:41)
at com.spark.kafka.PhoenixJdbcClient.currentTableSchema(PhoenixJdbcClient.scala:595)
at com.spark.kafka.SparkHBaseClient$.main(SparkHBaseClient.scala:47)
at com.spark.kafka.SparkHBaseClient.main(SparkHBaseClient.scala)
18/03/05 16:18:52 INFO SparkContext: Invoking stop() from shutdown hook
18/03/05 16:18:52 INFO SparkUI: Stopped Spark web UI at http://192.168.1.103:4040
18/03/05 16:18:53 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/03/05 16:18:53 INFO MemoryStore: MemoryStore cleared
18/03/05 16:18:53 INFO BlockManager: BlockManager stopped
18/03/05 16:18:53 INFO BlockManagerMaster: BlockManagerMaster stopped
18/03/05 16:18:53 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/03/05 16:18:53 INFO SparkContext: Successfully stopped SparkContext
18/03/05 16:18:53 INFO ShutdownHookManager: Shutdown hook called
18/03/05 16:18:53 INFO ShutdownHookManager: Deleting directory /tmp/spark-c8dd26fc-74dd-40fb-a339-8c5dda36b973
We are using Amabri Server 2.6.1.3 with HDP-2.6.3.0 and below components:
Hbase-1.1.2
kafka-0.10.1
spark-2.2.0
phoenix
Below are the POM artifact's I have added for HBase and Phoenix.
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-common</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-protocol</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-spark</artifactId>
<version>4.10.0-HBase-1.2</version>
</dependency>
<dependency>
Try the following
1.Copy Phoenix server jar to all HBase region servers(HBase lib folder)
2.Restart HBase master

Spark SBT compilation issue

in my compilation even though i am placing twitter jar files in the src/main/resources folder ,but SBT compilation is not picking them up and compiles and package without errors but at run time gives me error as "class not found twitterUtils"
my question is why SBT is not including the jar files from resource folder in the compilation ?
people are telling me to do all these complex steps of getting the Git utility and then doing a sbt assembly which I did but since iam behind proxy Git is not working even though all the http_proxy setup.
I have also tried putting these twitter jar files in the CLASSPATH with no luck.
I am stuck with this issue so any help is highly appreciated.
please see the details below
[root#hadoop1 TwitterPopularTags]# pwd
/root/TwitterPopularTags
[root#hadoop1 TwitterPopularTags]# sbt compile
[info] Set current project to TwitterPopularTags (in build file:/root/TwitterPopularTags/)
[info] Updating {file:/root/TwitterPopularTags/}twitterpopulartags...
[info] Resolving jline#jline;2.12.1 ...
[info] Done updating.
[info] Compiling 2 Scala sources to /root/TwitterPopularTags/target/scala-2.11/classes...
[success] Total time: 14 s, completed Sep 16, 2016 9:55:20 AM
[root#hadoop1 TwitterPopularTags]# sbt package
[info] Set current project to TwitterPopularTags (in build file:/root/TwitterPopularTags/)
[info] Packaging /root/TwitterPopularTags/target/scala-2.11/twitterpopulartags_2.11-1.0.jar ...
[info] Done packaging.
[success] Total time: 1 s, completed Sep 16, 2016 9:56:20 AM
[root#hadoop1 TwitterPopularTags]# spark-submit /root/TwitterPopularTags/target/scala-2.11/twitterpopulartags_2.11-1.0.jar
16/09/16 09:57:06 INFO SparkContext: Running Spark version 1.6.2
16/09/16 09:57:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/09/16 09:57:06 INFO SecurityManager: Changing view acls to: root
16/09/16 09:57:06 INFO SecurityManager: Changing modify acls to: root
16/09/16 09:57:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/09/16 09:57:07 INFO Utils: Successfully started service 'sparkDriver' on port 53967.
16/09/16 09:57:07 INFO Slf4jLogger: Slf4jLogger started
16/09/16 09:57:07 INFO Remoting: Starting remoting
16/09/16 09:57:07 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#10.100.44.17:57877]
16/09/16 09:57:07 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 57877.
16/09/16 09:57:07 INFO SparkEnv: Registering MapOutputTracker
16/09/16 09:57:07 INFO SparkEnv: Registering BlockManagerMaster
16/09/16 09:57:07 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-47a89077-0926-447c-ada7-fdb4a9aa1b83
16/09/16 09:57:07 INFO MemoryStore: MemoryStore started with capacity 511.5 MB
16/09/16 09:57:07 INFO SparkEnv: Registering OutputCommitCoordinator
16/09/16 09:57:08 INFO Server: jetty-8.y.z-SNAPSHOT
16/09/16 09:57:08 INFO AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
16/09/16 09:57:08 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/09/16 09:57:08 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.100.44.17:4040
16/09/16 09:57:08 INFO HttpFileServer: HTTP File server directory is /tmp/spark-d56628b6-fdbf-4d89-bbd2-a96603000607/httpd-ee499eb3-00ae-4276-b163-423e3b81f0b4
16/09/16 09:57:08 INFO HttpServer: Starting HTTP Server
16/09/16 09:57:08 INFO Server: jetty-8.y.z-SNAPSHOT
16/09/16 09:57:08 INFO AbstractConnector: Started SocketConnector#0.0.0.0:56067
16/09/16 09:57:08 INFO Utils: Successfully started service 'HTTP file server' on port 56067.
16/09/16 09:57:08 INFO SparkContext: Added JAR file:/root/TwitterPopularTags/target/scala-2.11/twitterpopulartags_2.11-1.0.jar at http://10.100.44.17:56067/jars/twitterpopulartags_2.11-1.0.jar with timestamp 1474034228091
16/09/16 09:57:08 INFO Executor: Starting executor ID driver on host localhost
16/09/16 09:57:08 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 49715.
16/09/16 09:57:08 INFO NettyBlockTransferService: Server created on 49715
16/09/16 09:57:08 INFO BlockManagerMaster: Trying to register BlockManager
16/09/16 09:57:08 INFO BlockManagerMasterEndpoint: Registering block manager localhost:49715 with 511.5 MB RAM, BlockManagerId(driver, localhost, 49715)
16/09/16 09:57:08 INFO BlockManagerMaster: Registered BlockManager
16/09/16 09:57:08 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
16/09/16 09:57:08 INFO EventLoggingListener: Logging events to hdfs:///spark-history/local-1474034228122
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/streaming/twitter/TwitterUtils$
at dot.state.fl.us.PrintTweets$.main(PrintTweets.scala:29)
at dot.state.fl.us.PrintTweets.main(PrintTweets.scala)
my question is why SBT is not including the jar files from resource folder in the compilation ?
Because that's not what resource folder is for. If you want to manage the dependencies manually, put them into lib folder instead. But in this case you also need to do the same with all dependencies of those dependencies, their dependencies, etc. Using managed dependencies, as described in the linked documentation, is a much better idea in general.

spark-cassandra java.lang.NoClassDefFoundError: com/datastax/spark/connector/japi/CassandraJavaUtil

16/04/26 16:58:46 DEBUG ProtobufRpcEngine: Call: complete took 3ms
Exception in thread "main" java.lang.NoClassDefFoundError: com/datastax/spark/connector/japi/CassandraJavaUtil
at com.baitic.mcava.lecturahdfssaveincassandra.TratamientoCSV.main(TratamientoCSV.java:123)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.japi.CassandraJavaUtil
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
16/04/26 16:58:46 INFO SparkContext: Invoking stop() from shutdown hook
16/04/26 16:58:46 INFO SparkUI: Stopped Spark web UI at http://10.128.0.5:4040
16/04/26 16:58:46 INFO SparkDeploySchedulerBackend: Shutting down all executors
16/04/26 16:58:46 INFO SparkDeploySchedulerBackend: Asking each executor to shut down
16/04/26 16:58:46 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/04/26 16:58:46 INFO MemoryStore: MemoryStore cleared
16/04/26 16:58:46 INFO BlockManager: BlockManager stopped
16/04/26 16:58:46 INFO BlockManagerMaster: BlockManagerMaster stopped
16/04/26 16:58:46 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/04/26 16:58:46 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/04/26 16:58:46 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/04/26 16:58:46 INFO SparkContext: Successfully stopped SparkContext
16/04/26 16:58:46 INFO ShutdownHookManager: Shutdown hook called
16/04/26 16:58:46 INFO ShutdownHookManager: Deleting directory /srv/spark/tmp/spark-2bf57fa2-a2d5-4f8a-980c-994e56b61c44
16/04/26 16:58:46 DEBUG Client: stopping client from cache: org.apache.hadoop.ipc.Client#3fb9a67f
16/04/26 16:58:46 DEBUG Client: removing client from cache: org.apache.hadoop.ipc.Client#3fb9a67f
16/04/26 16:58:46 DEBUG Client: stopping actual client because no more references remain: org.apache.hadoop.ipc.Client#3fb9a67f
16/04/26 16:58:46 DEBUG Client: Stopping client
16/04/26 16:58:46 DEBUG Client: IPC Client (2107841088) connection to mcava-master/10.128.0.5:54310 from baiticpruebas2: closed
16/04/26 16:58:46 DEBUG Client: IPC Client (2107841088) connection to mcava-master/10.128.0.5:54310 from baiticpruebas2: stopped, remaining connections 0
16/04/26 16:58:46 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
i make this simple code:
/ String pathDatosTratados="hdfs://mcava-master:54310/srv/hadoop/data/spark/DatosApp/medidasSensorTratadas.txt";
String jarPath ="hdfs://mcava-master:54310/srv/hadoop/data/spark/original-LecturaHDFSsaveInCassandra-1.0-SNAPSHOT.jar";
String jar="hdfs://mcava-master:54310/srv/hadoop/data/spark/spark-cassandra-connector-assembly-1.6.0-M1-4-g6f01cfe.jar";
String jar2="hdfs://mcava-master:54310/srv/hadoop/data/spark/spark-cassandra-connector-java-assembly-1.6.0-M1-4-g6f01cfe.jar";
String[] jars= new String[3];
jars[0]=jarPath;
jars[2]=jar;
jars[1]=jar2;
SparkConf conf=new SparkConf().setAppName("TratamientoCSV").setJars(jars);
conf.set("spark.cassandra.connection.host", "10.128.0.5");
conf.set("spark.kryoserializer.buffer.max","512");
conf.set("spark.kryoserializer.buffer","256");
// conf.setJars(jars);
JavaSparkContext sc= new JavaSparkContext(conf);
JavaRDD<String> input= sc.textFile(pathDatos);
i also put the path to cassandra drive in spark-default.conf
spark.driver.extraClassPath hdfs://mcava-master:54310/srv/hadoop/data/spark/spark-cassandra-connector-java-assembly-1.6.0-M1-4-g6f01cfe.jar
spark.executor.extraClassPath hdfs://mcava-master:54310/srv/hadoop/data/spark/spark-cassandra-connector-java-assembly-1.6.0-M1-4-g6f01cfe.jar
i also put the flag --jars to the path of driver but i have always the same error i do not understand why??
i work in google engine
Try to add package when you submit your app.
$SPARK_HOME/bin/spark-submit --packages datastax:spark-cassandra-connector:1.6.0-M2-s_2.11 ....
I add this argument to solve this problem: --packages datastax:spark-cassandra-connector:1.6.0-M2-s_2.10.
At least for 3.0+ spark cassandra connector, the official assembly jar works well for me. It has all the necessary dependencies.
i solve the problem... i maked a fat jar with all dependencies and it not necessary to indicate the references to the cassandra connector only the reference to the fat jar.
I used Spark in my Java programm, and had the same issue.
The problem was, because I didn`t include spark-cassandra-connector into my maven dependencies of my project.
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>2.0.7</version> <!-- Check actual version in maven repo -->
</dependency>
After that I builld fat jar with all my dependencies - and it`s worked!
Maybe it will help someone

Spark, How can add more storage memory?

Hy,
I have many times this error when I use a biggest dataset and I'm using MlLib (ALS)
The dataset have 3 columns (user, movie and rating) and 1.200.000 rows
WARN TaskSetManager: Stage 0 contains a task of very large size (116722 KB). The maximum recommended task size is 100 KB.
Exception in thread "dispatcher-event-loop-3" java.lang.OutOfMemoryError: Java heap space
My machine has now 8 cores, 240Gb RAM and 100GB Disk (50Gb free)
I want add more storage memory and more executors and I set (I'm using spyder IDE)
conf = SparkConf()
conf.set("spark.executor.memory", "40g")
conf.set("spark.driver.memory","20g")
conf.set("spark.executor.cores","8")
conf.set("spark.num.executors","16")
conf.set("spark.python.worker.memory","40g")
conf.set("spark.driver.maxResultSize","0")
sc = SparkContext(conf=conf)
But I still have this:
What did I do wrong?
How I'm launching Spark (PySpark - Spyder IDE)
import sys
import os
from py4j.java_gateway import JavaGateway
gateway = JavaGateway()
os.environ['SPARK_HOME']="C:/Apache/spark-1.6.0"
sys.path.append("C:/Apache/spark-1.6.0/python/")
from pyspark import SparkContext
from pyspark import SparkConf
from pyspark.sql import SQLContext
conf = SparkConf()
conf.set("spark.executor.memory", "25g")
conf.set("spark.driver.memory","10g")
conf.set("spark.executor.cores","8")
conf.set("spark.python.worker.memory","30g")
conf.set("spark.driver.maxResultSize","0")
sc = SparkContext(conf=conf)
Result
16/02/12 18:37:47 INFO SparkContext: Running Spark version 1.6.0
16/02/12 18:37:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/02/12 18:37:48 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:355)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:370)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:363)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:104)
at org.apache.hadoop.security.Groups.<init>(Groups.java:86)
at org.apache.hadoop.security.Groups.<init>(Groups.java:66)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:280)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:271)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:248)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:763)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:748)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:621)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2136)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2136)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2136)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:322)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Unknown Source)
16/02/12 18:37:48 INFO SecurityManager: Changing view acls to: rmalveslocal
16/02/12 18:37:48 INFO SecurityManager: Changing modify acls to: rmalveslocal
16/02/12 18:37:48 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(rmalveslocal); users with modify permissions: Set(rmalveslocal)
16/02/12 18:37:48 INFO Utils: Successfully started service 'sparkDriver' on port 64280.
16/02/12 18:37:49 INFO Slf4jLogger: Slf4jLogger started
16/02/12 18:37:49 INFO Remoting: Starting remoting
16/02/12 18:37:49 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#10.10.5.105:64293]
16/02/12 18:37:49 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 64293.
16/02/12 18:37:49 INFO SparkEnv: Registering MapOutputTracker
16/02/12 18:37:49 INFO SparkEnv: Registering BlockManagerMaster
16/02/12 18:37:49 INFO DiskBlockManager: Created local directory at C:\Users\rmalveslocal\AppData\Local\Temp\1\blockmgr-4bd2f97f-8b4d-423d-a4e3-06f08ecdeca9
16/02/12 18:37:49 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/02/12 18:37:49 INFO SparkEnv: Registering OutputCommitCoordinator
16/02/12 18:37:50 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/02/12 18:37:50 INFO SparkUI: Started SparkUI at http://10.10.5.105:4040
16/02/12 18:37:50 INFO Executor: Starting executor ID driver on host localhost
16/02/12 18:37:50 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 64330.
16/02/12 18:37:50 INFO NettyBlockTransferService: Server created on 64330
16/02/12 18:37:50 INFO BlockManagerMaster: Trying to register BlockManager
16/02/12 18:37:50 INFO BlockManagerMasterEndpoint: Registering block manager localhost:64330 with 511.1 MB RAM, BlockManagerId(driver, localhost, 64330)
16/02/12 18:37:50 INFO BlockManagerMaster: Registered BlockManager
You didn't specify the running mode (standalone, YARN, Mesos) you are using but I assume you use the standalone mode (for one server)
There are three concepts that play here
Worker node - a host that runs one or more executors
Executor - a container that hosts tasks
Tasks- a unit of work that runs in an
executor (parts of stages that together form a job - both these terms
are not important for this discussion)
The default in standalone mode is to allocate all available cores to an executor. In your case you also set it to 8 which equals all your cores. The result is that you have one executor that uses all the cores and since you also set the executor memory to 40G you're only using a fraction of your memory for ti (40/240)
You can either increase the memory for the executor to allow more tasks to run in parallel (and have more memory each) or set the number of cores to 1 so that you'd be able to host 8 executors (in which case you'd probably want to set the memory to a smaller number since 8*40=320)

Resources