Lots of ERROR ErrorMonitor: AssociationError on spark startup - apache-spark

I am using spark on mesos (with kafka and cassandra)
On startup, I have a lot of errors (~ 100) then evrything works fine.
The errors are of this type:
[Stage 0:=======> (24 + 26) / 50][Stage 3:> (0 + 24) / 24]15/09/17 09:48:35 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver#10.131.xx.xxx:58325] <- [akka.tcp://driverPropsFetcher#10.131.xx.xxx:59441]: Error [Shut down address: akka.tcp://driverPropsFetcher#10.131.xx.xxx:59441] [
akka.remote.ShutDownAssociation: Shut down address: akka.tcp://driverPropsFetcher#10.131.xx.xxx:59441
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down.
]
What could be the reason of this error? How can I solve it?

The log seems to be noise.
This issue also happens at Spark 1.5.0 in CDH 5.5.0, and Cloudera says below.
http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_rn_spark_ki.html
When using Spark on YARN, the driver reports misleading error messages
The Spark driver reports misleading error messages such as:
ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver#...] ->
[akka.tcp://sparkExecutor#...]: Error [Association failed with [akka.tcp://sparkE xecutor#...]]
[akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor#...]]
Workaround: Add the following property to the Spark log4j configuration file: log4j.logger.org.apache.spark.rpc.akka.ErrorMonitor=FATAL. See Configuring Spark Application Logging Properties.

Related

Error on starting worker nodes in spark standalone cluster

I am trying to setup a spark standalone cluster with 3 nodes. Configurations for Linux servers are below:
master node with 2 core and 25GB memory
worker node 1 with 4 core and 21GB memory
worker node 2 with 8 core and 19GB memory
I have started the master node successfully and its url is spark://IP:7077
when I start any of the worker node with command ./sbin/start-worker.sh spark://IP:7077 I get the below error message:
22/11/25 09:13:58 INFO Worker: Connecting to master IP:7077...
22/11/25 09:14:54 ERROR RpcOutboxMessage: Ask terminated before connecting successfully
22/11/25 09:14:54 WARN NettyRpcEnv: Ignored failure: java.io.IOException: Connecting to /IP:7077 timed out (120000 ms)
22/11/25 09:14:54 WARN Worker: Failed to connect to master IP:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
openjdk 11.0.17 is the java version installed on all the 3 nodes
Any solutions to resolve this issue will be helpful.

Apache PySpark - Failed to connect to master 7077

I set up Spark and HDFS after watching this video. The only difference is that I did it on a server (ubuntu) and not on a VM.
On the server, everything works perfect. Now I wanted to access it from my local machine (Windows) with PySpark.
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("spark://ubuntu-spark:7077").appName("test").getOrCreate()
spark.stop()
However, here I get the following error messages:
22/11/12 10:38:35 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException:
java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see
https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
setLogLevel(newLevel).
22/11/12 10:38:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
22/11/12 10:38:37 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master
ubuntu-spark:7077
org.apache.spark.SparkException: Exception thrown in awaitResult: ...
According to other posts, the DNS should be correct. I got this from the Spark Master website (at port 8080):
URL: spark://ubuntu-spark:7077
Alive Workers: 1
Cores in use: 2 Total, 0 Used
Memory in use: 6.8 GiB Total, 0.0 B Used
Resources in use:
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE
The ports are open. I also don't understand the following message: "HADOOP_HOME and hadoop.home.dir are unset." Hadoop is configured on the server. Why should I do the same thing locally again? My expectation would be that I can use Spark like an API or am I wrong?
Thank you very much for your help. If you need any configuration files I can provide them.
Hadoop should not be necessary for the code shown since you're not using HDFS, but the log is saying it's looking on your Windows machine for those settings.
DNS needs to work between your windows machine and wherever your server is running (a VM can still be server, so it's unclear where you're running this). Start debugging with ping spark-master to check, or you should be able to open spark-master:8080 from Windows browser instance as well.
If you only want to run Spark code, and don't care if it's distributed, you could just use Docker on Windows - https://github.com/jupyter/docker-stacks
Or setup Pycharm all locally for the same

Too many HashedWheelTimer error in Spark application

I met this error when running spark streaming application. This error happened only once and the application has been running over 500 hours.
Here are some dependencies and their versions:
Spark: 1.5.1
Spark-Cassandra-Connector 1.5.0-M3
Thanks

Spark mllib svd gives: Java OutOfMemory Error

I am using the svd library of mllib to do some dimensionality reduction on a big matrix:
the data is about 20G,
and the spark memory is 60G
I got the following warning and error message:
WARN ARPACK: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemARPACK WARN ARPACK: Failed to
load implementation from: com.github.fommil.netlib.NativeRefARPACK
WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemBLAS WARN BLAS: Failed to load
implementation from: com.github.fommil.netlib.NativeRefBLAS Exception
in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2766)
at org.apache.spark.mllib.linalg.EigenValueDecomposition$.symmetricEigs(EigenValueDecomposition.scala:128)
at org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:258)
at org.apache.spark.mllib.linalg.distributed.RowMatrix.computeSVD(RowMatrix.scala:190)
To resolve the warning messages, I build Spark (1.2) with the command
-Pnetlib-lgpl locally, and the warning messages disappeared when I test locally. And the log shows the netlib library is working properly:
15/03/05 20:07:03 INFO JniLoader: successfully loaded
/tmp/jniloader7217840327264308862netlib-native_system-linux-x86_64.so
15/03/05 20:07:11 INFO JniLoader: already loaded
netlib-native_system-linux-x86_64.so
Then I installed the Spark (1.2) compiled with -Pnetlib-lgpl on AWS EMR, but the warning and error message still shows up.
I was not sure whether it is the problem of my local compilation, so I compiled Spark (1.2) on AWS EC2 and installed it on AWS EMR, but the warning and error message still shows up.
Could anyone tell me how to solve this problem? Much appreciated!
The problem is partially solved. Thanks for the comments by #SeanOwen.
The reason why I am getting the java memory error is because, the computation for top eigenvectors are on the driver, so I need to make sure that I have enough memory on the driver node.
When using spark-submit with --driver-memory 5G, the problem is solved.
But the warning messages still exist.

spark-submit cluster mode is not working

I am getting an error in launching the standalone Spark driver in cluster mode. As per the documentation, it is noted that cluster mode is supported in the Spark 1.2.1 release. However, it is currently not working properly for me. Please help me in fixing the issue(s) that are preventing the proper functioning of Spark.
I have 3 node spark cluster node1 , node2 and node 3
I running below command on node 1 for deploying driver
/usr/local/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class com.fst.firststep.aggregator.FirstStepMessageProcessor --master spark://ec2-xx-xx-xx-xx.compute-1.amazonaws.com:7077 --deploy-mode cluster --supervise file:///home/xyz/sparkstreaming-0.0.1-SNAPSHOT.jar /home/xyz/config.properties
driver gets launched on node 2 in cluster. but getting exception on node 2 that it is trying to bind to node 1 ip.
2015-02-26 08:47:32 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:32 INFO Slf4jLogger:80 - Slf4jLogger started
2015-02-26 08:47:33 ERROR NettyTransport:65 - failed to bind to ec2-xx.xx.xx.xx.compute-1.amazonaws.com/xx.xx.xx.xx:0, shutting down Netty transport
2015-02-26 08:47:33 WARN Utils:71 - Service 'Driver' could not bind on port 0. Attempting port 1.
2015-02-26 08:47:33 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:33 ERROR Remoting:65 - Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:136)
at akka.remote.Remoting.start(Remoting.scala:201)
at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1765)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1756)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:33)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: ec2-xx-xx-xx.compute-1.amazonaws.com/xx.xx.xx.xx:0
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
kindly suggest
Thanks`enter code here`
It is not possible to bind to port 0. There is/are errors in your spark configuration. Specifically look at the
spark.webui.port
It is probably set to 0.

Resources