Too many HashedWheelTimer error in Spark application - apache-spark

I met this error when running spark streaming application. This error happened only once and the application has been running over 500 hours.
Here are some dependencies and their versions:
Spark: 1.5.1
Spark-Cassandra-Connector 1.5.0-M3
Thanks

Related

Error when running spark application with zeppelin

When I run the above spark application with zeppelin in Yarn cluster with cluster mode, I get the following error:
Where may be the problem? Thanks

apache_beam spark runner with python can't be implemented on remote spark cluster?

i am following the python guide beam spark runner,and the beam_pipeline can submit job to a local jobserver which is launched by ./gradlew :runners:spark:job-server:runShadow with a local spark,
and the addition parameter-PsparkMasterUrl=spark://localhost:7077 to a pre-deployed spark.
But i have a spark cluster on yarn, i set the launch command as ./gradlew :runners:spark:job-server:runShadow -PsparkMasterUrl=yarn(also tried yarn-client), but only get org.apache.spark.SparkException: Could not parse Master URL: 'yarn'
and the source code of the spark runner(beam\sdks\python\apache_beam\runners\portability\spark_runnner.py) shows that:
parser.add_argument('--spark_master_url',
default='local[4]',
help='Spark master URL (spark://HOST:PORT). '
'Use "local" (single-threaded) or "local[*]" '
'(multi-threaded) to start a local cluster for '
'the execution.')
it doesn't mention 'yarn', and the Provided SparkContext and StreamingListeners are not supported on the Spark portable runner. So does that meaning apache_beam spark runner with python can't be implemented on remote spark cluster(yarn mostly) and can only be test locally? or maybe i can set the job_endpoint as the remote job server url of my spark cluster.
and the every ./gradlew command blocked at 98%,but the jab server started with info like that:
19/11/28 13:47:48 INFO org.apache.beam.runners.fnexecution.jobsubmission.JobServerDriver: JobService started on localhost:8099
<============-> 98% EXECUTING [16s]
> IDLE
> :runners:spark:job-server:runShadow
> IDLE
So does that meaning apache_beam spark runner with python can't be implemented on remote spark cluster(yarn mostly)
We've recently added portable Spark jars, which can be submitted via spark-submit. This feature isn't scheduled be included a Beam release until 2.19.0, however.
I created a JIRA ticket to track the status of YARN support, in case there are other related issues that need to be addressed.
and the every ./gradlew command blocked at 98%
That's expected behavior. The job server will stay running until canceled.

%spark.r interpreter is not working in Zeppelin 0.6.1

I am having Spark 1.6.2 cluster with Hadoop YARN, Oozie. I have installed Zeppelin 0.6.1(Binary package with all interpreters: zeppelin-0.6.1-bin-all.tgz). When I am trying to use SparkR script with %spark.r interpreter,
%spark.r
# Creating SparkConext and connecting to Cloudant DB
sc1 <- sparkR.init(sparkEnv = list("cloudant.host"="host_name","cloudant.username"="user_name","cloudant.password"="password", "jsonstore.rdd.schemaSampleSize"="-1"))
# Database to be connected to extract the data
database <- "sensordata"
# Creating Spark SQL Context
sqlContext <- sparkRSQL.init(sc)
# Creating DataFrame for the "sensordata" Cloudant DB
sensorDataDF <- read.df(sqlContext, database, header='true', source = "com.cloudant.spark",inferSchema='true')
# Get basic information about the DataFrame(sensorDataDF)
printSchema(sensorDataDF)
I am getting the following error(log):
ERROR [2016-08-25 03:28:37,336] (
{Thread-77}
JobProgressPoller.java[run]:54) - Can not get or update progress
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:373)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:111)
at org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:237)
at org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:51)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpreterService.java:296)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterService.java:281)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:370)
... 3 more
Help would be much appreciated.
I faced the same similar issue after migrating to 0.6.1. The issue is Zeppelin is built with scala 2.11 and Apache Spark 1.6.2 is built with scala 2.10.
You need to build spark 1.6.x with scala 2.11 or migrate your spark code to 2.0.0
Setting local[2] in the interpreter section fixed my issues. This was originally suggested by vgunnu
"Try setting spark master as local[2], if that works, you might be missing few environmental variables in env file – vgunnu Aug 25 at 4:37"

Lots of ERROR ErrorMonitor: AssociationError on spark startup

I am using spark on mesos (with kafka and cassandra)
On startup, I have a lot of errors (~ 100) then evrything works fine.
The errors are of this type:
[Stage 0:=======> (24 + 26) / 50][Stage 3:> (0 + 24) / 24]15/09/17 09:48:35 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver#10.131.xx.xxx:58325] <- [akka.tcp://driverPropsFetcher#10.131.xx.xxx:59441]: Error [Shut down address: akka.tcp://driverPropsFetcher#10.131.xx.xxx:59441] [
akka.remote.ShutDownAssociation: Shut down address: akka.tcp://driverPropsFetcher#10.131.xx.xxx:59441
Caused by: akka.remote.transport.Transport$InvalidAssociationException: The remote system terminated the association because it is shutting down.
]
What could be the reason of this error? How can I solve it?
The log seems to be noise.
This issue also happens at Spark 1.5.0 in CDH 5.5.0, and Cloudera says below.
http://www.cloudera.com/content/www/en-us/documentation/enterprise/latest/topics/cdh_rn_spark_ki.html
When using Spark on YARN, the driver reports misleading error messages
The Spark driver reports misleading error messages such as:
ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver#...] ->
[akka.tcp://sparkExecutor#...]: Error [Association failed with [akka.tcp://sparkE xecutor#...]]
[akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor#...]]
Workaround: Add the following property to the Spark log4j configuration file: log4j.logger.org.apache.spark.rpc.akka.ErrorMonitor=FATAL. See Configuring Spark Application Logging Properties.

spark-ec2 and Tachyon hadoop version disparity

I try to use spark-ec2 to launch ec2 cluster with hadoop version 2.x, so I tried:
./spark-ec2 -k spark -i ~/.ssh/spark.pem -s 1 --hadoop-major-version=2 launch my-spark-cluster
then I found out there are error in the tachyon setting up process:
Setting up tachyon
RSYNC'ing /root/tachyon to slaves...
ec2-52-1-147-16.compute-1.amazonaws.com
ec2-52-1-147-16.compute-1.amazonaws.com: Formatting Tachyon Worker # ip-172-31-21-86.ec2.internal
ec2-52-1-147-16.compute-1.amazonaws.com: Removing local data under folder: /mnt/ramdisk/tachyonworker/
Formatting Tachyon Master # ec2-52-1-14-186.compute-1.amazonaws.com
Formatting JOURNAL_FOLDER: /root/tachyon/libexec/../journal/
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4
at tachyon.util.CommonUtils.runtimeException(CommonUtils.java:246)
at tachyon.UnderFileSystemHdfs.<init>(UnderFileSystemHdfs.java:73)
at tachyon.UnderFileSystemHdfs.getClient(UnderFileSystemHdfs.java:53)
at tachyon.UnderFileSystem.get(UnderFileSystem.java:53)
at tachyon.Format.main(Format.java:54)
Caused by: org.apache.hadoop.ipc.RemoteException: Server IPC version 7 cannot communicate with client version 4
at org.apache.hadoop.ipc.Client.call(Client.java:1070)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at tachyon.UnderFileSystemHdfs.<init>(UnderFileSystemHdfs.java:69)
... 3 more
I've searched for some related question and it seems that Server IPC version 7 cannot communicate with client version 4 means that server is using hadoop 2.x and client is using hadoop 1.x. However, I built my spark with hadoop 2.4.0 and I also tried the official spark pre-built version with hadoop 2.4.0 and later, both lead to the same error.
By the way, hadoop version created by setting --hadoop-major-version=2 is Hadoop 2.0.0-cdh4.2.0. Is this a problem? But I tried to use 2.4 or 2.4.0 here, neither of them are recognized as valid hadoop version

Resources