Livy create session dead - apache-spark

I added to my spark config a package (in spark-default.conf) but when I create a new session with livy it causes me a problem (see the error below) and the session and death .
ps: when I remove this package all work fine .
20/05/04 00:17:35 WARN RSCClient: Error stopping RPC.
io.netty.util.concurrent.BlockingOperationException: DefaultChannelPromise#6d493840(uncancellable)
at io.netty.util.concurrent.DefaultPromise.checkDeadLock(DefaultPromise.java:394)
at io.netty.channel.DefaultChannelPromise.checkDeadLock(DefaultChannelPromise.java:157)
...........
Exception in thread "Thread-32" java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:170)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:283)
........
at org.apache.livy.utils.LineBufferedStream$$anon$1.run(LineBufferedStream.scala:46)
20/05/04 00:17:36 WARN ContextLauncher: Child process exited with code 143.
20/05/04 00:17:36 ERROR SparkProcApp: job was killed by user
20/05/04 00:17:36 INFO InteractiveSession: Stopped InteractiveSession 0.
20/05/04 00:28:17 INFO InteractiveSessionManager: Deleting InteractiveSession 0 because it was inactive for more than 3600000.0 ms.
20/05/04 00:28:17 INFO InteractiveSessionManager: Deleting session 0
20/05/04 00:28:17 INFO InteractiveSession: Stopping InteractiveSession 0...
20/05/04 00:28:17 INFO InteractiveSession: Stopped InteractiveSession 0.
20/05/04 00:28:17 INFO InteractiveSessionManager: Deleted session 0
I use :
cloudera hdp2.6.5 :
spark 2.3
livy 0.7.0
Hadoop 2.7
lib unsupervised (https://github.com/unsupervise/spark-tss)
step :
livy conf => livy.spark.master yarn-cluster
spark-default conf => spark.jars.repositories https://dl.bintray.com/unsupervise/maven/
spark-defaultconf => spark.jars.packages com.github.unsupervise:spark-tss:0.1.1

Please try also to add to spark defaults
spark.jars.repositories https://dl.bintray.com/unsupervise/maven/
Alternatively (just to be sure that you have no issues with spark-default.conf file) please try to include these configs to the Livy request body instead when submitting the Spark job (refer Livy API).

Related

How do I use the portable runner and spark-submit to submit beams wordcount python example to a remote spark cluster on EMR running yarn?

I am trying to submit beams wordcount python example to a remote spark cluster on emr running yarn as its resource manager. According to the spark documentation this needs to be done using the portable runner.
Following the portable runner instructions, I have started the job service endpoint, and it appears to start correctly::
$ docker run --net=host apache/beam_spark_job_server:latest --spark-master-url=spark://*.***.***.***:7077
20/08/31 12:13:08 INFO org.apache.beam.runners.jobsubmission.JobServerDriver: ArtifactStagingService started on localhost:8098
20/08/31 12:13:08 INFO org.apache.beam.runners.jobsubmission.JobServerDriver: Java ExpansionService started on localhost:8097
20/08/31 12:13:08 INFO org.apache.beam.runners.jobsubmission.JobServerDriver: JobService started on localhost:8099
20/08/31 12:13:08 INFO org.apache.beam.runners.jobsubmission.JobServerDriver: Job server now running, terminate with Ctrl+C
Now I try to submit the job using spark-submit, input is a plain text version of Sherlock Holmes:
$ spark-submit --master=yarn --deploy-mode=cluster wordcount.py --input data/sherlock.txt --output output --runner=PortableRunner --job_endpoint=localhost:8099 --environment_type=DOCKER --environment_config=apachebeam/python3.7_sdk
20/08/31 12:19:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/08/31 12:19:40 INFO RMProxy: Connecting to ResourceManager at ip-***-**-**-***.ec2.internal/***.**.**.***:8032
20/08/31 12:19:40 INFO Client: Requesting a new application from cluster with 2 NodeManagers
20/08/31 12:19:40 INFO Configuration: resource-types.xml not found
20/08/31 12:19:40 INFO ResourceUtils: Unable to find 'resource-types.xml'.
20/08/31 12:19:40 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (6144 MB per container)
20/08/31 12:19:40 INFO Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
20/08/31 12:19:40 INFO Client: Setting up container launch context for our AM
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: /usr/lib/spark/python/lib/pyspark.zip not found; cannot run pyspark application in YARN mode.
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.deploy.yarn.Client.$anonfun$findPySparkArchives$2(Client.scala:1167)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.deploy.yarn.Client.findPySparkArchives(Client.scala:1163)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:858)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:178)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1134)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1526)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:853)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:937)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
20/08/31 12:19:40 INFO ShutdownHookManager: Shutdown hook called
20/08/31 12:19:40 INFO ShutdownHookManager: Deleting directory /tmp/spark-ee751413-e29d-4b1f-8a16-fb8650b1ca10
It appears to want pyspark to be installed, I am fairly new to submitting beam jobs to a spark cluster, is there a reason why pyspark would need to be installed when submitting a beam job? I have a feeling my spark-submit command is wrong, but I am having a hard time finding any more concrete documentation on how to do what I am trying to do.

Livy session stuck on starting after successful spark context creation

I've been trying to create a new spark session with Livy 0.7 server that runs on Ubuntu 18.04.
On that same machine I have a running spark cluster with 2 workers and I'm able to create a normal spark-session.
My problem is that after running the following request to Livy server the session stays stuck on starting state:
import json, pprint, requests, textwrap
host = 'http://localhost:8998'
data = {'kind': 'spark'}
headers = {'Content-Type': 'application/json'}
r = requests.post(host + '/sessions', data=json.dumps(data), headers=headers)
r.json()
I can see that the session is starting and created the spark session from the session log:
20/06/03 13:52:31 INFO SparkEntries: Spark context finished initialization in 5197ms
20/06/03 13:52:31 INFO SparkEntries: Created Spark session.
20/06/03 13:52:46 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (xxx.xx.xx.xxx:1828) with ID 0
20/06/03 13:52:47 INFO BlockManagerMasterEndpoint: Registering block manager xxx.xx.xx.xxx:1830 with 434.4 MB RAM, BlockManagerId(0, xxx.xx.xx.xxx, 1830, None)
and also from the spark master UI:
and after the livy.rsc.server.idle-timeout is reached the session log then outputs:
20/06/03 14:28:04 WARN RSCDriver: Shutting down RSC due to idle timeout (10m).
20/06/03 14:28:04 INFO SparkUI: Stopped Spark web UI at http://172.17.52.209:4040
20/06/03 14:28:04 INFO StandaloneSchedulerBackend: Shutting down all executors
20/06/03 14:28:04 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
20/06/03 14:28:04 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/06/03 14:28:04 INFO MemoryStore: MemoryStore cleared
20/06/03 14:28:04 INFO BlockManager: BlockManager stopped
20/06/03 14:28:04 INFO BlockManagerMaster: BlockManagerMaster stopped
20/06/03 14:28:04 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/06/03 14:28:04 INFO SparkContext: Successfully stopped SparkContext
20/06/03 14:28:04 INFO SparkContext: SparkContext already stopped.
and after that the dies :(
I already tried increasing the driver timeout with no luck, and didn't find any known issues like that
my guess it has something to do with the spark driver connectivity to the rsc but I have no idea where to configure that
Anyone knows the reason/solution for that?
We faced a similar problem in one of our environments. The only difference between the working and non-working env was spark master setting in livy.conf file.
I removed the config livy.spark.master=yarn from livy.conf and set this value from the code itself.
// pass master as yarn
public static JavaSparkContext getSparkContext(final String master, final String appName) {
LOGGER.info("Creating spark context");
SparkConf conf = new SparkConf().setAppName(appName);
if (Strings.isNullOrEmpty(master)) {
LOGGER.warn("No spark master found setting local!!");
conf.setMaster("local");
} else {
conf.setMaster(master);
}
conf.set("spark.submit.deployMode", "client");
return new JavaSparkContext(conf);
}
This worked for me.
It would help if anyone can point out, how this worked for me.

Why does Spark exit with exitCode: 16?

I am using Spark 2.0.0 with Hadoop 2.7 and use the yarn-cluster mode. Every time, I get the following error:
17/01/04 11:18:04 INFO spark.SparkContext: Successfully stopped SparkContext
17/01/04 11:18:04 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
17/01/04 11:18:04 INFO util.ShutdownHookManager: Shutdown hook called
17/01/04 11:18:04 INFO util.ShutdownHookManager: Deleting directory /tmp/hadoop-hduser/nm-local-dir/usercache/harry/appcache/application_1475261544699_0833/spark-42e40ac3-279f-4c3f-ab27-9999d20069b8
17/01/04 11:18:04 INFO spark.SparkContext: SparkContext already stopped.
However, I do get the correct printed output.
The same code works fine in Spark 1.4.0-Hadoop 2.4.0 where I do not see any exit codes.
This issue .sparkStaging not cleaned if application exited incorrectly https://issues.apache.org/jira/browse/SPARK-17340 started after Spark 1.4 (Affects Version/s: 1.5.2, 1.6.1, 2.0.0)
The issue is: When running Spark (yarn,cluster mode) and killing application
.sparkStaging is not cleaned.
When this issue happened exitCode 16 raised in Spark 2.0.X
ERROR ApplicationMaster: RECEIVED SIGNAL TERM
INFO ApplicationMaster: Final app status: FAILED, exitCode: 16, (reason: Shutdown hook called before final status was reported.)
Is it possible that in your code, something is killing the application?
If so - it shouldn't be seen in Spark 1.4, but should be seen in Spark 2.0.0
Please search your code for "exit" (as if you have such in your code, the error won't be shown in Spark 1.4, but will in Spark 2.0.0)
Seems like excess memory is used by JVM, try adding the property
yarn.nodemanager.vmem-check-enabled to false in your yarn-site.xml
Reference

How to autostart an Apache Spark cluster using Supervisord?

Starting an Apache Spark cluster is usually done through the spark-submit shell scripts provided by the code base. However, the problem is that every time the cluster shuts down and starts again, you need to execute those shell scripts to start the spark cluster.
Supervisord is great for managing processes and seems like a good candidate for starting the spark processes automatically after reboot.
However, after starting the master process via
command=/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -cp :/path/spark-1.3.0-bin-cdh4/sbin/../conf:/path/spark-1.3.0-bin-cdh4/lib/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.2.0.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-api-jdo-3.2.6.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-core-3.2.10.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-rdbms-3.2.9.jar:etc/hadoop/conf -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.master.Master --ip master.mydomain.com --port 7077 --webui-port 18080
and the worker process by
command=/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin/java -cp :/path/spark-1.3.0-bin-cdh4/sbin/../conf:/path/spark-1.3.0-bin-cdh4/lib/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.2.0.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-api-jdo-3.2.6.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-core-3.2.10.jar:/path/spark-1.3.0-bin-cdh4/lib/datanucleus-rdbms-3.2.9.jar:etc/hadoop/conf -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://master.mydomain.com:7077
I end up with the following error after I submit my spark application:
15/06/05 17:16:25 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 0
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 1
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 2
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 3
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 4
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 5
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 6
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 7
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 8
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 9
15/06/05 17:16:32 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: Master removed our application: FAILED
15/06/05 17:16:32 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: Master removed our application: FAILED
Does anyone know how to manage the spark processes through supervisord?
I'm also open to alternative solutions.
The spark master can be run in the foreground by
command=/path/spark-1.3.0-bin-cdh4/sbin/../bin/spark-class org.apache.spark.deploy.master.Master --ip master.mydomain.com --port 7077 --webui-port 18080
And the worker
command=/path/spark-1.3.0-bin-cdh4/sbin/../bin/spark-class org.apache.spark.deploy.worker.Worker spark://master.mydomain.com:7077

Can't spark-submit to analytics node on DataStax Enterprise

I have a 6 node cluster, one of those is spark enabled.
I also have a spark job that I would like to submit to the cluster / that node, so I enter the following command
spark-submit --class VDQConsumer --master spark://node-public-ip:7077 target/scala-2.10/vdq-consumer-assembly-1.0.jar
it launches the spark ui on that node, but eventually gets here:
15/05/14 14:19:55 INFO SparkContext: Added JAR file:/Users/cwheeler/dev/git/vdq-consumer/target/scala-2.10/vdq-consumer-assembly-1.0.jar at http://node-ip:54898/jars/vdq-consumer-assembly-1.0.jar with timestamp 1431627595602
15/05/14 14:19:55 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#node-ip:7077/user/Master...
15/05/14 14:19:55 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster#node-ip:7077] has failed, address is now gated for [5000] ms. Reason is: [Disassociated].
15/05/14 14:20:15 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#node-ip:7077/user/Master...
15/05/14 14:20:35 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#node-ip:7077/user/Master...
15/05/14 14:20:55 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/05/14 14:20:55 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
15/05/14 14:20:55 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
Does anyone have any idea what just happened?

Resources