We get an exception when setting up flink on Azure HDInsights cluster.
./bin/yarn-session.sh -n 4 -jm 1024m -tm 4096m
Throws:
org.apache.flink.client.deployment.ClusterDeploymentException:
Couldn't deploy Yarn session cluster
Caused by:
Failing this attempt.Diagnostics:
[2018-10-24 00:41:17.703]Resource wasb://../.flink/application_1539730571763_0057/
application_1539730571763_0057-flink-conf.yaml8158650202504017094.tmp
changed on src filesystem (expected 1540341676000, was 1540341677000
java.io.IOException:
at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:273)
It seems to be because wasb blob storage is not keeping original timestamps for copied files breaking HDFS API abstraction on top of wasb.
Any workarounds for this?
Only other thread I could find was Oozie/yarn: resource changed on src filesystem.
We worked with MS Azure support and they confirmed that it only works with Flink 1.4.x and not 1.5.x or 1.6.x.
We downgraded it to 1.4.x and it is working fine now in Azure cluster.
Related
I am trying to run the spark-submit command on my Hadoop cluster Here is a summary of my Hadoop Cluster:
The cluster is built using 5 VirtualBox VM's connected on an internal network
There is 1 namenode and 4 datanodes created.
All the VM's were built from the Bitnami Hadoop Stack VirtualBox image
I am trying to run one of the spark examples using the following spark-submit command
spark-submit --class org.apache.spark.examples.SparkPi $SPARK_HOME/examples/jars/spark-examples_2.12-3.0.3.jar 10
I get the following error:
[2022-07-25 13:32:39.253]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher
I get the same error when trying to run a script with PySpark.
I have tried/verified the following:
environment variables: HADOOP_HOME, SPARK_HOME and HADOOP_CONF_DIR have been set in my .bashrc file
SPARK_DIST_CLASSPATH and HADOOP_CONF_DIR have been defined in spark-env.sh
Added spark.master yarn, spark.yarn.stagingDir hdfs://hadoop-namenode:8020/user/bitnami/sparkStaging and spark.yarn.jars hdfs://hadoop-namenode:8020/user/bitnami/spark/jars/ in spark-defaults.conf
I have uploaded the jars into hdfs (i.e. hadoop fs -put $SPARK_HOME/jars/* hdfs://hadoop-namenode:8020/user/bitnami/spark/jars/ )
The logs accessible via the web interface (i.e. http://hadoop-namenode:8042 ) do not provide any further details about the error.
This section of the Spark documentation seems relevant to the error since the YARN libraries should be included, by default, but only if you've installed the appropriate Spark version
For with-hadoop Spark distribution, since it contains a built-in Hadoop runtime already, by default, when a job is submitted to Hadoop Yarn cluster, to prevent jar conflict, it will not populate Yarn’s classpath into Spark. To override this behavior, you can set spark.yarn.populateHadoopClasspath=true. For no-hadoop Spark distribution, Spark will populate Yarn’s classpath by default in order to get Hadoop runtime. For with-hadoop Spark distribution, if your application depends on certain library that is only available in the cluster, you can try to populate the Yarn classpath by setting the property mentioned above. If you run into jar conflict issue by doing so, you will need to turn it off and include this library in your application jar.
https://spark.apache.org/docs/latest/running-on-yarn.html#preparations
Otherwise, yarn.application.classpath in yarn-site.xml refers to local filesystem paths in each of ResourceManager servers where JARs are available for all YARN applications (spark.yarn.jars or extra packages should get layered onto this)
Another problem could be file permissions. You probably shouldn't put Spark jars into an HDFS user folder if they're meant to be used by all users. Typically, I'd put it under hdfs:///apps/spark/<version>, then give that 744 HDFS permissions
In the Spark / YARN UI, it should show the complete classpath of the application for further debugging
I figured out why I was getting this error. It turns out that I made an error while specifying spark.yarn.jars in spark-defaults.conf
The value of this property must be
hdfs://hadoop-namenode:8020/user/bitnami/spark/jars/*
instead of
hdfs://hadoop-namenode:8020/user/bitnami/spark/jars/
i.e. Basically, we need to specify the jar files as the value to this property and not the folder containing the jar files.
I am trying to setup a spark cluster on k8s. I've managed to create and setup a cluster with three nodes by following this article:
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
After that when I tried to deploy spark on the cluster it failed at spark submit setup.
I used this command:
~/opt/spark/spark-2.3.0-bin-hadoop2.7/bin/spark-submit \
--master k8s://https://206.189.126.172:6443 \
--deploy-mode cluster \
--name word-count \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=5 \
--conf spark.kubernetes.container.image=docker.io/garfiny/spark:v2.3.0 \
—-conf spark.kubernetes.driver.pod.name=word-count \
local:///opt/spark/examples/jars/spark-examples_2.11-2.3.0.jar
And it gives me this error:
Exception in thread "main" org.apache.spark.SparkException: The Kubernetes mode does not yet support referencing application dependencies in the local file system.
at org.apache.spark.deploy.k8s.submit.DriverConfigOrchestrator.getAllConfigurationSteps(DriverConfigOrchestrator.scala:122)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:229)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:227)
at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2585)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:227)
at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:192)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-06-04 10:58:24 INFO ShutdownHookManager:54 - Shutdown hook called
2018-06-04 10:58:24 INFO ShutdownHookManager:54 - Deleting directory /private/var/folders/lz/0bb8xlyd247cwc3kvh6pmrz00000gn/T/spark-3967f4ae-e8b3-428d-ba22-580fc9c840cd
Note: I followed this article for installing spark on k8s.
https://spark.apache.org/docs/latest/running-on-kubernetes.html
The error message comes from commit 5d7c4ba4d73a72f26d591108db3c20b4a6c84f3f and include the page you mention: "Running Spark on Kubernetes" with the mention that you indicate:
// TODO(SPARK-23153): remove once submission client local dependencies are supported.
if (existSubmissionLocalFiles(sparkJars) || existSubmissionLocalFiles(sparkFiles)) {
throw new SparkException("The Kubernetes mode does not yet support referencing application " +
"dependencies in the local file system.")
}
This is described in SPARK-18278:
it wouldn't accept running a local: jar file, e.g. local:///opt/spark/examples/jars/spark-examples_2.11-2.2.0-k8s-0.5.0.jar, on my spark docker image (allowsMixedArguments and isAppResourceReq booleans in SparkSubmitCommandBuilder.java get in the way).
And this is linked to kubernetes issue 34377
The issue SPARK-22962 "Kubernetes app fails if local files are used" mentions:
This is the resource staging server use-case. We'll upstream this in the 2.4.0 timeframe.
In the meantime, that error message was introduced in PR 20320.
It includes the comment:
The manual tests I did actually use a main app jar located on gcs and http.
To be specific and for record, I did the following tests:
Using a gs:// main application jar and a http:// dependency jar. Succeeded.
Using a https:// main application jar and a http:// dependency jar. Succeeded.
Using a local:// main application jar. Succeeded.
Using a file:// main application jar. Failed.
Using a file:// dependency jar. Failed.
That issue should been fixed by now, and the OP garfiny confirms in the comments:
I used the newest spark-kubernetes jar to replace the one in spark-2.3.0-bin-hadoop2.7 package. The exception is gone.
According to the mentioned documentation:
Dependency Management
If your application’s dependencies are all
hosted in remote locations like HDFS or HTTP servers, they may be
referred to by their appropriate remote URIs. Also, application
dependencies can be pre-mounted into custom-built Docker images. Those
dependencies can be added to the classpath by referencing them with
local:// URIs and/or setting the SPARK_EXTRA_CLASSPATH environment
variable in your Dockerfiles. The local:// scheme is also required
when referring to dependencies in custom-built Docker images in
spark-submit.
Note that using application dependencies from the
submission client’s local file system is currently not yet supported.
I'm using aws cli and I launch a Cluster with the following command:
aws emr create-cluster --name "Config1" --release-label emr-5.0.0 --applications Name=Spark --use-default-role --ec2-attributes KeyName=ChiaveEMR --log-uri 's3://aws-logs-813591802533-us-west-2/elasticmapreduce/' --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.medium InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.medium
after that, I put a file into the master node:
aws emr put --cluster-id j-NSGFSP57255P --key-pair-file "ChiaveEMR.pem" --src "./configS3.txt"
The file is located in /home/hadoop/configS3.txt.
Then I launch a step:
aws emr add-steps --cluster-id ID_CLUSTER --region us-west-2 --steps Type=Spark,Name=SparkSubmit,Args=[--deploy-mode,cluster,--master,yarn,--executor-memory,1G,--class,Traccia2014,s3://tracceale/params/traccia-22-ottobre_2.11-1.0Ale.jar,/home/hadoop/configS3.txt,30,300,2,"s3a://tracceale/Tempi1"],ActionOnFailure=CONTINUE
But I get this error:
17/02/23 14:49:51 ERROR ApplicationMaster: User class threw exception: java.io.FileNotFoundException: /home/hadoop/configS3.txt (No such file or directory)
java.io.FileNotFoundException: /home/hadoop/configS3.txt (No such file or directory)
probably due to the fact that 'configS3.txt' is located on the master and not on the slaves.
How could I pass 'configS3.txt' to spark-submit script? I've tried from S3 too but it doesn't work. Any solutions? Thanks in advance
Since you are using "--deploy-mode cluster", the driver runs on a CORE/TASK instance rather than the MASTER instance, so yes, it's because you uploaded the file to the MASTER instance but then the code that's trying to access the file is not running on the MASTER instance.
Given that the error you are encountering is a FileNotFoundException, it sounds like your application code is trying to open it directly, meaning that of course you can't simply use the S3 path directly. (You can't do something like new File("s3://bucket/key") because Java has no idea how to handle this.) My assumption could be wrong though because you have not included your application code or explained what you are doing with this configS3.txt file.
Maurizio: you're still trying to fix your previous problem.
On a distributed system, you need files which are visible on all machines (which the s3:// filestore delivers) and to use an API which works with data from the distributed filesystem. which SparkContext.hadoopRDD() delivers. You aren't going to get anywhere by trying to work out how to get a file onto the local disk of every VM, because that's not the problem you need to fix: it's how to get your code to read data from the shared object store.
Sorry
I'm working on integration between Mesos & Spark. For now, I can start SlaveMesosDispatcher in a docker; and I like to also run Spark executor in Mesos docker. I do the following configuration for it, but I got an error; any suggestion?
Configuration:
Spark: conf/spark-defaults.conf
spark.mesos.executor.docker.image ubuntu
spark.mesos.executor.docker.volumes /usr/bin:/usr/bin,/usr/local/lib:/usr/local/lib,/usr/lib:/usr/lib,/lib:/lib,/home/test/workshop/spark:/root/spark
spark.mesos.executor.home /root/spark
#spark.executorEnv.SPARK_HOME /root/spark
spark.executorEnv.MESOS_NATIVE_LIBRARY /usr/local/lib
NOTE: The spark are installed in /home/test/workshop/spark, and all dependencies are installed.
After submit SparkPi to the dispatcher, the driver job is started but failed. The error messes is:
I1015 11:10:29.488456 18697 exec.cpp:134] Version: 0.26.0
I1015 11:10:29.506619 18699 exec.cpp:208] Executor registered on slave b7e24114-7585-40bc-879b-6a1188cb65b6-S1
WARNING: Your kernel does not support swap limit capabilities, memory limited without swap.
/bin/sh: 1: ./bin/spark-submit: not found
Does any know how to map/set spark home in docker for this case?
I think the issue you're seeing here is a result of the current working directory of the container isn't where Spark is installed. When you specify a docker image for Spark to use with Mesos, it expects the default working directory of the container to be inside $SPARK_HOME where it can find ./bin/spark-submit.
You can see that logic here.
It doesn't look like you're able to configure the working directory through Spark configuration itself, which means you'll need to build a custom image on top of ubuntu that simply does a WORKDIR /root/spark.
We are running certain spark jobs and we see .sparkstaging directoring in hdfs persisting after the job completion.
Is there any parameter we need to set to delete the staging directory after job completion?
spark.yarn.preserve.staging.files is false by default and hence we have not set it explicitly.
we are running spark on yarn using hortonworks and spark version 1.2
Regards,
Manju
Please check for the following log events in the job completion console output to get more insights into what's going on:
ApplicationMaster: Deleting staging directory .sparkStaging/application_xxxxxx_xxxx - this means that the application was able to successfully clean up the staging directory
ApplicationMaster: Staging directory is null - this means that application was not to able to find the staging dir for this application
ApplicationMaster: Failed to cleanup staging dir .sparkStaging/application_xxxxxx_xxxx - this means something went wrong deleting the staging directory
Could you also double check these properties in the cluster which can affect the scenario you have mentioned: spark.yarn.preserve.staging.files and this SPARK_YARN_STAGING_DIR.