Unable to see output or error messages for Spark on Kubernetes - apache-spark

Trying to run a simple Spark application using Kubernetes master. But I don't get the intended output/processing, neither do I see any error messages. The final pod phase is 'Failed' and the error code is 101. The pod logs show the usual log4j warnings, but nothing else.
Running minikube v1.0.1 on windows (amd64) on my office laptop using hyperv. Have already increased the #cpus and memory on minikube VM to 3 and 4 GB as recommended.
Made sure that the applications run fine with Spark Standalone. The first application 'Hello' is supposed to print a 'Hello' message. The second application 'Calculate Monthly Revenue' is supposed to read data from Teradata over JDBC, aggregate it and write the result back to Teradata table over JDBC.
Also made sure that 'hello minikube' works fine.
In all the code snippets below, ... indicates portions omitted for brevity, >>> indicates command prompt.
>>> spark-submit --master k8s://https://153.65.225.219:8443 --deploy-mode cluster --name Hello --class Hello --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=rahulvkulkarni/default:spark-td-run --conf spark.kubernetes.container.image.pullSecrets=regcred local://hello_2.12-0.1.0-SNAPSHOT.jar
log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/05/20 16:59:09 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: hello-1558351748442-driver
...
phase: Pending
status: []
...
19/05/20 16:59:13 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: hello-1558351748442-driver
...
phase: Failed
status: [ContainerStatus(containerID=docker://464c9c0e23d543f20954d373218c9cefefc31107711cbd2ada4d93bb31ce4d80, image=rahulvkulkarni/default:spark-td-run, imageID=docker-pullable://rahulvkulkarni/default#sha256:1de9951c4ac9f0b5f26efa3949e1effa779b0605066f2043738402ce20e8179b, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}), name=spark-kubernetes-driver, ready=false, restartCount=0, state=ContainerState(running=null, terminated=ContainerStateTerminated(containerID=docker://464c9c0e23d543f20954d373218c9cefefc31107711cbd2ada4d93bb31ce4d80, exitCode=101, finishedAt=2019-05-17T18:26:41Z, message=null, reason=Error, signal=null, startedAt=2019-05-17T18:26:40Z, additionalProperties={}), waiting=null, additionalProperties={}), additionalProperties={})]
19/05/20 16:59:13 INFO LoggingPodStatusWatcherImpl: Container final statuses:
Container name: spark-kubernetes-driver
Container image: rahulvkulkarni/default:spark-td-run
Container state: Terminated
Exit code: 101
19/05/20 16:59:13 INFO Client: Application Hello finished.
...
>>> kubectl logs hello-1558351748442-driver
++ id -u
...
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$#")
+ exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=172.17.0.5 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class Hello spark-internal
19/05/17 18:26:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.spark.deploy.SparkSubmit$$anon$2).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
What does exit code 101 mean? How to find the actual error?
Then I tried to configure log4j for detailed logging as described in How to stop INFO messages displaying on spark console?. Renamed and used the log4j.properties template provided in the conf directory. But spark-submit is not able to find the log4j.properties file that I have already included in the docker build.
>>> spark-submit --master k8s://https://153.65.225.219:8443 --deploy-mode cluster --files /opt/spark/conf/log4j.properties --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/opt/spark/conf/log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:/opt/spark/conf/log4j.properties" --name "Calculate Monthly Revenue" --class mthRev --conf spark.executor.instances=1 --conf spark.kubernetes.container.image=rahulvkulkarni/default:spark-td-run --conf spark.kubernetes.container.image.pullSecrets=regcred local://mthrev_2.10-0.1-SNAPSHOT.jar <username> <password> <server name>
19/05/20 20:02:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/05/20 20:02:52 INFO LoggingPodStatusWatcherImpl: State changed, new state:
pod name: calculate-monthly-revenue-1558362771110-driver
...
Container name: spark-kubernetes-driver
Container image: rahulvkulkarni/default:spark-td-run
Container state: Terminated
Exit code: 1
>>> kubectl logs -c spark-kubernetes-driver calculate-monthly-revenue-1558362771110-driver
++ id -u
...
log4j:ERROR Could not read configuration file from URL [file:/opt/spark/conf/log4j.properties].
java.io.FileNotFoundException: /opt/spark/conf/log4j.properties (No such file or directory)
...
log4j:ERROR Ignoring configuration file [file:/opt/spark/conf/log4j.properties].
19/05/17 21:30:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 2: C:
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.<init>(Path.java:171)
at org.apache.hadoop.fs.Path.<init>(Path.java:93)
at org.apache.hadoop.fs.Globber.glob(Globber.java:211)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1657)
at org.apache.spark.deploy.DependencyUtils$.org$apache$spark$deploy$DependencyUtils$$resolveGlobPath(DependencyUtils.scala:192)
at org.apache.spark.deploy.DependencyUtils$$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:147)
at org.apache.spark.deploy.DependencyUtils$$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:145)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
at org.apache.spark.deploy.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:145)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$4.apply(SparkSubmit.scala:355)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$4.apply(SparkSubmit.scala:355)
at scala.Option.map(Option.scala:146)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:355)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 2: C:
at java.net.URI$Parser.fail(URI.java:2848)
at java.net.URI$Parser.failExpecting(URI.java:2854)
at java.net.URI$Parser.parse(URI.java:3057)
at java.net.URI.<init>(URI.java:746)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
... 23 more
[INFO tini (1)] Main child exited normally (with status '1')
I tried several variations of specifying the log4j.properties file: local file on my Windows laptop (file:///C$/Users//spark-2.4.3-bin-hadoop2.7/conf/log4j.properties and file:///C:/Users//spark-2.4.3-bin-hadoop2.7/conf/log4j.properties), local file in the Linux container (file:///opt/spark/conf/log4j.properties). But I keep getting the message:
log4j:ERROR Could not read configuration file from URL [file:/C$/Users/<my-username>/spark-2.4.3-bin-hadoop2.7/conf/log4j.properties].
The IllegalArgumentException exception went away when I tried the path without the colon (C:), i.e. either the Linux path or the Windows path with C$.
But I still don't get the desired output of my program and don't know if/what is the error!

There was a typo in the spark-submit command in the specification of the application jar. I was using only two forward slashes: local://hello_2.12-0.1.0-SNAPSHOT.jar. Hence, Spark was not able to locate it and (I think) was ignoring it silently and then had no work to do. Hence, there was no message. I'd expect it to give a warning at least.
Changed it to three slashes and it moved ahead:
local:///hello_2.12-0.1.0-SNAPSHOT.jar
I now have another issue related to Kubernetes RBAC, which I will solve separately. The log4j issue still remains, but is not a concern for me now.

I solve this by deploying config file to blobs
https://$(container_name).blob.core.windows.net/jars/log4jconfig1
and give him config to spark-submit
--conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=https://<container_name>.blob.core.windows.net/jars/log4jconfig1" \
--conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=https://<container_name>.blob.core.windows.net/jars/log4jconfig1" \

Related

Spark Submit: You have not specified a krb5.conf file locally or via a ConfigMap

I'm using to spark-submit directly to submit a Spark application to a Kubernetes cluster. using the below command :
Submit Spark
./bin/spark-submit --master k8s://https://localhost:8443
--deploy-mode cluster --name spark-pi
--class com.xxx.Main
--conf spark.executor.instances=5
--conf spark.kubernetes.container.image=spark:latest
local:///home/SparkJob_jar spark.kubernetes.file.upload.path=/opt/xxx
But I'm getting the below exception:
20/09/25 11:33:57 WARN Utils: Your hostname, XXXX resolves to a loopback address: 127.0.1.1; using xxx.yyy.zzz instead (on interface wlp59s0)
20/09/25 11:33:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/09/25 11:33:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/09/25 11:33:58 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
20/09/25 11:33:58 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
20/09/25 11:33:58 WARN WatchConnectionManager: Exec Failure
javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:326)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:269)
The post https://issues.apache.org/jira/browse/SPARK-31800 suggested adding spark.kubernetes.file.upload.path to solve the problem, I added it to the submit command, but the issue still there. I appreciate any help to figure out the root cause of this issue.
you might havae to add the following configuration flags to make the certificats available
--conf spark.kubernetes.authenticate.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
--conf spark.kubernetes.authenticate.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token \

spark-submit custom jar on kubernetes fails with: failed to load class [duplicate]

This question already has answers here:
How to deploy Spark application jar file to Kubernetes cluster?
(2 answers)
Closed 2 years ago.
I cannot run a custom Spark application on Kubernetes.
I have followed the setup https://spark.apache.org/docs/latest/running-on-kubernetes.html and the steps on examplel like these: https://towardsdatascience.com/how-to-build-spark-from-source-and-deploy-it-to-a-kubernetes-cluster-in-60-minutes-225829b744f9 and I can run the spark-pi example.
I even recreated the spark image and have it contain my xxx.jar in both /opt/spark/examples/jars and /opt/spark/jars but still I get the failed to load class issue. Any ideas what I may have missed. This is extra baffling to me because I checked that the jar is part of the image right next to the example jars, and those work fine.
I run spark-submit like this.
bin/spark-submit
--master k8s://https://localhost:6443
--deploy-mode cluster
--conf spark.executor.instances=3
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark
--conf spark.kubernetes.container.image=spark2:latest
--conf spark.kubernetes.container.image.pullPolicy=Never
--class com.xxx.Application
--name myApp
local:///opt/spark/examples/jars/xxx.jar
Update: Added stacktrace:
spark.driver.bindAddress=10.1.0.68 --deploy-mode client --properties- file /opt/spark/conf/spark.properties --class com.xxx.Application local:///opt/spark/examples/jars/xxx.jar
20/05/13 11:02:09 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Error: Failed to load class com.xxx.Application.
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
Thanks!
Verify the jar which you referring has Application.class file at com.xxx.Application.class using jar xf <jar name>.jar
Replace
local:///opt/spark/examples/jars/xxx.jar
with
/opt/spark/examples/jars/xxx.jar
in your spark-submit command

Exception in thread "main" java.lang.IllegalStateException: Cannot retrieve files with 'spark' scheme without an active SparkEnv

I'm very new to spark and cassandra, got one sample from github and tried to run the application from the below link
spark-on-cassandra-quickstart
After jar file generated, Tried executing with the below syntax
C:\Users\user\Desktop\softwares\spark-2.4.3-bin-hadoop2.7\spark-2.4.3-bin-hadoop2.7\bin>spark-submit --class com.github.boneill42.JavaDemo --master spark://localhost:7077
C:\Users\user\git\spark-on-cassandra-quickstart\target/spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar spark://localhost:7077 localhost
Below is the issue I'm facing
19/06/08 22:59:49 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.IllegalStateException: Cannot retrieve files with 'spark' scheme without an active SparkEnv.
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:690)
at org.apache.spark.deploy.DependencyUtils$.downloadFile(DependencyUtils.scala:137)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:367)
at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:367)
at scala.Option.map(Option.scala:146)
at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:366)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Please help me in resolving the issue
In your case, It seems you want to start in standalone mode
spark://HOST:PORT Connect to the given Spark standalone cluster master.
The port must be whichever one your master is configured to use, which is 7077 by default.
Do you start spark master and worker first ?
launch master
./sbin/start-master.sh
launch worker
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://localhost:7077 -c 1 -m 512M
After start master and worker, then you can submit your job again.

AWS EMR using spark steps in cluster mode. Application application_ finished with failed status

I'm trying to launch a cluster using AWS Cli. I use the following command:
aws emr create-cluster --name "Config1" --release-label emr-5.0.0 --applications Name=Spark --use-default-role --log-uri 's3://aws-logs-813591802533-us-west-2/elasticmapreduce/' --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.medium InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.medium
The cluster is created successfully. Then I add this command:
aws emr add-steps --cluster-id ID_CLUSTER --region us-west-2 --steps Name=SparkSubmit,Jar="command-runner.jar",Args=[spark-submit,--deploy-mode,cluster,--master,yarn,--executor-memory,1G,--class,Traccia2014,s3://tracceale/params/scalaProgram.jar,s3://tracceale/params/configS3.txt,30,300,2,"s3a://tracceale/Tempi1"],ActionOnFailure=CONTINUE
After some time, the step failed. This is the LOG file:
17/02/22 11:00:07 INFO RMProxy: Connecting to ResourceManager at ip-172-31- 31-190.us-west-2.compute.internal/172.31.31.190:8032
17/02/22 11:00:08 INFO Client: Requesting a new application from cluster with 2 NodeManagers
17/02/22 11:00:08 INFO Client: Verifying our application has not requested
Exception in thread "main" org.apache.spark.SparkException: Application application_1487760984275_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1175)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/22 11:01:02 INFO ShutdownHookManager: Shutdown hook called
17/02/22 11:01:02 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-27baeaa9-8b3a-4ae6-97d0-abc1d3762c86
Command exiting with ret '1'
Locally (on SandBox Hortonworks HDP 2.5) I run:
./spark-submit --class Traccia2014 --master local[*] --executor-memory 2G /usr/hdp/current/spark2-client/ScalaProjects/ScripRapportoBatch2.1/target/scala-2.11/traccia-22-ottobre_2.11-1.0.jar "/home/tracce/configHDFS.txt" 30 300 3
and everything works fine.
I've already read something related to my problem, but I can't figure it out.
UPDATE
Checked into Application Master, I get this error:
17/02/22 15:29:54 ERROR ApplicationMaster: User class threw exception: java.io.FileNotFoundException: s3:/tracceale/params/configS3.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at scala.io.Source$.fromFile(Source.scala:91)
at scala.io.Source$.fromFile(Source.scala:76)
at scala.io.Source$.fromFile(Source.scala:54)
at Traccia2014$.main(Rapporto.scala:40)
at Traccia2014.main(Rapporto.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
17/02/22 15:29:55 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.io.FileNotFoundException: s3:/tracceale/params/configS3.txt (No such file or directory))
I pass the path mentioned "s3://tracceale/params/configS3.txt" from S3 to the function 'fromFile' like this:
for(line <- scala.io.Source.fromFile(logFile).getLines())
How could I solve it? Thanks in advance.
Because you are using cluster deploy mode, the logs you have included are not useful at all. They just say that the application failed but not why it failed. To figure out why it failed, you at least need to look at the Application Master logs, since that is where the Spark driver runs in cluster deploy mode, and it will probably give a better hint as to why the application failed.
Since you have configured your cluster with a --log-uri, you will find the logs for the Application Master underneath s3://aws-logs-813591802533-us-west-2/elasticmapreduce/<CLUSTER ID>/containers/<YARN Application ID>/ where the YARN Application ID is (based on the logs you included above) application_1487760984275_0001, and the container ID should be something like container_1487760984275_0001_01_000001. (The first container for an application is the Application Master.)
What you have there is a URL to an object store, reachable from the Hadoop filesystem APIs, and a stack trace coming from java.io.File, which can't read it because it doesn't refer to anything in the local disk.
Use SparkContext.hadoopRDD() as the operation to convert the path into an RDD
There is a probability of file missing in the location, may be you can see it after ssh into EMR cluster but still the steps command wouldn't be able to figure out by itself and starts throwing that file not found exception.
In this scenario what I did is :
Step 1: Checked for the file existence in the project directory which we copied to EMR.
for example mine was in `//usr/local/project_folder/`
Step 2: Copy the script which you're expecting to run on the EMR.
for example I copied from `//usr/local/project_folder/script_name.sh` to `/home/hadoop/`
Step 3: Then executed the script from /home/hadoop/ by passing the absolute path to the command-runner.jar
command-runner.jar bash /home/hadoop/script_name.sh
Thus I found my script running. Hope this may be helpful to someone

"Bad substitution" when submitting spark job to yarn-cluster

I am doing a smoke test against a yarn cluster using yarn-cluster as the master with the SparkPi example program. Here is the command line:
$SPARK_HOME/bin/spark-submit --master yarn-cluster
--executor-memory 8G --executor-cores 240 --class org.apache.spark.examples.SparkPi
examples/target/scala-2.11/spark-examples-1.4.1-hadoop2.7.1.jar
The yarn accepts the job but then complains about a "bad substitution". Maybe it is on the hdp.version ??
15/09/01 21:54:05 INFO yarn.Client: Application report for application_1441066518301_0013 (state: ACCEPTED)
15/09/01 21:54:05 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1441144443866
final status: UNDEFINED
tracking URL: http://yarnmaster-8245.lvs01.dev.ebayc3.com:8088/proxy/application_1441066518301_0013/
user: stack
15/09/01 21:54:06 INFO yarn.Client: Application report for application_1441066518301_0013 (state: ACCEPTED)
15/09/01 21:54:10 INFO yarn.Client: Application report for application_1441066518301_0013 (state: FAILED)
15/09/01 21:54:10 INFO yarn.Client:
client token: N/A
diagnostics: Application application_1441066518301_0013 failed 2 times due to AM Container for appattempt_1441066518301_0013_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://yarnmaster-8245.lvs01.dev.ebayc3.com:8088/cluster/app/application_1441066518301_0013Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_e03_1441066518301_0013_02_000001
Exit code: 1
Exception message: /mnt/yarn/nm/local/usercache/stack/appcache/
application_1441066518301_0013/container_e03_1441066518301_0013_02_000001/
launch_container.sh: line 24: $PWD:$PWD/__hadoop_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:
/usr/hdp/current/hadoop-client/*::$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:
/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-.6.0.${hdp.version}.jar:
/etc/hadoop/conf/secure: bad substitution
Stack trace: ExitCodeException exitCode=1: /mnt/yarn/nm/local/usercache/stack/appcache/application_1441066518301_0013/container_e03_1441066518301_0013_02_000001/launch_container.sh: line 24: $PWD:$PWD/__hadoop_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Of note here is:
/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-.6.0.${hdp.version}.jar:
/etc/hadoop/conf/secure: bad substitution
The "sh" is linked to bash:
$ ll /bin/sh
lrwxrwxrwx 1 root root 4 Sep 1 05:48 /bin/sh -> bash
It is caused by hdp.version not getting substituted correctly. You have to set hdp.version in the file java-opts under $SPARK_HOME/conf.
And you have to set
spark.driver.extraJavaOptions -Dhdp.version=XXX
spark.yarn.am.extraJavaOptions -Dhdp.version=XXX
in spark-defaults.conf under $SPARK_HOME/conf where XXX is the version of hdp.
If you are using spark with hdp, then you have to do the following things:
Add these entries in $SPARK_HOME/conf/spark-defaults.conf
spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041 (your installed HDP version)
spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041 (your installed HDP version)
Create a file called java-opts in $SPARK_HOME/conf and add the installed HDP version to that file like this:
-Dhdp.version=2.2.0.0-2041 (your installed HDP version)
To figure out which hdp version is installed, please run this command in the cluster:
hdp-select status hadoop-client
I had the same issue:
launch_container.sh: line 24: $PWD:$PWD/__hadoop_conf__:$PWD/__spark__.jar:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*::$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure: bad substitution
As I couldn't find any /usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo* file, I just edited mapred-site.xml and removed
"/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:"
Go to the ambari-yarn.
click on Configs->Advanced->Custom yarn-site->Add Property ...
add hdp version as key and value as your HDP version.
You will get hdp version along with below command
hdp-select versions
e.g. 2.5.3.0-37
Now add you property as
hdp.version=2.5.3.0-37
Otherwise replace ${hdp.version} as your hdp version(2.5.3.0-37) in yarn-site.xml and yarn-env.sh
I also had this Issue using BigInsights 4.2.0.0 with yarn, spark and mapreduce 2 and what was causing it was the iop.version.
To fix it you have to add the iop.version variable to mapred-site, and this can be done into with the following steps:
In Ambari Server go to:
MAPREDUCE2
Configs (tab)
Advanced (tab)
Click into Custom mapred-site
Add Property...
Put iop.version and your BigInsights version.
Restart all services.
This has fixed it.
This may be caused by /bin/sh linked to dash, instead of bash, which often happens on Debian based systems.
To fix it, run sudo dpkg-reconfigure dash and select no.

Resources