Trying to use JFrog as authenticated Artifactory with Apache Spark with the https://user:pswd#url suggested here: https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management and it is not working with next configuration:
"jars":"https://user:pswd#test.jfrog.io/artifactory/default-generic-local/spark-sql-kafka-0-10_2.12-3.0.3.jar"
It doesn't work:
22/04/27 15:27:21 INFO SecurityManager - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(1000690000); groups with view permissions: Set(); users with modify permissions: Set(1000690000); groups with modify permissions: Set()
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 401 for URL: https://user:pswd#test.jfrog.io/artifactory/default-generic-local/spark-sql-kafka-0-10_2.12-3.0.3.jar
Even if you try from a console this work:
curl -v "https://user:pswd#test.jfrog.io/artifactory/default-generic-local/"
Also a no authenticated JFrog artifact it works good. Any advice to use an authenticated artifact from Spark?
Related
I am trying to install Spark on the IPv6, however, spark-master comes on the IPv6 - DNS hostname but the spark worker node doesn't start even though I pass IP or DNS the error is the same. I need to use LOCAL_WORKER_IP=127.0.0.1 to make the spark worker start.
SPARK Worker log:
*fd74:ca9b:3a09:868c:172:18:0:462a spark-master.t253-u000265.svc.cluster.local
21/08/09 16:41:40 INFO Worker: Started daemon with process name: 10#v4-virtio-spark-worker-zjgxs
21/08/09 16:41:40 INFO SignalUtils: Registered signal handler for TERM
21/08/09 16:41:40 INFO SignalUtils: Registered signal handler for HUP
21/08/09 16:41:40 INFO SignalUtils: Registered signal handler for INT
21/08/09 16:41:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
21/08/09 16:41:41 INFO SecurityManager: Changing view acls to: root
21/08/09 16:41:41 INFO SecurityManager: Changing modify acls to: root
21/08/09 16:41:41 INFO SecurityManager: Changing view acls groups to:
21/08/09 16:41:41 INFO SecurityManager: Changing modify acls groups to:
21/08/09 16:41:41 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/08/09 16:41:42 INFO Utils: Successfully started service 'sparkWorker' on port 37816
21/08/09 16:41:42 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.lang.AssertionError: assertion failed: Expected hostname (not IP) but got fd74:ca9b:3a09:868c:172:18:0:488e
at scala.Predef$.assert(Predef.scala:170)
at org.apache.spark.util.Utils$.checkHost(Utils.scala:1014)
at org.apache.spark.deploy.worker.Worker.<init>(Worker.scala:60)
at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:811)
at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:779)
at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
21/08/09 16:41:42 INFO ShutdownHookManager: Shutdown hook called*
Does anyone know if the SPARK worker can be configured on the Ipv6 using any configurations?
TLDR;
It's possible to configure the Beam portable runner with the spark configurations? More precisely, it's possible to configure the spark.driver.host in the Portable Runner?
Motivation
Currently, we have airflow implemented in a Kubernetes cluster, and aiming to use TensorFlow Extended we need to use Apache beam. For our use case Spark would be the appropriate runner to be used, and as airflow and TensorFlow are coded in python we would need to use the Apache Beam's Portable Runner (https://beam.apache.org/documentation/runners/spark/#portability).
The problem
The portable runner creates the spark context inside its container and does not leave space for the driver DNS configuration making the executors inside the worker pods non-communicable to the driver (the job server).
Setup
Following the beam documentation, the job serer was implemented in the same pod as the airflow to use the local network between these two containers.
Job server config:
- name: beam-spark-job-server
image: apache/beam_spark_job_server:2.27.0
args: ["--spark-master-url=spark://spark-master:7077"]
Job server/airflow service:
apiVersion: v1
kind: Service
metadata:
name: airflow-scheduler
labels:
app: airflow-k8s
spec:
type: ClusterIP
selector:
app: airflow-scheduler
ports:
- port: 8793
protocol: TCP
targetPort: 8793
name: scheduler
- port: 8099
protocol: TCP
targetPort: 8099
name: job-server
- port: 7077
protocol: TCP
targetPort: 7077
name: spark-master
- port: 8098
protocol: TCP
targetPort: 8098
name: artifact
- port: 8097
protocol: TCP
targetPort: 8097
name: java-expansion
The ports 8097,8098 and 8099 are related to the job server, 8793 to airflow, and 7077 to the spark master.
Development/Errors
When testing a simple beam example python -m apache_beam.examples.wordcount --output ./data_test/ --runner=PortableRunner --job_endpoint=localhost:8099 --environment_type=LOOPBACK from the airflow container I get the following response on the airflow pod:
Defaulting container name to airflow-scheduler.
Use 'kubectl describe pod/airflow-scheduler-local-f685b5bc7-9d7r6 -n airflow-main-local' to see all of the containers in this pod.
airflow#airflow-scheduler-local-f685b5bc7-9d7r6:/opt/airflow$ python -m apache_beam.examples.wordcount --output ./data_test/ --runner=PortableRunner --job_endpoint=localhost:8099 --environment_type=LOOPBACK
INFO:apache_beam.internal.gcp.auth:Setting socket default timeout to 60 seconds.
INFO:apache_beam.internal.gcp.auth:socket default timeout is 60.0 seconds.
INFO:oauth2client.client:Timeout attempting to reach GCE metadata service.
WARNING:apache_beam.internal.gcp.auth:Unable to find default credentials to use: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
Connecting anonymously.
INFO:apache_beam.runners.worker.worker_pool_main:Listening for workers at localhost:35837
WARNING:root:Make sure that locally built Python SDK docker image has Python 3.7 interpreter.
INFO:root:Default Python SDK image for environment is apache/beam_python3.7_sdk:2.27.0
INFO:apache_beam.runners.portability.portable_runner:Environment "LOOPBACK" has started a component necessary for the execution. Be sure to run the pipeline using
with Pipeline() as p:
p.apply(..)
This ensures that the pipeline finishes before this program exits.
INFO:apache_beam.runners.portability.portable_runner:Job state changed to STOPPED
INFO:apache_beam.runners.portability.portable_runner:Job state changed to STARTING
INFO:apache_beam.runners.portability.portable_runner:Job state changed to RUNNING
And the worker log:
21/02/19 19:50:00 INFO Worker: Asked to launch executor app-20210219194804-0000/47 for BeamApp-root-0219194747-7d7938cf_51452c51-dffe-4c61-bcb7-60c7779e3256
21/02/19 19:50:00 INFO SecurityManager: Changing view acls to: root
21/02/19 19:50:00 INFO SecurityManager: Changing modify acls to: root
21/02/19 19:50:00 INFO SecurityManager: Changing view acls groups to:
21/02/19 19:50:00 INFO SecurityManager: Changing modify acls groups to:
21/02/19 19:50:00 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/02/19 19:50:00 INFO ExecutorRunner: Launch command: "/usr/local/openjdk-8/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=44447" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#airflow-scheduler-local-f685b5bc7-9d7r6:44447" "--executor-id" "47" "--hostname" "172.18.0.3" "--cores" "1" "--app-id" "app-20210219194804-0000" "--worker-url" "spark://Worker#172.18.0.3:35837"
21/02/19 19:50:02 INFO Worker: Executor app-20210219194804-0000/47 finished with state EXITED message Command exited with code 1 exitStatus 1
21/02/19 19:50:02 INFO ExternalShuffleBlockResolver: Clean up non-shuffle files associated with the finished executor 47
21/02/19 19:50:02 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20210219194804-0000, execId=47)
21/02/19 19:50:02 INFO Worker: Asked to launch executor app-20210219194804-0000/48 for BeamApp-root-0219194747-7d7938cf_51452c51-dffe-4c61-bcb7-60c7779e3256
21/02/19 19:50:02 INFO SecurityManager: Changing view acls to: root
21/02/19 19:50:02 INFO SecurityManager: Changing modify acls to: root
21/02/19 19:50:02 INFO SecurityManager: Changing view acls groups to:
21/02/19 19:50:02 INFO SecurityManager: Changing modify acls groups to:
21/02/19 19:50:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/02/19 19:50:02 INFO ExecutorRunner: Launch command: "/usr/local/openjdk-8/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=44447" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#airflow-scheduler-local-f685b5bc7-9d7r6:44447" "--executor-id" "48" "--hostname" "172.18.0.3" "--cores" "1" "--app-id" "app-20210219194804-0000" "--worker-url" "spark://Worker#172.18.0.3:35837"
21/02/19 19:50:04 INFO Worker: Executor app-20210219194804-0000/48 finished with state EXITED message Command exited with code 1 exitStatus 1
21/02/19 19:50:04 INFO ExternalShuffleBlockResolver: Clean up non-shuffle files associated with the finished executor 48
21/02/19 19:50:04 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20210219194804-0000, execId=48)
21/02/19 19:50:04 INFO Worker: Asked to launch executor app-20210219194804-0000/49 for BeamApp-root-0219194747-7d7938cf_51452c51-dffe-4c61-bcb7-60c7779e3256
21/02/19 19:50:04 INFO SecurityManager: Changing view acls to: root
21/02/19 19:50:04 INFO SecurityManager: Changing modify acls to: root
21/02/19 19:50:04 INFO SecurityManager: Changing view acls groups to:
21/02/19 19:50:04 INFO SecurityManager: Changing modify acls groups to:
21/02/19 19:50:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/02/19 19:50:04 INFO ExecutorRunner: Launch command: "/usr/local/openjdk-8/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=44447" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#airflow-scheduler-local-f685b5bc7-9d7r6:44447" "--executor-id" "49" "--hostname" "172.18.0.3" "--cores" "1" "--app-id" "app-20210219194804-0000" "--worker-url" "spark://Worker#172.18.0.3:35837"
.
.
.
As we can see, the executor is being exited constantly, and by what I know this issue is created by the missing communication between the executor and the driver (the job server in this case). Also, the "--driver-url" is translated to the driver pod name using the random port "-Dspark.driver.port".
As we can't define the name of the service, the worker tries to use the original name from the driver and to use a randomly generated port. As the configuration comes from the driver, changing the default conf files in the worker/master doesn't create any results.
Using this answer as an example, I tried to use the env variable SPARK_PUBLIC_DNS in the job server but this didn't result in any changes in the worker logs.
Obs
Using directly in kubernetes a spark job
kubectl run spark-base --rm -it --labels="app=spark-client" --image bde2020/spark-base:2.4.5-hadoop2.7 -- bash ./spark/bin/pyspark --master spark://spark-master:7077 --conf spark.driver.host=spark-client
having the service:
apiVersion: v1
kind: Service
metadata:
name: spark-client
spec:
selector:
app: spark-client
clusterIP: None
I get a full working pyspark shell. If I omit the --conf parameter I get the same behavior as the first setup (exiting executors indefinitely)
21/02/19 20:21:02 INFO Worker: Executor app-20210219202050-0002/4 finished with state EXITED message Command exited with code 1 exitStatus 1
21/02/19 20:21:02 INFO ExternalShuffleBlockResolver: Clean up non-shuffle files associated with the finished executor 4
21/02/19 20:21:02 INFO ExternalShuffleBlockResolver: Executor is not registered (appId=app-20210219202050-0002, execId=4)
21/02/19 20:21:02 INFO Worker: Asked to launch executor app-20210219202050-0002/5 for Spark shell
21/02/19 20:21:02 INFO SecurityManager: Changing view acls to: root
21/02/19 20:21:02 INFO SecurityManager: Changing modify acls to: root
21/02/19 20:21:02 INFO SecurityManager: Changing view acls groups to:
21/02/19 20:21:02 INFO SecurityManager: Changing modify acls groups to:
21/02/19 20:21:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/02/19 20:21:02 INFO ExecutorRunner: Launch command: "/usr/local/openjdk-8/bin/java" "-cp" "/opt/spark/conf/:/opt/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=46161" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler#spark-base:46161" "--executor-id" "5" "--hostname" "172.18.0.20" "--cores" "1" "--app-id" "app-20210219202050-0002" "--worker-url" "spark://Worker#172.18.0.20:45151"
I have three solutions to choose from depending on your deployment requirements. In order of difficulty:
Use the Spark "uber jar" job server. This starts an embedded job server inside the Spark master, instead of using a standalone job server in a container. This would simplify your deployment a lot, since you would not need to start the beam_spark_job_server container at all.
python -m apache_beam.examples.wordcount \
--output ./data_test/ \
--runner=SparkRunner \
--spark_submit_uber_jar \
--spark_master_url=spark://spark-master:7077 \
--environment_type=LOOPBACK
You can pass the properties through a Spark configuration file. Create the Spark configuration file, and add spark.driver.host and whatever other properties you need. In the docker run command for the job server, mount that configuration file to the container, and set the SPARK_CONF_DIR environment variable to point to that directory.
If that neither of those work for you, you can alternatively build your own customized version of the job server container. Pull Beam source from Github. Check out the release branch you want to use (e.g. git checkout origin/release-2.28.0). Modify the entrypoint spark-job-server.sh and set -Dspark.driver.host=x there. Then build the container using ./gradlew :runners:spark:job-server:container:docker -Pdocker-repository-root="your-repo" -Pdocker-tag="your-tag".
Let me revise the answer. The Job server need to able to communicate with the workers vice verse. The error of keep exiting is due to this. You need to configure such that they can communicate. A k8s headless service able to solve this.
reference of workable example at https://github.com/cometta/python-apache-beam-spark . If it is useful for you, can help me to 'Star' the repository
I have setup Spark2.1.1 cluster (1 master 2 slaves) following http://paxcel.net/blog/how-to-setup-apache-spark-standalone-cluster-on-multiple-machine/ in standalone mode.
I do not have a pre-Hadoop setup on of the machine. I wanted to start spark-history server.
I run it as follows:
roshan#bolt:~/spark/spark_home/sbin$ ./start-history-server.sh
and in the spark-defaults.conf I set this:
spark.eventLog.enabled true
But it fails with the error:
7/06/29 22:59:03 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(roshan); groups with view permissions: Set(); users with modify permissions: Set(roshan); groups with modify permissions: Set()
17/06/29 22:59:03 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:278)
at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: java.io.FileNotFoundException: Log directory specified does not exist: file:/tmp/spark-events Did you configure the correct one through spark.history.fs.logDirectory?
at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$startPolling(FsHistoryProvider.scala:214)
What I should set to spark.history.fs.logDirectory and spark.eventLog.dir
Update 1:
spark.eventLog.enabled true
spark.history.fs.logDirectory file:////home/roshan/spark/spark_home/logs
spark.eventLog.dir file:////home/roshan/spark/spark_home/logs
but I am always getting this error:
java.lang.IllegalArgumentException: Codec [1] is not available. Consider setting spark.io.compression.codec=snappy at org.apache.spark.io.Co
By default spark defines file:/tmp/spark-events as the log directory for history server and your log clearly says spark.history.fs.logDirectory is not configured
first of all you need to create spark-events folder in /tmp (which is not a good idea as /tmp is refreshed everytime a machine is rebooted) and then add spark.history.fs.logDirectory in spark-defaults.conf to point to that directory. But I suggest you create another folder which spark user has access to and update spark-defaults.conf file.
You need to define two more variables in spark-defaults.conf file
spark.eventLog.dir file:path to where you want to store your logs
spark.history.fs.logDirectory file:same path as above
Suppose you want to store in /opt/spark-events where spark user has access to then above parameters in spark-defaults.conf would be
spark.eventLog.enabled true
spark.eventLog.dir file:/opt/spark-events
spark.history.fs.logDirectory file:/opt/spark-events
You can find more information in Monitoring and Instrumentation
Try setting
spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
in spark-defaults.conf
I have logged one of my jobs in Apache Spark using the following snippet in the Universities.
conf = SparkConf()\
.setAppName("Ex").set("spark.eventLog.enabled", "true")\
.set("spark.eventLog.dir", "log")
After the job has completed, I tried to copy the log file app-20170416171823-0000. On to my local system and tried to follow the following command to go through the recorded Spark Web UI.
sbin/start-history-server.sh ~/Downloads/log/app-20170416171823-0000
But the history terminated with the following error:
failed to launch: nice -n 0 /usr/local/Cellar/apache-spark/2.1.0/libexec/bin/spark-class org.apache.spark.deploy.history.HistoryServer /Users/sk/Downloads/log/app-20170416171823-0000 at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:77) ... 6 more full log in /usr/local/Cellar/apache-spark/2.1.0/libexec/logs/spark-sk-org.apa
che.spark.deploy.history.HistoryServer-1-Sk-MacBook-Pro.local.out
Contents of the output of the history server:
17/04/16 17:44:52 INFO SecurityManager: Changing modify acls groups to:
17/04/16 17:44:52 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(skaran); groups with view permissions: Set(); users with modify permissions: Set(skaran); groups with modify permissions: Set()
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:278)
at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: java.lang.IllegalArgumentException: Logging directory specified is not a directory: file:/Users/sk/Downloads/log/app-20170416171823-0000
at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$startPolling(FsHistoryProvider.scala:198)
at org.apache.spark.deploy.history.FsHistoryProvider.initialize(FsHistoryProvider.scala:153)
at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:149)
at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:77)
... 6 more
It seems that the argument should be the folder containing the log.
i am trying to deploy Spark 2.0 Streaming over Amazon EMR 5.0.
it seems that the application is getting stuck at endless loop with the log
"endless loop of "INFO Client: Application report for application_14111979683_1111 (state: ACCEPTED)"
and then exit.
Here is how i am trying to submit it through the command line:
aws emr add-steps --cluster-id --steps
Type=Spark,Name="Spark
Program",ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--class,,s3://.jar]
any idea ?
thanks,
Eran
16/08/30 15:43:27 INFO SecurityManager: Changing view acls to: hadoop
16/08/30 15:43:27 INFO SecurityManager: Changing modify acls to: hadoop
16/08/30 15:43:27 INFO SecurityManager: Changing view acls groups to:
16/08/30 15:43:27 INFO SecurityManager: Changing modify acls groups to:
16/08/30 15:43:27 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set()
16/08/30 15:43:27 INFO Client: Submitting application application_14111979683_1111 to ResourceManager
16/08/30 15:43:27 INFO YarnClientImpl: Submitted application application_14111979683_1111
16/08/30 15:43:28 INFO Client: Application report for application_14111979683_1111 (state: ACCEPTED)
16/08/30 15:43:28 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1472571807467
final status: UNDEFINED
tracking URL: http://xxxxxx:20888/proxy/application_14111979683_1111/
user: hadoop
16/08/30 15:43:29 INFO Client: Application report for application_14111979683_1111 (state: ACCEPTED)
and this the exception thrown:
16/08/31 08:14:48 INFO Client:
client token: N/A
diagnostics: Application application_1472630652740_0001 failed 2 times due to AM Container for appattempt_1472630652740_0001_000002 exited with exitCode: 13
For more detailed output, check application tracking page:http://ip-10-0-0-8.eu-west-1.compute.internal:8088/cluster/app/application_1472630652740_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1472630652740_0001_02_000001
Exit code: 13
Stack trace: ExitCodeException exitCode=13:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
EMR is actually a wrapper to Yarn.
so, we need to add "--master yarn" as an argument to the deployment command line.
Example:
aws emr add-steps --cluster-id j-XXXXXXXXX --steps Type=Spark,Name="Spark Program",ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--class,com.xxx.MyMainClass,s3://]
Another thing which is needed, is removing 'sparkConf.setMaster("local[*]")',
from the initialization of spark conf.