How to start Spark History Server pointing to Google Cloud? - apache-spark

I'm using following command to start the server
sh /usr/local/Cellar/apache-spark/3.2.1/libexec/sbin/start-history-server.sh --properties-file /usr/local/Cellar/apache-spark/3.2.1/custom-configs/cloud.properties
cloud.properties file contents
spark.history.fs.logDirectory=gs://bucket/spark-history-server
Got the following error
Spark Command: /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home/bin/java -cp /usr/local/Cellar/apache-spark/3.2.1/libexec/conf/:/usr/local/Cellar/apache-spark/3.2.1/libexec/jars/* -Xmx1g org.apache.spark.deploy.history.HistoryServer --properties-file /usr/local/Cellar/apache-spark/3.2.1/custom-configs/cloud.properties
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/05/03 00:58:50 INFO HistoryServer: Started daemon with process name: XXXX
22/05/03 00:58:50 INFO SignalUtils: Registering signal handler for TERM
22/05/03 00:58:50 INFO SignalUtils: Registering signal handler for HUP
22/05/03 00:58:50 INFO SignalUtils: Registering signal handler for INT
22/05/03 00:58:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/05/03 00:58:50 INFO SecurityManager: Changing view acls to: xxx
22/05/03 00:58:50 INFO SecurityManager: Changing modify acls to: xxx
22/05/03 00:58:50 INFO SecurityManager: Changing view acls groups to:
22/05/03 00:58:50 INFO SecurityManager: Changing modify acls groups to:
22/05/03 00:58:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xxx); groups with view permissions: Set(); users with modify permissions: Set(xxx); groups with modify permissions: Set()
22/05/03 00:58:50 INFO FsHistoryProvider: History server ui acls disabled; users with admin permissions: ; groups with admin permissions:
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:305)
at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala)
Caused by: org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "gs"
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:3443)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3466)
at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:116)
at org.apache.spark.deploy.history.FsHistoryProvider.<init>(FsHistoryProvider.scala:88)
... 6 more
How to set the FileSystem configuration here?

Related

Apache Spark on IPv6

I am trying to install Spark on the IPv6, however, spark-master comes on the IPv6 - DNS hostname but the spark worker node doesn't start even though I pass IP or DNS the error is the same. I need to use LOCAL_WORKER_IP=127.0.0.1 to make the spark worker start.
SPARK Worker log:
*fd74:ca9b:3a09:868c:172:18:0:462a spark-master.t253-u000265.svc.cluster.local
21/08/09 16:41:40 INFO Worker: Started daemon with process name: 10#v4-virtio-spark-worker-zjgxs
21/08/09 16:41:40 INFO SignalUtils: Registered signal handler for TERM
21/08/09 16:41:40 INFO SignalUtils: Registered signal handler for HUP
21/08/09 16:41:40 INFO SignalUtils: Registered signal handler for INT
21/08/09 16:41:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
21/08/09 16:41:41 INFO SecurityManager: Changing view acls to: root
21/08/09 16:41:41 INFO SecurityManager: Changing modify acls to: root
21/08/09 16:41:41 INFO SecurityManager: Changing view acls groups to:
21/08/09 16:41:41 INFO SecurityManager: Changing modify acls groups to:
21/08/09 16:41:41 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/08/09 16:41:42 INFO Utils: Successfully started service 'sparkWorker' on port 37816
21/08/09 16:41:42 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.lang.AssertionError: assertion failed: Expected hostname (not IP) but got fd74:ca9b:3a09:868c:172:18:0:488e
at scala.Predef$.assert(Predef.scala:170)
at org.apache.spark.util.Utils$.checkHost(Utils.scala:1014)
at org.apache.spark.deploy.worker.Worker.<init>(Worker.scala:60)
at org.apache.spark.deploy.worker.Worker$.startRpcEnvAndEndpoint(Worker.scala:811)
at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:779)
at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
21/08/09 16:41:42 INFO ShutdownHookManager: Shutdown hook called*
Does anyone know if the SPARK worker can be configured on the Ipv6 using any configurations?

Network error log on spark docker(bitnami/spark) cluster

Server 1 : Master, Slave Node
Server 2 : Slave Node
Server 3 : Slave Node
When I execute the pi.py example to master node, many jobs were finished with Exit code 1.
Same goes for the log message in workernode, like below.
However, I don't know the exact reason... Could you give me some advise???
20/03/12 13:21:54 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 7571#803acbaf5fbf
20/03/12 13:21:54 INFO SignalUtils: Registered signal handler for TERM
20/03/12 13:21:54 INFO SignalUtils: Registered signal handler for HUP
20/03/12 13:21:54 INFO SignalUtils: Registered signal handler for INT
20/03/12 13:21:55 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/12 13:21:55 INFO SecurityManager: Changing view acls to: spark,root
20/03/12 13:21:55 INFO SecurityManager: Changing modify acls to: spark,root
20/03/12 13:21:55 INFO SecurityManager: Changing view acls groups to:
20/03/12 13:21:55 INFO SecurityManager: Changing modify acls groups to:
20/03/12 13:21:55 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark, root); groups with view permissions: Set(); users with modify permissions: Set(spark, root); groups with modify permissions: Set()
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:285)
at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
... 4 more
Caused by: java.io.IOException: Failed to connect to 67f75f899bac:43487
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: 67f75f899bac
at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getByName(InetAddress.java:1077)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
at java.security.AccessController.doPrivileged(Native Method)
at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
at io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
at io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)
at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:202)
at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:48)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:182)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:168)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:577)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:551)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:490)
at io.netty.util.concurrent.DefaultPromise.setValue0(DefaultPromise.java:615)
at io.netty.util.concurrent.DefaultPromise.setSuccess0(DefaultPromise.java:604)
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
at io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:84)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:985)
at io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:505)
at io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:416)
at io.netty.channel.AbstractChannel$AbstractUnsafe$1.run(AbstractChannel.java:475)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:518)
at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more
Could you give me some advise???
This means that the worker nodes are not able to reach the master node. Because you are using spark inside docker, docker containers both workers and Master should be able to communicate. Update /etc/hosts for all nodes with correct Ip addresses.
You can also update the docker host /etc/hosts and attach it as a volume with -v /etc/hosts:/etc/hosts:rw inside the container.
Add --network host to your run command to allow port mapping with the docker host. Change the hostname of the containers.
Add -e SPARK_MASTER_URL=spark://YOUR_HOST:7077 to your run command
I founded this thread with a similar problem.
Could you check your acls ?

Spark on Yarn error : Yarn application has already ended! It might have been killed or unable to launch application master

While starting spark-shell --master yarn --deploy-mode client I am getting error :
Yarn application has already ended! It might have been killed or
unable to launch application master.
Here is the complete log from Yarn:
19/08/28 00:54:55 INFO client.RMProxy: Connecting to ResourceManager
at /0.0.0.0:8032
Container: container_1566921956926_0010_01_000001 on
rhel7-cloudera-dev_33917
=============================================================================== LogType:stderr Log Upload Time:28-Aug-2019 00:46:31 LogLength:523 Log
Contents: SLF4J: Class path contains multiple SLF4J bindings. SLF4J:
Found binding in
[jar:file:/yarn/local/usercache/rhel/filecache/26/__spark_libs__5634501618166443611.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/etc/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation. SLF4J: Actual binding is of type
[org.slf4j.impl.Log4jLoggerFactory]
LogType:stdout Log Upload Time:28-Aug-2019 00:46:31 LogLength:5597 Log
Contents: 2019-08-28 00:46:19 INFO SignalUtils:54 - Registered signal
handler for TERM 2019-08-28 00:46:19 INFO SignalUtils:54 - Registered
signal handler for HUP 2019-08-28 00:46:19 INFO SignalUtils:54 -
Registered signal handler for INT 2019-08-28 00:46:19 INFO
SecurityManager:54 - Changing view acls to: yarn,rhel 2019-08-28
00:46:19 INFO SecurityManager:54 - Changing modify acls to: yarn,rhel
2019-08-28 00:46:19 INFO SecurityManager:54 - Changing view acls
groups to: 2019-08-28 00:46:19 INFO SecurityManager:54 - Changing
modify acls groups to: 2019-08-28 00:46:19 INFO SecurityManager:54 -
SecurityManager: authentication disabled; ui acls disabled; users
with view permissions: Set(yarn, rhel); groups with view permissions:
Set(); users with modify permissions: Set(yarn, rhel); groups with
modify permissions: Set() 2019-08-28 00:46:20 INFO
ApplicationMaster:54 - Preparing Local resources 2019-08-28 00:46:21
INFO ApplicationMaster:54 - ApplicationAttemptId:
appattempt_1566921956926_0010_000001 2019-08-28 00:46:21 INFO
ApplicationMaster:54 - Waiting for Spark driver to be reachable.
2019-08-28 00:46:21 INFO ApplicationMaster:54 - Driver now available:
rhel7-cloudera-dev:34872 2019-08-28 00:46:21 INFO
TransportClientFactory:267 - Successfully created connection to
rhel7-cloudera-dev/192.168.56.112:34872 after 107 ms (0 ms spent in
bootstraps) 2019-08-28 00:46:22 INFO ApplicationMaster:54 -
=============================================================================== YARN executor launch context: env:
CLASSPATH -> {{PWD}}{{PWD}}/spark_conf{{PWD}}/spark_libs/$HADOOP_CONF_DIR$HADOOP_COMMON_HOME/share/hadoop/common/$HADOOP_COMMON_HOME/share/hadoop/common/lib/$HADOOP_HDFS_HOME/share/hadoop/hdfs/$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/$HADOOP_YARN_HOME/share/hadoop/yarn/$HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
$HADOOP_COMMON_HOME/$HADOOP_COMMON_HOME/lib/$HADOOP_HDFS_HOME/$HADOOP_HDFS_HOME/lib/$HADOOP_MAPRED_HOME/$HADOOP_MAPRED_HOME/lib/$HADOOP_YARN_HOME/$HADOOP_YARN_HOME/lib/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib//etc/hadoop-2.6.0/etc/hadoop:/etc/hadoop-2.6.0/share/hadoop/common/lib/:/etc/hadoop-2.6.0/share/hadoop/common/:/etc/hadoop-2.6.0/share/hadoop/hdfs:/etc/hadoop-2.6.0/share/hadoop/hdfs/lib/:/etc/hadoop-2.6.0/share/hadoop/hdfs/:/etc/hadoop-2.6.0/share/hadoop/yarn/lib/:/etc/hadoop-2.6.0/share/hadoop/yarn/:/etc/hadoop-2.6.0/share/hadoop/mapreduce/lib/:/etc/hadoop-2.6.0/share/hadoop/mapreduce/:/etc/hadoop-2.6.0/contrib/capacity-scheduler/.jar{{PWD}}/spark_conf/hadoop_conf
SPARK_DIST_CLASSPATH -> /etc/hadoop-2.6.0/etc/hadoop:/etc/hadoop-2.6.0/share/hadoop/common/lib/:/etc/hadoop-2.6.0/share/hadoop/common/:/etc/hadoop-2.6.0/share/hadoop/hdfs:/etc/hadoop-2.6.0/share/hadoop/hdfs/lib/:/etc/hadoop-2.6.0/share/hadoop/hdfs/:/etc/hadoop-2.6.0/share/hadoop/yarn/lib/:/etc/hadoop-2.6.0/share/hadoop/yarn/:/etc/hadoop-2.6.0/share/hadoop/mapreduce/lib/:/etc/hadoop-2.6.0/share/hadoop/mapreduce/:/etc/hadoop-2.6.0/contrib/capacity-scheduler/.jar
SPARK_YARN_STAGING_DIR -> *********(redacted)
SPARK_USER -> *********(redacted)
SPARK_CONF_DIR -> /etc/spark/conf
SPARK_HOME -> /etc/spark
command:
{{JAVA_HOME}}/bin/java \
-server \
-Xmx1024m \
-Djava.io.tmpdir={{PWD}}/tmp \
'-Dspark.driver.port=34872' \
-Dspark.yarn.app.container.log.dir= \
-XX:OnOutOfMemoryError='kill %p' \
org.apache.spark.executor.CoarseGrainedExecutorBackend \
--driver-url \
spark://CoarseGrainedScheduler#rhel7-cloudera-dev:34872 \
--executor-id \
\
--hostname \
\
--cores \
1 \
--app-id \
application_1566921956926_0010 \
--user-class-path \
file:$PWD/app.jar \
1>/stdout \
2>/stderr
resources:
spark_libs -> resource { scheme: "hdfs" host: "rhel7-cloudera-dev" port: 9000 file:
"/user/rhel/.sparkStaging/application_1566921956926_0010/spark_libs__5634501618166443611.zip"
} size: 232107209 timestamp: 1566933362350 type: ARCHIVE visibility:
PRIVATE
__spark_conf -> resource { scheme: "hdfs" host: "rhel7-cloudera-dev" port: 9000 file:
"/user/rhel/.sparkStaging/application_1566921956926_0010/spark_conf.zip"
} size: 208377 timestamp: 1566933365411 type: ARCHIVE visibility:
PRIVATE
=============================================================================== 2019-08-28 00:46:22 INFO RMProxy:98 - Connecting to ResourceManager
at /0.0.0.0:8030 2019-08-28 00:46:22 INFO YarnRMClient:54 -
Registering the ApplicationMaster 2019-08-28 00:46:22 INFO
YarnAllocator:54 - Will request 2 executor container(s), each with 1
core(s) and 1408 MB memory (including 384 MB of overhead) 2019-08-28
00:46:22 INFO YarnAllocator:54 - Submitted 2 unlocalized container
requests. 2019-08-28 00:46:22 INFO ApplicationMaster:54 - Started
progress reporter thread with (heartbeat : 3000, initial allocation :
200) intervals 2019-08-28 00:46:22 ERROR ApplicationMaster:43 -
RECEIVED SIGNAL TERM 2019-08-28 00:46:23 INFO ApplicationMaster:54 -
Final app status: UNDEFINED, exitCode: 16, (reason: Shutdown hook
called before final status was reported.) 2019-08-28 00:46:23 INFO
ShutdownHookManager:54 - Shutdown hook called
Container: container_1566921956926_0010_02_000001 on
rhel7-cloudera-dev_33917
=============================================================================== LogType:stderr Log Upload Time:28-Aug-2019 00:46:31 LogLength:3576 Log
Contents: SLF4J: Class path contains multiple SLF4J bindings. SLF4J:
Found binding in
[jar:file:/yarn/local/usercache/rhel/filecache/26/__spark_libs__5634501618166443611.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/etc/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation. SLF4J: Actual binding is of type
[org.slf4j.impl.Log4jLoggerFactory] Exception in thread "main"
java.io.IOException: Failed on local exception: java.io.IOException;
Host Details : local host is: "rhel7-cloudera-dev/192.168.56.112";
destination host is: "rhel7-cloudera-dev":9000; at
org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:772) at
org.apache.hadoop.ipc.Client.call(Client.java:1474) at
org.apache.hadoop.ipc.Client.call(Client.java:1401) at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source) at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source) at
org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1977) at
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1118)
at
org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7$$anonfun$apply$3.apply(ApplicationMaster.scala:235)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7$$anonfun$apply$3.apply(ApplicationMaster.scala:232)
at scala.Option.foreach(Option.scala:257) at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7.apply(ApplicationMaster.scala:232)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$7.apply(ApplicationMaster.scala:197)
at
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:800)
at java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:422) at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at
org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:799)
at
org.apache.spark.deploy.yarn.ApplicationMaster.(ApplicationMaster.scala:197)
at
org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:823)
at
org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:854)
at
org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
Caused by: java.io.IOException at
org.apache.hadoop.ipc.Client$Connection.waitForWork(Client.java:935)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:967)
Caused by: java.lang.InterruptedException ... 2 more
LogType:stdout Log Upload Time:28-Aug-2019 00:46:31 LogLength:975 Log
Contents: 2019-08-28 00:46:26 INFO SignalUtils:54 - Registered signal
handler for TERM 2019-08-28 00:46:26 INFO SignalUtils:54 - Registered
signal handler for HUP 2019-08-28 00:46:26 INFO SignalUtils:54 -
Registered signal handler for INT 2019-08-28 00:46:27 INFO
SecurityManager:54 - Changing view acls to: yarn,rhel 2019-08-28
00:46:27 INFO SecurityManager:54 - Changing modify acls to: yarn,rhel
2019-08-28 00:46:27 INFO SecurityManager:54 - Changing view acls
groups to: 2019-08-28 00:46:27 INFO SecurityManager:54 - Changing
modify acls groups to: 2019-08-28 00:46:27 INFO SecurityManager:54 -
SecurityManager: authentication disabled; ui acls disabled; users
with view permissions: Set(yarn, rhel); groups with view permissions:
Set(); users with modify permissions: Set(yarn, rhel); groups with
modify permissions: Set() 2019-08-28 00:46:28 INFO
ApplicationMaster:54 - Preparing Local resources 2019-08-28 00:46:28
ERROR ApplicationMaster:43 - RECEIVED SIGNAL TERM
Any suggestion to resolve this issue?

spark-submit: unable to get driver status

I'm running a job on a test Spark standalone in cluster mode, but I'm finding myself unable to monitor the status of the driver.
Here is a minimal example using spark-2.4.3 (master and one worker running on the same node, started running sbin/start-all.sh on a freshly unarchived installation using the default conf, no conf/slaves set), executing spark-submit from the node itself:
$ spark-submit --master spark://ip-172-31-15-245:7077 --deploy-mode cluster \
--class org.apache.spark.examples.SparkPi \
/home/ubuntu/spark/examples/jars/spark-examples_2.11-2.4.3.jar 100
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.NativeCodeLoader).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/06/27 09:08:28 INFO SecurityManager: Changing view acls to: ubuntu
19/06/27 09:08:28 INFO SecurityManager: Changing modify acls to: ubuntu
19/06/27 09:08:28 INFO SecurityManager: Changing view acls groups to:
19/06/27 09:08:28 INFO SecurityManager: Changing modify acls groups to:
19/06/27 09:08:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(ubuntu); groups with view permissions: Set(); users with modify permissions: Set(ubuntu); groups with modify permissions: Set()
19/06/27 09:08:28 INFO Utils: Successfully started service 'driverClient' on port 36067.
19/06/27 09:08:28 INFO TransportClientFactory: Successfully created connection to ip-172-31-15-245/172.31.15.245:7077 after 29 ms (0 ms spent in bootstraps)
19/06/27 09:08:28 INFO ClientEndpoint: Driver successfully submitted as driver-20190627090828-0008
19/06/27 09:08:28 INFO ClientEndpoint: ... waiting before polling master for driver state
19/06/27 09:08:33 INFO ClientEndpoint: ... polling master for driver state
19/06/27 09:08:33 INFO ClientEndpoint: State of driver-20190627090828-0008 is RUNNING
19/06/27 09:08:33 INFO ClientEndpoint: Driver running on 172.31.15.245:41057 (worker-20190627083412-172.31.15.245-41057)
19/06/27 09:08:33 INFO ShutdownHookManager: Shutdown hook called
19/06/27 09:08:33 INFO ShutdownHookManager: Deleting directory /tmp/spark-34082661-f0de-4c56-92b7-648ea24fa59c
> spark-submit --master spark://ip-172-31-15-245:7077 --status driver-20190627090828-0008
19/06/27 09:09:27 WARN RestSubmissionClient: Unable to connect to server spark://ip-172-31-15-245:7077.
Exception in thread "main" org.apache.spark.deploy.rest.SubmitRestConnectionException: Unable to connect to server
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:165)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:148)
at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732)
at org.apache.spark.deploy.rest.RestSubmissionClient.requestSubmissionStatus(RestSubmissionClient.scala:148)
at org.apache.spark.deploy.SparkSubmit.requestStatus(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:88)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: org.apache.spark.deploy.rest.SubmitRestConnectionException: No response from server
at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:285)
at org.apache.spark.deploy.rest.RestSubmissionClient.org$apache$spark$deploy$rest$RestSubmissionClient$$get(RestSubmissionClient.scala:195)
at org.apache.spark.deploy.rest.RestSubmissionClient$$anonfun$requestSubmissionStatus$3.apply(RestSubmissionClient.scala:152)
... 11 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at org.apache.spark.deploy.rest.RestSubmissionClient.readResponse(RestSubmissionClient.scala:278)
... 13 more
Spark is in good health (I'm able to run other jobs after the one above), the driver-20190627090828-0008 appears as "FINISHED" in the web UI.
Is there something I am missing?
UPDATE:
on the master log all I get is
19/07/01 09:40:24 INFO master.Master: 172.31.15.245:42308 got disassociated, removing it.

Spark fails to register multiple workers to master

I have been working on creating a Spark cluster using 1 master and 4 workers on Linux.
It works fine for one worker. When I try to add more than one worker, only the first worker gets registered to master while the rest fails with the below error,
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/08/06 14:17:39 INFO Worker: Started daemon with process name: 24104#barracuda5
18/08/06 14:17:39 INFO SignalUtils: Registered signal handler for TERM
18/08/06 14:17:39 INFO SignalUtils: Registered signal handler for HUP
18/08/06 14:17:39 INFO SignalUtils: Registered signal handler for INT
18/08/06 14:17:39 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/08/06 14:17:39 INFO SecurityManager: Changing view acls to: barracuda5
18/08/06 14:17:39 INFO SecurityManager: Changing modify acls to: barracuda5
18/08/06 14:17:39 INFO SecurityManager: Changing view acls groups to:
18/08/06 14:17:39 INFO SecurityManager: Changing modify acls groups to:
18/08/06 14:17:39 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(barracuda5); groups with view permissions: Set(); users with modify permissions: Set(barracuda5); groups with modify permissions: Set()
18/08/06 14:17:40 INFO Utils: Successfully started service 'sparkWorker' on port 46635.
18/08/06 14:17:40 INFO Worker: Starting Spark worker 10.0.6.6:46635 with 4 cores, 14.7 GB RAM
18/08/06 14:17:40 INFO Worker: Running Spark version 2.1.0
18/08/06 14:17:40 INFO Worker: Spark home: /usr/lib/spark/spark-2.1.0-bin-hadoop2.7
18/08/06 14:17:40 INFO Utils: Successfully started service 'WorkerUI' on port 8081.
18/08/06 14:17:40 INFO WorkerWebUI: Bound WorkerWebUI to 0.0.0.0, and started at http://10.0.6.6:8081
18/08/06 14:17:40 INFO Worker: Connecting to master Cudatest.533gwuzexxzehbkoeqpn4rgs4d.ux.internal.cloudapp.net:7077...
18/08/06 14:17:40 WARN Worker: Failed to connect to master Cudatest.533gwuzexxzehbkoeqpn4rgs4d.ux.internal.cloudapp.net:7077
org.apache.spark.SparkException: Exception thrown in awaitResult
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
at org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
at org.apache.spark.deploy.worker.Worker$$anonfun$org$apache$spark$deploy$worker$Worker$$tryRegisterAllMasters$1$$anon$1.run(Worker.scala:218)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed to connect to Cudatest.533gwuzexxzehbkoeqpn4rgs4d.ux.internal.cloudapp.net:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:191)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
... 4 more
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:205)
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1226)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:550)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:535)
at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:550)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:535)
at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:550)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:535)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:517)
at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:970)
at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:215)
at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:408)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:455)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:140)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more
Let me know if I have missed something here. Or if anyone knows what might be the solution to this.
Thanks

Resources