PySpark in Pycharm- unable to connect to remote server

PySpark in Pycharm- unable to connect to remote server - apache-spark

Use Case: I want to use my laptop (using Win 7 Professional) to connect to the CentOS 6.4 master server using PyCharm.
Objective: To write the code in Pycharm on the laptop and then send the job to the server which will do the processing and should then return the result back to the laptop or to any other visualizing API.
The server and 3 namenodes already installed with pyspark and I have checked pyspark works in standalone mode on all four servers. Pyspark works in standalone mode on my laptop too.
I use the following code but I am not able to connect to the remote server.
import os
import sys
try:
from pyspark import SparkContext
from pyspark import SparkConf
print ("Pyspark sucess")
except ImportError as e:
print ("Error importing Spark Modules", e)
conf = SparkConf()
conf.setMaster("spark://10.210.250.400:7077")
conf.setAppName("First_Remote_Spark_Program")
sc = SparkContext(conf=conf)
print ("connection succeeded with Master",conf)
data = [1, 2, 3, 4, 5]
distData = sc.parallelize(data)
print(distData)
The stack trace of error is
Pyspark sucess
15/08/01 14:08:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/08/01 14:08:24 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:318)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:333)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:326)
at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:76)
at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:93)
at org.apache.hadoop.security.Groups.<init>(Groups.java:77)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:240)
at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:255)
at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:232)
at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:718)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:703)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:605)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2162)
at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2162)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2162)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:301)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
15/08/01 14:08:25 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
15/08/01 14:08:26 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster#10.210.250.400:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#10.210.250.400:7077
15/08/01 14:08:26 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#10.210.250.400:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: no further information: /10.210.250.400:7077
15/08/01 14:08:46 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster#10.210.250.400:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#10.210.250.400:7077
15/08/01 14:08:46 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#10.210.250.400:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: no further information: /10.210.250.400:7077
15/08/01 14:09:06 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster#10.210.250.400:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster#10.210.250.400:7077
15/08/01 14:09:06 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster#10.210.250.400:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: no further information: /10.210.250.400:7077
15/08/01 14:09:25 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/08/01 14:09:25 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
15/08/01 14:09:25 ERROR OneForOneStrategy:
java.lang.NullPointerException
at org.apache.spark.deploy.client.AppClient$ClientActor$$anonfun$receiveWithLogging$1.applyOrElse(AppClient.scala:160)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59)
at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
at org.apache.spark.deploy.client.AppClient$ClientActor.aroundReceive(AppClient.scala:61)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
at akka.actor.ActorCell.invoke(ActorCell.scala:487)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
at akka.dispatch.Mailbox.run(Mailbox.scala:220)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
15/08/01 14:09:25 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext
at org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103)
at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1501)
at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2005)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:543)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Traceback (most recent call last):
File "C:/Users/ashish dutt/PycharmProjects/KafkaToHDFS/local2Remote.py", line 26, in <module>
sc = SparkContext(conf=conf)
File "C:\spark-1.4.0\python\pyspark\context.py", line 113, in __init__
conf, jsc, profiler_cls)
File "C:\spark-1.4.0\python\pyspark\context.py", line 165, in _do_init
self._jsc = jsc or self._initialize_context(self._conf._jconf)
File "C:\spark-1.4.0\python\pyspark\context.py", line 219, in _initialize_context
return self._jvm.JavaSparkContext(jconf)
File "C:\spark-1.4.0\python\lib\py4j-0.8.2.1-src.zip\py4j\java_gateway.py", line 701, in __call__
File "C:\spark-1.4.0\python\lib\py4j-0.8.2.1-src.zip\py4j\protocol.py", line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Cannot call methods on a stopped SparkContext
at org.apache.spark.SparkContext.org$apache$spark$SparkContext$$assertNotStopped(SparkContext.scala:103)
at org.apache.spark.SparkContext.getSchedulingMode(SparkContext.scala:1501)
at org.apache.spark.SparkContext.postEnvironmentUpdate(SparkContext.scala:2005)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:543)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
at py4j.Gateway.invoke(Gateway.java:214)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Process finished with exit code 1
The spark-defaults.conf file is configured as follows
#spark.eventLog.dir=hdfs://ABCD01:8020/user/spark/applicationHistory
spark.eventLog.dir hdfs://10.210.250.400:8020/user/spark/eventlog
spark.eventLog.enabled true
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.shuffle.service.enabled true
spark.shuffle.service.port 7337
spark.yarn.historyServer.address http://ABCD04:18088
spark.master spark://10.210.250.400:7077
spark.yarn.jar local:/opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/lib/spark/assembly/lib/spark-assembly-1.3.0-cdh5.4.2-hadoop2.6.0-cdh5.4.2.jar
spark.driver.extraLibraryPath /opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/lib/hadoop/lib/native
spark.executor.extraLibraryPath /opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/lib/hadoop/lib/native
spark.yarn.am.extraLibraryPath /opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/lib/hadoop/lib/native
spark.logConf true
The spark-env.sh file is configured as follows
#!/usr/bin/env bash
##
# Generated by Cloudera Manager and should not be modified directly
##
SELF="$(cd $(dirname $BASH_SOURCE) && pwd)"
if [ -z "$SPARK_CONF_DIR" ]; then
export SPARK_CONF_DIR="$SELF"
fi
export SPARK_HOME=/opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/lib/spark
export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.4.2-1.cdh5.4.2.p0.2/lib/hadoop
#export STANDALONE_SPARK_MASTER_HOST=`ABCD01`
export SPARK_MASTER_IP=spark://10.210.250.400
export SPARK_MASTER_PORT=7077
export SPARK_WEBUI_PORT=18080
### Path of Spark assembly jar in HDFS
export SPARK_JAR_HDFS_PATH=${SPARK_JAR_HDFS_PATH:-''}
export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME}
if [ -n "$HADOOP_HOME" ]; then
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${HADOOP_HOME}/lib/native
fi
SPARK_EXTRA_LIB_PATH=""
if [ -n "$SPARK_EXTRA_LIB_PATH" ]; then
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$SPARK_EXTRA_LIB_PATH
fi
export LD_LIBRARY_PATH
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf}
# This is needed to support old CDH versions that use a forked version
# of compute-classpath.sh.
export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib
# Set distribution classpath. This is only used in CDH 5.3 and later.
export SPARK_DIST_CLASSPATH=$(paste -sd: "$SELF/classpath.txt")
And the slaves.sh file is configured as
10.210.250.401
10.210.250.402
10.210.250.403
Please tell me how can I connect to the remote server.

The problem is that Spark requires some elements of a Hadoop distribution to be present on Windows in order to work. Spark-env.sh won't help you as it is a shell script not executed on Windows. I think the solution you need is already covered here submit .py script on Spark without Hadoop installation

Related

My PySpark Jobs Run Fine in Local Mode, But Fail in Cluster Mode - SOLVED

I have a four node Hadoop/Spark cluster running in AWS. I can submit and run jobs perfectly in local mode:
spark-submit --master local[*] myscript.py
But when I attempt to run the script in cluster mode, it fails. I'm just trying the cluster equivalent of "hello world":
spark-submit spark-yarn.py
Where the script is this one that was recommended:
from pyspark import SparkConf
from pyspark import SparkContext
conf = SparkConf()
conf.setMaster('yarn')
conf.setAppName('spark-yarn')
sc = SparkContext(conf=conf)
def mod(x):
import numpy as np
return (x, np.mod(x, 2))
rdd = sc.parallelize(range(1000)).map(mod).take(10)
print(rdd)
I've spent days looking at every log I can find and reading everything I can online, but nothing has helped me get to the root of why it's not working. Before I tear down all the servers and start over, I'm hoping someone can point me in the right direction to get this working.
Here's the output in the terminal:
20/02/25 12:59:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/02/25 13:00:11 ERROR YarnClientSchedulerBackend: YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details.
20/02/25 13:00:11 ERROR YarnClientSchedulerBackend: Diagnostics message: Application application_1582603840719_0002 failed 2 times due to AM Container for appattempt_1582603840719_0002_000002 exited with exitCode: -103
Failing this attempt.Diagnostics: [2020-02-25 13:00:11.601]Container [pid=3124,containerID=container_1582603840719_0002_02_000001] is running beyond virtual memory limits. Current usage: 328.7 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1582603840719_0002_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 3128 3124 3124 3124 (java) 504 34 2359349248 83396 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0002/container_1582603840719_0002_02_000001/tmp -Dspark.yarn.app.container.log.dir=/home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0002/container_1582603840719_0002_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg ip-172-31-7-96.ec2.internal:43275 --properties-file /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0002/container_1582603840719_0002_02_000001/__spark_conf__/__spark_conf__.properties
|- 3124 3122 3124 3124 (bash) 0 0 13635584 760 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0002/container_1582603840719_0002_02_000001/tmp -Dspark.yarn.app.container.log.dir=/home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0002/container_1582603840719_0002_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 'ip-172-31-7-96.ec2.internal:43275' --properties-file /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0002/container_1582603840719_0002_02_000001/__spark_conf__/__spark_conf__.properties 1> /home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0002/container_1582603840719_0002_02_000001/stdout 2> /home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0002/container_1582603840719_0002_02_000001/stderr
[2020-02-25 13:00:11.651]Container killed on request. Exit code is 143
[2020-02-25 13:00:11.658]Container exited with a non-zero exit code 143.
For more detailed output, check the application tracking page: http://ec2-34-200-223-235.compute-1.amazonaws.com:8088/cluster/app/application_1582603840719_0002 Then click on links to logs of each attempt.
. Failing the application.
20/02/25 13:00:11 ERROR TransportClient: Failed to send RPC RPC 6867152665638655473 to /172.31.9.94:57526: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
20/02/25 13:00:11 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
java.io.IOException: Failed to send RPC RPC 6867152665638655473 to /172.31.9.94:57526: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:362)
at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:339)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:987)
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:869)
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1316)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38)
at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1081)
at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1128)
at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1070)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
20/02/25 13:00:11 ERROR Utils: Uncaught exception in thread YARN application state monitor
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:574)
at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:98)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:164)
at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:653)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2042)
at org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1949)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1948)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:121)
Caused by: java.io.IOException: Failed to send RPC RPC 6867152665638655473 to /172.31.9.94:57526: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:362)
at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:339)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:122)
at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:987)
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:869)
at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1316)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38)
at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1081)
at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1128)
at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1070)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
20/02/25 13:00:12 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:818)
at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:196)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Traceback (most recent call last):
File "/home/ubuntu/server/spark-yarn.py", line 7, in <module>
sc = SparkContext(conf=conf)
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 136, in __init__
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 198, in _do_init
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 306, in _initialize_context
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Spark context stopped while waiting for backend
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:818)
at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:196)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
If I change 'spark' to 'spark-client' as the master, it gives a slightly different error:
20/02/25 13:07:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/02/25 13:07:46 ERROR TransportClient: Failed to send RPC RPC 5381013595535555066 to /172.31.5.228:39748: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
20/02/25 13:07:46 ERROR YarnScheduler: Lost executor 1 on ip-172-31-5-228.ec2.internal: Slave lost
20/02/25 13:07:51 ERROR YarnClientSchedulerBackend: YARN application has exited unexpectedly with state FAILED! Check the YARN application logs for more details.
20/02/25 13:07:51 ERROR YarnClientSchedulerBackend: Diagnostics message: Application application_1582603840719_0003 failed 2 times due to AM Container for appattempt_1582603840719_0003_000002 exited with exitCode: -103
Failing this attempt.Diagnostics: [2020-02-25 13:07:51.067]Container [pid=3223,containerID=container_1582603840719_0003_02_000001] is running beyond virtual memory limits. Current usage: 320.8 MB of 1 GB physical memory used; 2.2 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1582603840719_0003_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 3227 3223 3223 3223 (java) 489 32 2355855360 81352 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0003/container_1582603840719_0003_02_000001/tmp -Dspark.yarn.app.container.log.dir=/home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0003/container_1582603840719_0003_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg ip-172-31-7-96.ec2.internal:40963 --properties-file /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0003/container_1582603840719_0003_02_000001/__spark_conf__/__spark_conf__.properties
|- 3223 3221 3223 3223 (bash) 0 0 13635584 767 /bin/bash -c /usr/lib/jvm/java-8-openjdk-amd64/bin/java -server -Xmx512m -Djava.io.tmpdir=/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0003/container_1582603840719_0003_02_000001/tmp -Dspark.yarn.app.container.log.dir=/home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0003/container_1582603840719_0003_02_000001 org.apache.spark.deploy.yarn.ExecutorLauncher --arg 'ip-172-31-7-96.ec2.internal:40963' --properties-file /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1582603840719_0003/container_1582603840719_0003_02_000001/__spark_conf__/__spark_conf__.properties 1> /home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0003/container_1582603840719_0003_02_000001/stdout 2> /home/ubuntu/server/hadoop-2.9.2/logs/userlogs/application_1582603840719_0003/container_1582603840719_0003_02_000001/stderr
[2020-02-25 13:07:51.089]Container killed on request. Exit code is 143
[2020-02-25 13:07:51.090]Container exited with a non-zero exit code 143.
For more detailed output, check the application tracking page: http://ec2-34-200-223-235.compute-1.amazonaws.com:8088/cluster/app/application_1582603840719_0003 Then click on links to logs of each attempt.
. Failing the application.
20/02/25 13:07:51 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalStateException: Spark context stopped while waiting for backend
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:818)
at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:196)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
20/02/25 13:07:52 ERROR TransportClient: Failed to send RPC RPC 8397804982944513692 to /172.31.0.102:39468: java.nio.channels.ClosedChannelException
java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
20/02/25 13:07:52 ERROR YarnSchedulerBackend$YarnSchedulerEndpoint: Sending RequestExecutors(0,0,Map(),Set()) to AM was unsuccessful
java.io.IOException: Failed to send RPC RPC 8397804982944513692 to /172.31.0.102:39468: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:362)
at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:339)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
20/02/25 13:07:52 ERROR Utils: Uncaught exception in thread YARN application state monitor
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:574)
at org.apache.spark.scheduler.cluster.YarnSchedulerBackend.stop(YarnSchedulerBackend.scala:98)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.stop(YarnClientSchedulerBackend.scala:164)
at org.apache.spark.scheduler.TaskSchedulerImpl.stop(TaskSchedulerImpl.scala:653)
at org.apache.spark.scheduler.DAGScheduler.stop(DAGScheduler.scala:2042)
at org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1949)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1948)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:121)
Caused by: java.io.IOException: Failed to send RPC RPC 8397804982944513692 to /172.31.0.102:39468: java.nio.channels.ClosedChannelException
at org.apache.spark.network.client.TransportClient$RpcChannelListener.handleFailure(TransportClient.java:362)
at org.apache.spark.network.client.TransportClient$StdChannelListener.operationComplete(TransportClient.java:339)
at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
Traceback (most recent call last):
File "/home/ubuntu/server/spark-yarn.py", line 7, in <module>
sc = SparkContext(conf=conf)
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 136, in __init__
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 198, in _do_init
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/pyspark.zip/pyspark/context.py", line 306, in _initialize_context
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
File "/home/ubuntu/server/spark-2.4.4-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.IllegalStateException: Spark context stopped while waiting for backend
at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:818)
at org.apache.spark.scheduler.TaskSchedulerImpl.postStartHook(TaskSchedulerImpl.scala:196)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:560)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
It mentions checking the logs at:
http://ec2-34-200-223-235.compute-1.amazonaws.com:8088/cluster/app/application_1582603840719_0003
But clicking on any of the log links on that page give an error:
Firefox can’t establish a connection to the server at ip-172-31-0-102.ec2.internal:8042.
(That's probably unrelated.)
Grepping for warnings, I see the following:
2020-02-25 13:07:38,904 WARN org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The specific max attempts: 0 for application: 3 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead.
2020-02-25 13:07:51,241 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=ubuntOPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1582603840719_0003 failed 2 times due to AM Container for appattempt_1582603840719_0003_000002 exited with exitCode: -103
2020-02-25 13:07:40,367 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint done. New Image Size: 12086
No errors were generated.
Trying yarn logs from the command line will show me the jobs:
ubuntu#ip-172-31-7-96:~/server$ yarn application -list -appStates ALL
20/02/25 13:31:44 INFO client.RMProxy: Connecting to ResourceManager at ec2-34-200-223-235.compute-1.amazonaws.com/172.31.7.96:8032
Total number of applications (application-types: [], states: [NEW, NEW_SAVING, SUBMITTED, ACCEPTED, RUNNING, FINISHED, FAILED, KILLED] and tags: []):3
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1582603840719_0001 spark-yarn SPARK ubuntu default FINISHED UNDEFINED 100% N/A
But asking for the logs fails:
ubuntu#ip-172-31-7-96:~/server$ yarn logs -applicationId application_1582603840719_0001
20/02/25 13:32:48 INFO client.RMProxy: Connecting to ResourceManager at ec2-34-200-223-235.compute-1.amazonaws.com/172.31.7.96:8032
fs.AbstractFileSystem.ec2-34-200-223-235.compute-1.amazonaws.com.impl=null: No AbstractFileSystem configured for scheme: ec2-34-200-223-235.compute-1.amazonaws.com
Can not find any log file matching the pattern: [ALL] for the application: application_1582603840719_0001
Can not find the logs for the application: application_1582603840719_0001 with the appOwner: ubuntu
Again, if someone could direct me to next trouble-shooting steps, I'd really appreciate it. I've spent days on this and don't seem to be making progress.

Two of these things ended up solving this issue:
First, I added the following lines to all nodes in the yarn-site.xml file:
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
Next, I changed my spark-submit command to include the following lines to give the driver more memory:
spark-submit --master yarn \
--deploy-mode client \
--driver-memory 6g \
--executor-memory 6g \
--executor-cores 2 \
--num-executors 10 \
my_app.py

pyspark.zip not found,Application application_1558064260263_0001 failed 2 times due to AM Container

The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
19/05/17 10:11:06 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
19/05/17 10:11:06 WARN MetricsSystem: Stopping a MetricsSystem that is not running
19/05/17 10:11:06 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.GatewayConnection.run(GatewayConnection.java:238)
java.lang.Thread.run(Thread.java:748)
19/05/17 10:11:06 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/05/17 10:11:10 ERROR YarnClientSchedulerBackend: The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
19/05/17 10:11:10 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Application application_1558064260263_0002 failed 2 times due to AM Container for appattempt_1558064260263_0002_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2019-05-17 10:11:09.626]File file:/home/hadoop/.sparkStaging/application_1558064260263_0002/pyspark.zip does not exist
19/05/17 10:11:06 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
19/05/17 10:11:06 WARN MetricsSystem: Stopping a MetricsSystem that is not running
19/05/17 10:11:06 WARN SparkContext: Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:238)
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
py4j.GatewayConnection.run(GatewayConnection.java:238)
java.lang.Thread.run(Thread.java:748)
19/05/17 10:11:06 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
19/05/17 10:11:10 ERROR YarnClientSchedulerBackend: The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
19/05/17 10:11:10 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Application application_1558064260263_0002 failed 2 times due to AM Container for appattempt_1558064260263_0002_000002 exited with exitCode: -1000
Failing this attempt.Diagnostics: [2019-05-17 10:11:09.626]File file:/home/hadoop/.sparkStaging/application_1558064260263_0002/pyspark.zip does not exist

Add these lines in your .bashrc
function snotebook ()
{
#Spark path (based on your computer)
SPARK_PATH=$SPARK_HOME
export PYSPARK_DRIVER_PYTHON="jupyter"
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
# For python 3 users, you have to add the line below or you will get an error
export PYSPARK_PYTHON=/home/anaconda3/bin/python
$SPARK_PATH/bin/pyspark --master yarn
}
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop

Cannot load a saved Spark model in pyspark: "java.lang.NoSuchMethodException"

When I run the following Python program
from pyspark.ml.classification import LinearSVC
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("Sparkmodel").getOrCreate()
data = spark.read.format("libsvm").load("/usr/local/spark/data/mllib/sample_libsvm_data.txt")
model = LinearSVC().fit(data)
model.save("mymodel")
LinearSVC.load("mymodel")
the load fails with a "java.lang.NoSuchMethodException".
/anaconda3/envs/scratch/bin/python /Users/billmcn/src/toy/sparkmodel/sparkmodel/little.py
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/11/12 13:23:05 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/12 13:23:06 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
17/11/12 13:23:06 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
17/11/12 13:23:17 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
17/11/12 13:23:17 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
Traceback (most recent call last):
File "/Users/billmcn/src/toy/sparkmodel/sparkmodel/little.py", line 9, in <module>
LinearSVC.load("mymodel")
File "/anaconda3/envs/scratch/lib/python3.6/site-packages/pyspark/ml/util.py", line 257, in load
return cls.read().load(path)
File "/anaconda3/envs/scratch/lib/python3.6/site-packages/pyspark/ml/util.py", line 197, in load
java_obj = self._jread.load(path)
File "/anaconda3/envs/scratch/lib/python3.6/site-packages/py4j/java_gateway.py", line 1160, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/anaconda3/envs/scratch/lib/python3.6/site-packages/pyspark/sql/utils.py", line 63, in deco
return f(*a, **kw)
File "/anaconda3/envs/scratch/lib/python3.6/site-packages/py4j/protocol.py", line 320, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o64.load.
: java.lang.NoSuchMethodException: org.apache.spark.ml.classification.LinearSVCModel.<init>(java.lang.String)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getConstructor(Class.java:1825)
at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:328)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Process finished with exit code 1
The "mymodel" directory is created and its contents appear to be valid.
I am running Spark 2.2.0 and pyspark 2.2.0. I have the following mllib jars in my installation.
> ll /usr/local/spark.versions/spark-2.2.0-bin-hadoop2.7/jars/spark-mllib*
-rw-r--r--# 1 billmcn admin 6501535 Jun 30 18:09 /usr/local/spark.versions/spark-2.2.0-bin-hadoop2.7/jars/spark-mllib_2.11-2.2.0.jar
-rw-r--r--# 1 billmcn admin 182887 Jun 30 18:09 /usr/local/spark.versions/spark-2.2.0-bin-hadoop2.7/jars/spark-mllib-local_2.11-2.2.0.jar
And the latter contains the class I want.
jar tf /usr/local/spark.versions/spark-2.2.0-bin-hadoop2.7/jars/spark-mllib_2.11-2.2.0.jar | grep LinearSVCModel
org/apache/spark/ml/classification/LinearSVCModel$LinearSVCWriter$Data.class
org/apache/spark/ml/classification/LinearSVCModel$.class
org/apache/spark/ml/classification/LinearSVCModel$LinearSVCWriter.class
org/apache/spark/ml/classification/LinearSVCModel$LinearSVCReader.class
org/apache/spark/ml/classification/LinearSVCModel$$anonfun$11.class
org/apache/spark/ml/classification/LinearSVCModel$LinearSVCWriter$$typecreator1$1.class
org/apache/spark/ml/classification/LinearSVCModel$LinearSVCWriter$Data$.class
org/apache/spark/ml/classification/LinearSVCModel.class
The same problem happens on two different machines.
What am I doing wrong?

I was using the wrong class to load the module. The following works
model = LinearSVCModel.load(model_path)

This looks like a version mismatch. The most likely scenario is:
Your Python (PySpark) installation uses Spark 2.2
While JVM jars have been compiled with earlier Spark version, which didn't include LinearSVCModel.

error with spark: Received -1 when reading from channel, socket has likely been closed. what does it mean?

I am running a Kafka server. (when i use the command bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning it gives me all ma data in my topic).
When I want to test the example JavaDirectKafkaWordCount in spark in order to understand how it works i get the following error:
$ ./run-example streaming.JavaDirectKafkaWordCount localhost:2181 test
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/08/17 11:19:33 INFO StreamingExamples: Setting log level to [WARN] for streaming example. To override add a custom log4j.properties to the classpath.
16/08/17 11:19:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/17 11:19:33 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.66.212.132 instead (on interface enp5s0)
16/08/17 11:19:33 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Exception in thread "main" org.apache.spark.SparkException: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$checkErrors$1.apply(KafkaCluster.scala:366)
at scala.util.Either.fold(Either.scala:97)
at org.apache.spark.streaming.kafka.KafkaCluster$.checkErrors(KafkaCluster.scala:365)
at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:222)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:607)
at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
at org.apache.spark.examples.streaming.JavaDirectKafkaWordCount.main(JavaDirectKafkaWordCount.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I would like to know what the error means and how i could possible solve it.
Thank you very much for your attention and your help.
Joe

Intellij connect hortonwork spark remotely failed

I have a hortonwork sandbox 2.4 with spark 1.6 set up. Then I create Intellij spark development environment in windows using hdp spark jar and scala 2.10.5. So both spark and scala version are matched between my windows and hdp environment as indicated here. And my Intellij dev environment works with local as Master.
Then I'm trying to connect hdp in windows using
val sparkConf = new SparkConf()
.setAppName("spark-word-count")
.setMaster("spark://10.33.241.160:7077")
And I get below error information and have no clue to resolve it. Please help!
6/03/21 16:27:40 INFO SparkUI: Started SparkUI at http://10.33.240.126:4040
16/03/21 16:27:40 INFO AppClient$ClientEndpoint: Connecting to master spark://10.33.241.160:7077...
16/03/21 16:27:41 WARN AppClient$ClientEndpoint: Failed to connect to master 10.33.241.160:7077
java.io.IOException: Failed to connect to /10.33.241.160:7077
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.ConnectException: Connection refused: no further information: /10.33.241.160:7077
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:692)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
16/03/21 16:28:40 ERROR MapOutputTrackerMaster: Error communicating with MapOutputTracker
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1325)
at scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
at org.apache.spark.MapOutputTracker.askTracker(MapOutputTracker.scala:110)
at org.apache.spark.MapOutputTracker.sendTracker(MapOutputTracker.scala:120)
at org.apache.spark.MapOutputTrackerMaster.stop(MapOutputTracker.scala:462)
at org.apache.spark.SparkEnv.stop(SparkEnv.scala:93)
at org.apache.spark.SparkContext$$anonfun$stop$12.apply$mcV$sp(SparkContext.scala:1756)
at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1229)
at org.apache.spark.SparkContext.stop(SparkContext.scala:1755)
at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:127)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:264)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2$$anonfun$run$1.apply$mcV$sp(AppClient.scala:134)
at org.apache.spark.util.Utils$.tryOrExit(Utils.scala:1163)
at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anon$2.run(AppClient.scala:129)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

It turns out I need to setup my hortonworks Spark as master server every time server restart. Then use my intellij dev environment to connect with hdp as slave. Just run ./sbin/start-master.sh in hdp as this link.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

PySpark in Pycharm- unable to connect to remote server - apache-spark

Related

My PySpark Jobs Run Fine in Local Mode, But Fail in Cluster Mode - SOLVED

pyspark.zip not found,Application application_1558064260263_0001 failed 2 times due to AM Container

Cannot load a saved Spark model in pyspark: "java.lang.NoSuchMethodException"

error with spark: Received -1 when reading from channel, socket has likely been closed. what does it mean?

Intellij connect hortonwork spark remotely failed

Categories

Resources