I want to run spark with yarn, but I get a java.net.ConnectException error - apache-spark

I want to run spark on yarn
root#server01:/export/server/spark# bin/spark-shell --master yarn
But it goes to error like this
root#server01:/export/server/spark# bin/spark-shell --master yarn
22/12/06 07:51:42 WARN Utils: Your hostname, server01 resolves to a loopback address: 127.0.1.1; using 192.168.40.133 instead (on interface ens33)
22/12/06 07:51:42 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/12/06 07:51:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/12/06 07:57:58 ERROR YarnClientSchedulerBackend: The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
22/12/06 07:57:58 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Application application_1670312235175_0001 failed 2 times due to Error launching appattempt_1670312235175_0001_000002. Got exception: java.net.ConnectException: Call From localhost/127.0.0.1 to localhost:41647 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://node1:8020/sparklog/
spark.eventLog.compress true
# spark-yarn jar package
spark.yarn.jars hdfs://node1:8020/spark/jars/*
# spark and yarn history server
spark.yarn.historyServer.address node1:18080
yarn-site.xml
<property>
<name>yarn.log.server.url</name>
<value>http://node1:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
I verify the spark.yarn.historyServer to node1:19888 and restart spark,but it doesn't work.

Related

I cant get hive to run jobs with spark

Running hive with spark keeps giving me this error. I have tried many different versions of both hive and spark.
Running hadoop 2.7.3 in standalone mode
it is a diy implementation
right now using spark 2.2 with hive 2.3.5
I tried different versions of hive and spark, I am not sure what exactly the problem is or how to debug this:
0: jdbc:hive2://192.168.71.62:10000> select count(*) from traffic;
Getting log thread is interrupted, since query is done!
Error: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Failed to create spark client.
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create spark client.
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:64)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionManagerImpl.getSession(SparkSessionManagerImpl.java:115)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1232)
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:255)
... 11 more
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: Cancel client '9be4c047-285d-4578-a934-7bd51294d240'. Error: Child process exited before connecting back with error log Warning: Ignoring non-spark config property: hive.spark.client.server.connect.timeout=90000
Warning: Ignoring non-spark config property: hive.spark.client.rpc.threads=8
Warning: Ignoring non-spark config property: hive.spark.client.connect.timeout=1000
Warning: Ignoring non-spark config property: hive.spark.client.secret.bits=256
Warning: Ignoring non-spark config property: hive.spark.client.rpc.max.size=52428800
19/05/20 12:39:42 WARN util.Utils: Your hostname, suypc183-OptiPlex-3020 resolves to a loopback address: 127.0.0.1; using 192.168.71.62 instead (on interface enp2s0)
19/05/20 12:39:42 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
19/05/20 12:39:43 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
19/05/20 12:39:43 INFO yarn.Client: Requesting a new application from cluster with 1 NodeManagers
19/05/20 12:39:43 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
19/05/20 12:39:43 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
19/05/20 12:39:43 INFO yarn.Client: Setting up container launch context for our AM
19/05/20 12:39:43 INFO yarn.Client: Setting up the launch environment for our AM container
19/05/20 12:39:43 INFO yarn.Client: Preparing resources for our AM container
19/05/20 12:39:44 INFO yarn.Client: Deleted staging directory hdfs://localhost:9000/user/anonymous/.sparkStaging/application_1558334426394_0004
Exception in thread "main" java.lang.IllegalArgumentException: Can not create a Path from an empty string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:126)
at org.apache.hadoop.fs.Path.<init>(Path.java:134)
at org.apache.hadoop.fs.Path.<init>(Path.java:93)
at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:369)
at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$$distribute$1(Client.scala:490)
at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:529)
at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:882)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:171)
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1167)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1226)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:744)
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
at com.google.common.base.Throwables.propagate(Throwables.java:160)
at org.apache.hive.spark.client.SparkClientImpl.<init>(SparkClientImpl.java:125)
at org.apache.hive.spark.client.SparkClientFactory.createClient(SparkClientFactory.java:80)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.createRemoteClient(RemoteHiveSparkClient.java:101)
at org.apache.hadoop.hive.ql.exec.spark.RemoteHiveSparkClient.<init>(RemoteHiveSparkClient.java:97)
at org.apache.hadoop.hive.ql.exec.spark.HiveSparkClientFactory.createHiveSparkClient(HiveSparkClientFactory.java:73)
at org.apache.hadoop.hive.ql.exec.spark.session.SparkSessionImpl.open(SparkSessionImpl.java:62)
... 22 more
hive site configs are the following
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
<description>Scratch space for Hive jobs</description>
</property>
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>spark.master</name>
<value>yarn</value>
</property>
<property>
<name>spark.executor.memory</name>
<value>2048</value>
</property>
<property>
<name>spark.yarn.archive</name>
<value>hdfs://localhost:8088/user/jars/</value>
</property>
<property>
<name>spark.home</name>
<value>/home/danielphingston/spark</value>
</property>
The most typical problem of connection between Hive and Spark is making sure that Spark knows the HADOOP CONF DIRECTORY. We solve that by provide below statements in the spark-env.sh file within Spark:
HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-$SPARK_CONF_DIR/yarn-conf}
HIVE_CONF_DIR=${HIVE_CONF_DIR:-/etc/hive/conf}
if [ -d "$HIVE_CONF_DIR" ]; then
HADOOP_CONF_DIR="$HADOOP_CONF_DIR:$HIVE_CONF_DIR"
fi
export HADOOP_CONF_DIR
This enables Spark to see where Hadoop directories are present in the file system.

Spark on Yarn Client Mode - HADOOP_CONF_DIR is ignored

I have Spark 2.1.1 installed on top of Hadoop 2.8.1.
I have already specified HADOOP_CONF_DIR on spark-env.sh. I also have the following setting on spark-defaults.sh
spark.yarn.access.namenodes hdfs://hadoop-node0:55555/
But when I execute spark-shell with the following command for instance,
sparkuser#hadoop-node0:/home/apps/spark-2.1.1-bin-hadoop2.7$ bin/spark-shell --master yarn --deploy-mode client
HADOOP_CONF_DIR setting seems to be ignored so it does not retrieve the settings on core-site.xml and hdfs-site.xml, as I always get the following error:
17/07/25 10:15:24 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: java.net.UnknownHostException: spark
When I added "spark" on my /etc/hosts as alternative for localhost, I always get the following error:
17/07/25 10:17:15 ERROR spark.SparkContext: Error initializing SparkContext.
java.net.ConnectException: Call From XXXX/XXX.XXX.XXX.XXX to spark:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
So it always tries to reach 127.0.0.1:8020 which of course does not work as nothing is listening on it.
What do you think I missed specifying in the config files?
Thanks a lot in advance.
Kind regards,
Anto

Continuously getting error in spark-submit job

I am continuously getting this error:
16/02/29 14:49:40 WARN BlockManager: Block input-0-1456737579500 replicated to only 0 peer(s) instead of 1 peers
while running the spark-submit job.
./spark-submit --jars jar_names--driver-class-path --packages --executor-memory 6g --executor-cores 4 --master local[4] script.py
I am getting this error as soon as it is started: Error says:
WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
WARN Utils: Your hostname, host_name resolves to a loopback address: 127.0.0.1; using ip instead (on interface eth0)
WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
I have also tried using --master[4] but still getting the error.

spark-submit cluster mode is not working

I am getting an error in launching the standalone Spark driver in cluster mode. As per the documentation, it is noted that cluster mode is supported in the Spark 1.2.1 release. However, it is currently not working properly for me. Please help me in fixing the issue(s) that are preventing the proper functioning of Spark.
I have 3 node spark cluster node1 , node2 and node 3
I running below command on node 1 for deploying driver
/usr/local/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class com.fst.firststep.aggregator.FirstStepMessageProcessor --master spark://ec2-xx-xx-xx-xx.compute-1.amazonaws.com:7077 --deploy-mode cluster --supervise file:///home/xyz/sparkstreaming-0.0.1-SNAPSHOT.jar /home/xyz/config.properties
driver gets launched on node 2 in cluster. but getting exception on node 2 that it is trying to bind to node 1 ip.
2015-02-26 08:47:32 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:32 INFO Slf4jLogger:80 - Slf4jLogger started
2015-02-26 08:47:33 ERROR NettyTransport:65 - failed to bind to ec2-xx.xx.xx.xx.compute-1.amazonaws.com/xx.xx.xx.xx:0, shutting down Netty transport
2015-02-26 08:47:33 WARN Utils:71 - Service 'Driver' could not bind on port 0. Attempting port 1.
2015-02-26 08:47:33 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:33 ERROR Remoting:65 - Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:136)
at akka.remote.Remoting.start(Remoting.scala:201)
at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1765)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1756)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:33)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: ec2-xx-xx-xx.compute-1.amazonaws.com/xx.xx.xx.xx:0
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
kindly suggest
Thanks`enter code here`
It is not possible to bind to port 0. There is/are errors in your spark configuration. Specifically look at the
spark.webui.port
It is probably set to 0.

Spark 1.2.1 standalone cluster mode spark-submit is not working

I have 3 node spark cluster
node1 , node2 and node 3
I running below command on node 1 for deploying driver
/usr/local/spark-1.2.1-bin-hadoop2.4/bin/spark-submit --class com.fst.firststep.aggregator.FirstStepMessageProcessor --master spark://ec2-xx-xx-xx-xx.compute-1.amazonaws.com:7077 --deploy-mode cluster --supervise file:///home/xyz/sparkstreaming-0.0.1-SNAPSHOT.jar /home/xyz/config.properties
driver gets launched on node 2 in cluster. but getting exception on node 2 that it is trying to bind to node 1 ip.
2015-02-26 08:47:32 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:32 INFO Slf4jLogger:80 - Slf4jLogger started
2015-02-26 08:47:33 ERROR NettyTransport:65 - failed to bind to ec2-xx.xx.xx.xx.compute-1.amazonaws.com/xx.xx.xx.xx:0, shutting down Netty transport
2015-02-26 08:47:33 WARN Utils:71 - Service 'Driver' could not bind on port 0. Attempting port 1.
2015-02-26 08:47:33 DEBUG AkkaUtils:63 - In createActorSystem, requireCookie is: off
2015-02-26 08:47:33 ERROR Remoting:65 - Remoting error: [Startup failed] [
akka.remote.RemoteTransportException: Startup failed
at akka.remote.Remoting.akka$remote$Remoting$$notifyError(Remoting.scala:136)
at akka.remote.Remoting.start(Remoting.scala:201)
at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1765)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1756)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:33)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.jboss.netty.channel.ChannelException: Failed to bind to: ec2-xx-xx-xx.compute-1.amazonaws.com/xx.xx.xx.xx:0
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
at scala.util.Success$$anonfun$map$1.apply(Try.scala:206)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Success.map(Try.scala:206)
kindly suggest
Thanks
after spending lot more time.i got the answer.i did below changes
remove entry of SPARK_LOCAL_IP and SPARK_MASTER_IP
add host name and private ip address of each other nodes in etc/hosts.
use --deploy-mode cluster --supervise
thats all and it works perfectly with fully HA components(Master,Slaves and Driver)
Thanks
Cluster mode is not supported in EC2 1.2 instances where it creates a standalone cluster. Hence you can try removing
--deploy-mode cluster --supervise

Resources