Spark on Yarn Client Mode - HADOOP_CONF_DIR is ignored - apache-spark

I have Spark 2.1.1 installed on top of Hadoop 2.8.1.
I have already specified HADOOP_CONF_DIR on spark-env.sh. I also have the following setting on spark-defaults.sh
spark.yarn.access.namenodes hdfs://hadoop-node0:55555/
But when I execute spark-shell with the following command for instance,
sparkuser#hadoop-node0:/home/apps/spark-2.1.1-bin-hadoop2.7$ bin/spark-shell --master yarn --deploy-mode client
HADOOP_CONF_DIR setting seems to be ignored so it does not retrieve the settings on core-site.xml and hdfs-site.xml, as I always get the following error:
17/07/25 10:15:24 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: java.net.UnknownHostException: spark
When I added "spark" on my /etc/hosts as alternative for localhost, I always get the following error:
17/07/25 10:17:15 ERROR spark.SparkContext: Error initializing SparkContext.
java.net.ConnectException: Call From XXXX/XXX.XXX.XXX.XXX to spark:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
So it always tries to reach 127.0.0.1:8020 which of course does not work as nothing is listening on it.
What do you think I missed specifying in the config files?
Thanks a lot in advance.
Kind regards,
Anto

Related

I want to run spark with yarn, but I get a java.net.ConnectException error

I want to run spark on yarn
root#server01:/export/server/spark# bin/spark-shell --master yarn
But it goes to error like this
root#server01:/export/server/spark# bin/spark-shell --master yarn
22/12/06 07:51:42 WARN Utils: Your hostname, server01 resolves to a loopback address: 127.0.1.1; using 192.168.40.133 instead (on interface ens33)
22/12/06 07:51:42 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/12/06 07:51:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/12/06 07:57:58 ERROR YarnClientSchedulerBackend: The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
22/12/06 07:57:58 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Application application_1670312235175_0001 failed 2 times due to Error launching appattempt_1670312235175_0001_000002. Got exception: java.net.ConnectException: Call From localhost/127.0.0.1 to localhost:41647 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://node1:8020/sparklog/
spark.eventLog.compress true
# spark-yarn jar package
spark.yarn.jars hdfs://node1:8020/spark/jars/*
# spark and yarn history server
spark.yarn.historyServer.address node1:18080
yarn-site.xml
<property>
<name>yarn.log.server.url</name>
<value>http://node1:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
I verify the spark.yarn.historyServer to node1:19888 and restart spark,but it doesn't work.

Apache Spark 2.3.0 : Unable start spark-shell on Windows

While starting spark-shell, I get the following error.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
**18/04/25 07:18:41 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master 10.250.54.201:7077**
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:109)
Try following the below steps to resolve the issue
Go to %SPARK_HOME%\bin folder in a command prompt
Run spark-class org.apache.spark.deploy.master.Master to run the master. This will give you a URL of the form spark://ip:port
Run spark-class org.apache.spark.deploy.worker.Worker spark://ip:port to run the worker. Make sure you use the URL you obtained in step 2.
Run spark-shell --master spark://ip:port to connect an application to the newly created cluster.

spark-submit fails with wrong IP reference

We are trying to run a sample python file through spark-submit , but it fails with the error as below.
we have also set SPARK_MASTER_IP and SPARK_LOCAL_IP to (10.1.1.127) in spark-env.sh file.
but some where still there is some refer to (ip-10-1-1-127)
16/04/26 12:21:01 ERROR SparkContext: Error initializing SparkContext.
java.net.UnknownHostException: ip-10-1-1-127: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1295)

java.net.ConnectException (on port 9000) while submitting a spark job

On running this command:
~/spark/bin/spark-submit --class [class-name] --master [spark-master-url]:7077 [jar-path]
I am getting
java.lang.RuntimeException: java.net.ConnectException: Call to ec2-[ip].compute-1.amazonaws.com/[internal-ip]:9000 failed on connection exception: java.net.ConnectException: Connection refused
Using spark version 1.3.0.
How do I resolve it?
When Spark is run in Cluster mode, all input files will be expected to be from HDFS (otherwise how will workers read from master's local files). But in this case, Hadoop wasn't running, so it was giving this exception.
Starting HDFS resolved this.

error when starting the spark shell

I just downloaded the latest version of spark and when I started the spark shell I got the following error:
java.net.BindException: Failed to bind to: /192.168.1.254:0: Service 'sparkDriver' failed after 16 retries!
at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:393)
at akka.remote.transport.netty.NettyTransport$$anonfun$listen$1.apply(NettyTransport.scala:389)
...
...
java.lang.NullPointerException
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:193)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:71)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:525)
at org.apache.spark.repl.SparkILoop.createSQLContext(SparkILoop.scala:1028)
at $iwC$$iwC.<init>(<console>:9)
...
...
<console>:10: error: not found: value sqlContext
import sqlContext.implicits._
^
<console>:10: error: not found: value sqlContext
import sqlContext.sql
^
Is there something that I missed in setting up spark?
Try setting the Spark env variable SPARK_LOCAL_IP to a local IP address.
In my case, I was running Spark on an Amazon EC2 Linux instance. spark-shell stopped working, with an error message similar to yours. I was able to fix it by adding a setting like the following to the Spark config file spark-env.conf.
export SPARK_LOCAL_IP=172.30.43.105
Could also set it in ~/.profile or ~/.bashrc.
Also check host settings in /etc/hosts
See SPARK-8162.
It looks like it only affects 1.4.1 and 1.5.0 - you're probably best off running the latest release (1.4.0 at time of writing).
I was experiencing the same issue. First got to .bashrc and put
export SPARK_LOCAL_IP=172.30.43.105
then goto
cd $HADOOP_HOME/bin
then run the following command
hdfs dfsadmin -safemode leave
This just switches your safemode of namenode off.
Then delete the metastore_db folder from the spark home folder or /bin. It will be generally be in a folder from which you generally start a spark session.
then I ran my spark-shell using this
spark-shell --master "spark://localhost:7077"
and voila I didnot get the sqlContext.implicits._ error.

Resources