spark-submit fails with wrong IP reference - apache-spark

We are trying to run a sample python file through spark-submit , but it fails with the error as below.
we have also set SPARK_MASTER_IP and SPARK_LOCAL_IP to (10.1.1.127) in spark-env.sh file.
but some where still there is some refer to (ip-10-1-1-127)
16/04/26 12:21:01 ERROR SparkContext: Error initializing SparkContext.
java.net.UnknownHostException: ip-10-1-1-127: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1295)

Related

I want to run spark with yarn, but I get a java.net.ConnectException error

I want to run spark on yarn
root#server01:/export/server/spark# bin/spark-shell --master yarn
But it goes to error like this
root#server01:/export/server/spark# bin/spark-shell --master yarn
22/12/06 07:51:42 WARN Utils: Your hostname, server01 resolves to a loopback address: 127.0.1.1; using 192.168.40.133 instead (on interface ens33)
22/12/06 07:51:42 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
22/12/06 07:51:51 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
22/12/06 07:57:58 ERROR YarnClientSchedulerBackend: The YARN application has already ended! It might have been killed or the Application Master may have failed to start. Check the YARN application logs for more details.
22/12/06 07:57:58 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Application application_1670312235175_0001 failed 2 times due to Error launching appattempt_1670312235175_0001_000002. Got exception: java.net.ConnectException: Call From localhost/127.0.0.1 to localhost:41647 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
spark-defaults.conf
spark.eventLog.enabled true
spark.eventLog.dir hdfs://node1:8020/sparklog/
spark.eventLog.compress true
# spark-yarn jar package
spark.yarn.jars hdfs://node1:8020/spark/jars/*
# spark and yarn history server
spark.yarn.historyServer.address node1:18080
yarn-site.xml
<property>
<name>yarn.log.server.url</name>
<value>http://node1:19888/jobhistory/logs</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>604800</value>
</property>
I verify the spark.yarn.historyServer to node1:19888 and restart spark,but it doesn't work.

ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up

im new to pyspark and i wanna lunch spark standalone cluster i lunched the spark-master using bin\spark-class2.cmd org.apache.spark.deploy.master.Master it worked well i checked on http://localhost:8080/ .
i wanted to lunch spark-shell using spark-shell --master spark://192.168.43.78:7077 <--- the URL of spark master and i got this error:
ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
22/05/24 20:28:59 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:92)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:577)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2589)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:937)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:931)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:106)
at $line3.$read$$iw$$iw.<init>(<console>:15)
...
How can i fix that ?

Spark on Yarn Client Mode - HADOOP_CONF_DIR is ignored

I have Spark 2.1.1 installed on top of Hadoop 2.8.1.
I have already specified HADOOP_CONF_DIR on spark-env.sh. I also have the following setting on spark-defaults.sh
spark.yarn.access.namenodes hdfs://hadoop-node0:55555/
But when I execute spark-shell with the following command for instance,
sparkuser#hadoop-node0:/home/apps/spark-2.1.1-bin-hadoop2.7$ bin/spark-shell --master yarn --deploy-mode client
HADOOP_CONF_DIR setting seems to be ignored so it does not retrieve the settings on core-site.xml and hdfs-site.xml, as I always get the following error:
17/07/25 10:15:24 ERROR spark.SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: java.net.UnknownHostException: spark
When I added "spark" on my /etc/hosts as alternative for localhost, I always get the following error:
17/07/25 10:17:15 ERROR spark.SparkContext: Error initializing SparkContext.
java.net.ConnectException: Call From XXXX/XXX.XXX.XXX.XXX to spark:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
So it always tries to reach 127.0.0.1:8020 which of course does not work as nothing is listening on it.
What do you think I missed specifying in the config files?
Thanks a lot in advance.
Kind regards,
Anto

AWS EMR using spark steps in cluster mode. Application application_ finished with failed status

I'm trying to launch a cluster using AWS Cli. I use the following command:
aws emr create-cluster --name "Config1" --release-label emr-5.0.0 --applications Name=Spark --use-default-role --log-uri 's3://aws-logs-813591802533-us-west-2/elasticmapreduce/' --instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m1.medium InstanceGroupType=CORE,InstanceCount=2,InstanceType=m1.medium
The cluster is created successfully. Then I add this command:
aws emr add-steps --cluster-id ID_CLUSTER --region us-west-2 --steps Name=SparkSubmit,Jar="command-runner.jar",Args=[spark-submit,--deploy-mode,cluster,--master,yarn,--executor-memory,1G,--class,Traccia2014,s3://tracceale/params/scalaProgram.jar,s3://tracceale/params/configS3.txt,30,300,2,"s3a://tracceale/Tempi1"],ActionOnFailure=CONTINUE
After some time, the step failed. This is the LOG file:
17/02/22 11:00:07 INFO RMProxy: Connecting to ResourceManager at ip-172-31- 31-190.us-west-2.compute.internal/172.31.31.190:8032
17/02/22 11:00:08 INFO Client: Requesting a new application from cluster with 2 NodeManagers
17/02/22 11:00:08 INFO Client: Verifying our application has not requested
Exception in thread "main" org.apache.spark.SparkException: Application application_1487760984275_0001 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1132)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1175)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
17/02/22 11:01:02 INFO ShutdownHookManager: Shutdown hook called
17/02/22 11:01:02 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-27baeaa9-8b3a-4ae6-97d0-abc1d3762c86
Command exiting with ret '1'
Locally (on SandBox Hortonworks HDP 2.5) I run:
./spark-submit --class Traccia2014 --master local[*] --executor-memory 2G /usr/hdp/current/spark2-client/ScalaProjects/ScripRapportoBatch2.1/target/scala-2.11/traccia-22-ottobre_2.11-1.0.jar "/home/tracce/configHDFS.txt" 30 300 3
and everything works fine.
I've already read something related to my problem, but I can't figure it out.
UPDATE
Checked into Application Master, I get this error:
17/02/22 15:29:54 ERROR ApplicationMaster: User class threw exception: java.io.FileNotFoundException: s3:/tracceale/params/configS3.txt (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at scala.io.Source$.fromFile(Source.scala:91)
at scala.io.Source$.fromFile(Source.scala:76)
at scala.io.Source$.fromFile(Source.scala:54)
at Traccia2014$.main(Rapporto.scala:40)
at Traccia2014.main(Rapporto.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
17/02/22 15:29:55 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.io.FileNotFoundException: s3:/tracceale/params/configS3.txt (No such file or directory))
I pass the path mentioned "s3://tracceale/params/configS3.txt" from S3 to the function 'fromFile' like this:
for(line <- scala.io.Source.fromFile(logFile).getLines())
How could I solve it? Thanks in advance.
Because you are using cluster deploy mode, the logs you have included are not useful at all. They just say that the application failed but not why it failed. To figure out why it failed, you at least need to look at the Application Master logs, since that is where the Spark driver runs in cluster deploy mode, and it will probably give a better hint as to why the application failed.
Since you have configured your cluster with a --log-uri, you will find the logs for the Application Master underneath s3://aws-logs-813591802533-us-west-2/elasticmapreduce/<CLUSTER ID>/containers/<YARN Application ID>/ where the YARN Application ID is (based on the logs you included above) application_1487760984275_0001, and the container ID should be something like container_1487760984275_0001_01_000001. (The first container for an application is the Application Master.)
What you have there is a URL to an object store, reachable from the Hadoop filesystem APIs, and a stack trace coming from java.io.File, which can't read it because it doesn't refer to anything in the local disk.
Use SparkContext.hadoopRDD() as the operation to convert the path into an RDD
There is a probability of file missing in the location, may be you can see it after ssh into EMR cluster but still the steps command wouldn't be able to figure out by itself and starts throwing that file not found exception.
In this scenario what I did is :
Step 1: Checked for the file existence in the project directory which we copied to EMR.
for example mine was in `//usr/local/project_folder/`
Step 2: Copy the script which you're expecting to run on the EMR.
for example I copied from `//usr/local/project_folder/script_name.sh` to `/home/hadoop/`
Step 3: Then executed the script from /home/hadoop/ by passing the absolute path to the command-runner.jar
command-runner.jar bash /home/hadoop/script_name.sh
Thus I found my script running. Hope this may be helpful to someone

SparkR and Pyspark throw Java.net.Bindexception on launch, but Spark-Shell does not?

I have already tried setting SPARK_LOCAL_IP to "127.0.0.1" and checking if the port is occupied. Here is the full error text:
Launching java with spark-submit command /usr/hdp/2.4.0.0-
169/spark/bin/spark-submit "sparkr-shell" /tmp/RtmpZo44il/backend_port998540c56917
/usr/hdp/2.4.0.0-169/spark/bin/load-spark-env.sh: line 72: export: `load-spark-env.sh': not a valid identifier
16/06/13 11:28:24 ERROR RBackend: Server shutting down: failed with exception
java.net.BindException: Cannot assign requested address
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125)
at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485)
at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089)
at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430)
at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415)
at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903)
at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198)
at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:745)
Error in SparkR::sparkR.init() : JVM is not ready after 10 seconds
Above error is when launching ./bin/sparkR. Again Spark-shell will execute normally.
Some more information. Spark-shell when launched will automatically search through ports until it has resolved one that doesn't have a bind exception. Even when I set the default SparkR backend port to an unused port it will fail.
I found the issue. Another user had deleted my etc/hosts file. I reconfigured the file with localhost and it seems to run sparkR. I am still curious how spark-shell could run with the file though.

Resources