Spark standalone mode : failed on connection exception: - apache-spark

I am running a spark(1.2.1) standalone cluster on my virtual machine(Ubuntu 12.04). I can run the example such as als.py and pi.py successfully. But I can't run the workcount.py example because a connection error will occur.
bin/spark-submit --master spark://192.168.1.211:7077 /examples/src/main/python/wordcount.py ~/Documents/Spark_Examples/wordcount.py
The error message is as below:
15/03/13 22:26:02 INFO BlockManagerMasterActor: Registering block manager a12:45594 with 267.3 MB RAM, BlockManagerId(0, a12, 45594)
15/03/13 22:26:03 INFO Client: Retrying connect to server: a11/192.168.1.211:9000. Already tried 4 time(s).
......
Traceback (most recent call last):
File "/home/spark/spark/examples/src/main/python/wordcount.py", line 32, in <module>
.reduceByKey(add)
File "/home/spark/spark/lib/spark-assembly-1.2.1 hadoop1.0.4.jar/pyspark/rdd.py", line 1349, in reduceByKey
File "/home/spark/spark/lib/spark-assembly-1.2.1-hadoop1.0.4.jar/pyspark/rdd.py", line 1559, in combineByKey
File "/home/spark/spark/lib/spark-assembly-1.2.1-hadoop1.0.4.jar/pyspark/rdd.py", line 1942, in _defaultReducePartitions
File "/home/spark/spark/lib/spark-assembly-1.2.1-hadoop1.0.4.jar/pyspark/rdd.py", line 297, in getNumPartitions
......
py4j.protocol.Py4JJavaError: An error occurred while calling o23.partitions.
java.lang.RuntimeException: java.net.ConnectException: Call to a11/192.168.1.211:9000 failed on connection exception: java.net.ConnectException: Connection refused
......
I didn't use Yarn or ZooKeeper. And all the virtual machines can connect to each other via ssh without password. I also set the SPARK_LOCAL_IP for master and workers.

I think that wordcount.py example is accessing hdfs to reading lines in a file (and then count the words)
Something like:
sc.textFile("hdfs://<master-hostname>:9000/path/to/whatever")
Port 9000 is usually used for hdfs.
Please be sure that this file is accessible or do not use hdfs for that example :).
I hope it helps.

Related

hbase-spark connector for spark2.1.0

I am using below stack
Hadoop-2.7.7
spark-2.4.5
Hbase-2.1.0
zk-3.5.9
I want to read and write data on hbase using spark with spark-submit command. But i was unable to do so.
I have successfully started all services and also searched connectors for same but i didn't get.
I have tried to create connectors using below link https://github.com/LucaCanali/Miscellaneous/blob/master/Spark_Notes/Spark_HBase_Connector.md
But connector build getting failed somehow i have made it possible to get connectors from internet and tried with it
when i try to launch spark submit with below command my application is failing
spark-submit --jars /home/bigdata/downloads/hbase-spark-1.0.0.jar --packages org.apache.hbase:hbase-shaded-mapreduce:2.1.0 /home/bigdata/hbasefload.py
Error:
Traceback (most recent call last):
File "/home/bigdata/hbasefload.py", line 35, in <module>
.option("hbase.zookeeper.quorum", "node2.ellicium.com:2181")\
File "/opt/spark/spark245/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 73 7, in save
File "/opt/spark/spark245/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/spark/spark245/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/opt/spark/spark245/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328 , in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o60.save.
: java.util.NoSuchElementException: key not found: catalog
As i try to write on to hbase using spark-shell with above jars it successfully get executed but failing with spark-submit.

Opscenter not loading. Life cycle manager not connecting to the cluster.

My opscenter always gets stuck with Loading OpsCenter... BTW this is my first installation and so far I did not get OpsCenter to run.
All three of these run normally.
nodetool status
service dse
service datastax-agent
I could reproduce it on both GChrome & MFirefox. Both remotely & running on localhost.
opscenterd.log :
2017-04-12 15:20:15,877 [myclustername] WARN: These nodes reported this message, Nodes: ['10.35.21.207'] Message: HTTP request http://10.35.21.207:61621/connection-status? failed:
An error occurred while connecting: 107: Transport endpoint is not connected. (MainThread)
when using life cycle manager , it sees my cluster name I picked but could not connect. Here's what the log looks like when I attempt to start managing the un-managed cluster.
[opscenterd] ERROR: Problem while calling ImportClusterIntoLifecycleManagerController (AgentCommunicationFailure): Cluster Import Failure: Unable to determine the DSE version for the specified cluster. Please verify that the Agents for this cluster are properly communicating with Opscenter.
File "/usr/share/opscenter/lib/py/twisted/internet/defer.py", line 1122, in _inlineCallbacks
result = result.throwExceptionIntoGenerator(g)
File "/usr/share/opscenter/lib/py/twisted/python/failure.py", line 389, in throwExceptionIntoGenerator
return g.throw(self.type, self.value, self.tb)
File "/usr/share/opscenter/jython/Lib/site-packages/opscenterd/WebServer.py", line 2598, in ImportClusterIntoLifecycleManagerController

module error in multi-node spark job on google cloud cluster

This code runs perfect when I set master to localhost. The problem occurs when I submit on a cluster with two worker nodes.
All the machines have same version of python and packages. I have also set the path to point to the desired python version i.e. 3.5.1. when I submit my spark job on the master ssh session. I get the following error -
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, .c..internal): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/hadoop/yarn/nm-local-dir/usercache//appcache/application_1469113139977_0011/container_1469113139977_0011_01_000004/pyspark.zip/pyspark/worker.py", line 98, in main
command = pickleSer._read_with_length(infile)
File "/hadoop/yarn/nm-local-dir/usercache//appcache/application_1469113139977_0011/container_1469113139977_0011_01_000004/pyspark.zip/pyspark/serializers.py", line 164, in _read_with_length
return self.loads(obj)
File "/hadoop/yarn/nm-local-dir/usercache//appcache/application_1469113139977_0011/container_1469113139977_0011_01_000004/pyspark.zip/pyspark/serializers.py", line 419, in loads
return pickle.loads(obj, encoding=encoding)
File "/hadoop/yarn/nm-local-dir/usercache//appcache/application_1469113139977_0011/container_1469113139977_0011_01_000004/pyspark.zip/pyspark/mllib/init.py", line 25, in
import numpy
ImportError: No module named 'numpy'
I saw other posts where people did not have access to their worker nodes. I do. I get the same message for the other worker node. not sure if I am missing some environment setting. Any help will be much appreciated.
Not sure if this qualifies as a solution. I submitted the same job using dataproc on google platform and it worked without any problem. I believe the best way to run jobs on google cluster is via the utilities offered on google platform. The dataproc utility seems to iron out any issues related to the environment.

apache spark "Py4JError: Answer from Java side is empty"

I get this error every time...
I use sparkling water...
My conf-file:
***"spark.driver.memory 65g
spark.python.worker.memory 65g
spark.master local[*]"***
The amount of data is about 5 Gb.
There is no another information about this error...
Does anybody know why it happens? Thank you!
***"ERROR:py4j.java_gateway:Error while sending or receiving.
Traceback (most recent call last):
File "/data/analytics/Spark1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command
raise Py4JError("Answer from Java side is empty")
Py4JError: Answer from Java side is empty
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
File "/data/analytics/Spark1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
self.socket.connect((self.address, self.port))
File "/usr/local/anaconda/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
File "/data/analytics/Spark1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
self.socket.connect((self.address, self.port))
File "/usr/local/anaconda/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
File "/data/analytics/Spark1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
self.socket.connect((self.address, self.port))
File "/usr/local/anaconda/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused"***
Have you tried setting spark.executor.memory and spark.driver.memory in your Spark configuration file?
See https://stackoverflow.com/a/22742982/5453184 for more info.
Usually, you'll see this error when the Java process get silently killed by the OOM Killer.
The OOM Killer (Out of Memory Killer) is a Linux process that kicks in when the system becomes critically low on memory. It selects a process based on its "badness" score and kills it to reclaim memory.
Read more on OOM Killer here.
Increasing spark.executor.memory and/or spark.driver.memory values will only make things worse in this case, i.e. you may want to do the opposite!
Other options would be to:
increase the number of partitions if you're working with very big data sources;
increase the number of worker nodes;
add more physical memory to worker/driver nodes;
Or, if you're running your driver/workers using docker:
increase docker memory limit;
set --oom-kill-disable on your containers, but make sure you understand possible consequences!
Read more on --oom-kill-disable and other docker memory settings here.
Another point to note if you are on wsl2 using pyspark. Ensure that your wsl2 config file has an increased memory.
# Settings apply across all Linux distros running on WSL 2
[wsl2]
# Limits VM memory to use no more than 4 GB, this can be set as whole numbers using GB or MB
memory=12GB # This was originally set to 3gb which caused me to fail since spark.executor.memory and spark.driver.memory was only able to MAX of 3gb regardless of how high i set it.
# Sets the VM to use eight virtual processors
processors=8
for reference. your .wslconfig config file should be located in C:\Users\USERNAME

Spark on EMR Yarn - EOF Error

We are running some PySpark processes on Yarn, when the datasets increase in size we are getting this error in the yarn log:
Traceback (most recent call last):
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 157, in manager
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/daemon.py", line 61, in worker
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/worker.py", line 136, in main
if read_int(infile) == SpecialLengths.END_OF_STREAM:
File "/home/hadoop/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 544, in read_int
raise EOFError
java.net.SocketException: Socket is closed
at java.net.Socket.shutdownOutput(Socket.java:1496)
at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3$$anonfun$apply$2.apply$mcV$sp(PythonRDD.scala:256)
at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3$$anonfun$apply$2.apply(PythonRDD.scala:256)
at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3$$anonfun$apply$2.apply(PythonRDD.scala:256)
at org.apache.spark.util.Utils$.tryLog(Utils.scala:1785)
at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$3.apply(PythonRDD.scala:256)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1772)
at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:208)
We are running on a EMR Setup 3*m3.xlarge - each with 4vCPUs, 15GiB and 2x40 GB
The job is executed with the following sh script:
export SPARK_HOME=/home/hadoop/spark
JARS="/home/hadoop/avro-1.7.7.jar,/home/hadoop/spark-avro-master/target/scala-2.10/spark-avro_2.10-1.0.0.jar”
$SPARK_HOME/bin/spark-submit --master yarn-cluster --py-files deploy.zip --jars $JARS main.py
where deploy.zip contains some utility methods and lambda functions
No other configuration changes were made to the cluster.
By looking at the UI seems that the all the jobs are finishing with a SUCCESS status, nevertheless we would like to get rid of this issue, or at the very least to understand what's causing it.
Would you have any idea on what it might be the origin of the error?
Thanks!

Resources