Why does submitting a Spark application to Mesos fail with 'Failed to load native Mesos library'? - apache-spark

I'm getting the following exception when I'm trying to submit a Spark application to a Mesos cluster:
/home/knoldus/application/spark-2.2.0-rc4/conf/spark-env.sh: line 40: export: `/usr/local/lib/libmesos.so': not a valid identifier
/home/knoldus/application/spark-2.2.0-rc4/conf/spark-env.sh: line 41: export: `hdfs://spark-2.2.0-bin-hadoop2.7.tgz': not a valid identifier
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/09/30 14:17:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/09/30 14:17:31 WARN Utils: Your hostname, knoldus resolves to a loopback address: 127.0.1.1; using 192.168.0.111 instead (on interface wlp6s0)
17/09/30 14:17:31 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Failed to load native Mesos library from
java.lang.UnsatisfiedLinkError: Expecting an absolute path of the library:
at java.lang.Runtime.load0(Runtime.java:806)
at java.lang.System.load(System.java:1086)
at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:159)
at org.apache.mesos.MesosNativeLibrary.load(MesosNativeLibrary.java:188)
at org.apache.mesos.MesosSchedulerDriver.<clinit>(MesosSchedulerDriver.java:61)
at org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils$class.createSchedulerDriver(MesosSchedulerUtils.scala:104)
at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.createSchedulerDriver(MesosCoarseGrainedSchedulerBackend.scala:49)
at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.start(MesosCoarseGrainedSchedulerBackend.scala:170)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:103)
... 47 elided
I have built spark using
./build/mvn -Pmesos -DskipTests clean package
I have set the following properties in spark-env.sh:
export MESOS_NATIVE_JAVA_LIBRARY= /usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI= hdfs://spark-2.2.0-bin-hadoop2.7.tgz
And in spark-defaults.conf :
spark.executor.uri hdfs://spark-2.2.0-bin-hadoop2.7.tgz

I have resolved the issue.
The problem is that there should be no space while exporting path.
export MESOS_NATIVE_JAVA_LIBRARY= /usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI= hdfs://spark-2.2.0-bin-hadoop2.7.tgz
For example
export foo = bar
the shell will interpret that as a request to export three names: foo, = and bar. = isn't a valid variable name, so the command fails. The variable name, equals sign and it's value must not be separated by spaces for them to be processed as a simultaneous assignment and export.
Remove the spaces.
export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI=hdfs://spark-2.2.0-bin-hadoop2.7.tgz

Related

Pyspark not creating SparkContext (Yarn). bad gateway or network traffic blocked?

Here is some context of my installation of pyspark binary.
In my company, we use a Cloudera Data Science Workbench (CDSW). When we create a session for a new projet, I'm guessing it's a image from a specific Dockerfile. And inside this dockerfile is pushed the installation of CDH binaries and configuration.
Now I wish to use thoses configurations outside CDSW. I have a kubernetes cluster where I deploy webapps. And I would like to use spark in Yarn mode to deploy very small ressources for the webapps.
What I have done, is to tar.gz all binaries and config from /opt/cloudera/parcels/CDH-6.3.4-1.cdh6.3.4.p4484.8795072 and /var/lib/cdsw/client-config/.
Then untar.gz them in a container or in a WSL2 instance.
Instead of unpacking everything in /var/ or /opt/ like I should do, I've put them in $HOME/opt/cloudera/parcels/CDH-6.3.4-1.cdh6.3.4.p4484.8795072/* and $USER/etc/client-config/*. Why I did this? Because I might want to use a mounted Volume in my kubernetes and share binaries between containers.
I've sed and modifiy all configuration files to adapt paths:
spark-env.sh
topology.py
Any *.txt, *.sh, *.py
So I managed to run beeline hadoop hdfs hbase pointing them with the hadoop-conf folder. I can use pyspark but in local mode only. But What I really want is to use pyspark with yarn.
So I set a bunch of env variables to make this work:
export HADOOP_CONF_DIR=$HOME/etc/client-config/spark-conf/yarn-conf
export SPARK_CONF_DIR=$HOME/etc/client-config/spark-conf/yarn-conf
export JAVA_HOME=/usr/local
export BIN_DIR=$HOME/opt/cloudera/parcels/CDH-6.3.4-1.cdh6.3.4.p4484.8795072/bin
export PATH=$BIN_DIR:$JAVA_HOME/bin:$PATH
export PYSPARK_PYTHON=python3.6
export PYSPARK_DRIVER_PYTHON=python3.6
export OPENBLAS_NUM_THREADS=1
export MKL_NUM_THREADS=1
export SPARK_HOME=/opt/cloudera/parcels/CDH-6.3.4-1.cdh6.3.4.p4484.8795072/lib/spark
export PYSPARK_ARCHIVES_PATH=$(ZIPS=("$CDH_DIR"/lib/spark/python/lib/*.zip); IFS=:; echo "${ZIPS[*]}"):$PYSPARK_ARCHIVES_PATH
export SPARK_DIST_CLASSPATH=$HOME/opt/cloudera/parcels/CDH-6.3.4-1.cdh6.3.4.p4484.8795072/lib/hadoop/client/accessors-smart-1.2.jar:<ALL OTHER JARS FOR EVERY BINARIES>
Anyway, all of the paths are existing and working. And since I've sed all config files, they also generate the same path as the exported one.
I launch my pyspark binary like this:
pyspark --conf "spark.master=yarn" --properties-file $HOME/etc/client-config/spark-conf/spark-defaults.conf --verbose
FYI, it is using pyspark 2.4.0. And I've install Java(TM) SE Runtime Environment (build 1.8.0_131-b11). The same that I found on the CDSW instance. I added the keystore with the public certificate of the company. And I also generate a keytab for the kerberos auth. Both of them are working since I can used hdfs with HADOOP_CONF_DIR=$HOME/etc/client-config/hadoop-conf
In verbose mode I can see all the details and configuration from spark. When I compare it from the CDSW session, they are quite identical (with modified path, for example :
Using properties file: /home/docker4sg/etc/client-config/spark-conf/spark-defaults.conf
Adding default property: spark.lineage.log.dir=/var/log/spark/lineage
Adding default property: spark.port.maxRetries=250
Adding default property: spark.serializer=org.apache.spark.serializer.KryoSerializer
Adding default property: spark.driver.log.persistToDfs.enabled=true
Adding default property: spark.yarn.jars=local:/home/docker4sg/opt/cloudera/parcels/CDH-6.3.4-1.cdh6.3.4.p4484.8795072/lib/spark/jars/*,local:/home/docker4sg/opt/cloudera/parcels/CDH-6.3.4-1.cdh6.3.4.p4484.8795072/lib/spark/hive/*
...
After few seconds it fails to create a sparkSession:
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
2022-02-22 14:44:14 WARN Client:760 - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
2022-02-22 14:44:14 ERROR SparkContext:94 - Error initializing SparkContext.
java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 12: pyspark.zip:
...
Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 12: pyspark.zip:
...
2022-02-22 14:44:15 WARN YarnSchedulerBackend$YarnSchedulerEndpoint:69 - Attempted to request executors before the AM has registered!
2022-02-22 14:44:15 WARN MetricsSystem:69 - Stopping a MetricsSystem that is not running
2022-02-22 14:44:15 WARN SparkContext:69 - Another SparkContext is being constructed (or threw an exception in its constructor). This may indicate an error, since only one SparkContext may be running in this JVM (see SPARK-2243). The other SparkContext was created at:
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58
From what I understand, it fails for a reason I'm not sure about and then tries to fall back into an other mode. That fails too.
In the configuration file spark-conf/yarn-conf/yarn-site.xml, it is specified that it is using a zookeeper:
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>corporate.machine.node1.name.net:9999,corporate.machine.node2.name.net:9999,corporate.machine.node3.name.net:9999</value>
</property>
Could it be that the Yarn cluster does not accept traffic from a random IP (kuber IP or personnal IP from computer)? For me, the IP i'm working on is not on the whitelist, but at the moment I cannot ask to add my ip to the whitelist. How can I know for sure I'm looking in the good direction?
Edit 1:
As said in the comment, the URI of the pyspark.zip was wrong. I've modified my PYSPARK_ARCHIVES_PATH to the real location of pyspark.zip.
PYSPARK_ARCHIVES_PATH=local:$HOME/opt/cloudera/parcels/CDH-6.3.4-1.cdh6.3.4.p4484.8795072/lib/spark/python/lib/py4j-0.10.7-src.zip,local:$HOME/opt/cloudera/parcels/CDH-6.3.4-1.cdh6.3.4.p4484.8795072/lib/spark/python/lib/pyspark.zip
Now I get an error UnknownHostException:
org.apache.spark.SparkException: Uncaught exception: org.apache.spark.SparkException: Exception thrown in awaitResult
...
Caused by: java.io.IOException: Failed to connect to <HOSTNAME>:13250
...
Caused by: java.net.UnknownHostException: <HOSTNAME>
...

Spark Submit: You have not specified a krb5.conf file locally or via a ConfigMap

I'm using to spark-submit directly to submit a Spark application to a Kubernetes cluster. using the below command :
Submit Spark
./bin/spark-submit --master k8s://https://localhost:8443
--deploy-mode cluster --name spark-pi
--class com.xxx.Main
--conf spark.executor.instances=5
--conf spark.kubernetes.container.image=spark:latest
local:///home/SparkJob_jar spark.kubernetes.file.upload.path=/opt/xxx
But I'm getting the below exception:
20/09/25 11:33:57 WARN Utils: Your hostname, XXXX resolves to a loopback address: 127.0.1.1; using xxx.yyy.zzz instead (on interface wlp59s0)
20/09/25 11:33:57 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.0.1.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/09/25 11:33:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/09/25 11:33:58 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
20/09/25 11:33:58 INFO KerberosConfDriverFeatureStep: You have not specified a krb5.conf file locally or via a ConfigMap. Make sure that you have the krb5.conf locally on the driver image.
20/09/25 11:33:58 WARN WatchConnectionManager: Exec Failure
javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:131)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:326)
at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:269)
The post https://issues.apache.org/jira/browse/SPARK-31800 suggested adding spark.kubernetes.file.upload.path to solve the problem, I added it to the submit command, but the issue still there. I appreciate any help to figure out the root cause of this issue.
you might havae to add the following configuration flags to make the certificats available
--conf spark.kubernetes.authenticate.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
--conf spark.kubernetes.authenticate.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token \

How to run an interactive spark application from spark-shell/spark-submit

I have a spark app that reads large data, loads it in memory and sets everything in between ready for user to query the dataframe in memory multiple times. Once a query is done, the user is prompted on the console to either continue with new set of input or quit the application.
I can do this very well on the IDE. However, can I run this interactive spark app from spark-shell?
I've used spark job server before to achieve multiple interactive querying on a memory loaded dataframe but not from a shell. Any pointers?
Thanks!
UPDATE 1:
Here is how the project jar looks and its packaged with all the other dependencies.
jar tf target/myhome-0.0.1-SNAPSHOT.jar
META-INF/MANIFEST.MF
META-INF/
my_home/
my_home/myhome/
my_home/myhome/App$$anonfun$foo$1.class
my_home/myhome/App$.class
my_home/myhome/App.class
my_home/myhome/Constants$.class
my_home/myhome/Constants.class
my_home/myhome/RecommendMatch$$anonfun$1.class
my_home/myhome/RecommendMatch$$anonfun$2.class
my_home/myhome/RecommendMatch$$anonfun$3.class
my_home/myhome/RecommendMatch$.class
my_home/myhome/RecommendMatch.class
and ran spark-shell with the following options
spark-shell -i my_home/myhome/RecommendMatch.class --master local --jars /Users/anon/Documents/Works/sparkworkspace/myhome/target/myhome-0.0.1-SNAPSHOT.jar
but shell throws the following message on start up. The jars are loaded as per the environment shown at localhost:4040
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/05/16 10:10:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/16 10:10:06 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.0.101:4040
Spark context available as 'sc' (master = local, app id = local-1494909601904).
Spark session available as 'spark'.
That file does not exist
Welcome to
...
UPDATE 2 (using spark-submit)
Tried with full path to jar. Next, tried by copying project jar to bin location.
pwd
/usr/local/Cellar/apache-spark/2.1.0/bin
spark-submit --master local —-class my_home.myhome.RecommendMatch.class --jars myhome-0.0.1-SNAPSHOT.jar
Error: Cannot load main class from JAR file:/usr/local/Cellar/apache-spark/2.1.0/bin/—-class
Try the -i <path_to_file> option to run the scala code in your file or the scala shell :load <path_to_file> function.
Relevant Q&A: Spark : how to run spark file from spark shell
The following command works to run an interactive spark application.
spark-submit /usr/local/Cellar/apache-spark/2.1.0/bin/myhome-0.0.1-SNAPSHOT.jar
Note that is a uber jar built with the main class as entry point and all dependent libraries. Check out http://maven.apache.org/plugins/maven-shade-plugin/

trouble in adding spark-csv package in Cloudera VM

I am using Cloudera quickstart VM to test out some pyspark work. For one task, I need to add spark-csv package. And here is what I did:
PYSPARK_DRIVER_PYTHON=ipython pyspark -- packages com.databricks:spark-csv_2.10:1.3.0
pyspark started up fine, however I did get warnings as:
**16/02/09 17:41:22 WARN util.Utils: Your hostname, quickstart.cloudera resolves to a loopback address: 127.0.0.1; using 10.0.2.15 instead (on interface eth0)
16/02/09 17:41:22 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/09 17:41:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable**
then I ran my code in pyspark:
yelp_df = sqlCtx.load(
source="com.databricks.spark.csv",
header = 'true',
inferSchema = 'true',
path = 'file:///directory/file.csv')
But I am getting an error message:
Py4JJavaError: An error occurred while calling o19.load.: java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv at scala.sys.package$.error(package.scala:27)
What could have gone wrong?? Thanks in advance for your help.
Try this
PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.3.0
Without the space, there's a typo.

Can't access hadoop with ip address

I'm following this guide for installing hadoop in centos.
Everything works normal when I run hadoop and I compare it with the guide also, but when I try to access mine with ip address like 192.168.0.1:50070 then nothing works.
Here is the output when I run had:
bash-4.2$ start-dfs.sh
14/10/15 16:28:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-localhost.localdomain.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out
14/10/15 16:29:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Do you think I have to configure IP somewhere to access them? My configuration is exactly the same as above link, even the xml files...
Did you tried to disable the firewall/make an ip tables line for it on the master/slaves?
for centOS6.5, try:
service firewall stop
to disable the firewall. If it works properly, you just need to add the allowance to your iptables.
Also, CentOS has the SELinux. I would advice turning it off and check if it keeps with an error.

Resources