I'm following this guide for installing hadoop in centos.
Everything works normal when I run hadoop and I compare it with the guide also, but when I try to access mine with ip address like then nothing works.
Here is the output when I run had:
14/10/15 16:28:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-datanode-localhost.localdomain.out
Starting secondary namenodes [] starting secondarynamenode, logging to /home/hadoop/hadoop/logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out
14/10/15 16:29:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Do you think I have to configure IP somewhere to access them? My configuration is exactly the same as above link, even the xml files...

Did you tried to disable the firewall/make an ip tables line for it on the master/slaves?
for centOS6.5, try:
service firewall stop
to disable the firewall. If it works properly, you just need to add the allowance to your iptables.
Also, CentOS has the SELinux. I would advice turning it off and check if it keeps with an error.


Apache PySpark - Failed to connect to master 7077

I set up Spark and HDFS after watching this video. The only difference is that I did it on a server (ubuntu) and not on a VM.
On the server, everything works perfect. Now I wanted to access it from my local machine (Windows) with PySpark.
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("spark://ubuntu-spark:7077").appName("test").getOrCreate()
However, here I get the following error messages:
22/11/12 10:38:35 WARN Shell: Did not find winutils.exe: HADOOP_HOME and hadoop.home.dir are unset. -see
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use
22/11/12 10:38:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
22/11/12 10:38:37 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master
org.apache.spark.SparkException: Exception thrown in awaitResult: ...
According to other posts, the DNS should be correct. I got this from the Spark Master website (at port 8080):
URL: spark://ubuntu-spark:7077
Alive Workers: 1
Cores in use: 2 Total, 0 Used
Memory in use: 6.8 GiB Total, 0.0 B Used
Resources in use:
Applications: 0 Running, 0 Completed
Drivers: 0 Running, 0 Completed
Status: ALIVE
The ports are open. I also don't understand the following message: "HADOOP_HOME and hadoop.home.dir are unset." Hadoop is configured on the server. Why should I do the same thing locally again? My expectation would be that I can use Spark like an API or am I wrong?
Thank you very much for your help. If you need any configuration files I can provide them.
Hadoop should not be necessary for the code shown since you're not using HDFS, but the log is saying it's looking on your Windows machine for those settings.
DNS needs to work between your windows machine and wherever your server is running (a VM can still be server, so it's unclear where you're running this). Start debugging with ping spark-master to check, or you should be able to open spark-master:8080 from Windows browser instance as well.
If you only want to run Spark code, and don't care if it's distributed, you could just use Docker on Windows -
Or setup Pycharm all locally for the same

Why does submitting a Spark application to Mesos fail with 'Failed to load native Mesos library'?

I'm getting the following exception when I'm trying to submit a Spark application to a Mesos cluster:
/home/knoldus/application/spark-2.2.0-rc4/conf/ line 40: export: `/usr/local/lib/': not a valid identifier
/home/knoldus/application/spark-2.2.0-rc4/conf/ line 41: export: `hdfs://spark-2.2.0-bin-hadoop2.7.tgz': not a valid identifier
Using Spark's default log4j profile: org/apache/spark/
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/09/30 14:17:31 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/09/30 14:17:31 WARN Utils: Your hostname, knoldus resolves to a loopback address:; using instead (on interface wlp6s0)
17/09/30 14:17:31 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
Failed to load native Mesos library from
java.lang.UnsatisfiedLinkError: Expecting an absolute path of the library:
at java.lang.Runtime.load0(
at java.lang.System.load(
at org.apache.mesos.MesosNativeLibrary.load(
at org.apache.mesos.MesosNativeLibrary.load(
at org.apache.mesos.MesosSchedulerDriver.<clinit>(
at org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils$class.createSchedulerDriver(MesosSchedulerUtils.scala:104)
at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.createSchedulerDriver(MesosCoarseGrainedSchedulerBackend.scala:49)
at org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.start(MesosCoarseGrainedSchedulerBackend.scala:170)
at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:173)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:509)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:909)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:901)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:901)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:103)
... 47 elided
I have built spark using
./build/mvn -Pmesos -DskipTests clean package
I have set the following properties in
export MESOS_NATIVE_JAVA_LIBRARY= /usr/local/lib/
export SPARK_EXECUTOR_URI= hdfs://spark-2.2.0-bin-hadoop2.7.tgz
And in spark-defaults.conf :
spark.executor.uri hdfs://spark-2.2.0-bin-hadoop2.7.tgz
I have resolved the issue.
The problem is that there should be no space while exporting path.
export MESOS_NATIVE_JAVA_LIBRARY= /usr/local/lib/
export SPARK_EXECUTOR_URI= hdfs://spark-2.2.0-bin-hadoop2.7.tgz
For example
export foo = bar
the shell will interpret that as a request to export three names: foo, = and bar. = isn't a valid variable name, so the command fails. The variable name, equals sign and it's value must not be separated by spaces for them to be processed as a simultaneous assignment and export.
Remove the spaces.
export MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/
export SPARK_EXECUTOR_URI=hdfs://spark-2.2.0-bin-hadoop2.7.tgz

error initialising SparkContext, running Spark on bash on ubuntu on windows

I am attempting to install and configure spark 2.0.1 on bash on ubuntu on Windows. I followed the instructions at Apache Spark - Installation and everything seemed to get installed OK however when I run spark-shell this happens:
16/11/06 11:25:47 ERROR SparkContext: Error initializing SparkContext. Invalid argument
at Method)
at io.netty.bootstrap.AbstractBootstrap$
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(
at io.netty.util.concurrent.SingleThreadEventExecutor$
Immediately prior to that error I see a warning which may or may not be related:
16/11/06 11:25:47 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/11/06 11:25:47 WARN Utils: Your hostname, DESKTOP-IKGIG97 resolves to a loopback address:; using instead (on interface wifi0)
16/11/06 11:25:47 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
I'm a bit of a noob with linux I must admit so am a bit clueless as to what to do next. In case it matters here is the contents of /etc/hosts localhost DESKTOP-IKGIG97
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
Hoping someone here can identify my issue. What do I need to do to investigate and fix this error?
As the error suggests, add the SPARK_LOCAL_IP in the conf/ script in the directory where Spark is installed
Environment Variables
Certain Spark settings can be configured through environment variables, which are read from the conf/ script in the directory where Spark is installed (or conf/spark-env.cmd on Windows). In Standalone and Mesos modes, this file can give machine specific information such as hostnames. It is also sourced when running local Spark applications or submission scripts.
Note that conf/ does not exist by default when Spark is installed. However, you can copy conf/ to create it. Make sure you make the copy executable.
The following variables can be set in
SPARK_LOCAL_IP IP address of the machine to bind to.
If it doesn't solve your problem - please share the output executing the following unix command:

trouble in adding spark-csv package in Cloudera VM

I am using Cloudera quickstart VM to test out some pyspark work. For one task, I need to add spark-csv package. And here is what I did:
PYSPARK_DRIVER_PYTHON=ipython pyspark -- packages com.databricks:spark-csv_2.10:1.3.0
pyspark started up fine, however I did get warnings as:
**16/02/09 17:41:22 WARN util.Utils: Your hostname, quickstart.cloudera resolves to a loopback address:; using instead (on interface eth0)
16/02/09 17:41:22 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
16/02/09 17:41:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable**
then I ran my code in pyspark:
yelp_df = sqlCtx.load(
header = 'true',
inferSchema = 'true',
path = 'file:///directory/file.csv')
But I am getting an error message:
Py4JJavaError: An error occurred while calling o19.load.: java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv at scala.sys.package$.error(package.scala:27)
What could have gone wrong?? Thanks in advance for your help.
Try this
PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.3.0
Without the space, there's a typo.

No start of Hadoop components on the slave machine

I am trying to set up a multi-node hadoop cluster using my two laptops using Michael Noll tutorial. The OS on both machines is Ubuntu 14.04.
I managed to set up single-node clusters on each of the two laptops, but when I try to start (after all the necessary modifications as instructed in the tutorial) the multi-node cluster using sbin/ on my master the slave does not react at all. All the five components on the master start, but no single one starts on the slave.
My /etc/hosts looks on both PCs like this localhost master slave
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
(Furthermore, in /usr/local/hadoop/etc/hadoop there was no file called master, so I created it using: touch /usr/local/hadoop/etc/hadoop/master)
Then, when I ru sbin/, I see the following:
hduser#master:/usr/local/hadoop$ sbin/
This script is Deprecated. Instead use and
15/05/17 21:21:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hduser-namenode-master.out
localhost: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hduser-datanode-master.out
Starting secondary namenodes [] starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hduser-secondarynamenode-master.out
15/05/17 21:21:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hduser-resourcemanager-master.out
localhost: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hduser-nodemanager-master.out
hduser#master:/usr/local/hadoop$ jps
3716 DataNode
3915 SecondaryNameNode
4522 Jps
3553 NameNode
4210 NodeManager
4073 ResourceManager
Interesting is here that on the line 6 there is localhost. Shouldn't it be master?
I can connect to the slave using ssh slavepassword-lessly from the master and control the slave machine, but still, sbin/ does not start any hadoop components on the slave.
Very intrestingly, if I run sbin/ on the slave, it starts NameNode on the master (!!!) and starts NodeManager and ResourceManager on the slave itself.
Can someone help me to properly start the multi-node cluster?
P.S: I looked at this, but in my case the location of hadoop home on both machines are identical
There can be several things:
Check that you can connect with ssh password-less from slave to master. Here is a link that teach us how to do it.
The hostname of each machine is correct?
/etc/hosts file is identical on both, master and slave, alike?
Have you ckecked with ifconfig -a the ip of both machines? Are them the ones that you expected?
Have you changed all the configurations file in slave machine, so instead of localhost, now must say the master's hostname? You should seek for the words localhost and stuff like that, in all your files on your $HADOOP_HOME directory, because there are several files for configurating all sort of things and it's very easy to forget some. Something like this: sudo grep -Ril "localhost" /usr/local/hadoop/etc/hadoop
Check the same as before but in master, so instead of saying localhost, it says the hostname of it.
You should remove the localhost entry, on the /etc/hosts file on slave machine. Sometimes that entry, so typical of the hadoop tutorials, could lead to some problems
In masters and in slaves # slave host, it should say only "slave", and in your master host, in masters file it should say "master" and in your slave file, it should say slave.
You should format your filesystem on both nodes previous to start hadoop.
Those are all the problems that i remember to have when i do the as you are doing right now. Check if some of them help you!
