Trying to run a spark-submit job on a yarn cluster but I keep getting the following warning. How do I fix the issue? - apache-spark

WARN YarnClusterScheduler: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have sufficient
resources.
I have looked through similar questions and tried everything else that was mentioned. When I look through the yarn-nodemanager log on hdfs I see the following warning that might be causing the error. How do I fix these warnings?
2017-09-13 14:29:52,640 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The
Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class
org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'.
Because these are not the same tools trying to send ServiceData and read
Service Meta Data may have issues unless the refer to the name in the config.
yarn-site.xml log:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common
/*, /usr/local/hadoop/share/hadoop/common/lib/*,
/usr/local/hadoop/share/hadoop/
hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*,
/usr/local/hadoop/share/hadoo
p/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
/usr/local/hadoop
/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/*</value>
</property>
<property>
<name>nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<description>
Number of seconds after an application finishes before the nodemanager's
DeletionService will delete the application's localized file directory
and log directory.
core-site.xml log:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://sandbox:9000</value>
</property>
<property>
<name>dfs.client.use.legacy.blockreader</name>
<value>true</value>
</property>
</configuration>
hdfs-site.xml log:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Please let me know if I am trying to find a solution to my initial warning in the wrong direction because the application keeps running but no data is sent to hdfs. Thank you!

Related

cannot stat '/user/hadoop/logs/datanode-cluster

I am trying to run a multi-step job that has one of the steps as a script that uses pyspark/Apache Spark. I have a 4 node computer cluster with a SLURM job scheduler and am wondering how I can run them together. Currently, I have Spark on all the nodes (with the head node acting as the "master" and the remaining 3 compute nodes as "slaves") and Hadoop(with the head node as the namenode, secondary namenode and the remaining 3 compute nodes as datanodes).
However, when I start hadoop on the head node with start-all.sh, I only see a single datanode and when I try to start it an error saying
localhost: mv: cannot stat '/user/hadoop/logs/datanode-cluster-n1.out.4': No such file or directory
localhost: mv: cannot stat '/user/hadoop/logs/datanode-cluster-n1.out.3': No such file or directory
localhost: mv: cannot stat '/user/hadoop/logs/datanode-cluster-n1.out.2': No such file or directory
localhost: mv: cannot stat '/user/hadoop/logs/datanode-cluster-n1.out.1': No such file or directory
localhost: mv: cannot stat '/user/hadoop/logs/datanode-cluster-n1.out': No such file or directory
However, these files exist and seem to be readable/writable. Spark starts well and the 3 slave nodes are able to be started from the head node. Because of the error mentioned before, when I submit my job to SLURM it throws the error above. I would appreciate any advice on this issue and any advice on the architecture of my process.
Edit 1: Hadoop config files
core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://cluster-hn:9000</value>
</property>
</configuration>
Hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permission</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/s1/snagaraj/hadoop/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/s1/snagaraj/hadoop/dataNode</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.https.port</name>
<value>50470</value>
<description>The https port where namenode binds</description>
</property>
<property>
<name>dfs.socket.timeout</name>
<value>0</value>
</property>
Workers File
localhost
cluster-n1
cluster-n2
cluster-n3
I have been facing this same issue... I could fix it by giving 775 permission to the logs directory recursively... i.e, in my case...chmod 775 -R /home/admin/hadoop/logs
Now the "mv: cannot stat... .out': No such file or directory" error is gone.

How to configure the Hive cli when using the Spark execution engine?

I have set the hive.execution.engine to spark and also am using a spark-enabled queue. Spark sql is able to access the hive tables - and so is beeline from a directly connected cluster machine.
But the hive cli seems to need additional steps. So far the following have been done:
** Copy the scala libraries to the $HIVE_HOME/libs dir (or we get ClassNotFoundException)
** Run the following at the start of the hive script (or in .hiverc)
set hive.execution.engine=spark;
set mapred.job.queue.name=root.spark.sbg.hos;
However the following error now happens Failed to create spark client.:
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/Cellar/hive/2.1.1/libexec/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
hive (default)> insert into sb.test2 values (1,'ab');
Query ID = sboesch_20171030175629_dc310c9a-519e-4f84-a632-f3a44f1df8c3
Total jobs = 3
Launching Job 1 out of 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Has anyone managed to connect to spark backend for hive ? I am connecting via vanilla hive (not Cloudera or Hortonworks or MapR).
you have to start Hive metastore Server separately for accessing hive tables through spark.
Try hive --service metastore in a new Terminal you will get a response like Starting Hive Metastore Server
hive-site.xml
`<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>**mysql metastore username**</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>**mysql metastore DB password**</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/tmp/hivequerylogs/${user.name}</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>file:///usr/local/hive/apache-hive-2.1.1-bin/lib/hive-hbase-handler-2.1.1.jar,file:///usr/local/hive/apache-hive-2.1.1-bin/lib/zookeeper-3.4.6.jar</value>
<description>A comma separated list (with no spaces) of the jar files required for Hive-HBase integration</description>
</property>
<property>
<name>hive.support.concurrency</name>
<value>false</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>PAM</value>
</property>
<property>
<name>hive.server2.custom.authentication.class</name>
<value>org.apache.hive.service.auth.PamAuthenticationProvider</value>
</property>
<property>
<name>hive.server2.authentication.pam.services</name>
<value>sshd,sudo</value>
</property>
<property>
<name>hive.stats.dbclass</name>
<value>jdbc:mysql</value>
</property>
<property>
<name>hive.stats.jdbcdriver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>hive.session.history.enabled</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.optimize.sort.dynamic.partition</name>
<value>false</value>
</property>
<property>
<name>hive.optimize.insert.dest.volume</name>
<value>false</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive/${user.name}</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
<description/>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
<description>creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once</description>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.validateConstraints</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.validateColumns</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.validateTables</name>
<value>true</value>
</property>
</configuration>`

Yarn nodemanager not starting up. Getting no errors

I have Hadoop 2.7.4 installed on Ubuntu 16.04. I'm trying to run it in Pseudo Mode.
I have a '/hadoop' partition mounted for all my hadoop files, NameNode and DataNode files.
My core-site.xml is:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
My hdfs-site.xml is:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/hadoop/nodes/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/hadoop/nodes/datanode</value>
</property>
</configuration>
My mapred-site.xml is:
<configuration>
<property>
<name>Map-Reduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
My yarn-site.xml is:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>Map-Reduce_shuffle</value>
</property>
</configuration>
After running
$ start-dfs.sh
$ start-yarn.sh
$ jps
I get the following daemons running.
2800 ResourceManager
2290 NameNode
4242 Jps
2440 DataNode
2634 SecondaryNameNode
start-yarn.sh gives me:
$ start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /hadoop/hadoop-2.7.4/logs/yarn-abdy-resourcemanager-abdy-hadoop.out
localhost: starting nodemanager, logging to /hadoop/hadoop-2.7.4/logs/yarn-abdy-nodemanager-abdy-hadoop.out
The nodemanager daemon does not seem to start at all.
I've tried for 2 days to fix this issue but I cannot seem to find a fix. Someone please guide me.
If your going to start hadoop daemons for the first time.
First you have to format your namenode :
hadoop namenode -format
Before formatting namenode make sure you delete existing
/hadoop/nodes/namenode and /hadoop/nodes/datanode folders
Then you execute:
hadoop namenode -format
Once formatting of namenode is done.
you execute following commands.
start-dfs.sh
start-yarn.sh

HMaster is automatically stops in hbase

I installed and configured Hadoop (version 2.7.0) & HBase (version 1.2.3) in pseudo distributed mode. I test hadoop with a test program (Word Count) and every things is OK. Before I enter Hbase shell and get list of tables, Hmaster is running, but When I enter hbase shell and get list of tables (or create table), i see this Error:
hbase(main):001:0> list
TABLE
ERROR: Can't get master address from ZooKeeper; znode data == null
Here is some help for this command:
List all tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:
hbase> list
hbase> list 'abc.*'
hbase> list 'ns:abc.*'
hbase> list 'ns:.*'
And When i back from Hbase shell and get jps, I see that there is no HMaster running but HRegionServer and Zookeeper are still running.
Here is my Hbase-site.xml Configuration:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:54310/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
</configuration>
Here is my core-site.xml Configuration:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
</configuration>

Hadoop: each namenode and datanode only last for a momentary time

Using CentOs 5.4
Three virtual machines(using vmware workstation):master, slave1, slave2. master is used for the namenode, and slave1 slave2 are used for the datanodes.
Hadoop version is hadoop-0.20.1.tar.gz, I have configured all the relative files, and closed the firewall with root user using the command:/sbin/service iptables stop. Then I tried to format namenode and start hadoop in the master(namenode) virtual machine with the following commands, no error was reported.
bin/hadoop namenode -format
bin/start-all.sh
Then I typed the command "jps" in the master machine right now, and found the right result:
5144 JobTracker
4953 NameNode
5079 SecondaryNameNode
5216 Jps
But after about several seconds, when I tried to type the "jps" command, all the virtual machines only have one process: JPS. The following is the result displayed in the namenode(master)
5236 Jps
What's the matter? Or how can I find what caused the matter? Dose it mean that it cannot find any namenode or datanode? Thank you.
Attachment: all the places I have modified:
hadoop-env.sh:
# set java environment
export JAVA_HOME=/usr/jdk1.6.0_13/
core-site.xml:
<configuration>
<property>
<name>master.node</name>
<value>namenode_master</value>
<description>master</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>local dir</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://${master.node}:9000</value>
<description> </description>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>${hadoop.tmp.dir}/hdfs/name</value>
<description>local dir</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>${hadoop.tmp.dir}/hdfs/data</value>
<description> </description>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>${master.node}:9001</value>
<description> </description>
</property>
<property>
<name>mapred.local.dir</name>
<value>${hadoop.tmp.dir}/mapred/local</value>
<description> </description>
</property>
<property>
<name>mapred.system.dir</name>
<value>/tmp/mapred/system</value>
<description>hdfs dir</description>
</property>
</configuration>
master:
master
slaves:
slave1
slave2
/etc/hosts:
192.168.190.133 master
192.168.190.134 slave1
192.168.190.135 slave2
From the log files, I found that I should change the namenode_master to master in the file core-site.xml. Now it works.

Resources