I installed and configured Hadoop (version 2.7.0) & HBase (version 1.2.3) in pseudo distributed mode. I test hadoop with a test program (Word Count) and every things is OK. Before I enter Hbase shell and get list of tables, Hmaster is running, but When I enter hbase shell and get list of tables (or create table), i see this Error:
hbase(main):001:0> list
TABLE
ERROR: Can't get master address from ZooKeeper; znode data == null
Here is some help for this command:
List all tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:
hbase> list
hbase> list 'abc.*'
hbase> list 'ns:abc.*'
hbase> list 'ns:.*'
And When i back from Hbase shell and get jps, I see that there is no HMaster running but HRegionServer and Zookeeper are still running.
Here is my Hbase-site.xml Configuration:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:54310/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
</configuration>
Here is my core-site.xml Configuration:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
</configuration>
Related
I am trying to access a Hive cluster without Hive downloaded on my machine. I read on here that I just need a jdbc client to do so. I have a url, username and password for the hive cluster. I have tried making a hive-site.xml with these, as well as doing it programmatically, although this method does not seem to have a place to input username and password. No matter what I do, it seems that the following error is keeping me from accessing hive:
Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
I feel like this is because I do not have Hive downloaded on my computer from the answers to this error online. What exactly do I need to do here to access it without hive downloaded, or do I actually have to download it? Here is my code for reference:
spark = SparkSession \
.builder \
.appName("interfacing spark sql to hive metastore without
configuration file") \
.config("hive.metastore.uris", "https://prod-fmhdinsight-
eu.azurehdinsight.net") \
.enableHiveSupport() \
.getOrCreate()
data = [('First', 1), ('Second', 2), ('Third', 3), ('Fourth', 4),
('Fifth', 5)]
df = spark.createDataFrame(data)
# see the frame created
df.show()
# write the frame
df.write.mode("overwrite").saveAsTable("t4")
and the hive-site.xml:
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>https://prod-fmhdinsight-eu.azurehdinsight.net</value>
</property>
<!--
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<-->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>https://prod-fmhdinsight-eu.azurehdinsight.net</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>username</value>
<description>user name for connecting to mysql server
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
<description>password for connecting to mysql server
</description>
</property>
tl;dr Use spark.sql.hive.metastore.jars configuration property with maven to let Spark SQL download the required jars.
The other options are builtin (that simply assumes Hive 1.2.1) and a classpath of the Hive JARs (e.g. spark.sql.hive.metastore.jars="/Users/jacek/dev/apps/hive/lib/*").
If your Hive metastore is available remotely via thrift protocol you may want to create $SPARK_HOME/conf/hive-site.xml as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
</property>
</configuration>
A nice feature of Hive is to define configuration properties as System properties so the above would look as follows:
$SPARK_HOME/bin/spark-shell \
--driver-java-options="-Dhive.metastore.uris=thrift://localhost:9083"
You may want to add the following to conf/log4j.properties for a more low-level logging:
log4j.logger.org.apache.spark.sql.hive.HiveUtils$=ALL
log4j.logger.org.apache.spark.sql.internal.SharedState=ALL
I have set the hive.execution.engine to spark and also am using a spark-enabled queue. Spark sql is able to access the hive tables - and so is beeline from a directly connected cluster machine.
But the hive cli seems to need additional steps. So far the following have been done:
** Copy the scala libraries to the $HIVE_HOME/libs dir (or we get ClassNotFoundException)
** Run the following at the start of the hive script (or in .hiverc)
set hive.execution.engine=spark;
set mapred.job.queue.name=root.spark.sbg.hos;
However the following error now happens Failed to create spark client.:
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/Cellar/hive/2.1.1/libexec/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
hive (default)> insert into sb.test2 values (1,'ab');
Query ID = sboesch_20171030175629_dc310c9a-519e-4f84-a632-f3a44f1df8c3
Total jobs = 3
Launching Job 1 out of 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Has anyone managed to connect to spark backend for hive ? I am connecting via vanilla hive (not Cloudera or Hortonworks or MapR).
you have to start Hive metastore Server separately for accessing hive tables through spark.
Try hive --service metastore in a new Terminal you will get a response like Starting Hive Metastore Server
hive-site.xml
`<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>**mysql metastore username**</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>**mysql metastore DB password**</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/tmp/hivequerylogs/${user.name}</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>file:///usr/local/hive/apache-hive-2.1.1-bin/lib/hive-hbase-handler-2.1.1.jar,file:///usr/local/hive/apache-hive-2.1.1-bin/lib/zookeeper-3.4.6.jar</value>
<description>A comma separated list (with no spaces) of the jar files required for Hive-HBase integration</description>
</property>
<property>
<name>hive.support.concurrency</name>
<value>false</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>PAM</value>
</property>
<property>
<name>hive.server2.custom.authentication.class</name>
<value>org.apache.hive.service.auth.PamAuthenticationProvider</value>
</property>
<property>
<name>hive.server2.authentication.pam.services</name>
<value>sshd,sudo</value>
</property>
<property>
<name>hive.stats.dbclass</name>
<value>jdbc:mysql</value>
</property>
<property>
<name>hive.stats.jdbcdriver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>hive.session.history.enabled</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.optimize.sort.dynamic.partition</name>
<value>false</value>
</property>
<property>
<name>hive.optimize.insert.dest.volume</name>
<value>false</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive/${user.name}</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
<description/>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
<description>creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once</description>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.validateConstraints</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.validateColumns</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.validateTables</name>
<value>true</value>
</property>
</configuration>`
WARN YarnClusterScheduler: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have sufficient
resources.
I have looked through similar questions and tried everything else that was mentioned. When I look through the yarn-nodemanager log on hdfs I see the following warning that might be causing the error. How do I fix these warnings?
2017-09-13 14:29:52,640 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The
Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class
org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'.
Because these are not the same tools trying to send ServiceData and read
Service Meta Data may have issues unless the refer to the name in the config.
yarn-site.xml log:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common
/*, /usr/local/hadoop/share/hadoop/common/lib/*,
/usr/local/hadoop/share/hadoop/
hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*,
/usr/local/hadoop/share/hadoo
p/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
/usr/local/hadoop
/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/*</value>
</property>
<property>
<name>nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<description>
Number of seconds after an application finishes before the nodemanager's
DeletionService will delete the application's localized file directory
and log directory.
core-site.xml log:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://sandbox:9000</value>
</property>
<property>
<name>dfs.client.use.legacy.blockreader</name>
<value>true</value>
</property>
</configuration>
hdfs-site.xml log:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Please let me know if I am trying to find a solution to my initial warning in the wrong direction because the application keeps running but no data is sent to hdfs. Thank you!
, I have installed
1) scala-2.10.3
2) spark-1.0.0
Changed spark-env.sh with below variables
export SCALA_HOME=$HOME/scala-2.10.3
export SPARK_WORKER_MEMORY=16g
I can see Spark master.
3) shark-0.9.1-bin-hadoop1
Changed shark-env.sh with below variables
export SHARK_MASTER_MEM=1g
SPARK_JAVA_OPTS=" -Dspark.local.dir=/tmp "
SPARK_JAVA_OPTS+="-Dspark.kryoserializer.buffer.mb=10 "
SPARK_JAVA_OPTS+="-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps "
export SPARK_JAVA_OPTS
export HIVE_HOME=/usr/share/dse/hive
export HIVE_CONF_DIR="/etc/dse/hive"
export SPARK_HOME=/home/ubuntu/spark-1.0.0
export SPARK_MEM=16g
source $SPARK_HOME/conf/spark-env.sh
4) In DSE, Hive version is Hive 0.11
Existing Hive-site.xml is
<configuration>
<!-- Hive Execution Parameters -->
<property>
<name>hive.exec.mode.local.auto</name>
<value>false</value>
<description>Let hive determine whether to run in local mode automatically</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>cfs:///user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>lib/hive-hwi.war</value>
<description>This sets the path to the HWI war file, relative to ${HIVE_HOME}</description>
</property>
<property>
<name>hive.metastore.rawstore.impl</name>
<value>com.datastax.bdp.hadoop.hive.metastore.CassandraHiveMetaStore</value>
<description>Use the Apache Cassandra Hive RawStore implementation</description>
</property>
<property>
<name>hadoop.bin.path</name>
<value>${dse.bin}/dse hadoop</value>
</property>
<!-- Set this to true to enable auto-creation of Cassandra keyspaces as Hive Databases -->
<property>
<name>cassandra.autoCreateHiveSchema</name>
<value>true</value>
</property>
</configuration>
5) while running Shark shell getting error:
Unable to instantiate Org.apache.hadoop.hive.metastore.HiveMetaStoreClient
And
6) While running shark shell with -skipRddReload - I'm able to get Shark shell but not able to connect hive and not able execute any commands.
shark> DESCRIVE mykeyspace;
and getting error message:
FAILED: Error in metastore: java.lang.RuntimeException: Unable to instantiate org.apache.haddop.hive.metastore.HiveMataStoreClient.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.q1.exec.DDLTask.
Please provide details how to configure spark/shark on Datastax enterprise (Cassandra).
Using CentOs 5.4
Three virtual machines(using vmware workstation):master, slave1, slave2. master is used for the namenode, and slave1 slave2 are used for the datanodes.
Hadoop version is hadoop-0.20.1.tar.gz, I have configured all the relative files, and closed the firewall with root user using the command:/sbin/service iptables stop. Then I tried to format namenode and start hadoop in the master(namenode) virtual machine with the following commands, no error was reported.
bin/hadoop namenode -format
bin/start-all.sh
Then I typed the command "jps" in the master machine right now, and found the right result:
5144 JobTracker
4953 NameNode
5079 SecondaryNameNode
5216 Jps
But after about several seconds, when I tried to type the "jps" command, all the virtual machines only have one process: JPS. The following is the result displayed in the namenode(master)
5236 Jps
What's the matter? Or how can I find what caused the matter? Dose it mean that it cannot find any namenode or datanode? Thank you.
Attachment: all the places I have modified:
hadoop-env.sh:
# set java environment
export JAVA_HOME=/usr/jdk1.6.0_13/
core-site.xml:
<configuration>
<property>
<name>master.node</name>
<value>namenode_master</value>
<description>master</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>local dir</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://${master.node}:9000</value>
<description> </description>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>${hadoop.tmp.dir}/hdfs/name</value>
<description>local dir</description>
</property>
<property>
<name>dfs.data.dir</name>
<value>${hadoop.tmp.dir}/hdfs/data</value>
<description> </description>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>${master.node}:9001</value>
<description> </description>
</property>
<property>
<name>mapred.local.dir</name>
<value>${hadoop.tmp.dir}/mapred/local</value>
<description> </description>
</property>
<property>
<name>mapred.system.dir</name>
<value>/tmp/mapred/system</value>
<description>hdfs dir</description>
</property>
</configuration>
master:
master
slaves:
slave1
slave2
/etc/hosts:
192.168.190.133 master
192.168.190.134 slave1
192.168.190.135 slave2
From the log files, I found that I should change the namenode_master to master in the file core-site.xml. Now it works.