cannot find a constructor for class org.apache.htrace.impl.ZipkinSpanReceiver which takes an HTraceConfiguration - zipkin

This is what i see in my HBase logs after following the instructions in this document:
http://hbase.apache.org/book.html#tracing
2017-03-02 09:47:12,851 ERROR [main] htrace.SpanReceiverBuilder: SpanReceiverBuilder cannot find a constructor for class org.apache.htrace.impl.ZipkinSpanReceiverwhich takes an HTraceConfiguration. Disabling span receiver.
Here's my entry in hbase-site.xml:
<property>
<name>hbase.trace.spanreceiver.classes</name>
<value>org.apache.htrace.impl.ZipkinSpanReceiver</value>
</property>
<property>
<name>hbase.htrace.zipkin.collector-hostname</name>
<value>jesaremi-svpc</value>
</property>
<property>
<name>hbase.htrace.zipkin.collector-port</name>
<value>9410</value>
</property>
and I have the following additional jar files in the HBase Lib folder:
zipkin-1.20.1.jar
htrace-core-3.1.0-incubating.jar
htrace-zipkin-4.2.0-incubating.jar

HTraceConfiguration package names are different.
(org.apache.htrace.HTraceConfiguration -> htrace-core-3.1.0-incubating.jar)
(org.apache.htrace.core.HTraceConfiguration -> htrace-zipkin-4.2.0-incubating.jar)
I solved the problem using htrace-zipkin version 3.1.0.

Related

How to configure the Hive cli when using the Spark execution engine?

I have set the hive.execution.engine to spark and also am using a spark-enabled queue. Spark sql is able to access the hive tables - and so is beeline from a directly connected cluster machine.
But the hive cli seems to need additional steps. So far the following have been done:
** Copy the scala libraries to the $HIVE_HOME/libs dir (or we get ClassNotFoundException)
** Run the following at the start of the hive script (or in .hiverc)
set hive.execution.engine=spark;
set mapred.job.queue.name=root.spark.sbg.hos;
However the following error now happens Failed to create spark client.:
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Logging initialized using configuration in jar:file:/usr/local/Cellar/hive/2.1.1/libexec/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
hive (default)> insert into sb.test2 values (1,'ab');
Query ID = sboesch_20171030175629_dc310c9a-519e-4f84-a632-f3a44f1df8c3
Total jobs = 3
Launching Job 1 out of 3
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Failed to execute spark task, with exception 'org.apache.hadoop.hive.ql.metadata.HiveException(Failed to create spark client.)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
Has anyone managed to connect to spark backend for hive ? I am connecting via vanilla hive (not Cloudera or Hortonworks or MapR).
you have to start Hive metastore Server separately for accessing hive tables through spark.
Try hive --service metastore in a new Terminal you will get a response like Starting Hive Metastore Server
hive-site.xml
`<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>**mysql metastore username**</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>**mysql metastore DB password**</value>
</property>
<property>
<name>hive.querylog.location</name>
<value>/tmp/hivequerylogs/${user.name}</value>
</property>
<property>
<name>hive.aux.jars.path</name>
<value>file:///usr/local/hive/apache-hive-2.1.1-bin/lib/hive-hbase-handler-2.1.1.jar,file:///usr/local/hive/apache-hive-2.1.1-bin/lib/zookeeper-3.4.6.jar</value>
<description>A comma separated list (with no spaces) of the jar files required for Hive-HBase integration</description>
</property>
<property>
<name>hive.support.concurrency</name>
<value>false</value>
</property>
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>PAM</value>
</property>
<property>
<name>hive.server2.custom.authentication.class</name>
<value>org.apache.hive.service.auth.PamAuthenticationProvider</value>
</property>
<property>
<name>hive.server2.authentication.pam.services</name>
<value>sshd,sudo</value>
</property>
<property>
<name>hive.stats.dbclass</name>
<value>jdbc:mysql</value>
</property>
<property>
<name>hive.stats.jdbcdriver</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>hive.session.history.enabled</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>hive.optimize.sort.dynamic.partition</name>
<value>false</value>
</property>
<property>
<name>hive.optimize.insert.dest.volume</name>
<value>false</value>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive/${user.name}</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: ${hive.exec.scratchdir}/<username> is created, with ${hive.scratch.dir.permission}.</description>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
<description/>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
<description>creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once</description>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.validateConstraints</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.validateColumns</name>
<value>true</value>
</property>
<property>
<name>datanucleus.schema.validateTables</name>
<value>true</value>
</property>
</configuration>`

Trying to run a spark-submit job on a yarn cluster but I keep getting the following warning. How do I fix the issue?

WARN YarnClusterScheduler: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have sufficient
resources.
I have looked through similar questions and tried everything else that was mentioned. When I look through the yarn-nodemanager log on hdfs I see the following warning that might be causing the error. How do I fix these warnings?
2017-09-13 14:29:52,640 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: The
Auxilurary Service named 'mapreduce_shuffle' in the configuration is for class
org.apache.hadoop.mapred.ShuffleHandler which has a name of 'httpshuffle'.
Because these are not the same tools trying to send ServiceData and read
Service Meta Data may have issues unless the refer to the name in the config.
yarn-site.xml log:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.application.classpath</name>
<value>/usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common
/*, /usr/local/hadoop/share/hadoop/common/lib/*,
/usr/local/hadoop/share/hadoop/
hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*,
/usr/local/hadoop/share/hadoo
p/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/*,
/usr/local/hadoop
/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/*</value>
</property>
<property>
<name>nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<description>
Number of seconds after an application finishes before the nodemanager's
DeletionService will delete the application's localized file directory
and log directory.
core-site.xml log:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://sandbox:9000</value>
</property>
<property>
<name>dfs.client.use.legacy.blockreader</name>
<value>true</value>
</property>
</configuration>
hdfs-site.xml log:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Please let me know if I am trying to find a solution to my initial warning in the wrong direction because the application keeps running but no data is sent to hdfs. Thank you!

HMaster is automatically stops in hbase

I installed and configured Hadoop (version 2.7.0) & HBase (version 1.2.3) in pseudo distributed mode. I test hadoop with a test program (Word Count) and every things is OK. Before I enter Hbase shell and get list of tables, Hmaster is running, but When I enter hbase shell and get list of tables (or create table), i see this Error:
hbase(main):001:0> list
TABLE
ERROR: Can't get master address from ZooKeeper; znode data == null
Here is some help for this command:
List all tables in hbase. Optional regular expression parameter could
be used to filter the output. Examples:
hbase> list
hbase> list 'abc.*'
hbase> list 'ns:abc.*'
hbase> list 'ns:.*'
And When i back from Hbase shell and get jps, I see that there is no HMaster running but HRegionServer and Zookeeper are still running.
Here is my Hbase-site.xml Configuration:
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://localhost:54310/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>localhost</value>
</property>
</configuration>
Here is my core-site.xml Configuration:
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop/tmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:54310</value>
</property>
</configuration>

Setting MySQL as the metastore for built in spark hive

I have an spark, scala sbt project using spark. I need to multiple create HiveContexts, which is not allowed by the built in derby for spark hive. Can someone help me with setting up mysql as the metastore instead of derby, which is the default db. I don't have actual hive installed or spark installed. I use sbt dependency for spark and hive.
Copy hive-site.xml file in Spark's conf directory and change some properties in that file
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
Shakti
You need to have the conf files in the class path. I'm using hadoop, hive, and spark with Intellij. In Intellij I have file:/usr/local/spark/conf/, file:/usr/local/hadoop/etc/hadoop/, and file:/usr/local/hive/conf/ in my class path. You can use following to print your run time class path:
val cl = ClassLoader.getSystemClassLoader
cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)
I hope this help if you haven't already found a fix.

Error in Configuring Spark/Shark on DSE

, I have installed
1) scala-2.10.3
2) spark-1.0.0
Changed spark-env.sh with below variables
export SCALA_HOME=$HOME/scala-2.10.3
export SPARK_WORKER_MEMORY=16g
I can see Spark master.
3) shark-0.9.1-bin-hadoop1
Changed shark-env.sh with below variables
export SHARK_MASTER_MEM=1g
SPARK_JAVA_OPTS=" -Dspark.local.dir=/tmp "
SPARK_JAVA_OPTS+="-Dspark.kryoserializer.buffer.mb=10 "
SPARK_JAVA_OPTS+="-verbose:gc -XX:-PrintGCDetails -XX:+PrintGCTimeStamps "
export SPARK_JAVA_OPTS
export HIVE_HOME=/usr/share/dse/hive
export HIVE_CONF_DIR="/etc/dse/hive"
export SPARK_HOME=/home/ubuntu/spark-1.0.0
export SPARK_MEM=16g
source $SPARK_HOME/conf/spark-env.sh
4) In DSE, Hive version is Hive 0.11
Existing Hive-site.xml is
<configuration>
<!-- Hive Execution Parameters -->
<property>
<name>hive.exec.mode.local.auto</name>
<value>false</value>
<description>Let hive determine whether to run in local mode automatically</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>cfs:///user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.hwi.war.file</name>
<value>lib/hive-hwi.war</value>
<description>This sets the path to the HWI war file, relative to ${HIVE_HOME}</description>
</property>
<property>
<name>hive.metastore.rawstore.impl</name>
<value>com.datastax.bdp.hadoop.hive.metastore.CassandraHiveMetaStore</value>
<description>Use the Apache Cassandra Hive RawStore implementation</description>
</property>
<property>
<name>hadoop.bin.path</name>
<value>${dse.bin}/dse hadoop</value>
</property>
<!-- Set this to true to enable auto-creation of Cassandra keyspaces as Hive Databases -->
<property>
<name>cassandra.autoCreateHiveSchema</name>
<value>true</value>
</property>
</configuration>
5) while running Shark shell getting error:
Unable to instantiate Org.apache.hadoop.hive.metastore.HiveMetaStoreClient
And
6) While running shark shell with -skipRddReload - I'm able to get Shark shell but not able to connect hive and not able execute any commands.
shark> DESCRIVE mykeyspace;
and getting error message:
FAILED: Error in metastore: java.lang.RuntimeException: Unable to instantiate org.apache.haddop.hive.metastore.HiveMataStoreClient.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.q1.exec.DDLTask.
Please provide details how to configure spark/shark on Datastax enterprise (Cassandra).

Resources