spark 1.6.1 -- hive-site.xml -- not connecting to mysql [duplicate] - apache-spark

This question already has answers here:
How to connect Spark SQL to remote Hive metastore (via thrift protocol) with no hive-site.xml?
(11 answers)
Closed 2 years ago.
The following are the versions that we have
Spark 1.6.1
Hadoop 2.6.2
Hive 1.1.0
I have the hive-site.xml in $SPARK_HOME/conf directory. The hive.metastore.uris property is also configured properly.
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://host.domain.com:3306/metastore</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>user name for connecting to mysql server </description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>*****</value>
<description>password for connecting to mysql server </description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://host.domain.com:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
Unfortunately Spark is creating a temp derby db without connecting to MySQL metastore
I need Spark to connect to MySQL metastore as that is the central store for all metadata. Please help
Regards
Bala

Can you try passing the hive-site.xml
(--files) with spark-submit when running in cluster mode?

Related

hive on spark :org.apache.hadoop.hive.ql.metadata.HiveException

Buddy!
I get a problem while using hive(version 3.1.2) on spark(version 3.2.1) in my mac(Catalina 10.15.7). My hadoop and hive run in my mac in local mode and they both work well (I can insert records into hive table and select them out when I set hive.execution.engine=mr ).
After I made the configuration listed below in hive's hive-site.xml
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>spark.home</name>
<value>/Users/admin/spark/spark-3.2.1-bin-hadoop3.2/</value>
</property>
<property>
<name>spark.master</name>
<value>spark://localhost:7077</value>
</property>
Then I run hive in terminal and it works well(I can run select statement).
But when I insert records into table ,I get exception
FAILED: SemanticException Failed to get a spark session:
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for
Spark session b9704c3a-6932-44b3-a011-600ae80f39e1

Configure Hive metastore in my local Spark shell

I need to configure Hive metastore for use with Spark SQL in spark-shell.
I copied my hive-site.xml to spark/conf folder - it didn't work.
Then tried in spark shell
spark.conf.set("hive.metastore.uris","jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true")
spark.conf.set("spark.sql.catalogImplementation","hive")
But got an error:
scala> spark.conf.set("spark.sql.catalogImplementation","hive")
org.apache.spark.sql.AnalysisException: Cannot modify the value of a static config: spark.sql.catalogImplementation;
at org.apache.spark.sql.RuntimeConfig.requireNonStaticConf(RuntimeConfig.scala:155)
at org.apache.spark.sql.RuntimeConfig.set(RuntimeConfig.scala:41)
... 49 elided
Tried opening spark shell using
spark-shell --conf spark.sql.catalogImplementation=hive hive.metastore.uris=jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true
still not able to read
Error:
scala> spark.sql("select * from car_test.car_data_table").show
org.apache.spark.sql.AnalysisException: Table or view not found: `car_test`.`car_data_table`
Hive metastore is not getting attached to spark sql.
My hive-site.xml
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/raptor/tmp/hive/warehouse/</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>HIVE_USER</value>
</property>
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>HIVE_PASSWORD</value>
</property>
</configuration>
spark-env.sh
#!/usr/bin/env bash
export JAVA_HOME=/home/user/Softwares/jdk1.8.0_221/
export SPARK_LOCAL_IP=127.0.1.1
export HADOOP_HOME=/home/user/Softwares/hadoop-2.7.3/
export HIVE_HOME=/home/user/Softwares/apache-hive-2.1.1-bin/
export SPARK_LOCAL_IP=127.0.1.1
export SPARK_DIST_CLASSPATH=$HADOOP_HOME/bin/hadoop
#SPARK_DIST_CLASSPATH=$HADOOP_HOME/share/hadoop/common/lib:$HADOOP_HOME/share/hadoop/common:$HADOOP_HOME/share/hadoop/mapreduce:$HADOOP_HOME/share/hadoop/mapreduce/lib:$HADOOP_HOME/share/hadoop/yarn:$HADOOP_HOME/share/hadoop/yarn/lib:$HADOOP_HOME/share/hadoop/hdfs:$HADOOP_HOME/share/hadoop/hdfs/lib
export SPARK_HOME=/home/user/Softwares/spark-2.4.4-bin-hadoop-2.7-scala-2.12
export SPARK_CONF_DIR=${SPARK_HOME}/conf
export SPARK_LOG_DIR=${SPARK_HOME}/logs
Got help from this post.
How to connect Spark SQL to remote Hive metastore (via thrift protocol) with no hive-site.xml?
And One another thing, Spark 2.* will only support hive metastores from hive 0.12.0 to hive 1.2.0. I was using hive 2.1.1 that was the issue.

How to connect to remote Hive server without hive downloaded?

I am trying to access a Hive cluster without Hive downloaded on my machine. I read on here that I just need a jdbc client to do so. I have a url, username and password for the hive cluster. I have tried making a hive-site.xml with these, as well as doing it programmatically, although this method does not seem to have a place to input username and password. No matter what I do, it seems that the following error is keeping me from accessing hive:
Unable to instantiate
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
I feel like this is because I do not have Hive downloaded on my computer from the answers to this error online. What exactly do I need to do here to access it without hive downloaded, or do I actually have to download it? Here is my code for reference:
spark = SparkSession \
.builder \
.appName("interfacing spark sql to hive metastore without
configuration file") \
.config("hive.metastore.uris", "https://prod-fmhdinsight-
eu.azurehdinsight.net") \
.enableHiveSupport() \
.getOrCreate()
data = [('First', 1), ('Second', 2), ('Third', 3), ('Fourth', 4),
('Fifth', 5)]
df = spark.createDataFrame(data)
# see the frame created
df.show()
# write the frame
df.write.mode("overwrite").saveAsTable("t4")
and the hive-site.xml:
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>https://prod-fmhdinsight-eu.azurehdinsight.net</value>
</property>
<!--
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<-->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>https://prod-fmhdinsight-eu.azurehdinsight.net</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>username</value>
<description>user name for connecting to mysql server
</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>password</value>
<description>password for connecting to mysql server
</description>
</property>
tl;dr Use spark.sql.hive.metastore.jars configuration property with maven to let Spark SQL download the required jars.
The other options are builtin (that simply assumes Hive 1.2.1) and a classpath of the Hive JARs (e.g. spark.sql.hive.metastore.jars="/Users/jacek/dev/apps/hive/lib/*").
If your Hive metastore is available remotely via thrift protocol you may want to create $SPARK_HOME/conf/hive-site.xml as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
</property>
</configuration>
A nice feature of Hive is to define configuration properties as System properties so the above would look as follows:
$SPARK_HOME/bin/spark-shell \
--driver-java-options="-Dhive.metastore.uris=thrift://localhost:9083"
You may want to add the following to conf/log4j.properties for a more low-level logging:
log4j.logger.org.apache.spark.sql.hive.HiveUtils$=ALL
log4j.logger.org.apache.spark.sql.internal.SharedState=ALL

How to configure spark thrift user and password

i am trying to connect spark thrift server by beeline, and i started spark thrift as below:
start-thriftserver.sh --master yarn-client --num-executors 2 --conf spark.driver.memory=2g --executor-memory 3g
and the spark conf/hive-site.xml as below:
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:node001:3306/hive?useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>NONE</value>
</property>
<property>
<name>hive.server2.thrift.client.user</name>
<value>root</value>
</property>
<property>
<name>hive.server2.thrift.client.password</name>
<value>123456</value>
</property>
<property>
<name>hive.server2.thrift.port</name>
<value>10001</value>
</property>
<property>
<name>hive.security.authorization.enabled</name>
<value>true</value>
<description>enableor disable the hive clientauthorization</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
</configuration>
when i use beeline cli to access spark thrift server, it hint me input username and passwd, but i didn't input anything, just click enter(no input the username and password configured in hive-site.xml), i also can access spark. How can make the configuration work well.
Thanks :)
beeline> !connect jdbc:hive2://node001:10001
Connecting to jdbc:hive2://node001:10001
Enter username for jdbc:hive2://node001:10001:
Enter password for jdbc:hive2://node001:10001:
18/12/06 16:13:44 INFO Utils: Supplied authorities: node001:10001
18/12/06 16:13:44 INFO Utils: Resolved authority: node001:10001
18/12/06 16:13:45 INFO HiveConnection: Will try to open client transport with JDBC Uri: jdbc:hive2://node001:10001
Connected to: Spark SQL (version 2.4.0)
Driver: Hive JDBC (version 1.2.1.spark2)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://node001:10001> show databases;
+-------------------------+--+
| databaseName |
+-------------------------+-- |
| default |
| test |
+-------------------------+--+
7 rows selected (0.142 seconds)
0: jdbc:hive2://node001:10001>

Setting MySQL as the metastore for built in spark hive

I have an spark, scala sbt project using spark. I need to multiple create HiveContexts, which is not allowed by the built in derby for spark hive. Can someone help me with setting up mysql as the metastore instead of derby, which is the default db. I don't have actual hive installed or spark installed. I use sbt dependency for spark and hive.
Copy hive-site.xml file in Spark's conf directory and change some properties in that file
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
Shakti
You need to have the conf files in the class path. I'm using hadoop, hive, and spark with Intellij. In Intellij I have file:/usr/local/spark/conf/, file:/usr/local/hadoop/etc/hadoop/, and file:/usr/local/hive/conf/ in my class path. You can use following to print your run time class path:
val cl = ClassLoader.getSystemClassLoader
cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)
I hope this help if you haven't already found a fix.

Resources