presto hive: Query failed: Partition location does not exist: - linux

I have presto server(linux) with config.properties:
coordinator=false
datasources=hive
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
And catalog/hive.properties:
connector.name=hive-cdh4
hive.metastore.uri=thrift://another-machine:port
Hive are working on another mashine (windows(!)).
Db client works correct with hive. Tables can be created, updated etc.
But if I use presto client (presto.jar from linux)
ERROR:
Query 20160426_133746_00003_8iq7y failed: Partition location does not exist: file:/C:/tmp/user/hive/warehouse/test
Path is correct. Exist.
hive-site.xml config contains:
<property>
<name>hive.metastore.warehouse.dir</name>
<value>file:///C:/tmp/user/hive/warehouse</value>
<description>location of the warehouse directory</description>
</property>
Need help to resolve this problem. Thanks.

Related

hive on spark :org.apache.hadoop.hive.ql.metadata.HiveException

Buddy!
I get a problem while using hive(version 3.1.2) on spark(version 3.2.1) in my mac(Catalina 10.15.7). My hadoop and hive run in my mac in local mode and they both work well (I can insert records into hive table and select them out when I set hive.execution.engine=mr ).
After I made the configuration listed below in hive's hive-site.xml
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>spark.home</name>
<value>/Users/admin/spark/spark-3.2.1-bin-hadoop3.2/</value>
</property>
<property>
<name>spark.master</name>
<value>spark://localhost:7077</value>
</property>
Then I run hive in terminal and it works well(I can run select statement).
But when I insert records into table ,I get exception
FAILED: SemanticException Failed to get a spark session:
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for
Spark session b9704c3a-6932-44b3-a011-600ae80f39e1

Unable to write data on hive using spark

I am using spark1.6. I am creating hivecontext using spark context. When I save the data into hive it gives error. I am using cloudera vm. My hive is inside cloudera vm and spark in on my system. I can access the vm using IP. I have started the thrift server and hiveserver2 on vm. I have user thrift server uri for hive.metastore.uris
val hiveContext = new HiveContext(sc)
hiveContext.setConf("hive.metastore.uris", "thrift://IP:9083")
............
............
df.write.mode(SaveMode.Append).insertInto("test")
I get the following error:
FAILED: SemanticException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClien‌​t
Probably inside spark conf folder, hive-site.xml is not available , I have added the details below.
Adding hive-site.xml inside spark configuration folder.
creating a symlink which points to hive-site.xml in hive configuration folder.
sudo ln -s /usr/lib/hive/conf/hive-site.xml /usr/lib/spark/conf/hive-site.xml
after the above steps, restarting spark-shell should help.

sqlContext.tableNames not fetching any hive tables [duplicate]

I have Hive 0.13 installation and have created custom databases. I have spark 1.1.0 single node cluster built using mvn -hive option.
I want to access tables in this database in spark application using hivecontext. But hivecontext is always reading the local metastore created in spark directory. I have copied the hive-site.xml in spark/conf directory.
Do I need to do any other configuration?
Step 1:
Setup SPARK with latest version....
$ cd $SPARK_Home; ./sbt/sbt -Phive assembly
$ cd $SPARK_Home; ./sbt/sbt -Phivethriftserver assembly
By executing this you will download some jar files and bydefault it will be added no need to add....
Step 2:
Copy hive-site.xml from your Hive cluster to your $SPARK_HOME/conf/dir and edit the XML file and add these properties to that file which is listed below:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://MYSQL_HOST:3306/hive_{version}</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore/description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>XXXXXXXX</value>
<description>Username to use against metastore database/description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>XXXXXXXX</value>
<description>Password to use against metastore database/description>
</property>
Step 3: Download MYSQL JDBC connector and add that to SPARK CLASSPATH.
Run this command bin/compute-classpath.sh
and add the below line for the following script.
CLASSPATH=”$CLASSPATH:$PATH_TO_mysql-connector-java-5.1.10.jar
How to retrieve the data from HIVE to SPARK....
Step 1:
Start all deamons by the following command....
start-all.sh
Step 2:
Start hive thrift server 2 by the following command....
hive --service hiveserver2 &
Step 3:
Start spark server by the following command....
start-spark.sh
And finally check whether these are started or not by checking with the following command....
RunJar
ResourceManager
Master
NameNode
SecondaryNameNode
Worker
Jps
JobHistoryServer
DataNode
NodeManager
Step 4:
Start the master by the following command....
./sbin/start-master.sh
To stop the master use the below command.....
./sbin/stop-master.sh
Step 5:
Open a new terminal....
Start the beeline by the following path....
hadoop#localhost:/usr/local/hadoop/hive/bin$ beeline
After it asks for input... Pass the input which is listed below....
!connect jdbc:hive2://localhost:10000 hadoop "" org.apache.hive.jdbc.HiveDriver
After that set the SPARK by the following commands....
Note:set these configurations on a conf file so no need to run always....
set spark.master=spark://localhost:7077;
set hive.execution.engines=spark;
set spark.executor.memory=2g; // set the memory depends on your server
set spark.serializer=org.apache.spark.serializer.kryoSerializer;
set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
After it asks for input.... Pass the Query which you want to retrieve the data.... and open a browser and check in the URL by the following command localhost:8080 You can see the Running Jobs and Completed Jobs in the URL....

spark 1.6.1 -- hive-site.xml -- not connecting to mysql [duplicate]

This question already has answers here:
How to connect Spark SQL to remote Hive metastore (via thrift protocol) with no hive-site.xml?
(11 answers)
Closed 2 years ago.
The following are the versions that we have
Spark 1.6.1
Hadoop 2.6.2
Hive 1.1.0
I have the hive-site.xml in $SPARK_HOME/conf directory. The hive.metastore.uris property is also configured properly.
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://host.domain.com:3306/metastore</value>
<description>metadata is stored in a MySQL server</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>MySQL JDBC driver class</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>user name for connecting to mysql server </description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>*****</value>
<description>password for connecting to mysql server </description>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://host.domain.com:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>
Unfortunately Spark is creating a temp derby db without connecting to MySQL metastore
I need Spark to connect to MySQL metastore as that is the central store for all metadata. Please help
Regards
Bala
Can you try passing the hive-site.xml
(--files) with spark-submit when running in cluster mode?

Setting MySQL as the metastore for built in spark hive

I have an spark, scala sbt project using spark. I need to multiple create HiveContexts, which is not allowed by the built in derby for spark hive. Can someone help me with setting up mysql as the metastore instead of derby, which is the default db. I don't have actual hive installed or spark installed. I use sbt dependency for spark and hive.
Copy hive-site.xml file in Spark's conf directory and change some properties in that file
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
Shakti
You need to have the conf files in the class path. I'm using hadoop, hive, and spark with Intellij. In Intellij I have file:/usr/local/spark/conf/, file:/usr/local/hadoop/etc/hadoop/, and file:/usr/local/hive/conf/ in my class path. You can use following to print your run time class path:
val cl = ClassLoader.getSystemClassLoader
cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)
I hope this help if you haven't already found a fix.

Resources