sqlContext.tableNames not fetching any hive tables [duplicate] - apache-spark

I have Hive 0.13 installation and have created custom databases. I have spark 1.1.0 single node cluster built using mvn -hive option.
I want to access tables in this database in spark application using hivecontext. But hivecontext is always reading the local metastore created in spark directory. I have copied the hive-site.xml in spark/conf directory.
Do I need to do any other configuration?

Step 1:
Setup SPARK with latest version....
$ cd $SPARK_Home; ./sbt/sbt -Phive assembly
$ cd $SPARK_Home; ./sbt/sbt -Phivethriftserver assembly
By executing this you will download some jar files and bydefault it will be added no need to add....
Step 2:
Copy hive-site.xml from your Hive cluster to your $SPARK_HOME/conf/dir and edit the XML file and add these properties to that file which is listed below:
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://MYSQL_HOST:3306/hive_{version}</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore/description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>XXXXXXXX</value>
<description>Username to use against metastore database/description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>XXXXXXXX</value>
<description>Password to use against metastore database/description>
</property>
Step 3: Download MYSQL JDBC connector and add that to SPARK CLASSPATH.
Run this command bin/compute-classpath.sh
and add the below line for the following script.
CLASSPATH=”$CLASSPATH:$PATH_TO_mysql-connector-java-5.1.10.jar
How to retrieve the data from HIVE to SPARK....
Step 1:
Start all deamons by the following command....
start-all.sh
Step 2:
Start hive thrift server 2 by the following command....
hive --service hiveserver2 &
Step 3:
Start spark server by the following command....
start-spark.sh
And finally check whether these are started or not by checking with the following command....
RunJar
ResourceManager
Master
NameNode
SecondaryNameNode
Worker
Jps
JobHistoryServer
DataNode
NodeManager
Step 4:
Start the master by the following command....
./sbin/start-master.sh
To stop the master use the below command.....
./sbin/stop-master.sh
Step 5:
Open a new terminal....
Start the beeline by the following path....
hadoop#localhost:/usr/local/hadoop/hive/bin$ beeline
After it asks for input... Pass the input which is listed below....
!connect jdbc:hive2://localhost:10000 hadoop "" org.apache.hive.jdbc.HiveDriver
After that set the SPARK by the following commands....
Note:set these configurations on a conf file so no need to run always....
set spark.master=spark://localhost:7077;
set hive.execution.engines=spark;
set spark.executor.memory=2g; // set the memory depends on your server
set spark.serializer=org.apache.spark.serializer.kryoSerializer;
set spark.io.compression.codec=org.apache.spark.io.LZFCompressionCodec;
After it asks for input.... Pass the Query which you want to retrieve the data.... and open a browser and check in the URL by the following command localhost:8080 You can see the Running Jobs and Completed Jobs in the URL....

Related

hive on spark :org.apache.hadoop.hive.ql.metadata.HiveException

Buddy!
I get a problem while using hive(version 3.1.2) on spark(version 3.2.1) in my mac(Catalina 10.15.7). My hadoop and hive run in my mac in local mode and they both work well (I can insert records into hive table and select them out when I set hive.execution.engine=mr ).
After I made the configuration listed below in hive's hive-site.xml
<property>
<name>hive.execution.engine</name>
<value>spark</value>
</property>
<property>
<name>spark.home</name>
<value>/Users/admin/spark/spark-3.2.1-bin-hadoop3.2/</value>
</property>
<property>
<name>spark.master</name>
<value>spark://localhost:7077</value>
</property>
Then I run hive in terminal and it works well(I can run select statement).
But when I insert records into table ,I get exception
FAILED: SemanticException Failed to get a spark session:
org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create Spark client for
Spark session b9704c3a-6932-44b3-a011-600ae80f39e1

how to write data into hive table using spark remotely?

I'm new to hadoop world. I have installed spark 2.3.1 in my windows machine and installed cloudera inside vm in the same machine. i'm doing some data transformation in the form of dataframe using spark shell. Now i want to put this data to hive which is in cloudera using spark. i have googled and did the following steps.
1) Copied all the files in /etc/hive/conf and pasted to spark/conf in my windows.
2) In windows spark/conf open “hive-site.xml” and change property as below.
<configuration>
<property>
<name>hive.metastore.uris</name>
<value>thrift://MyclouderaIP:9083</value>
</property>
<property>
3) Put host entry in window system C:\Windows\System32\drivers\etc\hosts
example : MyclouderaIP quickstart.cloudera
4) In cloudera vm open “/etc/hive/conf/hdfs-site.xml” and change the property like below
<property>
<name>dfs.client.use.datanode.hostname</name>
<value>true</value>
</property>
After finishing all the steps i am facing issues below.
scala> val Main = sc.textFile("D:\\Windows\\CompanyData.txt")
scala> Main.collect
Error :
java.lang.IllegalArgumentException: Pathname /D:/Windows/CompanyData.txt from hdfs://quickstart.cloudera:8020/D:/Windows/CompanyData.txt is not a valid DFS filename.
at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:197)
at org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFileSystem.java:106)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
i have removed "core-site.xml" from spark/conf and it can able to read textfile in windows . But it saprk is not able to communicate with cloudera while inserting a record .
scala> import org.apache.spark.sql.hive.HiveContext
scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> sqlContext.sql("insert into TestTable select 1")
Error:
org.apache.hadoop.ipc.RemoteException(java.io.IOException):
File /user/hive/warehouse/TestTable/.hive-staging_hive_2018-10-17_00-03-48_369_2112774544260501723-1/-ext-10000/_temporary/0/_temporary/attempt_20181017000351_0000_m_000000_0/part-00000-8fcba81b-8a51-48a6-9c47-ac5f1c9dafdb-c000
could only be replicated to 0 nodes instead of minReplication (=1).
There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
please can somebody help me .

Unable to write data on hive using spark

I am using spark1.6. I am creating hivecontext using spark context. When I save the data into hive it gives error. I am using cloudera vm. My hive is inside cloudera vm and spark in on my system. I can access the vm using IP. I have started the thrift server and hiveserver2 on vm. I have user thrift server uri for hive.metastore.uris
val hiveContext = new HiveContext(sc)
hiveContext.setConf("hive.metastore.uris", "thrift://IP:9083")
............
............
df.write.mode(SaveMode.Append).insertInto("test")
I get the following error:
FAILED: SemanticException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClien‌​t
Probably inside spark conf folder, hive-site.xml is not available , I have added the details below.
Adding hive-site.xml inside spark configuration folder.
creating a symlink which points to hive-site.xml in hive configuration folder.
sudo ln -s /usr/lib/hive/conf/hive-site.xml /usr/lib/spark/conf/hive-site.xml
after the above steps, restarting spark-shell should help.

presto hive: Query failed: Partition location does not exist:

I have presto server(linux) with config.properties:
coordinator=false
datasources=hive
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://localhost:8080
And catalog/hive.properties:
connector.name=hive-cdh4
hive.metastore.uri=thrift://another-machine:port
Hive are working on another mashine (windows(!)).
Db client works correct with hive. Tables can be created, updated etc.
But if I use presto client (presto.jar from linux)
ERROR:
Query 20160426_133746_00003_8iq7y failed: Partition location does not exist: file:/C:/tmp/user/hive/warehouse/test
Path is correct. Exist.
hive-site.xml config contains:
<property>
<name>hive.metastore.warehouse.dir</name>
<value>file:///C:/tmp/user/hive/warehouse</value>
<description>location of the warehouse directory</description>
</property>
Need help to resolve this problem. Thanks.

Setting MySQL as the metastore for built in spark hive

I have an spark, scala sbt project using spark. I need to multiple create HiveContexts, which is not allowed by the built in derby for spark hive. Can someone help me with setting up mysql as the metastore instead of derby, which is the default db. I don't have actual hive installed or spark installed. I use sbt dependency for spark and hive.
Copy hive-site.xml file in Spark's conf directory and change some properties in that file
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
Shakti
You need to have the conf files in the class path. I'm using hadoop, hive, and spark with Intellij. In Intellij I have file:/usr/local/spark/conf/, file:/usr/local/hadoop/etc/hadoop/, and file:/usr/local/hive/conf/ in my class path. You can use following to print your run time class path:
val cl = ClassLoader.getSystemClassLoader
cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)
I hope this help if you haven't already found a fix.

Resources