Cannot find class 'org.apache.hadoop.hive.druid.DruidStorageHandler' - linux

The jar file for druid hive handler is there. Clients table is already there in hive with data. Filename in hive library folder hive-druid-handler-3.1.2.jar.
I am getting the error an when I try to create table in hive for druid
FAILED: SemanticException Cannot find class 'org.apache.hadoop.hive.druid.DruidStorageHandler'
Here is the SQL.
CREATE TABLE ssb_druid_hive
STORED BY 'org.apache.hadoop.hive.
druid.DruidStorageHandler'
TBLPROPERTIES (
"druid.segment.granularity" = "MONTH",
"druid.query.granularity" = "DAY")
AS
SELECT
cast(clients.first_name as int) first_name ,
cast(clients.last_name as int) last_name
from clients
what could be the reason ?

I found some people having the similar problem and here's the Link to the external forum
In conclusion, you may have to reinstall the latest version of the file for it to work.
i.e. download the latest version of Hive. If you have downloaded Hive1, download Hive2 and it would work.
Here's a pdf format of the webpage (just in case that one is dropped):
https://drive.google.com/file/d/1-LgtgJa6FPgULeG09qbFNIYA2EgUCJK9/view?usp=sharing

I faced same issue while creating external table on hive.
We need to add hive-druid-handler-3.1.2.jar jar to your hive server.
To add this temporarily,
1. Download hive-druid-handler-3.1.2.jar from here
2. Copy .jar to s3 or blob
3. Goto hive CLI and type add jars s3://your-bucket/hive-druid-handler-3.1.2.jar
To add Permanently
1. Copy hive-druid-handler-3.1.2.jar into hive lib folder.
hdfs dfs -copyToLocal s3://your-bucket/hive-druid-handler-3.1.2.jar /usr/hdp/4.1.4.8/hive/lib/
2. Restart hive server

Related

Failed to open HDFS file after load data from Spark

I'm Using Java-Spark.
I'm loading Parquet data into Hive table as follow:
ds.write().mode("append").format("parquet").save(path);
Then I make
spark.catalog().refreshTable("mytable");//mytable is External table
And after I'm trying to see the data from Impala I got the following exception:
Failed to open HDFS file
No such file or directory. root cause: RemoteException: File does not exist
After I make on impala refresh mytable I can see the data.
How can I make the refresh command from Spark?
I'm try also
spark.sql("msck repair table mytable");
And still not working for me.
Any suggestions?
Thanks.

Spark returns Empty DataFrame but Populated in Hive

I have a table in hive
db.table_name
When I run the following in hive I get results back
SELECT * FROM db.table_name;
When I run the following in a spark-shell
spark.read.table("db.table_name").show
It shows nothing. Similarly
sql("SELECT * FROM db.table_name").show
Also shows nothing. Selecting arbitrary columns out before the show also displays nothing. Performing a count states the table has 0 rows.
Running the same queries works against other tables in the same database.
Spark Version: 2.2.0.cloudera1
The table is created using
table.write.mode(SaveMode.Overwrite).saveAsTable("db.table_name")
And if I read the file using the parquet files directly it works.
spark.read.parquet(<path-to-files>).show
EDIT:
I'm currently using a workaround by describing the table and getting the location and using spark.read.parquet.
Have you refresh metadata table? Maybe you need to refresh table to access to new data.
spark.catalog.refreshTable("my_table")
I solved the problem by using
query_result.write.mode(SaveMode.Overwrite).format("hive").saveAsTable("table")
which stores the results in textfile.
There is probably some incompatibility with the Hive parquet.
I also found a Cloudera report about it (CDH Release Notes): they recommend creating the Hive table manually and then load data from a temporary table or by query.

Unable to query HIVE Parquet based EXTERNAL table from spark-sql

We have an External Hive Table which is stored as Parquet. I am not the owner of the schema in which this hive-parquet table is so don't have much info.
The Problem here is when in try to Query that table from spark-sql>(Shell prompt) Not by using scala like spark.read.parquet("path"), I am getting 0 records stating "Unable to infer schema". But when i created a Managed Table by using CTAS in my personal schema just for testing i was able to query it from the spark-sql>(Shell prompt)
When i try it from spark-shell> via spark.read.parquet("../../00000_0").show(10) , I was able to see the data.
So this clears that something is wrong between
External Hive table - Parquet - Spark-SQL(shell)
If locating Schema would be the issue then it should behave same while accessing through spark session (spark.read.parquet(""))
I am using MapR 5.2, Spark version 2.1.0
Please suggest what can be the issue

PrestoDB - Where are Parquet files stored?

I have Presto installed along side AWS EMR. I've created a table in Presto from a Hive table.
CREATE TABLE temp_table
WITH (format = 'PARQUET')
AS
SELECT * FROM <hive_table>;
Where are the Parquet files stored?
Or, where are any of the files stored when a CREATE TABLE statement is executed?
The data is stored in the Hive Warehouse, viewable on the Master node.
hdfs://ip-###-###-###-###.ec2.internal:8020/user/hive/warehouse/<table_name>/
Viewable through the following command:
hadoop fs -ls hdfs://ip-###-###-###-###.ec2.internal:8020/user/hive/warehouse/<table_name>/

Does Presto support Parquet format?

Running CDH4 cluster with Impala, I created parquet table and after adding parquet jar files to hive, I can query the table using hive.
Added same set of jars to /opt/presto/lib and restarted coordinator and workers.
parquet-avro-1.2.4.jar
parquet-cascading-1.2.4.jar
parquet-column-1.2.4.jar
parquet-common-1.2.4.jar
parquet-encoding-1.2.4.jar
parquet-format-1.0.0.jar
parquet-generator-1.2.4.jar
parquet-hadoop-1.2.4.jar
parquet-hive-1.2.4.jar
parquet-pig-1.2.4.jar
parquet-scrooge-1.2.4.jar
parquet-test-hadoop2-1.2.4.jar
parquet-thrift-1.2.4.jar
Still getting this error when running parquet select query from Presto:
> select * from test_pq limit 2;
Query 20131116_144258_00002_d3sbt failed : org/apache/hadoop/hive/serde2/SerDe
Presto now supports Parquet automatically.
Try to add the jars in presto plugin dir instead of presto lib dir.
Presto auto loads jars from plugins dirs.

Resources