As am trying to get familiar with hbase, I have created a table example in hbase shell. I reformatted the NameNode for hadoop and (because I didn't properly shut it down before my computer ran out of battery) restarted hadoop and hbase. But now I when I try to create example table I get the following error:
ERROR: Table already exists: example!
and when I try to disable it and drop it I get the following:
ERROR: Table example does not exist.
When I try to list the tables, no tables are listed. I even removed the hbase directory from hadoop, but the problem still persists.
Related
Is there a way how to truncate/overwrite external Hive table which has some hdfs snapshots with Spark?
I´ve already tried to do
spark.sql("TRUNCATE TABLE xyz");
and
df.limit(0).write.mode("overwrite").insertInto("xyz")
but everytime I get something like
The directory /path/to/table/xyz cannot be deleted since /path/to/table/xyz is snapshottable and already has snapshots
I noticed that in Hive it works - apparently was implemented years ago https://issues.apache.org/jira/browse/HIVE-11667
I´m really surprised Spark cant handle that. Any ideas?
I have an error with the way Zeppelin cache tables. We update the data in the Glue Data Catalog in real time, so when we want to query a partition that was updated using Spark, sometimes we get the following error:
org.apache.spark.sql.execution.datasources.FileDownloadException: Failed to download file path: s3://bucket/prefix/partition.snappy.parquet, range: 0-16165503, partition values: [empty row], isDataPresent: false, eTag: 53ea26b5ecc9a194efe5163f3c297800-1
This can be solved by issuing the command refresh table <table_name> or restarting the Spark interpreter from the Zeppelin UI, but it might as well be the retrying that solves the issue instead of deleting the cache.
One solution may be to run a scheduled query that refresh all tables at a given time, but this would be highly inefficient.
Thanks!
please spark.sql("refresh TABLE {db}.{table}")
When to execute REFRESH TABLE my_table in spark?
I am trying to run a spark job written in Java, on the Spark cluster to load records as dataframe into a Hive Table i created.
df.write().mode("overwrite").insertInto(dbname.tablename);
Although the table and database exists in Hive, it throws below error:
org.apache.spark.sql.AnalysisException: Table or view not found: dbname.tablename, the database dbname doesn't exist.;
I also tried reading from an existing hive table different than the above table thinking there might be an issue while my table creation.
I also checked if my user has permission to the hdfs folder where the hive is storing the data.
It all looks fine, not sure what could be the issue.
Please suggest.
Thanks
I think it is searching for that table in spark instead of hive.
I am having an issue with pyspark sql module. I created a partitioned table and saved it as parquet file into hive table by running spark job after multiple transformations.
Data load is successful into hive and also able to query the data. But when I try to query the same data from spark it says file path doesn't exist.
java.io.FileNotFoundException: File hdfs://localhost:8020/data/path/of/partition partition=15f244ee8f48a2f98539d9d319d49d9c does not exist
The partition which is mentioned in above error was the old partitioned column data which doesn't even exist now.
I have run the spark job which populates a new partition value.
I searched for solutions but all I can see is people say there was no issue in spark version 1.4 and there is an issue in 1.6
Can someone please suggest me the solution for this problem.
Hi I am relatively new to HIVE and HDFS so apologies in advance if I am not wording this correctly.
I have used Microsoft Azure to create a virtual machine. I am then logging into this using putty and Ambari Sandbox.
In Ambari I am using HIVE, all is working fine but I am having major issues with memory allocation.
When I drop a table in Hive I will then go into my 'Hive View' and delete the table from the trash folder. However this is freeing up no memory within the HDFS.
The table is now gone from my HIVE database and also from the trash folder but no memory has been freed.
Is there somewhere else where I should be deleting the table from?
Thanks in advance.
According to your description, as #DuduMarkovitz said, I also don't know what HDFS memory you said is, but I think that you want to say is the table data files on HDFS.
Per my experience, I think the table you dropped in Hive is an external table, not an internal table. You can get the feature below from Hive offical document for External Tables.
External Tables
The EXTERNAL keyword lets you create a table and provide a LOCATION so that Hive does not use a default location for this table. This comes in handy if you already have data generated. When dropping an EXTERNAL table, data in the table is NOT deleted from the file system.
The difference between interal table and external table, you can refer to here.
So if you want to recycle the external table data from HDFS after dropped the external table, you need to use the commend below for HDFS to remove it manually.
hadoop fs -rm -f -r <your-hdfs-path-url>/apps/hive/warehouse/<database name>/<table-name>
Hope it helps.
Try DESCRIBE FORMATTED <table_name> command. It should show you location of file in HDFS. Check if this location is empty.