I am getting following exception while executing spark Jobs.
org.datanucleus.exceptions.NucleusDataStoreException: Exception thrown obtaining schema column information from datastore
Which is caused by
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Table 'hive_metastore.DELETEME1530184568175' doesn't exist
I had the same problem:
22/04/10 04:10:30 ERROR Datastore: Error thrown executing CREATE TABLE `DELETEME1649563830414`
(
`UNUSED` INTEGER NOT NULL
) ENGINE=INNODB : CREATE command denied to user 'someuser'#'xx.xx.xx.xx' for table 'DELETEME1649563830414'
since it was for develop, I just granted all privileges
mysql> GRANT ALL PRIVILEGES ON *.* TO 'someuser'#'%';
Related
While loading data from Sybase DB in AWS Glue I encounter an error:
Py4JJavaError: An error occurred while calling o261.load.
: java.sql.SQLException: The identifier that starts with '__SPARK_GEN_JDBC_SUBQUERY_NAME' is too long. Maximum length is 30.
The code I use is:
spark.read.format("jdbc").
option("driver", "net.sourceforge.jtds.jdbc.Driver").
option("url", jdbc_url).
option("query", query).
option("user", db_username).
option("password", db_password).
load()
Is there any way to set this identifier as a custom one in order to have it shorter? What's interesting I am able to load all the data from a particular table by replacing query option with option("dbtable", table) but invoking a custom query is impossible.
Best Regards
I've parquet table test_table created in Hive. The location of one of the partitions is '/user/hive/warehouse/prod.db/test_table/date_id=20210701'
Created the view based on this table:
create view prod.test_table_vw as
select date_id, src, telephone_number, action_date, duration from prod.test_table
Then granted select privilege to some role:
GRANT SELECT ON TABLE prod.test_table_vw TO ROLE data_analytics;
Then user with this role trying to query this data using spark.sql() from pyspark:
sql = spark.sql(f"""SELECT DISTINCT phone_number
FROM prod.test_table_vw
WHERE date_id=20210701""")
sql.show(5)
This code returns permission denied error:
Py4JJavaError: An error occurred while calling o138.collectToPython.
: org.apache.hadoop.security.AccessControlException: Permission denied: user=keytabuser, access=READ_EXECUTE, inode="/user/hive/warehouse/prod.db/test_table/date_id=20210701":hive:hive:drwxrwx--x
I can't grant select rights for this table due to some sensitive fields in it, that's why I created view with limited list of columns.
Questions:
Why Spark ignores Hive's select permissions on view? And how to resolve this issue?
It's Spark's limitation:
When a Spark job accesses a Hive view, Spark must have privileges to
read the data files in the underlying Hive tables. Currently, Spark
cannot use fine-grained privileges based on the columns or the WHERE
clause in the view definition. If Spark does not have the required
privileges on the underlying data files, a SparkSQL query against the
view returns an empty result set, rather than an error.
Reference link.
Environment
spark 3.0.0
hive metastore (standalone) 3.0.0
mysql 8 as the metastore db
Problem
Every time I try to drop a database in the metastore via spark, I get AnalysisException and I don't know what is causing it or whether the drop operation is succeeding in it's entirety
Example
spark.sql(f"CREATE DATABASE IF NOT EXISTS myDb LOCATION 'shared-metastore-location/myDb.db'")
################
# DB creation succeeds and I can view the db in the metastore, add tables etc
################
spark.sql(f"DROP DATABASE myDb CASCADE")
################
AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Unable to clean up java.sql.SQLException: The table does not comply with the requirements by an external plugin.\n\tat com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)\n\tat com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:97)\n\tat com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)\n\tat com.mysql.cj.jdbc.StatementImpl.executeUpdateInternal(StatementImpl.java:1335)\n\tat com.mysql.cj.jdbc.StatementImpl.executeLargeUpdate(StatementImpl.java:2108)\n\tat com.mysql.cj.jdbc.StatementImpl.executeUpdate(StatementImpl.java:1245)\n\tat com.zaxxer.hikari.pool.ProxyStatement.executeUpdate(ProxyStatement.java:117)\n\tat com.zaxxer.hikari.pool.HikariProxyStatement.executeUpdate(HikariProxyStatement.java)\n\tat org.apache.hadoop.hive.metastore.txn.TxnHandler.cleanupRecords(TxnHandler.java:2741)\n\tat org.apache.hadoop.hive.metastore.AcidEventListener.onDropDatabase(AcidEventListener.java:52)\n\tat org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier$21.notify(MetaStoreListenerNotifier.java:85)\n\tat org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:264)\n\tat org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:326)\n\tat org.apache.hadoop.hive.metastore.MetaStoreListenerNotifier.notifyEvent(MetaStoreListenerNotifier.java:364)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database_core(HiveMetaStore.java:1537)\n\tat org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_database(HiveMetaStore.java:1575)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)\n\tat org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)\n\tat com.sun.proxy.$Proxy32.drop_database(Unknown Source)\n\tat org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:14352)\n\tat org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$drop_database.getResult(ThriftHiveMetastore.java:14336)\n\tat org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)\n\tat org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:111)\n\tat org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:107)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)\n\tat org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:119)\n\tat org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\n\tat java.lang.Thread.run(Thread.java:748)\n);'
Despite the Exception, the database does disappear after I run this code. And if I try to run the drop command a second time, I get a different exception saying that the database doesn't exist. But, I have no idea whether the operation has succeeded in its entirety or whether it's leaving a mess behind. I'm not familiar enough with Hive to know what should be deleted in the metastore to completely delete a db
I seem to get the same result whether I have tables in the db or not. I've also tried dropping the table without cascade. Same result
When run sql select * from hive.information_schema.columns; in presto client. I get the error infomation:
Query 20170208_085534_00061_ny9tu failed: outputFormat should not be accessed from a null StorageFormat
However, It succeed when select from other table in information_schema like select * from hive.information_schema.tables;
Can anybody help?
Thanks.
This is a bug caused by a table that has metadata we aren't expecting.
I scan all tables in hive and find some tables' InputFormat/OutputFormat is null. It will get the same error information if I DESCRIBE TABLENAME for those table with null InputFormat/OutputFormat in presto.
reference
I have created the following schema to represent the association between a user and a set of threads ordered by their last message (which threads the user has read and which ones he has not):
CREATE TABLE table(user_id bigint, message_id bigint, thread_id bigint, read boolean, PRIMARY KEY(user_id, message_id)) WITH CLUSTERING ORDER BY (message_id DESC);
CREATE INDEX ON table(read);
After inserting some values, I try to run this query to get the most recent read or unread threads for a user:
SELECT thread_id, message_id FROM table WHERE user_id = ? AND message_id < ? AND read = ? LIMIT ?
The query works if run via cqlsh. However, when run through the datastax client, on the client side we get a timeout exception and on the server side, the Cassandra log shows this exception:
ERROR [ReadStage:4190] 2013-12-10 13:18:03,579 CassandraDaemon.java (line 187) Exception in thread Thread[ReadStage:4190,5,main]
java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1940)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at org.apache.cassandra.db.filter.SliceQueryFilter.start(SliceQueryFilter.java:261)
at org.apache.cassandra.db.index.composites.CompositesSearcher.makePrefix(CompositesSearcher.java:66)
at org.apache.cassandra.db.index.composites.CompositesSearcher.getIndexedIterator(CompositesSearcher.java:101)
at org.apache.cassandra.db.index.composites.CompositesSearcher.search(CompositesSearcher.java:53)
at org.apache.cassandra.db.index.SecondaryIndexManager.search(SecondaryIndexManager.java:537)
at org.apache.cassandra.db.ColumnFamilyStore.search(ColumnFamilyStore.java:1669)
at org.apache.cassandra.db.PagedRangeCommand.executeLocally(PagedRangeCommand.java:109)
at org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1423)
at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1936)
... 3 more
Does anyone know what the problem is? Thanks!
This bug can now be tracked at https://issues.apache.org/jira/browse/CASSANDRA-6470