couldn't save table in azure databricks - azure

During saving dataframe to tables in Azure Databricks I get error,
val employeesDf = Seq(
("Rafferty", Some(31)), ("Jones", Some(33)), ("Heisenberg", Some(33)),
("Robinson", Some(34)), ("Smith", Some(34)), ("Williams", null)
).toDF("LastName","DepartmentID").write.format("parquet").mode("overwrite").saveAsTable("employ ees_table")
org.apache.spark.sql.AnalysisException:
org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:javax.jdo.JDOUserException: Table
"partition_keys" has been specified with a primary-key to include
column "TBL_ID" but this column is not found in the table. Please
check your column specification. <div
class="ansiout"> at
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:549)
at
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
at
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
at
org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:719)
at sun.reflect.GeneratedMethodAccessor441.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
at com.sun.proxy.$Proxy32.createTable(Unknown Source) at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1261)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1294)
at sun.reflect.GeneratedMethodAccessor439.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at
com.sun.proxy.$Proxy33.create_table_with_environment_context(Unknown
Source) ...
Additionally, I am also getting an error during run an example notebook from databricks, in which create tables from path on dbfs
%sql
DROP TABLE IF EXISTS diamonds;
CREATE TABLE diamonds
USING csv
OPTIONS (path "/databricks-datasets/Rdatasets/data-
001/csv/ggplot2/diamonds.csv", header "true")
Error in SQL statement: AnalysisException:
org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:javax.jdo.JDOUserException: Table
"partition_keys" has been specified with a primary-key to include
column "TBL_ID" but this column is not found in the table. Please
check your column specification. at
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:549)
at
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
at
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
at
org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:719)
at sun.reflect.GeneratedMethodAccessor441.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
at com.sun.proxy.$Proxy32.createTable(Unknown Source) at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1261)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1294)
at sun.reflect.GeneratedMethodAccessor439.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at
com.sun.proxy.$Proxy33.create_table_with_environment_context(Unknown
Source) at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:558)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:547)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy34.createTable(Unknown Source) at
org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:613) at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply$mcV$sp(HiveClientImpl.scala:528)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:526)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:526)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:322)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:230)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:222)
at
org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:266)
at
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:222)
at
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:305)
at
org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:526)
at
org.apache.spark.sql.hive.client.PoolingHiveClient$$anonfun$createTable$1.apply(PoolingHiveClient.scala:286)
at
org.apache.spark.sql.hive.client.PoolingHiveClient$$anonfun$createTable$1.apply(PoolingHiveClient.scala:285)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:112)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.createTable(PoolingHiveClient.scala:285)
at
org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:554)
at
org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$createDataSourceTable(HiveExternalCatalog.scala:461)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply$mcV$sp(HiveExternalCatalog.scala:325)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply(HiveExternalCatalog.scala:298)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply(HiveExternalCatalog.scala:298)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1$$anonfun$apply$1.apply(HiveExternalCatalog.scala:141)
at
org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$maybeSynchronized(HiveExternalCatalog.scala:104)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1.apply(HiveExternalCatalog.scala:139)
at
com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:345)
at
com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:331)
at
com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
at
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:137)
at
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:298)
at
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:99)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:349)
at
com.databricks.sql.DatabricksSessionCatalog.createTable(DatabricksSessionCatalog.scala:144)
at
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:118)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:72)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:81)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:205)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:205)
at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:3424)
at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:3419)
at
org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:99)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:228)
at
org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:85)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:158)
at
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3419)
at org.apache.spark.sql.Dataset.(Dataset.scala:205) at
org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89) at
org.apache.spark.sql.SparkSession.sql(SparkSession.scala:696) at
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:707) at
com.databricks.backend.daemon.driver.SQLDriverLocal$$anonfun$1.apply(SQLDriverLocal.scala:87)
at
com.databricks.backend.daemon.driver.SQLDriverLocal$$anonfun$1.apply(SQLDriverLocal.scala:33)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392) at
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296) at
com.databricks.backend.daemon.driver.SQLDriverLocal.executeSql(SQLDriverLocal.scala:33)
at
com.databricks.backend.daemon.driver.SQLDriverLocal.repl(SQLDriverLocal.scala:136)
at
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:323)
at
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:303)
at
com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:235)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at
com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:230)
at
com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:47)
at
com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:268)
at
com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:47)
at
com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:303)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:591)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:591)
at scala.util.Try$.apply(Try.scala:192) at
com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:586)
at
com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:477)
at
com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:544)
at
com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:383)
at
com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:330)
at
com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:216)
at java.lang.Thread.run(Thread.java:748) NestedThrowablesStackTrace:
Table "partition_keys" has been specified with a primary-key to
include column "TBL_ID" but this column is not found in the table.
Please check your column specification.
org.datanucleus.exceptions.NucleusUserException: Table
"partition_keys" has been specified with a primary-key to include
column "TBL_ID" but this column is not found in the table. Please
check your column specification. at
org.datanucleus.store.rdbms.table.ElementContainerTable.applyUserPrimaryKeySpecification(ElementContainerTable.java:217)
at
org.datanucleus.store.rdbms.table.CollectionTable.initialize(CollectionTable.java:240)
at
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.initializeClassTables(RDBMSStoreManager.java:3283)
at
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3185)
at
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841)
at
org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122)
at
org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605)
at
org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954)
at
org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679)
at
org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2045)
at
org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1365)
at
org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3827)
at
org.datanucleus.state.JDOStateManager.setIdentity(JDOStateManager.java:2571)
at
org.datanucleus.state.JDOStateManager.initialiseForPersistentNew(JDOStateManager.java:513)
at
org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:232)
at
org.datanucleus.ExecutionContextImpl.newObjectProviderForPersistentNew(ExecutionContextImpl.java:1414)
at
org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2218)
at
org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:2065)
at
org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1913)
at
org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217)
at
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:727)
at
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
at
org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:719)
at sun.reflect.GeneratedMethodAccessor441.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
at com.sun.proxy.$Proxy32.createTable(Unknown Source) at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1261)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1294)
at sun.reflect.GeneratedMethodAccessor439.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at
com.sun.proxy.$Proxy33.create_table_with_environment_context(Unknown
Source) at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:558)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:547)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy34.createTable(Unknown Source) at
org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:613) at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply$mcV$sp(HiveClientImpl.scala:528)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:526)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:526)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:322)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:230)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:222)
at
org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:266)
at
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:222)
at
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:305)
at
org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:526)
at
org.apache.spark.sql.hive.client.PoolingHiveClient$$anonfun$createTable$1.apply(PoolingHiveClient.scala:286)
at
org.apache.spark.sql.hive.client.PoolingHiveClient$$anonfun$createTable$1.apply(PoolingHiveClient.scala:285)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:112)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.createTable(PoolingHiveClient.scala:285)
at
org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:554)
at
org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$createDataSourceTable(HiveExternalCatalog.scala:461)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply$mcV$sp(HiveExternalCatalog.scala:325)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply(HiveExternalCatalog.scala:298)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply(HiveExternalCatalog.scala:298)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1$$anonfun$apply$1.apply(HiveExternalCatalog.scala:141)
at
org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$maybeSynchronized(HiveExternalCatalog.scala:104)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1.apply(HiveExternalCatalog.scala:139)
at
com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:345)
at
com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:331)
at
com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
at
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:137)
at
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:298)
at
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:99)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:349)
at
com.databricks.sql.DatabricksSessionCatalog.createTable(DatabricksSessionCatalog.scala:144)
at
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:118)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:72)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:81)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:205)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:205)
at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:3424)
at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:3419)
at
org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:99)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:228)
at
org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:85)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:158)
at
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3419)
at org.apache.spark.sql.Dataset.(Dataset.scala:205) at
org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89) at
org.apache.spark.sql.SparkSession.sql(SparkSession.scala:696) at
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:707) at
com.databricks.backend.daemon.driver.SQLDriverLocal$$anonfun$1.apply(SQLDriverLocal.scala:87)
at
com.databricks.backend.daemon.driver.SQLDriverLocal$$anonfun$1.apply(SQLDriverLocal.scala:33)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392) at
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296) at
com.databricks.backend.daemon.driver.SQLDriverLocal.executeSql(SQLDriverLocal.scala:33)
at
com.databricks.backend.daemon.driver.SQLDriverLocal.repl(SQLDriverLocal.scala:136)
at
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:323)
at
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:303)
at
com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:235)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at
com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:230)
at
com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:47)
at
com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:268)
at
com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:47)
at
com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:303)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:591)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:591)
at scala.util.Try$.apply(Try.scala:192) at
com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:586)
at
com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:477)
at
com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:544)
at
com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:383)
at
com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:330)
at
com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:216)
at java.lang.Thread.run(Thread.java:748) );

I'm able to run the same query without any issue.
This issue looks strange. For a deeper investigation and immediate assistance on this issue, if you have a support plan you may file a support ticket.
For more details, refer "Azure Databricks Quickstart guide".

This error is due to incorrect configuration of the external metastore

Related

Spark read delta table, getting NoSuchObjectException(message:There is no database named delta) error

Reading delta format data using spark
spark.sql("select * from delta.`/mnt/data/test`").createOrReplaceTempView("test")
test view creates in spark program and I can use this view in joining. Program works fine. I can get the count of view
spark.sql("select count(*) from test").show(false)
+--------+
|count(1)|
+--------+
|551 |
+--------+
But I am also getting below error logs
21/08/14 13:55:52 ERROR RetryingHMSHandler: NoSuchObjectException(message:There is no database named delta)
at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:487)
at org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:498)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
at com.sun.proxy.$Proxy47.getDatabase(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:796)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at com.sun.proxy.$Proxy48.get_database(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:949)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy49.getDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1165)
at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1154)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply$mcZ$sp(HiveClientImpl.scala:412)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply(HiveClientImpl.scala:412)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply(HiveClientImpl.scala:412)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:331)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:239)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:231)
at org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:280)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:231)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:314)
at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:411)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:279)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:279)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:279)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1$$anonfun$apply$1.apply(HiveExternalCatalog.scala:144)
at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$maybeSynchronized(HiveExternalCatalog.scala:111)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1.apply(HiveExternalCatalog.scala:142)
at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:372)
at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:358)
at com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:140)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:278)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.databaseExists(ExternalCatalogWithListener.scala:78)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.databaseExists(SessionCatalog.scala:265)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.isRunningDirectlyOnFiles(Analyzer.scala:767)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:692)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:730)
I don't know why this logs are getting? How to get rid of it?
platform : Azure Databricks
Databricks Runtime Version: 6.4 Extended support
Thanks

Special characters in Dataproc partition columns

I'm using Spark 3.1.2 on Goole Dataproc image version 2.0.15-debian10 with Dataproc managed metastore version 3.1.2. The following snippet works fine with a GCS backed table mydb.mytable:
from pyspark.sql import Row
spark.createDataFrame([Row(x='a', y=1)]).write.saveAsTable('mydb.mytable',
mode='overwrite',
partitionBy=['x'])
However, when adding a special character to partition column:
spark.createDataFrame([Row(x='ก', y=1)]).write.saveAsTable('mydb.mytable',
mode='overwrite',
partitionBy=['x'])
the operation fails with the following exception:
21/08/03 10:44:27 ERROR hive.ql.metadata.Hive: MetaException(message:Exception thrown when executing query : SELECT DISTINCT 'org.apache.hadoop.hive.metastore.model.MPartition' AS `NUCLEUS_TYPE`,`A0`.`CREATE_TIME`,`A0`.`LAST_ACCESS_TIME`,`A0`.`PART_NAME`,`A0`.`PART_ID` FROM `PARTITIONS` `A0` LEFT OUTER JOIN `TBLS` `B0` ON `A0`.`TBL_ID` = `B0`.`TBL_ID` LEFT OUTER JOIN `DBS` `C0` ON `B0`.`DB_ID` = `C0`.`DB_ID` WHERE `B0`.`TBL_NAME` = ? AND `C0`.`NAME` = ? AND `A0`.`PART_NAME` = ? AND `C0`.`CTLG_NAME` = ?)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result$add_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result$add_partitions_req_resultStandardScheme.read(ThriftHiveMetastore.java)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$add_partitions_req_result.read(ThriftHiveMetastore.java)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_add_partitions_req(ThriftHiveMetastore.java:1911)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.add_partitions_req(ThriftHiveMetastore.java:1898)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:627)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:173)
at com.sun.proxy.$Proxy46.add_partitions(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2336)
at com.sun.proxy.$Proxy46.add_partitions(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createPartitions(Hive.java:2097)
at org.apache.spark.sql.hive.client.Shim_v0_13.createPartitions(HiveShim.scala:555)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createPartitions$1(HiveClientImpl.scala:609)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:291)
at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:224)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:223)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:273)
at org.apache.spark.sql.hive.client.HiveClientImpl.createPartitions(HiveClientImpl.scala:602)
at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createPartitions$1(HiveExternalCatalog.scala:1007)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
at org.apache.spark.sql.hive.HiveExternalCatalog.createPartitions(HiveExternalCatalog.scala:989)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createPartitions(ExternalCatalogWithListener.scala:201)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createPartitions(SessionCatalog.scala:1050)
at org.apache.spark.sql.execution.command.AlterTableRecoverPartitionsCommand.$anonfun$addPartitions$1(ddl.scala:792)
at org.apache.spark.sql.execution.command.AlterTableRecoverPartitionsCommand.$anonfun$addPartitions$1$adapted(ddl.scala:774)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at org.apache.spark.sql.execution.command.AlterTableRecoverPartitionsCommand.addPartitions(ddl.scala:774)
at org.apache.spark.sql.execution.command.AlterTableRecoverPartitionsCommand.run(ddl.scala:672)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:192)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:108)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:106)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:131)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.createTable(DataFrameWriter.scala:753)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:727)
at org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:626)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
The data gets written correctly to GCS but updating the metastore fails. My guess is this is caused by using latin1 encoding in the managed metastore. Also, using spark.write.parquet(...) works (as the metastore is not used).
Is it possible to configure Dataproc to properly handle any UTF-8 values in partition columns? I would like to avoid using any encoding (like URL encoding) in the application logic.
The issue is that Hive Metastore's MySQL schema does not support that character:
PART_NAME varchar(767) CHARACTER SET latin1 COLLATE latin1_bin DEFAULT NULL,
https://github.com/apache/hive/blob/master/metastore/scripts/upgrade/mysql/hive-schema-2.3.0.mysql.sql#L211
You could reach out to the Hive community to check whether a larger character set is OK: https://github.com/apache/hive#useful-mailing-lists

Zeppelin - Spark Interpreter Unable to create hive table by using CTAS (Create Table as Select ...) statement

I am using Zeppelin and trying to create a hive table from another hive table by using CTAS statement
But my query ends up with error always so the table is not getting created. Have found out few posts which says to modify zeppelin configuration but I cannot change any configuration as I don't have permission to do so.
The query which I had executed and the error that I get are given below :
%sql
create table student as select * from student_score
org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter
table. Invalid method name: 'alter_table_with_cascade' at
org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:500) at
org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484) at
org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668) at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.spark.sql.hive.client.Shim_v0_14.loadTable(HiveShim.scala:716)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply$mcV$sp(HiveClientImpl.scala:672)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:672)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$loadTable$1.apply(HiveClientImpl.scala:672)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:283)
at
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:230)
at
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:229)
at
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:272)
at
org.apache.spark.sql.hive.client.HiveClientImpl.loadTable(HiveClientImpl.scala:671)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply$mcV$sp(HiveExternalCatalog.scala:741)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:739)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$loadTable$1.apply(HiveExternalCatalog.scala:739)
at
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:95)
at
org.apache.spark.sql.hive.HiveExternalCatalog.loadTable(HiveExternalCatalog.scala:739)
at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult$lzycompute(InsertIntoHiveTable.scala:323)
at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.sideEffectResult(InsertIntoHiveTable.scala:170)
at
org.apache.spark.sql.hive.execution.InsertIntoHiveTable.doExecute(InsertIntoHiveTable.scala:347)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
at
org.apache.spark.sql.hive.execution.CreateHiveTableAsSelectCommand.run(CreateHiveTableAsSelectCommand.scala:92)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at
org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
at
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
at org.apache.spark.sql.Dataset.(Dataset.scala:185) at
org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64) at
org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592) ... 47
elided Caused by: org.apache.thrift.TApplicationException: Invalid
method name: 'alter_table_with_cascade' at
org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at
org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_cascade(ThriftHiveMetastore.java:1374)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_with_cascade(ThriftHiveMetastore.java:1358)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:340)
at
org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.alter_table(SessionHiveMetaStoreClient.java:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
at com.sun.proxy.$Proxy25.alter_table(Unknown Source) at
org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:496)
... 93 more

Spark AVRO S3 read not working for partitioned data

When I read a specific file it works:
val filePath= "s3n://bucket_name/f1/f2/avro/dt=2016-10-19/hr=19/000000"
val df = spark.read.avro(filePath)
But if I point to a folder to read date partitioned data it fails:
val filePath="s3n://bucket_name/f1/f2/avro/dt=2016-10-19/"
I get this error:
Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/f1%2Ff2%2Favro%2Fdt%3D2016-10-19' - ResponseCode=403, ResponseMessage=Forbidden
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleServiceException(Jets3tNativeFileSystemStore.java:245)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at org.apache.hadoop.fs.s3native.$Proxy7.retrieveMetadata(Unknown Source)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:414)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:374)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:364)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:364)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
at com.databricks.spark.avro.package$AvroDataFrameReader$$anonfun$avro$2.apply(package.scala:34)
at com.databricks.spark.avro.package$AvroDataFrameReader$$anonfun$avro$2.apply(package.scala:34)
at BasicS3Avro$.main(BasicS3Avro.scala:55)
at BasicS3Avro.main(BasicS3Avro.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Am I missing anything?
what does the newer, maintained, s3a client report?

titan-1.0.0-hadoop1 compatibilty issues with apache-cassandra-3.2

I am trying to connect titan-1.0.0-hadoop1 with apache-cassandra-3.2.
I am using following " titan-cassandra-myfile.properties ":
storage.cassandra-config-dir=titan-1.0.0- hadoop1/conf/cassandra/cassandra.yaml
storage.backend=cassandra
storage.hostname=localhost
When I use gremlin to execute following command:
g=TitanFactory.open("/path to titan/ titan-1.0.0-hadoop1/conf/titan-cassandra-myfile.properties");
I am getting following error:
java.lang.IllegalArgumentException: Could not instantiate implementation: com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager
at com.thinkaurelius.titan.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:55)
at com.thinkaurelius.titan.diskstorage.Backend.getImplementationClass(Backend.java:473)
at com.thinkaurelius.titan.diskstorage.Backend.getStorageManager(Backend.java:407)
at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.(GraphDatabaseConfiguration.java:1320)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:94)
at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:62)
at com.thinkaurelius.titan.core.TitanFactory$open.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:110)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:122)
at groovysh_evaluate.run(groovysh_evaluate:3)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
at org.codehaus.groovy.tools.shell.Interpreter.evaluate(Interpreter.groovy:69)
at org.codehaus.groovy.tools.shell.Groovysh.execute(Groovysh.groovy:185)
at org.codehaus.groovy.tools.shell.Shell.leftShift(Shell.groovy:119)
at org.codehaus.groovy.tools.shell.ShellRunner.work(ShellRunner.groovy:94)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$work(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1207)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:130)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:150)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.work(InteractiveShellRunner.groovy:123)
at org.codehaus.groovy.tools.shell.ShellRunner.run(ShellRunner.groovy:58)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.super$2$run(InteractiveShellRunner.groovy)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:90)
at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:324)
at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1207)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuperN(ScriptBytecodeAdapter.java:130)
at org.codehaus.groovy.runtime.ScriptBytecodeAdapter.invokeMethodOnSuper0(ScriptBytecodeAdapter.java:150)
at org.codehaus.groovy.tools.shell.InteractiveShellRunner.run(InteractiveShellRunner.groovy:82)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
at org.apache.tinkerpop.gremlin.console.Console.(Console.groovy:144)
at org.codehaus.groovy.vmplugin.v7.IndyInterface.selectMethod(IndyInterface.java:215)
at org.apache.tinkerpop.gremlin.console.Console.main(Console.groovy:303)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at com.thinkaurelius.titan.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:44)
... 42 more
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: Temporary failure in storage backend
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:572)
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.(AstyanaxStoreManager.java:291)
... 47 more
Caused by: com.netflix.astyanax.connectionpool.exceptions.PoolTimeoutException: PoolTimeoutException: [host=localhost(127.0.0.1):9160, latency=10000(10000), attempts=1]Timed out waiting for connection
at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.waitForConnection(SimpleHostConnectionPool.java:231)
at com.netflix.astyanax.connectionpool.impl.SimpleHostConnectionPool.borrowConnection(SimpleHostConnectionPool.java:198)
at com.netflix.astyanax.connectionpool.impl.RoundRobinExecuteWithFailover.borrowConnection(RoundRobinExecuteWithFailover.java:84)
at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:117)
at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:352)
at com.netflix.astyanax.thrift.ThriftClusterImpl.executeSchemaChangeOperation(ThriftClusterImpl.java:146)
at com.netflix.astyanax.thrift.ThriftClusterImpl.internalCreateKeyspace(ThriftClusterImpl.java:321)
at com.netflix.astyanax.thrift.ThriftClusterImpl.addKeyspace(ThriftClusterImpl.java:294)
at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.AstyanaxStoreManager.ensureKeyspaceExists(AstyanaxStoreManager.java:567)
... 48 more
```
When I use same configurations with apache-cassandra-1.2.5 and titan-all-0.3.1 , the above config works and it creates a graph.
Please suggest some solution on this , whether is this a compatibility issue or I am doing something wrong.
Thanks,
Laxmikant
Titan 1.0 has been released using Cassandra 2.1.9. I'm almost sure that it's not compatible with Cassandra 3.x because there was a storage engine re-write
See here: https://github.com/thinkaurelius/titan/blob/titan10/pom.xml#L65

Resources