trying to load data to spark from hbase 1.1 using phoenix-spark 4.9.0 on spark 2.1 but failing on following error:
>>> df = spark.read.format('org.apache.phoenix.spark').option('table', 'namespace.table').option('zkUrl', '10.0.1.1:2181').load()
: java.sql.SQLException: java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2432)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2352)
at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:76)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2352)
at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:232)
at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:147)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:98)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:57)
at org.apache.phoenix.mapreduce.util.ConnectionUtil.getInputConnection(ConnectionUtil.java:45)
at org.apache.phoenix.mapreduce.util.PhoenixConfigurationUtil.getSelectColumnMetadataList(PhoenixConfigurationUtil.java:279)
at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:114)
at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:60)
at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:40)
at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:389)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
any clues anyone?
Related
Reading delta format data using spark
spark.sql("select * from delta.`/mnt/data/test`").createOrReplaceTempView("test")
test view creates in spark program and I can use this view in joining. Program works fine. I can get the count of view
spark.sql("select count(*) from test").show(false)
+--------+
|count(1)|
+--------+
|551 |
+--------+
But I am also getting below error logs
21/08/14 13:55:52 ERROR RetryingHMSHandler: NoSuchObjectException(message:There is no database named delta)
at org.apache.hadoop.hive.metastore.ObjectStore.getMDatabase(ObjectStore.java:487)
at org.apache.hadoop.hive.metastore.ObjectStore.getDatabase(ObjectStore.java:498)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
at com.sun.proxy.$Proxy47.getDatabase(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:796)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at com.sun.proxy.$Proxy48.get_database(Unknown Source)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getDatabase(HiveMetaStoreClient.java:949)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy49.getDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.getDatabase(Hive.java:1165)
at org.apache.hadoop.hive.ql.metadata.Hive.databaseExists(Hive.java:1154)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply$mcZ$sp(HiveClientImpl.scala:412)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply(HiveClientImpl.scala:412)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$databaseExists$1.apply(HiveClientImpl.scala:412)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:331)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:239)
at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:231)
at org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:280)
at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:231)
at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:314)
at org.apache.spark.sql.hive.client.HiveClientImpl.databaseExists(HiveClientImpl.scala:411)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:279)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:279)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:279)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1$$anonfun$apply$1.apply(HiveExternalCatalog.scala:144)
at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$maybeSynchronized(HiveExternalCatalog.scala:111)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1.apply(HiveExternalCatalog.scala:142)
at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:372)
at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:358)
at com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:140)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:278)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.databaseExists(ExternalCatalogWithListener.scala:78)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.databaseExists(SessionCatalog.scala:265)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.isRunningDirectlyOnFiles(Analyzer.scala:767)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:692)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:730)
I don't know why this logs are getting? How to get rid of it?
platform : Azure Databricks
Databricks Runtime Version: 6.4 Extended support
Thanks
During saving dataframe to tables in Azure Databricks I get error,
val employeesDf = Seq(
("Rafferty", Some(31)), ("Jones", Some(33)), ("Heisenberg", Some(33)),
("Robinson", Some(34)), ("Smith", Some(34)), ("Williams", null)
).toDF("LastName","DepartmentID").write.format("parquet").mode("overwrite").saveAsTable("employ ees_table")
org.apache.spark.sql.AnalysisException:
org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:javax.jdo.JDOUserException: Table
"partition_keys" has been specified with a primary-key to include
column "TBL_ID" but this column is not found in the table. Please
check your column specification. <div
class="ansiout"> at
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:549)
at
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
at
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
at
org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:719)
at sun.reflect.GeneratedMethodAccessor441.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
at com.sun.proxy.$Proxy32.createTable(Unknown Source) at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1261)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1294)
at sun.reflect.GeneratedMethodAccessor439.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at
com.sun.proxy.$Proxy33.create_table_with_environment_context(Unknown
Source) ...
Additionally, I am also getting an error during run an example notebook from databricks, in which create tables from path on dbfs
%sql
DROP TABLE IF EXISTS diamonds;
CREATE TABLE diamonds
USING csv
OPTIONS (path "/databricks-datasets/Rdatasets/data-
001/csv/ggplot2/diamonds.csv", header "true")
Error in SQL statement: AnalysisException:
org.apache.hadoop.hive.ql.metadata.HiveException:
MetaException(message:javax.jdo.JDOUserException: Table
"partition_keys" has been specified with a primary-key to include
column "TBL_ID" but this column is not found in the table. Please
check your column specification. at
org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:549)
at
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:732)
at
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
at
org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:719)
at sun.reflect.GeneratedMethodAccessor441.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
at com.sun.proxy.$Proxy32.createTable(Unknown Source) at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1261)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1294)
at sun.reflect.GeneratedMethodAccessor439.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at
com.sun.proxy.$Proxy33.create_table_with_environment_context(Unknown
Source) at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:558)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:547)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy34.createTable(Unknown Source) at
org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:613) at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply$mcV$sp(HiveClientImpl.scala:528)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:526)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:526)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:322)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:230)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:222)
at
org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:266)
at
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:222)
at
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:305)
at
org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:526)
at
org.apache.spark.sql.hive.client.PoolingHiveClient$$anonfun$createTable$1.apply(PoolingHiveClient.scala:286)
at
org.apache.spark.sql.hive.client.PoolingHiveClient$$anonfun$createTable$1.apply(PoolingHiveClient.scala:285)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:112)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.createTable(PoolingHiveClient.scala:285)
at
org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:554)
at
org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$createDataSourceTable(HiveExternalCatalog.scala:461)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply$mcV$sp(HiveExternalCatalog.scala:325)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply(HiveExternalCatalog.scala:298)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply(HiveExternalCatalog.scala:298)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1$$anonfun$apply$1.apply(HiveExternalCatalog.scala:141)
at
org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$maybeSynchronized(HiveExternalCatalog.scala:104)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1.apply(HiveExternalCatalog.scala:139)
at
com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:345)
at
com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:331)
at
com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
at
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:137)
at
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:298)
at
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:99)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:349)
at
com.databricks.sql.DatabricksSessionCatalog.createTable(DatabricksSessionCatalog.scala:144)
at
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:118)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:72)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:81)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:205)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:205)
at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:3424)
at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:3419)
at
org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:99)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:228)
at
org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:85)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:158)
at
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3419)
at org.apache.spark.sql.Dataset.(Dataset.scala:205) at
org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89) at
org.apache.spark.sql.SparkSession.sql(SparkSession.scala:696) at
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:707) at
com.databricks.backend.daemon.driver.SQLDriverLocal$$anonfun$1.apply(SQLDriverLocal.scala:87)
at
com.databricks.backend.daemon.driver.SQLDriverLocal$$anonfun$1.apply(SQLDriverLocal.scala:33)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392) at
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296) at
com.databricks.backend.daemon.driver.SQLDriverLocal.executeSql(SQLDriverLocal.scala:33)
at
com.databricks.backend.daemon.driver.SQLDriverLocal.repl(SQLDriverLocal.scala:136)
at
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:323)
at
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:303)
at
com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:235)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at
com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:230)
at
com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:47)
at
com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:268)
at
com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:47)
at
com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:303)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:591)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:591)
at scala.util.Try$.apply(Try.scala:192) at
com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:586)
at
com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:477)
at
com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:544)
at
com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:383)
at
com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:330)
at
com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:216)
at java.lang.Thread.run(Thread.java:748) NestedThrowablesStackTrace:
Table "partition_keys" has been specified with a primary-key to
include column "TBL_ID" but this column is not found in the table.
Please check your column specification.
org.datanucleus.exceptions.NucleusUserException: Table
"partition_keys" has been specified with a primary-key to include
column "TBL_ID" but this column is not found in the table. Please
check your column specification. at
org.datanucleus.store.rdbms.table.ElementContainerTable.applyUserPrimaryKeySpecification(ElementContainerTable.java:217)
at
org.datanucleus.store.rdbms.table.CollectionTable.initialize(CollectionTable.java:240)
at
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.initializeClassTables(RDBMSStoreManager.java:3283)
at
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3185)
at
org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841)
at
org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122)
at
org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605)
at
org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954)
at
org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679)
at
org.datanucleus.store.rdbms.RDBMSStoreManager.getPropertiesForGenerator(RDBMSStoreManager.java:2045)
at
org.datanucleus.store.AbstractStoreManager.getStrategyValue(AbstractStoreManager.java:1365)
at
org.datanucleus.ExecutionContextImpl.newObjectId(ExecutionContextImpl.java:3827)
at
org.datanucleus.state.JDOStateManager.setIdentity(JDOStateManager.java:2571)
at
org.datanucleus.state.JDOStateManager.initialiseForPersistentNew(JDOStateManager.java:513)
at
org.datanucleus.state.ObjectProviderFactoryImpl.newForPersistentNew(ObjectProviderFactoryImpl.java:232)
at
org.datanucleus.ExecutionContextImpl.newObjectProviderForPersistentNew(ExecutionContextImpl.java:1414)
at
org.datanucleus.ExecutionContextImpl.persistObjectInternal(ExecutionContextImpl.java:2218)
at
org.datanucleus.ExecutionContextImpl.persistObjectWork(ExecutionContextImpl.java:2065)
at
org.datanucleus.ExecutionContextImpl.persistObject(ExecutionContextImpl.java:1913)
at
org.datanucleus.ExecutionContextThreadedImpl.persistObject(ExecutionContextThreadedImpl.java:217)
at
org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:727)
at
org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:752)
at
org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:719)
at sun.reflect.GeneratedMethodAccessor441.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:108)
at com.sun.proxy.$Proxy32.createTable(Unknown Source) at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1261)
at
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1294)
at sun.reflect.GeneratedMethodAccessor439.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
at
com.sun.proxy.$Proxy33.create_table_with_environment_context(Unknown
Source) at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:558)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:547)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy34.createTable(Unknown Source) at
org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:613) at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply$mcV$sp(HiveClientImpl.scala:528)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:526)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createTable$1.apply(HiveClientImpl.scala:526)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:322)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:230)
at
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$retryLocked$1.apply(HiveClientImpl.scala:222)
at
org.apache.spark.sql.hive.client.HiveClientImpl.synchronizeOnObject(HiveClientImpl.scala:266)
at
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:222)
at
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:305)
at
org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:526)
at
org.apache.spark.sql.hive.client.PoolingHiveClient$$anonfun$createTable$1.apply(PoolingHiveClient.scala:286)
at
org.apache.spark.sql.hive.client.PoolingHiveClient$$anonfun$createTable$1.apply(PoolingHiveClient.scala:285)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.withHiveClient(PoolingHiveClient.scala:112)
at
org.apache.spark.sql.hive.client.PoolingHiveClient.createTable(PoolingHiveClient.scala:285)
at
org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:554)
at
org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$createDataSourceTable(HiveExternalCatalog.scala:461)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply$mcV$sp(HiveExternalCatalog.scala:325)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply(HiveExternalCatalog.scala:298)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createTable$1.apply(HiveExternalCatalog.scala:298)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1$$anonfun$apply$1.apply(HiveExternalCatalog.scala:141)
at
org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$maybeSynchronized(HiveExternalCatalog.scala:104)
at
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1.apply(HiveExternalCatalog.scala:139)
at
com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:345)
at
com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:331)
at
com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:34)
at
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:137)
at
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:298)
at
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:99)
at
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:349)
at
com.databricks.sql.DatabricksSessionCatalog.createTable(DatabricksSessionCatalog.scala:144)
at
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:118)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:72)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:70)
at
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:81)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:205)
at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:205)
at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:3424)
at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:3419)
at
org.apache.spark.sql.execution.SQLExecution$$anonfun$withCustomExecutionEnv$1.apply(SQLExecution.scala:99)
at
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:228)
at
org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:85)
at
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:158)
at
org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$withAction(Dataset.scala:3419)
at org.apache.spark.sql.Dataset.(Dataset.scala:205) at
org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89) at
org.apache.spark.sql.SparkSession.sql(SparkSession.scala:696) at
org.apache.spark.sql.SQLContext.sql(SQLContext.scala:707) at
com.databricks.backend.daemon.driver.SQLDriverLocal$$anonfun$1.apply(SQLDriverLocal.scala:87)
at
com.databricks.backend.daemon.driver.SQLDriverLocal$$anonfun$1.apply(SQLDriverLocal.scala:33)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.immutable.List.foreach(List.scala:392) at
scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.immutable.List.map(List.scala:296) at
com.databricks.backend.daemon.driver.SQLDriverLocal.executeSql(SQLDriverLocal.scala:33)
at
com.databricks.backend.daemon.driver.SQLDriverLocal.repl(SQLDriverLocal.scala:136)
at
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:323)
at
com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$8.apply(DriverLocal.scala:303)
at
com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:235)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at
com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:230)
at
com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:47)
at
com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:268)
at
com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:47)
at
com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:303)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:591)
at
com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:591)
at scala.util.Try$.apply(Try.scala:192) at
com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:586)
at
com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:477)
at
com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:544)
at
com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:383)
at
com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:330)
at
com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:216)
at java.lang.Thread.run(Thread.java:748) );
I'm able to run the same query without any issue.
This issue looks strange. For a deeper investigation and immediate assistance on this issue, if you have a support plan you may file a support ticket.
For more details, refer "Azure Databricks Quickstart guide".
This error is due to incorrect configuration of the external metastore
I am querying Apache drill from within apache spark. My question is, how to send sql commands other than select * from from spark to drill. By default, spark is sending the queries inside select * from. Also, when I am querying schema other than dfs, I am getting NullPointerException. Please help!
My spark version is 2.2.0
Here are my codes:
1. schema = dfs:
dataframe_mysql = spark.read.format("jdbc").option("url", "jdbc:drill:zk=%s;schema=%s;" % (foreman,schema)).option("driver","org.apache.drill.jdbc.Driver").option("dbtable","\"/user/titanic_data/test.csv\"").load()
Schema = MySQL
dataframe_mysql = spark.read.format("jdbc").option("url", "jdbc:drill:zk=%s;schema=MySQL;" % (foreman)).option("driver","org.apache.drill.jdbc.Driver").option("dbtable","MySQL.\"spark3\"").load()
This is the complete error:
Name: org.apache.toree.interpreter.broker.BrokerException
Message: Py4JJavaError: An error occurred while calling o40.load.
: java.sql.SQLException: Failed to create prepared statement: SYSTEM ERROR: NullPointerException
[Error Id: d1e4b310-f4df-4e7c-90ae-983cc5c89f94 on inpunpclx1825e.kih.kmart.com:31010]
at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newServerPreparedStatement(DrillJdbc41Factory.java:147)
at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newPreparedStatement(DrillJdbc41Factory.java:108)
at org.apache.drill.jdbc.impl.DrillJdbc41Factory.newPreparedStatement(DrillJdbc41Factory.java:50)
at oadd.org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:278)
at org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareStatement(DrillConnectionImpl.java:389)
at oadd.org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:119)
at org.apache.drill.jdbc.impl.DrillConnectionImpl.prepareStatement(DrillConnectionImpl.java:422)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:113)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:47)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
(<class 'py4j.protocol.Py4JJavaError'>, Py4JJavaError('An error occurred while calling o40.load.\n', JavaObject id=o41), <traceback object at 0x7f00106d6488>)
StackTrace: org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:163)
org.apache.toree.interpreter.broker.BrokerState$$anonfun$markFailure$1.apply(BrokerState.scala:163)
scala.Option.foreach(Option.scala:257)
org.apache.toree.interpreter.broker.BrokerState.markFailure(BrokerState.scala:162)
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
py4j.Gateway.invoke(Gateway.java:280)
py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
py4j.commands.CallCommand.execute(CallCommand.java:79)
py4j.GatewayConnection.run(GatewayConnection.java:214)
java.lang.Thread.run(Thread.java:748)
I have changed the default drill quote from `` to "" so that there won't be any quoting identifier issue between spark and drill.
When I read a specific file it works:
val filePath= "s3n://bucket_name/f1/f2/avro/dt=2016-10-19/hr=19/000000"
val df = spark.read.avro(filePath)
But if I point to a folder to read date partitioned data it fails:
val filePath="s3n://bucket_name/f1/f2/avro/dt=2016-10-19/"
I get this error:
Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/f1%2Ff2%2Favro%2Fdt%3D2016-10-19' - ResponseCode=403, ResponseMessage=Forbidden
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.handleServiceException(Jets3tNativeFileSystemStore.java:245)
at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.retrieveMetadata(Jets3tNativeFileSystemStore.java:119)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at org.apache.hadoop.fs.s3native.$Proxy7.retrieveMetadata(Unknown Source)
at org.apache.hadoop.fs.s3native.NativeS3FileSystem.getFileStatus(NativeS3FileSystem.java:414)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:374)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:364)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:364)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
at com.databricks.spark.avro.package$AvroDataFrameReader$$anonfun$avro$2.apply(package.scala:34)
at com.databricks.spark.avro.package$AvroDataFrameReader$$anonfun$avro$2.apply(package.scala:34)
at BasicS3Avro$.main(BasicS3Avro.scala:55)
at BasicS3Avro.main(BasicS3Avro.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Am I missing anything?
what does the newer, maintained, s3a client report?
installing cassandra-spark-connector - but getting error creating SparkContext
Please help. I am following the guide - https://github.com/datastax/spark-cassandra-connector/blob/master/doc/0_quick_start.md
Env - Spark 1.0.1, Scala 2.10.4
But having the following error message when i get to creating SparkContext. The last line says all master are unresponsive, giving up. Master is still running
My steps are:
./sbin/start-all - starts all workes successfully
MASTER=spark://spark-master-hostname:7077 ./bin/spark-shell - to lunch spark on the master
scala> import org.apache.spark.SparkContext
import org.apache.spark.SparkContext
scala> import org.apache.spark.SparkContext._
import org.apache.spark.SparkContext._
scala> import org.apache.spark.SparkConf
import org.apache.spark.SparkConf
scala> val conf = new SparkConf(true).set("spark.cassandra.connection.host","cassandra-host-ip")
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf#9f073
*scala> val sc = new SparkContext("spark://spark-master-ipaddress:7077", "test", conf)*
**14/07/29 12:18:23 WARN AbstractLifeCycle: FAILED
SelectChannelConnector#0.0.0.0:4040: java.net.BindException: Address already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:293)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:192)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:191)
at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:205)
at org.apache.spark.ui.WebUI.bind(WebUI.scala:99)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:223)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:97)
at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:17)
at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22)
at $line15.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
at $line15.$read$$iwC$$iwC$$iwC.<init>(<console>:26)
at $line15.$read$$iwC$$iwC.<init>(<console>:28)
at $line15.$read$$iwC.<init>(<console>:30)
at $line15.$read.<init>(<console>:32)
at $line15.$read$.<init>(<console>:36)
at $line15.$read$.<clinit>(<console>)
at $line15.$eval$.<init>(<console>:7)
at $line15.$eval$.<clinit>(<console>)
at $line15.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
14/07/29 12:18:23 WARN AbstractLifeCycle: FAILED org.eclipse.jetty.server.Server#dd53c8a: java.net.BindException: Address already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265 )
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:293)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:192)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:191)
at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:205)
at org.apache.spark.ui.WebUI.bind(WebUI.scala:99)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:223)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:97)
at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:17)
at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22)
at $line15.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
at $line15.$read$$iwC$$iwC$$iwC.<init>(<console>:26)
at $line15.$read$$iwC$$iwC.<init>(<console>:28)
at $line15.$read$$iwC.<init>(<console>:30)
at $line15.$read.<init>(<console>:32)
at $line15.$read$.<init>(<console>:36)
at $line15.$read$.<clinit>(<console>)
at $line15.$eval$.<init>(<console>:7)
at $line15.$eval$.<clinit>(<console>)
at $line15.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext#4353d65f
scala> 14/07/29 12:19:24 ERRstrong textOR SparkDeploySchedulerBackend: Application has been killed. Reason: All master**s are unresponsive! Giving up.
14/07/29 12:19:24 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.****
Step 1 : To load the connector into the Spark Shell, start the shell with this command:
../bin/spark-shell –jars ~/apps/spark-1.2/jars/spark-cassandra-connector-assembly-1.1.1-SNAPSHOT.jar
Step 2 : Connect the Spark Context to the Cassandra cluster.Stop the default context.
sc.stop
Step 3 :Import the necessary jar files.
import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf
Step 4 : Make a new SparkConf with the Cassandra connection details:
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
Step 5 : Create a new Spark Context:
val sc = new SparkContext(conf)
You now have a new SparkContext which is connected to your Cassandra cluster.
Have tried to use spark-packages?
Spark Cassandra Connector on spark-packages.org
Boils down to
$SPARK_HOME/bin/spark-shell --packages datastax:spark-cassandra-connector:2.0.0-M2-s_2.10
where you need to use the correct version for your version of spark. This should setup everything needed automatically.