COSMOS DB write issue from Databricks Notebook - apache-spark

As per the databricks docs- https://docs.databricks.com/data/data-sources/azure/cosmosdb-connector.html, i've downloaded the latest azure-cosmosdb-spark library( azure-cosmosdb-spark_2.4.0_2.11-2.1.2-uber.jar) and placed in the libraries location of dbfs.
When trying to write the data from a dataframe to COSMOS container, i'm getting the below error, any help would be more appreciated.
My Databricks runtime version is: 7.0 (includes Apache Spark 3.0.0, Scala 2.12)
imports from the notebook:
import java.time.LocalDateTime
import java.time.format.DateTimeFormatter
import org.apache.spark.sql.{DataFrame, SaveMode, SparkSession}
import org.apache.spark.sql.types.{StructField, _}
import org.apache.spark.sql.functions._
import com.microsoft.azure.cosmosdb.spark.schema._
import com.microsoft.azure.cosmosdb.spark.CosmosDBSpark
import com.microsoft.azure.cosmosdb.spark.config._
val dtcCANWrite = Config(Map(
"Endpoint" -> "NOT DISPLAYED",
"Masterkey" -> "NOT DISPLAYED",
"Database" -> "NOT DISPLAYED",
"Collection" -> "NOT DISPLAYED",
"preferredRegions" -> "NOT DISPLAYED",
"Upsert" -> "true"
))
distinctCANDF.write.mode(SaveMode.Append).cosmosDB(dtcCANWrite)
Error:
at com.microsoft.azure.cosmosdb.spark.config.CosmosDBConfigBuilder.<init>(CosmosDBConfigBuilder.scala:31)
at com.microsoft.azure.cosmosdb.spark.config.Config$.apply(Config.scala:259)
at com.microsoft.azure.cosmosdb.spark.config.Config$.apply(Config.scala:240)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:7)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:69)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:71)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:73)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:75)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:77)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:79)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:81)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:83)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:85)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:87)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:89)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:91)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw.<init>(command-3649834446724317:93)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw.<init>(command-3649834446724317:95)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw.<init>(command-3649834446724317:97)
at line6d80624d7a774601af6eb962eb59453253.$read.<init>(command-3649834446724317:99)
at line6d80624d7a774601af6eb962eb59453253.$read$.<init>(command-3649834446724317:103)
at line6d80624d7a774601af6eb962eb59453253.$read$.<clinit>(command-3649834446724317)
at line6d80624d7a774601af6eb962eb59453253.$eval$.$print$lzycompute(<notebook>:7)
at line6d80624d7a774601af6eb962eb59453253.$eval$.$print(<notebook>:6)
at line6d80624d7a774601af6eb962eb59453253.$eval.$print(<notebook>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:202)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$10(DriverLocal.scala:396)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:238)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:233)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:230)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:275)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:268)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:653)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:645)
at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:486)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:598)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:391)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: scala.Product$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at com.microsoft.azure.cosmosdb.spark.config.CosmosDBConfigBuilder.<init>(CosmosDBConfigBuilder.scala:31)
at com.microsoft.azure.cosmosdb.spark.config.Config$.apply(Config.scala:259)
at com.microsoft.azure.cosmosdb.spark.config.Config$.apply(Config.scala:240)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:7)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:69)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:71)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:73)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:75)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:77)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:79)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:81)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:83)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:85)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:87)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:89)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw$$iw.<init>(command-3649834446724317:91)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw$$iw.<init>(command-3649834446724317:93)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw$$iw.<init>(command-3649834446724317:95)
at line6d80624d7a774601af6eb962eb59453253.$read$$iw.<init>(command-3649834446724317:97)
at line6d80624d7a774601af6eb962eb59453253.$read.<init>(command-3649834446724317:99)
at line6d80624d7a774601af6eb962eb59453253.$read$.<init>(command-3649834446724317:103)
at line6d80624d7a774601af6eb962eb59453253.$read$.<clinit>(command-3649834446724317)
at line6d80624d7a774601af6eb962eb59453253.$eval$.$print$lzycompute(<notebook>:7)
at line6d80624d7a774601af6eb962eb59453253.$eval$.$print(<notebook>:6)
at line6d80624d7a774601af6eb962eb59453253.$eval.$print(<notebook>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:745)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1021)
at scala.tools.nsc.interpreter.IMain.$anonfun$interpret$1(IMain.scala:574)
at scala.reflect.internal.util.ScalaClassLoader.asContext(ScalaClassLoader.scala:41)
at scala.reflect.internal.util.ScalaClassLoader.asContext$(ScalaClassLoader.scala:37)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:41)
at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:600)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:570)
at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:215)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.$anonfun$repl$1(ScalaDriverLocal.scala:202)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:714)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:667)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:202)
at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$10(DriverLocal.scala:396)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:238)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:233)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:230)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:49)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:275)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:268)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:49)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:373)
at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:653)
at scala.util.Try$.apply(Try.scala:213)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:645)
at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:486)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:598)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:391)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:337)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:219)
at java.lang.Thread.run(Thread.java:748)

Thanks for Rayan Ral's suggestion. So the problem comes from the version conflict.
Solution is downgraded the version. In this cases, you can downgraded from Spark 3.0.0, Scala 2.12 to Spark 2.4.4, Scala 2.11.

Related

Partitioned delta table failing to write checkpoints?

I'm using pyspark with spark 3.2.1 and delta table package ("io.delta:delta-core_2.12:2.1.0").
I have partitioned my table by Code, Year and Month like below:
table.write.partitionBy("Code", "Year", "Month").format('delta').save(path)
When I run a merge on this table:
deltaTable.alias("t0").merge(
df.alias("t1"),
" t0.Code= t1.CodeAND "
" t0.Year = t1.Year AND "
" t0.Month = t1.Month AND "
" t0.Date = t1.Date"
).whenMatchedUpdate(
set={
...
}
).execute()
I got the following warning:
WARN DAGScheduler: Broadcasting large task binary with size 1861.8 KiB
Then I got the following error after some minutes:
py4j.protocol.Py4JJavaError: An error occurred while calling o2070.execute.
: java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.ElementAt$.apply$default$3()Lscala/Option;
at org.apache.spark.sql.delta.CheckpointV2$.$anonfun$extractPartitionValues$1(Checkpoints.scala:727)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at org.apache.spark.sql.types.StructType.foreach(StructType.scala:102)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at org.apache.spark.sql.types.StructType.map(StructType.scala:102)
at org.apache.spark.sql.delta.CheckpointV2$.extractPartitionValues(Checkpoints.scala:724)
at org.apache.spark.sql.delta.Checkpoints$.buildCheckpoint(Checkpoints.scala:689)
at org.apache.spark.sql.delta.Checkpoints$.$anonfun$writeCheckpoint$1(Checkpoints.scala:531)
at org.apache.spark.sql.delta.metering.DeltaLogging.withDmqTag(DeltaLogging.scala:143)
at org.apache.spark.sql.delta.metering.DeltaLogging.withDmqTag$(DeltaLogging.scala:142)
at org.apache.spark.sql.delta.Checkpoints$.withDmqTag(Checkpoints.scala:460)
at org.apache.spark.sql.delta.Checkpoints$.writeCheckpoint(Checkpoints.scala:487)
at org.apache.spark.sql.delta.Checkpoints.writeCheckpointFiles(Checkpoints.scala:361)
at org.apache.spark.sql.delta.Checkpoints.writeCheckpointFiles$(Checkpoints.scala:359)
at org.apache.spark.sql.delta.DeltaLog.writeCheckpointFiles(DeltaLog.scala:63)
at org.apache.spark.sql.delta.Checkpoints.checkpointAndCleanUpDeltaLog(Checkpoints.scala:346)
at org.apache.spark.sql.delta.Checkpoints.checkpointAndCleanUpDeltaLog$(Checkpoints.scala:344)
at org.apache.spark.sql.delta.DeltaLog.checkpointAndCleanUpDeltaLog(DeltaLog.scala:63)
at org.apache.spark.sql.delta.Checkpoints.$anonfun$checkpoint$2(Checkpoints.scala:318)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:139)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:137)
at org.apache.spark.sql.delta.DeltaLog.recordFrameProfile(DeltaLog.scala:63)
at org.apache.spark.sql.delta.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:132)
at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)
at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)
at org.apache.spark.sql.delta.DeltaLog.recordOperation(DeltaLog.scala:63)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:131)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:121)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:109)
at org.apache.spark.sql.delta.DeltaLog.recordDeltaOperation(DeltaLog.scala:63)
at org.apache.spark.sql.delta.Checkpoints.$anonfun$checkpoint$1(Checkpoints.scala:314)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.delta.metering.DeltaLogging.withDmqTag(DeltaLogging.scala:143)
at org.apache.spark.sql.delta.metering.DeltaLogging.withDmqTag$(DeltaLogging.scala:142)
at org.apache.spark.sql.delta.DeltaLog.withDmqTag(DeltaLog.scala:63)
at org.apache.spark.sql.delta.Checkpoints.checkpoint(Checkpoints.scala:313)
at org.apache.spark.sql.delta.Checkpoints.checkpoint$(Checkpoints.scala:312)
at org.apache.spark.sql.delta.DeltaLog.checkpoint(DeltaLog.scala:63)
at org.apache.spark.sql.delta.OptimisticTransactionImpl.postCommit(OptimisticTransaction.scala:1097)
at org.apache.spark.sql.delta.OptimisticTransactionImpl.postCommit$(OptimisticTransaction.scala:1092)
at org.apache.spark.sql.delta.OptimisticTransaction.postCommit(OptimisticTransaction.scala:101)
at org.apache.spark.sql.delta.OptimisticTransactionImpl.liftedTree1$1(OptimisticTransaction.scala:750)
at org.apache.spark.sql.delta.OptimisticTransactionImpl.$anonfun$commit$1(OptimisticTransaction.scala:691)
at scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:139)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:137)
at org.apache.spark.sql.delta.OptimisticTransaction.recordFrameProfile(OptimisticTransaction.scala:101)
at org.apache.spark.sql.delta.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:132)
at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)
at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)
at org.apache.spark.sql.delta.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:101)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:131)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:121)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:109)
at org.apache.spark.sql.delta.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:101)
at org.apache.spark.sql.delta.OptimisticTransactionImpl.commit(OptimisticTransaction.scala:688)
at org.apache.spark.sql.delta.OptimisticTransactionImpl.commit$(OptimisticTransaction.scala:686)
at org.apache.spark.sql.delta.OptimisticTransaction.commit(OptimisticTransaction.scala:101)
at org.apache.spark.sql.delta.commands.MergeIntoCommand.$anonfun$run$2(MergeIntoCommand.scala:363)
at org.apache.spark.sql.delta.commands.MergeIntoCommand.$anonfun$run$2$adapted(MergeIntoCommand.scala:319)
at org.apache.spark.sql.delta.DeltaLog.withNewTransaction(DeltaLog.scala:221)
at org.apache.spark.sql.delta.commands.MergeIntoCommand.$anonfun$run$1(MergeIntoCommand.scala:319)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:139)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:137)
at org.apache.spark.sql.delta.commands.MergeIntoCommand.recordFrameProfile(MergeIntoCommand.scala:215)
at org.apache.spark.sql.delta.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:132)
at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)
at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)
at org.apache.spark.sql.delta.commands.MergeIntoCommand.recordOperation(MergeIntoCommand.scala:215)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:131)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:121)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:109)
at org.apache.spark.sql.delta.commands.MergeIntoCommand.recordDeltaOperation(MergeIntoCommand.scala:215)
at org.apache.spark.sql.delta.commands.MergeIntoCommand.run(MergeIntoCommand.scala:317)
at io.delta.tables.DeltaMergeBuilder.$anonfun$execute$1(DeltaMergeBuilder.scala:230)
at org.apache.spark.sql.delta.util.AnalysisHelper.improveUnsupportedOpError(AnalysisHelper.scala:104)
at org.apache.spark.sql.delta.util.AnalysisHelper.improveUnsupportedOpError$(AnalysisHelper.scala:90)
at io.delta.tables.DeltaMergeBuilder.improveUnsupportedOpError(DeltaMergeBuilder.scala:122)
at io.delta.tables.DeltaMergeBuilder.execute(DeltaMergeBuilder.scala:206)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.lang.Thread.run(Thread.java:748)
This error only happens when the table is partitioned! If I not use partitions on my folder my code runs normally.
I'm using this code inside a for loop, performing multiple updates on that table!
So my doubt is: Why is this happening when the table is partitioned?
You're using incompatible version of the Delta Lake - as you can see in the docs, the version 2.1.0 is compatible with Spark 3.3.0, so you can't use it with 3.2.1. You need to take 2.0.1 instead

spark giving requirement failed: Literal must have a corresponding value to string, but class String found

I have a spark 2.4.6 with dataframe write as
df
.select((struct(df.columns.map(column): _*)).alias("value"))
.write
.format("kafka")
.options(........)
.save
But after making a jar and doing spark submit in version 3.0.2 it gives:
Exception in thread "main" java.lang.IllegalArgumentException: requirement failed: Literal must have a corresponding value to string, but class String found.
at scala.Predef$.require(Predef.scala:281)
at org.apache.spark.sql.catalyst.expressions.Literal$.validateLiteralValue(literals.scala:215)
at org.apache.spark.sql.catalyst.expressions.Literal.<init>(literals.scala:292)
at org.apache.spark.sql.kafka010.KafkaWriter$.$anonfun$validateQuery$2(KafkaWriter.scala:55)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.kafka010.KafkaWriter$.validateQuery(KafkaWriter.scala:50)
at org.apache.spark.sql.kafka010.KafkaWriter$.write(KafkaWriter.scala:86)
at org.apache.spark.sql.kafka010.KafkaSourceProvider.createRelation(KafkaSourceProvider.scala:255)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:127)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:126)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:962)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:764)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:962)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:414)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:398)
at part4integrations.IntegratingKafka$.nimbleWrite(IntegratingKafka.scala:145)
at part4integrations.IntegratingKafka$.main(IntegratingKafka.scala:154)
at part4integrations.IntegratingKafka.main(IntegratingKafka.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
On checking the functions
we have that validation function in 3.0.2
https://github.com/apache/spark/blob/648457905c4ea7d00e3d88048c63f360045f0714/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala#L292
but not in the previous version
https://github.com/apache/spark/blob/v2.4.6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/literals.scala
Is there any resolution to this without upgrading the spark version?
as rebuilding the jar with version 3.0.2 works fine.

scala> val sqlContext = new HiveContext(sc) java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveVariableSource

I am trying to run my access hive table over spark
Currently using CDH5.4 and hive version is 1.1 and Spark is 1.6
I have already copied the hive-site.xml file to spark conf folder.
if I am trying to run below code getting the error.
I am trying to get HiveContext so that I can access hive tables in Spark shell
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext
val sparkConf = new SparkConf().setAppName("WordCount")
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
getting error on below line :-
scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveVariableSource
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:48)
at $iwC$$iwC$$iwC.<init>(<console>:50)
at $iwC$$iwC.<init>(<console>:52)
at $iwC.<init>(<console>:54)
at <init>(<console>:56)
at .<init>(<console>:60)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064)
at org.apache.spark.repl.Main$.main(Main.scala:35)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveVariableSource
Updated below based on reply on HADOOP_CLASSPATH
I am using the CDH 5.4 , looks like HADOOP_CLASSPATH is not set
cloudera#quickstart ~]$ echo "Classpath=$HADOOP_CLASSPATH"
Classpath=
But below classpath gave me some output . Do I need my HADOOP_CLASSPATH or its fine ?
cloudera#quickstart ~]$ hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/.//*:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/.//*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/.//*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/.//*
Try this
Val SC = SparkContext.getOrCreate(sparkConf)
Then use your piece of code to create hivecontext

SparkSQL key not found: scale

Spark version is 1.6.0.
I trying to do a simple SQL query to the remote Oracle 11g DB by using Spark SQL.
Of course the ojdbc driver is added to the classpath and ping to the DB is also ok.
SparkConf conf = new SparkConf().setAppName(APP_NAME).setMaster("yarn-client");
JavaSparkContext jsc = new JavaSparkContext(conf);
SqlContext sqlContext = new SqlContext(jsc );
Map<String, String> connectionProperties = new HashMap<>();
connectionProperties.put("user", username);
connectionProperties.put("password", password);
connectionProperties.put("url", url);
connectionProperties.put("dbtable", "(SELECT * FROM tableName)");
connectionProperties.put("driver", "oracle.jdbc.OracleDriver");
DataFrame result = sqlContext.read().format("jdbc").options(connectionProperties).load();
The error arise at the last line at .load() method.
The resulted stacktrace is:
Exception in thread "main" java.util.NoSuchElementException: key not found: scale
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:58)
at org.apache.spark.sql.types.Metadata.get(Metadata.scala:108)
at org.apache.spark.sql.types.Metadata.getLong(Metadata.scala:51)
at org.apache.spark.sql.jdbc.OracleDialect$.getCatalystType(OracleDialect.scala:33)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:140)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:57)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at myapp.dfComparator.entity.OriginalSourceTable.load(OriginalSourceTable.java:74)
at myapp.dfComparator.Program.main(Program.java:74)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I have no mind, what is wrong.
EDIT 01/27/2017
Additional information:
Hadoop version is 2.6.0-cdh.5.8.3
Spark version is 1.6.0 with Scala version 2.10.5
I trying to reproduce the code above in scala and execute it using spark-shell:
val jdbcDF = sqlContext.read.format("jdbc").options(
Map("url" -> "jdbc:oracle:thin:system/system#db-host:1521:orcl",
"dbtable"-> "schema_name.table_name",
"driver"-> "oracle.jdbc.OracleDriver",
"username" -> "user",
"password" -> "pwd")).load()
The result of this code is the similiar stacktrace:
java.util.NoSuchElementException: key not found: scale
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:58)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:58)
at org.apache.spark.sql.types.Metadata.get(Metadata.scala:108)
at org.apache.spark.sql.types.Metadata.getLong(Metadata.scala:51)
at org.apache.spark.sql.jdbc.OracleDialect$.getCatalystType(OracleDialect.scala:33)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:140)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
at org.apache.spark.sql.execution.datasources.jdbc.DefaultSource.createRelation(DefaultSource.scala:57)
at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:158)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:119)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
at $iwC$$iwC$$iwC.<init>(<console>:38)
at $iwC$$iwC.<init>(<console>:40)
at $iwC.<init>(<console>:42)
at <init>(<console>:44)
at .<init>(<console>:48)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
So, I have strong perception that there is some mistakes in configuration of spark or(and) hadoop.
EDIT 02/01/2017
I investigate that such problem is arise only in case when the column type in oracle table is NUMBER. For example, if I cast the id column (which type is NUMBER) to VARCHAR in select statement, then all will works fine:
val jdbcDF = sqlContext.read.format("jdbc").options(
Map("url" -> "jdbc:oracle:thin:system/system#db-host:1521:orcl",
"dbtable"-> "(SELECT CAST(id AS varchar(3))) FROM tableName",
"driver"-> "oracle.jdbc.OracleDriver",
"username" -> "user",
"password" -> "pwd")).load()
In more detail - the staktrace show us, that problem is appear at org.apache.spark.sql.types.Metadata.get method. By investigate this method from sources we can see (or assume) that in case of NUMBER type it trying to cast it to Long and cannot find the scale of it.
Thats why now I think that main issue is in cloudera distr of apache spark.
At this time I received the answer from Cloudera that in CDH 5.8.3 with Spark 1.6 it is a known issue, which is resolved in Spark version 2.0.
For overcoming the problem we have 3 options:
In select query CAST any numeric type to VARCHAR. Instead of
CAST we also can use TO_CHAR function.
For loading data use RDD/JavaRDD (instead of Dataframe) and after that
convert it to Dataframes (which is much faster)
Use CDH 5.9.x

installing cassandra-spark-connector - but getting error creating SparkContext

installing cassandra-spark-connector - but getting error creating SparkContext
Please help. I am following the guide - https://github.com/datastax/spark-cassandra-connector/blob/master/doc/0_quick_start.md
Env - Spark 1.0.1, Scala 2.10.4
But having the following error message when i get to creating SparkContext. The last line says all master are unresponsive, giving up. Master is still running
My steps are:
./sbin/start-all - starts all workes successfully
MASTER=spark://spark-master-hostname:7077 ./bin/spark-shell - to lunch spark on the master
scala> import org.apache.spark.SparkContext
import org.apache.spark.SparkContext
scala> import org.apache.spark.SparkContext._
import org.apache.spark.SparkContext._
scala> import org.apache.spark.SparkConf
import org.apache.spark.SparkConf
scala> val conf = new SparkConf(true).set("spark.cassandra.connection.host","cassandra-host-ip")
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf#9f073
*scala> val sc = new SparkContext("spark://spark-master-ipaddress:7077", "test", conf)*
**14/07/29 12:18:23 WARN AbstractLifeCycle: FAILED
SelectChannelConnector#0.0.0.0:4040: java.net.BindException: Address already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:293)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:192)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:191)
at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:205)
at org.apache.spark.ui.WebUI.bind(WebUI.scala:99)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:223)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:97)
at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:17)
at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22)
at $line15.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
at $line15.$read$$iwC$$iwC$$iwC.<init>(<console>:26)
at $line15.$read$$iwC$$iwC.<init>(<console>:28)
at $line15.$read$$iwC.<init>(<console>:30)
at $line15.$read.<init>(<console>:32)
at $line15.$read$.<init>(<console>:36)
at $line15.$read$.<clinit>(<console>)
at $line15.$eval$.<init>(<console>:7)
at $line15.$eval$.<clinit>(<console>)
at $line15.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
14/07/29 12:18:23 WARN AbstractLifeCycle: FAILED org.eclipse.jetty.server.Server#dd53c8a: java.net.BindException: Address already in use
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:444)
at sun.nio.ch.Net.bind(Net.java:436)
at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at org.eclipse.jetty.server.nio.SelectChannelConnector.open(SelectChannelConnector.java:187)
at org.eclipse.jetty.server.AbstractConnector.doStart(AbstractConnector.java:316)
at org.eclipse.jetty.server.nio.SelectChannelConnector.doStart(SelectChannelConnector.java:265 )
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.eclipse.jetty.server.Server.doStart(Server.java:293)
at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:64)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply$mcV$sp(JettyUtils.scala:192)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192)
at org.apache.spark.ui.JettyUtils$$anonfun$1.apply(JettyUtils.scala:192)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.ui.JettyUtils$.connect$1(JettyUtils.scala:191)
at org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:205)
at org.apache.spark.ui.WebUI.bind(WebUI.scala:99)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:223)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:97)
at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:17)
at $line15.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:22)
at $line15.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
at $line15.$read$$iwC$$iwC$$iwC.<init>(<console>:26)
at $line15.$read$$iwC$$iwC.<init>(<console>:28)
at $line15.$read$$iwC.<init>(<console>:30)
at $line15.$read.<init>(<console>:32)
at $line15.$read$.<init>(<console>:36)
at $line15.$read$.<clinit>(<console>)
at $line15.$eval$.<init>(<console>:7)
at $line15.$eval$.<clinit>(<console>)
at $line15.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:788)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1056)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext#4353d65f
scala> 14/07/29 12:19:24 ERRstrong textOR SparkDeploySchedulerBackend: Application has been killed. Reason: All master**s are unresponsive! Giving up.
14/07/29 12:19:24 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.****
Step 1 : To load the connector into the Spark Shell, start the shell with this command:
../bin/spark-shell –jars ~/apps/spark-1.2/jars/spark-cassandra-connector-assembly-1.1.1-SNAPSHOT.jar
Step 2 : Connect the Spark Context to the Cassandra cluster.Stop the default context.
sc.stop
Step 3 :Import the necessary jar files.
import com.datastax.spark.connector._, org.apache.spark.SparkContext, org.apache.spark.SparkContext._, org.apache.spark.SparkConf
Step 4 : Make a new SparkConf with the Cassandra connection details:
val conf = new SparkConf(true).set("spark.cassandra.connection.host", "localhost")
Step 5 : Create a new Spark Context:
val sc = new SparkContext(conf)
You now have a new SparkContext which is connected to your Cassandra cluster.
Have tried to use spark-packages?
Spark Cassandra Connector on spark-packages.org
Boils down to
$SPARK_HOME/bin/spark-shell --packages datastax:spark-cassandra-connector:2.0.0-M2-s_2.10
where you need to use the correct version for your version of spark. This should setup everything needed automatically.

Resources