Why External scheduler cannot be instantiated running spark on minikube/kubernetes? - apache-spark

I'm trying to run spark on kubernetes(using minikube with VirtualBox or docker driver, I tested in both) and now I have an error that I don't know how to solve.
The error is a "SparkException: External scheduler cannot be instantiated". I'm new in Kubernetes world, so I really don't know if this is a newbie error, but trying to resolve by myself I failed.
Please help me.
In the next lines, follow the command and the error.
I use this spark submit command:
spark-submit --master k8s://https://192.168.99.102:8443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=2 \
--executor-memory 1024m \
--conf spark.kubernetes.container.image=spark:latest \
local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar
And i got this error in the pod:
20/06/23 15:24:56 INFO SparkContext: Submitted application: Spark Pi
20/06/23 15:24:56 INFO SecurityManager: Changing view acls to: 185,luan
20/06/23 15:24:56 INFO SecurityManager: Changing modify acls to: 185,luan
20/06/23 15:24:56 INFO SecurityManager: Changing view acls groups to:
20/06/23 15:24:56 INFO SecurityManager: Changing modify acls groups to:
20/06/23 15:24:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(185, luan); groups with view permissions: Set(); users with modify permissions: Set(185, luan); groups with modify permissions: Set()
20/06/23 15:24:57 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
20/06/23 15:24:57 INFO SparkEnv: Registering MapOutputTracker
20/06/23 15:24:57 INFO SparkEnv: Registering BlockManagerMaster
20/06/23 15:24:57 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/06/23 15:24:57 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/06/23 15:24:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
20/06/23 15:24:57 INFO DiskBlockManager: Created local directory at /var/data/spark-4f7b787b-ec75-4ae5-b703-f9f90ef130cb/blockmgr-1ef6d02a-48f6-4bd7-9d7d-fe2518850f5e
20/06/23 15:24:57 INFO MemoryStore: MemoryStore started with capacity 413.9 MiB
20/06/23 15:24:57 INFO SparkEnv: Registering OutputCommitCoordinator
20/06/23 15:24:57 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/06/23 15:24:57 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-pi-a8278472e1c83236-driver-svc.default.svc:4040
20/06/23 15:24:57 INFO SparkContext: Added JAR local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar at file:/opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar with timestamp 1592925897650
20/06/23 15:24:57 WARN SparkContext: The jar local:///opt/spark/examples/jars/spark-examples_2.12-3.0.0.jar has been added already. Overwriting of added jars is not supported in the current version.
20/06/23 15:24:57 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
20/06/23 15:24:58 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: External scheduler cannot be instantiated
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2934)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:528)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2555)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:930)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/default/pods/spark-pi-a8278472e1c83236-driver. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "spark-pi-a8278472e1c83236-driver" is forbidden: User "system:serviceaccount:default:default" cannot get resource "pods" in API group "" in the namespace "default".
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:505)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:395)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:376)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:845)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:214)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:168)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:59)
at scala.Option.map(Option.scala:230)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:58)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:113)
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2928)
... 19 more
20/06/23 15:24:58 INFO SparkUI: Stopped Spark web UI at http://spark-pi-a8278472e1c83236-driver-svc.default.svc:4040
20/06/23 15:24:58 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/06/23 15:24:58 INFO MemoryStore: MemoryStore cleared
20/06/23 15:24:58 INFO BlockManager: BlockManager stopped
20/06/23 15:24:58 INFO BlockManagerMaster: BlockManagerMaster stopped
20/06/23 15:24:58 WARN MetricsSystem: Stopping a MetricsSystem that is not running
20/06/23 15:24:58 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/06/23 15:24:58 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2934)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:528)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2555)
at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$1(SparkSession.scala:930)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/default/pods/spark-pi-a8278472e1c83236-driver. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "spark-pi-a8278472e1c83236-driver" is forbidden: User "system:serviceaccount:default:default" cannot get resource "pods" in API group "" in the namespace "default".
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:568)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:505)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:471)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:430)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:395)
at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:376)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:845)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:214)
at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:168)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:59)
at scala.Option.map(Option.scala:230)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:58)
at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:113)
at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2928)
... 19 more
20/06/23 15:24:58 INFO ShutdownHookManager: Shutdown hook called
20/06/23 15:24:58 INFO ShutdownHookManager: Deleting directory /var/data/spark-4f7b787b-ec75-4ae5-b703-f9f90ef130cb/spark-616edc5e-b42d-4c77-9f11-8465b4d69642
20/06/23 15:24:58 INFO ShutdownHookManager: Deleting directory /tmp/spark-71e3bd59-3b7d-4d72-a442-b0ad0c7092fb
Thank You!
Ps: Im using Spark 3.0 - new version, minikube - 1.11.0

Based on the log file:
Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. pods "spark-pi-a8278472e1c83236-driver" is forbidden: User "system:serviceaccount:default:default" cannot get resource "pods" in API group "" in the namespace "default".
It looks like the default:default service account doesn't have edit permissions. You can run this to create the ClusterRoleBinding to add the permissions.
$ kubectl create clusterrolebinding default \
--clusterrole=edit --serviceaccount=default:default --namespace=default
You can take a look at this cheat sheet.

Related

ERROR SparkContext: Failed to add None to Spark environment

I submit a spark job first like this in a pyspark file
os.system(f'spark-submit --master local --jars ./examples/lib/app.jar app.py')
Then in the submitted app.py file, I create a new SparkSession like this:
spark = SparkSession.builder.appName(appName) \
.config('spark.jars') \
.getOrCreate()
Error message:
23/01/17 11:02:52 INFO SparkContext: Running Spark version 3.3.0
23/01/17 11:02:52 INFO ResourceUtils: ==============================================================
23/01/17 11:02:52 INFO ResourceUtils: No custom resources configured for spark.driver.
23/01/17 11:02:52 INFO ResourceUtils: ==============================================================
23/01/17 11:02:52 INFO SparkContext: Submitted application: symbolic_test
23/01/17 11:02:52 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/01/17 11:02:52 INFO ResourceProfile: Limiting resource is cpu
23/01/17 11:02:53 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/01/17 11:02:53 INFO SecurityManager: Changing view acls to: annie
23/01/17 11:02:53 INFO SecurityManager: Changing modify acls to: annie
23/01/17 11:02:53 INFO SecurityManager: Changing view acls groups to:
23/01/17 11:02:53 INFO SecurityManager: Changing modify acls groups to:
23/01/17 11:02:53 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(annie); groups with view permissions: Set(); users with modify permissions: Set(annie); groups with modify permissions: Set()
23/01/17 11:02:53 INFO Utils: Successfully started service 'sparkDriver' on port 42141.
23/01/17 11:02:53 INFO SparkEnv: Registering MapOutputTracker
23/01/17 11:02:53 INFO SparkEnv: Registering BlockManagerMaster
23/01/17 11:02:53 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/01/17 11:02:53 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/01/17 11:02:53 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/01/17 11:02:53 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-e4cc3b01-a6d5-4454-ad2d-4d0f42066479
23/01/17 11:02:53 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
23/01/17 11:02:53 INFO SparkEnv: Registering OutputCommitCoordinator
23/01/17 11:02:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/01/17 11:02:53 ERROR SparkContext: Failed to add None to Spark environment
java.io.FileNotFoundException: Jar /home/annie/exampleApp/example/None not found
at org.apache.spark.SparkContext.addLocalJarFile$1(SparkContext.scala:1949)
at org.apache.spark.SparkContext.addJar(SparkContext.scala:2004)
at org.apache.spark.SparkContext.$anonfun$new$12(SparkContext.scala:507)
at org.apache.spark.SparkContext.$anonfun$new$12$adapted(SparkContext.scala:507)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:507)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
when creating spark session through pyspark, I get the following error messages, which only arise when I add .config('spark.jars').
I've set my $SPARK_HOME variable correctly...
Any help will be appreciated!
If your code sample is true you do not assign any value to spark.jars key while creating spark session. Assigning jar path as value may solve the error.
SparkSession.builder.appName(appName) \
.config('config_key', config_value) \

Getting NoClassDefFoundError using Spark with spark-cassandra-connector 3.1.0

I've been trying to submit a spark application but get the following exception:
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/jars/spark-unsafe_2.12-3.2.0.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
21/11/13 13:17:42 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2021-11-13T13:17:46+0330 - INFO - Great Expectations logging enabled at 20 level by JupyterUX module.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/11/13 13:17:47 INFO SparkContext: Running Spark version 3.2.0
21/11/13 13:17:47 INFO ResourceUtils: ==============================================================
21/11/13 13:17:47 INFO ResourceUtils: No custom resources configured for spark.driver.
21/11/13 13:17:47 INFO ResourceUtils: ==============================================================
21/11/13 13:17:47 INFO SparkContext: Submitted application: examstat
21/11/13 13:17:47 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
21/11/13 13:17:47 INFO ResourceProfile: Limiting resource is cpu
21/11/13 13:17:47 INFO ResourceProfileManager: Added ResourceProfile id: 0
21/11/13 13:17:47 INFO SecurityManager: Changing view acls to: alisaberi
21/11/13 13:17:47 INFO SecurityManager: Changing modify acls to: alisaberi
21/11/13 13:17:47 INFO SecurityManager: Changing view acls groups to:
21/11/13 13:17:47 INFO SecurityManager: Changing modify acls groups to:
21/11/13 13:17:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(alisaberi); groups with view permissions: Set(); users with modify permissions: Set(alisaberi); groups with modify permissions: Set()
21/11/13 13:17:47 INFO Utils: Successfully started service 'sparkDriver' on port 62135.
21/11/13 13:17:47 INFO SparkEnv: Registering MapOutputTracker
21/11/13 13:17:47 INFO SparkEnv: Registering BlockManagerMaster
21/11/13 13:17:47 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/11/13 13:17:47 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/11/13 13:17:47 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
21/11/13 13:17:47 INFO DiskBlockManager: Created local directory at /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/blockmgr-e6d2444c-2aa6-4690-ac82-7a4ab1d86b6b
21/11/13 13:17:47 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
21/11/13 13:17:47 INFO SparkEnv: Registering OutputCommitCoordinator
21/11/13 13:17:47 INFO Utils: Successfully started service 'SparkUI' on port 4040.
21/11/13 13:17:47 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.1.3:4040
21/11/13 13:17:47 INFO SparkContext: Added JAR file:///Users/alisaberi/Desktop/test-great-expectations/spark-cassandra-connector-assembly_2.12-3.1.0.jar at spark://192.168.1.3:62135/jars/spark-cassandra-connector-assembly_2.12-3.1.0.jar with timestamp 1636796867038
21/11/13 13:17:47 INFO Executor: Starting executor ID driver on host 192.168.1.3
21/11/13 13:17:47 INFO Executor: Fetching spark://192.168.1.3:62135/jars/spark-cassandra-connector-assembly_2.12-3.1.0.jar with timestamp 1636796867038
21/11/13 13:17:47 INFO TransportClientFactory: Successfully created connection to /192.168.1.3:62135 after 42 ms (0 ms spent in bootstraps)
21/11/13 13:17:47 INFO Utils: Fetching spark://192.168.1.3:62135/jars/spark-cassandra-connector-assembly_2.12-3.1.0.jar to /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb/userFiles-89f4f184-ba26-4a28-b83f-52cec85d7563/fetchFileTemp11862606911562884947.tmp
21/11/13 13:17:48 INFO Executor: Adding file:/private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb/userFiles-89f4f184-ba26-4a28-b83f-52cec85d7563/spark-cassandra-connector-assembly_2.12-3.1.0.jar to class loader
21/11/13 13:17:48 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 62138.
21/11/13 13:17:48 INFO NettyBlockTransferService: Server created on 192.168.1.3:62138
21/11/13 13:17:48 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/11/13 13:17:48 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.1.3, 62138, None)
21/11/13 13:17:48 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.1.3:62138 with 434.4 MiB RAM, BlockManagerId(driver, 192.168.1.3, 62138, None)
21/11/13 13:17:48 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.1.3, 62138, None)
21/11/13 13:17:48 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.1.3, 62138, None)
21/11/13 13:17:48 WARN SparkSession: Cannot use com.datastax.spark.connector.CassandraSparkExtensions to configure session extensions.
java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1016)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:151)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:825)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:723)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:646)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:604)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:576)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:468)
at org.apache.spark.util.Utils$.classForName(Utils.scala:216)
at org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1(SparkSession.scala:1194)
at org.apache.spark.sql.SparkSession$.$anonfun$applyExtensions$1$adapted(SparkSession.scala:1192)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$applyExtensions(SparkSession.scala:1192)
at org.apache.spark.sql.SparkSession.<init>(SparkSession.scala:104)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:64)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.util.Logging
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:606)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 33 more
21/11/13 13:17:48 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir.
21/11/13 13:17:48 INFO SharedState: Warehouse path is 'file:/Users/alisaberi/Desktop/test-great-expectations/spark-warehouse'.
/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/context.py:77: FutureWarning: Deprecated in 3.0.0. Use SparkSession.builder.getOrCreate() instead.
Traceback (most recent call last):
File "/Users/alisaberi/Desktop/test-great-expectations/test.py", line 33, in <module>
sqlContext.read\
File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 164, in load
File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1309, in __call__
File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/pyspark.zip/pyspark/sql/utils.py", line 111, in deco
File "/Users/alisaberi/Desktop/test-great-expectations/spark-3.2.0-bin-hadoop3.2/python/lib/py4j-0.10.9.2-src.zip/py4j/protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o56.load.
: java.lang.NoClassDefFoundError: com/datastax/spark/connector/util/Logging
at java.base/java.lang.ClassLoader.defineClass1(Native Method)
at java.base/java.lang.ClassLoader.defineClass(ClassLoader.java:1016)
at java.base/java.security.SecureClassLoader.defineClass(SecureClassLoader.java:151)
at java.base/jdk.internal.loader.BuiltinClassLoader.defineClass(BuiltinClassLoader.java:825)
at java.base/jdk.internal.loader.BuiltinClassLoader.findClassOnClassPathOrNull(BuiltinClassLoader.java:723)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClassOrNull(BuiltinClassLoader.java:646)
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:604)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
at org.apache.spark.sql.cassandra.DefaultSource.getTable(DefaultSource.scala:55)
at org.apache.spark.sql.cassandra.DefaultSource.inferSchema(DefaultSource.scala:72)
at org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.getTableFromProvider(DataSourceV2Utils.scala:81)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:233)
at scala.Option.map(Option.scala:230)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:174)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:832)
Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.util.Logging
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:606)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:168)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
... 28 more
21/11/13 13:17:49 INFO SparkContext: Invoking stop() from shutdown hook
21/11/13 13:17:49 INFO SparkUI: Stopped Spark web UI at http://192.168.1.3:4040
21/11/13 13:17:49 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/11/13 13:17:49 INFO MemoryStore: MemoryStore cleared
21/11/13 13:17:49 INFO BlockManager: BlockManager stopped
21/11/13 13:17:49 INFO BlockManagerMaster: BlockManagerMaster stopped
21/11/13 13:17:49 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/11/13 13:17:49 INFO SparkContext: Successfully stopped SparkContext
21/11/13 13:17:49 INFO ShutdownHookManager: Shutdown hook called
21/11/13 13:17:49 INFO ShutdownHookManager: Deleting directory /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-ef03b69b-8170-49e1-a24f-af46ff8ada7d
21/11/13 13:17:49 INFO ShutdownHookManager: Deleting directory /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb/pyspark-42c7c117-c948-4b16-82a6-39017769cff9
21/11/13 13:17:49 INFO ShutdownHookManager: Deleting directory /private/var/folders/4q/qc3xhr1x6qx5jr9604nl91w40000gn/T/spark-3961cb18-dacf-4940-a5ff-36d1bbc2c3bb
The application use spark-cassandra-connector to read from cassandra. Here is the code:
from pyspark.sql import SQLContext, SparkSession
from pyspark.context import SparkContext
spark = SparkSession\
.builder\
.appName("Test")\
.master('local[*]') \
.config('spark.cassandra.connection.host', 'localhost') \
.getOrCreate()
spark.read\
.format("org.apache.spark.sql.cassandra")\
.options(table="gps", keyspace="test")\
.load().show()
I've tried two different approaches to submit the application:
$SPARK_HOME/bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.12:3.1.0 ./test.py
$SPARK_HOME/bin/spark-submit --jars /Full/Path/to/spark-cassandra-connector-assembly_2.12-3.1.0.jar
Also when I run the same code in pyspark shell, it works fine.
Spark 3.2.0
spark-cassandra-connector 3.1.0
cassandra 4.0.1

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`product`' given input columns: [jsontostructs(message)];

C:\Users\sorun\.jdks\openjdk-14.0.1\bin\java.exe "-javaagent:D:\Intellij IDEA\IntelliJ IDEA 2020.1.1\lib\idea_rt.jar=50945:D:\Intellij IDEA\IntelliJ IDEA 2020.1.1\bin" -Dfile.encoding=UTF-8 -classpath C:\Users\sorun\IdeaProjects\spark-streaming-kafka\target\classes;C:\Users\sorun\.m2\repository\org\apache\spark\spark-sql_2.11\2.2.0\spark-sql_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\com\univocity\univocity-parsers\2.2.1\univocity-parsers-2.2.1.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-sketch_2.11\2.2.0\spark-sketch_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-core_2.11\2.2.0\spark-core_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\apache\avro\avro\1.7.7\avro-1.7.7.jar;C:\Users\sorun\.m2\repository\com\thoughtworks\paranamer\paranamer\2.3\paranamer-2.3.jar;C:\Users\sorun\.m2\repository\org\apache\commons\commons-compress\1.4.1\commons-compress-1.4.1.jar;C:\Users\sorun\.m2\repository\org\tukaani\xz\1.0\xz-1.0.jar;C:\Users\sorun\.m2\repository\org\apache\avro\avro-mapred\1.7.7\avro-mapred-1.7.7-hadoop2.jar;C:\Users\sorun\.m2\repository\org\apache\avro\avro-ipc\1.7.7\avro-ipc-1.7.7.jar;C:\Users\sorun\.m2\repository\org\apache\avro\avro-ipc\1.7.7\avro-ipc-1.7.7-tests.jar;C:\Users\sorun\.m2\repository\com\twitter\chill_2.11\0.8.0\chill_2.11-0.8.0.jar;C:\Users\sorun\.m2\repository\com\esotericsoftware\kryo-shaded\3.0.3\kryo-shaded-3.0.3.jar;C:\Users\sorun\.m2\repository\com\esotericsoftware\minlog\1.3.0\minlog-1.3.0.jar;C:\Users\sorun\.m2\repository\org\objenesis\objenesis\2.1\objenesis-2.1.jar;C:\Users\sorun\.m2\repository\com\twitter\chill-java\0.8.0\chill-java-0.8.0.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-client\2.6.5\hadoop-client-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-common\2.6.5\hadoop-common-2.6.5.jar;C:\Users\sorun\.m2\repository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;C:\Users\sorun\.m2\repository\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;C:\Users\sorun\.m2\repository\commons-httpclient\commons-httpclient\3.1\commons-httpclient-3.1.jar;C:\Users\sorun\.m2\repository\commons-io\commons-io\2.4\commons-io-2.4.jar;C:\Users\sorun\.m2\repository\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;C:\Users\sorun\.m2\repository\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;C:\Users\sorun\.m2\repository\commons-configuration\commons-configuration\1.6\commons-configuration-1.6.jar;C:\Users\sorun\.m2\repository\commons-digester\commons-digester\1.8\commons-digester-1.8.jar;C:\Users\sorun\.m2\repository\commons-beanutils\commons-beanutils\1.7.0\commons-beanutils-1.7.0.jar;C:\Users\sorun\.m2\repository\commons-beanutils\commons-beanutils-core\1.8.0\commons-beanutils-core-1.8.0.jar;C:\Users\sorun\.m2\repository\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-auth\2.6.5\hadoop-auth-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\directory\server\apacheds-kerberos-codec\2.0.0-M15\apacheds-kerberos-codec-2.0.0-M15.jar;C:\Users\sorun\.m2\repository\org\apache\directory\server\apacheds-i18n\2.0.0-M15\apacheds-i18n-2.0.0-M15.jar;C:\Users\sorun\.m2\repository\org\apache\directory\api\api-asn1-api\1.0.0-M20\api-asn1-api-1.0.0-M20.jar;C:\Users\sorun\.m2\repository\org\apache\directory\api\api-util\1.0.0-M20\api-util-1.0.0-M20.jar;C:\Users\sorun\.m2\repository\org\apache\curator\curator-client\2.6.0\curator-client-2.6.0.jar;C:\Users\sorun\.m2\repository\org\htrace\htrace-core\3.0.4\htrace-core-3.0.4.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-hdfs\2.6.5\hadoop-hdfs-2.6.5.jar;C:\Users\sorun\.m2\repository\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;C:\Users\sorun\.m2\repository\xerces\xercesImpl\2.9.1\xercesImpl-2.9.1.jar;C:\Users\sorun\.m2\repository\xml-apis\xml-apis\1.3.04\xml-apis-1.3.04.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-app\2.6.5\hadoop-mapreduce-client-app-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-common\2.6.5\hadoop-mapreduce-client-common-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-yarn-client\2.6.5\hadoop-yarn-client-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-yarn-server-common\2.6.5\hadoop-yarn-server-common-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-shuffle\2.6.5\hadoop-mapreduce-client-shuffle-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-yarn-api\2.6.5\hadoop-yarn-api-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-core\2.6.5\hadoop-mapreduce-client-core-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-yarn-common\2.6.5\hadoop-yarn-common-2.6.5.jar;C:\Users\sorun\.m2\repository\javax\xml\bind\jaxb-api\2.2.2\jaxb-api-2.2.2.jar;C:\Users\sorun\.m2\repository\javax\xml\stream\stax-api\1.0-2\stax-api-1.0-2.jar;C:\Users\sorun\.m2\repository\org\codehaus\jackson\jackson-jaxrs\1.9.13\jackson-jaxrs-1.9.13.jar;C:\Users\sorun\.m2\repository\org\codehaus\jackson\jackson-xc\1.9.13\jackson-xc-1.9.13.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-jobclient\2.6.5\hadoop-mapreduce-client-jobclient-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-annotations\2.6.5\hadoop-annotations-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-launcher_2.11\2.2.0\spark-launcher_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-network-common_2.11\2.2.0\spark-network-common_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-network-shuffle_2.11\2.2.0\spark-network-shuffle_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-unsafe_2.11\2.2.0\spark-unsafe_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\net\java\dev\jets3t\jets3t\0.9.3\jets3t-0.9.3.jar;C:\Users\sorun\.m2\repository\org\apache\httpcomponents\httpcore\4.3.3\httpcore-4.3.3.jar;C:\Users\sorun\.m2\repository\org\apache\httpcomponents\httpclient\4.3.6\httpclient-4.3.6.jar;C:\Users\sorun\.m2\repository\javax\activation\activation\1.1.1\activation-1.1.1.jar;C:\Users\sorun\.m2\repository\mx4j\mx4j\3.0.2\mx4j-3.0.2.jar;C:\Users\sorun\.m2\repository\javax\mail\mail\1.4.7\mail-1.4.7.jar;C:\Users\sorun\.m2\repository\org\bouncycastle\bcprov-jdk15on\1.51\bcprov-jdk15on-1.51.jar;C:\Users\sorun\.m2\repository\com\jamesmurty\utils\java-xmlbuilder\1.0\java-xmlbuilder-1.0.jar;C:\Users\sorun\.m2\repository\net\iharder\base64\2.3.8\base64-2.3.8.jar;C:\Users\sorun\.m2\repository\org\apache\curator\curator-recipes\2.6.0\curator-recipes-2.6.0.jar;C:\Users\sorun\.m2\repository\org\apache\curator\curator-framework\2.6.0\curator-framework-2.6.0.jar;C:\Users\sorun\.m2\repository\org\apache\zookeeper\zookeeper\3.4.6\zookeeper-3.4.6.jar;C:\Users\sorun\.m2\repository\com\google\guava\guava\16.0.1\guava-16.0.1.jar;C:\Users\sorun\.m2\repository\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;C:\Users\sorun\.m2\repository\org\apache\commons\commons-lang3\3.5\commons-lang3-3.5.jar;C:\Users\sorun\.m2\repository\org\apache\commons\commons-math3\3.4.1\commons-math3-3.4.1.jar;C:\Users\sorun\.m2\repository\com\google\code\findbugs\jsr305\1.3.9\jsr305-1.3.9.jar;C:\Users\sorun\.m2\repository\org\slf4j\slf4j-api\1.7.16\slf4j-api-1.7.16.jar;C:\Users\sorun\.m2\repository\org\slf4j\jul-to-slf4j\1.7.16\jul-to-slf4j-1.7.16.jar;C:\Users\sorun\.m2\repository\org\slf4j\jcl-over-slf4j\1.7.16\jcl-over-slf4j-1.7.16.jar;C:\Users\sorun\.m2\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar;C:\Users\sorun\.m2\repository\org\slf4j\slf4j-log4j12\1.7.16\slf4j-log4j12-1.7.16.jar;C:\Users\sorun\.m2\repository\com\ning\compress-lzf\1.0.3\compress-lzf-1.0.3.jar;C:\Users\sorun\.m2\repository\org\xerial\snappy\snappy-java\1.1.2.6\snappy-java-1.1.2.6.jar;C:\Users\sorun\.m2\repository\net\jpountz\lz4\lz4\1.3.0\lz4-1.3.0.jar;C:\Users\sorun\.m2\repository\org\roaringbitmap\RoaringBitmap\0.5.11\RoaringBitmap-0.5.11.jar;C:\Users\sorun\.m2\repository\commons-net\commons-net\2.2\commons-net-2.2.jar;C:\Users\sorun\.m2\repository\org\scala-lang\scala-library\2.11.8\scala-library-2.11.8.jar;C:\Users\sorun\.m2\repository\org\json4s\json4s-jackson_2.11\3.2.11\json4s-jackson_2.11-3.2.11.jar;C:\Users\sorun\.m2\repository\org\json4s\json4s-core_2.11\3.2.11\json4s-core_2.11-3.2.11.jar;C:\Users\sorun\.m2\repository\org\json4s\json4s-ast_2.11\3.2.11\json4s-ast_2.11-3.2.11.jar;C:\Users\sorun\.m2\repository\org\scala-lang\scalap\2.11.0\scalap-2.11.0.jar;C:\Users\sorun\.m2\repository\org\scala-lang\scala-compiler\2.11.0\scala-compiler-2.11.0.jar;C:\Users\sorun\.m2\repository\org\scala-lang\modules\scala-xml_2.11\1.0.1\scala-xml_2.11-1.0.1.jar;C:\Users\sorun\.m2\repository\org\scala-lang\modules\scala-parser-combinators_2.11\1.0.1\scala-parser-combinators_2.11-1.0.1.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\core\jersey-client\2.22.2\jersey-client-2.22.2.jar;C:\Users\sorun\.m2\repository\javax\ws\rs\javax.ws.rs-api\2.0.1\javax.ws.rs-api-2.0.1.jar;C:\Users\sorun\.m2\repository\org\glassfish\hk2\hk2-api\2.4.0-b34\hk2-api-2.4.0-b34.jar;C:\Users\sorun\.m2\repository\org\glassfish\hk2\hk2-utils\2.4.0-b34\hk2-utils-2.4.0-b34.jar;C:\Users\sorun\.m2\repository\org\glassfish\hk2\external\aopalliance-repackaged\2.4.0-b34\aopalliance-repackaged-2.4.0-b34.jar;C:\Users\sorun\.m2\repository\org\glassfish\hk2\external\javax.inject\2.4.0-b34\javax.inject-2.4.0-b34.jar;C:\Users\sorun\.m2\repository\org\glassfish\hk2\hk2-locator\2.4.0-b34\hk2-locator-2.4.0-b34.jar;C:\Users\sorun\.m2\repository\org\javassist\javassist\3.18.1-GA\javassist-3.18.1-GA.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\core\jersey-common\2.22.2\jersey-common-2.22.2.jar;C:\Users\sorun\.m2\repository\javax\annotation\javax.annotation-api\1.2\javax.annotation-api-1.2.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\bundles\repackaged\jersey-guava\2.22.2\jersey-guava-2.22.2.jar;C:\Users\sorun\.m2\repository\org\glassfish\hk2\osgi-resource-locator\1.0.1\osgi-resource-locator-1.0.1.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\core\jersey-server\2.22.2\jersey-server-2.22.2.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\media\jersey-media-jaxb\2.22.2\jersey-media-jaxb-2.22.2.jar;C:\Users\sorun\.m2\repository\javax\validation\validation-api\1.1.0.Final\validation-api-1.1.0.Final.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\containers\jersey-container-servlet\2.22.2\jersey-container-servlet-2.22.2.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\containers\jersey-container-servlet-core\2.22.2\jersey-container-servlet-core-2.22.2.jar;C:\Users\sorun\.m2\repository\io\netty\netty-all\4.0.43.Final\netty-all-4.0.43.Final.jar;C:\Users\sorun\.m2\repository\io\netty\netty\3.9.9.Final\netty-3.9.9.Final.jar;C:\Users\sorun\.m2\repository\com\clearspring\analytics\stream\2.7.0\stream-2.7.0.jar;C:\Users\sorun\.m2\repository\io\dropwizard\metrics\metrics-core\3.1.2\metrics-core-3.1.2.jar;C:\Users\sorun\.m2\repository\io\dropwizard\metrics\metrics-jvm\3.1.2\metrics-jvm-3.1.2.jar;C:\Users\sorun\.m2\repository\io\dropwizard\metrics\metrics-json\3.1.2\metrics-json-3.1.2.jar;C:\Users\sorun\.m2\repository\io\dropwizard\metrics\metrics-graphite\3.1.2\metrics-graphite-3.1.2.jar;C:\Users\sorun\.m2\repository\com\fasterxml\jackson\module\jackson-module-scala_2.11\2.6.5\jackson-module-scala_2.11-2.6.5.jar;C:\Users\sorun\.m2\repository\com\fasterxml\jackson\module\jackson-module-paranamer\2.6.5\jackson-module-paranamer-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\ivy\ivy\2.4.0\ivy-2.4.0.jar;C:\Users\sorun\.m2\repository\oro\oro\2.0.8\oro-2.0.8.jar;C:\Users\sorun\.m2\repository\net\razorvine\pyrolite\4.13\pyrolite-4.13.jar;C:\Users\sorun\.m2\repository\net\sf\py4j\py4j\0.10.4\py4j-0.10.4.jar;C:\Users\sorun\.m2\repository\org\apache\commons\commons-crypto\1.0.0\commons-crypto-1.0.0.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-catalyst_2.11\2.2.0\spark-catalyst_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\scala-lang\scala-reflect\2.11.8\scala-reflect-2.11.8.jar;C:\Users\sorun\.m2\repository\org\codehaus\janino\janino\3.0.0\janino-3.0.0.jar;C:\Users\sorun\.m2\repository\org\codehaus\janino\commons-compiler\3.0.0\commons-compiler-3.0.0.jar;C:\Users\sorun\.m2\repository\org\antlr\antlr4-runtime\4.5.3\antlr4-runtime-4.5.3.jar;C:\Users\sorun\.m2\repository\commons-codec\commons-codec\1.10\commons-codec-1.10.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-tags_2.11\2.2.0\spark-tags_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\apache\parquet\parquet-column\1.8.2\parquet-column-1.8.2.jar;C:\Users\sorun\.m2\repository\org\apache\parquet\parquet-common\1.8.2\parquet-common-1.8.2.jar;C:\Users\sorun\.m2\repository\org\apache\parquet\parquet-encoding\1.8.2\parquet-encoding-1.8.2.jar;C:\Users\sorun\.m2\repository\org\apache\parquet\parquet-hadoop\1.8.2\parquet-hadoop-1.8.2.jar;C:\Users\sorun\.m2\repository\org\apache\parquet\parquet-format\2.3.1\parquet-format-2.3.1.jar;C:\Users\sorun\.m2\repository\org\apache\parquet\parquet-jackson\1.8.2\parquet-jackson-1.8.2.jar;C:\Users\sorun\.m2\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.11\jackson-mapper-asl-1.9.11.jar;C:\Users\sorun\.m2\repository\org\codehaus\jackson\jackson-core-asl\1.9.11\jackson-core-asl-1.9.11.jar;C:\Users\sorun\.m2\repository\com\fasterxml\jackson\core\jackson-databind\2.6.5\jackson-databind-2.6.5.jar;C:\Users\sorun\.m2\repository\com\fasterxml\jackson\core\jackson-annotations\2.6.0\jackson-annotations-2.6.0.jar;C:\Users\sorun\.m2\repository\com\fasterxml\jackson\core\jackson-core\2.6.5\jackson-core-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\xbean\xbean-asm5-shaded\4.4\xbean-asm5-shaded-4.4.jar;C:\Users\sorun\.m2\repository\org\spark-project\spark\unused\1.0.0\unused-1.0.0.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-sql-kafka-0-10_2.11\2.2.0\spark-sql-kafka-0-10_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\apache\kafka\kafka-clients\0.10.0.1\kafka-clients-0.10.0.1.jar;C:\Users\sorun\.m2\repository\com\google\code\gson\gson\2.8.3\gson-2.8.3.jar StreamingConsumer
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/06/19 12:39:42 INFO SparkContext: Running Spark version 2.2.0
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/C:/Users/sorun/.m2/repository/org/apache/hadoop/hadoop-auth/2.6.5/hadoop-auth-2.6.5.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/06/19 12:39:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/19 12:39:44 INFO SparkContext: Submitted application: Streaming-kafka
20/06/19 12:39:44 INFO SecurityManager: Changing view acls to: OZAN-OKAN
20/06/19 12:39:44 INFO SecurityManager: Changing modify acls to: OZAN-OKAN
20/06/19 12:39:44 INFO SecurityManager: Changing view acls groups to:
20/06/19 12:39:44 INFO SecurityManager: Changing modify acls groups to:
20/06/19 12:39:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(OZAN-OKAN); groups with view permissions: Set(); users with modify permissions: Set(OZAN-OKAN); groups with modify permissions: Set()
20/06/19 12:39:45 INFO Utils: Successfully started service 'sparkDriver' on port 50966.
20/06/19 12:39:45 INFO SparkEnv: Registering MapOutputTracker
20/06/19 12:39:45 INFO SparkEnv: Registering BlockManagerMaster
20/06/19 12:39:45 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/06/19 12:39:45 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/06/19 12:39:45 INFO DiskBlockManager: Created local directory at C:\Users\sorun\AppData\Local\Temp\blockmgr-0794380e-6e2b-4559-bf6c-7d10c2074bc8
20/06/19 12:39:45 INFO MemoryStore: MemoryStore started with capacity 1040.4 MB
20/06/19 12:39:45 INFO SparkEnv: Registering OutputCommitCoordinator
20/06/19 12:39:45 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/06/19 12:39:46 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.56.1:4040
20/06/19 12:39:46 INFO Executor: Starting executor ID driver on host localhost
20/06/19 12:39:46 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 50975.
20/06/19 12:39:46 INFO NettyBlockTransferService: Server created on 192.168.56.1:50975
20/06/19 12:39:46 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/06/19 12:39:46 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.56.1, 50975, None)
20/06/19 12:39:46 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.56.1:50975 with 1040.4 MB RAM, BlockManagerId(driver, 192.168.56.1, 50975, None)
20/06/19 12:39:46 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.56.1, 50975, None)
20/06/19 12:39:46 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.56.1, 50975, None)
20/06/19 12:39:46 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/C:/Users/sorun/IdeaProjects/spark-streaming-kafka/spark-warehouse/').
20/06/19 12:39:46 INFO SharedState: Warehouse path is 'file:/C:/Users/sorun/IdeaProjects/spark-streaming-kafka/spark-warehouse/'.
20/06/19 12:39:47 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
20/06/19 12:39:47 INFO CatalystSqlParser: Parsing command: string
20/06/19 12:39:49 INFO SparkSqlParser: Parsing command: CAST(value AS STRING) message
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`product`' given input columns: [jsontostructs(message)];
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:88)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4$$anonfun$apply$10.apply(TreeNode.scala:323)
at scala.collection.MapLike$MappedValues$$anonfun$iterator$3.apply(MapLike.scala:246)
at scala.collection.MapLike$MappedValues$$anonfun$iterator$3.apply(MapLike.scala:246)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.IterableLike$$anon$1.foreach(IterableLike.scala:311)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.MapBuilder.$plus$plus$eq(MapBuilder.scala:25)
at scala.collection.TraversableViewLike$class.force(TraversableViewLike.scala:88)
at scala.collection.IterableLike$$anon$1.force(IterableLike.scala:311)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:268)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:268)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:279)
at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:289)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$6.apply(QueryPlan.scala:298)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:298)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:268)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:85)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolveAndBind(ExpressionEncoder.scala:256)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:206)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:170)
at org.apache.spark.sql.Dataset$.apply(Dataset.scala:61)
at org.apache.spark.sql.Dataset.as(Dataset.scala:380)
at StreamingConsumer.main(StreamingConsumer.java:24)
20/06/19 12:39:50 INFO SparkContext: Invoking stop() from shutdown hook
20/06/19 12:39:50 INFO SparkUI: Stopped Spark web UI at http://192.168.56.1:4040
20/06/19 12:39:50 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/06/19 12:39:50 INFO MemoryStore: MemoryStore cleared
20/06/19 12:39:50 INFO BlockManager: BlockManager stopped
20/06/19 12:39:50 INFO BlockManagerMaster: BlockManagerMaster stopped
20/06/19 12:39:50 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/06/19 12:39:50 INFO SparkContext: Successfully stopped SparkContext
20/06/19 12:39:50 INFO ShutdownHookManager: Shutdown hook called
20/06/19 12:39:50 INFO ShutdownHookManager: Deleting directory C:\Users\sorun\AppData\Local\Temp\spark-b70ecbcc-e6cf-4328-9069-97cc41cc72d7
Process finished with exit code 1
CODE
Exception in thread "main" org.apache.spark.sql.AnalysisException:
cannot resolve '`product`' given input columns: [jsontostructs(message)];
Above exception message says the column which you are selecting is not available in DataFrame, rename the column jsontostructs(message) to product & use this column in select.
And if you have "message" field in your model,
add it to schema struct type
StructType schema = new StructType().add("product","string").add("time", DataTypes.TimestampType).add("message", DataTypes.StringType);
Change schema).as("json"))
Dataset<SearchProductModel> data = load.selectExpr("CAST(value AS STRING) as message")
.select(functions.from_json(functions.col("message"), schema).as("json"))
.select("json.*")
.as(Encoders.bean(SearchProductModel.class));

Error when running spark-submit: java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition

spark-submit --jars spark-streaming-kafka-0-8_2.11-2.4.4.jar direct_approach.py localhost:9092 new_topic
I ran the code above but I don't know why I got this error. I spent hours to fix but I cannot.
I am using Spark 2.4.4 and Scala 2.13.0. I tried to set spark.executor.memory and spark.driver.memory in my Spark configuration file but i still could not solve the problem.
Here is the error:
(tutorial-env) (base) harry#harry-badass:~/Desktop/twitter_project$ spark-submit --jars spark-streaming-kafka-0-8_2.11-2.4.4.jar direct_approach.py localhost:9092 new_topic
19/12/14 14:27:23 WARN Utils: Your hostname, harry-badass resolves to a loopback address: 127.0.1.1; using 220.149.84.46 instead (on interface enp4s0)
19/12/14 14:27:23 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/jars/spark-unsafe_2.11-2.4.4.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
19/12/14 14:27:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/12/14 14:27:24 INFO SparkContext: Running Spark version 2.4.4
19/12/14 14:27:24 INFO SparkContext: Submitted application: PythonStreamingDirectKafkaWordCount
19/12/14 14:27:24 INFO SecurityManager: Changing view acls to: harry
19/12/14 14:27:24 INFO SecurityManager: Changing modify acls to: harry
19/12/14 14:27:24 INFO SecurityManager: Changing view acls groups to:
19/12/14 14:27:24 INFO SecurityManager: Changing modify acls groups to:
19/12/14 14:27:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(harry); groups with view permissions: Set(); users with modify permissions: Set(harry); groups with modify permissions: Set()
19/12/14 14:27:24 INFO Utils: Successfully started service 'sparkDriver' on port 41699.
19/12/14 14:27:24 INFO SparkEnv: Registering MapOutputTracker
19/12/14 14:27:24 INFO SparkEnv: Registering BlockManagerMaster
19/12/14 14:27:24 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/12/14 14:27:24 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/12/14 14:27:24 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-2067d2bb-4b7c-49d8-8f02-f20e8467b21e
19/12/14 14:27:24 INFO MemoryStore: MemoryStore started with capacity 434.4 MB
19/12/14 14:27:24 INFO SparkEnv: Registering OutputCommitCoordinator
19/12/14 14:27:24 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/12/14 14:27:24 INFO Utils: Successfully started service 'SparkUI' on port 4041.
19/12/14 14:27:24 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://220.149.84.46:4041
19/12/14 14:27:24 INFO SparkContext: Added JAR file:///home/harry/Desktop/twitter_project/spark-streaming-kafka-0-8_2.11-2.4.4.jar at spark://220.149.84.46:41699/jars/spark-streaming-kafka-0-8_2.11-2.4.4.jar with timestamp 1576301244901
19/12/14 14:27:24 INFO Executor: Starting executor ID driver on host localhost
19/12/14 14:27:25 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46637.
19/12/14 14:27:25 INFO NettyBlockTransferService: Server created on 220.149.84.46:46637
19/12/14 14:27:25 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/12/14 14:27:25 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 220.149.84.46, 46637, None)
19/12/14 14:27:25 INFO BlockManagerMasterEndpoint: Registering block manager 220.149.84.46:46637 with 434.4 MB RAM, BlockManagerId(driver, 220.149.84.46, 46637, None)
19/12/14 14:27:25 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 220.149.84.46, 46637, None)
19/12/14 14:27:25 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 220.149.84.46, 46637, None)
Exception in thread "Thread-5" java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
at java.base/java.lang.Class.getDeclaredMethods0(Native Method)
at java.base/java.lang.Class.privateGetDeclaredMethods(Class.java:3139)
at java.base/java.lang.Class.privateGetPublicMethods(Class.java:3164)
at java.base/java.lang.Class.getMethods(Class.java:1861)
at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:345)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:305)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:466)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:563)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
... 12 more
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "/home/harry/Desktop/twitter_project/direct_approach.py", line 9, in <module>
kvs = KafkaUtils.createDirectStream(ssc, [topic],{"metadata.broker.list": brokers})
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 146, in createDirectStream
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o24.createDirectStreamWithoutMessageHandler
19/12/14 14:27:25 INFO SparkContext: Invoking stop() from shutdown hook
19/12/14 14:27:25 INFO SparkUI: Stopped Spark web UI at http://220.149.84.46:4041
19/12/14 14:27:25 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/12/14 14:27:25 INFO MemoryStore: MemoryStore cleared
19/12/14 14:27:25 INFO BlockManager: BlockManager stopped
19/12/14 14:27:25 INFO BlockManagerMaster: BlockManagerMaster stopped
19/12/14 14:27:25 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/12/14 14:27:25 INFO SparkContext: Successfully stopped SparkContext
19/12/14 14:27:25 INFO ShutdownHookManager: Shutdown hook called
19/12/14 14:27:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-8e271f94-bec9-4f7e-aad0-1f3b651e9b29
19/12/14 14:27:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-747cc9ca-bca4-42a7-ad82-d6a055727394
19/12/14 14:27:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-747cc9ca-bca4-42a7-ad82-d6a055727394/pyspark-83cc90cc-1aaa-4dea-b364-4b66487be18f
Memory doesn't help find a missing class. You need to download the kafka-clients JAR as well
Note: You can use --packages instead of downloading jars

Spark Standalone on Kubernetes - application got finished after consecutive master then driver failure

Trying to achieve High Availability of SparkMaster using ZooKeeper with SparkDriver resiliency using metaData checkpoint into GlusterFS.
Some Informations :
Using Spark 2.2.0 (prebuilt binary)
Submitting a streaming app with --deploy-mode cluster and --supervise from a separate spark client pod
Spark Components on Kubernetes are of type Statefulset for Dynamic Volume Provisioning (Previously using Replication Controller/ Deployment)
Created 3 GlusterFS shared pvc - spark-master-pvc,spark-worker-pvc,spark-ckp-pvc
Successfully achieved the Scenarios like - Only Master Failure, Only Driver Failure, Consecutive Master and Driver Failure, Driver Failure then Master. But the Scenario like Submitted a Job -> Master Failure (Working fine) -> Driver Failure i.e. Worker Pod failure is not working.
NEW ALIVE MASTER's log -
18/06/11 10:23:16 INFO ZooKeeperLeaderElectionAgent: We have gained leadership
18/06/11 10:23:16 INFO Master: I have been elected leader! New state: RECOVERING
18/06/11 10:23:16 INFO Master: Trying to recover app: app-20180611102123-0001
18/06/11 10:23:16 INFO Master: Trying to recover worker: worker-20180611101834-10.1.53.142-36203
18/06/11 10:23:16 INFO Master: Trying to recover worker: worker-20180611102123-10.1.170.85-39447
18/06/11 10:23:16 INFO Master: Trying to recover worker: worker-20180611101834-10.1.185.87-38235
18/06/11 10:23:16 INFO TransportClientFactory: Successfully created connection to /10.1.53.142:36203 after 7 ms (0 ms spent in bootstraps)
18/06/11 10:23:16 INFO TransportClientFactory: Successfully created connection to /10.1.185.87:38235 after 3 ms (0 ms spent in bootstraps)
18/06/11 10:23:16 INFO TransportClientFactory: Successfully created connection to /10.1.53.142:38994 after 12 ms (0 ms spent in bootstraps)
18/06/11 10:23:16 INFO TransportClientFactory: Successfully created connection to /10.1.170.85:39447 after 7 ms (0 ms spent in bootstraps)
18/06/11 10:23:16 INFO Master: Application has been re-registered: app-20180611102123-0001
18/06/11 10:23:16 INFO Master: Worker has been re-registered: worker-20180611102123-10.1.170.85-39447
18/06/11 10:23:16 INFO Master: Worker has been re-registered: worker-20180611101834-10.1.53.142-36203
18/06/11 10:23:16 INFO Master: Worker has been re-registered: worker-20180611101834-10.1.185.87-38235
18/06/11 10:23:16 INFO Master: Recovery complete - resuming operations!
18/06/11 10:24:37 INFO Master: Received unregister request from application app-20180611102123-0001
18/06/11 10:24:37 INFO Master: Removing app app-20180611102123-0001
18/06/11 10:24:37 INFO Master: 10.1.53.142:38994 got disassociated, removing it.
18/06/11 10:24:37 INFO Master: 10.1.53.142:38994 got disassociated, removing it.
18/06/11 10:24:37 WARN Master: Got status update for unknown executor app-20180611102123-0001/0
18/06/11 10:24:37 WARN Master: Got status update for unknown executor app-20180611102123-0001/1
18/06/11 10:24:38 INFO Master: 10.1.53.142:36203 got disassociated, removing it.
18/06/11 10:24:38 INFO Master: Removing worker worker-20180611101834-10.1.53.142-36203 on 10.1.53.142:36203
18/06/11 10:24:38 INFO Master: Re-launching driver-20180611102017-0000
18/06/11 10:24:38 INFO Master: Launching driver driver-20180611102017-0000 on worker worker-20180611101834-10.1.185.87-38235
18/06/11 10:24:38 INFO Master: 10.1.53.142:59142 got disassociated, removing it.
18/06/11 10:24:38 INFO Master: 10.1.53.142:36203 got disassociated, removing it.
18/06/11 10:24:38 INFO Master: 10.1.53.142:36203 got disassociated, removing it.
18/06/11 10:24:43 INFO Master: Registering worker 10.1.53.143:35156 with 8 cores, 30.3 GB RAM
DRIVER is remained in Halted State. Driver Error Log -
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
18/06/11 19:32:14 INFO SecurityManager: Changing view acls to: root
18/06/11 19:32:14 INFO SecurityManager: Changing modify acls to: root
18/06/11 19:32:14 INFO SecurityManager: Changing view acls groups to:
18/06/11 19:32:14 INFO SecurityManager: Changing modify acls groups to:
18/06/11 19:32:14 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
18/06/11 19:32:15 INFO Utils: Successfully started service 'Driver' on port 40594.
18/06/11 19:32:15 INFO WorkerWatcher: Connecting to worker spark://Worker#10.1.185.87:38235
18/06/11 19:32:15 INFO TransportClientFactory: Successfully created connection to /10.1.185.87:38235 after 44 ms (0 ms spent in bootstraps)
18/06/11 19:32:15 INFO WorkerWatcher: Successfully connected to spark://Worker#10.1.185.87:38235
18/06/11 19:32:15 INFO CheckpointReader: Checkpoint files found: file:/ckp/checkpoint-1528712675000,file:/ckp/checkpoint-1528712675000.bk,file:/ckp/checkpoint-1528712670000,file:/ckp/checkpoint-1528712670000.bk,file:/ckp/checkpoint-1528712665000,file:/ckp/checkpoint-1528712665000.bk,file:/ckp/checkpoint-1528712660000,file:/ckp/checkpoint-1528712660000.bk,file:/ckp/checkpoint-1528712655000,file:/ckp/checkpoint-1528712655000.bk
18/06/11 19:32:15 INFO CheckpointReader: Attempting to load checkpoint from file file:/ckp/checkpoint-1528712675000
18/06/11 19:32:15 INFO Checkpoint: Checkpoint for time 1528712675000 ms validated
18/06/11 19:32:15 INFO CheckpointReader: Checkpoint successfully loaded from file file:/ckp/checkpoint-1528712675000
18/06/11 19:32:15 INFO CheckpointReader: Checkpoint was generated at time 1528712675000 ms
18/06/11 19:32:15 INFO SparkContext: Running Spark version 2.2.0
18/06/11 19:32:15 INFO SparkContext: Submitted application: SparkStreamingWithCheckPointAndZK
18/06/11 19:32:15 INFO SecurityManager: Changing view acls to: root
18/06/11 19:32:15 INFO SecurityManager: Changing modify acls to: root
18/06/11 19:32:15 INFO SecurityManager: Changing view acls groups to:
18/06/11 19:32:15 INFO SecurityManager: Changing modify acls groups to:
18/06/11 19:32:15 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
18/06/11 19:32:15 INFO Utils: Successfully started service 'sparkDriver' on port 46544.
18/06/11 19:32:15 INFO SparkEnv: Registering MapOutputTracker
18/06/11 19:32:15 INFO SparkEnv: Registering BlockManagerMaster
18/06/11 19:32:15 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
18/06/11 19:32:15 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
18/06/11 19:32:16 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-623c4b9e-8045-4a19-a746-96a3b23c1184
18/06/11 19:32:16 INFO MemoryStore: MemoryStore started with capacity 366.3 MB
18/06/11 19:32:16 INFO SparkEnv: Registering OutputCommitCoordinator
18/06/11 19:32:16 INFO Utils: Successfully started service 'SparkUI' on port 4040.
18/06/11 19:32:16 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.1.185.87:4040
18/06/11 19:32:16 INFO SparkContext: Added JAR file:///opt/spark/jars/spark-0.0.1-SNAPSHOT.jar at spark://10.1.185.87:46544/jars/spark-0.0.1-SNAPSHOT.jar with timestamp 1528745536460
18/06/11 19:32:16 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://10.1.170.81:7077...
18/06/11 19:32:36 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://10.1.170.81:7077...
18/06/11 19:32:56 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://10.1.170.81:7077...
18/06/11 19:33:16 ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
18/06/11 19:33:16 WARN StandaloneSchedulerBackend: Application ID is not initialized yet.
18/06/11 19:33:16 INFO SparkUI: Stopped Spark web UI at http://10.1.185.87:4040
18/06/11 19:33:16 INFO StandaloneSchedulerBackend: Shutting down all executors
18/06/11 19:33:16 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46323.
18/06/11 19:33:16 INFO NettyBlockTransferService: Server created on 10.1.185.87:46323
18/06/11 19:33:16 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/06/11 19:33:16 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
18/06/11 19:33:16 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.1.185.87, 46323, None)
18/06/11 19:33:16 WARN StandaloneAppClient$ClientEndpoint: Drop UnregisterApplication(null) because has not yet connected to master
18/06/11 19:33:16 INFO BlockManagerMasterEndpoint: Registering block manager 10.1.185.87:46323 with 366.3 MB RAM, BlockManagerId(driver, 10.1.185.87, 46323, None)
18/06/11 19:33:16 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.1.185.87, 46323, None)
18/06/11 19:33:16 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.1.185.87, 46323, None)
18/06/11 19:33:16 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
18/06/11 19:33:16 INFO MemoryStore: MemoryStore cleared
18/06/11 19:33:16 INFO BlockManager: BlockManager stopped
18/06/11 19:33:16 INFO BlockManagerMaster: BlockManagerMaster stopped
18/06/11 19:33:16 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
18/06/11 19:33:16 ERROR SparkContext: Error initializing SparkContext.
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:524)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:141)
at apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:829)
at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:829)
at scala.Option.map(Option.scala:146)
at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:829)
at org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:626)
at org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
at org.merlin.spark.SparkKafkaStreamingWithGluster.main(SparkKafkaStreamingWithGluster.java:42)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
18/06/11 19:33:16 INFO SparkContext: SparkContext already stopped.
Exception in thread "main" java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.Predef$.require(Predef.scala:224)
at org.apache.spark.metrics.MetricsSystem.getServletHandlers(MetricsSystem.scala:91)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:524)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2509)
at org.apache.spark.streaming.StreamingContext.<init>(StreamingContext.scala:141)
at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:829)
at org.apache.spark.streaming.StreamingContext$$anonfun$getOrCreate$1.apply(StreamingContext.scala:829)
at scala.Option.map(Option.scala:146)
at org.apache.spark.streaming.StreamingContext$.getOrCreate(StreamingContext.scala:829)
at org.apache.spark.streaming.api.java.JavaStreamingContext$.getOrCreate(JavaStreamingContext.scala:626)
at org.apache.spark.streaming.api.java.JavaStreamingContext.getOrCreate(JavaStreamingContext.scala)
at org.merlin.spark.SparkKafkaStreamingWithGluster.main(SparkKafkaStreamingWithGluster.java:42)
... 6 more
Am I choosing the right resource controller i.e. Statefulsets of kubernetes for spark?
M new to this environment, any help will be highly appreciable.
Seems like your driver is not able to find master node. Here is the log
18/06/11 19:33:16 ERROR StandaloneSchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
Try to telnet ip and port from your client machine.

Resources