How to pass data from Kafka to Spark Streaming? - apache-spark

I am trying to pass data from kafka to spark streaming.
This is what I've done till now:
Installed both kafka and spark
Started zookeeper with default properties config
Started kafka server with default properties config
Started kafka producer
Started kafka consumer
Sent message from producer to consumer. Works fine.
Wrote kafka-spark.py to receive messages from kafka to spark.
I try running ./bin/spark-submit examples/src/main/python/kafka-spark.py
I get an error.
kafka-spark.py -
from __future__ import print_function
import sys
from pyspark.streaming import StreamingContext
from pyspark import SparkContext,SparkConf
from pyspark.streaming.kafka import KafkaUtils
if __name__ == "__main__":
#conf = SparkConf().setAppName("Kafka-Spark").setMaster("spark://127.0.0.1:7077")
conf = SparkConf().setAppName("Kafka-Spark")
#sc = SparkContext(appName="KafkaSpark")
sc = SparkContext(conf=conf)
stream=StreamingContext(sc,1)
map1={'spark-kafka':1}
kafkaStream = KafkaUtils.createStream(stream, 'localhost:9092', "name", map1) #tried with localhost:2181 too
print("kafkastream=",kafkaStream)
sc.stop()
Full Log including the Error on running spark-kafka.py:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/01/18 13:05:33 INFO SparkContext: Running Spark version 1.6.0
16/01/18 13:05:33 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/18 13:05:33 INFO SecurityManager: Changing view acls to: username
16/01/18 13:05:33 INFO SecurityManager: Changing modify acls to: username
16/01/18 13:05:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(username); users with modify permissions: Set(username)
16/01/18 13:05:33 INFO Utils: Successfully started service 'sparkDriver' on port 54446.
16/01/18 13:05:34 INFO Slf4jLogger: Slf4jLogger started
16/01/18 13:05:34 INFO Remoting: Starting remoting
16/01/18 13:05:34 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#127.0.0.1:50386]
16/01/18 13:05:34 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 50386.
16/01/18 13:05:34 INFO SparkEnv: Registering MapOutputTracker
16/01/18 13:05:34 INFO SparkEnv: Registering BlockManagerMaster
16/01/18 13:05:34 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-f5490271-cdb7-467d-a915-4f5ccab57f0e
16/01/18 13:05:34 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/01/18 13:05:34 INFO SparkEnv: Registering OutputCommitCoordinator
16/01/18 13:05:34 INFO Server: jetty-8.y.z-SNAPSHOT
16/01/18 13:05:34 INFO AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
16/01/18 13:05:34 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/01/18 13:05:34 INFO SparkUI: Started SparkUI at http://127.0.0.1:4040
Java HotSpot(TM) Server VM warning: You have loaded library /tmp/libnetty-transport-native-epoll561240765619860252.so which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
16/01/18 13:05:34 INFO Utils: Copying ~/Dropbox/Work/ITNow/spark/spark-1.6.0/examples/src/main/python/kafka-spark.py to /tmp/spark-18227081-a1c8-43f2-8ca7-cfc4751f023f/userFiles-e93fc252-0ba1-42b7-b4fa-2e46f3a0601e/kafka-spark.py
16/01/18 13:05:34 INFO SparkContext: Added file file:~/Dropbox/Work/ITNow/spark/spark-1.6.0/examples/src/main/python/kafka-spark.py at file:~/Dropbox/Work/ITNow/spark/spark-1.6.0/examples/src/main/python/kafka-spark.py with timestamp 1453118734892
16/01/18 13:05:35 INFO Executor: Starting executor ID driver on host localhost
16/01/18 13:05:35 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58970.
16/01/18 13:05:35 INFO NettyBlockTransferService: Server created on 58970
16/01/18 13:05:35 INFO BlockManagerMaster: Trying to register BlockManager
16/01/18 13:05:35 INFO BlockManagerMasterEndpoint: Registering block manager localhost:58970 with 511.1 MB RAM, BlockManagerId(driver, localhost, 58970)
16/01/18 13:05:35 INFO BlockManagerMaster: Registered BlockManager
________________________________________________________________________________________________
Spark Streaming's Kafka libraries not found in class path. Try one of the following.
1. Include the Kafka library and its dependencies with in the
spark-submit command as
$ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka:1.6.0 ...
2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-assembly, Version = 1.6.0.
Then, include the jar in the spark-submit command as
$ bin/spark-submit --jars <spark-streaming-kafka-assembly.jar> ...
________________________________________________________________________________________________
Traceback (most recent call last):
File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/examples/src/main/python/kafka-spark.py", line 33, in <module>
kafkaStream = KafkaUtils.createStream(stream, 'localhost:9092', "name", map1)
File "~/Dropbox/Work/ITNow/spark/spark-1.6.0/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 80, in createStream
py4j.protocol.Py4JJavaError: An error occurred while calling o22.loadClass.
: java.lang.ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaUtilsPythonHelper
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
at py4j.Gateway.invoke(Gateway.java:259)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:209)
at java.lang.Thread.run(Thread.java:745)
16/01/18 13:05:35 INFO SparkContext: Invoking stop() from shutdown hook
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/metrics/json,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/kill,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/api,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/static,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/threadDump,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors/json,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/executors,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment/json,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/environment,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd/json,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/rdd,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage/json,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/storage,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool/json,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/pool,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage/json,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/stage,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages/json,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/stages,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job/json,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/job,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs/json,null}
16/01/18 13:05:35 INFO ContextHandler: stopped o.e.j.s.ServletContextHandler{/jobs,null}
16/01/18 13:05:35 INFO SparkUI: Stopped Spark web UI at http://127.0.0.1:4040
16/01/18 13:05:35 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/01/18 13:05:35 INFO MemoryStore: MemoryStore cleared
16/01/18 13:05:35 INFO BlockManager: BlockManager stopped
16/01/18 13:05:35 INFO BlockManagerMaster: BlockManagerMaster stopped
16/01/18 13:05:35 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/01/18 13:05:35 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
16/01/18 13:05:35 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
16/01/18 13:05:35 INFO SparkContext: Successfully stopped SparkContext
16/01/18 13:05:35 INFO ShutdownHookManager: Shutdown hook called
16/01/18 13:05:35 INFO ShutdownHookManager: Deleting directory /tmp/spark-18227081-a1c8-43f2-8ca7-cfc4751f023f
16/01/18 13:05:35 INFO ShutdownHookManager: Deleting directory /tmp/spark-18227081-a1c8-43f2-8ca7-cfc4751f023f/pyspark-fcd47a97-57ef-46c3-bb16-357632580334
EDIT
On running ./bin/spark-submit --jars spark-streaming-kafka-assembly_2.10-1.6.0.jar examples/src/main/python/kafka-spark.py I get the HEXADECIMAL location instead of the actual string:
kafkastream= <pyspark.streaming.dstream.TransformedDStream object at 0x7fd6c4dad150>
Any idea what am I doing wrong? I'm really new to kakfa and spark so I need some help here. Thanks!

You need to submit spark-streaming-kafka-assembly_*.jar with your job:
spark-submit --jars spark-streaming-kafka-assembly_2.10-1.5.2.jar ./spark-kafka.py

Alternatively, if you want to also specify resources to be allocated at the same time:
spark-submit --deploy-mode cluster --master yarn --num-executors 5 --executor-cores 5 --executor-memory 20g --jars spark-streaming-kafka-assembly_2.10-1.6.0.jar ./spark-kafka.py
If you wanna run your code in a Jupyter-notebook, then this could be helpful:
from __future__ import print_function
import sys
from pyspark.streaming import StreamingContext
from pyspark import SparkContext,SparkConf
from pyspark.streaming.kafka import KafkaUtils
if __name__ == "__main__":
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars spark-streaming-kafka-assembly_2.10-1.6.0.jar pyspark-shell' #note that the "pyspark-shell" part is very important!!.
#conf = SparkConf().setAppName("Kafka-Spark").setMaster("spark://127.0.0.1:7077")
conf = SparkConf().setAppName("Kafka-Spark")
#sc = SparkContext(appName="KafkaSpark")
sc = SparkContext(conf=conf)
stream=StreamingContext(sc,1)
map1={'spark-kafka':1}
kafkaStream = KafkaUtils.createStream(stream, 'localhost:9092', "name", map1) #tried with localhost:2181 too
print("kafkastream=",kafkaStream)
sc.stop()
Note the introduction of the following line in __main__:
os.environ['PYSPARK_SUBMIT_ARGS'] = '--jars spark-streaming-kafka-assembly_2.10-1.6.0.jar pyspark-shell'
Sources: https://github.com/jupyter/docker-stacks/issues/154

To print a DStream, spark provides a method pprint for Python. So you'll use
kafkastream.pprint()

Related

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`product`' given input columns: [jsontostructs(message)];

C:\Users\sorun\.jdks\openjdk-14.0.1\bin\java.exe "-javaagent:D:\Intellij IDEA\IntelliJ IDEA 2020.1.1\lib\idea_rt.jar=50945:D:\Intellij IDEA\IntelliJ IDEA 2020.1.1\bin" -Dfile.encoding=UTF-8 -classpath C:\Users\sorun\IdeaProjects\spark-streaming-kafka\target\classes;C:\Users\sorun\.m2\repository\org\apache\spark\spark-sql_2.11\2.2.0\spark-sql_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\com\univocity\univocity-parsers\2.2.1\univocity-parsers-2.2.1.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-sketch_2.11\2.2.0\spark-sketch_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-core_2.11\2.2.0\spark-core_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\apache\avro\avro\1.7.7\avro-1.7.7.jar;C:\Users\sorun\.m2\repository\com\thoughtworks\paranamer\paranamer\2.3\paranamer-2.3.jar;C:\Users\sorun\.m2\repository\org\apache\commons\commons-compress\1.4.1\commons-compress-1.4.1.jar;C:\Users\sorun\.m2\repository\org\tukaani\xz\1.0\xz-1.0.jar;C:\Users\sorun\.m2\repository\org\apache\avro\avro-mapred\1.7.7\avro-mapred-1.7.7-hadoop2.jar;C:\Users\sorun\.m2\repository\org\apache\avro\avro-ipc\1.7.7\avro-ipc-1.7.7.jar;C:\Users\sorun\.m2\repository\org\apache\avro\avro-ipc\1.7.7\avro-ipc-1.7.7-tests.jar;C:\Users\sorun\.m2\repository\com\twitter\chill_2.11\0.8.0\chill_2.11-0.8.0.jar;C:\Users\sorun\.m2\repository\com\esotericsoftware\kryo-shaded\3.0.3\kryo-shaded-3.0.3.jar;C:\Users\sorun\.m2\repository\com\esotericsoftware\minlog\1.3.0\minlog-1.3.0.jar;C:\Users\sorun\.m2\repository\org\objenesis\objenesis\2.1\objenesis-2.1.jar;C:\Users\sorun\.m2\repository\com\twitter\chill-java\0.8.0\chill-java-0.8.0.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-client\2.6.5\hadoop-client-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-common\2.6.5\hadoop-common-2.6.5.jar;C:\Users\sorun\.m2\repository\commons-cli\commons-cli\1.2\commons-cli-1.2.jar;C:\Users\sorun\.m2\repository\xmlenc\xmlenc\0.52\xmlenc-0.52.jar;C:\Users\sorun\.m2\repository\commons-httpclient\commons-httpclient\3.1\commons-httpclient-3.1.jar;C:\Users\sorun\.m2\repository\commons-io\commons-io\2.4\commons-io-2.4.jar;C:\Users\sorun\.m2\repository\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;C:\Users\sorun\.m2\repository\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;C:\Users\sorun\.m2\repository\commons-configuration\commons-configuration\1.6\commons-configuration-1.6.jar;C:\Users\sorun\.m2\repository\commons-digester\commons-digester\1.8\commons-digester-1.8.jar;C:\Users\sorun\.m2\repository\commons-beanutils\commons-beanutils\1.7.0\commons-beanutils-1.7.0.jar;C:\Users\sorun\.m2\repository\commons-beanutils\commons-beanutils-core\1.8.0\commons-beanutils-core-1.8.0.jar;C:\Users\sorun\.m2\repository\com\google\protobuf\protobuf-java\2.5.0\protobuf-java-2.5.0.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-auth\2.6.5\hadoop-auth-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\directory\server\apacheds-kerberos-codec\2.0.0-M15\apacheds-kerberos-codec-2.0.0-M15.jar;C:\Users\sorun\.m2\repository\org\apache\directory\server\apacheds-i18n\2.0.0-M15\apacheds-i18n-2.0.0-M15.jar;C:\Users\sorun\.m2\repository\org\apache\directory\api\api-asn1-api\1.0.0-M20\api-asn1-api-1.0.0-M20.jar;C:\Users\sorun\.m2\repository\org\apache\directory\api\api-util\1.0.0-M20\api-util-1.0.0-M20.jar;C:\Users\sorun\.m2\repository\org\apache\curator\curator-client\2.6.0\curator-client-2.6.0.jar;C:\Users\sorun\.m2\repository\org\htrace\htrace-core\3.0.4\htrace-core-3.0.4.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-hdfs\2.6.5\hadoop-hdfs-2.6.5.jar;C:\Users\sorun\.m2\repository\org\mortbay\jetty\jetty-util\6.1.26\jetty-util-6.1.26.jar;C:\Users\sorun\.m2\repository\xerces\xercesImpl\2.9.1\xercesImpl-2.9.1.jar;C:\Users\sorun\.m2\repository\xml-apis\xml-apis\1.3.04\xml-apis-1.3.04.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-app\2.6.5\hadoop-mapreduce-client-app-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-common\2.6.5\hadoop-mapreduce-client-common-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-yarn-client\2.6.5\hadoop-yarn-client-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-yarn-server-common\2.6.5\hadoop-yarn-server-common-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-shuffle\2.6.5\hadoop-mapreduce-client-shuffle-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-yarn-api\2.6.5\hadoop-yarn-api-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-core\2.6.5\hadoop-mapreduce-client-core-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-yarn-common\2.6.5\hadoop-yarn-common-2.6.5.jar;C:\Users\sorun\.m2\repository\javax\xml\bind\jaxb-api\2.2.2\jaxb-api-2.2.2.jar;C:\Users\sorun\.m2\repository\javax\xml\stream\stax-api\1.0-2\stax-api-1.0-2.jar;C:\Users\sorun\.m2\repository\org\codehaus\jackson\jackson-jaxrs\1.9.13\jackson-jaxrs-1.9.13.jar;C:\Users\sorun\.m2\repository\org\codehaus\jackson\jackson-xc\1.9.13\jackson-xc-1.9.13.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-mapreduce-client-jobclient\2.6.5\hadoop-mapreduce-client-jobclient-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\hadoop\hadoop-annotations\2.6.5\hadoop-annotations-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-launcher_2.11\2.2.0\spark-launcher_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-network-common_2.11\2.2.0\spark-network-common_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-network-shuffle_2.11\2.2.0\spark-network-shuffle_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-unsafe_2.11\2.2.0\spark-unsafe_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\net\java\dev\jets3t\jets3t\0.9.3\jets3t-0.9.3.jar;C:\Users\sorun\.m2\repository\org\apache\httpcomponents\httpcore\4.3.3\httpcore-4.3.3.jar;C:\Users\sorun\.m2\repository\org\apache\httpcomponents\httpclient\4.3.6\httpclient-4.3.6.jar;C:\Users\sorun\.m2\repository\javax\activation\activation\1.1.1\activation-1.1.1.jar;C:\Users\sorun\.m2\repository\mx4j\mx4j\3.0.2\mx4j-3.0.2.jar;C:\Users\sorun\.m2\repository\javax\mail\mail\1.4.7\mail-1.4.7.jar;C:\Users\sorun\.m2\repository\org\bouncycastle\bcprov-jdk15on\1.51\bcprov-jdk15on-1.51.jar;C:\Users\sorun\.m2\repository\com\jamesmurty\utils\java-xmlbuilder\1.0\java-xmlbuilder-1.0.jar;C:\Users\sorun\.m2\repository\net\iharder\base64\2.3.8\base64-2.3.8.jar;C:\Users\sorun\.m2\repository\org\apache\curator\curator-recipes\2.6.0\curator-recipes-2.6.0.jar;C:\Users\sorun\.m2\repository\org\apache\curator\curator-framework\2.6.0\curator-framework-2.6.0.jar;C:\Users\sorun\.m2\repository\org\apache\zookeeper\zookeeper\3.4.6\zookeeper-3.4.6.jar;C:\Users\sorun\.m2\repository\com\google\guava\guava\16.0.1\guava-16.0.1.jar;C:\Users\sorun\.m2\repository\javax\servlet\javax.servlet-api\3.1.0\javax.servlet-api-3.1.0.jar;C:\Users\sorun\.m2\repository\org\apache\commons\commons-lang3\3.5\commons-lang3-3.5.jar;C:\Users\sorun\.m2\repository\org\apache\commons\commons-math3\3.4.1\commons-math3-3.4.1.jar;C:\Users\sorun\.m2\repository\com\google\code\findbugs\jsr305\1.3.9\jsr305-1.3.9.jar;C:\Users\sorun\.m2\repository\org\slf4j\slf4j-api\1.7.16\slf4j-api-1.7.16.jar;C:\Users\sorun\.m2\repository\org\slf4j\jul-to-slf4j\1.7.16\jul-to-slf4j-1.7.16.jar;C:\Users\sorun\.m2\repository\org\slf4j\jcl-over-slf4j\1.7.16\jcl-over-slf4j-1.7.16.jar;C:\Users\sorun\.m2\repository\log4j\log4j\1.2.17\log4j-1.2.17.jar;C:\Users\sorun\.m2\repository\org\slf4j\slf4j-log4j12\1.7.16\slf4j-log4j12-1.7.16.jar;C:\Users\sorun\.m2\repository\com\ning\compress-lzf\1.0.3\compress-lzf-1.0.3.jar;C:\Users\sorun\.m2\repository\org\xerial\snappy\snappy-java\1.1.2.6\snappy-java-1.1.2.6.jar;C:\Users\sorun\.m2\repository\net\jpountz\lz4\lz4\1.3.0\lz4-1.3.0.jar;C:\Users\sorun\.m2\repository\org\roaringbitmap\RoaringBitmap\0.5.11\RoaringBitmap-0.5.11.jar;C:\Users\sorun\.m2\repository\commons-net\commons-net\2.2\commons-net-2.2.jar;C:\Users\sorun\.m2\repository\org\scala-lang\scala-library\2.11.8\scala-library-2.11.8.jar;C:\Users\sorun\.m2\repository\org\json4s\json4s-jackson_2.11\3.2.11\json4s-jackson_2.11-3.2.11.jar;C:\Users\sorun\.m2\repository\org\json4s\json4s-core_2.11\3.2.11\json4s-core_2.11-3.2.11.jar;C:\Users\sorun\.m2\repository\org\json4s\json4s-ast_2.11\3.2.11\json4s-ast_2.11-3.2.11.jar;C:\Users\sorun\.m2\repository\org\scala-lang\scalap\2.11.0\scalap-2.11.0.jar;C:\Users\sorun\.m2\repository\org\scala-lang\scala-compiler\2.11.0\scala-compiler-2.11.0.jar;C:\Users\sorun\.m2\repository\org\scala-lang\modules\scala-xml_2.11\1.0.1\scala-xml_2.11-1.0.1.jar;C:\Users\sorun\.m2\repository\org\scala-lang\modules\scala-parser-combinators_2.11\1.0.1\scala-parser-combinators_2.11-1.0.1.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\core\jersey-client\2.22.2\jersey-client-2.22.2.jar;C:\Users\sorun\.m2\repository\javax\ws\rs\javax.ws.rs-api\2.0.1\javax.ws.rs-api-2.0.1.jar;C:\Users\sorun\.m2\repository\org\glassfish\hk2\hk2-api\2.4.0-b34\hk2-api-2.4.0-b34.jar;C:\Users\sorun\.m2\repository\org\glassfish\hk2\hk2-utils\2.4.0-b34\hk2-utils-2.4.0-b34.jar;C:\Users\sorun\.m2\repository\org\glassfish\hk2\external\aopalliance-repackaged\2.4.0-b34\aopalliance-repackaged-2.4.0-b34.jar;C:\Users\sorun\.m2\repository\org\glassfish\hk2\external\javax.inject\2.4.0-b34\javax.inject-2.4.0-b34.jar;C:\Users\sorun\.m2\repository\org\glassfish\hk2\hk2-locator\2.4.0-b34\hk2-locator-2.4.0-b34.jar;C:\Users\sorun\.m2\repository\org\javassist\javassist\3.18.1-GA\javassist-3.18.1-GA.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\core\jersey-common\2.22.2\jersey-common-2.22.2.jar;C:\Users\sorun\.m2\repository\javax\annotation\javax.annotation-api\1.2\javax.annotation-api-1.2.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\bundles\repackaged\jersey-guava\2.22.2\jersey-guava-2.22.2.jar;C:\Users\sorun\.m2\repository\org\glassfish\hk2\osgi-resource-locator\1.0.1\osgi-resource-locator-1.0.1.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\core\jersey-server\2.22.2\jersey-server-2.22.2.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\media\jersey-media-jaxb\2.22.2\jersey-media-jaxb-2.22.2.jar;C:\Users\sorun\.m2\repository\javax\validation\validation-api\1.1.0.Final\validation-api-1.1.0.Final.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\containers\jersey-container-servlet\2.22.2\jersey-container-servlet-2.22.2.jar;C:\Users\sorun\.m2\repository\org\glassfish\jersey\containers\jersey-container-servlet-core\2.22.2\jersey-container-servlet-core-2.22.2.jar;C:\Users\sorun\.m2\repository\io\netty\netty-all\4.0.43.Final\netty-all-4.0.43.Final.jar;C:\Users\sorun\.m2\repository\io\netty\netty\3.9.9.Final\netty-3.9.9.Final.jar;C:\Users\sorun\.m2\repository\com\clearspring\analytics\stream\2.7.0\stream-2.7.0.jar;C:\Users\sorun\.m2\repository\io\dropwizard\metrics\metrics-core\3.1.2\metrics-core-3.1.2.jar;C:\Users\sorun\.m2\repository\io\dropwizard\metrics\metrics-jvm\3.1.2\metrics-jvm-3.1.2.jar;C:\Users\sorun\.m2\repository\io\dropwizard\metrics\metrics-json\3.1.2\metrics-json-3.1.2.jar;C:\Users\sorun\.m2\repository\io\dropwizard\metrics\metrics-graphite\3.1.2\metrics-graphite-3.1.2.jar;C:\Users\sorun\.m2\repository\com\fasterxml\jackson\module\jackson-module-scala_2.11\2.6.5\jackson-module-scala_2.11-2.6.5.jar;C:\Users\sorun\.m2\repository\com\fasterxml\jackson\module\jackson-module-paranamer\2.6.5\jackson-module-paranamer-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\ivy\ivy\2.4.0\ivy-2.4.0.jar;C:\Users\sorun\.m2\repository\oro\oro\2.0.8\oro-2.0.8.jar;C:\Users\sorun\.m2\repository\net\razorvine\pyrolite\4.13\pyrolite-4.13.jar;C:\Users\sorun\.m2\repository\net\sf\py4j\py4j\0.10.4\py4j-0.10.4.jar;C:\Users\sorun\.m2\repository\org\apache\commons\commons-crypto\1.0.0\commons-crypto-1.0.0.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-catalyst_2.11\2.2.0\spark-catalyst_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\scala-lang\scala-reflect\2.11.8\scala-reflect-2.11.8.jar;C:\Users\sorun\.m2\repository\org\codehaus\janino\janino\3.0.0\janino-3.0.0.jar;C:\Users\sorun\.m2\repository\org\codehaus\janino\commons-compiler\3.0.0\commons-compiler-3.0.0.jar;C:\Users\sorun\.m2\repository\org\antlr\antlr4-runtime\4.5.3\antlr4-runtime-4.5.3.jar;C:\Users\sorun\.m2\repository\commons-codec\commons-codec\1.10\commons-codec-1.10.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-tags_2.11\2.2.0\spark-tags_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\apache\parquet\parquet-column\1.8.2\parquet-column-1.8.2.jar;C:\Users\sorun\.m2\repository\org\apache\parquet\parquet-common\1.8.2\parquet-common-1.8.2.jar;C:\Users\sorun\.m2\repository\org\apache\parquet\parquet-encoding\1.8.2\parquet-encoding-1.8.2.jar;C:\Users\sorun\.m2\repository\org\apache\parquet\parquet-hadoop\1.8.2\parquet-hadoop-1.8.2.jar;C:\Users\sorun\.m2\repository\org\apache\parquet\parquet-format\2.3.1\parquet-format-2.3.1.jar;C:\Users\sorun\.m2\repository\org\apache\parquet\parquet-jackson\1.8.2\parquet-jackson-1.8.2.jar;C:\Users\sorun\.m2\repository\org\codehaus\jackson\jackson-mapper-asl\1.9.11\jackson-mapper-asl-1.9.11.jar;C:\Users\sorun\.m2\repository\org\codehaus\jackson\jackson-core-asl\1.9.11\jackson-core-asl-1.9.11.jar;C:\Users\sorun\.m2\repository\com\fasterxml\jackson\core\jackson-databind\2.6.5\jackson-databind-2.6.5.jar;C:\Users\sorun\.m2\repository\com\fasterxml\jackson\core\jackson-annotations\2.6.0\jackson-annotations-2.6.0.jar;C:\Users\sorun\.m2\repository\com\fasterxml\jackson\core\jackson-core\2.6.5\jackson-core-2.6.5.jar;C:\Users\sorun\.m2\repository\org\apache\xbean\xbean-asm5-shaded\4.4\xbean-asm5-shaded-4.4.jar;C:\Users\sorun\.m2\repository\org\spark-project\spark\unused\1.0.0\unused-1.0.0.jar;C:\Users\sorun\.m2\repository\org\apache\spark\spark-sql-kafka-0-10_2.11\2.2.0\spark-sql-kafka-0-10_2.11-2.2.0.jar;C:\Users\sorun\.m2\repository\org\apache\kafka\kafka-clients\0.10.0.1\kafka-clients-0.10.0.1.jar;C:\Users\sorun\.m2\repository\com\google\code\gson\gson\2.8.3\gson-2.8.3.jar StreamingConsumer
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/06/19 12:39:42 INFO SparkContext: Running Spark version 2.2.0
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/C:/Users/sorun/.m2/repository/org/apache/hadoop/hadoop-auth/2.6.5/hadoop-auth-2.6.5.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/06/19 12:39:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/06/19 12:39:44 INFO SparkContext: Submitted application: Streaming-kafka
20/06/19 12:39:44 INFO SecurityManager: Changing view acls to: OZAN-OKAN
20/06/19 12:39:44 INFO SecurityManager: Changing modify acls to: OZAN-OKAN
20/06/19 12:39:44 INFO SecurityManager: Changing view acls groups to:
20/06/19 12:39:44 INFO SecurityManager: Changing modify acls groups to:
20/06/19 12:39:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(OZAN-OKAN); groups with view permissions: Set(); users with modify permissions: Set(OZAN-OKAN); groups with modify permissions: Set()
20/06/19 12:39:45 INFO Utils: Successfully started service 'sparkDriver' on port 50966.
20/06/19 12:39:45 INFO SparkEnv: Registering MapOutputTracker
20/06/19 12:39:45 INFO SparkEnv: Registering BlockManagerMaster
20/06/19 12:39:45 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/06/19 12:39:45 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/06/19 12:39:45 INFO DiskBlockManager: Created local directory at C:\Users\sorun\AppData\Local\Temp\blockmgr-0794380e-6e2b-4559-bf6c-7d10c2074bc8
20/06/19 12:39:45 INFO MemoryStore: MemoryStore started with capacity 1040.4 MB
20/06/19 12:39:45 INFO SparkEnv: Registering OutputCommitCoordinator
20/06/19 12:39:45 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/06/19 12:39:46 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.56.1:4040
20/06/19 12:39:46 INFO Executor: Starting executor ID driver on host localhost
20/06/19 12:39:46 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 50975.
20/06/19 12:39:46 INFO NettyBlockTransferService: Server created on 192.168.56.1:50975
20/06/19 12:39:46 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/06/19 12:39:46 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.56.1, 50975, None)
20/06/19 12:39:46 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.56.1:50975 with 1040.4 MB RAM, BlockManagerId(driver, 192.168.56.1, 50975, None)
20/06/19 12:39:46 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.56.1, 50975, None)
20/06/19 12:39:46 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.56.1, 50975, None)
20/06/19 12:39:46 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/C:/Users/sorun/IdeaProjects/spark-streaming-kafka/spark-warehouse/').
20/06/19 12:39:46 INFO SharedState: Warehouse path is 'file:/C:/Users/sorun/IdeaProjects/spark-streaming-kafka/spark-warehouse/'.
20/06/19 12:39:47 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
20/06/19 12:39:47 INFO CatalystSqlParser: Parsing command: string
20/06/19 12:39:49 INFO SparkSqlParser: Parsing command: CAST(value AS STRING) message
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`product`' given input columns: [jsontostructs(message)];
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:88)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:85)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4$$anonfun$apply$10.apply(TreeNode.scala:323)
at scala.collection.MapLike$MappedValues$$anonfun$iterator$3.apply(MapLike.scala:246)
at scala.collection.MapLike$MappedValues$$anonfun$iterator$3.apply(MapLike.scala:246)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.IterableLike$$anon$1.foreach(IterableLike.scala:311)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.MapBuilder.$plus$plus$eq(MapBuilder.scala:25)
at scala.collection.TraversableViewLike$class.force(TraversableViewLike.scala:88)
at scala.collection.IterableLike$$anon$1.force(IterableLike.scala:311)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:331)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:268)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$transformExpressionsUp$1.apply(QueryPlan.scala:268)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:279)
at org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$recursiveTransform$1(QueryPlan.scala:289)
at org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$6.apply(QueryPlan.scala:298)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:298)
at org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsUp(QueryPlan.scala:268)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:85)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:78)
at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:78)
at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.resolveAndBind(ExpressionEncoder.scala:256)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:206)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:170)
at org.apache.spark.sql.Dataset$.apply(Dataset.scala:61)
at org.apache.spark.sql.Dataset.as(Dataset.scala:380)
at StreamingConsumer.main(StreamingConsumer.java:24)
20/06/19 12:39:50 INFO SparkContext: Invoking stop() from shutdown hook
20/06/19 12:39:50 INFO SparkUI: Stopped Spark web UI at http://192.168.56.1:4040
20/06/19 12:39:50 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/06/19 12:39:50 INFO MemoryStore: MemoryStore cleared
20/06/19 12:39:50 INFO BlockManager: BlockManager stopped
20/06/19 12:39:50 INFO BlockManagerMaster: BlockManagerMaster stopped
20/06/19 12:39:50 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/06/19 12:39:50 INFO SparkContext: Successfully stopped SparkContext
20/06/19 12:39:50 INFO ShutdownHookManager: Shutdown hook called
20/06/19 12:39:50 INFO ShutdownHookManager: Deleting directory C:\Users\sorun\AppData\Local\Temp\spark-b70ecbcc-e6cf-4328-9069-97cc41cc72d7
Process finished with exit code 1
CODE
Exception in thread "main" org.apache.spark.sql.AnalysisException:
cannot resolve '`product`' given input columns: [jsontostructs(message)];
Above exception message says the column which you are selecting is not available in DataFrame, rename the column jsontostructs(message) to product & use this column in select.
And if you have "message" field in your model,
add it to schema struct type
StructType schema = new StructType().add("product","string").add("time", DataTypes.TimestampType).add("message", DataTypes.StringType);
Change schema).as("json"))
Dataset<SearchProductModel> data = load.selectExpr("CAST(value AS STRING) as message")
.select(functions.from_json(functions.col("message"), schema).as("json"))
.select("json.*")
.as(Encoders.bean(SearchProductModel.class));

Error when running spark-submit: java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition

spark-submit --jars spark-streaming-kafka-0-8_2.11-2.4.4.jar direct_approach.py localhost:9092 new_topic
I ran the code above but I don't know why I got this error. I spent hours to fix but I cannot.
I am using Spark 2.4.4 and Scala 2.13.0. I tried to set spark.executor.memory and spark.driver.memory in my Spark configuration file but i still could not solve the problem.
Here is the error:
(tutorial-env) (base) harry#harry-badass:~/Desktop/twitter_project$ spark-submit --jars spark-streaming-kafka-0-8_2.11-2.4.4.jar direct_approach.py localhost:9092 new_topic
19/12/14 14:27:23 WARN Utils: Your hostname, harry-badass resolves to a loopback address: 127.0.1.1; using 220.149.84.46 instead (on interface enp4s0)
19/12/14 14:27:23 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/jars/spark-unsafe_2.11-2.4.4.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
19/12/14 14:27:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/12/14 14:27:24 INFO SparkContext: Running Spark version 2.4.4
19/12/14 14:27:24 INFO SparkContext: Submitted application: PythonStreamingDirectKafkaWordCount
19/12/14 14:27:24 INFO SecurityManager: Changing view acls to: harry
19/12/14 14:27:24 INFO SecurityManager: Changing modify acls to: harry
19/12/14 14:27:24 INFO SecurityManager: Changing view acls groups to:
19/12/14 14:27:24 INFO SecurityManager: Changing modify acls groups to:
19/12/14 14:27:24 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(harry); groups with view permissions: Set(); users with modify permissions: Set(harry); groups with modify permissions: Set()
19/12/14 14:27:24 INFO Utils: Successfully started service 'sparkDriver' on port 41699.
19/12/14 14:27:24 INFO SparkEnv: Registering MapOutputTracker
19/12/14 14:27:24 INFO SparkEnv: Registering BlockManagerMaster
19/12/14 14:27:24 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/12/14 14:27:24 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/12/14 14:27:24 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-2067d2bb-4b7c-49d8-8f02-f20e8467b21e
19/12/14 14:27:24 INFO MemoryStore: MemoryStore started with capacity 434.4 MB
19/12/14 14:27:24 INFO SparkEnv: Registering OutputCommitCoordinator
19/12/14 14:27:24 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/12/14 14:27:24 INFO Utils: Successfully started service 'SparkUI' on port 4041.
19/12/14 14:27:24 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://220.149.84.46:4041
19/12/14 14:27:24 INFO SparkContext: Added JAR file:///home/harry/Desktop/twitter_project/spark-streaming-kafka-0-8_2.11-2.4.4.jar at spark://220.149.84.46:41699/jars/spark-streaming-kafka-0-8_2.11-2.4.4.jar with timestamp 1576301244901
19/12/14 14:27:24 INFO Executor: Starting executor ID driver on host localhost
19/12/14 14:27:25 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46637.
19/12/14 14:27:25 INFO NettyBlockTransferService: Server created on 220.149.84.46:46637
19/12/14 14:27:25 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
19/12/14 14:27:25 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 220.149.84.46, 46637, None)
19/12/14 14:27:25 INFO BlockManagerMasterEndpoint: Registering block manager 220.149.84.46:46637 with 434.4 MB RAM, BlockManagerId(driver, 220.149.84.46, 46637, None)
19/12/14 14:27:25 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 220.149.84.46, 46637, None)
19/12/14 14:27:25 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 220.149.84.46, 46637, None)
Exception in thread "Thread-5" java.lang.NoClassDefFoundError: kafka/common/TopicAndPartition
at java.base/java.lang.Class.getDeclaredMethods0(Native Method)
at java.base/java.lang.Class.privateGetDeclaredMethods(Class.java:3139)
at java.base/java.lang.Class.privateGetPublicMethods(Class.java:3164)
at java.base/java.lang.Class.getMethods(Class.java:1861)
at py4j.reflection.ReflectionEngine.getMethodsByNameAndLength(ReflectionEngine.java:345)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:305)
at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
at py4j.Gateway.invoke(Gateway.java:274)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:844)
Caused by: java.lang.ClassNotFoundException: kafka.common.TopicAndPartition
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:466)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:563)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:496)
... 12 more
ERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
File "/home/harry/Desktop/twitter_project/direct_approach.py", line 9, in <module>
kvs = KafkaUtils.createDirectStream(ssc, [topic],{"metadata.broker.list": brokers})
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 146, in createDirectStream
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/home/harry/tutorial-env/lib/python3.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o24.createDirectStreamWithoutMessageHandler
19/12/14 14:27:25 INFO SparkContext: Invoking stop() from shutdown hook
19/12/14 14:27:25 INFO SparkUI: Stopped Spark web UI at http://220.149.84.46:4041
19/12/14 14:27:25 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
19/12/14 14:27:25 INFO MemoryStore: MemoryStore cleared
19/12/14 14:27:25 INFO BlockManager: BlockManager stopped
19/12/14 14:27:25 INFO BlockManagerMaster: BlockManagerMaster stopped
19/12/14 14:27:25 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
19/12/14 14:27:25 INFO SparkContext: Successfully stopped SparkContext
19/12/14 14:27:25 INFO ShutdownHookManager: Shutdown hook called
19/12/14 14:27:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-8e271f94-bec9-4f7e-aad0-1f3b651e9b29
19/12/14 14:27:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-747cc9ca-bca4-42a7-ad82-d6a055727394
19/12/14 14:27:25 INFO ShutdownHookManager: Deleting directory /tmp/spark-747cc9ca-bca4-42a7-ad82-d6a055727394/pyspark-83cc90cc-1aaa-4dea-b364-4b66487be18f
Memory doesn't help find a missing class. You need to download the kafka-clients JAR as well
Note: You can use --packages instead of downloading jars

Cannot connect kafka server to spark

I try to stream data from kafka server to spark. As you guess I failed. I use spark 2.2.0 and kafka_2.11-0.11.0.1. I loaded jars to eclipse and runned below code.
package com.defne
import java.nio.ByteBuffer
import scala.util.Random
import org.apache.spark._
import org.apache.spark.streaming.dstream._
import org.apache.spark.streaming.kafka._
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.storage.StorageLevel
import org.apache.log4j.Level
import java.util.regex.Pattern
import java.util.regex.Matcher
import kafka.serializer.StringDecoder
import Utilities._
import org.apache.spark.streaming.kafka
import org.apache.spark.streaming.kafka.KafkaUtils
object KafkaExample {
def main(args: Array[String]) {
val ssc = new StreamingContext("local[*]", "KafkaExample", Seconds(1))
val kafkaParams = Map("metadata.broker.list" -> "kafkaIP:9092", "group.id" -> "console-consumer-9526", "zookeeper.connect" -> "localhost:2181")
val topics = List("logstash_log").toSet
val lines = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics).map(_._2)
lines.print()
ssc.checkpoint("C:/checkpoint/")
ssc.start()
ssc.awaitTermination()
}
}
And I got below output. Interesting thing is no error exists but somehow I cannot connect to the kafka server.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/11/01 10:16:55 INFO SparkContext: Running Spark version 2.2.0
17/11/01 10:16:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/11/01 10:16:56 INFO SparkContext: Submitted application: KafkaExample
17/11/01 10:16:56 INFO SecurityManager: Changing view acls to: user
17/11/01 10:16:56 INFO SecurityManager: Changing modify acls to: user
17/11/01 10:16:56 INFO SecurityManager: Changing view acls groups to:
17/11/01 10:16:56 INFO SecurityManager: Changing modify acls groups to:
17/11/01 10:16:56 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user); groups with view permissions: Set(); users with modify permissions: Set(user); groups with modify permissions: Set()
17/11/01 10:16:58 INFO Utils: Successfully started service 'sparkDriver' on port 53749.
17/11/01 10:16:59 INFO SparkEnv: Registering MapOutputTracker
17/11/01 10:16:59 INFO SparkEnv: Registering BlockManagerMaster
17/11/01 10:16:59 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
17/11/01 10:16:59 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
17/11/01 10:16:59 INFO DiskBlockManager: Created local directory at C:\Users\user\AppData\Local\Temp\blockmgr-2fa455d5-ef26-4fb9-ba4b-caf9f2fa3a68
17/11/01 10:16:59 INFO MemoryStore: MemoryStore started with capacity 897.6 MB
17/11/01 10:16:59 INFO SparkEnv: Registering OutputCommitCoordinator
17/11/01 10:16:59 INFO Utils: Successfully started service 'SparkUI' on port 4040.
17/11/01 10:17:00 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.56.1:4040
17/11/01 10:17:00 INFO Executor: Starting executor ID driver on host localhost
17/11/01 10:17:00 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 53770.
17/11/01 10:17:00 INFO NettyBlockTransferService: Server created on 192.168.56.1:53770
17/11/01 10:17:00 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
17/11/01 10:17:00 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 192.168.56.1, 53770, None)
17/11/01 10:17:00 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.56.1:53770 with 897.6 MB RAM, BlockManagerId(driver, 192.168.56.1, 53770, None)
17/11/01 10:17:00 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 192.168.56.1, 53770, None)
17/11/01 10:17:00 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, 192.168.56.1, 53770, None)
17/11/01 10:17:01 INFO VerifiableProperties: Verifying properties
17/11/01 10:17:01 INFO VerifiableProperties: Property group.id is overridden to console-consumer-9526
17/11/01 10:17:01 INFO VerifiableProperties: Property zookeeper.connect is overridden to localhost:2181
17/11/01 10:17:02 INFO SimpleConsumer: Reconnect due to error:
java.lang.NoSuchMethodError: org.apache.kafka.common.network.NetworkSend.<init>(Ljava/lang/String;Ljava/nio/ByteBuffer;)V
at kafka.network.RequestOrResponseSend.<init>(RequestOrResponseSend.scala:41)
at kafka.network.RequestOrResponseSend.<init>(RequestOrResponseSend.scala:44)
at kafka.network.BlockingChannel.send(BlockingChannel.scala:114)
at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:88)
at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:86)
at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:114)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$getPartitionMetadata$1.apply(KafkaCluster.scala:126)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$getPartitionMetadata$1.apply(KafkaCluster.scala:125)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:346)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:342)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at org.apache.spark.streaming.kafka.KafkaCluster.org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers(KafkaCluster.scala:342)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:125)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)
at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
at com.defne.KafkaExample$.main(KafkaExample.scala:44)
at com.defne.KafkaExample.main(KafkaExample.scala)
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.kafka.common.network.NetworkSend.<init>(Ljava/lang/String;Ljava/nio/ByteBuffer;)V
at kafka.network.RequestOrResponseSend.<init>(RequestOrResponseSend.scala:41)
at kafka.network.RequestOrResponseSend.<init>(RequestOrResponseSend.scala:44)
at kafka.network.BlockingChannel.send(BlockingChannel.scala:114)
at kafka.consumer.SimpleConsumer.liftedTree1$1(SimpleConsumer.scala:101)
at kafka.consumer.SimpleConsumer.kafka$consumer$SimpleConsumer$$sendRequest(SimpleConsumer.scala:86)
at kafka.consumer.SimpleConsumer.send(SimpleConsumer.scala:114)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$getPartitionMetadata$1.apply(KafkaCluster.scala:126)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$getPartitionMetadata$1.apply(KafkaCluster.scala:125)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:346)
at org.apache.spark.streaming.kafka.KafkaCluster$$anonfun$org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers$1.apply(KafkaCluster.scala:342)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35)
at org.apache.spark.streaming.kafka.KafkaCluster.org$apache$spark$streaming$kafka$KafkaCluster$$withBrokers(KafkaCluster.scala:342)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitionMetadata(KafkaCluster.scala:125)
at org.apache.spark.streaming.kafka.KafkaCluster.getPartitions(KafkaCluster.scala:112)
at org.apache.spark.streaming.kafka.KafkaUtils$.getFromOffsets(KafkaUtils.scala:211)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:484)
at com.defne.KafkaExample$.main(KafkaExample.scala:44)
at com.defne.KafkaExample.main(KafkaExample.scala)
17/11/01 10:17:02 INFO SparkContext: Invoking stop() from shutdown hook
17/11/01 10:17:02 INFO SparkUI: Stopped Spark web UI at http://192.168.56.1:4040
17/11/01 10:17:02 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17/11/01 10:17:02 INFO MemoryStore: MemoryStore cleared
17/11/01 10:17:02 INFO BlockManager: BlockManager stopped
17/11/01 10:17:02 INFO BlockManagerMaster: BlockManagerMaster stopped
17/11/01 10:17:02 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17/11/01 10:17:02 INFO SparkContext: Successfully stopped SparkContext
17/11/01 10:17:02 INFO ShutdownHookManager: Shutdown hook called
17/11/01 10:17:02 INFO ShutdownHookManager: Deleting directory C:\Users\user\AppData\Local\Temp\spark-a584950c-10ca-422b-990e-fd1980e2260c
Any help will be greatly appreciated.
NOTE
Add hostname:port for Kafka brokers, not Zookeeper
import kafka.serializer.StringDecoder
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.kafka.KafkaUtils
val ssc = new StreamingContext(new SparkConf, Seconds(60))
// hostname:port for Kafka brokers, not Zookeeper
val kafkaParams = Map("metadata.broker.list" -> "localhost:9092,anotherhost:9092")
val topics = Set("sometopic", "anothertopic")
val stream = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder](ssc, kafkaParams, topics)

Kafka message consumption with spark

I am using HDP-2.3 sandbox for Consuming kafka messages by running SPARK submit job.
i am putting some messages in kafka as below:
kafka-console-producer.sh --broker-list sandbox.hortonworks.com:6667 --topic webevent
OR
kafka-console-producer.sh --broker-list sandbox.hortonworks.com:6667 --topic test --new-producer < myfile.txt
Now i need to consume above messages from spark job as shown below:
./bin/spark-submit --master spark://192.168.255.150:7077 --executor-memory 512m --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount lib/spark-examples-1.4.1-hadoop2.4.0.jar 192.168.255.150:2181 webevent 10
Where 2181 is a zookeeper port
I am getting Error as shown(Guide me how to consume that message from Kafka):
16/05/02 15:21:30 INFO SparkContext: Running Spark version 1.3.1
16/05/02 15:21:30 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/02 15:21:31 INFO SecurityManager: Changing view acls to: root
16/05/02 15:21:31 INFO SecurityManager: Changing modify acls to: root
16/05/02 15:21:31 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
16/05/02 15:21:31 INFO Slf4jLogger: Slf4jLogger started
16/05/02 15:21:31 INFO Remoting: Starting remoting
16/05/02 15:21:32 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver#sandbox.hortonworks.com:53950]
16/05/02 15:21:32 INFO Utils: Successfully started service 'sparkDriver' on port 53950.
16/05/02 15:21:32 INFO SparkEnv: Registering MapOutputTracker
16/05/02 15:21:32 INFO SparkEnv: Registering BlockManagerMaster
16/05/02 15:21:32 INFO DiskBlockManager: Created local directory at /tmp/spark-c70b08b9-41a3-42c8-9d83-bc4258e299c6/blockmgr-c2d86de6-34a7-497c-8018-d3437a100e87
16/05/02 15:21:32 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
16/05/02 15:21:32 INFO HttpFileServer: HTTP File server directory is /tmp/spark-a8f7ade9-292c-42c4-9e54-43b3b3495b0c/httpd-65d36d04-1e2a-4e69-8d20-295465100070
16/05/02 15:21:32 INFO HttpServer: Starting HTTP Server
16/05/02 15:21:32 INFO Server: jetty-8.y.z-SNAPSHOT
16/05/02 15:21:32 INFO AbstractConnector: Started SocketConnector#0.0.0.0:37014
16/05/02 15:21:32 INFO Utils: Successfully started service 'HTTP file server' on port 37014.
16/05/02 15:21:32 INFO SparkEnv: Registering OutputCommitCoordinator
16/05/02 15:21:32 INFO Server: jetty-8.y.z-SNAPSHOT
16/05/02 15:21:32 INFO AbstractConnector: Started SelectChannelConnector#0.0.0.0:4040
16/05/02 15:21:32 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/05/02 15:21:32 INFO SparkUI: Started SparkUI at http://sandbox.hortonworks.com:4040
16/05/02 15:21:33 INFO SparkContext: Added JAR file:/usr/hdp/2.3.0.0-2130/spark/lib/spark-examples-1.4.1-hadoop2.4.0.jar at http://192.168.255.150:37014/jars/spark-examples-1.4.1-hadoop2.4.0.jar with timestamp 1462202493866
16/05/02 15:21:34 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster#192.168.255.150:7077/user/Master...
16/05/02 15:21:34 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20160502152134-0000
16/05/02 15:21:34 INFO AppClient$ClientActor: Executor added: app-20160502152134-0000/0 on worker-20160502150437-sandbox.hortonworks.com-36920 (sandbox.hortonworks.com:36920) with 1 cores
16/05/02 15:21:34 INFO SparkDeploySchedulerBackend: Granted executor ID app-20160502152134-0000/0 on hostPort sandbox.hortonworks.com:36920 with 1 cores, 512.0 MB RAM
16/05/02 15:21:34 INFO AppClient$ClientActor: Executor updated: app-20160502152134-0000/0 is now RUNNING
16/05/02 15:21:34 INFO AppClient$ClientActor: Executor updated: app-20160502152134-0000/0 is now LOADING
16/05/02 15:21:34 INFO NettyBlockTransferService: Server created on 43440
16/05/02 15:21:34 INFO BlockManagerMaster: Trying to register BlockManager
16/05/02 15:21:34 INFO BlockManagerMasterActor: Registering block manager sandbox.hortonworks.com:43440 with 265.4 MB RAM, BlockManagerId(<driver>, sandbox.hortonworks.com, 43440)
16/05/02 15:21:34 INFO BlockManagerMaster: Registered BlockManager
16/05/02 15:21:35 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
16/05/02 15:21:35 INFO VerifiableProperties: Verifying properties
16/05/02 15:21:35 INFO VerifiableProperties: Property group.id is overridden to
16/05/02 15:21:35 INFO VerifiableProperties: Property zookeeper.connect is overridden to
16/05/02 15:21:35 INFO SimpleConsumer: Reconnect due to socket error: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
Error: application failed with exception
org.apache.spark.SparkException: java.io.EOFException: Received -1 when reading from channel, socket has likely been closed.
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
at scala.util.Either.fold(Either.scala:97)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:415)
at org.apache.spark.streaming.kafka.KafkaUtils$.createDirectStream(KafkaUtils.scala:532)
at org.apache.spark.streaming.kafka.KafkaUtils.createDirectStream(KafkaUtils.scala)
at org.apache.spark.examples.streaming.JavaDirectKafkaWordCount.main(JavaDirectKafkaWordCount.java:71)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:577)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:174)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
OR
wen i use this:
./bin/spark-submit --master spark://192.168.255.150:7077 --executor-memory 512m --class org.apache.spark.examples.streaming.JavaDirectKafkaWordCount lib/spark-examples-1.4.1-hadoop2.4.0.jar 192.168.255.150:6667 webevent 10
where 6667 is a Kafka’s message producing port, i am getting this error:
16/05/02 15:27:26 INFO SimpleConsumer: Reconnect due to socket error: java.nio.channels.ClosedChannelException
Error: application failed with exception
org.apache.spark.SparkException: java.nio.channels.ClosedChannelException
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
at org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:416)
i dont know if this can help:
./bin/spark-submit --class consumer.kafka.client.Consumer --master spark://192.168.255.150:7077 --executor-memory 1G lib/kafka-spark-consumer-1.0.6.jar 10

Submitting a job to Apache Spark Error

I have the following settings for my Apache Spark instance that runs locally on my machine:
export SPARK_HOME=/Users/joe/Softwares/apache/spark/spark-1.6.0-bin-hadoop2.6
export SPARK_MASTER_IP=127.0.0.1
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=8080
export SPARK_LOCAL_DIRS=$SPARK_HOME/work
export SPARK_WORKER_CORES=1
export SPARK_WORKER_MEMORY=1G
export SPARK_EXECUTOR_INSTANCES=2
export SPARK_DAEMON_MEMORY=384m
I have a spark streaming consumer that I would like to submit to Spark. This streaming consumer is just a jar file that I submit like this:
$SPARK_HOME/bin/spark-submit --class com.my.job.MetricsConsumer --master spark://127.0.0.1:7077 /Users/joe/Sandbox/jaguar/spark-kafka-consumer/target/scala-2.11/spark-kafka-consumer-0.1.0-SNAPAHOT.jar
I get the following error:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/01/13 10:30:06 INFO SparkContext: Running Spark version 1.6.0
16/01/13 10:30:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/01/13 10:30:06 INFO SecurityManager: Changing view acls to: joe
16/01/13 10:30:06 INFO SecurityManager: Changing modify acls to: joe
16/01/13 10:30:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(joe); users with modify permissions: Set(joe)
16/01/13 10:30:07 INFO Utils: Successfully started service 'sparkDriver' on port 65528.
16/01/13 10:30:07 INFO Slf4jLogger: Slf4jLogger started
16/01/13 10:30:08 INFO Remoting: Starting remoting
16/01/13 10:30:08 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem#172.22.0.104:65529]
16/01/13 10:30:08 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 65529.
16/01/13 10:30:08 INFO SparkEnv: Registering MapOutputTracker
16/01/13 10:30:08 INFO SparkEnv: Registering BlockManagerMaster
16/01/13 10:30:08 INFO DiskBlockManager: Created local directory at /Users/joe/Softwares/apache/spark/spark-1.6.0-bin-hadoop2.6/work/blockmgr-cee3388d-ecfc-42a7-a76c-8738401db0c9
16/01/13 10:30:08 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
16/01/13 10:30:08 INFO SparkEnv: Registering OutputCommitCoordinator
16/01/13 10:30:08 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/01/13 10:30:08 INFO SparkUI: Started SparkUI at http://172.22.0.104:4040
16/01/13 10:30:08 INFO HttpFileServer: HTTP File server directory is /Users/joe/Softwares/apache/spark/spark-1.6.0-bin-hadoop2.6/work/spark-10d7d880-7d1d-4234-88d4-d80558c8051a/httpd-40f80936-7508-4b6c-bb90-411aa37d7e93
16/01/13 10:30:08 INFO HttpServer: Starting HTTP Server
16/01/13 10:30:08 INFO Utils: Successfully started service 'HTTP file server' on port 65530.
16/01/13 10:30:09 INFO SparkContext: Added JAR file:/Users/joe/Sandbox/jaguar/spark-kafka-consumer/target/scala-2.11/spark-kafka-consumer-0.1.0-SNAPAHOT.jar at http://172.22.0.104:65530/jars/spark-kafka-consumer-0.1.0-SNAPAHOT.jar with timestamp 1452677409966
16/01/13 10:30:10 INFO AppClient$ClientEndpoint: Connecting to master spark://myhost:7077...
16/01/13 10:30:10 WARN AppClient$ClientEndpoint: Failed to connect to master myhost:7077
java.io.IOException: Failed to connect to myhost:7077
export MAVEN_OPTS="-Xmx512m -XX:MaxPermSize=128m"
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:200)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:187)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:183)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:101)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622)
at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.connect(AbstractNioChannel.java:207)
at io.netty.channel.DefaultChannelPipeline$HeadContext.connect(DefaultChannelPipeline.java:1097)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
at io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
at io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)
at io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)
at io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438)
at io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908)
at io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)
at io.netty.bootstrap.Bootstrap$2.run(Bootstrap.java:166)
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
... 1 more
I have checked my firewall settings and eveything seems to be Ok. Why would I get this error? Any ideas?

Resources