Version of Kafka Connector For Use in Spark Streaming - apache-spark

The latest version of Kafka available for download is Kafka 2.1.0. But in order to use Kafka in Spark Streaming, or Spark Structured Streaming, we use respectively the following connectors:
spark-streaming-kafka-0-10_2.11
spark-sql-kafka-0-10_2.11
My question is that it seems that the connectors are for Kafka version 0.10.0.0 since the name of the connectors include 0-10. Is there something that I don't understand here, or we are really using connectors which are for much older versions of Kafka?

For Spark Structure Streaming 2.4, Kafka Client 2.0 is used.
0-10 means it is compatible with Kafka Brokers in version 0.10 or above.
You can check it in pom.xml in spark project: https://github.com/apache/spark/blob/branch-2.4/external/kafka-0-10-sql/pom.xml#L33

Related

Spark Streaming with Spark 2 and Kafka 2.1

I'm upgrading a Java project from Cloudera 5.10 to Cloudera 6.2. We have Spark Streaming reading data from Kafka to process it and write the results elsewhere. During the upgrade, Spark is going from v1.6 to v2.1, and Kafka from v0.8 to v2.1.
To perform the streaming processing, we were connecting to Kafka using KafkaUtils.createStream(...), but KafkaUtils are not available in Kafka 2.11 anymore. However, I can't seem to find any Spark Streaming + Kafka example or documentation which doesn't use this method in Java.
Is there something I'm missing? What is the best way to connect both worlds in these versions?
The module was renamed to spark-streaming-kafka-0-10
https://mvnrepository.com/artifact/org.apache.spark/spark-streaming-kafka-0-10
However, you should consider using Structured Streaming, instead.

Fetch kafka headers in spark 2.4.X

How to get Kafka header fields (which were introduced in Kafka 0.11+) in Spark Structured Streaming?
I see the headers implementation is added in Spark 3.0 but not in 2.4.5.
And I see by default spark-sql-kafka-0-10 is using kafka-client 2.0.
If it is not possible to read Kafka headers using Spark then can you suggest any alternative?
I don't found the way to do it in spark 2.X. can use Kafka connect SMT if the use case is simple

Can I use spark 2.3.0 and pyspark to do stream processing from Kafka?

I am going to do stream processing with pyspark and use Kafka as a data source.
I see that Kafka 0.10 connector is not supported under Spark Python API.
Can I use Kafka 0.8 connector in Spark 2.3.0 regardless it is deprecated?
It's deprecated, but not deleted. You can use it.
However, you may be interested in Structured Streaming, which has Kafka 0.10 support in Python - link here. This is the new Streaming API in Spark, that will replace DStreams

Can use spark streaming 1.5.1 with kafka 0.10.0?

Can I use spark streaming 1.5.1 with kafka 0.10.0?
The site spark.apache.org recommended that spark streaming should work with kafka 0.8.2;
I just wanna know what if I use kafka 0.10.0 with spark 1.5.1 ?
Can someone please help.
I think it's possible but there are some limitations. For instance, Apache spark 1.5.x does not support features like streaming in a kerberized environment.
This API spark-streaming-kafka-0-10 is still in experimental condition.
You can use this. But you might get some surprises as it is yet to stable.
Please follow the below link:
https://github.com/jerryshao/spark-streaming-kafka-0-10-connector
This is a Kafka 0.10 connector for Spark 1.x Streaming. Hope it will work.

spark streaming + kafka compatibility issue

Will spark streaming compatible with kafka versions above 0.8.2.1?
Is writing custom receiver the only option to make spark streaming use kafka version above 0.9?
I just added the "inter.broker.protocol.version=CURRENT_KAFKA_VERSION (e.g. 0.8.2 or 0.9.0.0)" in server.properties file. That will make the old 0.8.2.1 consumer to receive the data from new versions of kafka brokers.

Resources