Is there an official Apache Spark Streaming connector for Rabbitmq. If not is it there an alternate way to do it without using external libraries.
Related
I want to create multiple kafka Topics run time in my Spark Structured Streaming application. I found that there are various methods available in Java API. But I couldn't find any with Spark Structured Streaming.
Please let me know if there is any way available or I need to use java library
My apache Spark version is 2.4.4 and Kafka library dependency is spark-sql-kafka-0-10_2.12
AFAIK, Spark doesn't create topics.
You can use the same Java APIs you've found before initializing your SparkSession
spark-sql-kafka includes kafka-clients, so you have the AdminClient class available
How to create a Topic in Kafka through Java
I am trying to build a connector from Apache Cassandra to Apache Ignite. Basically, I want to write all the new incoming data from Cassandra to Ignite. Is there any connector or something which can be helpful?
N.B - Stream data from Cassandra to Ignite
Ignite provides such integration out of the box, see here: https://apacheignite-mix.readme.io/docs/ignite-with-apache-cassandra
I want to use Spark structured streaming to aggregate data which is consumed from RabbitMQ.
I know there is official spark structured streaming integration with apache kafka, and I was wondering if there exists some integration with RabbitMQ as well?
Since I'm not able to switch the existing messaging system (RabbitMQ), I thought of using kafka-connect to move the data between the messaging systems (Rabbit to kafka) and then use Spark structured streaming.
Does anyone knows a better solution?
This custom RabbitMQ receiver seems to available if you're open to exploring Spark Streaming rather than Structured Streaming.
I was looking if there is a way to load the streaming data from Kafka directly into HDFS using spark streaming and without using Flume.
I have tried it using Flume(Kafka source and HDFS sink) already.
Thanks in Advance!
There is HDFS connector for Kafka Connect. Confluent's documentation have more information.
This is a pretty basic function for Spark Streaming. Depending on what version of spark and Kafka you are using, you can look at the spark streaming kafka integration documentation for the versions you are using. Saving to HDFS is as easy as rdd.saveAsTextFile("hdfs:///directory/filename").
Spark/Kafka integration guide for latest versions
I've integrated kafka and spark streaming after downloading from the apache website. However, I wanted to use Datastax for my Big Data solution and I saw you can easily integrate Cassandra and Spark.
But I can't see any kafka modules in the latest version of Datastax enterprise. How to integrate kafka with spark streaming here?
What I want to do is basically:
Start necessary brokers and servers
Start kafka producer
Start kafka consumer
Connect spark streaming to kafka broker and receive the messages from there
However after a quick google search, I can't see anywhere that kafka has been incorporated with datastax enterprise.
How can I achieve this? I'm really new to datastax and kafka and all so I need some advice. Language preference- Python.
Thanks!
Good question. DSE does not incorporate Kafka out of the box, you must set up kafka yourself and then set up your spark streaming job to read from kafka. Since DSE does bundle spark, use DSE Spark to run your spark streaming job.
You can use either the direct kafka API or kafka receivers, more details here on the tradeoffs. TL;DR direct api does not require WAL or zookeeper for HA.
Here is an example of how you can configure Kafka to work with DSE by Cary Bourgeois:
https://github.com/CaryBourgeois/DSE-Spark-Streaming/tree/master