Is there any API/ way to integrate Spark Streaming with JMS. I am able to integrate with Kafka and Sockets but to integrate with Jms queue or topic I am unable to.
I think you should try calling reciever api in spark. You need to create custom receiver
http://spark.apache.org/docs/latest/streaming-custom-receivers.html
Also check rely from tathagat das who is spark contributor from
www.apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-JMS-td5371.html
If you need help in detail let me know
I know this is an old post . Since I am working on something similar. You can use spark jms receiver
Spark JMS receiver
Related
We use Apache Nifi to get data from multiple sources like Twitter and Reddit in specific interval (for example 30s). Then we would like to send it to Apache Kafka and probably it should somehow group both Twitter and Reddit messages into 1 topic so that Spark would always receive data from both sources for given interval at once.
Is there any way to do that?
#Sebastian What you describe is basic NiFI routing. You would just route both Twitter and Redis to the same downstream Kafka Producer and same Topic. After you get data into NiFi from each service, you should run it to UpdateAttribute and set attribute topicName to what you want for each source. If there are additional steps per Data Source do them after Update Attribute and before PublishKafka.
If you code all the upstream routes as above, you could route all the different Data Sources to PublishKafka processor using ${topicName} dynamically.
I'm working with NodeJs MS, so far they communicate through Kafka Consumer/Producer. Now I need to buiid a Loggger MS which must record all the messages and do some processing (parse and save to db), but I'm not sure if the current approach could be improved using Kafka Stream or if I should continue using Consumers
The Streams API is a higher level abstraction that sits on top of the Consumer/Producer APIs. The Streams API allows you to filter and transform messages, and build a topology of processing steps.
For what you're describing, if you're just picking up a messages and doing a single processing step, the Consumer API is probably fine. That said, you could do the same thing with the Streams API too and not use the other features.
buiid a Loggger MS which must record all the messages and do some processing (parse and save to db)
I would suggest using something like Streams API or Nodejs Producer + Consumer to parse and write back to Kafka.
From your parsed/filtered/sanitized messages, you can run a Kafka Connect cluster to sink your data into a DB
could be improved using Kafka Stream or if I should continue using Consumers
Ultimately, depends what you need. The peek and foreach methods of Streams DSL are functionally equivalent to a Consumer
Is there a way to send some custom messages from Executor to Driver In Apache Spark? It is quite evident from driver and executor logs that there is a a lot of framework level communication is happening however I did not find any API to send custom messages between processes. Please advise.
I think you can use Accumulators!
read about it: https://www.tutorialspoint.com/apache_spark/advanced_spark_programming.htm
You can not send messages, but you can aggregate data at executors and use it at the driver after the parallel computation ends.
I might need to work with Kafka and I am absolutely new to it. I understand that there are Kafka producers which will publish the logs(called events or messages or records in Kafka) to the Kafka topics.
I will need to work on reading from Kafka topics via consumer. Do I need to set up consumer API first then I can stream using SparkStreaming Context(PySpark) or I can directly use KafkaUtils module to read from kafka topics?
In case I need to setup the Kafka consumer application, how do I do that? Please can you share links to right docs.
Thanks in Advance!!
Spark provide internal kafka stream in which u dont need to create custom consumer there is 2 approach to connect with kafka 1 with receiver 2. direct approach.
For more detail go through this link http://spark.apache.org/docs/latest/streaming-kafka-integration.html
There's no need to set up kafka consumer application,Spark itself creates a consumer with 2 approaches. One is Reciever Based Approach which uses KafkaUtils class and other is Direct Approach which uses CreateDirectStream Method.
Somehow, in any case of failure ion Spark streaming,there's no loss of data, it starts from the offset of data where you left.
For more details,use this link: http://spark.apache.org/docs/latest/streaming-kafka-integration.html
We previously used to have a Spring Integration flow (XML configuration-based) where we would do an update in a database after sending a message to a JMS queue. To achieve this, the SI flow was configured with a publish-subscribe queue channel as an input to a JMS Outbound Channel Adapter (order 0) and a Service Activator (order 1). The idea here being that after a successful JMS send, the service activator would be called thus, updating the data in the database.
We are now in the process of updating our flows to work with spring-integration:4.0.x APIs and wanted to use this opportunity to see if the described flow pattern is still a good/recommended way of doing a database update after a successful JMS send or if there is now a simpler/better way of achieving this? As a side note, our flows are now being implemented using spring-integration-java-dsl:1.0.0.M3 APIs.
Thanks in advance for any input on this,
PM.
publish-subscribe queue channel
There's no such thing as a pub-sub queue channel; by definition, it's a subscribable channel; so I assume that's what you mean.
It is one of the ways to do what you need, and perfectly fine; you can also achieve the same result with a RecipientListRouter. The dsl syntax is quite nice, especially with Java 8; see the SpringOne demo app for an example.