Apache Cassandra integration with RabbitMQ - cassandra

I want to use MQTT as a protocol communication with RabbitMQ Message Broker, but from rabbitmq website I found this paragraph:
These implementations are suitable for development but sometimes won't be for production needs. MQTT 3.1 specification does not define consistency or replication requirements for retained message stores, therefore RabbitMQ allows for custom ones to meet the consistency and availability needs of a particular environment. For example, stores based on Riak and Cassandra would be suitable for most production environments as those data stores provide tunable consistency.
https://www.rabbitmq.com/mqtt.html
So, from this paragraph, I should to use Cassandra as a database for RabbitMQ, but I didn't find anything about integration Cassandra as a database for rabbitmq.
can you help me by giving me something to make it possible.
NB:I'm newbie in RabbitMQ.

The paragraph refers specifically to "retained messages" part of MQTT spec, as in, the messages you want to keep for a long time. Like a "last know configuration", that you may want to apply to any MQTT subscriber, regardless whether or not it has been online and subscribed at the moment the message is published.
It's a very particular situation and unless you need that feature you don't have to worry about using RabbitMQ as MQTT broker. For regular messages built-in RabbitMQ replication options are perfectly suitable and production-ready.

Until now, RabbitMQ doesn't support this feature.
so, it's not possible to use another database instead of Mnseia database

Related

Many ordered queues - how to auto rebalancing streams between app instances?

Problem description
I want to deploy distributed, ordered queues solution for my project but I have questions/problems:
Which tool/solution should I use? Which would be the easiest to implement/learn and infrastructure cost me less? RabbitMQ, Kafka, Redis Streams?
How to implement auto rebalancing of topics/streams for each consumer in failure situation or when new topic/stream was added to system?
In other words, I want to realize something like that:
distributed queues
..but, if one of my application are failed, other instances should take all traffic which is currently left with proper distribution (equal load).
Note, that my code was written in node.js v10 (TypeScript) and my infrastructure are based on Azure, so besides self-hosted solution (like RabbitMQ), azure-based solution (like Azure Service Bus) are also possible, but less vendor-lock, the better solution for me
My current architecture
Now I provide a more detailed background of my system:
I have 100 000 vehicle's tracker devices (different ones, many manufactures and protocols), each of them communicate with one of my custom app called decoder. This small microservice decodes and unifies payload from tracker and send it to distributed queue. Each tracker sends message every 10-30 seconds.
Note, that I must keep order of messages from single device, this is very important!
In next step, I have processing app microservice which I want to scale (forking / clustering) depends of number of tracker devices. Each fork of this app should subscribe to some of topics/consumer groups to process messages from devices, while keeping order. Processing of each message takes about 1-3 seconds.
Note, that in every moment of time, I can add or remove tracker devices, and this information should be auto-propagate to forks of processing app and this instances should be able to auto rebalancing traffic from queue.
The question is how to do that with as little as possible lines of (node.js) code, and at the same time, keeping solution easy, clean and cheap? :)
As you see at picture above, if fork no.3 failed, system must decide which of working forks should be get "blue" messages. Also, if fork no.3 return back, rebalancing is also needed.
My own research
I read about Apache Kafka with Consumer Groups, but Kafka is difficult to learn and to implement for me.
I read about RabbitMQ and Consumer Groups / many topics, but I don't know how to write auto rebalancing feature and also how I can use RabbitMQ (which plugins? which settings / configurations? there's so many options...).
I read about Azure Service Bus with message sessions but it has vendor-lock (azure cloud), it costs a lot, and like other solutions, doesn't provide full auto-rebalancing out-of-box.
I read about Redis Streams (with consumer groups) but it's new feature (lack of libraries for node.js) and also doesn't provide auto-rebalancing.
1 Message Brocker
For the first question you should look for a mature m2m protocol brocker which will give you freedom in designing your own intelligent data switching algorithms.
2 Loadbalancer
The answer to the second question you must employ well performed load balancer for handling such a huge number of 100000 connected cars. My suggestion to use Azure API Gateway or Nginx load balancer.
Now lets look at some of connected car solutions and analyze how the Aws IoT or Azure IoT doing the job nicely.
OpenSource IoT Solution
OpenSource IoT Solution
Nginx or API Gateway is used for the load Balancing purposes while the event processing is done on Kafka. Using kafka you can implement your own rule engine for intelligent data switching. Similarly any Message Broker as IoT bridge would do better. If I were you would be using VerneMQ to implement MQTTv5 features and data routing. In this case queue is not required.
Again if you want to use azure queue you have to concentrate on managing the queue forking and preempting. To control the queue seamlessly you have to write Azure Queue Trigger server-less Function. Thus your goal to not be vendor locked would be impossible to achieve.
In single word using VerneMQ, MQTT V5 implementation with Nginx would be great to implement but as all these are opensource product you must be strong in implementation and trouble shooting otherwise your business operation would be in support failure.
Its better to use professional IoT cloud services for a solution of thousands of connected cars. This is paying of as the SLA of the service is very high standard and little effort in system operation management.
Azure IoT Solution
Azure IoT Solution
If you are using Azure Solution, you be using IoT Hub where you don't have to worry about load balancing. Using Azure device SDK you can connect all the car with mobile LTE sim, OBD plugin etc to the cloud. Then azure function can handle the event processing and so on.
AWS IoT Solution
AWS IoT Solution
Unlike Azure IoT Device SDK, AWS IoT have sdk for devices. But in this architecture we want to complete the connected car project a little differently. For the shake of thing shadow and actual device status synchronization we have used AWS GreenGrass core solution in the edge side. Along with the server-less IoT event processing we have settled the whole connected car solution.
Similarly Azure IoT edge could be used to provide all can information to the device twin and synchronize between the actual car and twins.
Hope this will give you a clear idea how to implement and see the cost benefit over the vendor locked or unlocked situation.
Thank you.

Hazelcast vs Redis(or RedisLabs) for NodeJS application

I have an application having more than 2 TB of data to be stored in cache, the data will be accessed using NodeJS APIs. For a NodeJS app which would be a better choice, Hazelcast or Redis(or RedisLabs)? Considering following criteria?
NodeJS API Support, including connection pooling. Looks like HazelCast doesn't have NodeJS API
I understand that in benchmarking Hazelcast is faster due to multithreaded implementation, and its scalable as well. But can we effectively leverage these good features using NodeJS(need Set datastructure)? Lastly, we can have multiple shards in RedisLabs which will be like having multiple threads or processes working on their respective chunk of data, in that case I believe the Hazelcast's edge due to multi-threaded nature would be true for Redis but not for RedisLabs, Any comments in this?
Hazelcast Node.js client in fact does exist and currently provides following features
implementation of Open Client Binary Protocol, Redis uses text-based protocol
Map
Get
Put
Remove
Smart Client - clients connect to each cluster node.
Since each data partition uses the well known and consistent hashing algorithm, each client can send an operation to the relevant cluster node, which increases the overall throughput and efficiency. The client doesn't need to be restarted in case of adding or removing nodes from the cluster.
Distributed Object Listener
Lifecycle Service
In terms of comparing Hazelcast and Redis server-side features, you find comprehensive doc here.
Thank you
Well, I would suggest if you are using very complex data handling/processing you should go with HazelCast, and by nature nodejs is single threaded so if you are using it just to store key-value dont go with it.
There is official API you can use for (NodeJS + hazelcast) but with very limited functionality, uses only KeyVal
If you are just using cache as key-value store Redis is good, fast, FREE!, it can handle huge data as well with some extra setup take look at
http://redis.io/topics/partitioning
in terms of support pricing RedisLabs are less costly and if you use Redisson with Redis it can give you all data structure which Hazelcast use :)
BitSet, Set, Multimap, SortedSet, Map, List, Queue, BlockingQueue, Deque, BlockingDeque, Semaphore, Lock, AtomicLong, CountDownLatch, Publish / Subscribe, Bloom filter, Remote service, Spring cache, Executor service, Live Object service, Scheduler service
RedisLabs is having much more userbase + contributors, HazelCast is bit less if you comapre users, so if your data is 2TB if its just key-value .. Redis be best

Managing RabbitMQ

I'm new to RabbitMQ and MQ's in general. I'm using the rabbit.js Node.js module to interface with RabbitMQ, so all my application layer is going to mainly be in Node.js. What I'm wondering is, how do I manage RabbitMQ? How can I see everything that's going on with RabbitMQ, from what's messages are left in the queue to general configuration and administration?
I'm looking for something visual, but more importantly, easy to use and simple.
RabbitMQ has a web interface (part of the rabbitmq_management plugin which ships with RabbitMQ, but needs to be enabled) that allows you to see the servers, exchanges, queues, etc.
It's pretty easy to use. One thing I would recommend is to set the time-interval on the graphs to 10 minutes. I find if you set them to longer, say, an hour plus, the information (due to the way it's bucketed, I think) gets a bit wonky.
Check out this link for more info: https://www.rabbitmq.com/management.html
There is also a JSON API that can be used to programmatically determine, for example, how many items are in a particular queue.
There's also a cmdline tool, called rabbitmqadmin (https://www.rabbitmq.com/management-cli.html) which can come in really handy for things like setting up test RabbitMQ test environments via a bash script and things of that nature.
check JXM.io sources (open source messaging backend for node.js / JXcore) that uses RabbitMQ for multi server integration and there is a nice article showing how to cluster RabbitMQ http://jxm.io/multi-server-messaging-backend-installation/

Which one to choose from STOMP/AMQP?

i am using the node.js as client to jms topic.There are two protocols available to make the connection on Topic.
Theses are Stomp and AMQP. I am read brief about them at http://en.wikipedia.org/wiki/Advanced_Message_Queuing_Protocol
and http://en.wikipedia.org/wiki/Streaming_Text_Oriented_Messaging_Protocol. Both seems to be wire level protocol i.e
data that is sent across the network as a stream of octets.I do not find any
concrete reason which one should be preferred. If someone can shed light on it, it would be helpful.
Another point is both the protocol takes pride in stating them as interoprable. Does the interoperable term means that if someone
want to take off specific message broker implementation say apache active MQ and instead want to plugin Websphere MQ , transition
will be smooth(provided both support AMQP/STOMP or any other wire levl protocol)?
You may see a difference in performance (refer to this benchmark based upon many factors including message size and persistence requirements for the queue entries.
As is often the case, there are other factors to consider as well, especially if your message size/count/etc. doesn't mean there's a clear winner in terms of performance and no one protocol meets your functional requirements in a way the other does not.
This article in particular hints there may be more fragmentation in the different STOMP broker implementations. Quoting from that article
STOMP...uses a SEND semantic with a “destination” string. The broker must map onto something that it understands internally such as a topic, queue, or exchange. Consumers then SUBSCRIBE to those destinations. Since those destinations are not mandated in the specification, different brokers may support different flavours of destination. So, it’s not always straightforward to port code between brokers.
At least with AMQP (which touts interoperability as one of its most important advantages) the only issues you should have with switching providers/languages are those inherent in setting up said new providers. For example, I've read ZeroMQ is likely to take more configuration work on your part than RabbitMQ, but that's not really due to any attributes specific to AMQP.

Messaging bus + event storage + PubSub

I'm looking at building an application which has many data sources, each of which put events into my system. Events have a well defined data structure and could be encoded using JSON or XML.
I would like to be able to guarantee that events are saved persistently, and that the events are used as a part of a publish/subscribe bus with multiple subscribers possible per event.
For the database, availability is very important even as it scales to multiple nodes, and partition tolerance is important so that I can scale the number of places which can store my events. Eventual consistency is good enough for me.
I was thinking of using a JMS enterprise messaging bus (e.g. Mule) or an AMQP enterprise messaging bus (such as RabbitMQ or ZeroMQ).
But for my application, it seems that if I could set up a publish subscribe system with CouchDB or something similar, it would solve my problem without having to integrate a enterprise messaging bus and a persistent storage system.
Which would work better, CouchDB + scaling + loadbalancing + some kind of PubSub mechanism, or an explicit PubSub messaging system with attached eventually-consistent , Available, partition-tolerant storage? Which one is easier to set up, administer, and operate? Which solution will have high throughput for a given cost? Why?
Also, are there any more questions I should ask before selecting my technologies? (BTW, Java is the server-side and client-side language).
I am using a CouchDB message queue in production. (It is not pub/sub, so I do not consider this answer complete.)
Currently (June 2011), CouchDB has huge potential as a messaging substrate:
Good data persistence
Well-poised for clustering (on a LAN, using BigCouch or Lounge)
Well-poised for distribution (between data centers, world-wide)
Good platform. Despite the shortcomings listed below, I love CQS because I can re-use my DB and it works from Erlang, NodeJS, and every web browser.
The _changes query
Continuous feeds, instant delivery without polling
Network going down is no problem, just retry later from the previous position
Still, even a low-volume message system in CouchDB requires careful planning and maintenance. CouchDB is potentially a great messaging server. (It is inspired by Lotus notes, which handles high email volume.)
However, these are the challenges with CouchDB:
Append-only database files grow fast
Be mindful about disk capacity
Be mindful about disk i/o. Compaction will read and re-write all live documents
Deleted documents are not really deleted. They are marked deleted=true and kept forever, even after compaction! This is in fact uniquely good about CouchDB, because the deleted action will propagate through the cluster, even if the network goes down for a time.
Propagating (replicating) deletes is great, but what about the buildup of deleted docs? Eventually it will outstrip everything else. The solution is to purge them, which actually removes them from disk. Unfortunately, if you do 2 or more purges before querying a map/reduce view, the view will completely rebuild itself. That may take too much time, depending on your needs.
As usual, we hear NoSQL databases shouting "free lunch!", "free lunch!" while CouchDB says "you are going to have to work for this."
Unfortunately, unless you have compelling pressure to re-use CouchDB, I would use a dedicated messaging platform. I had a good experience with ejabberd as a messaging platform and to communicate to/from Google App Engine.)
I think that the best solution would be CouchDB + Jabber/XMPP server (ejabberd) + book: http://professionalxmpp.com
JSON is the natural storing mechanism for CouchDB
Jabber/XMPP server includes pubsub support
The book is a must read
While you can use a database as an alternative to a message queueing system, no database is a message queuing system, not even CouchDB. A message queueing system like AMQP provides more than just persistence of messages, in fact with RabbitMQ, persistence is just an invisible service under the hood that takes care of all of the challenges that you have to deal with by yourself on CouchDB.
Take a good look at the RabbitMQ website where there is lots of information about AMQP and how to make use of it. They have done a great job of collecting together articles and blogs about message queueing.

Resources