Nodejs Cluster with MQTT client [closed] - node.js

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I have a nodejs engine that uses MQTT to subscribe to messages from many IoT devices. As the number of IoT devices has increased, I want to run the nodejs engine in a cluster mode. This results in every cluster receiving all the MQTT messages. Is there a way to avoid this? A way in which one MQTT message will be received only once and the load of servicing the messages can be equally distributed.
Setup:
Nodejs Engine with MQTT client running via pm2 in an EC2 instance.
MQTT broker running in another EC2 instance.

You need to use a broker that supports shared subscriptions.
This allows multiple clients to connect to the broker and then all subscribe to the same topic[s] and the broker will deliver each message to only one client in the group in a round robin fashion.
Shared subscriptions are an option part of the MQTT v5 spec and some brokers have none standard implementations in their MQTT v3 brokers.
You can read more about shared subscriptions here

Related

Can Azure VM handle the traffic of 100 K or more device if i use Azure VM as TCP Socket server [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I have written a TCP Socket program through which OBD-II devices connect now I want to host my TCP Socket code on Azure VM and this is my first time going with azure so the questions arise:
Will Azure VM be able to handle 100K devices or more?
If not which other Azure services should I use for this kind of problem.
Please help me to figure it out.
Thanks
For TCP Listeners, you should use Azure VMs. In the past, it was also possible to work with Worker Roles (Cloud Services) but it is now deprecated. In terms of the workload, we can't tell if a single VM is enough or not, but you should have more than one for high availability reasons.
In your shoes, I would do a Proof Of Concept with two virtual machines, then a load test to check how it behaves. And in case, add more vms.
PS: Try to use Azure Virtual Machine Scale Sets as it will help you in case you need to scale the architectre when / if needed.
According to the VM network throughput documentation on Github, Azure VMs support up to 500,000 inbound and 500,000 outbound flows. That's just from a networking perspective.
The next question is, can your server application handle the load induced by that many connections? That would seem to be a function of the number of concurrent requests, and the processing resources required to handle them.
So, the answer to 1. above is, "definitely maybe". You would need to benchmark in order to answer this question definitively.
If the answer you determine for your load pattern and application is "no", the answer to 2. might be to spin up multiple VMs, which you could do in a Virtual Machine Scale Set, or by manually spinning them up, and then putting an Azure Load Balancer in front of it.

Design suggestion for RabbitmQ Producer & Consumer [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have a RabbitMQ design question. Let say I have 6 exchanges and 10 queues and split up as below
5 exchange with type 'fanout' and has 5 queues bound to it
1 exchange with type 'topic' and it is routed to 5 different queues based on the routing key.
I have a microservice application which runs Kubernetes with the scale of 25 and all these applications acquire 1 rabbitmq connection per process. So 25 rabbitmq connections act as producer.
I have another application which also runs in Kubernetes with the scale of 1 and these applications acquire 1 rabbitmq connection. So 1 rabbitmq connection act as a consumer.
Numbers: Let say every exchange gets 100k messages per day.
Tech stack: Node.js + amqplib
Questions:
How many channels should producer needs to create for publishing the messages to exchanges?
How many channels should consumer needs to create for consuming the messages from the queues?
Is it a good approach to have one application act as a consumer which consumes the message from all the queues?
How can I scale the consumers automatically based on the queue size in Kubernetes?
Is it possible to build priority based on consumers? Let say due to heavy load conditions, I would like the consumers to stop consuming from a couple of queues and focus all the resources on the rest of the queues.
How many connections should producer & consumer create?
Thanks in advance :)
Semantically, there will be publishing and consuming components in your system. Each should use a channel, primarily because error reporting and handling is channel-scoped.
Whether a single application should consume from "all" queues, entirely depends on how you structure your services.
Same for controlling what consumers consume from what queues. Usually queue and consumers have semantical "types" and serve certain purposes.
Simply adding more consumers and increasing prefetch will only work up to a point, a single queue has a realistic throughput limit.
Scaling application instances based on queue length (messages in Ready state specifically) involves monitoring individual queue metrics. That only works with a small number of queues (with e.g. 100K queues collecting all metrics from all them becomes really expensive).
A small application that monitors metrics of an individual queue or the totals, and updates the number of replicas of an app in a deployment using the Kubernetes API should do.

how do you determine the connection pool size of a e.g nodejs app.. based on the connection limit? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
Both apps are hosted in aws beanstalk and they are the same code base
Divide connection limit across apps e.g if i have two instances then its 50/50 both instances
OR
Set the Connection limit equal to the connection pool.
I have an API accessed by two kinds of users over a monolithic architecture, and I expect a +10k unique users. The users are event organizers and participants. I'm using a pooling mechanism of each nodejs instances.
My question is: how should I distribute the connection pool based on my database, having a limit of say 500 connection_limi. Or do I adjust my nodejs replica to the limit of my database or just set all replica to 500 pool connectionn?
It seems like you have several application server instances connecting to a single database.
Then you should install pgBouncer on the database server, because connection pooling on the application server alone would not be effective. You should set the pool size to something very small like 20 - 50, depending on the number of cores on the database and the number of concurrent I/O requests the database storage can handle. It does not really matter how many connections to pgBouncer you allow.
Make sure your database transactions are short and use transaction pooling mode, and you will be able to handle a big workload.

How to set partition in azure event hub consumer java code [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
i want to know the purpose of the hostname in EventProcessHost and how to set partition in consumer side . right now i am able to get data from consumer group but all partitions goes to the output stream .
Questions:
1.How to set partition via code java.
2.Use of hostname in EventProcessHost
3.example for multi consumer each has it's own partition in java code.
I highly appreciate any help.
There is a complete Java example, see the docs
You don't need to set a partition when you use an EventProcessHost. Instead, each instance will lease a partition it will work on. So if you created the event hub using ,say 4 partitions, you should instantiate 4x EventProcessHost to get better troughput. See the linked docs as well:
This tutorial uses a single instance of EventProcessorHost. To increase throughput, it is recommended that you run multiple instances of EventProcessorHost, preferably on separate machines. This provides redundancy as well. In those cases, the various instances automatically coordinate with each other in order to load balance the received events.
Leases are given out for a specific time only. After that another receiver can take over that lease. If you give it a while you should notice all instances will retrieve data.
About the hostname:
When receiving events from different machines, it might be useful to specify names for EventProcessorHost instances based on the machines (or roles) in which they are deployed.

How to Scale Node.js WebSocket Redis Server?

I'm writing a chat server for Acani, and I have some questions about Scaling node.js and websockets with load balancer scalability.
What exactly does it mean to load balance Node.js? Does that mean there will be n independent versions of my server application running, each on a separate server?
To allow one client to broadcast a message to all the others, I store a set of all the webSocketConnections opened on the server. But, if I have n independent versions of my server application running, each on a separate server, then will I have n different sets of webSocketConnections?
If the answers to 1 & 2 are affirmative, then how do I store a universal set of webSocketConnections (across all servers)? One way I think I could do this is use Redis Pub/Sub and just have every webSocketConnection subscribe to a channel on Redis.
But, then, won't the single Redis server become the bottleneck? How would I then scale Redis? What does it even mean to scale Redis? Does that mean I have m independent versions of Redis running on different servers? Is that even possible?
I heard Redis doesn't scale. Why would someone say that. What does that mean? If that's true, is there a better solution to for pub/sub and/or storing a list of all broadcasted messages?
Note: If your answer is that Acani would never have to scale, even if each of all seven billion people (and growing) on Earth were to broadcast a message every second to everyone else on earth, then please give a valid explanation.
Well, few answers for your question:
To load balance Node.js, it means exactly what you thought about what it is, except that you don't really need separate server, you can run more then one process of your node server on the same machine.
Each server/process of your node server will have it's own connections, the default store for websockets (for example Socket.IO) is MemoryStore, it means that all the connections will be stored on the machine memory, it is required to work with RedisStore in order to work with redis as a connection store.
Redis PUB/SUB is a good way to achieve this task
You are right about what you said here, redis doesn't scale at this moment and running a lot of processes/connections connected to redis can make redis to be a bottleneck.
Redis doesn't scale, that is correct, but according to this presentation you can see that a cluster development is in top priority at redis and redis do have a cluster, it's just not stable yet: (taken from http://redis.io/download)
Where's Redis Cluster?
Redis development is currently focused on Redis 2.6 that will bring you support for Lua scripting and many other improvements. This is our current priority, however the unstable branch already contains most of the fundamental parts of Redis Cluster. After the 2.6 release we'll focus our energies on turning the current Redis Cluster alpha in a beta product that users can start to seriously test.
It is hard to make forecasts since we'll release Redis Cluster as stable only when we feel it is rock solid and useful for our customers, but we hope to have a reasonable beta for summer 2012, and to ship the first stable release before the end of 2012.
See the presentation here: http://redis.io/presentation/Redis_Cluster.pdf
2) Using Redis might not work to store connections: Redis can store data in string format, and if the connecion object has circular references (ie, Engine.IO) you won't be able serialise them
3) Creating a new Redis client for each client might not be a good approach so avoid that trap if you can
Consider using ZMQ node library to have processes communicate with each other through TCP (or IPC if they are clustered as in master-worker)

Resources