Designing a message processing system - node.js

I have been asked to create a message processing system as following. As I am not sure if this is the right place to post this, feel free to move it to any other appropriate SC group.
Server have about 100 to 500 clients connected at every moment. When a client connects to server, server loads part of their data and cache it in memory for faster access. Server will receive between 200~1000 messages per second for all clients. These messages are relatively small (about 500 bytes). Any changes to data in cache should be saved to disk as soon as possible. When client disconnects all their data is saved to disk and removed from cache. each message contains some instruction and a text message which will be saved as file. Instructions should be executed as fast as possible (near instant) and all clients using that file should get the update. Only writing the modified message to disk can be delayed.
Here is my solution in a diagram
My solution consists of a web server (http or socket) a message queue and two or more instances of file server and instruction server.
Web server grabs client messages and if there is message available for client in message queue, pushes it back to client.
Instruction processor grabs instructions from queue and creates necessary message to be processed by file server (Get/set file) and waits for the file to be available in queue and more process to create another message for client.
File server only provides the files, either from cache or physical file depending on type of file.
There are peak times that total connected clients might go over 10000 at once and total messages received from clients increase to 10~15K.
I should be able to clear the queue and go back to normal state as soon as possible (with processing requests obviously).
I should be able to add extra instruction processors and file servers on the fly without having to shut down the other instances.
In case file server crashes it shouldn’t lose files so it has to write files to disk as soon as there are any changes and process time is available.
File system should be in b+ tree format so some applications (local reporting apps) could easily access files without having to go through queue server
My Solution
I am thinking of using node.js for socket/web server. And may be a NoSQL database for file server and a queue server such as rabbitMQ or Node_Redis and Redis.
Is there a better way of structuring this system?
What are my other options for components of this system?
is it possible to run all the instances in same server machine or even in same application (in different thread)?

You have a couple of holes here, mostly around the web server "pushing" the message back to the client. That doesn't really work in a web-based world. You can try and use websockets, but generally, this ends up being polling based.
I don't know what the "instructions" are to be executed, but saving 1000 500byte messages is trivial. Many NoSQL solutions boast million+ write per second capacity. Especially if you let committing to disk to lag.
Don't bother with the queue for the return of the file. A good NoSQL solution will scale better. Build out a Cassandra cluster, load test it until it can handle your peak load.
This simplifies your architecture into a 1 or more web servers, clients polling that server for file updates, a queue for submitting "messages" to the "instruction server" (also known as an application server in web-developer terms), and a no-sql database for the instruction server to write files to.
This makes scaling easy, you can always add more web servers, and with a decent cluster size for your no-sql server, you should get to scale horizontally there as well. Your only real bottleneck is your instruction server queue, which you could always throw more instruction servers at.


SocketIO scaling architecture and large rooms requirements

We are using socketIO on a large chat application.
At some points we want to dispatch "presence" (user availability) to all other users.'room1').emit('availability:update', {userid='xxx', isAvailable: false});
room1 may contains a lot of users (500 max). We observe a significant raise in our NodeJS load when many availability updates are triggered.
The idea was to use something similar to redis store with Socket IO. Have web browser clients to connect to different NodeJS servers.
When we want to emit to a room we dispatch the "emit to room1" payload to all other NodeJS processes using Redis PubSub ZeroMQ or even RabbitMQ for persistence. Each process will itself call his own'room1').emit to target his subset of connected users.
One of the concern with this setup is that the inter-process communication may become quite busy and I was wondering if it may become a problem in the future.
Here is the architecture I have in mind.
Could you batch changes and only distribute them every 5 seconds or so? In other words, on each node server, simply take a 'snapshot' every X seconds of the current state of all users (e.g. 'connected', 'idle', etc.) and then send that to the other relevant servers in your cluster.
Each server then does the same, every 5 seconds or so it sends the same message - of only the changes in user state - as one batch object array to all connected clients.
Right now, I'm rather surprised you are attempting to send information about each user as a packet. Batching seems like it would solve your problem quite well, as it would also make better use of standard packet sizes that are normally transmitted via routers and switches.
You are looking for this library:
Which can be used with this emitter:
About available users function, I think there are two alternatives,you can create a "queue Users" where will contents "public data" from connected users or you can use exchanges binding information for show users connected. If you use an "user's queue", this will be the same for each "room" and you could update it when an user go out, "popping" its state message from queue (Although you will have to "reorganize" all queue message for it).
Nevertheless, I think that RabbitMQ is designed for asynchronous communication and it is not very useful approximation have a register for presence or not from users. I think it's better for applications where you don't know when the user will receive the message and its "real availability" ("fire and forget architectures"). ZeroMQ require more work from zero but you could implement something more specific for your situation with a better performance.
An publish/subscribe example from RabbitMQ site could be a good point to begin a new design like yours where a message it's sent to several users at same time. At summary, I will create two queues for user (receive and send queue messages) and I'll use specific exchanges for each "room chat" controlling that users are in each room using exchange binding's information. Always you have two queues for user and you create exchanges to binding it to one or more "chat rooms".
I hope this answer could be useful for you ,sorry for my bad English.
This is the common approach for sharing data across several processes. You have done well, so far, with a single process and a single thread. I could lamely assume that you could pick any of the mentioned technologies for communicating shared data without hitting any performance issues.
If all you need is IPC, you could perhaps have a look at Faye. If, however, you need to have some data persisted, you could start a Redis cluster with as many Redis masters as you have CPUs, though this will add minor networking noise for Pub/Sub.

Cloud Architecture On Azure for Internet of Things

I'm working on a server architecture for sending/receiving messages from remote embedded devices, which will be hosted on Windows Azure. The front-facing servers are going to be maintaining persistent TCP connections with these devices, and I need a way to communicate with them on the backend.
Problem facts:
Devices: ~10,000
Frequency of messages device is sending up to servers: 1/min
Frequency of messages originating server side (e.g. from user actions, scheduled triggers, etc.): 100/day
Average size of message payload: 64 bytes
Upward communication
The devices send up messages very frequently (sensor readings). The constraints for that data are not very strong, due to the fact that we can aggregate/insert those sensor readings in a batched manner, and that they don't require in-order guarantees. I think the best way of handling them is to put them in a Storage Queue, and have a worker process poll the queue at intervals and dump that data. Of course, I'll have to be careful about making sure the worker process does this frequently enough so that the queue doesn't infinitely back up. The max batch size of Azure Storage Queues is 32, but I'm thinking of potentially pulling in more than that: something like publishing to the data store every 1,000 readings or 30 seconds, whichever comes first.
Downward communication
The server sends down updates and notifications much less frequently. This is a slightly harder problem, as I can see two viable paradigms here (with some blending in between). Could either:
Create a Service Bus Queue for each device (or one queue with thousands of subscriptions - limit is for number of queues is 10,000)
Have a state table housed in a DB that contains the latest "state" of a specific message type that the devices will get sent to them
With option 1, the application server simply enqueues a message in a fire-and-forget manner. On the front-end servers, however, there's quite a bit of things that have to happen. Concerns I can see include:
Monitoring 10k queues (or many subscriptions off of a queue - the
Azure SDK apparently reuses connections for subscriptions to the same
Connection Management
Should no longer monitor a queue if device disconnects.
Need to expire messages if device is disconnected for an extended period of time (so that queue isn't backed up)
Need to enable some type of "refresh" mechanism to update device's complete state when it goes back online
The good news is that service bus queues are durable, and with sessions can arrange messages to come in a FIFO manner.
With option 2, the DB would house a table that would maintain state for all of the devices. This table would be checked periodically by the front-facing servers (every few seconds or so) for state changes written to it by the application server. The front-facing servers would then dispatch to the devices. This removes the requirement for queueing of FIFO, the reasoning being that this message contains the latest state, and doesn't have to compete with other messages destined for the same device. The message is ephemeral: if it fails, then it will be resent when the device reconnects and requests to be refreshed, or at the next check interval of the front-facing server.
In this scenario, the need for queues seems to be removed, but the DB becomes the bottleneck here, and I fear it's not as scalable.
These are both viable approaches, and I feel this question is already becoming too large (although I can provide more descriptions if necessary). Just wanted to get a feel for what's possible, what's usually done, if there's something fundamental I'm missing, and what things in the cloud can I take advantage of to not reinvent the wheel.
If you can identify the device (may be device id/IMEI/Mac address) by the the message it sends then you can reduce the number of queues from 10,000 to 1 queue and not have 10000 subscriptions too. This could also help you in the downward communication as you will be able to identify the device and send the message to the appropriate socket.
As you mentioned the connections last longer you could deliver the command to the device that is connected and decide what to do with the commands to the device that are not connected.
Hope it helps

Measure perf - components of a roundtrip MDX query

I would like to decompose the performance of a round-trip MDX query from a client to Analysis Services and back. In particular, I'm looking to identify/distinguish individual queries and record the time each query takes for:
the XMLA over HTTP message from client to IIS
the XMLA over TCP/IP message from the Data Pump to Analysis Services
the response from Analysis Services to the Data Pump
the response from IIS to the client
I am open to other data-points that would be beneficial to identify bottlenecks in the lifecycle of a query.
My company has tested a mix of software including: Periodic SSAS DMV data collection, PerfMon, Flight Recorder, Splunk and SQL Sentry. We are having trouble tying it all together.
One of the main problems that you have is that there probably are overlaps in time: msmdpump in IIS can start sending the first bytes to the AS server as soon as it has available the first few bytes of the XMLA from the http request, and vice versa, it probably starts sending the message as soon as the first few bytes from the response from the AS server is available.
Actually, the communication between msmdpump and the AS server is a binary version of the XML that is sent between msmdpump and the client, and hence easy to translate without knowing information later in the message. See for some details about the protocol.
To track the times, my approach would a low level one: I would run Wireshark ( on on the computer running IIS, and filter to only the http frames between the client and IIS and the frames between the IIS computer and the AS server. The contents of the frames would be more or less irrelevant, but you could see the time stamp of the first and last package of a request, giving you an rough estimate about the durations of the different communications. And staying on one computer for all network traffic logging avoids the need to have the clocks of all computers exactly synchronized.

Architecture and performance issue

I have an question about architecture/performance. I'm talking about a SIP server that processes multiples client requests concurrently. I suppose that each request is treated in a dedicated thread. At the end of the process, the concerned thread log request specific infos in a file. I want to optimize the last part of processing. I mean I want to know what alternatives you propose instead of logging these infos in a file. Why? Because writing in a file after processing uses resources that I would use to process other arriving requests.
First, what do you think about the question? And, if you think that it's a "true" question (I mean that an alternative may optimize the performances), what do you propose?
I thought about logging the data into a queue and to use another process IN ANOTHER MACHINE that would read from the queue and write to a file.
Thanks for your suggestions
If it is NOT a requirement that the log is written before the request returns - i.e. the logging is not part of the atomic response - then you have the option of returning the response and just initiating the logging action.
Putting the logging data in a queue in memory seems reasonable. You can read that queue and write to disk either on the same machine or another. I would start with a thread in your app as this is easiest to implement and since the disk I/O is going to be the limiting factor, it shouldn't impact your server much.
If the log is required to be written BEFORE the response is returned, you still have the option of using a reliable queue like MSMQ.
I suspect that network overhead involved in moving the logging to another machine is problably going to create more problems than it solves. I would go with #Nicholas' solution - queue off the logs to one thread on the same machine. The queue allows slack so that occasional disk latency is mitigated and the logging thread can make its own optimizations, eg. waiting until it has a cluster-size of logs before writing. Other stuff, like opening a new log file every day or whenever the log-file reaches a limiting size are also much easier without affecting the performance of the main server.
Even if you log on another machine, you should still queue off the logging to mitigate network latency.
If the log objects on the queue contain, say, a 'request' enumeration, (eg. ElogWrite, ElogNewFile, ElogPath, ElogShutdown), you could try both - you could queue up a request for the log thread to close its current log file and open a path to a file on a networked machine at runtime - the queue buffer would absorb the delay of doing this.

Linux: need to design pre-fetcher to cache files from NAS into system memory

I am designing a server for the following scenario:
a series of single images are stored on a NAS, lets say 100 of them
a client connects to the server over TCP socket and requests image39
server reads image39 from NAS and sends back to client over socket
it is likely that the client will also request other images from the series, so:
I would like to launch a thread that iterates through the images, reads them, and does a cat image39 > /dev/null to force cache into memory on server
thread will fetch images as follows: image38, image40, image37, image41, etc.
already fetched images are ignored
if client now requests image77, I want to reset the fetch thread to fetch: image76, image78, etc.
This has to scale to many series and clients. Probably on the order of 1000 concurrent
prefetches. I understand that threads can cause performance hit if there are too many. Would it be better to fork a new process instead? Is there a more efficient way than threads or processes ?
This is premature optimization. Try implementing your system without tricks to "force" the cache, and see how it works. I bet it'll be fine--and you won't then need to worry about nasty surprises if it turns out your tricks don't play nice with other things on the system.
