Syncing app state with clients using socketio - node.js

I'm running a node server with SocketIO which keeps a large object (app state) that is updated regularly.
All clients receive the object after connecting to the server and should keep it updated in real-time using the socket (read-only).
Here's what I have considered:
1:
Emit a delta of changes to the clients using diff after updates
(requires dealing with the reability of delivery and lost updates)
2:
Use the diffsync package (however it allows clients to push changes to the server, but I need updates to be unidirectional, i.e. server-->clients)
I'm confident there should be a readily available solution to deal with this but I was not able to find a definitive answer.

The solution is very easy. You must modify the server so that it accepts updates only from trusted clients.
let Server = require('diffsync').Server;
let receiveEdit = Server.prototype.receiveEdit
Server.receiveEdit = function(connection, editMessage, sendToClient){
if(checkIsTrustedClient(connection))
receiveEdit.call(this, connection, editMessage, sendToClient)
}
but
// TODO: implement backup workflow
// has a low priority since `packets are not lost` - but don't quote me on that :P
console.log('error', 'patch rejected!!', edit.serverVersion, '->',
clientDoc.shadow.serverVersion, ':',
edit.localVersion, '->', clientDoc.shadow.localVersion);
Second option is try find another solution based on jsondiffpatch

Related

How should one properly handle opening and closing of Azure Service Bus/Topic Clients in an Azure Function?

I'm not sure of the proper way to manage the lifespans of the various clients necessary to interact with the Azure Service Bus. From my understanding there are three different but similar clients to manage: ServiceBusClient, a Topic/Queue/Subscription Service, and then a Sender of some sort. In my case, its TopicService and a Sender. Should I close the sender after every message? After a certain amount of downtime? And same with all the others? I feel like I should keep the ServiceBusClient open until the function is entirely complete, so that probably carries over to the Topic Client as well. There's just so many ways to skin this one, I'm not sure where to start to draw the line. I'm pretty sure it's not this extreme:
function sendMessage(message: SendableMessageInfo) {
let client=createServiceBusClientFromConnectionString(connectionString)
let tClient = createTopicClient(client);
const sender = tClient.createSender();
sender.send(message);
sender.close();
tClient.close();
client.close();
}
But leaving everything open all the time seems like a memory leak waiting to happen. Should I handle this all through error handling? Try-catch, then close everything in a finally block?
I could also just use the Azure Function binding, correct me if I'm wrong:
const productChanges: AzureFunction = async function (context: Context, products: product[]): Promise<void> {
context.bindings.product_changes = []
for (let product of product) {
if(product.updated) {
let message = this.createMessage(product)
context.bindings.product_changes.push(message)
}
}
context.done();
}
I can't work out from the docs or source which would be better (both in terms of performance and finances) for an extremely high throughput Topic (at surge, ~100,000 requests/sec).
Any advice would be appreciated!
In my opinion, we'd better use Azure binding or set the client static but not create the client every time. If use Azure binding, we will not consider the problem about close the sender, if set the client static, it's ok too. Both of the solutions have good performance and there is no difference in cost (you can refer to this tutorial for servicebus price: https://azure.microsoft.com/en-us/pricing/details/service-bus/) between these twos. Hope it would be helpful to your question.
I know this is a late reply, but I'll try to explain the concepts behind the clients below in case someone lands here looking for answers.
Version 1
_ ServiceBusClient (maintains the connection)
|_ TopicClient
|_ Sender (sender link)
Version 7
_ ServiceBusClient (maintains the connection)
|_ ServiceBusSender (sender link)
In both version 1 and version 7 of #azure/service-bus SDK, when you use the sendMessages method or the equivalent send method for the first time, a connection is created on the ServiceBusClient if there was none and the new sender link is created.
The sender link remains active for a while and is cleared on its own(by the SDK) if there is no activity. Even if it is closed by inactivity, the subsequent send call even after waiting for a long duration would work just fine since it creates a new sender link.
Once you're done using the ServiceBusClient, you can close the client and all the internal senders, receivers are also closed with this if they are not already closed individually.
The latest version 7.0.0 of #azure/service-bus has been released recently.
#azure/service-bus - 7.0.0
Samples for 7.0.0
Guide to migrate from #azure/service-bus v1 to v7

TypeScript: Large memory consumption while using ZeroMQ ROUTER/DEALER

We have recently started working on Typescript language for one of the application where a queue'd communication is expected between a server and client/clients.
For achieving the queue'd communication, we are trying to use the ZeroMQ library version 4.6.0 as a npm package: npm install -g zeromq and npm install -g #types/zeromq.
The exact scenario :
The client is going to send thousands of messages to the server over ZeroMQ. The server in-turn will be responding with some acknowledgement message per incoming message from the client. Based on the acknowledgement message, the client will send next message.
ZeroMQ pattern used :
The ROUTER/DEALER pattern (we cannot use any other pattern).
Client side code :
import Zmq = require('zeromq');
let clientSocket : Zmq.Socket;
let messageQueue = [];
export class ZmqCommunicator
{
constructor(connString : string)
{
clientSocket = Zmq.socket('dealer');
clientSocket.connect(connString);
clientSocket.on('message', this.ReceiveMessage);
}
public ReceiveMessage = (msg) => {
var argl = arguments.length,
envelopes = Array.prototype.slice.call(arguments, 0, argl - 1),
payload = arguments[0];
var json = JSON.parse(msg.toString('utf8'));
if(json.type != "error" && json.type =='ack'){
if(messageQueue.length>0){
this.Dispatch(messageQueue.splice(0, 1)[0]);
}
}
public Dispatch(message) {
clientSocket.send(JSON.stringify(message));
}
public SendMessage(msg: Message, isHandshakeMessage : boolean){
// The if condition will be called only once for the first handshake message. For all other messages, the else condition will be called always.
if(isHandshakeMessage == true){
clientSocket.send(JSON.stringify(message));
}
else{
messageQueue.push(msg);
}
}
}
On the server side, we already have a ROUTER socket configured.
The above code is pretty straight forward. The SendMessage() function is essentially getting called for thousands of messages and the code works successfully but with load of memory consumption.
Problem :
Because the behavior of ZeroMQ is asynchronous, the client has to wait on the call back call ReceiveMessage() whenever it has to send a new message to ZeroMQ ROUTER (which is evident from the flow to the method Dispatch).
Based on our limited knowledge with TypeScript and usage of ZeroMQ with TypeScript, the problem is that because default thread running the typescript code (which creates the required 1000+ messages and sends to SendMessage()) continues its execution (creating and sending more messages) after sending the first message (handshake message essentially), unless all the 1000+ messages are created and sent to SendMessage() (which is not sending the data but queuing the data as we want to interpret the acknowledgement message sent by the router socket and only based on the acknowledgement we want to send the next message), the call does not come to the ReceiveMessage() call back method.
It is to say that the call comes to ReceiveMessage() only after the default thread creating and calling SendMessage() is done doing this for 1000+ message and now there is no other task for it to do any further.
Because ZeroMQ does not provide any synchronous mechanism of sending/receiving data using the ROUTER/DEALER, we had to utilize the queue as per the above code using a messageQueue object.
This mechanism will load a huge size messageQueue (with 1000+ messages) in memory and will dequeue only after the default thread gets to the ReceiveMessage() call at the end. The situation will only worsen if say we have 10000+ or even more messages to be sent.
Questions :
We have validated this behavior certainly. So we are sure of the understanding that we have explained above. Is there any gap in our understanding of either/or TypeScript or ZeroMQ usage?
Is there any concept like a blocking queue/limited size array in Typescript which would take limited entries on queue, and block any new additions to the queue until the existing ones are queues (which essentially applies that the default thread pauses its processing till the time the call back ReceiveMessage() is called which will de-queue entries from the queue)?
Is there any synchronous ZeroMQ methodology (We have used it in similar setup for C# where we pool on ZeroMQ and received the data synchronously)?.
Any leads on using multi-threading for such a scenario? Not sure if Typescript supports multi threading to a good extent.
Note : We have searched on many forums and have not got any leads any where. The above description may have multiple questions inside one question (against the rules of stackoverflow forum); but for us all of these questions are interlinked to using ZeroMQ effectively in Typescript.
Looking forward to getting some leads from the community.
Welcome to ZeroMQ
If this is your first read about ZeroMQ, feel free to first take a 5 seconds read - about the main conceptual differences in [ ZeroMQ hierarchy in less than a five seconds ] Section.
1 ) ... Is there any gap in our understanding of either/or TypeScript or ZeroMQ usage ?
Whereas I cannot serve for the TypeScript part, let me mention a few details, that may help you move forwards. While ZeroMQ is principally a broker-less, asynchronous signalling/messaging framework, it has many flavours of use and there are tools to enforce both a synchronous and asynchronous cooperation between the application code and the ZeroMQ Context()-instance, which is the cornerstone of all the services design.
The native API provides means to define, whether a respective call ought block, until a message processing across the Context()-instance's boundary was able to get completed, or, on the very contrary, if a call ought obey the ZMQ_DONTWAIT and asynchronously return the control back to the caller, irrespectively of the operation(s) (in-)completion.
As additional tricks, one may opt to configure ZMQ_SND_HWM + ZMQ_RCV_HWM and other related .setsockopt()-options, so as to meet a specific blocking / silent-dropping behaviours.
Because ZeroMQ does not provide any synchronous mechanism of sending/receiving data
Well, ZeroMQ API does provide means for a synchronous call to .send()/.recv() methods, where the caller is blocked until any feasible message could get delivered into / from a Context()-engine's domain of control.
Obviously, the TypeScript language binding/wrapper is responsible for exposing these native API services to your hands.
3 ) Is there any synchronous ZeroMQ methodology (We have used it in similar setup for C# where we pool on ZeroMQ and received the data synchronously) ?
Yes, there are several such :
- the native API, if not instructed by a ZMQ_DONTWAIT flag, blocks until a message can get served
- the native API provides a Poller()-object, that can .poll(), if given a -1 as a long duration specifier to wait for sought for events, blocking the caller until any such event comes and appears to the Poller()-instance.
Again, the TypeScript language binding/wrapper is responsible for exposing these native API services to your hands.
... Large memory consumption ...
Well, this may signal a poor resources management care. ZeroMQ messages, once got allocated, ought become also free-d, where appropriate. Check your TypeScript code and the TypeScript language binding/wrapper sources, if the resources systematically get disposed off and free-d from memory.

Meteor MongoDB subscription delivering data in 10 second intervals instead of live

I believe this is more of a MongoDB question than a Meteor question, so don't get scared if you know a lot about mongo but nothing about meteor.
Running Meteor in development mode, but connecting it to an external Mongo instance instead of using Meteor's bundled one, results in the same problem. This leads me to believe this is a Mongo problem, not a Meteor problem.
The actual problem
I have a meteor project which continuosly gets data added to the database, and displays them live in the application. It works perfectly in development mode, but has strange behaviour when built and deployed to production. It works as follows:
A tiny script running separately collects broadcast UDP packages and shoves them into a mongo collection
The Meteor application then publishes a subset of this collection so the client can use it
The client subscribes and live-updates its view
The problem here is that the subscription appears to only get data about every 10 seconds, while these UDP packages arrive and gets shoved into the database several times per second. This makes the application behave weird
It is most noticeable on the collection of UDP messages, but not limited to it. It happens with every collection which is subscribed to, even those not populated by the external script
Querying the database directly, either through the mongo shell or through the application, shows that the documents are indeed added and updated as they are supposed to. The publication just fails to notice and appears to default to querying on a 10 second interval
Meteor uses oplog tailing on the MongoDB to find out when documents are added/updated/removed and update the publications based on this
Anyone with a bit more Mongo experience than me who might have a clue about what the problem is?
For reference, this is the dead simple publication function
/**
* Publishes a custom part of the collection. See {#link https://docs.meteor.com/api/collections.html#Mongo-Collection-find} for args
*
* #returns {Mongo.Cursor} A cursor to the collection
*
* #private
*/
function custom(selector = {}, options = {}) {
return udps.find(selector, options);
}
and the code subscribing to it:
Tracker.autorun(() => {
// Params for the subscription
const selector = {
"receivedOn.port": port
};
const options = {
limit,
sort: {"receivedOn.date": -1},
fields: {
"receivedOn.port": 1,
"receivedOn.date": 1
}
};
// Make the subscription
const subscription = Meteor.subscribe("udps", selector, options);
// Get the messages
const messages = udps.find(selector, options).fetch();
doStuffWith(messages); // Not actual code. Just for demonstration
});
Versions:
Development:
node 8.9.3
mongo 3.2.15
Production:
node 8.6.0
mongo 3.4.10
Meteor use two modes of operation to provide real time on top of mongodb that doesn’t have any built-in real time features. poll-and-diff and oplog-tailing
1 - Oplog-tailing
It works by reading the mongo database’s replication log that it uses to synchronize secondary databases (the ‘oplog’). This allows Meteor to deliver realtime updates across multiple hosts and scale horizontally.
It's more complicated, and provides real-time updates across multiple servers.
2 - Poll and diff
The poll-and-diff driver works by repeatedly running your query (polling) and computing the difference between new and old results (diffing). The server will re-run the query every time another client on the same server does a write that could affect the results. It will also re-run periodically to pick up changes from other servers or external processes modifying the database. Thus poll-and-diff can deliver realtime results for clients connected to the same server, but it introduces noticeable lag for external writes.
(the default is 10 seconds, and this is what you are experiencing , see attached image also ).
This may or may not be detrimental to the application UX, depending on the application (eg, bad for chat, fine for todos).
This approach is simple and and delivers easy to understand scaling characteristics. However, it does not scale well with lots of users and lots of data. Because each change causes all results to be refetched, CPU time and network bandwidth scale O(N²) with users. Meteor automatically de-duplicates identical queries, though, so if each user does the same query the results can be shared.
You can tune poll-and-diff by changing values of pollingIntervalMs and pollingThrottleMs.
You have to use disableOplog: true option to opt-out of oplog tailing on a per query basis.
Meteor.publish("udpsPub", function (selector) {
return udps.find(selector, {
disableOplog: true,
pollingThrottleMs: 10000,
pollingIntervalMs: 10000
});
});
Additional links:
https://medium.baqend.com/real-time-databases-explained-why-meteor-rethinkdb-parse-and-firebase-dont-scale-822ff87d2f87
https://blog.meteor.com/tuning-meteor-mongo-livedata-for-scalability-13fe9deb8908
How to use pollingThrottle and pollingInterval?
It's a DDP (Websocket ) heartbeat configuration.
Meteor real time communication and live updates is performed using DDP ( JSON based protocol which Meteor had implemented on top of SockJS ).
Client and server where it can change data and react to its changes.
DDP (Websocket) protocol implements so called PING/PONG messages (Heartbeats) to keep Websockets alive. The server sends a PING message to the client through the Websocket, which then replies with PONG.
By default heartbeatInterval is configure at little more than 17 seconds (17500 milliseconds).
Check here: https://github.com/meteor/meteor/blob/d6f0fdfb35989462dcc66b607aa00579fba387f6/packages/ddp-client/common/livedata_connection.js#L54
You can configure heartbeat time in milliseconds on server by using:
Meteor.server.options.heartbeatInterval = 30000;
Meteor.server.options.heartbeatTimeout = 30000;
Other Link:
https://github.com/meteor/meteor/blob/0963bda60ea5495790f8970cd520314fd9fcee05/packages/ddp/DDP.md#heartbeats

Pusher Account over quota

We use Puhser in our application in order to have real-time updates.
Something very stange happens - while google analytics says that we have around 200 simultaneous connections, Pusher says that we have 1500.
I would like to monitor Pusher connections in real-time but could not find any method to do so. Somebody can help??
Currently there's no way to get realtime stats on the number of connections you currently have open for your app. However, it is something that we're investigating currently.
In terms of why the numbers vary between Pusher and Google Analytics, it's usually down to the fact that Google Analytics uses different methods of tracking whether or not a user is on the site. We're confident that our connection counting is correct, however, that's not to say that there isn't a potentially unexpected reason for your count to be high.
A connection is counted as a WebSocket connection to Pusher. When using the Pusher JavaScript library a new WebSocket connection is created when you create a new Pusher instance.
var pusher = new Pusher('APP_KEY');
Channel subscriptions are created over the existing WebSocket connection (known as multiplexing), and do not count towards your connection quota (there is no limit on the number allowed per connection).
var channel1 = pusher.subscribe('ch1');
var channel2 = pusher.subscribe('ch2');
// All done over as single connection
// more subscriptions
// ...
var channel 100 = pusher.subscribe('ch100');
// Still just a 1 connection
Common reasons why connections are higher than expected
Users open multiple tabs
If a user has multiple tabs open to the same application, multiple instances of Pusher will be created and therefore multiple connections will be used e.g. 2 tabs open will mean 2 connections are established.
Incorrectly coded applications
As mentioned above, a new connection is created every time a new Pusher object is instantiated. It is therefore possible to create many connections in the same page.
Using an older version of one our libraries
Our connection strategies have improved over time, and we recommend that you keep up to date with the latest versions.
Specifically, in newer versions of our JS library, we carry out ping-pong requests between server and client to verify that the client is still around.
Other remedies
While our efforts are always to keep a connection going indefinitely to an application, it is possible to disconnect manually if you feel this works in your scenario. It can be achieved by making a call to Pusher.disconnect(). Below is some example code:
var pusher = new Pusher("APP_KEY");
var timeoutId = null;
function startInactivityCheck() {
timeoutId = window.setTimeout(function(){
pusher.disconnect();
}, 5 * 60 * 1000); // called after 5 minutes
};
// called by something that detects user activity
function userActivityDetected(){
if(timeoutId !== null) {
window.clearTimeout(timeoutId);
}
startInactivityCheck();
};
How this disconnection is transmitted to the user is up to you but you may consider prompting them to let them know that they will not receive any further real-time updates due to a long period of inactivity. If they wish to start receiving real-time updates again they should click a button.

Rabbitmq understanding channel, consumer and connection statistics

My problem referres to the statistics that are displayed in the management plugin. When not used rabbitmq stats looks like that:
I am using rabbitmq to create a REQ/REP socket. For each connected client a new queue is created. So we have 4 queues now:
However I don't understand the other numbers.
Why are there 8 exchanges initially? (after fresh install)
Why are there 2 queues initially? (after fresh install)
Why did the other numbers jump from 0 to 4 while I have just 2 clients?
Is this because of the REQ/REP?
Update: I have two application communicating with each other. On the one side I have
var context = require('rabbit.js').createContext('amqp://localhost');
var rep = context.socket('REP', {
prefetch: 1,
persistent: false
});
rep.connect(someIdentifier);
rep.setEncoding('utf8');
rep.on('data', function(data) {
//got a request
});
And on the other:
var context = require('rabbit.js').createContext('amqp://localhost');
var req = context.socket('REQ');
req.setEncoding('utf8');
req.connect(sameIdAsAbove);
req.on('data', function(data) {
//got a response
});
6 default exchanges are one of each exchange type + their aliases exchanges (see Exchanges and Exchange Types section in AMQP 0-9-1 Model Explained.
The next 2 exchanges are amq.rabbitmq.trace (topic type), the one from Firehose Tracer and amqp.rabbitmq.log (also topic type) from where you can consume log entries during debugging (just bind by # key for example).
These exchanges created in every vhost, by the way. The amq prefix comes from AMQP conventions to name AMQP related entities with amq prefix. The rabbitmq part stands for RabbitMQ-specific features.
So it all about conventions.
As to 2 default queues, it really depends of your installation type, while default config may vary. Vanilla RabbitMQ installation gives you no queues.
If you have 4 active consumers (process that waiting for a new message to appear in queue) that stay connected they will utilize at least one connection each and one channel per connection.
Why your queues number changes is hard to say without seen actual code.
Update:
4 connections and 4 channels (to communicate with AMQP broker you need to open at least one channel, it's described in 4.3 Connection Multiplexing section in AMQP protocol) comes that underlying implementation creates duplex stream (one for each application instance) that probably use two connections to makes read and write events happen independently.
P.S.: actually, fresh install may import pre-defined config and configure many other options from access policy, vhosts, users, exchanges, queues, bindings to HA, clustering and many other.

Resources