Ejabberd custom module memory leak

Ejabberd custom module memory leak - memory-leaks

I'm not well experienced in erlang but need to resolve some task.
We are using Ejabberd for messaging and we want to send push notification to user mobile device when user is offline. For DB we are using MySQL.
Because user can have more devices and it is common case that user is online on one device but offline on second. We want that on second device he receive the push notification. We have our own push server and calling his push API from Ejabberd
I created new module which process IQ of type urn:xmpp:gcm:0 where the app is sending his push token. And I write into DB table user, resource, token and user status (online/offline)
when user session is closed I update for that resource status to offline.
I added hook on event "user_send_packet" and I load from DB all offline resources from destination users and send push notification to them.
I crated modules mod_notification.erl - for message parsing and handling
and module mod_notification_sql.erl - for SQL queries and updates handling
this are the hooks implemented in module
gen_iq_handler:add_iq_handler(ejabberd_sm, Host, <<?NS_GCM>>, ?MODULE, iq, no_queue),
ejabberd_hooks:add(user_send_packet, Host, ?MODULE, user_send_packet, 500),
ejabberd_hooks:add(sm_register_connection_hook, Host, ?MODULE, user_online, 100),
ejabberd_hooks:add(sm_remove_connection_hook, Host, ?MODULE, user_offline, 100),
This module is causing memory leak but I can't the cause.
The communication with database is done very similar to "mod_mam_sql" basically just different queries and variables.
I can post the complete code but for start If someone can tell me what can be the possible reasons for memory leak. Because I'm not using some caching or some direct memory allocation.
update: The IQ processing is making the leak, when I comment the iq handler initialization leak is gone. Is there any rules what should be done or not in iq handler?
Thanks in Advance,
Laslo

Related

Correct way to build an in-app notification service?

Background
I have a monolith Node.js + PostgreSQL app that, besides other things, needs to provide real-time in-app notifications to end users.
It is currently implemented in the following way:
there's a db table notifications which has state (pending/sent), userid (id of the notification receiver), isRead (did a user read the notification), type and body - notification data.
once specific resources get created or specific events occur, a various number of users should receive in-app notifications. When a notification is created, it gets persisted to the db and gets sent to the user using WebSockets. Notifications can also get created by a cron job.
when a user receives N number of notifications of the same type, they get collapsed into one single notification. This is done via db trigger by deleting repeated notifications and inserting a new one.
usually it works fine. But when the number of receivers exceeds several thousands, the app lags or other requests get blocked or not all notifications get sent via WebSockets.
Examples of notifications
Article published
A user is awarded with points
A user logged in multiple times but didn't perform some action
One user sends a friend request to another
One user sent a message to another
if a user receives 3+ Article published notifications, they get collapsed into the N articles published notification (N gets updated if new same notifications get received).
What I currently have doesn't seem to work very well. For example, for the Article created event the api endpoint that handles the creation, also handles notifications send-outs (which is maybe not a good approach - it creates ~5-6k notifications and sends them to users via websockets).
Question
How to correctly design such functionality?
Should I stay with a node.js + db approach or add a queuing service? Redis Pub/Sub? RabbitMQ?
We deploy to the k8s cluster, so adding another service is not a problem. More important question - is it really needed in my case?
I would love some general advice or resources to read on this topic.
I've read several articles on messaging/queuing/notifications system design but still don't quite get if this fits my case.
Should the queue store the notifications or should they be in the db? What's the correct way to notify thousands of users in real-time (websockets? SSE?)?
Also, the more I read about queues and message brokers, the more it feels like I'm overcomplicating things and getting more confused.

Consider using the Temporal open source project. It would allow modeling each user lifecycle as a separate program. The Temporal makes the code fully fault tolerant and preserves its full state (including local variables and blocking await calls) across process restarts.

How to build a scalable realtime chat messaging with Websocket?

I'm trying to build a realtime (private) chat between users of a video game with 25K+ concurrent connections. We currently run 32 nodes where users can connect through a load balancer. The problem I'm trying to solve is how to route messages to each user?
Currently, we are using socket.io & socket.io-redis, where each websocket joins a room with its user ID, and we emit each message they should receive to that room. The problem with this design is that we are reaching the limits of Redis Pubsub, and Socket.io which doesn't scale well (socket.io emit messages to all nodes which check if the user is connected, this is not viable).
Our current stack is composed of Postgres, Redis & RabbitMQ. I have been thinking about this problem a lot and have come up with 3 different solutions :
Route all messages with RabbitMQ. When a user connects, we create an exchange with type fanout with the user ID and a queue per websocket connection (we have to handle multiple connections per user). When we want to emit to that user, we simply publish to that exchange. The problem with that approach is that we have to create a lot of queues, and I heard that this may not be very efficient.
Create a queue for each node in RabbitMQ. When a user connects, we save the node & socket ID in a Redis Set, so that when we have to send a message to that specific user, we first get the list of nodes, emit to each node queue, which then handle routing to specific client in the app. The problems with that approach is that in the case of a node failure, we may store that a user is connected when this is not the case. To fix that, we would need to expire the users's Redis entry but this is not a perfect fix. Also, if we later want to implement group chat, it would mean we have to send duplicates messages in Rabbit, this is not ideal.
Go all in with Firebase Cloud Messaging. We have a mobile app, and we plan to use it for push notifications when the user isn't connected, but would it be a good fit even if the user is connected?
What do you think is the best fit for our use case? Do you have any other idea?

I found a better solution : create a binding for each user but using only one queue on each node, then we route each messages to each user.

How to keep information synchronized between the cloud and the device

I'm coding a project which needs cloud control device operation, and want to keep information in sync.
The cloud needs to know the state of device, such as when the network is interrupted and when the network is restored.
When the network is restored, the modified information on the cloud is synchronized to device.
anyone got an idea of how my approach should be like? any tips?
I intend to add resident programs in the background at both ends to determine, but in fact, it is impossible for the cloud in the project to connect only one device, and multiple apps may run in one device, which is very tedious to do. Is there any simple component to realize this function?
I wish control information and data information to be synchronized on the cloud and device

Based on your tag, I'm assuming that you are using MQTT as a messaging protocol for your system. If so, to address your need for tracking the device-cloud connection state, MQTT specifies a feature called "Last Will and Testament".
From the MQTT 3.1.1 Standard Section 3.1.2.5:
If the Will Flag is set to 1 this indicates that, if the Connect request is accepted, a Will Message MUST be stored on the Server and associated with the Network Connection. The Will Message MUST be published when the Network Connection is subsequently closed unless the Will Message has been deleted by the Server on receipt of a DISCONNECT Packet [MQTT-3.1.2-8].
This can be leveraged to let the remote MQTT client on the cloud know when the device is connected and when it disconnects by publishing an online payload to a topic (for example) device/conn_status after a successful connection, and registering a Last Will offline message to the same topic. Now, whenever the device client goes offline, the broker will publish the offline payload on his behalf to the cloud client that can now act accordingly.

How are Node.js+Socket.io+MongoDB webapps truly asynchronous?

I have a good old-style LAMP webapp. A week ago I needed to add a push notification mechanism to it.
Therefore, what I did was to add node.js+socket.io on the server and poll the MySQL database every 10 seconds using node.js to check whether there were new items: if so, I would have sent them to the client(s) with socket.io.
I was pretty happy with the result, even if that is not a proper realtime notification (as there is a lag of up to 10 secs).
Now, I am about to build a new webapp which will need push notifications, too. I am wondering whether to go with the same approach as the first one (that I believe is more stable and mature) or to go totally Node.js, without PHP and Apache. As for the database, I have already decided to go for MongoDB.
Finally, my question is: if I go for Node.js+Socket.io+MongoDB will I get a truly near-real-time webapp? I mean, as soon as a new record is inserted into MongoDB, will there be some sort of event triggered that I can catch via node.js, do some checking on it and, if relevant, send the notification to the client? Or will there be anyway some sort of polling on the db server-side and lag, as with my first LAMP webapp?
A related question: can you build a realtime webapp on MySQL without doing any polling as I did with my first app. Or do you need MongoDB (or Redis)?
I hope this question is not too silly - sorry, I am just starting with Node.js and co.
Thanks.

I understand your problem because I switched to node.js from php/apache/mysql too.
Generally node.js is stable, modules and your scripts are the main reasons for errors
Real-time has nothing to do with database, it's all about client and server, you can query as many data as you want in your requests and push it to the other client.
Choosing node.js is very wise but it's harder to implement.
When you insert a new record to your db, the event is the request itself, you will make a push event along with the database query something like:
// Please note this is not real code, just an example of the idea
app.get('/query', function(request, response){
// Query your database
db.query('SELECT * FROM users', function(rows){
// Push notification to dan
socket.emit('database_query_executed', 'to_dan', rows);
// End request
response.end('success');
})
})
Of course you can use MySQL! And any database you want, as I said real-time has nothing to do with databases because the database is in the middle of the process and it's totally optional.
If you want to use node.js for push notifications and php/apache for mysql then you will need to create 2 requests for each server something like:
// this is javascript
ajax('http://node.yoursite.com/push', node_options)
ajax('http://php.yoursite.com/mysql_query', php_options)
or if you want just one request, or you want to use a form, you can call your php and inside php you can create an http or net request to node.js from php, something like:
// this is php
new HttpRequest('http://node.youtsite.com/push', HttpRequest::METH_GET);

Using:
A regular MongoDB Collection as the Store,
A MongoDB Capped Collection with Tailable Cursors as the Queue,
A Node worker with Socket.IO watching the Queue as the Worker,
A Node server to serve the page with the Socket.IO client, and to receive POSTed data (or however else the data gets added) as the Server
It goes like:
The new data gets sent to the Server,
The Server puts the data in the Store,
The Server adds the data's ObjectID to the Queue,
The Queue will send the newly arrived ObjectID to the open Tailable Cursor on the Worker,
The Worker goes and gets the actual data in the ObjectID from the Store,
The Worker emits the data through the socket,
The client receives the data from the socket.
This is 'push' from the initial addition of the data all the way to receipt at the client - no polling, so as real-time as you can get given the processing time at each step.

Re: triggers in MongoDB - please see this answer: https://stackoverflow.com/a/12405093/1651408
There are much more convenient triggers in MySQL, but to call Node.js from them would require a bit of work with MySQL UDFs (user-defined functions), for instance pushing data through a Unix socket. Please note that this is necessary only when other applications (besides your Node.js process) are updating the database, and be sure to choose InnoDB as storage in this case (row- vs. table-level locking).
Can see no big problem with your technology choice of sockets.io, even if client-side web sockets aren't supported, you'll fall back (gracefully, I hope) to polling.
Finally, your question is not silly at all, since push technology is definitely superior to the flood of polling requests - it scales better. EDIT: However, would not describe either technology as real-time.
Another EDIT: for a quite well-known and successful setup of this kind please read this: http://blog.fogcreek.com/the-trello-tech-stack/

Have you discovered Chole? It works separately from your web sever and interfaces with it by using HTTP POSTs. That way you can code your web app any which way you want.

Actually Using Push Technology like Socket.IO helps you to use
the server's resource efficiently and also helps you to leverage old browsers to modern browsers making websocket or websocket-like connection.
10 sec polling is a HTTP request which is expensive especially when a lot of users present.
Unlike polling technology, push technology is relatively cheap. Users' client is opening a dedicated socket(ie. websocket) to listen to the server's push notification.
And usually your client-side JavaScript do some actions when the push notification is received.
Using your LAMP stack and Socket.IO with different port (other than 80) will be good enough to implement what you need.
But using Node.js + MongoDB + Socket.IO actually helps you to manage your server's resource much efficiently.
Because those three have non-blocking nature.
If you understand non-blocking concept correctly and implement your app appropriately,
your identical app, an app with same feature but with different language and different database, would be able to handle a lot more requests than general LAMP stack.
Above picture is a famous chart of comparing Non-blocking vs Thread way to handle concurrency
Apache(Thread) vs Nginx(Non-blocking)
MySQL is a great database. I believe you won't need join and transactions for realtime notification.
MongoDB does not have those two features unless you implement similar features by yourself.
Because of not having those two and some characteristics of its own, MongoDB can store and fetch data much faster than traditional SQL databases.
Switching from MySQL to MongoDB will decrease the time taking to insert and fetch data.

with JS you can open a socket to your server (not old browser), the server will have a ah-hoc program (on an ad-hoc port, so you need the permission to open door and run program on your server) that will send data (almost) realtime from and to the client, and without the HTTP's protocol overhead.old browser will just fall-back to polling mechanism.
I can't see other way to do this (probably there are already "coocked" framework that do this)

How to get Node.js processes communicate with one another

I have an nodejs chat app where multiple clients connect to a common chat room using socketio. I want to scale this to multiple node processes, possibly on different machines. However, clients that connect to the same room will not be guaranteed to hit the same node process. For example user 1 will hit node process A and user 2 will hit node process B. They are in the same room so if user 1 sends a message, user 2 should get it. What's the best way to make this happen since their connections are managed by different processes?
I thought about just having the node processes connect to redis. This at least solves the problem that process A will know there's another user, user 2, in the room but it still can't send to user 2 because process B controls that connection. Is there a way to register a "value changed" callback for redis?
I'm in a server environment where I can't control any of the routing or load balancing.

Both node.js processes can be subscribed to some channel through redis pub/sub and listen to messages which you pass to this channel. For example, when user 1 connects to process A on the first machine, you can store in redis information about this user along with the information which process on which machine manages it. Then when user 2, which is connected to process B on the second machine, sends a message to user 1, you can publish it to this channel and check which process on which machine is responsible for managing communication with user 1 and respond accordingly.

I have done(did) some research on this. Below my findings:
Like yojimbo87 said you first just use redis pub/sub(is very optimized).
http://comments.gmane.org/gmane.comp.lang.javascript.nodejs/22348
Tim Caswell wrote:
It's been my experience that the bottleneck is the serialization and
de-serialization of the data, not the actual channel. I'm pretty sure
you can use named pipes, but I'm not sure what the API is. msgpack
seems like a good format for the data interchange. There are a few
libraries out there that implement msgpack or ipc frameworks on top of
it.
But when serialization / deserialization becomes your bottle-neck I would try to use https://github.com/pgriess/node-msgpack. I would also like to test this out, because I think the sooner you have this the better?

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string