Should I spawn a new node process per game room? - node.js

I am making a card game on node.js and I am thinking about spawning a new process per game room. I plan on doing the connection using fork(). After quite some research, I found that this isn't the best approach, because I should have a node process per physical core. But isn't my approach better for scalability and modularity? If let's say a game room crashes, it wouldn't crash the rest of them. Can someone help me analyse a bit better the situation? I plan on running the game on aws EC2 instances and expect a maximum of 1500 concurrent users, playing on rooms of 4 people, and communicating with socket.io messages.

A single Node instance can handle that type of load, as one of NodeJS's strongpoints is real time communication and many concurrent connections.
Regarding crashes, you need to plan for those. Some initial tips:
Catch errors and log error messages so that your Node instance does not completely fail. Often errors may stop that particular function chain from finishing correctly, but will not kill your process.
Persist your game state to another service, like a database, so things (like connections) can recover. Use Case example: "User loses connection and logs back in, they are re-connected to the room and can see the game in its current state"
You can auto-recover your Node process by running something like forever or PM2 (there are others). These will monitor and restart on process failure (though that shouldn't happen too often).

Related

Node.js multiplayer game multi threading and multi core

I'm working on an web multiplayer game. Using docker, node, digital ocean with postgres and redis for data. My docker machine has many CPUs, but node.js is only using a single CPU. Because of this I'm near the point this game can scale.
I want to be able to leverage the other CPUs. Some of the methods I've heard and read about are:
Using child processes
Running multiple node servers in 1 droplet
Running multiple node servers in different droplets
I prefer not to run more node servers, if I can get away with it. The game structure is kind of like this:
Lobby (Player info, all online players etc..)
Rooms (Limited amount of players)
Games (Limited amount of players, 20-40m play time)
Lobby <=> Room => Games
I do not have a way of measuring (Open to suggestions here too), but I'm guessing most of the CPU usage are from singular games. I'm hoping for a solution somewhat like keeping Lobby & Rooms in the main thread. While all games are created in either random different threads or created in thread that has the minimum amount of active game count.
Some possible problems I foresee are messages and playerstate. A player can have multiple browser tabs open at the same time. Which means a single player object can have like 4 sessions & socket connections. 1 inside lobby, 3 inside games. Player might want to send a private message from his game to a player inside the lobby. Also when a game finishes, player state will get updated.
According to all these info, what would be my best way to proceed? Any docs, blogs, links, keywords, videos are appreciated.
The typical solution for this problem is to use Redis, which it sounds like you are already doing. The idea is that you program your game server to be "stateless", meaning that all of the state is stored in Redis.
Then, you can spawn N instances of your game server, where N is equal to the number of CPUs on your VPS droplet minus 1. (Let the OS have 1 CPU for good measure.)
This is a much cleaner solution than using NodeJS Web Workers, because you don't have to worry about IPC and the lifecycle of the processes.
Furthermore, updating your game servers becomes very easy. In a server that handles state internally, your users would have to have downtime when you updated the server, meaning that they would have to end any games that they are currently playing so that you can reboot everything. But if all the state is in Redis, you can simply shutdown all your instances and start new ones with no real downtime.
Have you tried Web Workers? They allow you to use threads in Node.js and their API is finally stable.
You can see a few examples in this blog post by Rich Trott.
Here's a minimal example:
const { Worker, isMainThread, parentPort } = require('worker_threads');
if (isMainThread) {
// This code is executed in the main thread and not in the worker.
// Create the worker.
const worker = new Worker(__filename);
// Listen for messages from the worker and print them.
worker.on('message', (msg) => { console.log(msg); });
} else {
// This code is executed in the worker and not in the main thread.
// Send a message to the main thread.
parentPort.postMessage('Hello world!');
}

Distributing topics between worker instances with minimum overlap

I'm working on a Twitter project, using their streaming API, built on Heroku with Node.js.
I have a collection of topics that my app needs to process, which are pulled from MongoDB. I need to track each of these topics via the API, however it needs to be done such that each topic is tracked only once. As each worker process expires after approximately 1 hour, when a worker receives SIGTERM it needs to untrack each topic assigned, and release it back to the pool again.
I've been using RabbitMQ to communicate between app and worker processes, however with this I'm a little stuck. Are there any good examples, or advice you can offer on the correct way to do this?
Couldn't the worker just send a message via the messagequeue to the application when it receives a SIGTERM? According to the heroku docs on shutdown the process is allowed a couple of seconds (10) before it will be forecefully killed.
So you can do something like this:
// listen for SIGTERM sent by heroku
process.on('SIGTERM', function () {
// - notify app that this worker is shutting down
messageQueue.sendSomeMessageAboutShuttingDown();
// - shutdown process (might need to wait for async completion
// of message delivery to not prevent it from being delivered)
process.exit()
});
Alternatively you could break up your work in much smaller chunks and have workers only 'take' work that will run for a couple of minutes or even seconds max. Your main application should be the bookkeeper and if a process doesn't complete its task within a specified time assume it has gone missing and make the task available for another process to handle. You can probably also implement this behavior using confirms in rabbitmq.
RabbitMQ won't do this for you.
It will allow you to distribute the work to another process and/or computer, but it won't provide the kind of mechanism you need to prevent more than one process / computer from working on a particular topic.
What you want is a semaphore - a way to control access to a particular "resource" from multiple processes... a way to ensure only one process is working on a particular resource at a given time. In your case the "resource" will be the topic... but it will still be the resource that you want to control access to.
FWIW, there has been discussion of using RabbitMQ to implement a distributed semaphore in the past:
https://www.rabbitmq.com/blog/2014/02/19/distributed-semaphores-with-rabbitmq/
https://aphyr.com/posts/315-call-me-maybe-rabbitmq
but the general consensus is that this is a bad idea. there are too many edge cases and scenarios in which RabbitMQ will fail to work as proper semaphore.
There are some node.js semaphore libraries available. I would recommend looking at them, and using one of them. Have a single process manage the semaphore and decide which other process can / cannot work on which topic.

Node/Express: running specific CPU-instensive tasks in the background

I have a site that makes the standard data-bound calls, but then also have a few CPU-intensive tasks which are ran a few times per day, mainly by the admin.
These tasks include grabbing data from the db, running a few time-consuming different algorithms, then reuploading the data. What would be the best method for making these calls and having them run without blocking the event loop?
I definitely want to keep the calculations on the server so web workers wouldn't work here. Would a child process be enough here? Or should I have a separate thread running in the background handling all /api/admin calls?
The basic answer to this scenario in Node.js land is to use the core cluster module - https://nodejs.org/docs/latest/api/cluster.html
It is an acceptable API to :
easily launch worker node.js instances on the same machine (each instance will have its own event loop)
keep a live communication channel for short messages between instances
this way, any work done in the child instance will not block your master event loop.

Develop a clock and workers in node.js on heroku

I'm working on a service that needs to analyze data from social media networks every five minutes for different users. I'm developing it in node.js and I will implement it on Heroku.
According to this article on Heroku website, the best way to do that is separating the logic of the scheduler from the logic of the worker. In fact, the idea is to have one dyno dedicated to schedule tasks to avoid duplication. This dyno instructs a farm of workers (n dynos as needed) to do the tasks.
Here is the procfile of this architecture:
web: node web.js
worker: node worker.js
clock: node clock.js
The problem is how to implement it in node.js. I googled it, and the suggestion is to use message queue systems (like IronMQ, RabbitMQ or CloudAMQP). But I'm trying to set my code and app simple and with the minor need of add-ons.
The question is: is there a way to communicate directly from my scheduler (clock) to the worker dynos?
Thanks for your answers.
Heroku dynos do not have fixed IP addresses, so there is no way to open a direct connection between them. That's why you need to create a separate server instance with a static IP or other fixed endpoint that acts as a go-between.
You have at least two viable options: a RabbitMQ-type message queue, or a stripped down version using a pub-sub redis feed. I generally use the latter because it's quick, simple, and sufficiently robust for all my needs (e.g. if a message gets lost every once in a blue moon, it's no big deal). If, however, it is essential that you never lose a message, you should use a full-blown message queue like RabbitMQ.
Setting up the redis implementation is very straightforward. There are several redis add-ons (I use RedisCloud) with free and inexpensive plans. When you provision them, you get an endpoint to connect to and a password. Then you just connect your web dyno(s) and worker dyno(s) to your redis instance such that your web app publishes tasks to a channel and the worker subscribes to that channel.
If you need the web app to communicate with the client after task completion, you just create another channel for the worker to publish task completion messages and the web app to listen for them.
You'll never get duplication of tasks, as each time a worker receives a message it pops off the queue.
If I understood this correctly, you want to spin a clock as one app, and then spin workers as separate apps? Sure, there is a direct way. You open a connection from the clock app towards the worker app.
For example, have every worker open a client sockets connection to the clock. Then the clock can communicate to them and relay orders.
Or use WebRTC. That way the workers will talk to the clock, but they can also talk to each other.
Or make an (authenticated) HTTP(s) REST endpoint on the worker where it will receive tasks. Like, POST /tasks will create a task on the worker. If the task is short, it can reply right away, so that the clock knows the job is done. Or if it's a longer task, it can acknowledge it, but later call an endpoint on the clock to say it's done, something like PUT /tasks/32.
Or even more directly, open a direct net connection towards the clock, for example on worker start (and the other way around). Use dgram and send UDP messages between worker and clock.
In any way, I also believe that the people suggesting MQ like RabbitMQ is much better to just push jobs/tasks on. Then it can distribute tasks as needed, and based on unacked count on the job queue, it can spin up more workers when needed.
But your question is very broad, so to get more details, you could provide a little more details.
This might be helpful.
http://blog.andyjiang.com/intermediate-cron-jobs-with-heroku/
Basically you require the worker file directly into the clock file.
I solved in an easy way with the following three steps:
Set credit card information in Heroku account;
Installed Heroku Scheduler addon (you can use the command heroku addons:create scheduler:standard --app <yourAppName>)
Set up the script to run as scheduled job
More info here or here.

PUB/SUB with short-lived publisher and long-lived subscribers

Context: OS: Linux (Ubuntu), language: C (actually Lua, but this should not matter).
I would prefer a ZeroMQ-based solution, but will accept anything sane enough.
Note: For technical reasons I can not use POSIX signals here.
I have several identical long-living processes on a single machine ("workers").
From time to time I need to deliver a control message to each of processes via a command-line tool. Example:
$ command-and-control worker-type run-collect-garbage
Each of workers on this machine should receive a run-collect-garbage message. Note: it would be perfect if the solution would somehow work for all workers on all machines in the cluster, but I can write that part myself.
This is easily done if I will store some information about running workers. For example keep the PIDs for them in a known location and open a control Unix domain socket on a known path with a PID somewhere in it. Or open TCP socket and store host and port somewhere.
But this would require careful management of the stored information — e.g. what if worker process suddenly dies? (Nothing unmanageable, but, still, extra fuss.) Also, the information needs to be stored somewhere, thus adding an extra bit of complexity.
Is there a good way to do this in PUB/SUB style? That is, workers are subscribers, command-and-control tool is a publisher, and all they know is a single "channel url", so to say, on which to come for messages.
Additional requirements:
Messages to the control channel must wake up workers from the poll (select, whatever)
loop.
Message delivery must be guaranteed, and it must reach each and every worker that is listening.
Worker should have a way to monitor for messages without blocking — ideally by the poll/select/whatever loop mentioned above.
Ideally, worker process should be "server" in a sense — he should not bother about keeping connections to the "channel server" (if any) persistent etc. — or this should be done transparently by the framework.
Usually such a pattern requires a proxy for the publisher, i.e. you send to the proxy which immediately accepts delivery and then that reliably forwads to the end subscriber workers. The ZeroMQ guide covers a few different methods of implementing this.
http://zguide.zeromq.org/page:all
Given your requirements, Steve's suggestion does seem the simplest: run a daemon which listens on two known sockets - the workers connect to that and the command tool pushes to it which redistributes to connected workers.
You could do something complicated that would probably work, by effectively nominating one of the workers. For example, on startup workers attempt to bind() a PUB ipc:// socket somewhere accessible, like tmp. The one that wins bind()s a second IPC as a PULL socket and acts as a forwarder device on top of it's normal duties, the others connect() to the original IPC. The command line tool connect()s to the second IPC, and pushes it's message. The risk there is that the winner dies, leaving a locked file. You could identify this in the command line tool, rebind then sleep (to allow the connections to be established). Still, that's all a little bit complex, I think I'd go with a proxy!
I think what you're describing would fit well with a gearmand/supervisord implementation.
Gearman is a great task queue manager and supervisord would allow you to make sure that the process(es) are all running. It's TCP based too so you could have clients/workers on different machines.
http://gearman.org/
http://supervisord.org/
I recently set something up with multiple gearmand nodes, linked to multiple workers so that there's no single point of failure
edit: Sorry - my bad, I just re-read and saw that this might not be ideal.
Redis has some nice and simple looking pub/sub functionality that I've not used yet but sounds promising.
Use a mulitcast PUB/SUB. You'll have to make sure the pgm option is compiled into your ZeroMQ distribution (man 7 zmq_pgm).

Resources