Wt (Witty) + Apache + Fastcgi - multithreading

I'm trying to understand how sessions are managed by Wt when using it through the fastcgi connector and apache, as a dynamic fastcgi script. My wt_config.xml is the default one. I changed nothing, so, the current Wt options about processes and threads management are the default ones:
<server>
<application-settings location="*">
<session-management>
<!--
<dedicated-process>
<max-num-sessions>100</max-num-sessions>
</dedicated-process>
-->
<shared-process>
<num-processes>1</num-processes>
</shared-process>
// Others....
</session-management>
<connector-fcgi>
<num-threads>1</num-threads>
// Others....
</connector-fcgi>
// Others...
</application-settings>
</server>
With such a configuration, how many process will I have? Just one for all the sessions? Or only one for each new session? What does shared-process exactly mean here?
If I would use my app, for example, as a chat, using server pushes to refresh contents, wouldn't I get problems if I have more than one processes running at the same time? (because different clients could connect to different processes).

When a Wt app works under Apache with FastCGI connector, first the Wt FastCGI dispatching process is launched. That process receives FastCGI requests from the web server and forwards them to worker processes which actually do your app job. Number of worker processes is controlled by shared-process option. It's called shared because the number of active Wt sessions can eventually exceed the number of worker processes. In this case the dispatching process tries to distribute new Wt sessions evenly to working processes.
Unlike shared-process option, when you use dedicated-process, a new worker process will be launched for every new Wt session.
Push events sent by WServer::post or WServer::postAll will be delivered only to Wt sessions within the current process. If you want to post events to all Wt sessions in all process, you should implement some custom inter-process communication mechanism.
I've discussed it here: http://redmine.webtoolkit.eu/issues/2897

Related

nodejs cluster distributing connection

In nodejs api doc, it says
The cluster module supports two methods of distributing incoming
connections.
The first one (and the default one on all platforms except Windows),
is the round-robin approach, where the master process listens on a
port, accepts new connections and distributes them across the workers
in a round-robin fashion, with some built-in smarts to avoid
overloading a worker process.
The second approach is where the master process creates the listen
socket and sends it to interested workers. The workers then accept
incoming connections directly.
The second approach should, in theory, give the best performance. In
practice however, distribution tends to be very unbalanced due to
operating system scheduler vagaries. Loads have been observed where
over 70% of all connections ended up in just two processes, out of a
total of eight.
I know PM2 is using the first one, but why it doesn't use the second? Just because of unbalnced distribution? thanks.
The second may add CPU load when every child process is trying to 'grab' the socket master sent.

How can I prevent similar queues from running at the same time?

We currently process a set of tasks using Queue workers in Laravel. When I am using multiple threads of php artisan queue:work jobs end up running together (async). We are using Beanstalkd as the queue driver.
The issue is that in the queue work we are polling an API that only allows one concurrent session for a particular agent_id. That is, only one API call with the same agent_id can run at a time.
We thought of spinning up multiple php artisan queue:work threads with a filter on the queue_name matching the agent_id but we have over 500 agents therefore we would need 500 threads so this is not ideal.
Is there anyway to implement a lock style feature for each agent_id so that if a job is already running for a particular agent_id it will send it back to the queue? Or are there any features of beanstalkd that would allow for this?
The other option could also be to gracefully handle the rejection from the API when the user is already logged in (and send the job back to the queue). But this could get messy and could clutter the logs.
You could either run only a single worker that is capable of running the fetch-from-API job, or use some sort of external marshalling/lock service.
The options for that, may be either an internal rate limiting system, or some kind of common atomically locking system. A memcached or redis server where a worker tries to set a lock-key, and only the agent that successfully sets it, gets to work on the task. An advantage of that may be that as soon as the API request has been completed, you can remove the lock, and then while the worker processes the results, a different worker can make a new request.

Node.js - child process handle urls related to itself

I'm trying to scale a chatter app using socket.io + cluster. Is it possible for child processes to handle incoming request belong to its process id (assigned when fork)?
For example:
http://mydomain/calculate?process=1
The above request is only handled by process 1, other processes will ignore it. In this way, I want to make sure requests of the same room are handled by same process, so I may don't have to use RedisStore as socket.io backend.
I also wonder how RedisStore works, because when using it, I found io.sockets.manager.rooms data are not accurate in all processes.
Edit:
Put it another way: can cluster master process dispatch request to different child processes based on the querystring?
The answer is no. The OS takes care of load balancing in this situation and in order to process query string you already have to be connected to a web server ( in your case child process ).
From my experience I find cluster a bit useless. It is a lot easier to spawn multiple NodeJS processes ( on multiple ports ) and put a proxy ( nginx? ) in front of them. It is easy and scalable.
As for socket.io: I don't think it works correctly with cluster ( because of sharing global variables, which causes issues ). Again: spawning separate NodeJS processes should fix the problem. Also it will be useful once you reach the point when you will have to scale to multiple machines. Any tricks with cluster won't help you at that point.
One last note: socket.io does not scale well. I suggest writing your own WebSocket server ( based on WS for example ) and implement your own scaling mechanism. For example based on all-to-all UDP pinging, which should scale well when dealing with small amount of servers ( 50? 100? ).

How does the cluster module work in Node.js?

Can someone explain in detail how the core cluster module works in Node.js?
How the workers are able to listen to a single port?
As far as I know that the master process does the listening, but how it can know which ports to listen since workers are started after the master process? Do they somehow communicate that back to the master by using the child_process.fork communication channel? And if so how the incoming connection to the port is passed from the master to the worker?
Also I'm wondering what logic is used to determine to which worker an incoming connection is passed?
I know this is an old question, but this is now explained at nodejs.org here:
The worker processes are spawned using the child_process.fork method,
so that they can communicate with the parent via IPC and pass server
handles back and forth.
When you call server.listen(...) in a worker, it serializes the
arguments and passes the request to the master process. If the master
process already has a listening server matching the worker's
requirements, then it passes the handle to the worker. If it does not
already have a listening server matching that requirement, then it
will create one, and pass the handle to the worker.
This causes potentially surprising behavior in three edge cases:
server.listen({fd: 7}) -
Because the message is passed to the master,
file descriptor 7 in the parent will be listened on, and the handle
passed to the worker, rather than listening to the worker's idea of
what the number 7 file descriptor references.
server.listen(handle) -
Listening on handles explicitly will cause the
worker to use the supplied handle, rather than talk to the master
process. If the worker already has the handle, then it's presumed that
you know what you are doing.
server.listen(0) -
Normally, this will cause servers to listen on a
random port. However, in a cluster, each worker will receive the same
"random" port each time they do listen(0). In essence, the port is
random the first time, but predictable thereafter. If you want to
listen on a unique port, generate a port number based on the cluster
worker ID.
When multiple processes are all accept()ing on the same underlying
resource, the operating system load-balances across them very
efficiently. There is no routing logic in Node.js, or in your program,
and no shared state between the workers. Therefore, it is important to
design your program such that it does not rely too heavily on
in-memory data objects for things like sessions and login.
Because workers are all separate processes, they can be killed or
re-spawned depending on your program's needs, without affecting other
workers. As long as there are some workers still alive, the server
will continue to accept connections. Node does not automatically
manage the number of workers for you, however. It is your
responsibility to manage the worker pool for your application's needs.
NodeJS uses a round-robin decision to make load balancing between the child processes. It will give the incoming connections to an empty process, based on the RR algorithm.
The children and the parent do not actually share anything, the whole script is executed from the beginning to end, that is the main difference between the normal C fork. Traditional C forked child would continue executing from the instruction where it was left, not the beginning like NodeJS. So If you want to share anything, you need to connect to a cache like MemCache or Redis.
So the code below produces 6 6 6 (no evil means) on the console.
var cluster = require("cluster");
var a = 5;
a++;
console.log(a);
if ( cluster.isMaster){
worker = cluster.fork();
worker = cluster.fork();
}
Here is a blog post that explains this
As an update to #OpenUserX03's answer, nodejs has no longer use system load-balances but use a built in one. from this post:
To fix that Node v0.12 got a new implementation using a round-robin algorithm to distribute the load between workers in a better way. This is the default approach Node uses since then including Node v6.0.0

nodeJS multi node Web server

I need to create multi node web server that will be allow to control number of nodes in real time and change process UID and GUID.
For example at start server starts 5 workers and pushes them into workers pool.
When the server gets the new request it searches for free workers, sets UID or GUID if needed, and gives it the request to proces. In case if there is no free workers, server will create new one, set GUID or UID, also pushes it into pool and so on.
Can you suggest me how it can be implemented?
I've tried this example http://nodejs.ru/385 but it doesn't allow to control the number of workers, so I decided that there must be other solution but I can't find it.
If you have some examples or links that will help me to resolve this issue write me please.
I guess you are looking for this: http://learnboost.github.com/cluster/
I don't think cluster will do it for you.
What you want is to use one process per request.
Have in mind that this can be very innefficient, and node is designed to work around those types of worker processing, but if you really must do it, then you must do it.
On the other hand, node is very good at handling processes, so you need to keep a process pool, which is easily accomplished by using node internal child_process.spawn API.
Also, you will need a way for you to communicate to the worker process.
I suggest opening a unix-domain socket and sending the client connection file descriptor, so you can delegate that connection into the new worker.
Also, you will need to handle edge-cases for timeouts, etc.
https://github.com/pgte/fugue I use this.

Resources