How to access to worker's queued requests? - node.js

I'm implementing a web server using nodejs which must serve a lot of concurrent requests. As nodejs processes the requests one by one, it keeps them in an internal queue (in libuv, I guess).
I also want to run my web server using cluster module, so there will be one requests queue per worker.
Questions:
If any worker dies, how can I retrieve its queued
requests?
How can I put retrieved requests into other workers' queues?
Is there any API to access to alive workers' requests queue?
By No. 3 I want to keep queued requests somewhere such as Redis (if possible), so in case of server crash, failure or even hardware restart I can retrieve them.

As you mentioned in the tags that you are-already-using/want-to-use redis, you can use queue-manager based on redis to do all the work for you.
Checkout https://github.com/OptimalBits/bull (or it's alternatives).
bull has a concept of queue. you add jobs to the queue and listen to the same queue from different processes/vms. bull will send the same job to only one listener and you have the ability to control how many jobs each listener is processing at the same time (concurrency-level).
In addition, if one of the jobs fails to run (in other words, the listener of the queue threw an error), bull will try to give the same job to different listener.

Related

Background processes in the aiohttp event loop

I have a web service that accepts post requests. A post request specifies a specific job to be executed in the background, that modifies a database used for later analysis. The sender of the request does not care about the result, and only needs to receive a 202 acknowledgment from the web service.
How it was implemented so far:
Flask Web service will get the http request , and add the necessary parameters to the task queue (rq workers), and return back an acknowledgement. A separate rq worker process listens on the queue and processes the job.
We have now switched to aiohttp, and realized that the web service can now schedule the actual job request in its own event loop, by using the aiohttp.ensure_future() method.
This however blurs the lines between the web-server and the task queue. On the positive side, it eliminates the need of having to manage the rq workers.
Is this considered a good practice?
If your tasks are not CPU heavy - yes, it is good practice.
But if so, then you need to move them to separate service or use run_in_executor(). In other case your aiohttp event loop will be blocked by this tasks and server will not be able to accept new requests.

How to manage Managed Executor Service

I'm using Managed Executor Service to implement a process manager which will process tasks in the background upon receiving an JMS message event. Normally, there will be a small number of tasks running (maybe 10 max) but what if something happens and my application starts getting hundred of JMS message events. How do I handle such event?
My thought is to limit the number of threads if possible and save all the other messages to database and will be run when thread available. Thanks in advance.
My thought is to limit the number of threads if possible and save all the other messages to database and will be run when thread available.
The detailed answer to this question depends on which Java EE app server you choose to run on, since they all have slightly different configuration.
Any Java EE app server will allow you to configure the thread pool size of your Managed Executor Service (MES), this is the number of worker threads for your thread pool.
Say you have a 10 worker threads, and you get flooded with 100 requests all at once, the MES will keep a queue of requests that are backlogged, and the worker threads will take work off the queue whenever they finish work until the queue is empty.
Now, it's fine if work goes to the queue sometimes but if overall your work queue increases more quickly than your worker threads can take work off the queue, you will run into problems. The solution to this is to increase your thread pool size otherwise the backlog will get overrun and your server will run out of memory.
what if something happens and my application starts getting hundred of JMS message events. How do I handle such event?
If the load on your server will be so sporadic that tasks need to be saved to a database, it seems that the best approach would be to either:
increase thread pool size
have the server immediately reject incoming tasks when the task backlog queue is full
have clients do a blocking wait for the server task queue to be not full (I would only advise this option if client task submission is in no way connected to user experience)

Distributing topics between worker instances with minimum overlap

I'm working on a Twitter project, using their streaming API, built on Heroku with Node.js.
I have a collection of topics that my app needs to process, which are pulled from MongoDB. I need to track each of these topics via the API, however it needs to be done such that each topic is tracked only once. As each worker process expires after approximately 1 hour, when a worker receives SIGTERM it needs to untrack each topic assigned, and release it back to the pool again.
I've been using RabbitMQ to communicate between app and worker processes, however with this I'm a little stuck. Are there any good examples, or advice you can offer on the correct way to do this?
Couldn't the worker just send a message via the messagequeue to the application when it receives a SIGTERM? According to the heroku docs on shutdown the process is allowed a couple of seconds (10) before it will be forecefully killed.
So you can do something like this:
// listen for SIGTERM sent by heroku
process.on('SIGTERM', function () {
// - notify app that this worker is shutting down
messageQueue.sendSomeMessageAboutShuttingDown();
// - shutdown process (might need to wait for async completion
// of message delivery to not prevent it from being delivered)
process.exit()
});
Alternatively you could break up your work in much smaller chunks and have workers only 'take' work that will run for a couple of minutes or even seconds max. Your main application should be the bookkeeper and if a process doesn't complete its task within a specified time assume it has gone missing and make the task available for another process to handle. You can probably also implement this behavior using confirms in rabbitmq.
RabbitMQ won't do this for you.
It will allow you to distribute the work to another process and/or computer, but it won't provide the kind of mechanism you need to prevent more than one process / computer from working on a particular topic.
What you want is a semaphore - a way to control access to a particular "resource" from multiple processes... a way to ensure only one process is working on a particular resource at a given time. In your case the "resource" will be the topic... but it will still be the resource that you want to control access to.
FWIW, there has been discussion of using RabbitMQ to implement a distributed semaphore in the past:
https://www.rabbitmq.com/blog/2014/02/19/distributed-semaphores-with-rabbitmq/
https://aphyr.com/posts/315-call-me-maybe-rabbitmq
but the general consensus is that this is a bad idea. there are too many edge cases and scenarios in which RabbitMQ will fail to work as proper semaphore.
There are some node.js semaphore libraries available. I would recommend looking at them, and using one of them. Have a single process manage the semaphore and decide which other process can / cannot work on which topic.

How can I prevent similar queues from running at the same time?

We currently process a set of tasks using Queue workers in Laravel. When I am using multiple threads of php artisan queue:work jobs end up running together (async). We are using Beanstalkd as the queue driver.
The issue is that in the queue work we are polling an API that only allows one concurrent session for a particular agent_id. That is, only one API call with the same agent_id can run at a time.
We thought of spinning up multiple php artisan queue:work threads with a filter on the queue_name matching the agent_id but we have over 500 agents therefore we would need 500 threads so this is not ideal.
Is there anyway to implement a lock style feature for each agent_id so that if a job is already running for a particular agent_id it will send it back to the queue? Or are there any features of beanstalkd that would allow for this?
The other option could also be to gracefully handle the rejection from the API when the user is already logged in (and send the job back to the queue). But this could get messy and could clutter the logs.
You could either run only a single worker that is capable of running the fetch-from-API job, or use some sort of external marshalling/lock service.
The options for that, may be either an internal rate limiting system, or some kind of common atomically locking system. A memcached or redis server where a worker tries to set a lock-key, and only the agent that successfully sets it, gets to work on the task. An advantage of that may be that as soon as the API request has been completed, you can remove the lock, and then while the worker processes the results, a different worker can make a new request.

gearman and retrying workers with unreliable external dependencies

I'm using gearman to queue a variety of different jobs, some which can always be serviced immediately, and some which can "fail", because they require an unreliable external service. (For example, sending email might require an SMTP server that's frequently unavailable.)
If an external service goes down, I'd like to keep all jobs which require that service on the queue, and retry one job occasionally (every few minutes, say) until the service becomes available again. (Perhaps optionally sending email if the service has not been available for hours.)
However I'd like jobs that don't require a failed service to be passed on to workers as soon as possible. How can this be achieved? (I'm happy to put some of the logic in the workers if necessary, although it seems to be a bit "late" to throttle on the worker side.)
Gearman should already be handle this. As long as you have some workers which specialise in handling jobs with unreliable dependancies and don't handle other jobs, along with some workers that either do all jobs, or just jobs without unreliable dependencies.
All you would need to do it add some code the unreliable dependancy workers so that they only accept jobs once that have checked that the dependent service is running, if the service is down then just have them wait a bit and retest the service (and continue ad infinitum), once the service is up then have them join the gearmand server, do job, return work, retest service, etc etc.
While the dependent service is down, the workers that don't handle jobs that need the service will keep on trundling through the job queue for the other jobs. Gearmand won't block an entire job queue (or worker) on one job type if there are workers available to handle other job types.
The key is to be sensible about how you define your job types and workers.
EDIT--
Ah-ha, I knew my thinking was a little out, (I wrote my gearman system about a year ago and haven't really touched it since). My solution to this type of issue was to have all the workers that normally handle dependent-job unregister their dependent job handling capability with the gearmand server once a failure was detected with the dependent service. (and any workers that are currently trying to complete that job should return a failure.) Once the service is backup - get those same workers to reregister their ability to handle that job. Do note this does require another channel of communications for the workers to be notified of the status of the dependent services.
Hope this helps

Resources