is PYTHON Gearman Worker accept multi-tasks - gearman

For example:
I have a task named "URLDownload", the task's function is download a large file from internet.
Now I have a Worker Process running, but have about 1000 files to download.
It is easy for a Client Process to create 1000 task, and send them to Gearman Server.
My Question is the Worker Process will do the task one by one, or it will accept multi-tasks at one time,
If the Worker Process can accept multi-tasks, So How can I limit the task-pool-size in Worker Process.

Workers process one request at a time. You have a few options:
1) You can run multiple workers (this is the most common method). Workers sit in poll() when they aren't processing so this model works pretty well.
2) Write a fork() implementation around the worker. This way you can fire up a set number of worker processes, but don't have to monitor multiple processes.

Related

What does [Max tasks per child setting] exactly mean in Celery?

The doc is:
With this option you can configure the maximum number of tasks a worker can execute before it’s replaced by a new process.
In what condition will a worker be replaced by a new process ? Does this setting make a worker, even with multi processes, can only process one task at one time?
It means that when celery has executed tasks more than the limit on one worker (the "worker" is a process if you use the default process pool), it will restart the worker automatically.
Say if you use celery for database manipulation and you forget to close the database connection, the auto restart mechanism will help you close all pending connections.

What does it take to run background tasks in node?

If what i understand is correct, processing background tasks is a good way to free the main thread of cpu bound tasks.
What i don't get is what is used by systems like bull or kue to run the tasks out of the main thread.
Do they use threads ? Do they require an entire node process fork? Do they spawn child processes?
Bull is based on Redis which handles in it's own process the handling and Queuing of data for these jobs.
It is a lightweight, robust and fast job processing queue. It uses redis for persistence, so the queue is not lost if the server goes down for any reason.
the internal implementation of the Job can be seen here
The same is with Kue module which is a priority job queue backed by redis processes, built for node.js. The background tasks are powered by Redis.
This means that these module depends on Redis external process that enables different creating background jobs.
Job-specific events are fired on the Job instances via Redis pubsub:
enqueue the job is now queued
promotion the job is promoted from delayed state to queued
progress the job's progress ranging from 0-100
failed attempt the job has failed, but has remaining attempts yet
failed the job has failed and has no remaining attempts
complete the job has completed
remove the job has been removed
The delayed Jobs are powered by Redis Queue which notifies/triggers back the callbacks in the module.
That's not how node.js works. Node.js internally uses an event loop to handle requests (and thus, making it an event driven framework)
This entire event loop is running in a single thread. When you execute a long running command (such as I/O or network operations) the request is enqueued to the loop and the process doesn't block. When the operation completes, it triggers a callback in your code

How to manage Managed Executor Service

I'm using Managed Executor Service to implement a process manager which will process tasks in the background upon receiving an JMS message event. Normally, there will be a small number of tasks running (maybe 10 max) but what if something happens and my application starts getting hundred of JMS message events. How do I handle such event?
My thought is to limit the number of threads if possible and save all the other messages to database and will be run when thread available. Thanks in advance.
My thought is to limit the number of threads if possible and save all the other messages to database and will be run when thread available.
The detailed answer to this question depends on which Java EE app server you choose to run on, since they all have slightly different configuration.
Any Java EE app server will allow you to configure the thread pool size of your Managed Executor Service (MES), this is the number of worker threads for your thread pool.
Say you have a 10 worker threads, and you get flooded with 100 requests all at once, the MES will keep a queue of requests that are backlogged, and the worker threads will take work off the queue whenever they finish work until the queue is empty.
Now, it's fine if work goes to the queue sometimes but if overall your work queue increases more quickly than your worker threads can take work off the queue, you will run into problems. The solution to this is to increase your thread pool size otherwise the backlog will get overrun and your server will run out of memory.
what if something happens and my application starts getting hundred of JMS message events. How do I handle such event?
If the load on your server will be so sporadic that tasks need to be saved to a database, it seems that the best approach would be to either:
increase thread pool size
have the server immediately reject incoming tasks when the task backlog queue is full
have clients do a blocking wait for the server task queue to be not full (I would only advise this option if client task submission is in no way connected to user experience)

How can I prevent similar queues from running at the same time?

We currently process a set of tasks using Queue workers in Laravel. When I am using multiple threads of php artisan queue:work jobs end up running together (async). We are using Beanstalkd as the queue driver.
The issue is that in the queue work we are polling an API that only allows one concurrent session for a particular agent_id. That is, only one API call with the same agent_id can run at a time.
We thought of spinning up multiple php artisan queue:work threads with a filter on the queue_name matching the agent_id but we have over 500 agents therefore we would need 500 threads so this is not ideal.
Is there anyway to implement a lock style feature for each agent_id so that if a job is already running for a particular agent_id it will send it back to the queue? Or are there any features of beanstalkd that would allow for this?
The other option could also be to gracefully handle the rejection from the API when the user is already logged in (and send the job back to the queue). But this could get messy and could clutter the logs.
You could either run only a single worker that is capable of running the fetch-from-API job, or use some sort of external marshalling/lock service.
The options for that, may be either an internal rate limiting system, or some kind of common atomically locking system. A memcached or redis server where a worker tries to set a lock-key, and only the agent that successfully sets it, gets to work on the task. An advantage of that may be that as soon as the API request has been completed, you can remove the lock, and then while the worker processes the results, a different worker can make a new request.

Fork NodeJS clusters as working load changes

I am trying to fork worker clusters to a maximun of 10, and only if the working load increases. Can it be done?
I have tried with strong-cluster-control's setSize, but I can't find an easy way of forking automatically (if many requests are being done then fork, for example), or closing/"suiciding" forks (maybe with a timeOut if nothing is being done, like in this answer)
This is my repo's main file at GitHub
Thank you in advance!!
I assume that you already have some idea as to how you would like to spread your load so I will not include details about that and instead focus on the interprocess communication required for this.
Notifying the master
To send arbitrary data to the master, you can use process.send() from a worker. The way I would go about this is probably something along these steps:
The application is started
Minimum amount of workers are spawned
Each worker will send the master a request message every time it receives a new request, via process.send()
The master keeps track of all the request events from all workers
If the amount of request events increases above a predefined threshold (i.e. > 100 requests/s) it spawns a new worker
If the amount of request events decreases below a predefined threshold it asks one of the workers to stop processing new requests and close itself gracefully (note that it should not simply kill the process to avoid interrupting ongoing requests)
Main point is: Do not focus on time - focus on rate. In an application that is supposed to handle tens to thousands of requests per second, your setTimout() (the task of which might be to kill the worker if it has been idle for too long) will never fire because Node.js evenly distributes your load across your workers - you could start with one worker, but once you reach your maximum you will never drop to one worker again under continuous load even if there is only one request per second.
It should be noted that it is counterproductive to spawn more workers than the amount of CPU cores you have at your disposal. It might, however, be beneficial to start with a single worker and incrementally increase the amount to all cores as load increases.

Resources