From the little I know of the inner workings of Ruby it only releases memory once the process if finished.
I'm debugging a very long running sidekiq Heroku process with memory issues and I'm trying to figure out the boundaries of a background process.
So... say I have sidekiq using 3 threads, and it's running 3 jobs at a time, and I have 100 jobs total. That mean its likely there will never be a break between jobs right? Does that mean memory won't be released until all 100 jobs are done? I'm assuming yes, it won't release until the 100 are finished.
But say I switch to 1 thread, and the 100 jobs run after each other. Will each job now be it's own process, or will sidekiq treat all those 100 queued jobs as a single process (even though it's only running one at a time)?
If so... is there a way to make each job it's own individual process which releases memory after it's done?
Ruby uses garbage collection to reclaim memory. It will release memory much quicker than you are thinking. See also https://github.com/mperham/sidekiq/wiki/Problems-and-Troubleshooting#memory-bloat
Related
Tech stack: celery 5.0.5, flask, python, Windows OS(8 CPUs).
To give a background, my usage requires spawning one worker, one queue per country as per the request payload
I am using celery.control.inspect().active() to see list of active workers and see if worker with {country}_worker exists in that list. If no, spawn a new worker using:
python subprocess.Popen('celery -A main.celery worker --loglevel=info -Q {queue_name} --logfile=logs\\{queue_name}.log --concurrency=1 -n {worker_name}')
This basically starts a new celery worker and a new queue.
My initial understanding was that we can spawn only n number of workers where n is the cpu_count(). So with this understanding, while testing my code I found that when my 9th worker was spawned, I assumed it will wait for any one of the previous 8 workers to finish execution before taking up the task, but as soon as it was spawned it started consuming from the queue while rest 8 workers were also executing and same happened when I spawned more workers(15 workers in total).
This brings me to my question that the --concurrency argument in a celery process is responsible for parallel execution within that worker? If I spawned 15 independent workers does that mean 15 different processes can be executed in parallel?
Any help is appreciated in understanding this concept.
Edit: I also noticed that each new task received in the corresponding worker spawns a new python.exe process(as per the task manager) and the previous python process spawned remains in memory unused. This does not happen when I spawn worker as "solo" rather than "prefork". Problem with using solo? celery.inspect().active() does not return anything if the workers are executing something and respond back when no tasks are in progress.
If your tasks are I/O bound, and it seems they are, then perhaps you should change the concurrency type to Eventlet. Then you can in theory have concurrency set even to 1000. However, it is a different execution model so you need to write your tasks carefully to avoid deadlocks.
If the tasks are CPU-bound, then I suggest you have concurrency set to N-1, where N is number of cores, unless you want to overutilise, in which case you can pick a slightly bigger number.
PS. you CAN spawn many worker-processes, but since they all run concurrently (separate processes in this case) their CPU utilisation would be low so it really makes no sense to go above the number of available cores.
The idle task (a.k.a. swapper task) is chosen to run when no more runnable tasks in the run queue at the point of task scheduling. But what is the usage for this so special task? Another question is why i can't find this thread/process in the "ps aux" output (PID=0) from the userland?
The reason is historical and programatic. The idle task is the task running, if no other task is runnable, like you said it. It has the lowest possible priority, so that's why it's running of no other task is runnable.
Programatic reason: This simplifies process scheduling a lot, because you don't have to care about the special case: "What happens if no task is runnable?", because there always is at least one task runnable, the idle task. Also you can count the amount of cpu time used per task. Without the idle task, which task gets the cpu-time accounted no one needs?
Historical reason: Before we had cpus which are able to step-down or go into power saving modes, it HAD to run on full speed at any time. It ran a series of NOP-instructions, if no tasks were runnable. Today the scheduling of the idle task usually steps down the cpu by using HLT-instructions (halt), so power is saved. So there is a functionality somehow in the idle task in our days.
In Windows you can see the idle task in the process list, it's the idle process.
The linux kernel maintains a waitlist of processes which are "blocked" on IO/mutexes etc. If there is no runnable process, the idle process is placed onto the run queue until it is preempted by a task coming out of the wait queue.
The reason it has a task is so that you can measure (approximately) how much time the kernel is wasting due to blocks on IO / locks etc. Additionally it makes the code that much easier for the kernel as the idle task is the same as every task it needs to context switch, instead of a "special case" idle task which could make changing kernel behaviour more difficult.
There is actually one idle task per cpu, but it's not held in the main task list, instead it's in the cpu's "struct rq" runqueue struct, as a struct task_struct * .
This gets activated by the scheduler whenever there is nothing better to do (on that CPU) and executes some architecture-specific code to idle the cpu in a low power state.
You can use ps -ef and it will list the no of process which are running. Then in the first link, it will list the first pid - 0 which is the swapper task.
I've heard that if a thread does not consume the entire time-slice allocated by the OS's thread-scheduler the remainder is wasted: e.g. if the time-slice is 10ms and the thread ends before 5ms, the remaining 5ms are lost.
So if you have a lot of small fast tasks always taking less time than the originally allocated time-slice the waste can be important system-wide.
If this is true I guess with standard workloads the impact is negligible and will be a concern only with specific use-cases like servers running a single type of tasks.
Do you confirm this?
Have you more information?
I've heard that if a thread does not consume the entire time-slice
allocated by the OS's thread-scheduler the remainder is wasted
I don't think that's the case. For linux a running task goes into terminated state when it exits, thus freeing the processor:
... but if the OS's scheduler only "wakes up" at fixed times (e.g. with a
frequency of 10ms/100 times per second)
The scheduler is invoked whenever a task needs to be scheduled. That happens when the time allocated for the running task has expired (that doesn't necessary mean fix frequency), but also on IO/events, exit and other scenarios.
I have a process Scheduled using Timer and TimerTask that runs nightly. Currently ti takes about an hour to finish. Considering there are only 6000 records to loop through the process and the upper management feels like it is very inefficient Job. So I wanted to know if I could span multiple threads of the same job with different datasets. Probaby each thread processes only 500 records at a time.
If i am hitting the same table for read/insert and update using
multiple threads would that be ok to do it?
if so how do i run multiple threads within a timer task? I suppose I could
just create threads and run but how do i ensure they run simultaneously but not sequentially?
I am using java 1.4 and this runs on a jboss 2.4 and i make use EJB 1.1 session beans in the process to read/update/add data.
There isn't enough info in your post for a surefire answer, but I'll share some thoughts:
It depends. Generally you can do reads in parallel, but not writes. If you're doing much more reading than writing, you're probably ok, but you may find yourself dealing with frustrating race conditions.
It depends. You are never guaranteed to have threads run in parallel. That's up to the cpu/kernel/jvm to decide. You just make threads to tell the machine that it's allowed to execute them in parallel.
The idle task (a.k.a. swapper task) is chosen to run when no more runnable tasks in the run queue at the point of task scheduling. But what is the usage for this so special task? Another question is why i can't find this thread/process in the "ps aux" output (PID=0) from the userland?
The reason is historical and programatic. The idle task is the task running, if no other task is runnable, like you said it. It has the lowest possible priority, so that's why it's running of no other task is runnable.
Programatic reason: This simplifies process scheduling a lot, because you don't have to care about the special case: "What happens if no task is runnable?", because there always is at least one task runnable, the idle task. Also you can count the amount of cpu time used per task. Without the idle task, which task gets the cpu-time accounted no one needs?
Historical reason: Before we had cpus which are able to step-down or go into power saving modes, it HAD to run on full speed at any time. It ran a series of NOP-instructions, if no tasks were runnable. Today the scheduling of the idle task usually steps down the cpu by using HLT-instructions (halt), so power is saved. So there is a functionality somehow in the idle task in our days.
In Windows you can see the idle task in the process list, it's the idle process.
The linux kernel maintains a waitlist of processes which are "blocked" on IO/mutexes etc. If there is no runnable process, the idle process is placed onto the run queue until it is preempted by a task coming out of the wait queue.
The reason it has a task is so that you can measure (approximately) how much time the kernel is wasting due to blocks on IO / locks etc. Additionally it makes the code that much easier for the kernel as the idle task is the same as every task it needs to context switch, instead of a "special case" idle task which could make changing kernel behaviour more difficult.
There is actually one idle task per cpu, but it's not held in the main task list, instead it's in the cpu's "struct rq" runqueue struct, as a struct task_struct * .
This gets activated by the scheduler whenever there is nothing better to do (on that CPU) and executes some architecture-specific code to idle the cpu in a low power state.
You can use ps -ef and it will list the no of process which are running. Then in the first link, it will list the first pid - 0 which is the swapper task.