CFS sysctl_sched_latency kernel parameter - linux

We have following kernel parameters:
sysctl_sched_min_granularity = 0.75
sysctl_sched_latency = 6 ms
sched_nr_latency = 8
What I understand (I don't know if correctly), parameter sysctl_sched_latency says, that all tasks in runqueue should be executed in time 6ms.
So if the task arrives in time X the task should be executed not later than X + 6 ms at least once.
The function check_preempt_tick is executed periodically by task_tick_fair.
At the beginning of this function we checks whether delta_exec is not greater then ideal_runtime:
ideal_runtime = sched_slice(cfs_rq, curr);
delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
if (delta_exec > ideal_runtime)
For instance:
if we have 4 tasks (the same priority) ideal_runtime should returns 1.5 ms for the first task.
Let assume that this task has executed for 1.5 ms and exit.
So we have 3 tasks in the queue right now.
The second calculation for task2 is like that:
we have 3 tasks in a queue so time of ideal_runtime should be 2 ms.
Task2 has run for 2 ms and exit.
Again
task3 calculates time and we have
2 tasks in the queue
so it should run for 3ms.
So in summary
task4 will be executed after 1.5 ms (task1) + 2ms (task2) + 3ms (task3)
so sysctl_sched_latency is exceeded (which is 6 ms).
How CFS does ensure that all task in queue are executed in sysctl_sched_latency time, when queue can change dynamically all time?
Thanks,

Related

How do I return a worker back to the worker pool in Go

I am implementing a worker pool which can take jobs from a channel. After it kept timing out, I realised that when a panic occurs within a worker fcn, even though I have implemented a recovery mechanism, the worker still does not return to the pool again.
In the golang playground, I was able to replicate the issue:
Worker Pool Reference
Modified code for play ground:
package main
import "fmt"
import "time"
import "log"
func recovery(id int, results chan<- int) {
if r := recover(); r != nil {
log.Print("IN RECOVERY FUNC - Failed worker: ",id)
results <- 0
}
}
func worker(id int, jobs <-chan int, results chan<- int) {
for j := range jobs {
defer recovery(id, results)
if id == 1 {
panic("TEST")
}
fmt.Println("worker", id, "started job", j)
time.Sleep(time.Second)
fmt.Println("worker", id, "finished job", j)
results <- j * 2
}
}
func main() {
jobs := make(chan int, 100)
results := make(chan int, 100)
for w := 1; w <= 3; w++ {
go worker(w, jobs, results)
}
for j := 1; j <= 10; j++ {
jobs <- j
}
close(jobs)
for a := 1; a <= 10; a++ {
<-results
}
}
For testing, I have implemented a panic when worker 1 is used. When run, the func panics as expected, and goes into recovery as expected (does not push a value into the channel either), however worker 1 never seems to come back.
Output without panic:
worker 3 started job 1
worker 1 started job 2
worker 2 started job 3
worker 1 finished job 2
worker 1 started job 4
worker 3 finished job 1
worker 3 started job 5
worker 2 finished job 3
worker 2 started job 6
worker 3 finished job 5
worker 3 started job 7
worker 1 finished job 4
worker 1 started job 8
worker 2 finished job 6
worker 2 started job 9
worker 1 finished job 8
worker 1 started job 10
worker 3 finished job 7
worker 2 finished job 9
worker 1 finished job 10
Output with panic:
worker 3 started job 1
2009/11/10 23:00:00 RECOVERY Failed worker: 1
worker 2 started job 3
worker 2 finished job 3
worker 2 started job 4
worker 3 finished job 1
worker 3 started job 5
worker 3 finished job 5
worker 3 started job 6
worker 2 finished job 4
worker 2 started job 7
worker 2 finished job 7
worker 2 started job 8
worker 3 finished job 6
worker 3 started job 9
worker 3 finished job 9
worker 3 started job 10
worker 2 finished job 8
worker 3 finished job 10
How do I return worker 1 back to the pool after recovery (or in the recovery process)
If you care about the errors, you could have an errors channel passed into the worker functions, and if they encounter an error, send it down the channel and then continue. The main loop could process those errors.
Or, if you don't care about the error, simply continue to skip that job.
The continue statement basically stops processing that iteration of the loop, and continues with the next.

Is the performance of notify_one() really this bad?

For the measurement below I've been using x86_64 GNU/Linux with kernel 4.4.0-109-generic #132-Ubuntu SMP running on the AMD FX(tm)-8150 Eight-Core Processor (which has a 64 byte cache-line size).
The full source code can be obtained here: https://github.com/CarloWood/ai-threadsafe-testsuite/blob/master/src/condition_variable_test.cxx
which is independent of other libraries. Just compile with:
g++ -pthread -std=c++11 -O3 condition_variable_test.cxx
What I really tried to do here is measure how long it takes to execute a call to notify_one() when one or more threads are actually waiting, relative to how long that takes when no thread is waiting on the condition_variable used.
To my astonishment I found that both cases are in the microsecond range: when 1 thread is waiting it takes about 14 to 20 microseconds; when no thread is waiting it takes apparently less, but still at least 1 microsecond.
In other words, if you have a producer/consumer scenario and every time there is nothing to do for the consumer you let them call wait(), and every time something new is written to the queue by a producer you call notify_one() assuming that the implementation of std::condition_variable will be smart enough not to spend a lot of time when no threads are waiting in the first place.. then oh horror, your application will become a lot slower than with the code that I wrote to TEST how long a call to notify_one() takes when a thread is waiting!
It seems that the code that I used is a must to speed up such scenarios. And that confuses me: why on earth isn't the code that I wrote already part of std::condition_variable ?
The code in question is, instead of doing:
// Producer thread:
add_something_to_queue();
cv.notify_one();
// Consumer thread:
if (queue.empty())
{
std::unique_lock<std::mutex> lk(m);
cv.wait(lk);
}
You can gain a speed up of a factor of 1000 by doing:
// Producer thread:
add_something_to_queue();
int waiting;
while ((waiting = s_idle.load(std::memory_order_relaxed)) > 0)
{
if (!s_idle.compare_exchange_weak(waiting, waiting - 1, std::memory_order_relaxed, std::memory_order_relaxed))
continue;
std::unique_lock<std::mutex> lk(m);
cv.notify_one();
break;
}
// Consumer thread:
if (queue.empty())
{
std::unique_lock<std::mutex> lk(m);
s_idle.fetch_add(1, std::memory_order_relaxed);
cv.wait(lk);
}
Am I making some horrible mistake here? Or are my findings correct?
Edit:
I forgot to add the output of the bench mark program (DIRECT=0):
All started!
Thread 1 statistics: avg: 1.9ns, min: 1.8ns, max: 2ns, stddev: 0.039ns
The average time spend on calling notify_one() (726141 calls) was: 17995.5 - 21070.1 ns.
Thread 1 finished.
Thread Thread Thread 5 finished.
8 finished.
7 finished.
Thread 6 finished.
Thread 3 statistics: avg: 1.9ns, min: 1.7ns, max: 2.1ns, stddev: 0.088ns
The average time spend on calling notify_one() (726143 calls) was: 17207.3 - 22278.5 ns.
Thread 3 finished.
Thread 2 statistics: avg: 1.9ns, min: 1.8ns, max: 2ns, stddev: 0.055ns
The average time spend on calling notify_one() (726143 calls) was: 17910.1 - 21626.5 ns.
Thread 2 finished.
Thread 4 statistics: avg: 1.9ns, min: 1.6ns, max: 2ns, stddev: 0.092ns
The average time spend on calling notify_one() (726143 calls) was: 17337.5 - 22567.8 ns.
Thread 4 finished.
All finished!
And with DIRECT=1:
All started!
Thread 4 statistics: avg: 1.2e+03ns, min: 4.9e+02ns, max: 1.4e+03ns, stddev: 2.5e+02ns
The average time spend on calling notify_one() (0 calls) was: 1156.49 ns.
Thread 4 finished.
Thread 5 finished.
Thread 8 finished.
Thread 7 finished.
Thread 6 finished.
Thread 3 statistics: avg: 1.2e+03ns, min: 5.9e+02ns, max: 1.5e+03ns, stddev: 2.4e+02ns
The average time spend on calling notify_one() (0 calls) was: 1164.52 ns.
Thread 3 finished.
Thread 2 statistics: avg: 1.2e+03ns, min: 1.6e+02ns, max: 1.4e+03ns, stddev: 2.9e+02ns
The average time spend on calling notify_one() (0 calls) was: 1166.93 ns.
Thread 2 finished.
Thread 1 statistics: avg: 1.2e+03ns, min: 95ns, max: 1.4e+03ns, stddev: 3.2e+02ns
The average time spend on calling notify_one() (0 calls) was: 1167.81 ns.
Thread 1 finished.
All finished!
The '0 calls' in the latter output are actually around 20000000 calls.

setTimeout introducing delay in long running functions

The code below executes a long running function (a sleep to keep it simple), then calls back itself again using a setTimeout. I am using nodejs 5.1.0
var start_time = Math.ceil(new Date() /1000)
function sleep(delay) {
var start = new Date().getTime();
while (new Date().getTime() < start + delay);
}
function iteration(){
console.log("sleeping at :", Math.ceil(new Date()/1000 - start_time));
sleep(10000); // sleep 10 secs
console.log("waking up at:", Math.ceil(new Date()/1000 - start_time));
setTimeout(iteration, 0);
}
iteration();
At the moment where setTimeout is called node is not doing anything (eventloop empty in theory?) so I expect that callback to be executed immediately. In other terms, I expect the program to sleep immediately after it wakes up.
However here is what happens:
>iteration();
sleeping at : 0
waking up at: 10
undefined
> sleeping at : 10 //Expected: Slept immediately after waking up
waking up at: 20
sleeping at : 30 //Not expected: Slept 10 secs after waking up
waking up at: 40
sleeping at : 50 //Not expected: Slept 10 secs after waking up
waking up at: 60
sleeping at : 70 //Not expected. Slept 10 secs after waking up
waking up at: 80
sleeping at : 90 //Not expected: Slept 10 secs after waking up
waking up at: 100
sleeping at : 110 //Not expected: Slept 10 secs after waking up
As you can see, the program sleeps immediately after it wakes up the first time (which is what I expect), but not after that. In fact, it waits exactly another 10 seconds between waking up and sleeping.
If I use setImmediate things work fine. Any idea what I am missing?
EDIT:
Notes:
1- In my real code, I have a complex long algorithm and not a simple sleep function. The code above just reproduces the problem in a simpler use case.
That means I can't use setTimeout(iteraton, 10000)
2- The reason why I use setTimeout with a delay of 0 is because my code blocks nodejs for so long that other smaller tasks get delayed. By doing a setTimeout, nodejs gets backs to the event loop to execute those small tasks before continuing to work on my main long running code
3- I completely understand that setInterval(function, 0) will run as soon as possible and not immediately after, however that does not explain why it runs with 10 seconds delay given that nodejs is not doing anything else
4- My code in Chrome runs without any problems. Nodejs bug?
First look at the node.js documenation:
It is important to note that your callback will probably not be called
in exactly delay milliseconds - Node.js makes no guarantees about the
exact timing of when the callback will fire, nor of the ordering
things will fire in. The callback will be called as close as possible
to the time specified.
To make your function more async you could make something like:
function iteration(callback){
console.log("sleeping at :", Math.ceil(new Date()/1000));
sleep(10000); // sleep 10 secs
console.log("waking up at:", Math.ceil(new Date()/1000));
return setTimeout(callback(iteration), 0);
}
iteration(iteration);
Next problem is your sleep function. You have a while loop blocking all your node.js processes to make something the setTimeout function of node.js does for you.
function sleep(delay) {
var start = new Date().getTime();
while (new Date().getTime() < start + delay);
}
That what happens now is that your sleep function is blocking the process. It is not only happening when you call the function directly it is also happening, when the scope of the function is initialised after the async call of setTimeout, by node.js.
And that is not a bug. It has to be like that, because inside your while loop could be something your iteration functions has to know. Specially when it is called after a setTimeout. So first the sleep function is checked and as soon as possible the function is called.
Edited answer after question change

Efficent use of Future

is there any diffrence between in these approach?
val upload = for {
done <- Future {
println("uploadingStart")
uploadInAmazonS3 //take 10 to 12 sec
println("uploaded")
}
}yield done
println("uploadingStart")
val upload = for {
done <- Future {
uploadInAmazonS3 //take 10 to 12 sec
}
}yield done
println("uploadingStart")
i wanna know in terms of thread Blocking?
does thread is blocked here, while executing these three lines
println("uploadingStart")
uploadInAmazonS3 //take 10 to 12 sec
println("uploaded")
and in another it is not blocking thread it is so?
or thread are same busy in both cases?
The code within future will be executed by some thread from the executionContext(thread pool)
Yes, the thread which executes this part
println("uploadingStart")
uploadInAmazonS3 //take 10 to 12 sec
println("uploaded")
will be blocked, but not the calling thread(main thread).
In the second case both the println statements are executed by the main thread. Since the main thread simply proceeds after creating the future, the println statements are executed without any delay
The difference is that in former code, println are executed when the future is really performed, whereas in the second one println are runed when the future is declared (prepared, but not yet executed).

Seeking help with a MT design pattern

I have a queue of 1000 work items and a n-proc machine (assume n =
4).The main thread spawns n (=4) worker threads at a time ( 25 outer
iterations) and waits for all threads to complete before processing
the next n (=4) items until the entire queue is processed
for(i= 0 to queue.Length / numprocs)
for(j= 0 to numprocs)
{
CreateThread(WorkerThread,WorkItem)
}
WaitForMultipleObjects(threadHandle[])
The work done by each (worker) thread is not homogeneous.Therefore in
1 batch (of n) if thread 1 spends 1000 s doing work and rest of the 3
threads only 1 s , above design is inefficient,becaue after 1 sec
other 3 processors are idling. Besides there is no pooling - 1000
distinct threads are being created
How do I use the NT thread pool (I am not familiar enough- hence the
long winded question) and QueueUserWorkitem to achieve the above. The
following constraints should hold
The main thread requires that all worker items are processed before
it can proceed.So I would think that a waitall like construct above
is required
I want to create as many threads as processors (ie not 1000 threads
at a time)
Also I dont want to create 1000 distinct events, pass to the worker
thread, and wait on all events using the QueueUserWorkitem API or
otherwise
Exisitng code is in C++.Prefer C++ because I dont know c#
I suspect that the above is a very common pattern and was looking for
input from you folks.
I'm not a C++ programmer, so I'll give you some half-way pseudo code for it
tcount = 0
maxproc = 4
while queue_item = queue.get_next() # depends on implementation of queue
# may well be:
# for i=0; i<queue.length; i++
while tcount == maxproc
wait 0.1 seconds # or some other interval that isn't as cpu intensive
# as continously running the loop
tcount += 1 # must be atomic (reading the value and writing the new
# one must happen consecutively without interruption from
# other threads). I think ++tcount would handle that in cpp.
new thread(worker, queue_item)
function worker(item)
# ...do stuff with item here...
tcount -= 1 # must be atomic

Resources