Excesive Linux Latency - linux

Do you think that a latency of 50 msec are normal in Linux System?
I have a program with many threads, one thread is controlling the movement of an object with a motor and photocells.
I have made many thing to get minimun latency, but always get 50 msec that cause a position error in the object.
Things I did:
- nice function to -20
- Thread priority of photeocell control thread: SCHED FIFO, 99
- Kernel configuration: CONFING_PREEMPT=y
- mlockall (MCL_CURRENT | MCL_FUTURE);
Many times, I lose 50 msec waiting for a photocell. I think that the problema is not another of
my thread, but process in the kernel.
Is posible reduced this latency? Is posible to know who is getting this 50 msec extra?
The thread that is controlling photocells make many "read" functions. Can this generate problems?
/**********/
At now the situation is:
There is only one thread running an infinite empty loop, only looking for time at start od the loop an at the end of the loop.
No acces to disk, no acces to GPIO, no serial ports, nothing.
The loop spend 50 milisecond many of the times.
I have not set cpuaffinity, my processor has only one nucleus.

I have been making test in my program.
This is the code in the main function, before the program star the threads, that cause de 50 mseg latency:
struct sched_param lsPrio;
lsPrio.sched_priority = 1;
if (sched_setscheduler (0, SCHED_FIFO, &lsPrio) != 0)
printf ("FALLO sched_set\n");
if I comment this lines the latency is reduced about 1 mseg.
Why this lines cause latency?

Related

Precise Throughput Timer stuck with simple setup

I have a similar issue as Synchronizing timer hangs with simple setup, but with Precise Throughput Timer which suppose to replace Synchronizing timer:
Certain cases might be solved via Synchronizing Timer, however Precise Throughput Timer has native way to issue requests in packs. This behavior is disabled by default, and it is controlled with "Batched departures" settings
Number of threads in the batch (threads). Specifies the number of samples in a batch. Note the overall number of samples will still be in line with Target Throughput
Delay between threads in the batch (ms). For instance, if set to 42, and the batch size is 3, then threads will depart at x, x+42ms, x+84ms
I'm setting 10 thread number , 1 ramp up and 1 loop count,
I'm adding 1 HTTP Request only (less than 1 seconds response) and before it Test Action with Precise Throughput Timer as a child with the following setup:
Thread stuck after 5 threads succeeded:
EDIT 1
According to #Dimitri T solution:
Change Duration to 100 and add line to logging configuration and got 5 errors:
2018-03-12 15:43:42,330 INFO o.a.j.t.JMeterThread: Stopping Thread: org.apache.jorphan.util.JMeterStopThreadException: The thread is scheduled to stop in -99886 ms and the throughput timer generates a delay of 20004077. JMeter (as of 4.0) does not support interrupting of sleeping threads, thus terminating the thread manually.
EDIT 2
According to #Dimitri T solution set "Loop Count" to -1 executed 10 threads, but if I change Number of threads in batch from 2 to 5 it execute only 3 threads and stops
INFO o.a.j.t.JMeterThread: Stopping Thread: org.apache.jorphan.util.JMeterStopThreadException: The thread is scheduled to stop in -89233 ms and the throughput timer generates a delay of 19999450. JMeter (as of 4.0) does not support interrupting of sleeping threads, thus terminating the thread manually.
Set "Duration (seconds)" in your Thread Group to something non-zero (i.e. to 100)
Depending on what you're trying to achieve you might also want to set "Loop Count" to -1
You can also add the following line to log4j2.xml file:
<Logger name="org.apache.jmeter.timers" level="debug" />
This way you will be able to see what's going on with your timer(s) in jmeter.log file

Finding requests per second for distributed system - a textbook query

Found a question in Pradeep K Sinha's book
From my understanding it is safe to assume howsoever number of threads are available. But how could we compute the time?
Single-threaded:
We want to figure out many requests per second the system can support. This is represented as n below.
1 second
= 1000 milliseconds
= 0.7n(20) + 0.3n(100)
Since 70% of the requests hit the cache, we represent the time spent handling requests that hit the cache with 0.7n(20). We represent the requests that miss the cache with 0.3n(100). Since the thread goes to sleep when there is a cache miss and it contacts the file server, we don't need to worry about interleaving the handling for the next request with the current one.
Solving for n:
1000
= 0.7n(20) + 0.3n(100)
= 0.7n(20) + 1.5n(20)
= 2.2n(20)
= 44n
=> n = 100/44 = 22.73.
Therefore, a single thread can handle 22.73 requests per second.
Multi-threaded:
The question does not give much detail about the multi-threaded state, apart from the context switch cost. The answer to this question depends on several factors:
How many cores does the computer have?
How many threads can exist at once?
When there is a cache miss, how much time does the computer spend servicing the request and how much time does the computer spend sleeping?
I am going to make the following assumptions:
There is 1 core.
There is no bound on how many threads can exist at once.
On a cache miss, the computer spends 20 milliseconds servicing the request (e.g. checking the cache, contacting the file server, and forwarding the response to the client) and 80 milliseconds sleeping.
I can now solve for n:
1000 milliseconds
= 0.7n(20) + 0.3n(20).
On a cache miss, a thread spends 20 milliseconds doing work and 80 milliseconds sleeping. When the thread is sleeping, another thread can run and do useful work. Thus, on a cache miss, the thread only uses the CPU for 20 milliseconds, whereas when the process was single-threaded, the next request was blocked from being serviced for 100 milliseconds.
Solving for n:
1000 milliseconds
= 0.7n(20) + 0.3n(20)
= 1.0n(20)
= 20n
=> n = 1000/20 = 50.
Therefore, a multi-threaded process can handle 50 requests per second given the assumptions above.

How can I improve performance with FutureTasks

The problem seems simple, I have a number (huge) of operations that I need to work and the main thread can only proceed when all of those operations return their results, however. I tried in one thread only and each operation took about let's say from 2 to 10 seconds at most, and at the end it took about 2,5 minutes. Tried with future tasks and submited them all to the ExecutorService. All of them processed at a time, however each of them took about let's say from 40 to 150 seconds. In the end of the day the full process took about 2,1 minutes.
If I'm right, all the threads were nothing but a way of execute all at once, although sharing processor's power, and what I thought I would get would be the processor working heavily to get me all the tasks executed at the same time taking the same time they take to excecuted in a single thread.
Question is: Is there a way I can reach this? (maybe not with future tasks, maybe with something else, I don't know)
Detail: I don't need them to exactly work at the same time that actually doesn't matter to me what really matters is the performance
You might have created way too many threads. As a consequence, the cpu was constantly switching between them thus generating a noticeable overhead.
You probably need to limit the number of running threads and then you can simply submit your tasks that will execute concurrently.
Something like:
ExecutorService es = Executors.newFixedThreadPool(8);
List<Future<?>> futures = new ArrayList<>(runnables.size());
for(Runnable r : runnables) {
es.submit(r);
}
// wait they all finish:
for(Future<?> f : futures) {
f.get();
}
// all done

Performance when calling fsync on multiple files vs one file

I have multiple threads each accepting requests, doing some processing, storing the result in a commit log, and returning the results. In order to guarantee that at most x seconds worth of data is lost, this commit log needs to be fsync'd every x seconds.
I would like to avoid synchronization between threads, which means they each need to have their own commit log rather than a shared log - is it possible to fsync all these different commit logs regularly in a performant way?
This is on Linux, ext4 (or ext3)
(Note: due to the nature of the code, even during normal processing the threads need to re-read some of their own recent data from the commit log (but never other threads commit log data), so I believe it would be impractical to use a shared log since many threads need to read/write to it)
If you only need flushing to happen every few seconds, do you need to fsync() at all? I.e. the OS should do it for you fairly regularly (unless the system is under heavy load and disk I/O is in short supply).
Otherwise, have your threads do something like:
if (high_resolution_time() % n == 0) {
fsync();
}
Where n is a value that would be e.g. 3 if high_resolution_time() returned returned Unix EPOCH time (which is expressed in seconds). Would make the thread flush the file every 3 seconds.
The problem, of course, is that you need much higher clock resolution to avoid having a thread that passes this code section several times per second not flush its file multiple times in quick succession. I don't know what programming language you use, but in C on Linux you could use
gettimeofday:
struct timeval tv;
gettimeofday(&tv, null);
double x = (double)tv.tv_sec * (double)1000000 + (double)tv.tv_usec;
if (x % 3000000 == 0) { // fsync every 3 seconds
fsync();
}

context switch measure time

I wonder if anyone of you know how to to use the function get_timer()
to measure the time for context switch
how to find the average?
when to display it?
Could someone help me out with this.
Is it any expert who knows this?
One fairly straightforward way would be to have two threads communicating through a pipe. One thread would do (pseudo-code):
for(n = 1000; n--;) {
now = clock_gettime(CLOCK_MONOTONIC_RAW);
write(pipe, now);
sleep(1msec); // to make sure that the other thread blocks again on pipe read
}
Another thread would do:
context_switch_times[1000];
while(n = 1000; n--;) {
time = read(pipe);
now = clock_gettime(CLOCK_MONOTONIC_RAW);
context_switch_times[n] = now - time;
}
That is, it would measure the time duration between when the data was written into the pipe by one thread and the time when the other thread woke up and read that data. A histogram of context_switch_times array would show the distribution of context switch times.
The times would include the overhead of pipe read and write and getting the time, however, it gives a good sense of big the context switch times are.
In the past I did a similar test using stock Fedora 13 kernel and real-time FIFO threads. The minimum context switch times I got were around 4-5 usec.
I dont think we can actually measure this time from User space, as in kernel you never know when your process is picked up after its time slice expires. So whatever you get in userspace includes scheduling delays as well. However, from user space you can get closer measurement but not exact always. Even a jiffy delay matters.
I believe LTTng can be used to capture detailed traces of context switch timings, among other things.

Resources