I am using both ostream::write and ostream::flush operation in a multithread application in the following sequence:
// <<-- start time measurement
{
ostream::write();
ostream::flush();
}
//<<-- end time measurement
The issue it that on measuring the time for the above sequence, I get a very short time (~10msec), yet the time between thread entrance becomes very large (~400msec), only because of adding the ostream::flush and ostream:::write commands.
Only oncee in a while I get that the time difference becomes larger, yet I am not sure if it is because of some context switch.
I test it in Linux machine, dual core cpu.
This make me confused, I have assumed that both of these functions are blocking, or is it that the writing is actually done only after fflush ?
EDIT:
only one thread does the writing to file.
Related
I understand the notion of update_rq_clock as it updates the run queue clock on system tick periodically. But this function calls update_rq_clock_task(). What is the purpose behind this function?
Within update_rq_clock the difference between the CPU timestamp and the run queue clock is calculated (The rq->clock variable represents the last clock read from the CPU). That difference is added to the rq->clock and to the rq->clock_task (Which is the same as rq->clock - time for interrupts and stolen time) through update_rq_clock_task.
There are a couple of options within the function, which you can activate with kernel build options. But basically it breaks down to:
...
rq->clock_task += delta;
...
update_rq_clock_pelt(rq, delta);
...
So, both functions together update the clock of the run queue and the clock of the run queue without accounting for interrupts and stolen time (unless you activated that accounting through the kernel options), so the actual time that the tasks used.
Apologized posting the above question here because i read few same kind of thread here but still things is not clear.
As we know that Both processes and threads are independent sequences of execution. The typical difference is that threads (of the same process) run in a shared memory space, while processes run in separate memory spaces. (quoted from this answer)
the above explanation is not enough to visualize the actual things. it will be better if anyone explain what is process with example and how it is different than thread with example.
suppose i start a MS-pain or any accounting program. can we say that accounting program is process ? i guess no. a accounting apps may have multiple process and each process can start multiple thread.
i want to visualize like which area can be called as process when we run any application. so please explain and guide me with example for better visualization and also explain how process and thread is not same. thanks
suppose i start a MS-pain or any accounting program. can we say that accounting program is process ?
Yes. Or rather the current running instance of it is.
i guess no. a accounting apps may have multiple process and each process can start multiple thread.
It is possible for a process to start another process, but relatively usual with windowed software.
The process is a given executable; a windowed application, a console application and a background application would all each involve a running process.
Process refers to the space in which the application runs. With a simple process like NotePad if you open it twice so that you have two NotePad windows open, you have two NotePad processes. (This is also true of more complicated processes, but note that some do their own work to keep things down to one, so e.g. if you have Firefox open and you run Firefox again there will briefly be two Firefox processes but the second one will tell the first to open a new window before exiting and the number of processes returns to one; having a single process makes communication within that application simpler for reasons we'll get to now).
Now each process will have to have at least one thread. This thread contains information about just what it is trying to do (typically in a stack, though that is certainly not the only possible approach). Consider this simple C# program:
static int DoAdd(int a, int b)
{
return a + b;
}
void Main()
{
int x = 2;
int y = 3;
int z = DoAdd(x, y);
Console.WriteLine(z);
}
With this simple program first 2 and 3 are stored in places in the stack (corresponding with the labels x and y). Then they are pushed onto the stack again and the thread moves to DoAdd. In DoAdd these are popped and added, and the result pushed to the stack. Then this is stored in the stack (corresponding with the labels z). Then that is pushed again and the thread moves to Console.WriteLine. That does its thing and the thread moves back to Main. Then it leaves and the thread dies. As the only foreground thread running its death leads to the process also ending.
(I'm simplifying here, and I don't think there's a need to nitpick all of those simplifications right now; I'm just presenting a reasonable mental model).
There can be more than one thread. For example:
static int DoAdd(int a, int b)
{
return a + b;
}
static void PrintTwoMore(object num)
{
Thread.Sleep(new Random().Next(0, 500));
Console.WriteLine(DoAdd(2, (int)num));
}
void Main()
{
for(int i = 0; i != 10; ++i)
new Thread(PrintTwoMore).Start(i);
}
Here the first thread creates ten more threads. Each of these pause for a different length of time (just to demonstrate that they are independent) and then do a similar task to the first example's only thread.
The first thread dies upon creating the 10th new thread and setting it going. The last of these 10 threads to be running will be the last foreground thread and so when it dies so does the process.
Each of these threads can "see" the same methods and can "see" any data that is stored in the application though there are limits on how likely they are to stamp over each other I won't get into now.
A process can also start a new process, and communicate with it. This is very common in command-line programs, but less so in windowed programs. In the case of Windowed programs its also more common on *nix than on Windows.
One example of this would be when Geany does a find-in-directory operation. Geany doesn't have its own find-in-directory functionality but rather runs the program grep and then interprets the results. So we start with one process (Geany) with its own threads running then one of those threads causes the grep program to run, which means we've also got a grep process running with its threads. Geany's threads and grep's threads cannot communicate to each other as easily as threads in the same process can, but when grep outputs results the thread in Geany can read that output and use that to display those results.
The problem seems simple, I have a number (huge) of operations that I need to work and the main thread can only proceed when all of those operations return their results, however. I tried in one thread only and each operation took about let's say from 2 to 10 seconds at most, and at the end it took about 2,5 minutes. Tried with future tasks and submited them all to the ExecutorService. All of them processed at a time, however each of them took about let's say from 40 to 150 seconds. In the end of the day the full process took about 2,1 minutes.
If I'm right, all the threads were nothing but a way of execute all at once, although sharing processor's power, and what I thought I would get would be the processor working heavily to get me all the tasks executed at the same time taking the same time they take to excecuted in a single thread.
Question is: Is there a way I can reach this? (maybe not with future tasks, maybe with something else, I don't know)
Detail: I don't need them to exactly work at the same time that actually doesn't matter to me what really matters is the performance
You might have created way too many threads. As a consequence, the cpu was constantly switching between them thus generating a noticeable overhead.
You probably need to limit the number of running threads and then you can simply submit your tasks that will execute concurrently.
Something like:
ExecutorService es = Executors.newFixedThreadPool(8);
List<Future<?>> futures = new ArrayList<>(runnables.size());
for(Runnable r : runnables) {
es.submit(r);
}
// wait they all finish:
for(Future<?> f : futures) {
f.get();
}
// all done
I have read that a forever process like daemon should run with a sleep() in their while(1) or for(;;) loop. They say, it is required because otherwise this process will always be in a run queue and the kernel will always run it. This will block the other process. I don't agree that it will block the other process completely. If there is a time slicing, then it will execute other process. But, certainly it will steal a time from others. Making a delay for other process since this process is always in the run state. By default, the Linux runs as a round-robin. The first task is swapd, then other tasks . This is a circular link list with first task as swapd(process-id is 0) and then other tasks. I believe this is still based as time sliced. A particular time for each process. These tasks are nothing but the process-descriptor. I believe this link list is maintained by the init process. Please do correct me here If I am wrong. Other question is if we need to give a sleep() then what should be its value? How can we determine the sleep value to get the best results?
If your program has useful things to do, don't throttle it. A program can move out of the run queue by doing blocking stuff like IO and waiting.
If you are writing a polling loop that can spin an arbitrary number of times you probably want to throttle it a bit with sleep because spinning too often has little value.
That said, polling loops are a means of last resort. Normally, programs perform useful work with every instruction, so they don't sleep at all.
Sleep is almost certainly the wrong solution.
Usually what you do it call a blocking function which wakes you up when there's something for you to do.
For example, if you're a network service you'd want to remain inactive until a request arrives.
In other words, the core of your daemon should not look like this:
while(1)
{
if (checkIfSomethingToDo())
doSomething();
else
sleep(1);
}
but rather a little like this:
while(1)
{
int ret = poll(fds, nfds, -1);
if (ret > 0)
doSomething();
}
Have the kernel put you to sleep until there's actual work to do. It's not hard to implement, you'd be a lot more efficient (not stealing CPU time from others, only to waste it doing no actual work) and your response latency will go down too.
A sleep forces the os to pass execution to another thread and therefore is helpfull, or at least fair. Start with sleep one. Should be ok.
I wonder if anyone of you know how to to use the function get_timer()
to measure the time for context switch
how to find the average?
when to display it?
Could someone help me out with this.
Is it any expert who knows this?
One fairly straightforward way would be to have two threads communicating through a pipe. One thread would do (pseudo-code):
for(n = 1000; n--;) {
now = clock_gettime(CLOCK_MONOTONIC_RAW);
write(pipe, now);
sleep(1msec); // to make sure that the other thread blocks again on pipe read
}
Another thread would do:
context_switch_times[1000];
while(n = 1000; n--;) {
time = read(pipe);
now = clock_gettime(CLOCK_MONOTONIC_RAW);
context_switch_times[n] = now - time;
}
That is, it would measure the time duration between when the data was written into the pipe by one thread and the time when the other thread woke up and read that data. A histogram of context_switch_times array would show the distribution of context switch times.
The times would include the overhead of pipe read and write and getting the time, however, it gives a good sense of big the context switch times are.
In the past I did a similar test using stock Fedora 13 kernel and real-time FIFO threads. The minimum context switch times I got were around 4-5 usec.
I dont think we can actually measure this time from User space, as in kernel you never know when your process is picked up after its time slice expires. So whatever you get in userspace includes scheduling delays as well. However, from user space you can get closer measurement but not exact always. Even a jiffy delay matters.
I believe LTTng can be used to capture detailed traces of context switch timings, among other things.