Multiprocessors and multithreading - Operating Systems - multithreading

I was going through topics of Operating Systems using the text book by Galvin (the 9th edition). In Chapter 4 on multi-threading, I came across problem 14 which is as follows:
A system with two dual-core processors has four processors available for scheduling. A CPU -intensive application is running on this system. All input is performed at program start-up, when a single file must be opened. Similarly, all output is performed just before the program terminates, when the program results must be written to a single file. Between startup and termination, the program is entirely CPU - bound. Your task is to improve the performance of this application by multithreading it. The application runs on a system that uses the one-to-one threading model (each user thread maps to a kernel thread).
• How many threads will you create to perform the input and output? Explain.
• How many threads will you create for the CPU -intensive portion of the application? Explain.
For the first part, I think we could create 4 threads for taking input for reading from a file as well as for writing output to a file. This is because during either input or output, there is no updating of the data being carried out.
For the second part, the nature of operation to be carried out on data is not known, for example, whether (1) average of the data is to be printed or (2) a function to print the average of first and last data points, then print average of second and second last data points, and so on.
Therefore, for second part, one thread could be employed to handle the operation.
But I am not very sure of the answer I gave here being right. So, I would be very grateful if you could let me know the right answer for this.

The question is testing if you understand some principles about parallelizing work to increase speed. Some of these principles are:
In the usual case, reading and writing a single file cannot be sped up using multiple cores. Speed of file I/O is determine by the properties of where and how the file is stored. Throwing more threads at it is not going to help, because those threads are just going to be waiting for the I/O to complete.
How many threads you use for CPU intensive portion depends entirely on what is being computed. If the program is generating imagery for a movie, use 4 threads because that is completely parallel. If the workload is entirely serial, use 1 thread because adding more threads won't help (by definition).
Computing the averages in your example is almost completely parallel, so you should use four threads, not one.

Related

Do Rust threads run at the same time in parallel? Documentation sounds like it does not [duplicate]

I want to know if a program can run two threads at the same time (that is basically what it is used for correct?). But if I were to do a system call in one function where it runs on thread A, and have some other tasks running in another function where it runs on thread B, would they both be able to run at the same time or would my second function wait until the system call finishes?
Add-on to my original question: Now would this process still be an uninterruptable process while the system call is going on? I am talking about using any system call on UNIX/LINUX.
Multi-threading and parallel processing are two completely different topics, each worthy of its own conversation, but for the sake of introduction...
Threading:
When you launch an executable, it is running in a thread within a process. When you launch another thread, call it thread 2, you now have 2 separately running execution chains (threads) within the same process. On a single core microprocessor (uP), it is possible to run multiple threads, but not in parallel. Although conceptually the threads are often said to run at the same time, they are actually running consecutively in time slices allocated and controlled by the operating system. These slices are interleaved with each other. So, the execution steps of thread 1 do not actually happen at the same time as the execution steps of thread 2. These behaviors generally extend to as many threads as you create, i.e. packets of execution chains all working within the same process and sharing time slices doled out by the operating system.
So, in your system call example, it really depends on what the system call is as to whether or not it would finish before allowing the execution steps of the other thread to proceed. Several factors play into what will happen: Is it a blocking call? Does one thread have more priority than the other. What is the duration of the time slices?
Links relevant to threading in C:
SO Example
POSIX
ANSI C
Parallel Processing:
When multi-threaded program execution occurs on a multiple core system (multiple uP, or multiple multi-core uP) threads can run concurrently, or in parallel as different threads may be split off to separate cores to share the workload. This is one example of parallel processing.
Again, conceptually, parallel processing and threading are thought to be similar in that they allow things to be done simultaneously. But that is concept only, they are really very different, in both target application and technique. Where threading is useful as a way to identify and split out an entire task within a process (eg, a TCP/IP server may launch a worker thread when a new connection is requested, then connects, and maintains that connection as long as it remains), parallel processing is typically used to send smaller components of the same task (eg. a complex set of computations that can be performed independently in separate locations) off to separate resources (cores, or uPs) to be completed simultaneously. This is where multiple core processors really make a difference. But parallel processing also takes advantage of multiple systems, popular in areas such as genetics and MMORPG gaming.
Links relevant to parallel processing in C:
OpenMP
More OpenMP (examples)
Gribble Labs - Introduction to OpenMP
CUDA Tookit from NVIDIA
Additional reading on the general topic of threading and architecture:
This summary of threading and architecture barely scratches the surface. There are many parts to the the topic. Books to address them would fill a small library, and there are thousands of links. Not surprisingly within the broader topic some concepts do not seem to follow reason. For example, it is not a given that simply having more cores will result in faster multi-threaded programs.
Yes, they would, at least potentially, run "at the same time", that's exactly what threads are for; of course there are many details, for example:
If both threads run system calls that e.g. write to the same file descriptor they might temporarily block each other.
If thread synchronisation primitives like mutexes are used then the parallel execution will be blocked.
You need a processor with at least two cores in order to have two threads truly run at the same time.
It's a very large and very complex subject.
If your computer has only a single CPU, you should know, how it can execute more than one thread at the same time.
In single-processor systems, only a single thread of execution occurs at a given instant. because Single-processor systems support logical concurrency, not physical concurrency.
On multiprocessor systems, several threads do, in fact, execute at the same time, and physical concurrency is achieved.
The important feature of multithreaded programs is that they support logical concurrency, not whether physical concurrency is actually achieved.
The basics are simple, but the details get complex real quickly.
You can break a program into multiple threads (if it makes sense to do so), and each thread will run "at its own pace", such that if one must wait for, eg, some file I/O that doesn't slow down the others.
On a single processor multiple threads are accommodated by "time slicing" the processor somehow -- either on a simple clock basis or by letting one thread run until it must wait (eg, for I/O) and then "switching" to the next thread. There is a whole art/science to doing this for maximum efficiency.
On a multi-processor (such as most modern PCs which have from 2 to 8 "cores") each thread is assigned to a separate processor, and if there are not enough processors then they are shared as in the single processor case.
The whole area of assuring "atomicity" of operations by a single thread, and assuring that threads don't somehow interfere with each other is incredibly complex. In general a there is a "kernel" or "nucleus" category of system call that will not be interrupted by another thread, but thats only a small subset of all system calls, and you have to consult the OS documentation to know which category a particular system call falls into.
They will run at the same time, for one thread is independent from another, even if you perform a system call.
It's pretty easy to test it though, you can create one thread that prints something to the console output and perform a system call at another thread, that you know will take some reasonable amount of time. You will notice that the messages will continue to be printed by the other thread.
Yes, A program can run two threads at the same time.
it is called Multi threading.
would they both be able to run at the same time or would my second function wait until the system call finishes?
They both are able to run at the same time.
if you want, you can make thread B wait until Thread A completes or reverse
Two thread can run concurrently only if it is running on multiple core processor system, but if it has only one core processor then two threads can not run concurrently. So only one thread run at a time and if it finishes its job then the next thread which is on queue take the time.

Why do we need semaphores on single cpu?

I have read that we use semaphores inside the linux kerenl,and i have read that semaphores has advantages even in one single cpu (we can run only one process\thread). Can anyone please give me an example of a problem that semaphore solves(inside the kernel)?
In my view, there can be a problem only if we have more than one cpu, because two process may call system calls that use the same data structure, and probablly cause problems.
Thank you for your help!
You don't really need more than one CPU for concurrency. The multiple CPUs are really "an implementation detail," a piece of hardware quirkiness that you can abstract away from. Concurrency is a logical property of programs. You can have concurrency without multiple CPUs, and use multiple CPUs without "real concurrency".
Consider a web server. It has to be "concurrent," in the sense that it must serve multiple clients at once, hold information about multiple connections and once, and process multiple requests at once. You can have it literally do this, by having multiple CPUs all working at the same time. Yet, the program only has to appear to do multiple things at once. It could just as well be running on one CPU and context switching to fairly service all the work put to it. The fact that a web-server does multiple things at once is part of its interface: the I/O for the connections are interleaved, if a request has exclusively locked a resource, another request won't start trying to manipulate that same resource, etc. Writing a web server without concurrency produces a program that is wrong.
Semaphores help you with concurrency, by letting you control the way processes access resources. You asked, if you had one process running, how another could run at the same time with only a single core. Well, as I said, concurrency doesn't need multiple cores. The first process can be paused, and the second one started while the first one is still unfinished. This is just an implementation detail; logically, to the program writer, the two processes are running simultaneously, whether there are multiple cores or not. If the program was written without semaphores (or had broken concurrency in some other way), it would be wrong, even on a single core. Physically, this will be because context switching can abruptly pause one computation and start another at any time, and, without semaphores, the newly live thread won't know what resources it can and cannot access. Logically, this will be because the processes are running simultaneously, once you abstract yourself away from the implementation, and, in general, processes running simultaneously can walk over each other if not properly synchronized.
For an example applicable to an OS kernel, consider that every process is logically running concurrently with every other process. A kernel provides the implementation that makes this concurrency work. A resource that two processes may want simultaneously is a hard drive. A semaphore might be used in the kernel to track whether a given drive is currently busy with a read or write. A process trying to read or write to the same disk will ask the kernel to do so, and the kernel can check the semaphore to see that the disk is still busy and force the offending process to wait. Now, an operating system does count as low level code, so in some places, yes, you might want to omit some otherwise vital concurrency safeguards when running on a single CPU, because your job is to handle such implementation details, but higher level parts may still use them.
In contrast, consider a number-crunching program. Let's say it's processing each element of a huge array of data into an equal-sized array of modified data (a functional map operation). It can use multiple CPUs to do this more quickly, but it can also work one CPU. The observable behavior of the program is the same, and you never get any idea that it's doing multiple things at once from its behavior. Numbers go in, numbers come out, who cares what happens in the middle? Writing such a program without the ability to do multiple things at once does not produce a logically incorrect program, just a slow one. Such a program probably does not need semaphores when running on a single CPU, because it didn't need concurrency in the first place.

single file reader/multiple consumer model: good idea for multithreaded program?

I have a simple task that is easily parallelizable. Basically, the same operation must be performed repeatedly on each line of a (large, several Gb) input file. While I've made a multithreaded version of this, I noticed my I/O was the bottleneck. I decided to build a utility class that involves a single "file reader" thread that simply goes and reads straight ahead as fast as it can into a circular buffer. Then, multiple consumers can call this class and get their 'next line'. Given n threads, each thread i's starting line is line i in the file, and each subsequent line for that thread is found by adding n. It turns out that locks are not needed for this, a couple key atomic ops are enough to preserve invariants.
I've tested the code and it seems faster, but upon second thought, I'm not sure why. Wouldn't it be just as fast to divide the large file into n input files ( you can 'seek' ahead into the same file to achieve the same thing, minimal preprocessing ), and then have each process simply call iostream::readLine on its own chunk? ( since iostream reads into its own buffer as well ). It doesn't seem that sharing a single buffer amongst multiple threads has any inherent advantage, since the workers are not actually operating on the same lines of data. Plus, there's no good way I don't think to parallelize so that they do work on the same lines. I just want to understand the performance gain I'm seeing, and know whether it is 'flukey' or scalable/reproducible across platforms...
When you are I/O limited, you can get a good speedup by using two threads, one reading the file, second doing the processing. This way the reading will never wait for processing (expect for the very last line) and you will be doing reading 100 %.
The buffer should be large enough to give the consumer thread enough work in one go, which most often means it should consist of multiple lines (I would recommend at least 4000 characters, but probably even more). This will prevent thread context switching cost to be impractically high.
Single threaded:
read 1
process 1
read 2
process 2
read 3
process 3
Double threaded:
read 1
process 1/read 2
process 2/read 3
process 3
On some platforms you can get the same speedup also without threads, using overlapped I/O, but using threads can be often clearer.
Using more than one consumer thread will bring no benefit as long as you are really I/O bound.
In your case, there are at least two resources that your program competes for, the CPU and the harddisk. In a single-threaded approach, you request data then wait with an idle CPU for the HD to deliver it. Then, you handle the data, while the HD is idle. This is bad, because one of the two resources is always idle. This changes a bit if you have multiple CPUs or multiple HDs. Also, in some cases the memory bandwidth (i.e. the RAM connection) is also a limiting resource.
Now, your solution is right, you use one thread to keep the HD busy. If this threads blocks waiting for the HD, the OS just switches to a different thread that handles some data. If it doesn't have any data, it will wait for some. That way, CPU and HD will work in parallel, at least some of the time, increasing the overall throughput. Note that you can't increase the throughput with more than two threads, unless you also have multiple CPUs and the CPU is the limiting factor and not the HD. If you are writing back some data, too, you could improve performance with a third thread that writes to a second harddisk. Otherwise, you don't get any advantage from more threads.

Two processes on two CPUs -- is it possible that they complete at exactly the same moment?

This is sort of a strange question that's been bothering me lately. In our modern world of multi-core CPUs and multi-threaded operating systems, we can run many processes with true hardware concurrency. Let's say I spawn two instances of Program A in two separate processes at the same time. Disregarding OS-level interference which may alter the execution time for either or both processes, is it possible for both of these processes to complete at exactly the same moment in time? Is there any specific hardware/operating-system mechanism that may prevent this?
Now before the pedants grill me on this, I want to clarify my definition of "exactly the same moment". I'm not talking about time in the cosmic sense, only as it pertains to the operation of a computer. So if two processes complete at the same time, that means that they complete
with a time difference that is so small, the computer cannot tell the difference.
EDIT : by "OS-level interference" I mean things like interrupts, various techniques to resolve resource contention that the OS may use, etc.
Actually, thinking about time in the "cosmic sense" is a good way to think about time in a distributed system (including multi-core systems). Not all systems (or cores) advance their clocks at exactly the same rate, making it hard to actually tell which events happened first (going by wall clock time). Because of this inability to agree, systems tend to measure time by logical clocks. Two events happen concurrently (i.e., "exactly at the same time") if they are not ordered by sharing data with each other or otherwise coordinating their execution.
Also, you need to define when exactly a process has "exited." Thinking in Linux, is it when it prints an "exiting" message to the screen? When it returns from main()? When it executes the exit() system call? When its process state is run set to "exiting" in the kernel? When the process's parent receives a SIGCHLD?
So getting back to your question (with a precise definition for "exactly at the same time"), the two processes can end (or do any other event) at exactly the same time as long as nothing coordinates their exiting (or other event). What counts as coordination depends on your architecture and its memory model, so some of the "exited" conditions listed above might always be ordered at a low level or by synchronization in the OS.
You don't even need "exactly" at the same time. Sometimes you can be close enough to seem concurrent. Even on a single core with no true concurrency, two processes could appear to exit at the same time if, for instance, two child processes exited before their parent was next scheduled. It doesn't matter which one really exited first; the parent will see that in an instant while it wasn't running, both children died.
So if two processes complete at the same time, that means that they complete with a time difference that is so small, the computer cannot tell the difference.
Sure, why not? Except for shared memory (and other resources, see below), they're operating independently.
Is there any specific hardware/operating-system mechanism that may prevent this?
Anything that is a resource contention:
memory access
disk access
network access
explicit concurrency management via locks/semaphores/mutexes/etc.
To be more specific: these are separate CPU cores. That means they have computing circuitry implemented in separate logic circuits. From the wikipedia page:
The fact that each core can have its own memory cache means that it is quite possible for most of the computation to occur as interaction of each core with its own cache. Once you have that, it's just a matter of probability. That's not to say that algorithms take a nondeterministic amount of time, but their inputs may come from a probabilistic distribution and the amount of time it takes to run is unlikely to be completely independent of input data unless the algorithm has been carefully designed to take the same amount of time.
Well I'm going to go with I doubt it:
Internally any sensible OS maintains a list of running processes.
It therefore seems sensible for us to define the moment that the process completes as the moment that it is removed from this list.
It also strikes me as fairly unlikely (but not impossible) that a typical OS will go to the effort to construct this list in such a way that two threads can independently remove an item from this list at exactly the same time (processes don't terminate that frequently and removing an item from a list is relatively inexpensive - I can't see any real reason why they wouldn't just lock the entire list instead).
Therefore for any two terminating processes A and B (where A terminates before B), there will always be a reasonably large time period (in a cosmic sense) where A has terminated and B has not.
That said it is of course possible to produce such a list, and so in reality it depends on the OS.
Also I don't really understand the point of this question, in particular what do you mean by
the computer cannot tell the difference
In order for the computer to tell the difference it has to be able to check the running process table at a point where A has terminated and B has not - if the OS schedules removing process B from the process table immediately after process A then it could very easily be that no such code gets a chance to execute and so by some definitions it isn't possible for the computer to tell the difference - this sutation holds true even on a single core / CPU processor.
Yes, without any OS Scheduling interference they could finish at the same time, if they don't have any resource contention (shared memory, external io, system calls). When either of them have a lock on a resource they will force the other to stall waiting for resource to free up.

Multithreading vs virtual process

There's three types of control flow model,
single threaded, virtual process and multithreaded process.
here's what has written in the power point which I study form
Virtual processes. This is based on a
single threaded model but gives the
appearance of concurrent execution. a
controller component schedules the
execution of the other components and
gives them control. The scheduling can
be performed periodically or based on
events. This model is based on a
logical decomposition of activities in
simple steps whose execution requires
only short intervals of time.
I couldn't understand it and couldn't understand the difference between multithreading process and vp.
can some one help?
EDIT here the chapter of the book which I mention the section above form
http://www.mediafire.com/?ru82i0nvp12qw6t
This term "virtual process" is unusual but based on your description I can give 2 real-world examples of using each. For multithreading, imagine you have a lot of data in memory and want to perform some calculations on it... you can split that data up and have seperate threads (1 per CPU core, ideally) simultaneously working on different chunks of the data. This way, the calculations will be done faster based on how many threads you create. For 'virtual process', imagine you need to retrieve 20 files from remote servers... most of the CPU 'work' involved in this is just sitting around waiting for bytes to arrive from the remote network. Creating separate threads to download each of these files would not make the files arrive any faster. If anything, having extra threads that the OS needs to constantly switch between (and it will switch a LOT because most of the time each thread will just say 'im still waiting' and then cede control). So, in this case it's better to have a single thread doing all of the downloading, cycling internally between each of the download tasks to read incomming data off of their buffers.
Your virtual process looks to me like event driven programming. Google for eg. 'threads vs events', the first link you get is quite fine comparison.
EDIT: Here's another comparison I've found in bookmarks.

Resources