Concurrent execution/Re-entrant /ThreadSafe/? - multithreading

I read many answers given here for questions related to thread safety, re-entrancy, but when i think about them, some more questions came to mind, hence this question/s.
1.) I have one executable program say some *.exe. If i run this program on command prompt, and while it is executing, i run the same program on another command prompt, then in what conditions the results could be corrupted, i.e. should the code of this program be re-entrant or it should be thread safe alone?
2.) While defining re-entrancy, we say that the routine can be re-entered while it is already running, in what situations the function can be re-entered (apart from being recursive routine, i am not talking recursive execution here). There has to be some thread to execute the same code again, or how can that function be entered again?
3.) In a practical case, will two threads execute same code, i.e. perform same functionality. I thought the idea of multi-threading is to execute different functionality, concurrently(on different cores/processors).
Sorry if these queries seem different, but they all occured to me, same time when i read about the threadsafe Vs reentrant post on SO, hence i put them together.
Any pointers, reading material will be appreciated.
thanks,
-AD.

I'll try to explain these, in order:
Each program runs in its own process, and gets its own isolated memory space. You don't have to worry about thread safety in this situation. (However, if the processes are both accessing some other shared resource, such as a file, you may have different issues. For example, process 1 may "lock" the data file, preventing process 2 from being able to open it).
The idea here is that two threads may try to run the same routine at the same time. This is not always valid - it takes special care to define a class or a process in a way that multiple threads can use the same instance of the same class, or the same static function, without errors occurring. This typically requires synchronization in the class.
Two threads often execute the same code. There are two different conceptual ways to parition your work when threading. You can either think in terms of tasks - ie: one thread does task A while another does task B. Alternatively, you can think in terms of decomposing the the problem based on data. In this case, you work with a large collection, and each element is processed using the same routine, but the processing happens in parallel. For more info, you can read this blog post I wrote on Decomposition for Parallelism.

Two processes cannot share memory. So thread-safety is moot here.
Re-entrancy means that a method can be safely executed by two threads at the same time. This doesn't require recursion - threads are separate units of execution, and there is nothing keeping them both from attempting to run the same method simultaneously.
The benefits to threading can happen in two ways. One is when you perform different types of operations concurrently (like running cpu-intensive code and I/O-intensive code at the ame time). The other is when you can divide up a long-running operation among multiple processors. In this latter case, two threads may be executing the same function at the same time on different input data sets.

First of all, I strongly suggest you to look at some basic stuffs of computer system, especially how a process/thread is executing on CPU and scheduled by operating system. For example, virtual address, context switching, process/thread concepts(e.g., each thread has its own stack and register vectors while heap is shared by threads. A thread is an execution and scheduling unit, so it maintains control flow of code..) and so on. All of the questions are related to understanding how your program is actually working on CPU
1) and 2) are already answered.
3) Multithreading is just concurrent execution of any arbitrary thread. The same code can be executed by multiple threads. These threads can share some data, and even can make data races which are very hard to find. Of course, many times threads are executing separate code(we say it as thread-level parallelism).
In this context, I have used concurrent as two meaning: (a) in a single processor, multiple threads are sharing a single physical processor, but operating system gives a sort of illusion that threads are running concurrently. (b) In a multicore, yes, physically two or more threads can be executed concurrently.
Having concrete understanding of concurrent/parallel execution takes quite long time. But, you already have a solid understanding!

Related

Do Rust threads run at the same time in parallel? Documentation sounds like it does not [duplicate]

I want to know if a program can run two threads at the same time (that is basically what it is used for correct?). But if I were to do a system call in one function where it runs on thread A, and have some other tasks running in another function where it runs on thread B, would they both be able to run at the same time or would my second function wait until the system call finishes?
Add-on to my original question: Now would this process still be an uninterruptable process while the system call is going on? I am talking about using any system call on UNIX/LINUX.
Multi-threading and parallel processing are two completely different topics, each worthy of its own conversation, but for the sake of introduction...
Threading:
When you launch an executable, it is running in a thread within a process. When you launch another thread, call it thread 2, you now have 2 separately running execution chains (threads) within the same process. On a single core microprocessor (uP), it is possible to run multiple threads, but not in parallel. Although conceptually the threads are often said to run at the same time, they are actually running consecutively in time slices allocated and controlled by the operating system. These slices are interleaved with each other. So, the execution steps of thread 1 do not actually happen at the same time as the execution steps of thread 2. These behaviors generally extend to as many threads as you create, i.e. packets of execution chains all working within the same process and sharing time slices doled out by the operating system.
So, in your system call example, it really depends on what the system call is as to whether or not it would finish before allowing the execution steps of the other thread to proceed. Several factors play into what will happen: Is it a blocking call? Does one thread have more priority than the other. What is the duration of the time slices?
Links relevant to threading in C:
SO Example
POSIX
ANSI C
Parallel Processing:
When multi-threaded program execution occurs on a multiple core system (multiple uP, or multiple multi-core uP) threads can run concurrently, or in parallel as different threads may be split off to separate cores to share the workload. This is one example of parallel processing.
Again, conceptually, parallel processing and threading are thought to be similar in that they allow things to be done simultaneously. But that is concept only, they are really very different, in both target application and technique. Where threading is useful as a way to identify and split out an entire task within a process (eg, a TCP/IP server may launch a worker thread when a new connection is requested, then connects, and maintains that connection as long as it remains), parallel processing is typically used to send smaller components of the same task (eg. a complex set of computations that can be performed independently in separate locations) off to separate resources (cores, or uPs) to be completed simultaneously. This is where multiple core processors really make a difference. But parallel processing also takes advantage of multiple systems, popular in areas such as genetics and MMORPG gaming.
Links relevant to parallel processing in C:
OpenMP
More OpenMP (examples)
Gribble Labs - Introduction to OpenMP
CUDA Tookit from NVIDIA
Additional reading on the general topic of threading and architecture:
This summary of threading and architecture barely scratches the surface. There are many parts to the the topic. Books to address them would fill a small library, and there are thousands of links. Not surprisingly within the broader topic some concepts do not seem to follow reason. For example, it is not a given that simply having more cores will result in faster multi-threaded programs.
Yes, they would, at least potentially, run "at the same time", that's exactly what threads are for; of course there are many details, for example:
If both threads run system calls that e.g. write to the same file descriptor they might temporarily block each other.
If thread synchronisation primitives like mutexes are used then the parallel execution will be blocked.
You need a processor with at least two cores in order to have two threads truly run at the same time.
It's a very large and very complex subject.
If your computer has only a single CPU, you should know, how it can execute more than one thread at the same time.
In single-processor systems, only a single thread of execution occurs at a given instant. because Single-processor systems support logical concurrency, not physical concurrency.
On multiprocessor systems, several threads do, in fact, execute at the same time, and physical concurrency is achieved.
The important feature of multithreaded programs is that they support logical concurrency, not whether physical concurrency is actually achieved.
The basics are simple, but the details get complex real quickly.
You can break a program into multiple threads (if it makes sense to do so), and each thread will run "at its own pace", such that if one must wait for, eg, some file I/O that doesn't slow down the others.
On a single processor multiple threads are accommodated by "time slicing" the processor somehow -- either on a simple clock basis or by letting one thread run until it must wait (eg, for I/O) and then "switching" to the next thread. There is a whole art/science to doing this for maximum efficiency.
On a multi-processor (such as most modern PCs which have from 2 to 8 "cores") each thread is assigned to a separate processor, and if there are not enough processors then they are shared as in the single processor case.
The whole area of assuring "atomicity" of operations by a single thread, and assuring that threads don't somehow interfere with each other is incredibly complex. In general a there is a "kernel" or "nucleus" category of system call that will not be interrupted by another thread, but thats only a small subset of all system calls, and you have to consult the OS documentation to know which category a particular system call falls into.
They will run at the same time, for one thread is independent from another, even if you perform a system call.
It's pretty easy to test it though, you can create one thread that prints something to the console output and perform a system call at another thread, that you know will take some reasonable amount of time. You will notice that the messages will continue to be printed by the other thread.
Yes, A program can run two threads at the same time.
it is called Multi threading.
would they both be able to run at the same time or would my second function wait until the system call finishes?
They both are able to run at the same time.
if you want, you can make thread B wait until Thread A completes or reverse
Two thread can run concurrently only if it is running on multiple core processor system, but if it has only one core processor then two threads can not run concurrently. So only one thread run at a time and if it finishes its job then the next thread which is on queue take the time.

Why do we need semaphores on single cpu?

I have read that we use semaphores inside the linux kerenl,and i have read that semaphores has advantages even in one single cpu (we can run only one process\thread). Can anyone please give me an example of a problem that semaphore solves(inside the kernel)?
In my view, there can be a problem only if we have more than one cpu, because two process may call system calls that use the same data structure, and probablly cause problems.
Thank you for your help!
You don't really need more than one CPU for concurrency. The multiple CPUs are really "an implementation detail," a piece of hardware quirkiness that you can abstract away from. Concurrency is a logical property of programs. You can have concurrency without multiple CPUs, and use multiple CPUs without "real concurrency".
Consider a web server. It has to be "concurrent," in the sense that it must serve multiple clients at once, hold information about multiple connections and once, and process multiple requests at once. You can have it literally do this, by having multiple CPUs all working at the same time. Yet, the program only has to appear to do multiple things at once. It could just as well be running on one CPU and context switching to fairly service all the work put to it. The fact that a web-server does multiple things at once is part of its interface: the I/O for the connections are interleaved, if a request has exclusively locked a resource, another request won't start trying to manipulate that same resource, etc. Writing a web server without concurrency produces a program that is wrong.
Semaphores help you with concurrency, by letting you control the way processes access resources. You asked, if you had one process running, how another could run at the same time with only a single core. Well, as I said, concurrency doesn't need multiple cores. The first process can be paused, and the second one started while the first one is still unfinished. This is just an implementation detail; logically, to the program writer, the two processes are running simultaneously, whether there are multiple cores or not. If the program was written without semaphores (or had broken concurrency in some other way), it would be wrong, even on a single core. Physically, this will be because context switching can abruptly pause one computation and start another at any time, and, without semaphores, the newly live thread won't know what resources it can and cannot access. Logically, this will be because the processes are running simultaneously, once you abstract yourself away from the implementation, and, in general, processes running simultaneously can walk over each other if not properly synchronized.
For an example applicable to an OS kernel, consider that every process is logically running concurrently with every other process. A kernel provides the implementation that makes this concurrency work. A resource that two processes may want simultaneously is a hard drive. A semaphore might be used in the kernel to track whether a given drive is currently busy with a read or write. A process trying to read or write to the same disk will ask the kernel to do so, and the kernel can check the semaphore to see that the disk is still busy and force the offending process to wait. Now, an operating system does count as low level code, so in some places, yes, you might want to omit some otherwise vital concurrency safeguards when running on a single CPU, because your job is to handle such implementation details, but higher level parts may still use them.
In contrast, consider a number-crunching program. Let's say it's processing each element of a huge array of data into an equal-sized array of modified data (a functional map operation). It can use multiple CPUs to do this more quickly, but it can also work one CPU. The observable behavior of the program is the same, and you never get any idea that it's doing multiple things at once from its behavior. Numbers go in, numbers come out, who cares what happens in the middle? Writing such a program without the ability to do multiple things at once does not produce a logically incorrect program, just a slow one. Such a program probably does not need semaphores when running on a single CPU, because it didn't need concurrency in the first place.

Does multithreading actually work in uniprocessor environment

Suppose we have a process with multiple threads in a uniprocessor.
Now I know that if we have several processes, only one of them will be processed at a time in a uniprocessor and hence the processes are not concurrent.
If my understanding is correct, similarly each thread will be processed at a time and not concurrent in a uniprocessor. Is this statement true? If so then does multithreading mean having more than one thread in a process and does not mean running multiple threads at a time? And does that mean there's no benefit of creating user threads in a uniprocessor environment?
TL;DR: threads are switching more often than processes and in real time we have an effect of concurrency because it is happens really fast.
when you wrote:
each thread will be processed at a time and not concurrent in a uni processor
Notice the word "concurrent", there is no real concurrency in uni processor, there is only effect of that thanks to the multiple number of context switches between processes.
Let's clarify something here, the single core of the CPU can handle one thread at a given time, each process has a main thread and (if needed) more threads running together. If a process A is now running and it has 3 threads: A1(main thread), A2, A3 all three will be running as long as process A is being processed by the CPU core. When a context switch occur process A is no longer running and now process B will run with his threads.
About this statement:
there's no benefit of creating user threads in a uni processor environment
That is not true. there is a benefit in creating threads, they are easier to create ("spawn" as in the books) and shearing the process heap memory. Creating a sub process ("child" as in the books) is a overhead comparing to a thread because a process need to have his own memory. For example each google chrome tab is a process not a thread, but this tab has multiple threads running concurrency with little responsibility.
If you are still somehow running a computer with just one, single-core, CPU, then you would be correct to observe that only one thread can be physically executing at one time. But that does not negate the value of breaking up the application into multiple threads and/or processes.
The essential benefit is concurrency. When one thread is waiting (e.g. for an input/output operation to complete), there is something else for the CPU to be doing in the meantime: it can be running a different thread that isn't waiting. With a carefully designed application, you can get much better utilization of every part of the hardware, more parallelism, and thus, more throughput.
My favorite go-to example is a fast food restaurant. About a dozen workers, each one doing different things, cooperate to bring your order to you. Even if one of them (say, "the fry guy") is standing around, someone else always has something to do. Several orders are in-process at once. This overlap, this "concurrency," is what you are shooting for – regardless of how many CPUs you have.
Multithreading is also commonly used with GUI applications that also need to do some kind of "heavy lifting." One thread handles the GUI interaction (and has no other real responsibilities) while other threads, with a slightly inferior priority (or "niceness") do the lifting. When a GUI event comes in, the GUI thread pre-empts the others and responds to it immediately, then of course goes right back to sleep again. But in this way the GUI always remains very responsive – even though the other threads are doing "heavy lifting" things, GUI messages are still handled very promptly. (I scooped-up about a 25% performance improvement by re-tooling an older application to use this approach, because the application was no longer "polling" for GUI events.)
The first question I ask about any thread is, "what does it wait for?" To me, a thread is defined by what event it waits for and what it does when that event happens.
Threads were in wide-spread use for at least a decade before multi-processor computers became commercially available. They are useful when you want to write a program that has to respond to un-synchronized events that come from multiple different sources. There's a few different ways to model a program like that. One way is to have a different thread to wait on each different event source. The next most popular is an event driven architecture in which there's a main loop that waits for all events and calls different event handler functions for each of the different kinds of event.
The multi-threaded style of program often is easier to read* because there's usually different activities going on inside the program, and the state of each activity can be implicit in the context (i.e., registers and call stack) of the thread that's driving it, while in the event-driven model, each activity's state must be explicitly encoded in some object.
The implicit-in-the-context way of keeping the state is much closer to the procedural style of coding a single activity that we learn as beginners.
*Easier to read does not mean that the code is easy to write without making bad and non-obvious mistakes!!
The main impetus for developing threads was Ada compliance. Prior to that, different operating systems had their own ways of handing multiple things at once. In eunuchs, the way to do more than one thing was to spin off a new process. In VMS, software interrupts (aka Asynchronous System Traps or Asynchronous Procedure Calls in Windoze). In those days (1970's) multiprocessor systems were rare.
One of the goals of Ada was to have a system independent way of doing things. It adopted the "task" which is effectively a thread. In order to support Ada, compiler developers had to include task (thread) libraries.
With the rise of multiprocessors, operating systems started to make threads (rather than processes) the basic schedulable unit in a system.
Threads then give a way for programs to handle multiple things simultaneously, even if there is only one processor. Sadly, support for threads in programming languages has been woefully lacking. Ada is the only major language I can think of that has real support for threads (tasks). Thread support in Java, for example, is a complete, sick joke. The result is threads are not as effective in practice as they could be.

Running two threads at the same time

I want to know if a program can run two threads at the same time (that is basically what it is used for correct?). But if I were to do a system call in one function where it runs on thread A, and have some other tasks running in another function where it runs on thread B, would they both be able to run at the same time or would my second function wait until the system call finishes?
Add-on to my original question: Now would this process still be an uninterruptable process while the system call is going on? I am talking about using any system call on UNIX/LINUX.
Multi-threading and parallel processing are two completely different topics, each worthy of its own conversation, but for the sake of introduction...
Threading:
When you launch an executable, it is running in a thread within a process. When you launch another thread, call it thread 2, you now have 2 separately running execution chains (threads) within the same process. On a single core microprocessor (uP), it is possible to run multiple threads, but not in parallel. Although conceptually the threads are often said to run at the same time, they are actually running consecutively in time slices allocated and controlled by the operating system. These slices are interleaved with each other. So, the execution steps of thread 1 do not actually happen at the same time as the execution steps of thread 2. These behaviors generally extend to as many threads as you create, i.e. packets of execution chains all working within the same process and sharing time slices doled out by the operating system.
So, in your system call example, it really depends on what the system call is as to whether or not it would finish before allowing the execution steps of the other thread to proceed. Several factors play into what will happen: Is it a blocking call? Does one thread have more priority than the other. What is the duration of the time slices?
Links relevant to threading in C:
SO Example
POSIX
ANSI C
Parallel Processing:
When multi-threaded program execution occurs on a multiple core system (multiple uP, or multiple multi-core uP) threads can run concurrently, or in parallel as different threads may be split off to separate cores to share the workload. This is one example of parallel processing.
Again, conceptually, parallel processing and threading are thought to be similar in that they allow things to be done simultaneously. But that is concept only, they are really very different, in both target application and technique. Where threading is useful as a way to identify and split out an entire task within a process (eg, a TCP/IP server may launch a worker thread when a new connection is requested, then connects, and maintains that connection as long as it remains), parallel processing is typically used to send smaller components of the same task (eg. a complex set of computations that can be performed independently in separate locations) off to separate resources (cores, or uPs) to be completed simultaneously. This is where multiple core processors really make a difference. But parallel processing also takes advantage of multiple systems, popular in areas such as genetics and MMORPG gaming.
Links relevant to parallel processing in C:
OpenMP
More OpenMP (examples)
Gribble Labs - Introduction to OpenMP
CUDA Tookit from NVIDIA
Additional reading on the general topic of threading and architecture:
This summary of threading and architecture barely scratches the surface. There are many parts to the the topic. Books to address them would fill a small library, and there are thousands of links. Not surprisingly within the broader topic some concepts do not seem to follow reason. For example, it is not a given that simply having more cores will result in faster multi-threaded programs.
Yes, they would, at least potentially, run "at the same time", that's exactly what threads are for; of course there are many details, for example:
If both threads run system calls that e.g. write to the same file descriptor they might temporarily block each other.
If thread synchronisation primitives like mutexes are used then the parallel execution will be blocked.
You need a processor with at least two cores in order to have two threads truly run at the same time.
It's a very large and very complex subject.
If your computer has only a single CPU, you should know, how it can execute more than one thread at the same time.
In single-processor systems, only a single thread of execution occurs at a given instant. because Single-processor systems support logical concurrency, not physical concurrency.
On multiprocessor systems, several threads do, in fact, execute at the same time, and physical concurrency is achieved.
The important feature of multithreaded programs is that they support logical concurrency, not whether physical concurrency is actually achieved.
The basics are simple, but the details get complex real quickly.
You can break a program into multiple threads (if it makes sense to do so), and each thread will run "at its own pace", such that if one must wait for, eg, some file I/O that doesn't slow down the others.
On a single processor multiple threads are accommodated by "time slicing" the processor somehow -- either on a simple clock basis or by letting one thread run until it must wait (eg, for I/O) and then "switching" to the next thread. There is a whole art/science to doing this for maximum efficiency.
On a multi-processor (such as most modern PCs which have from 2 to 8 "cores") each thread is assigned to a separate processor, and if there are not enough processors then they are shared as in the single processor case.
The whole area of assuring "atomicity" of operations by a single thread, and assuring that threads don't somehow interfere with each other is incredibly complex. In general a there is a "kernel" or "nucleus" category of system call that will not be interrupted by another thread, but thats only a small subset of all system calls, and you have to consult the OS documentation to know which category a particular system call falls into.
They will run at the same time, for one thread is independent from another, even if you perform a system call.
It's pretty easy to test it though, you can create one thread that prints something to the console output and perform a system call at another thread, that you know will take some reasonable amount of time. You will notice that the messages will continue to be printed by the other thread.
Yes, A program can run two threads at the same time.
it is called Multi threading.
would they both be able to run at the same time or would my second function wait until the system call finishes?
They both are able to run at the same time.
if you want, you can make thread B wait until Thread A completes or reverse
Two thread can run concurrently only if it is running on multiple core processor system, but if it has only one core processor then two threads can not run concurrently. So only one thread run at a time and if it finishes its job then the next thread which is on queue take the time.

Thread and Process

What is the best definition of a thread and what is a process?
If I call a function, how do I know that a thread is calling it or a process (or am I not understanding it??!). This is in a multi-core system (quadcore).
From http://wiki.answers.com/Q/What_is_the_difference_between_a_computer_process_and_thread:
A single process can have multiple threads that share global data and address space with other threads running in the same process, and therefore can operate on the same data set easily. Processes do not share address space and a different mechanism must be used if they are to share data.
If we consider running a word processing program to be a process, then the auto-save and spell check features that occur in the background are different threads of that process which are all operating on the same data set (your document).
One thing to add is how does a multi-core processor handle this. Think of a thread as the sequential execution of your code.
A core in a CPU can only execute one thread at a time. So if this thread is blocked because the program is waiting for an I/O operation to finish, the process is blocked (very simplified example: Word not responding). Multi-threading allows us to execute multiple code paths at the same time. "Same time" is a bit of a lie, since only one thread can actually execute at a time in a core, but the CPU gives some small chunk of time to each thread, so it appears as if all these threads are executing at the same time. A good example here is the spell checker in Word.
If you have multiple cores, the only difference is that in an N-Core CPU you can have N threads executing at the same time. To simplify a lot, it doesn't matter what process the threads belong to. To simply even further, you'd expect a N times performance increase. :-D
In every modern OS I know of, everything runs in a thread, which runs in a process.
The OS can keep track of multiple processes, and each process can host an arbitrary number of threads. So all code is executed within a thread and within a process (since the thread runs in a process).
The main distinction between the two is that each process has its own virtual address space. Separate processes do not have access to each others' data, file handles or anything else, and are essentially not aware that other processes exist.
On the other hand, every thread in a process share the same address space, and all threads can therefore inspect or modify each others' data, call the same functions and everything else.
It is often (but not always) the cases that one program consists of one process and a number of threads.
A process is composed of one or more threads (one by default for most environments). A process can create additional threads though.
Like the previous answer says, each Process has its own memory space (each can have a pointer to 0x12345, with that memory location having different values for each process), while all the Threads of a process would actually point to the exact same memory location, since they're all in the same memory space.
When calling a function, it's almost always called on the same thread that the caller is running on. In Objective-C, there are exceptions (performSelectorOnMainThread), and there might be for other languages as well, but that sort of functionality is necessary only in special cases.
From a user's point of view, the main distinction is that threads share memory with each other, while processes do not. That means you can easily share data between threads, while processes require some kind of OS call to do so.
Some call this a benifit of threads, but sharing data between multiple threads of control is fraught with danger, so it can be argued that processes lead to more reliable code.
There's a lot more to it, particularly if you are an OS person.

Resources