Can single thread do everything that multithread can do? - multithreading

My 1st Question: As per the title.
I am asking this because I came across a StackExchange question: What can multiple threads do that a single thread cannot?
In one of the solutions given in that link states that whatever multithread can do, it can be done by single thread as well.
However I don't think this is true. My argument is this: When we build a simple chat program with socket programming and run it via the command console. If the chat program is single threaded. The chat program is actually half-duplex. Which means we cannot listen and talk concurrently and each time only a party can talk and the other have to listen. In order for both parties to be able to talk and receive message concurrently, we have to implement it with multithreads.
My 2nd Question: Is my argument correct? Or did I miss out some points here, and therefore a single thread still can do everything multithread does?

Let's consider the computer as a whole, and more precisely that you chat application is bound with the kernel (or the whole os) as a piece we would call "the software".
Now consider that this "software" runs on a single core (say a i386).
Then you can figure out that, even if you wrote your chat application using threads (which is probably quite overkill), the software as a whole runs on a single CPU core, which means that at a very moment it performs one single thing even if there seem to be parallel things happening.
This is nothing more but a Turing machine (using a single tape) https://en.wikipedia.org/wiki/Turing_machine
The parallelism is an illusion caused by the kernel because it can switch between task fast enough. Just like a film seems to be continuous picture on screen, when actually there are just 24 images per seconds, and this is enough to fool our brain.
So I would say that anything a multithreaded program does, a single threaded could do.
Nevertheless, now we all use multi-core CPUs which can be seen at a certain point as running on multiple computers at the same time (parallel computing), thus you can probably find software that works on multi core and that would not run on a single threaded one.
A good example are device drivers (in kernel). If you have a poor implementation, on non preemptive kernel, you can create a busy loop that waits for an event indefinitely. This usually deadlock on single core (you prevent the kernel to schedule to another task, thus you prevent the event to be sent). But this can work on multi core as the event is usually eventually sent by the other thread running on an other core (hopefully).

I want to amend the existing answer (+1):
You absolutely can run multiple parallel IOs on a single thread. An IO is nothing more but a kernel data structure. When you start the IO the OS talks to the hardware and tells it to do something. Then, the CPU is free to do whatever it wants. The hardware calls back into the OS when it's done. It issues an interrupt which hijacks a CPU core to process the completion notification.
This is called async IO and all OS'es provide it.
In fact this is how socket programs with many connections run. They use async IO to multiplex high amounts of connections onto a small pool of threads.

The core reason why this argument is incorrect is subtle. While it's true that with only a single thread, or single core, or single network interface, that particular component can only be handling a send or a receive at any given time, if it's not the critical path, it does not make sense to describe the overall system as half duplex.
Consider a network link that is full-duplex and takes 1ms to move a chunk of data from one end to the other. Now imagine we have a device that puts data on the link or removes data from the link but cannot do both at the same time. So long as it takes much less than 1ms to process a send or a receive, this single file path that data in both directions must go through does not somehow make the link half-duplex. There will still be data moving in both directions at the same time.
In any realistic chat application, the CPU will not be the limiting factor. So it's inability to do more than one thing at a time can't make the system half-duplex. There can still be data moving in both directions at the same time.
For a typical chat application under typical load, the behavior of the system will not be significantly different whether implementation uses a single thread or has multiple threads with infinite CPU resources. The CPU just won't be the limiting factor.

Related

Why do we need semaphores on single cpu?

I have read that we use semaphores inside the linux kerenl,and i have read that semaphores has advantages even in one single cpu (we can run only one process\thread). Can anyone please give me an example of a problem that semaphore solves(inside the kernel)?
In my view, there can be a problem only if we have more than one cpu, because two process may call system calls that use the same data structure, and probablly cause problems.
Thank you for your help!
You don't really need more than one CPU for concurrency. The multiple CPUs are really "an implementation detail," a piece of hardware quirkiness that you can abstract away from. Concurrency is a logical property of programs. You can have concurrency without multiple CPUs, and use multiple CPUs without "real concurrency".
Consider a web server. It has to be "concurrent," in the sense that it must serve multiple clients at once, hold information about multiple connections and once, and process multiple requests at once. You can have it literally do this, by having multiple CPUs all working at the same time. Yet, the program only has to appear to do multiple things at once. It could just as well be running on one CPU and context switching to fairly service all the work put to it. The fact that a web-server does multiple things at once is part of its interface: the I/O for the connections are interleaved, if a request has exclusively locked a resource, another request won't start trying to manipulate that same resource, etc. Writing a web server without concurrency produces a program that is wrong.
Semaphores help you with concurrency, by letting you control the way processes access resources. You asked, if you had one process running, how another could run at the same time with only a single core. Well, as I said, concurrency doesn't need multiple cores. The first process can be paused, and the second one started while the first one is still unfinished. This is just an implementation detail; logically, to the program writer, the two processes are running simultaneously, whether there are multiple cores or not. If the program was written without semaphores (or had broken concurrency in some other way), it would be wrong, even on a single core. Physically, this will be because context switching can abruptly pause one computation and start another at any time, and, without semaphores, the newly live thread won't know what resources it can and cannot access. Logically, this will be because the processes are running simultaneously, once you abstract yourself away from the implementation, and, in general, processes running simultaneously can walk over each other if not properly synchronized.
For an example applicable to an OS kernel, consider that every process is logically running concurrently with every other process. A kernel provides the implementation that makes this concurrency work. A resource that two processes may want simultaneously is a hard drive. A semaphore might be used in the kernel to track whether a given drive is currently busy with a read or write. A process trying to read or write to the same disk will ask the kernel to do so, and the kernel can check the semaphore to see that the disk is still busy and force the offending process to wait. Now, an operating system does count as low level code, so in some places, yes, you might want to omit some otherwise vital concurrency safeguards when running on a single CPU, because your job is to handle such implementation details, but higher level parts may still use them.
In contrast, consider a number-crunching program. Let's say it's processing each element of a huge array of data into an equal-sized array of modified data (a functional map operation). It can use multiple CPUs to do this more quickly, but it can also work one CPU. The observable behavior of the program is the same, and you never get any idea that it's doing multiple things at once from its behavior. Numbers go in, numbers come out, who cares what happens in the middle? Writing such a program without the ability to do multiple things at once does not produce a logically incorrect program, just a slow one. Such a program probably does not need semaphores when running on a single CPU, because it didn't need concurrency in the first place.

What's the point of multi-threading on a single core?

I've been playing with the Linux kernel recently and diving back into the days of OS courses from college.
Just like back then, I'm playing around with threads and the like. All this time I had been assuming that threads were automatically running concurrently on multiple cores but I've recently discovered that you actually have to explicitly code for handling multiple cores.
So what's the point of multi-threading on a single core? The only example I can think of is from college when writing a client/server program but that seems like a weak point.
All this time I had been assuming that threads were automatically
running concurrently on multiple cores but I've recently discovered
that you actually have to explicitly code for handling multiple cores.
The above is incorrect for any widely used, modern OS. All of Linux's schedulers, for example, will automatically schedule threads on different cores and even automatically move threads from one core to another when necessary to maximize core usage. There are some APIs that allow you to modify the schedulers' behavior, but these APIs are generally used to disable automatic thread-to-core scheduling, not to enable it.
So what's the point of multi-threading on a single core?
Imagine you have a GUI program whose purpose is to execute an expensive computation (for example, render a 3D image or a Mandelbrot set) and then display the result. Let's say this computation takes 30 seconds to complete on this particular CPU. If you implement that program the obvious way, and use only a single thread, then the user's GUI controls will be unresponsive for 30 seconds while the calculation is executing -- the user will be unable to do anything with your program, and possibly unable to do anything with his computer at all. Since users expect GUI controls to be responsive at all times, that would be a poor user experience.
If you implement that program with two threads (one GUI thread and one rendering thread), on the other hand, the user will be able to click buttons, resize the window, quit the program, choose menu items, etc, even while the computation is executing, because the OS is able to wake up the GUI thread and allow it to handle mouse/keyboard events when necessary.
Of course, it is possible to write this program with a single thread and keep its GUI responsive, by writing your single thread to do just a few milliseconds worth of computation, then check to see if there are GUI events available to process, handling them, then going back to do a bit more computation, etc. But if you code your app this way, you are essentially writing your own (very primitive) thread scheduler inside your app anyway, so why reinvent the wheel?
The first versions of MacOS were designed to run on a single core, but had no real concept of multithreading. This forced every application developer to correctly implement some manual thread management -- even if their app did not have any extended computations, they had to explicitly indicate when they were done using the CPU, e.g. by calling WaitNextEvent. This lack of multithreading made early (pre-MacOS-X) versions of MacOS famously unreliable at multitasking, since just one poorly written application could bring the whole computer to a grinding halt.
First, a program not only computes, but also waits for input/output and so can be considered as executing on an I/O processor. So even single-core machine is a multi-processor machine, and employing of multi-threading is justified.
Second, a task can be divided in several threads in the sake of modularity.
Multithreading is not only for taking advantage of multiple cores.
You need multiple processes for multitasking. For similar reason you are allowed to have multiple threads, which are lightweight compared with processes.
You probably don't want to spawn processes all the time for things like blocking I/O. That may be overkill.
And there is fiber, which is even more lightweight. So we have process, thread, and fiber for different levels of needs.
Well, when you say multithreading on a single core, there are things you need to consider. For example, the thread API that you are using - is it user level or kernel level. Most probably from you question I believe you are using user level threads.
Now, user level threads, depending upon the host OS or the API itself may map to single kernel thread or multiple. Many relations are possible like 1-1,many-1 or many-many.
Now, if there is a single core, your OS can still provide you several Kernel level threads which may behave as multiple processes to the CPU. In which case, OS will give you a time-slicing (and multi-programming) on the kernel threads leading to superfast context switch and via the user level API - you/your code will seem to have multithreaded features.
Also note that eventhough your processor is a single core, depending on the make, it can be hyperthreaded and have super deep pipelines allowing the concurrent running of Kernel threads with very low overhead.
For references: Check Intel/AMD architecture and how various OS provide Kernel threads.

Programming with threads, what is the benefit? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Given a single core CPU, what is the benefit to coding using threads?
At least with the Java implementation, and it seems intuitive to naturally extend to any other language considering the single core restriction, you may have several threads performing various actions but the processes are time-limited and switched.
Given process A and process B:
What is the benefit of performing half of process A, finish process B, and then finish the second half of process A VS performing process A then B?
It seems that the switching between the threads would introduce time delays that would prolong the overall completion time of both processes VS not switching and just completing A then B.
The reason to use threads on a single-core system is simply to allow processes that would otherwise use all the CPU to be preempted by other tasks that need to get done sooner. The most common reason to make a system multi-threaded is to have a responsive user interface even while performing long calculations.
Of course, any operation can take a long time (reading a file, accessing a database, resizing a photo, recalculating a spreadsheet), and those operations can be performed on a separate thread to allow the thread responding to user input to operate the whole time.
Twenty years ago, for example, it was rare to have a multi-CPU system or an OS that allowed multi-threading, so nearly every program was single-threaded and there were many frameworks created to allow systems to have UIs and still do I/O. The standard mechanism for this is an event loop, where all events (UI, network, timers, etc.) are processed in a big loop.
This type of system means that the UI is held up during things like file I/O and calculations. In order to not hold up the UI too much, you have to do the I/O in chunks (say, read the file 4k at a time), processing any incoming UI events between chunks. This is really just a hack to keep the system running, but it's hard to make the system run smoothly like this because you don't know how often you need to process events.
The solution is to have a separate thread to recalculate your spreadsheet or write your file. That way the OS can give those threads fair timeslices while still preempting them to run the UI, allowing the UI to always be responsive.
An executing thread is not necessarily doing anything useful. The canonical example is reading from disk -- that data isn't going to be there for another few milliseconds, during which time the processor would be sitting unused. Threads allow one piece of the program to use the CPU while other pieces of the program are waiting for operations to complete.
There are many reasons. Wikipedia gives a decent overview on its page about threads.
Here's a few OTOH:
I/O bound tasks benefit from threading (especially network applications).
Hyperthreaded processors may speed up multithreaded applications even on a single core.
Threads can be instructed to wait (block) and wake up on specific events, enabling responsive event-driven programming.
If your program has to do several things "at the same time" then threads are a good way to go, particularly is some of those tasks are quite long running. Otherwise you find yourself writing code that looks like an operating system scheduler inside your program, which is always a waste of time if the OS underneath you has a perfectly good one already. You'd find that your source code was mostly 'scheduler' and not much 'program', which is very inelegant. A good threaded program can be very elegant and economic in source code, which makes oneself look good and saves time.
Some run times get/got it wrong. In the early days of Ada the runtime environment would do its own thread scheduling, and it was never very satisfactory. That was partly due to the fact that whilst the Ada language spec included the concept of threads, the OSes we had back then quite often didn't provide them. Ada got a lot better when the compiler writers started using the underlying OS threads instead.
Similarly Python doesn't really properly use the underlying OS threads; it spoils it with the Global Interpreter Lock. Python has sidestepped the whole issue by going for multiprocessing instead (not necessarily a good thing on Windows hosts...).
Early versions of Windows didn't do threads either, they did cooperative multitasking. This depended on each process in the whole machine calling any OS routine at least now and then. Each OS routine would first consult the 'scheduler' to see if anything else was waiting to run before getting on with whatever it was supposed to be doing on behalf of the program. There were many terrible programs back then that wouldn't play ball and hogged the entire machine. You couldn't get on with playing a game of Solitaire when something else embarked on a length calculation.
What's the mental model of your program?
IF it depends on multiple external inputs that can happen in unpredictable orders, and if what you want to do in response to those inputs is not simple and can overlap in time ...
THEN it makes sense to devote a separate thread to each input request, and have that thread perform the response needed by that request.
So, for example, if your program is waiting for input requests from an external channel, and each request must trigger its own protocol of outgoing and incoming messages, it can very much simplify the code to create a new thread (or re-use an old one) for each request.
Somehow people seem to enter the workforce thinking that threads are only there for speed (through parallelism).
That's one use, provided it allows multiple CPU chips to get cranking,
but it is by no means the only use.

Would handling each TCP connection in a separate thread improve latency?

I have an FTP server, implemented on top of QTcpServer and QTcpSocket.
I take advantage of the signals and slots mechanism to support multiple TCP connections simultaneously, even though I have a single thread. My code returns as soon as possible to the event loop, it doesn't block (no wait functions), and it doesn't use nested event loops anywhere. That way I already have cooperative multitasking, like Win3.1 applications had.
But a lot of other FTP servers are multithreaded. Now I'm wondering if using a separate thread for handling each TCP connection would improve performance, and especially latency.
On one hand, threads add to latency because you need to start a new thread for each new connection, but on the other, with my cooperative multitasking, other TCP connections have to wait until I've returned to the main loop before their readyRead()/bytesWritten() signals can be handled.
In your current system and ignoring file I/O time one processor is always doing something useful if there's something useful to be done, and waiting ready-to-go if there's nothing useful to be done. If this were a single processor (single core) system you would have maximized throughput. This is often a very good design -- particularly for an FTP server where you don't usually have a human waiting on a packet-by-packet basis.
You have also minimized average latency (for a single processor system.) What you do not have is consistent latency. Measuring your system's performance is likely to show a lot of jitter -- a lot of variation in the time it takes to handle a packet. Again because this is FTP and not real-time process control or human interaction, jitter may not be a problem.
Now, however consider that there is probably more than one processor available on your system and that it may be possible to overlap I/O time and processing time.
To take full advantage of a multi-processor(core) system you need some concurrency.
This normally translates to using multiple threads, but it may be possible to achieve concurrency via asynchronous (non-blocking) file reads and writes.
However, adding multiple threads to a program opens up a huge can-of-worms.
If you do decide to go the MT route, I'd suggest that you consider depending on a thread-aware I/O library. QT may provide that for you (I'm not sure.) If not, take a look at boost::asio (or ACE for an older, but still solid solution). You'll discover that using the MT capabilities of such a library involves a considerable investment in learning time; however as it turns out the time to add on multithreading "by-hand" and get it right is even worse.
So I'd say stay with your existing solution unless you are worried about unused Processor cycles and/or jitter in which case start learning QT's multithreading support or boost::asio.
Do you need to start a new thread for each new connection? Could you not just have a pool of threads that acts on requests as and when they arrive. This should reduce some of the latency. I have to say that in general a multi-threaded FTP server should be more responsive that a single-threaded one. Is it possible to have an event based FTP server?

Many threads or as few threads as possible?

As a side project I'm currently writing a server for an age-old game I used to play. I'm trying to make the server as loosely coupled as possible, but I am wondering what would be a good design decision for multithreading. Currently I have the following sequence of actions:
Startup (creates) ->
Server (listens for clients, creates) ->
Client (listens for commands and sends period data)
I'm assuming an average of 100 clients, as that was the max at any given time for the game. What would be the right decision as for threading of the whole thing? My current setup is as follows:
1 thread on the server which listens for new connections, on new connection create a client object and start listening again.
Client object has one thread, listening for incoming commands and sending periodic data. This is done using a non-blocking socket, so it simply checks if there's data available, deals with that and then sends messages it has queued. Login is done before the send-receive cycle is started.
One thread (for now) for the game itself, as I consider that to be separate from the whole client-server part, architecturally speaking.
This would result in a total of 102 threads. I am even considering giving the client 2 threads, one for sending and one for receiving. If I do that, I can use blocking I/O on the receiver thread, which means that thread will be mostly idle in an average situation.
My main concern is that by using this many threads I'll be hogging resources. I'm not worried about race conditions or deadlocks, as that's something I'll have to deal with anyway.
My design is setup in such a way that I could use a single thread for all client communications, no matter if it's 1 or 100. I've separated the communications logic from the client object itself, so I could implement it without having to rewrite a lot of code.
The main question is: is it wrong to use over 200 threads in an application? Does it have advantages? I'm thinking about running this on a multi-core machine, would it take a lot of advantage of multiple cores like this?
Thanks!
Out of all these threads, most of them will be blocked usually. I don't expect connections to be over 5 per minute. Commands from the client will come in infrequently, I'd say 20 per minute on average.
Going by the answers I get here (the context switching was the performance hit I was thinking about, but I didn't know that until you pointed it out, thanks!) I think I'll go for the approach with one listener, one receiver, one sender, and some miscellaneous stuff ;-)
use an event stream/queue and a thread pool to maintain the balance; this will adapt better to other machines which may have more or less cores
in general, many more active threads than you have cores will waste time context-switching
if your game consists of a lot of short actions, a circular/recycling event queue will give better performance than a fixed number of threads
To answer the question simply, it is entirely wrong to use 200 threads on today's hardware.
Each thread takes up 1 MB of memory, so you're taking up 200MB of page file before you even start doing anything useful.
By all means break your operations up into little pieces that can be safely run on any thread, but put those operations on queues and have a fixed, limited number of worker threads servicing those queues.
Update: Does wasting 200MB matter? On a 32-bit machine, it's 10% of the entire theoretical address space for a process - no further questions. On a 64-bit machine, it sounds like a drop in the ocean of what could be theoretically available, but in practice it's still a very big chunk (or rather, a large number of pretty big chunks) of storage being pointlessly reserved by the application, and which then has to be managed by the OS. It has the effect of surrounding each client's valuable information with lots of worthless padding, which destroys locality, defeating the OS and CPU's attempts to keep frequently accessed stuff in the fastest layers of cache.
In any case, the memory wastage is just one part of the insanity. Unless you have 200 cores (and an OS capable of utilizing) then you don't really have 200 parallel threads. You have (say) 8 cores, each frantically switching between 25 threads. Naively you might think that as a result of this, each thread experiences the equivalent of running on a core that is 25 times slower. But it's actually much worse than that - the OS spends more time taking one thread off a core and putting another one on it ("context switching") than it does actually allowing your code to run.
Just look at how any well-known successful design tackles this kind of problem. The CLR's thread pool (even if you're not using it) serves as a fine example. It starts off assuming just one thread per core will be sufficient. It allows more to be created, but only to ensure that badly designed parallel algorithms will eventually complete. It refuses to create more than 2 threads per second, so it effectively punishes thread-greedy algorithms by slowing them down.
I write in .NET and I'm not sure if the way I code is due to .NET limitations and their API design or if this is a standard way of doing things, but this is how I've done this kind of thing in the past:
A queue object that will be used for processing incoming data. This should be sync locked between the queuing thread and worker thread to avoid race conditions.
A worker thread for processing data in the queue. The thread that queues up the data queue uses semaphore to notify this thread to process items in the queue. This thread will start itself before any of the other threads and contain a continuous loop that can run until it receives a shut down request. The first instruction in the loop is a flag to pause/continue/terminate processing. The flag will be initially set to pause so that the thread sits in an idle state (instead of looping continuously) while there is no processing to be done. The queuing thread will change the flag when there are items in the queue to be processed. This thread will then process a single item in the queue on each iteration of the loop. When the queue is empty it will set the flag back to pause so that on the next iteration of the loop it will wait until the queuing process notifies it that there is more work to be done.
One connection listener thread which listens for incoming connection requests and passes these off to...
A connection processing thread that creates the connection/session. Having a separate thread from your connection listener thread means that you're reducing the potential for missed connection requests due to reduced resources while that thread is processing requests.
An incoming data listener thread that listens for incoming data on the current connection. All data is passed off to a queuing thread to be queued up for processing. Your listener threads should do as little as possible outside of basic listening and passing the data off for processing.
A queuing thread that queues up the data in the right order so everything can be processed correctly, this thread raises the semaphore to the processing queue to let it know there's data to be processed. Having this thread separate from the incoming data listener means that you're less likely to miss incoming data.
Some session object which is passed between methods so that each user's session is self contained throughout the threading model.
This keeps threads down to as simple but as robust a model as I've figured out. I would love to find a simpler model than this, but I've found that if I try and reduce the threading model any further, that I start missing data on the network stream or miss connection requests.
It also assists with TDD (Test Driven Development) such that each thread is processing a single task and is much easier to code tests for. Having hundreds of threads can quickly become a resource allocation nightmare, while having a single thread becomes a maintenance nightmare.
It's far simpler to keep one thread per logical task the same way you would have one method per task in a TDD environment and you can logically separate what each should be doing. It's easier to spot potential problems and far easier to fix them.
What's your platform? If Windows then I'd suggest looking at async operations and thread pools (or I/O Completion Ports directly if you're working at the Win32 API level in C/C++).
The idea is that you have a small number of threads that deal with your I/O and this makes your system capable of scaling to large numbers of concurrent connections because there's no relationship between the number of connections and the number of threads used by the process that is serving them. As expected, .Net insulates you from the details and Win32 doesn't.
The challenge of using async I/O and this style of server is that the processing of client requests becomes a state machine on the server and the data arriving triggers changes of state. Sometimes this takes some getting used to but once you do it's really rather marvellous;)
I've got some free code that demonstrates various server designs in C++ using IOCP here.
If you're using unix or need to be cross platform and you're in C++ then you might want to look at boost ASIO which provides async I/O functionality.
I think the question you should be asking is not if 200 as a general thread number is good or bad, but rather how many of those threads are going to be active.
If only several of them are active at any given moment, while all the others are sleeping or waiting or whatnot, then you're fine. Sleeping threads, in this context, cost you nothing.
However if all of those 200 threads are active, you're going to have your CPU wasting so much time doing thread context switches between all those ~200 threads.

Resources