Thread Block Operation - multithreading

I asked two questions(1,2) about reading a file using using a thread so that the main thread doesn't get blocked. My problem wasn't so much writing and starting a thread, my problem was I didn't understand what operations were blocking. I've been told before that reading from a file is a blocking, using the for loop in my second example, is blocking. I don't really understand why or even spot it when looking at a piece of code.
So my question obviously is, how do you spot or determine when an operation is blocking a thread, and how do you fix it?

So my question obviously is, how do you spot or determine when an
operation is blocking a thread
There's no magic way to do it; in general you have to read the documentation for whatever functions you call in order to get an idea about whether they are guaranteed to return quickly or whether they might block for an extended period of time.
If you're looking at a running program and want to know what its threads are currently doing, you can either watch them using a debugger, or insert print statements at various locations so that you can tell (by seeing what text gets printed to stdout and what text doesn't) roughly where the thread is at and what it is doing.
, and how do you fix it?
Blocking is not "broken", so there's nothing to fix. Blocking is intentional behavior, so that e.g. when you call a function that reads from disk, it can provide you back some useful data when it returns. (Consider an alternative non-blocking read, which would always return immediately, but in most cases wouldn't be able to provide you with any data, since the hard drive had not had time to load in any data yet -- not terribly useful).
That said, for network operations you can set your socket(s) to non-blocking mode so that calls to send(), recv(), etc are guaranteed never to block (they will return an error code instead). That only works for networking though; most OS's don't support non-blocking I/O for disk access.

Related

Is the cancellation of blocked file/socket functions guaranteed when a fileno is closed?

I read in a lot of place (and sometimes see in code) the idea that closing a file number in some thread of a process will immediately cancel and unblock any I/O blocking call on that file (regular file, socket, or something else) made by another thread.
However I couldn't file either any normative reference guaranteeing that it will happen, or even linux specific documentation that it is the case by design and not by accident.
Is the practice rooted in simple tradition, and inherently unsafe in all cases, or it is somehow guaranteed to work?
(Anyhow, the idea is inherently unsafe in practical multi-threaded code with any part you don't control fully, as you can't know when other threads might re-use that given fileno. I'm not recommending it.)

Will non-blocking I/O be put to sleep during copying data from kernel to user?

I ask this question because I am looking at multiplexing I/O in Go, which is using epollwait.
When an socket is ready, a goroutine will be waked up and begin to read socket in non-blocking mode. If the read system call still will be blocked during copying data from kernel to user, I assume the kernel thread the gorouine attached to will be put to sleep as well.
I am not sure of that, hoping someone can help correct me if I am wrong.
I fail to quite parse what you've written.
I'll try to make a sheer guess and conjure you might be overseeing the fact that the write(2) and read(2) syscalls (and those of their ilk such as send(2) and recv(2)) on the sockets put into non-blocking mode are free to consume (and return, respectively) less data than requested.
In other words, a write(2) call on a non-blocking socket told to write 1 megabyte of data will consume just as much data currently fits into the assotiated kernel buffer and return immediately, signalling it consumed only as much data. The next immediate call to write(2) will likely return EWOULDBLOCK.
The same goes for the read(2) call: if you pass it a buffer large enough to hold 1 megabyte of data, and tell it to read that number of bytes, the call will only drain the contents of the kernel buffer and return immediately, signaling how much data it actually copied. The next immediate call to read(2) will likely return EWOULDBLOCK.
So, any attempt to get or put data to the socket succeeds almost immediately: either after the data had been shoveled between the kernel's buffer and the user space or right away—with the EAGAIN return code.
Sure, there's supposedly a possibility for an OS thread to be suspended right in the middle of performing such a syscall, but this does not count as "blocking in a syscall."
Update to the original answer in response to the following comment of the OP:
<…>
This is what I see in book
"UNIX Network Programming" (Volume 1, 3rd), chapter 6.2:
A synchronous I/O operation causes the requesting process
to be blocked until that I/O operation completes. Using these
definitions, the first four I/O models—blocking, nonblocking, I/O
multiplexing, and signal-driven I/O—are all synchronous because the
actual I/O operation (recvfrom) blocks the process.
It uses "blocks" to describe nonblocking I/O operation. That makes me confused.
I still don't understand why the book uses "blocks the process" if the process is actually not blocked.
I can only guess that the book's author intended to highlight that the process is indeed blocked since entering a syscall and until returning from it. Reads from and writes to a non-blocking socket do block to transfer the data, if available, between the kernel and the user space. We colloquially say this does not block because we mean "it does not block waiting and doing nothing for an indeterminate amount of time".
The book's author might contrast this to the so-called asynchronous I/O (called "overlapping" on Windows™)—where you basically give the kernel a buffer with/for data and ask it to do away with it completely in parallel with your code—in the sense the relevant syscall returns right away and the I/O is carried out in background (with regard to your user-space code).
To my knowledge, Go does not use kernel's async I/O facilities on neither platform it supports. You might look there for the developments regarding Linux and its contemporary io_uring subsystem.
Oh, and one more point. The book might (at that point through the narrative at least) be discussing a simplified "classic" scheme where there are no in-process threads, and the sole unit of concurrency is the process (with a single thread of execution). In this scheme, any syscall obviously blocks the whole process. In contrast, Go works only on kernels which support threads, so in a Go program a syscall never blocks the whole process—only the thread it's called on.
Let me take yet another stab at explaining the problem as—I perceive—the OP stated it.
The problem of serving multiple client requests is not new—one of the more visible first statements of it is "The C10k problem".
To quickly recap it, a single threaded server with blocking operations on the sockets it manages is only realistically able to handle a single client at a time.
To solve it, there exist two straightforward approaches:
Fork a copy of the server process to handle each incoming client connection.
On an OS which supports threads, fork a new thread inside the same process to handle each incoming client.
They have their pros and cons but they both suck with regard to resource usage, and—which is more important—they do not play well with the fact most clients have relatively low rate and bandwidth of I/O they perform with regard to the processing resources available on a typical server.
In other words, when serving a typical TCP/IP exchange with a client, the serving thread most of the time sleeps in the write(2) and read(2) calls on the client socket.
This is what most people mean when talking about "blocking operations" on sockets: if a socket is blocking, and operation on it will block until it can actually be carried out, and the originating thread will be put to sleep for an indeterminate amount of time.
Another important thing to note is that when the socket becomes ready, the amount of work done is typically miniscule compared to the amount of time slept between the wakeups.
While the tread sleeps, its resources (such as memory) are effectively wasted, as they cannot be used to do any other work.
Enter "polling". It combats the problem of wasted resources by noticing that the points of readiness of networked sockets are relatively rare and far in between, so it makes sense to have lots of such sockets been served by a single thread: it allows to keep the thread almost as busy as theoretically possible, and also allows to scale out when needed: if a single thread is unable to cope with the data flow, add another thread, and so on.
This approach is definitely cool but it has a downside: the code which reads and writes data must be re-written to use callback style instead of the original plain sequential style. Writing with callbacks is hard: you usuaully have to implement intricate buffer management and state machines to deal with this.
The Go runtime solves this problem by adding another layer of scheduling for its execution flow units—goroutines: for goroutines, operations on the sockets are always blocking, but when a goroutine is about to block on a socket, this is transparently handled by suspending only the goroutine itself—until the requested operation will be able to proceed—and using the thread the goroutine was running on to do other work¹.
This allows to have the best of both approaches: the programmer may write classic no-brainer sequential callback-free networking code but the threads used to handle networking requests are fully utilized².
As to the original question of blocking, both the goroutine and the thread it runs on are indeed blocked when the data transfer on a socket is happening, but since what happens is data shoveling between a kernel and a user-space buffer, the delay is most of the time small, and is no different to the classic "polling" case.
Note that performing of syscalls—including I/O on non-pollable descriptors—in Go (at leas up until, and including Go 1.14) does block both the calling goroutine and the thread it runs on, but is handled differently from those of pollable descriptors: when a special monitoring thread notices a goroutine spent in a syscall more that certain amount of time (20 µs, IIRC), the runtime pulls the so-called "processor" (a runtime thing which runs goroutines on OS threads) from under the gorotuine and tries to make it run another goroutine on another OS thread; if there is a goroutine wanting to run but no free OS thread, the Go runtime creates another one.
Hence "normal" blocking I/O is still blocking in Go in both senses: it blocks both goroutines and OS threads, but the Go scheduler makes sure the program as a whole still able to make progress.
This could arguably be a perfect case for using true asynchronous I/O provided by the kernel, but it's not there yet.
¹ See this classic essay for more info.
² The Go runtime is certainly not the first one to pioneer this idea. For instance, look at the State Threads library (and the more recent libtask) which implement the same approach in plain C; the ST library has superb docs which explain the idea.

Blocking and Non-Blocking I/O

what is the difference between blocking i/o and non blocking i/o in unix like system. Does any one explain these concepts with real time
scenario. I already gone through the references which are in online and books. But, still Iam not able to understand the use of non-
blocking i/o. Does any one summarize what did you know about this instead of specifying any theory concepts which are alread in somewhere.
Often one has a process that does more than one single task. Some of those tasks may depend on external data.
Now imagine one of those tasks has to listen to some client that might make a request and to handle that request. For this the process must open a socket and listen to requests. With a blocking socket the process would now hang until a request actually comes in. This means that all other tasks the process has to handle cannot be handled until a request comes in! With a non-blocking socket however the socket command returns immediately if no request is pending. So the process can handle other tasks and come back and check for client requests on a regular base.
The same applies to files being read as input, though not that frequent: if a file is read whilst another process still writes into it, then a blocking read access will hang. A non-blocking access again allows to do other stuff in the mean time and return to reading the file later or on a regular base. Very important for log file processing, for example. So files that always get stuff appended per definition.
Other approaches exists to handle this issue. But blocking and non-blocking modes for file-/socket operations are a practical thing to keep complexity low.
Roughly.
When you buy a new house to be constructed, you use a non-blocking behavior, you buy it and do not wait (non-block) in place until the end of the construction. You just continue to live your life, and sometimes later either the constructor calls you to tells you that your new house is ready (signal, interrupt - passive wait), or you call him regularly to get some information about the construction process (poll - active wait).
When you go to a restaurant, you use a blocking behavior, you make your command and then wait (block) until being served.
In general, when you need something because you can't go further without what you need, you use a blocking scenario. When you need something but can do something else if what you need is not available now, you use a non-blocking scenario.

Why do we need both threading and asynchronous calls

My understanding is that threads exist as a way of doing several things in parallel that share the same address space but each has its individual stack. Asynchronous programming is basically a way of using fewer threads. I don't understand why it's undesirable to just have blocking calls and a separate thread for each blocked command?
For example, suppose I want to scrape a large part of the web. A presumably uncontroversial implementation might be to have a large number of asynchronous loops. Each loop would ask for a webpage, wait to avoid overloading the remote server, then ask for another webpage from the same website until done. The loops would then be executed on a much smaller number of threads (which is fine because they mostly wait). So to restate the question, I don't see why it's any cheaper to e.g. maintain a threadpool within the language runtime than it would be to just have one (mostly blocked) OS thread per loop and let the operating system deal with the complexity of scheduling the work? After all, if piling two different schedulers on top of each other is so good, it can still be implemented that way in the OS :-)
It seems obvious the answer is something like "threads are expensive". However, a thread just needs to keep track of where it has got to before it was interrupted the last time. This is pretty much exactly what an asynchronous command needs to know before it blocks (perhaps representing what happens next by storing a callback). I suppose one difference is that a blocking asynchronous command does so at a well defined point whereas a thread can be interrupted anywhere. If there really is a substantial difference in terms of the cost of keeping the state, where does it come from? I doubt it's the cost of the stack since that wastes at most a 4KB page, so it's a non-issue even for 1000s of blocked threads.
Many thanks, and sorry if the question is very basic. It might be I just cannot figure out the right incantation to type into Google.
Threads consume memory, as they need to have their state preserved even if they aren't doing anything. If you make an asynchronous call on a single thread, it's literally (aside from registering that the call was made somewhere) not consuming any resources until it needs to be resolved, because if it isn't actively being processed you don't care about it.
If the architecture of your application is written in a way that the resources it needs scale linearly (or worse) with the number of users / traffic you receive, then that is going to be a problem. You can watch this talk about node.js if you want to watch someone talk at length about this.
https://www.youtube.com/watch?v=ztspvPYybIY

Why can a signal handler not run during system calls?

In Linux a process that is waiting for IO can be in either of the states TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE.
The latter, for example, is the case when the process waits until the read from a regular file completes. If this read takes very long or forever (for whatever reasons), it leads to the problem that the process cannot receive any signals during that time.
The former, is the case if, for example, one waits for a read from a socket (which may take an unbounded amount of time). If such a read call gets interrupted, however, it leads to the problem that it requires the programmer/user to handle the errno == EINTER condition correctly, which he might forget.
My question is: Wouldn't it be possible to allow all system calls to be interrupted by signals and, moreover, in such a way that after the signal handler ran, the original system call simply continues its business? Wouldn't this solve both problems? If it is not possible, why?
The two halves of the question are related but I may have digested too much spiked eggnog to determine if there is so much a one-to-one as opposed to a one-to-several relationship here.
On the kernel side I think a TASK_KILLABLE state was added a couple of years back. The consequence of that was that while the process blocked in a system call (for whatever reason), if the kernel encountered a signal that was going to kill the process anyway, it would let nature take its course. That reduces the chances of a process being permanent stuck in an uninterruptible state. What progress has been made in changing individual system calls to avail themselves of that option I do not know.
On the user side, SA_RESTART and its convenience function cousin siginterrupt go some distance to alleviating the application programmer pain. But the fact that these are specified per signal and aren't honored by all system calls is a pretty good hint that there are reasons why a blanket interruption scheme like you propose can't be implemented, or at least easily implemented. Just thinking about the possible interactions of the matrices of signals, individual system calls, calls that have historically expected and backwardly supported behavior (and possibly multiple historical semantics based on their system bloodline!), is a bit dizzying. But maybe that is the eggnog.

Resources