asyncio only processes one file at a time

asyncio only processes one file at a time - python-3.x

I'm working a program to upload (large) files to a remote SFTP-server, while also calculating the file's SHA256. The uploads are slow, and the program is supposed to open multiple SFTP-connections.
Here is the main code:
async def bwrite(fd, buf):
log.debug('Writing %d bytes to %s', len(buf), fd)
fd.write(buf)
async def digester(digest, buf):
log.debug('Updating digest %s with %d more bytes', digest, len(buf))
digest.update(buf)
async def upload(fNames, SFTP, rename):
for fName in fNames:
inp = open(fName, "rb", 655360)
log.info('Opened local %s', fName)
digest = hashlib.sha256()
rName = rename % os.path.splitext(fName)[0]
out = SFTP.open(rName, "w", 655360)
...
while True:
buf = inp.read(bsize)
if not buf:
break
await bwrite(out, buf)
await digester(digest, buf)
inp.close()
out.close()
...
for i in range(0, len(clients)):
fNames = args[(i * chunk):((i + 1) * chunk)]
log.debug('Connection %s: %d files: %s',
clients[i], len(fNames), fNames)
uploads.append(upload(fNames, clients[i], Rename))
log.info('%d uploads initiated, awaiting completion', len(uploads))
results = asyncio.gather(*uploads)
loop = asyncio.get_event_loop()
loop.run_until_complete(results)
loop.close()
The idea is for multiple upload coroutines to run "in parallel" -- each using its own separate SFTP-connection -- pushing out one or more files to the server.
It even works -- but only a single upload is running at any time. I expected multiple ones to get control -- while their siblings awaits the bwrite and/or the digester. What am I doing wrong?
(Using Python-3.6 on FreeBSD-11, if that matters... The program is supposed to run on RHEL7 eventually...)

If no awaiting is involved, then no parallelism can occur. bwrite and digester, while declared async, perform no async operations (not launching or creating coroutines that can be awaited); if you removed the async in their name and removed the await where they're called, the code would behave identically.
The only time asyncio can get you benefits is when:
There is a blocking operation involved, and
Said blocking operation is designed for asyncio use (or involves a file descriptor that can be rewrapped for said purposes)
Your bwrite isn't doing that (it's doing normal blocking I/O on SFTP objects, which doesn't appear to be async-friendly; if it was async, your failure to either return the future it produced, or await it yourself, would usually mean it does nothing, barring the off-chance it self-scheduled a task internally), your reads from the input file aren't either (which is fine; normally, changing blocking I/O to asyncio for local file access isn't beneficial; it's all being buffered, at user level and kernel level, so you almost never block on the writes). Nor is your digester (hashing operations are CPU bound; it never makes sense to make them async unless they're being done with actual async stuff).
Since the two awaits in upload are effectively synchronous (they don't return anything that, when awaited, would actually block on actual asynchronous tasks), upload itself is effectively synchronous (it will never, under any circumstances, return control to the event loop before it completes). So even though all the other tasks are in the event loop queue, raring to go, the event loop itself has to wait until the running task blocks in an async-friendly way (with await on something that actually does background work while blocked), which never happens, and the tasks just get run sequentially, one after the other.
If an async-friendly version of your SFTP module exists, that might allow you to gain some benefit. But without it, you're probably better off using concurrent.futures.ThreadPoolExecutor or multiprocessing.pool.ThreadPool to do preemptive multitasking (which will swap out threads whenever they release the GIL, forcibly swapping between bytecodes if they don't release the GIL for awhile). That will get you parallelism on any blocking I/O (async-friendly or not), and, if the data is large enough, on the hashing work as well (hashlib is one of the only Python built-ins I know of that releases the GIL for CPU-bound work, if the data to be hashed is large enough; extension modules releasing the GIL is the only way multithreaded CPython can do more than one core's worth of CPU-bound work in a single process, even on multicore systems).

Related

Doubts about event loop , multi-threading, order of executing readFile() in Node.js?

fs.readFile("./large.txt", "utf8", (err, data) => {
console.log('It is a large file')
//this file has many words (11X MB).
//It takes 1-2 seconds to finish reading (usually 1)
});
fs.readFile("./small.txt","utf8", (err, data) => {
for(let i=0; i<99999 ;i++)
console.log('It is a small file');
//This file has just one word.
//It always takes 0 second
});
Result:
The console will always first print "It is a small file" for 99999 times (it takes around 3 seconds to finish printing).
Then, after they are all printed, the console does not immediately print "It is a large file". (It is always printed after 1 or 2 seconds).
My thought:
So, it seems that the first readFile() and second readFile() functions do not run in parallel. If the two readFile() functions ran in parallel, then I would expect that after "It is a small file" was printed for 99999 times,
the first readFile() is finished reading way earlier (just 1 second) and the console would immediately print out the callback of the first readFile() (i.e. "It is a large file".)
My questions are :
(1a) Does this mean that the first readFile() will start to read file only after the callback of second readFile() has done its work?
(1b) To my understanding, in nodeJs, event loop passes the readFile() to Libuv multi-thread. However, I wonder in what order they are passed. If these two readFile() functions do not run in parallel, why is the second readFile() function always executed first?
(2) By default, Libuv has four threads for Node.js. So, here, do these two readFile() run in the same thread? Among these four threads, I am not sure whether there is only one for readFile().
Thank you very much for spending your time! Appreciate!

I couldn't possibly believe that node would delay the large file read until the callback for the small file read had completed, and so I did a little more instrumentation of your example:
const fs = require('fs');
const readLarge = process.argv.includes('large');
const readSmall = process.argv.includes('small');
if (readLarge) {
console.time('large');
fs.readFile('./large.txt', 'utf8', (err, data) => {
console.timeEnd('large');
if (readSmall) {
console.timeEnd('large (since busy wait)');
}
});
}
if (readSmall) {
console.time('small');
fs.readFile('./small.txt', 'utf8', (err, data) => {
console.timeEnd('small');
var stop = new Date().getTime();
while(new Date().getTime() < stop + 3000) { ; } // busy wait
console.time('large (since busy wait)');
});
}
(Note that I replaced your loop of console.logs with a 3s busy wait).
Running this against node v8.15.0 I get the following results:
$ node t small # read small file only
small: 0.839ms
$ node t large # read large file only
large: 447.348ms
$ node t small large # read both files
small: 3.916ms
large: 3252.752ms
large (since busy wait): 247.632ms
These results seem sane; the large file took ~0.5s to read on its own, but when the busy waiting callback interfered for 2s, it completed relatively soon (~1/4s) thereafter. Tweaking the length of the busy wait keeps this relatively consistent, so I'd be willing to just say this was some kind of scheduling overhead and not necessarily a sign that the large file I/O had not been running during the busy wait.
But then I ran the same program against node 10.16.3, and here's what I got:
$ node t small
small: 1.614ms
$ node t large
large: 1019.047ms
$ node t small large
small: 3.595ms
large: 4014.904ms
large (since busy wait): 1009.469ms
Yikes! Not only did the large file read time more than double (to ~1s), it certainly appears as if no I/O at all had been completed before the busy wait concluded! i.e., it sure looks like the busy wait in the main thread prevented any I/O at all from happening on the large file.
I suspect that this change from 8.x to 10.x is a result of this "optimization" in Node 10: https://github.com/nodejs/node/pull/17054. This change, which splits the read of large files into multiple operations, seems to be appropriate for smoothing performance of the system in general purpose cases, but it is likely exacerbated by the unnatural long main thread processing / busy wait in this scenario. Presumably, without the main thread yielding, the I/O is not getting a chance to advance to the next range of bytes in the large file to be read.
It would seem that, with Node 10.x, it is important to have a responsive main thread (i.e., one that yields frequently, and doesn't busy wait like in this example) in order to sustain I/O performance of large file reads.

(1a) Does this mean that the first readFile() will start to read file only after the callback of second readFile() has done its work?
No. Each readFile() actually consists of multiple steps (open file, read chunk, read chunk ... close file). The logic flow between steps is controlled by Javascript code in the node.js fs library. But, a portion of each step is implemented by native threaded code in libuv using a thread pool.
So, the first step of the first readFile() will be initiated and then control is returned back to the JS interpreter. Then, the first step of the second readFile() will be initiated and then control returned back to the JS interpreter. It can ping pong back and forth between progress in the two readFile() operations as long as the JS interpreter isn't kept busy. But, if the JS interpreter does get busy for awhile, it will stall further progress when the current step that's proceeding in the background completes. There's a full step-by-step chronology at the end of the answer if you want to follow the details of each step.
(1b) To my understanding, in nodeJs, event loop passes the readFile() to Libuv multi-thread. However, I wonder in what order they are passed. If these two readFile() functions do not run in parallel, why is the second readFile() function always executed first?
fs.readFile() itself is not implemented in libuv. It's implemented as a series of individual steps in node.js Javascript. Each individual step (open file, read chunk, close file) is implemented in libuv, but Javascript in the fs library controls the sequencing between steps. So, think of fs.readfile() as a series of calls to libuv. When you have two fs.readFile() operations in flight at the same time, each will have some libuv operation going at any given time and one step for each fs.readFile() can be proceeding in parallel due to the thread pool implementation in libuv. But, between each step in the process, control comes back the JS interpreter. So, if the interpreter gets busy for some extended portion of time, then further progress in scheduling the next step of the other fs.readFile() operation is stalled.
(2) By default, Libuv has four threads for Node.js. So, here, do these two readFile() run in the same thread? Among these four threads, I am not sure whether there is only one for readFile().
I think this is covered in the previous two explanations. readFile() itself is not implemented in native code of libuv. Instead, it's written in Javascript with calls to open, read, close operations that are written in native code and use libuv and the thread pool.
Here's a full accounting of what's going on. To fully understand, one needs to know about these:
Main Concepts
The single threaded, non-pre-emptive nature of node.js running your Javascript (assuming no WorkerThreads are manually coded here - which they aren't).
The multi-threaded, native code of the fs module's file I/O and how that works.
How native code asynchronous operations communicate completion via the event queue and how event loop scheduling works when the JS interpreter is busy doing something.
Asynchronous, Non-Blocking
I presume you know that fs.readFile() is asynchronous and non-blocking. That means when you call it, all it does is initiate an operation to read the file and then it goes right onto the next line of code at the top level after the fs.readFile() (not the code inside the callback you pass to it).
So, a condensed version of your code is basically this:
fs.readFile(x, funcA);
fs.readFile(y, funcB);
If we added some logging to this:
function funcA() {
console.log("funcA");
}
function funcB() {
console.log("funcB");
}
function spin(howLong) {
let finishTime = Date.now() + howLong;
// spin until howLong ms passes
while (Date.now() < finishTime) {}
}
console.log("1");
fs.readFile(x, funcA);
console.log("2");
fs.readFile(y, funcB);
console.log("3");
spin(30000); // spin for 30 seconds
console.log("4");
You would see either this order:
1
2
3
4
A
B
or this order:
1
2
3
4
B
A
Which of the two it was would just depend upon the indeterminate race between the two fs.readFile() operations. Either could happen. Also, notice that 1, 2, 3 and 4 are all logged before any asynchronous completion events can occur. This is because the single-threaded, non-pre-emptive JS interpreter main thread is busy executing Javascript. It won't pull the next event out of the event queue until it's done executing this piece of Javascript.
Libuv Thread Pool
As you appear to already know, the fs module uses a libuv thread pool for running file I/O. That's independent of the main JS thread so those read operations can proceed independently from further JS execution. Using native code, file I/O will communicate with the event queue when they are done to schedule their completion callback.
Indeterminate Race Between Two Asynchronous Operations
So, you've just created an indeterminate race between the two fs.readFile() operations that are likely each running in their own thread. A small file is much more likely to complete first before the larger file because the larger file has a lot more data to read from the disk.
Whichever fs.readFile() finishes first will insert its callback into the event queue first. When the JS interpreter is free, it will pick the next event out of the event queue. Whichever one finishes first gets to run its callback first. Since the small file is likely to finish first (which is what you are reporting), it gets to run its callback. Now, when it is running its callback, this is just Javascript and even though the large file may finish and insert its callback into the event queue, that callback can't run until the callback from the small file finishes. So, it finishes and THEN the callback from the large file gets to run.
In general, you should never write code like this unless you don't care at all what order the two asynchronous operations finish in because it's an indeterminate race and you cannot count on which one will finish first. Because of the asynchronous non-blocking nature of fs.readFile(), there is no guarantee that the first file operation initiated will finish first. It's no different than firing off two separate http requests one after the other. You don't know which one will complete first.
Step By Step Chronology
Here's a step by step chronology of what happens:
You call fs.readFile("./large.txt", ...);
In Javascript code, that initiates opening the large.txt file by calling native code and then returns. The opening of the actual file is handled by libuv in native code and when that is done, an event will be inserted into the JS event queue.
Immediately after that operation is initiated, then that first fs.readFile() returns (not yet done yet, still processing internally).
Now the JS interpreter picks up at the next line of code and runs fs.readFile("./small.txt", ...);
In Javascript code, that initiates opening the small.txt file by calling native code and then returns. The opening of the actual file is handled by libuv in native code and when that is done, an event will be inserted into the JS event queue.
Immediately after that operation is initiated, then that second fs.readFile() returns (not yet done yet, still processing internally).
The JS interpreter is actually free to run any following code or process any incoming events.
Then, some time later, one of the two fs.readFile() operations finishes its first step (opening the file), an event is inserted into the JS event queue and when the JS interpreter has time, a callback is called. Since opening each file is about the same operation time, it's likely that the open operation for the large.txt file finishes first, but that isn't guaranteed.
After the file open succeeds, it initiates an asynchronous operation to read the first chunk from the file. This again is asynchronous and is handled by libuv so as soon as this is initiated, it returns control back to the JS interpreter.
The second file open likely finises next and it does the same thing as the first, initiates reading the first chunk of data from disk and returns control back to the JS interpreter.
Then, one of these two chunk reads finishes and inserts an event into the event queue and when the JS interpreter is free, a callback is called to process that. At this point, this could be either the large or small file, but for purposes of simplicity of explanation, lets assume the first chunk of the large file finishes first. It will buffer that chunk, see that there is more data to read and will initiate another asynchronous read operation and then return control back to the JS interpreter.
Then, the other first chunk read finishes. It will buffer that chunk and see that there is no more data to read. At this point, it will issue a file close operation which is again handled by libuv and control is returned back to the JS interpreter.
One of the two previous operations completes (a second block read from large.txt or a file close of small.txt) and its callback is called. Since the close operation doesn't have to actually touch the disk (it just goes into the OS), let's assume the close operation finishes first for purposes of explanation. That close triggers the end of the fs.ReadFile() for small.txt and calls the completion callback for that.
So, at this point, small.txt is done and large.txt has read one chunk from its file and is awaiting completion of the second chunk to read.
Your code now executes the for loop that takes whatever time that takes.
By the point that finishes and the JS interpreter is free again, the 2nd file read from large.txt is probably done so the JS interpreter finds it's event in the event queue and executes a callback to do some more processing on reading more chunks from that file.
The process of reading a chunk, returning control back to the interpreter, waiting for the next chunk completion event and then buffering that chunk continues until all the data has been read.
Then a close operation is initiated for large.txt.
When that close operation is done, the callback for the fs.readFile() for large.txt is called and your code that is timing large.txt will measure completion.
So, because the logic of fs.readFile() is implemented in Javascript with a number of discrete asynchronous steps with each one ultimately handled by libuv (open file, read chunk - N times, close file), there will be an interleaving of the work between the two files. The reading of the smaller file will finish first just because it has fewer and smaller read operations. When it finishes, the large file will still have multiple more chunks to read and a close operation left. Because the multiple steps of fs.readFile() are controlled through Javascript, when you do the long for loop in the small.txt completion, you are stalling the fs.readFile() operation for the large.txt file too. Whatever chunk read was in progress when that loop happened will complete in the background, but the next chunk read won't get issued until that small file callback completes.
It appears that there would be an opportunity for node.js to improve the responsiveness of fs.readFile() in competitive circumstances like this if that operation was rewritten entirely in native code so that one native code operation could read the contents of the whole file rather than all these transitions back and forth between the single threaded main JS thread and libuv. If this was the case, the big for loop wouldn't stall the progress of large.txt because it would be progressing entirely in a libuv thread rather than waiting for some cycles from the JS interpreter in order to get to its next step.
We can theorize that if both files were able to be read in one chunk, then not much would get stalled by the long for loop. Both files would get opened (which should take approximately the same time for each). Both operations would initiate a read of their first chunk. The read for the smaller file would likely complete first (less data to read), but actually this depends upon both OS and disk controller logic. Because the actual reads are handed off to threaded code, both reads will be pending at the same time. Assuming the smaller read finishes first, it would fire completion and then during the busy loop the large read would finish, inserting an event in the event queue. When the busy loop finishes, the only thing left to do on the larger file (but still something that can was read in one chunk) would be to close the file which is a faster operation.
But, when the larger file can't be read in one chunk and needs multiple chunks of reading, that's why its progress really gets stalled by the busy loop because a chunk finishes, but the next chunk doesn't get scheduled until the busy loop is done.
Testing
So, let's test out all this theory. I created two files. small.txt is 558 bytes. large.txt is 255,194,500 bytes.
Then, I wrote the following program to time these and allow us to optionally do a 3 second spin loop after the small one finishes.
const fs = require('fs');
let doSpin = false; // -s will set this to true
let fname = "./large.txt";
for (let i = 2; i < process.argv.length; i++) {
let arg = process.argv[i];
console.log(`"${arg}"`);
if (arg.startsWith("-")) {
switch(arg) {
case "-s":
doSpin = true;
break;
default:
console.log(`Unknown arg ${arg}`);
process.exit(1);
break;
}
} else {
fname = arg;
}
}
function padDecimal(num, n = 3) {
let str = num.toFixed(n);
let index = str.indexOf(".");
if (index === -1) {
str += ".";
index = str.length - 1;
}
let zeroesToAdd = n - (str.length - index);
while (zeroesToAdd-- >= 0) {
str += "0";
}
return str;
}
let startTime;
function log(msg) {
if (!startTime) {
startTime = Date.now();
}
let diff = (Date.now() - startTime) / 1000; // in seconds
console.log(padDecimal(diff), ":", msg)
}
function funcA(err, data) {
if (err) {
log("error on large");
log(err);
return;
}
log("large completed");
}
function funcB(err, data) {
if (err) {
log("error on small");
log(err);
return;
}
log("small completed");
if (doSpin) {
spin(3000);
log("spin completed");
}
}
function spin(howLong) {
let finishTime = Date.now() + howLong;
// spin until howLong ms passes
while (Date.now() < finishTime) {}
}
log("start");
fs.readFile(fname, funcA);
log("large initiated");
fs.readFile("./small.txt", funcB);
log("small initiated");
Then (using node v12.13.0), I ran it both with and without the 3 second spin. Without the spin, I get this output:
0.000 : start
0.015 : large initiated
0.016 : small initiated
0.021 : small completed
0.240 : large completed
This shows a 0.219 second delta between the time to complete small and large (while running both at the same time).
Then, inserting the 3 second delay, we get this output:
0.000 : start
0.003 : large initiated
0.004 : small initiated
0.009 : small completed
3.010 : spin completed
3.229 : large completed
We have the exact same 0.219 second delta between the time to complete the small and the large (while running both at the same time). This shows that the large fs.readFile() essentially made no progress during the 3 second spin. It's progress was completely blocked. As we've theorized in the previous explanation, this is apparently because the progression from one chunked read to the next is written in Javascript and while the spin loop is running, that progression to the next chunk is blocked so it can't make any further progress.
How Big A File Makes Big File Finish Second?
If you look in the code for fs.readFile() in the source for node v12.13.0, you can find that the chunk size it reads is 512 * 1024 which is 512k. So, in theory, it's possible that the larger file might finish first if it can be read in one chunk. Whether that actually happens or not depends upon some OS and disk implementation details, but I thought I'd try it on my laptop running a current version of Windows 10 with an SSD drive.
I found that, for a 255k "large" file, it does finish before the small file (essentially in execution order). So, because the large file read is started before the small file read, even though it has more data to read, it will still finish before the small file.
0.000 : start
0.003 : large initiated
0.003 : small initiated
0.007 : large completed
0.008 : small completed
Keep in mind, this is OS and disk dependent so this is not guaranteed.

File I/O in Node.js runs in separate thread. But this does not matter. Node.js always executes all callbacks in the main thread. I/O callbacks are never executed in a separate thread (the file read operation is done in a separate thread then when it is finished will signal the main thread to run your callback). This essentially makes node.js single-threaded because all the code you write runs in the main thread (we're of course ignoring the worker_threads module/API which allows you to manually execute code in separate threads).
But the bytes in the files are read in parallel (or as parallel as your hardware allows - depending on the number of free DMA channels, which disk each file is from etc). What is parallel is the wait. Asynchronous I/O in any language (node.js, Java, C++, Python etc.) is basically an API that allows you to wait in parallel but handle events in a single thread. There is a word for this kind of parallel: concurrent. It is essentially parallel wait (while data is handled in parallel by your hardware) but not parallel code execution.

I think that you understand the behavior of event loop and libuv, don't lose your way.
My answers :
1a) Of course the two read file are executed in two different threads , I tried to run your code replacing a large file with a small one and the output was
It is a large file
It is a small file
1b) The second call just end before in your case and then the callback is invoked before.
2 ) As you said , libuv has four threads by default , but be sure that the default are not changed setting the env variable UV_THREADPOOL_SIZE ( http://docs.libuv.org/en/v1.x/threadpool.html )
I tried to work with a large and a big file , to read the big file my PC take 23/25 ms , to read the small file it take 8/10 ms.
When I try to read both the process terminate in 26/27 ms and this demonstrate that the two read file are executed in parallel .
Try to measure the time that your code take from small file callback to large file callback :
console.log(process.env.UV_THREADPOOL_SIZE)
const fs = require('fs')
const start = Date.now()
let smallFileEnd
fs.readFile("./alphabet.txt", "utf8", (err, data) => {
console.log('It is a large file')
console.log(`From the start to now are passed ${Date.now() - start} ms`)
console.log(`From the small file end to now are passed ${Date.now() - smallFileEnd} ms`)
//this file has many words (11X MB).
//It takes 1-2 seconds to finish reading (usually 1)
// 18ms to execute
});
fs.readFile("./stackoverflow.js","utf8", (err, data) => {
for(let i=0; i<99999 ;i++)
if(i === 99998){
smallFileEnd = Date.now()
console.log('is a small file ')
console.log(`From the start to now are passed ${Date.now() - start} ms`)
// 4/7 ms to execute
}
});

Wrapping synchronous requests into asyncio (async/await)?

I am writing a tool in Python 3.6 that sends requests to several APIs (with various endpoints) and collects their responses to parse and save them in a database.
The API clients that I use have a synchronous version of requesting a URL, for instance they use
urllib.request.Request('...
Or they use Kenneth Reitz' Requests library.
Since my API calls rely on synchronous versions of requesting a URL, the whole process takes several minutes to complete.
Now I'd like to wrap my API calls in async/await (asyncio). I'm using python 3.6.
All the examples / tutorials that I found want me to change the synchronous URL calls / requests to an async version of it (for instance aiohttp). Since my code relies on API clients that I haven't written (and I can't change) I need to leave that code untouched.
So is there a way to wrap my synchronous requests (blocking code) in async/await to make them run in an event loop?
I'm new to asyncio in Python. This would be a no-brainer in NodeJS. But I can't wrap my head around this in Python.

The solution is to wrap your synchronous code in the thread and run it that way. I used that exact system to make my asyncio code run boto3 (note: remove inline type-hints if running < python3.6):
async def get(self, key: str) -> bytes:
s3 = boto3.client("s3")
loop = asyncio.get_event_loop()
try:
response: typing.Mapping = \
await loop.run_in_executor( # type: ignore
None, functools.partial(
s3.get_object,
Bucket=self.bucket_name,
Key=key))
except botocore.exceptions.ClientError as e:
if e.response["Error"]["Code"] == "NoSuchKey":
raise base.KeyNotFoundException(self, key) from e
elif e.response["Error"]["Code"] == "AccessDenied":
raise base.AccessDeniedException(self, key) from e
else:
raise
return response["Body"].read()
Note that this will work because the vast amount of time in the s3.get_object() code is spent in waiting for I/O, and (generally) while waiting for I/O python releases the GIL (the GIL is the reason that generally threads in python is not a good idea).
The first argument None in run_in_executor means that we run in the default executor. This is a threadpool executor, but it may make things more explicit to explicitly assign a threadpool executor there.
Note that, where using pure async I/O you could easily have thousands of connections open concurrently, using a threadpool executor means that each concurrent call to the API needs a separate thread. Once you run out of threads in your pool, the threadpool will not schedule your new call until a thread becomes available. You can obviously raise the number of threads, but this will eat up memory; don't expect to be able to go over a couple of thousand.
Also see the python ThreadPoolExecutor docs for an explanation and some slightly different code on how to wrap your sync call in async code.

Will Go block the current thread when doing I/O inside a goroutine?

I am confused over how Go handles non-blocking I/O. Go's APIs look mostly synchronous to me, and when watching presentations on Go, it's not uncommon to hear comments like "and the call blocks".
Is Go using blocking I/O when reading from files or the network? Or is there some kind of magic that re-writes the code when used from inside a goroutine?
Coming from a C# background, this feels very unintuitive, as in C# we have the await keyword when consuming async APIs, which clearly communicates that the API can yield the current thread and continue later inside a continuation.
TLDR; will Go block the current thread when doing I/O inside a goroutine?, or will it be transformed into a C# like async/await state machine using continuations?

Go has a scheduler that lets you write synchronous code, and does context switching on its own and uses async I/O under the hood. So if you're running several goroutines, they might run on a single system thread, and when your code is blocking from the goroutine's view, it's not really blocking. It's not magic, but yes, it masks all this stuff from you.
The scheduler will allocate system threads when they're needed, and during operations that are really blocking (file I/O is blocking, for example, or calling C code). But if you're doing some simple http server, you can have thousands and thousands of goroutines using actually a handful of "real threads".
You can read more about the inner workings of Go here.

You should read #Not_a_Golfer answer first and the link he provided to understand how goroutines are scheduled. My answer is more like a deeper dive into network IO specifically. I assume you understand how Go achieves cooperative multitasking.
Go can and does use only blocking calls because everything runs in goroutines and they're not real OS threads. They're green threads. So you can have many of them all blocking on IO calls and they will not eat all of your memory and CPU like OS threads would.
File IO is just syscalls. Not_a_Golfer already covered that. Go will use real OS thread to wait on a syscall and will unblock the goroutine when it returns. Here you can see file read implementation for Unix.
Network IO is different. The runtime uses "network poller" to determine which goroutine should unblock from IO call. Depending on the target OS it will use available asynchronous APIs to wait for network IO events. Calls look like blocking but inside everything is done asynchronously.
For example, when you call read on TCP socket goroutine first will try to read using syscall. If nothing is arrived yet it will block and wait for it to be resumed. By blocking here I mean parking which puts the goroutine in a queue where it awaits resuming. That's how "blocked" goroutine yields execution to other goroutines when you use network IO.
func (fd *netFD) Read(p []byte) (n int, err error) {
if err := fd.readLock(); err != nil {
return 0, err
}
defer fd.readUnlock()
if err := fd.pd.PrepareRead(); err != nil {
return 0, err
}
for {
n, err = syscall.Read(fd.sysfd, p)
if err != nil {
n = 0
if err == syscall.EAGAIN {
if err = fd.pd.WaitRead(); err == nil {
continue
}
}
}
err = fd.eofError(n, err)
break
}
if _, ok := err.(syscall.Errno); ok {
err = os.NewSyscallError("read", err)
}
return
}
https://golang.org/src/net/fd_unix.go?s=#L237
When data arrives network poller will return goroutines that should be resumed. You can see here findrunnable function that searches for goroutines that can be run. It calls netpoll function which will return goroutines that can be resumed. You can find kqueue implementation of netpoll here.
As for async/wait in C#. async network IO will also use asynchronous APIs (IO completion ports on Windows). When something arrives OS will execute callback on one of the threadpool's completion port threads which will put continuation on the current SynchronizationContext. In a sense, there are some similarities (parking/unparking does looks like calling continuations but on a much lower level) but these models are very different, not to mention the implementations. Goroutines by default are not bound to a specific OS thread, they can be resumed on any one of them, it doesn't matter. There're no UI threads to deal with. Async/await are specifically made for the purpose of resuming the work on the same OS thread using SynchronizationContext. And because there're no green threads or a separate scheduler async/await have to split your function into multiple callbacks that get executed on SynchronizationContext which is basically an infinite loop that checks a queue of callbacks that should be executed. You can even implement it yourself, it's really easy.

Serial Dispatch Queue with Asynchronous Blocks

Is there ever any reason to add blocks to a serial dispatch queue asynchronously as opposed to synchronously?
As I understand it a serial dispatch queue only starts executing the next task in the queue once the preceding task has completed executing. If this is the case, I can't see what you would you gain by submitting some blocks asynchronously - the act of submission may not block the thread (since it returns straight-away), but the task won't be executed until the last task finishes, so it seems to me that you don't really gain anything.
This question has been prompted by the following code - taken from a book chapter on design patterns. To prevent the underlying data array from being modified simultaneously by two separate threads, all modification tasks are added to a serial dispatch queue. But note that returnToPool adds tasks to this queue asynchronously, whereas getFromPool adds its tasks synchronously.
class Pool<T> {
private var data = [T]();
// Create a serial dispath queue
private let arrayQ = dispatch_queue_create("arrayQ", DISPATCH_QUEUE_SERIAL);
private let semaphore:dispatch_semaphore_t;
init(items:[T]) {
data.reserveCapacity(data.count);
for item in items {
data.append(item);
}
semaphore = dispatch_semaphore_create(items.count);
}
func getFromPool() -> T? {
var result:T?;
if (dispatch_semaphore_wait(semaphore, DISPATCH_TIME_FOREVER) == 0) {
dispatch_sync(arrayQ, {() in
result = self.data.removeAtIndex(0);
})
}
return result;
}
func returnToPool(item:T) {
dispatch_async(arrayQ, {() in
self.data.append(item);
dispatch_semaphore_signal(self.semaphore);
});
}
}

Because there's no need to make the caller of returnToPool() block. It could perhaps continue on doing other useful work.
The thread which called returnToPool() is presumably not just working with this pool. It presumably has other stuff it could be doing. That stuff could be done simultaneously with the work in the asynchronously-submitted task.
Typical modern computers have multiple CPU cores, so a design like this improves the chances that CPU cores are utilized efficiently and useful work is completed sooner. The question isn't whether tasks submitted to the serial queue operate simultaneously — they can't because of the nature of serial queues — it's whether other work can be done simultaneously.

Yes, there are reasons why you'd add tasks to serial queue asynchronously. It's actually extremely common.
The most common example would be when you're doing something in the background and want to update the UI. You'll often dispatch that UI update asynchronously back to the main queue (which is a serial queue). That way the background thread doesn't have to wait for the main thread to perform its UI update, but rather it can carry on processing in the background.
Another common example is as you've demonstrated, when using a GCD queue to synchronize interaction with some object. If you're dealing with immutable objects, you can dispatch these updates asynchronously to this synchronization queue (i.e. why have the current thread wait, but rather instead let it carry on). You'll do reads synchronously (because you're obviously going to wait until you get the synchronized value back), but writes can be done asynchronously.
(You actually see this latter example frequently implemented with the "reader-writer" pattern and a custom concurrent queue, where reads are performed synchronously on concurrent queue with dispatch_sync, but writes are performed asynchronously with barrier with dispatch_barrier_async. But the idea is equally applicable to serial queues, too.)
The choice of synchronous v asynchronous dispatch has nothing to do with whether the destination queue is serial or concurrent. It's simply a question of whether you have to block the current queue until that other one finishes its task or not.
Regarding your code sample code, that is correct. The getFromPool should dispatch synchronously (because you have to wait for the synchronization queue to actually return the value), but returnToPool can safely dispatch asynchronously. Obviously, I'm wary of seeing code waiting for semaphores if that might be called from the main thread (so make sure you don't call getFromPool from the main thread!), but with that one caveat, this code should achieve the desired purpose, offering reasonably efficient synchronization of this pool object, but with a getFromPool that will block if the pool is empty until something is added to the pool.

What is a blocking function?

What is a blocking function or a blocking call?
This is a term I see again and again when referring to Node.js or realtime processing languages.

A function that stops script execution until it ends.
For example, if I had a function in my language that was used to write to a file, like so:
fwrite(file, "Contents");
print("Wrote to file!");
The print statement would only be executed once the file has been written to the disk. The whole program is halted on this instruction. This isn't noticeable for small enough writes, but imagine I had a huge blob to write to the file, one that took many seconds:
fwrite(file, blob);
print("Wrote to file!");
The print statement would only be executed after a few seconds of writting, and the whole program would be stopped for that time. In Node.js, this stuff is done asynchronously, using events and callbacks. Our example would become:
fwrite(file, blob, function() {
print("Wrote to file!");
});
print("Do other stuff");
Where the third parameter is a function to be called once the file has been written. The print statement located after the write function would be called immediately after, whether or not the file has been written yet. So if we were to write a huge enough blob, the output might look like this:
Do other stuff
Wrote to file!
This makes applictions very fast because you're not waiting on a client message, a file write or other. You can keep on processing the data in a parallel manner. This is considered by many one of the strengths of Node.js.

var block = function _block() {
while(true) {
readInputs();
compute();
drawToScreen();
}
}
A blocking function basically computes forever. That's what it means by blocking.
Other blocking functions would wait for IO to occur
a non-blocking IO system means a function starts an IO action, then goes idle then handles the result of the IO action when it happens.
It's basically the difference between a thread idling and sleeping.

A blocking call is one that doesn't allow processing to continue until it returns to the calling thread - this is also referred to as a synchronous call. Asynchronous on the other hand means that threads (and code) can execute at the same time (concurrently).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string