PyGObject: How to detect the end of a Gio.DataInputStream?

PyGObject: How to detect the end of a Gio.DataInputStream? - python-3.x

I'm building a GTK-3 application in Python (3.6 to be precise) which launches an external binary in a Gio.Subprocess and reads that's output from a Gio.DataInputStream. I'm following this recipe to read the output line by line by asynchronous operations. This happens in a loop of queue_read() and the _on_data() callback, which is interrupted only by self.cancellable triggered from _on_finished() or an eventual exception during reading from the stream.
Depending on the parameters used to call the external process, there is some arbitrary delay between each line of the output (each line is a unit of measured values, which should be displayed as soon as it is available). It also can happen that my spawned process terminates way before everything is read from the stream. Therefor, I cannot cancel reading in _on_finished() as in the linked code - I need to read until I get the stream's end-of-file and then omit to schedule reading another line.
How can I test for eof on a Gio.DataInputStream?
In C++ io streams for example there's a flag indicating this condition. The Gio.DataInputStream also raises GLib.Error in certain situations. But neither some flag seems to be available here nor an error is raised in this case. The only condition I experienced is that the async read operation (the _on_data() callback) completes immediately with line set to None after the subprocess terminated and no more data is available from the stream. If this is the way to go, is it documented somewhere?

Related

Node.js pipe console error to another program (make it async)

From Expressjs documentation:
To keep your app purely asynchronous, you’d still want to pipe
console.err() to another program
Qestions:
Is it enough to run my node app with stdout and stderr redirect to not block event loop? Like this: node app 2>&1 | tee logFile ?
If ad.1 answer is true, then how to achieve non-blocking logging while using Winston or Bunyan? They have some built in mechanism to achieve this or they just save data to specific file wasting cpu time of current Node.js process? Or maybe to achieve trully async logging they should pipe data to child process that performs "save to file" (is it still performance positive?) ? Can anyone explain or correct me if my way of thinking is just wrong?
Edited part: I can assume that piping data from processes A, B, ...etc to process L is cheaper for this specific processes (A, B, ...) than writing it to file (or sending over network).
To the point:
I am designing logger for application that uses nodejs cluster.
Briefly - one of processes (L) will handle data streams from others, (A, B, ...).
Process L will queue messages (for example line by line or some other special separator) and log it one by one into file, db or anywhere else.
Advantage of this approach is reducing load of processes that can spent more time on doing their job.
One more thing - assumption is to simplify usage of this library so user will only include this logger without any additional interaction (stream redirection) via shell.
Do you think this solution makes sense? Maybe you know a library that already doing this?

Let's set up some ground level first...
Writing to a terminal screen (console.log() etc.), writing to a file (fs.writeFile(), fs.writeFileSync() etc.) or sending data to a stream process.stdout.write(data) etc.) will always "block the event loop". Why? Because some part of those functions is always written in JavaScript. The minimum amount of work needed by these functions would be to take the input and hand it over to some native code, but some JS will always be executed.
And since JS is involved, it will inevitably "block" the event loop because JavaScript code is always executed on a single thread no matter what.
Is this a bad thing...?
No. The amount of time required to process some log data and send it over to a file or a stream is quite low and does not have significant impact on performance.
When would this be a bad thing, then...?
You can hurt your application by doing something generally called a "synchronous" I/O operation - that is, writing to a file and actually not executing any other JavaScript code until that write has finished. When you do this, you hand all the data to the underlying native code and while theoretically being able to continue doing other work in JS space, you intentionally decide to wait until the native code responds back to you with the results. And that will "block" your event loop, because these I/O operations can take much much longer than executing regular code (disks/networks tend to be the slowest part of a computer).
Now, let's get back to writing to stdout/stderr.
From Node.js' docs:
process.stdout and process.stderr differ from other Node.js streams in important ways:
They are used internally by console.log() and console.error(), respectively.
They cannot be closed (end() will throw).
They will never emit the 'finish' event.
Writes may be synchronous depending on what the stream is connected to and whether the system is Windows or POSIX:
Files: synchronous on Windows and POSIX
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
Pipes (and sockets): synchronous on Windows, asynchronous on POSIX
I am assuming we are working with POSIX systems below.
In practice, this means that when your Node.js' output streams are not piped and are sent directly to the TTY, writing something to the console will block the event loop until the whole chunk of data is sent to the screen. However, if we redirect the output streams to something else (a process, a file etc.) now when we write something to the console Node.js will not wait for the completion of the operation and continue executing other JavaScript code while it writes the data to that output stream.
In practice, we get to execute more JavaScript in the same time period.
With this information you should be able to answer all your questions yourself now:
You do not need to redirect the stdout/stderr of your Node.js process if you do not write anything to the console, or you can redirect only one of the streams if you do not write anything to the other one. You may redirect them anyway, but if you do not use them you will not gain any performance benefit.
If you configure your logger to write the log data to a stream then it will not block your event loop too much (unless some heavy processing is involved).
If you care this much about your app's performance, do not use Winston or Bunyan for logging - they are extremely slow. Use pino instead - see the benchmarks in their readme.

To answer (1) we can dive into the Express documentation, you will see a link to the Node.js documentation for Console, which links to the Node documentation on the process I/O. There it describes how process.stdout and process.stderr behaves:
process.stdout and process.stderr differ from other Node.js streams in important ways:
They are used internally by console.log() and console.error(), respectively.
They cannot be closed (end() will throw).
They will never emit the 'finish' event.
Writes may be synchronous depending on what the stream is connected to and whether the system is Windows or POSIX:
Files: synchronous on Windows and POSIX
TTYs (Terminals): asynchronous on Windows, synchronous on POSIX
Pipes (and sockets): synchronous on Windows, asynchronous on POSIX
With that we can try to understand what will happen with node app 2>&1 | tee logFile:
Stdout and stderr is piped to a process tee
tee writes to both the terminal and the file logFile.
The important part here is that stdout and stderr is piped to a process, which means that it should be asynchronous.
Regarding (2) it would depend on how you configured Bunyan or Winston:
Winston has the concept of Transports, which essentially allows you to configure where the log will go. If you want asynchronous logs, you should use any logger other than the Console Transport. Using the File Transport should be ok, as it should create a file stream object for this and that is asynchronous, and won't block the Node process.
Bunyan has a similar configuration option: Streams. According to their doc, it can accept any stream interface. As long as you avoid using the process.stdout and process.stderr streams here you should be ok.

Is it safe to use write() multiple times on the same file without regard for concurrency as long as each write is to a different region of the file?

According to the docs for fs:
Note that it is unsafe to use fs.write multiple times on the same file without waiting for the callback. For this scenario, fs.createWriteStream is strongly recommended.
I am downloading a file in chunks (4 chunks downloading at a time concurrently). I know the full size of the file beforehand (I use truncate after opening the file to allocate the space upfront) and also the size and ultimate location in the file (byte offset from beginning of file) of each chunk. Once a chunk is finished downloading, I call fs.write to put that chunk of data into the file at its proper place. Each call to fs.write includes the position where the data should be written. I am not using the internal pointer at all. No two chunks will overlap.
I assume that the docs indicate that calling fs.write multiple times without waiting for the callback is unsafe because you can't know where the internal pointer is. Since I'm not using that, is there any problem with my doing this?

No its not safe. Simpy because you don't know if the first call to write has been successful when you execute the second call.
Imagine if the second call was successful but the first and third call wasn't and the fifth and sixth were successful as well.
And the chaos is perfect.
Plus NodeJS has a different execution stack than other interpreters have. You have no guarantee when specific code parts will be executed or in which order

Determine when(/after which input) a python subprocess crashes

I have a python subprocess that runs an arbitrary C++ program (student assignments if it matters) via POpen. The structure is such that i write a series of inputs to stdin, at the end i read all of stdout and parse for responses to each output.
Of course given that these are student assignments, they may crash after certain inputs. What i require is to know after which specific input their program crashed.
So far i know that when a runtime exception is thrown in the C++ program, its printed to stderr. So right not i can read the stderr after the fact and see that it did in face crash. But i haven't found a way to read stderr while the program is still running, so that i can infer that the error is in response to the latest input. Every SO question or article that i have run into seems to make use of subprocess.communicate(), but communicate seems to block until the subprocess returns, this hasn't been working for me because i need to continue sending inputs to the program after the fact if it hasn't crashed.

What i require is to know after which specific input their program crashed.
Call process.stdin.flush() after process.stdin.write(b'your input'). If the process is already dead then either .write() or .flush() will raise an exception (specific exception may depend on the system e.g, BrokenPipeError on POSIX).
Unrelated: If you are redirecting all three standard streams (stdin=PIPE, stdout=PIPE, stderr=PIPE) then make sure to consume stdout, stderr pipes concurrently while you are writing the input otherwise the child process may hang if it generates enough output to fill the OS pipe buffer. You could use threads, async. IO to do it -- code examples.

File getting blank during write operation node js

I have found this issue in following two situations.
When there is lots of free space on the server.
When there is no space available on the server.
I am reading particular JSON file using following:
fs.readFileSync(_file_path, 'utf-8');
and after some manipulation on the received data, I am writing the updated data to the same file using following:
fs.writeFileSync(_file_path, {stringified-json});
During this operation my file is becoming empty sometime, now I am trying to reproduce this issue locally but not able to reproduce it.

fs.writeFileSync() will throw if there was an error, so make sure you have the code in a try/catch block and that your catch is not simply swallowing the error. (For example, an empty catch block will cause the exception to be swallowed or, in other words, ignored.)
If this is a script or some other process that might get invoked multiple times simultaneously (e.g., multiple processes or you're using workers), then you need to use file locking or some other mechanism to make sure you don't have a race condition. If process A opens the file for writing (thus emptying it) and then process B opens the file for reading before process A is finished with the file, that could result in an empty file if process B reads the empty file and the code is written such that it will write an empty file as a result.
Without more information (e.g., error logs), any answer is likely to be pure guess work. But those are two things I'd check.

Run a process as a synchronous operation from a Win32 application

I have an existing utility application, let's call it util.exe. It's a command-line tool which takes inputs from the command line, and creates a file on disk, let's say an image file
I want to use this within another application, by running util.exe. However it needs to be synchronous so the file is known to exist when processing continues.
e.g (psudeo)
bool CreateImageFile(params)
{
//ret is util.exe program exit code
int ret = runprocess("util.exe",params);
return ret==0;
}
Is there a single Win32 API call that will run the process and wait until it ends? I looked at CreateProcess but it returns as soon as it tries to start, I looked at ShellExecute but that seems a bit ugly even it were synchronous.

There's no single api, but this is actually a more interesting general question for Win32 apps. You can use CreateProcess or ShellExecuteEx and WaitForSingleObject on the process handle. GetExitCodeProcess at that point will give you the program's exit code. See here for simple sample code.
However this blocks your main thread completely, and can give you serious deadlock problems under some Win32 messaging scenarios. Let's say the spawned exe does a broadcast sendmessage. It can't proceed until all windows have processed the message - but you can't proceed because you're blocked waiting for it. Deadlock. Since you're using purely command line programs this issue probably doesn't apply to you though. Do you care if a command line program hangs for a while?
The best general solution for normal apps is probably to split a process launch-and-wait off onto a thread and post a message back to your main window when the thread runs to completion. When you receive the message, you know it is safe to continue, and there are no deadlock issues.

Process handle is a waitable object, AFAIK. This is exactly what you need.
However, I'd recommend against doing anything like that. Starting process on windows may be slow and it will block your UI. Consider a PeekMessage loop with 50ms wait timeouts to do it from a windows application.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string