How to make `write()` system call on Linux immediately effective? - linux

I am writing a REPL(read-execute-print-loop) for C. I try to maintain a header file so that I can define new functions based on the previous functions. Whenever I define a new function, I get a new temporary file like this:
#include "/tmp/header.h"
int foo() {
return func() * func();
}
And the /tmp/header.h is like:
int func();
int foo();
where func() is a previously defined function.
So I need to call write() on header_fileno again and again. What I am concerned is---is it possible that after I called write(header_fileno, buf, wrsize), the contents of buf is stored in some kernel buffer instead of being written into the actual file? Because if that happens, I cannot count on the header to give up-to-date declarations. I have the same concern when it comes to the source file. And if that happens, is there a way to make it immediately effective?

You may safely assume that any process, including the current one, which makes a read() call after you've called write() will see the updated file, even if the file is still in a kernel buffer and not fully written to disk. POSIX mandates this behavior:
If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes.
Having said that, this doesn't apply if you use the stdio functions, which may buffer data before writing. It also doesn't guarantee that your data won't be lost or corrupted if your system crashes; if you need that guarantee, you must use fsync() or open the file with O_SYNC.

According to the Linux man page, there is no guarantee that write will commit your data to disk. Use fsync (or close the file) to flush the written data to the disk. (lseek might also be useful, if you want to use just one file descriptor.)
However, if you are going to be modifying the contents of the files frequently, as you might expect in a REPL, you might want to store them in memory instead, so you can manipulate them more easily.
Similarly, rather than storing live REPL code as text, you might want to store its information in another form, so you can access and change it more easily.

Related

Is there a system call or some way to know the type of file descriptor in Linux (e.g. regular file fd, socket fd, signal fd, timer fd)?

As I keep discovering, there are a variety of File Descriptors - Almost every thing is abstracted around a file descriptor: regular files, sockets, signals and timers (for example). All file descriptors are merely integers.
Given a file descriptor, is it possible to know what type it is? For example, it would be nice to have a system call such as getFdType(fd).
If an epoll_wait is awakened due to multiple file descriptors getting ready, the processing of each file descriptor will be based upon its type. That is the reason I need the type.
Of course, I can maintain this info separately myself but it would be more convenient to have the system support it.
Also, are all file descriptors, irrespective of the type, sequential. I mean if you open a regular data file, then create a timer file descriptor, then a signal file descriptor, are they all guaranteed to be numbered sequentially?
As "that other guy" mentioned, the most obvious such call is fstat. The st_mode member contains bits to distinguish between regular files, devices, sockets, pipes, etc.
But in practice, you will almost certainly need to keep track yourself of which fd is which. Knowing it's a regular file doesn't help too much when you have several different regular files open. So since you have to maintain this information somewhere in your code anyway, then referring back to that record would seem to be the most robust way to go.
(It's also going to be much faster to check some variables within your program than to make one or several additional system calls.)
Also, are all file descriptors, irrespective of the type, sequential. I mean if you open a regular data file, then create a timer file descriptor, then a signal file descriptor, are they all guaranteed to be numbered sequentially?
Not really.
As far as I know, calls that create a new fd will always return the lowest-numbered available fd. There are old programs that rely on this behavior; before dup2 existed, I believe the accepted way to move standard input to a new file was was close(0); open("myfile", ...);.
However, it's hard to really be sure what fds are available. For example, the user may have run your program as /usr/bin/prog 5>/some/file/somewhere and then it will appear that fd 5 gets skipped, because /some/file/somewhere is already open on fd 5. As such, if you open a bunch of files in succession, you cannot really be sure that you will get sequential fds, unless you have just closed all those fds yourself and are sure that all lower-numbered fds are already in use. And doing that seems much more of a hassle (and a source of potential problems) than just keeping track of the fds in the first place.

Is it safe to use write() multiple times on the same file without regard for concurrency as long as each write is to a different region of the file?

According to the docs for fs:
Note that it is unsafe to use fs.write multiple times on the same file without waiting for the callback. For this scenario, fs.createWriteStream is strongly recommended.
I am downloading a file in chunks (4 chunks downloading at a time concurrently). I know the full size of the file beforehand (I use truncate after opening the file to allocate the space upfront) and also the size and ultimate location in the file (byte offset from beginning of file) of each chunk. Once a chunk is finished downloading, I call fs.write to put that chunk of data into the file at its proper place. Each call to fs.write includes the position where the data should be written. I am not using the internal pointer at all. No two chunks will overlap.
I assume that the docs indicate that calling fs.write multiple times without waiting for the callback is unsafe because you can't know where the internal pointer is. Since I'm not using that, is there any problem with my doing this?
No its not safe. Simpy because you don't know if the first call to write has been successful when you execute the second call.
Imagine if the second call was successful but the first and third call wasn't and the fifth and sixth were successful as well.
And the chaos is perfect.
Plus NodeJS has a different execution stack than other interpreters have. You have no guarantee when specific code parts will be executed or in which order

Lua 4.0.1 appendto

Could someone please explain the proper way to use the appendto function?
I am trying to use it to write debug text to a file. I want it written immediately when I call the function, but for some reason the program waits until it exits, and then writes everything at once.
Am I using the right function? Do I need to open, then write, then close the file each time I write to it instead?
Thanks.
Looks like you are having an issue with buffering (this also a common question in other languages, btw). The data you want to write to the file is being held in a memory buffer and is only being written to disk in a latter time (this is done to batch writes to disk together, for better performance).
One possibility is to open and close the file as you already suggested. Closing a file handle will flush the contents of the buffer to disk.
A second possibility is to use the flush function to explicitly request that the data be written to disk. In Lua 4.0.1, you can either call flush passing a file handle
-- If you have opened your file with open:
local myfile = open("myfile.txt", "a")
flush(myfile)
-- If you used appendto the output file handle is in the _OUTPUT global variable
appendto("myfile.txt")
flush(_OUTPUT)
or you can call flush with no arguments, in which case it will flush all the files you have currently open.
flush()
For details, see the reference manual: http://www.lua.org/manual/4.0/manual.html#6.

fs.createWriteStream over several processes

How can I implement a system where multiple Node.js processes write to the same file with fs.createWriteStream, such that they don't overwrite data? It looks like the default setup for fs.createWriteStream is that the file is cleared out when that method is called. My goal is to clear out the file once, and then have all other subsequent writers only append data.
Should I use fs.createWriteStream and then fs.appendFile? Or is there a way to open up a stream for each process, not just for the first process to open the file?
Should I use fs.createWriteStream and then fs.appendFile?
you can use either.
with fs.createWriteStream you have to change the flag like this:
fs.createWriteStream('your_file',{
flags: 'a+', // default is 'w' (just 'a' might be enough here, i'm not sure)
})
this should create the file if it doesn't exist or open it with write access if it exists and set the pointer to end. (append mode)
How to use fs.appendFile should be clear and it does pretty much the same.
Now the problem with multiple processes accessing the same file. Obviously only one process can open the same file with write access at the same time.
Therefore you need to wait for the file to be released if another process has the write access. You will probably need a library for that.
this one for example: https://www.npmjs.com/package/lockup
or this one: https://github.com/Perennials/mutex-node
you can also find alot more here: https://www.npmjs.com/browse/keyword/lock
or here: https://www.npmjs.com/browse/keyword/mutex
I have not tried any of those libraries but the one I posted and several others on the list should do exactly what you need.
Writing on a single file from multiple processes, ensuring data integrity, it is a fairly complex operation that you can orchestrate using File locking.
However, you have two simpler approaches:
Writing on a temporary file for each process, and then concatenate
the files at the end of the operations.
Transmitting what you need to write to a dedicated, single process and delegate the writing execution to it. Keep in mind that sending messages among processes can be expensive.

Node JS is async Read/Write safe?

Probably a dumb question, but if the program is asynchronously writing to a file, and you access that file while it's still writing, are the contents messed up?
In fact, it does not matter whether you are synchronously or asynchronously accessing a file: if some other process (yours or someone else) modifies the file while you are in the middle of reading, you will get inconsistent results.
The exact kind of inconsistency you'll see depends on how the file is written and when reading starts.
In node's default mode (w), a file's existing contents are truncated when the file is opened.
An in-flight read will stop early (without erroring), meaning you'll only have a percentage of the original file.
A read started after the write begins will read up to the last written byte. Depending on how far along and fast the write is, and how you read the file, the read may or may not see the complete file.
If the file is written in r+ mode, the contents are not truncated when the file is opened for writing. This means a read will see part of the old data and part of the new data. Things are further muddied if the write changes the file size.
This is all true regardless of whether you use streams (ie createReadStream), readFile, or even readFileSync. Any part of the file on disk can be changed while node is in the process of buffering the file into memory. (The only notable exception here is if you use writeFileSync and then readFileSync in the same process, since the write call would prevent the read from starting until after the write is complete. However, this still doesn't prevent other processes from changing the file mid-read, and you shouldn't be using the sync methods anyway.)
In other words, reading and writing a file is non-atomic. To avoid inconsistency, you should write the file with a temporary name and then rename it when the write is complete.

Resources