Lua 4.0.1 appendto - io

Could someone please explain the proper way to use the appendto function?
I am trying to use it to write debug text to a file. I want it written immediately when I call the function, but for some reason the program waits until it exits, and then writes everything at once.
Am I using the right function? Do I need to open, then write, then close the file each time I write to it instead?
Thanks.

Looks like you are having an issue with buffering (this also a common question in other languages, btw). The data you want to write to the file is being held in a memory buffer and is only being written to disk in a latter time (this is done to batch writes to disk together, for better performance).
One possibility is to open and close the file as you already suggested. Closing a file handle will flush the contents of the buffer to disk.
A second possibility is to use the flush function to explicitly request that the data be written to disk. In Lua 4.0.1, you can either call flush passing a file handle
-- If you have opened your file with open:
local myfile = open("myfile.txt", "a")
flush(myfile)
-- If you used appendto the output file handle is in the _OUTPUT global variable
appendto("myfile.txt")
flush(_OUTPUT)
or you can call flush with no arguments, in which case it will flush all the files you have currently open.
flush()
For details, see the reference manual: http://www.lua.org/manual/4.0/manual.html#6.

Related

NodeJS is appendFileSync buffered?

Is the function appendFileSync buffered. For example if I do 1000 calls to the function, with only a few characters to be written, will the function buffer my characters and only actually write to the file when the buffer is full or will it write to the file every time I call the function?
I'm curious to know if I need to implement buffering for it or if it's already built in.
According to the source code, the flow applied when using appendFileSync is the following:
- appendFileSync
- writeFileSync
- openSync
- writeSync
- closeSync
Provided that the file is opened and closed everytime, it does not use a buffering logic and will write data to the disk directly each time.
Edit: After inspecting closer, it appears that if you provide the file descriptor yourself, it will not open and close the file for you, so in this case this might be what you look for

How to make `write()` system call on Linux immediately effective?

I am writing a REPL(read-execute-print-loop) for C. I try to maintain a header file so that I can define new functions based on the previous functions. Whenever I define a new function, I get a new temporary file like this:
#include "/tmp/header.h"
int foo() {
return func() * func();
}
And the /tmp/header.h is like:
int func();
int foo();
where func() is a previously defined function.
So I need to call write() on header_fileno again and again. What I am concerned is---is it possible that after I called write(header_fileno, buf, wrsize), the contents of buf is stored in some kernel buffer instead of being written into the actual file? Because if that happens, I cannot count on the header to give up-to-date declarations. I have the same concern when it comes to the source file. And if that happens, is there a way to make it immediately effective?
You may safely assume that any process, including the current one, which makes a read() call after you've called write() will see the updated file, even if the file is still in a kernel buffer and not fully written to disk. POSIX mandates this behavior:
If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes.
Having said that, this doesn't apply if you use the stdio functions, which may buffer data before writing. It also doesn't guarantee that your data won't be lost or corrupted if your system crashes; if you need that guarantee, you must use fsync() or open the file with O_SYNC.
According to the Linux man page, there is no guarantee that write will commit your data to disk. Use fsync (or close the file) to flush the written data to the disk. (lseek might also be useful, if you want to use just one file descriptor.)
However, if you are going to be modifying the contents of the files frequently, as you might expect in a REPL, you might want to store them in memory instead, so you can manipulate them more easily.
Similarly, rather than storing live REPL code as text, you might want to store its information in another form, so you can access and change it more easily.

How to reliably read data from a file which is being continuously written by another process?

So, I am in the situation where one process is continuously (after each few seconds) writing data to a file (not appending). The data is in the form of json. Now another process has to read this file at regular intervals. Now it could be that the reading process reads it while the writing process is writing to the file.
A soluition to this problem that I can think of is for the writer process to also write a corresponding checksum file. The reader process would now have to read both the file and its checksum file. If the calculated checksum doesn't match, the reader process would repeat the process until the calculated checksum matches. In this way, now it would know that it has read the correct data.
Or maybe a better solution is to read the file twice after a certain time period (much less than the writing interval of the writing process), and see if the read data matches.
The third way could be to write some magic data at the end of the file, so that the reading process knows that it has read the whole file, if it has encoutered that magic data at the end.
What do you think? Are these solutions viable, or are there better methods to achieve this?
Create an entire new file each time, and rename() the new file once it's been completely written:
If newpath already exists, it will be atomically replaced, so that
there is no point at which another process attempting to access
newpath will find it missing. ...
Some copy of the file will always be there, and it will always be complete and correct:
So, instead of
writeDataFile( "/path/to/data/file.json" );
and then trying to figure out what to do in the reader process(es), you simply do
writeDataFile( "/path/to/data/file.json.new" );
rename( "/path/to/data/file.json.new", "/path/to/data/file.json" );
No locking is necessary, nor any reading of the file and computing checksums and hoping it's correct.
The only issue is any reader process has to open() the file each time it needs to read the latest copy - it can't keep and open file descriptor on the file and try to read new contents as the rename() call unlinks the original file and replaces it with an entirely new file.
If you want to guarantee that the reader always gets all data, consider using a name pipe.
mkfifo ./jsonoutput
Then set one program to write to and the other program to read from this file ./jsonoutput.
So long as the writer is regularly closing and reopening the file after writing each JSON, the reader will get an EOF and process the input.
However if that isn't the case, the reader will just keep reading and the writer will just keep writing. If the programs aren't designed to handle streams of data like that, then they might just never process the data and the programs will hang.
If that's the case then you could write a program that reads from one named pipe until it gets a complete JSON and then flushes it through a second named pipe to the final program.

fs.createWriteStream over several processes

How can I implement a system where multiple Node.js processes write to the same file with fs.createWriteStream, such that they don't overwrite data? It looks like the default setup for fs.createWriteStream is that the file is cleared out when that method is called. My goal is to clear out the file once, and then have all other subsequent writers only append data.
Should I use fs.createWriteStream and then fs.appendFile? Or is there a way to open up a stream for each process, not just for the first process to open the file?
Should I use fs.createWriteStream and then fs.appendFile?
you can use either.
with fs.createWriteStream you have to change the flag like this:
fs.createWriteStream('your_file',{
flags: 'a+', // default is 'w' (just 'a' might be enough here, i'm not sure)
})
this should create the file if it doesn't exist or open it with write access if it exists and set the pointer to end. (append mode)
How to use fs.appendFile should be clear and it does pretty much the same.
Now the problem with multiple processes accessing the same file. Obviously only one process can open the same file with write access at the same time.
Therefore you need to wait for the file to be released if another process has the write access. You will probably need a library for that.
this one for example: https://www.npmjs.com/package/lockup
or this one: https://github.com/Perennials/mutex-node
you can also find alot more here: https://www.npmjs.com/browse/keyword/lock
or here: https://www.npmjs.com/browse/keyword/mutex
I have not tried any of those libraries but the one I posted and several others on the list should do exactly what you need.
Writing on a single file from multiple processes, ensuring data integrity, it is a fairly complex operation that you can orchestrate using File locking.
However, you have two simpler approaches:
Writing on a temporary file for each process, and then concatenate
the files at the end of the operations.
Transmitting what you need to write to a dedicated, single process and delegate the writing execution to it. Keep in mind that sending messages among processes can be expensive.

Node JS is async Read/Write safe?

Probably a dumb question, but if the program is asynchronously writing to a file, and you access that file while it's still writing, are the contents messed up?
In fact, it does not matter whether you are synchronously or asynchronously accessing a file: if some other process (yours or someone else) modifies the file while you are in the middle of reading, you will get inconsistent results.
The exact kind of inconsistency you'll see depends on how the file is written and when reading starts.
In node's default mode (w), a file's existing contents are truncated when the file is opened.
An in-flight read will stop early (without erroring), meaning you'll only have a percentage of the original file.
A read started after the write begins will read up to the last written byte. Depending on how far along and fast the write is, and how you read the file, the read may or may not see the complete file.
If the file is written in r+ mode, the contents are not truncated when the file is opened for writing. This means a read will see part of the old data and part of the new data. Things are further muddied if the write changes the file size.
This is all true regardless of whether you use streams (ie createReadStream), readFile, or even readFileSync. Any part of the file on disk can be changed while node is in the process of buffering the file into memory. (The only notable exception here is if you use writeFileSync and then readFileSync in the same process, since the write call would prevent the read from starting until after the write is complete. However, this still doesn't prevent other processes from changing the file mid-read, and you shouldn't be using the sync methods anyway.)
In other words, reading and writing a file is non-atomic. To avoid inconsistency, you should write the file with a temporary name and then rename it when the write is complete.

Resources