Probably a dumb question, but if the program is asynchronously writing to a file, and you access that file while it's still writing, are the contents messed up?
In fact, it does not matter whether you are synchronously or asynchronously accessing a file: if some other process (yours or someone else) modifies the file while you are in the middle of reading, you will get inconsistent results.
The exact kind of inconsistency you'll see depends on how the file is written and when reading starts.
In node's default mode (w), a file's existing contents are truncated when the file is opened.
An in-flight read will stop early (without erroring), meaning you'll only have a percentage of the original file.
A read started after the write begins will read up to the last written byte. Depending on how far along and fast the write is, and how you read the file, the read may or may not see the complete file.
If the file is written in r+ mode, the contents are not truncated when the file is opened for writing. This means a read will see part of the old data and part of the new data. Things are further muddied if the write changes the file size.
This is all true regardless of whether you use streams (ie createReadStream), readFile, or even readFileSync. Any part of the file on disk can be changed while node is in the process of buffering the file into memory. (The only notable exception here is if you use writeFileSync and then readFileSync in the same process, since the write call would prevent the read from starting until after the write is complete. However, this still doesn't prevent other processes from changing the file mid-read, and you shouldn't be using the sync methods anyway.)
In other words, reading and writing a file is non-atomic. To avoid inconsistency, you should write the file with a temporary name and then rename it when the write is complete.
Related
Is the function appendFileSync buffered. For example if I do 1000 calls to the function, with only a few characters to be written, will the function buffer my characters and only actually write to the file when the buffer is full or will it write to the file every time I call the function?
I'm curious to know if I need to implement buffering for it or if it's already built in.
According to the source code, the flow applied when using appendFileSync is the following:
- appendFileSync
- writeFileSync
- openSync
- writeSync
- closeSync
Provided that the file is opened and closed everytime, it does not use a buffering logic and will write data to the disk directly each time.
Edit: After inspecting closer, it appears that if you provide the file descriptor yourself, it will not open and close the file for you, so in this case this might be what you look for
So, I am in the situation where one process is continuously (after each few seconds) writing data to a file (not appending). The data is in the form of json. Now another process has to read this file at regular intervals. Now it could be that the reading process reads it while the writing process is writing to the file.
A soluition to this problem that I can think of is for the writer process to also write a corresponding checksum file. The reader process would now have to read both the file and its checksum file. If the calculated checksum doesn't match, the reader process would repeat the process until the calculated checksum matches. In this way, now it would know that it has read the correct data.
Or maybe a better solution is to read the file twice after a certain time period (much less than the writing interval of the writing process), and see if the read data matches.
The third way could be to write some magic data at the end of the file, so that the reading process knows that it has read the whole file, if it has encoutered that magic data at the end.
What do you think? Are these solutions viable, or are there better methods to achieve this?
Create an entire new file each time, and rename() the new file once it's been completely written:
If newpath already exists, it will be atomically replaced, so that
there is no point at which another process attempting to access
newpath will find it missing. ...
Some copy of the file will always be there, and it will always be complete and correct:
So, instead of
writeDataFile( "/path/to/data/file.json" );
and then trying to figure out what to do in the reader process(es), you simply do
writeDataFile( "/path/to/data/file.json.new" );
rename( "/path/to/data/file.json.new", "/path/to/data/file.json" );
No locking is necessary, nor any reading of the file and computing checksums and hoping it's correct.
The only issue is any reader process has to open() the file each time it needs to read the latest copy - it can't keep and open file descriptor on the file and try to read new contents as the rename() call unlinks the original file and replaces it with an entirely new file.
If you want to guarantee that the reader always gets all data, consider using a name pipe.
mkfifo ./jsonoutput
Then set one program to write to and the other program to read from this file ./jsonoutput.
So long as the writer is regularly closing and reopening the file after writing each JSON, the reader will get an EOF and process the input.
However if that isn't the case, the reader will just keep reading and the writer will just keep writing. If the programs aren't designed to handle streams of data like that, then they might just never process the data and the programs will hang.
If that's the case then you could write a program that reads from one named pipe until it gets a complete JSON and then flushes it through a second named pipe to the final program.
Could someone please explain the proper way to use the appendto function?
I am trying to use it to write debug text to a file. I want it written immediately when I call the function, but for some reason the program waits until it exits, and then writes everything at once.
Am I using the right function? Do I need to open, then write, then close the file each time I write to it instead?
Thanks.
Looks like you are having an issue with buffering (this also a common question in other languages, btw). The data you want to write to the file is being held in a memory buffer and is only being written to disk in a latter time (this is done to batch writes to disk together, for better performance).
One possibility is to open and close the file as you already suggested. Closing a file handle will flush the contents of the buffer to disk.
A second possibility is to use the flush function to explicitly request that the data be written to disk. In Lua 4.0.1, you can either call flush passing a file handle
-- If you have opened your file with open:
local myfile = open("myfile.txt", "a")
flush(myfile)
-- If you used appendto the output file handle is in the _OUTPUT global variable
appendto("myfile.txt")
flush(_OUTPUT)
or you can call flush with no arguments, in which case it will flush all the files you have currently open.
flush()
For details, see the reference manual: http://www.lua.org/manual/4.0/manual.html#6.
How can I implement a system where multiple Node.js processes write to the same file with fs.createWriteStream, such that they don't overwrite data? It looks like the default setup for fs.createWriteStream is that the file is cleared out when that method is called. My goal is to clear out the file once, and then have all other subsequent writers only append data.
Should I use fs.createWriteStream and then fs.appendFile? Or is there a way to open up a stream for each process, not just for the first process to open the file?
Should I use fs.createWriteStream and then fs.appendFile?
you can use either.
with fs.createWriteStream you have to change the flag like this:
fs.createWriteStream('your_file',{
flags: 'a+', // default is 'w' (just 'a' might be enough here, i'm not sure)
})
this should create the file if it doesn't exist or open it with write access if it exists and set the pointer to end. (append mode)
How to use fs.appendFile should be clear and it does pretty much the same.
Now the problem with multiple processes accessing the same file. Obviously only one process can open the same file with write access at the same time.
Therefore you need to wait for the file to be released if another process has the write access. You will probably need a library for that.
this one for example: https://www.npmjs.com/package/lockup
or this one: https://github.com/Perennials/mutex-node
you can also find alot more here: https://www.npmjs.com/browse/keyword/lock
or here: https://www.npmjs.com/browse/keyword/mutex
I have not tried any of those libraries but the one I posted and several others on the list should do exactly what you need.
Writing on a single file from multiple processes, ensuring data integrity, it is a fairly complex operation that you can orchestrate using File locking.
However, you have two simpler approaches:
Writing on a temporary file for each process, and then concatenate
the files at the end of the operations.
Transmitting what you need to write to a dedicated, single process and delegate the writing execution to it. Keep in mind that sending messages among processes can be expensive.
I have found this issue in following two situations.
When there is lots of free space on the server.
When there is no space available on the server.
I am reading particular JSON file using following:
fs.readFileSync(_file_path, 'utf-8');
and after some manipulation on the received data, I am writing the updated data to the same file using following:
fs.writeFileSync(_file_path, {stringified-json});
During this operation my file is becoming empty sometime, now I am trying to reproduce this issue locally but not able to reproduce it.
fs.writeFileSync() will throw if there was an error, so make sure you have the code in a try/catch block and that your catch is not simply swallowing the error. (For example, an empty catch block will cause the exception to be swallowed or, in other words, ignored.)
If this is a script or some other process that might get invoked multiple times simultaneously (e.g., multiple processes or you're using workers), then you need to use file locking or some other mechanism to make sure you don't have a race condition. If process A opens the file for writing (thus emptying it) and then process B opens the file for reading before process A is finished with the file, that could result in an empty file if process B reads the empty file and the code is written such that it will write an empty file as a result.
Without more information (e.g., error logs), any answer is likely to be pure guess work. But those are two things I'd check.