fs.writeFile corrupting json files

fs.writeFile corrupting json files - node.js

I have a program that writes a lot of files very quickly, and I noticed that sometimes there will be extra brackets or text in json files sometimes.
Here is how the program works:
There is an array of emojis with some more information, and if that emoji doesn't already have a file for itself, it creates a new one. If there already is an existing file of that name, it will edit it.
Code to write file:
function writeToFile(fileName, file){
return new Promise(function(resolve, reject) {
fs.writeFile(fileName, JSON.stringify(file, null, 2), 'utf8', function(err) {
if (err) reject(err);
else resolve();
});
});
}
I have tried using fs and graceful-fs and both have had this issue every couple hundred files, with no visible patterns.
examples off messed up json:
...
],
"trade_times": []
}
]
}ade_times": []
}
]
}
That second "ade_times" shouldnt be there, and I have no idea why it is appearing.
other times it just looks like this:
{
...
}}
with extra closing brackets for no reason.
Not sure if this is an issue with my code, with fs, or something with my pc. If you need any more information I can provide that (more code, node.js version, etc).
Thank you for your time :)

I dont know why your code is buggy.
But, I have created an alternate writeToFile function for you:
// Async functions are functions that return a promise in a more concise way.
async function writeToFile(filename, file) {
let fj = JSON.stringify(file, undefined, 2)
// fs.readFileSync will automatically reject the promise with the error when it encounters an error
fs.writeFileSync(filename, fj, {
encoding: "utf8"
})
return // This line is the equivalent of resolve(), this line is optional too.
}
Hope this answer works for you 😊

Related

Nodejs write multiple dynamically changing files with fs writefile

I need to write multiple dynamically changing files based on an array consisting of objects passed to a custom writeData() function. This array consists of objects containing the file name and the data to write as shown below:
[
{
file_name: "example.json",
dataObj,
},
{
file_name: "example2.json",
dataObj,
},
{
file_name: "example3.json",
dataObj,
},
{
file_name: "example4.json",
dataObj,
},
];
My current method is to map this array and read + write new data to each file:
array.map((entry) => {
fs.readFile(
entry.file_name,
"utf8",
(err, unparsedData) => {
if (err) console.log(err);
else {
var parsedData = JSON.parse(unparsedData);
parsedData.data.push(entry.dataObj);
const parsedDataJSON = JSON.stringify(parsedData, null, 2);
fs.writeFile(
entry.file_name,
parsedDataJSON,
"utf8",
(err) => {
if (err) console.log(err);
}
);
}
}
);
});
This however, does not work. Only a small percent of data is written to these files and often times the file is not correctly written in json format (I think this is because two writeFile processes are writing to the same file at once and that breaks the file). Obviously this does not work the way I expected it to.
The multiple ways I have tried to resolve this problem consisted of attempting to make the fs.writeFile synchronous (delay the map loop, allowing each process to finish before moving to the next entry), but this is not a good practice as synchronous processes hang up the entire app. I have also looked into implementing promises but to no avail. I am a new learner to nodejs so apologies for missed details/information. Any help is appreciated!

The same file is often listed multiple times in the array if that changes anything.
Well, that changes everything. You should have shown that in the original question. If that is the case, then you have to sequence each individual file in the loop so it finishes one before advancing to the next. To prevent conflicts between writing to the same file, you have to assure yourself of two things:
You sequence each of the files in the loop so the next one doesn't start until the previous one is done.
You don't call this code again while its still in operation.
You can assure yourself of the first item like this:
async function processFiles(array) {
for (let entry of array) {
const unparsedData = await fs.promises.readFile(entry.file_name, "utf8");
const parsedData = JSON.parse(unparsedData);
parsedData.data.push(entry.dataObj);
const json = JSON.stringify(parsedData, null, 2);
await fs.promise.writeFile(entry.file_name, json, "utf8");
}
}
This will abort the loop if it gets an error on any of them. If you want it to continue to write the others, you can add a try/catch internally:
async function processFiles(array) {
let firstError;
for (let entry of array) {
try {
const unparsedData = await fs.promises.readFile(entry.file_name, "utf8");
const parsedData = JSON.parse(unparsedData);
parsedData.data.push(entry.dataObj);
const json = JSON.stringify(parsedData, null, 2);
await fs.promise.writeFile(entry.file_name, json, "utf8");
} catch (e) {
// log error and continue with the rest of the loop
if (!firstError) {
firstError = e;
}
console.log(e);
}
}
// make sure we communicate back any error that happened
if (firstError) {
throw firstError;
}
}
To assure yourself of the second point above, you will have to either not use a setInterval() (replace it with a setTimeout() that you set when the promise that processFiles()resolves or make absolutely sure that the setInterval() time is long enough that it will never fire before processFiles() is done.
Also, make absolutely sure that you are not modifying the array used in this function while that function is running.

Is this terminal-log the consequence of the Node JS asynchronous nature?

I haven't found anything specific about this, it isn't really a problem but I would like to understand better what is going on here.
Basically, I'am testing some simple NodeJS code , like this :
//Summary : Open a file , write to the file, delete the file.
let fs = require('fs');
fs.open('mynewfile.txt' , 'w' , function(err,file){
if(err) throw err;
console.log('Created file!')
})
fs.appendFile('mynewfile.txt' , 'Depois de ter criado este ficheiro com o fs.open, acrescentei-lhe data com o fs.appendFile' , function(err){
if (err) throw err;
console.log('Added text to the file.')
})
fs.unlink('mynewfile.txt', function(err){
if (err) throw err;
console.log('File deleted!')
})
console.log(__dirname);
I thought this code would be executed in the order it was written from the top to the bottom, but when I look at the terminal I'am not sure that was the case because this is what I get :
$ node FileSystem.js
C:\Users\Simon\OneDrive\Desktop\Portfolio\Learning Projects\NodeJS_Tutorial
Created file!
File deleted!
Added text to the file.
//Expected order would be: Create file, add text to file , delete file , log dirname.
Instead of what ther terminal might make you think, in the end when I look at my folder the code order still seems to have been followed somehow because the file was deleted and I have nothing left on the directory.
So , I was wondering , why is it that the terminal doesn't log in the same order that the code is written from the top to the bottom.
Would this be the result of NodeJS asynchronous nature or is it something else ?

The code is (in princliple) executed from top to bottom, as you say. But fs.open, fs.appendFile, and fs.unlink are asynchronous. Ie, they are placed on the execution stack in the partiticular order, but there is no guarantee whatsoever, in which order they are finished, and thus you can't guarantee, in which order the callbacks are executed. If you run the code multiple times, there is a good chance, that you may encounter different execution orders ...
If you need a specific order, you have two different options
You call the later operation only in the callback of the prior, ie something like below
fs.open('mynewfile.txt' , 'w' , function(err,file){
if(err) throw err;
console.log('Created file!')
fs.appendFile('mynewfile.txt' , '...' , function(err){
if (err) throw err;
console.log('Added text to the file.')
fs.unlink('mynewfile.txt', function(err){
if (err) throw err;
console.log('File deleted!')
})
})
})
You see, that code gets quite ugly and hard to read with all that increasing nesting ...
You switch to the promised based approach
let fs = require('fs').promises;
fs.open("myfile.txt", "w")
.then(file=> {
return fs.appendFile("myfile.txt", "...");
})
.then(res => {
return fs.unlink("myfile");
})
.catch(e => {
console.log(e);
})
With the promise-version of the operations, you can also use async/await
async function doit() {
let file = await fs.open('myfile.txt', 'w');
await fs.appendFile('myfile.txt', '...');
await fs.unlink('myfile.txt', '...');
}
For all three possibilites, you probably need to close the file, before you can unlink it.
For more details please read about Promises, async/await and the Execution Stack in Javascript

It's a combination of 2 things:
The asynchronous nature of Node.js, as you correctly assume
Being able to unlink an open file
What likely happened is this:
The file was opened and created at the same time (open with flag w)
The file was opened a second time for appending (fs.appendFile)
The file was unlinked
Data was appended to the file (while it was already unlinked) and the file was closed
When data was being appended, the file still existed on disk as an inode, but had zero hard links (references) to it. It still takes up space then, but the OS checks the reference count when closing and frees up the space if the count has fallen to zero.
People sometimes run into a similar situation with daemons such as HTTP servers that employ log rotation: if something goes wrong when switching over logs, the old log file may be unlinked but not closed, so it's never cleaned up and it takes space forever (until you reboot or restart the process).
Note that the ordering of operations that you're observing is random, and it is possible that they would be re-ordered. Don't rely on it.

You could write this as (untested):
let fs = require('fs');
const main = async () => {
await fs.open('mynewfile.txt' , 'w');
await fs.appendFile('mynewfile.txt' , 'content');
await fs.unlink('mynewfile.txt');
});
main()
.then(() => console.log('success'()
.catch(console.error);
or within another async function:
const someOtherFn = async () => {
try{
await main();
} catch(e) {
// handle any rejection to your liking
}
}
(The catch block is not mandatory. You can opt to just let them throw to the top. It's just to showcase how async / await allows you to make synchronous code appear as if it was synchronous code without runing into callback hell.)

Concatening files using streams

My app generates several files in parallel that I finally have to concat before serving it to the user.
Some of these can be big (>10M).
In order to achieve this, I have create 2 routines, one for handling the set of files to process and one for appending the content of a file to a destination one. Everything works fine on small/medium size files, but on large ones, the content is truncated...
function copyFileContent(ws,rs){
let errorDetected=false;
return new Promise((resolve,reject)=>{
if(rs && ws){
ws.on('error',(err)=>{
errorDetected=true;
return reject();
});
ws.on('close',()=>{
if(!errorDetected){
resolve(); // -> need to wait for large files...
} // Else reject has already occured...
});
rs.pipe(ws);
} else reject();
})
}
async function mergeFiles(dest, files){
for(var i=0;i<files.length;i++){
let w=fs.createWriteStream(dest,{flags: 'a', encoding: 'utf-8'});
let r=fs.createReadStream(files[i]);
await copyFileContent(w,r);
}
}
After some investigation, I have doublechecked that the close event on the writeStream (ws) was not a consequence of an error (which could justify that the content is truncated).
I finally added some delay by replacing the 'resolve()' statement by setTimeout(()=>{resolve()},3000);
And obviously allowing more time for the system (OS=Windows 10) to indeed write to file fixes the issue. But I don't understand why ! What is happening under the scene ? How to avoid such behaviour. I need to make sure that when the 'close' event occurs, then the file is indeed fully populated.
Can anyone help me finding my bug or misunderstanding on streams ?
Thx

in firebase cloud function the bucket.upload promise resolves too early

I wrote a function that work like this
onNewZipFileRequested
{get all the necessary data}
.then{download all the files}
.then{create a zipfile with all those file}
.then{upload that zipfile} (*here is the problem)
.than{update the database with the signedUrl of the file}
Here is the relevant code
[***CREATION OF ZIP FILE WORKING****]
}).then(() =>{
zip.generateNodeStream({type:'nodebuffer',streamFiles:true})
.pipe(fs.createWriteStream(tempPath))
.on('finish', function () {
console.log("zip written.");
return bucket.upload(tempPath, { //**** problem****
destination: destinazionePath
});
});
}).then(()=>{
const config = {
action:'read',
expires:'03-09-2391'
}
return bucket.file(destinazionePath).getSignedUrl(config)
}).then(risultato=>{
const daSalvare ={
signedUrl: risultato[0],
status : 'fatto',
dataInserimento : zipball.dataInserimento
}
return event.data.ref.set(daSalvare)
})
On the client side, as soon as the app see the status change and the new Url, a download button (pointing to the new url) appears
Everything is working, but if I try to download the file immediately... there is no file yet!!!
If I wait same time and retry the file is there.
I noted that the time I have to wait depend on the size of the zipfile.
The bucket.upload promise should resolve on the end of the upload, but apparently fires too early.
Is there a way to know exactly when the file is ready?
I may have to make same very big file, it's not a problem if the process takes several minutes, but I need to know when it's over.
* EDIT *
there was a unnecessary nesting in the code. While it was not the error (results are the same before and after refactoring) it was causing some confusion in the answers, so i edited it out.
Id' like to point out that i update the database only after getting the signed url, and i get that only after the upload (i could not otherwise), so to get any result at all the promise chain MUST work, and in fact it does. When on the client side the download button appears (happens when 'status' become 'fatto') it is already linked to the correct signed url, but if i press it too early the file is not there (Failed - No file). If i wait some second (the bigger the file the longer i have to wait) then the file is there.
(English is not my mother language, if i have been unclear ask and i will try to explain myself better)

It looks like the problem could be that the braces are not aligned properly, causing a then statement to be embedded within another. Here is the code with the then statements separated:
[***CREATION OF ZIP FILE WORKING****]}).then(() => {
zip.generateNodeStream({type: 'nodebuffer', streamFiles: true})
.pipe(fs.createWriteStream(tempPath))
.on('finish', function () {
console.log('zip written.')
return bucket.upload(tempPath, {
destination: destinazionePath
})
})
}).then(() => {
const config = {
action: 'read',
expires: '03-09-2391'
}
return bucket.file(destinazionePath).getSignedUrl(config)
}).then(risultato => {
const daSalvare = {
signedUrl: risultato[0],
status : 'fatto',
dataInserimento : zipball.dataInserimento
}
return event.data.ref.set(daSalvare)
})

Creating a file only if it doesn't exist in Node.js

We have a buffer we'd like to write to a file. If the file already exists, we need to increment an index on it, and try again. Is there a way to create a file only if it doesn't exist, or should I just stat files until I get an error to find one that doesn't exist already?
For example, I have files a_1.jpg and a_2.jpg. I'd like my method to try creating a_1.jpg and a_2.jpg, and fail, and finally successfully create a_3.jpg.
The ideal method would look something like this:
fs.writeFile(path, data, { overwrite: false }, function (err) {
if (err) throw err;
console.log('It\'s saved!');
});
or like this:
fs.createWriteStream(path, { overwrite: false });
Does anything like this exist in node's fs library?
EDIT: My question isn't if there's a separate function that checks for existence. It's this: is there a way to create a file if it doesn't exist, in a single file system call?

As your intuition correctly guessed, the naive solution with a pair of exists / writeFile calls is wrong. Asynchronous code runs in unpredictable ways. And in given case it is
Is there a file a.txt? — No.
(File a.txt gets created by another program)
Write to a.txt if it's possible. — Okay.
But yes, we can do that in a single call. We're working with file system so it's a good idea to read developer manual on fs. And hey, here's an interesting part.
'w' - Open file for writing. The file is created (if it does not
exist) or truncated (if it exists).
'wx' - Like 'w' but fails if path exists.
So all we have to do is just add wx to the fs.open call. But hey, we don't like fopen-like IO. Let's read on fs.writeFile a bit more.
fs.readFile(filename[, options], callback)#
filename String
options Object
encoding String | Null default = null
flag String default = 'r'
callback Function
That options.flag looks promising. So we try
fs.writeFile(path, data, { flag: 'wx' }, function (err) {
if (err) throw err;
console.log("It's saved!");
});
And it works perfectly for a single write. I guess this code will fail in some more bizarre ways yet if you try to solve your task with it. You have an atomary "check for a_#.jpg existence, and write there if it's empty" operation, but all the other fs state is not locked, and a_1.jpg file may spontaneously disappear while you're already checking a_5.jpg. Most* file systems are no ACID databases, and the fact that you're able to do at least some atomic operations is miraculous. It's very likely that wx code won't work on some platform. So for the sake of your sanity, use database, finally.
Some more info for the suffering
Imagine we're writing something like memoize-fs that caches results of function calls to the file system to save us some network/cpu time. Could we open the file for reading if it exists, and for writing if it doesn't, all in the single call? Let's take a funny look on those flags. After a while of mental exercises we can see that a+ does what we want: if the file doesn't exist, it creates one and opens it both for reading and writing, and if the file exists it does so without clearing the file (as w+ would). But now we cannot use it neither in (smth)File, nor in create(Smth)Stream functions. And that seems like a missing feature.
So feel free to file it as a feature request (or even a bug) to Node.js github, as lack of atomic asynchronous file system API is a drawback of Node. Though don't expect changes any time soon.
Edit. I would like to link to articles by Linus and by Dan Luu on why exactly you don't want to do anything smart with your fs calls, because the claim was left mostly not based on anything.

What about using the a option?
According to the docs:
'a+' - Open file for reading and appending. The file is created if it does not exist.
It seems to work perfectly with createWriteStream

This method is no longer recommended. fs.exists is deprecated. See comments.
Here are some options:
1) Have 2 "fs" calls. The first one is the "fs.exists" call, and the second is "fs.write / read, etc"
//checks if the file exists.
//If it does, it just calls back.
//If it doesn't, then the file is created.
function checkForFile(fileName,callback)
{
fs.exists(fileName, function (exists) {
if(exists)
{
callback();
}else
{
fs.writeFile(fileName, {flag: 'wx'}, function (err, data)
{
callback();
})
}
});
}
function writeToFile()
{
checkForFile("file.dat",function()
{
//It is now safe to write/read to file.dat
fs.readFile("file.dat", function (err,data)
{
//do stuff
});
});
}
2) Or Create an empty file first:
--- Sync:
//If you want to force the file to be empty then you want to use the 'w' flag:
var fd = fs.openSync(filepath, 'w');
//That will truncate the file if it exists and create it if it doesn't.
//Wrap it in an fs.closeSync call if you don't need the file descriptor it returns.
fs.closeSync(fs.openSync(filepath, 'w'));
--- ASync:
var fs = require("fs");
fs.open(path, "wx", function (err, fd) {
// handle error
fs.close(fd, function (err) {
// handle error
});
});
3) Or use "touch": https://github.com/isaacs/node-touch

Todo this in a single system call you can use the fs-extra npm module.
After this the file will have been created as well as the directory it is to be placed in.
const fs = require('fs-extra');
const file = '/tmp/this/path/does/not/exist/file.txt'
fs.ensureFile(file, err => {
console.log(err) // => null
});
Another way is to use ensureFileSync which will do the same thing but synchronous.
const fs = require('fs-extra');
const file = '/tmp/this/path/does/not/exist/file.txt'
fs.ensureFileSync(file)

With async / await and Typescript I would do:
import * as fs from 'fs'
async function upsertFile(name: string) {
try {
// try to read file
await fs.promises.readFile(name)
} catch (error) {
// create empty file, because it wasn't found
await fs.promises.writeFile(name, '')
}
}

Here's a synchronous way of doing it:
try {
await fs.truncateSync(filepath, 0);
} catch (err) {
await fs.writeFileSync(filepath, "", { flag: "wx" });
}
If the file exists it will get truncated, otherwise it gets created if an error is raised.

This works for me.
// Use the file system fs promises
const {access} = require('fs/promises');
// File Exist returns true
// dont use exists which is no more!
const fexists =async (path)=> {
try {
await access(path);
return true;
} catch {
return false;
}
}
// Wrapper for your main program
async function mainapp(){
if( await fexists("./users.json")){
console.log("File is here");
} else {
console.log("File not here -so make one");
}
}
// run your program
mainapp();
Just keep eye on your async - awaits so everthing plays nice.
hope this helps.

You can do something like this:
function writeFile(i){
var i = i || 0;
var fileName = 'a_' + i + '.jpg';
fs.exists(fileName, function (exists) {
if(exists){
writeFile(++i);
} else {
fs.writeFile(fileName);
}
});
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string