fs.readFile anomaly - node.js

Trying to run a script that opens a bunch of files asynchronously and reads their content. I am getting an error where fs.readFile's callback comes in with no data, yet the file is there, and is not currently being opened by anything else. Totally confused.
The error is:
Error: OK, open
'D:\Workspace\fasttrack\public\cloudmade\images\998\256\6\31\61.png'
More info:
The program runs through a loop that has a bunch of objects in it that look like this:
newObject = {
filePath:filePath,
scanPixels:function(parent){
....
}
}
The loop calls each object's scanPixels function, which then does an fs.readFile on the parent.filePath
Here is the for loop
for(var index=0;index<objects.length;index++){
objects[index].scanPixels(objects[index]);
}
The scanPixels function is essentially this:
scanPixels:function(parent){
png_js.decode(parent.filePath, function(pixels){
...more stuff
And in the png_js file:
PNG.decode = function(path, fn) {
return fs.readFile(path, function(err, file) {
var png;
png = new PNG(file);
return png.decode(function(pixels) {
return fn(pixels);
});
});
};

the problem is that fs.readFile does not return a value. You would want fs.readFileSync for that
var buffer1 = fs.fileReadSync('./hello1.txt');
var buffer2;
fs.fileRead('./hello2.txt', function (err, file) {
buffer = file;
});

Related

Fs.writeFile callback not called

Node version: 8.11.2
I have a simple CSV export function that takes an array of objects, and generates the headers of the file based on the object properties of the objects.
const exportCsv = (list, fileName) => {
if (list.length > 0) {
let headers = Object.keys(list[0]);
let opts = { headers };
let parser = new Parser(opts);
let csv = parser.parse(list);
fs.writeFile(`./output/${fileName}.csv`, csv, err => {
if (err) {
console.error(err);
}
console.log(`Wrote ${fileName} to disk.`);
});
} else {
console.log('List is Empty. Nothing to export.');
}
};
It was working great, but now the call back in the fs.writeFile call isn't firing, and there are no errors or exceptions from VS Code's Debugger.
What would cause it to nut run?
If process is dead before the writing path is done, your callback will not be called because it is an asynchronous.
So you have 3 options
make sure your process is not dead before writing path is done (you can use async/await or promise)
use writeFileSync instead (less effective but less confusing)

Stop function from being invoked multiple times

I'm in the process of building a file upload component that allows you to pause/resume file uploads.
The standard way to achieve this seems to be to break the file into chunks on the client machine, then send the chunks along with book-keeping information up to the server which can store the chunks into a staging directory, then merge them together when it has received all of the chunks. So, this is what I am doing.
I am using node/express and I'm able to get the files fine, but I'm running into an issue because my merge_chunks function is being invoked multiple times.
Here's my call stack:
router.post('/api/videos',
upload.single('file'),
validate_params,
rename_uploaded_chunk,
check_completion_status,
merge_chunks,
record_upload_date,
videos.update,
send_completion_notice
);
the check_completion_status function is implemented as follows:
/* Recursively check to see if we have every chunk of a file */
var check_completion_status = function (req, res, next) {
var current_chunk = 1;
var see_if_chunks_exist = function () {
fs.exists(get_chunk_file_name(current_chunk, req.file_id), function (exists) {
if (current_chunk > req.total_chunks) {
next();
} else if (exists) {
current_chunk ++;
see_if_chunks_exist();
} else {
res.sendStatus(202);
}
});
};
see_if_chunks_exist();
};
The file names in the staging directory have the chunk numbers embedded in them, so the idea is to see if we have a file for every chunk number. The function should only next() one time for a given (complete) file.
However, my merge_chunks function is being invoked multiple times. (usually between 1 and 4) Logging does reveal that it's only invoked after I've received all of the chunks.
With this in mind, my assumption here is that it's the async nature of the fs.exists function that's causing the issue.
Even though the n'th invocation of check_completion_status may occur before I have all of the chunks, by the time we get to the nth call to fs.exists(), x more chunks may have arrived and been processed concurrently, so the function can keep going and in some cases get to the end and next(). However those chunks that arrived concurrently are also going to correspond to invocations of check_completion_status, which are also going to next() because we obviously have all of the files at this point.
This is causing issues because I didn't account for this when I wrote merge_chunks.
For completeness, here's the merge_chunks function:
var merge_chunks = (function () {
var pipe_chunks = function (args) {
args.chunk_number = args.chunk_number || 1;
if (args.chunk_number > args.total_chunks) {
args.write_stream.end();
args.next();
} else {
var file_name = get_chunk_file_name(args.chunk_number, args.file_id)
var read_stream = fs.createReadStream(file_name);
read_stream.pipe(args.write_stream, {end: false});
read_stream.on('end', function () {
//once we're done with the chunk we can delete it and move on to the next one.
fs.unlink(file_name);
args.chunk_number += 1;
pipe_chunks(args);
});
}
};
return function (req, res, next) {
var out = path.resolve('videos', req.video_id);
var write_stream = fs.createWriteStream(out);
pipe_chunks({
write_stream: write_stream,
file_id: req.file_id,
total_chunks: req.total_chunks,
next: next
});
};
}());
Currently, I'm receiving an error because the second invocation of the function is trying to read the chunks that have already been deleted by the first invocation.
What is the typical pattern for handling this type of situation? I'd like to avoid a stateful architecture if possible. Is it possible to cancel pending handlers right before calling next() in check_completion_status?
If you just want to make it work ASAP, I would use a lock (much like a db lock) to lock the resource so that only one of the requests processes the chunks. Simply create a unique id on the client, and send it along with the chunks. Then just store that unique id in some sort of a data structure, and look that id up prior to processing. The example below is by far not optimal (in fact this map will keep growing, which is bad), but it should demonstrate the concept
// Create a map (an array would work too) and keep track of the video ids that were processed. This map will persist through each request.
var processedVideos = {};
var check_completion_status = function (req, res, next) {
var current_chunk = 1;
var see_if_chunks_exist = function () {
fs.exists(get_chunk_file_name(current_chunk, req.file_id), function (exists) {
if (processedVideos[req.query.uniqueVideoId]){
res.sendStatus(202);
} else if (current_chunk > req.total_chunks) {
processedVideos[req.query.uniqueVideoId] = true;
next();
} else if (exists) {
current_chunk ++;
see_if_chunks_exist();
} else {
res.sendStatus(202);
}
});
};
see_if_chunks_exist();
};

How do I loop until a file with specific data inside is found in Node.js?

I'm learning a lot about Node.js by rewriting some utility tools I had in C# for the fun of it. I have either found something that is not a good idea to write in Node.js or I'm completely missing a concept that will make it work.
The goal of the program: Search a directory of files for a file with data that matches some criteria. The files are gzipped XML, and for the time being I'm just looking for one tag. Here's what I tried (files is an array of file names):
while (files.length > 0) {
var currentPath = rootDir + "\\" + files.pop();
var fileContents = fs.readFileSync(currentPath);
zlib.gunzip(fileContents, function(err, buff) {
if (buff.toString().indexOf("position") !== -1) {
console.log("The file '%s' has an odometer reading.", currentPath);
return;
}
});
if (files.length % 1000 === 0) {
console.log("%d files remain...", files.length);
}
}
I was nervous about this when I wrote it. It's clear from the console output all of the gunzip operations are asynchronous and decide to wait until the while loop is complete. That means when I finally do get some output, currentPath doesn't have the value it had when the file was read, so the program is useless. I don't see a synchronous way to decompress the data with the zlip module. I don't see a way to store the context (currentPath would do) so the callback has the right value. I originally tried streams, piping a file stream to a gunzip stream, but I had a similar problem in that all of my callbacks happened after the loop had completed and I'd lost useful context.
It's been a long day and I'm out of ideas for how to structure this. The loop is a synchronous thing, and my asynchronous stuff depends on its state. That is bad. What am I missing? If the files weren't gzipped, this would be easy because of readFileSync().
Wow. I didn't really expect no answers at all. I got in a time crunch but I spent the last couple of days looking over Node.js, hypothesizing why certain things were working like they did, and learning about control flow.
So the code as-is doesn't work because I need a closure to capture the value of currentPath. Boy does Node.js like closures and callbacks. So a better structure for the application would look like this:
function checkFile(currentPath, fileContents) {
var fileContents = fs.readFileSync(currentPath);
zlib.gunzip(fileContents, function(err, buff) {
if (buff.toString().indexOf("position") !== -1) {
console.log("The file '%s' has an odometer reading.", currentPath);
return;
}
});
}
while (files.length > 0) {
var currentPath = rootDir + "\\" + files.shift();
checkFile(currentPath);
}
But it turns out that's not very Node, since there's so much synchronous code. To do it asynchronously, I need to lean on more callbacks. The program turned out longer than I expected so I'll only post part of it for brevity, but the first bits of it look like this:
function checkForOdometer(currentPath, callback) {
fs.readFile(currentPath, function(err, data) {
unzipFile(data, function(hasReading) {
callback(currentPath, hasReading);
});
});
}
function scheduleCheck(filePath, callback) {
process.nextTick(function() {
checkForOdometer(filePath, callback);
});
}
var withReading = 0;
var totalFiles = 0;
function series(nextPath) {
if (nextPath) {
var fullPath = rootDir + nextPath;
totalFiles++;
scheduleCheck(fullPath, function(currentPath, hasReading) {
if (hasReading) {
withReading++;
console.log("%s has a reading.", currentPath);
}
series(files.shift());
});
} else {
console.log("%d files searched.", totalFiles);
console.log("%d had a reading.", withReading);
}
}
series(files.shift());
The reason for the series control flow is it seems if I set up the obvious parallel search I end running out of process memory, probably from having 60,000+ buffers worth of data sitting on the stack:
while (files.length > 0) {
var currentPath = rootDir + files.shift();
checkForOdometer(currentPath, function(callbackPath, hasReading) {
//...
});
}
I could probably set it up to schedule batches of, say, 50 files in parallel and wait to schedule 50 more when those are done. Setting up the series control flow seemed just as easy.

node.js async parallel function returns a single result several times

I'm trying to upload several files in parallel using node.js's async module. My code looks like:
// fileArr is an array of objects. each object contains the attributes of the file to be uploaded - filename, path, destination path, etc.
// toUpload is an array to which i push all the functions i want to execute with async.parallel
for (i=0; i<fileArr.length; i++) {
var src = fileArr[i]['path']
var dest = fileArr[i]['dest']
var fn = function(cb) { self.upload(src, dest, headers, function(err, url) {
cb(null, url)
})
}
toUpload.push(fn)
} // for loop ends here
async.parallel(toUpload, function(err, results) {
console.log('results: ' + results)
})
My problem: for n = the number of functions in toUpload, the results array callback contains the result from the last parallel task in the array, n times. I can't figure this out. it seems like every function should return its own callback with (null, url) to the parallel function.
also - when i try calling the self.upload function with the definitions of src and dest directly:
self.upload(fileArr[i]['path'], fileArr[i]['dest'], headers, function(err, url) {
cb(null, url)
})
i get a error saying "cannot read property 'path' of undefined". so, fileArr[i] is undefined. why does this happen? i feel like there is some weirdness with assignments and scope going on...
if it's not immediately obvious from the question (and the code), i'm pretty new to programming..
Keep in mind, this is essentially:
var src, dest
for (i=0; i<fileArr.length; i++) {
src = fileArr[i]['path']
dest = fileArr[i]['dest']
var fn = function(cb) {
self.upload(src, dest, headers, function(err, url) {
cb(null, url)
})
}
which may make it clearer that by the time your fn function is called, the for loop has finished, so src, and dest will have their final loop value for every call to fn.
Similarly, for self.upload(fileArr[i]['path'], fileArr[i]['dest'], ..., by the time your function runs, the value of i === fileArr.length because the for-loop has finished.
The easiest solution for this would be to use async.map instead.
async.map(
fileArr,
function(file, callback){
self.upload(file['src'], file['dest'], headers, function(err, url){
callback(null, url);
});
},
function(err, results) {
console.log('results: ' + results)
}
)
I'm passing null as the error because that is what you are doing in your example, but you should probably not be discarding errors since they might be important.

Catching console.log in node.js?

Is there a way that I can catch eventual console output caused by console.log(...) within node.js to prevent cloggering the terminal whilst unit testing a module?
Thanks
A better way could be to directly hook up the output you to need to catch data of, because with Linus method if some module write directly to stdout with process.stdout.write('foo') for example, it wont be caught.
var logs = [],
hook_stream = function(_stream, fn) {
// Reference default write method
var old_write = _stream.write;
// _stream now write with our shiny function
_stream.write = fn;
return function() {
// reset to the default write method
_stream.write = old_write;
};
},
// hook up standard output
unhook_stdout = hook_stream(process.stdout, function(string, encoding, fd) {
logs.push(string);
});
// goes to our custom write method
console.log('foo');
console.log('bar');
unhook_stdout();
console.log('Not hooked anymore.');
// Now do what you want with logs stored by the hook
logs.forEach(function(_log) {
console.log('logged: ' + _log);
});
EDIT
console.log() ends its output with a newline, you may want to strip it so you'd better write:
_stream.write = function(string, encoding, fd) {
var new_str = string.replace(/\n$/, '');
fn(new_str, encoding, fd);
};
EDIT
Improved, generic way to do this on any method of any object with async support See the gist.
module.js:
module.exports = function() {
console.log("foo");
}
program:
console.log = function() {};
mod = require("./module");
mod();
// Look ma no output!
Edit: Obviously you can collect the log messages for later if you wish:
var log = [];
console.log = function() {
log.push([].slice.call(arguments));
};
capture-console solves this problem nicely.
var capcon = require('capture-console');
var stderr = capcon.captureStderr(function scope() {
// whatever is done in here has stderr captured,
// the return value is a string containing stderr
});
var stdout = capcon.captureStdout(function scope() {
// whatever is done in here has stdout captured,
// the return value is a string containing stdout
});
and later
Intercepting
You should be aware that all capture functions will still pass the values through to the main stdio write() functions, so logging will still go to your standard IO devices.
If this is not desirable, you can use the intercept functions. These functions are literally s/capture/intercept when compared to those shown above, and the only difference is that calls aren't forwarded through to the base implementation.
Simply add the following snippet to your code will let you catch the logs and still print it in the console:
var log = [];
console.log = function(d) {
log.push(d);
process.stdout.write(d + '\n');
};

Resources