NodeJS, batch processing files, alternative to writing excessive objects/functions per file?

NodeJS, batch processing files, alternative to writing excessive objects/functions per file? - node.js

I needed to get all mustache templates directory named templates/, and compile them with hogan.
In theory, assume their names are,
file1.mustache
file2.mustache
file3.mustache
Then we get a view on each one, and save the result to an output directory named build/.
In theory the resulting names would be,
name.file1
name.file2
name.file3
Obviously async is preferable, but I am most interested in how you'd do this efficiently? I can't believe that the only way is doing per file objects and anonymous functions.

You could use the fs-promise module along with Promise.all to easily read, process, and write your files in parallel:
const fsp = require('fs-promise');
function processTemplate(filename) {
return fsp.readFile(filename, 'utf8')
.then((template) => hogan.compile(template))
.then((compiledTemplate) => fsp.writeFile('path/to/compiled', compiledTemplate));
}
fsp.readdir('./templates')
.then((files) => Promise.all(files.map(processTemplate)))
.catch((error) => console.log(error));
Although I'm not sure I understand what you mean by "per file objects and anonymous functions".

Related

Unable to use one readable stream to write to two different targets in Node JS

I have a client side app where users can upload an image. I receive this image in my Node JS app as readable data and then manipulate it before saving like this:
uploadPhoto: async (server, request) => {
try {
const randomString = `${uuidv4()}.jpg`;
const stream = Fse.createWriteStream(`${rootUploadPath}/${userId}/${randomString}`);
const resizer = Sharp()
.resize({
width: 450
});
await data.file
.pipe(resizer)
.pipe(stream);
This works fine, and writes the file to the projects local directory. The problem comes when I try to use the same readable data again in the same async function. Please note, all of this code is in a try block.
const stream2 = Fse.createWriteStream(`${rootUploadPath}/${userId}/thumb_${randomString}`);
const resizer2 = Sharp()
.resize({
width: 45
});
await data.file
.pipe(resizer2)
.pipe(stream2);
The second file is written, but when I check the file, it seems corrupted or didn't successfully write the data. The first image is always fine.
I've tried a few things, and found one method that seems to work but I don't understand why. I add this code just before the I create the second write stream:
data.file.on('end', () => {
console.log('There will be no more data.');
});
Putting the code for the second write stream inside the on-end callback block doesn't make a difference, however, if I leave the code outside of the block, between the first write stream code and the second write stream code, then it works, and both files are successfully written.
It doesn't feel right leaving the code the way it is. Is there a better way I can write the second thumb nail image? I've tried to use the Sharp module to read the file after the first write stream writes the data, and then create a smaller version of it, but it doesn't work. The file doesn't ever seem to be ready to use.

You have 2 alternatives, which depends on how your software is designed.
If possible, I would avoid to execute two transform operations on the same stream in the same "context", eg: an API endpoint. I would rather separate those two different tranform so they do not work on the same input stream.
If that is not possible or would require too many changes, the solution is to fork the input stream and the pipe it into two different Writable. I normally use Highland.js fork for these tasks.
Please also see my comments on how to properly handle streams with async/await to check when the write operation is finished.

Running node js export in google cloud function

We need to export a zip file, containing lots of data (a couple of gb). The zip archive needs to contain about 50-100 indesign files (each about 100mb) and some other smaller files. We try to use google cloud functions to achieve it (less costs etc.) The function is triggered via a config file, which is uploaded into a bucket. The config file contains all information which files needs to be put into the zip. Unfortunately the memory limit of 2gb is always reached, so the function never succeeds.
We tried different things:
First solution was to loop over the files, create promises to download them and after the loop is done we tried to resolve all promises at once. (files are downloaded via streaming directly into a file).
Second try was to await every download inside the for loop, but again, memory limit reached.
So my question is:
Why does node js not clear the streams? It seems like node keeps every streamed file in memory and finally crashes. I already tried to set the readStream and writeStream to null as suggested here:
How to prevent memory leaks in node.js?
But no change.
Note: We never reached the point, there all files are downloaded to create the zip file. It always failed after the first files.
See below the code snippets:
// first try via promises all:
const promises = []
for (const file of files) {
promises.push(downloadIndesignToExternal(file, 'xxx', dir));
}
await Promise.all(promises)
// second try via await every step (not performant in terms of execution time, but we wanted to know if memory limit is also reached:
for (const file of files) {
await downloadIndesignToExternal(file, 'xxx', dir);
}
// code to download indesign file
function downloadIndesignToExternal(activeId, externalId, dir) {
return new Promise((resolve, reject) => {
let readStream = storage.bucket(INDESIGN_BUCKET).file(`${activeId}.indd`).createReadStream()
let writeStream = fs.createWriteStream(`${dir}/${externalId}.indd`);
readStream.pipe(writeStream);
writeStream.on('finish', () => {
resolve();
});
writeStream.on('error', (err) => {
reject('Could not write file');
})
})
}

It's important to know that /tmp (os.tmpdir()) is a memory-based filesystem in Cloud Functions. When you download a file to /tmp, it is taking up memory just as if you had saved it to memory in a buffer.
If your function needs more memory than can be configured for a function, then Cloud Functions might not be the best solution to this problem.
If you still want to use Cloud Functions, you will have to find a way to stream the input files directly to the output file, but without saving any intermediate state in the function. I'm sure this is possible, but you will probably need to write a fair amount of extra code for this.

For anyone interested:
We got it working by streaming the files into the zip and streaming it directly into google cloud storage. Memory usage is now by around 150-300mb, so this works perfectly for us.

Puppeteer evaluate not executing required function

I've got a little app i've written with node.js and puppeteer. I'm trying to require a function from a different file into my evaluate callback, however the function never fires and causes evaluate to fail. Here is a pretty simple example, and maybe somebody can see if i'm just doing something stupid here.
Evaluate is called from File A
product = await page.evaluate( source.getProductInformation )
source.getProductInformation is defined in File B, this function fails when I call a function I require from within File B
const priceSavePercent = calculateSavePercentage(priceWasNum, priceCurrentNum)
calculateSavePercentage is simply required at the top of File B const { calculateSavePercentage } = require('../modules/helpers')
I try to console log everywhere and don't get any output to my console, and my evaluate callback doesn't return the object it's suppose to. Is there a different way i'm suppose to require dependencies into File B? I have an npm package and a constant also required in File B and both don't cause issues. Any help is greatly appreciated. Let me know if you need any more info.

The problem is that when you evaluate, you are accessing the page you are scraping javascript, not your own so that function isn't defined you could try something like this.
await page.evaluate((source) => {
source.getProductInformation();
}, source);

What is the most efficient way to keep writing a frequently changing JavaScript object to a file in NodeJS?

I have a JavaScript object with many different properties, and it might look something like this:
var myObj = {
prop1: "val1",
prop2: [...],
...
}
The values in this object keep updating very frequently (several times every second) and there could be thousands of them. New values could be added, existing ones could be changed or removed.
I want to have a file that always has the updated version of this object. The simple approach for doing this would be just writing the entire object to the file all over again after each time that it changes like so:
fs.writeFileSync("file.json", JSON.stringify(myObj));
This doesn't seem very efficient for big objects that need to be written very frequently. Is there a better way of doing this?

You should use a database. Something simple like sqlite3 would be a good option. Have a table with just two columns 'Key' 'Value' and use it as a key value store. You will gain advantages like transactions and better performance than a file as well as simplifying your access.

Maintaining a file (on the filesystem) containing the current state of a rapidly changing object is surprisingly difficult. Specifically, setting things up so some other program can read the file at any time is the hard part. Why? At any time the file may be in the process of being written, so the reader can get inconsistent results.
Here's an outline of a good way to do this.
1) write the file less often than each time the state changes. Whenever the state changes call updateFile (myObj). It sets a timer for, let's say, 500ms, then writes the very latest state to the file when the timer expires. Something like this: not debugged:
let latestObj
let updateFileTimer = 0
function updateFile (myObj) {
latestObj = myObj
if (updateFileTimer === 0) {
updateFileTimer = setTimeout (
function () {
/* write latestObj to the file */
updateFileTimer = 0
}, 500)
}
}
This writes the latest state of your object to the file, but no more than every 500ms.
Inside that timeout function, write out a temporary file. When it's written delete the existing file and rename the temp file to have the existing file's name. Do all this asynchronously so the rest of your program won't have to wait for the filesystem to work. Your timeout function will look like this
updateFileTimer = setTimeout (
function () {
/* write latestObj to the file */
fs.writeFile("file.json.tmp",
JSON.stringify(myObj),
function (err) {
if (err) throw err;
fs.unlink ( "file.json",
function (err) {
if (!err)
fs.renameSync( "file.json.tmp", "file.json")
} )
} )
updateFileTimer = 0
}, 500)
There's one more thing to worry about. There's a brief period of time between the unlink and the renameSync operation where the "file.json" file does not exist in the file system. So, any program you write that READs "file.json" needs to try again if the file isn't found.
If you use a Linux, MacOs, FreeBSD, or other UNIX-derived operating system for this code it will work well. Those operating systems' file systems allow one program to unlink a file while another program is reading it. If you're running it on a DOS-derived operating system like Windows, the unlink operation will fail when another program is reading the file.

Is it possible in Node to read in a file an export its contents (without exporting asyncronly)?

TLDR: I want to read in a file's contents and then export a function which relies on those contents ... without making that exported function use promises or some other form of asynchronicity.
I'm trying to write an XML-validating module, and in order for it to do its thing I need to read in an XSD file. However, this only needs to happen once at "load time", so ideally I'd rather not have other modules that use my function have to wait for a promise to resolve to get their results. If I were using Webpack this would be easy, as I could use it's text file loader to bring in the XSD as if it were any other module ... but unfortunately I'm not.
In other words, currently I have to do (borderline pseudo-code):
module.exports.validate = () =>
new Promise((resolve) =>
fs.readFile(path, (file) => {
// use file to validate, then:
resolve(validationResult);
});
});
};
and instead I'd like to do:
fs.readFile(path, (file) => {
module.exports.validate = myValidationFunction;
});
But the above doesn't work because you can't export from callbacks, so my question is, is there any other way to accomplish this?
The https://github.com/jonschlinkert/to-exports library seems to offer exactly this, so it seems like it's possible ... but it doesn't work for me :(
P.S. At worst I could literally wrap the contents of the file inside the template string characters, rename the file to be .js, and export it that way:
module.exports = `*XSD contents go here*`;
However, that seems very kludgy, so I'm hoping there is a better way.

If you want to read a file synchronously, then use fs.readFileSync. It returns the contents of the file or throws an error.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

NodeJS, batch processing files, alternative to writing excessive objects/functions per file? - node.js

Related

Unable to use one readable stream to write to two different targets in Node JS

Running node js export in google cloud function

Puppeteer evaluate not executing required function

What is the most efficient way to keep writing a frequently changing JavaScript object to a file in NodeJS?

Is it possible in Node to read in a file an export its contents (without exporting asyncronly)?

Categories

Resources