NodeJS: Need idiomatic: Read files in dir, concatenate, transform, write - node.js

I'm trying to write a content manglement program in Node. I'm an old Ruby / Perl / Shell hand for many years, and I can't seem to get simple code that works in those languages to look similarly, eh, simple, in Node.
Task: Find all the *.md files, read them (in ls order), transform them, and bracket them with a header comment and a footer comment. The files, in order, have a pieces of Markdown that, when assembled and transformed, are a sensible HTML doc. Here's a shell implementation:
echo '<!-- Generate at:' $(date) ' -->' $(ls *.md |xargs cat|markdown)'<!-- Copyright Mumble-demo Inc. -->'
Produces the desired HTML:
<!-- Generate at: Tue Jun 6 08:25:59 EDT 2017 --> <h1>This is a Markdown File</h1> <h2>Heading 1</h2> <p>Inside of markdown we can create many interesting items</p> <ul> <li>such</li> <li>as</li> <li>lists</li> </ul><!-- Copyright Mumble-demo Inc. -->
Ruby is similarly reasonable...
#!/usr/bin/env ruby
require 'kramdown'
HEADER = "<!-- Generated at #{Time.now} -->\n"
FOOTER = "\n<!-- Copyright Mumble-demo Inc. -->"
OUTPUT = File.open("./output", "w")
results = Dir.glob("*.md").map { |f| File.open(f).readlines.join() }.reduce(:+)
OUTPUT.print(HEADER, Kramdown::Document.new(results).to_html, FOOTER)
But I can't figure out how to do this in Node in a Way That Feels Right(™)
A Way That Feels Wrong(™) is with the synchronous interfaces:
const fs = require("fs")
const marked = require("marked")
const HEADER = `<!-- Generated at ${new Date()} -->\n`
const FOOTER = `\n<!-- Copyright Mumble-demo Inc. -->`
fs.readdir(".", (err, files) => {
if (err) throw err;
let markdownFiles = files.filter((f) => f.endsWith(".md"))
let content = markdownFiles.reduce((memo, fileName) => {
return memo + fs.readFileSync(fileName, 'utf8')
}, "")
let contentString = [HEADER, marked(content), FOOTER].reduce((m, i) => m + i, "")
fs.writeFileSync("derp", contentString);
console.log(contentString);
})
A Way That Feels Right But That I Can't Get To Work(™) is:
Build read streams
Pipe them to markdown transform streams
Open an output stream and redirect transformed data to it
Great news is – this approach works until it comes time to put the header comments in at the top and the bottom. They live in the code, not on the filesystem so I can't "just add" them as another file to stream, sans transform, into the output stream. Most approaches wind up producing: header, footer, streamed data
Obviously the pipe()-work works asynchronously and the footer print fires before the read + transform work is done. I've tried horrible (and broken) Promise chains that ultimately did not work.
One alternate approach would be to turn the header and footer into streams (seems weird...) and flow them into the output stream as well (seems really weird).
I've stumped several seasoned developers with this...surely we're missing some common idiom here or is it actually this hard to do this simple task simply in Node?

Thoughts:
For most of my shell scripts, I simply read file contents into strings, synchronously. This approach doesn’t scale, but it usually doesn’t need to. And everything is so much easier with strings.
If you do anything asynchronous: Use async functions and util.promisify().
Long-term, asynchronous iteration and async generators will help with this kind of scenario, too.

You can run it "A Way That Feels Right" via synchronous executor nsynjs. Your code may transform similarly to this working example:
md-cat.js:
var nsynjs = require('nsynjs');
var nsynFs = require('../wrappers/nodeFs'); // part of nsynjs package, needs to be added manually
var synchronousCode = function(nsynFs) {
var HEADER = "<!-- Generated at "+new Date()+" -->\n";
var FOOTER = "\n<!-- Copyright Mumble-demo Inc. -->";
var files = nsynFs.readdir(nsynjsCtx, ".").data;
var content="";
for(var i=0; i<files.length; i++) {
var file = files[i];
if(file.endsWith('.md'))
content+=nsynFs.readFile(nsynjsCtx,file,"utf8").data;
}
nsynFs.writeFile(nsynjsCtx,"derp",HEADER+content+FOOTER);
};
nsynjs.run(synchronousCode, {},nsynFs, function () {
console.log('synchronousCode done')
});
Even though it looks synchronous, it does not use any synchronous functions under the hood, therefore it will not block node's event loop.

Try Gulp, which is a most idiomatic way nowadays.
If can't or don't want, use Promise chains, they feels like shell pipes.
#!/usr/bin/env node
'use strict';
const Promise = require('bluebird');
const fs = Promise.promisifyAll(require('fs'));
const path = require('path');
const marked = require('marked');
const HEADER = `<!-- Generated at ${(new Date()).toISOString()} -->`;
const FOOTER = '<!-- Copyright Mumble-demo Inc. -->';
fs.readdirAsync(process.cwd())
.map((fileName) => Promise.all([fileName, fs.statAsync(fileName)]))
.filter(([fileName, stat]) => stat.isFile() && path.extname(fileName) === '.md')
.call('sort', ([a], [b]) => a.localeCompare(b, 'en-US'))
.map(([mdFileName]) => fs.readFileAsync(mdFileName, 'utf8'))
.then((mdFiles) => {
let out = [HEADER, marked(mdFiles.join('\n')), FOOTER].join('').replace(/\n/g, '');
console.log(out);
return fs.writeFileAsync('out.html', out);
})
.catch((err) => {
console.error(err);
process.exit(1);
});
Thoughts:
Never write sync code in Node, you will always regret.
Promise chains work best for such tasks.
stat.isFile() and sort(), are just safety features, missing in bash and Ruby examples. Removing them you can save two lines of code.
Usage of Date.prototype.toString() should be considered as a bug in most cases, because output is unpredictable, it is platform and locale specific.
Node streams are overkill until you deal with huge files, which is usually not a case for markdown tasks.
Shell pipes are also not using file system streams and load everything in a memory. Efficiency roughly the same.

Related

Redirect Readable object stdout process to file in node

I use an NPM library to parse markdown to HTML like this:
var Markdown = require('markdown-to-html').Markdown;
var md = new Markdown();
...
md.render('./test', opts, function(err) {
md.pipe(process.stdout)
});
This outputs the result to my terminal as intended.
However, I need the result inside the execution of my node program. I thought about writing the output stream to file and then reading it in at a later time but I can't figure out a way to write the output to a file instead.
I tried to play around var file = fs.createWriteStream('./test.html'); but the node.js streams rather give me headaches than results.
I've also looked into the library's repo and Markdown inherits from Readable via util like this:
var util = require('util');
var Readable = require('stream').Readable;
util.inherits(Markdown, Readable);
Any resources or advice would be highly appreciated. (I would also take another library for parsing the markdown, but this gave me the best results so far)
Actually creating a writable file-stream and piping the markdown to this stream should work just fine. Try it with:
const writeStream = fs.createWriteStream('./output.html');
md.render('./test', opts, function(err) {
md.pipe(writeStream)
});
// in case of errors you should handle them
writeStream.on('error', function (err) {
console.log(err);
});

Use child_process#spawn with a generic string

I have a script in the form of a string that I would like to execute in a Node.js child process.
The data looks like this:
const script = {
str: 'cd bar && fee fi fo fum',
interpreter: 'zsh'
};
Normally, I could use
const exec = [script.str,'|',script.interpreter].join(' ');
const cp = require('child_process');
cp.exec(exec, function(err,stdout,sterr){});
however, cp.exec buffers the stdout/stderr, and I would like to be able to be able to stream stdout/stderr to wherever.
does anyone know if there is a way to use cp.spawn in some way with a generic string, in the same way you can use cp.exec? I would like to avoid writing the string to a temporary file and then executing the file with cp.spawn.
cp.spawn will work with a string but only if it has a predictable format - this is for a library so it needs to be extremely generic.
...I just thought of something, I am guessing the best way to do this is:
const n = cp.spawn(script.interpreter);
n.stdin.write(script.str); // <<< key part
n.stdout.setEncoding('utf8');
n.stdout.pipe(fs.createWriteStream('./wherever'));
I will try that out, but maybe someone has a better idea.
downvoter: you are useless
Ok figured this out.
I used the answer from this question:
Nodejs Child Process: write to stdin from an already initialised process
The following allows you to feed a generic string to a child process, with different shell interpreters, the following uses zsh, but you could use bash or sh or whatever executable really.
const cp = require('child_process');
const n = cp.spawn('zsh');
n.stdin.setEncoding('utf8');
n.stdin.write('echo "bar"\n'); // <<< key part, you must use newline char
n.stdout.setEncoding('utf8');
n.stdout.on('data', function(d){
console.log('data => ', d);
});
Using Node.js, it's about the same, but seems like I need to use one extra call, that is, n.stdin.end(), like so:
const cp = require('child_process');
const n = cp.spawn('node').on('error', function(e){
console.error(e.stack || e);
});
n.stdin.setEncoding('utf-8');
n.stdin.write("\n console.log(require('util').inspect({zim:'zam'}));\n\n"); // <<< key part
n.stdin.end(); /// seems necessary to call .end()
n.stdout.setEncoding('utf8');
n.stdout.on('data', function(d){
console.log('data => ', d);
});

Creating multiple files from Vinyl stream with Through2

I've been trying to figure this out by myself, but had no success yet. I don't even know how to start researching for this (though I've tried some Google searchs already, to no avail), so I decided to ask this question here.
Is it possible to return multiple Vinyl files from a Through2 Object Stream?
My use case is this: I receive an HTML file via stream. I want to isolate two different sections of the files (using jQuery) and return them in two separate HTML files. I can do it with a single section (and a single resulting HTML file), but I have absolutely no idea on how I would do generate two different files.
Can anyone give me a hand here?
Thanks in advance.
The basic approach is something like this:
Create as many output files from your input file as you need using the clone() function.
Modify the .contents property of each file depending on what you want to do. Don't forget that this is a Buffer, not a String.
Modify the .path property of each file so your files don't overwrite each other. This is an absolute path so use something like path.parse() and path.join() to make things easier.
Call this.push() from within the through2 transform function for every file you have created.
Here's a quick example that splits a file test.txt into two equally large files test1.txt and test2.txt:
var gulp = require('gulp');
var through = require('through2').obj;
var path = require('path');
gulp.task('default', function () {
return gulp.src('test.txt')
.pipe(through(function(file, enc, cb) {
var c = file.contents.toString();
var f = path.parse(file.path);
var file1 = file.clone();
var file2 = file.clone();
file1.contents = new Buffer(c.substring(0, c.length / 2));
file2.contents = new Buffer(c.substring(c.length / 2));
file1.path = path.join(f.dir, f.name + '1' + f.ext);
file2.path = path.join(f.dir, f.name + '2' + f.ext);
this.push(file1);
this.push(file2);
cb();
}))
.pipe(gulp.dest('out'));
});

Gulp: Passing through to a stream depending on the contents of a stream

I have the following simplified gulp task:
gulp.src(...)
.pipe(stuff())
.pipe(moreStuff())
.pipe(imagemin())
.pipe(yetMoreStuff());
I only want the imagemin stream to be called when the file path contains "xyz", but I want the other three streams to always be called.
Called gulp.src() in another place is not appropriate—this example is massively simplified, and duplicating everything would be messy as hell.
So far, I've got this far:
var through = require('through2');
gulp.src(...)
.pipe(stuff())
.pipe(moreStuff())
.pipe(through.obj(function (file, enc, cb) {
console.log(file.path.indexOf('hero') !== -1);
// file has a pipe method but what do I do?!
}))
.pipe(yetMoreStuff());
Doesn't do anything. I don't know vinyl / streams well enough to be able to do this by myself :(
How do I do this?
It sounds like gulp-filter might be what you're looking for.
var Filter = require('gulp-filter');
var filter = Filter(['**xyz**']);
gulp.src(...)
.pipe(stuff())
.pipe(moreStuff())
.pipe(filter)
.pipe(imagemin())
.pipe(filter.restore())
.pipe(yetMoreStuff());

NodeJS: Asynchronous file read problems

New to NodeJS.
Yes I know I could use a framework, but I want to get a good grok on it before delving into the myriad of fine fine tools that are out there.
my problem:
var img = fs.readFileSync(path);
the above works;
fs.readFile(path, function (err, data)
{
if (err) throw err;
console.log(data);
});
the above doesn't work;
the input path is : 'C:\NodeSite\chrome.jpg'
oh and working on Windows 7.
any help would be much appreciated.
Fixed
Late night/morning programming, introduces errors that are hard to spot. The path was being set from two different places, and so the source path were different in both cases. Thankyou for your help. I am a complete numpty. :)
If you are not setting an encoding when reading a file, you will get the binary content.
So for example, the following snippet will output the content of the test file using UTF-8 encoding. If you don't use an encoding, you will get an output like "" on your console (raw binary buffer).
var fs = require('fs');
var path = "C:\\tmp\\testfile.txt";
fs.readFile(path, 'utf8', function (err, data) {
if (err) throw err;
console.log(data);
});
Another issue (especially on windows-based OS's) can be the correct escaping of the target path. The above example shows how path's on Windows have to be escaped.
java guys will just use this javascript asynchronous command as if in pure java , troublefreely :
var fs = require('fs');
var Contenu = fs.readFileSync( fILE_FULL_Name , 'utf8');
console.log( Contenu );
That should take care of small & big files.

Resources