Node.js Web Server fs.createReadStream vs fs.readFile? - node.js

So I am writing my web server in pure node.js, only use bluebird for promisify. This has been bothering me for a week, and I can't decide which one should I really use. I have read tons of posts, blogs and docs on these two topic, please answer based on your own working experience, thanks. Here goes the detailed sum up and related questions.
Both approach has been tested, they both work great. But I can't test for performance, I only have my own basic website files(html, css, img, small database, etc.), and I never have managed video files and huge databases.
Below are code parts, to give you basic ideas(if you really know which one to use, don't bother reading the code, to save you some time), this question is not about logic, so you can just read the parts between dash lines.
About fs.createReadStream:
Pros: good for huge files, it reads a chunk at a time, saves memory and pipe is really smart.
Cons: Synchronous, can't be promisified(stream is a different concept to promise, too hard to do, not worth it).
//please ignore IP, its just a custom name for prototyping.
IP.read = function (fpath) {
//----------------------------------------------------
let file = fs.createReadStream(fpath);
file.on('error', function () {
return console.log('error on reading: ' + fpath);
});
return file;
//----------------------------------------------------
};
//to set the response of onRequest(request, response) in http.createServer(onRequest).
IP.setResponse = function (fpath) {
let ext = path.extname(fpath),
data = IP.read(fpath);
return function (resp) {
//----------------------------------------------------
//please ignore IP.setHeaders.
resp.writeHead(200, IP.setHeaders(ext));
data.pipe(resp).on('error', function (e) {
cosnole.log('error on piping ' + fpath);
});
//----------------------------------------------------
}
};
About fs.readFile:
Pros: Asynchronous, can easily be promisified, which makes code really easy to write(develop) and read(maintain). And other benefits I haven't gotten a hand on yet, like data validation, security, etc.
Cons: bad for huge files.
IP.read = function (fpath) {
//----------------------------------------------------
let file = fs.readFileAsync(fpath);
return file;
//----------------------------------------------------
};
//to set the response of onRequest(request, response) in http.createServer(onRequest).
IP.setResponse = function (fpath) {
const ext = path.extname(fpath);
return function (resp) {
//----------------------------------------------------
IP.read(fpath).then((data) => {
resp.writeHead(200, IP.setHeaders(ext));
resp.end(data);
}).catch((e) => {
console.log('Problem when reading: ' + fpath);
console.log(e);
});
//----------------------------------------------------
}
};
Here are my options:
• The easy way: Using fs.createReadStream for everything.
• The proper way: Using fs.createReadStream for huge files only.
• The practical way: Using fs.readFile for everything, until related problems occur, then handle those problems using fs.createReadStream.
My final decision is using fs.createReadStream for huge files only(I will create a function just for huge files), and fs.readFile for everything else. Is this a good/proper decision? Any better suggestions?
P.S.(not important):
I really like building infrastructure on my own, to give you an idea, when I instantiate a server, I can just set the routes like this, and customize whatever or however I want. Please don't suggest me to use frameworks:
let routes = [
{
method: ['GET', 'POST'],
uri: ['/', '/home', '/index.html'],
handleReq: function () {return app.setResp(homeP);}
},
{
method: 'GET',
uri: '/main.css',
handleReq: function () {return app.setResp(maincssP);}
},
{
method: 'GET',
uri: '/test-icon.svg',
handleReq: function () {return app.setResp(svgP);}
},
{
method: 'GET',
uri: '/favicon.ico',
handleReq: function () {return app.setResp(iconP);}
}
];
Or I can customize it and put it in a config.json file like this:
{
"routes":[
{
"method": ["GET", "POST"],
"uri": ["/", "/home"],
//I will create a function(handleReq) in my application to handle fpath
"fpath": "./views/index.html"
},
{
"method": "GET",
"uri": "/main.css",
"fpath": "./views/main.css"
},
{
"method": "GET",
"uri": "/test-icon.svg",
"fpath": "./views/test-icon.svg"
}
]
}

Let's discuss the actual practical way.
You should not be serving static files from Node.js in production
createReadStream and readFile are both very useful - createReadStream is more efficient in most cases and consider it if you're processing a lot of files (rather than serving them).
You should be serving static files from a static file server anyway - most PaaS web hosts do this for you automatically and if you set up an environment yourself you'll find yourself reverse-proxying node anyway behind something like IIS which should serve static files.
This is only true for static files, again, if you read them and transform them multiple times your question becomes very relevant.
For other purposes, you can use fs.readFileAsync safely
I use readFile a lot for reading files to buffers and working with them, while createReadStream can improve latency - overall you should get similar throughput and the API is easier to work with and more high level.
So in conclusion
If you're serving static files and care about performance - don't do it from Node.js in production anyway.
If you're transforming files as streams and latency is important, use createReadStream.
Otherwise prefer readFile.

Related

How to process large number of requests with promise all

I have about 5000 links and I need to crawl all those. So Im wonder is there a better approach than this. Here is my code.
let urls = [ 5000 urls go here ];
const doms = await getDoms(urls);
// processing and storing the doms
getDoms = (urls) => {
let data = await Promise.all(urls.map(url => {
return getSiteCrawlPromise(url)
}));
return data;
}
getSiteCrawlPromise = (url) => {
return new Promise((resolve, reject) => {
let j = request.jar();
request.get({url: url, jar: j}, function(err, response, body) {
if(err)
return resolve({ body: null, jar: j, error: err});
return resolve({body: body, jar: j, error: null});
});
})
}
Is there a mechanism implemented in promise so it can devide the jobs to multiple threads and process. then return the output as a whole ?
and I don't want to devide the urls into smaller fragments and process those fragments
The Promise object represents the eventual completion (or failure) of an asynchronous operation, and its resulting value.
There is no in-built mechanism in Promises to "divide jobs into multiple threads and process". If you must do that, you'll have to fragment the urls array into smaller arrays and queue the fragmented arrays onto separate crawler instances simultaneously.
But, there is absolutely no need to go that way, since you're using node-js and node-crawler, you can use the maxConnections option of the node-crawler. This is what it was built for and the end result would be the same. You'll be crawling the urls on multiple threads, without wasting time and effort on manual chunking and handling of multiple crawler instances, or depending on any concurrency libraries.
There isn't such a mechanism built-in to Javascript, at least right now.
You can use third-party Promise libraries that offer more features, like Bluebird, in which you can make use of their concurrency feature:
const Promise = require('bluebird');
// Crawl all URLs, with 10 concurrent "threads".
Promise.map(arrayOfUrls, url => {
return /* promise for crawling the url */;
}, { concurrency: 10 });
Another option is to use a dedicated throttling library (I recommend highly bottleneck), which lets you express any generic kind of rate limit. The syntax in that case would be similar to what you already have:
const Bottleneck = require('bottleneck');
const limit = new Bottleneck({ maxConcurrent: 10 });
const getCallSitePromise = limit.wrap(url => {
// the body of your getCallSitePromise function, as normal
});
// getDoms stays exactly the same
You can solve this problem yourself, but bringing one (or both!) of the libraries above will save you a lot of code.

Another Node.js newb who doesn't get it

EDITED: adjusted my narrative, and attempted to add output to code as examples show but it doesn't work. What am I doing wrong?
Hi experts or enthusiast superior to myself,
The question is "how to properly get the output from a asynchronous function in node.js as a return". The examples all talk of this mysterious callback function but in the context of my code I don't see how it applies or gets implemented.
Yes, the question has been asked many times, if I had to ask it again it is because the explanations provided didn't get this newb to a understanding. Yes, I spent near 24 hours or so trying to follow the examples, documentation, and other posts, but I didn't find one that explained it clear enough that I could apply to my code.
The concept of asynchronous makes sense, the code runs but the, in this case, https call hasn't. The code doesn't wait for the https call. You have to somehow grab the result after it has completed. While I haven't found the practicality of it, I am sure I will as I continue to learn why node.js is special in this way. Assuming my understanding is mostly right, my question is still the same. Concept is one thing, application and syntax are another.
This seems to be a common question and something nearly everyone new has trouble with.
Thus far none of the examples or explanations seem to clarify where or how with what I am working with. I understand there are additional modules that handle these differently but I believe I wont understand the 'why/how' as it applies unless I figure this out properly.
As I am brand new to node.js feel free to expand on any aspect of my code as I am eager to learn.
If anyone is finding this, this code will get data from the official Clash Royal API for which you require to register your IP and get a token from https://developer.clashroyale.com.
app.js
require('dotenv').config();
var func = require('./functions.js');
console.log(func.uChests(process.env.MyPlayer)); //this should output the value
functions.js
require('dotenv').config();
//console.log('Loaded Functions')
module.exports.uChests = func_uChests
//Clearly wrong application
//function func_uChests (playerID) {
function func_uChests (playerID,output) {
//console.log('uChests')
var http = require("https");
var options = {
"method": "GET",
"hostname": "api.clashroyale.com",
"port": null,
"path": "/v1/players/%23"+ playerID + "/upcomingchests",
"headers": {
"content-length": "0",
"authorization": "Bearer " + process.env.Token,
"accept": "application/json"
}
};
var req = http.request(options, function (res) {
var chunks = [];
res.on("data", function (chunk) {
chunks.push(chunk);
});
res.on("end", function () {
var body = Buffer.concat(chunks);
console.log(body.toString());
/* example output
{"items":[{"index":0,"name":"Magical Chest"},{"index":1,"name":"Silver Chest"},{"index":2,"name":"Silver Chest"},{"index":3,"name":"Golden Chest"},{"index":4,"name":"Silver Chest"},{"index":5,"name":"Silver Chest"},{"index":6,"name":"Silver Chest"},{"index":7,"name":"Golden Chest"},{"index":8,"name":"Silver Chest"},{"index":22,"name":"Legendary Chest"},{"index":40,"name":"Giant Chest"},{"index":76,"name":"Super Magical Chest"},{"index":77,"name":"Epic Chest"}]}
{"items":[{"index":0,"name":"Magical Chest"},{"index":1,"name":"Silver Chest"},{"index":2,"name":"Silver Chest"},{"index":3,"name":"Golden Chest"},{"index":4,"name":"Silver Chest"},{"index":5,"name":"Silver Chest"},{"index":6,"name":"Silver Chest"},{"index":7,"name":"Golden Chest"},{"index":8,"name":"Silver Chest"},{"index":22,"name":"Legendary Chest"},{"index":40,"name":"Giant Chest"},{"index":76,"name":"Super Magical Chest"},{"index":77,"name":"Epic Chest"}]}
*/
});
});
req.end();
}
//Clearly wrong application
function uChests(input, output) {
func_uChests(input, output);
console.log(output);
};
i think you should understand better na async nature of node , the only way you can return values to the caller statement is using a function parameter or Async/Await with Promises API,take a look below.
´
// return from a function parameter
myAsyncFunction(function(value){
console.log(value)
})
// or using the Promise API
let value = await myAsyncFunction()´

Consecutive writings to several files in the server

(* As my previous question has been more or less answered, and the code has evolved, I open a new question. *)
I want to build a simple playground with MEAN stack that mimics plunker: we have a list of files and a textarea on the left hand, and a live preview on the right hand. Note that the files are saved in a temporary folder, and the live preview is an iframe injected by the files from that temporary folder.
********** what I have written **********
In the html file, I add one controller per file, so that I could track exactly which file is changed in the textarea. Then, I just need to save that file to the server, rather than all the files:
<div ng-repeat="file in files track by $index" ng-controller="fileCtrl">
<a ng-click="go(file)">{{file.name}}</a>
</div>
the controller:
app.controller('fileCtrl', ['$scope', 'codeService', function ($scope, codeService) {
$scope.$watch('file', function () {
codeService.render($scope.files, $scope.file);
}, true);
}]);
the codeService:
app.service('codeService', ['$http', function ($http) {
this.render = function (files, changedFile) {
$http.post('/writeFile', changedFile);
};
}]);
the router:
router.post('/writeFile', function (req, res, next) {
var file = req.body;
var fs = require('fs');
fs.writeFile("public/tmp/" + file.name, file.body, function (err) {
if (err) { return console.log(err) };
});
});
********** my tests **********
My tests show that the modification in the textarea is well caught, and the modified file can more or less be saved to the server: it works quite well for the 1st, 2nd writings, but it often has trouble for the writings that follow.
********** my questions **********
Could anyone help me re-structure the code to handle well:
asynchronous calls of writings, such that all the writings are well (and fast) undertaken.
writings with debounce, that means we could wait a little bit before each saving. But it's a little bit tricky when we have several files: assume we could switch among files very fast, how would debounce and asynchronous saving perform in that case?
1. Here is plunker http://plnkr.co/edit/NerwghZaqcRuoswHYPlq?p=preview for you where you can play with debounce and asynchronous functions. 2. Once you switch between files very fast and type in different files, only the last result will be send to the server
3. Don't forget to handle errors in case you get it from BE (add catch() after each $http call to the server)
4. And on the Node Js side please add promise library that simplify your task. Personally I prefer Bluebird http://bluebirdjs.com/docs/api/promise.all.html
var files = [];
for (var i = 0; i < 100; ++i) {
files.push(fs.writeFileAsync("file-" + i + ".txt", "", "utf-8"));
}
Promise.all(files).then(function() {
console.log("all the files were created");
});
Hope it helps

Node async vs sync

I am writing a node server that reads/deletes/adds/etc a file from the filesystem. Is there any performance advantage to reading asynchronously? I can't do anything while waiting for the file to be read. Example:
deleteStructure: function(req, res) {
var structure = req.param('structure');
fs.unlink(structure, function(err) {
if (err) return res.serverError(err);
return res.ok();
});
}
I am also making requests to another server using http.get. Is there any performance advantage to fetching asynchronously? I can't do anything while waiting for the file to be fetched. Example:
getStructure: function(req, res) {
var structure = urls[req.param('structure')];
http.get(structure).then(
function (response) {
return res.send(response);
},
function (err) {
res.serverError(err)
}
);
}
If there is no performance advantage to reading files asynchronously, I can just use the synchronous methods. However, I am not aware of synchronous methods for http calls, do any built in methods exist?
FYI I am using Sails.js.
Thanks!
I can't do anything while waiting for the file to be read.
I can't do anything while waiting for the file to be fetched.
Wrong; you can handle an unrelated HTTP request.
Whenever your code is in the middle of a synchronous operation, your server will not respond to other requests at all.
This asynchronous scalability is the biggest attraction for Node.js.

Node.js: detect when all events of a request have finished

Sorry if this question is simple but I have been using node.js for only a few days.
Basically i receive a json with some entries. I loop on these entries and launch a http request for each of them. Something like this:
for (var i in entries) {
// Lots of stuff
http.get(options, function(res) {
// Parse reponse and detect if it was successfully
});
}
How can i detect when all requests were done? I need this in order to call response.end().
Also i will need to inform if each entry had success or not. Should i use a global variable to save the result of each entry?
You can e.g. use caolans "async" library:
async.map(entries, function(entry, cb) {
http.get(options, function(res) {
// call cb here, first argument is the error or null, second one is the result
})
}, function(err, res) {
// this gets called when all requests are complete
// res is an array with the results
}
There are many different libraries for that. I prefer q and qq futures libraries to async as async leads to forests of callbacks in complex scenarios. Yet another library is Step.

Resources