Concurrency with fs.writeFileSync using NodeJS and ExpressJS - node.js

I have the following code written with NodeJS and ExpressJS:
const express = require("express");
const fs = require("fs");
const bodyParser = require("body-parser");
const jsonParser = bodyParser.json();
const hostname = "127.0.0.1";
let port = 3001;
const app = express();
app.use(express.static(__dirname + "/answers"));
const answersPath = __dirname + "/answers/answers.json";
app.patch("/new/answer", jsonParser, function (req, res) {
try {
const questionId = req.body.questionId;
const answer = req.body.answer;
const answersJson = JSON.parse(fs.readFileSync(`${answersPath}`, "utf8"));
if (answersJson[questionId]) {
answersJson[questionId] = [...answersJson[questionId], answer];
} else {
answersJson[questionId] = [answer];
}
fs.writeFileSync(`${answersPath}`, JSON.stringify(answersJson));
res.sendStatus(200);
} catch (e) {
console.error(e);
res.sendStatus(500);
}
});
app.listen(port);
console.log(`Server running at http://${hostname}:${port}/`);
What it basically does, it has an endpoint (/new/question), on which it receives as a JSON format, a question and an answer.
If the question exists already in the answers.json file, it adds the new answer to the list of answers for that question. If not, it creates a new question with a list of the answer.
Now, I've read the following article: https://www.geeksforgeeks.org/how-to-handle-concurrency-in-node-js/
And what I understood from here, is that even though the endpoint would get called at the same time by two clients, both of the responses will be saved, one after the other - one of them will wait for the other one, i.e. the file will not get overwritten.
So my question is, is this true? NodeJS deals with concurrency on its own, or do I need to implement something to prevent this from happening?
Thank you, and sorry if this is a dumb question 😞.

Although readFileSync() and writeFileSync() might do what you want to achieve, you should avoid using synchronous functions in Node.js.
Synchronous functions will block the entire Node.js process, not just the a single Express route. This means your server will become unresponsive while reading or writing the file. This will become an issue if the file gets bigger.
Instead of using a file, you could keep the data only in memory. If you need to persist the data between server restarts, you can read it when the server starts and write it when the server stops. In this case it might be okay to use synchronous functions.

Related

Do not see the reason I am getting a NOENT returned when I can see the file at the exact spot I am calling for it to be

I know this is very similar to other questions that have been asked on the same error. In the case I have seen though, the file name had been left off of the url. In my case (as far as I know) the url is specified as it should be and I can see the file on my localhost using other tools.
I have a need in a node.js app to perform I/O on json files without the benefit of using express routing. This is an API that has only one route (processor.js). It is accessed by a menu selection on the GUI by selecting 'Process'. From that point on everything happens within that route including multiple GETs/PUTs to json (for ids to data and then using the ids to get the data) and the building of SQL rows for populating SQL-Server Tables from the parsed json data. That, at least is the concept I am testing now. It is the hand I have been dealt, so I don't have other options.
I am using fs-extra rather than request or axios etc., because they all seem to expect express routes to accomplish the I/O. I appear to be able to directly read and write the json using fs-extra. I am using sequelize (or will be) for the SQL side.
That's the background.
Here is my processor.js (I am merely validating that I can in fact get idsList returned to me at this point):
'use strict';
// node_modules
const express = require('express');
const router = express.Router();
const fse = require('fs-extra')
// local modules
const idsList = require('../functions/getIds');
router.get('/', (req, res) => {
console.log(idsList);
});
module.exports = router;
Here is my getIds function:
'use strict';
// library modules
const express = require('express');
const router = express.Router();
const fse = require('fs-extra');
const uri = require('../uri');
// initialize general variables
let baseURL = `http://localhost:5000${uri}/`;
let idsID = 'ids.json';
const getIds = async () => {
let url = `${baseURL}${idsID}`;
try {
const idsList = await fse.readJson(url);
console.log('fse.readJson',idsList);
} catch (err) {
console.error(err);
}
}
module.exports = getIds();
And, here is my error, output to the console (it didn't format very well):
Listening on port 5000...
{ [Error: ENOENT: no such file or directory, open
'http://localhost:5000/Users/doug5solas/sandbox/libertyMutual/playground/api/ids.json']
errno: -2,
code: 'ENOENT',
syscall: 'open',
path:
'http://localhost:5000/Users/doug5solas/sandbox/libertyMutual/playground/api/ids.json' }
What am I missing?
You can use fs-extra to manipulate files and directories in your local file system only.
If you want to read files hosted on other machine over http, try using an http client like: axios.
I moved away from fs-extra to fs.readFileSync and solved the problem. It is not my preference. But it does work and the file is small, and only once.

Testing for server in Koa

I am using Koa for web development in NodeJS, I have a server file, which does nothing but to start the server and initialise few middlewares. Following is the sample code
server.js
const Koa = require('koa');
var Router = require('koa-router');
var bodyParser = require('koa-bodyparser');
var app = new Koa();
var router = new Router();
app.use(bodyParser());
router.post('/abc', AbcController.abcAction);
router.post('/pqr', PqrController.pqrAction);
app.use(router.routes());
app.listen(3000);
When we run npm start the server will start on 3000 port and now I want to write unit test case for this file using mocha, chai and sinon.
One way is to create a test file lets say server_test.js and do something like the following(just an example):
var server = require(./server);
server.close();
For this we need to add the following lines to server.js
var server = app.listen(3000);
module.exports = server;
Is this a good practice to do? I think we should not expose server in this fashion?
As we don't have self created function here in this file, is testing really required?
Should we also exclude such files from sonarqube coverage?
Any other better suggestion is always welcome. Need your help guys. Thank you.
You can use chai-http for testing the endpoint
this is what I use for my project
const chai = require('chai');
const chaiHttp = require('chai-http');
const expect = chai.expect;
const app = require('../app');
describe('/GET roles', function () {
it('should return bla bla bla',
function (done) {
chai.request(app)
.get('/roles')
.end(function (err, res) {
expect(res.status).eql(200)
expect(res.body).to.have.property('message').eql('Get role list success');
expect(res.body).to.have.property('roles');
expect(err).to.be.null;
done();
});
}
);
});
There are primarily 2 ways through which you can actually handle rest cases.
One is to put your test cases along with your source code file. ( in your case it would be server.spec.js ). I personally prefer this way as it encourages code modularity and make your modules totally independent.
Another way is to create another directory, let say test, where you can put your entire test cases according to same directory structure as followed by the main application. This is really useful for applications where test cases are only considered while they are in development phase and then at time of production you can simply ignore sending these files.
Also, I usually prefer following the concepts of functional programming as it really helps to test each block of code independently.
Hope this helps

How to manage a background (scraping) process with node.js

I have an Express app which extract data from plenty of websites. To do it, currently I have to run a task with a route (e.g. localhost/scrapdata) which get the data and store it on my pgsql db. This task is running infinitely.
I have other routes to get the data from my db.
Is it a good strategy to start my scraping task with a route? Or there is another strategies?
This doesn't need to be an Express app, but a simple Node.js script that gets fired off at a specified interval. What you are looking for is Cron.
If you want to keep your current Express app, then I recommend keeping it's current structure, but use something like node-schedule. So in another file, you could have something like:
// my-job.js
const schedule = require('node-schedule')
module.exports = schedule.scheduleJob('42 * * * *', () => {
console.log('The answer to life, the universe, and everything!')
})
Then in your main app.js, just import the file to start the job:
const express = require('express')
...
require('./my-job')
Then in another route like /shutdown, you could do:
const express = require('express')
const j = require('./my-job')
const router = express.Router()
router.get('/shutdown', () => {
j.cancel()
res.json({ message: 'Canceled.' })
})
This is just an idea, the above has not been tested.
Keep in mind though, scraping websites is a gray area. If they offer an API, then use that instead.

How do I overload the functionality of app.listen in expressjs

I've been trying to create (basically) a factory function that configures and builds an expressjs server for a dozen smaller specialized servers I have. For part of this I want to augment the listen function.
I would like to know the best way to go about this. I'm also looking for a reusable design choice here.
Server is created normally:
var httpServer = express();
...
Because of the way express is designed (Not sure if I am correct) I cannot access a {whatever}.prototype.listen. So I have come up with two approaches.
Using an additional variable in the current scope:
var oldListen = httpServer.listen;
httpServer.listen = function(callback){
...
oldListen.call(httpServer, options.port, options.host, function(){
...
if ( typeof callback == 'function' ) callback();
});
};
Which works and is fairly straight forward but then I have a variable hoisting wart. I also have a closure solution, but I think it may be too obtuse to be practical:
httpServer.listen = (function(superListen){
return function(callback){
...
superListen.call(httpServer, options.port, options.host, function(){
...
if ( typeof callback == 'function' ) callback();
});
};
})(httpServer.listen);
Both examples are part of the factory context and I am intentionally reducing the arguments passed to the function.
Any help would be appreciated.
If you insist on "overloading", make sure you implement the original footprint (such is the nature of overloading). Express listen is just an alias to node's internal http listen method:
server.listen(port, [host], [backlog], [callback]);
UPDATE: Express even suggests using node's server API for custom implementations: http://expressjs.com/4x/api.html#app.listen
Otherwise, you should create your own custom listen method which would be defined like:
httpServer.myCustomListen = function (callback) {
httpServer.listen.call(httpServer, options.port, options.host, callback);
}
The second option is your best bet, but in order for it to work, you must extend the express library. Express is open source and hosted on Github. Fork it and modify it as you please. Periodically pull in new updates so you stay up-to-date with the core library. I do this all the time with node modules.
There are two benefits from doing it this way:
You have complete control to customize the code however you see fit while staying up to date with the code written by the original authors.
If you find a bug or build a cool feature, you can submit a pull request to benefit the community at large.
You would first fork the repository, then grab the URL for your fork, clone it, and then add a reference to the original "upstream" repo:
git clone [url_to your_fork]
cd express
git remote add upstream git#github.com:strongloop/express.git
Then you can push changes to your own repo (git push). If you want to get updates from the original repo, you can pull from the upstream repo: git pull upstream master.
If you want to add your custom fork of express as an npm module for a project, you would use the following:
npm install git://github.com/[your_user_name]/express.git --save
As Victor's answer pointed out, express's prototype is in express/lib/application.js. That file is used to build express and is exported via the application namespace in express/lib/express.js. Therefore, the .listen function can be referenced using express.appliction.listen.
One can use this method then: (similar to Victor's method)
var express = require('express');
express.application._listen = express.application.listen;
express.application.listen = function(callback) {
return this._listen(options.port, options.host, callback);
};
One can also use Lo-dash's _.wrap function if you don't want to store the base function in a variable yourself. It would look something like this:
var express = require('express');
var _ = require('lodash');
express.application.listen = _.wrap(express.application.listen, function(listenFn) {
return listenFn(options.port, options.host, callback); // Called with the same this
};
However, using these methods would run into the problems that you mentioned in your question (variable hoisting, creating an extra variable). To solve this, I would usually create my own subclass of express.application and replace the .listen function in that subclass and tell express to use that subclass instead. Due to express's current structure, however, you cannot replace express.application with your own subclass without overriding the express() function itself.
Hence, what I would do is to take over express.application.listen completely since it is only 2 lines. It is rather simple!
var express = require('express');
var http = require('http');
express.application.listen = function(callback) {
return http.createServer(this).listen(options.port, options.host, callback);
};
You can even make an https option!
var express = require('express');
var http = require('http');
var https = require('https');
express.application.listen = function(callback) {
return (options.https ? http.createServer(this) : https.createServer({ ... }, this))
.listen(options.port, options.host, callback);
};
Note: One of the other answers mentions forking express and modifying it. I would have a tough time justifying that for such a small function.
You should be able to easily overload the express listen function. You can access it in the following Object path: express.application.listen
So, you can implement something like this:
var express = require('express');
express.application.baseListen = express.application.listen;
express.application.listen = function(port) {
console.log('Port is: ' + port);
this.baseListen(port);
};
The implementation of the listen function is in the following path under the express module folder: node_modules\express\lib\application.js
Bind and listen for connections on the given host and port. This method is identical to node's http.Server#listen().
var express = require('express');
var app = express();
app.listen(3000);
The app returned by express() is in fact a JavaScript Function, designed to be passed to node's HTTP servers as a callback to handle requests. This allows you to provide both HTTP and HTTPS versions of your app with the same codebase easily, as the app does not inherit from these (it is simply a callback):
var express = require('express');
var https = require('https');
var http = require('http');
var app = express();
http.createServer(app).listen(80);
https.createServer(options, app).listen(443);
The app.listen() method is a convenience method for the following (if you wish to use HTTPS or provide both, use the technique above):
app.listen = function(){
var server = http.createServer(this);
return server.listen.apply(server, arguments);
};
Reference:http://expressjs.com/api.html
Hope This helps.

Nodejs blocks and can't process next requests during ZIP file streaming

I'm trying to create a simple app which will create ZIP on the fly containg a few files and stream them to the client. Almost works. The problem what I encountered is that the nodejs blocks when streaming is in progress. All pendig requests will be processed after finishing current streaming. But what is interesting, these pending requests will be processed concurrently! So it can do this! I don't understand, why nodejs (expressjs?) can't start processing next request during streaming one file and what I did wrong... :|
I'm using node-archiver as a stream source.
Here's my part of code:
var express = require('express');
var archiver = require('archiver');
var router = express.Router();
router.get('/job/:id', function(req, res) {
var job = req.jobs[req.params.id];
var archive = archiver('zip');
archive.pipe(res);
for (var n in job.files) {
var f = job.files[n];
archive.file(job.path + f, {name: f});
}
res.setHeader("content-type", "application/zip");
res.setHeader("Content-Disposition", 'attachment; filename="reports.zip"')
archive.finalize();
});
module.exports = router;
Any advices? Thanks!
EDIT: I've noticed another problem, completely not related with archiver. I have following basic app:
var http = require('http');
var fs = require('fs');
var server = http.createServer(function (req, res) {
var stream = fs.createReadStream('file.blob');
stream.pipe(res);
});
server.listen(31922);
Tell me, why it get stuck in this case? Result is absolutely the same as using archiver. OS what I use is SmartOS (based on Open Solaris), it's Unix. Maybe this is a problem? Any ideas?
For all those of you struggling with similar problem. I'm being dumb perhaps. The solution is simple. Basically testing method was wrong. My browser (and other tested have similar behaviour) blocks processing next request from the same host name when previous one is not finished. And in this case this is not finished request as downloading is still in progress. Thanks for your help!
It seems that Archiver depends on synchronous code, there is recent issue open on github addressing this:
https://github.com/ctalkington/node-archiver/issues/85
Given that it is synchronous, that's probably where your block is comming from.
Quote:
This module depends on file-utils which in the README says: "this is a
set of synchronous utility. As so, it should never be used on a
Node.js server. This is meant for users/command line utilities."

Resources