Optimized way to access posted form data - NodeJS - node.js

In one of the older project I saw two types of handling of form data.
In one of the method it is done using EventEmitter methods like this:
http.createServer(function(req, res){
let decoder = new StringDecoder("utf-8");
let buffer="";
req.on("data", fuction(chunk){
buffer += decoder.write(chunk);
})
});
req.on("end", function(chunk){
// Logic
});
});
Second way(Express Way) of doing this is getting params from request's body.
app.post('/',function(req, res){
const name = req.body.name;
});
As far as I am understanding if posted data is less in size we can use Body to fetch data and if posted data is large we can switch to Buffer.
Is there any other good explanation for this?

Express is just a lib that wraps around the vanilla HTTP lib. If you read the expressjs source code, it's still the Node HTTP lib under the hood. When the body is too big, streams can be use to write data continuously and effectively.

Related

How to determine the time Node.js spends to send an HTTP response body?

My current setup involves a Node.js web application using Express.js.
I am using DataDog's dd-tracer to measure the time Node.js spends for particular method invocations as part of my APM solution.
I would like to know if it is possible to measure the portion of time an incoming HTTP request is busy sending data back to the client as HTTP response body.
Are there any pitfalls or inaccuracies involved when trying to do this kind of instrumentation?
Does anybody know why this is not measured by APM client libraries by default?
I would like to know if it is possible to measure the portion of time an incoming HTTP request is busy sending data back to the client as HTTP response body.
You could wrap calls to res.write manually to create additional spans in the request trace. I would only recommend this if there are not many calls to the method within a request, and otherwise I would recommend to capture just a metric instead.
Alternatively, profiling might be an option that would give you a lot more information about exactly what is taking time within the res.write calls.
I look for a "global" solution which can be integrated into a Nest.js application without instrumenting each call to res.write manually.
As described above, you can simply wrap res.write directly at the start of every request. Using the tracer, this can be achieved like this:
res.write = tracer.wrap('http.write', res.write)
This should be done before any other middleware has the chance to write data.
Example middleware:
app.use((req, res) => {
res.write = tracer.wrap('http.write', res.write)
})
Are there any pitfalls or inaccuracies involved when trying to do this kind of instrumentation?
Nothing major that I can think of.
Does anybody know why this is not measured by APM client libraries by default?
The main issue for doing this out of the box is that creating a span for every call to res.write may be expensive if there are too many calls. If you think it would make sense to have an option to do this out of the box, we can definitely consider adding that.
Hope this helps!
It depends if you want to have the response time for each of the calls or if you want to gather statistics about the response time.
For the first, to get the response time in the header of the response for each request, you can use response-time package: https://github.com/expressjs/response-time
This will add to the response header a value (by default X-Response-Time). That will have the the elapsed time from when a request enters the middleware to when the headers are written out.
var express = require('express')
var responseTime = require('response-time')
var app = express()
app.use(responseTime())
app.get('/', function (req, res) {
res.send('hello, world!')
})
If you want a more complete solution and gather statistics that include the response time you can use the
express-node-metrics package
https://www.npmjs.com/package/express-node-metrics
var metricsMiddleware = require('express-node-metrics').middleware;
app.use(metricsMiddleware);
app.get('/users', function(req, res, next) {
//Do Something
})
app.listen(3000);
You can expose and access this statistics like this:
'use strict'
var express = require("express");
var router = express.Router();
var metrics = require('express-node-metrics').metrics;
router.get('/', function (req, res) {
res.send(metrics.getAll(req.query.reset));
});
router.get('/process', function (req, res) {
res.send(metrics.processMetrics(req.query.reset));
});
router.get('/internal', function (req, res) {
res.send(metrics.internalMetrics(req.query.reset));
});
router.get('/api', function (req, res) {
res.send(metrics.apiMetrics(req.query.reset));
});
First of all I state that I don't know dd-tracer, but I can try to provide a way to get the requested time, then it's up to the developer to use it as needed.
The main inaccuracy coming to my mind is that every OS has its own TCP stack and writing on a TCP socket is a buffered operation: for response bodies smaller than OS TCP stack buffer we are probably going to measure a time close to 0; the result we have is moreover influenced by the Node.js event loop load. The larger the response body becomes the more the event loop load related time becomes negligible. So, if we want to measure the write time for all request only to have a single point, but we'll do our analysis only for long time requests, I think the measurement will be quite accurate.
Another possible source of inaccuracy is how the request handlers write their output: if a request handler writes part of the body, then performs a long time operation to compute last part of the body, then writes missing part of the body, the measured time is influenced by the long time computing operation; we should take care that all request handlers write headers and body all at once.
My solution proposal (which works only if the server do not implements keep alive) is to add a middleware like this.
app.use((req, res, next) => {
let start;
const { write } = res.socket;
// Wrap only first write call
// Do not use arrow function to get access to arguments
res.socket.write = function() {
// Immediately restore write property to not wrap next calls
res.socket.write = write;
// Take the start time
start = new Date().getTime();
// Actually call first write
write.apply(res.socket, arguments);
};
res.socket.on("close", () => {
// Take the elapsed time in result
const result = new Date().getTime() - start;
// Handle the result as needed
console.log("elapsed", result);
});
next();
});
Hope this helps.
You can start a timer before res.end and then any code after res.end should run after it is finished so stop the timer after the res.end function. Don't quote me on that tho.

Node, Express, and parsing streamed JSON in endpoint without blocking thread

I'd like to provide an endpoint in my API to allow third-parties to send large batches of JSON data. I'm free to define the format of the JSON objects, but my initial thought is a simple array of objects:
{[{"id":1, "name":"Larry"}, {"id":2, "name":"Curly"}, {"id":3, "name":"Moe"}]}
As there could be any number of these objects in the array, I'd need to stream this data in, read each of these objects as they're streamed in, and persist them somewhere.
TL;DR: Stream a large array of JSON objects from the body of an Express POST request.
It's easy to get the most basic of examples out there working as all of them seem to demonstrate this idea using "fs" and working w/ the filesystem.
What I've been struggling with is the Express implementation of this. At this point, I think I've got this working using the "stream-json" package:
const express = require("express");
const router = express.Router();
const StreamArray = require("stream-json/streamers/StreamArray");
router.post("/filestream", (req, res, next) => {
const stream = StreamArray.withParser();
req.pipe(stream).on("data", ({key, value}) => {
console.log(key, value);
}).on("finish", () => {
console.log("FINISH!");
}).on("error", e => {
console.log("Stream error :(");
});
res.status(200).send("Finished successfully!");
});
I end up with a proper readout of each object as it's parsed by stream-json. The problem seems to be with the thread getting blocked while the processing is happening. I can hit this once and immediately get the 200 response, but a second hit blocks the thread until the first batch finishes, while the second also begins.
Is there any way to do something like this w/o spawning a child process, or something like that? I'm unsure what to do with this, so that the endpoint can continue to receive requests while streaming/parsing the individual JSON objects.

How do you read a stream in a middleware and still be streamable in next middleware

I'm using a proxy middleware to forward multipart data to a different endpoint. I would like to get some information from the stream using previous middleware, and still have the stream readable for the proxy middleware that follows. Is there stream pattern that allows me to do this?
function preMiddleware(req, res, next) {
req.rawBody = '';
req.on('data', function(chunk) {
req.rawBody += chunk;
});
req.on('end', () => {
next();
})
}
function proxyMiddleware(req, res, next) {
console.log(req.rawBody)
console.log(req.readable) // false
}
app.use('/cfs', preMiddleware, proxyMiddleware)
I want to access the name value of <input name="fee" type='file' /> before sending the streamed data to the external endpoint. I think I need to do this because the endpoint parses fee into the final url, and I would like to have a handle for doing some post processing. I'm open to alternative patterns to resolve this.
I don't think there is any mechanism for peeking into a stream without actually permanently removing data from the stream or any mechanism for "unreading" data from a stream to put it back into the stream.
As such, I can think of a few possible ideas:
Read the data you want from the stream and then send the data to the final endpoint manually (not using your proxy code that expects the readable stream).
Read the stream, get the data you want out if it, then create a new readable stream, put the data you read into that readable stream and pass that readable stream onto the proxy. Exactly how to pass it only the proxy will need some looking into the proxy code. You might have to make a new req object that is the new stream.
Create a stream transform that lets you read the stream (potentially even modifying it) while creating a new stream that can be fed to the proxy.
Register your own data event handler, then pause the stream (registering a data even automatically triggers the stream to flow and you don't want it to flow yet), then call next() right away. I think this will allow you to "see" a copy of all the data as it goes by when the proxy middleware reads the stream as there will just be multiple data event handlers, one for your middleware and one for the proxy middleware. This is a theoretical idea - I haven't yet tried it.
You would need to be able to send a single stream in two different directions, which is not gonna be easy if you try it on your own - luckily I wrote a helpful module back in the day rereadable-stream
, that you could use and I'll use scramjet for finding the data you're interested in.
I assume your data will be a multipart-boundary:
const {StringStream} = require('scramjet');
const {ReReadable} = require("rereadable-stream");
// I will use a single middleware, since express does not allow to pass an altered request object to next()
app.use('/cfs', (req, res, next) => {
const buffered = req.pipe(new ReReadable()); // rewind file to
let file = '';
buffered.pipe(new StringStream) // pipe to a StringStream
.lines('\n') // split request by line
.filter(x => x.startsWith('Content-Disposition: form-data;'))
// find form-data lines
.parse(x => x.split(/;\s*/).reduce((a, y) => { // split values
const z = y.split(/:\s*/); // split value name from value
a[z[0]] = JSON.parse(z[1]); // assign to accumulator (values are quoted)
return a;
}, {}))
.until(x => x.name === 'fee' && (file = x.filename, 1))
// run the stream until filename is found
.run()
.then(() => uploadFileToProxy(file, buffered.rewind(), res, next))
// upload the file using your method
});
You'll probably need to adapt this a little to make it work in real world scenario. Let me know if you get stuck or there's something to fix in the above answer.

Node/Express, How do I modify static files but still have access to req.params?

I'm new to node/express, so there's (hopefully) an obvious answer that I'm missing.
There's a middleware for transforming static content: https://www.npmjs.com/package/connect-static-transform/. The transformation function looks like:
transform: function (path, text, send) {
send(text.toUpperCase(), {'Content-Type': 'text/plain'});
}
So, that's great for transforming the content before serving, but it doesn't let me look at query parameters.
This answer shows how to do it Connect or Express middleware to modify the response.body:
function modify(req, res, next){
res.body = res.body + "modified";
next();
}
But I can't figure out how to get it to run with static file content. When I run it res.body is undefined.
Is there some way to get a middleware to run after express.static?
My use case is that I want to serve files from disk making a small substitution of some text based on the value of a query parameter. This would be easy with server-side templating, like Flask. But I want the user to be able to do a simple npm-install and start up a tiny server to do this. Since I'm new to node and express, I wanted to save myself the bother of reading the url, locating the file on disk and reading it. But it's becoming clear that I wasted much more time trying this approach.
The answer appears to be "There is no answer." (As suggested by Pomax in the comment.) This is really annoying. It didn't take me too long to figure out how to serve and transform files myself, but now I'm having to figure out error handling. A million people have already written this code.
You can create middleware that only does transformation of body chunks as they are written with res.write or res.end or whatever.
For example:
const CSRF_RE = /<meta name="csrf-token" content="(.*)"([^>]*)?>/
function transformMiddleware (req, res, next) {
const _write = res.write
res.write = function(chunk, encoding) {
if (chunk.toString().indexOf('<meta name="csrf-token"') === -1) {
_write.call(res, chunk, encoding)
} else {
const newChunk = chunk.toString().replace(CSRF_RE, `<meta name="csrf-token" content="${req.csrfToken()}">`)
_write.call(res, newChunk, encoding)
}
}
next()
}

Multiple clients posting data in node js

I've read that in Node js one should treat POST requests carefully because the post data may arrive in chunks, so it has to be handled like this, concatenating:
function handleRequest(request, response) {
if (request.method == 'POST') {
var body = '';
request.on('data', function (data) {
body += data;
});
request.on('end', function () {
//data is complete here
});
}
}
What I don't understand is how this code snippet will handle several clients at the same time. Let's say two separate clients start uploading large POST data. They will be added to the same body, mixing up the data...
Or is it the framework which will handle this? Triggering different instances of handleRequest function so that they do not get mixed up in the body variable?
Thanks.
Given the request, response signature of your method, it looks like that's a listener for the request event.
Assuming that's correct, then this event is emitted for every new request, so as long as you are only concatenating new data to a body object that is unique to that handler (as in your current example), you're good to go.

Resources