How can i thread image processing in NodeJS?

How can i thread image processing in NodeJS? - multithreading

I am writing a image processing web service on NodeJS and in basic i need to figure out how to make the actual image processing threaded. When i test it through Apache AB NodeJS is only using one core and stalling, i am surely doing something wrong here. How can i redesign this simple app to make use of the multiple cores on my server CPU.
I scaled away all input filtering and simplified the image processing function to give you a idea of the app structure instead of long code bits.
In app.js
app.get('/generate/:q', function(req, res){
var textString = req.params.q;
res.setHeader("Content-Type", "image/png");
res.end(generator.s());
});
In generate.js
var Canvas = require('canvas')
, Font = Canvas.Font
, fs = require('fs')
, path = require("path")
, plate = new Canvas.Image;
//To keep plate in RAM between requests.
fs.readFile(__dirname + '/plates/dhw132.png', function(err, squid){
if (err) throw err;
plate.src = squid;
});
exports.s = function () {
canvas = new Canvas(731,1024);
ctx = canvas.getContext('2d');
ctx.drawImage(plate, 0, 0, plate.width, plate.height);
return canvas.toBuffer();
}
How can i rewrite this to make the generator.s() threaded?

Node is single thread of course, but need for multi-threading is a very valid use-case. There's two ways I'm aware of.
Why don't you use clusters. You'll get a configurable number of process/threads, by default the number of cpus on your machine. Cluster essentially load-balances your application across processes, and transparently handles those processes sharing of the single listening http port.
http://nodejs.org/api/cluster.html
There's a wrapper for it as well here: https://github.com/dpweb/cluster-node
A different option, you could fork the process directly, here's an example where an uploaded file gets converted to mp3 using lame.. For your case, you would encapsulate all the image processing in a separate app, so the clusters option may be cleaner that doing that.
app.post('/process', function(req, res){
var f = req.files.filen;
fs.rename(f.path, f.name, function(err) {
if (err){
fs.unlink(f.name, ef);
throw err;
return;
}
fs.unlink(f.path, function() {
var ext = "." + req.body.extn;
require('child_process').exec("lame "+f.name, function(err, out, er) {
var nfn = f.name + '.mp3';
res.setHeader('Content-Type', 'application/octet-stream');
res.setHeader('Content-disposition', 'attachment; filename='+nfn);
res.setHeader("Content-Transfer-Encoding: binary");
res.setHeader('Accept-Ranges: bytes');
var size = fs.statSync(nfn).size;
console.log(size, f.name, nfn)
res.setHeader('Content-Length', size);
fs.createReadStream(nfn).pipe(res);
fs.unlink(nfn, ef); fs.unlink(f.name, ef);
})
})
})
})

Related

Opening Maxmind db in Nodejs

I am trying to open maxmind opensource database in my nodejs application. My application recieves a list of ip addressses from a java application. Application then returns the latitude and longitude corresponding to each ip. I have succesfully done this synchronously, but i want to do it asynchronously to make things a little faster. I have written a code for this, but the application gets killed everytime. I am guessing that the reason might be simultaneous opening of the same database(I might be wrong :D). I am posting the code below. Please take a look at it and make some suggestions on where I am going wrong. Thanks!!!
app.post('/endPoint', function(req, res){
var obj = req.body;
var list = [];
var ipList = obj.ipList;
for(var i = 0; i<ipList.length; i++){
var ip = ipList[i];
//console.log(i);
maxmind.open('./GeoLite2-City.mmdb', function(err, cityLookup){
if(err) throw err;
console.log("open database");
var city = cityLookup.get(ip);
if(city!=null){
var cordinates = {'latitude': city.location.latitude, 'longitude': geodata.location.longitude};
//console.log(cordinates);
list.push(cordinates);
}
if(list.length == ipList.length){
res.json({finalMap: list});
}
});
}
});

You should open the database only once, and reuse it.
The easiest solution would be to synchronously open the database at the top of your file:
const maxmind = require('maxmind');
const cityLookup = maxmind.openSync('./GeoLite2-City.mmdb');
Reading it asynchronously wouldn't speed things up a whole lot, and because loading the database is done only once (during app startup), I don't think it's a big deal that it may temporarily block the event loop for a few seconds.
And use the cityLookup function in your request handler:
app.post('/endPoint', function(req, res) {
...
let city = cityLookup.get(ip);
...
});

Getting a progress for an FTP-upload with node

I have an Electron app which uploads a dropped file to a predefined server with node-ftp. The upload works like a charm, but despite reading a couple of suggestions I cannot figure out how to get information on the actual progress for a progress-bar.
My upload-code so far:
var ftp = new Client();
let uploadfile = fs.createReadStream(f.path);
let newname = uuid(); //some function I use for renaming
ftp.on('ready', function () {
ftp.put(uploadfile, newname, function (err) {
if (err) throw err;
ftp.end();
});
});
c.connect({user: 'test', password: 'test'});
I always stumble across monitoring the 'data' event, but could not find out how or where to access it (as you can see I'm quite new to JavaScript).

Got it. I found the answer in streams with percentage complete
With my code changed to
var ftp = new Client();
let uploadfile = fs.createReadStream(f.path);
let newname = uuid(); //some function I use for renaming
ftp.on('ready', function() {
uploadfile.on('data', function(buffer) {
var segmentLength = buffer.length;
uploadedSize += segmentLength;
console.log("Progress:\t" + ((uploadedSize/f.size*100).toFixed(2) + "%"));
});
ftp.put(uploadfile, newname, function(err) {
if (err) throw err;
ftp.end();
});
});
c.connect({user: 'test', password: 'test'});
I get the percentage uploaded in console. From here it's only a small step to a graphical output.

on client side you can create a byte count for your upload stream (http://www.experts.exchange.com/questions/24041115/upload-file-on-ftp-with-progressbar-and-time-left.html)
set lower limit of the progressbar to 0
set upper limit to file length of upload file
feed the progress bar with the byte count
(http://www.stackoverflow.com/questions/24608048/how-do-i-count-bytecount-in-read-method-of-inputstream)
maybe you can use npm like stream-meter (https://www.npmjs.com/package/stream-meter) or progress-stream (https://www.npmjs.com/package/progress-stream) and pipe your file stream through to feed the progressbar. i am not sure about that because i do not know the internals of the npms. in progress-stream is a function transferred() that would fit exactly
a very accurate way is to have code on the server that gives feedback to the browser (http://www.stackoverflow.com/questions/8480240/progress-bar-for-iframe-uploads)

Throttle CPU NODE.JS action to allow new calls to be processed

I have an expressJS application that accepts a request that results in 1K to 50K fs.link() actions being executed. (it might even hit 500K).
The request (a POST) is not being held up while this occurs. I immediately fire of a res.send() which makes the client happy.
But the server then "forks" the job below, which needs to go and do all the fs.links() which do happen async, but the amount of work (CPU, DISK etc.) means that the ExpressJS service is not very responsive to new requests during this time.
Is there some easy way (other than childProcess) to simulate the forking of a low priority thread that would be doing these file linking?
Job.prototype.runJob = function (next) {
var self = this;
var max = this.files.length;
var count = 0;
async.each(this.files,
function (file, step) {
var src = path.join(self.sourcePath, file.path);
var base = path.basename(src);
var dest = path.join(self.root, base);
fs.link(src, dest, function (err) {
if (err) {
// logger.addLog('warn', "fs.link failed for file: %s", err.message, { file: src });
self.filesMissingList.push(src);
self.errors = true;
self.filesMissing++;
} else {
self.filesFound++;
}
self.batch.update({ tilesCount: ++count, tilesMax: max, done: false });
step(null);
});
},
function (err) {
self.batch.update({ tilesCount: count, tilesMax: max, done: true });
next(null, "FalconView Linking of: " + self.type + " run completed");
});
}

You could use the webworker-threads module, which is good for spinning CPU-intensive tasks onto other threads. Alternatively, you could abuse cluster, but it's really the wrong tool for the job. (The cluster module is really better for scaling up web services, not for doing intensive tasks.)

You can try to Use async.eachLimit instead of async.each. This way you can control how many iterations you process before an expressJS process.

Node Streaming, Writing, and Memory

I'm attempting to dynamically concatenate files prior to serving their content. The following very simplified code shows an approach:
var http = require('http');
var fs = require('fs');
var start = '<!doctype html><html lang="en"><head><script>';
var funcsA = fs.readFileSync('functionsA.js', 'utf8');
var funcsB = fs.readFileSync('functionsB.js', 'utf8');
var funcsC = fs.readFileSync('functionsC.js', 'utf8');
var finish = '</script></head><body>some stuff here</body></html>';
var output = start + funcsA + funcsB + funcsC + finish;
http.createServer(function (req, res) {
res.writeHead(200, {'Content-Type': 'text/html'});
res.end(output);
}).listen(9000);
In reality, how I concatenate might depend on clues from the userAgent. My markup and scripts could be several hundred kilobytes combined.
I like this approach because there is no file system I/O happening within createServer. I seem to have read somewhere that this response.write(...); approach is not as efficient/low overhead as streaming data using an fs.createReadStream approach. I seem to recall this had something to do with what happens when the client cannot receive data as fast as Node can send it.(?) We seem to be able to create a readable stream from a file system object, but not from memory. Is it possible to do what I have coded above with a streaming approach? With file I/O happening initially, outside of the CreateServer function?
Or, on the other hand, are my concerns not that critical, and the approach above offers perhaps no less efficiency than a streaming approach.
Thanks.

res.write(start)
var A = fs.createReadStream()
var B = fs.createReadStream()
var C = fs.createReadStream()
A.pipe(res, {
end: false
})
A.on('end', function () {
B.pipe(res, {
end: false
})
})
B.on('end', function () {
C.pipe(res, {
end: false
})
})
C.on('end', function () {
res.write(finish)
res.end()
})

Defining Streams prior to (and not inside) the createServer callback won't typically work, see here

Node.js thumbnailer using Imagemagick: nondeterministic corruption

I have a Node.js server which dynamically generates and serves small (200x200) thumbnails from images (640x640) in a database (mongodb). I'm using the node-imagemagick module for thumbnailing.
My code works roughly 95% of the time; about 1 in 20 (or fewer) thumbnailed images are corrupt on the client (iOS), which reports:
JPEG Corrupt JPEG data: premature end of data segment
For the corrupt images, the client displays the top 50% - 75% of the image, and the remainder is truncated.
The behavior is non-deterministic and the specific images which are corrupt changes on a per-request basis.
I'm using the following code to resize the image and output the thumbnail:
im.resize({
srcData: image.imageData.buffer,
width: opt_width,
}, function(err, stdout) {
var responseHeaders = {};
responseHeaders['content-type'] = 'image/jpeg';
responseHeaders['content-length'] = stdout.length;
debug('Writing ', stdout.length, ' bytes.');
response.writeHead(200, responseHeaders);
response.write(stdout, 'binary');
response.end();
});
What could be wrong, here?
Notes:
The problem is not an incorrect content-length header. When I omit the header, the result is the same.
When I do not resize the image, the full-size image always seems to be fine.
In researching this I found this and this StackOverflow questions, which both solved the problem by increasing the buffer size. In my case the images are very small, so this seems unlikely to be responsible.
I was originally assigning stdout to a new Buffer(stdout, 'binary') and writing that. Removing it ('binary' will be deprecated) made no difference.

The problem seems to have been due to a slightly older version of node-imagemagick (0.1.2); upgrading to 0.1.3 was the solution.
In case this is helpful to anyone, here's the code I used to make Node.js queue up and handle client requests one at a time.
// Set up your server like normal.
http.createServer(handleRequest);
// ...
var requestQueue = [];
var isHandlingRequest = false; // Prevent new requests from being handled.
// If you have any endpoints that don't always call response.end(), add them here.
var urlsToHandleConcurrently = {
'/someCometStyleThingy': true
};
function handleRequest(req, res) {
if (req.url in urlsToHandleConcurrently) {
handleQueuedRequest(req, res);
return;
}
requestQueue.push([req, res]); // Enqueue new requests.
processRequestQueue(); // Check if a request in the queue can be handled.
}
function processRequestQueue() {
// Continue if no requests are being processed and the queue is not empty.
if (isHandlingRequest) return;
if (requestQueue.length == 0) return;
var op = requestQueue.shift();
var req = op[0], res = op[1];
// Wrap .end() on the http.ServerRequest instance to
// unblock and process the next queued item.
res.oldEnd = res.end;
res.end = function(data) {
res.oldEnd(data);
isHandlingRequest = false;
processRequestQueue();
};
// Start handling the request, while blocking the queue until res.end() is called.
isHandlingRequest = true;
handleQueuedRequest(req, res);
}
function handleQueuedRequest(req, res) {
// Your regular request handling code here...
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How can i thread image processing in NodeJS? - multithreading

Related

Opening Maxmind db in Nodejs

Getting a progress for an FTP-upload with node

Throttle CPU NODE.JS action to allow new calls to be processed

Node Streaming, Writing, and Memory

Node.js thumbnailer using Imagemagick: nondeterministic corruption

Categories

Resources