Log4js takes hours to write logfile

Log4js takes hours to write logfile - node.js

I am using log4js in my code to log the results and errors. The program runs for about 2,5 hours before the final console output is made and afterwards needs several hours to complete writing the logfile. The log is writing for 6 hours now (since the algorithm itself finished) and the filesize is 100mb.
The log will be about 1,5 million lines (when done).
Is it normal for the log to be written as slow as this? Are there "standard" mistakes to make when using log4js that I could check?
In case you want to know: The program is running on an Intel i5 with 8gb RAM and an SSD drive, so the hardware shouldn't be the problem I guess.
I am not sure what other information I can give you, just ask ahead if you need to know something.

Dropbox sounds like a good candidate. Any anti virus software could also interfere.
Firstly I would confirm what your system is capable of by creating a mini log4js benchmark for the various configurations available on your PC, then compare that to your application performance.
var Benchmark = require('benchmark');
var log4js = require('log4js');
log4js.clearAppenders();
log4js.loadAppender('file');
log4js.addAppender(log4js.appenders.file('NUL'), 'nulnulnul');
var lognul = log4js.getLogger('nulnulnul');
log4js.addAppender(log4js.appenders.file('c:/your_dropbox/test.log'), 'normallog');
var lognorm = log4js.getLogger('normallog');
log4js.addAppender(log4js.appenders.file('c:/tmp/test.log'), 'nodropbox');
var lognodr = log4js.getLogger('nodropbox');
log4js.addAppender(log4js.appenders.file('c:/virus-exception/test.log'), 'nodropvir');
var lognodv = log4js.getLogger('nodropvir');
var suite = new Benchmark.Suite;
// add tests
suite.add('Log#Nul', function() {
lognul.info("Some lengthy nulnulnul info messages");
})
.add('Log#normal', function() {
lognorm.info("Some lengthy normallog info messages");
})
.add('Log#NoDropbox', function() {
lognodr.info("Some lengthy nodropbox info messages");
})
.add('Log#NoVirusOrDropbox', function() {
lognodv.info("Some lengthy nodropvir info messages");
})
// add listeners
.on('cycle', function(event) {
console.log(String(event.target));
})
.on('complete', function() {
console.log('Fastest is ' + this.filter('fastest').pluck('name'));
})
// run async
.run({ 'async': false });
If Dropbox or Virus software doesn't turn out to be the problem there are two Windows Sysinternal tools that will help you see what is going on your system when your process is running.
Process Explorer - Overall task manager/performance viewer
Gives you an overall view of your system so you can see which processes are doing what. You can also drill down into specific processes as well (right click/properties)
Process Monitor - Event profiler for processes.
Process Monitor is like a log file of all the system calls any process makes.
You can filter down to specific processes or calls so in your case you would be able to monitor Dropbox and your Node.js process and see if their access to the file in question is interleaved while Dropbox does it's work.

Related

Optimal method for nodejs to hand of image from database to browser

the end result that I need is to send multiple images to a web browser from a database.
The images are stored as blobs.
I know I can stream them out of the database and into a file and then I could just give the url to the file.
I also know I can hand off base64 string to the browser so it can render the image.
My question is which option is the most optimal? Or best practice? Keep in mind that if I go the stream method, I would have to check to see if the image has changed since the last time I displayed it...and if it has changed then I have to restream it out of the database.
I have been playing with the oracldb for node js and was able to successfully extract one blob into a file but I am also having trouble streaming multiple files.
This is a two question post:
Which is the most optimal:
1. Send Base64 string - I kind of like this method because i dont have to worry about streaming out the file and checking if it has changed since it is coming straight from the databse. My concern is can the browser/nodejs handle it? I know those strings can be very large. I could also be sending more than one image at a time.
Stream the blobs into files.
The second part question is how can i get multiple blobs out below is my code on streaming just one file, i found this example from github lobstream1.js
https://raw.githubusercontent.com/oracle/node-oracledb/master/examples/lobstream1.js
Focusing on the code:
// Stream a LOB to a file
var dostream = function(lob, cb) {
if (lob.type === oracledb.CLOB) {
console.log('Writing a CLOB to ' + outFileName);
lob.setEncoding('utf8'); // set the encoding so we get a 'string' not a 'buffer'
} else {
console.log('Writing a BLOB to ' + outFileName);
}
var errorHandled = false;
lob.on(
'error',
function(err) {
console.log("lob.on 'error' event");
if (!errorHandled) {
errorHandled = true;
lob.close(function() {
return cb(err);
});
}
});
lob.on(
'end',
function() {
console.log("lob.on 'end' event");
});
lob.on(
'close',
function() {
// console.log("lob.on 'close' event");
if (!errorHandled) {
return cb(null);
}
});
var outStream = fs.createWriteStream(outFileName);
outStream.on(
'error',
function(err) {
console.log("outStream.on 'error' event");
if (!errorHandled) {
errorHandled = true;
lob.close(function() {
return cb(err);
});
}
});
// Switch into flowing mode and push the LOB to the file
lob.pipe(outStream);
};
Fixed spooling out images with this method, I did change the dostream a bit.
for(var x = 0; x<result.rows.length;x++)
{
outputFileName = x + '.jpg';
console.log(outputFileName);
console.log(x);
var lob = result.rows[x][0];
dostream(lob,outputFileName);
// cb(null,lob);
}
Thank you for any help.

Given all the detail you provided in subsequent comments including the average image size, number of distinct images, memory available to Node.js, number of concurrent users, and the fact that it's "very critical to have the images up to date", here's my initial take...
For the first implementation, stick to the KISS principle and avoid over-engineering. Disable browser caching and don't cache images in Node.js. Instead, rely on the driver and Oracle Database to do the heavy lifting for you.
As for the table storing the images, try to use SecureFile LOBs over BasicFile LOBs (they are known to perform better) if possible. Also, look at the caching options available to both (CACHE, CACHE READS, and NOCACHE). Consider enabling the CACHE READS option based on your stated workload, but work with your DBA to ensure the buffer cache is sized appropriately so you will not impact others.
You can rely on the connection pool's connection request queue to help control how many people are fetching files concurrently. In fact, you might want to create a separate pool just for this purpose so that people fetching LOBs aren't blocking people doing other things in the application. For example, let's say you normally have one connection pool with 10 connections. You could create two connection pools with 5 connections each (use the connection pool cache to make this easy). Then, in the code path that fetches lobs, use the lob pool and use the other pool for everything else.
Given this setup, I'd also recommend NOT streaming the LOBs. Using the driver's ability to buffer the LOBs in Node.js will greatly simplify the code and you should have plenty of memory given such a small number of concurrent users/file fetches.
The biggest problem with this scenario that the images are pretty large and they'll always be flowing from the database through Node.js to the browser. But since you'll be on an internal network, this might not be much of a problem. If it does turn out to be a problem, you can start to add caching in either the browser or Node.js based on what makes the most sense.

Unless you do something like tiling or the base64 inline encoding, each image needs its own URL, so each invocation of node-oracledb would return just one image. You could do some kind of caching by writing to disk, but this seems extra IO - you will need to test to measure your own system's performance and memory requirements. Regarding accessing multiple images in node-oracledb there's some code in https://github.com/oracle/node-oracledb/issues/1041#issuecomment-459002641 that may be useful.

Nightmare doesn't run twice in a row - NodeJS

EDIT
I have noticed the removal of the .end() function appears to solve the issue, but after reading the Nightmare docs on the use of .end() it says: Completes any queue operations, disconnect and close the electron process.
Now while this does solve the problem, am I now just opening more and more electron processes each time the route is called, which will eventually cause the server to run out of memory, or is this a safe way to fix the issue?
ORIGINAL TEXT
Please consider the following problem:
I am developing a Node based service that will allow the user to request screenshot of a particular URL.
For this I am using Nightmare to visit the URL, wait 2 seconds, take a screenshot, which is saved to the disk, convert it to base64, delete the image and then return the base64 string.
console.log('Nightmare starts');
nightmare
.goto(url)
.wait(2000)
.screenshot(filename)
.end()
.then(function (result)
{
fs.exists(filename, function(exists)
{
if (exists)
{
data = fs.readFileSync(filename);
var base64 = data.toString('base64')
fs.unlink(filename);
var output = {'message':'success','map_image':base64};
res.send(output);
}
});
})
.catch(function (error)
{
console.error('Search failed:', error);
});
console.log("Nightmare Finished");
The above code works just fine, the first time it runs. However any subsequent calls to this just consoles "Nightmare starts" and "Nightmare Finished" instantly with the actual code in-between not running. I don't appear to have any errors display, nothing is caught if I wrap it in a try/catch. The node requires a reboot to allow it to happen again.
Something worth noting is that I am running on a headless ubuntu machine, as electron (one of the nightmare dependencies) appears to need a GUI, I am using xvfb to launch the node using the following command:
xvfb-run --auto-servernum --server-num=1 node server.js
I'm assuming this may be an issue with some resource not being released correctly on the first run, but any assistance would be appreciated.
Also open to any constructive criticism of my code, very new to Node and i'm sure i'm not writing in the most optimal way (sync file loading etc)

It appears that you are simply misplacing where you are creating the nightmare instances. Cannot help much without some more code snippet and information.
Way 1
Create nightmare instance every time and close them after you are done with your task. It will require some time to boot up the instance, but it will also lessen the memory load. Not to mention you can have multiple nightmare instances for different users.
Way 2
Don't end and re-use same nightmare instance. Have multiple nightmare instances and queue the call for screenshot. The websites will load fast and it won't take time to boot up an instance, but you will have longer wait time for longer queue.

How to execute / abort long running tasks in Node JS?

NodeJS server with a Mongo DB - one feature will generate a report JSON file from the DB, which can take a while (60 seconds up - has to process hundreds of thousands of entries).
We want to run this as a background task. We need to be able to start a report build process, monitor it, and abort it if the user decides to change the params and re build it.
What is the simplest approach with node? Don't really want to get into the realms of separate worker servers processing jobs, message queues etc - we need to keep this on the same box and fairly simple implementation.
1) Start the build as a async method, and return to the user, with socket.io reporting progress?
2) Spin off a child process for the build script?
3) Use something like https://www.npmjs.com/package/webworker-threads?
With the few approaches I've looked at I get stuck on the same two areas;
1) How to monitor progress?
2) How to abort an existing build process if the user re-submits data?
Any pointers would be greatly appreciated...

The best would be to separate this task from your main application. That said, it'd be easy to run it in the background.
To run it in the background and monit without message queue etc., the easiest would be a child_process.
You can launch a spawn job on an endpoint (or url) called by the user.
Next, setup a socket to return live monitoring of the child process
Add another endpoint to stop the job, with a unique id returned by 1. (or not, depending of your concurrency needs)
Some coding ideas:
var spawn = require('child_process').spawn
var job = null //keeping the job in memory to kill it
app.get('/save', function(req, res) {
if(job && job.pid)
return res.status(500).send('Job is already running').end()
job = spawn('node', ['/path/to/save/job.js'],
{
detached: false, //if not detached and your main process dies, the child will be killed too
stdio: [process.stdin, process.stdout, process.stderr] //those can be file streams for logs or wathever
})
job.on('close', function(code) {
job = null
//send socket informations about the job ending
})
return res.status(201) //created
})
app.get('/stop', function(req, res) {
if(!job || !job.pid)
return res.status(404).end()
job.kill('SIGTERM')
//or process.kill(job.pid, 'SIGTERM')
job = null
return res.status(200).end()
})
app.get('/isAlive', function(req, res) {
try {
job.kill(0)
return res.status(200).end()
} catch(e) { return res.status(500).send(e).end() }
})
To monit the child process you could use pidusage, we use it in PM2 for example. Add a route to monit a job and call it every second. Don't forget to release memory when job ends.
You might want to check out this library which will help you manage multi processing across microservices.

Best way to manage unique child processes in node.js

I'm about to start coding a chat bot. However, I plan on running more than one, using a wrapper to communicate and restart them. I have done this in the past with child_process.fork(), but it was incredibly inefficient. I've looked into spawn and cluster as well, but they all seem to focus on running the same thing, not unique bots. As for plugins, I've looked into fleet, forkfriend, and workerfarm, but none seem to fit my needs.
Is there any plugin or way I'm not seeing to help me do this? Or am I just going to have o wing it again?

You can have as many chat bots as you wish in a single process. The rule of thumb in Node.js is using one process per processor core since Node has slightly different multithreading model you might got used to.
Assuming you still need some multithreading on top of this, here is a couple of node modules you might find fitting your needs:
node-webworker-threads, dnode.
UPDATE:
Now I see what you need. There is a nice example in Node.js docs, which I saw recently. I just copy & paste it here:
var normal = require('child_process').fork('child.js', ['normal']);
var special = require('child_process').fork('child.js', ['special']);
// Open up the server and send sockets to child
var server = require('net').createServer();
server.on('connection', function (socket) {
// if this is a VIP
if (socket.remoteAddress === '74.125.127.100') {
special.send('socket', socket);
return;
}
// just the usual dudes
normal.send('socket', socket);
});
server.listen(1337);
child.js looks like this:
process.on('message', function(m, socket) {
if (m === 'socket') {
socket.end('You were handled as a ' + process.argv[2] + ' person');
}
});
I believe it's pretty much what you need. Launch several processes with different configs (if number of configs is relatively low) and pass socket to a particular one from master process.

NodeJS large directory file changes

I am working on more of a security dashboard, it watches for changes in files in the entire home directory with hundreds of sites (all Joomla, so a lot of files).
In order to keep on top of potential security issues we want to watch for file changes in an efficient way without creating unnecessary CPU/Memory overhead. We want to watch it at a faster interval but I know its more of a balancing act when you do want to keep it from using more cpu then a side process should.
I have tried to use "watch" with the following code, running in the home directory:
var watch, fs;
watch = require('watch');
fs = require('fs');
watch.createMonitor(__dirname,{interval:500,filter:function(file,stat){
if(file.indexOf('index.php')!=-1){
return true;
}else{
return false;
}
}},function(monitor){
monitor.filter(function(file){
console.log(file);
})
monitor.on('created',function(file,stat){
console.log(file + ' new');
});
monitor.on('changed',function(file,stat){
console.log(file + ' changed');
});
monitor.on('removed',function(file,stat){
console.log(file + ' deleted');
});
});
However this spikes the CPU to over 100% of a single core (sometimes 2) out of 8. Memory also takes up about 20% of 8gb pretty quickly as well. This is all just to create the watch event on all the files, so its before it can actually detect any file changes.
I know the issue with this is it goes through each file individually, and only does not track it if you filter that sort of file. Typically all I need to watch is the index.php in every directory, down to a point that it could be consistent (with some exceptions).
Is there a module already built to do this? Or is this something new? All modules I find assume its a smaller directory (like watching LESS or something) So not built for this sort of application at all.
Any ideas? I know this code will need to be scrapped as there is no way I can see to stop the CPU overhead.

Do not use package 'watch', just use fs.watch(...)
package 'watch':
consistent APIs
very slow because implement mostly in node, look source to see how it work
souce code: https://github.com/mikeal/watch/blob/master/main.js
fs.watch(..)
non-consistent APIs, not all OSs are supported.
very fast because it reused OS features
document: http://nodejs.org/docs/latest/api/fs.html#fs_fs_watch_filename_options_listener

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string