Observe file changes with node.js

Observe file changes with node.js - node.js

I have the following UseCase:
A creates a Chat and invites B and C - On the Server A creates a
File. A, B and C writes messages into this file. A, B and C read this
file.
I want a to create a file on server and observe this file if anybody else writes something into this file send the new content back with websockets.
So, any change of this file should be observed by my node.js application.
How can I observe files-changes? Is this possible with node js without locking the files?
If not possible with files, would it be possible with database object (NoSQL)

Good news is that you can observe filechanges with Node's API.
This however doesn't give you access to the contents that has been written into the file.
You can maybe use the fs.appendFile(); function so that when something is being written into the file you emit an event to something else that "logs" your new data that is being written.
fs.watch(): Directly pasted from the docs
fs.watch('somedir', function (event, filename) {
console.log('event is: ' + event);
if (filename) {
console.log('filename provided: ' + filename);
} else {
console.log('filename not provided');
}
});
Read here about the fs.watch(); function
EDIT: You can use the function
fs.watchFile();
Read here about the fs.watchFile(); function
This will allow you to watch a file for changes. Ie. whenever it is accessed by some other processes of any kind.

Also you could use node-watch. Here's an easy example:
const watch = require('node-watch')
watch('README.md', function(event, filename) {
console.log(filename, ' changed.')
})

I do not think you need to have observe file changes or use a NoSQL database for this (if you do not want to). My advice would be to look at events(Observer pattern). There are more than enough tutorials on this topic available online (Google). For example Felix's article about Using EventEmitters
This publish/subcribe semantic can also be achieved with NoSQL. In Redis for example, I think you should have a look at pubsub.
In MongoDB I think tailable cursors is what you are looking for. On their blog they have a post explaining pub/sub.

Related

Optimal method for nodejs to hand of image from database to browser

the end result that I need is to send multiple images to a web browser from a database.
The images are stored as blobs.
I know I can stream them out of the database and into a file and then I could just give the url to the file.
I also know I can hand off base64 string to the browser so it can render the image.
My question is which option is the most optimal? Or best practice? Keep in mind that if I go the stream method, I would have to check to see if the image has changed since the last time I displayed it...and if it has changed then I have to restream it out of the database.
I have been playing with the oracldb for node js and was able to successfully extract one blob into a file but I am also having trouble streaming multiple files.
This is a two question post:
Which is the most optimal:
1. Send Base64 string - I kind of like this method because i dont have to worry about streaming out the file and checking if it has changed since it is coming straight from the databse. My concern is can the browser/nodejs handle it? I know those strings can be very large. I could also be sending more than one image at a time.
Stream the blobs into files.
The second part question is how can i get multiple blobs out below is my code on streaming just one file, i found this example from github lobstream1.js
https://raw.githubusercontent.com/oracle/node-oracledb/master/examples/lobstream1.js
Focusing on the code:
// Stream a LOB to a file
var dostream = function(lob, cb) {
if (lob.type === oracledb.CLOB) {
console.log('Writing a CLOB to ' + outFileName);
lob.setEncoding('utf8'); // set the encoding so we get a 'string' not a 'buffer'
} else {
console.log('Writing a BLOB to ' + outFileName);
}
var errorHandled = false;
lob.on(
'error',
function(err) {
console.log("lob.on 'error' event");
if (!errorHandled) {
errorHandled = true;
lob.close(function() {
return cb(err);
});
}
});
lob.on(
'end',
function() {
console.log("lob.on 'end' event");
});
lob.on(
'close',
function() {
// console.log("lob.on 'close' event");
if (!errorHandled) {
return cb(null);
}
});
var outStream = fs.createWriteStream(outFileName);
outStream.on(
'error',
function(err) {
console.log("outStream.on 'error' event");
if (!errorHandled) {
errorHandled = true;
lob.close(function() {
return cb(err);
});
}
});
// Switch into flowing mode and push the LOB to the file
lob.pipe(outStream);
};
Fixed spooling out images with this method, I did change the dostream a bit.
for(var x = 0; x<result.rows.length;x++)
{
outputFileName = x + '.jpg';
console.log(outputFileName);
console.log(x);
var lob = result.rows[x][0];
dostream(lob,outputFileName);
// cb(null,lob);
}
Thank you for any help.

Given all the detail you provided in subsequent comments including the average image size, number of distinct images, memory available to Node.js, number of concurrent users, and the fact that it's "very critical to have the images up to date", here's my initial take...
For the first implementation, stick to the KISS principle and avoid over-engineering. Disable browser caching and don't cache images in Node.js. Instead, rely on the driver and Oracle Database to do the heavy lifting for you.
As for the table storing the images, try to use SecureFile LOBs over BasicFile LOBs (they are known to perform better) if possible. Also, look at the caching options available to both (CACHE, CACHE READS, and NOCACHE). Consider enabling the CACHE READS option based on your stated workload, but work with your DBA to ensure the buffer cache is sized appropriately so you will not impact others.
You can rely on the connection pool's connection request queue to help control how many people are fetching files concurrently. In fact, you might want to create a separate pool just for this purpose so that people fetching LOBs aren't blocking people doing other things in the application. For example, let's say you normally have one connection pool with 10 connections. You could create two connection pools with 5 connections each (use the connection pool cache to make this easy). Then, in the code path that fetches lobs, use the lob pool and use the other pool for everything else.
Given this setup, I'd also recommend NOT streaming the LOBs. Using the driver's ability to buffer the LOBs in Node.js will greatly simplify the code and you should have plenty of memory given such a small number of concurrent users/file fetches.
The biggest problem with this scenario that the images are pretty large and they'll always be flowing from the database through Node.js to the browser. But since you'll be on an internal network, this might not be much of a problem. If it does turn out to be a problem, you can start to add caching in either the browser or Node.js based on what makes the most sense.

Unless you do something like tiling or the base64 inline encoding, each image needs its own URL, so each invocation of node-oracledb would return just one image. You could do some kind of caching by writing to disk, but this seems extra IO - you will need to test to measure your own system's performance and memory requirements. Regarding accessing multiple images in node-oracledb there's some code in https://github.com/oracle/node-oracledb/issues/1041#issuecomment-459002641 that may be useful.

how to read an incomplete file and wait for new data in nodejs

I have a UDP client that grabs some data from another source and writes it to a file on the server. Since this is large amount of data, I dont want the end user to wait until they its full written to the server so that they can download it. So I made a NodeJS server that grabs the latest data from the file and sends it to the user.
Here is the code:
var stream = fs.readFileSync(filename)
.on("data", function(data) {
response.write(data)
});
The problem here is, if the download starts when the file was only for example 10mb.. the fs.readFileSync will only read my file up to 10mb. Even if 2 mins later the file increased to 100mb. fs.readFileSync will never know about the new updated data. How can I do this in Node? I would like somehow refresh the fs state or maybe perpaps wait for new data using fs file system. Or is there some kind of fs fileContent watcher?
EDIT:
I think the code below describes better what I would like to achieve, however in this code it keeps reading forever and I dont have any variable from fs.read that can help me stop it:
fs.open(filename, 'r', function(err, fd) {
var bufferSize=1000,
chunkSize=512,
buffer=new Buffer(bufferSize),
bytesRead = 0;
while(true){ //check if file has new content inside
fs.read(fd, buffer, 0, chunkSize, bytesRead);
bytesRead+= buffer.length;
}
});

Node has built-in methods in the fs module. It is tagged as unstable, so it can change in the future.
Its called: fs.watchFile(filename[, options], listener)
You can read more about it here: https://nodejs.org/api/fs.html#fs_fs_watchfile_filename_options_listener
But i highly suggest you to use one of the good modules mantained actively like
watchr:
From his readme:
Better file system watching for Node.js. Provides a normalised API the
file watching APIs of different node versions, nested/recursive file
and directory watching, and accurate detailed events for
file/directory changes, deletions and creations.
The module page is here: https://github.com/bevry/watchr
(Used the module in a couple of proyects and working great, im not related to it in other way)

you need store in some data base last size of file.
read filesize first.
load your file.
then make a script to check if file was change.
you can consult the size with jquery.post to obtain your result and decide if need to reload in javascript

How to get Vertica copy from stdin response in NodeJS?

I'm using Vertica Database 07.01.0100 and node.js v0.10.32. I'm using the vertica nodejs module by vanberger. I want to send a copy from stdin command, and that is working using this example: https://gist.github.com/soldair/5168249. Here's my code:
var loadStreamQuery = "COPY \""+input('table-name')+"\" FROM STDIN DELIMITER ',' skip 1 direct;"
var stream = through();
connection.copy(loadStreamQuery,function(transfer, success, fail){
stream.on('data',function(data){
log.info("loaddata: on data =>",data);
transfer(data);
});
stream.on('end',function(data){
log.info("loaddata: on end =>", data);
if(data) {
transfer(data);
}
success();
callback(null,{'result':{'status':'200','result':"Data was loaded successfully into Vertica"}});
});
stream.on('error',function(err){
fail();
log.error("loaddata: on error =>",err);
connection.disconnect();
});
stream.write(new Buffer(file));
stream.end();
}
);
But, if the data file has more columns than the target table, it doesn't say so. It just happily runs, copying nothing and then ends. When I look at the table, nothing has been loaded. If I do the same thing in dbvisualizer, it tells me that 0 rows were affected.
I would like to examine the status of the command, but I don't know how. Is there some other event that I need to listen for? Do I need to save the result of copy to a variable and listen there, like I do with query calls? I'm a nodejs noob, so if the answer is obvious, just let me know.
Thanks!

I don't really think it is a node.js thing as much as it is a Vertica thing.
You need to look for rejected rows. You can find some good examples in the docs here.
If you want to actually see the rows that reject, you can do this by using a COPY statement clause like REJECTED DATA AS table "loader_rejects". Alternatively you can send it to a file on the cluster. I'm not aware of a way to get rejected rows to a local file using STDIN.
If you don't care at all about the actual data, and just want to know how many rows loaded and rejected... you can use GET_NUM_REJECTED_ROWS() and GET_NUM_ACCEPTED_ROWS(). I think COPY will actually also return a result set with just the count of loaded rows, at least that is what I've noticed in the past.
So I guess as an example, if you want to see how many rows were accepted and rejected, you could do:
connection.query "SELECT GET_NUM_REJECTED_ROWS() AS REJECTED_ROWS, GET_NUMBER_ACCEPTED_ROWS() AS ACCEPTED_ROWS", (err, resultset) -> log.info( err, resultset.fields, resultset.rows, resultset.status )

Which nodejs library should I use to write into HDFS?

I have a nodejs application and I want to write data into hadoop HDFS file system. I have seen two main nodejs libraries that can do it: node-hdfs and node-webhdfs. Someone have tried it? Any hints? Which one should I use in production?
I am inclined to use node-webhdfs since it uses WebHDFS REST API. node-hdfs seem to be a c++ binding.
Any help will be greatly appreciated.

You may want to check out webhdfs library. It provides nice and straightforward (similar to fs module API) interface for WebHDFS REST API calls.
Writing to the remote file:
var WebHDFS = require('webhdfs');
var hdfs = WebHDFS.createClient();
var localFileStream = fs.createReadStream('/path/to/local/file');
var remoteFileStream = hdfs.createWriteStream('/path/to/remote/file');
localFileStream.pipe(remoteFileStream);
remoteFileStream.on('error', function onError (err) {
// Do something with the error
});
remoteFileStream.on('finish', function onFinish () {
// Upload is done
});
Reading from the remote file:
var WebHDFS = require('webhdfs');
var hdfs = WebHDFS.createClient();
var remoteFileStream = hdfs.createReadStream('/path/to/remote/file');
remoteFileStream.on('error', function onError (err) {
// Do something with the error
});
remoteFileStream.on('data', function onChunk (chunk) {
// Do something with the data chunk
});
remoteFileStream.on('finish', function onFinish () {
// Upload is done
});

Not good news!!!
Do not use node-hdfs. Although it seems promising, it is now two years obsolete. I've tried to compile it but it does not match the symbols of current libhdfs. If you want to use something like that you'll have to make your own nodejs binding.
You can use node-webhdfs but IMHO there's not much advantage on that. It is better to use an http nodejs lib to make your own requests. The hardest part here is try to hold the very async nature of nodejs, since you might want first to create a folder, and then after successfully create it, create a file and then, at last, write or append data. Everything through http requests that you must send and wait the for answer to then go on....
At least node-webhdfs might be a good reference to you take a look and start your own code.
Br,
Fabio Moreira

node-cloudfiles module - Is there a way to track upload progress

If anyone here is familiar with the node-cloudfiles module for node.js, I could use some help in several different areas. Unfortunately, is seems the authors are nearly impossible to reach via their github repo (EDIT: nevermind, someone did reach out to me, I'll send an update when I have an answer of some sort prepared.)
I'll start with my most basic challenge: is there a way to track the progress of the upload? I have tried many things, but the object returned from the .addFile command does not seem to hold any sort of progress stats.
Here is a basic outline of what I am working with.
var readStream = fs.createReadStream(path+'.'+extension, streamopts);
var upOpts = {
headers: {
'content-type': 'video/'+extension,
'content-length': totalBytes
},
remote: CDNfilename,
stream: readStream
};
//reqStream is the object returned from the 'request' module,
//which is used by the 'cloudfiles' module.
var reqStream = cloudClient.addFile(Container.name, upOpts, function (err, uploaded) {
if (err) { console.log(err); }
});
At first I thought I could just use the .bytesWritten property connected to an interval timer, but the object is not a normal node writeStream, so there is no such property.

Charlie (the author of the module) told me that this is possible because it's using a pipe and you just check the data events from the object returned from .addFile, like so:
reqStream.on('data', function () {
/* track progress /*
});
Whenever you need to contact somebody from the nodejitsu team, join the #nodejitsu channel on IRC, they're really active.

At the time of writing this answer, there isn't really a good way to get upload progress for files being sent to cloudfiles. However, one of the nodejitsu geniuses implemented chunked uploading, which in my case, eliminates the need for progress reports. Thanks Bradley.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string