node-cloudfiles module - Is there a way to track upload progress

node-cloudfiles module - Is there a way to track upload progress - node.js

If anyone here is familiar with the node-cloudfiles module for node.js, I could use some help in several different areas. Unfortunately, is seems the authors are nearly impossible to reach via their github repo (EDIT: nevermind, someone did reach out to me, I'll send an update when I have an answer of some sort prepared.)
I'll start with my most basic challenge: is there a way to track the progress of the upload? I have tried many things, but the object returned from the .addFile command does not seem to hold any sort of progress stats.
Here is a basic outline of what I am working with.
var readStream = fs.createReadStream(path+'.'+extension, streamopts);
var upOpts = {
headers: {
'content-type': 'video/'+extension,
'content-length': totalBytes
},
remote: CDNfilename,
stream: readStream
};
//reqStream is the object returned from the 'request' module,
//which is used by the 'cloudfiles' module.
var reqStream = cloudClient.addFile(Container.name, upOpts, function (err, uploaded) {
if (err) { console.log(err); }
});
At first I thought I could just use the .bytesWritten property connected to an interval timer, but the object is not a normal node writeStream, so there is no such property.

Charlie (the author of the module) told me that this is possible because it's using a pipe and you just check the data events from the object returned from .addFile, like so:
reqStream.on('data', function () {
/* track progress /*
});
Whenever you need to contact somebody from the nodejitsu team, join the #nodejitsu channel on IRC, they're really active.

At the time of writing this answer, there isn't really a good way to get upload progress for files being sent to cloudfiles. However, one of the nodejitsu geniuses implemented chunked uploading, which in my case, eliminates the need for progress reports. Thanks Bradley.

Related

Strapi & react-admin : I'd like to set 'Content-Range' header dynamically when any fetchAll query fires

I'm still a novice web developer, so please bear with me if I miss something fundamental !
I'm creating a backoffice for a Strapi backend, using react-admin.
React-admin library uses a 'data provider' to link itself with an API. Luckily someone already wrote a data provider for Strapi. I had no problem with step 1 and 2 of this README, and I can authenticate to Strapi within my React app.
I now want to fetch and display my Strapi data, starting with Users. In order to do that, quoting Step 3 of this readme : 'In controllers I need to set the Content-Range header with the total number of results to build the pagination'.
So far I tried to do this in my User controller, with no success.
What I try to achieve:
First, I'd like it to simply work with the ctx.set('Content-Range', ...) hard-coded in the controller like aforementioned Step 3.
Second, I've thought it would be very dirty to c/p this logic in every controller (not to mention in any future controllers), instead of having some callback function dynamically appending the Content-Range header to any fetchAll request. Ultimately that's what I aim for, because with ~40 Strapi objects to administrate already and plenty more to come, it has to scale.
Technical infos
node -v: 11.13.0
npm -v: 6.7.0
strapi version: 3.0.0-alpha.25.2
uname -r output: Linux 4.14.106-97.85.amzn2.x86_64
DB: mySQL v2.16
So far I've tried accessing the count() method of User model like aforementioned step3, but my controller doesn't look like the example as I'm working with users-permissions plugin.
This is the action I've tried to edit (located in project/plugins/users-permissions/controllers/User.js)
find: async (ctx) => {
let data = await strapi.plugins['users-permissions'].services.user.fetchAll(ctx.query);
data.reduce((acc, user) => {
acc.push(_.omit(user.toJSON ? user.toJSON() : user, ['password', 'resetPasswordToken']));
return acc;
}, []);
// Send 200 `ok`
ctx.send(data);
},
From what I've gathered on Strapi documentation (here and also here), context is a sort of wrapper object. I only worked with Express-generated APIs before, so I understood this snippet as 'use fetchAll method of the User model object, with ctx.query as an argument', but I had no luck logging this ctx.query. And as I can't log stuff, I'm kinda blocked.
In my exploration, I naively tried to log the full ctx object and work from there:
// Send 200 `ok`
ctx.send(data);
strapi.log.info(ctx.query, ' were query');
strapi.log.info(ctx.request, 'were request');
strapi.log.info(ctx.response, 'were response');
strapi.log.info(ctx.res, 'were res');
strapi.log.info(ctx.req, 'were req');
strapi.log.info(ctx, 'is full context')
},
Unfortunately, I fear I miss something obvious, as it gives me no input at all. Making a fetchAll request from my React app with these console.logs print this in my terminal:
[2019-09-19T12:43:03.409Z] info were query
[2019-09-19T12:43:03.410Z] info were request
[2019-09-19T12:43:03.418Z] info were response
[2019-09-19T12:43:03.419Z] info were res
[2019-09-19T12:43:03.419Z] info were req
[2019-09-19T12:43:03.419Z] info is full context
[2019-09-19T12:43:03.435Z] debug GET /users?_sort=id:DESC&_start=0&_limit=10& (74 ms)
While in my frontend I get the good ol' The Content-Range header is missing in the HTTP Response message I'm trying to solve.
After writing this wall of text I realize the logging issue is separated from my original problem, but if I was able to at least log ctx properly, maybe I'd be able to find the solution myself.
Trying to summarize:
Actual problem is, how do I set my Content-Range properly in my strapi controller ? (partially answered cf. edit 3)
Collateral problem n°1: Can't even log ctx object (cf. edit 2)
Collateral problem n°2: Once I figure out the actual problem, is it feasible to address it dynamically (basically some callback function for index/fetchAll routes, in which the model is a variable, on which I'd call the appropriate count() method, and finally append the result to my response header)? I'm not asking for the code here, just if you think it's feasible and/or know a more elegant way.
Thank you for reading through and excuse me if it was confuse; I wasn't sure which infos would be relevant, so I thought the more the better.
/edit1: forgot to mention, in my controller I also tried to log strapi.plugins['users-permissions'].services.user object to see if it actually has a count() method but got no luck with that either. Also tried the original snippet (Step 3 of aforementioned README), but failed as expected as afaik I don't see the User model being imported anywhere (the only import in User.js being lodash)
/edit2: About the logs, my bad, I just misunderstood the documentation. I now do:
ctx.send(data);
strapi.log.info('ctx should be : ', {ctx});
strapi.log.info('ctx.req = ', {...ctx.req});
strapi.log.info('ctx.res = ', {...ctx.res});
strapi.log.info('ctx.request = ', {...ctx.request});
ctrapi.log.info('ctx.response = ', {...ctx.response});
Ctx logs this way; also it seems that it needs the spread operator to display nested objects ({ctx.req} crash the server, {...ctx.req} is okay). Cool, because it narrows the question to what's interesting.
/edit3: As expected, having logs helps big time. I've managed to display my users (although in the dirty way). Couldn't find any count() method, but watching the data object that is passed to ctx.send(), it's equivalent to your typical 'res.data' i.e a pure JSON with my user list. So a simple .length did the trick:
let data = await strapi.plugins['users-permissions'].services.user.fetchAll(ctx.query);
data.reduce((acc, user) => {
acc.push(_.omit(user.toJSON ? user.toJSON() : user, ['password', 'resetPasswordToken']));
return acc;
}, []);
ctx.set('Content-Range', data.length) // <-- it did the trick
// Send 200 `ok`
ctx.send(data);
Now starting to work on the hard part: the dynamic callback function that will do that for any index/fetchAll call. Will update once I figure it out

I'm using React Admin and Strapi together and installed ra-strapi-provider.
A little boring to paste Content-Range header into all of my controllers, so I searched for a better solution. Then I've found middleware concept and created one that fits my needs. It's probably not the best solution, but do its job well:
const _ = require("lodash");
module.exports = strapi => {
return {
// can also be async
initialize() {
strapi.app.use(async (ctx, next) => {
await next();
if (_.isArray(ctx.response.body))
ctx.set("Content-Range", ctx.response.body.length);
});
}
};
};
I hope it helps

For people still landing on this page:
Strapi has been updated from #alpha to #beta. Care, as some of the code in my OP is no longer valid; also some of their documentation is not up to date.
I failed to find a "clever" way to solve this problem; in the end I copy/pasted the ctx.set('Content-Range', data.length) bit in all relevant controllers and it just worked.
If somebody comes with a clever solution for that problem I'll happily accept his answer. With the current Strapi version I don't think it's doable with policies or lifecycle callbacks.
The "quick & easy fix" is still to customize each relevant Strapi controller.
With strapi#beta you don't have direct access to controller's code: you'll first need to "rewrite" one with the help of this doc. Then add the ctx.set('Content-Range', data.length) bit. Test it properly with RA, so for the other controllers, you'll just have to create the folder, name the file, copy/paste your code + "Search & Replace" on model name.
The "longer & cleaner fix" would be to dive into the react-admin source code and refactorize so the lack of "Content-Range" header doesn't break pagination.
You'll now have to maintain your own react-admin fork, so make sure you're already committed into this library and have A LOT of tables to manage through it (so much that customizing every Strapi controller will be too tedious).
Before forking RA, please remember all the stuff you can do with the Strapi backoffice alone (including embedding your custom React app into it) and ensure it will be worth the trouble.

Optimal method for nodejs to hand of image from database to browser

the end result that I need is to send multiple images to a web browser from a database.
The images are stored as blobs.
I know I can stream them out of the database and into a file and then I could just give the url to the file.
I also know I can hand off base64 string to the browser so it can render the image.
My question is which option is the most optimal? Or best practice? Keep in mind that if I go the stream method, I would have to check to see if the image has changed since the last time I displayed it...and if it has changed then I have to restream it out of the database.
I have been playing with the oracldb for node js and was able to successfully extract one blob into a file but I am also having trouble streaming multiple files.
This is a two question post:
Which is the most optimal:
1. Send Base64 string - I kind of like this method because i dont have to worry about streaming out the file and checking if it has changed since it is coming straight from the databse. My concern is can the browser/nodejs handle it? I know those strings can be very large. I could also be sending more than one image at a time.
Stream the blobs into files.
The second part question is how can i get multiple blobs out below is my code on streaming just one file, i found this example from github lobstream1.js
https://raw.githubusercontent.com/oracle/node-oracledb/master/examples/lobstream1.js
Focusing on the code:
// Stream a LOB to a file
var dostream = function(lob, cb) {
if (lob.type === oracledb.CLOB) {
console.log('Writing a CLOB to ' + outFileName);
lob.setEncoding('utf8'); // set the encoding so we get a 'string' not a 'buffer'
} else {
console.log('Writing a BLOB to ' + outFileName);
}
var errorHandled = false;
lob.on(
'error',
function(err) {
console.log("lob.on 'error' event");
if (!errorHandled) {
errorHandled = true;
lob.close(function() {
return cb(err);
});
}
});
lob.on(
'end',
function() {
console.log("lob.on 'end' event");
});
lob.on(
'close',
function() {
// console.log("lob.on 'close' event");
if (!errorHandled) {
return cb(null);
}
});
var outStream = fs.createWriteStream(outFileName);
outStream.on(
'error',
function(err) {
console.log("outStream.on 'error' event");
if (!errorHandled) {
errorHandled = true;
lob.close(function() {
return cb(err);
});
}
});
// Switch into flowing mode and push the LOB to the file
lob.pipe(outStream);
};
Fixed spooling out images with this method, I did change the dostream a bit.
for(var x = 0; x<result.rows.length;x++)
{
outputFileName = x + '.jpg';
console.log(outputFileName);
console.log(x);
var lob = result.rows[x][0];
dostream(lob,outputFileName);
// cb(null,lob);
}
Thank you for any help.

Given all the detail you provided in subsequent comments including the average image size, number of distinct images, memory available to Node.js, number of concurrent users, and the fact that it's "very critical to have the images up to date", here's my initial take...
For the first implementation, stick to the KISS principle and avoid over-engineering. Disable browser caching and don't cache images in Node.js. Instead, rely on the driver and Oracle Database to do the heavy lifting for you.
As for the table storing the images, try to use SecureFile LOBs over BasicFile LOBs (they are known to perform better) if possible. Also, look at the caching options available to both (CACHE, CACHE READS, and NOCACHE). Consider enabling the CACHE READS option based on your stated workload, but work with your DBA to ensure the buffer cache is sized appropriately so you will not impact others.
You can rely on the connection pool's connection request queue to help control how many people are fetching files concurrently. In fact, you might want to create a separate pool just for this purpose so that people fetching LOBs aren't blocking people doing other things in the application. For example, let's say you normally have one connection pool with 10 connections. You could create two connection pools with 5 connections each (use the connection pool cache to make this easy). Then, in the code path that fetches lobs, use the lob pool and use the other pool for everything else.
Given this setup, I'd also recommend NOT streaming the LOBs. Using the driver's ability to buffer the LOBs in Node.js will greatly simplify the code and you should have plenty of memory given such a small number of concurrent users/file fetches.
The biggest problem with this scenario that the images are pretty large and they'll always be flowing from the database through Node.js to the browser. But since you'll be on an internal network, this might not be much of a problem. If it does turn out to be a problem, you can start to add caching in either the browser or Node.js based on what makes the most sense.

Unless you do something like tiling or the base64 inline encoding, each image needs its own URL, so each invocation of node-oracledb would return just one image. You could do some kind of caching by writing to disk, but this seems extra IO - you will need to test to measure your own system's performance and memory requirements. Regarding accessing multiple images in node-oracledb there's some code in https://github.com/oracle/node-oracledb/issues/1041#issuecomment-459002641 that may be useful.

Not sure what .listen() in refluxjs does exactly

In refluxjs I'm not sure what .listen() does. From my understanding, it has the same concepts as nodejs eventemitter but reflux wraps in its own way. I can't seem to find documentation on this anywhere. Maybe I missed it. I would like to find .listen() in the source code or documentation so I know exactly how refluxjs uses it.

Did you try the README? There's a whole section on it: Listening to changes in data store.
Listening to changes in data store
In your component, register to listen to changes in your data store
like this:
// Fairly simple view component that outputs to console
function ConsoleComponent() {
// Registers a console logging callback to the statusStore updates
statusStore.listen(function(status) {
console.log('status: ', status);
});
};
var consoleComponent = new ConsoleComponent();
Invoke actions as if they were functions:
statusUpdate(true);
statusUpdate(false);
With the setup above this will output the following in the console:
status: ONLINE
status: OFFLINE
And yes, its semantics are pretty much like EventEmitter; it uses eventemitter3 under the hood. listen itself is defined in PublisherMethods.js.

Which nodejs library should I use to write into HDFS?

I have a nodejs application and I want to write data into hadoop HDFS file system. I have seen two main nodejs libraries that can do it: node-hdfs and node-webhdfs. Someone have tried it? Any hints? Which one should I use in production?
I am inclined to use node-webhdfs since it uses WebHDFS REST API. node-hdfs seem to be a c++ binding.
Any help will be greatly appreciated.

You may want to check out webhdfs library. It provides nice and straightforward (similar to fs module API) interface for WebHDFS REST API calls.
Writing to the remote file:
var WebHDFS = require('webhdfs');
var hdfs = WebHDFS.createClient();
var localFileStream = fs.createReadStream('/path/to/local/file');
var remoteFileStream = hdfs.createWriteStream('/path/to/remote/file');
localFileStream.pipe(remoteFileStream);
remoteFileStream.on('error', function onError (err) {
// Do something with the error
});
remoteFileStream.on('finish', function onFinish () {
// Upload is done
});
Reading from the remote file:
var WebHDFS = require('webhdfs');
var hdfs = WebHDFS.createClient();
var remoteFileStream = hdfs.createReadStream('/path/to/remote/file');
remoteFileStream.on('error', function onError (err) {
// Do something with the error
});
remoteFileStream.on('data', function onChunk (chunk) {
// Do something with the data chunk
});
remoteFileStream.on('finish', function onFinish () {
// Upload is done
});

Not good news!!!
Do not use node-hdfs. Although it seems promising, it is now two years obsolete. I've tried to compile it but it does not match the symbols of current libhdfs. If you want to use something like that you'll have to make your own nodejs binding.
You can use node-webhdfs but IMHO there's not much advantage on that. It is better to use an http nodejs lib to make your own requests. The hardest part here is try to hold the very async nature of nodejs, since you might want first to create a folder, and then after successfully create it, create a file and then, at last, write or append data. Everything through http requests that you must send and wait the for answer to then go on....
At least node-webhdfs might be a good reference to you take a look and start your own code.
Br,
Fabio Moreira

Observe file changes with node.js

I have the following UseCase:
A creates a Chat and invites B and C - On the Server A creates a
File. A, B and C writes messages into this file. A, B and C read this
file.
I want a to create a file on server and observe this file if anybody else writes something into this file send the new content back with websockets.
So, any change of this file should be observed by my node.js application.
How can I observe files-changes? Is this possible with node js without locking the files?
If not possible with files, would it be possible with database object (NoSQL)

Good news is that you can observe filechanges with Node's API.
This however doesn't give you access to the contents that has been written into the file.
You can maybe use the fs.appendFile(); function so that when something is being written into the file you emit an event to something else that "logs" your new data that is being written.
fs.watch(): Directly pasted from the docs
fs.watch('somedir', function (event, filename) {
console.log('event is: ' + event);
if (filename) {
console.log('filename provided: ' + filename);
} else {
console.log('filename not provided');
}
});
Read here about the fs.watch(); function
EDIT: You can use the function
fs.watchFile();
Read here about the fs.watchFile(); function
This will allow you to watch a file for changes. Ie. whenever it is accessed by some other processes of any kind.

Also you could use node-watch. Here's an easy example:
const watch = require('node-watch')
watch('README.md', function(event, filename) {
console.log(filename, ' changed.')
})

I do not think you need to have observe file changes or use a NoSQL database for this (if you do not want to). My advice would be to look at events(Observer pattern). There are more than enough tutorials on this topic available online (Google). For example Felix's article about Using EventEmitters
This publish/subcribe semantic can also be achieved with NoSQL. In Redis for example, I think you should have a look at pubsub.
In MongoDB I think tailable cursors is what you are looking for. On their blog they have a post explaining pub/sub.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string