Optimal method for nodejs to hand of image from database to browser - node.js

the end result that I need is to send multiple images to a web browser from a database.
The images are stored as blobs.
I know I can stream them out of the database and into a file and then I could just give the url to the file.
I also know I can hand off base64 string to the browser so it can render the image.
My question is which option is the most optimal? Or best practice? Keep in mind that if I go the stream method, I would have to check to see if the image has changed since the last time I displayed it...and if it has changed then I have to restream it out of the database.
I have been playing with the oracldb for node js and was able to successfully extract one blob into a file but I am also having trouble streaming multiple files.
This is a two question post:
Which is the most optimal:
1. Send Base64 string - I kind of like this method because i dont have to worry about streaming out the file and checking if it has changed since it is coming straight from the databse. My concern is can the browser/nodejs handle it? I know those strings can be very large. I could also be sending more than one image at a time.
Stream the blobs into files.
The second part question is how can i get multiple blobs out below is my code on streaming just one file, i found this example from github lobstream1.js
https://raw.githubusercontent.com/oracle/node-oracledb/master/examples/lobstream1.js
Focusing on the code:
// Stream a LOB to a file
var dostream = function(lob, cb) {
if (lob.type === oracledb.CLOB) {
console.log('Writing a CLOB to ' + outFileName);
lob.setEncoding('utf8'); // set the encoding so we get a 'string' not a 'buffer'
} else {
console.log('Writing a BLOB to ' + outFileName);
}
var errorHandled = false;
lob.on(
'error',
function(err) {
console.log("lob.on 'error' event");
if (!errorHandled) {
errorHandled = true;
lob.close(function() {
return cb(err);
});
}
});
lob.on(
'end',
function() {
console.log("lob.on 'end' event");
});
lob.on(
'close',
function() {
// console.log("lob.on 'close' event");
if (!errorHandled) {
return cb(null);
}
});
var outStream = fs.createWriteStream(outFileName);
outStream.on(
'error',
function(err) {
console.log("outStream.on 'error' event");
if (!errorHandled) {
errorHandled = true;
lob.close(function() {
return cb(err);
});
}
});
// Switch into flowing mode and push the LOB to the file
lob.pipe(outStream);
};
Fixed spooling out images with this method, I did change the dostream a bit.
for(var x = 0; x<result.rows.length;x++)
{
outputFileName = x + '.jpg';
console.log(outputFileName);
console.log(x);
var lob = result.rows[x][0];
dostream(lob,outputFileName);
// cb(null,lob);
}
Thank you for any help.

Given all the detail you provided in subsequent comments including the average image size, number of distinct images, memory available to Node.js, number of concurrent users, and the fact that it's "very critical to have the images up to date", here's my initial take...
For the first implementation, stick to the KISS principle and avoid over-engineering. Disable browser caching and don't cache images in Node.js. Instead, rely on the driver and Oracle Database to do the heavy lifting for you.
As for the table storing the images, try to use SecureFile LOBs over BasicFile LOBs (they are known to perform better) if possible. Also, look at the caching options available to both (CACHE, CACHE READS, and NOCACHE). Consider enabling the CACHE READS option based on your stated workload, but work with your DBA to ensure the buffer cache is sized appropriately so you will not impact others.
You can rely on the connection pool's connection request queue to help control how many people are fetching files concurrently. In fact, you might want to create a separate pool just for this purpose so that people fetching LOBs aren't blocking people doing other things in the application. For example, let's say you normally have one connection pool with 10 connections. You could create two connection pools with 5 connections each (use the connection pool cache to make this easy). Then, in the code path that fetches lobs, use the lob pool and use the other pool for everything else.
Given this setup, I'd also recommend NOT streaming the LOBs. Using the driver's ability to buffer the LOBs in Node.js will greatly simplify the code and you should have plenty of memory given such a small number of concurrent users/file fetches.
The biggest problem with this scenario that the images are pretty large and they'll always be flowing from the database through Node.js to the browser. But since you'll be on an internal network, this might not be much of a problem. If it does turn out to be a problem, you can start to add caching in either the browser or Node.js based on what makes the most sense.

Unless you do something like tiling or the base64 inline encoding, each image needs its own URL, so each invocation of node-oracledb would return just one image. You could do some kind of caching by writing to disk, but this seems extra IO - you will need to test to measure your own system's performance and memory requirements. Regarding accessing multiple images in node-oracledb there's some code in https://github.com/oracle/node-oracledb/issues/1041#issuecomment-459002641 that may be useful.

Related

LokiJS inconsistent with updates & reads while working with 50k+ reads and writes

Currently we are working with nodeJS & LokiJS. as our application is dealing with real-time data; to communicate with external NoSQL/Relational DB will cause the latency problems.
So we decided to use in-memory database i.e, LokiJS.
LokiJs is good at when we are working with a collection which has 500-100 documents in it. but when it comes to the updates & parallel reads; it is worse.
meaning one of our vendor is published Kafka endpoint to consume the feed, and serve it to some external service again; From Kafka topic we are getting 100-200 events per second. So whenever, we received an Kafka event, we are updating the existing collection. As the delta updates are too frequent, LokiJS collection updates are not done properly by that read giving inconsistency results.
Here is my collection creation snippet.
let db= new loki('c:\products.json', {
autoload: true,
autosave: true,
autosaveInterval: 4000
});
this.collection= db.addCollection('REALTIMEFEED', {
indices: ["id", "type"],
adaptiveBinaryIndices: true,
autoupdate: true,
clone: true
});
function update(db, collection, element, id) {
try {
var data = collection.findOne(id);
data.snapshot = Date.now();
data.delta = element;
collection.update(data);
db.saveDatabase();
} catch (error) {
console.error('dbOperations:update:failed,', error)
}
}
Could you please suggest me that, am I missing anything here.
I think your problem lies in the fact that you are saving the database at each update. You already specified an autoSave and an autoSaveInterval, so LokiJS is going to periodically save your data. If you also force saving from each update you are clogging the process, since JS is single-threaded so it has to handle most of the operation (it can keep running when the save operations is passed off to the OS for the file save bit).

How to save records with Asset field use server-to-server cloudkit.js

I want to use server-to-server cloudkit js. to save record with Asset field.
the Asset field is a m4a audio. after saved, the audio file is corrupt to play
The Apple's Doc is not clear about the Asset field.
In a record that is being saved to the database, the value of an Asset field must be a window.Blob type. In the code fragment above, the type of the assetFile variable is window.File.
Docs:
https://developer.apple.com/documentation/cloudkitjs/cloudkit/database/1628735-saverecords
but in nodejs ,there is no Blob or .File, I filled it with a buffer like this code:
var dstFile = path.join(__dirname,"../test.m4a");
var data = fs.readFileSync(dstFile);
let buffer = Buffer.from(data);
var rec = {
recordType: "MyAttachment",
fields: {
ext: { value: ".m4a" },
file: { value: buffer }
}
}
//console.debug(rec);
mydatabase.newRecordsBatch().create(rec).commit().then(function (response) {
if (response.hasErrors) {
console.log(">>> saveAttachFile record failed");
console.warn(response.errors[0]);
} else {
var createdRecord = response.records[0];
console.log(">>> saveAttachFile record success:", createdRecord);
}
});
The record is successful be saved.
But when I download the audio from icloud.developer.apple.com/dashboard .
the audio file is corrupt to play.
What's wrong with it. thank you to reply.
I was having the same problem and have found a working solution!
Remembering that CloudKitJS needs you to define your own fetch method, I implemented a custom one to see what was going on. I then attached a debugger on the custom fetch to inspect the data that was passing through it.
After stepping through the caller, I found that all asset values are transformed using its toString() method only when the library is embedded in NodeJS. This is determined by the absence of the global window object.
When toString() is called on a Buffer, its contents are encoded to UTF-8 (by default), which causes binary assets to become malformed. If you're using node-fetch for your fetch implementation, it supports Buffer and stream.Readable, so this toString() call does nothing but harm.
The most unobtrusive fix I've found is to swap the toString() method on any Buffer or stream.Readable instances passed as an asset field values. You should probably use stream.Readable, by the way, so that you don't load the entire asset in memory when uploading.
Anyway, here's what it looks like in practice:
// Put this somewhere in your implementation
const swizzleBuffer = (buffer) => {
buffer.toString = () => buffer;
return buffer;
};
// Use this asset value instead
{ asset: swizzleBuffer(fs.readFileSync(path)) }
Please be aware that this workaround mutates a Buffer in an ugly way (since Buffer apparently can't be extended). It's probably a good idea to design an API which doesn't use Buffer arguments so that you can mutate instances that only you create yourself to avoid unintended side effects anywhere else in your code.
Also, sure to vendor (make a local copy) of CloudKitJS in your project, as the behavior may change in the future.
ORIGINAL ANSWER
I ran into the same problem and solved it by encoding my data using Base64. It appears that there's a bug in their SDK which mangles Buffer instances containing non-ascii characters (which, um, seems problematic).
Anyway, try something like this:
const assetField = { value: Buffer.from(data.toString('base64')), 'ascii') }
Side note:
You'll need to decode the asset(s) on the device before using them. There's no way to do this efficiently without writing your own routines, as the methods included in Data / NSData instances requires all data to be in memory.
This is a problem with CloudKitJS (and not the native CloudKit client / service), so the other option is to write your own routine to upload assets.
Neither of these options seem particularly great, but rolling your own atleast means there aren't extra steps for clients to take in order to use the asset.

Mongo Change Streams running multiple times (kind of): Node app running multiple instances

My Node app uses Mongo change streams, and the app runs 3+ instances in production (more eventually, so this will become more of an issue as it grows). So, when a change comes in the change stream functionality runs as many times as there are processes.
How to set things up so that the change stream only runs once?
Here's what I've got:
const options = { fullDocument: "updateLookup" };
const filter = [
{
$match: {
$and: [
{ "updateDescription.updatedFields.sites": { $exists: true } },
{ operationType: "update" }
]
}
}
];
const sitesStream = Client.watch(sitesFilter, options);
// Start listening to site stream
sitesStream.on("change", async change => {
console.log("in site change stream", change);
console.log(
"in site change stream, update desc",
change.updateDescription
);
// Do work...
console.log("site change stream done.");
return;
});
It can easily be done with only Mongodb query operators. You can add a modulo query on the ID field where the divisor is the number of your app instances (N). The remainder is then an element of {0, 1, 2, ..., N-1}. If your app instances are numbered in ascending order from zero to N-1 you can write the filter like this:
const filter = [
{
"$match": {
"$and": [
// Other filters
{ "_id": { "$mod": [<number of instances>, <this instance's id>]}}
]
}
}
];
Doing this with strong guarantees is difficult but not impossible. I wrote about the details of one solution here: https://www.alechenninger.com/2020/05/building-kafka-like-message-queue-with.html
The examples are in Java but the important part is the algorithm.
It comes down to a few techniques:
Each process attempts to obtain a lock
Each lock (or each change) has an associated fencing token
Processing each change must be idempotent
While processing the change, the token is used to ensure ordered, effectively-once updates.
More details in the blog post.
It sounds like you need a way to partition updates between instances. Have you looked into Apache Kafka? Basically what you would do is have a single application that writes the change data to a partitioned Kafka Topic and have your node application be a Kafka consumer. This would ensure only one application instance ever receives an update.
Depending on your partitioning strategy, you could even ensure that updates for the same record always go to the same node app (if your application needs to maintain its own state). Otherwise, you can spread out the updates in a round robin fashion.
The biggest benefit to using Kafka is that you can add and remove instances without having to adjust configurations. For example, you could start one instance and it would handle all updates. Then, as soon as you start another instance, they each start handling half of the load. You can continue this pattern for as many instances as there are partitions (and you can configure the topic to have 1000s of partitions if you want), that is the power of the Kafka consumer group. Scaling down works in the reverse.
While the Kafka option sounded interesting, it was a lot of infrastructure work on a platform I'm not familiar with, so I decided to go with something a little closer to home for me, sending an MQTT message to a little stand alone app, and letting the MQTT server monitor messages for uniqueness.
siteStream.on("change", async change => {
console.log("in site change stream);
const mqttClient = mqtt.connect("mqtt://localhost:1883");
const id = JSON.stringify(change._id._data);
// You'll want to push more than just the change stream id obviously...
mqttClient.on("connect", function() {
mqttClient.publish("myTopic", id);
mqttClient.end();
});
});
I'm still working out the final version of the MQTT server, but the method to evaluate uniqueness of messages will probably store an array of change stream IDs in application memory, as there is no need to persist them, and evaluate whether to proceed any further based on whether that change stream ID has been seen before.
var mqtt = require("mqtt");
var client = mqtt.connect("mqtt://localhost:1883");
var seen = [];
client.on("connect", function() {
client.subscribe("myTopic");
});
client.on("message", function(topic, message) {
context = message.toString().replace(/"/g, "");
if (seen.indexOf(context) < 0) {
seen.push(context);
// Do stuff
}
});
This doesn't include security, etc., but you get the idea.
Will that having a field in DB called status which will be updated using findAnUpdate based on the event received from change stream. So lets say you get 2 events at the same time from change stream. First event will update the status to start and the other will throw error if status is start. So the second event will not process any business logic.
I'm not claiming those are rock-solid production grade solutions, but I believe something like this could work
Solution 1
applying Read-Modify-Write:
Add version field to the document, all the created docs have version=0
Receive ChangeStream event
Read the document that needs to be updated
Perform the update on the model
Increment version
Update the document where both id and version match, otherwise discard the change
Yes, it creates 2 * n_application_replicas useless queries, so there is another option
Solution 2
Create collection of ResumeTokens in mongo which would store collection -> token mapping
In the changeStream handler code, after successful write, update ResumeToken in the collection
Create a feature toggle that will disable reading ChangeStream in your application
Configure only a single instance of your application to be a "reader"
In case of "reader" failure you might either enable reading on another node, or redeploy the "reader" node.
As a result: there might be an infinite amount of non-reader replicas and there won't be any useless queries

Is there any risk to read/write the same file content from different 'sessions' in Node JS?

I'm new in Node JS and i wonder if under mentioned snippets of code has multisession problem.
Consider I have Node JS server (express) and I listen on some POST request:
app.post('/sync/:method', onPostRequest);
var onPostRequest = function(req,res){
// parse request and fetch email list
var emails = [....]; // pseudocode
doJob(emails);
res.status(200).end('OK');
}
function doJob(_emails){
try {
emailsFromFile = fs.readFileSync(FILE_PATH, "utf8") || {};
if(_.isString(oldEmails)){
emailsFromFile = JSON.parse(emailsFromFile);
}
_emails.forEach(function(_email){
if( !emailsFromFile[_email] ){
emailsFromFile[_email] = 0;
}
else{
emailsFromFile[_email] += 1;
}
});
// write object back
fs.writeFileSync(FILE_PATH, JSON.stringify(emailsFromFile));
} catch (e) {
console.error(e);
};
}
So doJob method receives _emails list and I update (counter +1) these emails from object emailsFromFile loaded from file.
Consider I got 2 requests at the same time and it triggers doJob twice. I afraid that when one request loaded emailsFromFile from file, the second request might change file content.
Can anybody spread the light on this issue?
Because the code in the doJob() function is all synchronous, there is no risk of multiple requests causing a concurrency problem.
If you were using async IO in that function, then there would be possible concurrency issues.
To explain, Javascript in node.js is single threaded. So, there is only one thread of Javascript execution running at a time and that thread of execution runs until it returns back to the event loop. So, any sequence of entirely synchronous code like you have in doJob() will run to completion without interruption.
If, on the other hand, you use any asynchronous operations such as fs.readFile() instead of fs.readFileSync(), then that thread of execution will return back to the event loop at the point you call fs.readFileSync() and another request can be run while it is reading the file. If that were the case, then you could end up with two requests conflicting over the same file. In that case, you would have to implement some form of concurrency protection (some sort of flag or queue). This is the type of thing that databases offer lots of features for.
I have a node.js app running on a Raspberry Pi that uses lots of async file I/O and I can have conflicts with that code from multiple requests. I solved it by setting a flag anytime I'm writing to a specific file and any other requests that want to write to that file first check that flag and if it is set, those requests going into my own queue are then served when the prior request finishes its write operation. There are many other ways to solve that too. If this happens in a lot of places, then it's probably worth just getting a database that offers features for this type of write contention.

Observe file changes with node.js

I have the following UseCase:
A creates a Chat and invites B and C - On the Server A creates a
File. A, B and C writes messages into this file. A, B and C read this
file.
I want a to create a file on server and observe this file if anybody else writes something into this file send the new content back with websockets.
So, any change of this file should be observed by my node.js application.
How can I observe files-changes? Is this possible with node js without locking the files?
If not possible with files, would it be possible with database object (NoSQL)
Good news is that you can observe filechanges with Node's API.
This however doesn't give you access to the contents that has been written into the file.
You can maybe use the fs.appendFile(); function so that when something is being written into the file you emit an event to something else that "logs" your new data that is being written.
fs.watch(): Directly pasted from the docs
fs.watch('somedir', function (event, filename) {
console.log('event is: ' + event);
if (filename) {
console.log('filename provided: ' + filename);
} else {
console.log('filename not provided');
}
});
Read here about the fs.watch(); function
EDIT: You can use the function
fs.watchFile();
Read here about the fs.watchFile(); function
This will allow you to watch a file for changes. Ie. whenever it is accessed by some other processes of any kind.
Also you could use node-watch. Here's an easy example:
const watch = require('node-watch')
watch('README.md', function(event, filename) {
console.log(filename, ' changed.')
})
I do not think you need to have observe file changes or use a NoSQL database for this (if you do not want to). My advice would be to look at events(Observer pattern). There are more than enough tutorials on this topic available online (Google). For example Felix's article about Using EventEmitters
This publish/subcribe semantic can also be achieved with NoSQL. In Redis for example, I think you should have a look at pubsub.
In MongoDB I think tailable cursors is what you are looking for. On their blog they have a post explaining pub/sub.

Resources