Keep Object In Memory Between Requests with SailsJS/ Express - node.js

I'm building a server using SailsJS (a framework built on top of Express) and I need to keep an object in memory between requests. I would like to do this because loading it to/ from a database is taking way too long. Any ideas how I could do this?
Here's my code:
var params = req.params.all();
Network.findOne({ id: params.id }, function(err, network) {
if(network) {
var synapticNetwork = synaptic.Network.fromJSON(network.jsonValue);
if(synapticNetwork) { ...
Specifically, the fromJSON() function takes way too long and I would rather keep the synapticNetwork object in memory while the server is running (aka. load it when the server starts and just save periodically).

There are plenty libraries out there for caching purposes, one of which is node-cache as you've mentioned. All of them share similar api :
var cache = require('memory-cache');
// now just use the cache
cache.put('foo', 'bar');
console.log(cache.get('foo'))
You can also implement your own module and just require it wherever you need:
var cache = {};
module.exports = {
put: function(key, item) {
cache[key] = item;
},
get: function(key) {
return cache[key];
}
}

There are a lot of potential solutions. The first and most obvious one is using some session middleware for express. Most web frameworks should have some sort of session solution.
https://github.com/expressjs/session
The next option would be to use a caching utility like what Vsevolod suggested. It accomplishes pretty much the same thing as session, except if the data needs to be tied to a user/session then you'll have to store some kind of identifier in the session and use that to retrieve from the cache. Which I think is a bit redundant if that's your use-case.
There are also utilities that will expand your session middle-ware and persist objects in session to a database or other kinds of data stores, so that session information isn't lost even after server restarts. You still get the speed of an in-memory store, but backed by a database in case the in-memory store gets blown away.
Another option is to use Redis. You still have to serialize/deserialize your objects, but Redis is an in-memory data store and is super quick to write to and read from.

Related

Tracking currently active users in node.js

I am building an application using node.js and socket.io. I would like to create a table of users who are actively browsing the site at any given moment, which will update dynamically.
I am setting a cookie to give each browser a unique ID, and have a mysql database of all users (whether online or not); however, I'm not sure how best to use these two pieces of information to determine who is, and who isn't, actively browsing right now.
The simplest way would seem to be to store the cookie & socket IDs in an array, but I have read that global variables (which presumably this would have to be) are generally bad, and to be avoided.
Alternatively I could create a new database table, where IDs are inserted and deleted when a socket connects/disconnects; but I'm not sure whether this would be overkill.
Is one of these methods any better than the other, or is there a way of tracking this information which I haven't thought of yet?
You can keep track of active users in memory without it being a global variable. It can simply be a module level variable. This is one of the advantages of the nodejs module system.
The reasons to put it in a database instead of memory are:
You have multiple servers so you need a centralized place to put the data
You want the data stored persistently so if the server is restarted (normally or abnormally) you will have the recent data
The reasons for not putting it directly in a database:
It's a significant load of new database operations since you have to update the data on every single incoming request.
You can sometimes get the persistence without directly using a database by logging the access to a log file and then running chron jobs that parse the logs and do bulk addition of data to the database. This has a downside in that it's not as easy to query live data (since the most recent data is sitting in databases and hasn't been parsed yet).
For an in-memory store, you could do something like this:
// middleware that keeps track of user access
let userAccessMap = new Map();
app.use((req, res, next) => {
// get userId from the cookie (substitute your own cookie logic here)
let id = id: req.cookie.userID;
let lastAccess = Date.now();
// if you want to keep track of more than just lastAccess,
// you can store an object of data here instead of just the lastAccess time
// To update it, you would get the previous object, update some properties
// in it, and then set it back in the userAccessMap
userAccessMap.set(id, lastAccess);
next();
});
// routinely clean up the userAccessMap to remove old access times
// so it doesn't just grow forever
const cleanupFrequency = 30 * 60 * 1000; // run cleanup every 30 minutes
const cleanupTarget = 24 * 60 * 60 * 1000; // clean out users who haven't been here in the last day
setInterval(() => {
let now = Date.now();
for (let [id, lastAccess] of userAccessMap.entries()) {
if (now - lastAccess > cleanupTarget) {
// delete users who haven't been here in a long time
userAccessMap.delete(id);
}
}
}, cleanupFrequncy);
// Then, create some sort of adminstrative interface (probably with some sort of access protection)
// that gives you access to the user access info
// This might even be available in a separate web server on a separate port that isn't open to the general publoic
app.get("/userAccessData", (req, res) => {
// perhaps convert this to a human readable user name by looking up the user id
// also may want to sort the data by recentAccess
res.json(Array.from(userAccessMap));
});

Optimal method for nodejs to hand of image from database to browser

the end result that I need is to send multiple images to a web browser from a database.
The images are stored as blobs.
I know I can stream them out of the database and into a file and then I could just give the url to the file.
I also know I can hand off base64 string to the browser so it can render the image.
My question is which option is the most optimal? Or best practice? Keep in mind that if I go the stream method, I would have to check to see if the image has changed since the last time I displayed it...and if it has changed then I have to restream it out of the database.
I have been playing with the oracldb for node js and was able to successfully extract one blob into a file but I am also having trouble streaming multiple files.
This is a two question post:
Which is the most optimal:
1. Send Base64 string - I kind of like this method because i dont have to worry about streaming out the file and checking if it has changed since it is coming straight from the databse. My concern is can the browser/nodejs handle it? I know those strings can be very large. I could also be sending more than one image at a time.
Stream the blobs into files.
The second part question is how can i get multiple blobs out below is my code on streaming just one file, i found this example from github lobstream1.js
https://raw.githubusercontent.com/oracle/node-oracledb/master/examples/lobstream1.js
Focusing on the code:
// Stream a LOB to a file
var dostream = function(lob, cb) {
if (lob.type === oracledb.CLOB) {
console.log('Writing a CLOB to ' + outFileName);
lob.setEncoding('utf8'); // set the encoding so we get a 'string' not a 'buffer'
} else {
console.log('Writing a BLOB to ' + outFileName);
}
var errorHandled = false;
lob.on(
'error',
function(err) {
console.log("lob.on 'error' event");
if (!errorHandled) {
errorHandled = true;
lob.close(function() {
return cb(err);
});
}
});
lob.on(
'end',
function() {
console.log("lob.on 'end' event");
});
lob.on(
'close',
function() {
// console.log("lob.on 'close' event");
if (!errorHandled) {
return cb(null);
}
});
var outStream = fs.createWriteStream(outFileName);
outStream.on(
'error',
function(err) {
console.log("outStream.on 'error' event");
if (!errorHandled) {
errorHandled = true;
lob.close(function() {
return cb(err);
});
}
});
// Switch into flowing mode and push the LOB to the file
lob.pipe(outStream);
};
Fixed spooling out images with this method, I did change the dostream a bit.
for(var x = 0; x<result.rows.length;x++)
{
outputFileName = x + '.jpg';
console.log(outputFileName);
console.log(x);
var lob = result.rows[x][0];
dostream(lob,outputFileName);
// cb(null,lob);
}
Thank you for any help.
Given all the detail you provided in subsequent comments including the average image size, number of distinct images, memory available to Node.js, number of concurrent users, and the fact that it's "very critical to have the images up to date", here's my initial take...
For the first implementation, stick to the KISS principle and avoid over-engineering. Disable browser caching and don't cache images in Node.js. Instead, rely on the driver and Oracle Database to do the heavy lifting for you.
As for the table storing the images, try to use SecureFile LOBs over BasicFile LOBs (they are known to perform better) if possible. Also, look at the caching options available to both (CACHE, CACHE READS, and NOCACHE). Consider enabling the CACHE READS option based on your stated workload, but work with your DBA to ensure the buffer cache is sized appropriately so you will not impact others.
You can rely on the connection pool's connection request queue to help control how many people are fetching files concurrently. In fact, you might want to create a separate pool just for this purpose so that people fetching LOBs aren't blocking people doing other things in the application. For example, let's say you normally have one connection pool with 10 connections. You could create two connection pools with 5 connections each (use the connection pool cache to make this easy). Then, in the code path that fetches lobs, use the lob pool and use the other pool for everything else.
Given this setup, I'd also recommend NOT streaming the LOBs. Using the driver's ability to buffer the LOBs in Node.js will greatly simplify the code and you should have plenty of memory given such a small number of concurrent users/file fetches.
The biggest problem with this scenario that the images are pretty large and they'll always be flowing from the database through Node.js to the browser. But since you'll be on an internal network, this might not be much of a problem. If it does turn out to be a problem, you can start to add caching in either the browser or Node.js based on what makes the most sense.
Unless you do something like tiling or the base64 inline encoding, each image needs its own URL, so each invocation of node-oracledb would return just one image. You could do some kind of caching by writing to disk, but this seems extra IO - you will need to test to measure your own system's performance and memory requirements. Regarding accessing multiple images in node-oracledb there's some code in https://github.com/oracle/node-oracledb/issues/1041#issuecomment-459002641 that may be useful.

Best way to reuse a large translation file within Node / Express

I'm new to Node but I figured I'd jump right in and start converting a PHP app into Node/Express. It's a bilingual app that uses gettext with PO/MO files. I found a Node module called node-gettext. I'd rather not convert the PO files into another format right now, so it seems this library is my only option.
So my concern is that right now, before every page render, I'm doing something like this:
exports.home_index = function(req, res)
{
var gettext = require('node-gettext'),
gt = new gettext();
var fs = require('fs');
gt.textdomain('de');
var fileContents = fs.readFileSync('./locale/de.mo');
gt.addTextdomain('de', fileContents);
res.render(
'home/index.ejs',
{ gt: gt }
);
};
I'll also be using the translations in classes, so with how it's set up now I'd have to load the entire translation file again every time I want to translate something in another place.
The translation file is about 50k and I really don't like having to do file operations like this on every page load. In Node/Express, what would be the most efficient way to handle this (aside from a database)? Usually a user won't even be changing their language after the first time (if they're changing it from English).
EDIT:
Ok, I have no idea if this is a good approach, but it at least lets me reuse the translation file in other parts of the app without reloading it everywhere I need to get translated text.
In app.js:
var express = require('express'),
app = express(),
...
gettext = require('node-gettext'),
gt = new gettext();
Then, also in app.js, I create the variable app.locals.gt to contain the gettext/translation object, and I include my middleware function:
app.locals.gt = gt;
app.use(locale());
In my middleware file I have this:
mod
module.exports = function locale()
{
return function(req, res, next)
{
// do stuff here to populate lang variable
var fs = require('fs');
req.app.locals.gt.textdomain(lang);
var fileContents = fs.readFileSync('./locales/' + lang + '.mo');
req.app.locals.gt.addTextdomain(lang, fileContents);
next();
};
};
It doesn't seem like a good idea to assign the loaded translation file to app, since depending on the current request that file will be one of two languages. If I assigned the loaded translation file to app instead of a request variable, can that mix up users' languages?
Anyway, I know there's got to be a better way of doing this.
The simplest option would be to do the following:
Add this in app.js:
var languageDomains = {};
Then modify your Middleware:
module.exports = function locale()
{
return function(req, res, next)
{
// do stuff here to populate lang variable
if ( !req.app.locals.languageDomains[lang] ) {
var fs = require('fs');
var fileContents = fs.readFileSync('./locales/' + lang + '.mo');
req.app.locals.languageDomains[lang] = true;
req.app.locals.gt.addTextdomain(lang, fileContents);
}
req.textdomain = req.app.locals.gt.textdomain(lang);
next();
};
};
By checking if the file has already been loaded you are preventing the action from happening multiple times, and the domain data will stay resident in the server's memory. The downside to the simplicity of this solution is that if you ever change the contents of your .mo files whilst the server is running, the changes wont be taken into account. However, this code could be extended to keep an eye on the mtime of the files, and reload accordingly, or make use of fs.watchFile — if required:
if ( !req.app.locals.languageDomains[lang] ) {
var fs = require('fs'), filename = './locales/' + lang + '.mo';
var fileContents = fs.readFileSync(filename);
fs.watchFile(filename, function (curr, prev) {
req.app.locals.gt.addTextdomain(lang, fs.readFileSync(filename));
});
req.app.locals.languageDomains[lang] = true;
req.app.locals.gt.addTextdomain(lang, fileContents);
}
Warning: It should also be noted that using sync versions of functions outside of server initialisation is not a good idea because it can freeze the thread. You'd be better off changing your sync loading to the async equivalent.
After the above changes, rather than passing gt to your template, you should be able to use req.textdomain instead. It seems that the gettext library supports a number of requests directly on each domain object, which means you hopefully don't need to refer to the global gt object on a per request basis (which will be changing it's default domain on each request):
Each domain supports:
getTranslation
getComment
setComment
setTranslation
deleteTranslation
compilePO
compileMO
Taken from here:
https://github.com/andris9/node-gettext/blob/e193c67fdee439ab9710441ffd9dd96d027317b9/lib/domain.js
update
A little bit of further clarity.
Once the server has loaded the file into memory the first time, it should remain there for all subsequent connections it receives (for any visitor/request) because it is stored globally and wont be garbage collected — unless you remove all references to the data, which would mean gettext would need to have some kind of unload/forget domain method.
Node is different to PHP in that its environment is shared and wraps its own HTTP server (if you are using something like Express), which means it is very easy to remember data globally as it has a constant environment that all the code is executed within. PHP is always executed after the HTTP server has received and dealt with the request (e.g. Apache). Each PHP response is then executed in its own separate run-time, which means you have to rely on databases, sessions and cache stores to share even simple information and most resources.
further optimisations
Obviously with the above you are constantly running translations on each page load. Which means the gettext library will still be using the translation data resident in memory, which will take up processing time. To get around this, it would be best to make sure your URLs have something that makes them unique for each different language i.e. my-page/en/ or my.page.fr or even jp.domain.co.uk/my-page and then enable some kind of full page caching using something like memcached or express-view-cache. However, once you start caching pages you need to make certain there aren't any regions that are user specific, if so, you need to start implement more complicated systems that are sensitive to these areas.
Remember: The golden rule of optimisation, don't do so before you need to... basically meaning I wouldn't worry about page caching until you know it's going to be an issue, but it is always worth bearing in mind what your options are, as it should shape your code design.
update 2
Just to illustrate a bit further on the behaviour of a server running in JavaScript, and how the global behaviour is not just a property of req.app, but in fact any object that is further up the scope chain.
So, as an example, instead of adding var languageDomains = {}; to your app.js, you could instantiate it further up the scope of wherever your middleware is placed. It's best to keep your global entities in one place however, so app.js is the better place, but this is just for illustration.
var languageDomains = {};
module.exports = function locale()
{
/// you can still access languageDomains here, and it will behave
/// globally for the entire server.
languageDomains[lang]
}
So basically, where-as with PHP, the entire code-base is re-executed on each request — so the languageDomains would be instantiated a-new each time — in Node the only part of the code to be re-executed is the code within locale() (because it is triggered as part of a new request). This function will still have a reference to the already existing and defined languageDomains via the scope chain. Because languageDomains is never reset (on a per request basis) it will behave globally.
Concurrent users
Node.js is single threaded. This means that in order for it to be concurrent i.e. handle multiple requests at the "same" time, you have to code your app in such a way that each little part can be executed very quickly and then slip into a waiting state, whilst another part of another request is dealt with.
This is the reason for the asynchronous and callback nature of Node, and the reason to avoid Sync calls whilst your app is running. Any one Sync request could halt or freeze execution of the thread and delay handling for all other requests. The reason why I state this is to give you a better idea of how multiple users might interact with your code (and global objects).
Basically once a request is being dealt with by your server, it is it's only focus, until that particular execution cycle ends i.e. your request handler stops calling other code that needs to run synchronously. Once that happens the next queued item is dealt with (a callback or something), this could be part of another request, or it could be the next part in the current request.

sqlite returns SQLITE_BUSY in WAL mode

I have a web application working with sqlite database.
My version of sqlite is the latest from official windows binary distribution - 3.7.13.
The problem is that under heavy load on database, sqlite API functions (such as sqlite3_step) are returning SQLITE_BUSY.
I pass the following pragmas when initializing a connection:
journal_mode = WAL
page_size = 4096
synchronous = FULL
foreign_keys = on
The databas is one-file database. And I'm using Mono 2.10.8 and Mono.Data.Sqlite assembly provided with it to access database.
I'm testing it with 50 parallel threads which are sending 50 subsequent http-requests each to my application. On every request some reading and writing are done to the database. Every set of IO operations is executed inside the transaction.
Everything goes well until near 400th - 700th request. In this (random) moment API functions are starting to return SQLITE_BUSY permanently (To be more exact - until the limit of retries is reached).
As far as i know WAL mode transparently supports parallel reads and writes. I've guessed that it could be because of attempt to read database while checkpoint operation is executed. But even after turning autocheckpoint off the situation remains the same.
What could be wrong in this situation?
How to serve large amount of parallel database IO correctly?
P.S.
Only one connection per request is supposed.
I use nhibernate configured with WebSessionContext.
I initialize my NHibernate session like this:
ISession session = null;
//factory variable is session factory
if (CurrentSessionContext.HasBind(factory))
{
session = factory.GetCurrentSession();
if (session == null)
CurrentSessionContext.Unbind(factory);
}
if (session == null)
{
session = factory.OpenSession();
CurrentSessionContext.Bind(session);
}
return session;
And on HttpApplication.EndRequest i release it like this:
//factory variable is session factory
if (CurrentSessionContext.HasBind(factory))
{
try
{
CurrentSessionContext.Unbind(factory)
.Dispose();
}
catch (Exception ee)
{
Logr.Error("Error uninitializing session", ee);
}
}
So as far as i know there should be only one connection per request life cycle. While proceessing the request, code is executed sequentially (ASP.NET MVC 3). So it doesn't look like any concurency is possible here. Can i conclude that no connections are shared in this case?
It's not clear to me if the request threads share the same connection or not. If they don't then you should not be having these issues.
Assuming that you are indeed sharing the connection object across multiple threads, you should use some locking mechanism as the the SqliteConnection isn't thread-safe (an old post, but the SQLite library maintained as part of Mono evolved from System.Data.SQLite found on http://sqlite.phxsoftware.com).
So assuming that you don't lock around using the SqliteConnection object, can you please try it? A simple way to accomplish this could look like this:
static readonly object _locker = new object();
public void ProcessRequest()
{
lock (_locker) {
using (IDbCommand dbcmd = conn.CreateCommand()) {
string sql = "INSERT INTO foo VALUES ('bar')";
dbcmd.CommandText = sql;
dbcmd.ExecuteNonQuery();
}
}
}
You may however choose to open a distinct connection with each thread to ensure you don't have any more threading issues with the SQLite library.
EDIT
Following-up on the code you posted, do you close the session after committing the transaction? If you don't use some ITransaction, do you flush and close the session? I'm asking since I don't see it in your code, and I see it mentioned in https://stackoverflow.com/a/43567/610650
I also see it mentioned on http://nhibernate.info/doc/nh/en/index.html#session-configuration:
Also note that you may call NHibernateHelper.GetCurrentSession(); as
many times as you like, you will always get the current ISession of
this HTTP request. You have to make sure the ISession is closed after
your unit-of-work completes, either in Application_EndRequest event
handler in your application class or in a HttpModule before the HTTP
response is sent.

Persistent Sessions in Meteor

So, one of the more confusing aspects I've been observing with Meteor is that Sessions get cleared every refresh. Since it isn't a persistent store, where would I put things like userid, or where people are in my application's state machine?
What are the patterns for those scenarios?
Actually what you could do is create a "subclass" of Session that stores the value in Amplify's local storage when set() is called. You would automatically inherit all the reactive properties of Session. Here is the code, it worked for me:
SessionAmplify = _.extend({}, Session, {
keys: _.object(_.map(amplify.store(), function(value, key) {
return [key, JSON.stringify(value)]
})),
set: function (key, value) {
Session.set.apply(this, arguments);
amplify.store(key, value);
},
});
Just replace all your Session.set/get calls with SessionAmplify.set/get calls. When set() is called, the parent Session method is called, as well as amplify.store(). When the "subclass" is first created, it loads everything that is in amplify's store inside its keys, so that they can be retrieved right away with get().
You can test a working variation of the Leaderboard example here: https://github.com/sebastienbarre/meteor-leaderboard
Well, for a start I would be using meteors built in Auth to store userID. They are using local storage by default there I think, but AFAIK there's no easy way to hook into that.
However, I would have thought if you want stuff to survive across refreshes you should either store it in mongo or use the URL to indicate where they are in the 'state machine'. You can use the bootstrap router (for example) to use pushState to change the URL.

Resources