We are trying to implement the strategy outlined in the following presentation (slides 13-18) using nodejs/mongo-native driver.
https://www.slideshare.net/mongodb/securing-mongodb-to-serve-an-awsbased-multitenant-securityfanatic-saas-application
In summary:
Create a connection pool to mongodb from node.js.
For every request for a tenant, get a conenction from the pool and "authenticate" it. Use the authenticated conenection to serve the request. After response, return the connection to the pool.
Im able to create a connection pool to mongodb without specifying any database using the mongo-native driver like so:
const client = new MongoClient('mongodb://localhost:27017', { useNewUrlParser: true, poolSize: 10 });
However, in order to get a db object, I need to do the following:
const db = client.db(dbName);
This is where I would like to authenticate the connection, and it AFAICS, this functionality has been deprecated/removed from the more recent mongo drivers, node.js and java.
Going by the presentation, looks like this was possible to do with older versions of the Java driver.
Is it even possible for me to use a single connection pool and authenticate tenants to individual databases using the same connections ?
The alternative we have is to have a connection pool per tenant, which is not attractive to us at this time.
Any help will be appreciated, including reasons why this feature was deprecated/removed.
it's me from the slides!! :) I remember that session, it was fun.
Yeah that doesn't work any more, they killed this magnificent feature like 6 months after we implemented it and we were out with it in Beta at the time. We had to change the way we work..
It's a shame since till this day, in Mongo, "connection" (network stuff, SSL, cluster identification) and authentication are 2 separate actions.
Think about when you run mongo shell, you provide the host, port, replica set if any, and your in, connected! But not authenticated. You can then authenticate to user1, do stuff, and then authenticate to user2 and do stuff only user2 can do. And this is done on the same connection! without going thru the overhead creating the channel again, SSL handshake and so on...
Back then, the driver let us have a connection pool of "blank" connections that we could authenticate at will to the current tenant in context of that current execution thread.
Then they deprecated this capability, I think it was with Mongo 2.4. Now they only supported connections that are authenticated at creation. We asked enterprise support, they didn't say why, but to me it looked like they found this way is not secured, "old" authentication may leak, linger on that "not so blank" reusable connection.
We made a change in our multi-tenancy infra implementation, from a large pool of blank connections to many (small) pools of authenticated connections, a pool per tenant. These pools per tenant can be extremely small, like 3 or 5 connections. This solution scaled nicely to several hundreds of tenants, but to meet thousands of tenants we had to make all kinds of optimizations to create pools as needed, close them after idle time, lazy creation for non-active or dormant tenants, etc. This allowed us to scale even more... We're still looking into solutions and optimizations.
You could always go back to a global pool of authenticated connections to a Mongo user that have access to multiple databases. Yes, you can switch database on that same authenticated connection. You just can't switch authentication..
This is an example of pure Mongo Java driver, we used Spring which provide similar functionality:
MongoClient mongoClient = new MongoClient();
DB cust1db = mongoClient.getDB("cust1");
cust1db.get...
DB cust2db = mongoClient.getDB("cust2");
cust2db.get...
Somewhat related, I would recommend looking at MongoDB encryption at rest, it's an enterprise feature. The only way to encrypt each database (each customer) according to a different key.
Related
I wanted to create two database connection and periodically check the status of database connection. If database 1 fails I want to switch the connection to database 2. Can you give me some pointers please?
The first question is what exactly are you trying to do during the 'failover'. Is the Node.js app doing some work for users, or is it just the monitoring script? Then, next is how are your DBs configured (I'm guessing it's not RAC)?
There are all sorts of 'high availability' options and levels in Oracle, many of which are transparent to the application and are available when you use a connection pool. A few are described in the node-oracledb doc - look at Connections and High Availability. Other things can be configured in a tnsnames.ora file such as connection retries if connection requests fail.
As the most basic level the answer to your question is that you could periodically check whether a query works, or just use connection.ping(). If you go this 'roll your own' route, use a connection pool with size 1 for each DB. Then periodically get the connection from the pool and use it.
If you update your question with details, it would be easier to answer.
My aim is to log unique queries per session by writing custom QueryHandler implementation as logging all queries causes performance hit in our case.
Consider the case : If a user connected to cassandra cluster with java client and performs "select * from users where id = ?" 100 times.
And another user connected from cqlsh and performed same query 50 times. so i want to log only two queries in this case. For that i need a unique session id per login.
Cassandra provides below interface where all requests lands up but none of its apis provide any sessionId to differentiate between two different session described in above case.
org.apache.cassandra.cql3.QueryHandler
Note: I am able to get remoteaddress/port but i want some id which is created when user logged in and get destroyed when he disconnects.
In queryState.getClientState().getRemoteAddress() the address + port will be unique per tcp connection in the sessions pool. There can be multiple concurrent requests over each connection though, and a session can have multiple connections per host. There is also no guarantee the same tcp connection will be used from one request to another on client side.
However a single session cannot be connected as 2 different users (part of the initialization of connection) so the scenario you described isn't possible from the same Session object perspective. I think just using the address as the key for uniqueness will be all you can do given how the protocol/driver works. It will at least dedup things a little.
Are you actually processing your logging inline or are you pushing it off async? If using logback it should be using async appender but if your posting events synchronously to another server, might be better just to throw all the events on a queue and let it do the deduping in another thread so you don't hurt latency.
I have a Node.js script and a PostgreSQL database, and I'll be using a library that maintains a pool of connections to the database.
Say I have a script that queries the database multiple times (not a transaction) at different parts of the script, how do I tell if I should acquire a single connection/client and reuse it throughout*, or acquire a new client from the pool for each query? (Both works but which has better performance?)
*task in the pg-promise library, connect in the node-postgres library.
...
// Acquire connection from pool.
(Database query)
(Non-database-related code)
(Database query)
// Release connection to pool.
...
or
...
// Acquire connection from pool.
(Database query)
// Release connection to pool.
(Non-database-related code)
// Acquire connection from pool.
(Database query)
// Release connection to pool.
...
I am not sure, how the pool you are using works, but normally they should reuse the connections (don't disconnect after use), so you do not need to be concerned with caching connections.
You can use node-postgres module that will make you task easier.
And about your question when to use pool here is the brief answer.
PostgreSQL server can only handle 1 query at a time per connection.
That means if you have 1 global new pg.Client() connected to your
backend your entire app is bottleknecked based on how fast postgres
can respond to queries. It literally will line everything up, queuing
each query. Yeah, it's async and so that's alright...but wouldn't you
rather multiply your throughput by 10x? Use pg.connect set the
pg.defaults.poolSize to something sane (we do 25-100, not sure the
right number yet).
new pg.Client is for when you know what you're doing. When you need a
single long lived client for some reason or need to very carefully
control the life-cycle. A good example of this is when using
LISTEN/NOTIFY. The listening client needs to be around and connected
and not shared so it can properly handle NOTIFY messages. Other
example would be when opening up a 1-off client to kill some hung
stuff or in command line scripts.
here is the link of that module.
Hopefully this will help.
https://github.com/brianc/node-postgres
You can see the documentation over there and about the pooling. Thanks :)
And about closing the pool it provides the callback done which can be called when you want to close that pool.
I'm researching a good way to implement multiple database for multi-tenant support using node.js + mongoose and mongodb.
I've found out that mongoose supports a method called createConnection() and I'm wondering the best practice to use that. Actually I am storing all of those connection in an array, separated by tenant. It'd be like:
var connections = [
{ tenant: 'TenantA', connection: mongoose.createConnection('tenant-a') },
{ tenant: 'TenantB', connection: mongoose.createConnection('tenant-b') }
];
Let's say the user send the tenant he will be logged in by request headers, and I get it in a very early middleware in express.
app.use(function (req, res, next) {
req.mongoConnection = connections.find({tenant: req.get('tenant')});
});
The question is, is it OK to store those connections statically or a better move would be create that connection every time a request is made ?
Edit 2014-09-09 - More info on software requirements
At first we are going to have around 3 tenants, but our plan is to increase that number to 40 in a year or two. There are more read operations than write ones, it's basically a big data system with machine learning. It is not a freemium software. The databases are quite big because the amount of historical data, but it is not a problem to move very old data to another location (we already thought about that). We plan to shard it later if we run out of available resources on our database machine, we could also separate some tenants in different machines.
The thing that most intrigues me is that some people say it's not a good idea to have prefixed collections for multitenancy but the reasons for that are very short.
https://docs.compose.io/use-cases/multi-tenant.html
http://themongodba.wordpress.com/2014/04/20/building-fast-scalable-multi-tenant-apps-with-mongodb/
I would not recommend manually creating and managing those separate connections. I don't know the details of your multi-tenant requirements (number of tenants, size of databases, expected number transactions, etc), but I think it would be better to go with something like Mongoose's useDb function. Then Mongoose can handle all the connection pool details.
update
The first direction I would explore is to setup each tenant on a separate node process. There are some interesting benefits to running your tenants in separate node processes. It makes sense from a security standpoint (isolated memory) and from a stability standpoint (one tenant process crash doesn't effect others).
Assuming you're basing the tenancy off of the URL, you would setup a proxy server in front of the actual tenant servers. It's job would be to look at the URL and route to the correct process based on that information. This is a very straightforward node http proxy setup. Each tenant instance could be the exact same code base, but launched with a different config (which tells them what mongo connection string to use).
This means you're able to design your actual application as if it wasn't multi-tenant. Each process only knows about one mongo database, and there is no multi-tenant logic necessary. It also enables you to easily split up traffic later based on load. If you need split up the tenants for performance reasons, you can do it transparently at the proxy level. The DNS can all stay the same, and you can just move the server that the instances are on behind the scenes. You can even have the proxy balance the requests for a tenant between multiple servers.
I'm using the Node native client 1.4 in my application and I found something in the document a little bit confusing:
A Connection Pool is a cache of database connections maintained by the driver so that connections can be re-used when new connections to the database are required. To reduce the number of connection pools created by your application, we recommend calling MongoClient.connect once and reusing the database variable returned by the callback:
Several questions come in mind when reading this:
Does it mean the db object also maintains the fail over feature provided by replica set? Which I thought should be the work of MongoClient (not sure about this but the C# driver document does say MongoClient maintains replica set stuff)
If I'm reusing the db object, when should I invoke the db.close() function? I saw the db.close() in every example. But shouldn't we keep it open if we want to reuse it?
EDIT:
As it's a topic about reusing, I'd also want to know how we can share the db in different functions/objects?
As the project grows bigger, I don't want to nest all the functions/objects in one big closure, but I also don't want to pass it to all the functions/objects.
What's a more elegant way to share it among the application?
The concept of "connection pooling" for database connections has been around for some time. It really is a common sense approach as when you consider it, establishing a connection to a database every time you wish to issue a query is very costly and you don't want to be doing that with the additional overhead involved.
So the general principle is there that you have an object handle ( db reference in this case ) that essentially goes and checks for which "pooled" connection it can use, and possibly if the current "pool" is fully utilized then and create another ( or a few others ) connection up to the pool limit in order to service the request.
The MongoClient class itself is just a constructor or "factory" type class whose purpose is to establish the connections and indeed the connection pool and return a handle to the database for later usage. So it is actually the connections created here that are managed for things such as replica set fail-over or possibly choosing another router instance from the available instances and generally handling the connections.
As such, the general practice in "long lived" applications is that "handle" is either globally available or able to be retrieved from an instance manager to give access to the available connections. This avoids the need to "establish" a new connection elsewhere in your code, which has already been stated as a costly operation.
You mention the "example" code which is often present through many such driver implementation manuals often or always calling db.close. But these are just examples and not intended as long running applications, and as such those examples tend to be "cycle complete" in that they show all of the "initialization", the "usage" of various methods, and finally the "cleanup" as the application exits.
Good application or ODM type implementations will typically have a way to setup connections, share the pool and then gracefully cleanup when the application finally exits. You might write your code just like "manual page" examples for small scripts, but for a larger long running application you are probably going to implement code to "clean up" your connections as your actual application exits.