NodeJS sharding architecture with many MondoDB databases approaches

NodeJS sharding architecture with many MondoDB databases approaches - node.js

We have architecture problem on our project. This project requires sharding, as soon as we need almost unlimited scalability for the part of services.
Сurrently we use Node.js + MongoDb (Mongoose) and MySQL (TypeORM). Data is separated by databases through the simple 'DB Locator'. So node process needs connections to a lot of DBs (up to 1000).
Requests example:
HTTP request from client with Shop ID;
Get DB IP address/credentials in 'DB Locator' service by Shop ID;
Create connection to specific database with shop data;
Perform db queries.
We tried to implement it in two ways:
Create connection for each request, close it on response.
Problems:
we can't use connection after response (it's the main problem, because sometimes we need some asynchronous actions);
it works slower;
Keep all connections opened.
Problems:
reach simultaneous connections limit or some another limits;
memory leaks.
Which way is better? How to avoid described problems? Maybe there is a better solution?
Solution #1 perfectly worked for us on php as it runs single process on request and easily drops connections on process end. As we know, Express is pure JS code running in v8 and is not process based.
It would be great to close non-used connections automatically but can't find options to do that.

The short answer: stop using of MongoDB with Mongoose 😏
Longer answer:
MongoDB is document-oriented DBMS. The main usage case is when you have some not pretty structured data that you have to store, but you don't need to use too much. There is lazy indexing, dynamic typing and many more things that not allow you to use it as RDBMS, but it is great as a storage of logs or any serialized data.
The worth part here is Mongoose. This is the library that makes you feel like your trashbox is wonderful world with relations, virtual fields and many things that should not to be in DODBMS. Also, there is a lot of legacy code from previous versions that also make some troubles with connections management.
You already use TypeORM that may works instead Mongoose. With some restrictions, for sure.
It works exactly same way as MySQL connection management.
Here is some more data: https://github.com/typeorm/typeorm/blob/master/docs/mongodb.md#defining-entities-and-columns
In this case you may use you TypeORM Repository as transparent client that will init connections and close it or keep it alive on demand.

Related

MERN stack and Socket to Mongodb - real time data to frontend from database

I am setting up a website with MERN stack. in the backend I will be constantly fetching data from api/socket and saving that to the Mongodb database. Now the frontend React I want to real-time show/update data with Socket.
I am a bit worried about the amount of request/connections the server side and Mongodb can handle.
Its not clear how to setup a proper system to handle millions of users.
If someone can give me some info on what, how and/or where to search on how to setup a stable system.
Any info is welcome, thank you.

This is a suggestion for the MongoDB portion.
You may want to try out their official course on MongoDB to understand how their transaction works. MongoBD University, for hands-on try their "MongoDB for JavaScript Developers" course.
Here are some things to look out for:
Connection Pooling
Understand how transactions work
Indexes, especially on how to create a good one like following the ESR
MongoDB Clusters, Write into primary and read from secondary
I will update the list if I have time and remember some of the stuff.

Should I close my mongoose node.js connection after saving into database?

I have the following code in my app.js which runs on server start (npm start)
mongo.mongoConnect('connection_string', 'users').then((x) => {
console.log('Database connection successful');
app.listen(5000, () => console.log('Server started on port 5000'));
})
.catch(err => {
console.error(err.stack);
process.exit(1);
});
process.on('SIGINT', mongo.mongoDisconnect).on('SIGTERM', mongo.mongoDisconnect);
As you can see I open up SIGINT and SIGTERM for closing my connections upon process.exit
I've been reading a lot about how to deal with database connections in mongo and know that I should just invoke it once and have it across my application.
Does that mean that even after save() method when saving data to mongo followed by POST request, I should not be closing my connection? If I close it, how am I going to invoke it again since the connections happens on app start?
I'm asking it since in PHP I had the practice to always open and close my connection after querying MySql database.
Likewise, does it mean that the connection will close only on server shutdown in other words it will always be present since I do not want to shut down my node.js backend instance?

It is formally correct to open a connection, run a query, and then close the connection, but it is not a good practice, because opening a connection is an "expensive" operation and connections can be reused, which is much more efficient. The main restriction on an open connection is that it can only be used by 1 thread at a time. (More accurately, once a request is sent on a connection, no other requests can be sent on that connection until the response to that request is received.)
If your application is short lived or inherently single threaded, as may be the case when running as a "serverless" function, it may be acceptable to open and close a connection on each request.
While in theory it might be acceptable to open a single connection at the start of the program, keep a global reference to that connection, and reuse it, in practice there are common ways in which a connection becomes unusable that you would have to account for, and handling all the possibilities requires complex code. It gets even more complicated when, as is possible with MongoDB replica sets, you are actually connecting to more than one server and want to retry a command on a second server if the first one fails to respond.
That is why the standard and "best" practice is to use a "connection pool" to manage your database connections. A pool opens a set of network connections to the database, verifies and maintains their health, and dynamically assigns virtual database connections to actual network connections as needed. The pool is implemented in a library that will have received a lot of real world testing and is extremely likely to be better than anything you would write yourself. Connection pools have configuration options that would let you set any behavior you want, including opening a new connection for each request and closing it when done, but offer a wide range of performance enhancing capabilities, such a reusing connections and avoiding the overhead of creating them for each request.
This is why for MongoDB, the standard Node.js client already implements a connection pool. I do not know what mongo.mongoConnect in your code refers to; you said in the title that you are using mongoose but it uses connect, not mongoConnect to connect to the database. In general you should either be using the standard client or a JavaScript ORM library like mongoose. Either of them will take care of the connection management issues for you.
Refer to the documentation for the client/library you use for exactly the right way to use it. In general, you would initialize some kind of client object and store it globally before entering your main application handler. Then you would use this object to handle your database operations, and the object will transparently manage the underlying connections via the pool implementation. In this kind of setup, you would only close the connection when exiting the program, and usually the library takes care of that for you automatically, so you really never need to close the connection.
Thus, when using a MongoDB connection pool in NodeJS, you write your program basically the same way you would as if you just opened a connection at startup and then kept reusing it. The libraries take care of isolating you from all the problems that can arise from actually doing this. You do not need to, and in fact should not, close the connection after a database operation when using standard MongoDB NodeJS libraries.
Note that other connection pool implementations exist that do require you to close the connection. What you do with those pools is reserve (or "check out" or "open") a connection, use it, perhaps for multiple operations, and the release (or "check in" or "close") the connection when you are done. This is probably what you were doing in PHP. It is important to read and follow the documentation for the connection pool library you are using to make sure you are using it correctly.

This may not be the exact answer you are looking for, but it is not a good idea to open a new connection for every request and then close it. It is an overhead because it takes some time (even in milliseconds) to create a new connection.
Instead, you should create a pool of connections and use it in your app.

It's a good idea to close your mongo connection when your process dies or is stopped, but you should not need to close your mongoose connection after every successful query.
If you are instantiating a new mongo connection before each query you shouldn't need to be doing that either. You should just need to do that once when booting up your server.

you have two approaches
1) reopen a connection on every call using middle wares
2) you have to save your's query in node sometime later on execute all it onces

How many connections could node's mongoose and mongodb themselves handle? Will the server crash?

I was thinking about building some kind of API built on NodeJS with mongoose. I read that mongoose uses 1 connection per app.
But let us say that we have 300,000 users joining a room to answer some questions (real-time), will mongoose/mongodb handle it? Or will the server itself even handle it?

Thinking on the database side only:
The mongod executable have a parameter (--maxConns) for setting the maximum number of connections, prior to v2.6 you had a limit for that. Now, as the docs say: "This setting has no effect if it is higher than your operating system’s configured maximum connection tracking threshold", see here for linux.
Besides that you MUST consider a sharded cluster for this kind of load.

Short or long lived connections for RethinkDB?

We have a project on Node.js that is based on restify and we are using RethinkDB as a database. The problem is that RethinkDB should be accessed from different parts of code (from route handlers, middlewares), but not for all requests. I am wondering what is the best way to connect to RethinkDB in this case?
I see next options:
have one long connection that is stored somewhere (approach we use now),
connect to RethinkDB on each HTTP request, which potentially some of the connections being never used,
connect in each part individually, with potentially several connections per HTTP request, but without useless connections.
I ask this question because I am not sure how well Rethink handle well short/long connections and how expensive they are. For instance MongoDB prefers long connections, but all examples in RethinkDB docs uses one connection per HTTP request.

I recommend a connection pool or one connection per query. Especially if you use feature like changefeeds, which is recommened to be on its own connection.
When you use a single connection for everything, you have to also handle re-connection when the connection timeout/broken. I think it's easier to just use a connection per query, or shared a connection on a request/response.
Just ensure to close your connection after using it, otherwise you will leak connections and new connection cannot be created.
Some driver goes further and doesn't require you to think of connection anymore such as: https://github.com/neumino/rethinkdbdash
Or Elixir RethinkDB: https://github.com/hamiltop/rethinkdb-elixir/issues/32 has an issue to create connection pool.
RethinkDB has an issue related connection pool: https://github.com/rethinkdb/rethinkdb/issues/281
That's probably what community is heading too.

How are Node.js+Socket.io+MongoDB webapps truly asynchronous?

I have a good old-style LAMP webapp. A week ago I needed to add a push notification mechanism to it.
Therefore, what I did was to add node.js+socket.io on the server and poll the MySQL database every 10 seconds using node.js to check whether there were new items: if so, I would have sent them to the client(s) with socket.io.
I was pretty happy with the result, even if that is not a proper realtime notification (as there is a lag of up to 10 secs).
Now, I am about to build a new webapp which will need push notifications, too. I am wondering whether to go with the same approach as the first one (that I believe is more stable and mature) or to go totally Node.js, without PHP and Apache. As for the database, I have already decided to go for MongoDB.
Finally, my question is: if I go for Node.js+Socket.io+MongoDB will I get a truly near-real-time webapp? I mean, as soon as a new record is inserted into MongoDB, will there be some sort of event triggered that I can catch via node.js, do some checking on it and, if relevant, send the notification to the client? Or will there be anyway some sort of polling on the db server-side and lag, as with my first LAMP webapp?
A related question: can you build a realtime webapp on MySQL without doing any polling as I did with my first app. Or do you need MongoDB (or Redis)?
I hope this question is not too silly - sorry, I am just starting with Node.js and co.
Thanks.

I understand your problem because I switched to node.js from php/apache/mysql too.
Generally node.js is stable, modules and your scripts are the main reasons for errors
Real-time has nothing to do with database, it's all about client and server, you can query as many data as you want in your requests and push it to the other client.
Choosing node.js is very wise but it's harder to implement.
When you insert a new record to your db, the event is the request itself, you will make a push event along with the database query something like:
// Please note this is not real code, just an example of the idea
app.get('/query', function(request, response){
// Query your database
db.query('SELECT * FROM users', function(rows){
// Push notification to dan
socket.emit('database_query_executed', 'to_dan', rows);
// End request
response.end('success');
})
})
Of course you can use MySQL! And any database you want, as I said real-time has nothing to do with databases because the database is in the middle of the process and it's totally optional.
If you want to use node.js for push notifications and php/apache for mysql then you will need to create 2 requests for each server something like:
// this is javascript
ajax('http://node.yoursite.com/push', node_options)
ajax('http://php.yoursite.com/mysql_query', php_options)
or if you want just one request, or you want to use a form, you can call your php and inside php you can create an http or net request to node.js from php, something like:
// this is php
new HttpRequest('http://node.youtsite.com/push', HttpRequest::METH_GET);

Using:
A regular MongoDB Collection as the Store,
A MongoDB Capped Collection with Tailable Cursors as the Queue,
A Node worker with Socket.IO watching the Queue as the Worker,
A Node server to serve the page with the Socket.IO client, and to receive POSTed data (or however else the data gets added) as the Server
It goes like:
The new data gets sent to the Server,
The Server puts the data in the Store,
The Server adds the data's ObjectID to the Queue,
The Queue will send the newly arrived ObjectID to the open Tailable Cursor on the Worker,
The Worker goes and gets the actual data in the ObjectID from the Store,
The Worker emits the data through the socket,
The client receives the data from the socket.
This is 'push' from the initial addition of the data all the way to receipt at the client - no polling, so as real-time as you can get given the processing time at each step.

Re: triggers in MongoDB - please see this answer: https://stackoverflow.com/a/12405093/1651408
There are much more convenient triggers in MySQL, but to call Node.js from them would require a bit of work with MySQL UDFs (user-defined functions), for instance pushing data through a Unix socket. Please note that this is necessary only when other applications (besides your Node.js process) are updating the database, and be sure to choose InnoDB as storage in this case (row- vs. table-level locking).
Can see no big problem with your technology choice of sockets.io, even if client-side web sockets aren't supported, you'll fall back (gracefully, I hope) to polling.
Finally, your question is not silly at all, since push technology is definitely superior to the flood of polling requests - it scales better. EDIT: However, would not describe either technology as real-time.
Another EDIT: for a quite well-known and successful setup of this kind please read this: http://blog.fogcreek.com/the-trello-tech-stack/

Have you discovered Chole? It works separately from your web sever and interfaces with it by using HTTP POSTs. That way you can code your web app any which way you want.

Actually Using Push Technology like Socket.IO helps you to use
the server's resource efficiently and also helps you to leverage old browsers to modern browsers making websocket or websocket-like connection.
10 sec polling is a HTTP request which is expensive especially when a lot of users present.
Unlike polling technology, push technology is relatively cheap. Users' client is opening a dedicated socket(ie. websocket) to listen to the server's push notification.
And usually your client-side JavaScript do some actions when the push notification is received.
Using your LAMP stack and Socket.IO with different port (other than 80) will be good enough to implement what you need.
But using Node.js + MongoDB + Socket.IO actually helps you to manage your server's resource much efficiently.
Because those three have non-blocking nature.
If you understand non-blocking concept correctly and implement your app appropriately,
your identical app, an app with same feature but with different language and different database, would be able to handle a lot more requests than general LAMP stack.
Above picture is a famous chart of comparing Non-blocking vs Thread way to handle concurrency
Apache(Thread) vs Nginx(Non-blocking)
MySQL is a great database. I believe you won't need join and transactions for realtime notification.
MongoDB does not have those two features unless you implement similar features by yourself.
Because of not having those two and some characteristics of its own, MongoDB can store and fetch data much faster than traditional SQL databases.
Switching from MySQL to MongoDB will decrease the time taking to insert and fetch data.

with JS you can open a socket to your server (not old browser), the server will have a ah-hoc program (on an ad-hoc port, so you need the permission to open door and run program on your server) that will send data (almost) realtime from and to the client, and without the HTTP's protocol overhead.old browser will just fall-back to polling mechanism.
I can't see other way to do this (probably there are already "coocked" framework that do this)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string