NestJS TypeOrm dynamically connect to multiple DB - nestjs

I'm building an API that is meant to query from multiple databases, all of these databases are isolated instances of the same data structure, so the idea is for the request to contain to which database to point. The amount of databases is dynamic. Is there a way for a module to set up a different amount of databases when starting up?
I've tried to use typeorm and connect to the specific db when asked but adds some time to the request, so I wanted to know if a way to have them all exists

Related

Is writing multiple INSERTS versus UPDATE faster for temporary POSTGRES databases?

I am re-designing a project I built a year ago when I was just starting to learn how to code. I used MEAN stack, back then and want to convert it to a PERN stack now. My AWS knowledge has also grown a bit and I'd like to expand on these new skills.
The application receives real-time data from an api which I clean up to write to a database as well as broadcast that data to connected clients.
To better conceptualize this question I will refer to the following items:
api-m1 : this receives the incoming data and passes it to my schema I then send it to my socket-server.
socket-server: handles the WSS connection to the application's front-end clients. It also will write this data to a postgres database which it gets from Scraper and api-m1. I would like to turn this into clusters eventually as I am using nodejs and will incorporate Redis. Then I will run it behind an ALB using sticky-sessions etc.. for multiple EC2 instances.
RDS: postgres table which socket-server writes incoming scraper and api-m1 data to. RDS is used to fetch the most recent data stored along with user profile config data. NOTE: RDS main data table will have max 120-150 UID records with 6-7 columns
To help better visualize this see img below.
From a database perspective, what would be the quickest way to write my data to RDS.
Assuming we have during peak times 20-40 records/s from the api-m1 + another 20-40 records/s from the scraper? After each day I tear down the database using a lambda function and start again (as the data is only temporary and does not need to be saved for any prolonged period of time).
1.Should I INSERT each record using a SERIAL id, then from the frontend fetch the most recent rows based off of the uid?
2.a Should I UPDATE each UID so i'd have a fixed N rows of data which I just search and update? (I can see this bottlenecking with my Postgres client.
2.b Still use UPDATE but do BATCHED updates (what issues will I run into if I make multiple clusters i.e will I run into concurrency problems where table record XYZ will have an older value overwrite a more recent value because i'm using BATCH UPDATE with Node Clusters?
My concern is UPDATES are slower than INSERTS and I don't want to make it as fast as possible. This section of the application isn't CPU heavy, and the rt-data isn't that intensive.
To make my comments an answer:
You don't seem to need SQL semantics for anything here, so I'd just toss RDS and use e.g. Redis (or DynamoDB, I guess) for that data store.

N number of Sequelize connections to dynamically query via API calls

I'm looking to get some opinions on what the best approach is for the following scenario:
Our product requires connections to our users' Postgres databases via our Node Express server, they provide their credentials once and we store it in an encrypted way in our internal operations DB and can reference to it when access is needed. A user can do an action on our app UI like create a table, delete a table, etc. and view table sizes, min max values of a column, etc.
These actions comes to our server as authenticated API calls and we can query their databases via Sequelize as needed and return the results to frontend.
My question is, when there are N number of users with N number of databases from different SQL instances that need to be connected when an API is called to query the respective database, what is the best approach to maintain that?
Should we create a new Sequelize connection instance each time an API is called and run the query, return the response, and close the connection. Or create a new Sequelize connection instance for a DB when an API is called, and keep the instance for certain amount of time, and close the connection if it was inactive during that amount of time, and restart the instance next time?
If there are better and more efficient ways of doing this, I would love to hear about it. Thanks.
Currently, I've tried to do a new Sequelize instance each time at the beginning of the API request, and run the query, and then close the connection. Works ok, but that's just locally with 2 DBs so I can't tell what production would be like.
Edit: Anatoly suggested connection pool, in that case, what're the things that need to be considered for the config?

Sharding in NodeJs with AWS RDS Postgresql as Database, Sequelize as ORM

I have a monolith application backend which serves millions of request a day written in NodeJs, with Sequelize and postgres as the database. Since ours is a tenant based application I am planning to shard my database in a way that I have x thousands of tenants in one shard and x in another shards, etc. I use AWS RDS (postgressql) as the Database Server.
On the infra structure its pretty much straight-forward to create a new shard. Just creating a new RDS database server with same configurations as my primary database would be sufficient.
The main problem I am facing now is how to manage the shards.
For example: I have the following requirement -
All my queries of tenant_id < 10000 should go to meta_database
All my queries of tenant_id > 10000 and < 30000 should go to shard_1
All my queries of tenant_id > 30000 and < 60000 should go too shard_2
I tried with the following tools:
Sequelize -
Its seems like it's highly impossible in doing this with Sequelize since it does not support sharding still. I can have multiple Sequelize connections created for all the shards and do the mapping of tenant_id with a particular shard manually in code. But it requires to get the models each time by passing the tenant_id of the tenant, which is not a good and readable approach.
pg_bouncer_rr -
I tried with pg_bouncer-rr and droppped it since I found that having a logic in the query routing level to get the tenant_id from the query and check the value using regex is not a good approach and also can cause some unexpected errors too.
Pg_fdw - Foreign Data Wrapper
I was able to create a fdw server and was able to route my queries to the foriegn server by following few articles. But the problem is it's still inserting all the records to my primary meta database tables. It seems like I was able to route only the reading through data wrappers and the data will still reside on the co-ordinator database. Also on addition to that I can partition my table and have few partitions on the foreign servers, but still when I have a record is to be inserted it is getting written to the main database table and then its getting reflected in my foreign tables. How can i have my foreign server to handle all my write and read calls completely independent of the meta database (meta database should only do the routing and should not have any data persisted).
pl/proxy -
I read few articles on pl/proxy it requires me to write a function for every read and inserts. I guess its more useful for managing table partitions, than managing shards.
I am not sure how to proceed with the tenant based sharding. If anyone have achieved sharding with nodejs, postgres and sequelize, kindly help!
I am even okay in having a proxy to the database that will take care of the query routing based on tenant_id. I tried CITUS for this purpose to use as a proxy but it revoked its support for AWS recently.

MongoDB Multiple database vs single database

I have a NodeJS project that using mongodb as main database.
Regular, I just use one database for containing all information (users, organization, messages,...)
But now, I need to store one more thing - log data - which grow very very fast.
So I consider store log in other database to keep current database safe and fast.
Does anyone has experience in this, Is that better than single database?
Not a real question the mods will certainly say. You have a few options depending on your log data and how / how often you want to access it.
Capped collections if you don't need to store the logs for a long time
Something like Redis to delay writing to the log and keep the app responding fast
Use a replica set to distribute the database load.

How to share an object in multiple instances of nodejs?

I have a functionality where user post data containing few userid and some data related to those userid and I am saving it into postgresql database. I want to save this returned userid in some object.
I just want to check if userid is present in this object and then only call database. This check happen very frequently so I can not hit db every time just to check is there any data present for that userid.
Problem is, I have multiple nodejs instances running on different server so how can I have a common object.
I know I can use redis/riak for storing key-value on server, but don't want to increase complexity/learning just for a single case.(I have never used redis/riak before.)
Any suggestion ?
If your data is in different node.js processes on different servers, then the ONLY option is to use networking to communicate across servers with some common server to get the value. There are lots of different ways to do that.
Put the value in a database and always read the value from the common database
Designate one of your node.js instances as the master and have all the other node.js instances ask the value is on the master anytime they need it
Synchronize the value to each node.js process using networking so each node.js instance always has a current value in its own process
Use a shared file system (kind of like a poor man's database)
Since you already have a database, you probably want to just store it in the database you already have and query it from there rather than introduce another data store with redis just for this one use. If possible, you can have each process cache the value over some interval of time to improve performance for frequent requests.

Resources