I'm looking to get some opinions on what the best approach is for the following scenario:
Our product requires connections to our users' Postgres databases via our Node Express server, they provide their credentials once and we store it in an encrypted way in our internal operations DB and can reference to it when access is needed. A user can do an action on our app UI like create a table, delete a table, etc. and view table sizes, min max values of a column, etc.
These actions comes to our server as authenticated API calls and we can query their databases via Sequelize as needed and return the results to frontend.
My question is, when there are N number of users with N number of databases from different SQL instances that need to be connected when an API is called to query the respective database, what is the best approach to maintain that?
Should we create a new Sequelize connection instance each time an API is called and run the query, return the response, and close the connection. Or create a new Sequelize connection instance for a DB when an API is called, and keep the instance for certain amount of time, and close the connection if it was inactive during that amount of time, and restart the instance next time?
If there are better and more efficient ways of doing this, I would love to hear about it. Thanks.
Currently, I've tried to do a new Sequelize instance each time at the beginning of the API request, and run the query, and then close the connection. Works ok, but that's just locally with 2 DBs so I can't tell what production would be like.
Edit: Anatoly suggested connection pool, in that case, what're the things that need to be considered for the config?
Related
I am re-designing a project I built a year ago when I was just starting to learn how to code. I used MEAN stack, back then and want to convert it to a PERN stack now. My AWS knowledge has also grown a bit and I'd like to expand on these new skills.
The application receives real-time data from an api which I clean up to write to a database as well as broadcast that data to connected clients.
To better conceptualize this question I will refer to the following items:
api-m1 : this receives the incoming data and passes it to my schema I then send it to my socket-server.
socket-server: handles the WSS connection to the application's front-end clients. It also will write this data to a postgres database which it gets from Scraper and api-m1. I would like to turn this into clusters eventually as I am using nodejs and will incorporate Redis. Then I will run it behind an ALB using sticky-sessions etc.. for multiple EC2 instances.
RDS: postgres table which socket-server writes incoming scraper and api-m1 data to. RDS is used to fetch the most recent data stored along with user profile config data. NOTE: RDS main data table will have max 120-150 UID records with 6-7 columns
To help better visualize this see img below.
From a database perspective, what would be the quickest way to write my data to RDS.
Assuming we have during peak times 20-40 records/s from the api-m1 + another 20-40 records/s from the scraper? After each day I tear down the database using a lambda function and start again (as the data is only temporary and does not need to be saved for any prolonged period of time).
1.Should I INSERT each record using a SERIAL id, then from the frontend fetch the most recent rows based off of the uid?
2.a Should I UPDATE each UID so i'd have a fixed N rows of data which I just search and update? (I can see this bottlenecking with my Postgres client.
2.b Still use UPDATE but do BATCHED updates (what issues will I run into if I make multiple clusters i.e will I run into concurrency problems where table record XYZ will have an older value overwrite a more recent value because i'm using BATCH UPDATE with Node Clusters?
My concern is UPDATES are slower than INSERTS and I don't want to make it as fast as possible. This section of the application isn't CPU heavy, and the rt-data isn't that intensive.
To make my comments an answer:
You don't seem to need SQL semantics for anything here, so I'd just toss RDS and use e.g. Redis (or DynamoDB, I guess) for that data store.
I'm building an API that is meant to query from multiple databases, all of these databases are isolated instances of the same data structure, so the idea is for the request to contain to which database to point. The amount of databases is dynamic. Is there a way for a module to set up a different amount of databases when starting up?
I've tried to use typeorm and connect to the specific db when asked but adds some time to the request, so I wanted to know if a way to have them all exists
I need help to limit the allowed number of connections typeORM can hold in his connectionManager.
Today I have many database, more than 12 thousands that are distributed on some servers, and each request in my application can connect to a different database because each database is related to the user, so for each user requesting something from my API my service runs the createConnection(userParams) but I don't know how to control this connection.
I tried limiting inside the userParams something like
createConnection(...userParams, {extra: connectionLimit: 5})
but it seems this only limit the inner Pool that is created each time. I need a way so I can limit the total number of connections the connectionManager can have.
Basically I want a global pool instead of one for each connection created. Can someone please give me any hints?
It looks like what I wanted to achieve was not possible before typeorm version 0.3.6. On current versions the connectionManager does not exists, thus I'm able to control connections by myself
I have a NodeJS project that using mongodb as main database.
Regular, I just use one database for containing all information (users, organization, messages,...)
But now, I need to store one more thing - log data - which grow very very fast.
So I consider store log in other database to keep current database safe and fast.
Does anyone has experience in this, Is that better than single database?
Not a real question the mods will certainly say. You have a few options depending on your log data and how / how often you want to access it.
Capped collections if you don't need to store the logs for a long time
Something like Redis to delay writing to the log and keep the app responding fast
Use a replica set to distribute the database load.
I have a node.js app running on heroku backed by a Mongo db which breaks down like this:
Node app connects to db and stores db and collection into "top level" variables (not sure if Global is the right word)
App iterates through each document in the db using the foreach() function in node mongo driver.
Each iteration sends the document id to another function that uses the id to access fields on that document and take actions based on that data. In this case its making requests against api's from amazon and walmart getting updated pricing info. This function is also being throttled so as not to make too many requests too quickly.
My question is this, how can I know its safe to close the db connection. My best idea is to get a count of the documents, multiply that by the number of external api hits per document and then increment a variable by one each time a api transaction finishes and then test that number against the total number expected and if it hits that close the connection. This sounds so hackish there has to be a better way. Any ideas?