Solution for database updation without hitting db after some time intervals - node.js

Hitting a db again and again on some time intervals is a big mess as if there are 100k users logged in db will get 1 million request every 10 seconds which i cant afford. I have researched a lot about this issue and need a perfect solution for this.
(Working in NODEJS & PostgreSQL)

Postgres 9.4+ provides logical decoding which gives access to row level changes. You can listen to the write ahead log of postgres and have your application receive data as push from the database.
You may have to build a middleware that does it for you. I found a good write up that talks about utilizing logical decoding and apache kafka streams.
https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/

Related

Is writing multiple INSERTS versus UPDATE faster for temporary POSTGRES databases?

I am re-designing a project I built a year ago when I was just starting to learn how to code. I used MEAN stack, back then and want to convert it to a PERN stack now. My AWS knowledge has also grown a bit and I'd like to expand on these new skills.
The application receives real-time data from an api which I clean up to write to a database as well as broadcast that data to connected clients.
To better conceptualize this question I will refer to the following items:
api-m1 : this receives the incoming data and passes it to my schema I then send it to my socket-server.
socket-server: handles the WSS connection to the application's front-end clients. It also will write this data to a postgres database which it gets from Scraper and api-m1. I would like to turn this into clusters eventually as I am using nodejs and will incorporate Redis. Then I will run it behind an ALB using sticky-sessions etc.. for multiple EC2 instances.
RDS: postgres table which socket-server writes incoming scraper and api-m1 data to. RDS is used to fetch the most recent data stored along with user profile config data. NOTE: RDS main data table will have max 120-150 UID records with 6-7 columns
To help better visualize this see img below.
From a database perspective, what would be the quickest way to write my data to RDS.
Assuming we have during peak times 20-40 records/s from the api-m1 + another 20-40 records/s from the scraper? After each day I tear down the database using a lambda function and start again (as the data is only temporary and does not need to be saved for any prolonged period of time).
1.Should I INSERT each record using a SERIAL id, then from the frontend fetch the most recent rows based off of the uid?
2.a Should I UPDATE each UID so i'd have a fixed N rows of data which I just search and update? (I can see this bottlenecking with my Postgres client.
2.b Still use UPDATE but do BATCHED updates (what issues will I run into if I make multiple clusters i.e will I run into concurrency problems where table record XYZ will have an older value overwrite a more recent value because i'm using BATCH UPDATE with Node Clusters?
My concern is UPDATES are slower than INSERTS and I don't want to make it as fast as possible. This section of the application isn't CPU heavy, and the rt-data isn't that intensive.
To make my comments an answer:
You don't seem to need SQL semantics for anything here, so I'd just toss RDS and use e.g. Redis (or DynamoDB, I guess) for that data store.

Persist data for 24 hours

I need to build a microservice that scrapes a message once a day and persists it somewhere. It does not need to be accessible after 24 hours (it can be deleted). It doesn't really matter where or how, but I need to access it from an Express.js endpoint and return the message. Currently we use Redis and MongoDB for data persistence. It feels wrong to create a whole collection for one tiny service, and I'm not sure of an application of Redis that would fulfill this task. What's my best option? Open to any suggestions, thank you!
You can use YUGABYTE DB, and you can set TABLE LIVE= 24 Hours, then data will be deleted.
Redis provide an expiration mechanism out of the box. You can associate a timeout to a key, and it will be automatically deleted after the timeout has expired. Some official documentation here
Redis also provides logical databases, if you want to keep this expiring keys separated from the rest of your application. So you do not need to spin up another machine. Some official documentation here

mongodb Atlas server - slow return

So I understand how some queries can take a while and querying the same information many times can just eat up ram.
I am wondering is their away to the following query more friendly for real-time requests?
const LNowPlaying = require('mongoose').model('NowPlaying');
var query = LNowPlaying.findOne({"history":[y]}).sort({"_id":-1})
We have our iOS and Android apps that request this information every second - which takes toll on MongoDB Atlas.
We are wondering if their is away in nodeJS to cache the data that is returned for at least 30 seconds and then fetch the new playing data when the data has changed.
(NOTE: We have a listener script that listen for song metadata to change - and update NowPlaying for every listener).
MongoDB will try doing its own caching when possible of queried data in memory. But the frequent queries mentioned may still put too much load on the database.
You could use Redis, Memcached, or even in-memory on the NodeJS side to cache the query results for a time. The listener script referenced could invalidate the cache each time an update occurs for a song's metadata to ensure clients get the most up-to-date data. One example of an agnostic cache client for NodeJS is catbox.

MongoDB Multiple database vs single database

I have a NodeJS project that using mongodb as main database.
Regular, I just use one database for containing all information (users, organization, messages,...)
But now, I need to store one more thing - log data - which grow very very fast.
So I consider store log in other database to keep current database safe and fast.
Does anyone has experience in this, Is that better than single database?
Not a real question the mods will certainly say. You have a few options depending on your log data and how / how often you want to access it.
Capped collections if you don't need to store the logs for a long time
Something like Redis to delay writing to the log and keep the app responding fast
Use a replica set to distribute the database load.

How to connect pinoccio to apache couchdb

Is there anyone using the nice pinoccio from www.pinocc.io ?
I want to use it to post data into an apache couchdb using node.js. So I'm trying to poll data from the pinnocio API, but I'm a little lost:
schedule the polls
do long polls
do a completely different approach
Any ideas are welcome
Pitt
Sure. I wrote the Pinoccio API, here’s how you do it
https://gist.github.com/soldair/c11d6ae6f4bead140838
This example depends on the pinoccio npm module ~0.1.3 so make sure to npm install again to pick up the newest version.
you don't need to poll because pinoccio will send you changes as they happen if you have an open connection to either "stats" or "sync". if you want to poll you can but its not "real time".
sync gives you the current state + streams changes as they happen. so its perfect if you
only need to save the changes to your troop while your script is running. or show the current and last known state on a web page.
The solution that replicates every data point we store is stats. This is the example provided. Stats lets you read everything that has happened to a scout. Digital pins for example are the "digital" report. You can ask for data from a specific point in time or just from the current time (default). Changes to this "digital" report will continue streaming live as they happen, until the "end" time is reached, or if "tail" equals 0 in the options passed to stats.
hope this helps. i tested the script on my local couch and it worked well. you would need to modify it to copy more stats from each scout. I hope that soon you will be able to request multiple reports from multiple scouts in the same stream. i just have some bugs to sort out ;)
You need to look into 2 dimensions:
node.js talking to CouchDB. This is well understood and there are some questions you can find here.
Getting the data from the pinoccio. The API suggests that as long as the connection is open, you get data. So use a short timeout and a loop. You might want to run your own node.js instance for that.
Interesting fact: the CouchDB team seems to work on replacing their internal JS engine with node.js

Resources