Saving chat transcripts in nosql databases - node.js

I'm building a chat server app using node.js.
I need to save chat transcripts on the database and would like
to play with nosql database like mongodb. If i was in relational db world, i would
create users, chat_sessions and chat_messages tables and, for every new message, i'd
append new record in chat_messages table.
Which is the best approach in nosql?
Do i have to create a chat_session document and, inside of it, create
a chat_messages structure that is updated for every new message or
is there a better way to do it for nosql/mongodb?

You would use a similar approach and insert each new message as a separate document into the collection (possibly one for each room or channel).
Documents in MongoDB have a 16mb limit so storing the entire history in one document, which grows in an unbounded fashion, would be a bad choice.
You can de-normalize usernames and store them on the messages themselves to avoid querying two tables (this would likely be a join in a relational database).
It may make sense to split the data into multiple databases rather than collections (per room, channel, whatever) because currently mongodb has a database level write lock. This would allow you to achieve greater write concurrency with mongodb and node.js.

Related

How to structure Redis for data schemas with too many associations - mongodb

I have an application with more than 10 individual data models. Each of this model is associated with one another so deeply. For example there is a model called user that is associated to other data models like posts, comments, replies, connections etc. I am trying to have a redis cache system that will cache the data everytime a query is made. So consider this scenario where a post is upvoted, when this query happens I will have to update all the models that is somehow associated to this query.
So my question is how to structure my redis cache system so that all related data are updated everytime the query is made.
That is a very broad question. I don't know what your db schema and those entity relations look like. But I have few suggestions which I hope will guide you in a way to structure your data.
Break down your entities
Store user, post, comment, reply separately. When you need to get a post for example, get post and all its entities separately from redis then merge them to build a response.
Something like POST:345, USER:23, COMMENT:567.
Don't store everything in redis
Maintaining cache is way more difficult. Store only the data you are accessing more frequently and which will really make an impact if you serve it from cache. For example storing user profile will improve all post responses, comment responses, connection lists etc because all of them will have user objects and you will have them cached.
Increment stats directly in redis
Likes and comment count values can directly increment and decrement in redis.
Invalidate cache on update
When an entity is updated, don't update its cache. Simply just delete it from cache and next get call will cache the updated data. That is just to simplify things in code.

Listen to changes of all databases in CouchDB

I have a scenario where there are multiple (~1000 - 5000) databases being created dynamically in CouchDB, similar to the "one database per user" strategy. Whenever a user creates a document in any DB, I need to hit an existing API and update that document. This need not be synchronous. A short delay is acceptable. I have thought of two ways to solve this:
Continuously listen to the changes feed of the _global_changes database.
Get the db name which was updated from the feed.
Call the /{db}/_changes API with the seq (stored in redis).
Fetch the changed document, call my external API and update the document
Continuously replicate all databases into a single database.
Listen to the /_changes feed of this database.
Fetch the changed document, call my external API and update the document in the original database (I can easily keep a track of which document originally belongs to which database)
Questions:
Does any of the above make sense? Will it scale to 5000 databases?
How do I handle failures? It is critical that the API be hit for all documents.
Thanks!

Real-Time Database Messaging

We've got an application in Django running against a PGSQL database. One of the functions we've grown to support is real-time messaging to our UI when data is updated in the backend DB.
So... for example we show the contents of a customer table in our UI, as records are added/removed/updated from the backend customer DB table we echo those updates to our UI in real-time via some redis/socket.io/node.js magic.
Currently we've rolled our own solution for this entire thing using overloaded save() methods on the Django table models. That actually works pretty well for our current functions but as tables continue to grow into GB's of data, it is starting to slow down on some larger tables as our engine digs through the current 'subscribed' UI's and messages out appropriately which updates are needed as which clients.
Curious what other options might exist here. I believe MongoDB and other no-sql type engines support some constructs like this out of the box but I'm not finding an exact hit when Googling for better solutions.
Currently we've rolled our own solution for this entire thing using
overloaded save() methods on the Django table models.
Instead of working on the app level you might want to work on the lower, database level.
Add a PostgreSQL trigger after row insertion, and use pg_notify to notify external apps of the change.
Then in NodeJS:
var PGPubsub = require('pg-pubsub');
var pubsubInstance = new PGPubsub('postgres://username#localhost/tablename');
pubsubInstance.addChannel('channelName', function (channelPayload) {
// Handle the notification and its payload
// If the payload was JSON it has already been parsed for you
});
See that and that.
And you will be able to to the same in Python https://pypi.python.org/pypi/pgpubsub/0.0.2.
Finally, you might want to use data-partitioning in PostgreSQL. Long story short, PostgreSQL has already everything you need :)

Multiple PouchDB to single CouchDB

I need to submit data from multiple mobile apps. in my mobile app I am planning to use pouchdb to store the document, later I want this document to sync to couchdb one-way only.
what will happen if I submit data from multiple devices ? will pouch db create same document ID and overwrite data in couchDB ?
The document ID will not be the same, assuming you're letting PouchDB create them (the likelihood of PouchDB generating the same ID twice is extremely low)

PouchDB - start local, replicate later

Does it create any major problems if we always create and populate a PouchDB database locally first, and then later sync/authenticate with a centralised CouchDB service like Cloudant?
Consider this simplified scenario:
You're building an accommodation booking service such as hotel search or airbnb
You want people to be able to favourite/heart properties without having to create an account, and will use PouchDB to store this list
i.e. the idea is to not break their flow by making them create an account when it isn't strictly necessary
If users wish to opt in, they can later create an account and receive credentials for a "server side" database to sync with
At the point of step 3, once I've created a per-user CouchDB database server-side and assigned credentials to pass back to the browser for sync/replication, how can I link that up with the PouchDB data already created? i.e.
Can PouchDB somehow just reuse the existing database for this sync, therefore pushing all existing data up to the hosted CouchDB database, or..
Instead do we need to create a new PouchDB database and then copy over all docs from the existing (non-replicated) one to this new (replicated) one, and then delete the existing one?
I want to make sure I'm not painting myself into any corner I haven't thought of, before we begin the first stage, which is supporting non-replicated PouchDB.
It depends on what kind of data you want to sync from the server, but in general, you can replicate a pre-existing database into a new one with existing documents, just so long as those document IDs don't conflict.
So probably the best idea for the star-rating model would be to create documents client-side with IDs like 'star_<timestamp>' to ensure they don't conflict with anything. Then you can aggregate them with a map/reduce function.

Resources