Merge data from local database to remote database of Mongodb - node.js

I have a remote central database(MongoDb) that must contain all records coming from using a application by many clients!
The application save data in local collection, then should save data in remote database !
we’re good until now… the problem comes when there is no internet, we go in offline mode.
So lets suppose that we have a remote MongoDb Central database and 2 clients : client A and client B => local database A + local database B
(the two databases are independent).
We are offline ! I have created a document doc1 in the Local DB-A and document doc2 in the Local DB-B .
Online mode is active so the db A must push the doc1 in DB central also the db B must push the doc2 in DB central.
I am using MEAN stack application.
I have checked ChangeStream, but not sure during network break how it will work..
Anyhelp will be much appreciated.

I am not sure about the realm and change-stream, you need to brainstorm do your own logic as per your requirement,
I had the same situation and asked in the MongoDB forum in this topic, but got a suggestion to implement change-stream, I tried but was not worked for me,
After that, I developed my own logic using NodeJS, ExpressJS, and MongoDB driver APIs to sync data from local to live and wise versa,
If I explain to you in simple steps and examples:
added a timestamp property that is created timestamp for every collection's documents, _id objectId will also work
ex: createdAt: ISODate("")
created 2 collections for settings in the local server (one for local to live sync and second for live to local sync), that store the db+collection name, timestamp property that is the last sync document's timestamp
ex: dbName.live_to_local_settings collection
{
"db_collection": "dbName.collectionName",
"syncProperty": "createdAt",
"lastSync": ISODate("")
},
...
created 2 scripts in local server: (both will execute as provided interval)
live to local sync:
loop dbName.live_to_local_settings collection documents and do the below steps for each collection
query to live collection (live."dbName.collectionName") and check is createdAt >= settings.lastSync and sort by createdAt in ascending order
query to local collection (local."dbName.collectionName") and check only above steps found documents by _id { $in: _ids }
loop the live collections found documents and do the below steps:
insert document in the local collection (local."dbName.collectionName") if it not exists in local
update the document if _id exists in local, there are 3 conditions:
ignore the update if both dates local.createdAt == live.createdAt are equal
check if live.createdAt < local.createdAt then merges both documents but the primary consideration is the local document because the createdAt is latest, and after merge replace that whole document in both the places local and live collections using replace query
check else if local.createdAt < live.createdAt then merges both documents but the primary consideration is the live document because the createdAt is latest, and after merge replace that whole document in both the places local and live collections using replace query
update lastSync date in dbName.live_to_local_settings collection, that is document's createdAt timestamp
local to live sync:
the above steps would be the same but you need to change it query location wise versa
This looks complex but, works perfectly with less internet connectivity, and resumes when the internet comes back!

Related

Delete all documents in MongoDB collections in Mongo shell

I would like to delete all documents in all MongoDB collections. Is there a way to delete all documents in all collections.
I was using db.collection.remove({}) but it only removes all documents in one collection. Is there any command to do? I'm using NodeJS mostly, maybe there is a chance to use NodeJS to delete all documents in all collections?
Sorry if the question is dumb, just started working in MongoDB.
As already suggested - You can either use .dropDatabase() to drop entire database or .collection.drop() to drop a collection or if it's just to delete all documents in all collections then you need to iterate on list of collections and implement either .collection.remove() or .collection.deleteMany() or .findAndModify() without any filter in query condition.
To delete documents in each collection individually :
first list all collection names using .getCollectionNames() and then remove documents.
let colls = db.getCollectionNames() // Mongo shell can accept .Js Func's, if you've more collections you can use parallel as well
colls.forEach(eachColl => db[eachColl].remove({})) // or .deleteMany() or . findAndModify()
Doing this way, you'll still have the database and empty collections existing on MongoDB server. Maybe you can comeback after sometime check list of collections available or maybe rename few etc.
But if you just simply don't want to look at collection names that use to exist in any near future, go ahead with drop commands preferable drop database as you wanted to delete all docs from all collections - why it's preferred ? is because unlike SQL databases MongoDB automatically creates a database and a collection if you write a document for the first time to a collection in a DB. So in MongoDB you might not need to maintain databases with empty collections.
Assume you're querying on collection named girlfriend which is in mylife database - Let's say it's already deleted/missing/never existed then .find() would return [] empty array same like querying on empty collection on a DB - this is the advantage with MongoDB as it doesn't throw an error on mismatched names.

Node.JS/Express - how to avoid multiple database queries

I have a basic express app and im getting started with db queries and i want to know how to avoid multiple db queries because i dont think its efficient the way i do it :
app.get('/:word', function(req,res){
db.create({'name': word});
console.log('the word is ' + word);
});
What i want to do is :
get the word from the url
check if it exists in the datbaase (or previously requested because if it was then it was probably added already through this basic code)
if it doesn't exist then add it and then proceed to console.log
I want to add each word to my database once only and not run the db query again and again.
Here's what im thinking :
Not so efficient way
query to check if it exists before inserting one
Good way but i dont know how to start here
Cache the word being queried and maintain cache to prevent db queries
More info edit
I'm using mongodb via mongoose
the 'word' key is already unique so i know its not creating duplicate values
i dont want to run ANY db queries if that value or that url has already been hit once
The only way to check if the word already exists is to query the database before inserting. There are libraries (and also database) that implements the findOrCreate method, but this is always just an abstraction. Behind the scenes, the database will search for an existing value before writing.
If your database is huge and queryng is not suitable, you could use a cashing system (like Redis). But this definitely depends on your logic and your data size.
Probably you can just optimize the process just adding and index to the column you want be unique (I guess it's name?).
You could also define the column name as unique. When inserting, the database will throw you an error if the document already exists. But keep in mind again that, behind the scenes, the database is queryng for an existing same value before inserting. The advantage to have an "unique" column is that the index for this column is automatically created and also from your app logic (node js) you can just call the insert method and add a little bit error handling logic.
MongoDB will create any collections you use in your app if they do not already exist.
Insert Unique Value :
Create Unique Index to your key, So that the value will be added only once. If you try to add again it will throws an error to you.
To create Unique Index,
db.collection.createIndex( { "name": 1 }, { unique: true } )
Caching :
For caching, Store your data on cache system(Like: memory-cache, redis) on first time data will be query from MongoDB and then for subsequent need of data you can use cache system.
In mongo db you can use findOneAndUpdate with optional flag upsert: true documentation
To ensure that every word appears only once you should also set unique index on that field. However rememer that unique index is case sensitive so Cat and cat are different words.

Mongodb: big data structure

I'm rebuilding my website which is a search engine for nicknames from the most active forum in France: you search for a nickname and you got all of its messages.
My current database contains more than 60Gb of data, stored in a MySQL database. I'm now rewriting it into a mongodb database, and after retrieving 1 million messages (1 message = 1 document) find() started to take a while.
The structure of a document is as such:
{
"_id" : ObjectId(),
"message": "<p>Hai guys</p>",
"pseudo" : "mahnickname", //from a nickname (*pseudo* in my db)
"ancre" : "774497928", //its id in the forum
"datepost" : "30/11/2015 20:57:44"
}
I set the id ancre as unique, so I don't get twice the same entry.
Then the user enters the nickname and it finds all documents that have that nickname.
Here is the request:
Model.find({pseudo: "danickname"}).sort('-datepost').skip((r_page -1) * 20).limit(20).exec(function(err, bears)...
Should I structure it differently? Instead of having one document for each message, I'm having a document for each nickname and I update the document once I get a new message from that nickname?
I was using the first method with MySQL et it wasn't taking that long.
Edit: Or maybe should I just index the nicknames (pseudo)?
Thanks!
Here are some recommendations for your problem about big data:
The ObjectId already contains a timestamp. You can also sort on it. You could save on some disk space by removing the datepost field.
Do you absolutely need the ancre field? The ObjectId is already unique and indexed. If you absolutely need it and need to keep the datepost seperate too, you could replace the _id field to be your ancre field.
As many mentioned, you should add an index on pseudo. This will make the "get all messages where the pseudo is mahnickname" search much faster.
If the amount of messages per user is low, you could store all of them inside a single Document per user. This would avoid having to skip to a specific page, which can be slow. However, be aware of the 16mb limit. I would personally still have them in multiple documents.
To keep fast query speeds, ensure that all your indexed fields fit in RAM. You can see the RAM consumption of indexed fields by typing db.collection.stats() and looking at the indexSizes sub-document.
Would there be a way for you to not skip documents, but use the time it got written to the database as your pages? If so, use the datepost field or the timestamp in _id for your paging strategy. If you decide on using the datepost, make a compound index on pseudo and datepost.
As for your benchmarks, you can closely monitor MongoDB by using mongotop and mongostat.

CouchDB replication strategy with dynamic groups of users

This is the situation:
We have a series of users who share some documents. The documents they can share might change throughout the day, so can the documents themselves (changes and deletions). The users can change some information on the documents.
E.g.
Users | Documents
A | X
A | Y
A | Z
B | X
B | Z
C | Y
Possible groups: A+C, A+B
The server on CouchDB is a replica of a SQL Server DB with this data, an ETL takes care of managing changes on CouchDB. However, the CouchDB database is replicated on each user phone via PouchDB.
The goal:
To replicate changes and deletions accordingly.
What we've tried:
1) we figured we'd structure our documents with a list of users that can access to it. Each document would have a "Users" array and then a filter in the design document would take care of the replication to the clients. Unfortunately document deletions and document changes that won't pass the filter (e.g. a user is removed from the array) are not present in the _changes feed so cannot be replicated accordingly on the clients
2) database per user. This is not possible, because users need to see each others work on the documents (they share them)
3) database per group of users. Pretty much the same problem as the first solution, but worse. In fact:
- groups of user can change and no longer be present: how do reflect that client-side?
- a document can shift to a new group: it will have to be redownloaded from scratch. This greatly increases the download size
- the same document can be in more than one group! (see example above)
- each client would have to know in which group she is everytime she logs in and replicate multiple databases. Then on the return trip you'd have to know on which databases the document was present
Is there a recipe for this situation? Am I missing an obvious solution?
EDIT
Partial solution for case 1:
localDB.sync(remoteDB, {
live: true,
retry: true,
filter: 'app/by_user',
query_params: { "agente": agent }
})
.on('paused', function(info){
console.log("paused");
localDB.allDocs().then(function(docs){
console.log("allDocs");
docs.rows.forEach(function(row){
console.log(row);
remoteDB.get(row.id)
.then(function(doc){
if(doc.Agents.indexOf(agent) < 0){
localDB.remove(doc);
}
});
});
});
})
.on('change', function(result){
console.log("change!");
result.change.docs.forEach(function(change) {
if(!change.deleted){
$rootScope.$apply(function(){
$rootScope.$broadcast('upsert', change);
});
}
});
});
Each remove() is giving me a 409 (conflict), and rightfully so. Is there a way to tell Pouch "no longer consider this as replicable and just remove it from my DB?"
(3) Seems like the simplest solution to me, i.e. the "database per role" solution.
I think your difficulty stems from trying to manage permissions inside the documents themselves (and then using filtering replication). When you do that, you are basically trying to mirror CouchDB's permission system inside your documents, which is going to cause headaches.
Why not create a database per role, and assign roles to users using the normal _users database? If roles change, then users will lose or gain access to a set of documents. You would need to have server endpoints to handle the role-shuffling, or you would need to set up separate "admin" databases with special privileges, where users can change the roles.
Then on the client side, you can either replicate from multiple CouchDB databases into a single PouchDB (and then collate the results together yourself), or into a single PouchDB (probably a bad idea if you need to sync bidirectionally). Obviously you would need an initial step where you determine which databases the user has access to, but that's a small downside in my opinion.
Then if the user loses access to a document, they will simply get normal 401 errors during replication (which will show up in the 'denied' event during live replication). No need for ddocs or filtered replication - much simpler!
We arrived at the conclusion that:
1) our use-case might not be what CouchDB is good for
2) we value our mental health. After almost a month struggling with this problem we'd rather try and fail
3) documents are relatively inexpensive, so even if they stay on the user's phone that won't cause any major distress. If the data builds up too much they can simply clear the data and start fresh
Solution:
1) Keep the architecture as to point 1
2) After each 'pause' event triggers compare local docs with remote docs, if the remote doc doesn't pass the filter remove it from the UI. Should there be a way to remove the local document only we'll be very interested in upgrading to that logic.
1) still sounds as the simplest approach to me..
I don't know PouchDB very well, but in plain CouchDB, changes on deleted document can be workaround by extending attributes on deleted document, using your own custom DELETE function.
I mean.. a delete is like an update which sets the _deleted attribute to true.
So, instead of directly deleting documents, using the normal CouchDB crud DELETE on document, you can create an update function like this:
function(doc,req){
// optional acls for deleting doc.. doc is owned by req.userCtx.name
// doc.users are users already granted to work with this doc
return [{
"_id" : doc._id,
"_rev": doc._rev,
"_deleted":true,
"users": doc.users
},"Ok doc deleted"];
}
Furthermore, using document rewriting rules, this update function can eventually be called even when submitting an HTTP DELETE request(not only on PUT or POST).. In this way your delete behaviour becomes totally transparent to the client... and you delete in a way which can be more useful for your use case.
The Smileupps Chatty couchapp tutorial app uses this approach: extended deletes for different document types are performed within user/drop.js, profile/drop.js, chat/drop.js files

MongoDB: Copy a collection of referenced documents as subdocuments

I made the mistake of designing a scheme so that I have two collections where one has documents which contain a manual reference to the other. I realized now that I should have created it so that the parent collection contained the other collection as sub-documents instead.
The problem is, I've already put this scheme out into a production environment where hundreds of entries have already been created. What I'd like to do is somehow scan over all of the existing data, and copy the items to their referenced parent_id as a sub-document.
Here is an example of my schema:
Collection 1 - User
_id
Name
Collection 2 - Photos
_id
url
user_id
Is there a quick way to change the existing documents to be one collection like this:
Collection - User
_id
Name
Photos: [...]
Once I have the database setup correctly, I can easily modify my code to use the new one, but the problem I'm having is figuring out how to quickly/procedural copy the documents to their parent.
Additional detail - I'm using MongoHQ.com to host my MongoDB.
Thank You.
I don't know the specifics of your environment, but this sort of change usually involves the following kinds of steps:
Ensure that your old code doesn't complain if there is a Photos array in the User object.
"Freeze" the application so that new User and Photo documents are not created
Run a migration script that copies the Photo documents into the User documents. This should be pretty easy to create either in javaScript or through app code using the driver (see example below)
Deploy the new version of the application that expects Photos to be embedded in the array
"Unfreeze" the application to start creating new documents
If you cannot "Freeze/Unfreeze" you will need to run a delta script after step 4 that will migrate newly created Photo documents after the new application is deployed.
The script will look something like this (untested):
db.User.find().forEach(function (u) {
u.Photos = new Array();
db.Photo.find({user_id : u._id}).forEach(function (p) {
u.Photos.push(p);
}
db.User.Save(u);
}

Resources