TTL mongo for specific entry array - node.js

I read that mongodb has TTL (Time to live) indexes that can be activated for document.
But does it work if document structure is as follows?
username: 'user x',
activity: [
{type:a, desc:1, timestamp:timestamp},
{type:b, desc:2, timestamp:timestamp},
{type:b, desc:3, timestamp:timestamp},
etc.
]
Is there possibility to set TTL based on timestamp+7days of each array item so that only those expires but recent ones are kept?

Read the documentation carefully, The TTL index can be applied to an array but it will delete the whole document when expired not just the element inside the array.
However, you could split the array out into many documents?

There's currently no way to delete specific elements from an array using a TTL index. There's a feature request for this, but it seems like at the moment the best way to do this is to create a separate collection that links to the _id of your documents.
So in your case, instead of adding an activity array to all your users, you create an extra activities collections which contains documents like this:
{type:a, desc:1, timestamp:timestamp, userId:ObjectId("611636e533f29e4bd6683b05")}
{type:b, desc:2, timestamp:timestamp, userId:ObjectId("611636e533f29e4bd6683b05")}
{type:b, desc:3, timestamp:timestamp, userId:ObjectId("611636e533f29e4bd6683b05")}

Related

How to check for duplication before creating a new document in CouchDB/Cloudant?

We want to check if a document already exists in the database with the same fields and values of a new object we are trying to save to prevent duplicated item.
Note: This question is not about updating documents or about duplicated document IDs, we only check the data to prevent saving a new document with the same data of an existing one.
Preferably we'd like to accomplish this with Mango/Cloudant queries and not rely on views.
The idea so far is:
1) Scan the the data that we are trying to save and dynamically create a selector that matches that document's structure. (We can't have the selectors hardcoded because we have types of many documents)
2) Query de DB with for any documents matching that selector to if any document already exists that matches those criteria.
However I wonder about the performance of this approach since many of the selector fields will not be indexed.
I also much rather follow best practices than create something out of the blue, but haven't been able to find any known solutions for this specific scenario.
If you happen to know of any, please share.
Option 1 - Define a meaningful ID for your documents
The ID could be a logical coposition or a computed hash from the values that should be unique
If you want to check if a document ID already exists you can use the HEAD method
HEAD /db/docId
which returns 200-OK if the docId exits on the database.
If you would like to check if you have the same content in the new document and in the previous one, you may use the Validate Document Update Function which allows to compare both documents.
function(newDoc, oldDoc, userCtx, secObj) {
...
}
Option 2 - Use content hash computed outside CouchDB
Before create or update a document a hash should be computed using the values of the attributes that should be unique.
The hash is included in the document in a new attribute i.e. "key_hash"
Create a mango index using the "key_hash" attribute
When a new doc should be inserted, the hash should be computed and find for documents with the same hash value using a mango expression before the doc is inserted.
Option 3 - Compute hash in a View
Define a view which emit the computed hash for each document as key
Couchdb Javascript support does not include hashing functions, this could be difficult to include in a design document.
Use erlang to define the map function, where you can access to the erlang support for hashing.
Before creating a new document you should query the view using a the hash that you need to compute previously.
One solution would be to take Juanjo's and Alexis's comment one step further.
Select the keys you wish to keep unique
Put the values in a string and generate a hash
Set the document's _id to that hash
PUT the document on the database.
check return for failure
If another document already exists on the database with the same _id value, the PUT request will fail.

Is there a way to use dynamic collection names in AQL?

If I have a collection that contains collection names in it, is there a syntax in AQL that allows you to use dynamic collection names?
Here is an example of what I'm looking for. A collection called master has many documents, with a .state of Active or Disabled. The collection has a key called collection_name which is the name of another collection in this database.
FOR doc IN master
FILTER doc.state == 'Active'
FOR c IN COLLECTION(doc.collection_name) <--- invented command called COLLECTION
RETURN {
'collection_name': doc.collection_name,
'contents': c
}
I'm trying to retrieve all documents from all collections marked as Active in the master collection.
Is there a way to do this in one AQL query without having to break it up into an initial query on master followed by n queries for each of the collections returned?
As I've concluded from this ArangoDB issue there is no way to truly use dynamic collection names.
However you could use any AQL function as workaround. See the last comment on the issue for a full explanation.

How can I reduce a collection by keeping only change points?

I have a Collection exampled below. This data is pulled from an endpoint every twenty minutes on a cron job.
{"id":AFFD6,"empty":8,"capacity":15,"ready":6,"t":1474370406,"_id":"kROabyTIQ5eNoIf1"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474116005,"_id":"kX0DpoZ5fkMr2ezg"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474684808,"_id":"ken1WRN47PTW159H"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474117205,"_id":"kes1gDlG1sBjgV1R"}
{"id":AFFD6,"empty":10,"capacity":15,"ready":4,"t":1474264806,"_id":"khILUjzGEPOn0c2P"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474275606,"_id":"ko9r8u860es7E2hI"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474591207,"_id":"kpLS6mCtkIiffTrN"}
I want to discard any document (row) that doesn't show a change in the empty (and consequently ready). My goal is to find the most recent time stamp where these values have changed with in this collection.
Better illustrated, I want to reduce it to where the values change as such:
{"id":AFFD6,"empty":8,"capacity":15,"ready":6,"t":1474370406,"_id":"kROabyTIQ5eNoIf1"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474117205,"_id":"kes1gDlG1sBjgV1R"}
{"id":AFFD6,"empty":10,"capacity":15,"ready":4,"t":1474264806,"_id":"khILUjzGEPOn0c2P"}
{"id":AFFD6,"empty":9,"capacity":15,"ready":5,"t":1474591207,"_id":"kpLS6mCtkIiffTrN"}
Can I do this at the in a MongoDB query? Or am I better off with a JavaScript filter function?
MongoDB allows you to specify a unique constraint on an index. These constraints prevent applications from inserting documents that have duplicate values for the inserted fields.
Use the following code to make unique
db.collection.createIndex( { "id": 1 }, { unique: true } )
Also refer the MongoDB documentation for more clarification.

Mongodb: big data structure

I'm rebuilding my website which is a search engine for nicknames from the most active forum in France: you search for a nickname and you got all of its messages.
My current database contains more than 60Gb of data, stored in a MySQL database. I'm now rewriting it into a mongodb database, and after retrieving 1 million messages (1 message = 1 document) find() started to take a while.
The structure of a document is as such:
{
"_id" : ObjectId(),
"message": "<p>Hai guys</p>",
"pseudo" : "mahnickname", //from a nickname (*pseudo* in my db)
"ancre" : "774497928", //its id in the forum
"datepost" : "30/11/2015 20:57:44"
}
I set the id ancre as unique, so I don't get twice the same entry.
Then the user enters the nickname and it finds all documents that have that nickname.
Here is the request:
Model.find({pseudo: "danickname"}).sort('-datepost').skip((r_page -1) * 20).limit(20).exec(function(err, bears)...
Should I structure it differently? Instead of having one document for each message, I'm having a document for each nickname and I update the document once I get a new message from that nickname?
I was using the first method with MySQL et it wasn't taking that long.
Edit: Or maybe should I just index the nicknames (pseudo)?
Thanks!
Here are some recommendations for your problem about big data:
The ObjectId already contains a timestamp. You can also sort on it. You could save on some disk space by removing the datepost field.
Do you absolutely need the ancre field? The ObjectId is already unique and indexed. If you absolutely need it and need to keep the datepost seperate too, you could replace the _id field to be your ancre field.
As many mentioned, you should add an index on pseudo. This will make the "get all messages where the pseudo is mahnickname" search much faster.
If the amount of messages per user is low, you could store all of them inside a single Document per user. This would avoid having to skip to a specific page, which can be slow. However, be aware of the 16mb limit. I would personally still have them in multiple documents.
To keep fast query speeds, ensure that all your indexed fields fit in RAM. You can see the RAM consumption of indexed fields by typing db.collection.stats() and looking at the indexSizes sub-document.
Would there be a way for you to not skip documents, but use the time it got written to the database as your pages? If so, use the datepost field or the timestamp in _id for your paging strategy. If you decide on using the datepost, make a compound index on pseudo and datepost.
As for your benchmarks, you can closely monitor MongoDB by using mongotop and mongostat.

How to efficiently bulk insert and update mongodb document values from an array?

I have a Tags collection which contains documents of the following structure:
{
word:"movie", //tag word
count:1 //count of times tag word has been used
}
I am given an array of new tags that need to be added/updated in the Tags collection:
["music","movie","book"]
I can update the counts all Tags currently existing in the tags collection by using the following query:
db.Tags.update({word:{$in:["music","movies","books"]}}, {$inc:{count:1}}), true, true);
While this is an effective strategy to update, I am unable to see which tag values were not found in the collection, and setting the upsert flag to true did not create new documents for the unfound tags.
This is where I am stuck, how should I handle the bulk insert of "new" values into the Tags collection?
Is there any other way I could better utilize the update so that it does upsert the new tag values?
(Note: I am using Node.js with mongoose, solutions using mongoose/node-mongo-native would be nice but not necessary)
Thanks ahead
The concept of using upsert and the $in operator simultaneously is incongruous. This simply will not work as there is no way to different between upsert if *any* in and upsert if *none* in.
In this case, MongoDB is doing the version you don't want it to do. But you can't make it change behaviour.
I would suggest simply issuing three consecutive writes by looping through the array of tags. I know that's it's annoying and it has a bad code smell, but that's just how MongoDB works.

Resources