How can I reduce a collection by keeping only change points? - node.js

I have a Collection exampled below. This data is pulled from an endpoint every twenty minutes on a cron job.
I want to discard any document (row) that doesn't show a change in the empty (and consequently ready). My goal is to find the most recent time stamp where these values have changed with in this collection.
Better illustrated, I want to reduce it to where the values change as such:
Can I do this at the in a MongoDB query? Or am I better off with a JavaScript filter function?

MongoDB allows you to specify a unique constraint on an index. These constraints prevent applications from inserting documents that have duplicate values for the inserted fields.
Use the following code to make unique
db.collection.createIndex( { "id": 1 }, { unique: true } )
Also refer the MongoDB documentation for more clarification.


Trying to understand mongodb indexes for finding documents with exact and unique value(s)

I am reading through mongo docs fro nodejs driver, particularly this index section and it looks like all of the indexes that they mention are for sortable / searchable data. So I wanted to ask if I need indexes for following use case:
I have this user document structure
email: string,
version: number,
otherData: ...
As far as I understand I can query each user by _id and this already has default unique index applied to it? I alos want to query user by email as well, so I created following unique index
collection.createIndex({ email: 1 }, { unique: true })
Is my understanding correct here that by creating this index I guarantee thaa:
Email is always unique
My queries like collection.findOne({email: ''}) are optimised?
Next, I want to perform update operations on user documents, but only on specific versions, so:
collection.updateOne({email: '...', version: 2}, update)
What index do I need to create in order to optimise this query? Should I be somehow looking into compound indexes for this as I am now using email and version?
Yes, the unique constraint happens at the db layer so by definition this will be unique, It is worth mentioning that this can affect insert/update performance as this check has to be executed on each of these instances - from my experience you only start feeling this overhead in larger scale ( hundreds of millions of documents in a single collection + thousands of inserts a minutes ).
Yes. there is no other way to optimize this further.
What index do I need to create in order to optimise this query? Should I be somehow looking into compound indexes for this as I am now using email and version?
You want to create a compound index, the syntax will looks like this:
collection.createIndex({ email: 1, version: 1 }, { unique: true })
I will just say that by definition the (first) email index ensures uniqueness, so any additional filtering you add to the query and index will not really affect anything as there will always be only 1 of those emails in the DB. Basically why bother adding a "version" field to the query? if you need it for filtering that's fine but then you won't be needing to alter the existing index.

Can mongoose batch update based on an array of objects that matches the collection?

I am working on a project in Express/Node, and I am utilizing a MongoDB database that has a collection of Course documents that represent a course in my school system that changes in real-time. The Course documents in my database each look like this:
Course Document
courseID: Number,
restrictions: String,
status: String,
My program has to check for changes in the school's course system, and update any changes that it sees and updates my private MongoDB database with the changes. To accomplish this, I currently have a script that looks at all the courses in the school system, and records them in an array of objects, with each object corresponding to a course.
var allCourses =
courseID: 123456,
restrictions: "A and B",
status: "OPEN"
courseID: 678990,
restrictions: "A",
status: "FULL",
The goal now is to be able to go through my database, and skip the documents that are the same as the corresponding javascript object in the array, and update those that are not.
Obviously, I could just iterate through my array with forEach, and update every single course by filtering by 'courseID' and updating both fields one document at a time, but I can foresee that this would take a large amount of time.
I was wondering if there was a batch update function, similar to the insertMany operation, that can take my array of objects and update my database documents that correspond to an object within the array?
These are helpful links
Trying to do a bulk upsert with Mongoose. What's the cleanest way to do this?

Node.JS/Express - how to avoid multiple database queries

I have a basic express app and im getting started with db queries and i want to know how to avoid multiple db queries because i dont think its efficient the way i do it :
app.get('/:word', function(req,res){
db.create({'name': word});
console.log('the word is ' + word);
What i want to do is :
get the word from the url
check if it exists in the datbaase (or previously requested because if it was then it was probably added already through this basic code)
if it doesn't exist then add it and then proceed to console.log
I want to add each word to my database once only and not run the db query again and again.
Here's what im thinking :
Not so efficient way
query to check if it exists before inserting one
Good way but i dont know how to start here
Cache the word being queried and maintain cache to prevent db queries
More info edit
I'm using mongodb via mongoose
the 'word' key is already unique so i know its not creating duplicate values
i dont want to run ANY db queries if that value or that url has already been hit once
The only way to check if the word already exists is to query the database before inserting. There are libraries (and also database) that implements the findOrCreate method, but this is always just an abstraction. Behind the scenes, the database will search for an existing value before writing.
If your database is huge and queryng is not suitable, you could use a cashing system (like Redis). But this definitely depends on your logic and your data size.
Probably you can just optimize the process just adding and index to the column you want be unique (I guess it's name?).
You could also define the column name as unique. When inserting, the database will throw you an error if the document already exists. But keep in mind again that, behind the scenes, the database is queryng for an existing same value before inserting. The advantage to have an "unique" column is that the index for this column is automatically created and also from your app logic (node js) you can just call the insert method and add a little bit error handling logic.
MongoDB will create any collections you use in your app if they do not already exist.
Insert Unique Value :
Create Unique Index to your key, So that the value will be added only once. If you try to add again it will throws an error to you.
To create Unique Index,
db.collection.createIndex( { "name": 1 }, { unique: true } )
Caching :
For caching, Store your data on cache system(Like: memory-cache, redis) on first time data will be query from MongoDB and then for subsequent need of data you can use cache system.
In mongo db you can use findOneAndUpdate with optional flag upsert: true documentation
To ensure that every word appears only once you should also set unique index on that field. However rememer that unique index is case sensitive so Cat and cat are different words.

TTL mongo for specific entry array

I read that mongodb has TTL (Time to live) indexes that can be activated for document.
But does it work if document structure is as follows?
username: 'user x',
activity: [
{type:a, desc:1, timestamp:timestamp},
{type:b, desc:2, timestamp:timestamp},
{type:b, desc:3, timestamp:timestamp},
Is there possibility to set TTL based on timestamp+7days of each array item so that only those expires but recent ones are kept?
Read the documentation carefully, The TTL index can be applied to an array but it will delete the whole document when expired not just the element inside the array.
However, you could split the array out into many documents?
There's currently no way to delete specific elements from an array using a TTL index. There's a feature request for this, but it seems like at the moment the best way to do this is to create a separate collection that links to the _id of your documents.
So in your case, instead of adding an activity array to all your users, you create an extra activities collections which contains documents like this:
{type:a, desc:1, timestamp:timestamp, userId:ObjectId("611636e533f29e4bd6683b05")}
{type:b, desc:2, timestamp:timestamp, userId:ObjectId("611636e533f29e4bd6683b05")}
{type:b, desc:3, timestamp:timestamp, userId:ObjectId("611636e533f29e4bd6683b05")}

Mongoose: Only return one embedded document from array of embedded documents

I've got a model which contains an array of embedded documents. This embedded documents keeps track of points the user has earned in a given activity. Since a user can be a part of several activities or just one, it makes sense to keep these activities in an array. Now, i want to extract the hall of fame, the top ten users for a given activity. Currently i'm doing it like this:
userModel.find({ "stats.activity": "soccer" }, ["stats", "email"])
.run (err, users) ->
(if you are wondering about the syntax, it's coffeescript)
where "stats" is the array of embedded documents/activeties.
Now this actually works, but currently I'm only testing with accounts who only has one activity. I assume that something will go wrong (sorting-wise) once a user has more activities. Is there anyway i can tell mongoose to only return the embedded document where "activity" == "soccer" alongside the top-level document?
Btw, i realize i can do this another way, by having stats in it's own collection and having a db-ref to the relevant user, but i'm wondering if it's possible to do it like this before i consider any rewrites.
You are correct that this won't work once you have multiple activities in your array.
Specifically, since you can't return just an arbitrary subset of an array with the element, you'll get back all of it and the sort will apply across all points, not just the ones "paired" with "activity":"soccer".
There is a pretty simple tweak that you could make to your schema to get around this though. Don't store the activity name as a value, use it as the key.
{ _id: userId,
email: email,
stats: [
{soccer : points},
{rugby: points},
{dance: points}
Now you will be able to query and sort like so:
Note that when you move to version 2.2 (currently only available as unstable development version 2.1) you would be able to use aggregation framework to get the exact results you want (only a particular subset of an array or subdocument that matches your query) without changing your schema.
