Aggregate time series data

Aggregate time series data - node.js

I am using mongoDB and mongoose to store metrics data. It is stored as a document for an array of metrics referencing the project it's stored for and metric type.
The schema for this looks like this:
exports.metricReportSchema = new Schema({
metrics: [{
metric: {
type: mongoose.Schema.Types.ObjectId,
ref: 'metricSchema',
required: true
},
value: {
type: String,
required: true
}
}],
project: {
type: mongoose.Schema.Types.ObjectId,
ref: 'projectSchema',
required: true
},
reportDate: Date
});
And the actual document looks like the following:
db.metricreports.findOne() {
"_id" : ObjectId("58a60e8459dd3d12ef8c5d51"),
"reportDate" : ISODate("2017-02-16T20:41:40.657Z"),
"project" : ObjectId("58a20f5f04ef5789d3ef8faa"),
"metrics" : [
{
"metric" : ObjectId("58a20f5f04ef5789d3ef8fb7"),
"value" : "781",
"_id" : ObjectId("58a60e8459dd3d12ef8c5d52")
}, {
"metric" : ObjectId("58a21106fc2aef8a10ded196"),
"value" : "566",
"_id" : ObjectId("58a60e8459dd3d12ef8c5d53")
}, {
"metric" : ObjectId("58a2141bded78e8ad8384f97"),
"value" : "501",
"_id" : ObjectId("58a60e8459dd3d12ef8c5d54")
}, {
"metric" : ObjectId("58a2141bded78e8ad8384f94"),
"value" : "44",
"_id" : ObjectId("58a60e8459dd3d12ef8c5d55")
}, {
"metric" : ObjectId("58a2141bded78e8ad8384f93"),
"value" : "645",
"_id" : ObjectId("58a60e8459dd3d12ef8c5d56")
}
],
"__v" : 0
}
Over time, there are multiple documents of this kind that store slices of data for multiple metrics. It is very convenient for selecting and displaying static reports on metrics for multiple projects and whatnot.
Now, this is becoming little complex when I try to build a time series report for an individual metric for a project.
Basically, what I would need to do is to scan multiple metricReport documents and extract individual single metrics' data from all available reports over time. Let's say I have 10 metricReports that each contain data for 10 different metrics and I only want to extract one, this could probably look like this:
{
"_id": "...",
"project": "...",
"metric": "...",
"data": {
"2016-02-02": "22",
"2016-02-03": "453",
...
}
}
I could not find a way to do this with out-of-box mongoDB querying and filtering functionality and wanted to ask for advice:
Is my approach of storing multiple metrics in a single document reasonable? Would I be better of keeping metrics as individual documents and then "merging" them somehow?
Is there a way to achieve what I need without doing this with nodejs (I assume this is not going to be very fast thing - grabbing the documents and then iterating them to create a new structure alongside and pushing it out)?
Is there a better way to do this? Virtual models or something in mongoose that could help? I understand that mongoDB may not be the right choice for time series data but it's not the only part of functionality and mongoDB/mongoose combination seems to be serving the other purposes nicely and I don't want to change the technology mid-way.

Yes, but keep in mind that documents have a limited size (16 MB IIRC), so if your data is unbounded, this structure will not work as your "metrics" array will grow past that.
Ultimately yes, even if you can't figure out a decent filter query, Mongo has MapReduce which will allow you to do what you want, though it won't be easy. I'd use Node for this.
There's no silver bullet here. Mongo is excellent if you need to aggregate data and store it as arbitrary JSON (i.e. to be consumed by an app), and not so great at doing complex joins/views of data. Any joins at the application level will be slow. If you want performance, you'll have to aggregate your reports into individual documents, and save & serve them. If you have live data, this will be more difficult as you'll need to handle updates.

Related

Change datastructure of collection in MongoDB with NodeJS

I'm working with NodeJS and MongoDB,
Actually I'm have a doubt about the datastructure that I'm using in MongoDB, currently I save my data in the database like this:
{
"_id" : ObjectId("4f9519d6684c8b1c9e72e367"),
"tipo" : "First Post on MongoDB",
"rates": {
"user": "5c981a0f8a76d426dc04619e",
"votation": 1
},
}
But the way that I want to store the data is in this structure:
{
"_id" : ObjectId("4f9519d6684c8b1c9e72e367"),
"tipo" : "First Post on MongoDB",
"rates": {
"5c981a0f8a76d426dc04619e":{
"votation": 1
}
},
}
I had been trying to save the data like in the last example, but I don't achieve it,
Here is the way as I have my const in NodeJS.
const post = {
tipo: body.tipo,
user: body.usuario,
duration: 25,
rates:
{
user: body.usuario,
votation: 1
}
};
And Here is how I have my model:
interface IPost extends Document {
tipo: String;
user: Object;
duration: number;
rates: Object;
}
Can someone explain me how can I do it please?
Regards.

If you need to store key/value properties related to the main document and then just visualize them, probably you need mongoose maps.
A map's keys are always strings.
rates: { type: Map, of: Number }
In this way you can have something like this:
rates: {"5c981a0f8a76d426dc04619e": 1}
Using this schema you can have multiple rates, one for each user/key.
But, as noted in the comments, this could be a sub optimal solution if need to do queries against these keys/values. In this case, an external schema reference should be used. In this case I'd use a three schema approach: Post, User and Rates.
a Post has multiple rates
a Rate belongs to the User and Post couple
I found a similar question already asked on SO.

Create View from multiple collections MongoDB

I have following Mongo Schemas(truncated to hide project sensitive information) from a Healthcare project.
let PatientSchema = mongoose.Schema({_id:String})
let PrescriptionSchema = mongoose.Schema({_id:String, patient: { type: Number, ref: 'Patient', createdAt:Date }})
let ReportSchema = mongoose.Schema({_id:String, patient: { type: Number, ref: 'Patient', createdAt:Date }})
let EventsSchema = mongoose.Schema({_id:String, patient: { type: Number, ref: 'Patient', createdAt:Date }})
There is ui screen from the mobile and web app called Health history, where I need to paginate the entries from prescription, reports and events sorted based on createAt. So I am building a REST end point to get this heterogeneous data. How do I achieve this. Is it possible to create a "View" from multiple schema models so that I won't load the contents of all 3 schema to fetch one page of entries. The schema of my "View" should look like below so that I can run additional queries on it (e.g. find last report)
{recordType:String,/* prescription/report/event */, createdDate:Date, data:Object/* content from any of the 3 tables*/}

I can think of three ways to do this.
Imho the easiest way to achieve this is by using an aggregation something like this:
db.Patients.aggregate([
{$match : {_id: <somePatientId>},
{
$lookup:
{
from: Prescription, // replicate this for Report and Event,
localField: _id,
foreignField: patient,
as: prescriptions // or reports or events,
}
},
{ $unwind: prescriptions }, // or reports or events
{ $sort:{ $createDate : -1}},
{ $skip: <positive integer> },
{ $limit: <positive integer> },
])
You'll have to adapt it further, to also get the correct createdDate. For this, you might want to look at the $replaceRoot operator.
The second option is to create a new "meta"-collection, that holds your actual list of events, but only holds a reference to your patient as well as the actual event using a refPath to handle the three different event types. This solution is the most elegant, because it makes querying your data way easier, and probably also more performant. Still, it requires you to create and handle another collection, which is why I didn't want to recommend this as the main solution, since I don't know if you can create a new collection.
As a last option, you could create virtual populate fields in Patient, that automatically fetch all prescriptions, reports and events. This has the disadvantage that you can not really sort and paginate properly...

Mongoose and nodejs: about schema and query

I'm building a rest api that allows users to submit and retrieve data produced by surveys: questions are not mandatory and each "submit" could be different from each other. Each submit is a json with data and a "survey id":
{
id: abc123,
surveyid: 123,
array: [],
object: {}
...
}
I have to store this data and allow retrieving and querying.
First approach: going without schema and putting everything in a single collection: it works, but each json field is treated as a "String" and making queries on numeric values is problematic.
Second approach: get questions datatypes for each survey, make/save a mongoose schema on a json file and then keep updated this file.
Something like this, where entry "schema : {}" represent a mongoose schema used for inserting and querying/retrieving data.
[
{
"surveyid" : "123",
"schema" : {
"name": "string",
"username" : "string",
"value" : "number",
"start": "date",
"bool" : "boolean",
...
}
},
{ ... }
]
Hoping this is clear, I've some questions:
Right now I've a single collection for all "submits" and everything is treated as a string. Can I use a mongoose schema, without other modifications, in order to specify that some fields are numeric (or date or whatever)? Is it allowed or is it even a good idea?
Are there any disadvantage using an external json file? Mongoose schemas are loaded at run time when requested or does the service need to be restart when this file is updated?
How to store data with a "schema" that could change often ?
I hope it's clear!
Thank you!

Conditionally update an array in mongoose [duplicate]

Currently I am working on a mobile app. Basically people can post their photos and the followers can like the photos like Instagram. I use mongodb as the database. Like instagram, there might be a lot of likes for a single photos. So using a document for a single "like" with index seems not reasonable because it will waste a lot of memory. However, I'd like a user add a like quickly. So my question is how to model the "like"? Basically the data model is much similar to instagram but using Mongodb.

No matter how you structure your overall document there are basically two things you need. That is basically a property for a "count" and a "list" of those who have already posted their "like" in order to ensure there are no duplicates submitted. Here's a basic structure:
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3")
"photo": "imagename.png",
"likeCount": 0
"likes": []
}
Whatever the case, there is a unique "_id" for your "photo post" and whatever information you want, but then the other fields as mentioned. The "likes" property here is an array, and that is going to hold the unique "_id" values from the "user" objects in your system. So every "user" has their own unique identifier somewhere, either in local storage or OpenId or something, but a unique identifier. I'll stick with ObjectId for the example.
When someone submits a "like" to a post, you want to issue the following update statement:
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": { "$ne": ObjectId("54bb2244a3a0f26f885be2a4") }
},
{
"$inc": { "likeCount": 1 },
"$push": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
Now the $inc operation there will increase the value of "likeCount" by the number specified, so increase by 1. The $push operation adds the unique identifier for the user to the array in the document for future reference.
The main important thing here is to keep a record of those users who voted and what is happening in the "query" part of the statement. Apart from selecting the document to update by it's own unique "_id", the other important thing is to check that "likes" array to make sure the current voting user is not in there already.
The same is true for the reverse case or "removing" the "like":
db.photos.update(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
"likes": ObjectId("54bb2244a3a0f26f885be2a4")
},
{
"$inc": { "likeCount": -1 },
"$pull": { "likes": ObjectId("54bb2244a3a0f26f885be2a4") }
}
)
The main important thing here is the query conditions being used to make sure that no document is touched if all conditions are not met. So the count does not increase if the user had already voted or decrease if their vote was not actually present anymore at the time of the update.
Of course it is not practical to read an array with a couple of hundred entries in a document back in any other part of your application. But MongoDB has a very standard way to handle that as well:
db.photos.find(
{
"_id": ObjectId("54bb201aa3a0f26f885be2a3"),
},
{
"photo": 1
"likeCount": 1,
"likes": {
"$elemMatch": { "$eq": ObjectId("54bb2244a3a0f26f885be2a4") }
}
}
)
This usage of $elemMatch in projection will only return the current user if they are present or just a blank array where they are not. This allows the rest of your application logic to be aware if the current user has already placed a vote or not.
That is the basic technique and may work for you as is, but you should be aware that embedded arrays should not be infinitely extended, and there is also a hard 16MB limit on BSON documents. So the concept is sound, but just cannot be used on it's own if you are expecting 1000's of "like votes" on your content. There is a concept known as "bucketing" which is discussed in some detail in this example for Hybrid Schema design that allows one solution to storing a high volume of "likes". You can look at that to use along with the basic concepts here as a way to do this at volume.

Adding objects to a deep array in MongoDB

I've just started building a little application using MongoDB and can't seem to find any examples where I can add objects to a deep array that I can then find on an individual basis.
Let me illustrate by the following set of steps I take as well as the code I've written.
I create a simple object in MongoDB like so:
testing = { name: "s1", children: [] };
db.data.save(testing);
When I query it everything looks nice and simple still:
db.data.find();
Which outputs:
{
"_id" : ObjectId("4f36121082b4c129cfce3901"),
"name" : "s1",
"children" : [ ]
}
However, after I update the "children" array by "pushing" an object into it, I get into all sorts of problems.
First the update command that I run:
db.data.update({ name:"s1" },{
$push: {
children: { name:"r1" }
}
});
Then when I query the DB:
db.data.find({
children: { name: "r1" }
});
Results in:
{
"_id" : ObjectId("4f36121082b4c129cfce3901"),
"children" : [ { "name" : "r1" } ],
"name" : "s1"
}
Which doesn't make any sense to me, since I would have expected the following:
{
"name": "r1"
}
Is there a better way of inserting data into MongoDB so that when I run queries I extract individual objects rather than the entire tree? Or perhaps a better way of writing the "find" query?

By default mongodb find retrieve all the fields(like * from in sql). You can extract the particular field by specifying the field name
.
db.data.find({ "children.name": "r1" },'children.name');

Why would you expect ot to return only part of a document? It returns the whole document unless you tell it which fields you want to explicitly include or exclude. See http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string