Supposed I have a book schema. It has few fields.
const Book = new schema({
title: String,
content: String,
like: Number
});
How to get the book which get the most like in the past 2 weeks?
Most at all, it need to updated daily.
For example, supposed that i only need likes in the past two day now. 5 people like the book totally on Day1, 7 people like the book on Day2, and 3 people on Day3. Hence, I expect 12 likes (5 + 7) on Day2, and 10 likes on Day3 (7+3)
I intend to add a field and to a 14-element array.
{
...,
likeCnt: {
type: Array,
default: Array(14).fill(0)
}
}
So only update the Date().getDate() % 14th element when someone likes the book.
However, I need to use a cron-job to zero every book's likeCnt[] everyday.
Please tell me a more efficient solution.
Thank a lot.
I would create a new array where I would keep the date of every change that happens to data.
{
...,
likes: [Date],
}
Then request it that way to get the last two weeks change:
collection.find({
likes: {
$gte: Date.now() - 1209600000,
},
});
This soluce are going to make you to keep the data even after the two weeks.
You can either remove them periodically, or keep it and allow you to maybe change your functionality later. What if next month you need data on 3 weeks? Always think generic and evolutive.
Related
I need to store something like startTime and endTime in my document. To give some more context, these will reflect the opening and closing times for a shop. So, for example, startTime could be 9AM and endTime could be 9PM. What is the best way to store this? This is what I am doing right now:
timings: {
startTime: {
type: String,
required: [true, "....."]
},
endTime: {
type: String,
required: [true, "....."]
}
}
The idea is to store the values as strings ("9AM", "9PM") and do some sort of time parsing each time I query the database. But I was wondering if there was a better approach to this? Another idea I had is to store it as DateTime and ignore the date part. What else can I do? I'd like to avoid parsing/processing on application level as much as possible and leverage the power of mongodb.
I'm using mongoose and nodeJS.
I would agree the Date type is not relevant (and it has something to do with time zones and you might not want to get there ...)
I would store it as a number, not a string. Why ? because you might want to query it (like all "give me all shops that opens after 8pm"), and doing it with a string will be annoying ...
I'd go with that :
{
startTime: {
value: number;
amOrPm: string; //(if you don't want to use a 24 hours base)
},
endTime: {
value: number;
amOrPm: string; //(if you don't want to use a 24 hours base)
},
timeOffset: number // So you keep track on the offset with the base timezone
}
You could also store the minutes, or even store the time only in "minutes ellapsed since midnight" and convert it every time there is an access.
Having the offset this way won't allow you to easily query for a specific moment across different timezones, but I guess it's totally useless in your case.
Also you could store days as a number ('officially' sunday is 0, then monday is 1), but nowadays it is as easy to store a name so well ...
Edit: for the days maybe it's better to go with an array :
{
daysOppenned: [0, 1, 4]
}
And finally, what if each day has a different time openning ? Maybe you would have to consider having an array of days, each containing the time openning and time of closing, like above.
If you want to get even more into details, some shops are closed in middays and some other (like restaurants) only opens two times a day, you could then offer them to tick cases on a schedule and store that in an array.
Let us know if you need to build somthing like that !
I'm having some issues designing a query that deals with overlapping dates.
So here's the scenario, I can't reveal too much about the actual project but here is a similar example. Lets say I have a FleaMarket. It has a bunch of data about itself such as name, location, etc.
So a FleaMarket would have many Stalls, that are available to be booked for a portion of the year (as short as 2 days, as long as all year sort of thing). So the FleaMarket needs to specify when in a year it will be open. Most scenarios would either be open all year, or all summer/fall, but it could possible be broken down further (because seasons determine pricing). Each FleaMarket would define their Seasons which would include a startDate and endDate (including year).
Here's an ERD to model this example:
When a user attempts to book a Stall, they have already selected a FleaMarket (although ideally it would be nice to search based on availability in the future). It's really easy to tell if a Stall is already booked for the requested dates:
bookings = await Booking.find({
startDate: { $lt: <requested end date> },
endDate: { $gt: <requested start date> },
fleaMarketId: <flea market id>,
}).select('stallId');
bookedIds = bookings.map(b => b.stallId);
stalls = await Stall.find({
fleaMarketId: <flea marked id>,
_id: { $nin: bookedIds }
});
The issue I'm having is determining if a Stall is available for the specified Season. The problem comes that 2 seasons could be sequential, so you could make a booking that spans 2 seasons.
I originally tried a query like so:
seasons = await Season.find({
fleaMarketId: <flea market id>,
startDate: { $lt: <requested end date> },
endDate: {$gt: <requested start date> }
});
And then programatically checked if any returned seasons were sequential, and plucked the available stalls from that that existed in all seasons. But unfortunately I just realized this won't work if the requested date only partially overlaps with a season (ex: requested Jan 1 2020 - Jan 10 2020, but the season is defined as Jan 2 2020 - May 1 2020)
Is there a way I can handle checking for completely overlapping dates that could possible overlap with multiple documents? I was thinking about calculating and storing the current and future available season dates (stored as total ranges) denormalized on the Stall.
At this point I'm almost thinking I need to restructure the schema quite a bit. Any recommendations? I know this seems very relational, but pretty much everywhere else in the application doesn't really do much with the relationships. It's just this search that is quite problematic.
Update:
I just had the thought of maybe creating some sort of Calendar Document that can store a centralized list of availability for a FleaMarket, that would do a rolling update to only store future and present data, and slowly wiping away historical data, or maybe archiving it in a different format. Perhaps this will solve my issue, I will be discussing it with my team soon.
So as I said in an update in my post, I came up with the idea to create a rolling calendar.
For anyone who is interested, here's what I got:
I created an Availability collection, that contains documents like the following:
{
marketId: ObjectId('5dd705c0eeeaf900450e7009'),
stallId: ObjectId('5dde9fc3bf30e500280f80ce'),
availableDates: [
{
date: '2020-01-01T00:00:00.000Z',
price: 30.0,
seasonId: '5dd708e7534f3700a9cad0e7',
},
{
date: '2020-01-02T00:00:00.000Z',
price: 30.0,
seasonId: '5dd708e7534f3700a9cad0e7',
},
{
date: '2020-01-03T00:00:00.000Z',
price: 35.0,
seasonId: '5dd708e7534f3700a9cad0e8',
}
],
bookedDuring: [
'2020-01-01T00:00:00.000Z'
'2020-01-02T00:00:00.000Z'
]
}
Then handling updates to this collection:
Seasons
when creating, $push new dates onto each stall (and delete dates from the past)
When updating, remove the old dates, and add on the new ones (or calculate difference, either works depending on the integrity of this data)
When deleting, remove dates
Stalls
When creating, insert records for associated seasons
When deleting, delete records from availability collection
Bookings
When creating, add dates to bookedDuring
When updating, add or remove dates from bookedDuring
Then to find available stalls for a market, you can query where { marketId: /* market's ID */, availableDates.date: { $all: [/* each desired day */], bookedDuring: { $nin: [/* same dates */ ] } }} and then pluck out the stallId
And to find markets that have available, do { availableDates.dates: { $all: [/* each desired day */], bookedDuring: { $nin: [/* same dates */ ] } }} select distinct marketIds
I have a lot of data in solr like this:
{
id: some_id
date: 2008-01-01T00:00:00Z
price: 34.20
currency: "CAD"
weight: 39.9
etc
}
I'd like to perform searches on it to find the unique set of ids, and group them by time. So sometimes I want to find the items that satisfy the search for each day, or week, or month.
The first way I tried to do this was set an fq (field query) to the date range I want, and set a facet.field=id to get the unique id for that range, but if I want to do this for each day I'd have to do 365(+0/+1) queries, which is quite a pain and very slow.
A solution to this was to use facet.pivot=date,id which would break this down into each day, and then for each day give the set of ids. This is perfect for the day case! However, how do we achieve the same thing for weekly? Or monthly?
What I want is the first facet.pivot, which is date, to be a range of values. So instead of getting this:
{
"responseHeader":{
...
},
"facet_counts":{
...
"facet_pivot":{
"date,id":[{
"field":"date",
"value":"2008-01-01T00:00:00Z",
"count":923,
"pivot":[{
"field":"id",
"value":18,
"count":1},
{
"field":"id",
"value":66,
"count":1},
{
"field":"id",
"value":70,
"count":1},
]
}
...]
}
}
We get something like this:
{
"responseHeader":{
...
},
"facet_counts":{
...
"facet_pivot":{
"date,id":[{
"field":"date",
"value":"2008-01-01T00:00:00Z TO 2008-01-31T00:00:00Z",
"count":923,
"pivot":[...similar to above]
}
...]
}
}
In other words, instead of it grouping based on the value of date, it groups based on a range/interval/etc. I've toyed around with SOLRs interval,range, etc but can't seem to get something that works.
Please try like below to get the monthly range with the gap of week.
facet.range={!tag=rdt}date&facet.range.start=NOW/DAY&facet.range.gap=+7DAY&facet.range.end=NOW/DAY +30DAY&facet=true&facet.pivot={!range=rdt}date,id
I hope this helps!
I have an app that listens to a websocket and it stores usernames/userID's (Usernames are 1-20 bytes, UserID's are 17 bytes). This is not a big deal because it's only one document. However, every round they participate in, it pushes the round ID (24 bytes) and a 'score' decimal value (ex: 1190.0015239999999).
The thing is, there is no limit to how many rounds there are and I can't afford to pay so much per month for mongolab. What's the best way to handle this data?
My thoughts:
- If there is a way to replace the _id: field in mongodb, I will replace it with the userID which is 17 bytes long. Not sure if I can do that though.
Store user data with timestamps and remove OLD data that has a score value less than 200.
Cut off user names that are more than 10 characters.
Completely remove Round ID's (Or replace the _id field with roundId). (Won't work since there are multiple roundID's in each document)
Round the decimal value to two places.
Remove Round ID's after 30 days
tl;dr
Need to store data efficiently < 500 mb in mongo lab
Documents consists of username(1-20 characters), userid(17 characters), round(Object Array) = [{round Id(24 characters), score(1190.0015239999999)}].
Thanks in advance!
Edit:
Document Schema:
userID: {type: String},
userName: {type: String},
rounds: [{roundID: String, score: String}]
Modelling 1:n relationships as embedded document is not the best except for very rare cases. This is because there is a 16MB size limit for BSON documents at the time of this writing.
A better (read more scalable and efficient approach) is to do use document references.
First, you need your player data, of course. Here is an example:
{
_id: "SomeUserId",
name: "SomeName"
}
There is no need for an extra userId field since each document needs to have a _id field with unique values anyway. Contrary to popular belief, this fields value does not have to be an ObjectId. So we already reduced the size you need for your player data by 1/3, if I am not mistaken.
Next, the results of each round:
{
_id: {
round: "SomeString",
player: "SomeUserId"
},
score: 5,
createdAt: ISODate("2015-04-13T01:03:04.0002Z")
}
A few things are to note here. First and foremost: Do NOT use strings to record values. Even grades should rather be stored as corresponding numerical values. Otherwise you can not get averages and alike. I'll show more of that later. We are using a compound field for _id here, which is perfectly valid. Furthermore, it will give us a free index optimizing a few of the most likely queries, like "How did player X score in round Y?"
db.results.find({"_id.player":"X","_id.round":"Y"})
or "What where the results of round Y?"
db.results.find({"_id.round":"Y"})
or "What we're the scores of Player X in all rounds?"
db.results.find({"_id.player":"X"})
However, by not using a string to save the score, even some nifty stats become rather cheap, for example "What was the average score of round Y?"
db.results.aggregate(
{ $match: { "_id.round":"Y" } },
{ $group: { "round":"$_id.round", "averageScore": {$avg:"$score"} }
)
or "What is the average score of each player in all rounds?"
db.results.aggregate(
{ $group: { "player: "$_id.player", "averageAll": {$avg:"$score"} }
)
While you could do these calculation in your application, MongoDB can do them much more efficiently since the data does not have to be send to your app prior to processing it.
Next, for the data expiration. We have a createdAt field, of type ISODate. Now, we let MongoDB take care of the rest by creating a TTL index
db.results.ensureIndex(
{ "createdAt":1 },
{ expireAfterSeconds: 60*60*24*30}
)
So all in all, this should be pretty much the most efficient way of storing and expiring your data, while improving scalability in the same time.
So currently you are storing three data points in the array for each record.
_id: false will prevent mongoose from automatically creating an id for the document. If you don't need roundID, then you can use the following which only stores one data point in the array:
round[{_id:false, score:String}]
Otherwise if roundID actually has meaning, use the following which stores two data points in the array:
round[{_id:false, roundID: string, score:String}]
Lastly, if you just need an ID for reference purposes, use the following, which will store two data points in the array - a random id and the score:
round[{score:String}]
I have a large collection of documents and each is valid for a range of days. The range could be from 1 week up to 1 year. I want to be able to get all the documents that are valid on a specific day.
How would I do that?
As an example say I have the following two documents:
doc1 = {
// 1 year ago to today
start_at: "2012-03-22T00:00:00Z",
end_at: "2013-03-22T00:00:00Z"
}
doc2 = {
// 2 months ago to today
start_at: "2012-01-22T00:00:00Z",
end_at: "2013-03-22T00:00:00Z"
}
And a map function:
(doc) ->
emit([doc.start_at, doc.end_at], null)
So for a date of 6 months ago I would only get doc1, a date of 1 week ago I would get both documents, and with a date of tomorrow I would receive no documents.
Note that actual resolution needs to be down to the second of the request being made and there are lots of documents, so strategies of emitting a key for every valid second would not be appropriate.
You could call emit for each day in your range, and then you can easily pick out the documents available for a specific day.
function(doc) {
var day = new Date(doc.start),
end = new Date(doc.end).getTime();
do {
emit(day);
day = new Date(day.getFullYear(), day.getMonth(), day.getDate() + 1);
} while (day.getTime() <= end);
}
Even though you will have lots of documents, if you leave out the value part (2nd param) of your emit, the index will be as small as it could possibly be.
If you need to get more sophisticated, you could try out couchdb-lucene. You can index date fields as date objects and execute range queries with multiple fields in 1 request.
You can translate the problem into the computational geometry problem of location. For documents in two dimensional plane [x,y]=[start_at,end_at] query for those, which are valid at date date is the list of the points in the rectangle bounded by: left=-infinity, right=date (start_at<date) and bottom=date, top=infinity (end_at>date).
Unfortunately, CouchDB team underrate the power of computational geometry and does not support multidimensional queries. There is GeoCouch extension that allows you to do this kind of queries as easy as:
http://localhost:5984/places/_design/main/_spatial/points?bbox=0,0,180,90
on the view emitting spatial value:
emit({ type: "Point", coordinates: [doc.start_at, doc.end_at] }, doc);
The problem is different data type. You get float in range of [-180.0,180.0]/[-90.0,90.0] and need at least int (UNIX time format). If GeoCouch works for you in ranges bigger then 180.0 and the precision of float operation designed for geographical calculation is sufficient for dates with precision of seconds your problem is solved :) I am sure, with few tricks and hacks, you could solve this problem efficiently in geo software. If not GeoCouch then perhaps ElastiSearch (also support multidimensional queries) which is easy to use with CouchDB with its River plugins system.