Sort and limit capping results - node.js

I got a collection containing 20,000+ documents where only id is indexed.
I'm want to query using limit and a sort, but the limit kicks in before the sort.
collection.find({})
.sort({kick_return_yards: -1}) // desc largest to smallest
.limit(500)
.toArray (err, records){
});
kick_return_yards is a positive whole number. I know I have values for this number ranging from 0 to 177.
With above query I only get documents where kick_return_yards is 0. So it seems sort is executed after limit.
How do I get around this hoop?

Related

Get PartitionedList for partition with more than 2000 documents

I have a partitioned Cloudant database (on the free tier) with a partition that has more than 2000 documents. Unfortunately, running await db.partitionedList('partitionID') returns this object:
{
total_rows: 2082,
offset: 0,
rows: [...]
}
where rows is an array of only 2000 objects. Is there a way for me to get those 82 remaining rows, or get a list of all 2082 rows together. Thanks.
Cloudant limits the _partition endpoints to returning a maximum of 2000 rows so you can't get all 2082 rows at once.
The way to get the remaining rows is by storing the doc ID of the last row and using it to make a startkey for a second request, appending \0 to ask the list to start from the next doc ID in the index e.g.
db.partitionedList('partitionID', {
startkey: `${firstResponse.rows[1999].id}\0`
})
Note that partitionedList is the equivalent of /{db}/_partition/{partitionID}/_all_docs so key and id are the same in each row and you can safely assume they are unique (because it is a doc ID) allowing use the unicode \0 trick. However, if you wanted to do the same with a _view you'd need to store both the key and id and fetch the 2000th row twice.

Limit returned entries per user per query

I have the following request:
let artPerCall = 20
let artPerUser = 2
let start = req.params.page
let query = `
SELECT * from
(
SELECT a.*, row_to_json(u.*) as userinfo,
row_number() over (partition by u.address order by a.date desc) as ucount
FROM artworks a INNER JOIN users u ON a.address = u.address
WHERE a.flag != ($1) OR a.flag IS NULL
) t
WHERE ucount <= ($2)
ORDER BY date DESC
LIMIT ${artPerCall} OFFSET ${(start-1) * artPerCall}`
pool.query(query, ["ILLEGAL", artPerUser])
.then(users => {
if (users) {
res.json(users.rows);
}
})
.catch(err => {
next(err);
})
Called through an API with the following path /artworks/paginate/1/20 where (/artworks/paginate/{page}/20)
The expected result is to get 20 results per call with a maximum of 2 entries per user.
The current result:
It seem that it return only 2 entries per user as expected but once it return 2 for a user on a page then no more result for the same user in the following pages even if they have entries.
Any idea what i'm missing?
It seem that it return only 2 entries per user as expected but once it return 2 for a user on a page then no more result for the same user in the following pages even if they have entries.
Correct, this is what the query does. It selects:
row_number() over (partition by u.address order by a.date desc) as ucount
...
WHERE ucount <= ($2)
If parameter $2 is set to 2 as it is in your example code, then it will select 2 entries per user, not more, before sorting and pagination. If the user has more entries they will be filtered out.
If you remove "WHERE ucount <= ($2)" then you'll simply get all the results ordered by date, but that doesn't sound like what you want.
However, what I think you want to achieve sounds a bit complicated. I'm not sure either it would be great for usability as the results would look quite random to the user. So you will need to describe exactly what you want, with example data.
For example, if you want to avoid one user posting a lot of items with the same date pushing all the other users down in the search results, limiting the number of results per user is a good idea, but perhaps a button "more from this user..." would be a better choice than pushing the users' items down to the next pages.
Suppose you have only two users, user1 posts 20 items with date "today" and user2 posted 10 items yesterday. Do you want 2 items from user1, then 2 items from user2, then the 18 remaining items from user1, then the 8 remaining items from user? Or will they be interleaved with each other somewhat, which will make the date order a bit random in the results?
EDIT
Here's a proposal:
SELECT * from
(
SELECT *,
row_number() over (partition by user_id order by date desc) as ucount
FROM artworks
) t
ORDER BY (ucount/3)::INTEGER ASC, date DESC
LIMIT 20 OFFSET 0;
"(ucount/3)::INTEGER" is 0 for the first two artworks of each user, then 1 for the next 3, then 2 for the next 3, etc. So the most recent 2 artworks of each user end up first, then the most recent 3 artworks of each user, etc.
Another one:
ORDER BY ucount<3 ASC, date DESC
This will put the most recent 2 artworks of each user first, then the rest is simply sorted by date.

Is there a way to query for documents that have a certain millisecond in a timestamp? [Mongo]

I have a collection on my db with a "created" field that is a timestamp.
What I need to do is somehow get the milliseconds on that timestamp, divide it by 10 and then floor it and THEN match it to a number.
What I'm trying to achieve with this is have a random spread of this documents into "batches" that range from 0-100.
Is there a way to do all this in one query? Basically something like:
Tried aggregation to add that field but then I'm getting all the documents and filtering, was wondering if there's a way to do all this "in-query"
collection.find({<created.convertToMilliseconds.divideByTen>: X }})
// X being any number from 0 to 100
I

CouchDB views: total_rows vs offset vs rows?

I am making a POST request to a CouchDB with a list of keys in the body.
This is a follow up on a previous question asked on Stack Overflow here: CouchDB Query View with Multiple Keys Formatting).
I see that the result has 711 rows returned in this case, with an offset of 209. To me an offset means valid results that have been truncated - and you would need to go to the next page to see them.
I'm getting confused because the offset, rows, and what I actually get does not seem to add up. These are the results that I'm getting:
{
total_rows: 711,
offset: 209,
rows: [{
id: 'b45d1be2-9173-4008-9240-41b01b66b5de',
key: 2213,
value: [Object]
}, {
id: 'a73d0b13-5d36-431f-8a7a-2f2b45cb480d',
key: 2214,
value: [Object]
},
etc BUT THERE ARE ONLY 303 OBJECTS IN THIS ARRAY????
]
}
You have not supplied the query parameters you are using so I'll have to be a little general.
The total_rows value is the total number of rows in the view itself. The offset is the index in the view of the first matching row for the given query. The number of rows matching the query parameters are returned in the rows array, the total of which are trivial to obtain.
If there are no entries in the view for a direct key query, the offset value is the index into the view where the entry would be if it had the desired key.
It would seem that the offset refers to the number of documents BEFORE the first document that matches the key criteria is found.
and then the rows are all the documents that match the criteria.
i.e. rows returns all the documents that match the key criteria, and offset tells you what 'index' within all the docs returned by the view that the first document that matches the key criteria was found.
Please let me know if this is not correct :)

Cassandra aggregation

I have a Cassandra cluster with 4 table and data inside.
I want to make request with aggregation function ( sum, max ...) but I've read here that it's impossible :
http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/cql_function_r.html
Is there a way to make sum , average, group by, without buying the enterprise version, can I use presto , or other solutions?
Thanks
Aggregate functions will be available as part of Cassandra 3.0
https://issues.apache.org/jira/browse/CASSANDRA-4914
Sure it does:
Native aggregates
Count
The count function can be used to count the rows returned by a query.
Example:
SELECT COUNT (*) FROM plays;
SELECT COUNT (1) FROM plays;
It also can be used to count the non null value of a given column:
SELECT COUNT (scores) FROM plays;
Max and Min
The max and min functions can be used to compute the maximum and the
minimum value returned by a query for a given column. For instance:
SELECT MIN (players), MAX (players) FROM plays WHERE game = 'quake';
Sum
The sum function can be used to sum up all the values returned by a
query for a given column. For instance:
SELECT SUM (players) FROM plays;
Avg
The avg function can be used to compute the average of all the values
returned by a query for a given column. For instance:
SELECT AVG (players) FROM plays;
You can also create your own aggregates, more documentation on aggregates here: http://cassandra.apache.org/doc/latest/cql/functions.html?highlight=aggregate

Resources