Graphql-ruby pagination with limits generates n+1 queries - pagination

Imagine that you have users and subscriptions on something. You need to make pagination for user's subscriptions. Each user has a different number of subscriptions. This is the first thing that came in my mind:
users = User.where(id:[array]).index_by(&:id) # find users and make an object with id as a key
subs = Subs.where(user_id: [array]).limit(3).offset(1) # find subs for all users what we need
subs.forEach{|s| users[s[:id]].subs << s} # build graphql response
But it won't work because it makes a limit for all users in general but we need for each of them.
The output should be like this:
{
users: [
{
id: 1,
subs: [sub1, sub2] // this user has only two elements and it's like an end of pagination
},
{
id: 2,
subs: [sub3, sub4, sub5] // this user has more items on next page
}
]
}
Graphql by default make sub-queries for each user to make it real but it n+1. Are there any ways to make it without n+1 and optimized by cpu and memory usage?

Solved here https://elixirforum.com/t/how-to-do-pagination-in-a-nested-graphql-query-with-dataloader-batch-load/25282 maybe it helps somebody. It should be query like this with a partitioning.
def query(queryable, params) do
case params do
%{chapters: true, offset: offset, first: first} ->
last = offset + first
query = from r in queryable, select: r, select_merge: %{chapter_number: fragment("row_number() over (PARTITION by parent_id order by \"name\")")}
from r in subquery(query), select: %Wikisource.Book{id: r.id, name: r.name, info: r.info, preface: r.preface, info_html: r.info_html, preface_html: r.preface_html}, where: r.chapter_number >= ^offset and r.chapter_number < ^last
%{order_by: order_by, offset: from, first: size} -> from record in queryable, order_by: ^order_by, offset: ^from, limit: ^size

Related

Arangodb AQL nested subqueries relying on the data from another

I currently have three collections that need to be routed into one endpoint. I want to get the Course collection sort it, then from that course, I have to use nested subqueries to fetch a random review(there could be multiple tied to the same course) and also get the related user.
User{
name:
_id:User/4638
key: ...}
Review{
_from: User/4638
_to: Course/489
date: ....}
Course{
_id: Course/489
title: ...}
The issue I'm having is fetching the user based on the review. I've tried MERGE, but that seems to limit the query to one use when there should be multiple. Below is the current output using LET.
"course": {
"_key": "789",
"_id": "Courses/789",
"_rev": "_ebjuy62---",
"courseTitle": "Pandas Essential Training",
"mostRecentCost": 15.99,
"hours": 20,
"averageRating": 5
},
"review": [
{
"_key": "543729",
"_id": "Reviews/543729",
"_from": "Users/PGBJ38",
"_to": "Courses/789",
"_rev": "_ebOrt9u---",
"rating": 2
}
],
"user": []
},
Here is the current LET subquery method I'm using. I was wondering if there was anyway to pass or maybe nest the subqueries so that user can read review. Currently I try to pass the LET var but that isn't read in the output since a blank array is shown.
FOR c IN Courses
SORT c.averageRating DESC
LIMIT 3
LET rev = (FOR r IN Reviews
FILTER c._id == r._to
SORT RAND()
LIMIT 1
RETURN r)
LET use = (FOR u IN Users
FILTER rev._from == u._id
RETURN u)
RETURN {course: c, review: rev, user: use}`
The result of the first LET query, rev, is an array with one element. You can re-write the complete query two ways:
Set rev to the first element of the LET query result:
FOR c IN Courses
SORT c.averageRating DESC
LIMIT 3
LET rev = (FOR r IN Reviews
FILTER c._id == r._to
SORT RAND()
LIMIT 1
RETURN r)[0]
LET use = (FOR u IN Users
FILTER rev._from == u._id
RETURN u)
RETURN {course: c, review: rev, user: use}
I use this variant in my own projects.
Access the first elememt og rev in the second LET query:
FOR c IN Courses
SORT c.averageRating DESC
LIMIT 3
LET rev = (FOR r IN Reviews
FILTER c._id == r._to
SORT RAND()
LIMIT 1
RETURN r)
LET use = (FOR u IN Users
FILTER rev[0]._from == u._id
RETURN u)
RETURN {course: c, review: rev, user: use}
This is untested, the syntax might need slight changes. And you have to look at cases where there aren't any reviews - I can't say how this behaves in that case from the top of my head.

Increment a field conditioned by a WHERE

I can't seem to figure out how to do this in Sequelize. I have an instance from findOne, and I want to increment one of its fields using an expression, and only under certain conditions. Something such as:
UPDATE Account SET balance = balance - 10 WHERE balance >= 10;
I want the db to calculate the expression, as this isn't happening in a transaction. So I can't do a SET balance = 32. (I could do SET balance = 32 WHERE balance = 42, but that's not as effective.) I don't want to put a CHECK in there, as there are other places where I do want to allow a negative balance.
(Our Sequelize colleague has left, and I can't figure out how to do this).
I see the instance.increment and decrement, but it doesn't look like they take a where object.
I don't see how to express the setting of balance = balance - 10, nor of expressing the expression in the where object.
You are probably looking for Model.decrement instead of instance.decrement. instance.decrement is for updating the specific record so where doesn't make sense.
Model.decrement: https://sequelize.org/master/class/lib/model.js~Model.html#static-method-decrement
The example in the link shows similar scenario as yours.
============================================
Update:
This translates to your example.
const Op = require('sequelize').Op;
Account.decrement('balance', {
by: 10,
where: {
balance: {
[Op.gte]: 10
}
}
});
Based on #Emma comments, here's what I have working. amountToAdd is just that; cash_balance is the field I'm incrementing. The check on cash_balance ensures that if I'm decrementing (that is amount_to_add is < 0), the balance doesn't go below 0. I need to muck around with that some.
const options = {
where: {
user_id: userId,
cash_balance: {
[ Op.gte ]: amountToAdd,
},
},
by: amountToAdd
};
const incrResults = await models.User.increment( 'cash_balance', options );

Sequelize Top level where with eagerly loaded models creates sub query

I'm running an into issue where Sequelize creates a subquery of the primary model and then joins the includes with that subquery instead of directly with the primary model table. The query conditions for the include(s) ends up inside the subquery's WHERE clause which makes it invalid. I have shortened names down trying to keep this compact hopefully without losing any relevant info.
Environment:
Nodejs: 6.11.3
Sequelize: 3.23.6 => Updated to 4.38.1 and problem persists
MySql: 5.7.23
Code snip models:
I.model:
models.I.hasMany(models.II);
models.I.belongsTo(models.CJ);
models.I.belongsTo(models.CJS);
II.model:
models.II.belongsTo(models.I);
CJ.model:
models.CJ.hasMany(models.I);
models.CJ.hasMany(models.CJS);
CJS.model:
models.CJS.hasMany(models.I);
Code snip query definition:
let where = { cId: '2',
iAmt: { '$gt': 0 },
'$or':
[ { '$CJ.a1$': {$like: '%246%'}} },
{ '$CJ.a2$': {$like: '%246%'} },
{ '$I.cPN$': {$like: '%246%'} }
] };
let query = {
where: where,
order: orderBy,
distinct: true,
offset: offset,
limit: limit,
include: [
{
model: CJ,
as: 'CJ',
required: false
}, {
model: CJS,
as: 'CJS',
required: false
}, {
model: II,
as: 'IIs',
required: false
}
]
};
I.findAll(query)
Produces SQL like the following:
SELECT `I`.*, `CJ`.`_id` AS `CJ._id`, `CJS`.`_id` AS `CJS._id`, `IIs`.`_id` AS `IIs._id`
FROM (SELECT `I`.`_id`, `I`.`CJId`, `I`.`CJSId`, `I`.`CId`
FROM `Is` AS `I`
WHERE `I`.`CId` = '2' AND
`I`.`iA` > 0 AND
(`CJ`.`a1` LIKE '%246%' OR
`CJ`.`a2` LIKE '%246%' OR
`I`.`cPN` LIKE '%246%'
)
ORDER BY `I`.`iNum` DESC LIMIT 0, 10) AS `I`
LEFT OUTER JOIN `CJs` AS `CJ` ON `I`.`CJId` = `CJ`.`_id`
LEFT OUTER JOIN `CJSs` AS `CJS` ON `I`.`CJSId` = `CJS`.`_id`
LEFT OUTER JOIN `IIs` AS `IIs` ON `I`.`_id` = `IIs`.`IId`
ORDER BY `I`.`iNum` DESC;
I was expecting something like this:
SELECT `I`.*, `CJ`.`_id` AS `CJ._id`, `CJS`.`_id` AS `CJS._id`, `IIs`.`_id` AS `IIs._id`
FROM `Is` AS `I`
LEFT OUTER JOIN `CJs` AS `CJ` ON `I`.`CJId` = `CJ`.`_id`
LEFT OUTER JOIN `CJSs` AS `CJS` ON `I`.`CJSId` = `CJS`.`_id`
LEFT OUTER JOIN `IIs` AS `IIs` ON `I`.`_id` = `IIs`.`IId`
WHERE `I`.`CId` = '2' AND
`I`.`iA` > 0 AND
(`CJ`.`a1` LIKE '%246%' OR
`CJ`.`a2` LIKE '%246%' OR
`I`.`cPN` LIKE '%246%'
)
ORDER BY `I`.`iNum` DESC LIMIT 0, 10
If I remove the II model from the include it does work and moves the the WHERE to the top level. I admit the structure of the query is not straight forward here, with I being a child of CJ and CJS, which in turn is a child of CJ. And then II a child of I. What am I missing here?
Bueller's or anyone's 2 cent welcome!
what happened here is because you also using order and limit together with eager loading association see the issue
to make it work, there is a little hacky solution, you need to add subQuery: false together to your root model query
let query = {
where: where,
order: orderBy,
distinct: true,
offset: offset,
limit: limit,
subQuery: false,
include: [...]
};

ArangoDB Faceted Search Performance

We are evaluating ArangoDB performance in space of facets calculations.
There are number of other products capable of doing the same, either via special API or query language:
MarkLogic Facets
ElasticSearch Aggregations
Solr Faceting etc
We understand, there is no special API in Arango to calculate factes explicitly.
But in reality, it is not needed, thanks for a comprehensive AQL it can be easily achieved via simple query, like:
FOR a in Asset
COLLECT attr = a.attribute1 INTO g
RETURN { value: attr, count: length(g) }
This query calculate a facet on attribute1 and yields frequency in the form of:
[
{
"value": "test-attr1-1",
"count": 2000000
},
{
"value": "test-attr1-2",
"count": 2000000
},
{
"value": "test-attr1-3",
"count": 3000000
}
]
It is saying, that across my entire collection attribute1 took three forms (test-attr1-1, test-attr1-2 and test-attr1-3) with related counts provided.
Pretty much we run a DISTINCT query and aggregated counts.
Looks simple and clean. With only one, but really big issue - performance.
Provided query above runs for !31 seconds! on top of the test collection with only 8M documents.
We have experimented with different index types, storage engines (with rocksdb and without), investigating explanation plans at no avail.
Test documents we use in this test are very concise with only three short attributes.
We would appreciate any input at this point.
Either we doing something wrong. Or ArangoDB simply is not designed to perform in this particular area.
btw, ultimate goal would be to run something like the following in under-second time:
LET docs = (FOR a IN Asset
FILTER a.name like 'test-asset-%'
SORT a.name
RETURN a)
LET attribute1 = (
FOR a in docs
COLLECT attr = a.attribute1 INTO g
RETURN { value: attr, count: length(g[*])}
)
LET attribute2 = (
FOR a in docs
COLLECT attr = a.attribute2 INTO g
RETURN { value: attr, count: length(g[*])}
)
LET attribute3 = (
FOR a in docs
COLLECT attr = a.attribute3 INTO g
RETURN { value: attr, count: length(g[*])}
)
LET attribute4 = (
FOR a in docs
COLLECT attr = a.attribute4 INTO g
RETURN { value: attr, count: length(g[*])}
)
RETURN {
counts: (RETURN {
total: LENGTH(docs),
offset: 2,
to: 4,
facets: {
attribute1: {
from: 0,
to: 5,
total: LENGTH(attribute1)
},
attribute2: {
from: 5,
to: 10,
total: LENGTH(attribute2)
},
attribute3: {
from: 0,
to: 1000,
total: LENGTH(attribute3)
},
attribute4: {
from: 0,
to: 1000,
total: LENGTH(attribute4)
}
}
}),
items: (FOR a IN docs LIMIT 2, 4 RETURN {id: a._id, name: a.name}),
facets: {
attribute1: (FOR a in attribute1 SORT a.count LIMIT 0, 5 return a),
attribute2: (FOR a in attribute2 SORT a.value LIMIT 5, 10 return a),
attribute3: (FOR a in attribute3 LIMIT 0, 1000 return a),
attribute4: (FOR a in attribute4 SORT a.count, a.value LIMIT 0, 1000 return a)
}
}
Thanks!
Turns out main thread has happened on ArangoDB Google Group.
Here is a link to a full discussion
Here is a summary of current solution:
Run custom build of the Arango from a specific feature branch where number of performance improvements has been done (hope they should make it to a main release soon)
No indexes are required for a facets calculations
MMFiles is a preferred storage engine
AQL should be written to use "COLLECT attr = a.attributeX WITH COUNT INTO length" instead of "count: length(g)"
AQL should be split into smaller pieces and run in parallel (we are running Java8's Fork/Join to spread facets AQLs and then join them into a final result)
One AQL to filter/sort and retrieve main entity (if required. while sorting/filtering add corresponding skiplist index)
The rest are small AQLs for each facet value/frequency pairs
In the end we have gained >10x performance gain compare to an original AQL provided above.

CouchDB view map function that doesn't segregate reduce keys

Here is the doc "schema":
{
type: "offer",
product: "xxx",
price: "14",
valid_from: [2012, 7, 1, 0, 0, 0]
}
There are a lot of such documents with many valid dates in the past and the future and a lot of times two or three offers in the same month. I can't find a way to make the following view: Given a date, give me a list of the products and their running offer for that date.
I think I need to emit the valid_date field in order to set the endkey of the query to the given date and then I need to reduce on the max of this field, which means i can't emit it.
Have I got it wrong? I am totally new to the map/reduce concept. Any suggestions on how to do it?
I'm really thrown by your comments about wanting to reduce, based on your requirements you want just a map function - no reduce. Here's the map function based on what you asked for:
function(d) {
if( d.type === 'offer' ) {
var dd = d.valid_from;
dd[1] = ( '0' + ( dd[1] + 1 )).slice(-2); // remove the +1 if 7 is July not August
dd[2] = ( '0' + dd[2] ).slice(-2);
emit( dd.slice(0,3).join('-') );
}
}
Then to show all offers valid for a given day you'd query this view with params like:
endkey="2012-08-01"&include_docs=true

Resources