cypher pagination total result count - pagination

I have a monstrosity of a cypher query and I need to paginate the results of it. What I am trying to do is to get the total number of results before limit is done.
Here is my test graph: http://console.neo4j.org/?id=6hq9tj
I tried to use count(o) in all parts of the query but I always get the same result: 'total_count: 1'. Like in here: http://console.neo4j.org/?id=konr7. The result what I am trying to get should be: 'total_count: 6'.
I always could make an another query just to count the results but it makes no sense to execute two queries.
Please can any one help me one this? Thanks!

Something like this should work:
MATCH (o:Brand)
WITH o
ORDER BY o.name
WITH collect({uuid:o.uuid, name:o.name}) AS brands, COUNT(distinct o.uuid) AS total
UNWIND brands AS brand_row
WITH total, brand_row
SKIP 5
LIMIT 5
RETURN COLLECT(brand_row) AS brands, total;
Note: this is untested, something similar worked for me. Also, not sure how performant it is.

The only way I've gotten this to work is by defining the query twice, I'm not sure though what the impact is on performance, I would guess or hope it was cached the first time. Be warned: This is not a real solution as my comment above to the question states, if you use an offset out of range, nothing is returned!
// first query only to get count
MATCH (x:Brand)
WITH count(*) as total
// query again to get results :(
MATCH (o:Brand)
WITH total, o
ORDER BY o.name SKIP 5 LIMIT 5
WITH total, collect({uuid:o.uuid, name:o.name}) AS brands
RETURN {total:total, brands:brands}
If anyone comes up with a better solution, I as well would love to see it, spent enough time trying to get this to work properly.
Slightly better solution that can handle offset out of range...
// first query to get results
MATCH (o:Brand)
WITH o
ORDER BY o.name SKIP 5 LIMIT 5
WITH collect({uuid:o.uuid, name:o.name}) AS brands
// then query again to get count
MATCH (x:Brand)
WITH brands, count(*) as total
RETURN {total:total, brands:brands}
But it's still two queries and isn't a valid answer to the original question

Related

How to retrieve all results from NearBySearch on Azure?

I am using NearBySearch from Microsoft Azure. In the official documentation it says that when you make a query the totalResults that the API can return is X. However, you can also read that there is a limit on the number of items returned which is at most 100.
In the case that the totalResults >= limit == 100, the API will only display the first 100 results, thus not showing the remaining ones.
Question: Would you be able to suggest a way to retrieve the additional results using the NearBySearch function?
Note: On the Google API NearBySearch there is a parameter called next_page_token, which allows to view all the possible results. Is there something similar in Azure?
You have a limit of 100 results each query. If you have 150 totalResults. You can execute the query with ofs= 0 and limit= 100 to get the first 100 entries. After that you execute the second query with the ofs=100 (because it is like an index). Your limit is 100. After that you will get the next 100 results. Because there are only 50 results left, your numResults will be 50.
I hope it is understandable
Would you be able to suggest a way to retrieve the additional results
using the NearBySearch function?
Looking at the documentation, I noticed that there is an offset parameter (ofs) which by default is zero. You should be able to use that to get the next set of results if the total results are more than the limit specified by you.

Find documents in MongoDB with non-typical limit

I have a problem, but don't have idea how to resolve it.
I've got PointValues collection in MongoDB.
PointValue schema has 3 parameters:
dataPoint (ref to DataPoint schema)
value (Number)
time (Date)
There is one pointValue for every hour (24 per day).
I have API method to get PointValues for specified DataPoint and time range. Problem is I need to limit it to max 1000 points. Typical limit(1000) method isn't good way, because I need point for whole, specified time range, with time step depends on specified time range and point values count.
So... for example:
Request data for 1 year = 1 * 365 * 24 = 8760
It should return 1000 values but approx 1 value per (24 / (1000 / 365)) = ~9 hours
I don't have idea what method i should use to filter that data in MongoDB.
Thanks for help.
Sampling exactly like that on the database would be quite hard to do and likely not very performant. But an option which gives you a similar result would be to use an aggregation pipeline which $group's the $first best value by $year, $dayOfYear, and $hour (and $minute and $second if you need smaller intervals). That way you can sample values by time steps, but your choices of step lengths are limited to what you have date-operators for. So "hourly" samples is easy, but "9-hourly" samples gets complicated. When this query is performance-critical and frequent, you might want to consider to create additional collections with daily, hourly, minutely etc. DataPoints so you don't need to perform that aggregation on every request.
But your documents are quite lightweight due to the actual payload being in a different collection. So you might consider to get all the results in the requested time range and then do the skipping on the application layer. You might want to consider combining this with the above described aggregation to pre-reduce the dataset. So you could first use an aggregation-pipeline to get hourly results into the application and then skip through the result set in steps of 9 documents. Whether or not this makes sense depends on how many documents you expect.
Also remember to create a sorted index on the time-field.

Mongodb: How to get records N to M?

Lets say we want to get records 5 to 10 (second last 5 records).
What query should be used in nodejs or mongodb shell ?
I know to get last 5 messages one could try this (in nodejs):
db.collection(collection_name).find().limit(5);
As #Saleem posted in the comments, you would need to do a .skip()
db.coll.find(queryDoc).skip(x).limit(y)
However, to have a predictable order, you should add a .sort()
db.coll.find(queryDoc).sort(sortDoc).skip(x).limit(y)
Limit cannot return a set results in a range, it's for literally limiting the results to X number.
Assuming you're still using NodeJS,
What you can try, according to the documentation is to use min/max.
min {Number}, min set index bounds.
max {Number}, max set index bounds.
db.collection(user_name).find().min({index:5}).max({index:10});

Traversing the optimum path between nodes

in a graph where there are multiple path to go from point (:A) to (:B) through node (:C), I'd like to extract paths from (:A) to (:B) through nodes of type (c:C) where c.Value is maximum. For instance, connect all movies with only their oldest common actors.
match p=(m1:Movie) <-[:ACTED_IN]- (a:Actor) -[:ACTED_IN]-> (m2:Movie)
return m1.Name, m2.Name, a.Name, max(a.Age)
The above query returns the proper age for the oldest actor, but not always his correct name.
Conversely, I noticed that the following query returns both correct age and name.
match p=(m1:Movie) <-[:ACTED_IN]- (a:Actor) -[:ACTED_IN]-> (m2:Movie)
with m1, m2, a order by a.age desc
return m1.name, m2.name, a.name, max(a.age), head(collect(a.name))
Would this always be true? I guess so.
I there a better way to do the job without sorting which may cost much?
You need to use ORDER BY ... LIMIT 1 for this:
match p=(m1:Movie) <-[:ACTED_IN]- (a:Actor) -[:ACTED_IN]-> (m2:Movie)
return m1.Name, m2.Name, a.Name, a.Age order by a.Age desc limit 1
Be aware that you basically want to do a weighted shortest path. Neo4j can do this more efficiently using java code and the GraphAlgoFactory, see the chapter on this in the reference manual.
For those who are willing to do similar things, consider read this post from #_nicolemargaret which describe how to extract the n oldest actors acting in pairs of movies rather than just the first, as with head(collect()).

How to get total rows for cypher with skip limit?

I am able to use skip, limit (and order by) to fetch the contents of particular page in the UI.
E.g. to render nth page of page size m. UI asks for skip n*m and limit m.
But, UI wants to generate links for all the possible pages. For that i have to return it total rows available in neo4j.
E.g. for total p rows, the UI will generate hyperlink 1,2,3... (p/m).
What is the best(in terms of performance) way to get the total number of rows while using skip, limit in the the cypher?
In general it is not advisable as fetching all results requires you to fetch large swaths of the graph into memory.
You have two options:
use a simpler version of your query as separate count query (which might also run asynchronously)
merge the count query and your real query into one, but it will be much more expensive than your skip-limit query, in the worst case totalcount/pageSize times more expensive
start n=node:User(name={username})
match n-[:KNOWS]->()
with n,count(*) as total
match n-[:KNOWS]->m
return m.name, total
skip {offset}
limit {pagesize}

Resources