"ignoreDocumentNotFound", "readCompleteInput" query options in explain - arangodb

When I use "explain" on an insert query I get two query options that seems not to be documented:
ignoreDocumentNotFound
readCompleteInput
What are these options for and what they do?

Nice to see you like our db._explain() Facility ;-)
To answer your question one has to know that explain re-uses a backend functionality that is also used for different purposes:
distribute AQL queries in ArangoDB clusters
analyse what the optimizer did with queries in Unittests
The later will explain queries, and check whether certain assumptions over the query plan are still valid.
The ignoreDocumentNotFound and readCompleteInput flags are exactly intended for that purpose, so the unittests can revalidate whether certain assumptions for the query are still true.
Since they don't contain additional value for the end user, they're not documented. One could argue whether explain should hide them to avoid irretations

Related

Customize jOOQ dialect to alter the order in which LIMIT and OFFSET are rendered in a statement

I'm using jOOQ to generate queries to run against Athena (AKA PrestoDB/Trino)
To do this, I am using SQLDialects.DEFAULT, and it works because I use very basic query functionalities.
However, jOOQ renders queries like this:
select *
from "Artist"
limit 10
offset 10
God knows why, but the order of limit and offset seem to matter, and the query only works if written with the order swapped:
select *
from "Artist"
offset 10
limit 10
Is there a class I can subclass, to modify the statement render function so that the order of these are swapped? Or any other way of implementing this myself?
A generic solution in jOOQ
There isn't a simple way to change something as fundamental as the SELECT clause order (or any other SELECT clause syntax) so easily in jOOQ, simply, because this was never a requirement for core jOOQ usage, other than supporting fringe SQL dialects. Since the support of a SQL dialect is a lot of work in jOOQ (with all the integration tests, edge cases, etc.) and since market shares of those dialects are low, it has simply never been a priority to improve this in jOOQ.
You may be tempted to think that this is "just" about the order of keywords in this one case. "Only this one case." It never is. It never stops, and the subtle differences in dialects never end. Just look at the jOOQ code base to get an idea of how weirdly different vendors choose to make their dialects. In this particular case, one would think that it seems extremely obvious and simple to make this clause MySQL / PostgreSQL / SQLite compatible, so your best chance is to make a case with the vendor for a feature request. It should be in their own best interest to be more compatible with the market leaders, to facilitate migration.
Workarounds in jOOQ
You can, of course, patch your generated SQL on a low level, e.g. using an ExecuteListener and a simple regex. Whenever you encounter limit (\d+|\?) offset (\d+|\?), just swap the values (and bind values!). This might work reasonably well for top level selects. It's obviously harder if you're using LIMIT .. OFFSET in nested selects, but probably still doable.
Patching jOOQ is always an option. The class you're looking for in jOOQ 3.17 is org.jooq.impl.Limit. It contains all the rendering logic for this clause. If that's your only patch, then it might be possible to upgrade jOOQ. But obviously, patching is a slippery slope, as you may start patching all sorts of clauses, making upgrades impossible.
You can obviously use plain SQL templates for simple cases, e.g. resultQuery("{0} offset {1} limit {2}", actualSelect, val(10), val(10)). This doesn't scale well, but if it's only about 1-2 queries, it might suffice
Using the SQLDialect.DEFAULT
I must warn you, at this point, that the behaviour of SQLDialect.DEFAULT is unspecified. Its main purpose is to produce something when you call QueryPart.toString() on a QueryPart that is not an Attachable, where a better SQLDialect is unavailable. The DEFAULT dialect may change between minor releases (or even patch releases, if there's an important bug in some toString() method), so any implementation you base on this is at risk of breaking with every upgrade.
The most viable long term solution
... would be to have support for these dialects in jOOQ:
#5414 Presto
#11485 Trino

MongoDB most efficient Query Strategy

I state that I have already tried to look in the Mongo documentation, but I have not found what I am looking for. I've also read similar questions, but they always talk about very simple queries. I'm working with the Node's Mongo native driver. This is a scalability problem, so the collections I am talking about can have millions of records or some dozen.
Basically I have a query and I need to validate all results (which have a complex structure). Two possible solutions come to mind:
I create a query as specific as possible and try to validate the result directly on the server
I use the cursor to go through the documents one by one from the client (this would also allow me to stop if I am looking for only one result)
Here is the question: what is the most efficient way, in terms of latency, overall time, bandwidth use and computational weight server/client? There is probably no single answer, in fact I'd like to understand the pros and cons of the different approaches (and whichever approach you recommend). I know the solution should be determined on a case-by-case basis, however I am trying to figure out what could best cover most of the cases.
Also, to be more specific:
A) Being a complex query (several nested objects with ranges of values ​​and lists of values ​​allowed), performing the validation from the server would certainly save bandwidth, but is it always possible? And in terms of computation could it be more efficient to do it on the client?
B) I don't understand the cursor behavior: is it a continuously open stream until it is closed by server/client? In addition, does the next() result already take up resources on the server/client or does it happen to the call?
If anyone knows, I'd also like to know how Mongoose solved these "problems", for example in the case of custom validators.

Pagination in Gremlin

I'm trying to do pagination in Gremlin. I've followed the solution provided on gremlin recipes. So right now I am using the range() step to truncate results. This works well because when users asks for x results id only queries for x of them and doesn't perform full search which is significantly faster.
However in Gremlin docs it states that:
A Traversal’s result are never ordered unless explicitly by means of order()-step. Thus, never rely on the iteration order between TinkerPop3 releases and even within a release (as traversal optimizations may alter the flow).
And order is really important for pagination (we don't want user to have same results on different pages).
Adding order() would really slow down querying, since it will have to query all vertices that apply to the search and then truncate it with range().
Any ideas how this can be solved to have consistency between queries and still have smaller query time for not-full search?
The documentation is correct, but could use some additional context. Without order() you aren't guaranteed a specific order, unless the underlying graph database guarantees that order. TinkerPop should preserve that guarantee if it exists. So, ultimately you need to consider what the underlying graph will do.
As you are using JanusGraph, I'm pretty sure that it does not guarantee order of result iteration but there are caveats depending on the type of indexing that you do. You can read more about that here but may want to ask specific questions on the JanusGraph user list.

Cocuhdb a fit for a query like this?

I have been looking for an escape from GAE as the datastore does not support a lot of the things I want to do with it.
So I have looked at CouchDB (among others) and I really like the REST interface and the hosting option I found at Cloudant.
But for all my googling and reading any docs I could find, I still am not sure if it is a good fit.
So I come here in the hope that someone might have more insight.
I write web apps and a lot of the projects I want to do will involve a query that looks like this:
Find all entries that are within a user-input-lat/long bounding box and where start-time is less than user-input-time-1 and end-time is greater than user-input-time-2 and has all tags in user-input-list-of-tags.
Thats not even pseudocode, but I hope it makes sense anyway.
I am not just looking for a "You cannot do that in CouchDB". Some kind of explanation and perhaps something like "If you can live without the tags then you can do this:"
I would like to use the Cloudant service so GeoCouch is apparently out of the question, but they do something that should work like lucene, but does that mean the queries are slow?
As you can tell, I am a bit confused here, so just do your best to straighten me out and I'll be greatfull :)
Not mentioning the tags (which in itself is already a problem), what you describe is a multi-dimensional query : you have several "coordinates" (lat, long, start-time, end-time) and provide a range for each of these coordinates.
On its own, CouchDB cannot perform multi-dimensional queries at all — you only get single-dimension queries across one coordinate.
Tags certainly are possible, but it depends on whether you need documents that have at least one tag in the list, or documents that have all tags in the list. The first case is easy (run one query per tag using the bulk API), the second might require excessive amounts of memory (if a document has N tags, it needs to emit 2N-1 tag-sets in order to match all possible tag combinations involving it, so you should place an upper bound on either the number of tags in a document, or the number of tags in a query).
Lucene does allow multi-dimensional and keyword-based queries, though I cannot vouch for their performance.

JPQL VS Criteria Api

Well, first: I found this question:
What to use: JPQL or Criteria API?
and from other searches I suppose that the only benefit of criteria api is that can check the query and, if it isn't correct, returns a compiler error.
Is it right?
If no, which are the advantage and disavantage from using JPQL or Criteria Api?
P.S.: The question is born after this other question:
https://stackoverflow.com/questions/8342955/error-in-criteriaquery
and the difficult that I'm finding to resolve that problem when I do the correct method using JPQL in 20 minutes...
I think what you are asking is when would you consider a Criteria Query.
I will further assume you are using an Application Server.
The Criteria API exists to allow for the construction of dynamic SQL queries in a type-safe manner that prevents SQL injection. Otherwise you would be concatenating SQL strings together which is both error prone and a security risk: i.e. SQL Injection. That would be the only time you would want to use the Criteria API.
If the query remains basically the same but need only accept different parameters you should use annotated #NamedQueries which are simpler, precompiled, can be cached within the secondary cache and validated during server startup.
That's basically the the rule of thumb concerning Criteria Queries versus #NamedQueries. In my experience rarely do you require the Criteria API but it is good that it exists for those rare times it is required.
Hope this helps.

Resources