JOOQ: multisetAgg or toSet filtering out NULL - jooq

Quite often, the new feature of multisetAgg is used along with LEFT JOINs.
Let's say, I have a user as dimension table and fact table paid_subscriptions. I want to query a specific user with all of his paid subscriptions and for each subscription do some processing (like sending an email or whatever).
I would write some JOOQ like this:
ctx
.select(row(
USER.ID,
USER.USERNAME,
multisetAgg(PAIDSUBSCRIPTIONS.SUBNAME).as("subscr").convertFrom(r -> r.intoSet(Record1::value1))
).mapping(MyUserWithSubscriptionPOJO::new)
)
.from(USER)
.leftJoin(PAIDSUBSCRIPTIONS).onKey()
.where(someCondition)
.groupBy(USER)
.fetch(Record1::value1));
The problem here is: the multisetAgg produces a Set which can contain null as element.
I either heve to filter out the null subscriptions I don't care about after JOOQ select, or I have to rewrite my query with something like this:
multisetAgg(PAIDSUBSCRIPTIONS.SUBNAME).as("subscr").convertFrom(r -> {
final Set<String> res = r.intoSet(Record1::value1);
res.remove(null); // remove possible nulls
return res;
})
Both don't look too nice in code.
I wonder if there is a better approach to write this with less code or even an automatic filtering of null values or some other kind of syntactic sugar avilable in JOOQ? After all, I think it is quite a common usecase especially considering that often enough, I end up with some java8 style stream processing of my left joined collection and first step is to filter out null which is something I forget often :)

You're asking for a few things here:
SET instead of MULTISET (will be addressed with #12033)
Adding NULL filtering (is already possible with FILTER)
The implied idea that such NULL values could be removed automatically (might be addressed with #13776)
SET instead of MULTISET
The SQL standard has some notions of a SET as opposed to MULTISET or ARRAY. For example:
#13795
It isn't as powerful as MULTISET, and it doesn't have to be, because usually, just by adding DISTINCT you can turn any MULTISET into a SET. Nevertheless, Informix (possibly the most powerful ORDBMS) does have SET data types and constructors:
LIST (ARRAY)
MULTISET
SET
So, we might add support for this in the future, perhaps. I'm not sure yet of its utility, as opposed to using DISTINCT with MULTISET (already possible) or MULTISET_AGG (possible soon):
#12033
Adding NULL filtering
You already have the FILTER clause to do this directly in SQL. It's a SQL standard and supported by jOOQ natively, or via CASE emulations. A native SQL example, as supported by e.g. PostgreSQL:
select
t.a,
json_agg(u.c),
json_agg(u.c) filter (where u.b is not null)
from (values (1), (2)) t (a)
left join (values (2, 'a'),(2, 'b'),(3, 'c'),(3, 'd')) u (b, c) on t.a = u.b
group by t.a
Producing:
|a |json_agg |json_agg |
|---|----------|----------|
|1 |[null] | |
|2 |["a", "b"]|["a", "b"]|
So, just write:
multisetAgg(PAIDSUBSCRIPTIONS.SUBNAME).filter(PAIDSUBSCRIPTIONS.ID.isNotNull())
The implied idea that such NULL values could be removed automatically
Note, I understand that you'd probably like this to be done automatically. There's a thorough discussion on that subject here: #13776. As always, it's a desirable thing that is far from easy to implement consistently.
I'm positive that this will be done eventually, but it's a very big change.

Related

Jooq - converting nested objects

the problem which I have is how to convert jooq select query to some object. If I use default jooq mapper, it works but all fields must be mentioned, and in exact order. If I use simple flat mapper, I have problems with multiset.
The problem with simple flat mapper:
class Student {
private final id;
Set<String> bookIds;
}
private static final SelectQueryMapper<Student> studentMapper = SelectQueryMapperFactory.newInstance().newMapper(Studen.class);
var students = studentMapper.asList(
context.select(
STUDENT.ID.as("id),
multiset(
select(BOOK.ID).from(BOOK).where(BOOK.STUDENT_ID.eq(STUDENT.ID)),
).convertFrom(r -> r.intoSet(BOOK.ID)).as("bookIds"))
.from(STUDENT).where(STUDENT.ID.eq("<id>"))
)
Simple flat mapper for attribute bookIds returns:
Set of exact one String ["[[book_id_1], [book_id_2]]"], instead of ["book_id_1", "book_id_2"]
As I already mention, this is working with default jooq mapper, but in my case all attributes are not mention in columns, and there is possibility that some attributes will be added which are not present in table.
The question is, is there any possibility to tell simple flat mapper that mapping is one on one (Set to set), or to have default jooq mapper which will ignore non-matching and disorder fields.
Also what is the best approach in this situations
Once you start using jOOQ's MULTISET and ad-hoc conversion capabilities, then I doubt you still need third parties like SimpleFlatMapper, which I don't think can deserialise jOOQ's internally generated JSON serialisation format (which is currently an array of arrays, not array of objects, but there's no specification for this format, and it might change in any version).
Just use ad-hoc converters.
If I use default jooq mapper, it works but all fields must be mentioned, and in exact order
You should see that as a feature, not a bug. It increases type safety and forces you to think about your exact projection, helping you avoid to project too much data (which will heavily slow down your queries!)
But you don't have to use the programmatic RecordMapper approach that is currently being advocated in the jOOQ manual and blog posts. The "old" reflective DefaultRecordMapper will continue to work, where you simply have to have matching column aliases / target type getters/setters/member names.

Best way to build dynamic SQL involving an optional limit?

What is the best way to optionally apply a LIMIT to a query in JOOQ? I want to run:
SelectSeekStepN<Record> readyToFetch = dslContext.select(selectFields).
from(derivedTable).
where(conditions).
orderBy(orderForward);
if (length != Integer.MAX_VALUE)
readyToFetch = readyToFetch.limit(length);
limit() returns SelectLimitPercentStep<Record> which is not a sub-class of SelectSeekStepN<Record> so I get a compiler error.
If, on the other hand, I change the return type of readyToFetch from SelectSeekStepN<Record> to Select<Record> which is compatible with the return type of limit() then I cannot invoke limit() on Select<Record>. I would need to explicitly cast it to SelectSeekStepN<Record>.
Is there a better way to do this?
Maybe JOOQ should treat Integer.MAX_VALUE as a special value (no limit) to make this kind of code easier to write...
Offering a no-op to pass to clauses like LIMIT
There is an occasional feature request asking for such no-op clauses in the DSL API, which would obviously be very helpful, especially in the case of LIMIT, where there is currently no non-hacky workaround. Unfortunately, there's no good solution yet, other than the one you've already mentioned in your question, to dynamically construct your SQL query.
For most clauses where optionality is required, something like a DSL.noCondition() exists. A DSL.noTable() has been requested, but not yet implemented (as of jOOQ 3.14). Same with a "no-op" for LIMIT: https://github.com/jOOQ/jOOQ/issues/11551
Getting the types right with dynamic SQL
Your own question already contains the solution. It's just a minor typing problem. You probably chose to assign your intermediary step to SelectSeekStepN because your IDE suggested this type. But you can use any super type, instead.
Select<Record> readyToFetch;
SelectLimitStep<Record> readyToLimit;
readyToFetch = readyToLimit = dslContext.select(selectFields).
from(derivedTable).
where(conditions).
orderBy(orderForward);
if (length != Integer.MAX_VALUE)
readyToFetch = readyToLimit.limit(length);
readyToFetch.fetch();
You can take some inspiration by the ParserImpl logic. It does this all over the place. Assignment expressions are a blessing!
Alternative using type inference on conditional expressions:
SelectLimitStep<Record> limit = dslContext.select(selectFields).
from(derivedTable).
where(conditions).
orderBy(orderForward);
Result<?> result = (length != Integer.MAX_VALUE ? limit.limit(length) : limit).fetch();
Using null as a way to explicitly indicate the absence of LIMIT
Using null as a way to indicate the absence of a LIMIT is a very bad idea for at least 3 reasons:
Most jOOQ API interprets (Field<?>) null as a NULL bind value or NULL literal, never as an absent value. It would be very surprising if suddenly, we used null for that purpose only in LIMIT
Even if we did, we'd have to start distinguishing between null (the internal interpretation of an absent value) and null (the value you as a user provide jOOQ with explicitly). So, we'd need some noLimit() object anyway internally, to make the distinction, in case of which, why not just expose that as API instead of letting you hack around?
Some dialects support NULL limits. PostgreSQL interprets it an absent LIMIT (which I find very confusing, the LIMIT being "unknown"). Oracle interprets it as a LIMIT 0, which is much more reasonable. Other dialects (e.g. MySQL) reject LIMIT NULL as bad syntax, which is also reasonable. You're suggesting jOOQ overrides this behaviour and cleverly re-interprets it. I'd rather not!
I dug into the implementation and discovered that there is another method limit(Number) that treats null values as no limit. Consequently, the code could be written as:
Select<Record> readyToFetch = dslContext.select(selectFields).
from(derivedTable).
where(conditions).
orderBy(orderForward).
limit(length == Integer.MAX_VALUE ? null : length);

Raw sql with many columns

I'm building a CRUD application that pulls data using Persistent and executes a number of fairly complicated queries, for instance using window functions. Since these aren't supported by either Persistent or Esqueleto, I need to use raw sql.
A good example is that I want to select rows in which the value does not deviate strongly from the previous value, so in pseudo-sql the condition is WHERE val - lag(val) <= x. I need to run this selection in SQL, rather than pulling all data and then filtering in Haskell, because otherwise I'd have way to much data to handle.
These queries return many columns. However, the RawSql instance maxes out at tuples with 8 elements. So now I am writing additional functions from9, to9, from10, to10 and so on. And after that, all these are converted using functions with type (Single a, Single b, ...) -> DesiredType. Even though this could be shortened using code generation, the approach is simply hacky and clearly doesn't feel like good Haskell. This concerns me because I think most of my queries will require rawSql.
Do you have suggestions on how to improve this? Currently, my main thought is to un-normalize the database and duplicate data, e.g. by including the lagged value as column, so that I can query the data with Esqueleto.

cql binary protocol and named bound variables in prepared queries

imagine I have a simple CQL table
CREATE TABLE test (
k int PRIMARY KEY,
v1 text,
v2 int,
v3 float
)
There are many cases where one would want to make use of the schema-less essence of Cassandra and only set some of the values and do, for example, a
INSERT into test (k, v1) VALUES (1, 'something');
When writing an application to write to such a CQL table in a Cassandra cluster, the need to do this using prepared statements immediately arises, for performance reasons.
This is handled in different ways by different drivers. Java driver for example has introduced (with the help of a modification in CQL binary protocol), the chance of using named bound variables. Very practical: CASSANDRA-6033
What I am wondering is what is the correct way, from a binary protocol point of view, to provide values only for a subset of bound variables in a prepared query?
Values in fact are provided to a prepared query by building a values list as described in
4.1.4. QUERY
[...]
Values. In that case, a [short] <n> followed by <n> [bytes]
values are provided. Those value are used for bound variables in
the query.
Please note the definition of [bytes]
[bytes] A [int] n, followed by n bytes if n >= 0. If n < 0,
no byte should follow and the value represented is `null`.
From this description I get the following:
"Values" in QUERY offers no ways to provide a value for a specific column. It is just an ordered list of values. I guess the [short] must correspond to the exact number of bound variables in a prepared query?
All values, no matter what types they are, are represented as [bytes]. If that is true, any interpretation of the [bytes] value is left to the server (conversion to int, short, text,...)?
Assuming I got this all right, I wonder if a 'null' [bytes] value can be used to just 'skip' a bound variable and not assign a value for it.
I tried this and patched the cpp driver (which is what I am interested in). Queries get executed but when I perform a SELECT from clqsh, I don't see the 'null' string representation for empty fields, so I wonder if that is a hack that for some reasons is not just crashing or the intended way to do this.
I am sorry but I really don't think I can just download the java driver and see how named bound variables are implemented ! :(
---------- EDIT - SOLVED ----------
My assumptions were right and now support to skip a field in a prepared query has been added to cpp driver (see here ) by using a null [bytes value].
What I am wondering is what is the correct way, from a binary protocol point of view, to provide values only for a subset of bound variables in a prepared query?
You need to prepare a query that only inserts/updates the subset of columns that you're interested in.
"Values" in QUERY offers no ways to provide a value for a specific column. It is just an ordered list of values. I guess the [short] must correspond to the exact number of bound variables in a prepared query?
That's correct. The ordering is determined by the column metadata that Cassandra returns when you prepare a query.
All values, no matter what types they are, are represented as [bytes]. If that is true, any interpretation of the [bytes] value is left to the server (conversion to int, short, text,...)?
That's also correct. The driver will use the returned column metadata to determine how to convert native values (strings, UUIDS, ints, etc) to a binary (bytes) format. Cassandra does the inverse of this operation server-side.
Assuming I got this all right, I wonder if a 'null' [bytes] value can be used to just 'skip' a bound variable and not assign a value for it.
A null column insertion is interpreted as a deletion.
Implementation of what I was trying to achieve has been done (see here ) based on the principle I described.

How do I structure a SELECT query for the following

Hoping that someone here will be able to provide some mysql advice...
I am working on a categorical searchtag system. I have tables like the following:
EXERCISES
exerciseID
exerciseTitle
SEARCHTAGS
searchtagID
parentID ( -> searchtagID)
searchtag
EXERCISESEARCHTAGS
exerciseID (Foreign key -> EXERCISES)
searchtagID (Foreign key -> SEARCHTAGS)
Searchtags can be arranged in an arbitrarily deep tree. So for example I might have a tree of searchtags that looks like this...
Body Parts
Head
Neck
Arm
Shoulder
Elbow
Leg
Hip
Knee
Muscles
Pecs
Biceps
Triceps
Now...
I want to select all of the searchtags in ONE branch of the tree that reference at least ONE record in the subset of records referenced by a SINGLE searchtag in a DIFFERENT branch of the tree.
For example, let's say the searchtag "Arm" points to a subset of exercises. If any of the exercises in that subset are also referenced by searchtags from the "Muscles" branch of SEARCHTAGS, I would like to select for them. So my query could potentially return "Biceps," "Triceps".
Two questions:
1) What would the SELECT query for something like this look like? (If such a thing is even possible without creating a lot of slow down. I'm not sure where to start...)
2) Is there anything I should do to tweak my datastructure to ensure this query will continue to run fast - even as the tables get big?
Thanks in advance for your help, it's much appreciated.
An idea: consider using a cache table that saves all ancestor relationships in your searchtags:
CREATE TABLE SEARCHTAGRELATIONS (
parentID INT,
descendantID INT
);
Also include the tag itself as parent and descendant (so, for searchtag with id 1, the relations table includes a row with (1,1).
That way, you get rid of the parent/descendant relationships and can join a flat table. Assuming "Muscles" has the ID 5,
SELECT descendantID FROM SEARCHTAGRELATIONS WHERE parentID=5
returns all searchtags contained in muscles.
Alternatively, use modified preorder tree traversal, also known as the nested set model. It requires two fields (left and right) instead of one (parent id), and makes certain operations harder, but makes selecting whole branches much easier.

Resources