Customize jOOQ dialect to alter the order in which LIMIT and OFFSET are rendered in a statement - jooq

I'm using jOOQ to generate queries to run against Athena (AKA PrestoDB/Trino)
To do this, I am using SQLDialects.DEFAULT, and it works because I use very basic query functionalities.
However, jOOQ renders queries like this:
select *
from "Artist"
limit 10
offset 10
God knows why, but the order of limit and offset seem to matter, and the query only works if written with the order swapped:
select *
from "Artist"
offset 10
limit 10
Is there a class I can subclass, to modify the statement render function so that the order of these are swapped? Or any other way of implementing this myself?

A generic solution in jOOQ
There isn't a simple way to change something as fundamental as the SELECT clause order (or any other SELECT clause syntax) so easily in jOOQ, simply, because this was never a requirement for core jOOQ usage, other than supporting fringe SQL dialects. Since the support of a SQL dialect is a lot of work in jOOQ (with all the integration tests, edge cases, etc.) and since market shares of those dialects are low, it has simply never been a priority to improve this in jOOQ.
You may be tempted to think that this is "just" about the order of keywords in this one case. "Only this one case." It never is. It never stops, and the subtle differences in dialects never end. Just look at the jOOQ code base to get an idea of how weirdly different vendors choose to make their dialects. In this particular case, one would think that it seems extremely obvious and simple to make this clause MySQL / PostgreSQL / SQLite compatible, so your best chance is to make a case with the vendor for a feature request. It should be in their own best interest to be more compatible with the market leaders, to facilitate migration.
Workarounds in jOOQ
You can, of course, patch your generated SQL on a low level, e.g. using an ExecuteListener and a simple regex. Whenever you encounter limit (\d+|\?) offset (\d+|\?), just swap the values (and bind values!). This might work reasonably well for top level selects. It's obviously harder if you're using LIMIT .. OFFSET in nested selects, but probably still doable.
Patching jOOQ is always an option. The class you're looking for in jOOQ 3.17 is org.jooq.impl.Limit. It contains all the rendering logic for this clause. If that's your only patch, then it might be possible to upgrade jOOQ. But obviously, patching is a slippery slope, as you may start patching all sorts of clauses, making upgrades impossible.
You can obviously use plain SQL templates for simple cases, e.g. resultQuery("{0} offset {1} limit {2}", actualSelect, val(10), val(10)). This doesn't scale well, but if it's only about 1-2 queries, it might suffice
Using the SQLDialect.DEFAULT
I must warn you, at this point, that the behaviour of SQLDialect.DEFAULT is unspecified. Its main purpose is to produce something when you call QueryPart.toString() on a QueryPart that is not an Attachable, where a better SQLDialect is unavailable. The DEFAULT dialect may change between minor releases (or even patch releases, if there's an important bug in some toString() method), so any implementation you base on this is at risk of breaking with every upgrade.
The most viable long term solution
... would be to have support for these dialects in jOOQ:
#5414 Presto
#11485 Trino

Related

JOOQ - Is there any similar tool like SQL 2 jOOQ Parser?

Since JOOQ 3.6+, they no longer ship with SQL 2 jOOQ Parser. Search on the internet, I can't find the tool SQL 2 jOOQ Parser anywhere.
Just wonder is there any similar tool like SQL 2 jOOQ Parser so we can generate the JOOQ code from native sql?
There's a feature request for this:
https://github.com/jOOQ/jOOQ/issues/6277
From the feature request:
This was already implemented in the past by the https://github.com/sqlparser/sql2jooq third party module, but it suffered from several flaws:
It didn't produce very good jOOQ code
It worked only for MySQL and PostgreSQL
It depended on a third party parser (by Gudu Soft), which was proprietary and not under our control
It was hard to use
The product got zero (!) user feedback over quite some time, which is never a good sign.
Eventually, we'll re-iterate the idea, but it's a lot of work, and there are probably more interesting things that can be done first. The approach most people will choose when writing jOOQ queries is they:
Choose a test driven approach where the feedback cycle is tight, such that executing a query to test if it's correct is done relatively easily
Use views (seriously, use views! Why don't people use views more often?) for your very complex static SQL and query the views from jOOQ

JOOQ vs SQL Queries

I am on jooq queries now...I feel the SQL queries looks more readable and maintainable and why we need to use JOOQ instead of using native SQL queries.
Can someone explains few reason for using the same?
Thanks.
Here are the top value propositions that you will never get with native (string based) SQL:
Dynamic SQL is what jOOQ is really really good at. You can compose the most complex queries dynamically based on user input, configuration, etc. and still be sure that the query will run correctly.
An often underestimated effect of dynamic SQL is the fact that you will be able to think of SQL as an algebra, because instead of writing difficult to compose native SQL syntax (with all the keywords, and weird parenthesis rules, etc.), you can think in terms of expression trees, because you're effectively building an expression tree for your queries. Not only will this allow you to implement more sophisticated features, such as SQL transformation for multi tenancy or row level security, but every day things like transforming a set of values into a SQL set operation
Vendor agnosticity. As soon as you have to support more than one SQL dialect, writing SQL manually is close to impossible because of the many subtle differences in dialects. The jOOQ documentation illustrates this e.g. with the LIMIT clause. Once this is a problem you have, you have to use either JPA (much restricted query language: JPQL) or jOOQ (almost no limitations with respect to SQL usage).
Type safety. Now, you will get type safety when you write views and stored procedures as well, but very often, you want to run ad-hoc queries from Java, and there is no guarantee about table names, column names, column data types, or syntax correctness when you do SQL in a string based fashion, e.g. using JDBC or JdbcTemplate, etc. By the way: jOOQ encourages you to use as many views and stored procedures as you want. They fit perfectly in the jOOQ paradigm.
Code generation. Which leads to more type safety. Your database schema becomes part of your client code. Your client code no longer compiles when your queries are incorrect. Imagine someone renaming a column and forgetting to refactor the 20 queries that use it. IDEs only provide some degree of safety when writing the query for the first time, they don't help you when you refactor your schema. With jOOQ, your build fails and you can fix the problem long before you go into production.
Documentation. The generated code also acts as documentation for your schema. Comments on your tables, columns turn into Javadoc, which you can introspect in your client language, without the need for looking them up in the server.
Data type bindings are very easy with jOOQ. Imagine using a library of 100s of stored procedures. Not only will you be able to access them type safely (through code generation), as if they were actual Java code, but you don't have to worry about the tedious and useless activity of binding each single in and out parameter to a type and value.
There are a ton of more advanced features derived from the above, such as:
The availability of a parser and by consequence the possibility of translating SQL.
Schema management tools, such as diffing two schema versions
Basic ActiveRecord support, including some nice things like optimistic locking.
Synthetic SQL features like type safe implicit JOIN
Query By Example.
A nice integration in Java streams or reactive streams.
Some more advanced SQL transformations (this is work in progress).
Export and import functionality
Simple JDBC mocking functionality, including a file based database mock.
Diagnostics
And, if you occasionally think something is much simpler to do with plain native SQL, then just:
Use plain native SQL, also in jOOQ
Disclaimer: As I work for the vendor, I'm obviously biased.

Performance impact in using camelCase in Cassandra columns

I know, Cassandra generally converts all the column names into lowercase.
Is there a performance impact in using the camelCase in column names in Cassandra?
I used the double quote in columns and I am able to store the column names in the camelCase, like below
CREATE TABLE test (
Foo int PRIMARY KEY,
"Bar" int
);
Will there be a performance impact in storing the column name with the double quotes?
Space wise no. Performance wise, no. (Well, even assuming you have to wire the double quotes, if you use prepared statements, you will send once, so it is negligible)
On Cassandra 3, the names are only written once on the Header of the sstables (Reference: http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html).
It gets pretty weird on having to always provide the double quotes (CQLSH for example), so I normally don't do it.
Also, older cluster that started with Thrift and migrated to CQL have a lot of that. So, to avoid confusion regarding the origins (but there are better ways of verifying this) it is good to keep the quotes away.
I don't believe there's an impact. I would say that the case-insensitive nature of CQL only serves the purpose of simplifying queries, as you can see from this answer: https://stackoverflow.com/a/28447941/824644
Also, it seems that there was a motivation for this behavior due to the preference of lower camel case in Java (which is the language in which Cassandra is written in). See the discussion in this GitHub issue: https://github.com/reuzel/CqlSharp/issues/28
There's no impact performance wise. CQL downcases unquoted identifiers.
I understand it may cause trouble for developers, as it may be easy to handle camel or mixed cases but if you're trying to access the table through APIs then you may have defined a class that maps to that table. I don't see any overhead where you would need to change the table name and etc when accessing through the API.
Moreover, when you've a bigger team of developers it is seen as a good data modeling practice to name columns as lowercase or use underscores instead of camel case. That is done as default by cassandra. If you really need the casing then just use double quotes.

Save or update in Groovy

Usually in Java I execute a SELECT statement and check the size of ResultSet. If it is zero I issue a INSERT and otherwise an UPDATE.
Since Groovy provides syntactic sugar on top of JDBC, I'm wondering if it provides a way to ease this process? Is there an easier way to save or update a record?
Note:
I know that Hibernate offers this, but I'd rather stick only with Groovy API.
There's a lightweight ORM called GStorm here which I've had on my list of things to investigate which has basically no dependencies, but doesn't handle related domain objects
And a library to leverage Grails GORM here (which obviously pulls GORM out of Grails so has quite a few dependencies including Hibernate)
Other than that (and probably some other examples I've missed), there's nothing I know of to do what you're trying to do. I guess you'd have to write your own (you could switch between the INSERT or UPDATE depending on whether you pass a Primary Key -- assuming primary keys are auto-generated by the DB)

What code could be used as a string aggregator for Sybase? (Like Oracle's stragg)

In my travels in Oracle, the 'stragg' function, or 'String Aggregator' was life-saving when I had to create dynamic SQL queries on the fly.
You can read up about it here: http://www.oratechinfo.co.uk/delimited_lists_to_collections.html
The basic use of it was:
select stragg(fruit) from food;
fruit
-----------
apple,pear,banana,strawberry
1 row(s) returned
So simple to use, concatenating chr(13) turned it into a long list, and selecting information from system tables gave a 5 minute solution to dynamically generated SQL, e.g. auditing triggers.
Now I've been charged with transferring oracle functionality related to auditing into Sybase, and a function similar to Stragg would be ideal for this purpose.
E.g.
select #my_table = 'table_of_fruit'
select 'insert into '+#mytable+'_copy (' +char(10)
+ stragg(c.name) +char(10)
+ 'select '
+ stragg('inserted.'+c.name) + char(10)
+ 'from '+#mytable
from syscolumns c
where objectid(#mytable) = c.id
------------------------------------------
insert into table_of_fruit_copy
(fruit, sweetness, price)
select fruit, sweetness,price
from inserted
Done. Simple.
Except I don't know how to get a string-aggregation function working in Sybase.
Does anyone know of an attempt to do this kind of thing, or code that could work the same as stragg that could be used in this way?
The alternative at the moment is printing code based on complex cursors and such (sample LOC: 500), or select statements combining static strings and columns from user tables (sample LOC: 200). Stragg would severely reduce the complexity of this code, and would be a great deal of help in the future (sample LOC: who knows, maybe 50?)
p.s. I'm calling these selects through a shell script then piping them to file, then running the file through iSQL. Not the nicest solution, but it's better than the alternatives.
There are three separate answers
Question
You have made comments about simplicity, which need to be addressed before we get to the solution.
It is a common requirement to be able to take a delimited list of values, say A,B,C,D, and treat this data like it was a set of rows in a table, or vice versa
This one of the Top Ten Worst Programming Practices I read about recently.
In general, Sybase types tend to be somewhat more academically and Relationally qualified than Oracle types, so we simply do not do that sort of thing in SybaseLand or DB2Land.
In 20 years of working with Sybase, I have had to code that as part of my project just once, and that was for non-technical Auditor who loaded the result set into MS Access.
On the other hand, I have had to code that at least 12 times, when producing text files for importation into Oracle databases (fulfilling external requirements is outside my project, but I satisfy any such requirement free). Obviously the target databases were sub-standard and non-relational (loading a column with more than one datum breaks 1NF, and creates Update Anomalies), which is typical of what Oracle types have to do to get some speed.
Therefore, no, it is not simplicity, at least in the sense of that principle. It is by definition, complexity.
Your reference to "arrays" is incorrect. All commercial dbms handle arrays, according to the ISO/IEC/ANSI SQL (STRAGGR and LIST operators are non-standard SQL, therefore not SQL). Sybase is very strong in processing arrays. If it was an array, you would not need special hand coding to handle it (and you do, as per your question). This is not an array, there is no definition to the cells. This is a single concatenated scalar string.
Pivoting is an entirely different process, which uses set-processing; it does not require row-processing. (I understand on good authority, that Oracle is hopeless at scalar subqueries, and thus Oracle people are used to writing them as [very inefficient] joins or inline views, and then filtering: all that can be elevated to set-processing via scalar subqueries, and it will perform much faster. Particularly your Pivots.)
Even the author in your link posts as follows. Please familiarise yourself with the caveats:
It's as simple as this: If you want to have a system with no logical limitation in the number of data elements passed to a given process, then forget the following mechanisms! They are simply the wrong way to approach the problem.
Therefore, know whatever you are doing is sub-standard, non-relational, and limited; and go ahead with your eyes open. No use pretending that: it will not break; it is not limited; it is an "array"; or that Sybase doesn't have a neat little function that Oracle has. Any professional will see through all that. And if the string length is exceeded, for God's sake send some indicator back to the caller ("!Exceeded" in the string) identifying that condition.
Essentially you are turning the set-processing engine on its head, and forcing it into row-processing mode, so it will be very slow. A WHILE loop is distinctly faster than a cursor, but both are in the same class, row-processors.
The alternative at the moment is printing code based on complex cursors and such
What 200 or 500 LoC ? It is possible I am missing something, but my code is the same few lines of code identified under "Using a Table Function" in your link. Maximum 20, if you count nice formatting; the loop; initialisation; error handling. There is nothing "complex" about it. Do the exact reverse to cancatenate a single string from multiple rows. We use stored procedures for this (which oracle does not have, really, PL/SQL is a different animal). If you have ASE 15.0.2 or greater, you can use a User Defined Function, which you can then use in place of a column. Stored procs are better for true arrays.
the concatenation operator in Sybase is the plus sign. For reversal (decomposing the CSV string) you need CHARINDEX and SUBSTRING functions
You may need the Function Reference Manual, if for nothing else, to avoid writing code where we have functions.
Likewise, we do not have a RANK() function. We are quite happy with the 4 lines of code requires for the subquery. It is only required for Oracle because subqueries are crippled.
Ok, I have answered your question, Now to address the approaches.
You will be aware that code using Oracle Extensions to the SQL standard will need to be changed.
Sybase is way more automated than Oracle; if you familiarise yourself with its feature set, in many instances, you can get the same result (as you did in Oracle) without writing any code. Writing code-for-code blocks is the chain gang, rock-breaking method of building roads, in the context of bulldozers. Even if your company had good reason to use that method, you need to the aware that features work quite differently, eg. triggers, which is why I am posting so much detail.
Another issue that will annoy you is that Oracle isn't really ANSI SQL compliant (stretches the definitions in many places, in order to appear to be compliant), and Sybase, given its customer base, is rigidly SQL compliant. So in addition to the same function working differently, or in a different deployment, you need to be aware that code changes may be required to elevate Oracle code to ANSI compliance levels, just to execute on an ANSI SQL compliant platform.
I am not sure if you are trying to write code for the content of a trigger, or if you are trying to capture the changes to a database. I will provide both answers.
Auditing
Capture Changes to Database
We have an very robust, fast and configurable Auditing subsystem, fit for high volumes and banking level auditing requirements. Get your DBA to setup the sybaudit (separate) database, and to configure exactly what changes need to be captured. This facility will perform much faster than any code you or I can write in a trigger (as much as 100 times faster than your row-by-processing required for the above, as it is executed within the engine, within your executing thread). And of course the setup time is a fraction of your coding time.
Triggers
Again, I am not sure exactly what you are trying to achieve, but assuming you want to copy every insert to some table to a COPY of that table (inside the Trigger), that example code you have provided will not work (and I am not counting syntax issues).
Speaking to your example, you need to do way more work, to deal with the different datatypes; column sizes; precisions; scale; etc. And perhaps the UPDATE() function to identify which columns have changed (for an UPDATE trigger of course). If all you are trying to do is convert the various datatypes to strings, check the CONVERT() function.
Triggers are transactional.
Never place row-processing code in a Trigger (it will strangle the table)
You can't place Dynamic SQL in a Trigger.
But in Sybase even that is not necessary. Refer to the User Guide, chapter 19 is devoted to Triggers, with several variations, and examples. Inside the trigger, you should be able to simply:
INSERT table_copy
SELECT column_list -- never use * unless you want the db fixed in cement
FROM inserted
If you are trying to copy the inserts to all tables into one Audit table, then beware. Then I understand your example a little bit more. You will be forcing a highly Symmetric Muli-Threading server (oracle is not a server in the architecture sense) into single-threading through your table. Auditing is multi-threaded.
Last, the use of manual methods of any kind is not required, so if you could expand a bit more on your PS, what the requirement you are trying to fulfil is, I can identify the programmatic method for you. It appears you are trying to use the PL/SQL approach (which is very limited).
Just use the LIST() function. It's a direct replacement for stragg() function. Example:
SELECT LIST(state, ', ') FROM cities
Result:
name
CA, CA, MA, NY

Resources