Average date difference using querydsl-jpa / querydsl-sql - jpql

I'm trying to compute an average date difference using QueryDSL.
I created a small project to demonstrate what I'm trying to accomplish, in a simplified manner (the real query is a lot more complex, with tons of joins / where / sort clauses). We have a Customer class with a birthDate field, and we are trying to get the average age, in seconds, of our customers. We also want the maximum age, but let's focus on the average for this post.
I tried writing this query using querydsl-jpa, but it fails with an obscure error:
java.lang.NullPointerException
at org.hibernate.dialect.function.StandardAnsiSqlAggregationFunctions$AvgFunction.determineJdbcTypeCode(StandardAnsiSqlAggregationFunctions.java:106)
at org.hibernate.dialect.function.StandardAnsiSqlAggregationFunctions$AvgFunction.render(StandardAnsiSqlAggregationFunctions.java:100)
at org.hibernate.hql.internal.ast.SqlGenerator.endFunctionTemplate(SqlGenerator.java:233)
[...]
I also tried other approaches, like using NumberTemplate.create(Double.class, "{0} - {1}", DateExpression.currentDate(), customer.birthDate).avg(), but it doesn't return the correct value. If we want to get a date difference in seconds, it seems we need to find some way of calling the database-specific date/time difference functions, not just use the minus sign.
Sadly, computing a date difference doesn't seem to be possible in JPQL, so I guess querydsl-jpa has limitations there too. So we would have to write a native SQL query, or find some hack to have the QueryDsl-generated JPQL call a native database function.
JPA 2.1 added support for invoking database functions, but there is a problem: the MySQL function takes the form TIMESTAMPDIFF(SECOND, '2012-06-06 13:13:55', '2012-06-06 15:20:18'). It would probably be possible if the first parameter (SECOND) was a String, but it seems to be a reference to some kind of constant, and it seems complicated to generate JPQL with the first parameter unquoted.
QueryDSL added support for date differences, but it seems most of the code resides in the querydsl-sql project, so I'm wondering if I can benefit from this with querydsl-jpa.
Here are my questions:
Is it possible to compute the average date difference using querydsl-jpa, having it maybe call the native database functions using JPA 2.1 support (maybe using Expressions.numberTemplate())? Or are we forced to use querydsl-sql?
If we have to use querydsl-sql, how do we generate both QCustomer and SCustomer? QCustomer is currently generated from the Customer entity using the plugin "com.mysema.maven:apt-maven-plugin". If I understood correctly, I have to use a different plugin (com.querydsl:querydsl-maven-plugin) to generate the SCustomer query type?
When looking at querydsl-sql-example, I don't see any entity class, so I guess the query types are generated by QueryDSL from the database schema? Is there a way to generate the SCustomer query type from the entity instead, like we do with querydsl-jpa?
If we use querydsl-sql, is there a way to "re-use" our querydsl-jpa predicates / sorts / joins clauses in the querydsl-sql query? Or do we have to duplicate that code using querydsl-sql-specific classes?
I'm also considering creating a database function that delegates to TIMESTAMPDIFF(SECOND, x, y), but it's not very portable...
Am I missing something? Is there a simpler way of doing what I'm trying to do?

Using template expressions you should be able to inject any custom JPQL snippets into the Querydsl query. That should answer your first question.
Using both querydsl-jpa and querydsl-sql in the same project is possible, but adds some complexity.

Related

How to update rows in Jooq without Codegen using JSON

I am using Jooq version 3.17.0 and attempting to insert data into a table without codegen.
At the minute, I am designing a system that allows data to be imported into multiple tables (one at a time, and starting with just one), yet I do not want to write specific code for each table and as of now, I haven't had a need for codegen.
The code currently works for importing data via JSON, with json being a String formatted in the 'Jooq' format. This imports data correctly into the database. This also allows us to send json data of table updates from one system to our main system that uses Jooq. Yet it gives me an error when I try to update.
I am using MYSQL as my database.
The original code for insertion is :
Result<Record> convertedJson = dslContext.fetchFromJSON(json);
Loader<Record> res1 = dslContext.loadInto(table(tableName)).loadJSON(json).fields(convertedJson.fields()).execute();
However, if we try to update data by sending in the same json, but with one field changed, jooq gives an error org.jooq.exception.DataAccessException stating that there is a duplicate entry for key.
I tried to use :
Loader<Record> res2 = dslContext.loadInto(table(tableName)).onDuplicateKeyUpdate().loadJSON(json).fields(convertedJson.fields()).execute();
But then this throws an error ON DUPLICATE KEY UPDATE only works on tables with explicit primary keys. Table is not updatable : <tableName> since in LoaderImpl.onDuplicateKeyUpdate():220 since table.getPrimaryKey() is null which technically makes sense since table(tableName) returns a Table that does not know it's fields.
My question is probably two-fold.
Is there a way to have a table that is aware of it's fields without codegen?
Is there a way for me to allow jooq to update rows this way.
My preferences is to steer clear of codegen, unless it's really needed. I probably could switch to codegen if needed, but again I would still need to be able to execute SQL without writing specific code for each table. Using JSON is still very much desired, as that allows me to send data from one application to another for import.
Using code generation
You've run into one of those many reasons why code generation is very helpful with jOOQ. If your various tables are known at compile time, and all you're doing is switch table names, then I would go with generated code, making the lookup of the table dynamic. That would solve the problem easily.
From experience with various similar support cases, I've always recommended this first, because as soon as these kinds of troubles start, it's a good idea to re-think the code generation strategy as you will run into other, similar problems, having to work around the lack of ubiquitously available meta data all the time. There are many other benefits to using the code generator.
Emulating code generation
If for some reason you cannot (e.g. the tables aren't known at compile time) or do not want to use the code generator, then you can do the code generator's work yourself at runtime, by building CustomTable types as documented here.
Using other means of providing meta information
Another way to provide jOOQ with meta data is to use one of various forms of implementing org.jooq.Meta, which include:
Looking up meta data from the JDBC driver's DatabaseMetaData (this can be slow, depending on your schema)
Letting jOOQ interpret some DDL scripts
Using jOOQ's XML representation of the standard SQL INFORMATION_SCHEMA
Using generated code

CloudSearch, How to find documents where a key is not defined?

Rather simple question here. Using CloudSearch, how do I find an object that does NOT have a certain key/property defined.
eg. I have been storing Car objects all along without indexing their price. Now I have began indexing Car objects with their msrp... how do I find the Car objects stored without any indexed price?
(and price:null)
(and price:undefined)
and other similar 'falsy' statements and their stringified permutations all do not work.
I am using AWS sdk in Node.js.
TIA!
Niko
The option that will work without any reindexing is a range search like
(NOT (range field=price [0,}))
which matches cars with a price that is not between 0 and infinity (eg ones with no price). See this answer for a discussion of other options.
Side note: I get the impression that you may be using CloudSearch to store your data. If so, I would consider using a datastore (which are designed to store data) rather than CloudSearch (which is a search engine). For one, it'll make this sort of query much easier.

Date function and Selecting top N queries in DocumentDB

I have following questions regarding Azure DocumentDB
According to this article, multiple functions have been added to
DocumentDB. Is there any way to get Date functions working? How can i
get the queries of type greater than some date working?
Is there any way to select top N results like 'Select top 10 * from users'?
According to Document playground , Order By will be supported in future. Is ther any other way around for now?
The application that I am developing requires certain number of results to be displayed that have been inserted recently. I need these functionalities within a stored procedure. The documents that I am storing in DocumentDB have a DateTime property. I require the above mentioned functionalities for my application to work. I have searched at documentation and samples. Please help if you know of any workaround.
Some thoughts/suggestions below:
Please take a look at this idea on how to store and query dates in DocumentDB (as epoch timestamps). http://azure.microsoft.com/blog/2014/11/19/working-with-dates-in-azure-documentdb-4/
To get top N results, set FeedOptions.MaxItemCount and read only one page, i.e., call ExecuteNextAsync() once. See https://msdn.microsoft.com/en-US/library/microsoft.azure.documents.linq.documentqueryable.asdocumentquery.aspx for an example. We're planning to add TOP to the grammar to make this easier in the future.
You can email me at arramac at microsoft dot com to get early access to Order By right away. This is planned for broad release shortly.
Please note that stored procedures are best used when you have a write operation(s). You'll be able to better throughput on reads when you query directly.

JPQL VS Criteria Api

Well, first: I found this question:
What to use: JPQL or Criteria API?
and from other searches I suppose that the only benefit of criteria api is that can check the query and, if it isn't correct, returns a compiler error.
Is it right?
If no, which are the advantage and disavantage from using JPQL or Criteria Api?
P.S.: The question is born after this other question:
https://stackoverflow.com/questions/8342955/error-in-criteriaquery
and the difficult that I'm finding to resolve that problem when I do the correct method using JPQL in 20 minutes...
I think what you are asking is when would you consider a Criteria Query.
I will further assume you are using an Application Server.
The Criteria API exists to allow for the construction of dynamic SQL queries in a type-safe manner that prevents SQL injection. Otherwise you would be concatenating SQL strings together which is both error prone and a security risk: i.e. SQL Injection. That would be the only time you would want to use the Criteria API.
If the query remains basically the same but need only accept different parameters you should use annotated #NamedQueries which are simpler, precompiled, can be cached within the secondary cache and validated during server startup.
That's basically the the rule of thumb concerning Criteria Queries versus #NamedQueries. In my experience rarely do you require the Criteria API but it is good that it exists for those rare times it is required.
Hope this helps.

What code could be used as a string aggregator for Sybase? (Like Oracle's stragg)

In my travels in Oracle, the 'stragg' function, or 'String Aggregator' was life-saving when I had to create dynamic SQL queries on the fly.
You can read up about it here: http://www.oratechinfo.co.uk/delimited_lists_to_collections.html
The basic use of it was:
select stragg(fruit) from food;
fruit
-----------
apple,pear,banana,strawberry
1 row(s) returned
So simple to use, concatenating chr(13) turned it into a long list, and selecting information from system tables gave a 5 minute solution to dynamically generated SQL, e.g. auditing triggers.
Now I've been charged with transferring oracle functionality related to auditing into Sybase, and a function similar to Stragg would be ideal for this purpose.
E.g.
select #my_table = 'table_of_fruit'
select 'insert into '+#mytable+'_copy (' +char(10)
+ stragg(c.name) +char(10)
+ 'select '
+ stragg('inserted.'+c.name) + char(10)
+ 'from '+#mytable
from syscolumns c
where objectid(#mytable) = c.id
------------------------------------------
insert into table_of_fruit_copy
(fruit, sweetness, price)
select fruit, sweetness,price
from inserted
Done. Simple.
Except I don't know how to get a string-aggregation function working in Sybase.
Does anyone know of an attempt to do this kind of thing, or code that could work the same as stragg that could be used in this way?
The alternative at the moment is printing code based on complex cursors and such (sample LOC: 500), or select statements combining static strings and columns from user tables (sample LOC: 200). Stragg would severely reduce the complexity of this code, and would be a great deal of help in the future (sample LOC: who knows, maybe 50?)
p.s. I'm calling these selects through a shell script then piping them to file, then running the file through iSQL. Not the nicest solution, but it's better than the alternatives.
There are three separate answers
Question
You have made comments about simplicity, which need to be addressed before we get to the solution.
It is a common requirement to be able to take a delimited list of values, say A,B,C,D, and treat this data like it was a set of rows in a table, or vice versa
This one of the Top Ten Worst Programming Practices I read about recently.
In general, Sybase types tend to be somewhat more academically and Relationally qualified than Oracle types, so we simply do not do that sort of thing in SybaseLand or DB2Land.
In 20 years of working with Sybase, I have had to code that as part of my project just once, and that was for non-technical Auditor who loaded the result set into MS Access.
On the other hand, I have had to code that at least 12 times, when producing text files for importation into Oracle databases (fulfilling external requirements is outside my project, but I satisfy any such requirement free). Obviously the target databases were sub-standard and non-relational (loading a column with more than one datum breaks 1NF, and creates Update Anomalies), which is typical of what Oracle types have to do to get some speed.
Therefore, no, it is not simplicity, at least in the sense of that principle. It is by definition, complexity.
Your reference to "arrays" is incorrect. All commercial dbms handle arrays, according to the ISO/IEC/ANSI SQL (STRAGGR and LIST operators are non-standard SQL, therefore not SQL). Sybase is very strong in processing arrays. If it was an array, you would not need special hand coding to handle it (and you do, as per your question). This is not an array, there is no definition to the cells. This is a single concatenated scalar string.
Pivoting is an entirely different process, which uses set-processing; it does not require row-processing. (I understand on good authority, that Oracle is hopeless at scalar subqueries, and thus Oracle people are used to writing them as [very inefficient] joins or inline views, and then filtering: all that can be elevated to set-processing via scalar subqueries, and it will perform much faster. Particularly your Pivots.)
Even the author in your link posts as follows. Please familiarise yourself with the caveats:
It's as simple as this: If you want to have a system with no logical limitation in the number of data elements passed to a given process, then forget the following mechanisms! They are simply the wrong way to approach the problem.
Therefore, know whatever you are doing is sub-standard, non-relational, and limited; and go ahead with your eyes open. No use pretending that: it will not break; it is not limited; it is an "array"; or that Sybase doesn't have a neat little function that Oracle has. Any professional will see through all that. And if the string length is exceeded, for God's sake send some indicator back to the caller ("!Exceeded" in the string) identifying that condition.
Essentially you are turning the set-processing engine on its head, and forcing it into row-processing mode, so it will be very slow. A WHILE loop is distinctly faster than a cursor, but both are in the same class, row-processors.
The alternative at the moment is printing code based on complex cursors and such
What 200 or 500 LoC ? It is possible I am missing something, but my code is the same few lines of code identified under "Using a Table Function" in your link. Maximum 20, if you count nice formatting; the loop; initialisation; error handling. There is nothing "complex" about it. Do the exact reverse to cancatenate a single string from multiple rows. We use stored procedures for this (which oracle does not have, really, PL/SQL is a different animal). If you have ASE 15.0.2 or greater, you can use a User Defined Function, which you can then use in place of a column. Stored procs are better for true arrays.
the concatenation operator in Sybase is the plus sign. For reversal (decomposing the CSV string) you need CHARINDEX and SUBSTRING functions
You may need the Function Reference Manual, if for nothing else, to avoid writing code where we have functions.
Likewise, we do not have a RANK() function. We are quite happy with the 4 lines of code requires for the subquery. It is only required for Oracle because subqueries are crippled.
Ok, I have answered your question, Now to address the approaches.
You will be aware that code using Oracle Extensions to the SQL standard will need to be changed.
Sybase is way more automated than Oracle; if you familiarise yourself with its feature set, in many instances, you can get the same result (as you did in Oracle) without writing any code. Writing code-for-code blocks is the chain gang, rock-breaking method of building roads, in the context of bulldozers. Even if your company had good reason to use that method, you need to the aware that features work quite differently, eg. triggers, which is why I am posting so much detail.
Another issue that will annoy you is that Oracle isn't really ANSI SQL compliant (stretches the definitions in many places, in order to appear to be compliant), and Sybase, given its customer base, is rigidly SQL compliant. So in addition to the same function working differently, or in a different deployment, you need to be aware that code changes may be required to elevate Oracle code to ANSI compliance levels, just to execute on an ANSI SQL compliant platform.
I am not sure if you are trying to write code for the content of a trigger, or if you are trying to capture the changes to a database. I will provide both answers.
Auditing
Capture Changes to Database
We have an very robust, fast and configurable Auditing subsystem, fit for high volumes and banking level auditing requirements. Get your DBA to setup the sybaudit (separate) database, and to configure exactly what changes need to be captured. This facility will perform much faster than any code you or I can write in a trigger (as much as 100 times faster than your row-by-processing required for the above, as it is executed within the engine, within your executing thread). And of course the setup time is a fraction of your coding time.
Triggers
Again, I am not sure exactly what you are trying to achieve, but assuming you want to copy every insert to some table to a COPY of that table (inside the Trigger), that example code you have provided will not work (and I am not counting syntax issues).
Speaking to your example, you need to do way more work, to deal with the different datatypes; column sizes; precisions; scale; etc. And perhaps the UPDATE() function to identify which columns have changed (for an UPDATE trigger of course). If all you are trying to do is convert the various datatypes to strings, check the CONVERT() function.
Triggers are transactional.
Never place row-processing code in a Trigger (it will strangle the table)
You can't place Dynamic SQL in a Trigger.
But in Sybase even that is not necessary. Refer to the User Guide, chapter 19 is devoted to Triggers, with several variations, and examples. Inside the trigger, you should be able to simply:
INSERT table_copy
SELECT column_list -- never use * unless you want the db fixed in cement
FROM inserted
If you are trying to copy the inserts to all tables into one Audit table, then beware. Then I understand your example a little bit more. You will be forcing a highly Symmetric Muli-Threading server (oracle is not a server in the architecture sense) into single-threading through your table. Auditing is multi-threaded.
Last, the use of manual methods of any kind is not required, so if you could expand a bit more on your PS, what the requirement you are trying to fulfil is, I can identify the programmatic method for you. It appears you are trying to use the PL/SQL approach (which is very limited).
Just use the LIST() function. It's a direct replacement for stragg() function. Example:
SELECT LIST(state, ', ') FROM cities
Result:
name
CA, CA, MA, NY

Resources