jOOQ: Can you force LIMIT to be rendered as ROW_NUMBER() window function? - jooq

I see this bit in the jOOQ docs:
https://github.com/jOOQ/jOOQ/blob/d727e6c476e8b1cbed1c91fd3724936c73cd9126/jOOQ/src/main/java/org/jooq/SelectLimitStep.java#L135-L147
/**
* Add a <code>LIMIT</code> clause to the query
* <p>
* If there is no <code>LIMIT</code> or <code>TOP</code> clause in your
* RDBMS, this may be emulated with a <code>ROW_NUMBER()</code> window
* function and nested <code>SELECT</code> statements.
* <p>
* This is the same as calling {#link #limit(Number, Number)} with offset = 0, or
* calling <code>.limit(numberOfRows).offset(0)</code>
*/
I'm wondering if there is a setting to force-enable this option?
There seems to be a setting for the opposite, to convert ROW_NUMBER to LIMIT, but not LIMIT to ROW_NUMBER.
To get around this, I've written the below but if the ability exists in the codebase (and is probably implemented better) I'd like to take advantage of it:
fun wrapQueryInRowNumberSubquery(
stmt: SelectFinalStep<Record>,
limit: Int = 0,
offset: Int = 0
): SelectConditionStep<Record> {
stmt.query.addSelect(
DSL.rowNumber().over()
.partitionBy(DSL.field("*")) // custom logic here
.orderBy(DSL.field("*")) // custom logic here
.`as`("row_num")
)
return DSL.select(DSL.asterisk()).from(stmt)
.where(
DSL.field("row_num").greaterThan(
DSL.inline(offset)
)
)
.and(
DSL.field("row_num").lessThan(
DSL.inline(offset + limit)
)
)
}

There are currently (jOOQ 3.17) no flags to enable / disable individual emulations independently of the SQL dialect.

Related

Spring Integration Jdbc OutboundGateway returning 1 record ONLY even with MaxRows(0)

Have an application which has some 30,000+ records in a table and for an analytic use-case we need to fetch all and keep iterating over the returned result for some computation. However, the Jdbc OutboundGateway is returning ONLY 1 record even with MaxRows(0) though there are 30,000+ records in the DB. The same returns n number of records as a List when we explicitly set the MaxRows() with a non-zero value.
Please share how this can be made to return all rows with MaxRows(0)?
That's probably how your JDBC driver works or how your RDBMS is configured for maxRows.
The logic there in JdbcOutboundGateway is like this:
if (this.maxRows != null) {
Assert.notNull(this.poller, "If you want to set 'maxRows', then you must provide a 'selectQuery'.");
this.poller.setMaxRows(this.maxRows);
}
where that JdbcPollingChannelAdapter has this logic:
By default it is private int maxRows = 0;
return new PreparedStatementCreatorWithMaxRows(preparedStatementCreator,
JdbcPollingChannelAdapter.this.maxRows);
And that one:
public PreparedStatement createPreparedStatement(Connection con) throws SQLException {
PreparedStatement preparedStatement = this.delegate.createPreparedStatement(con);
preparedStatement.setMaxRows(this.maxRows); // We can't mutate provided JdbOperations for this option
return preparedStatement;
}
Then PreparedStatement:
/**
* Sets the limit for the maximum number of rows that any
* {#code ResultSet} object generated by this {#code Statement}
* object can contain to the given number.
* If the limit is exceeded, the excess
* rows are silently dropped.
*
* #param max the new max rows limit; zero means there is no limit
* #throws SQLException if a database access error occurs,
* this method is called on a closed {#code Statement}
* or the condition {#code max >= 0} is not satisfied
* #see #getMaxRows
*/
void setMaxRows(int max) throws SQLException;
zero means there is no limit
The logic in the JdbcOutboundGateway in the end is like this:
if (list.size() == 1 && (this.maxRows == null || this.maxRows == 1)) {
payload = list.get(0);
}
So, we return one record only if ResultSet has only one element.
I doubt we can do anything from Spring Integration perspective, unless you want to try with an Integer.MAX_VALUE for this property since your JDBC communication does not honor PreparedStatement.setMaxRows() contract.

Node js,Sequelize findAndCountAll with offset and limit doesn't work when contains "include" and "where: array[]" options

I'm trying to fetch paginated messages from a database given the ids of different chats. It works if I do not provide limit and offset, but when I provide the limit and offset parameters, it stops working. I use mariadb sql.
Message.findAndCountAll({
where: {chat_id: ids},//ids=> array of ints
offset: limit * page,
limit: limit,
include: {
model: UnreadMessage, as: 'unreadMessages',
where: {participant_id: userId}
}
},
)
The error I see is this
"(conn=12896, no: 1064, SQLState: 42000) You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near ''20') AS `messages` INNER JOIN `unread_message` AS `unreadMessages` ON `messa...' at line 1\nsql: SELECT `messages`.*, `unreadMessages`.`message_id` AS `unreadMessages.message_id`, `unreadMessages`.`participant_id` AS `unreadMessages.participant_id` FROM (SELECT `messages`.`id`, `messages`.`chat_id`, `messages`.`content`, `messages`.`sender_id`, `messages`.`created_at` FROM `messages` AS `messages` WHERE `messages`.`chat_id` IN (3, 5) AND ( SELECT `message_id` FROM `unread_message` AS `unreadMessages` WHERE (`unreadMessages`.`participant_id` = 10 AND `unreadMessages`.`message_id` = `messages`.`id`) LIMIT 1 ) IS NOT NULL LIMIT 0, '20') AS `messages` INNER JOIN `unread_message` AS `unreadMessages` ON `messages`.`id` = `unreadMessages`.`message_id` AND `unreadMessages`.`participant_id` = 10; - parameters:[]"
My speculation was right all along when I first saw it. The error says it all.
...right syntax to use near ''20') AS `mess....
limit is string. Cast it using +limit.
If I'm right, you're passing it from the request directly without casting it to integer.

how to let dataset add my own custom information when createorreplaceview in sparksql

I use createOrReplaceView to register a temp view in sparksql catalog, this method just need one parameter (string viewname), but I need add my own custom information, like a hashMap stored some information I need, is there a good way to do this?
I know I can use a hashmap in my own project
you can see in spark source code there is a method:
CreateViewCommand(
name = tableIdentifier,
userSpecifiedColumns = Nil,
comment = None,
properties = Map.empty,
originalText = None,
child = logicalPlan,
allowExisting = false,
replace = replace,
viewType = viewType)
but we can not pass properties, even we can not pass table description information
I'm not sure about the usecase- but you can do that using spark sql (no direct api to add table properties, so you can consider this as indirect way)-
/**
* Create or replace a view. This creates a [[CreateViewStatement]]
*
* For example:
* {{{
* CREATE [OR REPLACE] [[GLOBAL] TEMPORARY] VIEW [IF NOT EXISTS] multi_part_name
* [(column_name [COMMENT column_comment], ...) ]
* create_view_clauses
*
* AS SELECT ...;
*
* create_view_clauses (order insensitive):
* [COMMENT view_comment]
* [TBLPROPERTIES (property_name = property_value, ...)]
* }}}
*/
visit here for more information-
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala#L3493
Sample query-
val viewQuery =
s"""
| CREATE OR REPLACE TEMPORARY VIEW new_test
| COMMENT 'this is view comment'
| TBLPROPERTIES ('creator'='som', 'createdTime'=${System.currentTimeMillis()})
| AS select * from test
""".stripMargin
sqlContext.sql(viewQuery)

Alternative to count in Spark sql to check if a query return empty result

I know count action can be expensive in Spark, so to improve performance I'd like to have a different way just to check if a query can return any results
Here is what I did
var df = spark.sql("select * from table_name where condition = 'blah' limit 1");
var dfEmpty = df.head(1).isEmpty;
Is it a valid solution or is there any potential uncaught error if I use above solution to check query result? It is a lot faster though.
isEmpty is head of the data.. this is quite resonable to check empty or not and it was given by spark api and is optimized... Hence, I'd prefer this...
Also in the query I think limit 1 is not required.
/**
* Returns true if the `Dataset` is empty.
*
* #group basic
* #since 2.4.0
*/
def isEmpty: Boolean = withAction("isEmpty", limit(1).groupBy().count().queryExecution) { plan =>
plan.executeCollect().head.getLong(0) == 0
}
I think this is ok, I guess you could also omit the limit(1) because this is also part of the implementation of df.isEmpty. See also How to check if spark dataframe is empty?.
Note that the solution with df.isEmpty does may not evaluate all columns. E.g. if you have an UDF for 1 column, this will probabely not execute and could throws exceptions on a real query. df.head(1).isEmpty on the other hand will evaluate all columns for 1 rows.

Spark : Create dataframe with default values

Can we put a default value in a field of dataframe while creating the dataframe? I am creating a spark dataframe from List<Object[]> rows as :
List<org.apache.spark.sql.Row> sparkRows = rows.stream().map(RowFactory::create).collect(Collectors.toList());
Dataset<org.apache.spark.sql.Row> dataset = session.createDataFrame(sparkRows, schema);
While looking for a way, I found that org.apache.spark.sql.types.DataTypes contains object of org.apache.spark.sql.types.Metadata class. The documentation does not specify what is the exact purpose of the class :
/**
* Metadata is a wrapper over Map[String, Any] that limits the value type to simple ones: Boolean,
* Long, Double, String, Metadata, Array[Boolean], Array[Long], Array[Double], Array[String], and
* Array[Metadata]. JSON is used for serialization.
*
* The default constructor is private. User should use either [[MetadataBuilder]] or
* `Metadata.fromJson()` to create Metadata instances.
*
* #param map an immutable map that stores the data
*
* #since 1.3.0
*/
This class supports a very limited datatypes, and there is no out of the box api for making use of this class for inserting a default value while dataset creation.
Where does one use the metadata, can someone share any real life use case?
I know we can have our own map function to iterate over the rows.stream().map(RowFactory::create) and put default values. But is there any way we could do this using spark apis?
Edit : I am expecting some way similar to Oracle's DEFAULT functionality. We define a default value for each column, according to its datatype, and while creating the dataframe, if there is no value or null, then use this default value.

Resources