I'm working off the example from the jooq blog post: https://blog.jooq.org/jooq-3-15s-new-multiset-operator-will-change-how-you-think-about-sql/. I've got the same setup: a foreign key relationship, and I want to load the Parent table and also get all rows that reference each of the Parent rows:
CREATE TABLE parent (
id BIGINT NOT NULL,
CONSTRAINT pk_parent PRIMARY KEY (id)
);
CREATE TABLE item (
id BIGINT NOT NULL,
parent_id BIGINT NOT NULL,
type VARCHAR(255) NOT NULL,
CONSTRAINT pk_item PRIMARY KEY (id),
FOREIGN KEY (parent_id) REFERENCES parent (id)
);
This is what I think the jooq query should look like:
#Test
public void test() {
dslContext.insertInto(PARENT, PARENT.ID).values(123L).execute();
dslContext.insertInto(PARENT, PARENT.ID).values(456L).execute();
dslContext.insertInto(ITEM, ITEM.ID, ITEM.PARENT_ID, ITEM.TYPE).values(1L, 123L, "t1").execute();
dslContext.insertInto(ITEM, ITEM.ID, ITEM.PARENT_ID, ITEM.TYPE).values(2L, 456L, "t2").execute();
var result = dslContext.select(
PARENT.ID,
DSL.multiset(
DSL.select(
ITEM.ID,
ITEM.PARENT_ID,
ITEM.TYPE)
.from(ITEM)
.join(PARENT).onKey()))
.from(PARENT)
.fetch();
System.out.println(result);
}
The result is that each Item shows up for each Parent:
Executing query : select "PUBLIC"."PARENT"."ID", (select coalesce(json_arrayagg(json_array("v0", "v1", "v2" null on null)), json_array(null on null)) from (select "PUBLIC"."ITEM"."ID" "v0", "PUBLIC"."ITEM"."PARENT_ID" "v1", "PUBLIC"."ITEM"."TYPE" "v2" from "PUBLIC"."ITEM" join "PUBLIC"."PARENT" on "PUBLIC"."ITEM"."PARENT_ID" = "PUBLIC"."PARENT"."ID") "t") from "PUBLIC"."PARENT"
Fetched result : +----+----------------------------+
: | ID|multiset |
: +----+----------------------------+
: | 123|[(1, 123, t1), (2, 456, t2)]|
: | 456|[(1, 123, t1), (2, 456, t2)]|
: +----+----------------------------+
Fetched row(s) : 2
I also tried using doing an explicit check for Parent.id == Item.parent_id, but it didn't generate valid SQL:
var result =
dslContext
.select(
PARENT.ID,
DSL.multiset(
DSL.select(ITEM.ID, ITEM.PARENT_ID, ITEM.TYPE)
.from(ITEM)
.where(ITEM.PARENT_ID.eq(PARENT.ID))))
.from(PARENT)
.fetch();
Error:
jOOQ; bad SQL grammar [select "PUBLIC"."PARENT"."ID", (select coalesce(json_arrayagg(json_array("v0", "v1", "v2" null on null)), json_array(null on null)) from (select "PUBLIC"."ITEM"."ID" "v0", "PUBLIC"."ITEM"."PARENT_ID" "v1", "PUBLIC"."ITEM"."TYPE" "v2" from "PUBLIC"."ITEM" where "PUBLIC"."ITEM"."PARENT_ID" = "PUBLIC"."PARENT"."ID") "t") from "PUBLIC"."PARENT"]
org.springframework.jdbc.BadSqlGrammarException: jOOQ; bad SQL grammar [select "PUBLIC"."PARENT"."ID", (select coalesce(json_arrayagg(json_array("v0", "v1", "v2" null on null)), json_array(null on null)) from (select "PUBLIC"."ITEM"."ID" "v0", "PUBLIC"."ITEM"."PARENT_ID" "v1", "PUBLIC"."ITEM"."TYPE" "v2" from "PUBLIC"."ITEM" where "PUBLIC"."ITEM"."PARENT_ID" = "PUBLIC"."PARENT"."ID") "t") from "PUBLIC"."PARENT"]
at org.jooq_3.17.6.H2.debug(Unknown Source)
What am I doing wrong here?
Correlated derived table support
There are numerous SQL dialects that can emulate MULTISET in principle, but not if you correlate them like you did. According to #12045, these dialects do not support correlated derived tables:
Db2
H2
MariaDB (see https://jira.mariadb.org/browse/MDEV-28196)
MySQL 5.7 (8 can handle it)
Oracle 11g (12c can handle it)
#12045 was fixed in jOOQ 3.18, producing a slighly less robust and more limited MULTISET emulation that only works in the absence of:
DISTINCT
UNION (and other set operations)
OFFSET .. FETCH
GROUP BY and HAVING
WINDOW and QUALIFY
But that probably doesn't affect 95% of all MULTISET usages.
Workarounds
You could use MULTISET_AGG, which doesn't suffer from this limitation (but is generally less powerful)
You could stop using H2, in case you're using that only as a test database (jOOQ recommends you integration test directly against your target database. This is a prime example of why this is generally better)
You could upgrade to 3.18.0-SNAPSHOT for the time being (built off github, or available from here, if you're licensed: https://www.jooq.org/download/versions)
Related
I have a node server accessing a postgres database through a npm package, pg, and have a working query that returns the data, but I think it may be able to be optimized. The data model is of versions and features, one version has many feature children. This query pattern works in a few contexts for my app, but it looks clumsy. Is there a cleaner way?
SELECT
v.*,
coalesce(
(SELECT array_to_json(array_agg(row_to_json(x))) FROM (select f.* from app_feature f where f.version = v.id) x ),
'[]'
) as features FROM app_version v
CREATE TABLE app_version(
id SERIAL PRIMARY KEY,
major INT NOT NULL,
mid INT NOT NULL,
minor INT NOT NULL,
date DATE,
description VARCHAR(256),
status VARCHAR(24)
);
CREATE TABLE app_feature(
id SERIAL PRIMARY KEY,
version INT,
description VARCHAR(256),
type VARCHAR(24),
CONSTRAINT FK_app_feature_version FOREIGN KEY(version) REFERENCES app_version(id)
);
INSERT INTO app_version (major, mid, minor, date, description, status) VALUES (0,0,0, current_timestamp, 'initial test', 'PENDING');
INSERT INTO app_feature (version, description, type) VALUES (1, 'store features', 'New Feature')
INSERT INTO app_feature (version, description, type) VALUES (1, 'return features as json', 'New Feature');
The subquery in FROM clause may not be needed.
select v.*,
coalesce((select array_to_json(array_agg(row_to_json(f)))
from app_feature f
where f.version = v.id), '[]') as features
from app_version v;
And my 5 cents. Pls. note that id is primary key of app_version so it's possible to group by app_version.id only.
select v.*, coalesce(json_agg(to_json(f)), '[]') as features
from app_version v join app_feature f on f.version = v.id
group by v.id;
You could move the JSON aggregation into a view, then join to the view:
create view app_features_json
as
select af.version,
json_agg(row_to_json(af)) as features
from app_feature af
group by af.version;
The use that view in a join:
SELECT v.*,
fj.features
FROM app_version v
join app_features_json afj on afj.version = v.id
I have two kind of record mention below in my table staudentdetail of cosmosDb.In below example previousSchooldetail is nullable filed and it can be present for student or not.
sample record below :-
{
"empid": "1234",
"empname": "ram",
"schoolname": "high school ,bankur",
"class": "10",
"previousSchooldetail": {
"prevSchoolName": "1763440",
"YearLeft": "2001"
} --(Nullable)
}
{
"empid": "12345",
"empname": "shyam",
"schoolname": "high school",
"class": "10"
}
I am trying to access the above record from azure databricks using pyspark or scala code .But when we are building the dataframe reading it from cosmos db it does not bring previousSchooldetail detail in the data frame.But when we change the query including id for which the previousSchooldetail show in the data frame .
Case 1:-
val Query = "SELECT * FROM c "
Result when query fired directly
empid
empname
schoolname
class
Case2:-
val Query = "SELECT * FROM c where c.empid=1234"
Result when query fired with where clause.
empid
empname
school name
class
previousSchooldetail
prevSchoolName
YearLeft
Could you please tell me why i am not able to get previousSchooldetail in case 1 and how should i proceed.
As #Jayendran, mentioned in the comments, the first query will give you the previouschooldetail document wherever they are available. Else, the column would not be present.
You can have this column present for all the scenarios by using the IS_DEFINED function. Try tweaking your query as below:
SELECT c.empid,
c.empname,
IS_DEFINED(c.previousSchooldetail) ? c.previousSchooldetail : null
as previousSchooldetail,
c.schoolname,
c.class
FROM c
If you are looking to get the result as a flat structure, it can be tricky and would need to use two separate queries such as:
Query 1
SELECT c.empid,
c.empname,
c.schoolname,
c.class,
p.prevSchoolName,
p.YearLeft
FROM c JOIN c.previousSchooldetail p
Query 2
SELECT c.empid,
c.empname,
c.schoolname,
c.class,
null as prevSchoolName,
null as YearLeft
FROM c
WHERE not IS_DEFINED (c.previousSchooldetail) or
c.previousSchooldetail = null
Unfortunately, Cosmos DB does not support LEFT JOIN or UNION. Hence, I'm not sure if you can achieve this in a single query.
Alternatively, you can create a stored procedure to return the desired result.
Curious to find out what the best way is to generate relationship identities through ADF.
Right now, I'm consuming JSON data that does not have any identity information. This data is then transformed into multiple database sink tables with relationships (1..n, etc.). Due to FK constraints on some of the destination sink tables, these relationships need to be "built up" one at a time.
This approach seems a bit kludgy, so I'm looking to see if there are other options that I'm not aware of.
Note that I need to include the Surrogate key generation for each insert. If I do not do this, based on output database schema, I'll get a 'cannot insert PK null' error.
Also note that I turn IDENTITY_INSERT ON/OFF for each sink.
I would tend to take more of an ELT approach and use the native JSON abilites in Azure SQL DB, ie OPENJSON. You could land the JSON in a table in Azure SQL DB using ADF (eg a Stored Proc activity) and then call another stored proc to process the JSON, something like this:
-- Setup
DROP TABLE IF EXISTS #tmp
DROP TABLE IF EXISTS import.City;
DROP TABLE IF EXISTS import.Region;
DROP TABLE IF EXISTS import.Country;
GO
DROP SCHEMA IF EXISTS import
GO
CREATE SCHEMA import
CREATE TABLE Country ( CountryKey INT IDENTITY PRIMARY KEY, CountryName VARCHAR(50) NOT NULL UNIQUE )
CREATE TABLE Region ( RegionKey INT IDENTITY PRIMARY KEY, CountryKey INT NOT NULL FOREIGN KEY REFERENCES import.Country, RegionName VARCHAR(50) NOT NULL UNIQUE )
CREATE TABLE City ( CityKey INT IDENTITY(100,1) PRIMARY KEY, RegionKey INT NOT NULL FOREIGN KEY REFERENCES import.Region, CityName VARCHAR(50) NOT NULL UNIQUE )
GO
DECLARE #json NVARCHAR(MAX) = '{
"Cities": [
{
"Country": "England",
"Region": "Greater London",
"City": "London"
},
{
"Country": "England",
"Region": "West Midlands",
"City": "Birmingham"
},
{
"Country": "England",
"Region": "Greater Manchester",
"City": "Manchester"
},
{
"Country": "Scotland",
"Region": "Lothian",
"City": "Edinburgh"
}
]
}'
SELECT *
INTO #tmp
FROM OPENJSON( #json, '$.Cities' )
WITH
(
Country VARCHAR(50),
Region VARCHAR(50),
City VARCHAR(50)
)
GO
-- Add the Country first (has no foreign keys)
INSERT INTO import.Country ( CountryName )
SELECT DISTINCT Country
FROM #tmp s
WHERE NOT EXISTS ( SELECT * FROM import.Country t WHERE s.Country = t.CountryName )
-- Add the Region next including Country FK
INSERT INTO import.Region ( CountryKey, RegionName )
SELECT t.CountryKey, s.Region
FROM #tmp s
INNER JOIN import.Country t ON s.Country = t.CountryName
-- Now add the City with FKs
INSERT INTO import.City ( RegionKey, CityName )
SELECT r.RegionKey, s.City
FROM #tmp s
INNER JOIN import.Country c ON s.Country = c.CountryName
INNER JOIN import.Region r ON s.Region = r.RegionName
AND c.CountryKey = r.CountryKey
SELECT * FROM import.City;
SELECT * FROM import.Region;
SELECT * FROM import.Country;
This is a simple test script designed to show the idea and should run end-to-end but it is not production code.
How do I select all relevant records according to the provided list of pairs?
table:
CREATE TABLE "users_groups" (
"user_id" INTEGER NOT NULL,
"group_id" BIGINT NOT NULL,
PRIMARY KEY (user_id, group_id),
"permissions" VARCHAR(255)
);
For example, if I have the following JavaScript array of pairs that I should get from DB
[
{user_id: 1, group_id: 19},
{user_id: 1, group_id: 11},
{user_id: 5, group_id: 19}
]
Here we see that the same user_id can be in multiple groups.
I can pass with for-loop over every array element and create the following query:
SELECT * FROM users_groups
WHERE (user_id = 1 AND group_id = 19)
OR (user_id = 1 AND group_id = 11)
OR (user_id = 5 AND group_id = 19);
But is this the best solution? Let say if the array is very long. As I know query length may get ~1GB.
what is the best and quick solution to do this?
Bill Karwin's answer will work for Postgres just as well.
However, I have made the experience that joining against a VALUES clause is very often faster than a large IN list (with hundreds if not thousands of elements):
select ug.*
from user_groups ug
join (
values (1,19), (1,11), (5,19), ...
) as l(uid, guid) on l.uid = ug.user_id and l.guid = ug.group_id;
This assumes that there are no duplicates in the values provided, otherwise the JOIN would result in duplicated rows, which the IN solution would not do.
You tagged both mysql and postgresql, so I don't know which SQL database you're really using.
MySQL at least supports tuple comparisons:
SELECT * FROM users_groups WHERE (user_id, group_id) IN ((1,19), (1,11), (5,19), ...)
This kind of predicate can be optimized in MySQL 5.7 and later. See https://dev.mysql.com/doc/refman/5.7/en/range-optimization.html#row-constructor-range-optimization
I don't know whether PostgreSQL supports this type of predicate, or if it optimizes it.
SELECT count(*) FROM device_stats
WHERE orgid = 'XYZ'
AND regionid = 'NY'
AND campusid = 'C1'
AND buildingid = 'C1'
AND floorid = '2'
AND year = 2017;
The above CQL query returns correct result - 32032, in CQL Shell
But when I run the same query using QueryBuilder Java API , I see the count as 0
BuiltStatement summaryQuery = QueryBuilder.select()
.countAll()
.from("device_stats")
.where(eq("orgid", "XYZ"))
.and(eq("regionid", "NY"))
.and(eq("campusid", "C1"))
.and(eq("buildingid", "C1"))
.and(eq("floorid", "2"))
.and(eq("year", "2017"));
try {
ResultSetFuture tagSummaryResults = session.executeAsync(tagSummaryQuery);
tagSummaryResults.getUninterruptibly().all().stream().forEach(result -> {
System.out.println(" totalCount > "+result.getLong(0));
});
I have only 20 partitions and 32032 rows per partition.
What could be the reason QueryBuilder not executing the query correctly ?
Schema :
CREATE TABLE device_stats (
orgid text,
regionid text,
campusid text,
buildingid text,
floorid text,
year int,
endofwindow timestamp,
categoryid timeuuid,
devicestats map<text,bigint>,
PRIMARY KEY ((orgid, regionid, campusid, buildingid, floorid,year),endofwindow,categoryid)
) WITH CLUSTERING ORDER BY (endofwindow DESC,categoryid ASC);
// Using the keys function to index the map keys
CREATE INDEX ON device_stats (keys(devicestats));
I am using cassandra 3.10 and com.datastax.cassandra:cassandra-driver-core:3.1.4
Moving my comment to an answer since that seems to solve the original problem:
Changing .and(eq("year", "2017")) to .and(eq("year", 2017)) solves the issue since year is an int and not a text.