How to optimize DELETE .. NOT IN .. SUBQUERY in Firebird

How to optimize DELETE .. NOT IN .. SUBQUERY in Firebird - subquery

I've this kind of delete query:
DELETE
FROM SLAVE_TABLE
WHERE ITEM_ID NOT IN (SELECT ITEM_ID FROM MASTER_TABLE)
Are there any way to optimize this?

You can use EXECUTE BLOCK for sequential scanning of detail table and deleting records where no master record is matched.
EXECUTE BLOCK
AS
DECLARE VARIABLE C CURSOR FOR
(SELECT d.id
FROM detail d LEFT JOIN master m
ON d.master_id = m.id
WHERE m.id IS NULL);
DECLARE VARIABLE I INTEGER;
BEGIN
OPEN C;
WHILE (1 = 1) DO
BEGIN
FETCH C INTO :I;
IF(ROW_COUNT = 0)THEN
LEAVE;
DELETE FROM detail WHERE id = :I;
END
CLOSE C;
END

(NOT) IN can usually be optimized by using (NOT) EXISTS instead.
DELETE
FROM SLAVE_TABLE
WHERE NOT EXISTS (SELECT 1 FROM MASTER_TABLE M WHERE M.ITEM_ID = ITEM_ID)
I am not sure what you are trying to do here, but to me this query indicates that you should be using foreign keys to enforce these kind of constraints, not run queries to cleanup the mess afterwards.

Related

ATHENA/PRESTO complex query with multiple unnested tables

i have i would like to create a join over several tables.
table login : I would like to retrieve all the data from login
table logging : calculating the Nb_of_sessions for each db & for each a specific event type by user
table meeting : calculating the Nb_of_meetings for each db & for each user
table live : calculating the Nb_of_live for each db & for each user
I have those queries with the right results :
SELECT db.id,_id as userid,firstname,lastname
FROM "logins"."login",
UNNEST(dbs) AS a1 (db)
SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
array_join(array_agg(value.from_url),',') as from_url
FROM "loggings"."logging"
where event='url_event'
group by db.id,userid;
SELECT dbid,userid AS userid,count(*) as nb_interviews,
array_join(array_agg(interviewer),',') as interviewer
FROM "meetings"."meeting"
group by dbid,userid;
SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat
FROM "lives"."live",
UNNEST(users) AS r1 (user)
group by dbid,r1.user._id;
But when i begin to try put it all together, it seems i retrieve bad data (i have only on db retrieved) and it seems not efficient.
select a1.db.id,a._id as userid,a.firstname,a.lastname,count(rl._id) as nb_chat
FROM
"logins"."login" a,
"loggings"."logging" b,
"meetings"."meeting" c,
"lives"."live" d,
UNNEST(dbs) AS a1 (db),
UNNEST(users) AS r1 (user)
where a._id = b.userid AND a._id = c.userid AND a._id = r1.user._id
group by 1,2,3,4
Do you have an idea ?
Regards.

The easiest way is to work with with to structure the subquery and then reference them.
with parameter reference:
You can use WITH to flatten nested queries, or to simplify subqueries.
The WITH clause precedes the SELECT list in a query and defines one or
more subqueries for use within the SELECT query.
Each subquery defines a temporary table, similar to a view definition,
which you can reference in the FROM clause. The tables are used only
when the query runs.
Since you already have working sub queries, the following should work:
with logins as
(
SELECT db.id,_id as userid,firstname,lastname
FROM "logins"."login",
UNNEST(dbs) AS a1 (db)
)
,visits as
(
SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
array_join(array_agg(value.from_url),',') as from_url
FROM "loggings"."logging"
where event='url_event'
group by db.id,userid
)
,meetings as
(
SELECT dbid,userid AS userid,count(*) as nb_interviews,
array_join(array_agg(interviewer),',') as interviewer
FROM "meetings"."meeting"
group by dbid,userid
)
,chats as
(
SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat
FROM "lives"."live",
UNNEST(users) AS r1 (user)
group by dbid,r1.user._id
)
select *
from logins l
left join visits v
on l.dbid = v.dbid
and l.userid = v.userid
left join meetings m
on l.dbid = m.dbid
and l.userid = m.userid
left join chats c
on l.dbid = c.dbid
and l.userid = c.userid;

force replication of replicated tables

Some of my tables are of type REPLICATE. I would these tables to be actually replicated (not pending) before I start querying my data. This will help me avoid data movement.
I have a script, which I found online, which runs in a loop and do a SELECT TOP 1 on all the tables which are set for replication, but sometimes the script runs for hours. It may seem as the server sometimes won't trigger replication even if you do a SELECT TOP 1 from foo.
How can you force SQL Datawarehouse to complete replication?
The script looks something like this:
begin
CREATE TABLE #tbl
WITH
( DISTRIBUTION = ROUND_ROBIN
)
AS
SELECT
ROW_NUMBER() OVER(
ORDER BY
(
SELECT
NULL
)) AS Sequence
, CONCAT('SELECT TOP(1) * FROM ', s.name, '.', t.[name]) AS sql_code
FROM sys.pdw_replicated_table_cache_state AS p
JOIN sys.tables AS t
ON t.object_id = p.object_id
JOIN sys.schemas AS s
ON t.schema_id = s.schema_id
WHERE p.[state] = 'NotReady';
DECLARE #nbr_statements INT=
(
SELECT
COUNT(*)
FROM #tbl
), #i INT= 1;
WHILE #i <= #nbr_statements
BEGIN
DECLARE #sql_code NVARCHAR(4000)= (SELECT
sql_code
FROM #tbl
WHERE Sequence = #i);
EXEC sp_executesql #sql_code;
SET #i+=1;
END;
DROP TABLE #tbl;
SET #i = 0;
WHILE
(
SELECT TOP (1)
p.[state]
FROM sys.pdw_replicated_table_cache_state AS p
JOIN sys.tables AS t
ON t.object_id = p.object_id
JOIN sys.schemas AS s
ON t.schema_id = s.schema_id
WHERE p.[state] = 'NotReady'
) = 'NotReady'
BEGIN
IF #i % 100 = 0
BEGIN
RAISERROR('Replication in progress' , 0, 0) WITH NOWAIT;
END;
SET #i = #i + 1;
END;
END

Henrik, if 'select top 1' doesn't trigger a replicated table build, then that would be a defect. Please file a support ticket.
Without looking at your system, it is impossible to know exactly what is going on. Here are a couple of things that could be in factoring into extended build time to look into:
The replicated tables are large (size, not necessarily rows) requiring long build times.
There are a lot of secondary indexes on the replicated table requiring long build times.
Replicated table builds require statirc20 (2 concurrency slots). If the concurrency slots are not available, the build will queue behind other running queries.
The replicated tables are constantly being modified with inserts, updates and deletes. Modifications require the table to be built again.

The best way is to run a command like this as part of the job which creates/updates the table:
select top 1 * from <table>
That will force its redistribution at the correct time, without the slow loop through the stored procedure.

U-Sql not allowing non-equijoins

I have stumbled across a bit of an issue with U-SQL which for me is a problem I haven't yet found a workaround for.
It seems U-SQL doesnt support anything else but == in joins, so you can't put > or < in the join itself.
For the use case below which I have done in oracle:
create table trf.test_1(
number_col int
);
insert into trf.test_1 VALUES (10);
insert into trf.test_1 VALUES (20);
insert into trf.test_1 VALUES (30);
insert into trf.test_1 VALUES (60);
drop table trf.test_2;
create table trf.test_2(
number_col int
);
insert into trf.test_2 VALUES (20);
insert into trf.test_2 VALUES (30);
SELECT t1.number_col, t2.number_col
FROM trf.test_1 t1
LEFT JOIN trf.test_2 t2 ON t1.number_col < t2.number_col
;
I get the following:
How might I do that in u-sql without the < join?
I tried a cross join, but if you include the < in the where clause it just turns into an inner and you don't get the rows with the nulls.
Any ideas appreciated.
#t1 =
SELECT * FROM
( VALUES
(10),
(20),
(30),
(60)
) AS T(num_col);
#t2 =
SELECT * FROM
( VALUES
(20),
(30)
) AS T(num_col);
#result =
SELECT t1.num_col, t2.num_col AS num_col_2
FROM #t1 AS t1
CROSS JOIN #t2 AS t2
WHERE t1.num_col < t2.num_col;
#result2 =
SELECT t1.num_col, t2.num_col AS num_col_2
FROM #t1 AS t1
LEFT JOIN #result AS t2 ON t1.num_col == t2.num_col;
OUTPUT #result2
TO "/Output/ReferenceGuide/Joins/exampleA.csv"
USING Outputters.Csv();
Edit - I added the left join from the #t1 dataset back to the #result set which seems to work but would be interested if there are any better solutions out there. Seems a bit of a work around.

This is a known feature and discussed extensively in the article "U-SQL SELECT Selecting from joins".
Some quotes from that article:
Join Comparisons
U-SQL, like most scaled out Big Data Query languages
that support joins, restricts the join comparison to equality
comparisons between columns in the rowsets to be joined...
...
If one has a non-equality comparison or a more complex expression (such as a method invocation) in the comparison, one can move the comparison to the SELECT’s WHERE clause. Or the more complex expression can be placed in an earlier SELECT statement’s column and then that alias can be referred to in the join comparison.
Basically they don't scale particularly well on a distributed platform like ADLA.

PL/SQL Join Collection Object problems

I am working with an Oracle 11g database, release 11.2.0.3.0 - 64 bit production
I have written the following procedure which uses a cursor to collect batches of benefit_ids (which are simply of type NUMBER) from a table called benefit_info. For each benefit_id within each batch, I need to obtain the associated customers and then perform various calculations etc. So far I have the following:
CREATE OR REPLACE PROCEDURE ben_correct(in_bulk_collect_limit IN PLS_INTEGER DEFAULT 1000)
IS
TYPE ben_identity_rec IS RECORD
(
life_scd_id NUMBER,
benefit_id NUMBER
);
TYPE ben_identity_col IS TABLE OF ben_identity_rec INDEX BY PLS_INTEGER;
life_col ben_identity_col;
ben_id NUMBER;
CURSOR benefit_cur
IS
SELECT benefit_id FROM benefit_info;
TYPE benefit_ids_t IS TABLE OF NUMBER INDEX BY PLS_INTEGER;
benefit_ids benefit_ids_t;
PROCEDURE get_next_set_of_incoming(out_benefit_ids OUT NOCOPY benefit_ids_t)
IS
BEGIN
FETCH benefit_cur
BULK COLLECT INTO out_benefit_ids
LIMIT in_bulk_collect_limit;
END;
BEGIN
OPEN benefit_cur;
LOOP
get_next_set_of_incoming(benefit_ids);
/*
The code below is too slow as each benefit_id is considered
individually. Want to change FOR LOOP into LEFT JOIN of benefit_ids
*/
FOR indx IN 1 .. benefit_ids.count LOOP
ben_id := benefit_ids(indx);
SELECT c.life_scd_id, c.benefit_id
BULK COLLECT INTO life_col
FROM customer c
WHERE c.benefit_id = ben_id;
-- Now do further processing with life_col
END LOOP;
EXIT WHEN benefit_ids.count = 0;
END LOOP;
CLOSE benefit_cur;
END;
/
As indicated in the code above, the FOR indx IN 1 .. LOOP is VERY slow, particularly as there are millions of benefit_ids. However, I am aware I can replace the entire FOR LOOP with something like:
SELECT c.life_scd_id, c.benefit_id
BULK COLLECT INTO life_col
FROM customer c
LEFT JOIN table(benefit_ids) b
WHERE b.benefit_id IS NOT NULL;
However, for that to work I think I need to declare an Object type at the schema level as I think in the SELECT query you can join on pure tables or collections of objects. Therefore, from the procedure I remove
TYPE benefit_ids_t IS TABLE OF NUMBER INDEX BY PLS_INTEGER;
and instead at the schema level I have defined
CREATE OR REPLACE TYPE ben_id FORCE AS object
(
benefit_id number
);
CREATE OR REPLACE TYPE benefit_ids_t FORCE AS TABLE OF ben_id;
My revised code essentially becomes:
CREATE OR REPLACE PROCEDURE ben_correct(in_bulk_collect_limit IN PLS_INTEGER DEFAULT 1000)
IS
sql_str VARCHAR2(1000);
TYPE ben_identity_rec IS RECORD
(
life_scd_id NUMBER,
benefit_id NUMBER
);
TYPE ben_identity_col IS TABLE OF ben_identity_rec INDEX BY PLS_INTEGER;
life_col ben_identity_col;
CURSOR benefit_cur
IS
SELECT benefit_id FROM benefit_info;
--- benefit_ids_t has now been declared at schema level
benefit_ids benefit_ids_t;
PROCEDURE get_next_set_of_incoming(out_benefit_ids OUT NOCOPY benefit_ids_t)
IS
BEGIN
FETCH benefit_cur
BULK COLLECT INTO out_benefit_ids
LIMIT in_bulk_collect_limit;
END;
BEGIN
OPEN benefit_cur;
LOOP
get_next_set_of_incoming(benefit_ids);
sql_str := 'SELECT c.life_scd_id, c.benefit_id
FROM customer c
LEFT JOIN table(benefit_ids) b
WHERE b.benefit_id IS NOT NULL';
EXECUTE IMMEDIATE sql_str BULK COLLECT INTO life_col;
-- Now do further processing with life_col
EXIT WHEN benefit_ids.count = 0;
END LOOP;
CLOSE benefit_cur;
END;
/
However, this generates ORA-24344 and PLS-00386 errors, ie type mismatch found at 'OUT_BENEFIT_IDS' between FETCH cursor and INTO variables.
I sort of understand that it is complaining that benefit_ids_t is now a table of ben_ids, which are in turn objects of type number, which is in't quite the same as a table of numbers.
I've tried various attempts at resolving the issues, but I can't seem to quite get it right. Any help would be gratefully appreciated.
Also, any general comments to improve are welcome.

You don't need your table type to be of an object containing a number field, it can just be a table of numbers:
CREATE OR REPLACE TYPE benefit_ids_t FORCE AS TABLE OF number;
Or you can use a built-in type like sys.odcinumberlist, but having your own type under your control isn't a bad thing.
You don't want to use dynamic SQL though; this:
sql_str := 'SELECT c.life_scd_id, c.benefit_id
FROM customer c
LEFT JOIN table(benefit_ids) b
WHERE b.benefit_id IS NOT NULL';
EXECUTE IMMEDIATE sql_str BULK COLLECT INTO life_col;
won't work because benefit_ids isn't in scope when that dynamic statement is executed. You can just do it statically:
SELECT c.life_scd_id, c.benefit_id
BULK COLLECT INTO life_col
FROM table(benefit_ids) b
JOIN customer c
ON c.benefit_id = b.column_value;
which is closer to what you had in your original code.
Your EXIT is also in the wrong place - it will try to process rows in a loop when it doesn't find any. I wouldn't bother with the separate fetch procedure at all, it's easier to follow with the fetch directly in the loop:
BEGIN
OPEN benefit_cur;
LOOP
FETCH benefit_cur
BULK COLLECT INTO benefit_ids
LIMIT in_bulk_collect_limit;
EXIT WHEN benefit_ids.count = 0;
SELECT c.life_scd_id, c.benefit_id
BULK COLLECT INTO life_col
FROM table(benefit_ids) b
JOIN customer c
ON c.benefit_id = b.column_value;
-- Now do further processing with life_col
END LOOP;
CLOSE benefit_cur;
END;
If you did really want your object type, you could keep that, but you would need to make your cursor return instances of that object, via its default constructor:
CURSOR benefit_cur
IS
SELECT ben_id(benefit_id) FROM benefit_info;
The customer query join would then be:
SELECT c.life_scd_id, c.benefit_id
BULK COLLECT INTO life_col
FROM table(benefit_ids) b
JOIN customer c
ON c.benefit_id = b.benefit_id;
As it's an object type you can refer to it's field name, benefit_id, rather than the generic column_value from the scalar type table.

Subsonic 3 Simple Query inner join sql syntax

I want to perform a simple join on two tables (BusinessUnit and UserBusinessUnit), so I can get a list of all BusinessUnits allocated to a given user.
The first attempt works, but there's no override of Select which allows me to restrict the columns returned (I get all columns from both tables):
var db = new KensDB();
SqlQuery query = db.Select
.From<BusinessUnit>()
.InnerJoin<UserBusinessUnit>( BusinessUnitTable.IdColumn, UserBusinessUnitTable.BusinessUnitIdColumn )
.Where( BusinessUnitTable.RecordStatusColumn ).IsEqualTo( 1 )
.And( UserBusinessUnitTable.UserIdColumn ).IsEqualTo( userId );
The second attept allows the column name restriction, but the generated sql contains pluralised table names (?)
SqlQuery query = new Select( new string[] { BusinessUnitTable.IdColumn, BusinessUnitTable.NameColumn } )
.From<BusinessUnit>()
.InnerJoin<UserBusinessUnit>( BusinessUnitTable.IdColumn, UserBusinessUnitTable.BusinessUnitIdColumn )
.Where( BusinessUnitTable.RecordStatusColumn ).IsEqualTo( 1 )
.And( UserBusinessUnitTable.UserIdColumn ).IsEqualTo( userId );
Produces...
SELECT [BusinessUnits].[Id], [BusinessUnits].[Name]
FROM [BusinessUnits]
INNER JOIN [UserBusinessUnits]
ON [BusinessUnits].[Id] = [UserBusinessUnits].[BusinessUnitId]
WHERE [BusinessUnits].[RecordStatus] = #0
AND [UserBusinessUnits].[UserId] = #1
So, two questions:
- How do I restrict the columns returned in method 1?
- Why does method 2 pluralise the column names in the generated SQL (and can I get round this?)
I'm using 3.0.0.3...

So far my experience with 3.0.0.3 suggests that this is not possible yet with the query tool, although it is with version 2.
I think the preferred method (so far) with version 3 is to use a linq query with something like:
var busUnits = from b in BusinessUnit.All()
join u in UserBusinessUnit.All() on b.Id equals u.BusinessUnitId
select b;

I ran into the pluralized table names myself, but it was because I'd only re-run one template after making schema changes.
Once I re-ran all the templates, the plural table names went away.
Try re-running all 4 templates and see if that solves it for you.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to optimize DELETE .. NOT IN .. SUBQUERY in Firebird - subquery

I've this kind of delete query: DELETE FROM SLAVE_TABLE WHERE ITEM_ID NOT IN (SELECT ITEM_ID FROM MASTER_TABLE) Are there any way to optimize this?

Related

ATHENA/PRESTO complex query with multiple unnested tables

force replication of replicated tables

U-Sql not allowing non-equijoins

PL/SQL Join Collection Object problems

Subsonic 3 Simple Query inner join sql syntax

Categories

Resources