Spanner mutation limit on deleting 1 row of parent table - google-cloud-spanner

Spanner documentation says:
Insert and update operations count with the multiplicity of the number of columns they affect. For example, inserting a new record may count as five mutations, if values are inserted into five columns. Delete and delete range operations count as one mutation regardless of the number of columns affected. Deleting a row from a parent table that has the ON DELETE CASCADE annotation is also counted as one mutation regardless of the number of interleaved child rows present.
Today I'm trying to delete 1 row from a parent table whose the child (interleaved) table has ON DELETE CASCADE specified.
Example: DELETE FROM my_table WHERE some_primary_key='somevalue'
I'm getting error message saying I'm hitting mutation limit because the child table has more than 20k rows in this split. According to the documention above that should still count as 1 mutation no matter how many rows there are in the child table.
Note that the child table has a secondary index setup.
Is there an issue with this documentation, or am I missing something?

It seems that each subsequent index update is counted as a mutation. I did the following experiments to verify it:
Create a table
CREATE TABLE ParentTable (
Pid INT64 NOT NULL,
Data STRING(1024),
) PRIMARY KEY(Pid);
CREATE TABLE ChildTable (
Pid INT64 NOT NULL,
Cid INT64 NOT NULL,
Data STRING(1024),
) PRIMARY KEY(Pid, Cid),
INTERLEAVE IN PARENT ParentTable ON DELETE CASCADE;
Populate 1 parent row and 30000 child rows:
// Inserts 1 ParentTable rows
INSERT INTO ParentTable (Pid) (SELECT * FROM UNNEST([1]));
// Insert 30000 rows to ChildTable for pid=1
INSERT INTO ChildTable (Pid, Cid) (SELECT 1, child.Cid FROM
(SELECT 0+G.g AS Cid FROM (SELECT E.e*10+F.f AS g FROM (SELECT C.c*10+D.d AS e FROM (SELECT A.a*10 + B.b AS c FROM (SELECT * FROM UNNEST([0,1,2,3,4,5,6,7,8,9]) AS a) AS A, (SELECT * FROM UNNEST([0,1,2,3,4,5,6,7,8,9]) AS b) AS B) AS C, (SELECT * FROM UNNEST([0,1,2,3,4,5,6,7,8,9]) AS d) AS D) AS E, (SELECT * FROM UNNEST([0,1,2,3,4,5,6,7,8,9]) as f) AS F) AS G) AS Child);
INSERT INTO ChildTable (Pid, Cid) (SELECT 1, child.Cid FROM
(SELECT 10000+G.g AS Cid FROM (SELECT E.e*10+F.f AS g FROM (SELECT C.c*10+D.d AS e FROM (SELECT A.a*10 + B.b AS c FROM (SELECT * FROM UNNEST([0,1,2,3,4,5,6,7,8,9]) AS a) AS A, (SELECT * FROM UNNEST([0,1,2,3,4,5,6,7,8,9]) AS b) AS B) AS C, (SELECT * FROM UNNEST([0,1,2,3,4,5,6,7,8,9]) AS d) AS D) AS E, (SELECT * FROM UNNEST([0,1,2,3,4,5,6,7,8,9]) as f) AS F) AS G) AS Child);
INSERT INTO ChildTable (Pid, Cid) (SELECT 1, child.Cid FROM
(SELECT 20000+G.g AS Cid FROM (SELECT E.e*10+F.f AS g FROM (SELECT C.c*10+D.d AS e FROM (SELECT A.a*10 + B.b AS c FROM (SELECT * FROM UNNEST([0,1,2,3,4,5,6,7,8,9]) AS a) AS A, (SELECT * FROM UNNEST([0,1,2,3,4,5,6,7,8,9]) AS b) AS B) AS C, (SELECT * FROM UNNEST([0,1,2,3,4,5,6,7,8,9]) AS d) AS D) AS E, (SELECT * FROM UNNEST([0,1,2,3,4,5,6,7,8,9]) as f) AS F) AS G) AS Child);
Verify that the child table has 30000 rows with Pid=1:
// Verify counts
SELECT COUNT(*) FROM ChildTable WHERE Pid=1;
Try to delete it (succeeded)
DELETE FROM ParentTable WHERE Pid=1
Now repeat 1 to 3 again. This time we create secondary index on the child table
CREATE INDEX Cid_Increasing ON ChildTable (Cid) STORING (Data);
Try to delete the parent row again, this time it failed saying that
This DML statement exceeds the mutation limit for a single transaction (20000). To reduce the mutation count, try a transaction with fewer writes, or use fewer indexes. This can help because the mutation count for an operation is equal to the number of columns it affects. Reducing writes or indexes reduces the number of affected columns, bringing your mutation count under the limit. Alternatively, try a Partioned DML statement using the client libraries or gcloud command-line tool.
OK, try reducing the number of rows to 20000 and then delete. Failed again.
DELETE FROM ChildTable WHERE Cid > 19999; // deleted 10000 rows.
SELECT COUNT(*) FROM ChildTable WHERE Pid=1; // should be 20000 now.
DELETE FROM ParentTable WHERE Pid=1; // still failed.
Delete one more row in the child table. This time the parent deletion succeeded.
DELETE FROM ChildTable WHERE Cid > 19998;
SELECT COUNT(*) FROM ChildTable WHERE Pid=1; // should be 19999 now.
DELETE FROM ParentTable WHERE Pid=1; // succeeded
The last two experiments suggested that:
the parent row deletion, as well as its child rows deletion, are counted as only 1 mutation.
each subsequent index change is counted as 1 mutation.

Related

Error Snowflake - Unsupported subquery type cannot be evaluated

I am facing an error in snowflake saying "Unsupported subquery type cannot be evaluated" after for example executing the below statement. How should write this statement to avoid this error?
select A
from (
select b
, c
FROM test_table
) ;
The outer query column list needs to be within the column list of the subquery. example: select b from (select b,c from test_table);
ignoring "columns" the query you have shown will never trigger this error.
You would get it from this form though:
select A.*
from tableA as A
where a.x = (select b.y FROM test_table as b where b.z = a.z)
this form assuming there is only 1 b.y per b.z can be turned into a inner join like
select A.*
from tableA as A
join test_table as b
on b.z = a.z and a.x = b.y
other forms of this pattern do the likes of max(b.y) and those can be made into a sub-select like:
select A.*
from tableA as A
join (
select c.z, max(c.y) from test_table as c group by 1
) as b
on b.z = a.z and a.x = b.y
but the general pattern is, in other databases there is no "cost" to do row-by-row queries, where-as Snowflake is more optimal with pre-building tables of similar data, and then equi-joining those results together. So both the "how-to-write" example pivot from a for-each-row thinking to a build the set of all possible answers, and then join that. This allows for the most parallel processing of the data possible. And while it means you the develop need to understand your data to get he best performance out of it, in general if you are doing large scale data processing, you should be understanding your data. So this costs, is rather acceptable imho.
If you are trying to Match Two Attributes on the Subquery.
Use like below:
If both need to matched:
select * from Table WHERE a IN ( select b FROM test_table ) AND a IN ( select c FROM test_table )
If any one need to matched:
select * from Table WHERE a IN ( select b FROM test_table ) OR a IN ( select c FROM test_table )

How to do compare/subtract records

Table A having 20 records and table B showing 19 records. How to find that one record is which is missing in table B. How to do compare/subtract records of these two tables; to find that one record. Running query in Apache Superset.
The exact answer depends on which column(s) define whether two records are the same. Assuming you wanted to use some primary key column for the comparison, you could try:
SELECT a.*
FROM TableA a
WHERE NOT EXISTS (SELECT 1 FROM TableB b WHERE b.pk = a.pk);
If you wanted to use more than one column to compare records from the two tables, then you would just add logic to the exists clause, e.g. for three columns:
WHERE NOT EXISTS (SELECT 1 FROM TableB b WHERE b.col1 = a.col1 AND
b.col2 = a.col2 AND
b.col3 = a.col3)

How to include ' partition by ' in TD15 Pivot function?

Right now I'm having query like this -
SELECT a, b,
SUM (CASE WHEN measure_name = 'ABC' THEN measure_qty END) OVER (PARTITION BY a, b ) AS ABCPIVOT
FROM data_app.work_test
Now as TD15 is supporting direct PIVOTING.
How do I include this partition by in PIVOT function?

DELETE FROM (SELECT ...) SAP HANA

How come this does not work and what is a workaround?
DELETE FROM
(SELECT
PKID
, a
, b)
Where a > 1
There is a Syntax Error at "(".
DELETE FROM (TABLE) where a > 1 gives the same syntax error.
I need to delete specific rows that are flagged using a rank function in my select statement.
I have now put a table immediately after the DELETE FROM and put WHERE restrictions on the DELETE and in a small series of self-joins of the table.
DELETE FROM TABLE1
WHERE x IN
(SELECT A.x
FROM (SELECT x, r1.y, r2.y, DENSE_RANK() OVER (PARTITION by r1.y, r2.y ORDER by x) AS RANK
FROM TABLE2 r0
INNER JOIN TABLE1 r1 on r0.x = r1.x
INNER JOIN TABLE1 r2 on r0.x = r2.x
WHERE r1.y = foo and r2.y = bar
) AS A
WHERE A.RANK > 1
)

Postgres: insert and return ID or return id of exisitng row

I need to tune this statement to return the id if row exists or insert row if not exists based on 2 parameters:
WITH d(t, e) AS ( VALUES (1, current_timestamp)),
t AS (SELECT id FROM attendance, d WHERE start = e and user_id = t),
i AS (INSERT INTO attendance (user_id, start)
SELECT t, e FROM d WHERE t NOT IN (SELECT id FROM t))
SELECT t,e FROM d WHERE t IN (SELECT id FROM t);
1 and current_timestamp needs to be parametric. I will use this from nodejs. I have just used pgsql function because when I supplement timestamp like '2016-08-18T14:51:42.333Z' it is saying that I need to convert it.
Current state is that with current_timestamp function, it will insert the new row but not return it's id.
This is what I have constructed with help of this thread.
Thanks for solutions.

Resources