how to delete the data from the Delta Table? - databricks

I was actually trying to delete the data from the Delta table.
When i run the below query, I'm getting data around 500 or 1000 records.
SELECT * FROM table1 inv
join (SELECT col1, col2, col2, min(Date) minDate, max(Date) maxDate FROM table2 a GROUP BY col1, col2, col3) aux
on aux.col1 = inv.col1 and aux.col2 = inv.col2 and aux.col3 = inv.col3
WHERE Date between aux.minDate and aux.maxDate
But when i try to delete that 500 records with the below query I'm getting error with syntax.
DELETE FROM table1 inv
join (SELECT col1, col2, col2, min(Date) minDate, max(Date) maxDate FROM table2 a GROUP BY col1, col2, col3) aux
on aux.col1 = inv.col1 and aux.col2 = inv.col2 and aux.col3 = inv.col3
WHERE Date between aux.minDate and aux.maxDate
Please someone help me here.
Thanks in advance :).

Here is the sql reference:
DELETE FROM table_identifier [AS alias] [WHERE predicate]
You can't use JOIN here, so expand your where clause according to your needs.
Here are some examples:
DELETE FROM table1
WHERE EXISTS (SELECT ... FROM table2 ...)
DELETE FROM table1
WHERE table1.col1 IN (SELECT ... FROM table2 WHERE ...)
DELETE FROM table1
WHERE table1.col1 NOT IN (SELECT ... FROM table2 WHERE ...)

Related

How to get top 3 columns and their values across multiple columns (dynamically) in BigQuery

I have a table that looks like this
select 'Alice' AS ID, 1 as col1, 3 as col2, -2 as col3, 9 as col4
union all
select 'Bob' AS ID, -9 as col1, 2 as col2, 5 as col3, -6 as col4
I would like to get the top 3 absolute values for each record across the four columns and then format the output as a dictionary or STRUCT like below
select
'Alice' AS ID, [STRUCT('col4' AS column, 9 AS value), STRUCT('col2',3), STRUCT('col3',-2)] output
union all
select
'Bob' AS ID, [STRUCT('col1' AS column, -9 AS value), STRUCT('col4',-6), STRUCT('col3',5)]
output
output
I would like it to be dynamic, so avoid writing out columns individually. It could go up to 100 columns that change
For more context, I am trying to get the top three features from the batch local explanations output in Vertex AI
https://cloud.google.com/vertex-ai/docs/tabular-data/classification-regression/get-batch-predictions
I have looked up some examples, would like something similar to the second answer here How to get max value of column values in a record ? (BigQuery)
EDIT: the data is actually structured like this. If this can be worked with more easily, this would be a better option to work from
select 'Alice' AS ID, STRUCT(1 as col1, 3 as col2, -2 as col3, 9 as col4) AS featureAttributions
union all
SELECT 'Bob' AS ID, STRUCT(-9 as col1, 2 as col2, 5 as col3, -6 as col4) AS featureAttributions
Consider below query.
SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
FROM (
SELECT * FROM sample_table UNPIVOT (value FOR column IN (col1, col2, col3, col4))
)
GROUP BY ID;
Query results
Dynamic Query
I would like it to be dynamic, so avoid writing out columns individually
You need to consider a dynamic SQL for this. By refering to the answer from #Mikhail you linked in the post, you can write a dynamic query like below.
EXECUTE IMMEDIATE FORMAT("""
SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
FROM (
SELECT * FROM sample_table UNPIVOT (value FOR column IN (%s))
)
GROUP BY ID
""", ARRAY_TO_STRING(
REGEXP_EXTRACT_ALL(TO_JSON_STRING((SELECT AS STRUCT * EXCEPT (ID) FROM sample_table LIMIT 1)), r'"([^,{]+)":'), ',')
);
For updated sample table
SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
FROM (
SELECT * FROM (SELECT ID, featureAttributions.* FROM sample_table)
UNPIVOT (value FOR column IN (col1, col2, col3, col4))
)
GROUP BY ID;
EXECUTE IMMEDIATE FORMAT("""
SELECT ID, ARRAY_AGG(STRUCT(column, value) ORDER BY ABS(value) DESC LIMIT 3) output
FROM (
SELECT * FROM (SELECT ID, featureAttributions.* FROM sample_table)
UNPIVOT (value FOR column IN (%s))
)
GROUP BY ID
""", ARRAY_TO_STRING(
REGEXP_EXTRACT_ALL(TO_JSON_STRING((SELECT featureAttributions FROM sample_table LIMIT 1)), r'"([^,{]+)":'), ',')
);

Update table based on CTE

I have an update query as below:
WITH update_cte as (
SELECT col1, col2,
daterange(col3, col4) as col5
FROM temp_table
)
UPDATE my_table
SET col5 = update_cte.col5
FROM update_cte inner join my_table
ON (update_cte.col1 = mytable.col1)
AND (update_cte.col2 = mytable.col2)
It gives me the following error:
"Error: table name \"my_table\" specified more than once\n"}
I was able to resolve it by specifying an alias in inner join.
WITH update_cte as (
SELECT col1, col2,
daterange(col3, col4) as col5
FROM temp_table
)
UPDATE my_table
SET col5 = update_cte.col5
FROM update_cte inner join my_table as idr
ON (update_cte.col1 = idr.col1)
AND (update_cte.col2 = idr.col2)

Cognos 11 - filter between query subjects

Given Table A with columns: ColA1, ColA2, ColA3
And a Table B with columns: ColB1
I want to restrict the data that can be returned from Table A based on data in Table B, like:
ColA1 not in ColB1
Ideally, some way incorporate SQL queries in the filter with select statements
What you want is
SELECT a.ColA1
, a.ColA2
, a.ColA3
FROM TableA a
LEFT OUTER JOIN TableB b on b.ColB1 = a.ColA1
WHERE b.ColB1 IS NULL
So...
Query1 contains ColA1, ColA2, and ColA3 from TableA.
Query2 contains ColB1 from TableB.
Query3
joins Query1 and Query2 on ColA1 1..1 = 0..1 ColB1
Data Items: ColA1, ColA2, ColA3
Filter: ColB1 IS NOT NULL
not exists is probably what you are looking for
Try something like this
select * from TableA as T1
where not exists
(select * from TableB as T2
where t1.key1 = t2.key1 and T1.key2 = t2.key2)

hive How to use conditional statements to execute different query based on result

I have query select col1, col2 from view1 and I wanted execute only when (select columnvalue from table1) > 0 else do nothing.
if (select columnvalue from table1)>0
select col1, col2 from view1"
else
do thing
How can I achieve this in single hive query?
If check query returns scalar value (single row) then you can cross join with check result and filter using > 0 condition:
with check_query as (
select count (*) cnt
from table1
)
select *
from view1 t
cross join check_query c
where c.cnt>0
;

SQL Get specific columns from one table and all rows from a joined table in one query

This may have been asked before and I just can't find it.
I have a one to many relationship in the database on a few tables.
table1
table2
table3
table2 - table3 is the 1-many relationship
here's a mock of what I have:
select
table1.id
table1.Column
table2.Column2
-- I want all entries here from table 3 here as well
From table1 t1
left outer join table2 t2 on t2.ID = t1.ID
left outer join join table3 t3 on t3.ID2 = t2.ID2
Is it possible to also select all of the entries that belong to table3 in this query without specifying a sub-query in the select statement?
Also, does this look right? As I've said in the past I'm really new to SQL, thus my sucky code...
EDIT
Sorry guys I misspoke. I need a single column from each of the rows that should be in table3
select
table1.id,
table1.Column,
table2.Column2,
-- I'm going to need a subquery here aren't I...?
table3.columnFromRrow1,
table3.columnFromRrow2,
table3.columnFromRrow3
From table1 t1
left outer join table2 t2 on t2.ID = t1.ID
left outer join join table3 t3 on t3.ID2 = t2.ID2
;WITH cte AS
( SELECT table1.t1id,
table1.t1col,
table2.t2col,
table3.t3col,
ROW_NUMBER() OVER (PARTITION BY t1id,t1col,t2col
ORDER BY table3.id) AS RN
FROM table1 t1
LEFT OUTER JOIN table2 t2
ON t2.ID = t1.ID
LEFT OUTER JOIN
JOIN table3 t3
ON t3.ID2 = t2.ID2
)
SELECT
t1id,
t1col,
t2col,
MAX(CASE WHEN RN=1 THEN t3col END) AS columnFromRrow1,
MAX(CASE WHEN RN=2 THEN t3col END) AS columnFromRrow2,
MAX(CASE WHEN RN=3 THEN t3col END) AS columnFromRrow3
FROM cte
WHERE RN<=3
GROUP BY t1id,t1col,t2col
I've modified (and corrected your query to do what you want).
SELECT
table1.id,
table1.Column,
table2.Column2,
table3.* -- All columns from table3
FROM table1 AS t1
LEFT OUTER JOIN table2 AS t2
ON t2.ID = t1.ID
LEFT OUTER JOIN table3 AS t3
ON t3.ID2 = t2.ID2
NOTE: This answer is no longer valid, because the original question has been modified...
Using *
select
table1.id
table1.Column
table2.Column2
-- I want all entries here from table 3 here as well
table3.*
From table1 t1
left outer join table2 t2 on t2.ID = t1.ID
left outer join join table3 t3 on t3.ID2 = t2.ID2

Resources