Update table based on CTE - psycopg2

I have an update query as below:
WITH update_cte as (
SELECT col1, col2,
daterange(col3, col4) as col5
FROM temp_table
)
UPDATE my_table
SET col5 = update_cte.col5
FROM update_cte inner join my_table
ON (update_cte.col1 = mytable.col1)
AND (update_cte.col2 = mytable.col2)
It gives me the following error:
"Error: table name \"my_table\" specified more than once\n"}

I was able to resolve it by specifying an alias in inner join.
WITH update_cte as (
SELECT col1, col2,
daterange(col3, col4) as col5
FROM temp_table
)
UPDATE my_table
SET col5 = update_cte.col5
FROM update_cte inner join my_table as idr
ON (update_cte.col1 = idr.col1)
AND (update_cte.col2 = idr.col2)

Related

how to delete the data from the Delta Table?

I was actually trying to delete the data from the Delta table.
When i run the below query, I'm getting data around 500 or 1000 records.
SELECT * FROM table1 inv
join (SELECT col1, col2, col2, min(Date) minDate, max(Date) maxDate FROM table2 a GROUP BY col1, col2, col3) aux
on aux.col1 = inv.col1 and aux.col2 = inv.col2 and aux.col3 = inv.col3
WHERE Date between aux.minDate and aux.maxDate
But when i try to delete that 500 records with the below query I'm getting error with syntax.
DELETE FROM table1 inv
join (SELECT col1, col2, col2, min(Date) minDate, max(Date) maxDate FROM table2 a GROUP BY col1, col2, col3) aux
on aux.col1 = inv.col1 and aux.col2 = inv.col2 and aux.col3 = inv.col3
WHERE Date between aux.minDate and aux.maxDate
Please someone help me here.
Thanks in advance :).
Here is the sql reference:
DELETE FROM table_identifier [AS alias] [WHERE predicate]
You can't use JOIN here, so expand your where clause according to your needs.
Here are some examples:
DELETE FROM table1
WHERE EXISTS (SELECT ... FROM table2 ...)
DELETE FROM table1
WHERE table1.col1 IN (SELECT ... FROM table2 WHERE ...)
DELETE FROM table1
WHERE table1.col1 NOT IN (SELECT ... FROM table2 WHERE ...)

Pull latest record from data using distinct on?

I have a data that looks like below:
col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11,col12,col13,col14,col15
2020-10-30 17:57:17,False,2020-07-01,14,2,False,0.0,True,30.0,True,30.0,True,True,True,False
2020-10-30 17:57:17,False,2020-07-01,15,2,True,28.0,False,0.0,False,0.0,True,True,True,False
2020-11-15 17:57:17,True,2020-07-01,5,2,True,28.0,False,0.0,False,0.0,True,True,True,False
2020-11-15 17:57:17,False,2020-07-01,7,2,False,0.0,True,30.0,True,30.0,True,True,True,False
My query looks like the following:
select distinct on (col3) col4
from table where col13 is true and col15 is false
and col3 = '2020-07-01'
and col1 <= '2020-09-16'
and col2 is false order by col3, col1 asc;
My expected answer should be [14, 15] since these are earliest records for '2020-07-01'. However using the above query I only get [15]. Any ideas what I might be doing wrong.
I was able to resolve this using the following query:
select distinct col4,
from table where col13 is true and col15 is false
and col3 = '2020-07-01'
and col1 = (select min(col1) from table
where col1 <= '2020-09-16' and col3 = '2020-07-01')
and col2 is false;

hive How to use conditional statements to execute different query based on result

I have query select col1, col2 from view1 and I wanted execute only when (select columnvalue from table1) > 0 else do nothing.
if (select columnvalue from table1)>0
select col1, col2 from view1"
else
do thing
How can I achieve this in single hive query?
If check query returns scalar value (single row) then you can cross join with check result and filter using > 0 condition:
with check_query as (
select count (*) cnt
from table1
)
select *
from view1 t
cross join check_query c
where c.cnt>0
;

Cassandra select query failure

We have a table:
CREATE TABLE table (
col1 text,
col2 text,
col3 timestamp,
cl4 int,
col5 timestamp,
PRIMARY KEY (col1, col2, col3, col4)
) WITH CLUSTERING ORDER BY (col2 DESC, col3 DESC,col4 DESC)
When I try querying from this table like:
select * from table where col1 = 'something' and col3 < 'something'
and col4= 12 limit 5 ALLOW FILTERING;
select * from table where col1 = 'something' and col4 < 23
and col3 >= 'something' ALLOW FILTERING;
I always get the error: Clustering column "col4" cannot be restricted (preceding column "col3" is restricted by a non-EQ relation) .
I tried to change the table creation by making col4, col3, col2, but the second query doesn't work and throw a similar error.
Any suggetion/advice to solve this problem.
We are on : Cassandra 3.0.17.7.
You can use non-equality condition only on the last column of partition of the query.
For example, you can do use col1 = val and col2 <= ..., or col1 = val and col2 = val2 and col3 <= ..., or col1 = val and col2 = val2 and col3 = val3 and col4 <= ..., but you can't do non-equality condition on several columns - that's how Cassandra reads data.

Column list specification in INSERT OVERWRITE statement

While trying to OVERWRITE the Hive table with specific columns from Spark(Pyspark) using a dataframe getting the below error
pyspark.sql.utils.ParseException: u"\nmismatched input 'col1' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 36)\n\n== SQL ==\ninsert OVERWRITE table DB.TableName (Col1, Col2, Col3) select Col1, Col2, Col3 FROM dataframe\n------------------------------------^^^\n"
Based on the https://issues.apache.org/jira/browse/HIVE-9481 looks like column-list is still not supported in INSERT OVERWRITE, so on trying to run without the overwrite keyword still gives me the same error.
sparkSession.sql("insert into table DB.TableName (Col1, Col2, Col3) select Col1, Col2, Col3 FROM dataframe")
Note: The above works fine when the specific column-list is not
specified, and the columns between the tables match.
But on trying the same via Hive Terminal goes through fine.
INSERT INTO TABLE DB.TableName (Col1, Col2, Col3) select Col1, Col2, Col3 from DB.TableName2;
Should any property or configuration be set or passed through the Spark-Submit
Please do let me know if you need more data or information..

Resources