I am attempting to use a subquery in a left join condition, but am getting an error message that reads: "Error in SQL statement: AnalysisException: Table or view not found: TableD;" and points to the FROM TableD D2 statement in my subquery.
SELECT D1.Code, D1.Description, C.InstanceKey
FROM TableA A
INNER JOIN TableB B
ON A.Key = B.Key
INNER JOIN TableC C
ON B.DetailKey = C.DetailKey
LEFT JOIN TableD D1
ON C.InstanceKey = D1.InstanceKey
AND D1.RankCnt = (SELECT MIN(D2.RankCnt)
FROM TableD D2
WHERE C.InstanceKey = D2.InstanceKey);
If I remove the subquery and hardcode D1.RankCnt = [anyValidRankCnt], the query runs without issue.
This question has also been posted on the Databricks Community Forum at https://forums.databricks.com/questions/14588/why-is-subquery-in-left-join-causing-error-msg.html.
I'm not sure if that particular type of correlated subquery is supported in Spark at this time, although I was able to rewrite it in a couple of different ways, including using ROW_NUMBER. Please check these queries are semantically equivalent to yours with your data:
%sql
-- Rewrite 1: CTE
WITH cte AS
(
SELECT D1.Code, D1.Description, C.InstanceKey, ROW_NUMBER() OVER ( PARTITION BY c.InstanceKey ORDER BY D1.RankCnt ) xrank
FROM TableA A
INNER JOIN TableB B
ON A.Key = B.Key
INNER JOIN TableC C
ON B.DetailKey = C.DetailKey
LEFT JOIN TableD D1
ON C.InstanceKey = D1.InstanceKey
)
SELECT *
FROM cte
WHERE xrank = 1
-- Rewrite 2: subquery
SELECT x.Code, x.Description, C.InstanceKey
FROM TableA A
INNER JOIN TableB B
ON A.Key = B.Key
INNER JOIN TableC C
ON B.DetailKey = C.DetailKey
LEFT JOIN
(
SELECT D1.InstanceKey, D1.Code, D1.Description, D1.RankCnt
FROM TableD D1
INNER JOIN
(
SELECT InstanceKey, MIN(RankCnt) RankCnt
FROM TableD
GROUP BY InstanceKey
) D2 ON D1.InstanceKey = D2.InstanceKey
AND D1.RankCnt = D2.RankCnt
) x
ON c.InstanceKey = x.InstanceKey;
-- Rewrite 3: UNION ALL
SELECT D1.Code, D1.Description, C.InstanceKey
FROM TableA A
INNER JOIN TableB B
ON A.Key = B.Key
INNER JOIN TableC C
ON B.DetailKey = C.DetailKey
INNER JOIN TableD D1
ON C.InstanceKey = D1.InstanceKey
INNER JOIN
(
SELECT D2.InstanceKey, MIN(D2.RankCnt) RankCnt
FROM TableD D2
GROUP BY D2.InstanceKey
) x ON C.InstanceKey = x.InstanceKey
AND D1.RankCnt = x.RankCnt
UNION ALL
SELECT NULL AS Code, NULL AS Description, C.InstanceKey
FROM TableA A
INNER JOIN TableB B
ON A.Key = B.Key
INNER JOIN TableC C
ON B.DetailKey = C.DetailKey
WHERE NOT EXISTS
(
SELECT *
FROM TableD D1
WHERE C.InstanceKey = D1.InstanceKey
);
Related
Given Table A with columns: ColA1, ColA2, ColA3
And a Table B with columns: ColB1
I want to restrict the data that can be returned from Table A based on data in Table B, like:
ColA1 not in ColB1
Ideally, some way incorporate SQL queries in the filter with select statements
What you want is
SELECT a.ColA1
, a.ColA2
, a.ColA3
FROM TableA a
LEFT OUTER JOIN TableB b on b.ColB1 = a.ColA1
WHERE b.ColB1 IS NULL
So...
Query1 contains ColA1, ColA2, and ColA3 from TableA.
Query2 contains ColB1 from TableB.
Query3
joins Query1 and Query2 on ColA1 1..1 = 0..1 ColB1
Data Items: ColA1, ColA2, ColA3
Filter: ColB1 IS NOT NULL
not exists is probably what you are looking for
Try something like this
select * from TableA as T1
where not exists
(select * from TableB as T2
where t1.key1 = t2.key1 and T1.key2 = t2.key2)
imagine these two tables.
Table A
ID col1 col2 col3
1 foo baz bar
2 ofo zba rba
3 oof abz abr
Table B
A_ID field_name field_value
1 first Jon
1 last Doe
2 first Adam
2 last Smith
etc..
Now I would like to have a query (current one looks like this)
SELECT
a.id,
a.col1,
a.col2,
(SELECT field_value FROM B WHERE A_ID = a.id AND field_name = 'first') as first_name,
(SELECT field_value FROM B WHERE A_ID = a.id AND field_name = 'last') as last_name
FROM A a
WHERE (SELECT COUNT(*) FROM B WHERE A_ID = a.id) = 2;
This query is working. What I would like to achieve would be something like this.
SELECT
a.id,
a.col1,
a.col2,
(SELECT field_value FROM b WHERE b.field_name = 'first') as first_name,
(SELECT field_value FROM b WHERE b.field_name = 'last') as last_name
FROM
A a,
(SELECT field_value, field_name FROM B WHERE A_ID = a.id) b
WHERE (SELECT COUNT(*) FROM b) = 2;
How would my approach look correctly? Is there any other way to get rid of the multiple queries of the table B?
Thank you!
I would replace your correlated subqueries with joins:
SELECT
a.id,
a.col1,
a.col2,
b1.field_value AS fv1,
b2.field_value AS fv2
FROM A a
LEFT JOIN B b1
ON a.id = b1.A_ID AND b1.field_name = 'first'
LEFT JOIN B b2
ON a.id = b2.A_ID AND b2.field_name = 'last';
This answer assumes that a left join from a given A record would only match at most one record in the B table, which, however, is a requirement anyway for your correlated subqueries to only return a single value.
How come this does not work and what is a workaround?
DELETE FROM
(SELECT
PKID
, a
, b)
Where a > 1
There is a Syntax Error at "(".
DELETE FROM (TABLE) where a > 1 gives the same syntax error.
I need to delete specific rows that are flagged using a rank function in my select statement.
I have now put a table immediately after the DELETE FROM and put WHERE restrictions on the DELETE and in a small series of self-joins of the table.
DELETE FROM TABLE1
WHERE x IN
(SELECT A.x
FROM (SELECT x, r1.y, r2.y, DENSE_RANK() OVER (PARTITION by r1.y, r2.y ORDER by x) AS RANK
FROM TABLE2 r0
INNER JOIN TABLE1 r1 on r0.x = r1.x
INNER JOIN TABLE1 r2 on r0.x = r2.x
WHERE r1.y = foo and r2.y = bar
) AS A
WHERE A.RANK > 1
)
Is it possible to create a typed query that produces the following SQL?
SELECT A.*
FROM schema1.Table1 A
INNER JOIN (SELECT DISTINCT column1, column2 FROM schema1.Table2) B ON A.column1 = B.column1
You can't join a sub select with a typed API, the easiest way to implement this would be to use a CustomJoin, e.g:
var table1 = db.GetTableName<Table1>();
var q = db.From<Table1>()
.CustomJoin($#"INNER JOIN
(SELECT DISTINCT column1, column2 FROM schema1.Table2) B
ON {table1}.column1 = B.column1");
This may have been asked before and I just can't find it.
I have a one to many relationship in the database on a few tables.
table1
table2
table3
table2 - table3 is the 1-many relationship
here's a mock of what I have:
select
table1.id
table1.Column
table2.Column2
-- I want all entries here from table 3 here as well
From table1 t1
left outer join table2 t2 on t2.ID = t1.ID
left outer join join table3 t3 on t3.ID2 = t2.ID2
Is it possible to also select all of the entries that belong to table3 in this query without specifying a sub-query in the select statement?
Also, does this look right? As I've said in the past I'm really new to SQL, thus my sucky code...
EDIT
Sorry guys I misspoke. I need a single column from each of the rows that should be in table3
select
table1.id,
table1.Column,
table2.Column2,
-- I'm going to need a subquery here aren't I...?
table3.columnFromRrow1,
table3.columnFromRrow2,
table3.columnFromRrow3
From table1 t1
left outer join table2 t2 on t2.ID = t1.ID
left outer join join table3 t3 on t3.ID2 = t2.ID2
;WITH cte AS
( SELECT table1.t1id,
table1.t1col,
table2.t2col,
table3.t3col,
ROW_NUMBER() OVER (PARTITION BY t1id,t1col,t2col
ORDER BY table3.id) AS RN
FROM table1 t1
LEFT OUTER JOIN table2 t2
ON t2.ID = t1.ID
LEFT OUTER JOIN
JOIN table3 t3
ON t3.ID2 = t2.ID2
)
SELECT
t1id,
t1col,
t2col,
MAX(CASE WHEN RN=1 THEN t3col END) AS columnFromRrow1,
MAX(CASE WHEN RN=2 THEN t3col END) AS columnFromRrow2,
MAX(CASE WHEN RN=3 THEN t3col END) AS columnFromRrow3
FROM cte
WHERE RN<=3
GROUP BY t1id,t1col,t2col
I've modified (and corrected your query to do what you want).
SELECT
table1.id,
table1.Column,
table2.Column2,
table3.* -- All columns from table3
FROM table1 AS t1
LEFT OUTER JOIN table2 AS t2
ON t2.ID = t1.ID
LEFT OUTER JOIN table3 AS t3
ON t3.ID2 = t2.ID2
NOTE: This answer is no longer valid, because the original question has been modified...
Using *
select
table1.id
table1.Column
table2.Column2
-- I want all entries here from table 3 here as well
table3.*
From table1 t1
left outer join table2 t2 on t2.ID = t1.ID
left outer join join table3 t3 on t3.ID2 = t2.ID2