Left join in subquery - subquery

I am trying to do a left join on my main table using this code
select distinct VBen.BENF_NO_INDIV_BEN_BANLS as benbanls,
VBen.BENF_COD_SEXE AS Sexe,
VBen.BENF_DAT_NAISS AS DatNaiss,
VBen.BENF_DAT_DECES AS DatDec,
A.date_ch as date_chsld
from PROD.V_FICH_ID_BEN_CM AS VBen
left join (select distinct VAss.BENF_NO_INDIV_BEN_BANLS as benbanls,
vass.BENF_DD_ADMIS_ASSU_MED as date_ch
from Prod.V_ADMIS_ASSU_MED_PLAN_PRIOR_CM as vass ) as A
on VBen.BENF_NO_INDIV_BEN_BANLS =A. benbanls
where Vben.BENF_DAT_NAISS>'2016-04-01' or Vben.BENF_DAT_DECES>'2011-04-01'
The problem is that the query result is a table with of number of rows greater than the main table with the same where 'condition'. I don't understand what I am missing
Thanks for your help

Why is it a problem?
The results simply indicate you have a 1:M (one to many) relationship between VBen:Vass(A)
If you don't have a 1:M relationship and it should be 1:1 then...
you're missing join criteria between the tables.
you should be getting a min/max on your date instead of all dates per benbanls
To better understand and answer we would need to know what VBen and Vass actually represent; but to put simply, you have multiple VASS(A) per VBEN
To illustrate with an example: Think about Order_Header and Order_Line tables...
Order_header contains (order_Number PK)
Order_line contains (Order_Number, Order_Line PK)
An order can have multiple lines, each line could have it's own ship date several items may have gone out on the same shipment/day. where some that were backordered went out on a different day. In this situation, an order would still have multiple lines even though we distinct order_number and shipmentdate in a subquery. I would guess your situation is similar.
so 1 in base table * 2 rows in derived/lines table gives us 2 records
1 < 2 which is the situation you have now; and that to me is perfectly fine and expected if it's a 1:M relationship.
Maybe you need to do a min or max on date instead of a distinct?
If not you're missing join criteria to make a 1:1 relationship
maybe your expectation is just flawed.
The below will give you a 1:1 relationship but I'm not sure it's what you're after.
SELECT distinct VBen.BENF_NO_INDIV_BEN_BANLS as benbanls,
VBen.BENF_COD_SEXE AS Sexe,
VBen.BENF_DAT_NAISS AS DatNaiss,
VBen.BENF_DAT_DECES AS DatDec,
A.date_ch as date_chsld
FROM PROD.V_FICH_ID_BEN_CM AS VBen
LEFT JOIN (SELECT VAss.BENF_NO_INDIV_BEN_BANLS as benbanls,
Max(vass.BENF_DD_ADMIS_ASSU_MED) as date_ch
FROM Prod.V_ADMIS_ASSU_MED_PLAN_PRIOR_CM as vass
GROUP BY VAss.BENF_NO_INDIV_BEN_BANLS) as A
on VBen.BENF_NO_INDIV_BEN_BANLS = A. benbanls
WHERE (Vben.BENF_DAT_NAISS>'2016-04-01'
or Vben.BENF_DAT_DECES>'2011-04-01)

It is likely that there is more than one counterpart in the detail table of a record on the main table.
I try your scenario on my db get a correct result.
In my DB:
select distinct p.PollId as PollId,
p.Title AS Title,
p.InsertDate AS DatDec,
ps.date_ch as date_chsld
from dbo.Poll AS p
left join (select distinct pSt.PollId as pollId,
Max(pSt.InsertDate) as date_ch
from dbo.PollStore as pSt
Group by pSt.PollId ) as ps
on p.PollId =ps.pollId
As Your Query like this :
select distinct VBen.BENF_NO_INDIV_BEN_BANLS as benbanls,
VBen.BENF_COD_SEXE AS Sexe,
VBen.BENF_DAT_NAISS AS DatNaiss,
VBen.BENF_DAT_DECES AS DatDec,
A.date_ch as date_chsld
please try this query
from PROD.V_FICH_ID_BEN_CM AS VBen
left join (select distinct VAss.BENF_NO_INDIV_BEN_BANLS as benbanls,
Max(vass.BENF_DD_ADMIS_ASSU_MED) as date_ch
from Prod.V_ADMIS_ASSU_MED_PLAN_PRIOR_CM Group by VAss.BENF_NO_INDIV_BEN_BANLS as vass ) as A
on VBen.BENF_NO_INDIV_BEN_BANLS =A. benbanls
where Vben.BENF_DAT_NAISS>'2016-04-01' or Vben.BENF_DAT_DECES>'2011-04-01'

Related

Correct way to get the last value for a field in Apache Spark or Databricks Using SQL (Correct behavior of last and last_value)?

What is the correct behavior of the last and last_value functions in Apache Spark/Databricks SQL. The way I'm reading the documentation (here: https://docs.databricks.com/spark/2.x/spark-sql/language-manual/functions.html) it sounds like it should return the last value of what ever is in the expression.
So if I have a select statement that does something like
select
person,
last(team)
from
(select * from person_team order by date_joined)
group by person
I should get the last team a person joined, yes/no?
The actual query I'm running is shown below. It is returning a different number each time I execute the query.
select count(distinct patient_id) from (
select
patient_id,
org_patient_id,
last_value(data_lot) data_lot
from
(select * from my_table order by data_lot)
where 1=1
and org = 'my_org'
group by 1,2
order by 1,2
)
where data_lot in ('2021-01','2021-02')
;
What is the correct way to get the last value for a given field (for either the team example or my specific example)?
--- EDIT -------------------
I'm thinking collect_set might be useful here, but I get the error shown when I try to run this:
select
patient_id,
last_value(collect_set(data_lot)) data_lot
from
covid.demo
group by patient_id
;
Error in SQL statement: AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;;
Aggregate [patient_id#89338], [patient_id#89338, last_value(collect_set(data_lot#89342, 0, 0), false) AS data_lot#91848]
+- SubqueryAlias spark_catalog.covid.demo
The posts shown below discusses how to get max values (not the same as last in a list ordered by a different field, I want the last team a player joined, the player may have joined the Reds, the A's, the Zebras, and the Yankees, in that order timewise, I'm looking for the Yankees) and these posts get to the solution procedurally using python/r. I'd like to do this in SQL.
Getting last value of group in Spark
Find maximum row per group in Spark DataFrame
--- SECOND EDIT -------------------
I ended up using something like this based upon the accepted answer.
select
row_number() over (order by provided_date, data_lot) as row_num,
demo.*
from demo
You can assign row numbers based on an ordering on data_lots if you want to get its last value:
select count(distinct patient_id) from (
select * from (
select *,
row_number() over (partition by patient_id, org_patient_id, org order by data_lots desc) as rn
from my_table
where org = 'my_org'
)
where rn = 1
)
where data_lot in ('2021-01','2021-02');

Mulitple tables join in Hive getting error - Both left and right aliases encountered in join

I am trying to join 3 tables. Following are the table details.
I am expecting following results
Here is my query and getting error as "both left and right aliases encountered in join 'id'".
This was due to joining 3rd table with 1st and 2nd table(last full join statement).
select coalesce(a.id,b.id,c.id) as id,
ref1,ref2,ref3
from v_cmo_test1 a
FULL JOIN v_cmo_test2 b on (a.id = b.id)
FULL JOIN v_cmo_test3 c on (c.id in (a.id,b.id))
If I am using below query, id 3 is repeating in the table which I don't want.
select coalesce(a.id,b.id,c.id) as id,
ref1,ref2,ref3
from v_cmo_test1 a
FULL JOIN v_cmo_test2 b on a.id = b.id
FULL JOIN v_cmo_test3 c on c.id = a.id
Could any one help me on how to achieve the expected results and really appreciate for your help.
Thanks, Babu
This is a very tricky requirement. data is incorrect because you are using test1 as driver, outer joins arent working properly. And this can occur with other tables. So, i am joining two tables at a time to achieve what you want.
select coalesce(inner_sq.id,c.id) as id,ref1,ref2,ref3
from
(select coalesce(a.id,b.id,c.id) as id,ref1,ref2
from v_cmo_test1 a
FULL JOIN v_cmo_test2 b on a.id = b.id
) inner_sq
FULL JOIN v_cmo_test3 c on c.id = inner_sq.id
Inner_sq query output -
1,bab,kim
2,xxx,yyy
3,,mmm
When you full join above with test3, you should get your output.

sqlite combine 2 queries from different tables to make one

I recently took to using sql again, the last time I used it was in microsoft access 2000 so please bear with me if I'm behind the times a little.
I have 2 pointless virtual currencies on my discord server for my players to play pointless games with. Both of these currencies' transactions are currently stored in individual tables.
I wish to sum up all the transactions for each player to give them a single current amount for each currency. Individually I can do this:
SELECT
tblPlayers.PlayerID AS PlayerID,
tblPlayers.Name AS Name,
SUM(tblGorillaTears.Amount)
FROM
tblPlayers
INNER JOIN
tblGorillaTears
ON
tblPlayers.PlayerID = tblGorillaTears.PlayerID
GROUP BY
tblPlayers.PlayerID;
and
SELECT
tblPlayers.PlayerID AS PlayerID,
tblPlayers.Name AS Name,
SUM(tblKebabs.Amount)
FROM
tblPlayers
INNER JOIN
tblKebabs
ON
tblPlayers.PlayerID = tblKebabs.PlayerID
GROUP BY
tblPlayers.PlayerID;
What i need is a table that outputs the user name the id and the total for each currency on one row, but when i do this:
SELECT
tblPlayers.PlayerID AS PlayerID,
tblPlayers.Name AS Name,
SUM(tblGorillaTears.Amount) AS GT,
0 as Kebabs
FROM
tblPlayers
INNER JOIN
tblGorillaTears
ON
tblPlayers.PlayerID = tblGorillaTears.PlayerID
GROUP BY
tblPlayers.PlayerID
UNION
SELECT
tblPlayers.PlayerID AS PlayerID,
tblPlayers.Name AS Name,
0 as GP,
SUM(tblKebabs.Amount)
FROM
tblPlayers
INNER JOIN
tblKebabs
ON
tblPlayers.PlayerID = tblKebabs.PlayerID
GROUP BY
tblPlayers.PlayerID;
the results end in a row for each player for each currency. How can i make it so both currencies appear in the same row?
Previously in MSAccess i was able to create two queries and then make a query of those two queries as if they were a table, but I cannot figure out how to do that in this instance. Thanks <3
UNION will add new rows for sure, you can try like following query.
SELECT TP.playerid AS PlayerID,
TP.NAME AS NAME,
(SELECT Sum(TG.amount)
FROM tblgorillatears TG
WHERE TG.playerid = TP.playerid) AS GT,
(SELECT Sum(TG.amount)
FROM tblkebabs TG
WHERE TG.playerid = TP.playerid) AS Kebabs
FROM tblplayers TP

How to debug "Each GROUP BY expression must contain at least one column that is not an outer reference error"

Since SSRS doesn't allow filters on aggregates, I found some code which helped me come up with the below query. However, when I run it I get:
Each GROUP BY expression must contain at least one column that is not an outer reference
I have searched everywhere but can't find how to fix this. I've even removed the two extra tables from the query so there were no joins at all. I need to not return any order where the total of the lines on the order is less than $500 and greater than 0.
SELECT
tdsls041_sales_order_lines.company,
tdsls041_sales_order_lines.order_number,
tdsls041_sales_order_lines.amount,
tdsls041_sales_order_lines.item,
tdsls041_sales_order_lines.container
FROM
tdsls041_sales_order_lines AS tdsls041_sales_order_lines
WHERE
(tdsls041_sales_order_lines.company = 610) AND
(tdsls041_sales_order_lines.order_number IN
(SELECT
tdsls041_sales_order_lines.order_number
FROM
tdsls041_sales_order_lines AS tdsls041_sales_order_lines_1
GROUP BY
tdsls041_sales_order_lines.order_number
HAVING
(SUM(tdsls041_sales_order_lines.amount) <= 500) OR
SUM(tdsls041_sales_order_lines.amount) > 0))
The issue that SQL Server is complaining about is that the Grouping wants an aggregate function in the SELECT statement. Unfortunately, you want to use IN which you need a list of Order Numbers.
You just need to add an aggregate function to your subquery and then add another layer to select just the Order Numbers from that.
SELECT T1.company, T1.order_number, T1.amount, T1.item, T1.container
FROM tdsls041_sales_order_lines AS T1
WHERE (T1.company = 610) AND (T1.order_number IN
(SELECT order_number FROM
(SELECT TSOL.order_number, SUM(TSOL.amount) AS TTL
FROM tdsls041_sales_order_lines AS TSOL
GROUP BY TSOL.order_number
HAVING (SUM(TSOL.amount) <= 500) OR
SUM(TSOL.amount) > 0) AS T2) )
You can filter on aggreagates in Chart and Tables. You have to put the aggregate filter on your GROUP instead of on the table itself (Group Properties->Filters tab).

how to join two or more tables and result set having all distinct values

I have some 20 excel files containing data. all the tables have same columns like id name age location etc..... each file has distinct data but i don't know if data in one file is again repeated in another file. so i want to join all the files and the result st should contain distinct values. please help me out with this problem as soon as possible. i want the result set to be stored in an access database.
I would recomend either linking the sheets in acces, or importing the sheets as tabels.
Then from there try to determine using a DISTINCT select from the tables/sheets the keys required, and only selecting the records as required.
In SQL, you can use JOIN or NATURAL JOIN to join tables. I would look into NATURAL JOIN since you said all tables have the same values.
After that you can use DISTINCT to get distinct values.
I'm not sure if this is what you're looking for though: your question asks about excel but you've tagged it with SQL.
If you can use all the tables in one query, you can use a union to get the distinct rows:
select id, name, age, location from Table1
union
select id, name, age, location from Table2
union
select id, name, age, location from Table3
union
...
You can insert the records directly from the result:
insert into ResultTable
select id, name, age, location from Table1
union
....
If you only can select from one table at a time, you can skip the insert of rows that are already in the table:
insert into ResultTable
select t.id, t.name, t.age, t.location from Table1 as t
left join ResultTable as r on r.id = t.id
where r.id is null
(Assuming that id is a unique field identifying the record.)
It seems the unique set of data you want is this:
SELECT T1.name, T1.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db1.xls;
].[Sheet1$] AS T1
UNION
SELECT T1.name, T1.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db2.xls;
].[Sheet1$] AS T1
...but that you then want to arbitrarily apply a sequence of integers as id (rather than using the id values from the Excel tables).
Because Access Database Engine does not support common table expressions and Excel does not support VIEWs, you will have to repeat that UNION query as derived tables (hopefully the optimizer will recognize the repeat?) e.g. using a correlated subquery to get the row number:
SELECT (
SELECT COUNT(*) + 1
FROM (
SELECT T1.name, T1.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db1.xls;
].[Sheet1$] AS T1
UNION
SELECT T1.name, T1.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db2.xls;
].[Sheet1$] AS T1
) AS DT1
WHERE DT1.name < DT2.name
) AS id,
DT2.name, DT2.loc
FROM (
SELECT T2.name, T2.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db1.xls;
].[Sheet1$] AS T2
UNION
SELECT T2.name, T2.loc
FROM [Excel 8.0;HDR=YES;IMEX=1;DATABASE=C:\db2.xls;
].[Sheet1$] AS T2
) AS DT2;
Note:
i want the result set to be stored in
an access database
Then maybe you should migrate the Excel data into a staging table in your Access database and do the data scrubbing from there. At least you could put that derived table into a VIEW :)
Join is to combine two tables by matching the values in corresponding columns. In result, you will get a merged table which consists of the first table, plus the matched rows copied from the second table. You can use DIGBD add-in for excel

Resources