Hive multiple subqueries - subquery

I'm using Hive 0.9.0 and I'm trying to execute query i.e.
`SELECT a.id, b.user FROM (SELECT...FROM a_table) a, (SELECT...FROM b_table) b WHERE a.date = b.date;`
but it returns error "loop (...)+ does not match input....".
Does Hive support multiple subqueries in FROM just like Oracle DB?

Multiple subqueries allowed in hive.
I tested with below code,it works.
select * from (select id from test where id>10) a
join (select id from test where id>20) b on a.id=b.id;
Please post your exact code so that I can give relevant solution.

join subqueries is supported Absolutely.
I think the key problem is that u use SELECT...FROM.
The correct syntax is SELECT * FROM
SELECT a.id, b.user
FROM
(SELECT * FROM a_table) a
JOIN (SELECT * FROM b_table) b ON a.date = b.date;

If you want to obtain the full Cartesian product before applying the WHERE
clause, instead:
SELECT a.id, b.user FROM (SELECT...FROM a_table) a, (SELECT...FROM b_table) b WHERE a.date = b.date;
you should use 'join' in the middle, i.e.
SELECT a.id, b.user FROM (SELECT...FROM a_table) a join (SELECT...FROM b_table) b WHERE a.date = b.date;
above is not admissible in strict mode.

Related

Databricks AnalysisException: Column 'l' does not exist

I have a very strange occurrence with my code.
I keep on getting the error
AnalysisException: Column 'homepage_url' does not exist
However, when I do a select with cross Joins the column does actually exist.
Can someone take a look at my cross joins and let me know if that is where the problem is
SELECT DISTINCT
account.xpd_relationshipstatus AS CRM_xpd_relationshipstatus
,REPLACE(owneridname,'Data.Import #','') AS MontaguOwner
,account.ts_montaguoffice AS Montagu_Office
,CAST(account.ts_reminderdatesetto AS DATE) AS CRM_ts_reminderdatesetto
,CAST(account.ts_lastdatestatuschanged AS DATE) AS YearofCRMtslastdatestatuschanged
,organizations.name AS nameCB
,organizations.homepage_url
,iff(e like 'www.%', e, 'www.' + e) AS website
,left(category_list,charindex(',',category_list +',' )-1) AS category_CB
-- ,case when charindex(',',category_list,0) > 0 then left(category_list,charindex(',',category_list)-1) else category_list end as category_CB
,organizations.category_groups_list AS category_groups_CB
FROM basecrmcbreport.account
LEFT OUTER JOIN basecrmcbreport.CRM2CBURL_Lookup
ON account.Id = CRM2CBURL_Lookup.Key
LEFT OUTER JOIN basecrmcbreport.organizations
ON CRM2CBURL_Lookup.CB_URL_KEY = organizations.cb_url
cross Join (values (charindex('://', homepage_url))) a(a)
cross Join (values (iff(a = 0, 1, a + 3))) b(b)
cross Join (values (charindex('/', homepage_url, b))) c(c)
cross Join (values (iff(c = 0, length(homepage_url) + 1, c))) d(d)
cross Join (values (substring(homepage_url, b, d - b))) e(e)
Without the cross Joins
The main reason for cross join (or any join) to recognize the column when you select not when using table valued functions is that joins are used on tables only.
To use table valued functions, one must use cross apply or outer apply. But these are not supported in Databricks sql.
The following is the demo data I am using:
I tried using inner join on a table valued function using the following query and got the same error:
select d1.*,a from demo1 inner join (values(if(d1.team = 'OG',2,1))) a;
Instead, using the select query, the joins work as that is how they function:
select d1.*,a.no_of_wins from demo1 d1 inner join (select id,case team when 'OG' then 2 when 'TS' then 1 end as no_of_wins from demo1) a on d1.id=a.id;
So, the remedy for this problem is to replace all the table valued functions on which you are using joins with SELECT statements.

Cosmos DB Left Join

All of the documentation for Cosmos DB and it looks like it only supports the JOINkeyword, which seems to be a sort of INNER JOIN.
I have the following query:
SELECT * FROM
(
SELECT
DISTINCT(c.id),
c.OtherCollection,
FROM c
JOIN s IN c.OtherCollection
)
AS c order by c.id
This works fine and returns the data of documents that have OtherCollection populated. But It obviously does not return any documents that do not have it populated.
The reason for the join is that sometimes I execute the following query (queries are built up from user input)
SELECT * FROM
(
SELECT
DISTINCT(c.id),
c.OtherCollection,
FROM c
JOIN s IN c.OtherCollection
WHERE s.PropertyName = 'SomeValue'
)
AS c order by c.id
The question is how can I have a sort of LEFT JOIN operator in this scenario?
CosmosDB JOIN operation is limited to the scope of a single document. What possible is you can join parent object with child objects under same document.
It is totally different from SQL Join query which supports across two/many tables.
You can simulate LEFT JOIN with the EXISTS sentence.
Eg:
SELECT VALUE c
FROM c
WHERE (
(c.OtherCollection = null) OR EXISTS (--Like a "Left Join"
SELECT null
FROM s IN c.OtherCollection
WHERE s.PropertyName = 'SomeValue'
)
)
--AND/OR Some other c Node conditions
order by c.id

SQL Oracle Sub-query

I am having a issue getting this Sub-query to run. I am using Toad Data Point -Oracle. I get syntax error. I have tried several different ways with no luck. I am knew to sub-query's
Select *
from FINC.VNDR_ITEM_M as M
where M.ACCT_DOC_NBR = A.ACCT_DOC_NBR
(SELECT A.CLIENT_ID,
A.SRC_SYS_ID,
A.CO_CD,
A.ACCT_NBR,
A.CLR_DT,
A.ASGN_NBR,
A.FISCAL_YR,
A.ACCT_DOC_NBR,
A.LINE_ITEM_NBR,
A.MFR_PART_NBR,
A.POST_DT,
A.DRCR_IND,
A.DOC_CRNCY_AMT,
A.CRNCY_CD,
A.BSL_DT
FROM FINC.VNDR_ITEM_F A
WHERE A.CLR_DT IN (SELECT MAX(B.CLR_DT)
FROM FINC.VNDR_ITEM_F AS B
where (B.ACCT_DOC_NBR = A.ACCT_DOC_NBR and B.FISCAL_YR=A.FISCAL_YR and B.LINE_ITEM_NBR = A.LINE_ITEM_NBR and B.SRC_SYS_ID =A.SRC_SYS_ID and B.POST_DT=A.POST_DT and B.CO_CD=A.CO_CD)
and (B.CO_CD >='1000' and B.CO_CD <= '3000' or B.CO_CD ='7090') and (B.POST_DT Between to_date ('08/01/2018','mm/dd/yyyy')
AND to_date ('08/31/2018', 'mm/dd/yyyy')) and (B.SRC_SYS_ID ='15399') and (B.FISCAL_YR ='2018'))
GROUP BY
A.CLIENT_ID,
A.SRC_SYS_ID,
A.CO_CD,
A.ACCT_NBR,
A.CLR_DT,
A.ASGN_NBR,
A.FISCAL_YR,
A.ACCT_DOC_NBR,
A.LINE_ITEM_NBR,
A.MFR_PART_NBR,
A.POST_DT,
A.DRCR_IND,
A.DOC_CRNCY_AMT,
A.CRNCY_CD,
A.BSL_DT)
Your syntax is broken, you put subquery just at the end. Now it looks like:
select *
from dual as m
where a.dummy = m.dummy
(select dummy from dual)
It is in incorrect place, not joined, not aliased. What you should probably do is:
select *
from dual m
join (select dummy from dual) a on a.dummy = m.dummy
You also have some redundant, unnecessary brackets, but that's minor flaw. Full code (I cannot test it without data access):
select *
from FINC.VNDR_ITEM_M M
join (SELECT A.CLIENT_ID, A.SRC_SYS_ID, A.CO_CD, A.ACCT_NBR, A.CLR_DT, A.ASGN_NBR,
A.FISCAL_YR, A.ACCT_DOC_NBR, A.LINE_ITEM_NBR, A.MFR_PART_NBR, A.POST_DT,
A.DRCR_IND, A.DOC_CRNCY_AMT, A.CRNCY_CD, A.BSL_DT
FROM FINC.VNDR_ITEM_F A
WHERE A.CLR_DT IN (SELECT MAX(B.CLR_DT)
FROM FINC.VNDR_ITEM_F AS B
where B.ACCT_DOC_NBR = A.ACCT_DOC_NBR
and B.FISCAL_YR=A.FISCAL_YR
and B.LINE_ITEM_NBR = A.LINE_ITEM_NBR
and B.SRC_SYS_ID =A.SRC_SYS_ID
and B.POST_DT=A.POST_DT
and B.CO_CD=A.CO_CD
and (('1000'<=B.CO_CD and B.CO_CD<='3000') or B.CO_CD='7090')
and B.POST_DT Between to_date ('08/01/2018', 'mm/dd/yyyy')
AND to_date ('08/31/2018', 'mm/dd/yyyy')
and B.SRC_SYS_ID ='15399' and B.FISCAL_YR ='2018')
GROUP BY A.CLIENT_ID, A.SRC_SYS_ID, A.CO_CD, A.ACCT_NBR, A.CLR_DT, A.ASGN_NBR,
A.FISCAL_YR, A.ACCT_DOC_NBR, A.LINE_ITEM_NBR, A.MFR_PART_NBR, A.POST_DT,
A.DRCR_IND, A.DOC_CRNCY_AMT, A.CRNCY_CD, A.BSL_DT) A
on M.ACCT_DOC_NBR = A.ACCT_DOC_NBR and M.CO_CD=A.CO_CD;
You need to add an alias to the SubSelect (or Derived Table in Standard SQL):
select *
from
( select .......
) AS dt
join ....

Subquery instead of column name vs Inner Join

I'm fairly new to SQL and have started running into sub-queries as in this query below:
SELECT C.CustomerID
, C.Name
, ( Select PhoneNumber
FROM PhoneNumberTable P
WHERE P.CustomerID = C.CustomerID ) AS "PhoneNumber"
FROM CustomerTable C
Comparing to this query with a join below:
SELECT C.CustomerID
, C.Name
, P.PhoneNumber
FROM CustomerTable C
JOIN PhoneNumberTable P
ON P.customerID = C.customerID
Is there a difference in terms of efficiency/speed? The SQL I am working with has several sub-queries as I have shown above (no JOINs) and it is difficult to read.
joins in my experience tend to be faster, but sometimes you need a subquery.
you should also look into CTE they are very usefull and much easier (in my opinion) to manage
in your specific case i would use a join... because you are trying to join the 2 tables together.

Complex sql query and JPQL

How to change this complex sql statement into JPQL?
select a.name, a.surname, b.street, c.location, c.location_desc
from table1 join table2 on b.id = a.some_fk
left join table3 d on d.id = a.id
left join table4 c on c.some2_id = d.some2_fk where a.id = 400;
If this is possible to present in the JPQL form?
Impossible to give a definitive answer without knowing the entities and their mapping, but the query would look like this:
select a.name, a.surname, b.street, c.location, c.locationDesc
from Entity1 a
join a.entity2 b
left join a.entity3 d
left join d.entity4 c
where a.id = 400;
provided the necessary associations between entities are there.
JPQL is object oriented, it operates against JPA entity objects ,not database tables. Either you need to change the Question and add an UML diagramm or provide the Entity classes.

Resources