Databricks AnalysisException: Column 'l' does not exist - databricks

I have a very strange occurrence with my code.
I keep on getting the error
AnalysisException: Column 'homepage_url' does not exist
However, when I do a select with cross Joins the column does actually exist.
Can someone take a look at my cross joins and let me know if that is where the problem is
SELECT DISTINCT
account.xpd_relationshipstatus AS CRM_xpd_relationshipstatus
,REPLACE(owneridname,'Data.Import #','') AS MontaguOwner
,account.ts_montaguoffice AS Montagu_Office
,CAST(account.ts_reminderdatesetto AS DATE) AS CRM_ts_reminderdatesetto
,CAST(account.ts_lastdatestatuschanged AS DATE) AS YearofCRMtslastdatestatuschanged
,organizations.name AS nameCB
,organizations.homepage_url
,iff(e like 'www.%', e, 'www.' + e) AS website
,left(category_list,charindex(',',category_list +',' )-1) AS category_CB
-- ,case when charindex(',',category_list,0) > 0 then left(category_list,charindex(',',category_list)-1) else category_list end as category_CB
,organizations.category_groups_list AS category_groups_CB
FROM basecrmcbreport.account
LEFT OUTER JOIN basecrmcbreport.CRM2CBURL_Lookup
ON account.Id = CRM2CBURL_Lookup.Key
LEFT OUTER JOIN basecrmcbreport.organizations
ON CRM2CBURL_Lookup.CB_URL_KEY = organizations.cb_url
cross Join (values (charindex('://', homepage_url))) a(a)
cross Join (values (iff(a = 0, 1, a + 3))) b(b)
cross Join (values (charindex('/', homepage_url, b))) c(c)
cross Join (values (iff(c = 0, length(homepage_url) + 1, c))) d(d)
cross Join (values (substring(homepage_url, b, d - b))) e(e)
Without the cross Joins

The main reason for cross join (or any join) to recognize the column when you select not when using table valued functions is that joins are used on tables only.
To use table valued functions, one must use cross apply or outer apply. But these are not supported in Databricks sql.
The following is the demo data I am using:
I tried using inner join on a table valued function using the following query and got the same error:
select d1.*,a from demo1 inner join (values(if(d1.team = 'OG',2,1))) a;
Instead, using the select query, the joins work as that is how they function:
select d1.*,a.no_of_wins from demo1 d1 inner join (select id,case team when 'OG' then 2 when 'TS' then 1 end as no_of_wins from demo1) a on d1.id=a.id;
So, the remedy for this problem is to replace all the table valued functions on which you are using joins with SELECT statements.

Related

Cosmos DB Left Join

All of the documentation for Cosmos DB and it looks like it only supports the JOINkeyword, which seems to be a sort of INNER JOIN.
I have the following query:
SELECT * FROM
(
SELECT
DISTINCT(c.id),
c.OtherCollection,
FROM c
JOIN s IN c.OtherCollection
)
AS c order by c.id
This works fine and returns the data of documents that have OtherCollection populated. But It obviously does not return any documents that do not have it populated.
The reason for the join is that sometimes I execute the following query (queries are built up from user input)
SELECT * FROM
(
SELECT
DISTINCT(c.id),
c.OtherCollection,
FROM c
JOIN s IN c.OtherCollection
WHERE s.PropertyName = 'SomeValue'
)
AS c order by c.id
The question is how can I have a sort of LEFT JOIN operator in this scenario?
CosmosDB JOIN operation is limited to the scope of a single document. What possible is you can join parent object with child objects under same document.
It is totally different from SQL Join query which supports across two/many tables.
You can simulate LEFT JOIN with the EXISTS sentence.
Eg:
SELECT VALUE c
FROM c
WHERE (
(c.OtherCollection = null) OR EXISTS (--Like a "Left Join"
SELECT null
FROM s IN c.OtherCollection
WHERE s.PropertyName = 'SomeValue'
)
)
--AND/OR Some other c Node conditions
order by c.id

Postgresql jsonb -> invalid reference to FROM-clause entry for table "mt"

So I'm trying to inner join multiple tables in order to bind jsonb with a name. But I'm getting this error.
ERROR: invalid reference to FROM-clause entry for table "mt"
Find the recreational fiddle of the problem below.
SELECT test,jsonb_build_object(
'myData_updated',
json_agg(elems || jsonb_build_object('product_name', po.name))
)
FROM mainTable mt,
jsonb_array_elements(mt.myData) AS heading_elems,
jsonb_array_elements(heading_elems -> 'pItems') AS elems
JOIN products po ON (elems ->> 'pid' )::int = po.pid
INNER JOIN clients client ON client.client_id = mt.client_id
INNER JOIN projects project on project.project_id = mt.project_id
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=63e5b8a49940bb50b5bb7985a947c09e
I have tried removing alias i still get the same error. Quick googling says it is caused because of JOIN & ", delimited FROM" query
The syntax
table,
json_array_elements()
is the shortcut for
table CROSS JOIN LATERAL
json_array_elements()
So with the , syntax you used is an implicit join. After that, with using INNER JOIN you are using an explicit join. The mix is not always working, so replace the implicit syntax with an explicit one and it works.
demo:db<>fiddle
Beside this, the used function json_agg() is an aggregate, so if you want to get other columns like test you have to do a GROUP BY and/or use more aggregate functions on these columns.

U-Sql not allowing non-equijoins

I have stumbled across a bit of an issue with U-SQL which for me is a problem I haven't yet found a workaround for.
It seems U-SQL doesnt support anything else but == in joins, so you can't put > or < in the join itself.
For the use case below which I have done in oracle:
create table trf.test_1(
number_col int
);
insert into trf.test_1 VALUES (10);
insert into trf.test_1 VALUES (20);
insert into trf.test_1 VALUES (30);
insert into trf.test_1 VALUES (60);
drop table trf.test_2;
create table trf.test_2(
number_col int
);
insert into trf.test_2 VALUES (20);
insert into trf.test_2 VALUES (30);
SELECT t1.number_col, t2.number_col
FROM trf.test_1 t1
LEFT JOIN trf.test_2 t2 ON t1.number_col < t2.number_col
;
I get the following:
How might I do that in u-sql without the < join?
I tried a cross join, but if you include the < in the where clause it just turns into an inner and you don't get the rows with the nulls.
Any ideas appreciated.
#t1 =
SELECT * FROM
( VALUES
(10),
(20),
(30),
(60)
) AS T(num_col);
#t2 =
SELECT * FROM
( VALUES
(20),
(30)
) AS T(num_col);
#result =
SELECT t1.num_col, t2.num_col AS num_col_2
FROM #t1 AS t1
CROSS JOIN #t2 AS t2
WHERE t1.num_col < t2.num_col;
#result2 =
SELECT t1.num_col, t2.num_col AS num_col_2
FROM #t1 AS t1
LEFT JOIN #result AS t2 ON t1.num_col == t2.num_col;
OUTPUT #result2
TO "/Output/ReferenceGuide/Joins/exampleA.csv"
USING Outputters.Csv();
Edit - I added the left join from the #t1 dataset back to the #result set which seems to work but would be interested if there are any better solutions out there. Seems a bit of a work around.
This is a known feature and discussed extensively in the article "U-SQL SELECT Selecting from joins".
Some quotes from that article:
Join Comparisons
U-SQL, like most scaled out Big Data Query languages
that support joins, restricts the join comparison to equality
comparisons between columns in the rowsets to be joined...
...
If one has a non-equality comparison or a more complex expression (such as a method invocation) in the comparison, one can move the comparison to the SELECT’s WHERE clause. Or the more complex expression can be placed in an earlier SELECT statement’s column and then that alias can be referred to in the join comparison.
Basically they don't scale particularly well on a distributed platform like ADLA.

MSSQL: Use the result of nested sub-queries

The following works and results in the output shown in the image below.
SELECT
SU_Internal_ID,
NQ_QuestionText,
NA_AnswerText,
NoOfTimesChoosen
FROM
(SELECT
U.SU_Internal_ID,
NQ.NQ_QuestionText,
NA.NA_AnswerText,
COUNT(PC.UserID) AS NoOfTimesChoosen
FROM [dbo].[ParticipantNSChoices] PC
INNER JOIN [dbo].[KnowledgeSurveyAnswers] NA
on PC.NA_Internal_ID = NA.NA_Internal_ID
INNER JOIN [dbo].[KnowledgeSurveyQuestions] NQ
on PC.NQ_Internal_ID = NQ.NQ_Internal_ID
INNER JOIN [dbo].[AspNetUsers] U
on PC.UserID = U.Id
WHERE
U.SU_Internal_ID=1
and NQ.NQ_QuestionText LIKE '%Do you feel comfortable working with computers%'
GROUP
BY U.SU_Internal_ID,
NQ.NQ_QuestionText,
NA.NA_AnswerText ) as A
I want to add a column to show the percent for the two answers 'No' and 'Yes': so next to 'No' I want '20' and next to 'Yes' '80', but I'm pretty new at this and am stuck; I would appreciate any help. Thanks.
Result of working script
You don't need the outer SELECT.
SELECT
U.SU_Internal_ID,
NQ.NQ_QuestionText,
NA.NA_AnswerText,
COUNT(PC.UserID) AS NoOfTimesChoosen,
(cast(COUNT(PC.UserID) as float) /
cast(
(select count(*) from [dbo].[ParticipantNSChoices] PC2
INNER JOIN [dbo].[KnowledgeSurveyAnswers] NA2 on PC2.NA_Internal_ID = NA2.NA_Internal_ID
INNER JOIN [dbo].[KnowledgeSurveyQuestions] NQ2 on PC2.NQ_Internal_ID = NQ2.NQ_Internal_ID
INNER JOIN [dbo].[AspNetUsers] U2 on PC2.UserID = U2.Id
WHERE
U2.SU_Internal_ID=1
and NQ2.NQ_QuestionText LIKE '%Do you feel comfortable working with computers%' )
as float))
* 100 as PercentChosen
FROM [dbo].[ParticipantNSChoices] PC
INNER JOIN [dbo].[KnowledgeSurveyAnswers] NA
on PC.NA_Internal_ID = NA.NA_Internal_ID
INNER JOIN [dbo].[KnowledgeSurveyQuestions] NQ
on PC.NQ_Internal_ID = NQ.NQ_Internal_ID
INNER JOIN [dbo].[AspNetUsers] U
on PC.UserID = U.Id
WHERE
U.SU_Internal_ID=1
and NQ.NQ_QuestionText LIKE '%Do you feel comfortable working with computers%'
GROUP
BY U.SU_Internal_ID,
NQ.NQ_QuestionText,
NA.NA_AnswerText
The counts will be integers, so you need to cast as floats before dividing. You can then further format to your liking. Also, I might not have your exact denominator, because I don't know what your data looks like, but you can modify to match what you need.

Hive multiple subqueries

I'm using Hive 0.9.0 and I'm trying to execute query i.e.
`SELECT a.id, b.user FROM (SELECT...FROM a_table) a, (SELECT...FROM b_table) b WHERE a.date = b.date;`
but it returns error "loop (...)+ does not match input....".
Does Hive support multiple subqueries in FROM just like Oracle DB?
Multiple subqueries allowed in hive.
I tested with below code,it works.
select * from (select id from test where id>10) a
join (select id from test where id>20) b on a.id=b.id;
Please post your exact code so that I can give relevant solution.
join subqueries is supported Absolutely.
I think the key problem is that u use SELECT...FROM.
The correct syntax is SELECT * FROM
SELECT a.id, b.user
FROM
(SELECT * FROM a_table) a
JOIN (SELECT * FROM b_table) b ON a.date = b.date;
If you want to obtain the full Cartesian product before applying the WHERE
clause, instead:
SELECT a.id, b.user FROM (SELECT...FROM a_table) a, (SELECT...FROM b_table) b WHERE a.date = b.date;
you should use 'join' in the middle, i.e.
SELECT a.id, b.user FROM (SELECT...FROM a_table) a join (SELECT...FROM b_table) b WHERE a.date = b.date;
above is not admissible in strict mode.

Resources