Reference join on a combined key in Azure Stream Analytics - azure

I'm trying to filter data based on a reference table on a combined key. I acutally found a solution that seems to work:
SELECT
i.id
, i.timestamp
, i.PropertyName
, i.PropertyValue
FROM iothub AS i
LEFT JOIN Reference AS R
ON CONCAT(i.id, '|', 'i.PropertyName) = R.uid
WHERE R.keepIt = 1
But if I do this I get a warning that my query contains a JOIN with no key selector which will be translated into a CROSS JOIN.
I tested the method and it seems to result in the correct results, but I'm afraid that there may be side effects later on through a maybe CROSS JOIN. Or may I just ignore this Azure warning, as it does not apply in my case?

The CONCAT(i.id, '|', 'i.PropertyName) = R.uid is not a key selector, since left side of the equality is an expression and not a column reference.
So this will be translated to the CROSS JOIN followed by a filter as the warning suggests.
This is a warning and does not affect the functional correctness of the result.
You can project the expression as a column before doing reference data join and then it will be proper key lookup join. Here is what your example query will look like:
SELECT
i.id
, i.timestamp
, i.PropertyName
, i.PropertyValue
FROM (SELECT id, timestamp, PropertyName, PropertyValue,
uid = CONCAT(id, '|', PropertyName)
FROM iothub) AS i
LEFT JOIN Reference AS R
ON i.uid = R.uid
WHERE R.keepIt = 1
Of cause, the sub-select can also be put into a separate step.

Related

Databricks AnalysisException: Column 'l' does not exist

I have a very strange occurrence with my code.
I keep on getting the error
AnalysisException: Column 'homepage_url' does not exist
However, when I do a select with cross Joins the column does actually exist.
Can someone take a look at my cross joins and let me know if that is where the problem is
SELECT DISTINCT
account.xpd_relationshipstatus AS CRM_xpd_relationshipstatus
,REPLACE(owneridname,'Data.Import #','') AS MontaguOwner
,account.ts_montaguoffice AS Montagu_Office
,CAST(account.ts_reminderdatesetto AS DATE) AS CRM_ts_reminderdatesetto
,CAST(account.ts_lastdatestatuschanged AS DATE) AS YearofCRMtslastdatestatuschanged
,organizations.name AS nameCB
,organizations.homepage_url
,iff(e like 'www.%', e, 'www.' + e) AS website
,left(category_list,charindex(',',category_list +',' )-1) AS category_CB
-- ,case when charindex(',',category_list,0) > 0 then left(category_list,charindex(',',category_list)-1) else category_list end as category_CB
,organizations.category_groups_list AS category_groups_CB
FROM basecrmcbreport.account
LEFT OUTER JOIN basecrmcbreport.CRM2CBURL_Lookup
ON account.Id = CRM2CBURL_Lookup.Key
LEFT OUTER JOIN basecrmcbreport.organizations
ON CRM2CBURL_Lookup.CB_URL_KEY = organizations.cb_url
cross Join (values (charindex('://', homepage_url))) a(a)
cross Join (values (iff(a = 0, 1, a + 3))) b(b)
cross Join (values (charindex('/', homepage_url, b))) c(c)
cross Join (values (iff(c = 0, length(homepage_url) + 1, c))) d(d)
cross Join (values (substring(homepage_url, b, d - b))) e(e)
Without the cross Joins
The main reason for cross join (or any join) to recognize the column when you select not when using table valued functions is that joins are used on tables only.
To use table valued functions, one must use cross apply or outer apply. But these are not supported in Databricks sql.
The following is the demo data I am using:
I tried using inner join on a table valued function using the following query and got the same error:
select d1.*,a from demo1 inner join (values(if(d1.team = 'OG',2,1))) a;
Instead, using the select query, the joins work as that is how they function:
select d1.*,a.no_of_wins from demo1 d1 inner join (select id,case team when 'OG' then 2 when 'TS' then 1 end as no_of_wins from demo1) a on d1.id=a.id;
So, the remedy for this problem is to replace all the table valued functions on which you are using joins with SELECT statements.

Couchbase N1QL query with joined subquery with USE KEYS

I need to write a n1ql query which demands another sub-query in select clause. As it is mandatory to use 'USE KEYS' while writing subqueries in n1ql. How to write USE KEYS clause for an inner joined query, below is an example of same case:
select meta(m).id as _ID, meta(m).cas as _CAS,
(select c.description
from bucketName p join bucketName c on p.categoryId = c.categoryId and p.type='product' and
c.type='category' and p.masterId=m.masterId ) as description //--How to use USE KEYS here ?
from bucketName m where m.type='master' and m.caseId='12345'
My requirment is to fetch some value from another 2 joined tables. however, I simplified above query to make it more understandable.
Please suggest the correct way to implement.
Also, is writting
sub-queries in n1ql is better than fetching documents seperatly and
merging them in coding?
Non FROM CLAUSE, correlated sub queries requires USE KEYS due to global secondary indexes queries can take long time and resources. This is restriction at present in the N1QL. If you can derive p's document key from the m you can give that as USE KEYS in p.
Otherwise you have two options
Option 1: As your subquery is in the projection Use ANSI JOIN https://blog.couchbase.com/ansi-join-support-n1ql/
SELECT META(m).id AS _ID, META(m).cas AS _CAS, c.description
FROM bucketName AS m
LEFT JOIN bucketName AS p ON p.masterId=m.masterId AND p.type='product'
LEFT JOIN bucketName AS c ON c.type='category' AND p.categoryId = c.categoryId
WHERE m.type='master' AND m.caseId='12345';
CREATE INDEX ix1 ON (caseId) WHERE type='master';
CREATE INDEX ix2 ON (masterId, categoryId) WHERE type='product';
CREATE INDEX ix3 ON (categoryId, description) WHERE type='category';
NOTE: If there is no Unique relation m to p to c JOIN can produce more results.
If that is case, you can do GROUP BY META(m).id, META(m).cas and
ARRAY_AGG(c.description). All descriptions are given as ARRAY.
Option 2:
As described by you issue two separate quires and merge in the application.

Postgresql jsonb -> invalid reference to FROM-clause entry for table "mt"

So I'm trying to inner join multiple tables in order to bind jsonb with a name. But I'm getting this error.
ERROR: invalid reference to FROM-clause entry for table "mt"
Find the recreational fiddle of the problem below.
SELECT test,jsonb_build_object(
'myData_updated',
json_agg(elems || jsonb_build_object('product_name', po.name))
)
FROM mainTable mt,
jsonb_array_elements(mt.myData) AS heading_elems,
jsonb_array_elements(heading_elems -> 'pItems') AS elems
JOIN products po ON (elems ->> 'pid' )::int = po.pid
INNER JOIN clients client ON client.client_id = mt.client_id
INNER JOIN projects project on project.project_id = mt.project_id
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=63e5b8a49940bb50b5bb7985a947c09e
I have tried removing alias i still get the same error. Quick googling says it is caused because of JOIN & ", delimited FROM" query
The syntax
table,
json_array_elements()
is the shortcut for
table CROSS JOIN LATERAL
json_array_elements()
So with the , syntax you used is an implicit join. After that, with using INNER JOIN you are using an explicit join. The mix is not always working, so replace the implicit syntax with an explicit one and it works.
demo:db<>fiddle
Beside this, the used function json_agg() is an aggregate, so if you want to get other columns like test you have to do a GROUP BY and/or use more aggregate functions on these columns.

SQLAlchemy: Referencing labels in SELECT subqueries

I'm trying to figure out how to replicate the below query in SQLAlchemy
SELECT c.company_id AS company_id,
(SELECT policy_id FROM associative_table at WHERE at.company_id = c.company_id) AS policy_id_ref,
(SELECT `default` FROM policy p WHERE p.policy_id = policy_id_ref) AS `default`,
FROM company c;
Note that this is a stripped down, basic example of what I'm really dealing with. The actual schema supports data and relationship versioning that requires the subqueries to include additional conditions, sorting, and limiting, making it impractical (if not impossible) for them to be joins.
The crux of the problem is in how the second subquery relies on policy_id_ref -- the value obtained from the first subquery. In SQLAlchemy, this is effectively what I have now:
ct = aliased(classes.company)
at = aliased(classes.associative_table)
pt = aliased(classes.policy)
policy_id_ref = session.query(at.policy_id).\
filter(at.company_id == ct.company_id).\
label('policy_id_ref')
policy_default = session.query(pt.default).\
filter(pt.id == 'policy_id_ref').\
label('default')
query = session.query(ct.company_id,policy_id_ref,policy_default)
The pull from the "company" table works fine as does the first subquery that retrieves the "policy_id_ref" column. The problem is the second subquery that has to reference that "policy_id_ref" column. I don't know how to write its filter in such a way that it literally renders "policy_id_ref" in the resulting query, to match the label of the first subquery.
Suggestions?
Thanks in advance
You can write your query as
select(
Companies.company_id,
AssociativeTable.policy_id.label('policy_id_ref'),
Policy.default.label('policy_default'),
).select_from(
Companies,
).join(
AssociativeTable,
AssociativeTable.company_id == Companies.company_id,
).join(
Policy,
AssociativeTable.policy_id == Policy.id
)
but in case you need reference to label from subquery => use literal_column
from sqlalchemy import func, select, literal_column
session.query(
func.array_agg(
literal_column('batch_info'),
JSONB
).label('history')
).select_from(
select(
func.jsonb_build_object(
'batch_id', AccountingQueueBatch.id,
'batch_label', AccountingQueueBatch.label,
).label('batch_info')
).select_from(
AccountingQueueBatch,
)
)

Postgres SQL Joins for Many To Many Relationship

right now I am "learning" Postgres SQL. I have 3 tables:
1) User: userId
2) Stack :stackId
3) User_Stack: userId, stackId
Now I want to fetch all stacks belonging to one user, given the userId. I understand I need to use Joins, but thats were I get stuck... I try it like this:
SELECT * FROM "Stack" LEFT OUTER JOIN "User_Stack" ON ('User_Stack.stackId' = 'Stack.stackId') WHERE "userId" = '590855';
Error: The returned data is empty.
PS: Is there any GUI Query builder out there ? Or do you have any other tips how to systematically create queries ?
EDIT: If I change the query to this:
SELECT * FROM "Stack" INNER JOIN "User_Stack" ON (User_Stack.stackId = Stack.stackId) WHERE "userId" = '590855';
I get the following error:
Kernel error: ERROR: missing FROM-clause entry for table "user_stack"
LINE 1: SELECT * FROM "Stack" INNER JOIN "User_Stack" ON (User_Stack...
Your main error is in the join. If you do 'something' = 'other' you're comparing string literals, not getting anything from the database. So this will always return false. You will want to compare table1.field1 = table2.field2
Another thing is the LEFT OUTER JOIN. I'm pretty sure you want an INNER JOIN since you want only fields that exist in the other table.
Also don't use double quotes for fields and table names since then the database will require case sensitivity and usually it's not good to have case sensitive names. You can use them with lowercase names if you need and always create them in lowercase.
Numbers also don't need to be quoted, it will just cause more processing when the system has to convert them from text to numbers.

Resources