How to get sub query columns in main query with WHERE EXISTS in PostgreSQL?

How to get sub query columns in main query with WHERE EXISTS in PostgreSQL? - node.js

I am stuck with a query which takes more time in JOIN, I want to use WHERE EXISTS in place of JOIN since as performance wise EXISTS takes less time than it.
I have modified the query and it's executing as per expectation but I am not able to use sub query's columns in my main query
Here is my query
SELECT MAX(st.grade_level::integer) AS grades ,
scl.sid AS org_sourced_id
FROM schedules_53b055b75cd237fde3af904c1e726e12 sch
LEFT JOIN schools scl ON(sch.school_id=scl.school_id)
AND scl.batch_id=sch.batch_id
AND scl.client_id = sch.client_id
AND sch.run_id = scl.run_id
WHERE EXISTS
(SELECT t.term_id,t.abbreviation
FROM terms t
WHERE (sch.term = t.term_id)
AND t.batch_id=sch.batch_id
AND t.client_id = sch.client_id
AND t.run_id = sch.run_id)
AND EXISTS
(SELECT st.grade_level,
st.sid
FROM students st
WHERE (sch.student_id=st.sid)
AND st.batch_id= sch.batch_id
AND st.client_id = sch.client_id
AND st.run_id = sch.run_id)
GROUP BY scl.sid ,
sch.course_name ,
sch.course_number,
sch.school_id
And I am getting this error:
ERROR: missing FROM-clause entry for table "st"
SQL state: 42P01
Character: 29
I have only used one column here just for sample but I have to use more fields from sub query.
My main aim is that how can I achieve this with EXISTS or any alternate solution which is more optimal as performance wise
I am using pg module on Node.js since as back end I am using Node.js.
UPDATE
Query with JOIN
SELECT MAX(st.grade_level::integer) AS grades ,
scl.sid AS org_sourced_id
FROM schedules_53b055b75cd237fde3af904c1e726e12 sch
LEFT JOIN schools scl ON(sch.school_id=scl.school_id)
AND scl.batch_id=sch.batch_id
AND scl.client_id = sch.client_id
AND sch.run_id = scl.run_id
LEFT JOIN terms t ON (sch.term = t.term_id)
AND t.batch_id=sch.batch_id
AND t.client_id = sch.client_id
AND t.run_id = sch.run_id
LEFT JOIN students st ON (sch.student_id=st.sid)
AND st.batch_id= sch.batch_id
AND st.client_id = sch.client_id
AND st.run_id = sch.run_id
GROUP BY scl.sid ,
sch.course_name ,
sch.course_number,
sch.school_id

Related

ATHENA/PRESTO complex query with multiple unnested tables

i have i would like to create a join over several tables.
table login : I would like to retrieve all the data from login
table logging : calculating the Nb_of_sessions for each db & for each a specific event type by user
table meeting : calculating the Nb_of_meetings for each db & for each user
table live : calculating the Nb_of_live for each db & for each user
I have those queries with the right results :
SELECT db.id,_id as userid,firstname,lastname
FROM "logins"."login",
UNNEST(dbs) AS a1 (db)
SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
array_join(array_agg(value.from_url),',') as from_url
FROM "loggings"."logging"
where event='url_event'
group by db.id,userid;
SELECT dbid,userid AS userid,count(*) as nb_interviews,
array_join(array_agg(interviewer),',') as interviewer
FROM "meetings"."meeting"
group by dbid,userid;
SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat
FROM "lives"."live",
UNNEST(users) AS r1 (user)
group by dbid,r1.user._id;
But when i begin to try put it all together, it seems i retrieve bad data (i have only on db retrieved) and it seems not efficient.
select a1.db.id,a._id as userid,a.firstname,a.lastname,count(rl._id) as nb_chat
FROM
"logins"."login" a,
"loggings"."logging" b,
"meetings"."meeting" c,
"lives"."live" d,
UNNEST(dbs) AS a1 (db),
UNNEST(users) AS r1 (user)
where a._id = b.userid AND a._id = c.userid AND a._id = r1.user._id
group by 1,2,3,4
Do you have an idea ?
Regards.

The easiest way is to work with with to structure the subquery and then reference them.
with parameter reference:
You can use WITH to flatten nested queries, or to simplify subqueries.
The WITH clause precedes the SELECT list in a query and defines one or
more subqueries for use within the SELECT query.
Each subquery defines a temporary table, similar to a view definition,
which you can reference in the FROM clause. The tables are used only
when the query runs.
Since you already have working sub queries, the following should work:
with logins as
(
SELECT db.id,_id as userid,firstname,lastname
FROM "logins"."login",
UNNEST(dbs) AS a1 (db)
)
,visits as
(
SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
array_join(array_agg(value.from_url),',') as from_url
FROM "loggings"."logging"
where event='url_event'
group by db.id,userid
)
,meetings as
(
SELECT dbid,userid AS userid,count(*) as nb_interviews,
array_join(array_agg(interviewer),',') as interviewer
FROM "meetings"."meeting"
group by dbid,userid
)
,chats as
(
SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat
FROM "lives"."live",
UNNEST(users) AS r1 (user)
group by dbid,r1.user._id
)
select *
from logins l
left join visits v
on l.dbid = v.dbid
and l.userid = v.userid
left join meetings m
on l.dbid = m.dbid
and l.userid = m.userid
left join chats c
on l.dbid = c.dbid
and l.userid = c.userid;

Is there any table which we can refer to get the partition name in Azure?

I want to know the partition name (similar to PART_NAME in Hive) of the table that has been partitioned specifically on Azure, I've been looking in the System tables (like sys.partitions) for this information but was not able to figure it out, therefore wanted to ask is there any other system tables which I can refer to get the detail about the partition name.
To make the question more clear I've added a screen shot. In the photo I've highlighted a column name on which partition is being performed, I want to retrieve that column name, so is there any way to retrieve that from any System tables?

You have to join a bunch of tables together to get worthwhile Partition information. Here is a sample:
SELECT s.[name] AS [schema]
,t.[name] AS [table]
, i.[name] AS [index_name]
, p.[partition_id]
, p.[partition_number]
, p.[rows] AS [partition_rows]
, rng.[value] AS [partition_value]
, ds.[data_space_id]
, ps.[function_id]
from sys.tables t
join sys.schemas s
on s.[schema_id] = t.[schema_id]
join sys.indexes i
on t.[object_id] = i.[object_id]
join sys.partitions p
on p.[object_id] = t.[object_id]
join sys.data_spaces ds
on i.[data_space_id] = ds.[data_space_id]
join sys.partition_schemes ps
on ps.[data_space_id] = i.[data_space_id]
join sys.partition_functions pf
on ps.[function_id] = pf.[function_id]
join sys.partition_range_values rng
on pf.[function_id] = rng.[function_id] AND rng.[boundary_id] = p.[partition_number]
where s.[name] = #schema_name
and t.[name] = #table_name
Which returns something like the following:

ADW - Query performance issues

I have an Azure SQL Warehouse setup of DW500c of gen2 and i have a Data Vault model in it with several tables.
I am trying to execute one query that i think is taking too much time.
Here is the query i have been executing:
SELECT
H_PROFITCENTER.[BK_PROFITCENTER]
,H_ACCOUNT.[BK_ACCOUNT]
,H_LOCALCURRENCY.[BK_CURRENCY]
,H_DOCUMENTCURRENCY.[BK_CURRENCY]
,H_COSTCENTER.[BK_COSTCENTER]
,H_COMPANY.[BK_COMPANY]
,H_CURRENCY.[BK_CURRENCY]
,H_INTERNALORDER.[BK_INTERNALORDER]
,H_VERSION.[BK_VERSION]
,H_COSTELEMENT.[BK_COSTELEMENT]
,H_CALENDARDATE.[BK_DATE]
,H_VALUETYPEREPORT.[BK_VALUETYPEREPORT]
,H_FISCALPERIOD.[BK_FISCALPERIOD]
,H_COUNTRY.[BK_COUNTRY]
,H_FUNCTIONALAREA.[BK_FUNCTIONALAREA]
,SLADI.[LINE_ITEM]
,SLADI.[AMOUNT]
,SLADI.[CREDIT]
,SLADI.[DEBIT]
,SLADI.[QUANTITY]
,SLADI.[BALANCE]
,SLADI.[LOADING_DATE]
FROM [dwh].[L_ACCOUNTINGDOCUMENTITEMS] AS LADI
INNER JOIN [dwh].[SL_ACCOUNTINGDOCUMENTITEMS] AS SLADI ON LADI.[HK_ACCOUNTINGDOCUMENTITEMS] = SLADI.[HK_ACCOUNTINGDOCUMENTITEMS]
LEFT JOIN dwh.H_PROFITCENTERAS H_PROFITCENTER ON H_PROFITCENTER.[HK_PROFITCENTER] = LADI.[HK_PROFITCENTER]
LEFT JOIN dwh.H_ACCOUNT AS H_ACCOUNT ON H_ACCOUNT.[HK_ACCOUNT] = LADI.[HK_ACCOUNT]
LEFT JOIN dwh.H_CURRENCY AS H_LOCALCURRENCY ON H_LOCALCURRENCY.[HK_CURRENCY] = LADI.[HK_LOCALCURRENCY]
LEFT JOIN dwh.H_CURRENCY AS H_DOCUMENTCURRENCY ON H_DOCUMENTCURRENCY.[HK_CURRENCY] = LADI.[HK_DOCUMENTCURRENCY]
LEFT JOIN dwh.H_COSTCENTER AS H_COSTCENTER ON H_COSTCENTER.[HK_COSTCENTER] = LADI.[HK_COSTCENTER]
LEFT JOIN dwh.H_COMPANY AS H_COMPANY ON H_COMPANY.[HK_COMPANY] = LADI.[HK_COMPANY]
LEFT JOIN dwh.H_CURRENCY AS H_CURRENCY ON H_CURRENCY.[HK_CURRENCY] = LADI.[HK_CURRENCY]
LEFT JOIN dwh.H_INTERNALORDERAS H_INTERNALORDER ON H_INTERNALORDER.[HK_INTERNALORDER] = LADI.[HK_INTERNALORDER]
LEFT JOIN dwh.H_VERSION AS H_VERSION ON H_VERSION.[HK_VERSION] = LADI.[HK_VERSION]
LEFT JOIN dwh.H_COSTELEMENT AS H_COSTELEMENT ON H_COSTELEMENT.[HK_COSTELEMENT] = LADI.[HK_COSTELEMENT]
LEFT JOIN dwh.H_DATE AS H_CALENDARDATE ON H_CALENDARDATE.[HK_DATE] = LADI.[HK_CALENDARDATE]
LEFT JOIN dwh.H_VALUETYPEREPORTAS H_VALUETYPEREPORT ON H_VALUETYPEREPORT.[HK_VALUETYPEREPORT] = LADI.[HK_VALUETYPEREPORT]
LEFT JOIN dwh.H_FISCALPERIODAS H_FISCALPERIOD ON H_FISCALPERIOD.[HK_FISCALPERIOD] = LADI.[HK_FISCALPERIOD]
LEFT JOIN dwh.H_COUNTRY AS H_COUNTRY ON H_COUNTRY.[HK_COUNTRY] = LADI.[HK_COUNTRY]
LEFT JOIN dwh.H_FUNCTIONALAREAAS H_FUNCTIONALAREA ON H_FUNCTIONALAREA.[HK_FUNCTIONALAREA] = LADI.[HK_FUNCTIONALAREA]
This query is taking me 22 minutes to execute.
I must say that it returns around 1200000000 rows.
[L_ACCOUNTINGDOCUMENTITEMS] and [SL_ACCOUNTINGDOCUMENTITEMS] are hash distributed by [HK_ACCOUNTINGDOCUMENTITEMS] column and all other tables were created with replicated table distribution.
Also, i activated in azure datawarehouse automatic statistics creation.
Can anyone help me to understand how can i speed it up?

Here are some things to try out to see if you make this faster -
Create a table using 'Create Table as Select' (CTAS) with RoundRobin option for your query and take the timing of that. I have a feeling that returning that large amount of rows to your client could be a big contributor to the time. If the CTAS finishes in lets say 5 minutes, you can safely say that the rest of the time is being taken by return operation.
If not, You can materialize some of the left joins into a table and then add that table to the main query to see if that finishes faster.
You can also look at explain plans to see if you can cut down some steps by aligning the tables on a common key.

Is it possible to chain subsequent queries's where clauses in Dapper based on the results of a previous query in the same connection?

Is it possible to use .QueryMultiple (or some other method) in Dapper, and use the results of each former query to be used in the where clause of the next query, without having to do each query individually, get the id, and then .Query again, get the id and so on.
For example,
string sqlString = #"select tableA_id from tableA where tableA_lastname = #lastname;
select tableB_id from tableB WHERE tableB_id = tableA_id";
db.QueryMultiple.(sqlString, new {lastname = "smith"});
Is something like this possible with Dapper or do I need a view or stored procedure to accomplish this? I can use multiple joins for one SQL statement, but in my real query there are 7 joins, and I didn't think I should return 7 objects.
Right now I'm just using object.

You can store every previous query in table parameter and then first perform select from the parameter and query for next, for example:
DECLARE #TableA AS Table(
tableA_id INT
-- ... all other columns you need..
)
INSERT #TableA
SELECT tableA_id
FROM tableA
WHERE tableA_lastname = #lastname
SELECT *
FROM #TableA
SELECT tableB_id
FROM tableB
JOIN tableA ON tableB_id = tableA_id

Subquery instead of column name vs Inner Join

I'm fairly new to SQL and have started running into sub-queries as in this query below:
SELECT C.CustomerID
, C.Name
, ( Select PhoneNumber
FROM PhoneNumberTable P
WHERE P.CustomerID = C.CustomerID ) AS "PhoneNumber"
FROM CustomerTable C
Comparing to this query with a join below:
SELECT C.CustomerID
, C.Name
, P.PhoneNumber
FROM CustomerTable C
JOIN PhoneNumberTable P
ON P.customerID = C.customerID
Is there a difference in terms of efficiency/speed? The SQL I am working with has several sub-queries as I have shown above (no JOINs) and it is difficult to read.

joins in my experience tend to be faster, but sometimes you need a subquery.
you should also look into CTE they are very usefull and much easier (in my opinion) to manage
in your specific case i would use a join... because you are trying to join the 2 tables together.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to get sub query columns in main query with WHERE EXISTS in PostgreSQL? - node.js

Related

ATHENA/PRESTO complex query with multiple unnested tables

Is there any table which we can refer to get the partition name in Azure?

ADW - Query performance issues

Is it possible to chain subsequent queries's where clauses in Dapper based on the results of a previous query in the same connection?

Subquery instead of column name vs Inner Join

Categories

Resources