ATHENA/PRESTO complex query with multiple unnested tables - presto

i have i would like to create a join over several tables.
table login : I would like to retrieve all the data from login
table logging : calculating the Nb_of_sessions for each db & for each a specific event type by user
table meeting : calculating the Nb_of_meetings for each db & for each user
table live : calculating the Nb_of_live for each db & for each user
I have those queries with the right results :
SELECT db.id,_id as userid,firstname,lastname
FROM "logins"."login",
UNNEST(dbs) AS a1 (db)
SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
array_join(array_agg(value.from_url),',') as from_url
FROM "loggings"."logging"
where event='url_event'
group by db.id,userid;
SELECT dbid,userid AS userid,count(*) as nb_interviews,
array_join(array_agg(interviewer),',') as interviewer
FROM "meetings"."meeting"
group by dbid,userid;
SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat
FROM "lives"."live",
UNNEST(users) AS r1 (user)
group by dbid,r1.user._id;
But when i begin to try put it all together, it seems i retrieve bad data (i have only on db retrieved) and it seems not efficient.
select a1.db.id,a._id as userid,a.firstname,a.lastname,count(rl._id) as nb_chat
FROM
"logins"."login" a,
"loggings"."logging" b,
"meetings"."meeting" c,
"lives"."live" d,
UNNEST(dbs) AS a1 (db),
UNNEST(users) AS r1 (user)
where a._id = b.userid AND a._id = c.userid AND a._id = r1.user._id
group by 1,2,3,4
Do you have an idea ?
Regards.

The easiest way is to work with with to structure the subquery and then reference them.
with parameter reference:
You can use WITH to flatten nested queries, or to simplify subqueries.
The WITH clause precedes the SELECT list in a query and defines one or
more subqueries for use within the SELECT query.
Each subquery defines a temporary table, similar to a view definition,
which you can reference in the FROM clause. The tables are used only
when the query runs.
Since you already have working sub queries, the following should work:
with logins as
(
SELECT db.id,_id as userid,firstname,lastname
FROM "logins"."login",
UNNEST(dbs) AS a1 (db)
)
,visits as
(
SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
array_join(array_agg(value.from_url),',') as from_url
FROM "loggings"."logging"
where event='url_event'
group by db.id,userid
)
,meetings as
(
SELECT dbid,userid AS userid,count(*) as nb_interviews,
array_join(array_agg(interviewer),',') as interviewer
FROM "meetings"."meeting"
group by dbid,userid
)
,chats as
(
SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat
FROM "lives"."live",
UNNEST(users) AS r1 (user)
group by dbid,r1.user._id
)
select *
from logins l
left join visits v
on l.dbid = v.dbid
and l.userid = v.userid
left join meetings m
on l.dbid = m.dbid
and l.userid = m.userid
left join chats c
on l.dbid = c.dbid
and l.userid = c.userid;

Related

Cosmos DB Left Join

All of the documentation for Cosmos DB and it looks like it only supports the JOINkeyword, which seems to be a sort of INNER JOIN.
I have the following query:
SELECT * FROM
(
SELECT
DISTINCT(c.id),
c.OtherCollection,
FROM c
JOIN s IN c.OtherCollection
)
AS c order by c.id
This works fine and returns the data of documents that have OtherCollection populated. But It obviously does not return any documents that do not have it populated.
The reason for the join is that sometimes I execute the following query (queries are built up from user input)
SELECT * FROM
(
SELECT
DISTINCT(c.id),
c.OtherCollection,
FROM c
JOIN s IN c.OtherCollection
WHERE s.PropertyName = 'SomeValue'
)
AS c order by c.id
The question is how can I have a sort of LEFT JOIN operator in this scenario?
CosmosDB JOIN operation is limited to the scope of a single document. What possible is you can join parent object with child objects under same document.
It is totally different from SQL Join query which supports across two/many tables.
You can simulate LEFT JOIN with the EXISTS sentence.
Eg:
SELECT VALUE c
FROM c
WHERE (
(c.OtherCollection = null) OR EXISTS (--Like a "Left Join"
SELECT null
FROM s IN c.OtherCollection
WHERE s.PropertyName = 'SomeValue'
)
)
--AND/OR Some other c Node conditions
order by c.id

Is it possible to chain subsequent queries's where clauses in Dapper based on the results of a previous query in the same connection?

Is it possible to use .QueryMultiple (or some other method) in Dapper, and use the results of each former query to be used in the where clause of the next query, without having to do each query individually, get the id, and then .Query again, get the id and so on.
For example,
string sqlString = #"select tableA_id from tableA where tableA_lastname = #lastname;
select tableB_id from tableB WHERE tableB_id = tableA_id";
db.QueryMultiple.(sqlString, new {lastname = "smith"});
Is something like this possible with Dapper or do I need a view or stored procedure to accomplish this? I can use multiple joins for one SQL statement, but in my real query there are 7 joins, and I didn't think I should return 7 objects.
Right now I'm just using object.
You can store every previous query in table parameter and then first perform select from the parameter and query for next, for example:
DECLARE #TableA AS Table(
tableA_id INT
-- ... all other columns you need..
)
INSERT #TableA
SELECT tableA_id
FROM tableA
WHERE tableA_lastname = #lastname
SELECT *
FROM #TableA
SELECT tableB_id
FROM tableB
JOIN tableA ON tableB_id = tableA_id

MySQL: Is there a better alternative to subquery in each left join

My application needs to get some basic data from a user table with primary key user_id - and various other data about the user from secondary tables, each of which has user_id as a foreign key. There are a bunch of these secondary tables such as name, addresss, phone, etcetera - things about a person that can change over time.
More specifically, I need only some values from the most recent row from each secondary table. Each table has a "latest" column which is unix timestamp of the most recent UPDATE or INSERT (we must not delete in this application).
The following works correctly:
SELECT u.username, u.user_id, u.password, u.email, u.active
, n.first , n.middle , n.last
, uo.organization_id /* , other_cols_from_other_tables */
FROM user u
LEFT JOIN user_org uo ON (uo.user_id = u.user_id AND
uo.latest in (select max(latest) from name uo1
where uo1.user_id = u.user_id))
/* here, other LEFT JOINs like the above one */
WHERE u.username = :username
However, a subquery solution is widely discouraged due to slowness, and some of these queries will run on every request. So I came up with the following that works in some cases and gets rid of the subquery:
SELECT u.username, u.user_id, u.password, u.email, u.active
, n.first , n.middle , n.last
, uo.organization_id /* , other_cols_from_other_tables, etc. */
FROM user u
INNER JOIN
( SELECT user_id, MAX(latest) utd
FROM user_org
GROUP BY user_id
) uo1 ON uo1.user_id = u.user_id
LEFT JOIN user_org uo
ON (uo.user_id = u.user_id and uo.latest = uo1.utd)
/* here, other clauses like the part from 'FROM' to here */
WHERE u.username = :username
The latter, unfortunately makes a hard dependence on data in the secondary table, so that the whole query fails if data is lacking in any secondary table for the particular user.
I've researched this on SO and www and there are many solutions for avoiding subqueries, but everything I've found on the subject has the issue in the main query, not in a left join.
The logic I need is "if there's data for this user in this secondary table, get the specified column(s) from the most recent row in that table, otherwise a null".
It seems to me that putting a "current row" marker column on the most recent row in each table would avoid the whole issue and run faster than any other solution, but would be against normalization (I would still have to have the 'latest' column to maintain order-able history of previous data).
Is there a solution that gets normalization + speed? This is mariadb so it needs Mysql syntax.
EDIT: Still would like a better way, but decided to go with the extra column. Now the problem described above is avoided, and the SELECT SQL is much simplified and presumably faster. The downside is adding complexity in saves, but SELECT is more frequent.
MariaDB supports ROW_NUMBER as of version 10.2:
SELECT
u.username,
u.user_id,
u.password,
u.email,
u.active,
uo.organization_id,
...
FROM user u
LEFT JOIN
(
select
user_org.*,
row_number() over(partition by user_id order by latest desc) as rn
from user_org
) uo ON uo.user_id = u.user_id AND uo.rn = 1
...
WHERE u.username = :username;

How to get sub query columns in main query with WHERE EXISTS in PostgreSQL?

I am stuck with a query which takes more time in JOIN, I want to use WHERE EXISTS in place of JOIN since as performance wise EXISTS takes less time than it.
I have modified the query and it's executing as per expectation but I am not able to use sub query's columns in my main query
Here is my query
SELECT MAX(st.grade_level::integer) AS grades ,
scl.sid AS org_sourced_id
FROM schedules_53b055b75cd237fde3af904c1e726e12 sch
LEFT JOIN schools scl ON(sch.school_id=scl.school_id)
AND scl.batch_id=sch.batch_id
AND scl.client_id = sch.client_id
AND sch.run_id = scl.run_id
WHERE EXISTS
(SELECT t.term_id,t.abbreviation
FROM terms t
WHERE (sch.term = t.term_id)
AND t.batch_id=sch.batch_id
AND t.client_id = sch.client_id
AND t.run_id = sch.run_id)
AND EXISTS
(SELECT st.grade_level,
st.sid
FROM students st
WHERE (sch.student_id=st.sid)
AND st.batch_id= sch.batch_id
AND st.client_id = sch.client_id
AND st.run_id = sch.run_id)
GROUP BY scl.sid ,
sch.course_name ,
sch.course_number,
sch.school_id
And I am getting this error:
ERROR: missing FROM-clause entry for table "st"
SQL state: 42P01
Character: 29
I have only used one column here just for sample but I have to use more fields from sub query.
My main aim is that how can I achieve this with EXISTS or any alternate solution which is more optimal as performance wise
I am using pg module on Node.js since as back end I am using Node.js.
UPDATE
Query with JOIN
SELECT MAX(st.grade_level::integer) AS grades ,
scl.sid AS org_sourced_id
FROM schedules_53b055b75cd237fde3af904c1e726e12 sch
LEFT JOIN schools scl ON(sch.school_id=scl.school_id)
AND scl.batch_id=sch.batch_id
AND scl.client_id = sch.client_id
AND sch.run_id = scl.run_id
LEFT JOIN terms t ON (sch.term = t.term_id)
AND t.batch_id=sch.batch_id
AND t.client_id = sch.client_id
AND t.run_id = sch.run_id
LEFT JOIN students st ON (sch.student_id=st.sid)
AND st.batch_id= sch.batch_id
AND st.client_id = sch.client_id
AND st.run_id = sch.run_id
GROUP BY scl.sid ,
sch.course_name ,
sch.course_number,
sch.school_id

Subsonic 3 Simple Query inner join sql syntax

I want to perform a simple join on two tables (BusinessUnit and UserBusinessUnit), so I can get a list of all BusinessUnits allocated to a given user.
The first attempt works, but there's no override of Select which allows me to restrict the columns returned (I get all columns from both tables):
var db = new KensDB();
SqlQuery query = db.Select
.From<BusinessUnit>()
.InnerJoin<UserBusinessUnit>( BusinessUnitTable.IdColumn, UserBusinessUnitTable.BusinessUnitIdColumn )
.Where( BusinessUnitTable.RecordStatusColumn ).IsEqualTo( 1 )
.And( UserBusinessUnitTable.UserIdColumn ).IsEqualTo( userId );
The second attept allows the column name restriction, but the generated sql contains pluralised table names (?)
SqlQuery query = new Select( new string[] { BusinessUnitTable.IdColumn, BusinessUnitTable.NameColumn } )
.From<BusinessUnit>()
.InnerJoin<UserBusinessUnit>( BusinessUnitTable.IdColumn, UserBusinessUnitTable.BusinessUnitIdColumn )
.Where( BusinessUnitTable.RecordStatusColumn ).IsEqualTo( 1 )
.And( UserBusinessUnitTable.UserIdColumn ).IsEqualTo( userId );
Produces...
SELECT [BusinessUnits].[Id], [BusinessUnits].[Name]
FROM [BusinessUnits]
INNER JOIN [UserBusinessUnits]
ON [BusinessUnits].[Id] = [UserBusinessUnits].[BusinessUnitId]
WHERE [BusinessUnits].[RecordStatus] = #0
AND [UserBusinessUnits].[UserId] = #1
So, two questions:
- How do I restrict the columns returned in method 1?
- Why does method 2 pluralise the column names in the generated SQL (and can I get round this?)
I'm using 3.0.0.3...
So far my experience with 3.0.0.3 suggests that this is not possible yet with the query tool, although it is with version 2.
I think the preferred method (so far) with version 3 is to use a linq query with something like:
var busUnits = from b in BusinessUnit.All()
join u in UserBusinessUnit.All() on b.Id equals u.BusinessUnitId
select b;
I ran into the pluralized table names myself, but it was because I'd only re-run one template after making schema changes.
Once I re-ran all the templates, the plural table names went away.
Try re-running all 4 templates and see if that solves it for you.

Resources