My application needs to get some basic data from a user table with primary key user_id - and various other data about the user from secondary tables, each of which has user_id as a foreign key. There are a bunch of these secondary tables such as name, addresss, phone, etcetera - things about a person that can change over time.
More specifically, I need only some values from the most recent row from each secondary table. Each table has a "latest" column which is unix timestamp of the most recent UPDATE or INSERT (we must not delete in this application).
The following works correctly:
SELECT u.username, u.user_id, u.password, u.email, u.active
, n.first , n.middle , n.last
, uo.organization_id /* , other_cols_from_other_tables */
FROM user u
LEFT JOIN user_org uo ON (uo.user_id = u.user_id AND
uo.latest in (select max(latest) from name uo1
where uo1.user_id = u.user_id))
/* here, other LEFT JOINs like the above one */
WHERE u.username = :username
However, a subquery solution is widely discouraged due to slowness, and some of these queries will run on every request. So I came up with the following that works in some cases and gets rid of the subquery:
SELECT u.username, u.user_id, u.password, u.email, u.active
, n.first , n.middle , n.last
, uo.organization_id /* , other_cols_from_other_tables, etc. */
FROM user u
INNER JOIN
( SELECT user_id, MAX(latest) utd
FROM user_org
GROUP BY user_id
) uo1 ON uo1.user_id = u.user_id
LEFT JOIN user_org uo
ON (uo.user_id = u.user_id and uo.latest = uo1.utd)
/* here, other clauses like the part from 'FROM' to here */
WHERE u.username = :username
The latter, unfortunately makes a hard dependence on data in the secondary table, so that the whole query fails if data is lacking in any secondary table for the particular user.
I've researched this on SO and www and there are many solutions for avoiding subqueries, but everything I've found on the subject has the issue in the main query, not in a left join.
The logic I need is "if there's data for this user in this secondary table, get the specified column(s) from the most recent row in that table, otherwise a null".
It seems to me that putting a "current row" marker column on the most recent row in each table would avoid the whole issue and run faster than any other solution, but would be against normalization (I would still have to have the 'latest' column to maintain order-able history of previous data).
Is there a solution that gets normalization + speed? This is mariadb so it needs Mysql syntax.
EDIT: Still would like a better way, but decided to go with the extra column. Now the problem described above is avoided, and the SELECT SQL is much simplified and presumably faster. The downside is adding complexity in saves, but SELECT is more frequent.
MariaDB supports ROW_NUMBER as of version 10.2:
SELECT
u.username,
u.user_id,
u.password,
u.email,
u.active,
uo.organization_id,
...
FROM user u
LEFT JOIN
(
select
user_org.*,
row_number() over(partition by user_id order by latest desc) as rn
from user_org
) uo ON uo.user_id = u.user_id AND uo.rn = 1
...
WHERE u.username = :username;
Related
i have i would like to create a join over several tables.
table login : I would like to retrieve all the data from login
table logging : calculating the Nb_of_sessions for each db & for each a specific event type by user
table meeting : calculating the Nb_of_meetings for each db & for each user
table live : calculating the Nb_of_live for each db & for each user
I have those queries with the right results :
SELECT db.id,_id as userid,firstname,lastname
FROM "logins"."login",
UNNEST(dbs) AS a1 (db)
SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
array_join(array_agg(value.from_url),',') as from_url
FROM "loggings"."logging"
where event='url_event'
group by db.id,userid;
SELECT dbid,userid AS userid,count(*) as nb_interviews,
array_join(array_agg(interviewer),',') as interviewer
FROM "meetings"."meeting"
group by dbid,userid;
SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat
FROM "lives"."live",
UNNEST(users) AS r1 (user)
group by dbid,r1.user._id;
But when i begin to try put it all together, it seems i retrieve bad data (i have only on db retrieved) and it seems not efficient.
select a1.db.id,a._id as userid,a.firstname,a.lastname,count(rl._id) as nb_chat
FROM
"logins"."login" a,
"loggings"."logging" b,
"meetings"."meeting" c,
"lives"."live" d,
UNNEST(dbs) AS a1 (db),
UNNEST(users) AS r1 (user)
where a._id = b.userid AND a._id = c.userid AND a._id = r1.user._id
group by 1,2,3,4
Do you have an idea ?
Regards.
The easiest way is to work with with to structure the subquery and then reference them.
with parameter reference:
You can use WITH to flatten nested queries, or to simplify subqueries.
The WITH clause precedes the SELECT list in a query and defines one or
more subqueries for use within the SELECT query.
Each subquery defines a temporary table, similar to a view definition,
which you can reference in the FROM clause. The tables are used only
when the query runs.
Since you already have working sub queries, the following should work:
with logins as
(
SELECT db.id,_id as userid,firstname,lastname
FROM "logins"."login",
UNNEST(dbs) AS a1 (db)
)
,visits as
(
SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
array_join(array_agg(value.from_url),',') as from_url
FROM "loggings"."logging"
where event='url_event'
group by db.id,userid
)
,meetings as
(
SELECT dbid,userid AS userid,count(*) as nb_interviews,
array_join(array_agg(interviewer),',') as interviewer
FROM "meetings"."meeting"
group by dbid,userid
)
,chats as
(
SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat
FROM "lives"."live",
UNNEST(users) AS r1 (user)
group by dbid,r1.user._id
)
select *
from logins l
left join visits v
on l.dbid = v.dbid
and l.userid = v.userid
left join meetings m
on l.dbid = m.dbid
and l.userid = m.userid
left join chats c
on l.dbid = c.dbid
and l.userid = c.userid;
I want to know the partition name (similar to PART_NAME in Hive) of the table that has been partitioned specifically on Azure, I've been looking in the System tables (like sys.partitions) for this information but was not able to figure it out, therefore wanted to ask is there any other system tables which I can refer to get the detail about the partition name.
To make the question more clear I've added a screen shot. In the photo I've highlighted a column name on which partition is being performed, I want to retrieve that column name, so is there any way to retrieve that from any System tables?
You have to join a bunch of tables together to get worthwhile Partition information. Here is a sample:
SELECT s.[name] AS [schema]
,t.[name] AS [table]
, i.[name] AS [index_name]
, p.[partition_id]
, p.[partition_number]
, p.[rows] AS [partition_rows]
, rng.[value] AS [partition_value]
, ds.[data_space_id]
, ps.[function_id]
from sys.tables t
join sys.schemas s
on s.[schema_id] = t.[schema_id]
join sys.indexes i
on t.[object_id] = i.[object_id]
join sys.partitions p
on p.[object_id] = t.[object_id]
join sys.data_spaces ds
on i.[data_space_id] = ds.[data_space_id]
join sys.partition_schemes ps
on ps.[data_space_id] = i.[data_space_id]
join sys.partition_functions pf
on ps.[function_id] = pf.[function_id]
join sys.partition_range_values rng
on pf.[function_id] = rng.[function_id] AND rng.[boundary_id] = p.[partition_number]
where s.[name] = #schema_name
and t.[name] = #table_name
Which returns something like the following:
Is it possible to use .QueryMultiple (or some other method) in Dapper, and use the results of each former query to be used in the where clause of the next query, without having to do each query individually, get the id, and then .Query again, get the id and so on.
For example,
string sqlString = #"select tableA_id from tableA where tableA_lastname = #lastname;
select tableB_id from tableB WHERE tableB_id = tableA_id";
db.QueryMultiple.(sqlString, new {lastname = "smith"});
Is something like this possible with Dapper or do I need a view or stored procedure to accomplish this? I can use multiple joins for one SQL statement, but in my real query there are 7 joins, and I didn't think I should return 7 objects.
Right now I'm just using object.
You can store every previous query in table parameter and then first perform select from the parameter and query for next, for example:
DECLARE #TableA AS Table(
tableA_id INT
-- ... all other columns you need..
)
INSERT #TableA
SELECT tableA_id
FROM tableA
WHERE tableA_lastname = #lastname
SELECT *
FROM #TableA
SELECT tableB_id
FROM tableB
JOIN tableA ON tableB_id = tableA_id
I'm trying to figure out how to replicate the below query in SQLAlchemy
SELECT c.company_id AS company_id,
(SELECT policy_id FROM associative_table at WHERE at.company_id = c.company_id) AS policy_id_ref,
(SELECT `default` FROM policy p WHERE p.policy_id = policy_id_ref) AS `default`,
FROM company c;
Note that this is a stripped down, basic example of what I'm really dealing with. The actual schema supports data and relationship versioning that requires the subqueries to include additional conditions, sorting, and limiting, making it impractical (if not impossible) for them to be joins.
The crux of the problem is in how the second subquery relies on policy_id_ref -- the value obtained from the first subquery. In SQLAlchemy, this is effectively what I have now:
ct = aliased(classes.company)
at = aliased(classes.associative_table)
pt = aliased(classes.policy)
policy_id_ref = session.query(at.policy_id).\
filter(at.company_id == ct.company_id).\
label('policy_id_ref')
policy_default = session.query(pt.default).\
filter(pt.id == 'policy_id_ref').\
label('default')
query = session.query(ct.company_id,policy_id_ref,policy_default)
The pull from the "company" table works fine as does the first subquery that retrieves the "policy_id_ref" column. The problem is the second subquery that has to reference that "policy_id_ref" column. I don't know how to write its filter in such a way that it literally renders "policy_id_ref" in the resulting query, to match the label of the first subquery.
Suggestions?
Thanks in advance
You can write your query as
select(
Companies.company_id,
AssociativeTable.policy_id.label('policy_id_ref'),
Policy.default.label('policy_default'),
).select_from(
Companies,
).join(
AssociativeTable,
AssociativeTable.company_id == Companies.company_id,
).join(
Policy,
AssociativeTable.policy_id == Policy.id
)
but in case you need reference to label from subquery => use literal_column
from sqlalchemy import func, select, literal_column
session.query(
func.array_agg(
literal_column('batch_info'),
JSONB
).label('history')
).select_from(
select(
func.jsonb_build_object(
'batch_id', AccountingQueueBatch.id,
'batch_label', AccountingQueueBatch.label,
).label('batch_info')
).select_from(
AccountingQueueBatch,
)
)
I want to perform a simple join on two tables (BusinessUnit and UserBusinessUnit), so I can get a list of all BusinessUnits allocated to a given user.
The first attempt works, but there's no override of Select which allows me to restrict the columns returned (I get all columns from both tables):
var db = new KensDB();
SqlQuery query = db.Select
.From<BusinessUnit>()
.InnerJoin<UserBusinessUnit>( BusinessUnitTable.IdColumn, UserBusinessUnitTable.BusinessUnitIdColumn )
.Where( BusinessUnitTable.RecordStatusColumn ).IsEqualTo( 1 )
.And( UserBusinessUnitTable.UserIdColumn ).IsEqualTo( userId );
The second attept allows the column name restriction, but the generated sql contains pluralised table names (?)
SqlQuery query = new Select( new string[] { BusinessUnitTable.IdColumn, BusinessUnitTable.NameColumn } )
.From<BusinessUnit>()
.InnerJoin<UserBusinessUnit>( BusinessUnitTable.IdColumn, UserBusinessUnitTable.BusinessUnitIdColumn )
.Where( BusinessUnitTable.RecordStatusColumn ).IsEqualTo( 1 )
.And( UserBusinessUnitTable.UserIdColumn ).IsEqualTo( userId );
Produces...
SELECT [BusinessUnits].[Id], [BusinessUnits].[Name]
FROM [BusinessUnits]
INNER JOIN [UserBusinessUnits]
ON [BusinessUnits].[Id] = [UserBusinessUnits].[BusinessUnitId]
WHERE [BusinessUnits].[RecordStatus] = #0
AND [UserBusinessUnits].[UserId] = #1
So, two questions:
- How do I restrict the columns returned in method 1?
- Why does method 2 pluralise the column names in the generated SQL (and can I get round this?)
I'm using 3.0.0.3...
So far my experience with 3.0.0.3 suggests that this is not possible yet with the query tool, although it is with version 2.
I think the preferred method (so far) with version 3 is to use a linq query with something like:
var busUnits = from b in BusinessUnit.All()
join u in UserBusinessUnit.All() on b.Id equals u.BusinessUnitId
select b;
I ran into the pluralized table names myself, but it was because I'd only re-run one template after making schema changes.
Once I re-ran all the templates, the plural table names went away.
Try re-running all 4 templates and see if that solves it for you.