Merge two queries

Merge two queries - subquery

i need to merge two queries: one lists the sum off all items per month, and one lists the sum of the items YTD.
I used union and it works when 'YTD' is selected in my dropdownlist. However, when i select any other month it give me the results of YTD and the selected month....
The union query so far:
SELECT
Site.Site_Name 'Site',
'YTD' as 'Month_Name',
Sum(MOT.Total_MR_Count_Received) 'Receiving',
Sum(MOT.Total_Line_Item_Count_Received) 'Checking',
Sum(MOT.Total_MR_Count_Shipped) 'Shipment Activity'
FROM
Metrics_Main
INNER JOIN Metrics_MOT MOT ON Metrics_Main.Metrics_Key = MOT.Metrics_Key
INNER JOIN Month ON Metrics_Main.Month_Key = Month.Month_Key
INNER JOIN Site ON Metrics_Main.Site_Key = Site.Site_Key
group by Site.site_name
union
SELECT
Site.Site_Name 'Site',
Month.Month_Name 'Month_Name',
sum(MOT.Total_MR_Count_Received) 'Receiving',
sum(MOT.Total_Line_Item_Count_Received) 'Checking',
sum(MOT.Total_MR_Count_Shipped) 'Shipment_Activity'
FROM
Metrics_Main
INNER JOIN Metrics_MOT MOT ON Metrics_Main.Metrics_Key = MOT.Metrics_Key
INNER JOIN Month ON Metrics_Main.Month_Key = Month.Month_Key
INNER JOIN Site ON Metrics_Main.Site_Key = Site.Site_Key
WHERE
Month.Month_Name like #Month_Name
group by Site.site_name, month.month_name

This will help: http://www.w3schools.com/sql/sql_union.asp. Make sure you that you have the exact same number of columns in the two select queries; so add the "Month_Name" to the first query as well.

Related

Databricks AnalysisException: Column 'l' does not exist

I have a very strange occurrence with my code.
I keep on getting the error
AnalysisException: Column 'homepage_url' does not exist
However, when I do a select with cross Joins the column does actually exist.
Can someone take a look at my cross joins and let me know if that is where the problem is
SELECT DISTINCT
account.xpd_relationshipstatus AS CRM_xpd_relationshipstatus
,REPLACE(owneridname,'Data.Import #','') AS MontaguOwner
,account.ts_montaguoffice AS Montagu_Office
,CAST(account.ts_reminderdatesetto AS DATE) AS CRM_ts_reminderdatesetto
,CAST(account.ts_lastdatestatuschanged AS DATE) AS YearofCRMtslastdatestatuschanged
,organizations.name AS nameCB
,organizations.homepage_url
,iff(e like 'www.%', e, 'www.' + e) AS website
,left(category_list,charindex(',',category_list +',' )-1) AS category_CB
-- ,case when charindex(',',category_list,0) > 0 then left(category_list,charindex(',',category_list)-1) else category_list end as category_CB
,organizations.category_groups_list AS category_groups_CB
FROM basecrmcbreport.account
LEFT OUTER JOIN basecrmcbreport.CRM2CBURL_Lookup
ON account.Id = CRM2CBURL_Lookup.Key
LEFT OUTER JOIN basecrmcbreport.organizations
ON CRM2CBURL_Lookup.CB_URL_KEY = organizations.cb_url
cross Join (values (charindex('://', homepage_url))) a(a)
cross Join (values (iff(a = 0, 1, a + 3))) b(b)
cross Join (values (charindex('/', homepage_url, b))) c(c)
cross Join (values (iff(c = 0, length(homepage_url) + 1, c))) d(d)
cross Join (values (substring(homepage_url, b, d - b))) e(e)
Without the cross Joins

The main reason for cross join (or any join) to recognize the column when you select not when using table valued functions is that joins are used on tables only.
To use table valued functions, one must use cross apply or outer apply. But these are not supported in Databricks sql.
The following is the demo data I am using:
I tried using inner join on a table valued function using the following query and got the same error:
select d1.*,a from demo1 inner join (values(if(d1.team = 'OG',2,1))) a;
Instead, using the select query, the joins work as that is how they function:
select d1.*,a.no_of_wins from demo1 d1 inner join (select id,case team when 'OG' then 2 when 'TS' then 1 end as no_of_wins from demo1) a on d1.id=a.id;
So, the remedy for this problem is to replace all the table valued functions on which you are using joins with SELECT statements.

ATHENA/PRESTO complex query with multiple unnested tables

i have i would like to create a join over several tables.
table login : I would like to retrieve all the data from login
table logging : calculating the Nb_of_sessions for each db & for each a specific event type by user
table meeting : calculating the Nb_of_meetings for each db & for each user
table live : calculating the Nb_of_live for each db & for each user
I have those queries with the right results :
SELECT db.id,_id as userid,firstname,lastname
FROM "logins"."login",
UNNEST(dbs) AS a1 (db)
SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
array_join(array_agg(value.from_url),',') as from_url
FROM "loggings"."logging"
where event='url_event'
group by db.id,userid;
SELECT dbid,userid AS userid,count(*) as nb_interviews,
array_join(array_agg(interviewer),',') as interviewer
FROM "meetings"."meeting"
group by dbid,userid;
SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat
FROM "lives"."live",
UNNEST(users) AS r1 (user)
group by dbid,r1.user._id;
But when i begin to try put it all together, it seems i retrieve bad data (i have only on db retrieved) and it seems not efficient.
select a1.db.id,a._id as userid,a.firstname,a.lastname,count(rl._id) as nb_chat
FROM
"logins"."login" a,
"loggings"."logging" b,
"meetings"."meeting" c,
"lives"."live" d,
UNNEST(dbs) AS a1 (db),
UNNEST(users) AS r1 (user)
where a._id = b.userid AND a._id = c.userid AND a._id = r1.user._id
group by 1,2,3,4
Do you have an idea ?
Regards.

The easiest way is to work with with to structure the subquery and then reference them.
with parameter reference:
You can use WITH to flatten nested queries, or to simplify subqueries.
The WITH clause precedes the SELECT list in a query and defines one or
more subqueries for use within the SELECT query.
Each subquery defines a temporary table, similar to a view definition,
which you can reference in the FROM clause. The tables are used only
when the query runs.
Since you already have working sub queries, the following should work:
with logins as
(
SELECT db.id,_id as userid,firstname,lastname
FROM "logins"."login",
UNNEST(dbs) AS a1 (db)
)
,visits as
(
SELECT dbid,userid,count(distinct(sessionid)) as no_of_visits,
array_join(array_agg(value.from_url),',') as from_url
FROM "loggings"."logging"
where event='url_event'
group by db.id,userid
)
,meetings as
(
SELECT dbid,userid AS userid,count(*) as nb_interviews,
array_join(array_agg(interviewer),',') as interviewer
FROM "meetings"."meeting"
group by dbid,userid
)
,chats as
(
SELECT dbid,r1.user._id AS userid,count(_id) as nb_chat
FROM "lives"."live",
UNNEST(users) AS r1 (user)
group by dbid,r1.user._id
)
select *
from logins l
left join visits v
on l.dbid = v.dbid
and l.userid = v.userid
left join meetings m
on l.dbid = m.dbid
and l.userid = m.userid
left join chats c
on l.dbid = c.dbid
and l.userid = c.userid;

Spark inner join API returns too many records

File link here I have two same dataframes each has 27817 rows.Try to inner join these dataframes it returns 128954989 rows.
dataframe1.join(dataframe2,"_c0").count
res16: Long = 128954989
how to resolve this .

It happens because your join is creating cartesian products. If you want to keep the rows on the left side of the join you can do a left join like:
dataframe1.join(dataframe2,"_c0", "left")
Also you have more types of joins and you have to select one of them deppending on your necessity. Here you can see the join trypes with examples:

How to aggregate a field in BQL for complex query

I have a BQL query joining three tables as follows:
foreach (PXResult<GLTran, Branch, xTACOpenSourceDetail> rec in
PXSelectJoin<GLTran,
InnerJoin<Branch,
On<GLTran.branchID, Equal<Branch.branchID>>,
InnerJoin<xTACOpenSourceDetail,
On<Branch.branchCD, Equal<xTACOpenSourceDetail.string03>,
And<xTACOpenSourceDetail.openSourceName, Equal<Constants.openSourceName>,
And<xTACOpenSourceDetail.dataID, Equal<Constants.privateer>>>>>>,
Where<Branch.branchCD, NotEqual<Required<Branch.branchCD>>,
And<GLTran.posted, Equal<True>,
And<GLTran.ledgerID, Equal<Required<GLTran.ledgerID>>,
And<GLTran.tranDate, GreaterEqual<Required<GLTran.tranDate>>>>>>,
OrderBy<Asc<xTACOpenSourceDetail.string01, Asc<GLTran.batchNbr>>>>.Select(Base, osdBranch.String03, ledger.LedgerID, tacsmlm.Date01))
I want to add one aggregated field, namely the sum of the GLTran.CuryDebitAmt grouped by GLTran.BatNbr and Branch.BranchCD.
I can easily do this in SQL using the SUM OVER functionality as follows:
SELECT SUM(GLTran.CuryDebitAmt) OVER (PARTITION BY GLTran.BatchNbr, Branch.BranchCD) as 'BatchTotal'
,GLTran.*
,Branch.*
,xTACOpenSourceDetail.*
FROM GLTran
Inner Join Branch
On GLTran.branchID = Branch.branchID
AND Branch.CompanyID = GLTran.CompanyID
Inner Join xTACOpenSourceDetail
On Branch.branchCD = xTACOpenSourceDetail.string03
And xTACOpenSourceDetail.openSourceName = 'TAC FM Map Company Branch'
And xTACOpenSourceDetail.dataID = 'Privateer'
AND xTACOpenSourceDetail.CompanyID = GLTran.CompanyID
Where Branch.branchCD <> '000 0000'
And GLTran.posted = 1
And GLTran.ledgerID = 6
And GLTran.tranDate >= '08/03/2017'
AND GLTran.CompanyID = 2
Order
By xTACOpenSourceDetail.string01 ASC
,GLTran.batchNbr ASC
...but I have no idea how to add this single summed field in BQL. Any help is appreciated.

You will use a PXSelectGroupBy and in your Aggreate for the BQL indicate which fields will "SUM" their values. Any field not called out will be the MAX value.
If you search SUM< in the Acumatica source you can find plenty of BQL examples. Here is a BQL Example from ARPaymentEntry. Only two (curyAdjdAmt & adjAmt) fields will contain a SUM while all other fields returned will be the MAX.
SOAdjust other = PXSelectGroupBy<SOAdjust,
Where<SOAdjust.voided, Equal<False>,
And<SOAdjust.adjdOrderType, Equal<Required<SOAdjust.adjdOrderType>>,
And<SOAdjust.adjdOrderNbr, Equal<Required<SOAdjust.adjdOrderNbr>>,
And<
Where<SOAdjust.adjgDocType, NotEqual<Required<SOAdjust.adjgDocType>>,
Or<SOAdjust.adjgRefNbr, NotEqual<Required<SOAdjust.adjgRefNbr>>>>>>>>,
Aggregate<GroupBy<SOAdjust.adjdOrderType,
GroupBy<SOAdjust.adjdOrderNbr,
Sum<SOAdjust.curyAdjdAmt,
Sum<SOAdjust.adjAmt>>>>>>.Select(this, adj.AdjdOrderType, adj.AdjdOrderNbr, adj.AdjgDocType, adj.AdjgRefNbr);
Another alternative solution for your question would be to create a PXProjection which could be the sum by group values which you then in your regular select include the projection table vs the base table in your BQL. I don't know the performance benefits vs one or the other - just another option.

How to make subquery fast

for an author overview we are looking for a query which will show all the authors including their best book. The problem with this query is that it lacks speed. There are only about 1500 authors and the query do generate the overview is currently taking 20 seconds.
The main problem seems te be generating the average rating of all the books per person.
By selecting the following query, it is still rather fast
select
person.id as pers_id,
person.firstname,
person.suffix,
person.lastname,
thriller.title,
year(thriller.orig_pubdate) as year,
thriller.id as thrill_id,
count(user_rating.id) as nr,
AVG(user_rating.rating) as avgrating
from
thriller
inner join
thriller_form
on thriller_form.thriller_id = thriller.id
inner join
thriller_person
on thriller_person.thriller_id = thriller.id
and thriller_person.person_type_id = 1
inner join
person
on person.id = thriller_person.person_id
left outer join
user_rating
on user_rating.thriller_id = thriller.id
and user_rating.rating_type_id = 1
where thriller.id in
(select top 1 B.id from thriller as B
inner join thriller_person as C on B.id=C.thriller_id
and person.id=C.person_id)
group by
person.firstname,
person.suffix,
person.lastname,
thriller.title,
year(thriller.orig_pubdate),
thriller.id,
person.id
order by
person.lastname
However, if we make the subquery a little more complex by selecting the book with the average rating it takes a full 20 seconds to generate a resultset.
The query would then be as follows:
select
person.id as pers_id,
person.firstname,
person.suffix,
person.lastname,
thriller.title,
year(thriller.orig_pubdate) as year,
thriller.id as thrill_id,
count(user_rating.id) as nr,
AVG(user_rating.rating) as avgrating
from
thriller
inner join
thriller_form
on thriller_form.thriller_id = thriller.id
inner join
thriller_person
on thriller_person.thriller_id = thriller.id
and thriller_person.person_type_id = 1
inner join
person
on person.id = thriller_person.person_id
left outer join
user_rating
on user_rating.thriller_id = thriller.id
and user_rating.rating_type_id = 1
where thriller.id in
(select top 1 B.id from thriller as B
inner join thriller_person as C on B.id=C.thriller_id
and person.id=C.person_id
inner join user_rating as D on B.id=D.thriller_id
group by B.id
order by AVG(D.rating))
group by
person.firstname,
person.suffix,
person.lastname,
thriller.title,
year(thriller.orig_pubdate),
thriller.id,
person.id
order by
person.lastname
Anyone got a good suggestion to speed up this query?

Calculating an average requires a table scan since you've got to sum the values and then divide by the number of (relevant) rows. This in turn means that you're doing a lot of rescanning; that's slow. Can you calculate the averages once and store them? That would let your query use those pre-computed values. (Yes, it denormalizes the data, but denormalizing for performance is often necessary; there's a trade-off between performance and minimal data.)
It might be appropriate to use a temporary table as the store of the averages.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Merge two queries - subquery

This will help: http://www.w3schools.com/sql/sql_union.asp. Make sure you that you have the exact same number of columns in the two select queries; so add the "Month_Name" to the first query as well.

Related

Databricks AnalysisException: Column 'l' does not exist

ATHENA/PRESTO complex query with multiple unnested tables

Spark inner join API returns too many records

How to aggregate a field in BQL for complex query

How to make subquery fast

Categories

Resources