How to use ORDER BY and GROUP BY together in u-sql - azure

I am having a u-sql query which fetch some from 3 tables and this query already had the GROUP BY. I want to fetch only top 10 rows, so i have to use the FETCH.
#data= SELECT C.id,C.Name,C.Address,ph.phoneLabel,ph.phone
FROM person AS C
INNER JOIN
phone AS ph
ON ph.id == C.id
GROUP BY id
ORDER BY id ASC
FETCH 100 ROWS;
Please provide me some samples.
Thanks in Advance!

I am not an expert or anything but few days ago I executed a query which uses both group by and order by clause. Here's how it looks: SELECT distinct savedposters.*, comments.rating, comments.posterid FROM savedposters INNER JOIN comments ON savedposters.id=comments.posterid WHERE savedposters.display=1 GROUP BY comments.posterid HAVING avg(comments.rating)>=4 and count(comments.rating)>=2 ORDER BY avg(comments.rating) DESC

What is your exact goal? There is no relationship between ORDER BY and GROUP BY. In your query you have GROUP BY but there is no aggregation so the GROUP BY is not needed, plus the query would fail. If you're looking to limit the output by 10 rows then see the first example at Output Statement (U-SQL).

Related

Hybris flexible extract order id which has more than 10 products

I need to extract order id which has more than 10 products. Could you please suggest me the flexible search query.
Hi Please use group by with having clause.
select {o.code} from {Order as o join OrderEntry as oe on {oe.order}={o.pk}}
group by {o.code} having count({o.code})>10
reference to use having clause: having clause

Azure Datafactory: How to implement nested sql query in transformation data flow

[![enter image description here][1]][1]
I have two streams customer and customercontact. I am new to azure data factory. I just want to know which activity in data flow transformation will achieve the below sql query result.
(SELECT *
FROM customercontact
WHERE customerid IN
(SELECT customerid
FROM customer)
ORDER BY timestamp DESC
LIMIT 1)
I can utilize Exist transformation for inner query but I am need some help on how I can fetch the first row after sorting customer contact data.So , basically I am looking for a way to add limit/Top/Offset clause in dataflow.
You can achieve transformation for a given query in data flow with different transformation.
For sorting you can use Sort transformation. Here you can select Order Ascending or descending.
For top few records you can use Rank transformation.
For “IN” clause you can use Exists transformation.
Refer - https://learn.microsoft.com/en-us/azure/data-factory/data-flow-rank
Here is my sample data in SQL as Source
I have used Rank transformation.
After rank transformation one more column i.e. RankColumn got added.
Now to select only top 1 record I have used Filter Row Modifier. I used equals(RankColumn,1) expression to select Top 1 record.
Now finally use Sink activity and run pipeline.

Count(*) not showing null values

I'm using Hybris by SAP for a small project and almost got this down. Im trying to find the amount of Point of Service locations with 0 Orders in the past 7 days using Flexible Search.
Here is the HAC script i used:
select count(*), {PointOfService.name} from {Order left join PointOfService on {Order.pointOfService} = {PointOfService.pk}} where {creationTime} >= '2019-10-01' GROUP by {PointOfService.name} order by count(*)
The script gives me the quanity of orders for each individual PointOfService but does not give me the PointOfService locations with '0' orders. I read that this is due to count(*) not providing NULL values. Does anyone know a way around this?
You have a order attribute in your where(creationTime) so if there is no order for a specific point of service you wont be able to see it.
Something like that should work:
select count(*),
{ps.name}
from {PointOfService as ps
left join Order as o on {o.pointOfService} = {ps.pk}}
where count({o.pk}) == 0
GROUP by {ps.name}

Select where multiple fields are not in subquery (excluding join)

I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.
In technical sense my requirement is a left join where right side is 'null' (a.k.a. an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):
Select * from xxxx //xxxx is a result for a multiple table join
where xxxx~key not in (select key from archive_table where [conditions] )
and xxxx~foreign_key not in (select key from archive_table where [conditions] )
Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.
Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).
For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).
I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases.
Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.
Select * from xxxx //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS (
select key from archive_table
where key = xxxx~key OR key = XXXX-foreign_key
)
EDIT:
With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.
This is where context becomes helpful for understanding the problem:
Initial requirement: pull all vendors, without financial records in 3
tables.
Additional requirements: also exclude based on alternative
payers (1-* relationship). This is what example above is based on.
More requirements: also exclude based on alternative payee (*-* relationship between payer and payee).
Many-to-many join exponentially increased the record count within the construct I labeled XXXX, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers, and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.
At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance.
than any smarter looking alternatives.
select * from lfa1 inner join lfb1
where
( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
inner join bsik on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs )
and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
)
and [3 more sets of sub queries like the 3 above, just checking different tables].
My Conclusion:
When exclusion is based on a single field, both not in/not exits work. One might be better than the other, depending on filters you use.
When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).
Anyways, any additional insight into this is welcome.
EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.
select header~key
from headers left join items on headers~key = items~key
where items~key is null
if it is possible to use results of a sub-query to check more than
than 1 field or use another form of excluding join logic on multiple
fields
No, it is not possible to check two columns in subquery, as SAP Help clearly says:
The clauses in the subquery subquery_clauses must constitute a scalar
subquery.
Scalar is keyword here, i.e. it should return exactly one column.
Your subquery can have multi-column key, and such syntax is completely legit:
SELECT planetype, seatsmax
FROM saplane AS plane
WHERE seatsmax < #wa-seatsmax AND
seatsmax >= ALL ( SELECT seatsocc
FROM sflight
WHERE carrid = #wa-carrid AND
connid = #wa-connid )
however you say that these two fields should be checked against different tables
Those 2 fields are also checked against two more tables
so it's not the case for you. Your only choice seems to be multi-join.
P.S. FOR ALL ENTRIES does not support negation logic, you cannot just use some sort of NOT IN FOR ALL ENTRIES, it won't be that easy.

Spark sql top n per group

How can I get the top-n (lets say top 10 or top 3) per group in spark-sql?
http://www.xaprb.com/blog/2006/12/07/how-to-select-the-firstleastmax-row-per-group-in-sql/ provides a tutorial for general SQL. However, spark does not implement subqueries in the where clause.
You can use the window function feature that was added in Spark 1.4
Suppose that we have a productRevenue table as shown below.
the answer to What are the best-selling and the second best-selling products in every category is as follows
SELECT product,category,revenue FROM
(SELECT product,category,revenue,dense_rank()
OVER (PARTITION BY category ORDER BY revenue DESC) as rank
FROM productRevenue) tmp
WHERE rank <= 2
Tis will give you the desired result

Resources