I'm using Hybris by SAP for a small project and almost got this down. Im trying to find the amount of Point of Service locations with 0 Orders in the past 7 days using Flexible Search.
Here is the HAC script i used:
select count(*), {PointOfService.name} from {Order left join PointOfService on {Order.pointOfService} = {PointOfService.pk}} where {creationTime} >= '2019-10-01' GROUP by {PointOfService.name} order by count(*)
The script gives me the quanity of orders for each individual PointOfService but does not give me the PointOfService locations with '0' orders. I read that this is due to count(*) not providing NULL values. Does anyone know a way around this?
You have a order attribute in your where(creationTime) so if there is no order for a specific point of service you wont be able to see it.
Something like that should work:
select count(*),
{ps.name}
from {PointOfService as ps
left join Order as o on {o.pointOfService} = {ps.pk}}
where count({o.pk}) == 0
GROUP by {ps.name}
Related
I have an Excel spreadsheet which I use as a relational database for my milk round. I query this database using MS Query in Excel (Mac 2011 Version) to generate my delivery routes. One of the columns is the customer address and I'd like to have this shown once per order i.e. have a distinct query for just this column while displaying multiple other rows. It's purely for cosmetic purposes to make the spreadsheet less cluttered.
The main spreadsheet I use as my database has column headings which I have screenshotted, complete with some sample data:
From this main spreadsheet I use MS Query to generate my delivery route which looks like this:
As you can see there is a lot of repeated data in the route generated from the query. What I'd like to do is have just one instance of the address per customer's order, it would help with the legibility of the route when opened in an iPad. I hide other columns that aren't really necessary to help in that regard.
*EDIT
From isolated's comments below, here's a screenshot of ideally how the data returned from the query should look:
I've manually deleted the repeated info in the name & address column to achieve the desired result. I've also hidden some columns that aren't really necessary and I use some conditional formatting rules to help distinguish each customer's order.
EDIT*
I have tried using a group by clause and the following window function but can't get it to work:
SELECT *
FROM (
SELECT “All Orders”.”Route ID”,
“All Orders”.Name,
“All Orders”.Address
ROW_NUMBER() OVER(PARTITION BY “All Orders”.Address
ORDER BY “All Orders”.Address DESC) AS row_number
FROM “All Orders”
) AS rows
WHERE row_number = 1;
Whenever I try to run the query I get an error message regarding syntax. Hopefully someone can tell me where I'm going wrong!
I don't know MS Sql at all, but you could do something with a formula in excel. If you don't like this solution, simply put a comment below that you would still like a sql route and I can get you a query to try to adapt to ms sql.
Create another column and call it address2 (or several more columns if your address field is multiple columns).
Then use this/these formula and adjust as needed:
Column F (address2): =IF(A2=A1,"",C2)
Column G (town2): =IF(A2=A1,"",D2)
You can then hide columns C and D.
=============
U P D A T E
Here's a method that works in many dbms such as postgres, but I don't know how to adapt [rank() over (partition by...] to excel sql.
select account,
cust_name,
item,
case
when prod_rank = 1 then address
else ''
end address
from (
select
account,
cust_name,
item,
address,
rank() over (partition by account order by item) as prod_rank
from table1
)z
order by account, item
I tried a few variations in excel sql and finally got this one to work.
select a.Account,
a.Name,
a.Product,
Iif(a.product = b.min_item,a.address,'') as [address]
FROM table1 as a
,(
select
z.Account,
min(z.Product) as min_item
FROM table1 as z
group by z.Account ) as b
where b.account = a.Account
order by a.account, a.product
I have a requirement to pull records, that do not have history in an archive table. 2 Fields of 1 record need to be checked for in the archive.
In technical sense my requirement is a left join where right side is 'null' (a.k.a. an excluding join), which in abap openSQL is commonly implemented like this (for my scenario anyways):
Select * from xxxx //xxxx is a result for a multiple table join
where xxxx~key not in (select key from archive_table where [conditions] )
and xxxx~foreign_key not in (select key from archive_table where [conditions] )
Those 2 fields are also checked against 2 more tables, so that would mean a total of 6 subqueries.
Database engines that I have worked with previously usually had some methods to deal with such problems (such as excluding join or outer apply).
For this particular case I will be trying to use ABAP logic with 'for all entries', but I would still like to know if it is possible to use results of a sub-query to check more than than 1 field or use another form of excluding join logic on multiple fields using SQL (without involving application server).
I have tested quite a few variations of sub-queries in the life-cycle of the program I was making. NOT EXISTS with multiple field check (shortened example below) to exclude based on 2 keys works in certain cases.
Performance acceptable (processing time is about 5 seconds), although, it's noticeably slower than the same query when excluding based on 1 field.
Select * from xxxx //xxxx is a result for a multiple table inner joins and 1 left join ( 1-* relation )
where NOT EXISTS (
select key from archive_table
where key = xxxx~key OR key = XXXX-foreign_key
)
EDIT:
With changing requirements (for more filtering) a lot has changed, so I figured I would update this. The construct I marked as XXXX in my example contained a single left join ( where main to secondary table relation is 1-* ) and it appeared relatively fast.
This is where context becomes helpful for understanding the problem:
Initial requirement: pull all vendors, without financial records in 3
tables.
Additional requirements: also exclude based on alternative
payers (1-* relationship). This is what example above is based on.
More requirements: also exclude based on alternative payee (*-* relationship between payer and payee).
Many-to-many join exponentially increased the record count within the construct I labeled XXXX, which in turn produces a lot of unnecessary work. For instance: a single customer with 3 payers, and 3 payees produced 9 rows, with a total of 27 fields to check (3 per row), when in reality there are only 7 unique values.
At this point, moving left-joined tables from main query into sub-queries and splitting them gave significantly better performance.
than any smarter looking alternatives.
select * from lfa1 inner join lfb1
where
( lfa1~lifnr not in ( select lifnr from bsik where bsik~lifnr = lfa1~lifnr )
and lfa1~lifnr not in ( select wyt3~lifnr from wyt3 inner join t024e on wyt3~ekorg = t024e~ekorg and wyt3~lifnr <> wyt3~lifn2
inner join bsik on bsik~lifnr = wyt3~lifn2 where wyt3~lifnr = lfa1~lifnr and t024e~bukrs = lfb1~bukrs )
and lfa1~lifnr not in ( select lfza~lifnr from lfza inner join bsik on bsik~lifnr = lfza~empfk where lfza~lifnr = lfa1~lifnr )
)
and [3 more sets of sub queries like the 3 above, just checking different tables].
My Conclusion:
When exclusion is based on a single field, both not in/not exits work. One might be better than the other, depending on filters you use.
When exclusion is based on 2 or more fields and you don't have many-to-many join in main query, not exists ( select .. from table where id = a.id or id = b.id or... ) appears to be the best.
The moment your exclusion criteria implements a many-to-many relationship within your main query, I would recommend looking for an optimal way to implement multiple sub-queries instead (even having a sub-query for each key-table combination will perform better than a many-to-many join with 1 good sub-query, that looks good).
Anyways, any additional insight into this is welcome.
EDIT2: Although it's slightly off topic, given how my question was about sub-queries, I figured I would post an update. After over a year I had to revisit the solution I worked on to expand it. I learned that proper excluding join works. I just failed horribly at implementing it the first time.
select header~key
from headers left join items on headers~key = items~key
where items~key is null
if it is possible to use results of a sub-query to check more than
than 1 field or use another form of excluding join logic on multiple
fields
No, it is not possible to check two columns in subquery, as SAP Help clearly says:
The clauses in the subquery subquery_clauses must constitute a scalar
subquery.
Scalar is keyword here, i.e. it should return exactly one column.
Your subquery can have multi-column key, and such syntax is completely legit:
SELECT planetype, seatsmax
FROM saplane AS plane
WHERE seatsmax < #wa-seatsmax AND
seatsmax >= ALL ( SELECT seatsocc
FROM sflight
WHERE carrid = #wa-carrid AND
connid = #wa-connid )
however you say that these two fields should be checked against different tables
Those 2 fields are also checked against two more tables
so it's not the case for you. Your only choice seems to be multi-join.
P.S. FOR ALL ENTRIES does not support negation logic, you cannot just use some sort of NOT IN FOR ALL ENTRIES, it won't be that easy.
I am having a u-sql query which fetch some from 3 tables and this query already had the GROUP BY. I want to fetch only top 10 rows, so i have to use the FETCH.
#data= SELECT C.id,C.Name,C.Address,ph.phoneLabel,ph.phone
FROM person AS C
INNER JOIN
phone AS ph
ON ph.id == C.id
GROUP BY id
ORDER BY id ASC
FETCH 100 ROWS;
Please provide me some samples.
Thanks in Advance!
I am not an expert or anything but few days ago I executed a query which uses both group by and order by clause. Here's how it looks: SELECT distinct savedposters.*, comments.rating, comments.posterid FROM savedposters INNER JOIN comments ON savedposters.id=comments.posterid WHERE savedposters.display=1 GROUP BY comments.posterid HAVING avg(comments.rating)>=4 and count(comments.rating)>=2 ORDER BY avg(comments.rating) DESC
What is your exact goal? There is no relationship between ORDER BY and GROUP BY. In your query you have GROUP BY but there is no aggregation so the GROUP BY is not needed, plus the query would fail. If you're looking to limit the output by 10 rows then see the first example at Output Statement (U-SQL).
So I am querying data directly from OMS Log analytics using PowerBI Desktop, and I believe there is an 8MB hard limit on the data returned by the query. The problem I have is that I need to query about 30 000 rows, but hit the 8MB limit around 18 000 rows. Is it possible to break the query up, for example, query1 would return rows 1 - 18 000, query2 would return 18 001 - 28 000 and so on, then I can merge the queries in PowerBI to give me a view of all the data?
Problem is my experience in this field, DAX in particular is quite limited, so I don't know how to specify this in the advanced editor. Any help here would be highly appreciated.
Thanks!
Same Issue. Solved it.
My Need:
I have a table in Azure LogAnalytics (LA) that accumulates about ~35K rows per day. I needed to get all rows from LA into PowerBi for analysis.
My Issue:
I crafted the KQL query I wanted in the LA Logs Web UX. I then selected the "Export -> Export to PowerBI M query" feature. Pasted it into a BLANK query in PowerBi. Authed. And I noticed a few bizarre behaviors:
1) - Like you said, I was getting a rolling ~35K rows of data, each query would trim just a bit off the first date in my KQL range.
2) - Also, I found that for each day, the query would opportunistically trim off some rows - like it was 'guessing' what data I didn't need to fit into a limit.
3) - No matter what KQL |where TimeGenerated >= ago(xd) clause I wrote, I clearly wasn't going to get back more than the limits it held me to.
My Solution - and it works great.
In PowerQuery, i created a new blank table in PowerQuery/M (not a DAX table!). In that table I used DateTimeZone.UtcNow() to start it off with Today's date, then I added a col called [Days Back] and added rows for -1,-2,-3...-7. Then, with some M, I added another col that subtracts Today from Days Back, given me a history of dates..
Now, I have a table from which I can iterate over each Date in history and pass to my KQL query a parm1 for: | where TimeGeneratedDate == todatetime('"& Date.ToText(TimeGeneratedDateLoop) & "')
As you can see, after I edited my main LA query to use TimeGeneratedDateLoop as a parm, I can now get each full day's amount of records w/o hitting the LA limit. Note, that in my case, no single day breaches the 8MB limit. If yours does, then you can attack this problem with making 12-hour breakdowns, instead of full a day.
Here's my final M-query for the function.:
NOTE: I also removed this line from the pre-generated query: "prefer"="ai.response-thinning=true" <- I don't know if it helped, but setting it to false didn't work.
let
FxDailyQuery = (TimeGeneratedDateLoop as date) =>
let
AnalyticsQuery =
let
Source = Json.Document(Web.Contents(
"https://api.loganalytics.io/v1/workspaces/xxxxx-202d-xxxx-a351-xxxxxxxxxxxx/query",
[
Query = [#"query"
= "YourLogAnalyticsTbl
| extend TimeGeneratedDate = bin(TimeGenerated, 1d)
| where notempty(Col1)
| where notempty(Col2)
| where TimeGenerated >= ago(30d)
| where TimeGeneratedDate == todatetime('"& Date.ToText(TimeGeneratedDateLoop) & "')
", #"x-ms-app" = "OmsAnalyticsPBI"],
Timeout = #duration(0, 0, 4, 0)
]
)),
TypeMap = #table({"AnalyticsTypes", "Type"}, {
{"string", Text.Type},
{"int", Int32.Type},
{"long", Int64.Type},
{"real", Double.Type},
{"timespan", Duration.Type},
{"datetime", DateTimeZone.Type},
{"bool", Logical.Type},
{"guid", Text.Type},
{"dynamic", Text.Type}
}),
DataTable = Source[tables]{0},
Columns = Table.FromRecords(DataTable[columns]),
ColumnsWithType = Table.Join(Columns, {"type"}, TypeMap, {"AnalyticsTypes"}),
Rows = Table.FromRows(DataTable[rows], Columns[name]),
Table = Table.TransformColumnTypes(Rows, Table.ToList(
ColumnsWithType,
(c) => {c{0}, c{3}}
))
in
Table
in
AnalyticsQuery
in
FxDailyQuery
Before you downvote I would like to state that I looked at all of the similar questions but I am still getting the dreaded "PRIMARY KEY column cannot be restricted" error.
Here's my table structure:
CREATE TABLE IF NOT EXISTS events (
id text,
name text,
start_time timestamp,
end_time timestamp,
parameters blob,
PRIMARY KEY (id, name, start_time, end_time)
);
And here's the query I am trying to execute:
SELECT * FROM events WHERE name = ? AND start_time >= ? AND end_time <= ?;
I am really stuck at this. Can anyone tell me what I am doing wrong?
Thanks,
Deniz
This is a query you need to remodel your data for, or use a distributed analytics platform (like spark). Id describes how your data is distributed through the database. Since it is not specified in this query a full table scan will be required to determine the necessary rows. The Cassandra design team has decided that they would rather you not do a query at all rather than do a query which will not scale.
Basically whenever you see "COLUMN cannot be restricted" It means that the query you have tried to perform cannot be done efficiently on the table you created.
To run the query, use the ALLOW FILTERING clause,
SELECT * FROM analytics.events WHERE name = ? AND start_time >= ? AND end_time <= ? ALLOW FILTERING;
The "general" rule to make query is you have to pass at least all partition key columns, then you can add each key in the order they're set." So in order for you to make this work you'd need to add where id = x in there.
However, it appears what this error message is implying is that once you select 'start_time > 34' that's as far "down the chain" you're allowed to go otherwise it would require the "potentially too costly" ALLOW FILTERING flag. So it has to be "only equality" down to one < > combo on a single column. All in the name of speed. This works (though doesn't give a range query):
SELECT * FROM events WHERE name = 'a' AND start_time = 33 and end_time <= 34 and id = '35';
If you're looking for events "happening at minute y" maybe a different data model would be possible, like adding an event for each minute the event is ongoing or what not, or bucketing based on "hour" or what not. See also https://stackoverflow.com/a/48755855/32453