MsAccess Delete all values but one in column by condition - subquery

I have a table in MS ACCESS 2013 that looks like this:
Id Department Status FollowingDept ActualArea
1000 Thinkerers Thinking Thinkerer Thinkerer
1000 Drawers OnDrawBoard Drawers Drawers
1000 MaterialPlan To Plan MaterialPlan MaterialPlan
1000 Painters MatNeeded MaterialPlan
1000 Builders DrawsNeeded Drawers
The table gives follow to an ID which has to pass through five departments, each department with atleast 5 different status.
Each status has a FollowingDept value, like *Department* Thinkerers has the status MoreCoffeeNow which means *FollowingDept* Drawers.
All columns except for ActualArea are columns which values are gotten from the feed of a query.
ActualArea is an Expr where I inserted this logic:
Iif(FollowingDept = Department, FollowingDept, "")
My logic is simple, if the FollowingDept and Department coincide, then the ID's ActualArea gets the value from FollowingDept.
But as you can see, there can be rare cases where an ID is like my example above, where 3 departments coincide with the FollowingDept. This cases are rare, but I would like to add something like a priority to Access.
Thinkerers has the top priority, then MaterialPlan, then Drawers, then Builders and lastly Painters. So, following the same example, after ActualArea gets 3 values, Access will execute another query or subquery or whatever, where it will evaluate each value priority and only leave behind that one with the top priority. So in this example, Thinkerers gets the top priority, and the other two values are eliminated from the ActualArea column.
Please keep in mind there are over 500 different IDs, each id is repeated 5 times, so the total records to evaluate will be 2500.

You need another table, with the possible values for actualArea and the priorities as numbers, and then you can select with a JOIN and order on the priority:
SELECT TOP 1 d.*, p.priority
FROM departments d
LEFT JOIN priorities p ON d.actualArea = p.dept
WHERE d.id = 1000
AND p.priority IS NOT NULL
ORDER BY p.priority ASC
The IS NOT NULL clause eliminates all of the rows where actualArea is empty. The TOP condition leaves only the row with the top priority.
You don't seem to have a primary key for your table. If you don't, then I'll give another query in a minute, but I would strongly advise you to go back and add a primary key to the table. It will save you an incredible amount of headache later. I did add such a key to my test table, and it's called pID. This query makes use of that pID to remove the records you need:
DELETE FROM departments WHERE pID NOT IN (
SELECT TOP 1 d.pID
FROM departments d
LEFT JOIN priorities p ON d.actualArea = p.dept
WHERE id = 1000
AND p.priority IS NOT NULL
ORDER BY p.priority ASC
)
If you can't add a primary key to the data and actualArea is assumed to be unique, then you can just use the actualArea values to perform the delete:
DELETE FROM departments WHERE actualArea NOT IN (
SELECT TOP 1 d.actualArea
FROM departments d
LEFT JOIN priorities p ON d.actualArea = p.dept
WHERE id = 1000
AND p.priority IS NOT NULL
ORDER BY p.priority ASC
) AND id = 1000
If actualArea is not going to be unique, then we'll need to revisit this answer. This answer also assumes that you already have the id number.

Related

How "stable" is monotonically_increasing_id() in Spark?

I'm looking for an inexpensive way to distinguish duplicates and/or uniquely identify rows. I've been looking at the Spark built-ins monotonically_increasing_id() and uuid().
The problem with uuid() is that it does not retain its value and seems to be evaluated on the spot. For example
with uuids as (select uuid() as uuid)
select * from uuids join uuids
produces two different UUIDs.
If I use monotonically_increasing_id(), I get two identical values, but can I trust that to always work? In other words, if I have a CTE, where I have an id column generated by monotonically_increasing_id(), will any later rows derived from a row from that CTE have a consistent value of the id column within the same query?
In pseudo-SQL:
with /* ... */
with_ids as (select monotonically_increasing_id() as id, * from /* ... */),
/* ... */
derived_a as (/* Somehow derived from with_ids */),
derived_b as (/* Somehow derived from with_ids */)
select
(a.id == b.id) as are_same,
(a.id != b.id) as are_different
from derived_a as a
join derived_b as b
Will rows derived from the exact same rows of with_ids have are_same == true? Is it guaranteed that if the original rows were different, then are_different == true? The former is definitely false for uuid().
[Updated] Another example, involving a join and group by:
with
with_ids as (
select
monotonically_increasing_id() as id
,*
from table_a)
joined as (
select struct(a.*) as packed_a, a.id
from with_ids as a
left join table_b as b
on /* whatever */
)
select collect_set(packed_a) as should_be_singular
from joined
group by id
Is the row count in the above equal to the number of rows in table_a and is should_be_singular a single element array?
The documentation for both functions state that they are non-deterministic, but don't really offer any details on when the functions are evaluated or how they should be used.
The issue seems to be mentioned in SPARK-14241 and this question, but it's not clear if and under what conditions monotonically_increasing_id() is consistent within a single SQL statement.
from my past experience when working with row identifiers (uuid, row_number or monotonically_increasing_id) I cache the dataframe.
Then every subsequent calculation using the dataframe will have static row identifiers.

Correct way to get the last value for a field in Apache Spark or Databricks Using SQL (Correct behavior of last and last_value)?

What is the correct behavior of the last and last_value functions in Apache Spark/Databricks SQL. The way I'm reading the documentation (here: https://docs.databricks.com/spark/2.x/spark-sql/language-manual/functions.html) it sounds like it should return the last value of what ever is in the expression.
So if I have a select statement that does something like
select
person,
last(team)
from
(select * from person_team order by date_joined)
group by person
I should get the last team a person joined, yes/no?
The actual query I'm running is shown below. It is returning a different number each time I execute the query.
select count(distinct patient_id) from (
select
patient_id,
org_patient_id,
last_value(data_lot) data_lot
from
(select * from my_table order by data_lot)
where 1=1
and org = 'my_org'
group by 1,2
order by 1,2
)
where data_lot in ('2021-01','2021-02')
;
What is the correct way to get the last value for a given field (for either the team example or my specific example)?
--- EDIT -------------------
I'm thinking collect_set might be useful here, but I get the error shown when I try to run this:
select
patient_id,
last_value(collect_set(data_lot)) data_lot
from
covid.demo
group by patient_id
;
Error in SQL statement: AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;;
Aggregate [patient_id#89338], [patient_id#89338, last_value(collect_set(data_lot#89342, 0, 0), false) AS data_lot#91848]
+- SubqueryAlias spark_catalog.covid.demo
The posts shown below discusses how to get max values (not the same as last in a list ordered by a different field, I want the last team a player joined, the player may have joined the Reds, the A's, the Zebras, and the Yankees, in that order timewise, I'm looking for the Yankees) and these posts get to the solution procedurally using python/r. I'd like to do this in SQL.
Getting last value of group in Spark
Find maximum row per group in Spark DataFrame
--- SECOND EDIT -------------------
I ended up using something like this based upon the accepted answer.
select
row_number() over (order by provided_date, data_lot) as row_num,
demo.*
from demo
You can assign row numbers based on an ordering on data_lots if you want to get its last value:
select count(distinct patient_id) from (
select * from (
select *,
row_number() over (partition by patient_id, org_patient_id, org order by data_lots desc) as rn
from my_table
where org = 'my_org'
)
where rn = 1
)
where data_lot in ('2021-01','2021-02');

sqlite combine 2 queries from different tables to make one

I recently took to using sql again, the last time I used it was in microsoft access 2000 so please bear with me if I'm behind the times a little.
I have 2 pointless virtual currencies on my discord server for my players to play pointless games with. Both of these currencies' transactions are currently stored in individual tables.
I wish to sum up all the transactions for each player to give them a single current amount for each currency. Individually I can do this:
SELECT
tblPlayers.PlayerID AS PlayerID,
tblPlayers.Name AS Name,
SUM(tblGorillaTears.Amount)
FROM
tblPlayers
INNER JOIN
tblGorillaTears
ON
tblPlayers.PlayerID = tblGorillaTears.PlayerID
GROUP BY
tblPlayers.PlayerID;
and
SELECT
tblPlayers.PlayerID AS PlayerID,
tblPlayers.Name AS Name,
SUM(tblKebabs.Amount)
FROM
tblPlayers
INNER JOIN
tblKebabs
ON
tblPlayers.PlayerID = tblKebabs.PlayerID
GROUP BY
tblPlayers.PlayerID;
What i need is a table that outputs the user name the id and the total for each currency on one row, but when i do this:
SELECT
tblPlayers.PlayerID AS PlayerID,
tblPlayers.Name AS Name,
SUM(tblGorillaTears.Amount) AS GT,
0 as Kebabs
FROM
tblPlayers
INNER JOIN
tblGorillaTears
ON
tblPlayers.PlayerID = tblGorillaTears.PlayerID
GROUP BY
tblPlayers.PlayerID
UNION
SELECT
tblPlayers.PlayerID AS PlayerID,
tblPlayers.Name AS Name,
0 as GP,
SUM(tblKebabs.Amount)
FROM
tblPlayers
INNER JOIN
tblKebabs
ON
tblPlayers.PlayerID = tblKebabs.PlayerID
GROUP BY
tblPlayers.PlayerID;
the results end in a row for each player for each currency. How can i make it so both currencies appear in the same row?
Previously in MSAccess i was able to create two queries and then make a query of those two queries as if they were a table, but I cannot figure out how to do that in this instance. Thanks <3
UNION will add new rows for sure, you can try like following query.
SELECT TP.playerid AS PlayerID,
TP.NAME AS NAME,
(SELECT Sum(TG.amount)
FROM tblgorillatears TG
WHERE TG.playerid = TP.playerid) AS GT,
(SELECT Sum(TG.amount)
FROM tblkebabs TG
WHERE TG.playerid = TP.playerid) AS Kebabs
FROM tblplayers TP

How to compare one row to others in DAX in Excel

I have this table which has foreign keys from several other keys:
Basically, this table shows which students registered in which module run by which teacher in what term.
I want to query the following:
How many students have registered for more than one module run by a given tutor?
It will look something like this:
For example, Vasiliy Kuznetsov runs two modules: FunPro and NO. If one student registers for both of them, he is counted as one.
My sql oriented mind is telling me this: Count all the rows in which student_id and tutor_id are the same. For example, in one row student_id is 5 and tutor_id is 10, and the same is true for the third row. Then, I count it as one.
How can I do that with DAX formulas?
RowCount:=
COUNTROWS( ModuleRegistration )
StudentsWithTwoOrMoreRegistrations:=
COUNTROWS(
FILTER(
VALUES( ModuleRegistration[Student_ID] )
,[RowCount] >= 2
)
)
I refer to arguments positionally, thus the first argument to a function is (1), the second (2), and so on.
So, [RowCount] is trivial.
[StudentsWithTwoOrMoreRegistrations] is a bit more involved. DAX, being a functional language, is best understood inside-out.
FILTER() takes a table expression in (1) and evaluates a boolean predicate, (2), for each row in (1). It returns all rows from (1) for which (2) evaluates to true.
Our FILTER()'s (1) is VALUES( ModuleRegistration[Student_ID] ). VALUES() returns the unique rows from a field based on current filter context (it respects slicers and filters in the pivot table). Thus, we will return some subset of the unique list of [Student_ID]s.
Our FILTER()'s (2) is [RowCount] >= 2. For each [Student_ID] in (1), we'll evaluate [RowCount], checking how many times that student appears in ModuleRegistration. [RowCount] is evaluated in the combination of filter context from the pivot table (the [Faculty Name] field in your sample pivot provides filter context) and row context from FILTER()'s (1). Thus it counts how many times the student appears in ModuleRegistration for the [Faculty Name] on the pivot table row.
We check that [RowCount] is >= 2.
You've not indicated if your measure needs to handle grand totals, or how you might want to see that. If you need more help for the grand total to get it to behave the way you like, let me know.
Edit for grand total
There are a few ways you might want to handle grand totals. I'm gong to assume that you want a unique count of students.
StudentsWithTwoOrMoreRegistrations:=
COUNTROWS(
SUMMARIZE(
FILTER(
SUMMARIZE(
ModuleRegistration
,ModuleRegistration[Tutor_ID]
,ModuleRegistration[Student_ID]
)
,[RowCount] >= 2
)
,ModuleRegistration[Student_ID]
)
)
WTF happened to our measure?
Let's examine:
Starting with the innermost SUMMARIZE(). SUMMARIZE() navigates relationships outward from the table in (1) and groups by the columns listed in (2)-(N) (these don't have to be from the table in (1), but must be reachable by navigating relationships).
This is equivalent to the following in SQL:
SELECT
mr.Tutor_ID
,mr.Student_ID
FROM ModuleRegistration mr
We use FILTER() on this table like earlier. [RowCount] is evaluated in the combination of filter context from the pivot table and the row in the table, defined by our SUMMARIZE() above.
Now our row context is instead of just a student, a student-tutor pair. This pair will have a [RowCount] >= 2 when the student has taken more than one module from a tutor.
Our FILTER() returns the pairs which have a [RowCount] >= 2. This output table has two fields, [Tutor_ID] and [Student_ID], but we want to count distinct [Student_ID]s out of this.
Thus, we use the table from FILTER() as our (1) in the outer SUMMARIZE(). We group only by the values of [Student_ID]. We then count the rows of this table.
When only one [Faculty_Name] is in context, e.g. on a pivot table row, then our inner SUMMARIZE() is grouping by a single value of [Tutor_ID] and whatever [Student_ID]s are associated with it. This is identical to our earlier measure.
When we have many [Tutor_ID]s in context, like in the grand total, then we'll see the appropriate behavior of only counting each [Student_ID] once.

How to debug "Each GROUP BY expression must contain at least one column that is not an outer reference error"

Since SSRS doesn't allow filters on aggregates, I found some code which helped me come up with the below query. However, when I run it I get:
Each GROUP BY expression must contain at least one column that is not an outer reference
I have searched everywhere but can't find how to fix this. I've even removed the two extra tables from the query so there were no joins at all. I need to not return any order where the total of the lines on the order is less than $500 and greater than 0.
SELECT
tdsls041_sales_order_lines.company,
tdsls041_sales_order_lines.order_number,
tdsls041_sales_order_lines.amount,
tdsls041_sales_order_lines.item,
tdsls041_sales_order_lines.container
FROM
tdsls041_sales_order_lines AS tdsls041_sales_order_lines
WHERE
(tdsls041_sales_order_lines.company = 610) AND
(tdsls041_sales_order_lines.order_number IN
(SELECT
tdsls041_sales_order_lines.order_number
FROM
tdsls041_sales_order_lines AS tdsls041_sales_order_lines_1
GROUP BY
tdsls041_sales_order_lines.order_number
HAVING
(SUM(tdsls041_sales_order_lines.amount) <= 500) OR
SUM(tdsls041_sales_order_lines.amount) > 0))
The issue that SQL Server is complaining about is that the Grouping wants an aggregate function in the SELECT statement. Unfortunately, you want to use IN which you need a list of Order Numbers.
You just need to add an aggregate function to your subquery and then add another layer to select just the Order Numbers from that.
SELECT T1.company, T1.order_number, T1.amount, T1.item, T1.container
FROM tdsls041_sales_order_lines AS T1
WHERE (T1.company = 610) AND (T1.order_number IN
(SELECT order_number FROM
(SELECT TSOL.order_number, SUM(TSOL.amount) AS TTL
FROM tdsls041_sales_order_lines AS TSOL
GROUP BY TSOL.order_number
HAVING (SUM(TSOL.amount) <= 500) OR
SUM(TSOL.amount) > 0) AS T2) )
You can filter on aggreagates in Chart and Tables. You have to put the aggregate filter on your GROUP instead of on the table itself (Group Properties->Filters tab).

Resources