I have a event list where it's info being collected from the web form. Sometimes people click submit and are likely to get duplicates.
Sometimes people dont remember whether they submitted already, so some may resubmit.
Sometime people cancel the event and again book that was previously cancelled.
Sometimes people simply Cancel and submit for fun
Hence I need to find a way to efficiently retrieve the list of bookings those are latest.
The list is arranged as they are submitted and hence the last one is the most recent submissions and is considered for processing.
Here is the link for testing purposes
In the expected output you see that last booking is considered (Mark) and there are no duplicates and the cancelled events are eliminated (Clay Dunn, Malcolm etc). rebookings are considered even if they are previously cancelled (Myrtle Todd)
I have tried with
Based on this data in Sheet1:
Try this on another sheet, cell A1 (it does not rely on your helper column B for any sort):
=arrayformula(
{Sheet1!B2:F2;
query(
iferror(
vlookup(unique(lower(Sheet1!F3:F)),{query({Sheet1!F3:F,row(Sheet1!F3:F),Sheet1!B3:F},"where Col1 is not null order by Col1,Col2 desc",0)},{3,4,5,6,7,2},)
,),"select Col1,Col2,Col3,Col4,Col5 where Col2 contains 'Booking' order by Col6 ",0)
})
Or if you don't want the row numbers in the helper column:
=arrayformula(
{Sheet1!C2:E2,"";
query(
iferror(
vlookup(unique(lower(Sheet1!F3:F)),{query({Sheet1!F3:F,row(Sheet1!F3:F),Sheet1!C3:F},"where Col1 is not null order by Col1,Col2 desc",0)},{3,4,5,6,2},)
,),"select Col1,Col2,Col3,Col4 where Col1 contains 'Booking' order by Col5 ",0)
})
It uses query, vlookup to get the latest record for each value in Col F, then omits anything but 'Booking'.
please try:
=SORT(SORTN(SORT(A3:E16,1,),2^99,2,5,1))
Notes:
SORTN
article
Filters are needed to be applied after SORTN to exclude last cancelled results. I suggest using query for this:
=QUERY(SORT(SORTN(SORT(A3:E16,1,),2^99,2,5,1)),"select * where Col2 != 'Cancelling'")
https://docs.google.com/spreadsheets/d/1qA_dlaFL3FQQdPKOBZtNaZ6k1x0Iv3AiQoHwtPSQO70/copy
Related
So Deduping is one of the basic and imp Datacleaning technique.
There are a number of ways to do that in dataflow.
Like myself doing deduping with help of aggregate transformation where i put key columns(Consider "Firstname" and "LastName" as cols) which are need to be unique in Group by and a column pattern like name != 'Firstname' && name!='LastName'
$$ _____first($$) in aggregate tab.
The problem with this method is ,if we have a total of 200 cols among 300 cols to be considered as Unique cols, Its a very tedious to do include 200 cols in my column Pattern.
Can anyone suggest a better and optimised Deduping process in Dataflow acc to the above situation?
I tried to repro the deduplication process using dataflow. Below is the approach.
List of columns that needs to be grouped by are given in dataflow parameters.
In this repro, three columns are given. This can be extended as per requirements.
Parameter Name: Par1
Type: String
Default value: 'col1,col2,col3'
Source is taken as in below image.
(Group By columns: col1, col2, col3;
Aggregate column: col4)
Then Aggregate transform is taken and in group by,
sha2(256,byNames(split($Par1,','))) is given in columns and it is named as groupbycolumn
In Aggregates, + Add column pattern near column1 and then delete Column1. Then Enter true() in matching condition. Then click on undefined column expression and enter $$ in column name expression and first($$) in value expression.
Output of aggregation function
Data is grouped by col1,col2 and col3 and first value of col4 is taken for every col1,col2 and col3 combination.
Then using select transformation, groupbycolumn from above output can be removed before copying to sink.
Reference: ** MS document** on Mapping data flow script - Azure Data Factory | Microsoft Learn
I have very awkward database(4mln rows), structured like this:
id, start , suspension, renewal , stop , delete , category
1, 01-01-2019, 01-02-2019, 01-03-2019, 01-04-2019, 02-04-2019, A
2, 01-01-2019, , , 01-04-2019, 02-04-2019, B
I can create nice Pivot Table, with date of lets say "suspension" as Column, number of unique id's as Values, Category as filter, and then see how many Items were suspended each day.
But I need to see as well, how many Items were active each day - that means Started, not Stopped or Deleted, and Renewed if Suspended.
Unfortunately it is not as simple as "substract number of deleted from started each day".
So when I filter category A in pivot table, Excel checks every record if category == "A", right?
Can I do something similar, to check if start <= date & (stop or delete >= date) & if(suspension <= date){renewal <= date} for each row, for each date?
Try using a calculated field. U can put the formula almost directly in there. May need some tweaking.
https://support.office.com/en-us/article/calculate-values-in-a-pivottable-11f41417-da80-435c-a5c6-b0185e59da77
All I've done a lot of background reading on this on and off for the past day or so and I'm none the wiser on how to achieve this.
I have looked on this site and found ways of concatenating multiple rows into one column however what I'm after is a bit more bespoke, please help...
I have two tables - one is a list of people and details about them such as name etc and a person reference.
The second contains a number of alerts about a person, one person can have multiple alerts. This would contain a person reference and the type of alert they have in a string.
I want to join these two tables using an inner join on the person reference.
I next want to find all of the alerts for each person and concatenate it into a string and show this as the "All alerts" column.
So I will end up with the following output:
First Name | Surname | All Alerts
-----------+---------+--------------------------
Tony | Stark | Alert 1, Alert 2, Alert 3
I can get as far as going through all of the alerts in the alerts table and put the alerts for every person into a string, obviously I need a concatenated value for each person, and haven't figured out how to do this.
I've spent a day on this and looked into XMLPath solutions and using CTE, CROSS APPLY and subqueries to specify the where clause. I am a little lost.
DECLARE #ConcatenatedVals VARCHAR (255)
SET #ConcatenatedVals =
(
DECLARE #AllAlerts VARCHAR(8000)
SELECT #AllAlerts = COALESCE(#AllAlerts + ', ', '') + personAlert
FROM Alerts
SELECT #AllAlerts AS 'All Alerts'
)
I found the solution for this posted here:
https://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/#Toc205129480
This referenced a solution by Adam Machanic
Also, stuff is actually a function, not seen this before:
SELECT p1.CategoryId,
stuff( (SELECT ','+ProductName
FROM Northwind.dbo.Products p2
WHERE p2.CategoryId = p1.CategoryId
ORDER BY ProductName
FOR XML PATH(''), TYPE).value('.', 'varchar(max)')
,1,1,'')
AS Products
FROM Northwind.dbo.Products p1
GROUP BY CategoryId ;
I have this table which has foreign keys from several other keys:
Basically, this table shows which students registered in which module run by which teacher in what term.
I want to query the following:
How many students have registered for more than one module run by a given tutor?
It will look something like this:
For example, Vasiliy Kuznetsov runs two modules: FunPro and NO. If one student registers for both of them, he is counted as one.
My sql oriented mind is telling me this: Count all the rows in which student_id and tutor_id are the same. For example, in one row student_id is 5 and tutor_id is 10, and the same is true for the third row. Then, I count it as one.
How can I do that with DAX formulas?
RowCount:=
COUNTROWS( ModuleRegistration )
StudentsWithTwoOrMoreRegistrations:=
COUNTROWS(
FILTER(
VALUES( ModuleRegistration[Student_ID] )
,[RowCount] >= 2
)
)
I refer to arguments positionally, thus the first argument to a function is (1), the second (2), and so on.
So, [RowCount] is trivial.
[StudentsWithTwoOrMoreRegistrations] is a bit more involved. DAX, being a functional language, is best understood inside-out.
FILTER() takes a table expression in (1) and evaluates a boolean predicate, (2), for each row in (1). It returns all rows from (1) for which (2) evaluates to true.
Our FILTER()'s (1) is VALUES( ModuleRegistration[Student_ID] ). VALUES() returns the unique rows from a field based on current filter context (it respects slicers and filters in the pivot table). Thus, we will return some subset of the unique list of [Student_ID]s.
Our FILTER()'s (2) is [RowCount] >= 2. For each [Student_ID] in (1), we'll evaluate [RowCount], checking how many times that student appears in ModuleRegistration. [RowCount] is evaluated in the combination of filter context from the pivot table (the [Faculty Name] field in your sample pivot provides filter context) and row context from FILTER()'s (1). Thus it counts how many times the student appears in ModuleRegistration for the [Faculty Name] on the pivot table row.
We check that [RowCount] is >= 2.
You've not indicated if your measure needs to handle grand totals, or how you might want to see that. If you need more help for the grand total to get it to behave the way you like, let me know.
Edit for grand total
There are a few ways you might want to handle grand totals. I'm gong to assume that you want a unique count of students.
StudentsWithTwoOrMoreRegistrations:=
COUNTROWS(
SUMMARIZE(
FILTER(
SUMMARIZE(
ModuleRegistration
,ModuleRegistration[Tutor_ID]
,ModuleRegistration[Student_ID]
)
,[RowCount] >= 2
)
,ModuleRegistration[Student_ID]
)
)
WTF happened to our measure?
Let's examine:
Starting with the innermost SUMMARIZE(). SUMMARIZE() navigates relationships outward from the table in (1) and groups by the columns listed in (2)-(N) (these don't have to be from the table in (1), but must be reachable by navigating relationships).
This is equivalent to the following in SQL:
SELECT
mr.Tutor_ID
,mr.Student_ID
FROM ModuleRegistration mr
We use FILTER() on this table like earlier. [RowCount] is evaluated in the combination of filter context from the pivot table and the row in the table, defined by our SUMMARIZE() above.
Now our row context is instead of just a student, a student-tutor pair. This pair will have a [RowCount] >= 2 when the student has taken more than one module from a tutor.
Our FILTER() returns the pairs which have a [RowCount] >= 2. This output table has two fields, [Tutor_ID] and [Student_ID], but we want to count distinct [Student_ID]s out of this.
Thus, we use the table from FILTER() as our (1) in the outer SUMMARIZE(). We group only by the values of [Student_ID]. We then count the rows of this table.
When only one [Faculty_Name] is in context, e.g. on a pivot table row, then our inner SUMMARIZE() is grouping by a single value of [Tutor_ID] and whatever [Student_ID]s are associated with it. This is identical to our earlier measure.
When we have many [Tutor_ID]s in context, like in the grand total, then we'll see the appropriate behavior of only counting each [Student_ID] once.
I have the following data model:
Record: Id, ..., CreateDate
FactA: RecordId, CreateDate
FactB: RecordId, CreateDate
Relationships exist from FactA to Record and FactB to Record.
I've written measures on Records such as this with no issues:
FactA's:=CALCULATE(DISTINCTCOUNT(Records[Id]), FactA)
FactB's:=CALCULATE(DISTINCTCOUNT(Records[Id]), FactB)
Now I'd like a count of Records with FactA but no FactB, in SQL I'd do a LEFT JOIN WHERE FactB.RecordId IS NULL but I can't figure out how to do similar in DAX. I've tried:
-- this returns blank, presumably because when there is a FactB then RecordId isn't blank, and when there is no Fact B then RecordId a NULL which isn't blank either
FactA_No_FactB:=CALCULATE(DISTINCTCOUNT(Records[Id]), FactA, FILTER(FactB, ISBLANK([RecordId])))
-- this returns the long "The value for columns "RecordId" in table "FactB" cannot be determined in the current context" error.
FactA_No_FactB:=CALCULATE(DISTINCTCOUNT(Records[Id]), FILTER(FactA, ISBLANK(FactB[RecordId])))
I've also tried various ways of using RELATED and RELATEDTABLE but I don't really understand enough about DAX and context to know what I'm doing.
Can someone explain how I can write the calculated measure to count Records with FactA but no FactB?
Thanks in advance.
Edit - Workaround
I've come up with this, it looks correct so far but I'm not sure if it is the generally correct way to do this:
-- Take the count with FactA and subtract the count of (FactA and FactB)
FactA_No_FactB:=CALCULATE(DISTINCTCOUNT(Records[Id]), FactA) - CALCULATE(DISTINCTCOUNT(Records[Id]), FactA, FactB)
Here's an alternative, that might still not be the best way of doing it:
FactA_No_FactB:=CALCULATE(DISTINCTCOUNT(Records[ID]), FILTER(Records,CONTAINS(FactA, FactA[RecordID],Records[ID]) && NOT(CONTAINS(FactB,FactB[RecordID],Records[ID]))))
The difference between my version and yours is that mine returns a value of 1 for those items in and A but not B and BLANK for everything else. Your version returns 1 for those items in A but not B, 0 for those in both A and B and BLANK for everything else. Depending on your use case, one outcome may be prefereable over the other.