Count values for each row with a unique ID - excel

I have a bunch of rows in a table. Each row reflects an event in a patient. However, one patient can have experienced multiple events, so it's possible for there to be multiple rows with the same patient number. Now I'd like to count the amount of male patients in my database, without counting the ones that had multiple events multiple times. Each patient is identified by a unique patient ID that could be used for this.
This shouldn't be all that complicated if not for the fact that I'm using a table that also has several filters, so I need to use SUBTOTAL for any counting functions.
I literally have no idea where to start, so I can't really provide any code...
Any function that could point me in the right direction would be greatly appreciated.
Thanks for the help.
~Laurens

Use a Pivot Table to filter and count all your patients database. Select your data and select Insert -> Tables -> Pivot Tables. Put your filters at the Filter section of the table and the Patient ID in the Rows section. Then, you can use COUNT to get the number of patients.
For more information about Pivot Tables, you can check this: https://support.office.com/en-us/article/Create-a-PivotTable-to-analyze-worksheet-data-a9a84538-bfe9-40a9-a8e9-f99134456576

To get the number of unique IDs in the same column, if the IDs are numeric, you can use SUM with FREQUENCY:
=SUM(IF(FREQUENCY($A$1:$A$1000,$A$1:$A$1000)>0,1))
If they're text and numbers mixed, you can get unique IDs with this one:
=SUM(IF(FREQUENCY(MATCH($A$1:$A$1000,$A$1:$A$1000,0),MATCH($A$1:$A$1000,$A$1:$A$1000,0))>0,1))
(From here)

Here you go
You've not mentioned whether an event is optional.
You might want to add extra column H with formula like h2=if(c2="",0,1) with 1/0 and multiply it as well in G.
Basically if column G contains a 1 you include it
Here's what the results of the formula look like:
Revision
Table is sorted by Patient id..
on change of patient id column H contains a 1, it'll be 0 otherwise.
So H2 is hard coded to 1, H3,H4,H6 will evaluate to 1.
So now G2=H2*E2 etc. You can filter by column H.
The beauty of mapping things into binary zeros and ones is you can do multiplication to achieve a logical AND result, whilst at the same time breaking a complex task into a series of steps. You can then apply a filter to the data to get the rows where column G are not zero, and see the totals count. Normally I'd insert a column between header and data on row 2 and then have G2=SUM(G3:G9)
Sum column H for number of patients.

Related

Excel - Create dataframe out of raw data with multiple columns

In this question I will be writing some text, but also referring to a picture linked below.
I have raw data, with indicators (can be more than 4) always in column C.
In the raw data, I also have Company IDs (can be more than 2) always in row 5.
When these cross, they get some values. For instance, Company ID 1s value on indicator1 is 10.
I also have a list which takes the unique Company IDs on row 5.
I would like to create a dataframe which puts these values on table form, with each row representing a combination of indicator, company and the actual value - like represented in "desired outcome".
I would also like that table to be dynamic with how many indicators are in column C and how many companies are in row 5.
I need help with which formula to put in the three columns in desired outcome.
I have tried some combinations of xlookup, index match and byrow, but I cant seem to find a solution.
In advance, thank you :)

Power Query: Merge Only Unique Values Once

I am trying to figure out how to go about merging two tables and only having the matching rows in the first table appear once with a unique value from the second table.
I have two tables. For the sake of this question, I put everything into one picture here. I have an Orders table which is green. I have several order numbers for specific parts for each order. Each order is sorted by a priority for the order; 1 being the highest priority which will need to be fulfilled first.
In blue is my Stock table. This is what is currently being held in the warehouse. Each part has a serial number and some parts are located on different shelf locations.
The non-colored table is what I want my end result to be using Power Query. The orders with the highest priority get filled with the first serial number available in the inventory. Then the next order is filled with the next serial number and so on until there is no inventory left; in which case the query will just show blank.
For the sake of the question, this is what the tables look like before my attempts to merge.
I have been trying all types of merges and sorting combinations but no mater what I do I end up with duplicates, entire row orders removed or incorrect priority fulfillment. I have a working formula that I use in Excel; an array formula which I can post if it will help, but since we are moving towards Power Query, I really want to learn how to go about doing this if possible. I am sure there is something easily logical about this task that I am overlooking so any assistance would help. Thank you.
If you create a grouped index column you can then merge on that.
For both your Order table and your Stock table, perform a Group By operation on the PART# column.
New Column Name = All, Operation = All Rows
With the All column still grouped, add a new custom column named GROUP_INDEX. Use this code:
Table.AddIndexColumn([All],"Index",1)
You can now delete the All column. Expand the GROUP_INDEX column to expand the other columns (except for PART#)
Now you can merge both tables. Make sure your Order table is the left (upper) table. Merge on the PART# AND Index columns. Select Left Outer join.
Output:

How to count values in a data range with multiple filters

I need to create a module that will count the amount of values in specified date ranges, with other criteria.
For example, I have a list of products (Products A, B , C, D) in column C, and their sale date in column G.
I need to count all of product A sold before 1/1/1998. I then need to calculate product A sold between 1/1/1998 and 1/1/2005 etc.
I need to be able to run this for all the types of products, and group products together.
E.g. count all of product A & B sold before 1/1/1998.
This has to be done for a new workbook on a weekly basis so ideally needs to be able to be run for a new workbook each week. The tab names always remain the same.
Any help on how to get started would be appreciated
This answer will assume that your dates are entered as excel dates in column G and not a text. You can test this by using the formula =ISNUMBER(G3) where G3 is one of your dates. If it returns true, then your date is properly stored for use by excel formulas and this answer.
=SUMPRODUCT((($C$1:$C$100="A")+($C$1:$C$100="B"))*($G$1:$G$100<Date(1998,1,1))
That is how to hard code it. Personally I would build a table. Each row of the table would be a product you are interested in knowing the count for and a sum of the count would give you combined totals. Repeat the table if you need multiple combinations.
In the following example a single product was counted and then the total for all products listed was the total. The formula for the example in L3 and copied down for each product was:
=SUMPRODUCT(($C$2:$C$9=$J4)*($G$2:$G$9<K4))
The total at the bottom of the table was a simple SUM formula. Because SUMPRODUCT performs array like operations, avoid using full column references and try to restrict it to the data that needs to be checked. Otherwise you may notice a slow down in your system as multiple excess calculations are being perfomed.

Count number of unique combinations of two columns

I have a spreadsheet of statistics from sports games over a season, for different leagues - each row holds a single event that happened in a game, such as a penalty. There are many rows of events for each individual game. One of the columns is the league, another is the home team and another is the away team. How can I count the total number of games in a given league? In other words, I would need to count the number of unique pairs of strings from Home and Away, where League = "Ligue 1".
EDIT
I have tried:
=SUMPRODUCT(1/(COUNTIFS(E2:E81078,"Ligue 1",F2:F81078,F2:F81078,G2:G81078,G2:G81078)))
which returns a DIV/0 error (it does work if I dont include the column E = "Ligue 1" criteria).
This is similar to your formula but deals with the division by zero
=SUM(IFERROR((1/COUNTIFS(E2:E81078,"Ligue 1",F2:F81078,F2:F81078,G2:G81078,G2:G81078)),0))
Enter it with Ctrl+Shift+Enter rather than just Enter. If done correctly you will see {} around the formula
Try not to use ranges that are bigger than your data because it will slow these kind of formulas down significantly
Update
This might also work if your data is ordered the way you show in your question. It counts the number of times the home team changes in Ligue 1 data :
=SUMPRODUCT((F3:F81079<>F2:F81078)*(E2:E81078="Ligue 1"))
Note that the ranges in column F are offset by one row
You can do this with a Pivot Table.
Add a "helper" column where you concatenate the two teams, preferably with a delimiter in between, eg:
=CONCATENATE(B2, "|", C2)
Use, for example Teams for the column header
Then, Insert ► Pivot Table and be sure to select to Add to Data Model
This adds the option for Distinct Counts to the Values Settings
Then Drag "league" to the Rows area, "Teams" to the Values area, and select Distinct Count for the Value Setting
You might get a table similar to below, which you can format in many different ways:
EXCEL SCREENSHOT=SUMPRODUCT(1/COUNTIFS($B$1:$B$7,B1:B7,$C$1:$C$7,C1:C7))
TRY THIS =SUMPRODUCT(1/COUNTIFS($B$1:$B$7,B1:B7,$C$1:$C$7,C1:C7))

how to calculate the means of 100s of subgroups in excel

I have a spreadsheet with ~8000 records, there are ~400 unique identifiers (i.e. element 101, 102, 103....500) that I need to calculated means for. Is there a simple way to calculate means on large datasets like this?? Or will I have to do =average('select column block') for each subgroup/unique identifier?
Many Thanks
Use the following formula
=AVERAGEIF($A$1:$A$8000,"=IDNUMBER",$B$1:$B$8000)
Where
Column A is your column of ID numbers
Column B is your list that you need the mean from.
If your ID numbers are sequential, you can set up something like:
=AVERAGEIF($A$1:$A$8000,"="&100+row(A1),$B$1:B8000)
And copy that down from say C1 to C500
Alternatively you could make a list of the unique identifiers with another formula and place that unique list in C1 to C500 and then in column D use the following:
=AVERAGEIF($A$1:$A$8000,C1,$B$1:$B$8000)
If you have a header row you will need to adjust your ranges accordingly
The formula to generate a unique list of IDs is:
=INDEX($A$2:$A$8001,MATCH(0,INDEX(COUNTIF($C$1:C1,$A$2:$A$8001),0,0),0))
Use that in column C but in row 2 and copy down. So if your data starts in row 1 you will want to bump it down 1 row.
Create a pivot table with the unique identifiers in the rows and calculate the average of the values.
For data that is clustered up nicely and immediately ready to be handed off for a visual review of the averages try a creating a Subtotal:
Select your data
Go to Data > subtotal (far right on the tab)
On the menu popup in the At each change in field, select the column header name that corresponds to your unique identifier.
Select Average for Use function. Select the checkbox of the column for which you want to find the group's mean.
Select other formatting features if desired (defaults typically work best)
Click okay.
Take a sip of coffee and let the magic happen.

Resources