Excel count unique occurrences of a text entry based on a status contained in a seperate column - excel

Alright, this is driving me insane...
I have a section of data in a spreadsheet that looks like this:
Column A Column B Column C
lksdf-46-we-32 Fire 1
lksdf-46-we-32 Fire 2
lksdf-46-we-32 Fire 3
lksdf-46-we-32 Fire 4
wgw3f-18-bw-11 Ice 1
wgw3f-18-bw-11 Ice 2
wgw3f-18-bw-11 Ice 3
wgw3f-18-bw-11 Ice 4
possf-12-he-91 Fire 1
possf-12-he-91 Fire 2
possf-12-he-91 Fire 3
possf-12-he-91 Fire 4
oiwen-20-lw-93 Water 1
oiwen-20-lw-93 Water 2
oiwen-20-lw-93 Water 3
oiwen-20-lw-93 Water 4
In another spreadsheet, named 'Variables', I have a lookup category that looks something like this:
Column A
Fire
Water
I need to find the number of distinct entries in column A of the raw data sheet where column B matches any entry in column A of the Variables sheet. What I'm looking for is an excel formula, but everything I've tried either returns duplicates (as a starting point) or returns 0. Also, could you please explain in detail how the query works in excel? I'm a fairly experienced programmer, but I'm having a heck of a time wrapping my head around these functions in excel that I've been tasked to finish by the end of the day.

Try this "array formula" somewhere in the raw data sheet
=SUM(IF(FREQUENCY(IF(ISNUMBER(MATCH(B2:B100,Variables!A:A,0)),IF(A2:A100<>"",MATCH(A2:A100,A2:A100,0))),ROW(A2:A100)-ROW(A2)+1),1))
confirmed with CTRL+SHIFT+ENTER
The formula uses FREQUENCY function, with the "bins" being the row numbers, and counts bins that have 1 or more entry. Entries are only made when the column B item matches Variables column A.....and the second MATCH function ensures that the same row number (the first match) is entered for each repeated item in column A, which guarantees that duplicates are not counted
This formula looks at 100 rows of data in raw data sheet, increase as required but note that formula is very "expensive" so may prove impractical with very large datasets

Related

All combinations of 4 out of 7 columns with totals using excel

I have 7 columns to choose from and I need to pick 4 of those columns and generate a total for each row. I also need every combination of 4, which means I'll have 35 new columns with the totals for each of those combinations showing in each row. I need the code for this and if it can be done only using Excel. Here is an image of the columns and the grayed ones are the 7 columns I'm talking about. My knowledge of Excel is very limited. There are over 1,500 rows if that matters.
multi step approach that is going to use some helper rows. there may be a more elegant formula that will do this, and much slicker options in VBA, but this is a formula only approach.
Step 1 - Generate List of Column Combination
To generate the list 4 helper rows will need to be insert at the top of your data. either above or below you header row. These 4 rows will represent the column number you are going to pick. To keep the math simpler for me I just assumed the 1 for the first column and 7 for the last column. those numbers will get converted to later to account for column in between in your spreadsheet. For the sake of this example The first combination sum will occur in column AO and the first helper row will be row 1. The first combination will be hard coded and it will seed the pattern for the remainder of column combinations. Enter the following values in the corresponding cells:
AO1 = 1
AO2 = 2
AO3 = 3
AO4 = 4
In the adjacent column a formula will be placed and copied to the right. It will automatically augment the bottom value by 1 until it hits its maximum value at which point the value in the row above will increase by 1 and the the value of the current will be 1 more than the cell above. This will produce a pattern that covers all 35 combinations by the time column BW is reached. Place the formulas below in the appropriate cell and copy to the right:
AP1
=IF(AO2=5,AO1+1,AO1)
AP2
=IF(AO2=5,AP1+1,IF(AO3=6,AO2+1,AO2))
AP3
=IF(AO3=6,AP2+1,IF(AO4=7,AO3+1,AO3))
AP4
=IF(AO4=7,AP3+1,AO4+1)
Step2 - Sum The Appropriate Columns
I was hoping to use a some sort of array type operation to read through the column reference numbers above, but I could not get my head around it. Since it was just 4 entries to worry about I simply added each reference manually in a SUM function. Now the important thing to note is that we will be using the INDEX function over the 13 columns that cover the range of your columns so to convert the index number we figured out above, to something that will work to grab every second row, the number that was calculated will be multiplied by 2 and then 1 will be subtracted. That means 1,2,3,4 for the first column combination becomes 1,3,5,7. You can see this in the following formula. Place the following formula in the appropriate cell and copy down and to the right as needed.
AO5
=INDEX($AB5:$AN5,AO$1*2-1)+INDEX($AB5:$AN5,AO$2*2-1)+INDEX($AB5:$AN5,AO$3*2-1)+INDEX($AB5:$AN5,AO$4*2-1)
pay careful attention to the $ which will lock row or column reference and prevent them from changing as the formula is copied.
Now you may need to adjust the cell references to match your sheet.

Fill in table based a column of categories in Excel

I have a table that looks like this:
Type Value
Movie 5
Food 3
Gas 10
Food 2
.... ....
And There's a second table I want to fill in with "Value" based on their type in the first table, so that the corresponding rows look like this:
Rent Food Movie Gas Clothing ... ( appear in specific order bc they are subcategories)
5
3
10
2
The title row is already there, so I was thinking there might be some kind of lookup method to do this? How do I do that?
your second table apperas to hold one value per row but it doesn't have a label. it does correlate to the original row number, is this by design or coincidence?
if this is by design then you can use those 2 columns, hide them if you like, get a unique list of categories by copying you r abels to a new colum, removing duplicates in the data tab, then paste special transpose in c1 to create colum headers.
so column a and b remain unchanged
row 1 contains header starting at column c
your data starts at c2
this is the formula
=Iferror(vlookup(C$1,$A2:$B2,2,false),"")
drag it down and to the right
you can copy paste special values when done to remove the formulas
for something with only a hundred or thousand cells this will be one of the easier options but i would not do this on large tables, for those i would use power query or VBA
Assuming your 1st table is in Sheet1 and 2nd table is in sheet2.. you may try to fill in Sheet2!A2
=IF(Sheet1!$A2=A$1,Sheet1!$B2,"")
and drag it all the way.. Hope you get how it works.. and what you need.

To filter multiple columns with a condition on the results

I am trying to find a way of highlighting a result with multiple conditions. I have no knowledge of pivot tables. I would rather use a formula or macros. The table is organised by Dealer.
Acc NAME Add Dealer Total
68687 Sara 11 Wood 111A 0
68687 Sara 11 Wood 111A 0
32187 Sara 11 Wood 111A 0
12345 Tom 10 Main 7878C 2
12345 Tom 10 Main 7878C 2
54321 Tom 10 Main 7878C 2
My table is similar to the one above. I want to select where the Total is greater than 0 & for each Dealer each unique Account number with the lowest Account number highlighted somehow.
So the results I want for the table above would be: Dealer 7878C, Accounts 12345, 54321.
12345 being the lower of the two, it is highlighted.
I don't mind copying the results onto another sheet, as I don't want to remove any data from the sheet. I started by just filtering the Totals for >0 and I was thinking of trying to filter for unique values in Account but its the next step that I am stuck on. A countifs formula?
The sheet is quite large and I'm just not sure which is the best way to try and do it.
Thanks for any help.
There's a nice but complicated way to do it.
With your original data:
With changed data:
As you can see I've placed your data in A1:E7.
I use two array formulas, one for the Dealer in G2:G5 and one for the Accounts H2:N5. The Dealer formula is vertical, and the Accounts formula is horizontal.
For the dealers put this array formula in G2 (press Ctrl+Shift+Enter to enter it):
=IFERROR(INDEX($D$2:$D$7,SMALL(IF(($E$2:$E$7>0)*(COUNTIF($G$1:$G1,$D$2:$D$7)=0),ROW($D$2:$D$7)-1),ROW($G$1:$G1))),"")
Now copy G2 down to G3:G5 to get the rest of the relevant dealers.
For the accounts put this array formula in H2:
=IFERROR(SMALL(IF(($D$2:$D$7=$G2)*(COUNTIF($G2:G2,$A$2:$A$7)=0),$A$2:$A$7),1),"")
Now copy H2 to the right, I2:N2, and down to H3:N5.
To make the first accounts bold I simply make the H column formatted as Bold.
You can copy these formulas farther as needed. Note that the locations are important. If you want to place the formulas elsewhere you'll need to change the references accordingly.
Formulas explained
What these formulas do is check for your conditions, and then get the smallest value that hasn't been retrieved yet, in the upper / left most cells.
The two formulas are mostly the same, apart from the fact that in the account numbers we can use the actual numbers, and with the dealer we use the row number instead.
The dealer formula from the inside out:
The conditions are set in the IF part of the formula, with a multiplier * as a logical AND (TRUE*TRUE=TRUE FALSE*TRUE=FALSE).
The first condition in IF(($E$2:$E$7>0)*(COUNTIF($G$1:$G1,$D$2:$D$7)=0),... checks for the row's Total value to be greater than zero, the second condition checks that the dealer is not already present in the G column. The second condition is irrelevant in the first cell, but in the second cell G3 it becomes COUNTIF($G$1:$G2,... which returns more than 0 if the dealer already exists, and evaluate to FALSE.
If the conditions are met the IF returns the dealer's index by using its row minus 1 ROW($D$2:$D$7)-1, which returns 1 for the first etc. as the starting row is 2. Otherwise it returns FALSE which is ignored.
The SMALL function returns the k-th smallest item. It ignores the FALSE items, and in our case returns the k-th smallest index that meets the conditions (Total>0 and not already present in the results). SMALL(...,ROW($G$1:$G1) in the first cell return the first item. ROW($G$1:$G2) in the second cell G3 evaluates to 2 and returns the second smallest item, and so forth.
The INDEX function simply returns the dealer from the data according to the index.
And finally, the IFERROR is there only to hide the errors when the end of the results is reached.
based on your sample data and assuming a header row in row 1 and the left column being column A.
=COUNTIF($A$2:A2,A2)
place that in F2 and copy down. Then do a filter on the helper column =1

Quantifying conditional duration of values in excel

I am trying to analyze blood pressure that is taken every minute, and determine how long the values are within a certain range, consecutively. I have the data set up in excel for the moment. I have color coded the values based on the ranges I would like to quantify. I know that if I do a simple "=countIF) function I can get the total number of times these values meet the criteria. But what I want to do next is quantify for how long the values fall within a specified range, consecutively.
This shows values in columns in excel, where each column is a different patient, and the heat map are the value conditions to help me visualize if certain thresholds occur for longer times than others. But I want to find a way to quanitify this in excel, if possible. Any help would be much appreciated.
The final result I am looking for is to be able to measure how much time each patient sustains a specific category of blood pressure to know if certain ranges are more prolonged than others (e.g. blood pressure is between 120-130 for 30 minutes). So in the spreadsheet above, assuming each cell is a 1-minute bin, for column HU, BP is between 120-130 for 3 minutes (rows 2-4), and again for 16 minutes (rows 6-22). In column HS, blood pressure is above 140 (black) for 7 minutes.
I want to find a workflow to quantify these durations so that I can get a summary of the number of consecutive 1-minute bins (each cell) at a specified range/threshold for each patient (column)
First, I would create another sheet -- let's call it "Thresholds" -- with thresholds of bloodpressures in ascending order in column A.
Put a category number next to each value (in column B)
For example:
0 0
90 1
100 2
105 3
110 4
115 5
120 6
125 7
... etc.
Back in the other sheet, add a new column next to each bloodpressure column. So you
would have a new column HR next to HQ.
Put there a formula that looks up the category for the value in HQ, from sheet "Thresholds".
You can use VLOOKUP for that. For example in row 2:
=VLOOKUP(HQ2, Thresholds!$A$1:$B:$1000, 2)
Then add yet another column, HS it will be.
In there make a running count for same category rows, like this (for row 2, I assume you have used row 1 for column titles):
=IF(HR1<>HR2, 1, HS1+1)
Drag down this formula to the column. This formula checks if this row has a different category of blood pressure than the previous one. If so, it
sets the counter to 1 (it is the first instance in this running series). In the other
case it takes the value of the counter in the previous row and adds 1 to it.
Repeat this for the other columns (inserting 2 new columns next to each).
This will already give you a start for further analysis.

Excel VBA Lookup Methods

I have an issue that I've been scratching my head at; I've looked into the Index:Match lookup method, and V/HLookup, but I'm not sure these will help just yet. Here's what's happening:
I have two worksheets in excel-2007. One has a Customer ID column (which does and will have duplicate ID's in the instance that the customer did "x activity" more than one time in a month) and then the date that this happened in another column. The second sheet is for giving an overview of a specific day, IE what happened on 7-1-13.
The issue is that my raw data sheet is sorted via the first of the month descending, so 7-1,7-2,etc, and when I run the Vlookup, if a Customer ID has a record on 7-2 and on 7-15, the VLookup will pull data only from the 7-2 (first) row. Has anyone experienced this and found a workaround?
My current workaround would be to make either a new table for each day's data, or instead of using my VLookup of =Vlookup(A2, 'Data Sheet' A:D, 4, 0) to give the columns row numbers, like =Vlookup(A2, 'Data Sheet' A$1:D$30, 4, 0). This is a daily report, and that would be intense. Please help if you can!
(Another side note, I have one main sheet for the view, one data sheet where it's all collected, and then 30 sheets, one for each day of the month, this case being July). For each sheet, I've named them the day of the month, so I'm reflecting the data as such:
Sheets("7-1-13") has data from the 1st on it. The Data Sheet in it's entireity has data from 7-1-13 to 7-31-13. I need to reference ID's on the 1st to the data for the 1st and the 1st only.
I want to use something like this, but I'm having a hard time with it
=Vlookup(A2, 'Data Sheet', A:D (ONLY IF THE CREATE DATE OF THIS ITEM IS 7-1), 4,0)`
but of course it's not that easy :p
This may not give you your results in a format you like and still requires a bit of manual work, but without going the route of macros, I think this will get you one step closer. I thought of using an array formula to get all the IDs by a specific date.
Example:
A B
ID Date
1 5 7/1/2013
2 2 7/2/2013
3 5 7/3/2013
In this situation, I assume you want 5 from the first row to appear on your 7/1 sheet, 2 to appear on your 7/2 sheet, and 5 from the third row to appear on your 7/3 sheet
on your 7/1 sheet. you'll need to select the number of blank rows that matches your raw data (using the example above, you would be selecting A1:A3 on your 7/1 sheet). Once you have your cells selected, then enter the following formula in the formula bar and press Ctrl+Shift+Enter. This is what makes the formula an array formula.
=((Raw_DataSheet!B1:B3=DATE(2013,7,1))*1)*Raw_DataSheet!A1:A3
What this formula does is looks at all the dates in B1:B3 and finds the ones that equal 7/1/2013. Since you're using an array formula, this gives you the array {TRUE,FALSE,FALSE}. Multiply this by 1, and you get the array {1,0,0}. You now have an array that has a 1 for each row of B1:B3 that was equal to 7/1/2013. This array {1,0,0} is then multiplied by your Customer IDs {5,2,5}
5 * 1 = 5
2 * 0 = 0
5 * 0 = 0
So now your entire formula is equal to the array {5,0,0}. Since you selected A1:A3 on your 7/1 sheet, the values that should appear should be
A
ID
1 5
2 0
3 0
From here, you can always filter out the 0's and you'd just have a list of all the IDs that had the date of 7/1 from your Raw Data Sheet. You would also then replicate this for each of your sheets and just change the date in the formula...Yes, I know, way more complicated than you probably wanted but it's what I came up with!

Resources