I am trying to get a row by row count of unique invoices in a spreadsheet. I want excel to do this by reading either 1 for unique or zero for duplicate.
I have had success with =IF(COUNTIF($C$3:C3,C3)>1,0,1).
This has given me an accurate count based on one specific column, but I have not had any luck advancing this beyond the one column. I would like this formula to be based on three criteria, not just two.
A B C D E F G
Vendor ID Name 1 Invoice Number Inv Date Sum Amount Acctg Date Unique#
00001 A 0000001 3/16/2015 5.00 5/11/2016 1
00010 M 0000001 9/14/2015 10.00 5/24/2016 1
00010 M 0000001 9/4/2015 15.00 5/24/2016 0
00005 K 0000285 4/8/2016 20.00 4/18/2016 1
000106 O 000042 6/7/2016 30.00 6/21/2016 1
000107 H 006333 4/5/2016 6.00 4/11/2016 1
000107 H 006333 4/5/2016 6.00 4/12/2016 1
There are duplicates in all the columns because of how I needed to pull the report. I would like a pull down formula that would give me unique values of A, C, F in a 1,0 format on each row line by comparing each of them against a total combination of each of three columns. Please note vendor M having a duplicate invoice number vs vendor H which has two distinct invoices based on the criteria.
This will be a large drain on resources because of the size of the data. I am looking at around 20-90k lines, but maybe someone can show me a better mousetrap? VBA macro? Match Index? Anyway, onwards to the failures!
Please feel free to explain why they didn't work, or how they could. Also please ignore column locations compared to my example as I was moving things around quite frequently.
=A&C&F then use If(countif('ColumnX')), but this didn't work correctly as I found data that was listed as a repeat when it was actually unique. I think the root problem with doing this was combining the date and general formats into one cell.
=SUMPRODUCT((1/COUNTIFS(E3:E1000,E3:E1000,J3:J1000,J3:J1000,G3:G1000,G3:G1000)))
Multiple versions of AND with IF(CountIF)
Multiple versions of =A&C AND CountIF (Date)
I have also looked at the following questions in SE and found them helpful, but ultimately not what I specifically needed, or I failed at implementation.
Simple Pivot Table to Count Unique Values
I tried this unsuccessfully based on unique invoices, need three criteria not just one.
Count unique values in Excel
See above.
Excel Formula: Count Unique Values in a Row Based on Corresponding Value in Another Row
This looks like it should work, but I tried and failed to correctly adapt to my problem.
Excel - Return Count of Unique Values Based on Two Columns
This also should work perfectly with addition of third column. Formula yelled at me and called me names. Mentioned something about can't fix stupid.
Please let me know if any parts of the question are unclear. I did my best to not duplicate and trim the information down. Thanks in advance!
If I am understanding your problem correctly, basically you want column G to check if the current row is a duplicate (based on columns A, C and F) of any rows above it. If it is, return a 0, else return a 1.
If that is what you are looking to achieve, you can do so using the COUNTIFS() function to know if there are any duplicates above the row and then simply check if the count = 0 or is > 0 (=0 means it's unique, >0 means it is a duplicate).
Your formula for column G would look as follows:
G2: 1 (obviously we know it is unique since there are no values above it to be a duplicate of)
G3: =IF(COUNTIFS($A$2:A2,A3,$C$2:C2,C3,$F$2:F2,F3)=0,1,0)
then, drag G3 downwards.
Hope this is what you were looking for.
Related
NOTE: I'm using Excel 2016, don't have access to the good stuff in 365 :(
I'm trying to build a summary sheet at the moment. The idea is to filter a table, then have the summary formulas pick out the top 5 values in a given column.
To do this I'm trying to use the Large function in Aggregate which will help me ignore hidden rows while also allowing me to extract the nth largest value.
From there I had thought to use match to find the row number of that value within the column so that I could also get text based values from the same row by assigning a column letter via Indirect.
^This is the crux of what I'm trying to do^
The code looks like this at the moment... (Sales is column "R")
=MATCH(AGGREGATE(14,5,Table1[Sales],1),Table1[Sales],0)+62
Would then go be
=INDIRECT("N"&MATCH(AGGREGATE(14,5,Table1[Sales],1),Table1[Sales],0)+62)
Aggregate(14 = Large*
Aggregate(,5 = Ignore hidden rows*
The 62 there at the end is there as the data in the table starts on row 63 (making it more robust with row() is on my list but not there yet).
The issue I'm having is it seems the lookup array in the Match function Table1[Sales] isn't being filtered as the table is being filtered.
At least that's what the results I'm seeing are indicating to me as the row number I'm getting back isn't within the filtered table (I.e. Match is returning a hidden row number).
My question is if anyone has an idea about how make this so that only visible rows are considered within the array.
(If I've completely missed the mark with this and someone has a better idea about how to accomplish this goal (without having to resort to array functions) I'd be very grateful).
Thanks!
Expected results
row
A
B
1
Company A
425
2
Company B
1500
4
Company A
1200
7
Company C
750
15
Company B
100
19
Company A
100
I'd be looking for the nth largest value in column B, say 1200 (second largest) in this example.
=MATCH(AGGREGATE(14,5,B:B,2),B:B,0)
=MATCH((1200),C:C,0))
=3
The expected result is Row 4, but because (again, it seems) the look-up array isn't excluding filtered/hidden rows, it is returning Row 3 instead.
I hope this is a bit clearer!
I have a large data set(over 1 million rows) of patient names, problems/diagnoses, and the dates these diagnoses were entered(with each variable as a column header).
I would like to pull data from this source file to add to an existing file which has about 900 unique patient names with other demographics(in columns).
I am not able to use the vlookup function because most patients have multiple problems.
Are there any other functions or tricks which might be helpful?
Thanks in advance for your time and efforts.
Sample of what Data Currently looks like:
Name Diagnosis Date of Dx
A Head 11/15/12
B Leg 09/08/14
B Elbow 10/11/15
C Hand 02/23/16
A Toe 04/11/13
A Eye 05/25/15
C Ear 12/21/14
What I would like Data Set to Look like:
Name Dx#1 Date#1 Dx#2 Date#2 Dx#3 Date#3
A Head 11/15/12 Toe 04/11/13 Eye 05/25/15
B Leg 09/08/14 Elbow 10/11/15 n/a n/a
C Hand 02/23/16 Ear 12/21/14 n/a n/a
I'm not sure if you're familiar with the index and match functions, but you can use those to create the sheet. The easiest way would be to add several helper columns (in your example 3) and use the match function to get the reference row that you want.
From there you can offset the search range by previous match to find the next match. You can do this as many times as necessary depending on the number of conditions a patient has.
After that it's a simple index function to fill in the rows of the table with the desired values. You can clean up the extra cells with iferror if you want.
Assuming your data is in columns A1:C8, and your output dataset is in columns E1:K4, the following formulas will give you the desired output. The helper columns are found in L1:N4. These formulas would go in row 2, but you can drag them down to calculate for the rest of the rows.
I'll add the column above each formula:
E
No formula, list all patient names
F
=INDEX(B:B,L2)
G
=INDEX(C:C,L2)
H
=IFERROR(INDEX(B:B,M2),"")
I
=IFERROR(INDEX(C:C,M2),"")
J
=IFERROR(INDEX(B:B,N2),"")
K
=IFERROR(INDEX(C:C,N2),"")
L
=MATCH(E2,$A$1:$A$8,0)
M
=IFERROR(MATCH($E2,OFFSET($A$1,L2,0,COUNTA($A:$A)-L2),0)+L2,"")
N
=IFERROR(MATCH($E2,OFFSET($A$1,M2,0,COUNTA($A:$A)-M2),0)+M2,"")
Hope this helps, and let me know if you have any questions about the formulas.
I am Working off a large table in Excel and I want to sort the data into categories. What I’m trying to do is get Excel to count how many times a criteria in column C, D, & E occurs and returns the value. So look through C:C pick “Company”, then look through D:D for “Full Time – Temp” and then E:E for a location such as “Factory”. See link to sample table below.
Example:
G4 =COUNTIFS($C:$C,"company",$D:$D,"full time - temp",$E:$E,"home") and it returns 0
I4 =COUNTIFS($C:$C,"company bilingual",$D:$D,"bilingual - FT - perm") and it will return 3
My problem is column E
If I wanted to return in cell J4 how many “Company Bilingual” are “Bilingual - FT – Perm” and located in “Factory” I get 0.
I’ve tried using
J4 =COUNTIFS($C:$C,"company bilingual",$D:$D,"bilingual - FT – perm",$E:$E,"Factory") but it returns 0 and what I want it to return is 2, which I understand it is saying there is no Factory cell on its own, all the cells that have Factory have 3 items in them e.g Factory - Dallas. So I want to count all the factories in column E but not where the factory is actually located.
In summary what I want to do is find a function or array that will count one unique occurrence in column C, D, and E. If a cell in a column has more than one word I would like to be able to pick one word and ultimately still count all occurrences in the other columns and return a value.
In my research I have come across different suggestions but none that helps my problem.
I hope I've explained my problem, any assistance is greatly appreciated.
Screenshot of table
I suggest you make the criteria table and mention your criteria there as shown by me in the snapshot.Giving due credit to #Harsha Vardhan, his suggestion is correct approach as given in his comments. I have made a fully working example for clear understanding.
For partial string match I used a concatenated string in I2 ="*"&"Factory"&"*"
Criteria Table is in the Range G1:I4 and Results are in the Range J1:J4
Formula to be entered in J2 to J4 respectively are as per criteria mentioned in the table.
=COUNTIFS($C:$C,$G$2,$D:$D,$H$2,$E:$E,$I$2)
=COUNTIFS($C:$C,$G$3,$D:$D,$H$3)
=COUNTIFS($C:$C,$G$4,$D:$D,$H$4,$E:$E,$I$4)
Results are as per your requirement as shown in the snapshot.
EDIT DATE 23-06-2016
This has reference to OP's comments on 22nd and 23rd June 2016. There is no change in the formulas. It is required that conditions are put correctly in the criteria table. As per new criterion specified by OP, following snapshot shows that correct results are obtained. Further file count multiple text 23062016 has been uploaded for perusal.
I have googled for hours, not being able to find a solution to what I need/want. I have an Excel sheet where I want to sum the values in one column based on the criteria that either one of two columns should have a specific value in it. For instance
A B C
1 4 20 7
2 5 100 3
3 100 21 4
4 15 21 4
5 21 24 8
I want to sum the values in C given that at least one of A and B contains a value of less than or equal to 20. Let us assume that A1:A5 is named A, B1:B5 is named B, and C1:C5 is named C (for simplicity). I have tried:
={SUMPRODUCT(C,((A<=20)+(C<=20)))}
which gives me the rows where both columns match summed twice, and
={SUMPRODUCT(C,((A<=20)*(C<=20)))}
which gives me only the rows where both columns match
So far, I have settled for the solution of adding a column D with the lowest value of A and B, but it bugs me so much that I can't do it with formulas.
Any help would be highly appreciated, so thanks in advance. All I have found when googling is the "multiple criteria for same column" problem.
Thanks. That works. Found another one that works, after I figured out that excel does not treat 1 + 1 = 1 as I learnt in discrete mathematics, but as you say, counts the both the trues. Tried instead with:
{=SUM(IF((A<=20)+(B<=20);C;0))}
But I like yours better.
Your problem that it is "summing twice" in this formula
={SUMPRODUCT(C,((A<=20)+(C<=20)))}
is due to addition turning first TRUE plus the second TRUE into 2. It is not actually summing twice, because for any row, if only one condition is met, it would count that row only once.
The solution is to transform either the 1 or the 2 into a 1, using an IF:
={SUMPRODUCT(C,IF((A<=20)+(C<=20))>0, 1, 0)}
That way, each value in column C would only be counted at max once.
Following this site you could build up your SUMPRODUCT() formula like this:
=SUMPRODUCT(C,SIGN((A<=20)+(C<=20)))
So, instead of a nested IF() you control your or condition with the SIGN()function.
hth
If you plan to use a large set of data then it is best to use the array formula:
{=SUM(IF((A1:A5<=20)+(B1:B5<=20),C1:C5,0))}
Obviously adjust the range to suit the data set, however if the whole of each column is to form part of the formula then you can simply adjust to:
{=SUM(IF((A:A<=20)+(B:B<=20),C:C,0))}
This will perform the calculation on all rows of data within the A, B and C columns. With either example remember to press Ctrl + Shift + Enter in order to trigger the array formula (as opposed to typing the { and }).
My data table is like the image above. I can easily count the number of Male participants in group B or C using this array formula:
=SUM(COUNTIFS($B:$B, $E3, $C:$C, $F3:$F4))
The result is 3 as expected. However I'm gonna do the reverse thing, that is count the number of Male participants in NOT group B or C. The result should be 1 but currently I'm stuck at this.
Can anybody show me a way please (preferably not just counting the number of all Male participants and then do a subtraction)? I have even tried to change the values in the Group to something like <>B and <>C but it just doesn't work.
As you only have 2 in the group you can easily use COUNTIFS with 2 separate criteria, i.e.
=COUNTIFS($B:$B,$E3,$C:$C,"<>"&$F3,$C:$C,"<>"&$F4)
but clearly that might not be desirable for a large group, so you could use SUMPRODUCT like this to reference the group once
=SUMPRODUCT(($B:$B=$E3)*ISNA(MATCH($C:$C,$F3:$F4,0)))
ISNA will exclude matching rows - to include use ISNUMBER
You can replace F3:F4 with any single row or column of values
Note: whole columns with SUMPRODUCT will work (post Excel 2003) but is undesirable as Jerry says
Hmm, the thing with the formula right now is that the first COUNTIFS (for F3) will return 2 and the second COUNTIFS (for F4) will return 3, which SUM converts into 5 when you try:
=SUM(COUNTIFS($B:$B, $E3, $C:$C, "<>"&$F3:$F4))
I would suggest using SUMPRODUCT instead:
=SUMPRODUCT(($B:$B=$E3)*($C:$C<>$F3)*($C:$C<>$F4))
And maybe make the range smaller since this can take some time (you don't need to insert this as an array formula).
Otherwise, another option would be to count all the Males, and then subtract the counts for group B and subsequently C:
=COUNTIF($B:$B, $E3)-SUM(COUNTIFS($B:$B, $E3, $C:$C, $F3:$F4))