I'm attempting to code a clever duplicate excel formula, but can't seem to determine how. I have a large list of numbers (172,250 to be exact). I need to find which are duplicates and which are not. I have tried COUNTIF, COUNTIFS, and SUMPRODUCT, but all behave very slowly, so much so that it is a hindrance more than a help. What I need is a way to determine a duplicate that will stop after the second is found. I don't need a full count. I'm trying to avoid using VBA as this will eventually be given to an end user, who would be better suited to just copy-pasting the formula into her cells. Any help would be appreciated.
Thanks in advance!
Since you can sort, you can use a simple = comparator.
Let's say your numbers are from A2:A172251 and sorted (leaving one row above for filtering). In B2, put:
=A2=A1
And drag down. You will get TRUE for duplicates and FALSE for the first occurrence of a number. So if you have the following first column, you get the following second column:
Numbers Duplicates
1 FALSE
2 FALSE
2 TRUE
2 TRUE
3 FALSE
4 FALSE
If you don't want to sort, and want to return something for EACH duplicate, no matter where in the list, you could do something like:
A1: =ISNUMBER(MATCH($A1,$A2:$A$10000,0))
A2: =OR(ISNUMBER(MATCH($A2,$A3:$A$10000,0)),ISNUMBER(MATCH($A2,$A$1:A1,0)))
Select A2 and fill down as far as needed. Will return TRUE or FALSE depending on if there are duplicates.
Related
Summary
I would like to select multiple randomly selected options from a list within Excel, but filtered to only include those with a TRUE indicator in another column. The number of random selections to make will need to be vary. Additionally, duplicates shouldn't be pulled. Ideally, I would like to have the formula contained within one cell, and have it spill to the required length, if such a thing would be possible.
Current attempt
I have tried making this logic work through the combined used of =INDEX, =RANDBETWEEN =FILTER and a helper =RAND() column, in an attempt to filter the data, then randomly select from it. The issue I'm finding is that =FILTER does not like to combine properly (although I'm likely doing something wrong). Additionally, this involves copying the formula down to each cell I'd like a new randomly selected option for, which isn't ideal if possible.
Example of required output
ID
T/F
Number to randomly select?
Output
1
TRUE
5
2 [Formula is here and spill down]
2
TRUE
5
3
FALSE
7
4
FALSE
9
5
TRUE
12
6
FALSE
7
TRUE
8
TRUE
9
TRUE
10
FALSE
11
TRUE
12
TRUE
Here is an example of how I'd ideally like the output of the data. The data itself is in A1:B13. I'd enter the number of options to randomly select in C2, then the output would be generated based on a solution contained in D2, which spills down accordingly. The requirements that it only selects from an ID population in column A where the TRUE/FALSE indicator from column B for that ID is TRUE. I'm looking for the solution to this question to be the formula I'd need to put into D2, please.
Extra detail
I'm always looking to improve, so any explanation on how something works is always appreciated, so I don't end up asking the same things again!
I am using MSO 365, but I am restricted to version 2108 due to rollout schedules.
Thanks for any help with this!
Try:
Formula in D2:
=LET(X,FILTER(A2:A13,B2:B13),INDEX(SORTBY(X,RANDARRAY(COUNT(X))),SEQUENCE(C2),1))
Or, if one needs these to be ascending:
=LET(X,FILTER(A2:A13,B2:B13),SORT(INDEX(SORTBY(X,RANDARRAY(COUNT(X))),SEQUENCE(C2),1)))
A last small edit could be to adjust the value in C2 with a smaller number if need be (avoid errors spilled down):
=LET(X,FILTER(A2:A13,B2:B13),SORT(INDEX(SORTBY(X,RANDARRAY(COUNT(X))),SEQUENCE(MIN(COUNTIF(B2:B13,TRUE),C2)),1)))
Ok so the issue is a spreadsheet where the goal is to check if two columns end up equaling the same. One column has the 'goal result' and the other column is usually a formula. (A starting number, plus a number of changes that haven't yet been issued in the paperwork, should equal the 'goal result.)
There's a third 'check column' that will display either TRUE or FALSE depending on whether the two have equal values.
Now the issue is that sometimes, for no reason I can discern at all, the 'check column' will display FALSE even though the result values are clearly the same. There's no change in format, no difference in cell structure or formula structure. I appreciate that one shows a formula and one doesn't but my spreadsheet it massive, hundreds of rows, so there's no reason this should only be an issue in 1/50 or so cells.
A snippet example
An example of the formula in the working (top) row.
An example of the formula in the row working differently
The forumla that generates the TRUE or FALSE is as simple as "=H=I" (With the cell numbers obviously.)
EDIT: Thanks for the great response guys. The rounding solution worked. There was only addition and the true value IS only two decimals so I still don't quite understand why that would be an issue but I've replaced the formula throughout the sheet anyway, to be safe. Thanks again!
I have an excel sheet sort of like this:
I'm trying to figure out how to get the totals in cells B1 through B4.
I tried INDEX-MATCH, where I tried to match the words in A1:A4 with the words in row 7, get the numbers relative to them, and then sum them, but it was a lot of Google searching and stabbing in the dark -- every attempt returned an error.
I also tried to INDEX-MATCH the words in A1:A4 with row 7, and then nest a VLOOKUP in there where it'd get the number relative to "visits:" but that didn't work at all either.
Is INDEX-MATCH even the correct function? Any help would be much appreciated, I'm not even sure what to Google anymore.
EDIT: I need to use a search function of some kind, like the INDEX-MATCH method, rather that static formulas because the sheet will change periodically and I don't want to have to update the formula every time I add an animal.
Your data table is unusual in structure.
However, if you are gong to keep a fixed rule such that the number of visits is always offset 2 rows and 1 column from the animal type(and that itself is always in row 7), you could do:
In B1:
=SUM(IF($A$7:$AAA$7=$A1, $B$9:$AAB$9, 0))
Confirm with Ctrl-Shift-Enter, and then copy down..
DOes this work?
=SUM(IF($B$7=A1,$C$9,0),IF($D$7=A1,$E$9,0),IF($F$7=A1,$G$9,0),IF($H$7=A1,$I$9,0))
I'm not sureto have fully grasped your challenge. Yet it seems the following solution would work:
Add the following formula in each box where the number of visits is added as
=+SUMIF($A$1:$A$end;animal;$B$1:$B$end)
Where end is a number of the last cell in the first and second columns data contain the data.
And animal is the cell that contains the name of the animal.
Therefore in your simple example, the formulas on cells C9;E9;G9 and I9 would be respectively:
=+SUMIF($A$1:$A$4;B7;$B$1:$B$4) ; =+SUMIF($A$1:$A$4;D7;$B$1:$B$4); =+SUMIF($A$1:$A$4;F7;$B$1:$B$4) and =+SUMIF($A$1:$A$4;H7;$B$1:$B$4).
I am trying to use a COUNTIFS formula to calculate how many installs are done. This is done by searching through a large table containing many blank cells. When using the following formula, I receive a #VALUE! error because Excel sees the blank cells as 0s and gets confused trying to count strings and integers:
=COUNTIFS(B10:B152,"Installs",D10:N152,"Done")
The range D10:N152 contains blanks and is causing the error.
Can I make Excel ignore those blanks or see them as strings instead of integers?
For COUNTIFS:
Important Each additional range must have the same number of rows and columns as the criteria_range1 argument. The ranges do not have to be adjacent to each other.
Maybe add a helper column, say O with:
=IF(MATCH("Done",D10:N10,0)>=1,"Done","")
copied down to suit and then:
=COUNTIFS(B10:B152,"Installs",O10:O152,"Done")
Now we know only one Done per row (and assuming Done is on its own in a cell) then a helper column with say:
=COUNTIF(D10:N10,"Done")
would be a shorter formula than =IF(MATCH("Done",D10:N10,0)>=1,"Done","") and also allow a shorter formula than =COUNTIFS(B10:B152,"Installs",O10:O152,"Done") for the counting, say:
=SUM(O:O)
instead (assuming the rest of ColumnO is blank or text, otherwise =SUM(O10:O152).
However I aimed for a formula as similar as possible to that used by OP, for which the helper column needed to be populated with Done also. At the time I was allowing for the possibility there might be more than one Done per row.
As pointed out in a comment, the helper column might have been populated with fewer keystrokes than =IF(MATCH("Done",D10:N10,0)>=1,"Done",""), if only by excluding the result for failure of the test, say =IF(MATCH("Done",D10:N10,0)>=1,"Done") since MATCH never ‘fails’ – rather it returns #N/A where no match is found, which is good enough for a COUNTIF function since that does not count #N/As when the criterion is Done.
Try this FormulaArray:
=SUM(($B$10:$B$152="Installs")*($D$10:$N$152="Done"))
I am using the following formulae for a truly unique ranking of values: How to Rank Duplicate Values Sequentially.
(As you may or may not be aware the other option (see ##) can produce erroneous results!)
However, there is a problem: I would like to ignore empty cells! Currently, empty cells are counted as having the value zero.
How do I need to change the formulae in 1 to ignore empty cells and to return no value at all? Is that even possible with an array formula?
I avoid VBA as I need to keep this dynamic.
Thank you in advance for any hints!
pascal
(##): =RANK(A2,$A$2:$A$10)+COUNTIF($A$2:A2,A2)-1
End result:
Method (A1 is top left):
Data2: =IF(ISBLANK($A2),"",VALUE($A2&"."&(ROW()-ROW($B$1))))
Sorted: =SMALL($B$2:$B$8,ROW()-ROW($C$1))
Rank: =IFERROR(MATCH($B2,$C$2:$C$8,0),"")
I think this formula should work OK, to create "unique ranks" and return blanks for blanks
=IF(A2="","",RANK(A2,$A$2:$A$10)+COUNTIF($A$2:A2,A2)-1)
I'd expect that to calculate correctly assuming A2:A10 contains numbers (not text formatted) and where the numbers don't exceed 15 significant digits....but if you want to avoid COUNTIF then this formula with SUMPRODUCT should do the same:
=IF(A2="","",SUMPRODUCT((A$2:A$10>A2)*(A$2:A$10<>""))+SUMPRODUCT((A$2:A2=A2)*1))