Removal of Duplicate Values - Leading Date - excel

I'm trying to remove duplicate values from my workbook, but the issue is that the same entries, the one I am attempting to keep has the year leading as seen below.
I've tried using a VLOOKUP (below) but since it's not a exact match, I used TRUE, which doesn't return the value I really want and it would take another step to remove the value without the leading year.
=VLOOKUP(F2,F2:F657,1,TRUE)
Any and all help would be appreciated!

How about something like this:
{=IFERROR(VLOOKUP(F2,F2:F657,1,FALSE),INDEX(F2:F657,MATCH(F2,SUBSTITUTE(F2:F657,LEFT(F2:F657,7),""),0)))}
This works as follows: (1) first it checks if the vlookup finds a match. If that's not the case then (2) it will try to find a match in the list by removing the first 7 characters (year + space + hyphen + space). So, the above solution assumes that the first 7 characters must be always remove (this solution is not flexible to the length of the characters that need to be removed).
Furthermore, the above solution assumes that you have an Excel version which knows the formula IfError (requires Excel 2007+). Otherwise, you can substitute that with a complete If formula.
Note, that the above formula needs to be entered as an array formula. So, you need to press Ctrl + Shift + Enter to enter the formula. For more information on array formulas have a look at Microsoft's website here: https://support.office.com/en-us/article/Guidelines-and-examples-of-array-formulas-7d94a64e-3ff3-4686-9372-ecfd5caa57c7

Related

Increment numbers when value changes in another column

I currently have a column of letters in B2:B11 alongside numbers that increment by 1 when letters appear consecutively (C2:C11). When a new letter appears, the sequence resets and starts from 1 again.
This is the formula I'm using:
=SCAN(0,B2:B11,
LAMBDA(a,b,
IF(OFFSET(b,-1,0)=b,
a+1,1)
)
)
It works fine when the letters are together in blocks, but when they are separated, any previous instances of a letter are forgotten about.
I want to find a solution that uses a single formula. I believe I'm on the right path using the new SCAN() function. Please don't suggest methods involving classic formulas or tables (I've already seen these).
The values returned should match those in D2:D11.
Here the array version in E2:
=COUNTIF(OFFSET(A2,0,0,SEQUENCE(ROWS(A2:A11)),1),A2:A11)
or using LET for easier maintenance:
=LET(start, A2, range, A2:A11,
COUNTIF(OFFSET(start,0,0,SEQUENCE(ROWS(range)),1),range))
This is the output:
The idea was taken from here: Running Count Array Formula in Excel 365
Note: The third argument of OFFSET is optional, default is 1, so it can be omitted.
This worked for me:
=COUNTIF(B$2:B2,"="&B2)

Multiple if search statements in excel?

I am trying to convert text of the month to the number
B2 cell:
BirthMonth_Jan
BirthMonth_Feb
BirthMonth_mar
BirthMonth_Apr
BirthMonth_May
BirthMonth_Jun, ect to december
for example, BirthMonth_Jan will output 1 based on the search of Jan, so i can compare this to another set of numbers
I have this, and tried this, but only works with two if statements, is there anyway i can do this with 12?
=(IF(ISNUMBER(SEARCH("sep",B2)),"9")),(IF(ISNUMBER(SEARCH("aug",B2)),"8")),(IF(ISNUMBER(SEARCH("jul",B2)),"7")),(IF(ISNUMBER(SEARCH("jun",B2)),"6")),(IF(ISNUMBER(SEARCH("may",B2)),"5")),(IF(ISNUMBER(SEARCH("apr",B2)),"4")),(IF(ISNUMBER(SEARCH("mar",B2)),"3")),(IF(ISNUMBER(SEARCH("feb",B2)),"2")),(IF(ISNUMBER(SEARCH("jan",B2)),"1"))
I get #Value!
If i try this, it also doesn't work
=IF(ISNUMBER(SEARCH("dec",B2)),"12",IF(ISNUMBER(SEARCH("nov",B2)),"11")),IF(ISNUMBER(SEARCH("DSH_KnowBe4_BirthMonth_Oc",B2)),"10"))
the second option only works with two but if i add more it throws an error
The questioner is trying to obtain a numeral equivalent to a partial month name extracted from a string. There are any number of examples in stackoverflow and the net generally on this theme. What is special in this case is the partial month name in the target cell, and use of the IF statement. The questioner is right to use search since it is not case-sensitive
Two formula are offered:
Formula 1
=(IF(ISNUMBER(SEARCH("sep",B2)),"9")),(IF(ISNUMBER(SEARCH("aug",B2)),"8")),(IF(ISNUMBER(SEARCH("jul",B2)),"7")),(IF(ISNUMBER(SEARCH("jun",B2)),"6")),(IF(ISNUMBER(SEARCH("may",B2)),"5")),(IF(ISNUMBER(SEARCH("apr",B2)),"4")),(IF(ISNUMBER(SEARCH("mar",B2)),"3")),(IF(ISNUMBER(SEARCH("feb",B2)),"2")),(IF(ISNUMBER(SEARCH("jan",B2)),"1"))
The questioner said "I get #Value!"
This is not a surprise because it is essentially a series of nine, self-contained, unrelated if statements, each separated by a comma. It is an invalid statement.
However, if the if statements were nested, then the formula would work. Something along these lines:
=IF(ISNUMBER(SEARCH("jan",B2)),"1",IF(ISNUMBER(SEARCH("feb",B2)),"2",IF(ISNUMBER(SEARCH("mar",B2)),"3")))
Formula 2
=IF(ISNUMBER(SEARCH("dec",B2)),"12",IF(ISNUMBER(SEARCH("nov",B2)),"11")),IF(ISNUMBER(SEARCH("DSH_KnowBe4_BirthMonth_Oc",B2)),"10"))
So close and yet so far... This statement uses the nested approach mentioned above. There is a major typo for the October search (instead of searching for "oct", the formula searches for "DSH_KnowBe4_BirthMonth_Oc") though this doesn't cause the formula to fail.
Failure is caused by two things:
1) The double bracket following "11")) in the "November" search. There should be zero brackets here.
2) The formula needs an additional closing bracket.
Two other things to note:
1) in the event of a match, the value returned is a string not an integer.
2) there's no provision to return a value in the event of a failure to match.
Working IF statement formula
The following formula, consisting of nested IF statements, works as intended by the questioner.
=IF(ISNUMBER(SEARCH("jan",B2)),"1",IF(ISNUMBER(SEARCH("feb",B2)),"2",IF(ISNUMBER(SEARCH("mar",B2)),"3",IF(ISNUMBER(SEARCH("apr",B2)),"4",IF(ISNUMBER(SEARCH("may",B2)),"5",IF(ISNUMBER(SEARCH("jun",B2)),"6",IF(ISNUMBER(SEARCH("jul",B2)),"7",IF(ISNUMBER(SEARCH("aug",B2)),"8",IF(ISNUMBER(SEARCH("sep",B2)),"9",IF(ISNUMBER(SEARCH("oct",B2)),"10",IF(ISNUMBER(SEARCH("nov",B2)),"11",IF(ISNUMBER(SEARCH("dec",B2)),"12",NA()))))))))))))
Note, the formula uses the NA() function to return #N/A if there is no match.
VLOOKUP alternative
Though the above-mentioned formula works, I find it complicated and inflexible. My preference in situations like this is VLOOKUP. My equivalent formula would be:
=VLOOKUP(RIGHT(B2,LEN(B2)-SEARCH("_",B2)),Sheet2!$A$2:$B$13,2,FALSE)
Using January as an example: BirthMonth_Jan, the formula lookup works like this:
RIGHT(B2,LEN(B2)-SEARCH("_",B2))
1) search for the underline character SEARCH("_",B2),
2) deduct the result from the total length LEN(B2)-SEARCH("_",B2) to give the number of characters to the right of the underline.
3) get all the characters to the right of the underline RIGHT(B2,LEN(B2)-SEARCH("_",B2)). This is the lookup value
4) Create a reference table on another sheet (refer screenshot); lookup this table and return column 2 (the number for that month).
5) If there is no valid result, VLOOKUP automatically returns #N/A.
The reference table on a separate sheet:
Not sure what you are trying to do with the formula but if your "BirthMonth_" text is consistent, you can use :
=MONTH(DATEVALUE("1 "&SUBSTITUTE(A12,"BirthMonth_","")&" 2018"))
Having a view of your data and expected result would help if this is not what you're after.
It is seems just possible what you might want is:
=MONTH(MID(B2,SEARCH("BirthMonth_",B2)+11,3)&0)
Returns a Number.

Formula to extract numbers from a text string

How could I extract only the numbers from a text string in Excel or Google Sheets? For example:
A1 - a1b23eg67
A2 - 15dgrgr156
Result desired is
B1 - 12367
B2 - 15156
You can do it with capture groups in Google Sheets
=REGEXREPLACE(A1,ʺ(\d)|.ʺ,ʺ$1ʺ)
Anything which matches the contents of the brackets (a digit) will be copied to the output, anything else replaced by an empty string.
Please see #Max Makhrov's answer to this question
or
=regexreplace(A1,ʺ[^\d]ʺ,ʺʺ)
to remove anything which isn't a digit.
Because you asked for Excel also,
If you have a subscription to office 365 Excel then you can use this array formula:
=--TEXTJOIN("",TRUE,IF(ISNUMBER(--MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1)),MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1),""))
Being an array formula it needs to be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode. If done correctly then Excel will put {} around the formula.
I would imagine there is a way to pull this off with =RegexExtract but I can't figure out how to get it to repeat the search after the first hit. Often with these regex function implementations there is a third parameter to repeat, but it doesn't look like google implemented it.
At any rate, the following formula will do the trick. It's just a little roundabout:
=concatenate(SPLIT( LOWER(A1) , "abcdefghijklmnopqrstuvwxyz" ))
This is converting the string to lower case, then splitting the string using any letter of the alphabet. This will return an array of the numbers left over, which we concatenate back together.
Update, switched over to =REGEXREPLACE() instead of extract...:
=regexreplace(A1, "[a-z]", "")
That's a much cleaner and obvious way of doing it than that concat(split()) nonsense.

Excel - IF Formula with a FIND

I have an excel document I have to process regularly, while awaiting my company to build an automated process for this, and the issue we recently found is that the formula I'm using strips can't return a result other than #VALUE! when the FIND formula fails to find the text I need it to.
the formula we currently have is:
=IF(FIND("-",M2,3),RIGHT(M2,2))
The cells this formula checks have states, & provinces in them which look like so "CA-ON" or "US-NV".
The problem is that regions for the UK don't fillout as "UK-XX" it inputs the actual county for example "Essex" or "Merryside"
What I need the formula to do is, if it can't find the hyphen(-) in the cell, then it should just take whatever value is there and write it in the cell the formula is in.
I should also mention that some of the cells are also blank, since this is an optional field. Is there anyway to run this formula where if it doesn't find the "-" it just writes whats there?
What about using mid() to see if the third character is "-"
=IF(MID(A1,3,1)="-",RIGHT(A1,2),A1)
If you really want to use the find() function then:
=IF(ISNUMBER(FIND("-",A1)),RIGHT(A1,2),A1)
The IFERROR Function should help here. And there isn't a reason to use the if statement anymore. The formula below will find the hyphen if it is in the first 3 characters, and find the length of the string minus the location of the hyphen and return that string. The IFERROR will catch the instances where there is no hyphen and return your original cell.
=IFERROR(RIGHT(M2,LEN(M2)-FIND("-",LEFT(M2,3),1)),M2)

Last non-empty cell in a column

Does anyone know the formula to find the value of the last non-empty cell in a column, in Microsoft Excel?
Using following simple formula is much faster
=LOOKUP(2,1/(A:A<>""),A:A)
For Excel 2003:
=LOOKUP(2,1/(A1:A65535<>""),A1:A65535)
It gives you following advantages:
it's not array formula
it's not volatile formula
Explanation:
(A:A<>"") returns array {TRUE,TRUE,..,FALSE,..}
1/(A:A<>"") modifies this array to {1,1,..,#DIV/0!,..}.
Since LOOKUP expects sorted array in ascending order, and taking into account that if the LOOKUP function can not find an exact match, it chooses the largest value in the lookup_range (in our case {1,1,..,#DIV/0!,..}) that is less than or equal to the value (in our case 2), formula finds last 1 in array and returns corresponding value from result_range (third parameter - A:A).
Also little note - above formula doesn't take into account cells with errors (you can see it only if last non empty cell has error). If you want to take them into account, use:
=LOOKUP(2,1/(NOT(ISBLANK(A:A))),A:A)
image below shows the difference:
This works with both text and numbers and doesn't care if there are blank cells, i.e., it will return the last non-blank cell.
It needs to be array-entered, meaning that you press Ctrl-Shift-Enter after you type or paste it in. The below is for column A:
=INDEX(A:A,MAX((A:A<>"")*(ROW(A:A))))
Here is another option: =OFFSET($A$1;COUNTA(A:A)-1;0)
I know this question is old, but I'm not satisfied with the answers provided.
LOOKUP, VLOOKUP and HLOOKUP has performance issues and should really never be used.
Array functions has a lot of overhead and can also have performance issues, so it should only be used as a last resort.
COUNT and COUNTA run into problems if the data is not contiguously non-blank, i.e. you have blank spaces and then data again in the range in question
INDIRECT is volatile so it should only be used as a last resort
OFFSET is volatile so it should only be used as a last resort
any references to the last row or column possible (the 65536th row in Excel 2003, for instance) is not robust and results in extra overhead
This is what I use
when the data type is mixed: =max(MATCH(1E+306,[RANGE],1),MATCH("*",[RANGE],-1))
when it's known that the data contains only numbers: =MATCH(1E+306,[RANGE],1)
when it's known that the data contains only text: =MATCH("*",[RANGE],-1)
MATCH has the lowest overhead and is non-volatile, so if you're working with lots of data this is the best to use.
Inspired by the great lead given by Doug Glancy's answer, I came up with a way to do the same thing without the need of an array-formula. Do not ask me why, but I am keen to avoid the use of array formulae if at all possible (not for any particular reason, it's just my style).
Here it is:
=SUMPRODUCT(MAX(($A:$A<>"")*(ROW(A:A))))
For finding the last non-empty row using Column A as the reference column
=SUMPRODUCT(MAX(($1:$1<>"")*(COLUMN(1:1))))
For finding the last non-empty column using row 1 as the reference row
This can be further utilized in conjunction with the index function to efficiently define dynamic named ranges, but this is something for another post as this is not related to the immediate question addressed herein.
I've tested the above methods with Excel 2010, both "natively" and in "Compatibility Mode" (for older versions of Excel) and they work. Again, with these you do not need to do any of the Ctrl+Shift+Enter. By leveraging the way sumproduct works in Excel we can get our arms around the need to carry array-operations but we do it without an array-formula. I hope someone out there may appreciate the beauty, simplicity and elegance of these proposed sumproduct solutions as much as I do. I do not attest to the memory-efficiency of the above solutions though. Just that they are simple, look beautiful, help the intended purpose and are flexible enough to extend their use to other purposes :)
Hope this helps!
All the best!
This works in Excel 2003 (& later with minor edit, see below). Press Ctrl+Shift+Enter (not just Enter) to enter this as an array formula.
=IF(ISBLANK(A65536),INDEX(A1:A65535,MAX((A1:A65535<>"")*(ROW(A1:A65535)))),A65536)
Be aware that Excel 2003 is unable to apply an array formula to an entire column. Doing so yields #NUM!; unpredictable results may occur! (EDIT: Conflicting information from Microsoft: The same may or may not be true about Excel 2007; problem may have been fixed in 2010.)
That's why I apply the array formula to range A1:A65535 and give special treatment to the last cell, which is A65536 in Excel 2003. Can't just say A:A or even A1:A65536 as the latter automatically reverts to A:A.
If you're absolutely sure A65536 is blank, then you can skip the IF part:
=INDEX(A1:A65535,MAX((A1:A65535<>"")*(ROW(A1:A65535))))
Note that if you're using Excel 2007 or 2010, the last row number is 1048576 not 65536, so adjust the above as appropriate.
If there are no blank cells in the middle of your data, then I would just use the simpler formula, =INDEX(A:A,COUNTA(A:A)).
An alternative solution without array formulas, possibly more robust than that of a previous answer with a (hint to a) solution without array formulas, is
=INDEX(A:A,INDEX(MAX(($A:$A<>"")*(ROW(A:A))),0))
See this answer as an example.
Kudos to Brad and barry houdini, who helped solving this question.
Possible reasons for preferring a non-array formula are given in:
An official Microsoft page (look for "Disadvantages of using array formulas").
Array formulas can seem magical, but they also have some disadvantages:
You may occasionally forget to press CTRL+SHIFT+ENTER. Remember to press this key combination whenever you enter or edit an array formula.
Other users may not understand your formulas. Array formulas are relatively undocumented, so if other people need to modify your workbooks, you should either avoid array formulas or make sure those users understand how to change them.
Depending on the processing speed and memory of your computer, large array formulas can slow down calculations.
Array Formula Heresy.
if you search in Column (A) use :
=INDIRECT("A" & SUMPRODUCT(MAX((A:A<>"")*(ROW(A:A)))))
if your range is A1:A10 you can use:
=INDIRECT("A" & SUMPRODUCT(MAX(($A$1:$A10<>"")*(ROW($A$1:$A10)))))
in this formula :
SUMPRODUCT(MAX(($A$1:$A10<>"")*(ROW($A$1:$A10))))
returns last non blank row number ,and indirect() returns cell value.
=INDEX(A:A, COUNTA(A:A), 1) taken from here
=MATCH("*";A1:A10;-1) for textual data
=MATCH(0;A1:A10;-1) for numerical data
Ive tried all the non-volatile versions but Not one version given above has worked.. excel 2003/2007update. Surely this can be done in excel 2003. Not as an array nor standard formula.
I either get just a blank, 0 or #value error.
So I resort to the volatile methods .. This worked..
=LOOKUP(2,1/(T4:T369<>""),T4:T369)
#Julian Kroné .. Using ";" instead of "," does NOT work! I think you are using Libre Office not MS excel?
LOOKUP is so annoyingly volitile I use it as a last resort only
For Microsoft office 2013
"Last but one" of a non empty row:
=OFFSET(Sheet5!$C$1,COUNTA(Sheet5!$C:$C)-2,0)
"Last" non empty row:
=OFFSET(Sheet5!$C$1,COUNTA(Sheet5!$C:$C)-1,0)
Place this code in a VBA module. Save. Under functions, User defined look for This function.
Function LastNonBlankCell(Range As Excel.Range) As Variant
Application.Volatile
LastNonBlankCell = Range.End(xlDown).Value
End Function
for textual data:
EQUIV("";A1:A10;-1)
for numerical data:
EQUIV(0;A1:A10;-1)
This give you the relative index of the last non empty cell in the range selected (here A1:A10).
If you want to get the value, access it via INDIRECT after building -textually- the absolute cell reference, eg:
INDIRECT("A" & (nb_line_where_your_data_start + EQUIV(...) - 1))
I had the same problem too. This formula also works equally well:-
=INDIRECT(CONCATENATE("$G$",(14+(COUNTA($G$14:$G$65535)-1))))
14 being the row number of the first row in the rows you want to count.
Chronic Clawtooth
I used HLOOKUP
A1 has a date;
A2:A8 has forecasts captured at different times, I want the latest
=Hlookup(a1,a1:a8,count(a2:a8)+1)
This uses a standard hlookup formula with the lookup array defined by the number of entries.
If you know that there are not going to be empty cells in between, the fastest way is this.
=INDIRECT("O"&(COUNT(O:O,"<>""")))
It just counts the non-empty cells and refers to the appropriate cell.
It can be used for a specific range as well.
=INDIRECT("O"&(COUNT(O4:O34,"<>""")+3))
This returns the last non empty cell in the range O4:O34.
This formula worked with me for office 2010:
=LOOKUP(2;1/(A1:A100<>"");A1:A100)
A1: the first cell
A100: refer to the last cell in comparing
I think the response from W5ALIVE is closest to what I use to find the last row of data in a column. Assuming I am looking for the last row with data in Column A, though, I would use the following for the more generic lookup:
=MAX(IFERROR(MATCH("*",A:A,-1),0),IFERROR(MATCH(9.99999999999999E+307,A:A,1),0))
The first MATCH will find the last text cell and the second MATCH finds the last numeric cell. The IFERROR function returns zero if the first MATCH finds all numeric cells or if the second match finds all text cells.
Basically this is a slight variation of W5ALIVE's mixed text and number solution.
In testing the timing, this was significantly quicker than the equivalent LOOKUP variations.
To return the actual value of that last cell, I prefer to use indirect cell referencing like this:
=INDIRECT("A"&MAX(IFERROR(MATCH("*",A:A,-1),0),IFERROR(MATCH(9.99999999999999E+307,A:A,1),0)))
The method offered by sancho.s is perhaps a cleaner option, but I would modify the portion that finds the row number to this:
=INDEX(MAX((A:A<>"")*(ROW(A:A))),1)
the only difference being that the ",1" returns the first value while the ",0" returns the entire array of values (all but one of which are not needed). I still tend to prefer addressing the cell to the index function there, in other words, returning the cell value with:
=INDIRECT("A"&INDEX(MAX((A:A<>"")*(ROW(A:A))),1))
Great thread!
If you are not afraid to use arrays, then the following is a very simple formula to solve the problem:
=SUM(IF(A:A<>"",1,0))
You must press CTRL + SHIFT + ENTER because this is an array formula.
INDEX returns a value by index position in an array and ROWS then is used to specify the last position of the array.
=LET(array,A1:A10,INDEX(array,ROWS(array)))
Also works for multiple columns when setting the parameter [column_num] of INDEX to 0:
=LET(array,A1:C10,INDEX(array,ROWS(array),0))
A simple one which works for me:
=F7-INDEX(A:A,COUNT(A:A))
Okay, so I had the same issue as the asker, and tried both top answers. But only getting formula errors. Turned out that I needed to exchange the "," to ";" for the formulas to work. I am using XL 2007.
Example:
=LOOKUP(2;1/(A:A<>"");A:A)
or
=INDEX(A:A;MAX((A:A<>"")*(ROW(A:A))))
For version tracking (adding the letter v to the beginning of the number), I found this one to work well in Xcelsius (SAP Dashboards)
="v"&MAX(A2:A500)

Resources