if third letter is equal to - excel

I have a cell which contains any 4 letters, example akei, skiw. How do I ask
"if the 3rd letter is an i then equals True"
I was thinking something like
"=if(a1="??i?", True, False)"
But that dosen't Work

=MID(A1,3,1) = "i"
Should work, you don't need to use IF, the evaluation using the equals will return either TRUE or FALSE

The MID function let you to select a portion of the text, if you set the Position Start and the numbers of characters you want
=MID(A1,3,1) = "i"
so you just compare it to "i"

You could use the wildcard approach if you use COUNTIF like this
=COUNTIF(A1,"??i?")
That will return 1 or 0 and effectively tests two things, that A1 contains 4 characters AND the third one is "i"
As with MID this isn't case-sensitive so 1 will be returned for both XXIX and zziz

So I came upon this question for a similar use: Expanding a numbering system for lab specimen with each digit distinguishing something different and the 3rd digit indicating a heating profile
The resulting function was used to expand these codes out for people that would not know my logic otherwise:
=IF(MID(B2,3,1) = "1", "Temp1°C for 1 hour", "Temp2°C for X hours")

Related

How to fix the #SPILL! Error by displaying only the second value?

I have a column with some info displayed like that:
Product Info
I am the 3rd product from 2020
I was created in 1995 and I went public in 2021
I am a not sure if I'm from 2019 2020 2021
I have a formula to extract the year in the above column that is:
=IFERROR(FILTERXML("<k><m>"&SUBSTITUTE([#[Product Name]]," ","</m><m>")&"</m></k>","//m[.=number() and string-length()=4]"),"")
The problem with this formula is that it works fine with the first case, but it gives me a #SPILL! Error on the other two cases. My ideal output would be:
Product Info
Year
I am the 3rd product from 2020
2020
I was created in 1995 and I went public in 2021
2021
I am a not sure if I'm from 2019 2020 2021
Basically, for the first case, just return the 4 digits. EVERY time that I only have one sequence of 4 digits, I want to return that sequence.
For the second case, I want to return ONLY the second year. EVERY time I have 2 sequences of 4 digits, I want to return ONLY the second year.
For the third case, I want to return nothing. EVERY time I have more than 2 sequences of 4 digits, I want to return blank.
The last thing I tried to add was position()>5 and that would cut off the 1995 in the second example, but I would continue having the Error on the third example. Also, my list is quite huge, and I am not sure if the position()>5 thing would work for ALL products that fall in the same second example.
I am not very good with XPATH, so any help would be greatly appreciated.
Thank you!
Disclaimer: Below solution is written on the assumption that when 'count of years < 3', return the last given year. If 'count >= 3' then only return the last year if years come in pairs of two. Hence the use of 'modulus 2 == 0'.‡
You can expand the xpath for sure if you so desire. However, I'd rewrite it a little bit. Each predicate, the structure between the opening and closing square brackets, is a filter of a given nodelist. To write multiple of these structures is in fact anding such predicates. To get a better understanding of what most common xpath 1.0 functions can do within FILTERXML(), I'd like to redirect you to this post.
So to write a consecutive pattern of predicates I'd opt for:
[.*0=0] - First return a filtered nodelist of all numbers where a node multiplied by zero equals zero;
[string-length()=4] - Then return only those that are 4 characters long‡‡;
[position() = last() and (position() = 1 or position() mod 2 = 0)] - The 3rd and last predicate is the trickiest for your query. This is done with a first check that position() = last() meaning the node needs to be the last node in the filtered nodelist of step 2 and (position() = 1 or position() mod 2 = 0) means we want to check that this node is also at the 1st index or the modulus 2 of the indexed position equals 0‡‡‡.
Formula in B2:
=IFERROR(FILTERXML("<t><s>"&SUBSTITUTE(A2," ","</s><s>")&"</s></t>","//s[.*0=0][string-length()=4][position() = last() and (position() = 1 or position() mod 2 = 0)]"),"")
Whilst the above would work for Excel 2013 and higher‡‡‡‡, you do talk about spilled behaviour. If you happen to work with the current channel in ms365 you could also try:
=LET(x,TEXTSPLIT(A2," "),y,--FILTER(x,ISNUMBER(-(x&"**0"))*(LEN(x)=4),{1,2,3}),z,COUNT(y),IF(OR(z=1,MOD(z,2)=0),TAKE(y,,-1),""))
‡ If you need to simply return the last year if 'count < 3' then you can use xpath "//s[.*0=0][string-length()=4][position()<3 and position() = last()]" or ms365 formula =LET(x,TEXTSPLIT(A2," "),y,FILTER(x,ISNUMBER(-(x&"**0"))*(LEN(x)=4),""),IF(COUNTA(y)>2,"",TAKE(y,,-1))).
‡‡ Note that you can be more strict about this if you'd wish to validate that a year is between say 1900-2050 or so. One could replace the 1st and 2nd predicate with [.*1>1899][.*1<2051].
‡‡‡ Note that the order or writing your and/or statements in xpath do matter. We need to use explicit parentheses to control the precedence. See this
‡‡‡‡ This is not true for Excel Online or Excel for Mac
Just add a simple clause to determine the number of returns, for example using ROWS (since by default FILTERXML returns a vertical array):
=LET(
ζ, FILTERXML(
"<k><m>" &
SUBSTITUTE(
[#[Product Name]],
" ",
"</m><m>"
) & "</m></k>",
"//m[.=number() and string-length()=4]"
),
ξ, ROWS(ζ),
IF(ξ > 2, "", INDEX(ζ, ξ))
)
Edit: I might prefer to avoid FILTERXML here:
=LET(
ζ, TEXTSPLIT([#[Product Name]], " "),
ξ, -(ζ & "**0"),
IF(COUNT(ξ) > 2, "", IFERROR(-LOOKUP(1, FILTER(ξ, LEN(ζ) = 4)), ""))
)
You can try the following using TEXTAFTER function. Assuming you have years at the end delimited by space. If that is not the case, the formula can be adapted to have additional checks (it is a number and four-digit, but strictly speaking a year can have less or more than 4 digits). Let me know if the previous assumption doesn't apply so I can try to adapt it. The following is an array version, so you can use the entire table column in case you are using excel tables:
=LET(in,A2:A4,last,TEXTAFTER(in," ",-1),
IF(ISNUMBER(1*TEXTAFTER(SUBSTITUTE(in," "&last,"")," ",-1)),"",last))
For the case of more than one year, it removes the last year found, and if the second search is a number, then it returns empty, otherwise returns the previous year found.

How do I find the most common string in a column by replacing?

Suppose I have a column of wind directions ("N","S","W",E"). Each cell only contains 1 letter. If I am to find the most common wind directions,=CHAR(MODE(CODE(range))) will do the job
But if I handle wind directions like "SW","NE", the above function would not work. I know that =INDEX(range, MODE(MATCH(range, range, 0 ))) will work.
Just curious, somewhat similar to the first function, is there a way to substitue strings with numbers of choice only when passing in the column into MODE()function, so that it will return a number for me to MATCH() and INDEX() to get the result?
Clarify: Say that I have the following data
And I would like to substitue "N" with 0, "NE" with 45, "SW" with 225 and so on. So that MODE() will be applicable. And if needed, I can then use functions like INDEX(MATCH()) to return the actual letter representation of the wind direction.
=SWITCH(MODE(SWITCH(A:A,"N",0,"NE",45,"E",90,"SE",135,"S",180,"SW",225,"W",270,"NW",315,"")),0,"N",45,"NE",90,"E",135,"SE",180,"S",225,"SW",270,"W",315,"NW","")
Or similar:
=CHOOSE(1+MODE(SWITCH(A:A,"N",0,"NE",45,"E",90,"SE",135,"S",180,"SW",225,"W",270,"NW",315,""))/45,"N","NE","E","SE","S","SW","W","NW")
This first translates the strings to values, then translates the MODE result back to it's string.

How do I find the last number in a string with excel formulas

I'm parsing strings in excel, and I need to return everything through the last number. For example:
Input: A00XX
Output: A00
In my case, I know the last number will be between index 3 and 5, so I'm brute-forcing it with:
=LEFT([#Point],
IF(SUM((MID([#Point],5,1)={"0","1","2","3","4","5","6","7","8","9"})+0),5,
IF(SUM((MID([#Point],4,1)={"0","1","2","3","4","5","6","7","8","9"})+0),4,
IF(SUM((MID([#Point],3,1)={"0","1","2","3","4","5","6","7","8","9"})+0),3,
))))
Unfortunately, I've run into some edge cases where the numbers extended beyond index 5. Is there a generic way to find the last number in a string using excel formulas?
Note:
I've tried =MAX(SEARCH(... but it returns the index of the first number, not the last.
As a starting point: if we know the position of the last number, we can use LEFT to get the string to that point. Suppose that the position is 5:
=LEFT(A1, 5)
But, we don't know the position of the last number. Now, what if the only valid number was 0, and it only appeared once: then we could use FIND to locate the position of the number:
=LEFT(A1, FIND(0, A1))
But, we have more than one valid number. Suppose that we had all the numbers from 0 through 9, but each number could only appear once — then we could use MAX on a FIND array, to tell us which of the numbers is the last one:
=LEFT(A1, MAX(FIND({0,1,2,3,4,5,6,7,8,9}, A1)))
Unfortunately, FIND will throw a #VALUE! error any number doesn't appear, which will then make MAX return the same error. So, we need to fix that with IFERROR:
=LEFT(A1, MAX(IFERROR(FIND({0,1,2,3,4,5,6,7,8,9}, A1), 0)))
However, numbers can appear more than once. As such, we need a method to find the last occurrence of a value in a string (since FIND and SEARCH will, by default, return the first occurrence).
The SUBSTITUTE function has 3 mandatory arguments — Initial String, Value to be Replaced, Value to Replace with — and one Optional argument — the occurrence to replace. Normally, this is omitted, so that all occurrences are replaced. But, if we know how many times a character appears in a string, then we can replace just the last instance with a special/uncommon sub-string to search for.
To count how many times a character appears in a String, just start with the length of the String, then subtract the length when you SUBSTITUTE all copies of that character for Nothing:
=LEN(A1) - LEN(SUBSTITUTE(A1, 0, ""))
This means we can now replace the last occurrence of the character with, for example, ">¦<", and then FIND that:
=FIND(">¦<", SUBSTITUTE(A1, 0, ">¦<", LEN(A1) - LEN(SUBSTITUTE(A1, 0, ""))))
Of course, we want to do this for all the numbers from 0 to 9, and take the MAX value (remembering our IFERROR), so we need to put the Array of values back in:
=MAX(IFERROR(FIND(">¦<", SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, ">¦<", LEN(A1) - LEN(SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, "")))), 0))
Then, we plug that all back into our initial LEFT function:
=LEFT(A1, MAX(IFERROR(FIND(">¦<", SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, ">¦<", LEN(A1) - LEN(SUBSTITUTE(A1, {0,1,2,3,4,5,6,7,8,9}, "")))), 0)))
An alternative, assuming that the length of the string in question will never be more than 9 characters (which seems a safe assumption based on your description):
=LEFT(A1,MATCH(0,0+ISERR(0+MID(A1,{1;2;3;4;5;6;7;8;9},1))))
This, depending on your version of Excel, may or may not require committing with CTRL+SHIFT+ENTER.
Note also that the separator within the array constant {1;2;3;4;5;6;7;8;9} is the semicolon, which, for English-language versions of Excel, represents the row-separator. This may require amending if you are using a non-English-language version.
Of course, we can replace this static constant with a dynamic construction. However, since we are already making the assumption that 9 is an upper limit on the number of characters for the string in question, this would not seem to be necessary.
If you have the newest version of Excel, you can try something like:
=LEFT(D1,
LET(x, SEQUENCE(LEN(D1)),
MAX(IF(ISNUMBER(NUMBERVALUE(MID(D1, SEQUENCE(LEN(D1)), 1))), x))))
For example:

Find entire number between # and space at variable places within a cell

How can I find an entire number between a "#" and a space when that combination could appear anywhere in a given cell?
Example cell contents:
"This is a #123 Test that I 45 like to run"
"This is a #45 Test that I 98 like to run"
I need to return "123" from the first one and "45" from the second one.
Using Mid(), I can return the "1", but the problem is the number between # and space can vary in length, but there will generally be a #, number or numbers, then a space.
As a secondary issue, there may be scenarios where there is no "#", but I need to find the first numeric value in the cell and return them (i.e. "1", "34", "648").
Any advice on either of these challenges is greatly appreciated.
This should work as well:
=MID(A11,(FIND("#",A11,1)+1),FIND(" ",A11,FIND("#",A11,1)+1)-FIND("#",A11,1))
works by looking for the hash and the following space... Not for the secondary question...
Since you've put the excel-vba tag on your question, here's a vba way of doing it using regular expressions that should satisfy both your primary and secondary issues:
Sub tmp()
Dim regEx As New RegExp
regEx.Pattern = "^.*?\#?(\d+)"
Dim i As Integer
For i = 1 To Range("A" & Rows.Count).End(xlUp).Row:
Set mat = regEx.Execute(Cells(i, 1).Value)
If mat.Count = 1 Then
Cells(i, 2).Value = mat(0).SubMatches(0)
End If
Next
End Sub
The regular expression uses a non-greedy character search (ie the "?" on the end of "'.*?" is what does that) to find the first pattern in the cell that matches either "#123" or just "123" where the "123" is any arbitrary sequence of digits.
This will return the first number in a string:
=--LEFT(MID(A1,AGGREGATE(15,6,FIND({1,2,3,4,5,6,7,8,9,0},A1),1),LEN(A1)),FIND(" ",MID(A1,AGGREGATE(15,6,FIND({1,2,3,4,5,6,7,8,9,0},A1),1),LEN(A1))))
AGGREGATE was introduced in 2010 Excel. If you do not ahve that then you will need to use this array formula:
=--LEFT(MID(A1,MIN(IFERROR(FIND({1,2,3,4,5,6,7,8,9,0},A1),1E+99)),LEN(A1)),FIND(" ",MID(A1,MIN(IFERROR(FIND({1,2,3,4,5,6,7,8,9,0},A1),1E+99)),LEN(A1))))
Being an array formula it needs to be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode. If done correctly then excel will put {} around the formula.

Excel 2007 - Generate unique ID based on text?

I have a sheet with a list of names in Column B and an ID column in A. I was wondering if there is some kind of formula that can take the value in column B of that row and generate some kind of ID based on the text? Each name is also unique and is never repeated in any way.
It would be best if I didn't have to use VBA really. But if I have to, so be it.
Solution Without VBA.
Logic based on First 8 characters + number of character in a cell.
= CODE(cell) which returns Code number for first letter
= CODE(MID(cell,2,1)) returns Code number for second letter
= IFERROR(CODE(MID(cell,9,1)) If 9th character does not exist then return 0
= LEN(cell) number of character in a cell
Concatenating firs 8 codes + adding length of character on the end
If 8 character is not enough, then replicate additional codes for next characters in a string.
Final function:
=CODE(B2)&IFERROR(CODE(MID(B2,2,1)),0)&IFERROR(CODE(MID(B2,3,1)),0)&IFERROR(CODE(MID(B2,4,1)),0)&IFERROR(CODE(MID(B2,5,1)),0)&IFERROR(CODE(MID(B2,6,1)),0)&IFERROR(CODE(MID(B2,7,1)),0)&IFERROR(CODE(MID(B2,8,1)),0)&LEN(B2)
Sorry, I didn't found a solution with formula only even if this thread might help (trying to calculate the points in a scrabble game) but I didn't find a way to be sure the generated hash would be unique.
Yet, here is my solution, based on a UDF (Used-Defined Function):
Put the code in a module:
Public Function genId(ByVal sName As String) As Long
'Function to create a unique hash by summing the ascii value of each character of a given string
Dim sLetter As String
Dim i As Integer
For i = 1 To Len(sName)
genId = Asc(Mid(sName, i, 1)) * i + genId
Next i
End Function
And call it in your worksheet like a formula:
=genId(A1)
[EDIT] Added the * i to take into account the order. It works on my unit tests
May be OTT for your needs, but you can use a call to CoCreateGuid to get a real GUID
Private Declare Function CoCreateGuid Lib "ole32" (ID As Any) As Long
Function GUID() As String
Dim ID(0 To 15) As Byte
Dim i As Long
If CoCreateGuid(ID(0)) = 0 Then
For i = 0 To 15
GUID = GUID & Format(Hex$(ID(i)), "00")
Next
Else
GUID = "Error while creating GUID!"
End If
End Function
Test using
Sub testGUID()
MsgBox GUID
End Sub
How to best implement depends on your needs. One way would be to write a macro to get a GUID populate a column where names exist. (note, using it as a udf as is is no good, since it will return a new GUID when recalculated)
EDIT
See this answer for creating a SHA1 hash of a string
Do you just want an incrementing numeric id column to sit next to your values? If so, and if your values will always be unique, you can very easily do this with formulae.
If your values were in column B, starting in B2 underneath your headers for example, in A2 you would type the formula "=IF(B2="","",1+MAX(A$1:A1))". You can copy and paste that down as far as your data extends, and it will increment a numeric identifier for each row in column B which isn't blank.
If you need to do anything more complicated, like identify and re-identify repeating values, or make identifiers 'freeze' once they're populated, let me know. Currently, when you clear or add values to your list the identifers will toggle themselves up and down, so you need to be careful if your data changes.
Unique identifier based on the number of specific characters in text. I used an identifier based on vowels and numbers.
=LEN($J$14)-LEN(SUBSTITUTE($J$14;"a";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"e";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"i";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"j";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"o";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"u";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"y";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"1";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"2";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"3";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"4";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"5";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"6";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"7";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"8";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"9";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"0";""))
You say you are confident that there are no duplicate values in your words. To push it further, are you confident that the first 8 characters in any word would be unique?
If so, you can use the below formula. It works by individually taking each character's ASCII code - 40 [assuming normal characters, this puts numbers at between 8 & 57, and letters at between 57 & 122], and multiplying that characters code by 10 ^ [that character's digit placement in the word]. Basically it takes that character code [-40], and concatenates each code onto the next.
EDIT Note that this code no longer requires that at least 8 characters exist in your word to prevent an error, as the actual word to be coded has 8 "0"'s appended to it.
=TEXT(SUM((CODE(MID(LOWER(RIGHT(REPT("0",8)&A3,8)),{1,2,3,4,5,6,7,8},1))-40)*10^{0,2,4,6,8,10,12,14}),"#")
Note that as this uses the ASCII values of the characters, the ID # could be used to identify the name directly - this does not really create anonymity, it just turns 8 unique characters into a unique number. It is obfuscated with the -40, but not really 'safe' in that sense. The -40 is just to get normal letters and numbers in the 2 digit range, so that multiplying by 10^0,2,4 etc. will create a 2 digit unique add-on to the created code.
EDIT FOR ALTERNATIVE METHOD
I had previously attempted to do this so that it would look at each letter of the alphabet, count the number of times it appears in the word, and then multiply that by 10*[that letter's position in the alphabet]. The problem with doing this (see comment below for formula) is that it required a number of 10^26-1, which is beyond Excel's floating point precision. However, I have a modified version of that method:
By limiting the number of allowed characters in the alphabet, we can get the max total size possible to 10^15-1, which Excel can properly calculate. The formula looks like this:
=RIGHT(REPT("0",15)&TEXT(SUM(LEN(A3)*10^{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}-LEN(SUBSTITUTE(A3,MID(Alphabet,{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15},1),""))*10^{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}),"#"),15)
[The RIGHT("00000000000000"... portion of the formula is meant to keep all codes the same number of characters]
Note that here, Alphabet is a named string which holds the characters: "abcdehilmnorstu". For example, using the above formula, the word "asdf" counts the instances of a, s, and d, but not 'f' which isn't in my contracted alphabet. The code of "asdf" would be:
001000000001001
This only works with the following assumptions:
The letters not listed (nor numbers / special characters) are not required to make each name unique. For example, asdf & asd would have the same code in the above method.
And,
The order of the letters is not required to make each name unique. For example, asd & dsa would have the same code in the above method.

Resources