Excel: complicated SUMPRODUCT formula needed - excel

I'm in search of a SUMPRODUCT formula, or a similar sort of formula which does the same thing. It should do the following:
On worksheet A it needs to ignore incorrect zipcodes, meaning
zipcodes which do not consist of 4 numbers and 2 letters need to be
ignored. It also needs to take into account that there sometimes are
superfluous spaces behind the zipcode. And sometimes there is a
space between the numbers and letters, sometimes there isn't. Just
the 4 numbers and 2 letters need to be compared.
The correct zipcodes on worksheet A need to be compared with the
zipcodes on worksheet B. If they match, then all the values behind
the zip code need to be summed up. If there is another record
starting with the same zipcode then these need to be added up as
well.
Neither of the worksheets should need to be changed, since the data is generated frequently. The formula should be able to work on a third, separate worksheet. And it should work in Excel 2003.
EDIT: Added point 3.
I'll add an image to visualize what I mean. Hopefully someone can help me!

With some helper columns, you could use something like this (open in new tab for larger version):
The formulae:
In B2 to remove spaces and hence get a 'clean' ZIP and check the length:
=IF(LEN(SUBSTITUTE(A2," ",""))=6,SUBSTITUTE(A2," ",""),"")
In C2, to get the sum:
=IFERROR(IF(AND(ISNUMBER(LEFT(B2,4)*1),CODE(MID(LOWER(B2),5,1))>=97,CODE(MID(LOWER(B2),5,1))<=122,CODE(RIGHT(LOWER(B2)))>=97,CODE(RIGHT(LOWER(B2)))<=122),SUMPRODUCT($H$2:$K$8*($G$2:$G$8=B2)),""),"")
In G2, I used the same one as in B2:
=IF(LEN(SUBSTITUTE(F2," ",""))=6,SUBSTITUTE(F2," ",""),"")
Without the helper, the formula becomes much longer because of repeating parts:
=IFERROR(IF(AND(LEN(SUBSTITUTE(A2," ",""))=6,ISNUMBER(LEFT(SUBSTITUTE(A2," ",""),4)*1),CODE(MID(LOWER(SUBSTITUTE(A2," ","")),5,1))>=97,CODE(MID(LOWER(SUBSTITUTE(A2," ","")),5,1))<=122,CODE(RIGHT(LOWER(SUBSTITUTE(A2," ",""))))>=97,CODE(RIGHT(LOWER(SUBSTITUTE(A2," ",""))))<=122),SUMPRODUCT($H$2:$K$8*(SUBSTITUTE($F$2:$F$8," ","")=SUBSTITUTE(A2," ",""))),""),"")
Or
=IFERROR(
IF(
AND(
LEN(SUBSTITUTE(A2," ",""))=6, ' Check length
ISNUMBER(LEFT(SUBSTITUTE(A2," ",""),4)*1), ' Check numbers
CODE(MID(LOWER(SUBSTITUTE(A2," ","")),5,1))>=97, ' Check if letter
CODE(MID(LOWER(SUBSTITUTE(A2," ","")),5,1))<=122, ' Check if letter
CODE(RIGHT(LOWER(SUBSTITUTE(A2," ",""))))>=97, ' Check if letter
CODE(RIGHT(LOWER(SUBSTITUTE(A2," ",""))))<=122 ' Check if letter
),
SUMPRODUCT(
$H$2:$K$8*
(SUBSTITUTE($F$2:$F$8," ","")=SUBSTITUTE(A2," ",""))),
""
),
""
)
Oops, forgot that IFERROR was not in 2003. The only reason why I used it was that MID would return an empty string and CODE would subsequently give an error. You can use the below instead which makes sure the string is 6 chars first:
=IF(LEN(SUBSTITUTE(A2," ",""))=6,IF(AND(ISNUMBER(LEFT(SUBSTITUTE(A2," ",""),4)*1),CODE(MID(LOWER(SUBSTITUTE(A2," ","")),5,1))>=97,CODE(MID(LOWER(SUBSTITUTE(A2," ","")),5,1))<=122,CODE(RIGHT(LOWER(SUBSTITUTE(A2," ",""))))>=97,CODE(RIGHT(LOWER(SUBSTITUTE(A2," ",""))))<=122),SUMPRODUCT($H$2:$K$8*(SUBSTITUTE($F$2:$F$8," ","")=SUBSTITUTE(A2," ",""))),""),"")

Here you have a formula to validate Dutch postal codes
=AND(LEN(A2)=6; ISNUMBER(VALUE(LEFT(A2;4))); CODE(MID(LOWER(A2);5;1)) >= 97; CODE(MID(LOWER(A2);5;1)) <= 122; CODE(MID(LOWER(A2);6;1)) >= 97; CODE(MID(LOWER(A2);6;1)) <= 122)
0-9 = ASCII code 48 to 57
a-z = ASCII code 97 to 122 (lowercase)
In case you have a Dutch version of Excel, the formula would be:
=EN(LENGTE(A2)=6; ISGETAL(WAARDE(LINKS(A2;4))); CODE(DEEL(KLEINE.LETTERS(A2);5;1)) >= 97; CODE(DEEL(KLEINE.LETTERS(A2);5;1)) <= 122; CODE(DEEL(KLEINE.LETTERS(A2);6;1)) >= 97; CODE(DEEL(KLEINE.LETTERS(A2);6;1)) <= 122)

Related

How/which formula to use, to show combine text results for false condition (for pending task reporting usage)?

Wanted to check if CONCATENATE is the one to use (not sure if my excel has TEXTJOIN), and how to show just the text that has empty value in the cells.
For example in my attachment below, I want the intended result shown like in B2 and B3, where the texts shown with delimiter, when the values are false (empty).
If I were to use CONCATENATE like in Row 10 and Row 11, it's rather manual and it only capture "positive values" as in non-blank cells.
Purpose: To show pending tasks (empty/blank status cells)
Use MID with CONCATENATED IFS:
=MID(IF(C2="","/"&$C$1,"")&IF(D2="","/"&$D$1,"")&IF(E2="","/"&$E$1,"")&IF(F2="","/"&$F$1,"")&IF(GC2="","/"&$G$1,"")&IF(H2="","/"&$H$1,""),2,999)
I would use TEXJOIN and FILTER if you have the newest version of Excel.
For example: =TEXTJOIN("/",1,FILTER($E$2:$I$2, ISBLANK(E3:I3)))
EDIT: For older versions, a temporary workaround is as follows:
make a temporary array the same size as your original dataframe where each value is determined by a formula such as =IF(ISBLANK(E3), E$2&"/","")
Use something like =LEFT(CONCAT(E15:J15), LEN(CONCAT(E15:J15))-1) to get the desired result (where E15:J15 is where I elected to store the first row of the temporary array created in step 1).
I am not sure of your Excel version, but I think this would work in older versions (formatted for readability - will work if you paste it directly into cell B2 and copy down):
=LEFT(CONCAT( INDEX( CHOOSE({1;2;3},$C$1:$H$1,{"/","/","/","/","/","/"},{"","","","","",""}),
INDEX( IF(ISBLANK(C2:H2),{1;2},{3;3}),
MOD(COLUMN(A1:INDEX(1:1,,12))-1,2)+1,
(COLUMN(A1:INDEX(1:1,,12))-1)/2+1 ),
(COLUMN(A1:INDEX(1:1,,12))-1)/2+1 ) ),
SUM(7*ISBLANK(C2:H2))-1 )
Notes
As this is an array formula, you may have to enter it with CTRL + SHIFT + ENTER with an older version of Excel.
The stat labels must all have a length of 6 characters as shown in your post. If not, then they must at least have the same length and the last line SUM(7*ISBLANK(C2:H2))-1 must be changed to replace the 7 with the string length + 1, e.g. a length of 9 would be SUM(10*ISBLANK(C2:H2))-1.
If they don't have the same length, the LEFT( can be removed along with the SUM(10*ISBLANK(C2:H2))-1) at the end. You will end up having a trailing / delimiter at the end. You could fix that for the case of stat F being the last part by changing {"/","/","/","/","/","/"} to {"/","/","/","/","/",""}, but the other cases would still have a trailing /. Another approach is much more complex, but the component SUM(10*ISBLANK(C2:H2))-1) could be shaped to identify what to cut off or maybe a helper column could be built - in any case, let's hope your situation is that the stat labels all have the same length.
The delimiter "/" can be changed, but must always be a single character. If not, then then last line must be changed to SUM( [label length + delimiter length] *ISBLANK(C2:H2))-1.
This formula is fixed to 6 stat columns. If you need for it to accommodate more, it is possible by extending the {"/","/","/","/","/","/"} and {"","","","","",""} (one element for each new column) and replacing every 12 with 2 times the number of columns. Also, obviously, the references $C$1:$H$1 and C1:H2 must be changed to read in your new columns.

Excel - Deleting zeros on the right side of the cells

I would like to delete the zeros on the right side of the cells if there are more then 3 zeros.
Example:
A B
12345 12345
1230 1230
12345600 12345600
12000 12000
12340000000000000 1234000
1234500000000000000000 12345000
Is it possible to excel using just formula in the cells of the column B??
How to do?
Thanks so much!
The answers, given until now, are treating the numbers as strings, while I'd go for the numeric approach:
if mod(number,10000) = 0
then number = number div 1000;
return number;
Which means: if the number, divided by 10,000 equals 0 (if the number ends with '0000') then return the number, divided by a thousand (remove the last three zeroes).
You don't need this one time, but you need to remove all triplets of three zeroes, as much as possible, so instead of a simple if-loop, you might go for a while-loop:
while mod(number,10000) = 0
do number = number div 1000;
return number;
You can use this in a VBA function:
Public Function remove_ending_blanks(r As Range) As Double
Dim temp As Double
temp = r.Value
While temp Mod 10000 = 0
temp = temp / 1000
Wend
remove_ending_blanks = temp
End Function
You might also do this, using a formula, but the while-loop will need to be done using a circular reference, which is quite tricky.
Try this shorter formula solution and worked in left max. 3 zeros on the right side.
In B1, formula copied down :
=0+TRIM(LEFT(A1,MATCH(9^9,INDEX(1/MID(A1,ROW($1:$99),1),0))+3))
Bit of a stretch (I'm prety sure it can be done better), but if you have access to TEXTJOIN, try the following in B2:
=IF(RIGHT(A1,4)="0000",FILTERXML("<t><s>"&TEXTJOIN("</s><s>",1,MID(A1,1,LEN(A1)-ROW(A$1:INDEX(A:A,LEN(A1)))))&"</s></t>","//s[substring(., string-length(.)-3) != 0]"),A1)
Or:
=IF(RIGHT(A7,4)="0000",LEFT(A7,MAX((MID(A7,ROW(A$1:INDEX(A:A,LEN(A7))),1)<>"0")*(ROW(A$1:INDEX(A:A,LEN(A7)))))+3),A7)
Note: It's an array formula and needs to be confirmed through CtrlShiftEnter
It looks frightening, agreed, but would yield the correct result as far as my testing went. For example 100000400000 would yield 1000004000.
you could use this formula
=IF(RIGHT(A1,4)="0000",LEFT(A1,LEN(A1/10^LEN(A1))-FIND(".",A1/10^LEN(A1))+3),A1)
Consider:
Public Function ZeroTrimmer(r As Range) As String
s = r.Text
While Right(s, 4) = "0000"
s = Left(s, Len(s) - 1)
Wend
ZeroTrimmer = s
End Function
Edit: These will give incorrect results. I mis-read the question and didn't realize you wanted to leave a max of 3 trailing zeroes. The below will remove all trailing zeroes.
If it's an unknown number of trailing zeroes, it gets tricky. You'd have to use something like this...
=(10^(LEN(RIGHT(VALUE(CONCATENATE("0.", A2)),LEN(VALUE(CONCATENATE("0.", A2)))-FIND(".",VALUE(CONCATENATE("0.", A2)))))))*VALUE(CONCATENATE("0.", A2))
If you know the number of trailing zeroes it becomes much easier, and can be done like this:
=LEFT(A2,LEN(A2)-3)
Where 3 in the above formula represents the number of trailing zeroes to remove. Another variation could be:
=A2/(10^3)
These formulas will work if text or numeric.

How do I sum data based on a PART of the headers name?

Say I have columns
/670 - White | /650 - black | /680 - Red | /800 - Whitest
These have data in their rows. Basically, I want to SUM their values together if their headers contain my desired string.
For modularity's sake, I wanted to merely specify to sum /670, /650, and /680 without having to mention the rest of the header text.
So, something like =SUMIF(a1:c1; "/NUM & /NUM & /NUM"; a2:c2)
That doesn't work, and honestly I don't know what i should be looking for.
Additional stuff:
I'm trying to think of the answer myself, is it possible to mention the header text as condition for ifs? Like: if A2="/650 - Black" then proceed to sum the next header. Is this possible?
Possibility it would not involve VBA, a draggable formula would be preferable!
At this point, I may as well request a version which handles the complete header name rather than just a part of it as I believe it to be difficult for formula code alone.
Thanks for having a look!
Let me know if I need to elaborate.
EDIT: In regards to data samples, any positive number will do actually, damn shame stack overflow doesn't support table markdown. Anyway, for example then..:
+-------------+-------------+-------------+-------------+-------------+
| A | B | C | D | E |
+---+-------------+-------------+-------------+-------------+-------------+
| 1 |/650 - Black |/670 - White |/800 - White |/680 - Red |/650 - Black |
+---+-------------+-------------+-------------+-------------+-------------+
| 2 | 250 | 400 | 100 | 300 | 125 |
+---+-------------+-------------+-------------+-------------+-------------+
I should have clarified:
The number range for these headers would go from /100 - /9999 and no more than that.
EDIT:
Progress so far:
https://docs.google.com/spreadsheets/d/1GiJKFcPWzG5bDsNt93eG7WS_M5uuVk9cvkt2VGSbpxY/edit?usp=sharing
Formula:
=SUMPRODUCT((A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($H$1)=4,$H$1&"",$H$1&" ")))+(A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($I$1)=4,$I$1&"",$I$1&" ")))+(A2:D2*
(MID($A$1:$D$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" "))))
Apparently, each MID function is returning false with each F9 calculation.
EDIT EDIT:
Okay! I found my issue, it's the /being read when you ALSO mentioned that it wasn't required. Man, I should stop skimming!
Final Edit:
=SUMPRODUCT((RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match5)=4,Match5&"",Match5&" ")))+(RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match6)=4,Match6&"",Match6&" ")))+(RETURNSUM*
(MID(HEADER,2,4)=IF(LEN(Match7)=4,Match7&"",Match7&" ")))
The idea is that Header and RETURNSUM will become match criteria like the matches written above, that way it would be easier to punch new criterion into the search table. As of the moment, it doesn't support multiple rows/dragging.
I have knocked up a couple of formulas that will achieve what you are looking for. For ease I have made the search input require the number only as pressing / does not automatically type into the formula bar. I apologise for the length of the answer, I got a little carried away with the explanation.
I have set this up for 3 criteria located in J1, K1 and L1.
Here is the output I achieved:
Formula 1 - SUMPRODUCT():
=SUMPRODUCT((A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" ")))+(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($K$1)=4,$K$1&"",$K$1&" ")))+(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($L$1)=4,$L$1&"",$L$1&" "))))
Sumproduct(array1,[array2]) behaves as an array formula without needed to be entered as one. Array formulas break down ranges and calculate them cell by cell (in this example we are using single rows so the formula will assess columns seperately).
(A4:G4*(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" ")))
Essentially I have broken the Sumproduct() formula into 3 identical parts - 1 for each search condition. (A4:G4*: Now, as the formula behaves like an array, we will multiply each individual cell by either 1 or 0 and add the results together.
1 is produced when the next part of the formula is true and 0 for when it is false (default numeric values for TRUE/FALSE).
(MID($A$1:$G$1,2,4)=IF(LEN($J$1)=4,$J$1&"",$J$1&" "))
MID(text,start_num,num_chars) is being used here to assess the 4 digits after the "/" and see whether they match with the number in the 3 cells that we are searching from (in this case the first one: J1). Again, as SUMPRODUCT() works very much like an array formula, each cell in the range will be assessed individually.
I have then used the IF(logical_test,[value_if_true],[value_if_false]) to check the length of the number that I am searching. As we are searching for a 4 digit text string, if the number is 4 digits then add nothing ("") to force it to a text string and if it is not (as it will have to be 3 digits) add 1 space to the end (" ") again forcing it to become a text string.
The formula will then perform the calculation like so:
The MID() formula produces the array: {"650 ","670 ","800 ","680 ","977 ","9999","143 "}. This combined with the first search produces {TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE} which when multiplied by A4:G4
(remember 0 for false and 1 for true) produces this array: {250,0,0,0,0,0,0} essentially pulling the desired result ready to be summed together.
Formula 2: =SUM(IF(Array)): [This formula does not work for 3 digit numbers as they will exist within the 4 digit numbers! I have included it for educational purposes only]
=SUM(IF(ISNUMBER(SEARCH($J$1,$A$1:$G$1)),A8:G8),IF(ISNUMBER(SEARCH($K$1,$A$1:$G$1)),A8:G8),IF(ISNUMBER(SEARCH($L$1,$A$1:$G$1)),A8:G8))
The formula will need to be entered as an array (once copy and pasted while still in the formula bar hit CTRL+SHIFT+ENTER)
This formula works in a similar way, SUM() will add together the array values produced where IF(ISNUMBER(SEARCH() columns match the result column.
SEARCH() will return a number when it finds the exact characters in a cell which represents it's position in number of characters. By using ISNUMBER() I am avoiding having to do the whole MID() and IF(LEN()=4,""," ") I used in the previous formula as TRUE/FALSE will be produced when a match is found regardless of it's position or cell formatting.
As previously mentioned, this poses a problem as 999 can be found within 9999 etc.
The resulting array for the first part is: {250,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE} (if you would like to see the array you can highlight that part of the formula and calculate with F9 but be sure to highlight the exact brackets for that part of the formula).
I hope I have explained this well, feel free to ask any questions about stuff that you don't understand. It is good to see people keen to learn and not just fishing for a fast answer. I would be more than happy to help and explain in more depth.
I start this solution with the names in an array, you can read the header names into an array with not too much difficulty.
Sub test()
Dim myArray(1 To 4) As String
myArray(1) = "/670 - White"
myArray(2) = "/650 - black"
myArray(3) = "/680 - Red"
myArray(4) = "/800 - Whitest"
For Each ArrayValue In myArray
'Find position of last character
endposition = InStr(1, ArrayValue, " - ", vbTextCompare)
'Grab the number section from the string, based on starting and ending positions
stringvalue = Mid(ArrayValue, 2, endposition - 2)
'Convert to number
NumberValue = CLng(stringvalue)
'Add to total
Total = Total + NumberValue
Next ArrayValue
'Print total
Debug.Print Total
End Sub
This will print the answer to the debug window.

How to compare only a part of strings in different excel sheets(columns)

How to compare two sheet columns specied below, however only part of string in location(column)in sheet1 will matches only part of string with the Location(column) in sheet2?
1.only 1st two characters of location(column) in sheet1 and 1st two characters of location(column) in sheet2 should match.
2.only any two characters of location(column) in sheet1 and any two characters of location(column) in sheet2 should match. Please help
Location(sheet1) Location(sheet2)
____________________________________________
india- north USxcs
India-west Indiaasd
India- east Indiaavvds
India- south Africassdcasv
US- north Africavasvdsa
us-west UKsacvavsdv
uk- east Indiacascsa
uk- south UScssca
Africa-middle Indiacsasca
Africa-south Africaccc
Africa-east UKcac
For question 1 you can use MID function to extract the first two characters from each cell value and compare them.
For question 2 there is a solution if you can accept a predetermined maximum length of string. It is not a very nice solution! You can use nested if statements, basically 'unrolling the loop'. This example compares cell A1 and B1 for lengths of A1 up to 12 characters:
=IF(IFERROR(FIND(MID(A1,1,2),B1,1),0)>0,TRUE,
IF(IFERROR(FIND(MID(A1,2,2),B1,1),0)>0,TRUE,
IF(IFERROR(FIND(MID(A1,3,2),B1,1),0)>0,TRUE,
IF(IFERROR(FIND(MID(A1,4,2),B1,1),0)>0,TRUE,
IF(IFERROR(FIND(MID(A1,5,2),B1,1),0)>0,TRUE,
IF(IFERROR(FIND(MID(A1,6,2),B1,1),0)>0,TRUE,
IF(IFERROR(FIND(MID(A1,7,2),B1,1),0)>0,TRUE,
IF(IFERROR(FIND(MID(A1,8,2),B1,1),0)>0,TRUE,
IF(IFERROR(FIND(MID(A1,9,2),B1,1),0)>0,TRUE,
IF(IFERROR(FIND(MID(A1,10,2),B1,1),0)>0,TRUE,
IF(IFERROR(FIND(MID(A1,11,2),B1,1),0)>0,TRUE,
FALSE
)
)
)
)
)
)
)
)
)
)
)
Thanks to James Jenkins for this update
It seems older versions of Excel have a limit of 7 nested functions. You can work around this (if you don't mind making your spreadsheet even more ugly) by chaining together formulas in adjacent cells. In fact if you wanted to get really creative you could use the column index to calculate the offsets for the search, something like:
=IF(IFERROR(FIND(MID($A1,(COLUMN(C1) - 3) * 6 + 1, 2), $B1, 1),0)>0,TRUE,
...repeat with +2, +3, +4, +5
if(D2 = FALSE, FALSE, TRUE)
)))))))
Then the column can be copied right if you ever need more string length. Note the innermost 'if' should force a TRUE or FALSE value when the adjacent column is blank.

Excel 2007 - Generate unique ID based on text?

I have a sheet with a list of names in Column B and an ID column in A. I was wondering if there is some kind of formula that can take the value in column B of that row and generate some kind of ID based on the text? Each name is also unique and is never repeated in any way.
It would be best if I didn't have to use VBA really. But if I have to, so be it.
Solution Without VBA.
Logic based on First 8 characters + number of character in a cell.
= CODE(cell) which returns Code number for first letter
= CODE(MID(cell,2,1)) returns Code number for second letter
= IFERROR(CODE(MID(cell,9,1)) If 9th character does not exist then return 0
= LEN(cell) number of character in a cell
Concatenating firs 8 codes + adding length of character on the end
If 8 character is not enough, then replicate additional codes for next characters in a string.
Final function:
=CODE(B2)&IFERROR(CODE(MID(B2,2,1)),0)&IFERROR(CODE(MID(B2,3,1)),0)&IFERROR(CODE(MID(B2,4,1)),0)&IFERROR(CODE(MID(B2,5,1)),0)&IFERROR(CODE(MID(B2,6,1)),0)&IFERROR(CODE(MID(B2,7,1)),0)&IFERROR(CODE(MID(B2,8,1)),0)&LEN(B2)
Sorry, I didn't found a solution with formula only even if this thread might help (trying to calculate the points in a scrabble game) but I didn't find a way to be sure the generated hash would be unique.
Yet, here is my solution, based on a UDF (Used-Defined Function):
Put the code in a module:
Public Function genId(ByVal sName As String) As Long
'Function to create a unique hash by summing the ascii value of each character of a given string
Dim sLetter As String
Dim i As Integer
For i = 1 To Len(sName)
genId = Asc(Mid(sName, i, 1)) * i + genId
Next i
End Function
And call it in your worksheet like a formula:
=genId(A1)
[EDIT] Added the * i to take into account the order. It works on my unit tests
May be OTT for your needs, but you can use a call to CoCreateGuid to get a real GUID
Private Declare Function CoCreateGuid Lib "ole32" (ID As Any) As Long
Function GUID() As String
Dim ID(0 To 15) As Byte
Dim i As Long
If CoCreateGuid(ID(0)) = 0 Then
For i = 0 To 15
GUID = GUID & Format(Hex$(ID(i)), "00")
Next
Else
GUID = "Error while creating GUID!"
End If
End Function
Test using
Sub testGUID()
MsgBox GUID
End Sub
How to best implement depends on your needs. One way would be to write a macro to get a GUID populate a column where names exist. (note, using it as a udf as is is no good, since it will return a new GUID when recalculated)
EDIT
See this answer for creating a SHA1 hash of a string
Do you just want an incrementing numeric id column to sit next to your values? If so, and if your values will always be unique, you can very easily do this with formulae.
If your values were in column B, starting in B2 underneath your headers for example, in A2 you would type the formula "=IF(B2="","",1+MAX(A$1:A1))". You can copy and paste that down as far as your data extends, and it will increment a numeric identifier for each row in column B which isn't blank.
If you need to do anything more complicated, like identify and re-identify repeating values, or make identifiers 'freeze' once they're populated, let me know. Currently, when you clear or add values to your list the identifers will toggle themselves up and down, so you need to be careful if your data changes.
Unique identifier based on the number of specific characters in text. I used an identifier based on vowels and numbers.
=LEN($J$14)-LEN(SUBSTITUTE($J$14;"a";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"e";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"i";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"j";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"o";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"u";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"y";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"1";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"2";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"3";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"4";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"5";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"6";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"7";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"8";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"9";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"0";""))
You say you are confident that there are no duplicate values in your words. To push it further, are you confident that the first 8 characters in any word would be unique?
If so, you can use the below formula. It works by individually taking each character's ASCII code - 40 [assuming normal characters, this puts numbers at between 8 & 57, and letters at between 57 & 122], and multiplying that characters code by 10 ^ [that character's digit placement in the word]. Basically it takes that character code [-40], and concatenates each code onto the next.
EDIT Note that this code no longer requires that at least 8 characters exist in your word to prevent an error, as the actual word to be coded has 8 "0"'s appended to it.
=TEXT(SUM((CODE(MID(LOWER(RIGHT(REPT("0",8)&A3,8)),{1,2,3,4,5,6,7,8},1))-40)*10^{0,2,4,6,8,10,12,14}),"#")
Note that as this uses the ASCII values of the characters, the ID # could be used to identify the name directly - this does not really create anonymity, it just turns 8 unique characters into a unique number. It is obfuscated with the -40, but not really 'safe' in that sense. The -40 is just to get normal letters and numbers in the 2 digit range, so that multiplying by 10^0,2,4 etc. will create a 2 digit unique add-on to the created code.
EDIT FOR ALTERNATIVE METHOD
I had previously attempted to do this so that it would look at each letter of the alphabet, count the number of times it appears in the word, and then multiply that by 10*[that letter's position in the alphabet]. The problem with doing this (see comment below for formula) is that it required a number of 10^26-1, which is beyond Excel's floating point precision. However, I have a modified version of that method:
By limiting the number of allowed characters in the alphabet, we can get the max total size possible to 10^15-1, which Excel can properly calculate. The formula looks like this:
=RIGHT(REPT("0",15)&TEXT(SUM(LEN(A3)*10^{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}-LEN(SUBSTITUTE(A3,MID(Alphabet,{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15},1),""))*10^{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}),"#"),15)
[The RIGHT("00000000000000"... portion of the formula is meant to keep all codes the same number of characters]
Note that here, Alphabet is a named string which holds the characters: "abcdehilmnorstu". For example, using the above formula, the word "asdf" counts the instances of a, s, and d, but not 'f' which isn't in my contracted alphabet. The code of "asdf" would be:
001000000001001
This only works with the following assumptions:
The letters not listed (nor numbers / special characters) are not required to make each name unique. For example, asdf & asd would have the same code in the above method.
And,
The order of the letters is not required to make each name unique. For example, asd & dsa would have the same code in the above method.

Resources