UTF 8 byte length of a string in microsoft excel - excel

I am trying to add in cell data validation for a string length to be between 8 and 16 and the max byte length less than 40(UTF8 encoding).
I created a data validation using the excel active support:
Data validation(data tab -> Data Validation (between Remove Duplicates and Consolidate in excel 2016 mac)), In Settings tab, there is validation
criteria:
Validation Criteria:
Allow: Text Length
Data : between
Min : 8 & Max : 16
Though the above validation satisfies all the restrictions i have(8
For other languages(say Japanese), though the string length is being counted though physical length(Eg : "こんにちはこんにちはこんにちは", hellohellohello in Japanese), the UTF8 byte value is 45 bytes, which is the violation of the 40 bytes, thought the length is only 15.
I found "LENB" function in excel, but it is giving the value as 30(instead of 45). I think it is based on different encoding(ansi maybe)
I found the UNICODE function which gives the unicode number of the first character(12371) in the above case. But i don't see how can i get the byte value from this number(3 bytes is the value for the first character(こ)).
Any help in this regard will be greatly appreciated.

I faced the same issue and here is the solution without VBA based on the answer above + this article. Assuming you have a string in A1:
=SUM(
IF(UNICODE(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1))<128, 1,
IF(UNICODE(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1))<2048, 2,
IF(UNICODE(MID(A1,ROW(INDIRECT("1:"&LEN(A1))),1))<65536, 3, 4
))))
Don't forget to use array function (CTRL+SHIFT+ENTER) when leaving the cell :)

With the Unicode value, you can compute how many bytes a particular one will take. <128 is 1 byte, else <2048 is 2, else <65536 is 3, else 4.

Related

EXCEL: Unique alphanumeric code with certain characters excluded (without VBA / duplicates)

I am trying to create a list =5 alphanumeric characters.
They cannot contain 1, and i and there cannot be duplicates when dragging / copying the code down.
The characters that are allowed are:
023456789ABCDEFGHJKLMNOPQRSTUWVXYZ (Capital)
I have tried numerous of options but I can't seem to figure this one out.
Cheers
If your allowable character string is in cell A1 then the following formula will result in random codes that are each five characters in length:
=MID(A1,RANDBETWEEN(1,34),1) & MID(A1,RANDBETWEEN(1,34),1) & MID(A1,RANDBETWEEN(1,34),1) & MID(A1,RANDBETWEEN(1,34),1) & MID(A1,RANDBETWEEN(1,34),1)
But note that there is no guarantee that the codes will be unique.
As #ScottCraner pointed out... if you should happen to have Office 365, you can use this much shorter formula that takes advantage of two new functions only available in Excel 365:
=CONCAT(MID(A1,RANDARRAY(5,,1,34,TRUE),1))
But again, there is no guarantee that the resulting codes will be unique.
This formula will generate the codes in order
=SUBSTITUTE(SUBSTITUTE(BASE(K, 34,5),"1","Z"),"I","Y")
Here K can be 0, 1, 2, .... One way to generate the first ~1,048,576 K's is to use ROW()-1. You could get higher values of K by using something like K = 1048576*(COLUMN()-1) + ROW()-1.
The formula works by
(a) calling BASE(K, 34, 5) to get a 5-char long base-34 representation of K
(b) substituting Z for 1 since 1 is not a valid char
(c) substituting Y for I since I is not a valid char

PowerQuery M Conditional Column 'count' argument is out of range

I have a sheet with dates as MMDDYYY with no leading 0's if month number is single digit. For example, 1012018 or 12312018. Each record has a date, and each date is either 7 or 8 characters in length.
Here is the code I am using to convert the numbers to dates:
if Text.Length([ContractDate]) = 7
then
Text.Range([ContractDate],0,1)&"/"&Text.Range([ContractDate],1,2)&"/"&Text.Range([ContractDate],4,4)
else
Text.Range([ContractDate],0,2)&"/"&Text.Range([ContractDate],2,2)&"/"&Text.Range([ContractDate],4,4)
The code works fine for the "else" condition but I am getting error "Expression.Error: The 'count' argument is out of range. Details: 4" for all records where Text.Length() = 7. I verified this by adding a second column to get Length of ContractDate.
What am I missing?
EDIT: Problem Solved - I'm an idiot. I was getting an error because in the "then" condition, I am extracting a substring of (4,4) from a value that only has Len=7. I can't get 4 characters out of a 7 character string when starting at index of 4.
I know you found the issue with your code, but worth pointing out some things that might be good to know.
Text.Range with no character count will pull in all characters past the start point (so Text.Range([ContractDate], 4) would work for both).
Text.Middle operates like Text.Range but will not cause an error if you select a range that expands past the size of the string. This can be useful if for some reason you were dealing with variable size strings where you need a specific number of characters up to a limit past a certain position.
You could also use Text.PadStart([ContractDate], 8, "0") to pad the 7 length strings with a 0 at the start, and avoid the need for a conditional check all together.

How to build complex value from three variables?

I have an Excel spreadsheet with over 2000 entries:
Field B1: CustomerID as 000012345
Field B2: CustomerID as 0000432
Field C1: CustomerCountry as DE
Field C2: CustomerCountry as IT
I need to build codes 13 digits long including "CustomerCountry" + "CustomerID" without leading 0 + random number (can be 6 digits, more or less, depends in length of CustomerID).
The results should be like this: D1 Code as DE12345967895 or D2 Code as IT43274837401
How to do it with Excel functions?
UPDATED:
I tried this one. My big problem is to say that random number should be long enough to get 13 characters in all. Sometimes CustomerID is just 3 or 4 digits long, and concatenation of three variables can be just 10 or 9 characters. But codes have to be always 13 characters long.
Use & to concatenate strings.
Use VALUE(CustomerID) to trim the leading zeroes from the ID
Use RAND() to add a random number between 0 and 1 or RANDBETWEEN(x,y) to create one between x and y.
Combine the above and there you are!
If you always want 13 digits you can use LEFT(INT(RAND()*10^13);(13-LEN(CustomerCountry)-LEN(VALUE(CustomerID)))) for the random number to ALWAYS be the right length.
total formula
= CustomerCountry
& VALUE(CustomerID)
& LEFT(INT(RAND()*10^13);(13-LEN(CustomerCountry)-LEN(VALUE(CustomerID))))
=C1 & TEXT(B1,"0") & RIGHT(TEXT(RANDBETWEEN(0,99999999999),"00000000000"),11 - LEN(TEXT(B1,"0")))
that should do it
I don’t understand what is where and OP has accepted answer so have not bothered testing:
=LEFT(RIGHT(C1,2)&VALUE(MID(B1,15,13))&RANDBETWEEN(10^9,10^10),13)
(but I might revert to this if no one else picks the flaws in it first!)

Excel 2007 - Generate unique ID based on text?

I have a sheet with a list of names in Column B and an ID column in A. I was wondering if there is some kind of formula that can take the value in column B of that row and generate some kind of ID based on the text? Each name is also unique and is never repeated in any way.
It would be best if I didn't have to use VBA really. But if I have to, so be it.
Solution Without VBA.
Logic based on First 8 characters + number of character in a cell.
= CODE(cell) which returns Code number for first letter
= CODE(MID(cell,2,1)) returns Code number for second letter
= IFERROR(CODE(MID(cell,9,1)) If 9th character does not exist then return 0
= LEN(cell) number of character in a cell
Concatenating firs 8 codes + adding length of character on the end
If 8 character is not enough, then replicate additional codes for next characters in a string.
Final function:
=CODE(B2)&IFERROR(CODE(MID(B2,2,1)),0)&IFERROR(CODE(MID(B2,3,1)),0)&IFERROR(CODE(MID(B2,4,1)),0)&IFERROR(CODE(MID(B2,5,1)),0)&IFERROR(CODE(MID(B2,6,1)),0)&IFERROR(CODE(MID(B2,7,1)),0)&IFERROR(CODE(MID(B2,8,1)),0)&LEN(B2)
Sorry, I didn't found a solution with formula only even if this thread might help (trying to calculate the points in a scrabble game) but I didn't find a way to be sure the generated hash would be unique.
Yet, here is my solution, based on a UDF (Used-Defined Function):
Put the code in a module:
Public Function genId(ByVal sName As String) As Long
'Function to create a unique hash by summing the ascii value of each character of a given string
Dim sLetter As String
Dim i As Integer
For i = 1 To Len(sName)
genId = Asc(Mid(sName, i, 1)) * i + genId
Next i
End Function
And call it in your worksheet like a formula:
=genId(A1)
[EDIT] Added the * i to take into account the order. It works on my unit tests
May be OTT for your needs, but you can use a call to CoCreateGuid to get a real GUID
Private Declare Function CoCreateGuid Lib "ole32" (ID As Any) As Long
Function GUID() As String
Dim ID(0 To 15) As Byte
Dim i As Long
If CoCreateGuid(ID(0)) = 0 Then
For i = 0 To 15
GUID = GUID & Format(Hex$(ID(i)), "00")
Next
Else
GUID = "Error while creating GUID!"
End If
End Function
Test using
Sub testGUID()
MsgBox GUID
End Sub
How to best implement depends on your needs. One way would be to write a macro to get a GUID populate a column where names exist. (note, using it as a udf as is is no good, since it will return a new GUID when recalculated)
EDIT
See this answer for creating a SHA1 hash of a string
Do you just want an incrementing numeric id column to sit next to your values? If so, and if your values will always be unique, you can very easily do this with formulae.
If your values were in column B, starting in B2 underneath your headers for example, in A2 you would type the formula "=IF(B2="","",1+MAX(A$1:A1))". You can copy and paste that down as far as your data extends, and it will increment a numeric identifier for each row in column B which isn't blank.
If you need to do anything more complicated, like identify and re-identify repeating values, or make identifiers 'freeze' once they're populated, let me know. Currently, when you clear or add values to your list the identifers will toggle themselves up and down, so you need to be careful if your data changes.
Unique identifier based on the number of specific characters in text. I used an identifier based on vowels and numbers.
=LEN($J$14)-LEN(SUBSTITUTE($J$14;"a";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"e";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"i";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"j";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"o";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"u";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"y";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"1";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"2";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"3";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"4";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"5";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"6";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"7";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"8";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"9";""))&LEN($J$14)-LEN(SUBSTITUTE($J$14;"0";""))
You say you are confident that there are no duplicate values in your words. To push it further, are you confident that the first 8 characters in any word would be unique?
If so, you can use the below formula. It works by individually taking each character's ASCII code - 40 [assuming normal characters, this puts numbers at between 8 & 57, and letters at between 57 & 122], and multiplying that characters code by 10 ^ [that character's digit placement in the word]. Basically it takes that character code [-40], and concatenates each code onto the next.
EDIT Note that this code no longer requires that at least 8 characters exist in your word to prevent an error, as the actual word to be coded has 8 "0"'s appended to it.
=TEXT(SUM((CODE(MID(LOWER(RIGHT(REPT("0",8)&A3,8)),{1,2,3,4,5,6,7,8},1))-40)*10^{0,2,4,6,8,10,12,14}),"#")
Note that as this uses the ASCII values of the characters, the ID # could be used to identify the name directly - this does not really create anonymity, it just turns 8 unique characters into a unique number. It is obfuscated with the -40, but not really 'safe' in that sense. The -40 is just to get normal letters and numbers in the 2 digit range, so that multiplying by 10^0,2,4 etc. will create a 2 digit unique add-on to the created code.
EDIT FOR ALTERNATIVE METHOD
I had previously attempted to do this so that it would look at each letter of the alphabet, count the number of times it appears in the word, and then multiply that by 10*[that letter's position in the alphabet]. The problem with doing this (see comment below for formula) is that it required a number of 10^26-1, which is beyond Excel's floating point precision. However, I have a modified version of that method:
By limiting the number of allowed characters in the alphabet, we can get the max total size possible to 10^15-1, which Excel can properly calculate. The formula looks like this:
=RIGHT(REPT("0",15)&TEXT(SUM(LEN(A3)*10^{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}-LEN(SUBSTITUTE(A3,MID(Alphabet,{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15},1),""))*10^{0,1,2,3,4,5,6,7,8,9,10,11,12,13,14}),"#"),15)
[The RIGHT("00000000000000"... portion of the formula is meant to keep all codes the same number of characters]
Note that here, Alphabet is a named string which holds the characters: "abcdehilmnorstu". For example, using the above formula, the word "asdf" counts the instances of a, s, and d, but not 'f' which isn't in my contracted alphabet. The code of "asdf" would be:
001000000001001
This only works with the following assumptions:
The letters not listed (nor numbers / special characters) are not required to make each name unique. For example, asdf & asd would have the same code in the above method.
And,
The order of the letters is not required to make each name unique. For example, asd & dsa would have the same code in the above method.

How to return a long string from a VBA function in Excel?

I've the following function defined in Excel:
Function LongString()
Dim i As Integer
Do
LongString = LongString & "X"
i = i + 1
Loop Until i > 40000
End Function
This results in an error : #VALUE!
It seems that the maximum string length is limited to 32768 ?
How to get this working ?
--EDIT--
Thanks you all for your support. My solution was to split up my function into several cell which contain less then the 32768 characters.
According to Microsoft the 32767 length limit is in their specification (see here).
Length of cell contents (text): "32,767 characters. Only 1,024 display
in a cell; all 32,767 display in the
formula bar."
As such the only way you will get more than that in is to break down strings into multiple cells.
Your LonsString function returns a variant/variable-length string which can contain up to 2^31 (about 2 billion) characters.
However, as mentioned by #Jon cells can only contain up to 32767 characters. (change the data type of i to long to prove the point).
If you expand on what you are trying to achieve with LongString we may be able to offer some alternatives
Excel Allows more than 32 k in each cell. I've a strange situation, I have a cell with a string of 34743 bytes, i process the string but no able to return more than 32k. So the problem is in the return of the value, not in the max size of the cell.
Note Excel 2013, in 2003 the limit is 32k. But vba code it's still limited to return 32k even in Excel 2013. M$ rare bugs.

Resources