String sub not working correctly - string

I've got yet another question about lua. I've created a method to calculate the total amount of some prices. The prices are in this format: £500. So to convert them to numbers I'm using string:sub() and tonumber(), but I'm getting some weird results. Here is my code:`
function functions.calculateTotalAmount()
print("calculating total amount")
saveData.totalAmount = 0
print("There are " .. #saveData.amounts .. " in the amount file")
for i=1, #saveData.names do
print("SaveData.amounts[" .. i .. "] original = " .. saveData.amounts[i])
print("SaveData.amounts[" .. i .. "] after sub= " .. saveData.amounts[i]:sub(2))
print("totalAmount: " .. saveData.totalAmount)
if saveData.income[i] then
saveData.totalAmount = saveData.totalAmount + tonumber(saveData.amounts[i]:sub(2))
else
saveData.totalAmount = saveData.totalAmount - tonumber(saveData.amounts[i]:sub(2))
end
end
totalAmountStr.text = saveData.totalAmount .. " " .. currencyFull
loadsave.saveTable(saveData, "payMeBackTable.json")
end
I printed out some info in the for loop to determine the problem and this is what is being printed for the first 2 print statements in the for loop:
16:03:51.452 SaveData.amounts1 original = ¥201
16:03:51.452 SaveData.amounts1 after sub= 201
It looks fine here in stackoverflow but for the the ¥ is actually not gone in my log, instead it is replaced with a weird rectangle symbol. There will be a picture of the printed text attached to this post.
Does anyone see what is going on here?

Don't use sub in this case as the ¥ sign is likely a multi-byte sequence (depending on the encoding), so using sub(2) you are cutting it in the middle instead of removing it.
Use gsub("[^%d%.]+","") instead to remove all non-numeric parts.

string.sub() works on the bytes of a string, not on its chars. There is a difference when the string contains Unicode text.
If the number is at the end of the string, extract it with
amount = tonumber(saveData.amounts[i]:match("%d+$"))

Lua strings are strings of bytes, not strings of characters. ASCII characters are 1 byte long, but most other characters consume multiple bytes, so using string.sub() isn't going to work.
There are several standards for converting between bytes and characters (or code points), but by far the most common on the web is UTF-8. If you are using Lua 5.3 or greater, you can use new built-in functions for performing UTF-8 manipulation. For example, to take a substring of a UTF-8 string, you can do:
-- Simple version without bounds-checking.
function utf8_sub1(s, start_char_idx, end_char_idx)
start_byte_idx = utf8.offset(s, start_char_idx)
end_byte_idx = utf8.offset(s, end_char_idx + 1) - 1
return string.sub(s, start_byte_idx, end_byte_idx)
end
-- More robust version with bounds-checking.
function utf8_sub2(s, start_char_idx, end_char_idx)
start_byte_idx = utf8.offset(s, start_char_idx)
end_byte_idx = utf8.offset(s, end_char_idx + 1)
if start_byte_idx == nil then
start_byte_idx = 1
end
if end_byte_idx == nil then
end_byte_idx = -1
else
end_byte_idx = end_byte_idx - 1
end
return string.sub(s, start_byte_idx, end_byte_idx)
end
s = "¥201"
print(string.sub(s, 2, 4)) -- an invalid byte sequence
print(utf8_sub1(s, 2, 4)) -- "201"
print(utf8_sub2(s, 2, 4)) -- "201"
print(utf8_sub1(s, 2, 5)) -- throws an error
If you don't have Lua 5.3, you can use a UTF-8 library like this one instead to achieve the same functionality.

Related

How to extract the first instance of digits in a cell with a specified length in VBA?

I have the following Text sample:
Ins-Si_079_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_May
I want to get the number 079, So what I need is the first instance of digits of length 3. There are certain times the 3 digits are at the end, but they usually found with the first 2 underscores. I only want the digits with length three (079) and not 19, 1920, or 2554 which are different lengths.
Sometimes it can look like this with no underscore:
1920 O-B CLI 353 Tar Traf
Or like this with the 3 digit number at the end:
Ins-Si_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_079
There are also times where what I need is 2 digits but when it's 2 digits its always at the end like this:
FY1920-Or-OLV-B-45
How would I get what I need in all cases?
You can split the listed items and check for 3 digits via Like:
Function Get3Digits(s As String) As String
Dim tmp, elem
tmp = Split(Replace(Replace(s, "-", " "), "_", " "), " ")
For Each elem In tmp
If elem Like "###" Then Get3Digits = elem: Exit Function
Next
If Get3Digits = vbNullString Then Get3Digits = IIf(Right(s, 2) Like "##", Right(s, 2), "")
End Function
Edited due to comment:
I would execute a 2 digit search when there are no 3 didget numbers before the end part and the last 2 digits are 2. if 3 digits are fount at end then get 3 but if not then get 2. there are times when last is a number but only one number. I would only want to get last if there are 2 or 3 numbers. The - would not be relevant to the 2 digets. if nothing is found that is desired then would return " ".
If VBA is not a must you could try:
=TEXT(INDEX(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"_"," "),"-"," ")," ","</s><s>")&"</s></t>","//s[.*0=0][string-length()=3 or (position()=last() and string-length()=2)]"),1),"000")
It worked for your sample data.
Edit: Some explaination.
SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(A1,"_"," "),"-"," ")," ","</s><s>") - The key part to transform all three potential delimiters (hyphen, underscore and space) to valid XML node end- and startconstruct.
The above concatenated using ampersand into a valid XML construct (adding a parent node <t>).
FILTERXML can be used to now 'split' the string into an array.
//s[.*0=0][string-length()=3 or last() and string-length()=2] - The 2nd parameter of FILTERXML which should be valid XPATH syntax. It reads:
//s 'Select all <s> nodes with
following conditions:
[.*0=0] 'Check if an <s> node times zero
returns zero (to check if a node
is numeric. '
[string-length()=3 or (position()=last() and string-length()=2)] 'Check if a node is 3 characters
long OR if it's the last node and
only 2 characters long.
INDEX(.....,1) - I mentioned in the comments that usually this is not needed, but since ExcelO365 might spill the returned array, we may as well implemented to prevent spilling errors for those who use the newest Excel version. Now we just retrieving the very first element of whatever array FILTERXML returns.
TEXT(....,"000") - Excel will try delete leading zeros of a numeric value so we use TEXT() to turn it into a string value of three digits.
Now, if no element can be found, this will return an error however a simple IFERROR could fix this.
Try this function, please:
Function ExtractThreeDigitsNumber(x As String) As String
Dim El As Variant, arr As Variant, strFound As String
If InStr(x, "_") > 0 Then
arr = Split(x, "_")
Elseif InStr(x, "-") > 0 Then
arr = Split(x, "-")
Else
arr = Split(x, " ")
End If
For Each El In arr
If IsNumeric(El) And Len(El) = 3 Then strFound = El: Exit For
Next
If strFound = "" Then
If IsNumeric(Right(x, 2)) Then ExtractThreeDigitsNumber = Right(x, 2)
Else
ExtractThreeDigitsNumber = strFound
End If
End Function
It can be called in this way:
Sub testExtractThreDig()
Dim x As String
x = "Ins-Si_079_GM_SOC_US_VI SI_SOC_FY1920_US_FY19/20_A2554_Si Resp_2_May"
Debug.Print ExtractThreeDigitsNumber(x)
End Sub

Sorting strings without methods and other types

Hello I have to reorder a string, I am banned from using other types and str methods
So my problem is that I could not figure out how to end my code to get it work with any string
I tried to compare the results with sorted() to check and I am stuck at the first exchange
My code:
i = 0
s1 = "hello"
s2 = sorted(s1)
while (i<len(s1)):
j=i+1
while (j<=len(s1)-1):
if (s1[i] > s1[j]):
s1 = s1[0:i] + s1[j] + s1[i]
j+=1
i+=1
print(s1)
print(s2)
I tried to add + s1[len(s1):] at the end of the operation but
I only had found the result for a single string(that I was testing) adding thisI am really stuck, how can I make it work for all the strings with different lenghts??
Thanks
You're not reconstructing the string correctly when doing s1 = s1[0:i] + s1[j] + s1[i] as you're replacing one character for the other but you omit to actually interchange the two and to add the remains of the splitted string to the end of the new string.
Given what your code looks like, I would do it like this:
i = 0
s1 = "hello"
s2 = sorted(s1)
while i < len(s1):
j = i + 1
while j <= len(s1)-1:
if s1[i] > s1[j]:
s1 = s1[0:i] + s1[j] + s1[i+1:j] + s1[i] + s1[j+1:len(s1)]
j += 1
i += 1
print("".join(s2))
# > 'ehllo'
print(s1)
# > 'ehllo'
Please tell me if anything is unclear!
I am banned from using other types and str methods
Based upon your criteria, your request is impossible. Just accessing the elements of a string requires string methods.
The technique that you are using is very convoluted, hard to read and is difficult to debug. Try running your code in a debugger.
Now given that you are allowed to convert a string to a list (which requires string methods), redesign your code to use simple, easy to understand statements.
The following code first converts the string into a list. Then loops thru the list starting at the beginning and compares each following character to the end. If any character is less then the current character, swap. As you step thru the string, the character swaps will result in a sorted list. At the end convert the list back to a string using join().
msg = 'hello'
s = list(msg)
for i in range(len(s) - 1):
for j in range(i + 1, len(s)):
if s[i] <= s[j]:
continue
# swap characters
s[i], s[j] = s[j], s[i]
print(msg)
print(''.join(s))

Converting letters into NATO alphabet in MATLAB

I want to write a code in MATLAB that converts a letter into NATO alphabet. Such as the word 'hello' would be re-written as Hotel-Echo-Lima-Lima-Oscar. I have been having some trouble with the code. So far I have the following:
function natoText = textToNato(plaintext)
plaintext = lower(plaintext);
r = zeros(1, length(plaintext))
%Define my NATO alphabet
natalph = ["Alpha","Bravo","Charlie","Delta","Echo","Foxtrot","Golf", ...
"Hotel","India","Juliet","Kilo","Lima","Mike","November","Oscar", ...
"Papa","Quebec","Romeo","Sierra","Tango","Uniform","Victor",...
"Whiskey","Xray","Yankee","Zulu"];
%Define the normal lower alphabet
noralpha = ['a' : 'z'];
%Now we need to make a loop for matlab to check for each letter
for i = 1:length(text)
for j = 1:26
n = r(i) == natalph(j);
if noralpha(j) == text(i) : n
else r(i) = r(i)
natoText = ''
end
end
end
for v = 1:length(plaintext)
natoText = natoText + r(v) + ''
natoText = natoText(:,-1)
end
end
I know the above code is a mess and I am a bit in doubt what really I have been doing. Is there anyone who knows a better way of doing this? Can I modify the above code so that it works?
It is because now when I run the code, I am getting an empty plot, which I don't know why because I have not asked for a plot in any lines.
You can actually do your conversion in one line. Given your string array natalph:
plaintext = 'hello'; % Your input; could also be "hello"
natoText = strjoin(natalph(char(lower(plaintext))-96), '-');
And the result:
natoText =
string
"Hotel-Echo-Lima-Lima-Oscar"
This uses a trick that character arrays can be treated as numeric arrays of their ASCII equivalent values. The code char(lower(plaintext))-96 converts plaintext to lowercase, then to a character array (if it isn't already) and implicitly converts it to a numeric vector of ASCII values by subtracting 96. Since 'a' is equal to 97, this creates an index vector containing the values 1 ('a') through 26 ('z'). This is used to index the string array natalph, and these are then joined together with hyphens.

VBA Trim leaving leading white space

I'm trying to compare strings in a macro and the data isn't always entered consistently. The difference comes down to the amount of leading white space (ie " test" vs. "test" vs. " test")
For my macro the three strings in the example should be equivalent. However I can't use Replace, as any spaces in the middle of the string (ex. "test one two three") should be retained. I had thought that was what Trim was supposed to do (as well as removing all trailing spaces). But when I use Trim on the strings, I don't see a difference, and I'm definitely left with white space at the front of the string.
So A) What does Trim really do in VBA? B) Is there a built in function for what I'm trying to do, or will I just need to write a function?
Thanks!
So as Gary's Student aluded to, the character wasn't 32. It was in fact 160. Now me being the simple man I am, white space is white space. So in line with that view I created the following function that will remove ALL Unicode characters that don't actual display to the human eye (i.e. non-special character, non-alphanumeric). That function is below:
Function TrueTrim(v As String) As String
Dim out As String
Dim bad As String
bad = "||127||129||141||143||144||160||173||" 'Characters that don't output something
'the human eye can see based on http://www.gtwiki.org/mwiki/?title=VB_Chr_Values
out = v
'Chop off the first character so long as it's white space
If v <> "" Then
Do While AscW(Left(out, 1)) < 33 Or InStr(1, bad, "||" & AscW(Left(out, 1)) & "||") <> 0 'Left(out, 1) = " " Or Left(out, 1) = Chr(9) Or Left(out, 1) = Chr(160)
out = Right(out, Len(out) - 1)
Loop
'Chop off the last character so long as it's white space
Do While AscW(Right(out, 1)) < 33 Or InStr(1, bad, "||" & AscW(Right(out, 1)) & "||") <> 0 'Right(out, 1) = " " Or Right(out, 1) = Chr(9) Or Right(out, 1) = Chr(160)
out = Left(out, Len(out) - 1)
Loop
End If 'else out = "" and there's no processing to be done
'Capture result for return
TrueTrim = out
End Function
TRIM() will remove all leading spaces
Sub demo()
Dim s As String
s = " test "
s2 = Trim(s)
msg = ""
For i = 1 To Len(s2)
msg = msg & i & vbTab & Mid(s2, i, 1) & vbCrLf
Next i
MsgBox msg
End Sub
It is possible your data has characters that are not visible, but are not spaces either.
Without seeing your code it is hard to know, but you could also use the Application.WorksheetFunction.Clean() method in conjunction with the Trim() method which removes non-printable characters.
MSDN Reference page for WorksheetFunction.Clean()
Why don't you try using the Instr function instead? Something like this
Function Comp2Strings(str1 As String, str2 As String) As Boolean
If InStr(str1, str2) <> 0 Or InStr(str2, str1) <> 0 Then
Comp2Strings = True
Else
Comp2Strings = False
End If
End Function
Basically you are checking if string1 contains string2 or string2 contains string1. This will always work, and you dont have to trim the data.
VBA's Trim function is limited to dealing with spaces. It will remove spaces at the start and end of your string.
In order to deal with things like newlines and tabs, I've always imported the Microsoft VBScript RegEx library and used it to replace whitespace characters.
In your VBA window, go to Tools, References, the find Microsoft VBScript Regular Expressions 5.5. Check it and hit OK.
Then you can create a fairly simple function to trim all white space, not just spaces.
Private Function TrimEx(stringToClean As String)
Dim re As New RegExp
' Matches any whitespace at start of string
re.Pattern = "^\s*"
stringToClean = re.Replace(stringToClean, "")
' Matches any whitespace at end of string
re.Pattern = "\s*$"
stringToClean = re.Replace(stringToClean, "")
TrimEx = stringToClean
End Function
Non-printables divide different lines of a Web page. I replaced them with X, Y and Z respectively.
Debug.Print Trim(Mid("X test ", 2)) ' first place counts as 2 in VBA
Debug.Print Trim(Mid("XY test ", 3)) ' second place counts as 3 in VBA
Debug.Print Trim(Mid("X Y Z test ", 2)) ' more rounds needed :)
Programmers prefer large text as may neatly be chopped with built in tools (inSTR, Mid, Left, and others). Use of text from several children (i.e taking .textContent versus .innerText) may result several non-printables to cope with, yet DOM and REGEX are not for beginners. Addressing sub-elements for inner text precisely (child elements one-by-one !) may help evading non-printable characters.

String Manipulation in Lua: Make the odd char uppercase

I'm trying to do a library in Lua with some function that manipulate strings.
I want to do a function that changes the letter case to upper only on odd characters of the word.
This is an example:
Input: This LIBRARY should work with any string!
Result: ThIs LiBrArY ShOuLd WoRk WiTh AnY StRiNg!
I tried with the "gsub" function but i found it really difficult to use.
This almost works:
original = "This LIBRARY should work with any string!"
print(original:gsub("(.)(.)",function (x,y) return x:upper()..y end))
It fails when the string has odd length and the last char is a letter, as in
original = "This LIBRARY should work with any strings"
I'll leave that case as an exercise.
First, split the string into an array of words:
local original = "This LIBRARY should work with any string!"
local words = {}
for v in original:gmatch("%w+") do
words[#words + 1] = v
end
Then, make a function to turn words like expected, odd characters to upper, even characters to lower:
function changeCase(str)
local u = ""
for i = 1, #str do
if i % 2 == 1 then
u = u .. string.upper(str:sub(i, i))
else
u = u .. string.lower(str:sub(i, i))
end
end
return u
end
Using the function to modify every words:
for i,v in ipairs(words) do
words[i] = changeCase(v)
end
Finally, using table.concat to concatenate to one string:
local result = table.concat(words, " ")
print(result)
-- Output: ThIs LiBrArY ShOuLd WoRk WiTh AnY StRiNg
Since I am coding mostly in Haskell lately, functional-ish solution comes to mind:
local function head(str) return str[1] end
local function tail(str) return substr(str, 2) end
local function helper(str, c)
if #str == 0 then
return ""
end
if c % 2 == 1 then
return toupper(head(str)) .. helper(tail(str),c+1)
else
return head(str) .. helper(tail(str), c+1)
end
end
function foo(str)
return helper(str, 1)
end
Disclaimer: Not tested, just showing the idea.
And now for real, you can treat a string like a list of characters with random-access with reference semantics on []. Simple for loop with index should do the trick just fine.

Resources