Lua -- match strings including non-letter classes - string

I'm trying to find exact matches of strings in Lua including, special characters. I want the example below to return that it is an exact match, but because of the - character it returns nil
index = string.find("test-string", "test-string")
returns nil
index = string.find("test-string", "test-")
returns 1
index = string.find("test-string", "test")
also returns 1
How can I get it to do full matching?

- is a pattern operator in a Lua string pattern, so when you say test-string, you're telling find() to match the string test as few times as possible. So what happens is it looks at test-string, sees test in there, and since - isn't an actual minus sign in this case, it's really looking for teststring.
Do as Mike has said and escape it with the % character.
I found this helpful for better understanding patterns.

You can also ask for a plain substring match that ignores magic characters:
string.find("test-string", "test-string",1,true)

you need to escape special characters in the pattern with the % character.
so in this case you are looking for
local index = string.find('test-string', 'test%-string')

Related

How to match a wildcard for strings?

Please suggest a wildcard for below Firstjson list
Firstjson = { p10_7_8 , p10_7_2 , p10_7_3 p10_7_4}
I have tried p10.7.* wildcard for below Secondjson list, it worked. But when I tried p10_7_* for above Firstjson list it did not work
Secondjson = { p10.7.8 , p10.7.2 , p10.7.3 , p10.7.4 }
You are attempting to use wildcard syntax, but Groovy expects regular expression syntax for its pattern matching.
What went wrong with your attempt:
Attempt #1: p10.7.*
A regular expression of . matches any single character and .* matches 0 or more characters. This means:
p10{exactly one character of any kind here}7{zero or more characters of any
kind here}
You didn't realize it, but the . character in your first attempt was acting like a single-character wildcard too. This might match with p10x7abcdefg for example. It also does match p10.7.8 though. But be careful, it also matches p10.78, because the .* expression at the end of your pattern will happily match any sequence of characters, thus any and all characters following p10.7 are accepted.
Attempt #2: p10_7_*
_ matches only a literal underscore. But _* means to match zero or more underscores. It does not mean to match characters of any kind. So p10_7_* matches things like p10_7_______. Literally:
p10_7{zero or more underscores here}
What you can do instead:
You probably want a regular expression like p10_7_\d+
This will match things like p10_7_3 or p10_7_422. It works by matching the literal text p10_7_ followed by one or more digits where a digit is 0 through 9. \d matches any digit, and + means to match one or more of the preceding thing. Literally:
p10_7_{one or more digits here}

Match whole, exact text line with Lua

I'm looking for a little help on some Lua. I need some code to match this exact line:
efs.test efs.test.gpg
Here's what I have so far, which matches "efs.test":
if string.match(a.message, "%a+%a+%a+.%%a+%a+%a+%a+") then
print(a.message)
else
print ("Does not match")
end
I've also tried this, which matches:
if string.match(a.message, "efs.test") then
print(a.message)
else
print ("Does not match")
end
But when I try to add the extra text my compiler errors with "Number expected, got string" when running this code:
if string.match(a.message, "efs.test", "efs") then
print(a.message)
else
print ("Does not match")
end
Any pointers would be great!
Thanks.
if string.match(a.message, "%a+%a+%a+.%%a+%a+%a+%a+") then
Firstly, this is a wrong use of quantifiers. From PiL 20.2:
+ 1 or more repetitions
* 0 or more repetitions
- also 0 or more repetitions
? optional (0 or 1 occurrence)
In words, you try to match for unlimited %a+ after you already matched the full word with unlimited %a+
To match efs.test efs.test.gpg - we have 2 filenames I suppose, in a strict sense file names may contain only %w - alphanumeric characters (A-Za-z0-9). This would correctly match efs.test:
string.match(message, "%w+%.%w+")
Going one step further, match efs.test as filename and the following filename:
string.match(message, "%w+%.%w+ %w+%.%w+%.gpg")
While this would match both filenames, you would need to check if matched filenames are the same. We can go one step further yet:
local file, gpgfile = string.match(message, "(%w+%.%w+) (%1%.gpg)")
This pattern will return any <filename> <filename>.gpg where the filenames are equal.
With the use of capture-groups, we capture the filename: it will be returned as the first variable and further represented as %1. Then after the space char, we try to match for %1 (captured filename) followed by .gpg. Since it's also enclosed in brackets, it will become the second captured group and returned as the second variable. Done!
PS: You may want to grab ".gpg" by case-insensitive [Gg][Pp][Gg] pattern.
PPS: File names may contain spaces, dashes, UTF-8 characters etc. E.g. ext4 only forbids \0 and / characters.
string.match optional third argument is the index of the given string to start searching at. If you are looking for exactly efs.test efs.test.gpg in that order with that given spacing, why not just use:
string.match(a.message, "efs%.test efs%.test%.gpg")
If you want to match the entire line containing that substring:
string.match(a.message, ".*efs%.test efs%.test%.gpg.*")
For reference
If you are trying to match that exact line its way easier to just use:
if "efs.test efs.test.gpg" = a.message then
print(a.message)
else
print("string does not match!")
end
Of course this wouldn't find any other strings than this.
Another interpretation I see for your question is that you want to know if it has efs.test in the string, which you should be able to accomplish by doing:
if string.match(a.message, "%w+%.%w+") == "efs.test" then
...
end
Also, look into regex, it's basically the language Lua used to match strings with some exceptions.

How to match a part of string before a character into one variable and all after it into another

I have a problem with splitting string into two parts on special character.
For example:
12345#data
or
1234567#data
I have 5-7 characters in first part separated with "#" from second part, where are another data (characters,numbers, doesn't matter what)
I need to store two parts on each side of # in two variables:
x = 12345
y = data
without "#" character.
I was looking for some Lua string function like splitOn("#") or substring until character, but I haven't found that.
Use string.match and captures.
Try this:
s = "12345#data"
a,b = s:match("(.+)#(.+)")
print(a,b)
See this documentation:
First of all, although Lua does not have a split function is its standard library, it does have string.gmatch, which can be used instead of a split function in many cases. Unlike a split function, string.gmatch takes a pattern to match the non-delimiter text, instead of the delimiters themselves
It is easily achievable with the help of a negated character class with string.gmatch:
local example = "12345#data"
for i in string.gmatch(example, "[^#]+") do
print(i)
end
See IDEONE demo
The [^#]+ pattern matches one or more characters other than # (so, it "splits" a string with 1 character).

Efficient way to insert characters between other characters in a string

What is an efficient way in MATLAB to replace/insert one symbol (in series of symbols) with several others that correspond to the one that is being replaced?
For example, consider having a string Eq: Eq = 'A*exp(-((x-xc)/w)^2)'. Is there a way to replace * with .*, / with ./,\ with .\, and ^ with .^ without writing four separate strrep() lines?
Regular expressions will do the job nicely. Regular expressions simply find patterns in text. You specify what kind of pattern you are looking for by a regular expression, and the output gives you the locations of where the pattern occurred.
For our particular case, not only do we want to find where patterns occur, we also want to replace those patterns with something else. Specifically, use the function regexprep from MATLAB to replace matches in a string with something else. What you want to do is replace all *, /, \ and ^ symbols by adding a . in front of each.
How regexprep works is that the first input is the string you're looking at, the second input is a pattern that you're trying to find. In our case, we want to find any of *, /, \ and ^. To specify this pattern, you put those desired symbols in [] brackets. Regular expressions reserve \ as a special symbol to delineate characters that can be parsed as a regular expression but actually aren't. As such, you need to use \\ for the \ character and \^ for the ^ character. The third input is what you want to replace each match with. In our case, we simply want to reuse each matched character, but we add a . at the beginning of the match. This is done by doing \.$0 in the regular expression syntax. $0 means to grab the first token produced by a match... which is essentially the matched symbol from the pattern. . is also a reserved keyword using regular expressions, so we must prepend this symbol with a \ character.
Without further ado:
>> Eq = 'A*exp(-((x-xc)/w)^2)';
>> out = regexprep(Eq, '[*/\\\^]', '\.$0')
out =
A.*exp(-((x-xc)./w).^2)
The pattern we are looking for is [*/\\\^], which means that we want to find any of *, /, \ - denoted as \\ in regex, and \^ - denoted as ^ in regex. We want to find any of these symbols and replace them with the same symbol by adding a . character in front - \.$0.
As a more complicated example, let's make sure that we include all of the symbols you're looking for in a sample equation:
>> A = 'A*exp(-((x-xc)/w)^2) \ b^2';
>> out = regexprep(A, '[*/\\\^]', '\.$0')
out =
A.*exp(-((x-xc)./w).^2) .\ b.^2
I'd go with regexp as in rayryeng's answer. But here's another approach, just to provide an alternative.
ops = '*/\^'; %// operators that need a dot
ii = find(ismember(Eq, ops)); %// find where dots should be inserted
[~, jj] = sort([1:numel(Eq) ii-.5]); %// will be used to properly order the result
result = [Eq repmat('.',1,numel(ii))]; %// insert dots at the end
result = result(jj); %// properly order the result
And a variant:
ops = '*/\^'; %// operators that need a dot
ii = find(ismember(Eq, ops)); %// find where dots should be inserted
jj = sort([1:numel(Eq) ii-.5]); %// dot locations are marked with fractional part
result = Eq(ceil(jj)); %// repeat characters where the dots will be placed
result(mod(jj,1)>0) = '.'; %// place dots at indices with fractional part
The vectorize function already does almost all of what you want except that it does not convert mldivide (\) to ldivide (.\).
By "efficient," do you mean fewer lines of code or faster? Regular expressions are almost always slower than other approaches and less readable. I don't think they're necessary or a good choice in this case. If you only need to convert your string once, then speed is less of a concern than readability (strrep will still be faster). If you need to do it many times, this simple code that you alluded to is 4–5 times faster than regexrep for short strings like your example (and much faster for longer strings):
out = strrep(Eq,'*','.*');
out = strrep(out,'/','./');
out = strrep(out,'\','.\');
out = strrep(out,'^','.^');
If you want one line, use:
out = strrep(strrep(strrep(strrep(Eq,'*','.*'),'/','./'),'\','.\'),'^','.^');
which will also be slightly faster still. Or create your own version of vectorize and call that.
Where regular expressions shine is in more complex cases, e.g., if your string is already partially vectorized: Eq = 'A.*exp(-((x-xc)/w)^2)'. Even still, the vectorize function just uses strrep and then calls strfind to "remove any possible '..*', '../', etc." and replace them with the proper element-wise operators because it's faster (symbolic math strings can get very large, for example).

Lua plain searching with string.gsub?

With Lua's string.find function, there is an optional fourth argument you can pass to enable plain searching. From the Lua wiki:
The pattern argument also allows more complex searches. See the
PatternsTutorial for more information. We can turn off the pattern
matching feature by using the optional fourth argument plain. plain
takes a boolean value and must be preceeded by index. E.g.,
= string.find("Hello Lua user", "%su") -- find a space character followed by "u"
10 11
= string.find("Hello Lua user", "%su", 1, true) -- turn on plain searches, now not found
nil
Basically, I was wondering how I can accomplish the same plain searching using Lua's string.gsub function.
I expected there to be something in the standard library for this, but there isn't. The solution, then, is to escape the special characters in the pattern so they don't perform their usual functions.
Here's the general idea:
obtain the pattern string
replace any special characters with % followed by it (for example, % becomes %%, [ becomes %[
use this as your search pattern for replacing the text
Here is a simple library function for text replacement:
function string.replace(text, old, new)
local b,e = text:find(old,1,true)
if b==nil then
return text
else
return text:sub(1,b-1) .. new .. text:sub(e+1)
end
end
This function can be called as newtext = text:replace(old,new).
Note that this only replaces the first occurrence of old in text.
Use this function to escape all magic characters (and only those) in your search string.
function escape_magic(s)
return (s:gsub('[%^%$%(%)%%%.%[%]%*%+%-%?]','%%%1'))
end

Resources