String.format and gsub in Lua - string

function string:split(sep)
local sep, fields = sep or ":", {}
local pattern = string.format("([^%s]+)", sep)
self:gsub(pattern,function (c)fields[#fields + 1] = c end)
print(c)
return fields
end
I have above block of code.
string.format function has separator as its second argument. Why is that? We generally have the blob of text as second argument which needs to be formatted.
gsub function usually replace a given pattern. what is he role of function(c) in gsub? How is it being called and used here? Where is c coming from in function(c) ?

In the example code, the format specifier of string.format() is "([^%s]+)", in which %s expects a string, thus the second argument sep is a string.
For instance, if sep has a value of ",", then pattern becomes ([^,]+) (one or more occurrences of non-commas), which means the function string:split is splitting strings by commas (,)
string.gsub() can take three types as the second argument, a string, a function, or a table. When it's a function, it is called every time a match occurs, with all captured substrings passed as arguments, in order. For more details, see string.gsub().

Related

Contatenating strings using sprintf

I wonder why the following code does not work?
function out = test(str1, str2)
aux = sprintf(str1, str2);
end
Somehow MATLAB does not like how I supply the argument str1, which is entered by the user, to the function sprintf.
Read the documentation on sprintf in MATLAB. It's rather clear what's wrong: the first argument in your case. MATLAB's sprintf requires a format spec, so unless the first argument is a format specification, rather than say 'hello' matching the second string, this won't work. You'll probably want something along the lines of sprintf('%s %s', str1, str2) or sprintf([str1 ' ' str2]), i.e. explicitly concatenating the strings into one literal text string first.
Your current function will work if you call it as test('%s', 'hello') by the way, or even test('%f %f', [pi 5]). So you might want to use input verification to make sure you're only inputting strings.

pass regex group to function for substituting [duplicate]

I have a string S = '02143' and a list A = ['a','b','c','d','e']. I want to replace all those digits in 'S' with their corresponding element in list A.
For example, replace 0 with A[0], 2 with A[2] and so on. Final output should be S = 'acbed'.
I tried:
S = re.sub(r'([0-9])', A[int(r'\g<1>')], S)
However this gives an error ValueError: invalid literal for int() with base 10: '\\g<1>'. I guess it is considering backreference '\g<1>' as a string. How can I solve this especially using re.sub and capture-groups, else alternatively?
The reason the re.sub(r'([0-9])',A[int(r'\g<1>')],S) does not work is that \g<1> (which is an unambiguous representation of the first backreference otherwise written as \1) backreference only works when used in the string replacement pattern. If you pass it to another method, it will "see" just \g<1> literal string, since the re module won't have any chance of evaluating it at that time. re engine only evaluates it during a match, but the A[int(r'\g<1>')] part is evaluated before the re engine attempts to find a match.
That is why it is made possible to use callback methods inside re.sub as the replacement argument: you may pass the matched group values to any external methods for advanced manipulation.
See the re documentation:
re.sub(pattern, repl, string, count=0, flags=0)
If repl is a function, it is called for every non-overlapping
occurrence of pattern. The function takes a single match object
argument, and returns the replacement string.
Use
import re
S = '02143'
A = ['a','b','c','d','e']
print(re.sub(r'[0-9]',lambda x: A[int(x.group())],S))
See the Python demo
Note you do not need to capture the whole pattern with parentheses, you can access the whole match with x.group().

How to match a part of string before a character into one variable and all after it into another

I have a problem with splitting string into two parts on special character.
For example:
12345#data
or
1234567#data
I have 5-7 characters in first part separated with "#" from second part, where are another data (characters,numbers, doesn't matter what)
I need to store two parts on each side of # in two variables:
x = 12345
y = data
without "#" character.
I was looking for some Lua string function like splitOn("#") or substring until character, but I haven't found that.
Use string.match and captures.
Try this:
s = "12345#data"
a,b = s:match("(.+)#(.+)")
print(a,b)
See this documentation:
First of all, although Lua does not have a split function is its standard library, it does have string.gmatch, which can be used instead of a split function in many cases. Unlike a split function, string.gmatch takes a pattern to match the non-delimiter text, instead of the delimiters themselves
It is easily achievable with the help of a negated character class with string.gmatch:
local example = "12345#data"
for i in string.gmatch(example, "[^#]+") do
print(i)
end
See IDEONE demo
The [^#]+ pattern matches one or more characters other than # (so, it "splits" a string with 1 character).

split string by char

scala has a standard way of splitting a string in StringOps.split
it's behaviour somewhat surprised me though.
To demonstrate, using the quick convenience function
def sp(str: String) = str.split('.').toList
the following expressions all evaluate to true
(sp("") == List("")) //expected
(sp(".") == List()) //I would have expected List("", "")
(sp("a.b") == List("a", "b")) //expected
(sp(".b") == List("", "b")) //expected
(sp("a.") == List("a")) //I would have expected List("a", "")
(sp("..") == List()) // I would have expected List("", "", "")
(sp(".a.") == List("", "a")) // I would have expected List("", "a", "")
so I expected that split would return an array with (the number a separator occurrences) + 1 elements, but that's clearly not the case.
It is almost the above, but remove all trailing empty strings, but that's not true for splitting the empty string.
I'm failing to identify the pattern here. What rules does StringOps.split follow?
For bonus points, is there a good way (without too much copying/string appending) to get the split I'm expecting?
For curious you can find the code here.https://github.com/scala/scala/blob/v2.12.0-M1/src/library/scala/collection/immutable/StringLike.scala
See the split function with the character as an argument(line 206).
I think, the general pattern going on over here is, all the trailing empty splits results are getting ignored.
Except for the first one, for which "if no separator char is found then just send the whole string" logic is getting applied.
I am trying to find if there is any design documentation around these.
Also, if you use string instead of char for separator it will fall back to java regex split. As mentioned by #LRLucena, if you provide the limit parameter with a value more than size, you will get your trailing empty results. see http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String,%20int)
You can use split with a regular expression. I´m not sure, but I guess that the second parameter is the largest size of the resulting array.
def sp(str: String) = str.split("\\.", str.length+1).toList
Seems to be consistent with these three rules:
1) Trailing empty substrings are dropped.
2) An empty substring is considered trailing before it is considered leading, if applicable.
3) First case, with no separators is an exception.
split follows the behaviour of http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split(java.lang.String)
That is split "around" the separator character, with the following exceptions:
Regardless of anything else, splitting the empty string will always give Array("")
Any trailing empty substrings are removed
Surrogate characters only match if the matched character is not part of a surrogate pair.

How do I compare the contents of a capture with another string in lua

Suppose I have a string like so.
string = "This is just an {example} of a string. {Quite} boring."
At some point in the code, I want to use a function to replace the words between the curly brackets with something else. I prodded around in the manual, and came up with this solution.
function stringModify(a)
return string.gsub(a, '{(.-)}', stringDecide("%1"))
end
function stringDecide(a)
if a == "example" then
return "excellent example"
elseif a == "Quite" then
return "Not"
else
return "ERROR"
end
end
Only that it doesn't work the way I want it. The if part, for example, treats variable a as a literal "%1", instead of the contents of the capture.
How do I make it so that the contents of the capture is compared, instead the literal interpretation of the "%1" string?
You don't have to do the whole %1 thing to pass the captures over to your function. One possible mode of operation of string.gsub takes a function as an argument and passes it a string/array of strings representing the capture(s) every time it finds a match:
The last use of captured values is perhaps the most powerful. We can call string.gsub with a function as its third argument, instead of a replacement string. When invoked this way, string.gsub calls the given function every time it finds a match; the arguments to this function are the captures, while the value that the function returns is used as the replacement string.
Taking this into account, you can just remove a few characters from your existing code so you pass the function rather than calling it:
function stringModify(a)
return string.gsub(a, '{(.-)}', stringDecide)
end
and your stringDecide function will work unmodified, as you can see.
In your existing code, what you're wanting to happen is that string.gsub will call stringDecide for each match, substituting the string captured into the string parameter to stringDecide on each call, but what's in fact happening is that stringDecide is being called once with the literal parameter "%1" before string.gsub is even called, and it is returning "ERROR" as expected, basically expanding your string.gsub call in place to string.gsub(a, '{(.-)}', "ERROR").

Resources