How to match a part of string before a character into one variable and all after it into another - string

I have a problem with splitting string into two parts on special character.
For example:
12345#data
or
1234567#data
I have 5-7 characters in first part separated with "#" from second part, where are another data (characters,numbers, doesn't matter what)
I need to store two parts on each side of # in two variables:
x = 12345
y = data
without "#" character.
I was looking for some Lua string function like splitOn("#") or substring until character, but I haven't found that.

Use string.match and captures.
Try this:
s = "12345#data"
a,b = s:match("(.+)#(.+)")
print(a,b)

See this documentation:
First of all, although Lua does not have a split function is its standard library, it does have string.gmatch, which can be used instead of a split function in many cases. Unlike a split function, string.gmatch takes a pattern to match the non-delimiter text, instead of the delimiters themselves
It is easily achievable with the help of a negated character class with string.gmatch:
local example = "12345#data"
for i in string.gmatch(example, "[^#]+") do
print(i)
end
See IDEONE demo
The [^#]+ pattern matches one or more characters other than # (so, it "splits" a string with 1 character).

Related

Python - Replacing repeated consonants with other values in a string

I want to write a function that, given a string, returns a new string in which occurences of a sequence of the same consonant with 2 or more elements are replaced with the same sequence except the first consonant - which should be replaced with the character 'm'.
The explanation was probably very confusing, so here are some examples:
"hello world" should return "hemlo world"
"Hannibal" should return "Hamnibal"
"error" should return "emror"
"although" should return "although" (returns the same string because none of the characters are repeated in a sequence)
"bbb" should return "mbb"
I looked into using regex but wasn't able to achieve what I wanted. Any help is appreciated.
Thank you in advance!
Regex is probably the best tool for the job here. The 'correct' expression is
test = """
hello world
Hannibal
error
although
bbb
"""
output = re.sub(r'(.)\1+', lambda g:f'm{g.group(0)[1:]}', test)
# '''
# hemlo world
# Hamnibal
# emror
# although
# mbb
# '''
The only real complicated part of this is the lambda that we give as an argument. re.sub() can accept one as its 'replacement criteria' - it gets passed a regex object (which we call .group(0) on to get the full match, i.e. all of the repeated letters) and should output a string, with which to replace whatever was matched. Here, we use it to output the character 'm' followed by the second character onwards of the match, in an f-string.
The regex itself is pretty straightforward as well. Any character (.), then the same character (\1) again one or more times (+). If you wanted just alphanumerics (i.e. not to replace duplicate whitespace characters), you could use (\w) instead of (.)

Lua - match only words outside {} braces in string and replace or append the words with substring

I have various strings with forms similar to:
This is a sentence outside braces{sentence{} with some words. {This is a
sentence inside braces with some words.}{This is a second sentence
inside braces.} Maybe some more words here for another sentence.
With Lua, I want to only match specific words in the string which are outside the "{}" braces. For example, I might want to match the word "sentence" outside the braces but not the occurrences of "sentence" inside the braces. I want to only match the bolded occurrences of the word not the italicized ones.
How to do it?
EDIT: What if I want append or replace the matched words while keeping the substrings inside the braces intact?
Example: append "word" to sentence:
This is a sentenceword outside braces{sentence{} with some words. {This is a
sentence inside braces with some words.}{This is a second sentence
inside braces.} Maybe some more words here for another sentenceword.
The simplest way to do this would be to replace all the brackets with a zero length strings in a temporary variable which you can then use to search for whatever you like.
You can easily do this using Lua's pattern matching and the following simple gsub code:
local tempStr = startStr:gsub("{.-}","")
The .- is the part that makes it grab everything between the { and } and gsub then replaces it all with a blank string.
Edit: The issue with the above method, as DarkWiiPlayer has pointed out is that the first open brace mathces with the first close brace which is incorrect.
The way around that is to use balanced braces (%b) as DarkWiiPlayer has recommended in his answer, like so:
local tempStr = startStr:gsub("%b{}","")
local function weird_match(word, str)
return str:gsub("%b{}", ''):match(word)
end
Replace balanced pairs of { and } with the empty string
Find the desired pattern (word) in the resulting string
Return the matched word (or its captures, if it has any)

Lua -- match strings including non-letter classes

I'm trying to find exact matches of strings in Lua including, special characters. I want the example below to return that it is an exact match, but because of the - character it returns nil
index = string.find("test-string", "test-string")
returns nil
index = string.find("test-string", "test-")
returns 1
index = string.find("test-string", "test")
also returns 1
How can I get it to do full matching?
- is a pattern operator in a Lua string pattern, so when you say test-string, you're telling find() to match the string test as few times as possible. So what happens is it looks at test-string, sees test in there, and since - isn't an actual minus sign in this case, it's really looking for teststring.
Do as Mike has said and escape it with the % character.
I found this helpful for better understanding patterns.
You can also ask for a plain substring match that ignores magic characters:
string.find("test-string", "test-string",1,true)
you need to escape special characters in the pattern with the % character.
so in this case you are looking for
local index = string.find('test-string', 'test%-string')

Efficient way to insert characters between other characters in a string

What is an efficient way in MATLAB to replace/insert one symbol (in series of symbols) with several others that correspond to the one that is being replaced?
For example, consider having a string Eq: Eq = 'A*exp(-((x-xc)/w)^2)'. Is there a way to replace * with .*, / with ./,\ with .\, and ^ with .^ without writing four separate strrep() lines?
Regular expressions will do the job nicely. Regular expressions simply find patterns in text. You specify what kind of pattern you are looking for by a regular expression, and the output gives you the locations of where the pattern occurred.
For our particular case, not only do we want to find where patterns occur, we also want to replace those patterns with something else. Specifically, use the function regexprep from MATLAB to replace matches in a string with something else. What you want to do is replace all *, /, \ and ^ symbols by adding a . in front of each.
How regexprep works is that the first input is the string you're looking at, the second input is a pattern that you're trying to find. In our case, we want to find any of *, /, \ and ^. To specify this pattern, you put those desired symbols in [] brackets. Regular expressions reserve \ as a special symbol to delineate characters that can be parsed as a regular expression but actually aren't. As such, you need to use \\ for the \ character and \^ for the ^ character. The third input is what you want to replace each match with. In our case, we simply want to reuse each matched character, but we add a . at the beginning of the match. This is done by doing \.$0 in the regular expression syntax. $0 means to grab the first token produced by a match... which is essentially the matched symbol from the pattern. . is also a reserved keyword using regular expressions, so we must prepend this symbol with a \ character.
Without further ado:
>> Eq = 'A*exp(-((x-xc)/w)^2)';
>> out = regexprep(Eq, '[*/\\\^]', '\.$0')
out =
A.*exp(-((x-xc)./w).^2)
The pattern we are looking for is [*/\\\^], which means that we want to find any of *, /, \ - denoted as \\ in regex, and \^ - denoted as ^ in regex. We want to find any of these symbols and replace them with the same symbol by adding a . character in front - \.$0.
As a more complicated example, let's make sure that we include all of the symbols you're looking for in a sample equation:
>> A = 'A*exp(-((x-xc)/w)^2) \ b^2';
>> out = regexprep(A, '[*/\\\^]', '\.$0')
out =
A.*exp(-((x-xc)./w).^2) .\ b.^2
I'd go with regexp as in rayryeng's answer. But here's another approach, just to provide an alternative.
ops = '*/\^'; %// operators that need a dot
ii = find(ismember(Eq, ops)); %// find where dots should be inserted
[~, jj] = sort([1:numel(Eq) ii-.5]); %// will be used to properly order the result
result = [Eq repmat('.',1,numel(ii))]; %// insert dots at the end
result = result(jj); %// properly order the result
And a variant:
ops = '*/\^'; %// operators that need a dot
ii = find(ismember(Eq, ops)); %// find where dots should be inserted
jj = sort([1:numel(Eq) ii-.5]); %// dot locations are marked with fractional part
result = Eq(ceil(jj)); %// repeat characters where the dots will be placed
result(mod(jj,1)>0) = '.'; %// place dots at indices with fractional part
The vectorize function already does almost all of what you want except that it does not convert mldivide (\) to ldivide (.\).
By "efficient," do you mean fewer lines of code or faster? Regular expressions are almost always slower than other approaches and less readable. I don't think they're necessary or a good choice in this case. If you only need to convert your string once, then speed is less of a concern than readability (strrep will still be faster). If you need to do it many times, this simple code that you alluded to is 4–5 times faster than regexrep for short strings like your example (and much faster for longer strings):
out = strrep(Eq,'*','.*');
out = strrep(out,'/','./');
out = strrep(out,'\','.\');
out = strrep(out,'^','.^');
If you want one line, use:
out = strrep(strrep(strrep(strrep(Eq,'*','.*'),'/','./'),'\','.\'),'^','.^');
which will also be slightly faster still. Or create your own version of vectorize and call that.
Where regular expressions shine is in more complex cases, e.g., if your string is already partially vectorized: Eq = 'A.*exp(-((x-xc)/w)^2)'. Even still, the vectorize function just uses strrep and then calls strfind to "remove any possible '..*', '../', etc." and replace them with the proper element-wise operators because it's faster (symbolic math strings can get very large, for example).

How do I remove lines from a string begins with specific string in Lua?

How do I remove lines from a string begins with another string in Lua ? For instance i want to remove all line from string result begins with the word <Table. This is the code I've written so far:
for line in result:gmatch"<Table [^\n]*" do line = "" end
string.gmtach is used to get all occurrences of a pattern. For replacing certain pattern, you need to use string.gsub.
Another problem is your pattern <Table [^\n]* will match all line containing the word <Table, not just begins with it.
Lua pattern doesn't support beginning of line anchor, this almost works:
local str = result:gsub("\n<Table [^\n]*", "")
except that it will miss on the first line. My solution is using a second run to test the first line:
local str1 = result:gsub("\n<Table [^\n]*", "")
local str2 = str1:gsub("^<Table [^\n]*\n", "")
The LPEG library is perfect
for this kind of task.
Just write a function to create custom line strippers:
local mk_striplines
do
local lpeg = require "lpeg"
local P = lpeg.P
local Cs = lpeg.Cs
local lpegmatch = lpeg.match
local eol = P"\n\r" + P"\r\n" + P"\n" + P"\t"
local eof = P(-1)
local linerest = (1 - eol)^1 * (eol + eof) + eol
mk_striplines = function (pat)
pat = P (pat)
local matchline = pat * linerest
local striplines = Cs (((matchline / "") + linerest)^1)
return function (str)
return lpegmatch (striplines, str)
end
end
end
Note that the argument to mk_striplines() may be a string or a
pattern.
Thus the result is very flexible:
mk_striplines (P"<Table" + P"</Table>") would create a stripper
that drops lines with two different patterns.
mk_striplines (P"x" * P"y"^0) drops each line starting with an
x followed by any number of y’s -- you get the idea.
Usage example:
local linestripper = mk_striplines "foo"
local test = [[
foo lorem ipsum
bar baz
buzz
foo bar
xyzzy
]]
print (linestripper (test))
The other answers provide good solutions to actually stripping lines from a string, but don't address why your code is failing to do that.
Reformatting for clarity, you wrote:
for line in result:gmatch"<Table [^\n]*" do
line = ""
end
The first part is a reasonable way to iterate over result and extract all spans of text that begin with <Table and continue up to but not including the next newline character. The iterator returned by gmatch returns a copy of the matching text on each call, and the local variable line holds that copy for the body of the for loop.
Since the matching text is copied to line, changes made to line are not and cannot modifying the actual text stored in result.
This is due to a more fundamental property of Lua strings. All strings in Lua are immutable. Once stored, they cannot be changed. Variables holding strings are actually holding a pointer into the internal table of reference counted immutable strings, which permits only two operations: internalization of a new string, and deletion of an internalized string with no remaining references.
So any approach to editing the content of the string stored in result is going to require the creation of an entirely new string. Where string.gmatch provides an iteration over the content but cannot allow it to be changed, string.gsub provides for creation of a new string where all text matching a pattern has been replaced by something new. But even string.gsub is not changing the immutable source text; it is creating a new immutable string that is a copy of the old with substitutions made.
Using gsub could be as simple as this:
result = result:gsub("<Table [^\n]*", "")
but that will disclose other defects in the pattern itself. First, and most obviously, nothing requires that the pattern match at only the beginning of the line. Second, the pattern does not include the newline, so it will leave the line present but empty.
All of that can be refined by careful and clever use of the pattern library. But it doesn't change the fact that you are starting with XML text and are not handling it with XML aware tools. In that case, any approach based on pattern matching or even regular expressions is likely to end in tears.
result = result:gsub('%f[^\n%z]<Table [^\n]*', '')
The start of this pattern, '%f[^\n%z], is a frontier pattern which will match any transition from either a newline or zero character to another character, and for frontier patterns the pre-first character counts as a zero character. In other words, using that prefix allows the rest of the pattern to match at either the first line or any other start-of-line.
Reference: the Lua 5.3 manual, section 6.4.1 on string patterns

Resources