Groovy How to replace the exact match word in a String - groovy

Groovy How to replace the exact match word in a String.
I wanted to replace the exact matched word in a given string in Groovy. and when i tried the below am not getting the exact matched word
def str="My Name is Richards and Richardson"
log.info(str)
str=str.replace("Richards","Praveen")
log.info("After"+str)
Output after executing the above
My Name is Richards and Richardson
AfterMy Name is Praveen and Praveenon
Am Looking for the output like : AfterMy Name is Praveen and Richardson
I tried the boundaries \b
str=str.replace("\bRichards\b","Praveen")
which is in Java and its not working. Looks \b is ba backslash escape sequence in the Groovy
can someone help
def str="My Name is Richards and Richardson"
log.info(str)
str=str.replace("Richards","Praveen")
log.info("After"+str)
expecting:AfterMy Name is Praveen and Richardson

Using boundaries (/b) will not work with String::replace because the method argument does not accept a regular expression pattern but a simple string literal.
You have two options to get the expected outcome:
Instead of using String::replace you can use String::replaceFirst. As the method name suggests it will replace only the first occurrence of the Richards substring leaving the Richardson as is.
str = str.replaceFirst("Richards", "Praveen")
Instead of using String::replace you can use String::replaceAll, in opposite to String::replace it supports regular expressions so you can use word boundaries tokens
str = str.replaceAll("\\bRichards\\b","Praveen")
Mind the double slashes!
Also, according to the String::replaceAll documentation:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll. Use Matcher.quoteReplacement(java.lang.String) to suppress the special meaning of these characters, if desired.

Related

Python 3 re.sub returns unprintable characters instead of string

I'm trying to convert dates in dd/mm/yy format to dd/mm/20yy format using re.sub and capture groups.
date = "25/11/20"
fixed_date = re.sub(r"(\d\d/\d\d/)(\d\d)", r"\120\2", date)
However, even though my regex seems to work on regex101.com, Python returns an imprintable character.
fixed_date
Out[42]: 'P20'
How can I get my string? In this case, it would be "25/11/2020"
Edit: date is actually a string
Do
fixed_date = re.sub(r"(\d\d\/\d\d\/)(\d\d)", r"\g<1>20\g<2>", date)
From re docs:
In string-type repl arguments, in addition to the character escapes and backreferences described above, \g<name> will use the substring matched by the group named name, as defined by the (?P<name>...) syntax. \g<number> uses the corresponding group number; \g<2> is therefore equivalent to \2, but isn’t ambiguous in a replacement such as \g<2>0. \20 would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character '0'. The backreference \g<0> substitutes in the entire substring matched by the RE.

python Using variable in re.search source.error("bad escape %s" % escape, len(escape)) [duplicate]

I want to use input from a user as a regex pattern for a search over some text. It works, but how I can handle cases where user puts characters that have meaning in regex?
For example, the user wants to search for Word (s): regex engine will take the (s) as a group. I want it to treat it like a string "(s)" . I can run replace on user input and replace the ( with \( and the ) with \) but the problem is I will need to do replace for every possible regex symbol.
Do you know some better way ?
Use the re.escape() function for this:
4.2.3 re Module Contents
escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
A simplistic example, search any occurence of the provided string optionally followed by 's', and return the match object.
def simplistic_plural(word, text):
word_or_plural = re.escape(word) + 's?'
return re.match(word_or_plural, text)
You can use re.escape():
re.escape(string)
Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it.
>>> import re
>>> re.escape('^a.*$')
'\\^a\\.\\*\\$'
If you are using a Python version < 3.7, this will escape non-alphanumerics that are not part of regular expression syntax as well.
If you are using a Python version < 3.7 but >= 3.3, this will escape non-alphanumerics that are not part of regular expression syntax, except for specifically underscore (_).
Unfortunately, re.escape() is not suited for the replacement string:
>>> re.sub('a', re.escape('_'), 'aa')
'\\_\\_'
A solution is to put the replacement in a lambda:
>>> re.sub('a', lambda _: '_', 'aa')
'__'
because the return value of the lambda is treated by re.sub() as a literal string.
Usually escaping the string that you feed into a regex is such that the regex considers those characters literally. Remember usually you type strings into your compuer and the computer insert the specific characters. When you see in your editor \n it's not really a new line until the parser decides it is. It's two characters. Once you pass it through python's print will display it and thus parse it as a new a line but in the text you see in the editor it's likely just the char for backslash followed by n. If you do \r"\n" then python will always interpret it as the raw thing you typed in (as far as I understand). To complicate things further there is another syntax/grammar going on with regexes. The regex parser will interpret the strings it's receives differently than python's print would. I believe this is why we are recommended to pass raw strings like r"(\n+) -- so that the regex receives what you actually typed. However, the regex will receive a parenthesis and won't match it as a literal parenthesis unless you tell it to explicitly using the regex's own syntax rules. For that you need r"(\fun \( x : nat \) :)" here the first parens won't be matched since it's a capture group due to lack of backslashes but the second one will be matched as literal parens.
Thus we usually do re.escape(regex) to escape things we want to be interpreted literally i.e. things that would be usually ignored by the regex paraser e.g. parens, spaces etc. will be escaped. e.g. code I have in my app:
# escapes non-alphanumeric to help match arbitrary literal string, I think the reason this is here is to help differentiate the things escaped from the regex we are inserting in the next line and the literal things we wanted escaped.
__ppt = re.escape(_ppt) # used for e.g. parenthesis ( are not interpreted as was to group this but literally
e.g. see these strings:
_ppt
Out[4]: '(let H : forall x : bool, negb (negb x) = x := fun x : bool =>HEREinHERE)'
__ppt
Out[5]: '\\(let\\ H\\ :\\ forall\\ x\\ :\\ bool,\\ negb\\ \\(negb\\ x\\)\\ =\\ x\\ :=\\ fun\\ x\\ :\\ bool\\ =>HEREinHERE\\)'
print(rf'{_ppt=}')
_ppt='(let H : forall x : bool, negb (negb x) = x := fun x : bool =>HEREinHERE)'
print(rf'{__ppt=}')
__ppt='\\(let\\ H\\ :\\ forall\\ x\\ :\\ bool,\\ negb\\ \\(negb\\ x\\)\\ =\\ x\\ :=\\ fun\\ x\\ :\\ bool\\ =>HEREinHERE\\)'
the double backslashes I believe are there so that the regex receives a literal backslash.
btw, I am surprised it printed double backslashes instead of a single one. If anyone can comment on that it would be appreciated. I'm also curious how to match literal backslashes now in the regex. I assume it's 4 backslashes but I honestly expected only 2 would have been needed due to the raw string r construct.

Lua -- match strings including non-letter classes

I'm trying to find exact matches of strings in Lua including, special characters. I want the example below to return that it is an exact match, but because of the - character it returns nil
index = string.find("test-string", "test-string")
returns nil
index = string.find("test-string", "test-")
returns 1
index = string.find("test-string", "test")
also returns 1
How can I get it to do full matching?
- is a pattern operator in a Lua string pattern, so when you say test-string, you're telling find() to match the string test as few times as possible. So what happens is it looks at test-string, sees test in there, and since - isn't an actual minus sign in this case, it's really looking for teststring.
Do as Mike has said and escape it with the % character.
I found this helpful for better understanding patterns.
You can also ask for a plain substring match that ignores magic characters:
string.find("test-string", "test-string",1,true)
you need to escape special characters in the pattern with the % character.
so in this case you are looking for
local index = string.find('test-string', 'test%-string')

How to match a part of string before a character into one variable and all after it into another

I have a problem with splitting string into two parts on special character.
For example:
12345#data
or
1234567#data
I have 5-7 characters in first part separated with "#" from second part, where are another data (characters,numbers, doesn't matter what)
I need to store two parts on each side of # in two variables:
x = 12345
y = data
without "#" character.
I was looking for some Lua string function like splitOn("#") or substring until character, but I haven't found that.
Use string.match and captures.
Try this:
s = "12345#data"
a,b = s:match("(.+)#(.+)")
print(a,b)
See this documentation:
First of all, although Lua does not have a split function is its standard library, it does have string.gmatch, which can be used instead of a split function in many cases. Unlike a split function, string.gmatch takes a pattern to match the non-delimiter text, instead of the delimiters themselves
It is easily achievable with the help of a negated character class with string.gmatch:
local example = "12345#data"
for i in string.gmatch(example, "[^#]+") do
print(i)
end
See IDEONE demo
The [^#]+ pattern matches one or more characters other than # (so, it "splits" a string with 1 character).

Meaning of $ in a string?

I came along this
__date__ = "$Date: 2011/06$"
and found this in the docs
$$ is an escape; it is replaced with a single $.
$identifier names a substitution placeholder matching a mapping key of "identifier". By default, "identifier" must spell a Python identifier. The first non-identifier character after the $ character terminates this placeholder specification.
${identifier} is equivalent to $identifier. It is required when valid identifier characters follow the placeholder but are not part of the placeholder, such as "${noun}ification".
but I don't understand it.
Could someone explain in plain english what's the $ for and give some examples preferably?
To Python, those dollar signs mean nothing at all. Just like the 'D' or 'a' that follow, the dollar sign is merely a character in a string.
To your source-code control system, the dollar signs indicate a substitution command. When you check out a new copy of your source code, that string is replaced with the timestamp of the last committed change to the file.
Reference:
http://svnbook.red-bean.com/en/1.6/svn.advanced.props.special.keywords.html
http://www.badgertronics.com/writings/cvs/keywords.html
This has been used in the context of string replace. For ex, if you have scenario with a variable which takes different value in same string, you can use this as follows:
import string
mytext = "$dog is an animal"
replaceDogtoCat = {"dog":"cat"}
mytemplate = string.Template(mytext)
print mytemplate.substitute(replaceDogtoCat) #output: cat is an animal
replaceDogtoGoat = {"dog":"goat"}
print mytemplate.substitute(replaceDogtoGoat) #output: goat is an animal
$dog is a variable which would get replaced when substitute gets executed

Resources