Removing special characters from a string In a Groovy Script - groovy

I am looking to remove special characters from a string using groovy, i'm nearly there but it is removing the white spaces that are already in place which I want to keep. I only want to remove the special characters (and not leave a whitespace). I am running the below on a PostCode L&65$$ OBH
def removespecialpostcodce = PostCode.replaceAll("[^a-zA-Z0-9]+","")
log.info removespecialpostcodce
Currently it returns L65OBH but I am looking for it to return L65 OBH
Can anyone help?

Use below code :
PostCode.replaceAll("[^a-zA-Z0-9 ]+","")
instead of
PostCode.replaceAll("[^a-zA-Z0-9]+","")

To remove all special characters in a String you can use the invert regex character:
String str = "..\\.-._./-^+* ".replaceAll("[^A-Za-z0-1]","");
System.out.println("str: <"+str+">");
output:
str: <>
to keep the spaces in the text add a space in the character set
String str = "..\\.-._./-^+* ".replaceAll("[^A-Za-z0-1 ]","");
System.out.println("str: <"+str+">");
output:
str: < >

Related

Python - how to find string and remove string plus next x characters

I have the following string:
mystr = '(string_to_delete_20221012_11-36) keep this (string_to_delete_20221016_22-22) keep this (string_to_delete_20221017_20-55) keep this'
I wish to delete all the entries (string_to_deletexxxxxxxxxxxxxxx) (including the trailing space)
I sort of need pseudo code as follows:
If you find a string (string_to_delete then replace that string and the timestamp, closing parenthesis and trailing space with null e.g. delete the string (string_to_delete_20221012_11-36)
I would use a list comprehension but given that not all strings are contained inside parenthesis I cannot see what I could use to create the list via a string.split().
Is this somethng that needs regular expressions?
it seemed like a good place to put regex:
import re
pattern = r'\(string_to_delete_.*?\)\s*'
mystr = '(string_to_delete_20221012_11-36) keep this (string_to_delete_20221016_22-22) keep this (string_to_delete_20221017_20-55) keep this'
for match in re.findall(pattern, mystr):
mystr = mystr.replace(match, '', 1) # replace 1st occurence of matched str with empty string
print(mystr)
results with:
>> keep this keep this keep this
brief regex breakdown: \(string_to_delete_.*?\)\s*
\( look for left parenthesis - escape needed
match string string_to_delete_
.*? look for zero or more characters if any
\) match closing parenthesis
\s* include zero or more whitespaces after that

Apex - remove special characters from a string except for ''+"

In Apex, I want to remove all the special characters in a string except for "+". This string is actually a phone number. I have done the following.
String sampleText = '+44 597/58-31-30';
sampleText = sampleText.replaceAll('\\D','');
System.debug(sampleText);
So, what it prints is 44597583130.
But I want to keep the sign + as it is represents 00.
Can someone help me with this ?
Possible solutions
String sampleText = '+44 597/58-31-30';
// exclude all characters which you want to keep
System.debug(sampleText.replaceAll('[^\\+|\\d]',''));
// list explicitly each char which must be replaced
System.debug(sampleText.replaceAll('/|-| ',''));
Output in both case will be the same
|DEBUG| +44597583130
|DEBUG| +44597583130
Edit
String sampleText = '+0032 +497/+59-31-40';
System.debug(sampleText.replaceAll('(?!^\\+)[^\\d]',''));
|DEBUG|+0032497593140

re.sub replacing string using original sub-string

I have a text file. I would like to remove all decimal points and their trailing numbers, unless text is preceding.
e.g 12.29,14.6,8967.334 should be replaced with 12,14,8967
e.g happypants2.3#email.com should not be modified.
My code is:
import re
txt1 = "9.9,8.8,22.2,88.7,morris1.43#email.com,chat22.3#email.com,123.6,6.54"
txt1 = re.sub(r',\d+[.]\d+', r'\d+',txt1)
print(txt1)
unless there is an easier way of completing this, how do I modify r'\d+' so it just returns the number without a decimal place?
You need to make use of groups in your regex. You put the digits before the '.' into parentheses, and then you can use '\1' to refer to them later:
txt1 = re.sub(r',(\d+)[.]\d+', r',\1',txt1)
Note that in your attempted replacement code you forgot to replace the comma, so your numbers would have been glommed together. This still isn't perfect though; the first number, since it doesn't begin with a comma, isn't processed.
Instead of checking for a comma, the better way is to check word boundaries, which can be done using \b. So the solution is:
import re
txt1 = "9.9,8.8,22.2,88.7,morris1.43#email.com,chat22.3#email.com,123.6,6.54"
txt1 = re.sub(r'\b(\d+)[.]\d+\b', r'\1',txt1)
print(txt1)
Considering these are the only two types of string that is present in your file, you can explicitly check for these conditions.
This may not be an efficient way, but what I have done is split the str and check if the string contains #email.com. If thats true, I am just appending to a new list. For your 1st condition to satisfy, we can convert the str to int which will eliminate the decimal points.
If you want everything back to a str variable, you can use .join().
Code:
txt1 = "9.9,8.8,22.2,88.7,morris1.43#email.com,chat22.3#email.com,123.6,6.54"
txt_list = []
for i in (txt1.split(',')):
if '#email.com' in i:
txt_list.append(i)
else:
txt_list.append(str(int(float(i))))
txt_new = ",".join(txt_list)
txt_new
Output:
'9,8,22,88,morris1.43#email.com,chat22.3#email.com,123,6'

How to replace ALL characters in a string with one character

Does anyone know a method that allows you to replace all the characters in a word with a single character?
If not, can anyone suggest a way to basically print _ (underscore) the number of times which is the length of the string itself without using any loops or ifs in the code?
mystring = '_'*len(mystring)
Of course, I'm guessing at the name of your string variable and the character that you want to use.
Or, if you just want to print it out, you can:
print('_'*len(mystring))
import re
str = "abcdefghi"
print(re.sub('[a-z]','_',str))

Groovy Multiline String Doesn't Recognize Lines with Only Whitespaces

I'm guessing this is a well known issue and there's an efficient workaround somehow.
I am getting output which has lines in it that contain a fixed number of empty spaces. I'm doing a string comparison test such as the one below as part of a unit test. Is there a way to get this to pass without modifying the strings using stripIndent() or the like?
Note, the test below is supposed to have 4 white spaces in the seemingly empty line between testStart and testEnd in the multiline string. However, stack overflow may be removing it?
String singleLine = 'testStart\n \ntestEnd'
String multiLine =
'''
testStart
testEnd
'''
println singleLine
println multiLine
assert singleLine == multiLine
String singleLine = 'testStart\n \ntestEnd'
String multiLine =
'''
testStart
(assume there are 4 spaces on this line)
testEnd
'''
println singleLine
println multiLine
assert singleLine == multiLine
That assertion is supposed to fail. The first character in singleLine is the character t from testStart. The first character in multiLine is a newline character because the String begins immediately after the opening ''' and the first character you have after that is a newline character. You have the same issue at the end of the string. You could solve that in a couple of ways:
String multiLine =
'''\
testStart
(assume there are 4 spaces on this line)
testEnd\
'''
Or:
String multiLine =
'''testStart
(assume there are 4 spaces on this line)
testEnd'''
This was being caused by an intelliJ default setting. I have now resolved it.
http://blog.darrenscott.com/2015/01/24/intellij-idea-14-how-to-stop-stripping-of-trailing-spaces/

Resources