Python - Removing parentheses and quotation marks - python-3.x

I've been trying to remove pair parentheses(including text between them), unbalanced parentheses and quotation marks from the string.
What I've done so far:
import re
sample_text = '""sads"add"sfsfdsfds()()(0sefdAAAsfs)dasdad(asd'
res = re.sub(r'\([^)]*\)', '', sample_text))
It matches with only ()()(0sefdAAAsfs) part of the text. Unbalanced and quotation marks left unmatched. What can be done to improve above regex?

This isn't really something that a regular expression is suited for, so is not the right tool for the job. Having said that, you can use the following pattern to see if there is an opening paren, then zero or more non-paren, and a matching closing paren:
\([^)+]*\)
Substitute a " or ' or [ or whatever for the other types of matching components.
But again, this would not work with something like this:
(asdf))))))))
Long story short: it's not a problem that a regular expression is capable of solving. Try testing it out here: https://regex101.com/r/bdiK5W/2.

Related

Regex remove both apostrophes if they exist in Python

I’m quit new to Regex but almost finished with my text mining script. Only one thing fails: I’m trying to remove the apostrophes between a word if they exist. I’m using re.sub for this.
For instance:
‘Apple’ needs to be Apple
‘apple’ needs to be apple
‘[apple]’ needs to be [apple]
‘(apple)’ needs to be (apple)
However: Apple’s needs to stay Apple’s because there is only one apostrophe.
How do I select both apostrophes when there is a word in between so I can delete them with re.sub? In every try I remove the entire string! Hopefully someone can help.
My code is as follows:
str_o='\'Apple\''
str_o_a = re.sub(r"\'(.*?)\'","", str_o)
I have a simpler idea: split by whitespace, trim leading and trailing apostrophes, join with whitespace. Avoids having to write a regular expression and handles sentences such as "She's 'her' mother's daughter".
text = "She's 'her' mother's daughter"
text = ' '.join([word.strip("'") for word in text.split()])
print(text)
# She's her mother's daughter
The purpose of the parentheses in your regular expression was probably to capture the string you want to keep. The idiom looks like
str_o_a = re.sub(r"'([^']*)'", r"\1", str_o)
You want a raw string around the replacement, too, in order to preserve the backslash in the argument (otherwise you would be replacing with the literal string "\x01").
Notice also the preference for using a negated character class over a non-greedy "match anything" wildcard.

How to capture a string between parentheses?

str = "fa, (captured)[asd] asf, 31"
for word in str:gmatch("\(%a+\)") do
print(word)
end
Hi! I want to capture a word between parentheses.
My Code should print "captured" string.
lua: /home/casey/Desktop/test.lua:3: invalid escape sequence near '\('
And i got this syntax error.
Of course, I can just find position of parentheses and use string.sub function
But I prefer simple code.
Also, brackets gave me a similar error.
The escape character in Lua patterns is %, not \. So use this:
word=str:match("%((%a+)%)")
If you only need one match, there is no need for a gmatch loop.
To capture the string in square brackets, use a similar pattern:
word=str:match("%[(%a+)%]")
If the captured string is not entirely composed of letters, use .- instead of %a+.
lhf's answer likely gives you what you need, but I'd like to mention one more option that I feel is underused and may work for you as well. One issue with using %((%a+)%) is that it doesn't work for nested parentheses: if you apply it to something like "(text(more)text)", you'll get "more" even though you may expect "text(more)text". Note that you can't fix it by asking to match to the first closing parenthesis (%(([^%)]+)%)) as it will give you "text(more".
However, you can use %bxy pattern item, which balances x and y occurrences and will return (text(more)text) in this case (you'd need to use something like (%b()) to capture it). Again, this may be overkill for your case, but useful to keep in mind and may help someone else who comes across this problem.

vim search and replace between number

I have a pattern where there are double-quotes between numbers in a CSV file.
I can search for the pattern by [0-9]\"[0-9], but how do I retain value while removing the double quote. CSV format is like this:
"1234"5678","Text1","Text2"
"987654321","Text3","text4"
"7812891"3","Text5","Text6"
As you may notice there are double quotes between some numbers which I want to remove.
I have tried the following way, which is incorrect:
:%s/[0-9]\"[0-9]/[0-9][0-9]/g
Is it possible to execute a command at every search pattern, maybe go one character forward and delete it. How can "lx" be embedded in search and replace.
You need to capture groups. Try:
:%s/\(\d\)"\(\d\)/\1\2/g
[A digit can also be denoted by \d.]
I know that this question has been answered already, but here's another approach:
:%s/\d\zs"\ze\d
Explanation:
%s   Substitute for the whole buffer
\d   look up for a digit
\zs set the start of match here
"     look up for a double-quote
\ze set the end of match here
\d   look up for a digit
That makes the substitute command to match only the double-quote surrounded by digits.
Omitting the replacement string just deletes the match.
You need boundaries to use in regular expression.
Try this:
:%s/\([0-9]\)"\([0-9]\)/\1\2/g
A bit naive solution:
%s/^"/BEGINNING OF LINE QUOTE MARK/g
%s/\",\"/quote comma quote/g
%s/\"$/quota end of line/g
%s/\"//g
%s/quota end of line/"/g
%s/quote comma quote/","/g
%s/BEGINNING OF LINE QUOTE MARK/"/g
A macro can be created quite easy out of it and invoked as many times as needed.

Ignore escape characters (backslashes) in R strings

While running an R-plugin in SPSS, I receive a Windows path string as input e.g.
'C:\Users\mhermans\somefile.csv'
I would like to use that path in subsequent R code, but then the slashes need to be replaced with forward slashes, otherwise R interprets it as escapes (eg. "\U used without hex digits" errors).
I have however not been able to find a function that can replace the backslashes with foward slashes or double escape them. All those functions assume those characters are escaped.
So, is there something along the lines of:
>gsub('\\', '/', 'C:\Users\mhermans')
C:/Users/mhermans
You can try to use the 'allowEscapes' argument in scan()
X=scan(what="character",allowEscapes=F)
C:\Users\mhermans\somefile.csv
print(X)
[1] "C:\\Users\\mhermans\\somefile.csv"
As of version 4.0, introduced in April 2020, R provides a syntax for specifying raw strings. The string in the example can be written as:
path <- r"(C:\Users\mhermans\somefile.csv)"
From ?Quotes:
Raw character constants are also available using a syntax similar to the one used in C++: r"(...)" with ... any character sequence, except that it must not contain the closing sequence )". The delimiter pairs [] and {} can also be used, and R can be used in place of r. For additional flexibility, a number of dashes can be placed between the opening quote and the opening delimiter, as long as the same number of dashes appear between the closing delimiter and the closing quote.
First you need to get it assigned to a name:
pathname <- 'C:\\Users\\mhermans\\somefile.csv'
Notice that in order to get it into a name vector you needed to double them all, which gives a hint about how you could use regex. Actually, if you read it in from a text file, then R will do all the doubling for you. Mind you it not really doubling the backslashes. It is being stored as a single backslash, but it's being displayed like that and needs to be input like that from the console. Otherwise the R interpreter tries (and often fails) to turn it into a special character. And to compound the problem, regex uses the backslash as an escape as well. So to detect an escape with grep or sub or gsub you need to quadruple the backslashes
gsub("\\\\", "/", pathname)
# [1] "C:/Users/mhermans/somefile.csv"
You needed to doubly "double" the backslashes. The first of each couple of \'s is to signal to the grep machine that what next comes is a literal.
Consider:
nchar("\\A")
# returns `[1] 2`
If file E:\Data\junk.txt contains the following text (without quotes): C:\Users\mhermans\somefile.csv
You may get a warning with the following statement, but it will work:
texinp <- readLines("E:\\Data\\junk.txt")
If file E:\Data\junk.txt contains the following text (with quotes): "C:\Users\mhermans\somefile.csv"
The above readlines statement might also give you a warning, but will now contain:
"\"C:\Users\mhermans\somefile.csv\""
So, to get what you want, make sure there aren't quotes in the incoming file, and use:
texinp <- suppressWarnings(readLines("E:\\Data\\junk.txt"))

Replacing quote marks around strings in Vim?

I have something akin to <Foobar Name='Hello There'/> and need to change the single quotation marks to double quotation marks. I tried :s/\'.*\'/\"\0\" but it ended up producing <Foobar Name="'Hello There'"/>. Replacing the \0 with \1 only produced a blank string inside the double quotes - is there some special syntax I'm missing that I need to make only the found string ("Hello There") inside the quotation marks assign to \1?
There's also surround.vim, if you're looking to do this fairly often. You'd use cs'" to change surrounding quotes.
You need to use groupings:
:s/\'\(.*\)\'/\"\1\"
This way argument 1 (ie, \1) will correspond to whatever is delimited by \( and \).
%s/'\([^']*\)'/"\1"/g
You will want to use [^']* instead of .* otherwise
'apples' are 'red' would get converted to "apples' are 'red"
unless i'm missing something, wouldn't s/\'/"/g work?
Just an FYI - to replace all double quotes with single, this is the correct regexp - based on rayd09's example above
:%s/"\([^"]*\)"/'\1'/g
You need to put round brackets around the part of the expression you wish to capture.
s/\'\(.*\)\'/"\1"/
But, you might have problems with unintentional matching. Might you be able to simply replace any single quotes with double quotes in your file?
You've got the right idea -- you want to have "\1" as your replace clause, but you need to put the "Hello There" part in capture group 1 first (0 is the entire match). Try:
:%/'\(.*\)'/"\1"
Shift + V to enter visual block mode. Highlight the lines of code you want to remove single quotes from.
Then hit : on keyboard
Then type
s/'//g
Press Enter.
Done. You win.
Presuming you want to do this on an entire file ...
N Mode:
ggvG$ [SHIFT+:]
X Mode:
'<,'>/'/" [RET]

Resources