multiple variable in python regex - python-3.x

I have seen several related posts and several forums to find an answer for my question, but nothing has come up to what I need.
I am trying to use variable instead of hard-coded values in regex which search for either word in a line.
However i am able to get desired result if i don't use variable.
<http://www.somesite.com/software/sub/a1#Msoffice>
<http://www.somesite.com/software/sub1/a1#vlc>
<http://www.somesite.com/software/sub2/a2#dell>
<http://www.somesite.com/software/sub3/a3#Notepad>
re.search(r"\#Msoffice|#vlc|#Notepad", line)
This regex will return the line which has #Msoffice OR #vlc OR #Notepad.
I tried defining a single variable using re.escape and that worked absolutely fine. However i have tried many combination using | and , (pipe and comma) but no success.
Is there any way i can specify #Msoffice , #vlc and #Notepad in different variables and so later i can change those ?
Thanks in advance!!

If I did understand you the right way you'd like to insert variables in your regex.
You are actually using a raw string using r' ' to make the regex more readable, but if you're using f' ' it allows you to insert any variables using {your_var} then construct your regex as you like:
var1 = '#Msoffice'
var2 = '#vlc'
var3 = '#Notepad'
re.search(f'{var1}|{var2}|{var3}', line)
The most annoying issue is that you will have to add \ to escaped char, to look for \ it will be \\
Hope it helped

import re
lines = ["<http://www.somesite.com/software/sub/a1#Msoffice>",
"<http://www.somesite.com/software/sub1/a1#vlc>",
"<http://www.somesite.com/software/sub2/a2#dell>",
"<http://www.somesite.com/software/sub3/a3#Notepad>"]
for line in lines:
if re.search(r'\b(?:\#{}|\#{}|\#{})\b'.format('Msoffice', 'vlc', 'Notepad'), line):
print(line)
Output :
<http://www.somesite.com/software/sub/a1#Msoffice>
<http://www.somesite.com/software/sub1/a1#vlc>
<http://www.somesite.com/software/sub3/a3#Notepad>

Related

How to parse a configuration file (kind of a CSV format) using LUA

I am using LUA on a small ESP8266 chip, trying to parse a text string that looks like the one below. I am very new at LUA, and tried many similar scripts found at this forum.
data="
-- String to be parsed\r\n
Tiempo1,20\r\n
Tiempo2a,900\r\n
Hora2b,27\r\n
Tiempo2b,20\r\n
Hora2c,29\r\n
Tiempo2c,18\r\n"
My goal would be to parse the string, and return all the configuration pairs (name/value).
If needed, I can modify the syntax of the config file because it is created by me.
I have been trying something like this:
var1,var2 = data:match("([Tiempo2a,]), ([^,]+)")
But it is returning nil,nil. I think I am on the very wrong way to do this.
Thank you very much for any help.
You need to use gmatch and parse the values excluding non-printable characters (\r\n) at the end of the line or use %d+
local data=[[
-- String to be parsed
Tiempo1,20
Tiempo2a,900
Hora2b,27
Tiempo2b,20
Hora2c,29
Tiempo2c,18]]
local t = {}
for k,v in data:gmatch("(%w-),([^%c]+)") do
t[#t+1] = { k, v }
print(k,v)
end

re.MULTILINE flag is interfering with the end of line $ operator

Sorry if this is a duplicate/basic question, I couldn't find any similar questions.
I have the following multiline string
my_txt = """
foo.exe\n
bar.exec\n
abab.exe\n
"""
(The newlines aren't actually written in my code, I put them there for clarity).
I want to match every file that ends with a .exe, (not .exec).
My regex was initially:
my_reg = re.compile(".+[.](?=exe$)")
my_matches = my_reg.finditer(my_txt)
I hoped that it would first find every character, go back until it found the ., and then check if the characters exe and a newline followed.
Only one match was found, and that was:
abab.exe.
I tried to mess around a bit, and changed the first line:
my_reg = re.compile(".+[.](?=exe$)",flags=re.MULTILINE).
This time, it successfully ran, returning
foo.
abab.
I thought re.MULTILINE wasn't supposed to interfere with the $ operator, or am I wrong about the $ operator/misusing something?
Thanks in advance!
You do need the multiline flag, otherwise $ will only match the absolute end of your input. You just need to match exe instead of using a lookahead:
my_reg = re.compile(".+[.]exe$", re.MULTILINE)
Output:
['foo.exe', 'abab.exe']
Demo
If you are trying to match the filename without the extension, you can put the period inside the lookahead:
my_reg = re.compile(r".+(?=\.exe$)", re.MULTILINE)
Output:
['foo', 'abab']
Demo

How to get the content after a string using regex in python

I am having a string as follows:
A5697[2:10] = {ravi, rageev, raghav, smith};
I want the content after "A5697[2:10] =". So, my output should be:
{ravi, rageev, raghav, smith};
This is my code:
print(re.search(r'(?<=A\d+\[.*\] =\s).*', line).group())
But, this is giving error:
sre_constants.error: look-behind requires fixed-width pattern
Can anyone help to solve this issue? I would prefer to use regex.
You can try re.sub , like below, Since you have given only one data point. I am assuming all the other data points are following the similar pattern.
import re
text = "A5697[2:10] = {ravi, rageev, raghav, smith}"
re.sub(r'(A\d+\[\d+:\d+\]\s+=\s+)(.+)', r'\2', text)
returns,
'{ravi, rageev, raghav, smith}'
re.sub : substitutes the entire match as given as regex with the 2nd capturing group. The second capturing group captures every thing after '= '.
Simply replace the bits you don't want:
print re.sub(r'A\d[^=]*= *','',line)
See demo here: https://rextester.com/NSG17655

print(f"...:")-statement too long - break it into multiple lines without messing up the format

I have a console program with formatted output. to always get the same length of the printout, I have a rather complex formatted print output.
print(f"\n{WHITE_BG}{64*'-'}")
print(f"\nDirektvergleich{9*' '}{RED}{players[0].name}{4*' '}{GREEN}vs.{4*' '}{RED}{players[1].name}{CLEAR}\n")
print(f"""{15*'~'}{' '}{YELLOW}Gesamt{CLEAR}:{' '}{players[0].name}{' '}{GREEN}{int(player1_direct_wins)}{(int(4-len(player1_direct_wins)))*' '}-{(int(4-len(player1_direct_losses)))*' '}{int(player1_direct_losses)}{CLEAR}{' '}{players[1].name}{' '}{(28-len(players[0].name)-len(players[1].name))*'~'}\n""")
print(f"""{15*'~'}{' '}{YELLOW}Trend{CLEAR}:{' '}{players[0].name}{' '}{GREEN}{int(player1_trend_wins)}{(int(4-len(player1_trend_wins)))*' '}-{(int(4-len(player1_trend_losses)))*' '}{int(player1_trend_losses)}{CLEAR}{' '}{players[1].name}{' '}{(28-len(players[0].name)-len(players[1].name))*'~'}""")
print(f"\n{WHITE_BG}{64*'-'}")
This leads to the following output in my windows cmd
For readibility purpose, I tried to make the print over multiple lines, therefore I found on stackoverflow the idea to start with triple quotes. But when I cut this print(f"...") statement in the middle, I mess up my formatting.
Example:
print(f"\n{WHITE_BG}{64*'-'}") #als String einspeisen?!
print(f"\nDirektvergleich{9*' '}{RED}{players[0].name}{4*' '}{GREEN}vs.{4*' '}{RED}{players[1].name}{CLEAR}\n")
print(f"""{15*'~'}{' '}{YELLOW}Gesamt{CLEAR}:{' '}{players[0].name}{' '}{GREEN}{int(player1_direct_wins)}{(int(4-len(player1_direct_wins)))*' '}-
{(int(4-len(player1_direct_losses)))*' '}{int(player1_direct_losses)}{CLEAR}{' '}{players[1].name}{' '}{(28-len(players[0].name)-len(players[1].name))*'~'}\n""")
print(f"""{15*'~'}{' '}{YELLOW}Trend{CLEAR}:{' '}{players[0].name}{' '}{GREEN}{int(player1_trend_wins)}{(int(4-len(player1_trend_wins)))*' '}-
{(int(4-len(player1_trend_losses)))*' '}{int(player1_trend_losses)}{CLEAR}{' '}{players[1].name}{' '}{(28-len(players[0].name)-len(players[1].name))*'~'}""")
print(f"\n{WHITE_BG}{64*'-'}")
leads to...
can anyone point me in the right direction how to format my output in the displayed way, but without having this absurd long line length.
Thank you guys in advance!
Triple quoted strings preserve newline characters, so they are indeed not what you want here. Now when it finds two adjacent strings, the Python parser automagically concatenates them into a single string, i.e.:
s = "foo" "bar"
is equivalent to
s = "foobar"
And this works if you put your strings within parens:
s = ("foo" "bar")
in which case you can put each string on its own line as well:
s = (
"foo"
"bar"
)
This also applies to "fstrings" so what you want is something like:
print((
f"{15*'~'}{' '}{YELLOW}Gesamt{CLEAR}:{' '}{players[0].name}{' '}{GREEN} "
f"{int(player1_direct_wins)}{(int(4-len(player1_direct_wins)))*' '}-"
f"{(int(4-len(player1_direct_losses)))*' '}{int(player1_direct_losses)}"
f"{CLEAR}{' '}{players[1].name}{' '}{(28-len(players[0].name)-"
f"len(players[1].name))*'~'}\n"
))
That being said, I'd rather use intermediate variables than trying to cram such complex expressions in a fstring.

An Elegant Solution to Python's Multiline String?

I was trying to log a completion of a scheduled event I set to run on Django. I was trying my very best to make my code look presentable, So instead of putting the string into a single line, I have used a multiline string to output to the logger within a Command Management class method. The example as code shown:
# the usual imports...
# ....
import textwrap
logger = logging.getLogger(__name__)
class Command(BaseCommand):
def handle(self, *args, **kwargs):
# some codes here
# ....
final_statement = f'''\
this is the final statements \
with multiline string to have \
a neater code.'''
dedented_text = textwrap.dedent(final_statment)
logger.info(dedent.replace(' ',''))
I have tried a few methods I found, however, most quick and easy methods still left a big chunk of spaces on the terminal. As shown here:
this is the final statement with multiline string to have a neater code.
So I have come up with a creative solution to solve my problem. By using.
dedent.replace(' ','')
Making sure to replace two spaces with no space in order not to get rid of the normal spaces between words. Which finally produced:
this is the final statement with multiline string to have a neater code.
Is this an elegant solution or did I missed something on the internet?
You could use regex to simply remove all white space after a newline. Additionally, wrapping it into a function leads to less repetitive code, so let's do that.
import re
def single_line(string):
return re.sub("\n\s+", "", string)
final_statement = single_line(f'''
this is the final statements
with multiline string to have
a neater code.''')
print(final_statement)
Alternatively, if you wish to avoid this particular problem (and don't mine the developmental overhead), you could store them inside a file, like JSON so you can quickly edit prompts while keeping your code clean.
Thanks to Neil's suggestion, I have come out with a more elegant solution. By creating a function to replace the two spaces with none.
def single_line(string):
return string.replace(' ','')
final_statement = '''\
this is a much neater
final statement
to present my code
'''
print(single_line(final_statement)
As improvised from Neil's solution, I have cut down the regex import. That's one line less of code!
Also, making it a function improves on readability as the whole print statement just read like English. "Print single line final statement"
Any better idea?
The issue with both Neil’s and Wong Siwei’s answers is they don’t work if your multiline string contains lines more indented than others:
my_string = """\
this is my
string and
it has various
identation
levels"""
What you want in the case above is to remove the two-spaces indentation, not every space at the beginning of a line.
The solution below should work in all cases:
import re
def dedent(s):
indent_level = None
for m in re.finditer(r"^ +", s):
line_indent_level = len(m.group())
if indent_level is None or indent_level > line_indent_level:
indent_level = line_indent_level
if not indent_level:
return s
return re.sub(r"(?:^|\n) {%s}" % indent_level, "", s)
It first scans the whole string to find the lowest indentation level then uses that information to dedent all lines of it.
If you only care about making your code easier to read, you may instead use C-like strings "concatenation":
my_string = (
"this is my string"
" and I write it on"
" multiple lines"
)
print(repr(my_string))
# => "this is my string and I write it on multiple lines"
You may also want to make it explicit with +s:
my_string = "this is my string" + \
" and I write it on" + \
" multiple lines"

Resources