python regex or something else

python regex or something else - python-3.x

I have a code with asking for errors on interfaces from my network switches. The output that I'm getting varies sometimes.
the output that i get from the switches in this format : (number changes from time to time)
output
so i want to print from the output that i get only line with end with number that grater then 0 like the line with start BAG16
my code is going like that :
import re
kobi = '''
BAGG11 13917779236 10133016 16491979 64
BAGG15 30841323485 22747672 19201545 0
BAGG16 811970 0 811970 0
'''
err = re.findall (r'[BAGG]',kobi)
print(err)

I think this can be done without regex.
Try this:
kobi = '''
BAGG11 13917779236 10133016 16491979 64
BAGG15 30841323485 22747672 19201545 0
BAGG16 811970 0 811970 0
'''
lst = kobi.split()
lines = [lst[i:i+5] for i in range(0, len(lst), 5)]
for line in lines:
if int(line[-1]) > 0:
print(' '.join(line))
Output:
BAGG11 13917779236 10133016 16491979 64
I've made some assumptions about your input:
it's always five rows
last row is a numerical value

You might use a pattern to match BAGG and digits at the start and match a digit starting with 1-9 at the end.
^BAGG\d+[^\S\r\n].*[^\S\r\n][1-9]\d*$
Regex demo
If there should be 3 columns following, a bit more precise match could be using a quantifier {3} to match the number of "columns" in the middle.
^BAGG\d+(?:[^\S\r\n]+\d+){3}[^\S\r\n]+[1-9]\d*$
Explanation
^ Start of line
BAGG\d+ Match BAGG and 1+ digits
(?: Non capture group
[^\S\r\n]+\d+ Match 1+ whitespace chars without a newline followed by 1+ digits
){3} Close non capture group and repeat 3 times
[^\S\r\n]+ Match 1+ whitespace chars without newlines
[1-9]\d* Match a digit 1-9 followed by optional digits
$ End of line
Regex demo | Python demo
For example
import re
kobi = '''
BAGG11 13917779236 10133016 16491979 64
BAGG15 30841323485 22747672 19201545 0
BAGG16 811970 0 811970 0
'''
err = re.findall (r'^BAGG\d+(?:[^\S\r\n]+\d+){3}[^\S\r\n]+[1-9]\d*$', kobi, re.MULTILINE)
print(err)
Output
['BAGG11 13917779236 10133016 16491979 64']

Related

Horizotal print of a complex string block

Once again I'm asking for you advice. I'm trying to print a complex string block, it should look like this:
32 1 9999 523
+ 8 - 3801 + 9999 - 49
---- ------ ------ -----
40 -3800 19998 474
I wrote the function arrange_printer() for the characters arrangement in the correct format that could be reutilized for printing the list. This is how my code looks by now:
import operator
import sys
def arithmetic_arranger(problems, boolean: bool):
arranged_problems = []
if len(problems) <= 5:
for num in range(len(problems)):
arranged_problems += arrange_printer(problems[num], boolean)
else:
sys.exit("Error: Too many problems")
return print(*arranged_problems, end=' ')
def arrange_printer(oper: str, boolean: bool):
oper = oper.split()
ops = {"+": operator.add, "-": operator.sub}
a = int(oper[0])
b = int(oper[2])
if len(oper[0]) > len(oper[2]):
size = len(oper[0])
elif len(oper[0]) < len(oper[2]):
size = len(oper[2])
else:
size = len(oper[0])
line = '------'
ope = ' %*i\n%s %*i\n%s' % (size,a,oper[1],size,b,'------'[0:size+2])
try:
res = ops[oper[1]](a,b)
except:
sys.exit("Error: Operator must be '+' or '-'.")
if boolean == True:
ope = '%s\n%*i' % (ope,size+2, res)
return ope
arithmetic_arranger(['20 + 300', '1563 - 465 '], True)
#arrange_printer(' 20 + 334 ', True)
Sadly, I'm getting this format:
2 0
+ 3 0 0
- - - - -
3 2 0 1 5 6 3
- 4 6 5
- - - - - -
1 0 9 8
If you try printing the return of arrange_printer() as in the last commented line the format is the desired.
Any suggestion for improving my code or adopt good coding practices are well received, I'm starting to get a feel for programming in Python.
Thank you by your help!

The first problem I see is that you use += to add an item to the arranged_problems list. Strings are iterable. somelist += someiterable iterates over the someiterable, and appends each element to somelist. To append, use somelist.append()
Now once you fix this, it still won't work like you expect it to, because print() works by printing what you give it at the location of the cursor. Once you're on a new line, you can't go back to a previous line, because your cursor is already on the new line. Anything you print after that will go to the new line at the location of the cursor, so you need to arrange multiple problems such that their first lines all print first, then their second lines, and so on. Just fixing append(), you'd get this output:
20
+ 300
-----
320 1563
- 465
------
1098
You get a string with \n denoting the start of the new line from each call to arrange_printer(). You can split this output into lines, and then process each row separately.
For example:
def arithmetic_arranger(problems, boolean:bool):
arranged_problems = []
if len(problems) > 5:
print("Too many problems")
return
for problem in problems:
# Arrange and split into individual lines
lines = arrange_printer(problem, boolean).split('\n')
# Append the list of lines to our main list
arranged_problems.append(lines)
# Now, arranged_problems contains one list for each problem.
# Each list contains individual lines we want to print
# Use zip() to iterate over all the lists inside arranged_problems simultaneously
for problems_lines in zip(*arranged_problems):
# problems_lines is e.g.
# (' 20', ' 1563')
# ('+ 300', '- 465') etc
# Unpack this tuple and print it, separated by spaces.
print(*problems_lines, sep=" ")
Which gives the output:
20 1563
+ 300 - 465
----- ------
320 1098
If you expect each problem to have a different number of lines, then you can use the itertools.zip_longest() function instead of zip()
To collect all my other comments in one place:
return print(...) is pretty useless. print() doesn't return anything. return print(...) will always cause your function to return None.
Instead of iterating over range(len(problems)) and accessing problems[num], just do for problem in problems and then use problem instead of problems[num]
Debugging is an important skill, and the sooner into your programming career you learn it, the better off you will be.
Stepping through your program with a debugger allows you to see how each statement affects your program and is an invaluable debugging tool

Python reformatting strings based on contents

In a pandas dataframe I have rows with contents in the following format:
1) abc123-Target 4-ufs
2) abc123-target4-ufs
3) geo.4
4) j123T4
All of these should be simply: target 4
So far my cleaning procedure is as follows:
df["point_id"] = df["point_id"].str.lower()
df["point_id"] = df['point_id'].str.replace('^.*?(?=target)', '')
This returns:
1) target 4-ufs
2) target4-ufs
3) geo.14
4) geo.2
5) j123T4
What I believe I need is:
a. Remove anything after the last number in the string, this solves 1
b. If 'target' does not have a space after it add a space, this with the above solves 2
c. If the string ends in a point and a number of any length remove everything before the point (incl. point) and replace with 'target ', this solves 3 and 4
d. If the string ends with a 't' followed by a number of any length remove everything before 't' and replace with 'target ', this solves 5
I'm looking at regex and re but the following is not having effect (add space before the last number)
df["point_id"] = re.sub(r'\D+$', '', df["point_id"])

Reading the rules, you might use 2 capture groups and check for the group values:
\btarget\s*(\d+)|.*[t.](\d+)$
\btarget\s*(\d+) Match target, optional whitespace chars and capture 1+ digits in group 1
| Or
.*[t.] Match 0+ characters followed by either t or a .
(\d+)$ Capture 1+ digits in group 2 at the end of the string
Regex demo | Python demo
Python example:
import re
import pandas as pd
pattern = r"\btarget\s*(\d+)|.*[t.](\d+)$"
strings = [
"abc123-Target 4-ufs",
"abc123-target4-ufs",
"geo.4",
"j123T4"
]
df = pd.DataFrame(strings, columns=["point_id"])
def change(s):
m = re.search(pattern, s, re.IGNORECASE)
return "target " + (m.group(2) if m.group(2) else m.group(1))
df["point_id"] = df["point_id"].apply(change)
print(df)
Output
point_id
0 target 4
1 target 4
2 target 4
3 target 4

You can use
df = pd.DataFrame({'point_id':['abc123-Target 4-ufs','abc123-target4-ufs','geo.4','j123T4']})
df['point_id'] = df['point_id'].str.replace(r'(?i).*Target\s*(\d+).*', r'target \1', regex=True)
df.loc[df['point_id'].str.contains(r'(?i)\w[.t]\d+$'), 'point_id'] = 'target 4'
# point_id
# 0 target 4
# 1 target 4
# 2 target 4
# 3 target 4
The regex is (?i)Target\s*\d+|\w+[.t]\d+$:
(?i) - case insensitive matching
.* - any 0+ chars other than line break chars, as many as possible
Target\s*(\d+).* - Target, zero or more whitespaces, and one or more digits captured into Group 1
.* - any 0+ chars other than line break chars, as many as possible
The second regex matches
(?i) - case insensitive matching
\w - a word char, then
[.t] - a . or t and then
\d+$ - one or more digits at the end of string.
The second regex is used as a mask, and the values in the point_id column are set to target 4 whenever the pattern matches the regex.
See regex #1 demo and regex #2 demo.

Python regex multiple matches occurrences between two strings

I have a multi-line string with my start/end magic strings ("X" and "Y"). I'm trying to capture all occurrences but I'm experiencing some issues.
Here is the code
testString = '''AAAAAXBBBBBYCCCCCXDDDDDYEEEEEEXFFF
FFFYGGG
'''
pattern = re.compile(r'(.*)X(.*)Y(.*)', re.MULTILINE)
match = re.search(pattern, testString)
print match.group(1) # output: AAAAAXBBBBBYCCCCC
print match.group(2) # output: DDDDD
print match.group(3) # output: EEEEEEXFFF
Basically, I'm trying to capture all occurrences of the following (And I have to maintain text order):
Text before the magic start string (e.g.: AAAAA, CCCCC, EEEEEE)
Text between start/end magic strings (e.g.: BBBBB, DDDDD, FFF\nFFF)
Text after the magic start string (e.g.: CCCCC, GGG)
So I'm trying to print the following output: (what's in between brackets below is just a comment)
AAAAA (before magic string)
BBBBB (between magic strings)
CCCCC (before/after magic strings, it does not matter. Just the order matters.)
DDDDD (after magic string)
And so on. Printing them in that order would solve the issue. (Then I can pass each to other functions, ...etc.)
The code works nicely when the text is as simple as for example "AAXBBYCC", but with complicated strings I'm losing control.
Any ideas or alternative ways to do this?

You could match any character except X or Y in group 1 and then match X and do the same for Y. The "after the magic string" part you could capture in a lookahead with a third group.
The negated character class using [^ will also match an newline to match the FFFFFF part.
([^XY]+)X([^XY]+)Y(?=([^XY]+))
([^XY]+)X Capture group 1, match 1+ times any char except X or Y, then match X
([^XY]+)Y Capture group 2, match 1+ times any char except X or Y, then match Y
(?= Positive lookahead, assert what is directly to the right is
([^XY]+) Capture group 3, match 1+ times any char except X or Y
) Close lookahead
Regex demo | Python demo
import re
regex = r"([^XY]+)X([^XY]+)Y(?=([^XY]*))"
s = ("AAAAAXBBBBBYCCCCCXDDDDDYEEEEEEXFFF\n"
"FFFYGGG")
matches = re.findall(regex, s)
print(matches)
Output
[('AAAAA', 'BBBBB', 'CCCCC'), ('CCCCC', 'DDDDD', 'EEEEEE'), ('EEEEEE', 'FFF\nFFF', 'GGG')]

So I'm trying to print the following output: (what's in between brackets below is just a comment)
AAAAA (before magic string)
BBBBB (between magic strings)
CCCCC (before/after magic strings, it does not matter. Just the order matters.)
DDDDD (after magic string)
And so on.
Since it doesn't matter whether before or after start or end, it is as simple as:
import re
o = re.split("X|Y", testString)
print(*o, sep='\n')

Can't you just use:
pattern = re.compile(r'[^XY]+')
match = re.findall(pattern, testString)
print(match)
# ['AAAAA', 'BBBBB', 'CCCCC', 'DDDDD', 'EEEEEE', 'FFF\nFFF', 'GGG\n']

How to match different groups with different positions with one single pattern in Python3 Regex

I want to match different groups with different positions with one pattern only.
notice the last 5 digits are in different position, this is my actual inquiry.
import re
line = "Jul 6 14:02:08 computer.name jam_tag=psim[29187]: (UUID:006)"
pattern = r"(Jul\s\d\s\d+:+\d+:+\d+)" # but I coudn't recognize how to match another group with different position which is the 5 digits between brackets
result = re.search(pattern, line)
print(result) # output should be: Jul 6 14:02:08 29187
# my actual output: Jul 6 14:02:08 I still don't know how to match a group with different position using one pattern only

You may use
def show_time_of_pid(line):
pattern = r"^(Jul\s+\d+\s+[\d:]*\d).*?\[(\d+)]"
result = re.search(pattern, line)
return "{} pid:{}".format(result.group(1),result.group(2)) if result else ""
See the regex demo.
Regex details
^ - start of string
(Jul\s+\d+\s+[\d:]*\d) - Group 1: Jul, 1+ whitespaces, 1+ digits, 1+ whitespaces, zero or more digits or colons and then a digit
.*? - any 0+ chars, other than line break chars, as few as possible
\[(\d+)] - [, Group 2 capturing 1 or more digits, and then a ].
See Python demo:
print(show_time_of_pid("Jul 6 14:01:23 computer.name CRON[29440]: USER (good_user)"))
# => Jul 6 14:01:23 pid:29440
print(show_time_of_pid("Jul 6 14:02:08 computer.name jam_tag=psim[29187]: (UUID:006)"))
# => Jul 6 14:02:08 pid:29187

Python 3.x - don't count carriage returns with len

I'm writing the following code as part of my practice:
input_file = open('/home/me/01vshort.txt', 'r')
file_content = input_file.read()
input_file.close()
file_length_question = input("Count all characters (y/n)? ")
if file_length_question in ('y', 'Y', 'yes', 'Yes', 'YES'):
print("\n")
print(file_content, ("\n"), len(file_content) - file_content.count(" "))
It's counting carriage returns in the output, so for the following file (01vshort.txt), I get the following terminal output:
Count all characters (y/n)? y
0
0 0
1 1 1
9
...or...
Count all characters (y/n)? y
0
00
111
9
In both cases, the answer should be 6, as there are 6 characters, but I'm getting 9 as the result.
I've made sure the code is omitting whitespace, and have tested this with my input file by deliberately adding whitespace and running the code with and without the line:
- file_content.count(" ")
Can anyone assist here as to why the result is 9 and not 6?
Perhaps it isn't carriage returns at all?
I'm also curious as to why the result of 9 is indented by 1 whitespace? The input file simply contains the following (with a blank line at the end of the file, line numbers indicated in the example):
1. 0
2. 0 0
3. 1 1 1
4.
...or...
1. 0
2. 00
3. 111
4.
Thanks.

If you want to ignore all whitespace characters including tabs and newlines and other control characters:
print(sum(not c.isspace() for c in file_content))
will give you the 6 you expect.
Alternatively you can take advantage of the fact the .split() method with no argument will split a string on any whitespace character. So split it into non-space chunks and then join them all back together again without the whitespace characters:
print(len(''.join(file_content.split())))

You're getting 9 because the content of the file could be interpreted like:
file_content = "0\n0 0\n1 1 1\n"
and you're only matching the white spaces (file_content.count(" ")).
In order to count only the characters you'd either:
read line by line the file, or
use a regexp to match white space.
For the indenting of 9: print processes the commas as outlined here

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

python regex or something else - python-3.x

Related

Horizotal print of a complex string block

Python reformatting strings based on contents

Python regex multiple matches occurrences between two strings

How to match different groups with different positions with one single pattern in Python3 Regex

Python 3.x - don't count carriage returns with len

Categories

Resources