Split string with commas while keeping numeric parts - python-3.x

I'm using the following function to separate strings with commas right on the capitals, as long as it is not preceded by a blank space.
def func(x):
y = re.findall('[A-Z][^A-Z\s]+(?:\s+\S[^A-Z\s]*)*', x)
return ','.join(y)
However, when I try to separate the next string it removes the part with numbers.
Input = '49ersRiders Mapple'
Output = 'Riders Mapple'
I tried the following code but now it removes the 'ers' part.
def test(x):
y = re.findall(r'\d+[A-Z]*|[A-Z][^A-Z\s]+(?:\s+\S[^A-Z\s]*)*', x)
return ','.join(y)
Output = '49,Riders Mapple'
The output I'm looking for is this:
'49ers,Riders Mapple'
Is it possible to add this indication to my regex?
Thanks in advance

Maybe naive but why don't you use re.sub:
def func(x):
return re.sub(r'(?<!\s)([A-Z])', r',\1', x)
inp = '49ersRiders Mapple'
out = func(inp)
print(out)
# Output
49ers,Riders Mapple

Here is a regex re.findall approach:
inp = "49ersRiders"
output = ','.join(re.findall('(?:[A-Z]|[0-9])[^A-Z]+', inp))
print(output) # 49ers,Riders
The regex pattern used here says to match:
(?:
[A-Z] a leading uppercase letter (try to find this first)
| OR
[0-9] a leading number (fallback for no uppercase)
)
[^A-Z]+ one or more non capital letters following

Related

I want to compress each letter in a string with a specific length

I have the following string:
x = 'aaabbbbbaaaaaacccccbbbbbbbbbbbbbbb'. I want to get an output like this: abaacbbb, in which "a" will be compressed with a length of 3 and "b" will be compressed with a length of 5. I used the following function, but it removes all the adjacent duplicates and the output is: abacb :
def remove_dup(x):
if len(x) < 2:
return x
if x[0] != x[1]:
return x[0] + remove_dup(x[1:])
return remove_dup(x[1:])
x = 'aaabbbbbaaaaaacccccbbbbbbbbbbbbbbb'
print(remove_dup(x))
It would be wonderful if somebody could help me with this.
Thank you!
Unless this is a homework question with special constraints, this would be more conveniently and arguably more readably implemented with a regex substitution that replaces desired quantities of specific characters with a single instance of the captured character:
import re
def remove_dup(x):
return re.sub('(a){3}|([bc]){5}', r'\1\2', x)
x = 'aaabbbbbaaaaaacccccbbbbbbbbbbbbbbb'
print(remove_dup(x))
This outputs:
abaacbbb

remove string which contains special character by python regular expression

my code:
s = '$ascv abs is good'
re.sub(p.search(s).group(),'',s)
ouput:
'$ascv abs is good'
the output what i want:
'abs is good'
I want to remove string which contains special character by python regular expression. I thought my code was right but the output is wrong.
How can i fix my code to make the output right?
invalid_chars = ['#'] # Characters you don't want in your text
# Determine if a string has any character you don't want
def if_clean(word):
for letter in word:
if letter in invalid_chars:
return False
return True
def clean_text(text):
text = text.split(' ') # Convert text to a list of words
text_clean = ''
for word in text:
if if_clean(word):
text_clean = text_clean+' '+word
return text_clean[1:]
# This will print 'abs is good'
print(clean_text('$ascv abs is good'))

Python regex multiple matches occurrences between two strings

I have a multi-line string with my start/end magic strings ("X" and "Y"). I'm trying to capture all occurrences but I'm experiencing some issues.
Here is the code
testString = '''AAAAAXBBBBBYCCCCCXDDDDDYEEEEEEXFFF
FFFYGGG
'''
pattern = re.compile(r'(.*)X(.*)Y(.*)', re.MULTILINE)
match = re.search(pattern, testString)
print match.group(1) # output: AAAAAXBBBBBYCCCCC
print match.group(2) # output: DDDDD
print match.group(3) # output: EEEEEEXFFF
Basically, I'm trying to capture all occurrences of the following (And I have to maintain text order):
Text before the magic start string (e.g.: AAAAA, CCCCC, EEEEEE)
Text between start/end magic strings (e.g.: BBBBB, DDDDD, FFF\nFFF)
Text after the magic start string (e.g.: CCCCC, GGG)
So I'm trying to print the following output: (what's in between brackets below is just a comment)
AAAAA (before magic string)
BBBBB (between magic strings)
CCCCC (before/after magic strings, it does not matter. Just the order matters.)
DDDDD (after magic string)
And so on. Printing them in that order would solve the issue. (Then I can pass each to other functions, ...etc.)
The code works nicely when the text is as simple as for example "AAXBBYCC", but with complicated strings I'm losing control.
Any ideas or alternative ways to do this?
You could match any character except X or Y in group 1 and then match X and do the same for Y. The "after the magic string" part you could capture in a lookahead with a third group.
The negated character class using [^ will also match an newline to match the FFFFFF part.
([^XY]+)X([^XY]+)Y(?=([^XY]+))
([^XY]+)X Capture group 1, match 1+ times any char except X or Y, then match X
([^XY]+)Y Capture group 2, match 1+ times any char except X or Y, then match Y
(?= Positive lookahead, assert what is directly to the right is
([^XY]+) Capture group 3, match 1+ times any char except X or Y
) Close lookahead
Regex demo | Python demo
import re
regex = r"([^XY]+)X([^XY]+)Y(?=([^XY]*))"
s = ("AAAAAXBBBBBYCCCCCXDDDDDYEEEEEEXFFF\n"
"FFFYGGG")
matches = re.findall(regex, s)
print(matches)
Output
[('AAAAA', 'BBBBB', 'CCCCC'), ('CCCCC', 'DDDDD', 'EEEEEE'), ('EEEEEE', 'FFF\nFFF', 'GGG')]
So I'm trying to print the following output: (what's in between brackets below is just a comment)
AAAAA (before magic string)
BBBBB (between magic strings)
CCCCC (before/after magic strings, it does not matter. Just the order matters.)
DDDDD (after magic string)
And so on.
Since it doesn't matter whether before or after start or end, it is as simple as:
import re
o = re.split("X|Y", testString)
print(*o, sep='\n')
Can't you just use:
pattern = re.compile(r'[^XY]+')
match = re.findall(pattern, testString)
print(match)
# ['AAAAA', 'BBBBB', 'CCCCC', 'DDDDD', 'EEEEEE', 'FFF\nFFF', 'GGG\n']

How to split string by odd length

Lets say with a string = "AABBAAAAABBBBAAABBBBAA"
I want to return string split by the odd lengths of the string (i.e when A = 5 or A = 3),
What I want returned is 1) AABBAAAAA 2)BBBBAAA 3)BBBBAA,
How can I do that?
I tried using regex [A]+[B]+ for a slightly different case
One option might be to regex iterate using re.finditer with the following pattern:
.*?(?:AAA(?:AA)?|$)
This pattern will non greedily consume until reaching either 3 A's, 5 A's, or the end of the string. Then, we can print out each complete match as we iterate.
input = 'AABBAAAAABBBBAAABBBBAA'
pattern = '.*?(?:AAA(?:AA)?|$)'
for match in re.finditer(pattern, input):
print match.group()
This prints:
AABBAAAAA
BBBBAAA
BBBBAA
You can use itertools.groupby:
s = 'BBAAAAABBBBAAABBBBAA'
from itertools import groupby
out = ['']
for v, g in groupby(s):
l = [*g]
out[-1] += ''.join(l)
if v == 'A' and len(l) in (3, 5):
out.append('')
print(out)
Prints:
['BBAAAAA', 'BBBBAAA', 'BBBBAA']

How can I delete the letter that occurs in the two strings using python?

That's the source code:
def revers_e(str_one,str_two):
for i in range(len(str_one)):
for j in range(len(str_two)):
if str_one[i] == str_two[j]:
str_one = (str_one - str_one[i]).split()
print(str_one)
else:
print('There is no relation')
if __name__ == '__main__':
str_one = input('Put your First String: ').split()
str_two = input('Put your Second String: ')
print(revers_e(str_one, str_two))
How can I remove a letter that occurs in both strings from the first string then print it?
How about a simple pythonic way of doing it
def revers_e(s1, s2):
print(*[i for i in s1 if i in s2]) # Print all characters to be deleted from s1
s1 = ''.join([i for i in s1 if i not in s2]) # Delete them from s1
This answer says, "Python strings are immutable (i.e. they can't be modified). There are a lot of reasons for this. Use lists until you have no choice, only then turn them into strings."
First of all you don't need to use a pretty suboptimal way using range and len to iterate over a string since strings are iterable you can just iterate over them with a simple loop.
And for finding intersection within 2 string you can use set.intersection which returns all the common characters in both string and then use str.translate to remove your common characters
intersect=set(str_one).intersection(str_two)
trans_table = dict.fromkeys(map(ord, intersect), None)
str_one.translate(trans_table)
def revers_e(str_one,str_two):
for i in range(len(str_one)):
for j in range(len(str_two)):
try:
if str_one[i] == str_two[j]:
first_part=str_one[0:i]
second_part=str_one[i+1:]
str_one =first_part+second_part
print(str_one)
else:
print('There is no relation')
except IndexError:
return
str_one = input('Put your First String: ')
str_two = input('Put your Second String: ')
revers_e(str_one, str_two)
I've modified your code, taking out a few bits and adding a few more.
str_one = input('Put your First String: ').split()
I removed the .split(), because all this would do is create a list of length 1, so in your loop, you'd be comparing the entire string of the first string to one letter of the second string.
str_one = (str_one - str_one[i]).split()
You can't remove a character from a string like this in Python, so I split the string into parts (you could also convert them into lists like I did in my other code which I deleted) whereby all the characters up to the last character before the matching character are included, followed by all the characters after the matching character, which are then appended into one string.
I used exception statements, because the first loop will use the original length, but this is subject to change, so could result in errors.
Lastly, I just called the function instead of printing it too, because all that does is return a None type.
These work in Python 2.7+ and Python 3
Given:
>>> s1='abcdefg'
>>> s2='efghijk'
You can use a set:
>>> set(s1).intersection(s2)
{'f', 'e', 'g'}
Then use that set in maketrans to make a translation table to None to delete those characters:
>>> s1.translate(str.maketrans({e:None for e in set(s1).intersection(s2)}))
'abcd'
Or use list comprehension:
>>> ''.join([e for e in s1 if e in s2])
'efg'
And a regex to produce a new string without the common characters:
>>> re.sub(''.join([e for e in s1 if e in s2]), '', s1)
'abcd'

Resources