I have a string like this
convert_text = "tet1+tet2+tet34+tet12+tet3"
I want to replace digits into character from above string.That mapping list available separately.so,When am trying to replace digit 1 with character 'g' using replace like below
import re
convert_text = convert_text.replace('1','g')
print(convert_text)
output is
"tetg+tet2+tet34+tetg2+tet3"
How to differentiate single digit and two digit values.Is there is any way to do with Regexp or something else?
You can use a regular expression with a callable replacement argument to substitute consecutive runs of digits with a value in a lookup table, eg:
import re
# Input text
convert_text = "tet1+tet2+tet34+tet12+tet3"
# to->from of digits to string
replacements = {'1': 'A', '2': 'B', '3': 'C', '12': 'T', '34': 'X'}
# Do actual replacement of digits to string
converted_text = re.sub('(\d+)', lambda m: replacements[m.group()], convert_text)
Which gives you:
'tetA+tetB+tetX+tetT+tetC'
import re
convert_text = "tet1+tet2+tet34+tet12+tet3"
pattern = re.compile(r'((?<!\d)\d(?!\d))')
convert_text2=pattern.sub('g',convert_text)
convert_text2
Out[2]: 'tetg+tetg+tet34+tet12+tetg'
You have to use negative lookahead and negative lookbehind patterns which are in between parenthesis
(?!pat) and
(?<!pat),
you have the same with = instead of ! for positive lookahead/lookbehind.
EDIT: if you need replacement of strings of digits, regex is
pattern2 = re.compile(r'\d+')
In any pattern you can replace \d by a specific digit you need.
Related
This:
words = words.withColumn('value_2', F.regexp_replace('value', '|'.join(stopWords), ''))
works fine for substrings.
However, I have a stop word 'a' and as a result 'was' becomes 'ws'. I only want to see it on 'A' or 'a', and leave was as is.
Place word boundaries around the alternation:
words = words.withColumn('value_2', F.regexp_replace('value', '\\b(' + '|'.join(stopWords) + ')\\b', ''))
So Let's say I have a random string "Mississippi"
I want to create a new string from "Mississippi" but replacing all the non-instances of a particular character.
For example if we use the letter "S". In the new string, I want to keep all the S's in "MISSISSIPPI" and replace all the other letters with a "_".
I know how to do the reverse:
word = "MISSISSIPPI"
word2 = word.replace("S", "_")
print(word2)
word2 gives me MI__I__IPPI
but I can't figure out how to get word2 to be __SS_SS____
(The classic Hangman Game)
You would need to use the sub method of Python strings with a regular expression for symbolizing a NOT character set such as
import re
line = re.sub(r"[^S]", "_", line)
This replaces any non S character with the desired character.
You could do this with str.maketrans() and str.translate() but it would be easier with regular expressions. The trick is you need to insert your string of valid characters into the regular expression programattically:
import re
word = "MISSISSIPPI"
show = 'S' # augment as the game progresses
print(re.sub(r"[^{}]".format(show), "_", word))
A simpler way is to map a function across the string:
>>> ''.join(map(lambda w: '_' if w != 'S' else 'S', 'MISSISSIPPI'))
'__SS_SS____'
There is a list contains with character sequences such below:
seq_list = ['C','CA','CAF','CMMVF','E','CMM','CMMF','CMMFF',...]
and a string can be defined as below:
a_str = 'CAFCMMVFCMMECMMFFCCAF'
The problem is to match the longest character sequence of seq_list in a_str from left to right iteratively, and then a character('|') should be appended if it's found.
For example,
a_str begins with 'C' but the actual character sequence is 'CAF' because 'CAF' has the longer sequence than 'C',
so that it should be achieved such below:
a_str = 'CAF|CMMVFCMMECMMFFCCAF' #actual sequence match
'C|AFCMMVFCMMECMMFFCCAF' #false sequence match
Then, remaining a_str_r should be like this a_str_r = 'CMMVFCMMECMMFFCCAF' after a character '|' has been appended. So that the iterative process has to start over again by matching the longest sequence from the list until the end of the string, and the final result should be like this:
a_str = 'CAF|CMMVF|CMM|E|CMMFF|C|CAF|'
This was one of the attempts for this problem, and still couldn't get right!
a_str_r = []
for each in seq_list:
for i in a_str:
if each in i:
a_str_r.append(i+'|')
return a_str_r
You want to search for leftmost longest match. That is a standout for a regular expression search.
import re
seq_list = ['C','CA','CAF','CMMVF','E','CMM','CMMF','CMMFF']
# Sort to put longer match strings before shorter ones
sseq_list = sorted(seq_list, key=lambda a: len(a), reverse=True)
# Turn list into a regular expression string
sseq_re = '|'.join(sseq_list)
# Compile regular expression string
rx = rx = re.compile(sseq_re)
# Put pipe characters between the matches
print '|'.join(rx.findall('CAFCMMVFCMMECMMFFCCAF'))
That's the source code:
def revers_e(str_one,str_two):
for i in range(len(str_one)):
for j in range(len(str_two)):
if str_one[i] == str_two[j]:
str_one = (str_one - str_one[i]).split()
print(str_one)
else:
print('There is no relation')
if __name__ == '__main__':
str_one = input('Put your First String: ').split()
str_two = input('Put your Second String: ')
print(revers_e(str_one, str_two))
How can I remove a letter that occurs in both strings from the first string then print it?
How about a simple pythonic way of doing it
def revers_e(s1, s2):
print(*[i for i in s1 if i in s2]) # Print all characters to be deleted from s1
s1 = ''.join([i for i in s1 if i not in s2]) # Delete them from s1
This answer says, "Python strings are immutable (i.e. they can't be modified). There are a lot of reasons for this. Use lists until you have no choice, only then turn them into strings."
First of all you don't need to use a pretty suboptimal way using range and len to iterate over a string since strings are iterable you can just iterate over them with a simple loop.
And for finding intersection within 2 string you can use set.intersection which returns all the common characters in both string and then use str.translate to remove your common characters
intersect=set(str_one).intersection(str_two)
trans_table = dict.fromkeys(map(ord, intersect), None)
str_one.translate(trans_table)
def revers_e(str_one,str_two):
for i in range(len(str_one)):
for j in range(len(str_two)):
try:
if str_one[i] == str_two[j]:
first_part=str_one[0:i]
second_part=str_one[i+1:]
str_one =first_part+second_part
print(str_one)
else:
print('There is no relation')
except IndexError:
return
str_one = input('Put your First String: ')
str_two = input('Put your Second String: ')
revers_e(str_one, str_two)
I've modified your code, taking out a few bits and adding a few more.
str_one = input('Put your First String: ').split()
I removed the .split(), because all this would do is create a list of length 1, so in your loop, you'd be comparing the entire string of the first string to one letter of the second string.
str_one = (str_one - str_one[i]).split()
You can't remove a character from a string like this in Python, so I split the string into parts (you could also convert them into lists like I did in my other code which I deleted) whereby all the characters up to the last character before the matching character are included, followed by all the characters after the matching character, which are then appended into one string.
I used exception statements, because the first loop will use the original length, but this is subject to change, so could result in errors.
Lastly, I just called the function instead of printing it too, because all that does is return a None type.
These work in Python 2.7+ and Python 3
Given:
>>> s1='abcdefg'
>>> s2='efghijk'
You can use a set:
>>> set(s1).intersection(s2)
{'f', 'e', 'g'}
Then use that set in maketrans to make a translation table to None to delete those characters:
>>> s1.translate(str.maketrans({e:None for e in set(s1).intersection(s2)}))
'abcd'
Or use list comprehension:
>>> ''.join([e for e in s1 if e in s2])
'efg'
And a regex to produce a new string without the common characters:
>>> re.sub(''.join([e for e in s1 if e in s2]), '', s1)
'abcd'
I want to find if string1 is substring of string2.
e.g. string1="abc", string2="afcabcdfg".
I want to add Wildcard cases, e.g. "*" can substitute "a" and "c", "y" can substitute "f" or "d". As a result, "*by" should be substring of "afcabcdfg".
What is general way to code it? How should I loop it?
For the example provided by you try using a dictionary to define all the replacements then loop over the characters of the string like this:
string2="afcabcdfg"
table = {'a': '*', 'c': '*', 'f': 'y', 'd': 'y'}
new_string = ''
for c in string2:
if c in table and table[c] not in new_string: new_string += table[c]
elif c not in table: new_string += c
Use re, and a little string manipulation to make yourself a regex.
import re
string1 = 'abc'
string2 = 'zbcdef'
wildcards = 'a'
# . is wildcard in a regex.
my_regex = string1.replace(wildcards, '.')
# If there is a match, re returns an object. We don't care
# about what info the object holds, just that it returns.
if re.match(my_regex, string2):
print "Success"
# If there's no match, None is returned.
if not re.match(my_regex, string3):
print "Also success"