Delete specific characters from a string (Python) - string

I understand that str = str.replace('x', '') will eliminate all the x's.
But let's say I have a string jxjrxxtzxz and I only want to delete the first and last x making the string jjrxxtzz. This is not string specific. I want to be able to handle all strings, and not just that specific example.
edit: assume that x is the only letter I want to remove. Thank you!

One fairly straight forward way is to just use find and rfind to locate the characters to remove;
s = 'jxjrxxtzxz'
# Remove first occurrence of 'x'
ix = s.find('x')
if ix > -1:
s = s[:ix]+s[ix+1:]
# Remove last occurrence of 'x'
ix = s.rfind('x')
if ix > -1:
s = s[:ix]+s[ix+1:]

Not pretty but this will work:
def remove_first_last(c, s):
return s.replace(c,'', 1)[::-1].replace(c,'',1)[::-1]
Usage:
In [1]: remove_first_last('x', 'jxjrxxtzxz')
Out[1]: 'jjrxxtzz'

Related

Replace string only if all characters match (Thai)

The problem is that มาก technically is in มาก็. Because มาก็ is มาก + ็.
So when I do
"แชมพูมาก็เยอะ".replace("มาก", " X ")
I end up with
แชมพู X ็เยอะ
And what I want
แชมพู X เยอะ
What I really want is to force the last character ก็ to count as a single character, so that มาก no longer matches มาก็.
While I haven't found a proper solution, I was able to find a solution. I split each string into separate (combined) characters via regex. Then I compare those lists to each other.
# Check is list is inside other list
def is_slice_in_list(s,l):
len_s = len(s) #so we don't recompute length of s on every iteration
return any(s == l[i:len_s+i] for i in range(len(l) - len_s+1))
def is_word_in_string(w, s):
a = regex.findall(u'\X', w)
b = regex.findall(u'\X', s)
return is_slice_in_list(a, b)
assert is_word_in_string("มาก็", "พูมาก็เยอะ") == True
assert is_word_in_string("มาก", "พูมาก็เยอะ") == False
The regex will split like this:
พู ม า ก็ เ ย อ ะ
ม า ก
And as it compares ก็ to ก the function figures the words are not the same.
I will mark as answered but if there is a nice or "proper" solution I will chose that one.

I want to compress each letter in a string with a specific length

I have the following string:
x = 'aaabbbbbaaaaaacccccbbbbbbbbbbbbbbb'. I want to get an output like this: abaacbbb, in which "a" will be compressed with a length of 3 and "b" will be compressed with a length of 5. I used the following function, but it removes all the adjacent duplicates and the output is: abacb :
def remove_dup(x):
if len(x) < 2:
return x
if x[0] != x[1]:
return x[0] + remove_dup(x[1:])
return remove_dup(x[1:])
x = 'aaabbbbbaaaaaacccccbbbbbbbbbbbbbbb'
print(remove_dup(x))
It would be wonderful if somebody could help me with this.
Thank you!
Unless this is a homework question with special constraints, this would be more conveniently and arguably more readably implemented with a regex substitution that replaces desired quantities of specific characters with a single instance of the captured character:
import re
def remove_dup(x):
return re.sub('(a){3}|([bc]){5}', r'\1\2', x)
x = 'aaabbbbbaaaaaacccccbbbbbbbbbbbbbbb'
print(remove_dup(x))
This outputs:
abaacbbb

Is there a way to substring, which is between two words in the string in Python?

My question is more or less similar to:
Is there a way to substring a string in Python?
but it's more specifically oriented.
How can I get a par of a string which is located between two known words in the initial string.
Example:
mySrting = "this is the initial string"
Substring = "initial"
knowing that "the" and "string" are the two known words in the string that can be used to get the substring.
Thank you!
You can start with simple string manipulation here. str.index is your best friend there, as it will tell you the position of a substring within a string; and you can also start searching somewhere later in the string:
>>> myString = "this is the initial string"
>>> myString.index('the')
8
>>> myString.index('string', 8)
20
Looking at the slice [8:20], we already get close to what we want:
>>> myString[8:20]
'the initial '
Of course, since we found the beginning position of 'the', we need to account for its length. And finally, we might want to strip whitespace:
>>> myString[8 + 3:20]
' initial '
>>> myString[8 + 3:20].strip()
'initial'
Combined, you would do this:
startIndex = myString.index('the')
substring = myString[startIndex + 3 : myString.index('string', startIndex)].strip()
If you want to look for matches multiple times, then you just need to repeat doing this while looking only at the rest of the string. Since str.index will only ever find the first match, you can use this to scan the string very efficiently:
searchString = 'this is the initial string but I added the relevant string pair a few more times into the search string.'
startWord = 'the'
endWord = 'string'
results = []
index = 0
while True:
try:
startIndex = searchString.index(startWord, index)
endIndex = searchString.index(endWord, startIndex)
results.append(searchString[startIndex + len(startWord):endIndex].strip())
# move the index to the end
index = endIndex + len(endWord)
except ValueError:
# str.index raises a ValueError if there is no match; in that
# case we know that we’re done looking at the string, so we can
# break out of the loop
break
print(results)
# ['initial', 'relevant', 'search']
You can also try something like this:
mystring = "this is the initial string"
mystring = mystring.strip().split(" ")
for i in range(1,len(mystring)-1):
if(mystring[i-1] == "the" and mystring[i+1] == "string"):
print(mystring[i])
I suggest using a combination of list, split and join methods.
This should help if you are looking for more than 1 word in the substring.
Turn the string into array:
words = list(string.split())
Get the index of your opening and closing markers then return the substring:
open = words.index('the')
close = words.index('string')
substring = ''.join(words[open+1:close])
You may want to improve a bit with the checking for the validity before proceeding.
If your problem gets more complex, i.e multiple occurrences of the pair values, I suggest using regular expression.
import re
substring = ''.join(re.findall(r'the (.+?) string', string))
The re should store substrings separately if you view them in list.
I am using the spaces between the description to rule out the spaces between words, you can modify to your needs as well.

How to remove all characters after first instance of punctuation/blank space?

I have short strings (tweets) in which I must extract all instances of mentions from the text and return a list of these instances including repeats.
extract_mentions('.#AndreaTantaros-supersleuth! You are a true journalistic professional. Keep up the great work! #MakeAmericaGreatAgain')
[AndreaTantaros]
How do I make it so that I remove all text after the first instance of punctuation after '#'? (In this case it would be '-') Note, punctuation can be varied. Please no use of regex.
I have used the following:
tweet_list = tweet.split()
mention_list = []
for word in tweet_list:
if '#' in word:
x = word.index('#')
y = word[x+1:len(word)]
if y.isalnum() == False:
y = word[x+1:-1]
mention_list.append(y)
else:
mention_list.append(y)
return mention_list
This would only work for instances with one extra character
import string
def extract_mentions(s, delimeters = string.punctuation + string.whitespace):
mentions = []
begin = s.find('#')
while begin >= 0:
end = begin + 1
while end < len(s) and s[end] not in delimeters:
end += 1
mentions.append(s[begin+1:end])
begin = s.find('#', end)
return mentions
>>> print(extract_mentions('.#AndreaTantaros-supersleuth! You are a true journalistic professional. Keep up the great work! #MakeAmericaGreatAgain'))
['AndreaTantaros']
Use string.punctuation module to get all punctuation chars.
Remove the first characters while they are punctuation (else the answer would be empty string all the time). Then find the first punctuation char.
This uses 2 loops with opposite conditions and a set for better speed.
z =".#AndreaTantaros-supersleuth! You are a true journalistic professional. Keep up the great work! #MakeAmericaGreatAgain') [AndreaTantaros]"
import string
# skip leading punctuation: find position of first non-punctuation
spun=set(string.punctuation) # faster if searched from a set
start_pos = 0
while z[start_pos] in spun:
start_pos+=1
end_pos = start_pos
while z[end_pos] not in spun:
end_pos+=1
print(z[start_pos:end_pos])
Just use regexp to match and extract part of the text.

How can I delete the letter that occurs in the two strings using python?

That's the source code:
def revers_e(str_one,str_two):
for i in range(len(str_one)):
for j in range(len(str_two)):
if str_one[i] == str_two[j]:
str_one = (str_one - str_one[i]).split()
print(str_one)
else:
print('There is no relation')
if __name__ == '__main__':
str_one = input('Put your First String: ').split()
str_two = input('Put your Second String: ')
print(revers_e(str_one, str_two))
How can I remove a letter that occurs in both strings from the first string then print it?
How about a simple pythonic way of doing it
def revers_e(s1, s2):
print(*[i for i in s1 if i in s2]) # Print all characters to be deleted from s1
s1 = ''.join([i for i in s1 if i not in s2]) # Delete them from s1
This answer says, "Python strings are immutable (i.e. they can't be modified). There are a lot of reasons for this. Use lists until you have no choice, only then turn them into strings."
First of all you don't need to use a pretty suboptimal way using range and len to iterate over a string since strings are iterable you can just iterate over them with a simple loop.
And for finding intersection within 2 string you can use set.intersection which returns all the common characters in both string and then use str.translate to remove your common characters
intersect=set(str_one).intersection(str_two)
trans_table = dict.fromkeys(map(ord, intersect), None)
str_one.translate(trans_table)
def revers_e(str_one,str_two):
for i in range(len(str_one)):
for j in range(len(str_two)):
try:
if str_one[i] == str_two[j]:
first_part=str_one[0:i]
second_part=str_one[i+1:]
str_one =first_part+second_part
print(str_one)
else:
print('There is no relation')
except IndexError:
return
str_one = input('Put your First String: ')
str_two = input('Put your Second String: ')
revers_e(str_one, str_two)
I've modified your code, taking out a few bits and adding a few more.
str_one = input('Put your First String: ').split()
I removed the .split(), because all this would do is create a list of length 1, so in your loop, you'd be comparing the entire string of the first string to one letter of the second string.
str_one = (str_one - str_one[i]).split()
You can't remove a character from a string like this in Python, so I split the string into parts (you could also convert them into lists like I did in my other code which I deleted) whereby all the characters up to the last character before the matching character are included, followed by all the characters after the matching character, which are then appended into one string.
I used exception statements, because the first loop will use the original length, but this is subject to change, so could result in errors.
Lastly, I just called the function instead of printing it too, because all that does is return a None type.
These work in Python 2.7+ and Python 3
Given:
>>> s1='abcdefg'
>>> s2='efghijk'
You can use a set:
>>> set(s1).intersection(s2)
{'f', 'e', 'g'}
Then use that set in maketrans to make a translation table to None to delete those characters:
>>> s1.translate(str.maketrans({e:None for e in set(s1).intersection(s2)}))
'abcd'
Or use list comprehension:
>>> ''.join([e for e in s1 if e in s2])
'efg'
And a regex to produce a new string without the common characters:
>>> re.sub(''.join([e for e in s1 if e in s2]), '', s1)
'abcd'

Resources