removing the matched character in string - python-3.x

original_string ="helloworld"
characters_to_remove="world"
for character in characters_to_remove:
if original_string.find(character) == -1:
continue
else:
# remove matched character
original_string = original_string.replace(character,'',1)
print(original_string)
output:hello
(BUt get getting output is:helol) can any one resolve this issue

Your problem is obvious.
The 'l' in 'world' also appears in 'hello'. So what is happening is two things: your code is removing the first l in hello and then ignoring the l in world, because you are actually looping through 'w','o','r','l','d' just once.
A far better way to do this is to use a regex and python's great many string libraries, for example:
import re
re.sub('world', '', "helloworld")

Without the re module you can use find and replace functions to replace the partial string :
original_string ="helloworld"
characters_to_remove="world"
pos = original_string.find(characters_to_remove)
original_string = original_string.replace(original_string[pos:pos + len(characters_to_remove)],"")

Related

Removing Characters With Regular Expression in List Comprehension in Python

I am learning python and I am trying to do some text preprocessing and I have been reading and borrowing ideas from Stackoverflow. I was able to come up with the following formulations below, but they don't appear to do what I was expecting, and they don't throw any errors either, so I'm stumped.
First, in a Pandas dataframe column, I am trying to remove the third consecutive character in a word; it's kind of like running a spell check on words that are supposed to have two consecutive characters instead of three
buttter = butter
bettter = better
ladder = ladder
The code I used is below:
import re
docs['Comments'] = [c for c in docs['Comments'] if re.sub(r'(\w)\1{2,}', r'\1', c)]
In the second instance, I just want to to replace multiple punctuations with the last one.
????? = ?
..... = .
!!!!! = !
---- = -
***** = *
And the code I have for that is:
docs['Comments'] = [i for i in docs['Comments'] if re.sub(r'[\?\.\!\*]+(?=[\?\.\!\*])', '', i)]
It looks like you want to use
docs['Comments'] = docs['Comments'].str.replace(r'(\w)\1{2,}', r'\1\1', regex=True)
.str.replace(r'([^\w\s]|_)(\1)+', r'\2', regex=True)
The r'(\w)\1{2,}' regex finds three or more repeated word chars and \1\1 replaces with two their occurrences. See this regex demo.
The r'([^\w\s]|_)(\1)+' regex matches repeated punctuation chars and captures the last into Group 2, so \2 replaces the match with the last punctuation char. See this regex demo.

Caesar Cipher in Python - how to replace characters

I'm trying to re-arrange long sentence from a puzzle that is encoded using a Caesar Cipher.
Here is my code.
sentence="g fmnc wms bgblr rpylqjyrc gr zw fylb. rfyrq ufyr amknsrcpq ypc dmp. bmgle gr gl zw fylb gq glcddgagclr ylb rfyr'q ufw rfgq rcvr gq qm jmle. sqgle qrpgle.kyicrpylq() gq pcamkkclbcb. lmu ynnjw ml rfc spj."
import string
a=string.ascii_lowercase[]
b=a[2:]+a[:2]
for i in range(26):
sentence.replace(sentence[sentence.find(a[i])],b[i])
Am I, missing anything in replace function?
When I tried sentence.replace(sentence[sentence.find(a[0])],b[0])
it worked but why I can't loop through?
Thanks.
sentence.replace
returns a new string, which you are immediately throwing away. Note that replacing each character repeatedly will cause duplicate replacements in your cipher. See #RemcoGerlich's answer for a better-detailed explanation of what is wrong. As for the solution, what about
import string
letters = string.ascii_lowercase
shifted = {l: letters[(i + 2) % len(letters)] for i, l in enumerate(letters)}
sentence = ''.join(shifted.get(c, c) for c in sentence.lower())
or if you really want the tabled way:
from string import ascii_lowercase
rotated_lowercase = ascii_lowercase[2:] + ascii_lowercase[:2]
translation_table = str.maketrans(ascii_lowercase, rotated_lowercase)
sentence = sentence.translate(translation_table)
There are a few problems:
One, sentence[sentence.find(a[i])] is strange. It tries to look up where in the sentence the character a[1] occurs, and then looks up which character is there. Well, you already know -- a[1]. Unless that character doesn't occur in the string, then .find will return -1, and sentence[-1] is the last character in the sentence. Probably not what you meant. So instead you meant sentence.replace(a[i], b[i]).
But, you don't save the result anywhere. You meant sentence = sentence.replace(a[i], b[i]).
But that still doesn't work! What if a should be changed into b, and then b into c? Then the original as are also changed into c! That's a fundamental problem with your approach.
Better solutions are given by modesitt. Mine would have been something like
lookupdict = {a_char: b_char for (a_char, b_char) in zip(a, b)}
sentence_translated = [lookupdict.get(s, '') for s in sentence]
sentence = ''.join(sentence_translated)

re.sub replacing string using original sub-string

I have a text file. I would like to remove all decimal points and their trailing numbers, unless text is preceding.
e.g 12.29,14.6,8967.334 should be replaced with 12,14,8967
e.g happypants2.3#email.com should not be modified.
My code is:
import re
txt1 = "9.9,8.8,22.2,88.7,morris1.43#email.com,chat22.3#email.com,123.6,6.54"
txt1 = re.sub(r',\d+[.]\d+', r'\d+',txt1)
print(txt1)
unless there is an easier way of completing this, how do I modify r'\d+' so it just returns the number without a decimal place?
You need to make use of groups in your regex. You put the digits before the '.' into parentheses, and then you can use '\1' to refer to them later:
txt1 = re.sub(r',(\d+)[.]\d+', r',\1',txt1)
Note that in your attempted replacement code you forgot to replace the comma, so your numbers would have been glommed together. This still isn't perfect though; the first number, since it doesn't begin with a comma, isn't processed.
Instead of checking for a comma, the better way is to check word boundaries, which can be done using \b. So the solution is:
import re
txt1 = "9.9,8.8,22.2,88.7,morris1.43#email.com,chat22.3#email.com,123.6,6.54"
txt1 = re.sub(r'\b(\d+)[.]\d+\b', r'\1',txt1)
print(txt1)
Considering these are the only two types of string that is present in your file, you can explicitly check for these conditions.
This may not be an efficient way, but what I have done is split the str and check if the string contains #email.com. If thats true, I am just appending to a new list. For your 1st condition to satisfy, we can convert the str to int which will eliminate the decimal points.
If you want everything back to a str variable, you can use .join().
Code:
txt1 = "9.9,8.8,22.2,88.7,morris1.43#email.com,chat22.3#email.com,123.6,6.54"
txt_list = []
for i in (txt1.split(',')):
if '#email.com' in i:
txt_list.append(i)
else:
txt_list.append(str(int(float(i))))
txt_new = ",".join(txt_list)
txt_new
Output:
'9,8,22,88,morris1.43#email.com,chat22.3#email.com,123,6'

delete characters that are not letters, numbers, whitespace?

community,
I need to clean a string, so that it will contain only letters, numbers and whitespace.
The string momentarily consists of different sentences.
I tried:
for entry in s:
if not isalpha() or isdigit() or isspace:
del (entry)
else: s.append(entry) # the wanted characters should be saved in the string, the rest should be deleted
I am using python 3.4.0
You can use this:
clean_string = ''.join(c for c in s if c.isalnum() or c.isspace())
It iterates through each character, leaving you only with the ones that satisfy at least one of the two criteria, then joins them all back together. I am using isalnum() to check for alphanumeric characters, rather than both isalpha() and isdigit() separately.
You can achieve the same thing using a filter:
clean_string = filter(lambda c: c.isalnum() or c.isspace(), s)
The or does not work the way you think it works in English. Instead, you should do:
new_s = ''
for entry in s:
if entry.isalpha() or entry.isdigit() or entry.isspace():
new_s += entry
print(new_s)

Convert underscores to spaces in Matlab string?

So say I have a string with some underscores like hi_there.
Is there a way to auto-convert that string into "hi there"?
(the original string, by the way, is a variable name that I'm converting into a plot title).
Surprising that no-one has yet mentioned strrep:
>> strrep('string_with_underscores', '_', ' ')
ans =
string with underscores
which should be the official way to do a simple string replacements. For such a simple case, regexprep is overkill: yes, they are Swiss-knifes that can do everything possible, but they come with a long manual. String indexing shown by AndreasH only works for replacing single characters, it cannot do this:
>> s = 'string*-*with*-*funny*-*separators';
>> strrep(s, '*-*', ' ')
ans =
string with funny separators
>> s(s=='*-*') = ' '
Error using ==
Matrix dimensions must agree.
As a bonus, it also works for cell-arrays with strings:
>> strrep({'This_is_a','cell_array_with','strings_with','underscores'},'_',' ')
ans =
'This is a' 'cell array with' 'strings with' 'underscores'
Try this Matlab code for a string variable 's'
s(s=='_') = ' ';
If you ever have to do anything more complicated, say doing a replacement of multiple variable length strings,
s(s == '_') = ' ' will be a huge pain. If your replacement needs ever get more complicated consider using regexprep:
>> regexprep({'hi_there', 'hey_there'}, '_', ' ')
ans =
'hi there' 'hey there'
That being said, in your case #AndreasH.'s solution is the most appropriate and regexprep is overkill.
A more interesting question is why you are passing variables around as strings?
regexprep() may be what you're looking for and is a handy function in general.
regexprep('hi_there','_',' ')
Will take the first argument string, and replace instances of the second argument with the third. In this case it replaces all underscores with a space.
In Matlab strings are vectors, so performing simple string manipulations can be achieved using standard operators e.g. replacing _ with whitespace.
text = 'variable_name';
text(text=='_') = ' '; //replace all occurrences of underscore with whitespace
=> text = variable name
I know this was already answered, however, in my case I was looking for a way to correct plot titles so that I could include a filename (which could have underscores). So, I wanted to print them with the underscores NOT displaying with as subscripts. So, using this great info above, and rather than a space, I escaped the subscript in the substitution.
For example:
% Have the user select a file:
[infile inpath]=uigetfile('*.txt','Get some text file');
figure
% this is a problem for filenames with underscores
title(infile)
% this correctly displays filenames with underscores
title(strrep(infile,'_','\_'))

Resources