Python 2.7 - remove special characters from a string and camelCasing it - string

Desired output:
My code:
def to_camel_case(text):
lst =['_', '-']
if text is None:
return ''
for char in text:
if text in lst:
text = text.replace(char, '').title()
return text
1) The input could be an empty string - the above code does not return '' but None;
2) I am not sure that the title()method could help me obtaining the desired output(only the first letter of each word before the '-' or the '_' in caps except for the first.
I prefer not to use regex if possible.

A better way to do this would be using a list comprehension. The problem with a for loop is that when you remove characters from text, the loop changes (since you're supposed to iterate over every item originally in the loop). It's also hard to capitalize the next letter after replacing a _ or - because you don't have any context about what came before or after.
def to_camel_case(text):
# Split also removes the characters
# Start by converting - to _, then splitting on _
l = text.replace('-','_').split('_')
# No text left after splitting
if not len(l):
return ""
# Break the list into two parts
first = l[0]
rest = l[1:]
return first + ''.join(word.capitalize() for word in rest)
And our result:
print to_camel_case("hello-world")
Gives helloWorld
This method is quite flexible, and can even handle cases like "hello_world-how_are--you--", which could be difficult using regex if you're new to it.


Replace string only if all characters match (Thai)

The problem is that มาก technically is in มาก็. Because มาก็ is มาก + ็.
So when I do
"แชมพูมาก็เยอะ".replace("มาก", " X ")
I end up with
แชมพู X ็เยอะ
And what I want
แชมพู X เยอะ
What I really want is to force the last character ก็ to count as a single character, so that มาก no longer matches มาก็.
While I haven't found a proper solution, I was able to find a solution. I split each string into separate (combined) characters via regex. Then I compare those lists to each other.
# Check is list is inside other list
def is_slice_in_list(s,l):
len_s = len(s) #so we don't recompute length of s on every iteration
return any(s == l[i:len_s+i] for i in range(len(l) - len_s+1))
def is_word_in_string(w, s):
a = regex.findall(u'\X', w)
b = regex.findall(u'\X', s)
return is_slice_in_list(a, b)
assert is_word_in_string("มาก็", "พูมาก็เยอะ") == True
assert is_word_in_string("มาก", "พูมาก็เยอะ") == False
The regex will split like this:
พู ม า ก็ เ ย อ ะ
ม า ก
And as it compares ก็ to ก the function figures the words are not the same.
I will mark as answered but if there is a nice or "proper" solution I will chose that one.

How can I slice and keep text in a list every specific character?

I used beautifulsoup and I got a result form .get_text(). The result contains a long text:
alpha = ['\n\n\n\nIntroduction!!\nGood\xa0morning.\n\n\n\nHow\xa0are\xa0you?\n\n']
It can be noticed that the number of \n is not the same, and there are \xa0 for spacing.
I want to slice every group of \n (\n\n or \n\n\n or \n\n\n\n ) and replace \xa0 with a space in a new list, to look like this:
beta = ['Introduction!!','Good morning.','How are you?']
How can I do it?
Thank you in advance.
I wrote a little script that solves your problem:
alpha = ['\n\n\n\nIntroduction!!\nGood\xa0morning.\n\n\n\nHow\xa0are\xa0you?\n\n']
beta = []
for s in alpha:
# Turning the \xa0 into spaces
s = s.replace('\xa0',' ')
# Breaking the string by \n
s = s.split('\n')
# Explanation 1
s = list(filter(lambda s: s!= '',s))
# Explanation 2
beta = beta + s
Explanation 1
As there is some sequences of \n inside the alpha string, the split() will generate some empty strings. The filter() that I wrote removes them from the list.
Explanation 2
When the s string got split, it turns into a list of strings. Then, we need to concatenate the lists.

Is there a way to substring, which is between two words in the string in Python?

My question is more or less similar to:
Is there a way to substring a string in Python?
but it's more specifically oriented.
How can I get a par of a string which is located between two known words in the initial string.
mySrting = "this is the initial string"
Substring = "initial"
knowing that "the" and "string" are the two known words in the string that can be used to get the substring.
Thank you!
You can start with simple string manipulation here. str.index is your best friend there, as it will tell you the position of a substring within a string; and you can also start searching somewhere later in the string:
>>> myString = "this is the initial string"
>>> myString.index('the')
>>> myString.index('string', 8)
Looking at the slice [8:20], we already get close to what we want:
>>> myString[8:20]
'the initial '
Of course, since we found the beginning position of 'the', we need to account for its length. And finally, we might want to strip whitespace:
>>> myString[8 + 3:20]
' initial '
>>> myString[8 + 3:20].strip()
Combined, you would do this:
startIndex = myString.index('the')
substring = myString[startIndex + 3 : myString.index('string', startIndex)].strip()
If you want to look for matches multiple times, then you just need to repeat doing this while looking only at the rest of the string. Since str.index will only ever find the first match, you can use this to scan the string very efficiently:
searchString = 'this is the initial string but I added the relevant string pair a few more times into the search string.'
startWord = 'the'
endWord = 'string'
results = []
index = 0
while True:
startIndex = searchString.index(startWord, index)
endIndex = searchString.index(endWord, startIndex)
results.append(searchString[startIndex + len(startWord):endIndex].strip())
# move the index to the end
index = endIndex + len(endWord)
except ValueError:
# str.index raises a ValueError if there is no match; in that
# case we know that we’re done looking at the string, so we can
# break out of the loop
# ['initial', 'relevant', 'search']
You can also try something like this:
mystring = "this is the initial string"
mystring = mystring.strip().split(" ")
for i in range(1,len(mystring)-1):
if(mystring[i-1] == "the" and mystring[i+1] == "string"):
I suggest using a combination of list, split and join methods.
This should help if you are looking for more than 1 word in the substring.
Turn the string into array:
words = list(string.split())
Get the index of your opening and closing markers then return the substring:
open = words.index('the')
close = words.index('string')
substring = ''.join(words[open+1:close])
You may want to improve a bit with the checking for the validity before proceeding.
If your problem gets more complex, i.e multiple occurrences of the pair values, I suggest using regular expression.
import re
substring = ''.join(re.findall(r'the (.+?) string', string))
The re should store substrings separately if you view them in list.
I am using the spaces between the description to rule out the spaces between words, you can modify to your needs as well.

How can I delete the letter that occurs in the two strings using python?

That's the source code:
def revers_e(str_one,str_two):
for i in range(len(str_one)):
for j in range(len(str_two)):
if str_one[i] == str_two[j]:
str_one = (str_one - str_one[i]).split()
print('There is no relation')
if __name__ == '__main__':
str_one = input('Put your First String: ').split()
str_two = input('Put your Second String: ')
print(revers_e(str_one, str_two))
How can I remove a letter that occurs in both strings from the first string then print it?
How about a simple pythonic way of doing it
def revers_e(s1, s2):
print(*[i for i in s1 if i in s2]) # Print all characters to be deleted from s1
s1 = ''.join([i for i in s1 if i not in s2]) # Delete them from s1
This answer says, "Python strings are immutable (i.e. they can't be modified). There are a lot of reasons for this. Use lists until you have no choice, only then turn them into strings."
First of all you don't need to use a pretty suboptimal way using range and len to iterate over a string since strings are iterable you can just iterate over them with a simple loop.
And for finding intersection within 2 string you can use set.intersection which returns all the common characters in both string and then use str.translate to remove your common characters
trans_table = dict.fromkeys(map(ord, intersect), None)
def revers_e(str_one,str_two):
for i in range(len(str_one)):
for j in range(len(str_two)):
if str_one[i] == str_two[j]:
str_one =first_part+second_part
print('There is no relation')
except IndexError:
str_one = input('Put your First String: ')
str_two = input('Put your Second String: ')
revers_e(str_one, str_two)
I've modified your code, taking out a few bits and adding a few more.
str_one = input('Put your First String: ').split()
I removed the .split(), because all this would do is create a list of length 1, so in your loop, you'd be comparing the entire string of the first string to one letter of the second string.
str_one = (str_one - str_one[i]).split()
You can't remove a character from a string like this in Python, so I split the string into parts (you could also convert them into lists like I did in my other code which I deleted) whereby all the characters up to the last character before the matching character are included, followed by all the characters after the matching character, which are then appended into one string.
I used exception statements, because the first loop will use the original length, but this is subject to change, so could result in errors.
Lastly, I just called the function instead of printing it too, because all that does is return a None type.
These work in Python 2.7+ and Python 3
>>> s1='abcdefg'
>>> s2='efghijk'
You can use a set:
>>> set(s1).intersection(s2)
{'f', 'e', 'g'}
Then use that set in maketrans to make a translation table to None to delete those characters:
>>> s1.translate(str.maketrans({e:None for e in set(s1).intersection(s2)}))
Or use list comprehension:
>>> ''.join([e for e in s1 if e in s2])
And a regex to produce a new string without the common characters:
>>> re.sub(''.join([e for e in s1 if e in s2]), '', s1)

parsing words in a document using specific delimiters

I have a document that I'm parsing words from but I want to consider anything that is not a-z, A-Z, 0-9, or an apostrophe, to be white space. How could I do this if I am using the following bit of code before:
ifstream file;;
while(file >> word){
listOfWords.push_back(word); // I want to make sure only words with the stated
// range of characters exist in my list.
So, for example, the word would be two elements in my list, "hor" and "se".
Create a list of "whitespace characters" and then each time you encounter a character, check to see if that character is in the list and if so you've started a new word. This example is written in python, but the concept is the same.
def get_words(whitespace_chars, string):
words = []
current_word = ""
for x in range(0, len(string)):
#check to see if we hit the end of a word.
if(string[x] in whitespace_chars and current_word != ""):
current_word = ""
#add current letter to current word.
current_word += string[x]
#if the last letter isnt whitespace then the last word wont be added, so add here.
if(current_word != ""):
return words
return words
