I am wondering how it is possible to combine the following functions into one. The functions remove the entire word if "_" respectively "/" occur in a text.
I have tried the following, and the code fulfils it purpose. It his however cumbersome and I am wondering how to simplify it.
text = "This is _a default/ text"
def filter_string1(string):
a = []
for i in string.split():
if "_" not in i:
a.append(i)
return ' '.join(a)
def filter_string2(string):
a = []
for i in string.split():
if "/" not in i:
a.append(i)
return ' '.join(a)
text_no_underscore = filter_string1(text)
text_no_underscore_no_slash = filter_string2(text_no_underscore)
print(text_no_underscore_no_slash)
The output is (as desired):
"This is text"
You can combine the if conditions.
text = "This is _a default/ text"
def filter(string):
a = []
for i in string.split():
if "_" not in i and "/" not in i:
a.append(i)
return ' '.join(a)
print(filter(text))
There is a function called re.sub in python's re module which will let you accomplish this quickly.
def remove_words(text):
import re
return re.sub(
pattern=r'\s_[\s\S^\/]*\/', # regular expression used to match the parts to remove
repl='', # replace matched parts with empty string
string=text # use `text` as input
)
Explaining the regular expression \s_[\s\S^\/]*\/ (by deconstructing its parts):
\s_ match whitespace character followed by underscore
[\s\S^\/]* match any character sequence not containing a forward slash (sequence may be length 0)
\/ match the forward slash
Testing the function:
text = "This is _a default/ text"
text_no_underscore_no_slash = remove_words(text)
print('Result:', text_no_underscore_no_slash)
# Result: This is text
text = "This is _a longer/ _and also custom/ text"
text_no_underscore_no_slash = remove_words(text)
print('Result:', text_no_underscore_no_slash)
# Result: This is text
By the way, your original code has a bug, I think.
text = "This is _a longer/ _and also custom/ text"
text_no_underscore = filter_string1(text)
text_no_underscore_no_slash = filter_string2(text_no_underscore)
print(text_no_underscore_no_slash == 'This is text')
# False
Related
Write a function fun(long string) with one string parameter that returns a string. The function should extract the words separated by a single space " ", exclude/drop empty words as well as words that are equal to "end" and "exit", convert the remaining words to upper case, join them with the joining token string ";" and return this newly joined string.
my code is......
def fun(long_string):
long_string = long_string.split(' ')
try:
if 'exit' in long_string:
long_string.remove('exit')
elif 'end' in long_string:
long_string.remove('end')
except ValueError:
pass
.....................
but it does not remove the "End or exit" .can someone pls help me to get it out. Im beginner in python and I stack here
You could try this and convert into the function as you wish - it's very straightforward.
Code did not fully test yet (but works for your inputs), so please try different inputs and you can learn to "improve" it to meet your requirement. Please ask if you have any questions.
inputs = "this is a long test exit string"
stop_words = ('end', 'exit')
outs = ''
for word in inputs.split():
if word in stop_words:
outs = inputs.replace(word, " ")
ans = ';'.join(w.upper() for w in outs.split()) # do the final conversion
Confirm it:
assert ans == "THIS;IS;A;LONG;TEST;STRING" # silent means True
Edit: add function:
def fun(long_string):
#s = "this is a long test exit string"
stop_words = ('end', 'exit')
outs = ''
for word in long_string.split():
if word in stop_words:
outs = long_string.replace(word, " ")
ans = ';'.join(w.upper() for w in outs.split())
return ans
text = "this is a long test exit string"
print(fun(text))
my code:
s = '$ascv abs is good'
re.sub(p.search(s).group(),'',s)
ouput:
'$ascv abs is good'
the output what i want:
'abs is good'
I want to remove string which contains special character by python regular expression. I thought my code was right but the output is wrong.
How can i fix my code to make the output right?
invalid_chars = ['#'] # Characters you don't want in your text
# Determine if a string has any character you don't want
def if_clean(word):
for letter in word:
if letter in invalid_chars:
return False
return True
def clean_text(text):
text = text.split(' ') # Convert text to a list of words
text_clean = ''
for word in text:
if if_clean(word):
text_clean = text_clean+' '+word
return text_clean[1:]
# This will print 'abs is good'
print(clean_text('$ascv abs is good'))
so today I was working on a function that removes any quoted strings from a chunk of data, and replaces them with format areas instead ({0}, {1}, etc...).
I ran into a problem, because the output was becoming completely scrambled, as in a {1} was going in a seemingly random place.
I later found out that this was a problem because the replacement of slices in the list changed the list so that it's length was different, and so the previous re matches would not line up (it only worked for the first iteration).
the gathering of the strings worked perfectly, as expected, as this is most certainly not a problem with re.
I've read about mutable sequences, and a bunch of other things as well, but was not able to find anything on this.
what I think i need is something like str.replace but can take slices, instead of a substring.
here is my code:
import re
def rm_strings_from_data(data):
regex = re.compile(r'"(.*?)"')
s = regex.finditer(data)
list_data = list(data)
val = 0
strings = []
for i in s:
string = i.group()
start, end = i.span()
strings.append(string)
list_data[start:end] = '{%d}' % val
val += 1
print(strings, ''.join(list_data), sep='\n\n')
if __name__ == '__main__':
rm_strings_from_data('[hi="hello!" thing="a thing!" other="other thing"]')
i get:
['"hello!"', '"a thing!"', '"other thing"']
[hi={0} thing="a th{1}r="other thing{2}
I would like the output:
['"hello!"', '"a thing!"', '"other thing"']
[hi={0} thing={1} other={2}]
any help would be appreciated. thanks for your time :)
Why not match both key=value parts using regex capture groups like this: (\w+?)=(".*?")
Then it becomes very easy to assemble the lists as needed.
Sample Code:
import re
def rm_strings_from_data(data):
regex = re.compile(r'(\w+?)=(".*?")')
matches = regex.finditer(data)
strings = []
list_data = []
for matchNum, match in enumerate(matches):
matchNum = matchNum + 1
strings.append(match.group(2))
list_data.append((match.group(1) + '={' + str(matchNum) + '} '))
print(strings, '[' + ''.join(list_data) + ']', sep='\n\n')
if __name__ == '__main__':
rm_strings_from_data('[hi="hello!" thing="a thing!" other="other thing"]')
I'm trying to reverse the words in a string individually so the words are still in order however just reversed such as "hi my name is" with output "ih ym eman si" however the whole string gets flipped
r = 0
def readReverse(): #creates the function
start = default_timer() #initiates a timer
r = len(n.split()) #n is the users input
if len(n) == 0:
return n
else:
return n[0] + readReverse(n[::-1])
duration = default_timer() - start
print(str(r) + " with a runtime of " + str(duration))
print(readReverse(n))
First split the string into words, punctuation and whitespace with a regular expression similar to this. Then you can use a generator expression to reverse each word individually and finally join them together with str.join.
import re
text = "Hello, I'm a string!"
split_text = re.findall(r"[\w']+|[^\w]", text)
reversed_text = ''.join(word[::-1] for word in split_text)
print(reversed_text)
Output:
olleH, m'I a gnirts!
If you want to ignore the punctuation you can omit the regular expression and just split the string:
text = "Hello, I'm a string!"
reversed_text = ' '.join(word[::-1] for word in text.split())
However, the commas, exclamation marks, etc. will then be a part of the words.
,olleH m'I a !gnirts
Here's the recursive version:
def read_reverse(text):
idx = text.find(' ') # Find index of next space character.
if idx == -1: # No more spaces left.
return text[::-1]
else: # Split off the first word and reverse it and recurse.
return text[:idx][::-1] + ' ' + read_reverse(text[idx+1:])
Input:
to-camel-case
to_camel_case
Desired output:
toCamelCase
My code:
def to_camel_case(text):
lst =['_', '-']
if text is None:
return ''
else:
for char in text:
if text in lst:
text = text.replace(char, '').title()
return text
Issues:
1) The input could be an empty string - the above code does not return '' but None;
2) I am not sure that the title()method could help me obtaining the desired output(only the first letter of each word before the '-' or the '_' in caps except for the first.
I prefer not to use regex if possible.
A better way to do this would be using a list comprehension. The problem with a for loop is that when you remove characters from text, the loop changes (since you're supposed to iterate over every item originally in the loop). It's also hard to capitalize the next letter after replacing a _ or - because you don't have any context about what came before or after.
def to_camel_case(text):
# Split also removes the characters
# Start by converting - to _, then splitting on _
l = text.replace('-','_').split('_')
# No text left after splitting
if not len(l):
return ""
# Break the list into two parts
first = l[0]
rest = l[1:]
return first + ''.join(word.capitalize() for word in rest)
And our result:
print to_camel_case("hello-world")
Gives helloWorld
This method is quite flexible, and can even handle cases like "hello_world-how_are--you--", which could be difficult using regex if you're new to it.