I'm trying to get into NLP using NLTK and I understand most of the code below, but I don't understand what x.sub("", word) and if not new_word in "" mean. I'm confused.
text = ["It is a pleasant evening.", "Guests, who came from the US arrived at the venue.", "Food was tasty."]
tokenized_docs = [word_tokenize(doc) for doc in text]
print(tokenized_docs)
x = re.compile("[%s]" % re.escape(string.punctuation))
token_nop = []
for sentence in tokenized_docs:
new_sent = []
for word in sentence:
new_word = x.sub('', word)
if not new_word in '':
sentence.append(new_word)
token_nop.append(sentence)
For simple things like this, Python is actually self-documenting. You can always fire up a Python interpreter and call the __doc__ function on a function to see what it does:
>>> import re
>>> print(re.compile(".*").sub.__doc__)
sub(repl, string[, count = 0]) --> newstring
Return the string obtained by replacing the leftmost non-overlapping
occurrences of pattern in string by the replacement repl.
So, we see, sub is simply the operation that does a substitution on the given regular expression pattern. (If you're unfamiliar with regular expressions in Python, check this out). So, for example:
>>> import re
>>> s = "Hello world"
>>> p = re.compile("[Hh]ello")
>>> p.sub("Goodbye", s)
'Goodbye world'
As for in, that's just checking if new_word is the empty string.
Related
This is a silly repeating parrot program I wrote for practice. It worked on python 2.0, but it's not working now that I've started working with python 3.7. I keep getting this error...
TypeError: translate() takes exactly one argument (2 given)
I'm not sure what is happening.
import time
from string import whitespace, punctuation
# This prints a string up to a certain number of times equaling the number of characters in the string.
print ("I'm a parrot! Teach me a phrase! SQUAK!"),
parrot = input()
count = range(0, int(len(parrot.translate(None, punctuation).translate(None, whitespace))),
(int(len(parrot.translate(None, punctuation).translate(None, whitespace)))-(int(len(parrot.translate(None, punctuation).translate(None, whitespace))-1))))
for i in count:
print ('SQUAK!'),
print (parrot)
time.sleep(1.5)
print ('That phrase had %s letters in it! SQUAK!') % len(parrot.translate(None, punctuation).translate(None, whitespace))
str.translate was changed between Python 2 and 3. In Python 2 the second argument to str.translate() contains characters that are to be removed from the input string. In Python 3 str.translate() does not accept a second argument, instead characters that map to None in the translation table are removed.
You should be able to do this in Python 3:
parrot.translate({ord(c): None for c in punctuation + whitespace})
to remove punctuation and whitespace characters. E.g.
>>> parrot = "I'm a parrot! Teach me a phrase! SQUAK!"
>>> parrot.translate({ord(c): None for c in punctuation + whitespace})
'ImaparrotTeachmeaphraseSQUAK'
You could rewrite your code like this:
import time
from string import whitespace, punctuation
print ("I'm a parrot! Teach me a phrase! SQUAK!"),
parrot = input()
translation_table = {ord(c): None for c in punctuation + whitespace}
cleaned_parrot = parrot.translate(translation_table)
for c in cleaned_parrot:
print('SQUAK!', parrot)
time.sleep(1.5)
print('That phrase had {} letters in it! SQUAK!'.format(len(cleaned_parrot)))
I have a problem with adding p + vowel after a given vowel in a string using Python.
For example:
If I write welcome, the program would print wepelcopomepe.
You can use regex. Here is an example:
import re
s = 'welcome'
new_s = re.sub('([aeiou])', '\g<1>p\g<1>', s)
print(new_s)
> wepelcopomepe
so today I was working on a function that removes any quoted strings from a chunk of data, and replaces them with format areas instead ({0}, {1}, etc...).
I ran into a problem, because the output was becoming completely scrambled, as in a {1} was going in a seemingly random place.
I later found out that this was a problem because the replacement of slices in the list changed the list so that it's length was different, and so the previous re matches would not line up (it only worked for the first iteration).
the gathering of the strings worked perfectly, as expected, as this is most certainly not a problem with re.
I've read about mutable sequences, and a bunch of other things as well, but was not able to find anything on this.
what I think i need is something like str.replace but can take slices, instead of a substring.
here is my code:
import re
def rm_strings_from_data(data):
regex = re.compile(r'"(.*?)"')
s = regex.finditer(data)
list_data = list(data)
val = 0
strings = []
for i in s:
string = i.group()
start, end = i.span()
strings.append(string)
list_data[start:end] = '{%d}' % val
val += 1
print(strings, ''.join(list_data), sep='\n\n')
if __name__ == '__main__':
rm_strings_from_data('[hi="hello!" thing="a thing!" other="other thing"]')
i get:
['"hello!"', '"a thing!"', '"other thing"']
[hi={0} thing="a th{1}r="other thing{2}
I would like the output:
['"hello!"', '"a thing!"', '"other thing"']
[hi={0} thing={1} other={2}]
any help would be appreciated. thanks for your time :)
Why not match both key=value parts using regex capture groups like this: (\w+?)=(".*?")
Then it becomes very easy to assemble the lists as needed.
Sample Code:
import re
def rm_strings_from_data(data):
regex = re.compile(r'(\w+?)=(".*?")')
matches = regex.finditer(data)
strings = []
list_data = []
for matchNum, match in enumerate(matches):
matchNum = matchNum + 1
strings.append(match.group(2))
list_data.append((match.group(1) + '={' + str(matchNum) + '} '))
print(strings, '[' + ''.join(list_data) + ']', sep='\n\n')
if __name__ == '__main__':
rm_strings_from_data('[hi="hello!" thing="a thing!" other="other thing"]')
That's the source code:
def revers_e(str_one,str_two):
for i in range(len(str_one)):
for j in range(len(str_two)):
if str_one[i] == str_two[j]:
str_one = (str_one - str_one[i]).split()
print(str_one)
else:
print('There is no relation')
if __name__ == '__main__':
str_one = input('Put your First String: ').split()
str_two = input('Put your Second String: ')
print(revers_e(str_one, str_two))
How can I remove a letter that occurs in both strings from the first string then print it?
How about a simple pythonic way of doing it
def revers_e(s1, s2):
print(*[i for i in s1 if i in s2]) # Print all characters to be deleted from s1
s1 = ''.join([i for i in s1 if i not in s2]) # Delete them from s1
This answer says, "Python strings are immutable (i.e. they can't be modified). There are a lot of reasons for this. Use lists until you have no choice, only then turn them into strings."
First of all you don't need to use a pretty suboptimal way using range and len to iterate over a string since strings are iterable you can just iterate over them with a simple loop.
And for finding intersection within 2 string you can use set.intersection which returns all the common characters in both string and then use str.translate to remove your common characters
intersect=set(str_one).intersection(str_two)
trans_table = dict.fromkeys(map(ord, intersect), None)
str_one.translate(trans_table)
def revers_e(str_one,str_two):
for i in range(len(str_one)):
for j in range(len(str_two)):
try:
if str_one[i] == str_two[j]:
first_part=str_one[0:i]
second_part=str_one[i+1:]
str_one =first_part+second_part
print(str_one)
else:
print('There is no relation')
except IndexError:
return
str_one = input('Put your First String: ')
str_two = input('Put your Second String: ')
revers_e(str_one, str_two)
I've modified your code, taking out a few bits and adding a few more.
str_one = input('Put your First String: ').split()
I removed the .split(), because all this would do is create a list of length 1, so in your loop, you'd be comparing the entire string of the first string to one letter of the second string.
str_one = (str_one - str_one[i]).split()
You can't remove a character from a string like this in Python, so I split the string into parts (you could also convert them into lists like I did in my other code which I deleted) whereby all the characters up to the last character before the matching character are included, followed by all the characters after the matching character, which are then appended into one string.
I used exception statements, because the first loop will use the original length, but this is subject to change, so could result in errors.
Lastly, I just called the function instead of printing it too, because all that does is return a None type.
These work in Python 2.7+ and Python 3
Given:
>>> s1='abcdefg'
>>> s2='efghijk'
You can use a set:
>>> set(s1).intersection(s2)
{'f', 'e', 'g'}
Then use that set in maketrans to make a translation table to None to delete those characters:
>>> s1.translate(str.maketrans({e:None for e in set(s1).intersection(s2)}))
'abcd'
Or use list comprehension:
>>> ''.join([e for e in s1 if e in s2])
'efg'
And a regex to produce a new string without the common characters:
>>> re.sub(''.join([e for e in s1 if e in s2]), '', s1)
'abcd'
I'm learning Python and one of the labs requires me to import a list of words to serve as a dictionary, then compare that list of words to some text that is also imported. This isn't for a class, I'm just learning this on my own, or I'd ask the teacher. I've been hung up on how to covert that imported text to uppercase before making the comparision.
Here is the URL to the lab: http://programarcadegames.com/index.php?chapter=lab_spell_check
I've looked at the posts/answers below and some youtube videos and I still can't figure out how to do this. Any help would be appreciated.
Convert a Python list with strings all to lowercase or uppercase
How to convert upper case letters to lower case
Here is the code I have so far:
# Chapter 16 Lab 11
import re
# This function takes in a line of text and returns
# a list of words in the line.
def split_line(line):
return re.findall('[A-Za-z]+(?:\'[A-Za-z]+)?',line)
dfile = open("dictionary.txt")
dictfile = []
for line in dfile:
line = line.strip()
dictfile.append(line)
dfile.close()
print ("--- Linear Search ---")
afile = open("AliceInWonderLand200.txt")
for line in afile:
words = []
line = split_line(line)
words.append(line)
for word in words:
lineNumber = 0
lineNumber += 1
if word != (dictfile):
print ("Line ",(lineNumber)," possible misspelled word: ",(word))
afile.close()
Like the lb says: You use .upper():
dictfile = []
for line in dfile:
line = line.strip()
dictfile.append(line.upper()) # <- here.