new_listes(poem_line):
r""" (list of str) -> list of str
Return a list of lines from poem_lines
"""
new_list = []
for line in poem_lines:
new_list.append(clean_up(line))
return new_list
Precondition: len(poem_lines) == len(pattern[0])
Return a list of lines from poem_lines that do not have the right number of
syllables for the poetry pattern according to the pronunciation dictionary.
If all lines have the right number of syllables, return th
"""
k = ""
i=0
lst = []
for line in new_listes(poem_lines):
for word in line.split():
for sylables in word_to_phonemes[word]:
for char in sylables:
k = k + char
if k.isdigit():
i=i+1
return i
So for the body this is what I've written so far. I have a list of words built from phonemes (['N', 'EH1', 'K', 'S', 'T'] for the word next) and I would like to check how many digits are there (1 in EH1 makes it 1 for word `next) but I get back a 0.
I've tried moving my code around to different indentation but I'm not sure how to proceed from there.
If I understand correctly what are you asking you want to transform line like this 'The first line leads off' to list like this:
[
['DH', 'AH0'], # The
['F', 'ER1', 'S', 'T'], # first
['L', 'AY1', 'N'], # line
['L', 'IY1', 'D', 'Z'], # leads
['AO1', 'F'] # off
]
And count number of elements that contain number (5 in provided example - AH0, ER1, AY1, IY1, AO1).
What you were doing was building a string like:
'DH'
'DHAH0'
'DHAH0F'
>>> 'DHAH0FER1'.isdigit()
False
You need to count digits in string:
def number_count(input_string):
return sum(int(char.isdigit()) for char in input_string)
>>> number_count('a1b2')
2
And use it in your code (you don't have to build string, you can count digits on the fly):
lst = []
for line in new_listes(poem_lines):
i = 0
for word in line.split():
for sylables in word_to_phonemes[word]:
for char in sylables:
i += number_count(char)
lst.append(i)
return lst
Or do it a bit more pythonically:
for line in new_listes(poem_lines):
i = 0
for word in line.split():
for sylables in word_to_phonemes[word]:
i += sum(number_count(char) for char in sylables)
yield i
Or if you want to keep your code (and build string first, before returning):
lst = []
for line in new_listes(poem_lines):
k = "" # K needs to reset after each line
for word in line.split():
for sylables in word_to_phonemes[word]:
for char in sylables:
k = k + char
lst.append(number_count(k))
return lst
They all should return list (or generator) returning [5,5,4].
Listing only records that doesn't match
I'm not sure what pattern[1] is supposed to mean, but lets assume that for every line you want to work with line[n], pattern[0][n], pattern[1][n], the easiest way to do this is using zip():
for line, pattern0, pattern1 in zip(new_listes(poem_lines), pattern[0], pattern[1]):
i = 0
for word in line.split():
for sylables in word_to_phonemes[word]:
i += sum(number_count(char) for char in sylables)
if i != pattern0:
yield line
Related
Alien Dictionary
Link to the online judge -> LINK
Given a sorted dictionary of an alien language having N words and k starting alphabets of standard dictionary. Find the order of characters in the alien language.
Note: Many orders may be possible for a particular test case, thus you may return any valid order and output will be 1 if the order of string returned by the function is correct else 0 denoting incorrect string returned.
Example 1:
Input:
N = 5, K = 4
dict = {"baa","abcd","abca","cab","cad"}
Output:
1
Explanation:
Here order of characters is
'b', 'd', 'a', 'c' Note that words are sorted
and in the given language "baa" comes before
"abcd", therefore 'b' is before 'a' in output.
Similarly we can find other orders.
My working code:
from collections import defaultdict
class Solution:
def __init__(self):
self.vertList = defaultdict(list)
def addEdge(self,u,v):
self.vertList[u].append(v)
def topologicalSortDFS(self,givenV,visited,stack):
visited.add(givenV)
for nbr in self.vertList[givenV]:
if nbr not in visited:
self.topologicalSortDFS(nbr,visited,stack)
stack.append(givenV)
def findOrder(self,dict, N, K):
list1 = dict
for i in range(len(list1)-1):
word1 = list1[i]
word2 = list1[i+1]
rangej = min(len(word1),len(word2))
for j in range(rangej):
if word1[j] != word2[j]:
u = word1[j]
v = word2[j]
self.addEdge(u,v)
break
stack = []
visited = set()
vlist = [v for v in self.vertList]
for v in vlist:
if v not in visited:
self.topologicalSortDFS(v,visited,stack)
result = " ".join(stack[::-1])
return result
#{
# Driver Code Starts
#Initial Template for Python 3
class sort_by_order:
def __init__(self,s):
self.priority = {}
for i in range(len(s)):
self.priority[s[i]] = i
def transform(self,word):
new_word = ''
for c in word:
new_word += chr( ord('a') + self.priority[c] )
return new_word
def sort_this_list(self,lst):
lst.sort(key = self.transform)
if __name__ == '__main__':
t=int(input())
for _ in range(t):
line=input().strip().split()
n=int(line[0])
k=int(line[1])
alien_dict = [x for x in input().strip().split()]
duplicate_dict = alien_dict.copy()
ob=Solution()
order = ob.findOrder(alien_dict,n,k)
x = sort_by_order(order)
x.sort_this_list(duplicate_dict)
if duplicate_dict == alien_dict:
print(1)
else:
print(0)
My problem:
The code runs fine for the test cases that are given in the example but fails for ["baa", "abcd", "abca", "cab", "cad"]
It throws the following error for this input:
Runtime Error:
Runtime ErrorTraceback (most recent call last):
File "/home/e2beefe97937f518a410813879a35789.py", line 73, in <module>
x.sort_this_list(duplicate_dict)
File "/home/e2beefe97937f518a410813879a35789.py", line 58, in sort_this_list
lst.sort(key = self.transform)
File "/home/e2beefe97937f518a410813879a35789.py", line 54, in transform
new_word += chr( ord('a') + self.priority[c] )
KeyError: 'f'
Running in some other IDE:
If I explicitly give this input using some other IDE then the output I'm getting is b d a c
Interesting problem. Your idea is correct, it is a partially ordered set you can build a directed acyclcic graph and find an ordered list of vertices using topological sort.
The reason for your program to fail is because not all the letters that possibly some letters will not be added to your vertList.
Spoiler: adding the following line somewhere in your code solves the issue
vlist = [chr(ord('a') + v) for v in range(K)]
A simple failing example
Consider the input
2 4
baa abd
This will determine the following vertList
{"b": ["a"]}
The only constraint is that b must come before a in this alphabet. Your code returns the alphabet b a, since the letter d is not present you the driver code will produce an error when trying to check your solution. In my opinion it should simply output 0 in this situation.
#open a file for input
#loop through the contents to find four letter words
#split the contents of the string
#if length of string = 4 then print the word
my_file = open("myfile.txt", 'r')
for sentence in my_file:
single_strings = sentence.split()
for word in single_strings:
if len(word) == 4:
print(word)
I would like my code to return four letter words in a single string and instead it returns each string on a new line. How can I return the strings as one string so that I can split() them and get their length to print out.
All problems are simpler when broke in small parts. First write a function that return an array containing all words from a file:
def words_in_file(filename):
with open(filename, 'r') as f:
return [word for sentence in f for word in sentence.split()]
Then a function that filters arrays of words:
def words_with_k_letters(words, k=-1):
return filter(lambda w: len(w) == k, words)
Once you have these two function the problem becomes trivial:
words = words_in_file("myfile.txt")
words = words_with_k_letters(words, k=4)
print(', '.join(words))
This question already has answers here:
How to disemvowel a string with a condition where if the letter "g" is right beside a vowel, it would also be considered a vowel?
(3 answers)
Closed 3 years ago.
Working on a function that would detect if there are any vowels in a given string, also checks if the letter "g" is beside the vowels or not. If the letter "g" is beside a vowel, then it will also be considered a vowel. I did post a question similar to this and got an answer that almost works but I got no explanation as to how it was done and no-one replied to my comment asking for clarification.
Here is the function:
import re
def disemvowel(text):
result = re.sub(r"G[AEIOU]+|[AEIOU]+G|[AEIOU]+", "", text, flags=re.IGNORECASE)
print(result)
disemvowel("fragrance")
# frrnc
disemvowel('gargden')
# rgdn
disemvowel('gargdenag')
# rgdn
This function works for most cases except for when the letter 'g' both precedes and exceeds a vowel. For example, it does not work when I input 'gag' it returns 'g' when it isn't supposed to return anything. I just need clarification as to how this function works and what edits I could make to it to have it run properly for all scenarios.
This is my original function that I worked on but it only works for vowels since I could not figure out how to add a condition where it would detect the letter 'g' beside a vowel:
def disemvowel(text):
text = list(text)
new_letters = []
for i in text:
if i.lower() == "a" or i.lower() == "e" or i.lower() == "i" or i.lower() == "o" or i.lower() == "u":
pass
else:
new_letters.append(i)
print (''.join(new_letters))
disemvowel('fragrance')
# frgrnc
Two ways of doing it.
import re
def methodA(txt,ii):
vowels = ['a', 'e', 'i', 'o', 'u']
txt = re.sub('g'+ vowels[ii] +'g' ,' ', txt) # remove strings like gug
txt = re.sub(vowels[ii] + 'g', ' ', txt) # remove strings like ug
txt = re.sub('g' + vowels[ii] , ' ', txt) # remove strings like gu
if (ii == len(vowels)-1 or txt == '' or txt == ' ') : # if string is empty or all vowels have been used
txt = "".join( list(filter(lambda x:x not in ['a', 'e', 'i', 'o', 'u'], list(txt)))) # finally remove all vowels
txt = re.sub(' ', '', txt)
return txt
ii = ii + 1
return methodA(txt, ii) # call the function with next vowel in the list
ans = methodA("fragrance",0) # initialize with zeroth vowel
frrnc
from itertools import permutations
def methodB(txt):
vowels = ['a', 'e', 'i', 'o', 'u'] # vowels to remove
a = [] # store permutation of strings to remove in this list
for vowel in vowels: # create the list of all combo of characters that need to be removed
a.extend(list(map(lambda x : x[0]+x[1]+x[2] , list(permutations('g' + vowel + 'g')) )) )
a.extend(list(map(lambda x : x[0]+x[1] , list(permutations('g' + vowel )) )))
lims = list(set([re.sub('gg','', xx) for xx in a ])) # we don't need gg and the duplicates
lims.sort(key = len,reverse = True) # create a list of strings sorted by length
for ll in lims:
txt = re.sub(ll, ' ', txt) # remove the required strings with largest ones first
return re.sub(' ','',txt)
ans=methodB("fragrance")
frrnc
Here is a solution for this task:
def disemvowel(text):
return re.sub(r"G?[AEIOU]+G?", "", text, flags=re.IGNORECASE)
tests = {"fragrance": "frrnc", "gargden": "rgdn", "gargdenag": "rgdn", "gag": ""}
for test, value in tests.items():
assert disemvowel(test) == value
print("PASSED")
I have written a program which is counting trigrams that occur 5 times or more in a text file. The trigrams should be printed out according to their frequency.
I cannot find the problem!
I get the following error message:
list index out of range
I have tried to make the range bigger but that did not work out
f = open("bsp_file.txt", encoding="utf-8")
text = f.read()
f.close()
words = []
for word in text.split():
word = word.strip(",.:;-?!-–—_ ")
if len(word) != 0:
words.append(word)
trigrams = {}
for i in range(len(words)):
word = words[i]
nextword = words[i + 1]
nextnextword = words[i + 2]
key = (word, nextword, nextnextword)
trigrams[key] = trigrams.get(key, 0) + 1
l = list(trigrams.items())
l.sort(key=lambda x: x[1])
l.reverse()
for key, count in l:
if count < 5:
break
word = key[0]
nextword = key[1]
nextnextword = key[2]
print(word, nextword, nextnextword, count)
The result should look like this:(simplified)
s = "this is a trigram which is an example............."
this is a
is a trigram
a trigram which
trigram which is
which is an
is an example
As the comments pointed out, you're iterating over your list words with i, and you try to access words[i+1], when i will reach the last cell of words, i+1 will be out of range.
I suggest you read this tutorial to generate n-grams with pure python: http://www.albertauyeung.com/post/generating-ngrams-python/
Answer
If you don't have much time to read it all here's the function I recommend adaptated from the link:
def get_ngrams_count(words, n):
# generates a list of Tuples representing all n-grams
ngrams_tuple = zip(*[words[i:] for i in range(n)])
# turn the list into a dictionary with the counts of all ngrams
ngrams_count = {}
for ngram in ngrams_tuple:
if ngram not in ngrams_count:
ngrams_count[ngram] = 0
ngrams_count[ngram] += 1
return ngrams_count
trigrams = get_ngrams_count(words, 3)
Please note that you can make this function a lot simpler by using a Counter (which subclasses dict, so it will be compatible with your code) :
from collections import Counter
def get_ngrams_count(words, n):
# turn the list into a dictionary with the counts of all ngrams
return Counter(zip(*[words[i:] for i in range(n)]))
trigrams = get_ngrams_count(words, 3)
Side Notes
You can use the bool argument reverse in .sort() to sort your list from most common to least common:
l = list(trigrams.items())
l.sort(key=lambda x: x[1], reverse=True)
this is a tad faster than sorting your list in ascending order and then reverse it with .reverse()
A more generic function for the printing of your sorted list (will work for any n-grams and not just tri-grams):
for ngram, count in l:
if count < 5:
break
# " ".join(ngram) will combine all elements of ngram in a string, separated with spaces
print(" ".join(ngram), count)
For example :
Example 1:
string = "Jack and the bean stalk."
updated_string = "Jcak and the baen saltk."
Example 2:
string = "Hey, Do you want to boogie? Yes, Please."
updated_string = "Hey, Do you wnat to bogoie? Yes, Palsee."
Now this string is stored in a file.
I want to read this string from the file. And write the updated string back in the file at same positions.
Letters of each word of the string with length greater than 3 must be scrambled/shuffled while keeping first and last letter as it is. Also if there is any punctuation mark the punctuation mark stays as it is.
My approach:
import random
with open("path/textfile.txt","r+") as file:
file.seek(0)
my_list = []
for line in file.readlines():
word = line.split()
my_list.append(word)
scrambled_list =[]
for i in my_list:
if len(i) >3:
print(i)
s1 = i[1]
s2 = i[-1]
s3 = i[1:-1]
random.shuffle(s3)
y = ''.join(s3)
z = s1+y+s2+' '
print(z)
This is one approach.
Demo:
from random import shuffle
import string
punctuation = tuple(string.punctuation)
for line in file.readlines(): #Iterate lines
res = []
for i in line.split(): #Split sentence to words
punch = False
if len(i) >= 4: #Check if word is greater than 3 letters
if i.endswith(punctuation): #Check if words ends with punctuation
val = list(i[1:-2]) #Exclude last 2 chars
punch = True
else:
val = list(i[1:-1]) #Exclude last 1 chars
shuffle(val) #Shuffle letters excluding the first and last.
if punch:
res.append("{0}{1}{2}".format(i[0], "".join(val), i[-2:]))
else:
res.append("{0}{1}{2}".format(i[0], "".join(val), i[-1]))
else:
res.append(i)
print(" ".join(res))