I have done my code this far but it is not working properly with remove()..can anyone help me..
'''
Created on Apr 21, 2015
#author: Pallavi
'''
from pip._vendor.distlib.compat import raw_input
print ("Enter Query")
str=raw_input()
fo = open("stopwords.txt", "r+")
str1 = fo.read();
list=str1.split("\n");
fo.close()
words=str.split(" ");
for i in range(0,len(words)):
for j in range(0,len(list)):
if(list[j]==words[i]):
print(words[i])
words.remove(words(i))
Here is the error:
Enter Query
let them cry try diesd
let them try
Traceback (most recent call last):
File "C:\Users\Pallavi\workspace\py\src\parser.py", line 17, in <module>
if(list[j]==words[i]):
IndexError: list index out of range
The errors you have (besides my other comments) are because you're modifying a list while iterating over it. But you take the length of the list at the start, thus, after you've removed some elements, you cannot access the last positions.
I would do it this way:
words = ['a', 'b', 'a', 'c', 'd']
stopwords = ['a', 'c']
for word in list(words): # iterating on a copy since removing will mess things up
if word in stopwords:
words.remove(word)
An even more pythonic way using list comprehensions:
new_words = [word for word in words if word not in stopwords]
As an observation, this could be another elegant way to do it:
new_words = list(filter(lambda w: w not in stop_words, initial_words))
''' call this script in a Bash Konsole like so: python reject.py
purpose of this script: remove certain words from a list of words ,
e.g. remove invalid packages in a request-list using
a list of rejected packages from the logfile,
say on https://fai-project.org/FAIme/#
remove trailing spaces e.g. with KDE Kate in wordlist like so:
kate: remove-trailing-space on; BOM off;
'''
with open("rejects", "r+") as fooo :
stwf = fooo.read()
toreject = stwf.split("\n")
with open("wordlist", "r+") as bar :
woL = bar.read()
words = woL.split("\n")
new_words = [word for word in words if word not in toreject]
with open("cleaned", "w+") as foobar :
for ii in new_words:
foobar.write("%s\n" % ii)
one more easy way to remove words from the list is to convert 2 lists into the set and do a subtraction btw the list.
words = ['a', 'b', 'a', 'c', 'd']
words = set(words)
stopwords = ['a', 'c']
stopwords = set(stopwords)
final_list = words - stopwords
final_list = list(final_list)
Related
I would like to find keywords from a list, but return a zero if the word does not exist (in this case: part). In this example, collabor occurs 4 times and part 0 times.
My current output is
[['collabor', 4]]
But what I would like to have is
[['collabor', 4], ['part', 0]]
str1 = ["collabor", "part"]
x10 = []
for y in wordlist:
for string in str1:
if y.find(string) != -1:
x10.append(y)
from collections import Counter
x11 = Counter(x10)
your_list = [list(i) for i in x11.items()]
rowssorted = sorted(your_list, key=lambda x: x[0])
print(rowssorted)
Although you have not clearly written your problem and requirements,I think I understood the task.
I assume that you have a set of words that may or may not occur in a given list and you want to print the count of those words based on the occurrence in the given list.
Code:
constants=["part","collabor"]
wordlist = ["collabor", "collabor"]
d={}
for const in constants:
d[const]=0
for word in wordlist:
if word in d:
d[word]+=1
else:
d[word]=0
from collections import Counter
x11 = Counter(d)
your_list = [list(i) for i in x11.items()]
rowssorted = sorted(your_list, key=lambda x: x[0])
print(rowssorted)
output:
[['collabor', 2], ['part', 0]]
This approach gives the required output.
In python, to get the count of occurrence dictionary is popular.
Hope it helps!
What is the fastest way to find all the substrings in a string without using any modules and without making duplicates
def lols(s):
if not s:
return 0
lst = []
for i in range (len(s)):
for j in range(i , len(s)+1):
if not s[i:j] :
pass
elif len(s[i:j]) == len(set(s[i:j])):
lst.append(s[i:j])
res = (max(lst , key=len))
s = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&'()*+,-./:;<=>?#[\\]^_`{|}~"
s = s*100
lols(s)
this function works fine with strings smaller than 1000, but it freezes when the example string is used and a time limit is exceeded for large strings
Problem
I recommend that you don't try this with super long strings like your s!
If you're able to install nltk so that it works (I just recently had a problem with that but managed to solve it by installing it to the Windows Sandbox, see here: Python3: Could not find "vcomp140.dll (or one of its dependencies)" while trying to import nltk), then this is one way to do it
from nltk import ngrams
def lols(s):
lst = []
for i in range(1, len(s)):
lst.extend([''.join(j) for j in ngrams(s, i)])
lst.append(s)
return lst
If not, you can do this instead of "from nltk import ngrams".
import collections, itertools
def ngrams(words, n):
"""Edited from https://stackoverflow.com/questions/17531684/n-grams-in-python-four-five-six-grams"""
d = collections.deque(maxlen=n)
d.extend(words[:n])
words = words[n:]
answer = []
for window, word in zip(itertools.cycle((d,)), words):
answer.append(''.join(window))
d.append(word)
answer.append(''.join(window))
return answer
Demo:
>>> lols('username')
['u', 's', 'e', 'r', 'n', 'a', 'm', 'e', 'us', 'se', 'er', 'rn', 'na', 'am', 'me', 'use', 'ser', 'ern', 'rna', 'nam', 'ame', 'user', 'sern', 'erna', 'rnam', 'name', 'usern', 'serna', 'ernam', 'rname', 'userna', 'sernam', 'ername', 'usernam', 'sername', 'username']
Maybe your function performance slows down like n2 or n!. (O(n2) or O(n!)) Or the memory is tight.
About the maximum size of your string which you can print in stdout using print function, since you are have to pass your text as a python object to print function and since the max size of your variable is depend on your platform it could be 2**31 - 1 on a 32-bit platform and 2* *63 - 1 on a 64-bit platform.
for more in information go to sys.maxsize
This question already has answers here:
How to disemvowel a string with a condition where if the letter "g" is right beside a vowel, it would also be considered a vowel?
(3 answers)
Closed 3 years ago.
Working on a function that would detect if there are any vowels in a given string, also checks if the letter "g" is beside the vowels or not. If the letter "g" is beside a vowel, then it will also be considered a vowel. I did post a question similar to this and got an answer that almost works but I got no explanation as to how it was done and no-one replied to my comment asking for clarification.
Here is the function:
import re
def disemvowel(text):
result = re.sub(r"G[AEIOU]+|[AEIOU]+G|[AEIOU]+", "", text, flags=re.IGNORECASE)
print(result)
disemvowel("fragrance")
# frrnc
disemvowel('gargden')
# rgdn
disemvowel('gargdenag')
# rgdn
This function works for most cases except for when the letter 'g' both precedes and exceeds a vowel. For example, it does not work when I input 'gag' it returns 'g' when it isn't supposed to return anything. I just need clarification as to how this function works and what edits I could make to it to have it run properly for all scenarios.
This is my original function that I worked on but it only works for vowels since I could not figure out how to add a condition where it would detect the letter 'g' beside a vowel:
def disemvowel(text):
text = list(text)
new_letters = []
for i in text:
if i.lower() == "a" or i.lower() == "e" or i.lower() == "i" or i.lower() == "o" or i.lower() == "u":
pass
else:
new_letters.append(i)
print (''.join(new_letters))
disemvowel('fragrance')
# frgrnc
Two ways of doing it.
import re
def methodA(txt,ii):
vowels = ['a', 'e', 'i', 'o', 'u']
txt = re.sub('g'+ vowels[ii] +'g' ,' ', txt) # remove strings like gug
txt = re.sub(vowels[ii] + 'g', ' ', txt) # remove strings like ug
txt = re.sub('g' + vowels[ii] , ' ', txt) # remove strings like gu
if (ii == len(vowels)-1 or txt == '' or txt == ' ') : # if string is empty or all vowels have been used
txt = "".join( list(filter(lambda x:x not in ['a', 'e', 'i', 'o', 'u'], list(txt)))) # finally remove all vowels
txt = re.sub(' ', '', txt)
return txt
ii = ii + 1
return methodA(txt, ii) # call the function with next vowel in the list
ans = methodA("fragrance",0) # initialize with zeroth vowel
frrnc
from itertools import permutations
def methodB(txt):
vowels = ['a', 'e', 'i', 'o', 'u'] # vowels to remove
a = [] # store permutation of strings to remove in this list
for vowel in vowels: # create the list of all combo of characters that need to be removed
a.extend(list(map(lambda x : x[0]+x[1]+x[2] , list(permutations('g' + vowel + 'g')) )) )
a.extend(list(map(lambda x : x[0]+x[1] , list(permutations('g' + vowel )) )))
lims = list(set([re.sub('gg','', xx) for xx in a ])) # we don't need gg and the duplicates
lims.sort(key = len,reverse = True) # create a list of strings sorted by length
for ll in lims:
txt = re.sub(ll, ' ', txt) # remove the required strings with largest ones first
return re.sub(' ','',txt)
ans=methodB("fragrance")
frrnc
Here is a solution for this task:
def disemvowel(text):
return re.sub(r"G?[AEIOU]+G?", "", text, flags=re.IGNORECASE)
tests = {"fragrance": "frrnc", "gargden": "rgdn", "gargdenag": "rgdn", "gag": ""}
for test, value in tests.items():
assert disemvowel(test) == value
print("PASSED")
Basically, I need to find a line of text in python (and tk) and copy it, and the lines above and below it.
Imported_File = a long list of strings.
One_Ring = the line I'm looking for.
Find One_ring in Imported_File, Copy One_ring, the line above it and below it to Output_Var
I'm really stuck, any help would be amazing.
Considering you are reading from a file which has these long list of strings.
with open('textFile.txt') as f:
Imported_File = f.readlines()
One_Ring = "This is the line\n"
Output_Var = ""
if One_Ring in Imported_File:
idx = Imported_File.index(One_Ring)
Output_Var = Imported_File[max(0, idx - 1):idx+2]
print(Output_Var)
The textFile.txt would be your input file. I created a sample file which looks like
Hello there
How are you
This is the line
Done, Bye
I assume that you are looking the exact line
imported_file = ['a', 'b', 'c', 'd', 'e', 'f']
one_ring = 'd'
if one_ring in imported_file:
i = imported_file.index(one_ring)
start = i-1
end = i+2
# problem with first element before "a" when `one_ring = 'a'`
if start < 0:
start = 0
output_var = imported_file[start:end]
print(output_var)
else:
print('not found')
BTW: use lower_case names for varaibles: PEP 8 -- Style Guide for Python Code
new_listes(poem_line):
r""" (list of str) -> list of str
Return a list of lines from poem_lines
"""
new_list = []
for line in poem_lines:
new_list.append(clean_up(line))
return new_list
Precondition: len(poem_lines) == len(pattern[0])
Return a list of lines from poem_lines that do not have the right number of
syllables for the poetry pattern according to the pronunciation dictionary.
If all lines have the right number of syllables, return th
"""
k = ""
i=0
lst = []
for line in new_listes(poem_lines):
for word in line.split():
for sylables in word_to_phonemes[word]:
for char in sylables:
k = k + char
if k.isdigit():
i=i+1
return i
So for the body this is what I've written so far. I have a list of words built from phonemes (['N', 'EH1', 'K', 'S', 'T'] for the word next) and I would like to check how many digits are there (1 in EH1 makes it 1 for word `next) but I get back a 0.
I've tried moving my code around to different indentation but I'm not sure how to proceed from there.
If I understand correctly what are you asking you want to transform line like this 'The first line leads off' to list like this:
[
['DH', 'AH0'], # The
['F', 'ER1', 'S', 'T'], # first
['L', 'AY1', 'N'], # line
['L', 'IY1', 'D', 'Z'], # leads
['AO1', 'F'] # off
]
And count number of elements that contain number (5 in provided example - AH0, ER1, AY1, IY1, AO1).
What you were doing was building a string like:
'DH'
'DHAH0'
'DHAH0F'
>>> 'DHAH0FER1'.isdigit()
False
You need to count digits in string:
def number_count(input_string):
return sum(int(char.isdigit()) for char in input_string)
>>> number_count('a1b2')
2
And use it in your code (you don't have to build string, you can count digits on the fly):
lst = []
for line in new_listes(poem_lines):
i = 0
for word in line.split():
for sylables in word_to_phonemes[word]:
for char in sylables:
i += number_count(char)
lst.append(i)
return lst
Or do it a bit more pythonically:
for line in new_listes(poem_lines):
i = 0
for word in line.split():
for sylables in word_to_phonemes[word]:
i += sum(number_count(char) for char in sylables)
yield i
Or if you want to keep your code (and build string first, before returning):
lst = []
for line in new_listes(poem_lines):
k = "" # K needs to reset after each line
for word in line.split():
for sylables in word_to_phonemes[word]:
for char in sylables:
k = k + char
lst.append(number_count(k))
return lst
They all should return list (or generator) returning [5,5,4].
Listing only records that doesn't match
I'm not sure what pattern[1] is supposed to mean, but lets assume that for every line you want to work with line[n], pattern[0][n], pattern[1][n], the easiest way to do this is using zip():
for line, pattern0, pattern1 in zip(new_listes(poem_lines), pattern[0], pattern[1]):
i = 0
for word in line.split():
for sylables in word_to_phonemes[word]:
i += sum(number_count(char) for char in sylables)
if i != pattern0:
yield line