IndexError: string index out of range, can't figure out why - string

I want in this part of my code, to cut out any non alphabetical symbol in the words I get from a read file.
I get that there is probably an empty string being tested on, that the error is happening,
but I can't figure out why after numerous different codes I tried.
Here's what I have now for it:
for i in given_file:
cut_it_out = True
while cut_it_out:
if len(i) == 0:
cut_it_out = False
else:
while (len(i) != 0) and cut_it_out:
if i.lower()[0].isalpha() and i.lower()[len(i) - 1].isalpha():
cut_it_out = False
if (not i.lower()[len(i) - 1].isalpha()):
i = i[:len(i) - 2]
if (not i.lower()[0].isalpha()):
i = i[1:]
Can anyone help me figure this out?
thanks.
Thanks for the interesting answers :), I want it to be even more precise, but there is an endless loop problem on I can't seem to get rid of.
Can anyone help me figure it out?
all_words = {} # New empty dictionary
for i in given_file:
if "--" in i:
split_point = i.index("--")
part_1 = i[:split_point]
part_2 = i[split_point + 2:]
combined_parts = [part_1, part_2]
given_file.insert(given_file.index(i)+2, str(part_1))
given_file.insert(given_file.index(part_1)+1, str(part_2))
#given_file.extend(combined_parts)
given_file.remove(i)
continue
elif len(i) > 0:
if i.find('0') == -1 and i.find('1') == -1 and i.find('2') == -1 and i.find('3') == -1 and i.find('4') == -1\
and i.find('5') == -1 and i.find('6') == -1 and i.find('7') == -1 and i.find('8') == -1 and i.find('9') == -1:
while not i[:1].isalpha():
i = i[1:]
while not i[-1:].isalpha():
i = i[:-1]
if i.lower() not in all_words:
all_words[i.lower()] = 1
elif i.lower() in all_words:
all_words[i.lower()] += 1

I think your problem is a consequence of an over complicated solution.
The error was pointed by #tobias_k. And anyway your code can be very inefficient.
Try to simplify, for example try: (I have not tested yet)
for i in given_file:
beg=0
end=len(i)-1
while beg<=end and not i[beg].isalpha():
beg=beg+1
while beg<=end and not i[end].isalpha():
end=end-1
res=""
if beg<=end:
res=i[beg:end]

There are a few problems with your code:
The immediate problem is that the second if can strip away the last character in a string of all non-alpha characters, and then the third if will produce an exception.
If the last character is non-alpha, you strip away the last two characters.
There is no need for those two nested loops, and you can use break instead of that boolean variable
if i.lower()[x] is non-alpha, so is i[x]; also, better use i[-1] for the last index
After fixing those issues, but keeping the general idea the same, your code becomes
while len(i) > 0:
if i[0].isalpha() and i[-1].isalpha():
break
if not i[-1].isalpha():
i = i[:-1]
elif not i[0].isalpha(): # actually, just 'else' would be enough, too
i = i[1:]
But that's still a bit hard to follow. I suggest using two loops for the two ends of the string:
while i and not i[:1].isalpha():
i = i[1:]
while i and not i[-1:].isalpha():
i = i[:-1]
Or you could just use a regular expression, somethink like this:
i = re.sub(r"^[^a-zA-Z]+|[^a-zA-Z]+$", "", i)
This reads: Replace all (+) characters that are not ([^...]) in the group a-zA-Z that are directly after the start of the string (^) or (|) before the string's end ($) with "".

Related

Write a program to check the overlapping of one string's suffix with the prefix of another string

a = input()
b = input()
def longestSubstringFinder(string1, string2):
answer = ""
len1, len2 = len(string1), len(string2)
for i in range(len1):
match = ""
for j in range(len2):
if (i + j < len1 and string1[i + j] == string2[j]):
match += string2[j]
else:
if (len(match) > len(answer)): answer = match
match = ""
if answer == '':
return 'No overlapping'
else:
return answer
print(longestSubstringFinder(a, b))
in the above code, not getting the expected outcome for the input
correct
wrong
My output: e
expected output: No overlapping
Some issues:
the else block should not allow the inner loop to continue: when you have a mismatch, you should not try matches with higher values of j, but you should exit that loop and try with the next value for i. So there needs to be a break in the else block
the condition len(match) > len(answer) is not enough to identify a solution. The reason for getting into the else block might have been that the characters didn't match, so in that case you should never update answer.
On the other hand, the update of answer is not happening now when the inner loop ends normally, i.e. when all compared characters were equal and i + j < len1 was always true. This case happens when the second input string is a suffix of the first. So you must make the update to answer somewhere else, so you also catch this case.
Here is the correction to your code, dealing with these issues:
def longestSubstringFinder(string1, string2):
answer = ""
len1, len2 = len(string1), len(string2)
for i in range(len1):
match = ""
for j in range(len2):
if (i + j < len1 and string1[i + j] == string2[j]):
match += string2[j]
# Move the assignment to answer here, and add condition that
# this was the last character of string1:
if i + j == len1 - 1 and len(match) > len(answer): answer = match
else:
break # Necessary!
if answer == '':
return 'No overlapping'
else:
return answer
With the use of string slicing, and comparing those slices instead of individual characters, you can make your code shorter and run faster.
Using RegEX, you can do it with lesser lines of code. I'm assuming you're a beginner in Python. If you are, then please learn RegEx and List Comprehension for this type of code.
import re
str1, str2 = input(), input()
def longestSubstringFinder(string1, string2):
list_of_subsets = [str1.replace(str1[:i], '') for i in range(len(str1))]
intersect = [re.search('^'+slc, str2).group() for slc in list_of_subsets if re.search('^'+slc, str2)]
if len(intersect) == 0:
return 'No overlapping'
else:
return intersect[0]
print(longestSubstringFinder(str1, str2))
a = str(input())
b = str(input())
g=len(a)-1
prefix = "No overlapping"
for i in range(len(a)):
if a[(g-i):] == b[:i+1]:
prefix = b[:i+1]
print(prefix)

Find anagrams of a given sentence from a list of words

I have a sentence with no spaces and only lowercase letters, for example:
"johndrinksmilk"
and a list of words, which contains only words that could be anagrams of the sentence above, also these words are in alphabetical order, for example:
["drink","drinks","john","milk","milks"]
I want to create a function (without using libraries) which returns a tuple of three words that together can form the anagram of the given sentence. This tuple has to be the last possible anagram of the sentence. If the words in the given list can't be used to form the given sentence, the function should return None. Since I know I'm very bad at explaining things I'll try to give you some examples:
For example, with:
sentence = "johndrinksmilk"
g_list = ["drink","drinks","john","milk","milks"]
the result should be:
r_result = ("milks","john","drink")
while these results should be wrong:
w_result = ("drinks","john","milk")
w_result = None
w_result = ("drink","john","milks")
I tried this:
def find_anagram(sentence, g_list):
g_list.reverse()
for fword in g_list:
if g_list.index(fword) == len(g_list)-1:
break
for i in range(len(fword)):
sentence_1 = sentence.replace(fword[i],"",1)
if sentence_1 == "":
break
count2 = g_list.index(fword)+1
for sword in g_list[count2:]:
if g_list.index(sword) == len(g_list)-1:
break
for i in range(len(sword)):
if sword.count(sword[i]) > sentence_1.count(sword[i]):
break
else:
sentence_2 = sentence_1.replace(sword[i],"",1)
count3 = g_list.index(sword)+1
if sentence_2 == "":
break
for tword in g_list[count3:]:
for i in range(len(tword)):
if tword.count(tword[i]) != sentence_2.count(tword[i]):
break
else:
return (fword,sword,tword)
return None
but instead of returning:
("milks","john","drink")
it returns:
None
Can anyone please tell me what's wrong? If you think my function is bad feel free to show me a different approach (but still without using libraries), because I have the feeling my function is both complex and very slow (and wrong of course...).
Thanks for your time.
Edit: new examples as requested.
sentence = "markeatsbread"
a_list = ["bread","daerb","eats","kram","mark","stae"] #these are all the possibles anagrams
the correct result is:
result = ["stae","mark","daerb"]
wrong results should be:
result = ["mark","eats","bread"] #this could be a possible anagram, but I need the last possible one
result = None #can't return None because there's at least one anagram
Try this and see if it works with all of your cases:
def findAnagram(sentence, word_list):
word_list.reverse()
for f_word in word_list:
if word_list[-1] == f_word:
break
index1 = word_list.index(f_word) + 1
for s_word in word_list[index1:]:
if word_list[-1] == s_word: break
index2 = word_list.index(s_word) + 1
for t_word in word_list[index2:]:
if (sorted(list(f_word + s_word + t_word)) == sorted(list(sentence))):
return (f_word, s_word, t_word)
Hopefully this helps you

Comparing spaces in python

I am creating a cipher script in python without any modules but I have come accross a problem that i cant solve. When I am comparing msg[3] which has the value (space) it should be equal to bet[26] which is also a space. If i compare msg[3] with bet[26] in the shell...
>>>msg[3] == bet[26]
True
The output is True. However when i run the program and output the value of enmsg there is no value 26 where the value 26 should be.
enmsg = []
msg = "try harder"
bet = "abcdefghijklmnopqrstuvwxyz "
for x in range(0, len(msg)):
for i in range(0, 26):
if msg[x] == bet[i]:
print(msg[x])
enmsg.append(i)
You should get out of the habit of iterating over a range of indices and then looking up the value at the index. Instead iterate directly over your iterables, using enumerate when necessary.
enmsg = []
msg = "try harder"
bet = "abcdefghijklmnopqrstuvwxyz "
for msg_char in msg:
for index, bet_char in enumerate(bet):
if msg_char == bet_char:
print(msg_char)
enmsg.append(index)
Your second loop iterations are too short so it is not reaching the space symbol.
Try with this:
enmsg = []
msg = "try harder"
bet = "abcdefghijklmnopqrstuvwxyz "
for x in range(0, len(msg)):
for i in range(len(bet)):
if msg[x] == bet[i]:
print(msg[x])
enmsg.append(i)
The upper bound of range is not inclusive; you'll need to extend this by one to actually check the 26th index of the string. Better yet, iterate up through len(bet) as you did for len(msg) for the outer loop.

closed or open parentheses checker

I am trying to write program to determine whether or not a mathematical expression contains matching parentheses. I need to check if they have the same amount of left vs right, and then from that determine if they are open. But I'm not sure how to do that. After I get the expression, nothing comes out. I know there is a better way to find out if they are closed... but I can't figure it out
an example of an expression they could check would be (5+5)/(5+2-5), or something like that
def main():
left = 0
right = 0
even = (0 or 2 or 4 or 6 or 8 or 10 or 12 or 14) #is there a better way to check if they match rather than doing even or odd?
odd = (1 or 3 or 5 or 7 or 9 or 11 or 13 or 15)
if expression == "(":
left += 1
elif expression == ")":
right -= 1
expression = input("write a mathematical expression with parentheses") #lets user input mathematical expression to evaluate for correct number of parentheses
parentheses = (left + right) #this is probably not the most efficient way, I just want to find out if my parentheses match, so suggestions here would help
if parentheses == even:
print("Your parentheses are closed")
if parentheses == odd:
print("You are missing a parenthese")
main()
There are a couple of issues in your code:
You need to iterate over each character in your input string (expression) to see if it's a parenthesis or something else, and then count them.
You need to do 1 before deciding whether parentheses are closed or open
Not a problem per se, but you can test whether something is even with the modulo operator %, rather than defining even and odd variables.
There are a couple of inefficiencies and unnecessary variables in your code
The code below should do what you're after:
def main():
left = 0
right = 0
expression = raw_input("write a mathematical expression with parentheses: ")
for character in expression:
if character == '(':
left = left + 1
elif character == ')':
right = right + 1
total = left + right
if (left < right):
missing = "left"
elif (right < left):
missing = "right"
if (total % 2) == 0:
print("Your parentheses are closed")
else:
print("You are missing a " + missing + " parentheses")
main()

Binary search code not working

Good afternoon everyone,
I'm trying to sort out names which are already sorted in alphabetical order. I can't figure out why my program isn't working. Any tips or pointers would be nice. Thanks.
def main():
names = ['Ava Fiscer', 'Bob White', 'Chris Rich', 'Danielle Porter', 'Gordon Pike', 'Hannah Beauregard', 'Matt Hoyle', 'Ross Harrison', 'Sasha Ricci', 'Xavier Adams']
input('Please enter the name to be searched: ', )
binarySearch
main()
def binarySearch(names):
first = 0
last = len(names) - 1
position = -1
found = False
while not found and first <= last:
middle = (first + last) / 2
if names[middle] == value:
found = True
position = middle
elif arr[middle] > value:
last = middle -1
else:
first = middle + 1
return position
What does it mean that the program isn't working? Is it a syntax error or is the problem in the wrong results?
With the code you pasted, there are several indentation problems, but besides that, lines:
input('Please enter the name to be searched: ', )
binarySearch
are also syntactically incorrect, the comma is redundant and only the function name appearing just like that is plain wrong. If you are interested in the correctness of your algorithm, it seems alright, but the boundaries can always be tricky. My code below is working and syntactically correct, if you find it helpful. (names are numbers, but that is irrelevant in this case)
names = [1,2,4,5,6,8,9]
def bs(n):
start = 0
end = len(names)
while end - start > 0:
m = (start+end)/2
if names[m] == n:
return m
elif n < names[m]:
end = m
else:
start = m + 1
return -1
print (bs(1))
print (bs(6))
print (bs(9))
print (bs(3))
print (bs(10))
print (bs(-8))
Another thing I would like to point out is that this kind of binary search is already in the python standard library, the bisect module. However, if you are writing your own for practice or for any other reason that is just fine.
if you are using python 3.* then you are going to want to change
m = (start+end)/2
to
m = (start+end)//2
When you do /2 it outputs a float in 3.*

Resources