Having trouble with str.find() - string

I'm trying to use the str.find() and it keeps raising an error, what am I doing wrong?
import codecs
def countLOC(inFile):
""" Receives a file and then returns the amount
of actual lines of code by not counting commented
or blank lines """
LOC = 0
for line in inFile:
if line.isspace():
continue
comment = line.find('#')
if comment > 0:
for letter in range(comment):
if not letter.whitespace:
LOC += 1
break
return LOC
if __name__ == "__main__":
while True:
file_loc = input("Enter the file name: ").strip()
try:
source = codecs.open(file_loc)
except:
print ("**Invalid filename**")
else:
break
LOC_count = countLOC(source)
print ("\nThere were {0} lines of code in {1}".format(LOC_count,source.name))
Error
File "C:\Users\Justen-san\Documents\Eclipse Workspace\countLOC\src\root\nested\linesOfCode.py", line 12, in countLOC
comment = line.find('#')
TypeError: expected an object with the buffer interface

Use the built-in function open() instead of codecs.open().
You're running afoul of the difference between non-Unicode (Python 3 bytes, Python 2 str) and Unicode (Python 3 str, Python 2 unicode) string types. Python 3 won't convert automatically between non-Unicode and Unicode like Python 2 will. Using codecs.open() without an encoding parameter returns an object which yields bytes when you read from it.
Also, your countLOC function won't work:
for letter in range(comment):
if not letter.whitespace:
LOC += 1
break
That for loop will iterate over the numbers from zero to one less than the position of '#' in the string (letter = 0, 1, 2...); whitespace isn't a method of integers, and even if it were, you're not calling it.
Also, you're never incrementing LOC if the line doesn't contain #.
A "fixed" but otherwise faithful (and inefficient) version of your countLOC:
def countLOC(inFile):
LOC = 0
for line in inFile:
if line.isspace():
continue
comment = line.find('#')
if comment > 0:
for letter in line[:comment]:
if not letter.isspace():
LOC += 1
break
else:
LOC += 1
return LOC
How I might write the function:
def count_LOC(in_file):
loc = 0
for line in in_file:
line = line.lstrip()
if len(line) > 0 and not line.startswith('#'):
loc += 1
return loc

Are you actually passing an open file to the function? Maybe try printing type(file) and type(line), as there's something fishy here -- with an open file as the argument, I just can't reproduce your problem! (There are other bugs in your code but none that would cause that exception). Oh btw, as best practice, DON'T use names of builtins, such as file, for your own purposes -- that causes incredible amounts of confusion!

Related

Problem with decoding bytes to text and comparison?

I do not understand why when you open a document in bytes format with the 'open' function and decode it to text, when compared to a variable that contains exactly the same text, python says they are different. But that only happens when the decoded text of the document has line breaks.
example:
o = open('New.py','rb')
t = o.read().decode()
x = '''this is a
message for test'''
if t == x:
print('true')
else:
print('false')
Although the decoded text 't' and the text of the 'x' are exactly the same, python recognizes them as different and prints false.
I have really tried to find the difference in many ways but I still don't understand how they differ and how I can convert 't' to equal 'x'?
It's because the line breaks are still part of the string (represented as \n) even if you don't see it.
import binascii
o = open('new.py','rb')
t = o.read().decode()
print(binascii.hexlify(t.encode()))
# b'7468697320697320610a6d65737361676520666f7220746573740a'
x = '''this is a
message for test'''
print(binascii.hexlify(x.encode()))
# b'7468697320697320610a6d65737361676520666f722074657374'
Here, 0x0a at the end of t is the byte representation for the new line.
To make them the same, you need to strip out whitespaces and new lines. Assuming new.py looks like this (same as the value for x):
this is a
message for test
Then just do this:
o = open('new.py','rb')
t = o.read().decode().strip()
x = '''this is a
message for test'''
if t == x:
print('true')
else:
print('false')

reduce the number of IF statements in Python

I have written a function that is going to have up to 72 IF statements
and i was hoping to write code that will be much shorter, but have no idea where to start
The function reads the self.timeselect variable when a radio button is selected and the result is saved to a text file called missing_time.txt. If the result is equal to 1 then save "0000" to the file, if the result is 2 save then 0020 to the text file etc. This can be for 72 possible combinations.
Is there a smarter way to simplify the function ?
def buttonaction():
selectedchoice = ""
if self.timeselect.get() == 1:
selectedchoice = "0000"
orig_stdout = sys.stdout
f = open('missing_time.txt', 'w')
sys.stdout = f
print(selectedchoice)
f.close()
if self.timeselect.get() == 2:
selectedchoice = "0020"
orig_stdout = sys.stdout
f = open('missing_time.txt', 'w')
sys.stdout = f
print(selectedchoice)
f.close()
self.timeselect = tkinter.IntVar()
self.Radio_1 = tkinter.Radiobutton(text="0000",variable =
self.timeselect,indicator = 0 ,value=1)
self.Radio_1.place(x=50,y=200)
self.Radio_2 = tkinter.Radiobutton(text="0020",variable =
self.timeselect,indicator = 0 ,value=2)
self.Radio_2.place(x=90,y=200)
choice_map = {
1 : "0000",
2 : "0020"
}
def buttonaction():
selected = self.timeselect.get()
if 0 < selected < 73: # This works as intended in Python
selectedchoice = choice_map[selected]
# Do you intend to append to file instead of replacing it?
# See text below.
with open("missing_time.txt", 'w') as outfile:
outfile.write(selectedchoice + "\n")
print(selectedchoice)
Better yet, if there is a pattern that relates the value of self.timeselect.get() to the string that you write out, generate selectchoice directly from that pattern instead of using a dictionary to do the mapping.
Edit
I find it a bit odd that you are clearing the file "missing_time.txt" every time you call buttonaction. If your intention is to append to it, change the file mode accordingly.
Also, instead of opening and closing the file each time, you might just want to open it once and pass the handler to buttonaction or keep it as a global depending on how you use it.
Finally, if you do not intend to catch the KeyError from an invalid key, you can do what #Clifford suggests and use choice_map.get(selected, "some default value that does not have to be str").
All you need to do in this case is construct a string from the integer value self.timeselect.get().
selectedchoice = self.timeselect.get()
if 0 < selectedchoice < 73:
orig_stdout = sys.stdout
f = open('missing_time.txt', 'w')
sys.stdout = f
print( str(selectedchoice).zfill(4) ) # Convert choice to
# string with leading
# zeros to 4 charaters
f.close()
Further in the interests of simplification, redirecting stdout and restoring it is a cumbersome method of outputting to a file. Instead, you can write directly to the file:
with open('missing_time.txt', 'w') as f:
f.write(selectedchoice + "\n")
Note that because we use the with context manager here, f is automatically closed when we leave this context so there is no need to call f.close(). Ultimately you end up with:
selectedchoice = self.timeselect.get()
if 0 < selectedchoice < 73:
with open('missing_time.txt', 'w') as f:
f.write( str(selectedchoice).zfill(4) + "\n" )
Even if you did use the conditionals each one differs only in the first line, so only that part need be conditional and the remainder of the content performed after the conditionals. Moreover all conditionals are mutually exclusive so you can use else-if:
if self.timeselect.get() == 1:
selectedchoice = "0000"
elif self.timeselect.get() == 2:
selectedchoice = "0020"
...
if 0 < selectedchoice < 73:
with open('missing_time.txt', 'w') as f:
f.write(selectedchoice + "\n")
In circumstances where there is no direct arithmetic relationship between selectchoice and the required string, or the available choices are perhaps not contiguous, it is possible to implement a switch using a dictionary:
choiceToString = {
1: "0001",
2: "0002",
...
72: "0072",
}
selectedchoice = choiceToString.get( self.timeselect.get(), "Invalid Choice")
if selectedchoice != "Invalid Choice":
with open('missing_time.txt', 'w') as f:
f.write(selectedchoice + "\n")
Since there is no switch statement in Python, you can't really reduce the number of if statements. But I see 2 two way to optimize and reduce your code length.
First, you can use some
if condition:
elif condition:
instead of
if condition:
if condition:
since you can't have self.timeselect.get() evaluated to more than one int.
Secondly you can wrap all the code that doesn't vary in a function.
You can get rid of selectedchoice and put
orig_stdout = sys.stdout
f = open('missing_time.txt', 'w')
sys.stdout = f
print(selectedchoice)
f.close()
in a function writeToFile(selectedOption)
I'm assuming that the values are arbitrary and there's no defined pattern. I also see that the only thing that changes in your code is the selectedChoice variable. You can use a Dictionary in such cases. A dictionary's elements are key/value pairs so you can reference the key and get the value.
dictionary = {
1:"0000",
2:"0020",
3:"0300",
4:"4000"
}
def buttonAction():
selectedChoice = dictionary[self.timeselect.get()]
if 0<selectedChoice<=72:
f=open('missing_time.txt','w')
f.write(selectedChoice+" ")
f.close()
print(choice)

Having Issues Concatenating Strings into list without \n - Python3

I am currently having some issues trying to append strings into a new list. However, when I get to the end, my list looks like this:
['MDAALLLNVEGVKKTILHGGTGELPNFITGSRVIFHFRTMKCDEERTVIDDSRQVGQPMH\nIIIGNMFKLEVWEILLTSMRVHEVAEFWCDTIHTGVYPILSRSLRQMAQGKDPTEWHVHT\nCGLANMFAYHTLGYEDLDELQKEPQPLVFVIELLQVDAPSDYQRETWNLSNHEKMKAVPV\nLHGEGNRLFKLGRYEEASSKYQEAIICLRNLQTKEKPWEVQWLKLEKMINTLILNYCQCL\nLKKEEYYEVLEHTSDILRHHPGIVKAYYVRARAHAEVWNEAEAKADLQKVLELEPSMQKA\nVRRELRLLENRMAEKQEEERLRCRNMLSQGATQPPAEPPTEPPAQSSTEPPAEPPTAPSA\nELSAGPPAEPATEPPPSPGHSLQH\n']
I'd like to remove the newlines somehow. I looked at other questions on here and most suggest to use .rstrip however in adding that to my code, I get the same output. What am I missing here? Apologies if this question has been asked.
My input also looks like this(took the first 3 lines):
sp|Q9NZN9|AIPL1_HUMAN Aryl-hydrocarbon-interacting protein-like 1 OS=Homo sapiens OX=9606 GN=AIPL1 PE=1 SV=2
MDAALLLNVEGVKKTILHGGTGELPNFITGSRVIFHFRTMKCDEERTVIDDSRQVGQPMH
IIIGNMFKLEVWEILLTSMRVHEVAEFWCDTIHTGVYPILSRSLRQMAQGKDPTEWHVHT
from sys import argv
protein = argv[1] #fasta file
sequence = '' #string linker
get_line = False #False = not the sequence
Uniprot_ID = []
sequence_list =[]
with open(protein) as pn:
for line in pn:
line.rstrip("\n")
if line.startswith(">") and get_line == False:
sp, u_id, name = line.strip().split('|')
Uniprot_ID.append(u_id)
get_line = True
continue
if line.startswith(">") and get_line == True:
sequence.rstrip('\n')
sequence_list.append(sequence) #add the amino acids onto the list
sequence = '' #resets the str
if line != ">" and get_line == True: #if the first line is not a fasta ID and is it a sequence?
sequence += line
print(sequence_list)
Per documentation, rstrip removes trailing characters – the ones at the end. You probably misunderstood others' use of it to remove \ns because typically those would only appear at the end.
To replace a character with something else in an entire string, use replace instead.
These commands do not modify your string! They return a new string, so if you want to change something 'in' a current string variable, assign the result back to the original variable:
>>> line = 'ab\ncd\n'
>>> line.rstrip('\n')
'ab\ncd' # note: this is the immediate result, which is not assigned back to line
>>> line = line.replace('\n', '')
>>> line
'abcd'
When I asked this question I didn't take my time in looking at documentation & understanding my code. After looking, I realized two things:
my code isn't actually getting what I am interested in.
For the specific question I asked, I could have simply used line.split() to remove the '\n'.
sequence = '' #string linker
get_line = False #False = not the sequence
uni_seq = {}
"""this block of code takes a uniprot FASTA file and creates a
dictionary with the key as the uniprot id and the value as a sequence"""
with open (protein) as pn:
for line in pn:
if line.startswith(">"):
if get_line == False:
sp, u_id, name = line.strip().split('|')
Uniprot_ID.append(u_id)
get_line = True
else:
uni_seq[u_id] = sequence
sequence_list.append(sequence)
sp, u_id, name = line.strip().split('|')
Uniprot_ID.append(u_id)
sequence = ''
else:
if get_line == True:
sequence += line.strip() # removes the newline space
uni_seq[u_id] = sequence
sequence_list.append(sequence)

Python: having trouble with for loop saving only the last object multiple times

I've made the following code trying to load from a newline seperated textfile. It stores apple objects made up of colour then size then kind (each on a newline). The weird thing is that the load function works but it returns all the loaded objects as identical to the last object loaded in (but it puts the correct number of objects in the list based on the lines in the textfile. The print near the append shows the correct data being read though for each object though...
I'm not sure what is wrong and how I rectify it?
def loadInApplesTheOtherWay(filename):
tempList = []
#make a tempList and load apples from file into it
with open(filename,"r") as file:
#file goes, colour \n size \n kind \n repeat...
lines = file.read().splitlines()
count = 1
newApple = apple()
for line in lines:
if count % 3 == 1:
newApple.colour = line
if count % 3 == 2:
newApple.size = line
if count % 3 == 0:
newApple.kind = line
tempList.append(newApple)
print(newApple)
count +=1
return tempList
newApple is just a object reference.
>>> list(map(id, tempList))
The above line will show all apple is of the same id. As last modification of newApple is at the file end, so tempList are all the same as last apple object.
To make it differ, you need to deepcopy the object, such as tempList.append(copy.deepcopy(newApple))see https://docs.python.org/3/library/copy.html for more details.
Or you can create the object on the fly, you don't have to allocate newApple before for loop.
def loadInApplesTheOtherWay(filename):
tempList = []
#make a tempList and load apples from file into it
with open(filename,"r") as file:
#file goes, colour \n size \n kind \n repeat...
lines = file.read().splitlines()
count = 1
for line in lines:
if count % 3 == 1:
colour = line
if count % 3 == 2:
size = line
if count % 3 == 0:
kind = line
newApple = Apple(colour, size, kind)
tempList.append(newApple)
print(newApple)
count +=1
return tempList
You need to move newApple = apple() inside of the for loop.

How do I sum up values from a text file in Python?

I know there are a couple of post about this question on S.O. but they have not helped me solve my problem. I am trying to use an accumulator to sum up the values in a text file. When there is a number on each line my code just prints each line that is in the file. When there is a blank space between I get an error message. I think it is a simple oversight but I am new to Python so I am not sure what I am doing wrong.
My code:
def main():
#Open a file named numbers.txt
numbers_file = open('numbers.txt','r')
#read the numbers on the file
number = numbers_file.readline()
while number != '':
#convert to integer
int_number = int(number)
#create accumulator
total = 0
#Accumulates a total number
total += int_number
#read the numbers on the file
number = numbers_file.readline()
#Print the data that was inside the file
print(total)
#Close the the numbers file
numbers_file.close()
#Call the main function
main()
Inputs in the text file:
100
200
300
400
500
Gives me error message:
ValueError: invalid literal for int() with base 10: '\n'
Inputs in the text file:
100
200
300
400
500
Prints:
100
200
300
400
500
You need to exclude empty lines because you can't convert them to an int(). One pythonic (EAFP) way to do this is to catch the exception and ignore (though this will silently ignore any non-number line):
with open('numbers.txt','r') as numbers_file:
total = 0
for line in numbers_file:
try:
total += int(line)
except ValueError:
pass
print(total)
Or you can explicitly test that you don't have an empty string after you .strip() all the whitespace (this would still error for a non-numeric line, e.g. 'hello'):
with open('numbers.txt','r') as numbers_file:
total = 0
for line in numbers_file:
if line.strip():
total += int(line)
print(total)
This second one can be written as a generator expression:
with open('numbers.txt','r') as numbers_file:
total = sum(int(line) for line in numbers_file if line.strip())
print(total)
You are assigning the value 0 to your accumulator each time you go through the loop, before you add the new value. This means you're adding the new value to 0 each time, which means you're just printing the new value.
If you move the line total = 0 to occur before the loop, then it should work as you were hoping.
If you want, you can clean this up a little:
numbers_file = open('numbers.txt','r')
total = 0
for number in numbers_file:
if number:
int_number = int(number)
total += int_number
print(total)
numbers_file.close()
would be a first pass. The check if number returns True if number contains a "truthy" value, which in this case would happen if you hit an empty line.
Hi you are missing to remove the 'new line symbol' which is \n.
To ensure you get only literals that can be converted to numbers you have to strip other characters.
With e.g.
a = '100\ntest'
print(a.isnumeric())
a = '103478'
print(a.isnumeric())
You can test if there is a character that prevents conversion to a number.
The regular expression package to manipulate string easily.
See this stack overflow threat.
import re
a = jkfads1000ki'
re.sub('\D','',a)
'1000'
See the Python docs on re.

Resources