Problem with decoding bytes to text and comparison? - python-3.x

I do not understand why when you open a document in bytes format with the 'open' function and decode it to text, when compared to a variable that contains exactly the same text, python says they are different. But that only happens when the decoded text of the document has line breaks.
example:
o = open('New.py','rb')
t = o.read().decode()
x = '''this is a
message for test'''
if t == x:
print('true')
else:
print('false')
Although the decoded text 't' and the text of the 'x' are exactly the same, python recognizes them as different and prints false.
I have really tried to find the difference in many ways but I still don't understand how they differ and how I can convert 't' to equal 'x'?

It's because the line breaks are still part of the string (represented as \n) even if you don't see it.
import binascii
o = open('new.py','rb')
t = o.read().decode()
print(binascii.hexlify(t.encode()))
# b'7468697320697320610a6d65737361676520666f7220746573740a'
x = '''this is a
message for test'''
print(binascii.hexlify(x.encode()))
# b'7468697320697320610a6d65737361676520666f722074657374'
Here, 0x0a at the end of t is the byte representation for the new line.
To make them the same, you need to strip out whitespaces and new lines. Assuming new.py looks like this (same as the value for x):
this is a
message for test
Then just do this:
o = open('new.py','rb')
t = o.read().decode().strip()
x = '''this is a
message for test'''
if t == x:
print('true')
else:
print('false')

Related

Sorting sentences of text file by users input

My code is sorting the sentences of file based on a length of the sentences by their length and saving to a new file.
How can I alter my code so that if the user inputs any number at program start, we filter the lines based on that input.
Example: The user inputs 50 - the program will sort all sentences that have a greater length than 50 or if the user inputs all then the program will sort all lines as normal.
My code:
file = open("testing_for_tools.txt", "r")
lines_ = file.readlines()
#print(lines_)
user_input = input("enter")
if user_input is int:
lines = sorted(lines_, key=len)
else:
lines = sorted(lines_, key=len)
# lines.sort()
file_out = open('testing_for_tools_sorted.txt', 'w')
file_out.write(''.join(lines)) # Write a sequence of strings to a file
file_out.close()
file.close()
print(lines)
input returns a string, always, if you want an integer or somesuch you need to parse it explicitely, you will never get an integer out of input.
is is not a type-testing primitive in python, it's an identity primitive. It checks if the left and right are the same object and that's it.
filter is what you're looking for here, or a list comprehension: if the user provided an input and that input is a valid integer, you want to filter the lines to only those above the specified length. This is a separate step from sorting.
That aside,
you should use with to manage files unless there are specific reasons that you shan't or can't
files have a writelines method which should be more efficient than writing joined lines
never ever open files in text mode without providing an encoding, otherwise Python asks the system for an encoding and it's easy for that system to be misconfigured or oddly configured leading to garbage inputs
with open("testing_for_tools.txt", "r", encoding='utf-8') as f:
lines_ = file.readlines()
#print(lines_)
user_input = input("enter")
if user_input:
try:
limit = int(user_input.strip())
except ValueError:
pass
else:
lines_ = (l for l in lines_ if len(l) >= limit)
lines = sorted(lines_, key=len)
with open('testing_for_tools_sorted.txt', 'w', encoding='utf-8') as f:
f.writelines(lines)
print(lines)
#Black Snow
I don't have anything else to answer if its working as expected.
This is a rather long answer:
idx_to_sort = [True if len(i)>int(user_input) else False for i in lines_]
idx_to_sort
lines_to_sort = []
for i in range(len(idx_to_sort)):
if idx_to_sort[i]:
lines_to_sort.append(lines_[i])
lines_to_sort
lines = sorted(lines_to_sort, key=len)
lines
counter=0
for i in range(len(idx_to_sort)):
if idx_to_sort[i]:
lines_[i] = lines[counter]
counter += 1
lines_
The output would be different but not what you expected.

If statement get's a skipped, while only else statement get's printed .And how do I store a string or int in a single variable?

I was trying to do an exercise ,which asked us to solve this following problem
Exercise problem image
which I tried to do ,but by not using same exact keywords as shown in the exercise.
Here is my code
def StringLength(length_of_String):
return len(text)
text = input("length_of_String :")
if type(text) == int:
print ("python doesn't show length of integers")
else :
print (len(text))
But the problem I get here is , if I add any text say like"joker" .
It will output me length as "5",which is correct .
But when I type any integer or float , say "101" , it still prints it length as "3" because it is reading it as a string.
So how come I add Variable in which when I input a integer or string , it should recognise it as a string or an integer
some_variable = input() by default will give you string. You may want to modify your code:
def is_number(s):
try:
float(s)
return True
except ValueError:
return False
def StringLength():
text = input('Enter:')
if is_number(text):
print ("python doesn't show length of integers")
else :
return(len(text))
#StringLength() #Remove the '#' at the start of the line to test the function
Edit: I have added a function to test if the entered value is a number or not

Having Issues Concatenating Strings into list without \n - Python3

I am currently having some issues trying to append strings into a new list. However, when I get to the end, my list looks like this:
['MDAALLLNVEGVKKTILHGGTGELPNFITGSRVIFHFRTMKCDEERTVIDDSRQVGQPMH\nIIIGNMFKLEVWEILLTSMRVHEVAEFWCDTIHTGVYPILSRSLRQMAQGKDPTEWHVHT\nCGLANMFAYHTLGYEDLDELQKEPQPLVFVIELLQVDAPSDYQRETWNLSNHEKMKAVPV\nLHGEGNRLFKLGRYEEASSKYQEAIICLRNLQTKEKPWEVQWLKLEKMINTLILNYCQCL\nLKKEEYYEVLEHTSDILRHHPGIVKAYYVRARAHAEVWNEAEAKADLQKVLELEPSMQKA\nVRRELRLLENRMAEKQEEERLRCRNMLSQGATQPPAEPPTEPPAQSSTEPPAEPPTAPSA\nELSAGPPAEPATEPPPSPGHSLQH\n']
I'd like to remove the newlines somehow. I looked at other questions on here and most suggest to use .rstrip however in adding that to my code, I get the same output. What am I missing here? Apologies if this question has been asked.
My input also looks like this(took the first 3 lines):
sp|Q9NZN9|AIPL1_HUMAN Aryl-hydrocarbon-interacting protein-like 1 OS=Homo sapiens OX=9606 GN=AIPL1 PE=1 SV=2
MDAALLLNVEGVKKTILHGGTGELPNFITGSRVIFHFRTMKCDEERTVIDDSRQVGQPMH
IIIGNMFKLEVWEILLTSMRVHEVAEFWCDTIHTGVYPILSRSLRQMAQGKDPTEWHVHT
from sys import argv
protein = argv[1] #fasta file
sequence = '' #string linker
get_line = False #False = not the sequence
Uniprot_ID = []
sequence_list =[]
with open(protein) as pn:
for line in pn:
line.rstrip("\n")
if line.startswith(">") and get_line == False:
sp, u_id, name = line.strip().split('|')
Uniprot_ID.append(u_id)
get_line = True
continue
if line.startswith(">") and get_line == True:
sequence.rstrip('\n')
sequence_list.append(sequence) #add the amino acids onto the list
sequence = '' #resets the str
if line != ">" and get_line == True: #if the first line is not a fasta ID and is it a sequence?
sequence += line
print(sequence_list)
Per documentation, rstrip removes trailing characters – the ones at the end. You probably misunderstood others' use of it to remove \ns because typically those would only appear at the end.
To replace a character with something else in an entire string, use replace instead.
These commands do not modify your string! They return a new string, so if you want to change something 'in' a current string variable, assign the result back to the original variable:
>>> line = 'ab\ncd\n'
>>> line.rstrip('\n')
'ab\ncd' # note: this is the immediate result, which is not assigned back to line
>>> line = line.replace('\n', '')
>>> line
'abcd'
When I asked this question I didn't take my time in looking at documentation & understanding my code. After looking, I realized two things:
my code isn't actually getting what I am interested in.
For the specific question I asked, I could have simply used line.split() to remove the '\n'.
sequence = '' #string linker
get_line = False #False = not the sequence
uni_seq = {}
"""this block of code takes a uniprot FASTA file and creates a
dictionary with the key as the uniprot id and the value as a sequence"""
with open (protein) as pn:
for line in pn:
if line.startswith(">"):
if get_line == False:
sp, u_id, name = line.strip().split('|')
Uniprot_ID.append(u_id)
get_line = True
else:
uni_seq[u_id] = sequence
sequence_list.append(sequence)
sp, u_id, name = line.strip().split('|')
Uniprot_ID.append(u_id)
sequence = ''
else:
if get_line == True:
sequence += line.strip() # removes the newline space
uni_seq[u_id] = sequence
sequence_list.append(sequence)

How to decode a list of letters and rebuild the original word in Python?

So I'm trying to make a makeshift encoder/decoder without the use of modules and well my method works with singular letters but not words. I have the code set up so it encodes every letter of the word with a key you choose.
What I'm wondering is how can you decode a list of encoded numbers one by one and then rebuild the word. This would be amazing and very helpful thanks.
P.S. I'm a beginner in Python and this is my second day so I tried everything I know also please don't use any modules.
while True :
option = input('Encode or Decode? : ')
if option == 'encode':
start = input('What word do you want to be encoded?: ')
word = start
key = int(input('What key would you like to use?: '))
z=[]
for i in word:
encoder = ord(i)*key+key/key
z.append(encoder)
print(z)
else:
start = float(input('What encoded string do you want to be decoded?: '))
key = int(input('What key would you like to use?: '))
decoder = start/key
print(chr(round(decoder)))
What you could do to decode is to type the sequence of numbers back into the code I have adjusted for you:
else:
x = []
start = (input('What encoded nubmers do you want to be decoded?: '))
split_list = start.split()
key = int(input('What key would you like to use?: '))
for i in split_list:
integer = int(i)
decoder = int(integer/key)
letter = chr(decoder)
x.append(letter)
print("".join(x))
start.split() splits the code into separate strings and puts them in a list, split_list. The code then checks every number in split_list and decodes the number, then turns it back into a character. It then prints the joined result of the characters.
For example, if I encode apple with key 5, then run the decoder and type 486 561 561 541 506 with key 5 it successfully returns apple.
This even works for multiple words, as I tried encoding hello world then decoding it and it was successful. I hope this helps! :)

Having trouble with str.find()

I'm trying to use the str.find() and it keeps raising an error, what am I doing wrong?
import codecs
def countLOC(inFile):
""" Receives a file and then returns the amount
of actual lines of code by not counting commented
or blank lines """
LOC = 0
for line in inFile:
if line.isspace():
continue
comment = line.find('#')
if comment > 0:
for letter in range(comment):
if not letter.whitespace:
LOC += 1
break
return LOC
if __name__ == "__main__":
while True:
file_loc = input("Enter the file name: ").strip()
try:
source = codecs.open(file_loc)
except:
print ("**Invalid filename**")
else:
break
LOC_count = countLOC(source)
print ("\nThere were {0} lines of code in {1}".format(LOC_count,source.name))
Error
File "C:\Users\Justen-san\Documents\Eclipse Workspace\countLOC\src\root\nested\linesOfCode.py", line 12, in countLOC
comment = line.find('#')
TypeError: expected an object with the buffer interface
Use the built-in function open() instead of codecs.open().
You're running afoul of the difference between non-Unicode (Python 3 bytes, Python 2 str) and Unicode (Python 3 str, Python 2 unicode) string types. Python 3 won't convert automatically between non-Unicode and Unicode like Python 2 will. Using codecs.open() without an encoding parameter returns an object which yields bytes when you read from it.
Also, your countLOC function won't work:
for letter in range(comment):
if not letter.whitespace:
LOC += 1
break
That for loop will iterate over the numbers from zero to one less than the position of '#' in the string (letter = 0, 1, 2...); whitespace isn't a method of integers, and even if it were, you're not calling it.
Also, you're never incrementing LOC if the line doesn't contain #.
A "fixed" but otherwise faithful (and inefficient) version of your countLOC:
def countLOC(inFile):
LOC = 0
for line in inFile:
if line.isspace():
continue
comment = line.find('#')
if comment > 0:
for letter in line[:comment]:
if not letter.isspace():
LOC += 1
break
else:
LOC += 1
return LOC
How I might write the function:
def count_LOC(in_file):
loc = 0
for line in in_file:
line = line.lstrip()
if len(line) > 0 and not line.startswith('#'):
loc += 1
return loc
Are you actually passing an open file to the function? Maybe try printing type(file) and type(line), as there's something fishy here -- with an open file as the argument, I just can't reproduce your problem! (There are other bugs in your code but none that would cause that exception). Oh btw, as best practice, DON'T use names of builtins, such as file, for your own purposes -- that causes incredible amounts of confusion!

Resources