UnicodeDecodeError in decoding the torrent file - python-3.x

I am trying to write the torrent app from scratch just for learning purpose.So after few hours of reading wiki, I wrote some code for decoding of torrent files which use 'Bencoding' "https://en.wikipedia.org/wiki/Bencode" . But unfortunately, I didn't notice about byte string and python string. My code work fine with python string like torrent data, but when I passed torrent byte data, I got the encoding error.
I tried " open(file, 'rb',encoding='utf-8', errors='ignore') ". It did change the byte string into python string.I also tried all the available answers on stakoverflow. But some data were lost as errors and so I can't properly decode the torrent data. Pardon my messy coding and please help... I also read the bencoder library, and it works directly on byte string, so if there are any way that I don't have to re-write the code, please...
with open(torrent_file1, 'rb') as _file:
data = _file.read()
def int_decode(meta, cur):
print('inside int_decode function')
cursor = cur
start = cursor + '1'
end = start
while meta[end] != 'e':
end += 1
value = int(meta[start:end])
cursor = end + 1
print(value, cursor)
return value, cursor
def chr_decode(meta, cur):
print('inside chr_decode function')
cursor = cur
start = cursor
end = start
while meta[end] != ':':
end += 1
chr_len = int(meta[start:end])
chr_start = end + 1
chr_end = chr_start + chr_len
value = meta[chr_start:chr_end]
cursor = chr_end
print(value, cursor)
return value, cursor
def list_decode(meta, cur):
print('inside the list decoding')
cursor = cur+1
new_list = list()
while cursor < (len(meta)):
if meta[cursor] == 'i':
item, cursor = int_decode(meta, cursor)
new_list.append(item)
elif meta[cursor].isdigit():
item, cursor = chr_decode(meta, cursor)
new_list.append(item)
elif meta[cursor] == 'e':
print('list is ended')
cursor += 1
break
return (new_list,cursor)
def dict_decode(meta, cur=0, key_=False, key_val=None):
if meta[cur] == 'd':
print('dict found')
new_dict = dict()
key = key_
key_value = key_val
cursor = cur + 1
while cursor < (len(meta)):
if meta[cursor] == 'i':
value, cursor = int_decode(meta, cursor)
if not key:
key = True
key_value = value
else:
new_dict[key_value] = value
key = False
elif meta[cursor].isdigit():
value, cursor = chr_decode(meta, cursor)
if not key:
key = True
key_value = value
else:
new_dict[key_value] = value
key = False
elif meta[cursor] == 'l':
lists, cursor = list_decode(meta, cursor)
if key:
new_dict[key_value] = lists
key = False
else:
print('list cannot be used as key')
elif meta[cursor] == 'd':
dicts, cursor = dict_decode(meta, cursor)
if not key:
key=True
key_value = dicts
else:
new_dict[key_value] = dicts
key=False
elif meta[cursor] == 'e':
print('dict is ended')
cursor += 1
break
return (new_dict,cursor)
test = 'di323e4:spami23e4:spam5:helloi23e4:spami232ei232eli32e4:doneei23eli1ei2ei3e4:harmee'
test2 = 'di12eli23ei2ei22e5:helloei12eli1ei2ei3eee'
test3 = 'di12eli23ei2ei22ee4:johndi12e3:dggee'
print(len(test2))
new_dict = dict_decode(data)
print(new_dict)
Traceback (most recent call last):
File "C:\Users\yewaiyanoo\Desktop\python\torrent\read_torrent.py", line 8, in
data = _file.read()
File "C:\Users\yewaiyanoo\AppData\Local\Programs\Python\Python37-32\lib\codecs.py", line 701, in read
return self.reader.read(size)
File "C:\Users\yewaiyanoo\AppData\Local\Programs\Python\Python37-32\lib\codecs.py", line 504, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 204: invalid start byte

Python 3 decodes text files when reading, encodes when writing. The default encoding is taken from locale.getpreferredencoding(False), which evidently for your setup returns 'ASCII'. See the open() function documenation:
In text mode, if encoding is not specified the encoding used is platform dependent: locale.getpreferredencoding(False) is called to get the current locale encoding.
Instead of relying on a system setting, you should open your text files using an explicit codec:
currentFile = open(file, 'rb',encoding='latin1', errors='ignore')
where you set the encoding parameter to match the file you are reading.
Python 3 supports UTF-8 as the default for source code.
The same applies to writing to a writeable text file; data written will be encoded, and if you rely on the system encoding you are liable to get UnicodeEncodingError exceptions unless you explicitly set a suitable codec. What codec to use when writing depends on what text you are writing and what you plan to do with the file afterward.
You may want to read up on Python 3 and Unicode in the Unicode HOWTO, which explains both about source code encoding and reading and writing Unicode data.

Related

How do I change a text dictionary from file into usable dictionary

Right so, I need to make this function that basically saves a player's username in a dictionary which is next saved in a text file to be reused again.
The problem is on reusing it I can't manage to get the str that I get from the file into a dictionary.
Here is my code:
from ast import eval
def verification(j, d):
if j in d.keys():
return d
else:
d[j] = [0,0]
return d
savefile = open("save.txt", "r")
'''d = dict()
for line in savefile:
(key, val) = line.split(".")
d[key] = val
print(d)'''
d = savefile.read()
python_dict = literal_eval(d)
savefile.close()
j = input("name? ")
result = verification(j, python_dict)
savefile = open("save.txt", "w")
'''for i in result:
text = i + "." + str(result[i]) + " \n"
savefile.write(text)'''
savefile.write(str(result))
savefile.close()
As you can see I tried with the literal_eval from ast. I also tried to do a .split() but that wouldn't work. So I'm stuck. Any ideas? It would be of great help.
Thanks
There is no need to do your own encoding/decoding from scratch when you have existing libraries to do it for you.
One good example is JSON which is also not Python exclusive so the database you create can be used by other applications.
This can be done easily by:
import json
def verification(j, d):
if j not in d:
d[j] = [0,0]
return d
with open("save.txt", "r") as savefile:
python_dict = json.load(savefile)
j = input("name? ")
result = verification(j, python_dict)
with open("save.txt", "w") as savefile:
json.dump(result, savefile)

'utf-8' codec cant decode byte & IndexError: list index out of range errors

import sys
dataset = open('file-00.csv','r')
dataset_l = dataset.readlines()
When opening the above file, I get the following error:
**UnicodeDecodeError: 'utf-8' codec cant decode byte 0xfe in position 156: invalide start byte**
So I changed code to below
import sys
dataset = open('file-00.csv','r', errors='replace')
dataset_l = dataset.readlines()
I also tried errors='ignore' but for both the initial error now dissapears but later in my code i get another error:
def find_class_1(row):
global file_l_sp
for line in file_l_sp:
if line[0] == row[2] and line[1] == row[4] and line[2] == row[5]:
return line[3].strip()
return 'other'
File "Label_Classify_Dataset.py", line 56, in
dataset_w_label += dataset_l[it].strip() + ',' + find_class_1(l) + ',' + find_class_2(l) + '\n'
File "Label_Classify_Dataset.py", line 40, in find_class_1
if line[0] == row[2] and line[1] == row[4] and line[2] == row[5]:strong text
IndexError: list index out of range
How can I either fix the first or the second error ?
UPDATE....
I have used readline to enumerate and print each line, and have managed to work out which line is causing the error. It is indeed some random character but tshark must have substituted. Deleting this removes the error, but obviously I would rather skip over the lines rather than delete them
with open('file.csv') as f:
for i, line in enumerate(f):
print('{} = {}'.format(i+1, line.strip()))
Im sure there is a better way to do enumerate lol
Try the following;
dataset = open('file-00.csv','rb')
That b in the mode specifier in the open() states that the file shall be treated as binary, so the contents will remain as bytes. No decoding will be performed like this.

Python Cryptography: RSA Key Format is not supported

For some reason, the code that says:
private_key = RSA.import_key(open(privdirec).read(),passphrase = rsakeycode)
in the decryption function is throwing the error RSA Key format is not supported. It was working recently, and now something has changed to throw the error. Could anyone take a look at my code snippets and help?
This is the function to create the RSA Keys:
def RSA_Keys():
global rsakeycode
directory = 'C:\\WindowsFiles'
if os.path.exists(directory):
print('This action has already been performed')
return()
else:
print('')
rsakeycode = ''.join(random.SystemRandom().choice(string.ascii_uppercase + string.digits) for _ in range(32))
f = open('keycode.txt', 'w+')
f.write(rsakeycode)
f.close()
print('Generating RSA Keys...')
key = RSA.generate(4096)
encrypted_key = key.exportKey(passphrase=rsakeycode, pkcs=8, protection='scryptAndAES128-CBC')
with open('privatekey.bin', 'wb') as keyfile1:
keyfile1.write(encrypted_key)
with open('publickey.bin', 'wb') as keyfile:
keyfile.write(key.publickey().exportKey())
try:
if not os.path.exists(directory):
os.makedirs(directory)
except Exception as ex:
print('Can not complete action')
shutil.move('privatekey.bin', 'C:\\users\\bsmith\\Desktop\\privatekey.bin')
shutil.move('publickey.bin', 'C:\\WindowsFiles/publickey.bin')
shutil.move('encrypted_data.txt', 'C:\\WindowsFiles/encrypted_data.txt')
shutil.move('keycode.txt', 'C:\\users\\bsmith\\Desktop\\keycode.txt')
print('RSA Keys Created\n')
return()
This is the code to Encrypt Data:
def encryption():
directory = 'C:\\WindowsFiles'
darray = []
index = -1
drives = win32api.GetLogicalDriveStrings()
count = 1
if not os.path.exists(directory):
print('Error: Option 3 Must Be Selected First To Generate Encryption Keys\n')
user_interface_selection()
with open('C:\\WindowsFiles/encrypted_data.txt', 'ab') as out_file:
filename = ''.join(random.SystemRandom().choice(string.ascii_uppercase + string.digits) for _ in range(8))
recipient_key = RSA.import_key(open('C:\\WindowsFiles/publickey.bin').read())
session_key = get_random_bytes(16)
cipher_rsa = PKCS1_OAEP.new(recipient_key)
out_file.write(cipher_rsa.encrypt(session_key))
cipher_aes = AES.new(session_key, AES.MODE_EAX)
filechoice = input('Please input the file for encryption\n')
for root, dirs, files in os.walk('C:\\', topdown=False):
for name in files:
index += 1
data = (os.path.join(root, name))
darray.append(data)
if filechoice in data:
print(darray[index])
if darray[index].endswith(".lnk"):
print("fail")
elif darray[index].endswith(".LNK"):
print("fail")
elif darray[index].endswith(".txt"):
print(index)
newfile = open(darray[index],'rb')
data = newfile.read()
print(data)
ciphertext, tag = cipher_aes.encrypt_and_digest(data)
out_file.write(cipher_aes.nonce)
out_file.write(tag)
out_file.write(ciphertext)
out_file.close()
newfile.close()
shutil.move('C:\\WindowsFiles/encrypted_data.txt','C:\\WindowsFiles/' + filename + '.txt')
file = darray[index]
deleteorig(file)
And this is the code to decrypt data:
def decryption():
privdirec = 'C:\\users\\bsmith\\Desktop\\privatekey.bin'
count = 0
farray = []
index = 0
for file in os.listdir("C:\\WindowsFiles"):
if file.endswith(".txt"):
count += 1
print(count,end='')
print(':',end='')
print(os.path.join("C:\\WindowsFiles", file))
farray.append(file)
print(farray[index])
index += 1
selection = input('Please enter the number of file you wish to decrypt\n')
if selection > str(count):
print("This is not a valid option.")
elif int(selection) < 1:
print("This is not a valid option.")
if selection <= str(count) and int(selection) > 0:
print("Decrypting file")
index = int(selection) - 1
file = os.path.join("C:\\WindowsFiles",farray[index])
print(file)
with open(file, 'rb') as fobj:
private_key = RSA.import_key(open(privdirec).read(),passphrase = rsakeycode)
enc_session_key, nonce, tag, ciphertext = [fobj.read(x)
for x in
(private_key.size_in_bytes(),
16,16,-1)]
cipher_rsa = PKCS1_OAEP.new(private_key)
session_key = cipher_rsa.decrypt(enc_session_key)
cipher_aes = AES.new(session_key, AES.MODE_EAX, nonce)
data = cipher_aes.decrypt_and_verify(ciphertext, tag)
print(data)
file.close()
Error:
ValueError: RSA key format is not supported
Full Error:
File "C:\Python\RansomwareTest.py", line 702, in decryption private_key = RSA.import_key(open(privdirec).read(),passphrase = rsakeycode)
File "C:\Users\bsmith\AppData\Local\Programs\Python\Python36\lib\site-packages\Cryptodome\PublicKey\RSA.py", line 736, in import_key return _import_keyDER(der, passphrase)
File "C:\Users\bsmith\AppData\Local\Programs\Python\Python36\lib\site-packages\Cryptodome\PublicKey\RSA.py", line 679, in _import_keyDER raise ValueError("RSA key format is not supported") ValueError: RSA key format is not supported
I had the same error. After debugging I found that the format of the key string matters (e.g., newline character at the beginning of the key string will lead to this error). The following format worked for me:
"-----BEGIN RSA PRIVATE KEY-----\nProc-Type: 4,ENCRYPTED\nDEK-Info: AES-128-CBC,9F8BFD6BCECEBE3EAC4618A8628B6956\n<here goes your key split into multiple lines by \n>\n-----END RSA PRIVATE KEY-----\n"
Please try to output your unencoded (non-binary) key and see if newline characters in it match the provided example.
I tested with Python 3.6.9

Never resets list

I am trying to create a calorie counter the standard input goes like this:
python3 calories.txt < test.txt
Inside calories the food is the following format: apples 500
The problem I am having is that whenever I calculate the values for the person it seems to never return to an empty list..
import sys
food = {}
eaten = {}
finished = {}
total = 0
#mappings
def calories(x):
with open(x,"r") as file:
for line in file:
lines = line.strip().split()
key = " ".join(lines[0:-1])
value = lines[-1]
food[key] = value
def calculate(x):
a = []
for keys,values in x.items():
for c in values:
try:
a.append(int(food[c]))
except:
a.append(100)
print("before",a)
a = []
total = sum(a) # Problem here
print("after",a)
print(total)
def main():
calories(sys.argv[1])
for line in sys.stdin:
lines = line.strip().split(',')
for c in lines:
values = lines[0]
keys = lines[1:]
eaten[values] = keys
calculate(eaten)
if __name__ == '__main__':
main()
Edit - forgot to include what test.txt would look like:
joe,almonds,almonds,blue cheese,cabbage,mayonnaise,cherry pie,cola
mary,apple pie,avocado,broccoli,butter,danish pastry,lettuce,apple
sandy,zuchini,yogurt,veal,tuna,taco,pumpkin pie,macadamia nuts,brazil nuts
trudy,waffles,waffles,waffles,chicken noodle soup,chocolate chip cookie
How to make it easier on yourself:
When reading the calories-data, convert the calories to int() asap, no need to do it every time you want to sum up somthing that way.
Dictionary has a .get(key, defaultvalue) accessor, so if food not found, use 100 as default is a 1-liner w/o try: ... except:
This works for me, not using sys.stdin but supplying the second file as file as well instead of piping it into the program using <.
I modified some parsings to remove whitespaces and return a [(name,cal),...] tuplelist from calc.
May it help you to fix it to your liking:
def calories(x):
with open(x,"r") as file:
for line in file:
lines = line.strip().split()
key = " ".join(lines[0:-1])
value = lines[-1].strip() # ensure no whitespaces in
food[key] = int(value)
def getCal(foodlist, defValueUnknown = 100):
"""Get sum / total calories of a list of ingredients, unknown cost 100."""
return sum( food.get(x,defValueUnknown ) for x in foodlist) # calculate it, if unknown assume 100
def calculate(x):
a = []
for name,foods in x.items():
a.append((name, getCal(foods))) # append as tuple to list for all names/foods eaten
return a
def main():
calories(sys.argv[1])
with open(sys.argv[2]) as f: # parse as file, not piped in via sys.stdin
for line in f:
lines = line.strip().split(',')
for c in lines:
values = lines[0].strip()
keys = [x.strip() for x in lines[1:]] # ensure no whitespaces in
eaten[values] = keys
calced = calculate(eaten) # calculate after all are read into the dict
print (calced)
Output:
[('joe', 1400), ('mary', 1400), ('sandy', 1600), ('trudy', 1000)]
Using sys.stdin and piping just lead to my console blinking and waiting for manual input - maybe VS related...

Skipping elif statement?

Am trying to create a simple encryption/decryption using pycryptodome but keeping getting the following error:
ValueError: Error 3 while encrypting in CBC mode
after some digging I saw that you get this error if there is not enough data to encrypt, as in there is no padding in effect. The thing is that I've added a padding function. After debugging it seems as if my code literally skips the padding part completely and causes this error. What am I doing wrong?
import os, random
from Crypto.Cipher import AES
from Crypto.Hash import SHA256
def encrypt(key, filename):
chunksize = 64*1024
outputfile = filename + "(encrypted)"
filesize = str(os.path.getsize(filename)).zfill(16)
IV =''
for i in range(16):
IV += chr(random.randint(0, 0xFF))
encryptor = AES.new(key, AES.MODE_CBC, IV.encode("latin-1"))
with open(filename, 'rb') as infile:
with open(outputfile, 'wb') as outfile:
outfile.write(filesize.encode("latin-1"))
outfile.write(IV.encode("latin-1"))
while True:
chunk = infile.read(chunksize)
print(len(chunk))
if len(chunk) == 0:
break
elif len(chunk) % 16 != 0:
chunk += ' ' * (16 - (len(chunk) % 16))
outfile.write(encryptor.encrypt(chunk))
def decrypt(key, filename):
chunksize = 64 *1024
outputfile = filename[:11]
with open(filename, 'rb') as infile:
filesize = int(infile.read(16))
IV = infile.read(16)
decryptor = AES.new(key, AES.MODE_CBC, IV.encode("latin-1"))
with open(outputfile, 'wb') as outfile:
while True:
chunk = infile.read(chunksize)
if len(chunk) == 0:
break
outfile.write(decryptor.decrypt(chunk))
outfile.truncate(filesize)
def getkey (password):
hasher = SHA256.new(password.encode("latin-1"))
return hasher.digest()
def main():
choice = input ("do you want to [E]ncrypt of [D]ecrypt?")
if choice == 'E':
filename = input("File to encrypt >")
password = input("Password >")
encrypt(getkey(password), filename)
print("Encryption done!")
elif choice == 'D':
filename = input("File to Decrypt >")
password = input("Password >")
decrypt(getkey(password), filename)
print("Decryption done!")
else:
print("No option selected")
if __name__ == '__main__':
main()
*I am using python 3.6
EDIT:
Here are the full console output when I run the code:
C:\Users\itayg\AppData\Local\Programs\Python\Python36\python.exe "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.2\helpers\pydev\pydevd.py" --multiproc --qt-support --client 127.0.0.1 --port 21111 --file C:/Users/itayg/PycharmProjects/PyCrypto/encrypt.py
Connected to pydev debugger (build 171.4249.47)
pydev debugger: process 12876 is connecting
do you want to [E]ncrypt of [D]ecrypt?E
File to encrypt >grades.jpg
Password >123
65536
49373
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.2\helpers\pydev\pydevd.py", line 1585, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.2\helpers\pydev\pydevd.py", line 1015, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:\Program Files\JetBrains\PyCharm Community Edition 2017.1.2\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "C:/Users/itayg/PycharmProjects/PyCrypto/encrypt.py", line 66, in <module>
main()
File "C:/Users/itayg/PycharmProjects/PyCrypto/encrypt.py", line 55, in main
encrypt(getkey(password), filename)
File "C:/Users/itayg/PycharmProjects/PyCrypto/encrypt.py", line 29, in encrypt
outfile.write(encryptor.encrypt(chunk))
File "C:\Users\itayg\AppData\Local\Programs\Python\Python36\lib\site-packages\pycryptodome-3.4.6-py3.6-win-amd64.egg\Crypto\Cipher\_mode_cbc.py", line 167, in encrypt
raise ValueError("Error %d while encrypting in CBC mode" % result)
ValueError: Error 3 while encrypting in CBC mode
Ok, let's fix a few things that are wrong with your code. First the most obvious one - your padding would break on Python 3.5+ (and your user 'menu' would break on 2.x) because infile.read() would give you bytes array so trying to add a string formed by chunk += ' ' * (16 - (len(chunk) % 16)) would result in an error. You would need to convert your whitespace pad to bytes array first: chunk += b' ' * (16 - (len(chunk) % 16))
But whitespace padding like this is a bad idea - when you're later decrypting your file how will you know how much, if any, padding you've added? You need to store this somewhere - and you do in the 'header' via the filesize value, telling a potential attacker how exactly big is your file and how much padding was added opening you to a padding oracle attack (which is possible with the bellow code so do not use it for passing messages without adding a proper MAC to it).
There are plenty of robust padding schemes that you can use - I personally prefer PKCS#7 which is simply padding your uneven block or adding a whole new block with n number of bytes with the value of n - that way, after decryption, you can pick the last byte from your block and know exactly how many bytes were padded so you can strip them. So, replace your encryption portion with:
def encrypt(key, filename):
outputfile = filename + "(encrypted)"
chunksize = 1024 * AES.block_size # use the cipher's defined block size as a multiplier
IV = bytes([random.randint(0, 0xFF) for _ in range(AES.block_size)]) # bytes immediately
encryptor = AES.new(key, AES.MODE_CBC, IV)
with open(filename, 'rb') as infile:
with open(outputfile, 'wb') as outfile:
outfile.write(IV) # write the IV
padded = False
while not padded: # loop until the last block is padded
chunk = infile.read(chunksize)
chunk_len = len(chunk)
# if no more data or the data is shorter than the required block size
if chunk_len == 0 or chunk_len % AES.block_size != 0:
padding = AES.block_size - (chunk_len % AES.block_size)
chunk += bytes([padding]) * padding
# on Python 2.x replace with: chunk += chr(padding_len) * padding_len
padded = True
outfile.write(encryptor.encrypt(chunk))
I've also changed your chunksize to match the block size you're using (multiples of AES.block_size) - it just happens that 64 is a multiple of 16 but you should pay attention to those things.
Now that we have the encryption sorted out, the decryption is all this but in reversal - decrypt all blocks, read the last byte of the last block and remove n amount of bytes from behind matching the value of the last byte:
def decrypt(key, filename):
outputfile = filename[:-11] + "(decrypted)"
chunksize = 1024 * AES.block_size # use the cipher's defined block size as a multiplier
with open(filename, 'rb') as infile:
IV = infile.read(AES.block_size)
decryptor = AES.new(key, AES.MODE_CBC, IV)
with open(outputfile, 'wb') as outfile:
old_chunk = b'' # stores last chunk, needed for reading data with a delay
stripped = False
while not stripped: # delayed loop until the last block is stripped
chunk = decryptor.decrypt(infile.read(chunksize)) # decrypt as we read
if len(chunk) == 0: # no more data
padding = old_chunk[-1] # pick the padding value from the last byte
if old_chunk[-padding:] != bytes([padding]) * padding:
raise ValueError("Invalid padding...")
old_chunk = old_chunk[:-padding] # strip the padding
stripped = True
outfile.write(old_chunk) # write down the 'last' chunk
old_chunk = chunk # set the new chunk for checking in the next loop

Resources