UnicodeDecodeError Python 3.5.1 Email Script - python-3.x

I am attempting to send an email + attachment to an SMS gateway email. However I currently am getting a Unicode Decode: Error'Charmap' codec can't Decode Byte 0x8d in position 60
I'm not sure how to go about fixing this and would be interested in your advice. Bellow is my code and the Full Error.
import smtplib, os
from email.mime.image import MIMEImage
from email.mime.multipart import MIMEMultipart
msg = MIMEMultipart()
msg['Subject'] = 'Cuteness'
msg['From'] = 'sample#outlook.com'
msg['To'] = '111111111#messaging.sprintpcs.com'
msg.preamble = "Would you pet me? I'd Pet me so hard..."
here = os.getcwd()
file = open('cutecat.png')#the png shares directory with actual script
for here in file: #Error appears to be in this section
with open(file, 'rb') as fp:
img = MIMImage(fp.read())
msg.attach(img)
s = smtplib.SMTP('Localhost')
s.send_message(msg)
s.quit()
""" Traceback (most recent call last):
File "C:\Users\Thomas\Desktop\Test_Box\msgr.py", line 16, in <module>
for here in file:
File "C:\Users\Thomas\AppData\Local\Programs\Python\Python35-32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 60: character maps to <undefined>"""

You're trying to open the file twice. First you have:
file = open('cutecat.png')
The default mode to open files is to read them in text mode. That is generally not what you want to do with a binary file like a PNG file.
And then you do:
for here in file:
with open(file, 'rb') as fp:
img = MIMImage(fp.read())
msg.attach(img)
You get an exception in the first line because Python is trying to decode the contents of a binary file as text and fails. The chances of this happening are quite high. It is unlikely that a binary file is also a valid text file in your standard encoding.
But even if that would have worked, for every line in the file you try to open the file again? This makes no sense!
Were you just copy/pasting from the examples, especially the third one? You should note that this example is incomplete. The variable pngfiles used in that example (and which should be a sequence of file names) is not defined.
Try this instead:
with open('cutecat.png', 'rb') as fp:
img = MIMImage(fp.read())
msg.attach(img)
Or if you want to include multiple files:
pngfiles = ('cutecat.png', 'kitten.png')
for filename in pngfiles:
with open(filename, 'rb') as fp:
img = MIMImage(fp.read())
msg.attach(img)

Related

Python: copy file tree to a text file

I'm trying to create a text file with a tree of all files / dirs from a place that I choose using os.chdir(). My approach is to print the tree and to save all prints to the text file. The problem is that it doesn't copy the printed tree and the file is blank.
What am I doing wrong?
And is there a way to write this kind of data to the file without to actually print it?
My code:
import os
import sys
f = open("tree.txt", "w")
os.chdir("c:\\Users\Daniel\Desktop")
sys.stdout = f
os.system("tree /f")
f.close()
Edit
I was able to get the file tree from the clipboard after executing the command, however it gives me and eror when it tried to write to the txt file.
code:
import os
import tkinter
with open("tree.txt", "w") as f:
os.system("tree /f |clip")
root = tkinter.Tk()
tree = root.clipboard_get()
print(tree)
f.write(tree)
eror:
Traceback (most recent call last):
File "c:\Users\Daniel\Desktop\Tick\code_test\files.py", line 9, in <module>
f.write(tree)
File "C:\Users\Daniel\AppData\Local\Programs\Python\Python38-32\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2502' in position 80: character maps to <undefined>
solution
So I found the problem, I needed to use codec to be able write unicode to the text file. Now it works very well
code:
import os
import tkinter
import codecs
with codecs.open("tree.txt", "w", "utf8") as f:
os.chdir("c:\\Users")
os.system("tree /f |clip")
root = tkinter.Tk()
tree = root.clipboard_get()
f.write(tree)
Method check_output from subprocess module can help you to catch program output:
import subprocess
f = open("tree.txt", "wb")
tree_output = subprocess.check_output('tree /f', shell=True, cwd=r'c:\Users\Daniel\Desktop')
f.write(tree_output)
f.close()
Or with context manager:
import subprocess
with open("tree.txt", "wb") as f:
f.write(subprocess.check_output('tree /f', shell=True, cwd=r'c:\Users\Daniel\Desktop'))
Option wb is required because check_output returns bytes not a str. If you want to process output like a string - call tree_output.decode() first.

UnicodeDecodeError: charmap' codec can't decode byte 0x8f in position 756

I'm unable to retrieve the data from a Microsoft Excel document. I've tried using encoding 'Latin-1' or 'UTF-8' but when it gives me hundreds of \x00's in the terminal. Is there any way I can retrieve the data and output it to a text file?
This is what I'm running on the terminal and the error I get:
PS C:\Users\Andy-\Desktop> python.exe SRT411-Lab2.py Lab2Data.xlsx
Traceback (most recent call last):
File "SRT411-Lab2.py", line 9, in
lines = file.readlines()
File "C:\ProgramFiles\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1776.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 756: character maps to <\undefined>
Any help is greatly appreciated!
#!/usr/bin/python3
import sys
filename = sys.argv[1]
print(filename)
file = open(filename, 'r')
lines = file.readlines()
file.close()
print(lines)
I'd probably convert the excel file to csv file and use pandas to parse it

Python 'utf-8' codec stop message with IIS log

With the following python code
import csv
log_file = open('190415190514.txt', 'r')
all_data = csv.reader(log_file, delimiter=' ')
data = []
for row in all_data:
data.append(row)
to read a big file containing
2019-04-15 00:00:46 192.168.168.29 GET / - 443 - 192.168.168.80 Mozilla/5.0+(compatible;+PRTG+Network+Monitor+(www.paessler.com);+Windows) - 200 0 0 0
I get this error
File "main.py", line 5, in <module>
for row in datareader:
File "/usr/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1284: invalid start byte
I think there is no problem with the data file since it is a IIS log file. If there is any encoding issue, how can I locate that line? I am also not sure if my problem is the same this one.
Since you opened the file as 'r' instead of 'rb', python is trying to decode it as utf-8. The contents of the file are apparently not valid utf-8, so you're getting an erorr. You can find the line number of the offending line like this:
with open('190415190514.txt', 'rb') as f:
for i, line in enumerate(f):
try:
line.decode('utf-8')
except UnicodeDecodeError as e:
print (f'{e} at line {i+1}')
You probably should be passing errors or encoding to open. see: https://docs.python.org/3/library/functions.html#open

pdfparser from pdfminer: PDFException: PDFDocument is not initialized

I'm not understanding this error. I want to open a pdf and loop over the pages but I'm getting this exception and I couldn't find much by googling it.
Here is the example that fails
from pdfminer.pdfparser import PDFParser, PDFDocument
from os.path import basename, splitext
file = 'tmpfiles/tmpfile.pdf'
filename = splitext(basename(file))[0]
fp = open(file, 'rb')
parser = PDFParser(fp)
doc = PDFDocument(parser)
num_page = 0
text = ""
pages = doc.get_pages()
for p in pages:
print("do whatever")
Here is the traceback
Traceback (most recent call last):
File "test.py", line 20, in <module>
for p in pages:
File "/home/.../anaconda3/lib/python3.6/site-packages/pdfminer/pdfparser.py", line 544, in get_pages
raise PDFException('PDFDocument is not initialized')
pdfminer.pdftypes.PDFException: PDFDocument is not initialized
I have python 3.6
Before doing this I'm saving the pdf file like this because I have the contents in a base64 encoded string
decoded = base64.b64decode(content_string)
with open(tmpfiles_path+'tmpfile.pdf', 'wb') as fout:
fout.write(decoded)
Could it be that the file is being saved with some protection?
The problem was the version of pdfminer I was using. By installing pdfminer.six and changing the code in this way
from pdfminer.pdfpage import PDFPage
file = 'tmpfiles/tmpfile.pdf'
fp = open(file, 'rb')
pages = PDFPage.get_pages(fp)
for p in pages:
print("do whatever")
Now it works.

Cutting Plain HTML files with RegEX

I am using this code to extract a part of my locally stored HTML files and save the shortened new document into a .txt file.
import glob
import os
import re
def extractor():
os.chdir(r"F:\Test") # the directory containing your html
for file in glob.iglob("*.html"): # iterates over all files in the directory ending in .html
with open(file, encoding="utf8") as f, open((file.rsplit(".", 1)[0]) + ".txt", "w", encoding="utf8") as out:
contents = f.read()
extract = re.compile(r'(Start).*?End', re.I | re.S)
cut = extract.sub('', contents)
if re.search(extract, contents) is not None:
out.write(cut)
out.close()
extractor()
It works fine for most of my files however for a few files I do have some encoding issues and get:
Traceback (most recent call last):
File "C:/Users/6930p/PycharmProjects/untitled/Versuch/CutFile.py", line 16, in <module>
extractor()
File "C:/Users/6930p/PycharmProjects/untitled/Versuch/CutFile.py", line 14, in extractor
out.write(cut)
File "C:\Users\6930p\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 241205-241210: character maps to <undefined>
Anyone an idea what's the problem? I thought by using encoding="utf8" I won't have any problems with encoding...
Any help appreciated!
Ok, it has been an issue with encoding="utf8". It forgot to encode my new created .txt file with "utf8". Code is updated and works!

Resources