Zipfile / shutil.make_archive throws EncodeError on german umlauts - python-3.x

I'm trying to zip a folder in Python 3 with the module zipfile.
Since I'm german I have some filenames containing umlauts (äöü).
While zipping, I get a UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 95: surrogates not allowed.
The character in question is an ü.
How can I get zipfile to zip all my files?
The relevant code is this:
def zipdir(path, ziph):
for root, dirs, files in os.walk(path):
for file in files:
ziph.write(os.path.join(root, file))
if __name__ == '__main__':
zipf = zipfile.ZipFile('path/to/destination', 'w', zipfile.ZIP_DEFLATED)
zipdir('path/to/folder', zipf)
zipf.close()
Edit:
I've got the same error when I'm using shutil.make_archive.
import shutil
shutil.make_archive('/path/to/destination', 'zip', '/path/to/folder')
Full stacktrace of shutil.make_archive():
Traceback (most recent call last):
File "/usr/lib64/python3.7/zipfile.py", line 452, in _encodeFilenameFlags
return self.filename.encode('ascii'), self.flag_bits
UnicodeEncodeError: 'ascii' codec can't encode character '\udcfc' in position 59: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run.py", line 39, in <module>
archive_dir(path, zip_fullpath)
File "run.py", line 19, in archive_dir
shutil.make_archive(dest, 'zip', source)
File "/home/sean/.local/share/virtualenvs/backup-script-QUcRKrDQ/lib/python3.7/shutil.py", line 822, in make_archive
filename = func(base_name, base_dir, **kwargs)
File "/home/sean/.local/share/virtualenvs/backup-script-QUcRKrDQ/lib/python3.7/shutil.py", line 720, in _make_zipfile
zf.write(path, path)
File "/usr/lib64/python3.7/zipfile.py", line 1746, in write
with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
File "/usr/lib64/python3.7/zipfile.py", line 1473, in open
return self._open_to_write(zinfo, force_zip64=force_zip64)
File "/usr/lib64/python3.7/zipfile.py", line 1586, in _open_to_write
self.fp.write(zinfo.FileHeader(zip64))
File "/usr/lib64/python3.7/zipfile.py", line 442, in FileHeader
filename, flag_bits = self._encodeFilenameFlags()
File "/usr/lib64/python3.7/zipfile.py", line 454, in _encodeFilenameFlags
return self.filename.encode('utf-8'), self.flag_bits | 0x800
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 59: surrogates not allowed
Full stacktrace of zipfile:
Traceback (most recent call last):
File "/usr/lib64/python3.7/zipfile.py", line 452, in _encodeFilenameFlags
return self.filename.encode('ascii'), self.flag_bits
UnicodeEncodeError: 'ascii' codec can't encode character '\udcfc' in position 95: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run.py", line 41, in <module>
zipdir(path, zipf)
File "run.py", line 16, in zipdir
ziph.write(filepath)
File "/usr/lib64/python3.7/zipfile.py", line 1746, in write
with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
File "/usr/lib64/python3.7/zipfile.py", line 1473, in open
return self._open_to_write(zinfo, force_zip64=force_zip64)
File "/usr/lib64/python3.7/zipfile.py", line 1586, in _open_to_write
self.fp.write(zinfo.FileHeader(zip64))
File "/usr/lib64/python3.7/zipfile.py", line 442, in FileHeader
filename, flag_bits = self._encodeFilenameFlags()
File "/usr/lib64/python3.7/zipfile.py", line 454, in _encodeFilenameFlags
return self.filename.encode('utf-8'), self.flag_bits | 0x800
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 95: surrogates not allowed
Update:
I've tried some solutions that seemed to work for some at the posted link. This is what I've got:
with
ziph.write(filepath.encode('utf8','surrogateescape').decode('ISO-8859-1')) I got:
Traceback (most recent call last):
File "run.py", line 41, in <module>
zipdir(path, zipf)
File "run.py", line 16, in zipdir
ziph.write(filepath.encode('utf8','surrogateescape').decode('ISO-8859-1'))
File "/usr/lib64/python3.7/zipfile.py", line 1713, in write
zinfo = ZipInfo.from_file(filename, arcname)
File "/usr/lib64/python3.7/zipfile.py", line 506, in from_file
st = os.stat(filename)
FileNotFoundError: [Errno 2] No such file or directory: '/some/path/to/documents/DIS_Broschüre_DE.pdf'
So the encoding/decoding returned something that can not be found in the file system.
The other option: ziph.write(filepath.encode('utf8','surrogateescape').decode('utf-8')) got me
Traceback (most recent call last):
File "run.py", line 41, in <module>
zipdir(path, zipf)
File "run.py", line 16, in zipdir
ziph.write(filepath.encode('utf8','surrogateescape').decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 96: invalid start byte

Ok. I've found the Problem.
The files in questen were not the ones I thought they were. Usual umlaus work fine. Somehow the filenames were actually corrupt. like this:
ls in one of the dirs gives:
2e_geh�usetechnologie_flyer_qrcode.pdf
Command line auto completion gives me:
2e_geh$'\344'usetechnologie_flyer_qrcode.pdf
Since these are files that got uploaded via a webinterface I can only imagine that these are made in Windows or another non-UNIX OS and the webserver couldn't handle it.
Other uploaded files had correct umlauts. I'm not shure what happened there but I'm glad it is not Python or the Linux FS to blame.
Thanks for all the tips.

Related

"UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 3965: invalid start byte" when using Pyinstaller

I am trying to create an executable from two python scripts. One script defines the GUI for the other backend script. The backend is reading in excel files, creating DataFrames with them for manipulation, then outputting a new excel file. This is the code that reads in the excel file, where "user_path, userAN, userRev1, userRev2" are grabbed as user input from the GUI:
import pandas as pd
import numpy as np
import string
from tkinter import messagebox
import os
def generate_BOM(user_path, userAN, userRev1, userRev2):
## Append filepath with '/' if it does not include directory separator
if not (user_path.endswith('/') or user_path.endswith('\\')):
user_path = user_path + '/'
## Set filepath to current directory if user inputted path does not exist
if not os.path.exists(user_path):
user_path = '.'
fileFormat1 = userAN + '_' + userRev1 + '.xls'
fileFormat2 = userAN + '_' + userRev2 + '.xls'
for file in os.listdir(path=user_path):
if file.endswith(fileFormat1):
df1 = pd.read_excel(user_path+file, index_col=None)
if file.endswith(fileFormat2):
df2 = pd.read_excel(user_path+file, index_col=None)
When running the two scripts through Spyder, everything works perfectly. To create the exe, I am using Pyinstaller with the following command:
pyinstaller --onefile Delta_BOM_Creator.py
This results in the following error:
Traceback (most recent call last):
File "c:\users\davhar\anaconda3\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "c:\users\davhar\anaconda3\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\davhar\Anaconda3\Scripts\pyinstaller.exe\__main__.py", line 7, in <module>
File "c:\users\davhar\anaconda3\lib\site-packages\PyInstaller\__main__.py", line 114, in run
run_build(pyi_config, spec_file, **vars(args))
File "c:\users\davhar\anaconda3\lib\site-packages\PyInstaller\__main__.py", line 65, in run_build
PyInstaller.building.build_main.main(pyi_config, spec_file, **kwargs)
File "c:\users\davhar\anaconda3\lib\site-packages\PyInstaller\building\build_main.py", line 737, in main
build(specfile, kw.get('distpath'), kw.get('workpath'), kw.get('clean_build'))
File "c:\users\davhar\anaconda3\lib\site-packages\PyInstaller\building\build_main.py", line 684, in build
exec(code, spec_namespace)
File "C:\Users\davhar\.spyder-py3\DELTA_BOM_Creator\Delta_BOM_Creator.spec", line 7, in <module>
a = Analysis(['Delta_BOM_Creator.py'],
File "c:\users\davhar\anaconda3\lib\site-packages\PyInstaller\building\build_main.py", line 242, in __init__
self.__postinit__()
File "c:\users\davhar\anaconda3\lib\site-packages\PyInstaller\building\datastruct.py", line 160, in __postinit__
self.assemble()
File "c:\users\davhar\anaconda3\lib\site-packages\PyInstaller\building\build_main.py", line 414, in assemble
priority_scripts.append(self.graph.run_script(script))
File "c:\users\davhar\anaconda3\lib\site-packages\PyInstaller\depend\analysis.py", line 303, in run_script
self._top_script_node = super(PyiModuleGraph, self).run_script(
File "c:\users\davhar\anaconda3\lib\site-packages\PyInstaller\lib\modulegraph\modulegraph.py", line 1411, in run_script
contents = fp.read() + '\n'
File "c:\users\davhar\anaconda3\lib\codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 3965: invalid start byte
I've tried everything I could find that somewhat related to this issue. To list just a few:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 15: invalid start byte
https://www.dlology.com/blog/solution-pyinstaller-unicodedecodeerror-utf-8-codec-cant-decode-byte/
Pandas read _excel: 'utf-8' codec can't decode byte 0xa8 in position 14: invalid start byte
I've never used Pyinstaller, or created an executable from python at all, so apologies for being a big time noob.
SOLUTION: I found a solution. I went into the codecs.py file mentioned in the error and added 'ignore' to line 322
(result, consumed) = self.buffer_decode(data, 'ignore', final)

Python convert SVGz to PDF: no response

I tried to convert a batch of .svgz into a single pdf file following the instructions of Generating PDFs from SVG input.
from svglib.svglib import svg2rlg
from reportlab.graphics import renderPDF
renderPDF.drawToFile(svg2rlg("images/p1.svgz"), "out.pdf")
Encounter the issue OSError: Not a gzipped file (b'<s'). The file is not compressed at all as I can read the file with cat.
I changed the filename extension to .svg with mv, run again the above codes renderPDF.drawToFile(svg2rlg("images_svg/p1.svg"), "out.pdf"), but got nothing response.
I terminated the process with Ctrl + c, and got this,
$ python3 img_to_pdf.py
^CTraceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/reportlab/lib/utils.py", line 658, in open_for_read
return open_for_read_by_name(name,mode)
File "/usr/local/lib/python3.7/site-packages/reportlab/lib/utils.py", line 602, in open_for_read_by_name
return open(name,mode)
FileNotFoundError: [Errno 2] No such file or directory: 'p2_g_d0_f57.ttf'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/reportlab/lib/utils.py", line 661, in open_for_read
return getBytesIO(datareader(name) if name[:5].lower()=='data:' else urlopen(name).read())
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 510, in open
req = Request(fullurl, data)
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 328, in __init__
self.full_url = url
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 354, in full_url
self._parse()
File "/usr/local/Cellar/python/3.7.0/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 383, in _parse
raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: 'p2_g_d0_f57.ttf'
During handling of the above exception, another exception occurred:
......
How do I make it work?

python pandas, unicode decode error on read_csv

When importing a csv file I am getting an error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 15: invalid start byte
traceback:
Traceback (most recent call last):
File "<ipython-input-2-99e71d524b4b>", line 1, in <module>
runfile('C:/AppData/FinRecon/py_code/python3/DataJoin.py', wdir='C:/AppData/FinRecon/py_code/python3')
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 786, in runfile
execfile(filename, namespace)
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/AppData/FinRecon/py_code/python3/DataJoin.py", line 500, in <module>
M5()
File "C:/AppData/FinRecon/py_code/python3/DataJoin.py", line 221, in M5
s3 = pd.read_csv(working_dir+"S3.csv", sep=",") #encode here encoding='utf-16
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 702, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 435, in _read
data = parser.read(nrows)
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1139, in read
ret = self._engine.read(nrows)
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\io\parsers.py", line 1995, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 899, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 914, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 991, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 1123, in pandas._libs.parsers.TextReader._convert_column_data
File "pandas/_libs/parsers.pyx", line 1176, in pandas._libs.parsers.TextReader._convert_tokens
File "pandas/_libs/parsers.pyx", line 1299, in pandas._libs.parsers.TextReader._convert_with_dtype
File "pandas/_libs/parsers.pyx", line 1315, in pandas._libs.parsers.TextReader._string_convert
File "pandas/_libs/parsers.pyx", line 1553, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 15: invalid start byte
What i've tried:
`s3 = pd.read_csv(working_dir+"S3.csv", sep=",", encoding='utf-16')`
I get error UnicodeError: UTF-16 stream does not start with BOM
What can be done to get this file to be read properly?
Try using s3 = pd.read_csv(working_dir+"S3.csv", sep=",", encoding='Latin-1')
Mostly encoding issues arise with the characters within the data. While utf-8 supports all languages according to pandas' documentation, utf-8 has a byte structure that must be respected at all times. Some of the values not included in utf-8 are latin small letters i with diaeresis, right-pointing double angle quotation mark, inverted question mark. This are mapped as 0xef, 0xbb and 0xbf bytes respectively. Hence your error.

python code of geograpy module gives some error

solve some error of nltk, but these are remaining
import geograpy
url = 'http://www.bbc.com/news/world-europe-26919928'
places = geograpy.get_place_context(url=url)
this is generated errors
Traceback (most recent call last):
File "C:\Users\Monika\Desktop\p.py", line 3, in <module>
places = geograpy.get_place_context(url=url)
File "C:\Users\Monika\AppData\Local\Programs\Python\Python37\lib\site-packages\geograpy\__init__.py", line 11, in get_place_context
pc.set_cities()
File "C:\Users\Monika\AppData\Local\Programs\Python\Python37\lib\site-packages\geograpy\places.py", line 137, in set_cities
self.populate_db()
File "C:\Users\Monika\AppData\Local\Programs\Python\Python37\lib\site-packages\geograpy\places.py", line 30, in populate_db
for row in reader:
File "C:\Users\Monika\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 274: character maps to <undefined>

spyder unicode decode error in startup

I was using spyder-ide while parsing a tumblr page with the permission of the author, and at some point everything just crashed. Even my linux system had freezed. Well, to cut to the chase now I can not start spyder, it gives me the following error after I had written spyder to my terminal:
Traceback (most recent call last):
File "/home/dk/anaconda3/bin/spyder", line 2, in <module>
from spyderlib import start_app
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/start_app.py", line 13, in <module>
from spyderlib.config import CONF
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/config.py", line 736, in <module>
subfolder=SUBFOLDER, backup=True, raw_mode=True)
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/userconfig.py", line 215, in __init__
self.load_from_ini()
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/userconfig.py", line 265, in load_from_ini
self.read(self.filename(), encoding='utf-8')
File "/home/dk/anaconda3/lib/python3.5/configparser.py", line 696, in read
self._read(fp, filename)
File "/home/dk/anaconda3/lib/python3.5/configparser.py", line 1012, in _read
for lineno, line in enumerate(fp, start=1):
File "/home/dk/anaconda3/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
I tried the solution here and I had received the following error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/spyder.py", line 107, in <module>
from spyderlib.utils.qthelpers import qapplication
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/utils/qthelpers.py", line 24, in <module>
from spyderlib.guiconfig import get_shortcut
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/guiconfig.py", line 22, in <module>
from spyderlib.config import CONF
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/config.py", line 736, in <module>
subfolder=SUBFOLDER, backup=True, raw_mode=True)
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/userconfig.py", line 215, in __init__
self.load_from_ini()
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/userconfig.py", line 265, in load_from_ini
self.read(self.filename(), encoding='utf-8')
File "/home/dk/anaconda3/lib/python3.5/configparser.py", line 696, in read
self._read(fp, filename)
File "/home/dk/anaconda3/lib/python3.5/configparser.py", line 1012, in _read
for lineno, line in enumerate(fp, start=1):
File "/home/dk/anaconda3/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
I tried uninstalling and reinstalling anaconda and it doesn't seem to work I am open to suggestions, I am very much new to python, so I would appriciate a simple explanation of the possible causes of the error too.
Thanks in advance
Well here is how I solved the issue.
l opened this: spyderlib/userconfig.py
and changed this: self.read(self.filename(), encoding='utf-8')
to this: self.read(self.filename(), encoding='latin-1')
It gave me a Warning: File contains no section headers but started spyder anyway. After that, I closed spyder, opened the terminal and entered spyder --reset then restarted spyder, it seems to work now.
Here is what you should not do at all costs for this problem: thinkering with these, I learned my lesson the hard way:
python3.5/configparser.py
python3.5/codecs.py

Resources