UTF-8 is not the decoder - python-3.x

I'm using a program that reads data from a ubertooth-one device. I put the input data into a pipe file (made with mkfifo), but when i try to read the data i have the following error:
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/usr/lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "/home/k1k4ss0/Escritorio/proyecto/tests/tmp.py", line 44, in interpretar
print ("{}".format(fifo.readline()))
File "/usr/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb2 in position 2: invalid start byte
My code is this one:
# with FIFO as : '/tmp/pipe'
def interpretar():
with open(FIFO) as fifo:
while True:
print ("{}".format(fifo.readline()))
The command excecuted is ubertooth-rx -q /tmp/pipe
following the documentation:
• -q <file.pcap> : Capture packets to PCAP

The data you are getting is not unicode/text so can't be decoded as such. Try open(FIFO, "rb") to open in binary mode and read (instead of readline) to read binary data.

Related

How to enforce encoding for sqlcipher/sqlite export?

I'm trying to work with the JSON output taken from an sqlcipher output (taken from here)
sqlcipher/sqlcipher -json -noheader "db.sqlite" "PRAGMA key = \"x'"MY-ENCRYPTION-KEY"'\";PRAGMA encoding = \"UTF-8\";select * from messages_fts_data;" > messages_fts_data.json
but when I try to load the content with a Python3 script I'm getting problems with the encoding:
>>> json.load(open("messages_fts_data.json"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.7/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/usr/lib64/python3.7/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 32: invalid start byte
(I'm cheating a bit here - in my 'real' code I'm stripping of b'[{"ok":"ok"}]\n' first but since I'm getting the error when reading the file it's definitely not just an JSON error)
When I try to handle the encoding manually by specifying encoding="utf-8" in the load() or open() commands, I'm getting the same error.
What am I doing wrong? I thought I told sqlcipher to generate UTF-8 encoded output but this seems to not work.

Zipfile / shutil.make_archive throws EncodeError on german umlauts

I'm trying to zip a folder in Python 3 with the module zipfile.
Since I'm german I have some filenames containing umlauts (äöü).
While zipping, I get a UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 95: surrogates not allowed.
The character in question is an ü.
How can I get zipfile to zip all my files?
The relevant code is this:
def zipdir(path, ziph):
for root, dirs, files in os.walk(path):
for file in files:
ziph.write(os.path.join(root, file))
if __name__ == '__main__':
zipf = zipfile.ZipFile('path/to/destination', 'w', zipfile.ZIP_DEFLATED)
zipdir('path/to/folder', zipf)
zipf.close()
Edit:
I've got the same error when I'm using shutil.make_archive.
import shutil
shutil.make_archive('/path/to/destination', 'zip', '/path/to/folder')
Full stacktrace of shutil.make_archive():
Traceback (most recent call last):
File "/usr/lib64/python3.7/zipfile.py", line 452, in _encodeFilenameFlags
return self.filename.encode('ascii'), self.flag_bits
UnicodeEncodeError: 'ascii' codec can't encode character '\udcfc' in position 59: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run.py", line 39, in <module>
archive_dir(path, zip_fullpath)
File "run.py", line 19, in archive_dir
shutil.make_archive(dest, 'zip', source)
File "/home/sean/.local/share/virtualenvs/backup-script-QUcRKrDQ/lib/python3.7/shutil.py", line 822, in make_archive
filename = func(base_name, base_dir, **kwargs)
File "/home/sean/.local/share/virtualenvs/backup-script-QUcRKrDQ/lib/python3.7/shutil.py", line 720, in _make_zipfile
zf.write(path, path)
File "/usr/lib64/python3.7/zipfile.py", line 1746, in write
with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
File "/usr/lib64/python3.7/zipfile.py", line 1473, in open
return self._open_to_write(zinfo, force_zip64=force_zip64)
File "/usr/lib64/python3.7/zipfile.py", line 1586, in _open_to_write
self.fp.write(zinfo.FileHeader(zip64))
File "/usr/lib64/python3.7/zipfile.py", line 442, in FileHeader
filename, flag_bits = self._encodeFilenameFlags()
File "/usr/lib64/python3.7/zipfile.py", line 454, in _encodeFilenameFlags
return self.filename.encode('utf-8'), self.flag_bits | 0x800
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 59: surrogates not allowed
Full stacktrace of zipfile:
Traceback (most recent call last):
File "/usr/lib64/python3.7/zipfile.py", line 452, in _encodeFilenameFlags
return self.filename.encode('ascii'), self.flag_bits
UnicodeEncodeError: 'ascii' codec can't encode character '\udcfc' in position 95: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run.py", line 41, in <module>
zipdir(path, zipf)
File "run.py", line 16, in zipdir
ziph.write(filepath)
File "/usr/lib64/python3.7/zipfile.py", line 1746, in write
with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
File "/usr/lib64/python3.7/zipfile.py", line 1473, in open
return self._open_to_write(zinfo, force_zip64=force_zip64)
File "/usr/lib64/python3.7/zipfile.py", line 1586, in _open_to_write
self.fp.write(zinfo.FileHeader(zip64))
File "/usr/lib64/python3.7/zipfile.py", line 442, in FileHeader
filename, flag_bits = self._encodeFilenameFlags()
File "/usr/lib64/python3.7/zipfile.py", line 454, in _encodeFilenameFlags
return self.filename.encode('utf-8'), self.flag_bits | 0x800
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcfc' in position 95: surrogates not allowed
Update:
I've tried some solutions that seemed to work for some at the posted link. This is what I've got:
with
ziph.write(filepath.encode('utf8','surrogateescape').decode('ISO-8859-1')) I got:
Traceback (most recent call last):
File "run.py", line 41, in <module>
zipdir(path, zipf)
File "run.py", line 16, in zipdir
ziph.write(filepath.encode('utf8','surrogateescape').decode('ISO-8859-1'))
File "/usr/lib64/python3.7/zipfile.py", line 1713, in write
zinfo = ZipInfo.from_file(filename, arcname)
File "/usr/lib64/python3.7/zipfile.py", line 506, in from_file
st = os.stat(filename)
FileNotFoundError: [Errno 2] No such file or directory: '/some/path/to/documents/DIS_Broschüre_DE.pdf'
So the encoding/decoding returned something that can not be found in the file system.
The other option: ziph.write(filepath.encode('utf8','surrogateescape').decode('utf-8')) got me
Traceback (most recent call last):
File "run.py", line 41, in <module>
zipdir(path, zipf)
File "run.py", line 16, in zipdir
ziph.write(filepath.encode('utf8','surrogateescape').decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 96: invalid start byte
Ok. I've found the Problem.
The files in questen were not the ones I thought they were. Usual umlaus work fine. Somehow the filenames were actually corrupt. like this:
ls in one of the dirs gives:
2e_geh�usetechnologie_flyer_qrcode.pdf
Command line auto completion gives me:
2e_geh$'\344'usetechnologie_flyer_qrcode.pdf
Since these are files that got uploaded via a webinterface I can only imagine that these are made in Windows or another non-UNIX OS and the webserver couldn't handle it.
Other uploaded files had correct umlauts. I'm not shure what happened there but I'm glad it is not Python or the Linux FS to blame.
Thanks for all the tips.

What is this error when i try to parse a simple pcap file?

import dpkt
f = open('gtp.pcap')
pcap = dpkt.pcap.Reader(f)
for ts, buf in pcap:
eth = dpkt.ethernet.Ethernet(buf)
print(eth)
Traceback (most recent call last):
File "new.py", line 4, in <module>
pcap = dpkt.pcap.Reader(f)
File "/home/user/gtp_gaurang/venv/lib/python3.5/site-packages/dpkt/pcap.py", line 244, in __init__
buf = self.__f.read(FileHdr.__hdr_len__)
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 16: invalid start byte
(venv) user#user-OptiPlex-7010:~/gtp_gaurang$ python3 new.py
Traceback (most recent call last):
File "new.py", line 4, in <module>
pcap = dpkt.pcap.Reader(f)
File "/home/user/gtp_gaurang/venv/lib/python3.5/site-packages/dpkt/pcap.py", line 244, in __init__
buf = self.__f.read(FileHdr.__hdr_len__)
File "/usr/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 16: invalid start byte
What is this error when i try to parse a simple pcap file?
I am running this simple pcap parser code. But it is showing the above
error. Can anyone please help.
Can you please check this link.
Related Answer
according to the answer suggestion, UTF-8 encounters an invalid byte which it cannot decode. So if you just read your file in binary format this error will not come as the decoding will not happen and the file contents will remain a bytes.
Open the file in binary mode
f = open('gtp.pcap', 'rb')
pcap = dpkt.pcap.Reader(f)
...

stanford-dependency parser with NLTK :UnicodeDecodeError:

I am trying to run the following lines of code:
import os
os.environ['JAVAHOME'] = 'path/to/java.exe'
os.environ['STANFORD_PARSER'] = 'path/to/stanford-parser.jar'
os.environ['STANFORD_MODELS'] = 'path/to/stanford-parser-3.8.0-models.jar'
from nltk.parse.stanford import StanfordDependencyParser
dep_parser = StanfordDependencyParser(model_path="path/to/englishPCFG.ser.gz")
sentence = "sample sentence ..."
# Dependency Parsing:
print("Dependency Parsing:")
print([parse.tree() for parse in dep_parser.raw_parse(sentence)])
and at the line:
print([parse.tree() for parse in dep_parser.raw_parse(sentence)])
I get the following issues:
Traceback (most recent call last):
File "C:/Users/Norbert/PycharmProjects/untitled/StanfordDependencyParser.py", line 21, in
print([parse.tree() for parse in dep_parser.raw_parse(sentence)])
File "C:\Users\Norbert\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\parse\stanford.py", line 134, in raw_parse
return next(self.raw_parse_sents([sentence], verbose))
File "C:\Users\Norbert\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\parse\stanford.py", line 152, in raw_parse_sents
return self._parse_trees_output(self._execute(cmd, '\n'.join(sentences), verbose))
File "C:\Users\Norbert\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\parse\stanford.py", line 218, in _execute
stdout=PIPE, stderr=PIPE)
File "C:\Users\Norbert\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\internals.py", line 135, in java
print(_decode_stdoutdata(stderr))
File "C:\Users\Norbert\AppData\Local\Programs\Python\Python36\lib\site-packages\nltk\internals.py", line 737, in _decode_stdoutdata
return stdoutdata.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xac in position 3097: invalid start byte
Any idea what could be wrong ? I am not even dealing with any non-utf-8 text.
I can print a few things by doing this, maybe is not what you wanted but is a start.
print("Dependency Parsing:")
result = dependency_parser.raw_parse(sentence)
#print (next(result))
dep = next(result)
print (list(dep.triples()))
Uncomment the line -> print(next(result)) if you want to see the entire output.

spyder unicode decode error in startup

I was using spyder-ide while parsing a tumblr page with the permission of the author, and at some point everything just crashed. Even my linux system had freezed. Well, to cut to the chase now I can not start spyder, it gives me the following error after I had written spyder to my terminal:
Traceback (most recent call last):
File "/home/dk/anaconda3/bin/spyder", line 2, in <module>
from spyderlib import start_app
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/start_app.py", line 13, in <module>
from spyderlib.config import CONF
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/config.py", line 736, in <module>
subfolder=SUBFOLDER, backup=True, raw_mode=True)
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/userconfig.py", line 215, in __init__
self.load_from_ini()
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/userconfig.py", line 265, in load_from_ini
self.read(self.filename(), encoding='utf-8')
File "/home/dk/anaconda3/lib/python3.5/configparser.py", line 696, in read
self._read(fp, filename)
File "/home/dk/anaconda3/lib/python3.5/configparser.py", line 1012, in _read
for lineno, line in enumerate(fp, start=1):
File "/home/dk/anaconda3/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
I tried the solution here and I had received the following error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/spyder.py", line 107, in <module>
from spyderlib.utils.qthelpers import qapplication
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/utils/qthelpers.py", line 24, in <module>
from spyderlib.guiconfig import get_shortcut
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/guiconfig.py", line 22, in <module>
from spyderlib.config import CONF
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/config.py", line 736, in <module>
subfolder=SUBFOLDER, backup=True, raw_mode=True)
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/userconfig.py", line 215, in __init__
self.load_from_ini()
File "/home/dk/anaconda3/lib/python3.5/site-packages/spyderlib/userconfig.py", line 265, in load_from_ini
self.read(self.filename(), encoding='utf-8')
File "/home/dk/anaconda3/lib/python3.5/configparser.py", line 696, in read
self._read(fp, filename)
File "/home/dk/anaconda3/lib/python3.5/configparser.py", line 1012, in _read
for lineno, line in enumerate(fp, start=1):
File "/home/dk/anaconda3/lib/python3.5/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 0-1: invalid continuation byte
I tried uninstalling and reinstalling anaconda and it doesn't seem to work I am open to suggestions, I am very much new to python, so I would appriciate a simple explanation of the possible causes of the error too.
Thanks in advance
Well here is how I solved the issue.
l opened this: spyderlib/userconfig.py
and changed this: self.read(self.filename(), encoding='utf-8')
to this: self.read(self.filename(), encoding='latin-1')
It gave me a Warning: File contains no section headers but started spyder anyway. After that, I closed spyder, opened the terminal and entered spyder --reset then restarted spyder, it seems to work now.
Here is what you should not do at all costs for this problem: thinkering with these, I learned my lesson the hard way:
python3.5/configparser.py
python3.5/codecs.py

Resources