Set filesystem encoding for python 3 on the Intel Edison - python-3.x

Steps to reproduce:
Create a file test.txt with content This is 中文 (i.e., UTF-8 encoded non-ASCII text).
Custom-compile python 3.5.2 on the Intel Edison.
Launch the custom-compiled python3 interpreter and issue the following piece of code:
with open('test.txt', 'r') as fh:
fh.readlines()
Actual behavior:
A UnicodeDecodeError exception is thrown. The file is opened as 'ASCII' instead of 'UTF-8' by default:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 8: ordinal not in range(128)
On a "regular" Linux system this problem is easily solved by setting a proper locale, see e.g. this post or that post. On the Intel Edison, however, I cannot set the LC_CTYPE since the default Yocto Linux distribution is missing locales (see e.g. this page).
I also tried to use a couple of other hacks like
import sys; sys.getfilesystemencoding = lambda: 'UTF-8'
import locale; locale.getpreferredencoding = lambda: 'utf-8'
And I tried setting the PYTHONIOENCODING=utf8 environment variable before starting the python interpreter.
However, none of this works. The only workaround is to specify the encoding explicitly as a command line parameter to the open command. This works for the above snippet, but it won't set the system-wide default for all packages I am using (that will implicitly open files as ASCII and may or may not offer me a way to override that default behavior).
What is the proper way to set the python interpreter default filesystem encoding? (Of course without installing unneeded system-wide locales.)

You can set the LC_ALL environment variable to alter the default:
$ python3 -c 'import locale; print(locale.getpreferredencoding())'
UTF-8
$ LC_ALL='.ASCII' python3 -c 'import locale; print(locale.getpreferredencoding())'
US-ASCII
I tested this both on OS X and CentOS 7.
As for your other attempts, here is why they don't work:
sys.getfilesystemencoding() applies to filenames only (e.g. os.listdir() and friends).
The io module doesn't actually use the locale.getpreferrredencoding() function, so altering the function on the module won't have an effect. A lightweight _bootlocale.py bootstrap module is used instead. More on that below.
PYTHONIOENCODING only applies to sys.stdin, sys.stdout and sys.stdstderr
If setting environment variables ultimately fails, you can still patch the _bootlocale module:
import _bootlocale
old = _bootlocale.getpreferredencoding # so you can restore
_bootlocale.getpreferredencoding = lambda *args: 'UTF-8'
This works for me (again on OS X and CentOS 7, tested with 3.6):
>>> import _bootlocale
>>> open('/tmp/test.txt', 'w').encoding # showing pre-patch setting
'UTF-8'
>>> old = _bootlocale.getpreferredencoding
>>> _bootlocale.getpreferredencoding = lambda *a: 'ASCII'
>>> open('/tmp/test.txt', 'w').encoding # gimped hook
'ASCII'

Related

Pygrib UnicodeEncodeError

I use python 3.9.1 on macOS Big Sur with an M1 chip.
I would like to open the grib format file which is provided by Japan Meteorological Agency.
So, I tried to use the pygrib library as below:
import pygrib
gpv_file = pygrib.open("Z__C_RJTD_20171205000000_GSM_GPV_Rjp_Lsurf_FD0000-0312_grib2.bin")
But I got the error like this:
----> 1 gpv_file = pygrib.open("Z__C_RJTD_20171205000000_GSM_GPV_Rjp_Lsurf_FD0000-0312_grib2.bin")
pygrib/_pygrib.pyx in pygrib._pygrib.open.__cinit__()
pygrib/_pygrib.pyx in pygrib._pygrib._strencode()
UnicodeEncodeError: 'ascii' codec can't encode characters in position 50-51: ordinal not in range(128)
I asked other people to run the same code, and it somehow worked.
I am not sure what the problem is and how to fix it.

Why do different python versions have different behaviors on stand output print?

The Python 3.4 and Python 3.8/3.9 are different when I try execute below statement:
print('\u212B')
Python 3.8/3.9 can print it correctly.
Å
Python 3.4 will report an exception:
Traceback (most recent call last):
File "test.py", line 9, in <module>
print('\u212B')
UnicodeEncodeError: 'gbk' codec can't encode character '\u212b' in position 0: illegal multibyte sequence
And according to this page, I can avoid the exception by overwrite sys.stdout via statement:
sys.stdout = io.TextIOWrapper(buffer=sys.stdout.buffer,encoding='utf-8')
But python 3.4 still print different charactor as below:
鈩?
So my questions are:
Why do different python versions have different behaviors on stand output print?
How can I print correct value Å in python 3.4?
Edit 1:
I guess the difference is caused by PEP 528 -- Change Windows console encoding to UTF-8. But I still don't understand the machanism of console encoding and how I can print correct character in Python 3.4.
Edit 2:
One more difference, sys.getfilesystemencoding() will get utf-8 in Python 3.8/3.9 and get mbcs in Python 3.4.
Why?
Regarding the rationale behind the stdout encoding you can read more in the answers here: Changing default encoding of Python?
In short, Python 3.4 is using your OS's encoding by default as the one for stdout whereas with Python 3.8 it is set to UTF-8.
How to fix this?
You can use a new method - reconfigure introduced with Python 3.7:
sys.stdout.reconfigure(encoding='utf-8')
Typically, you can try setting the environment variable PYTHONIOENCODING to utf-8:
set PYTHONIOENCODING=utf8
in most of the operating systems except Windows where another environment variable must be set for it to work:
set PYTHONLEGACYWINDOWSIOENCODING=1
You can fix it in the version of Python preceding v. 3.7 via installing win-unicode-console package that handles UTF issues transparently on Windows:
pip install win-unicode-console
If you are not running the code directly from a console there is a possibility that your IDE configuration is interfering.

Reading data in Spyder with a different encoding

I'm trying to read a .spydata file into Spyder, which was written in a different platform (and probably with a different encoding), but spyder gives an error :
'ascii' codec can't decode byte 0xb5 in position 2: ordinal not in range(128)
I tried changing my encoding setting before loading spyder without success. Any ideas?
I fixed the encoding problem with this line on top of the file:
# -*- coding: utf-8 -*-
https://python.readthedocs.io/en/stable/howto/unicode.html
You can change de encoding if you need too. But I don't know if you can include the encoding line on the file you mentioned. Hope It helps.

Tkinter Import Error on Puppy Linux

I'm having a few problems running tkinter on my Puppy Linux laptop using Geany to write and execute the code. I've got Python 3.1 running ok, but whenever I try to
import tkinter
I get the following error message:
file "/usr/lib/python3.1/tkinter/__init__.py", line 40, in <module>
import _tkinter
UnicodeEncodeError: 'latin-1 codec can't encode characters in position 0-3: ordinal not in range(256)
I don't have the foggiest what's going on!
I changed the Geany filetypes.python document from:
compiler=python -m py_compile "%f"
run_cmd=python "5f"
which runs Python2 as the default (tkinter imports fine with this!), to:
compiler=python3.1 -c "import py_compile;py_compile.compile('%f')"
run_cmd=python3.1 "%f"
in order to allow me to use python 3.1 as the default. I did copy this code, rather than writing it as I'm not familiar with Linux.
I would like to know how I can successfully use tkinter with Python3 on PuppyLinux. I'm not tied in to using Geany, so any help is appreciated!
Rich

python 3.4 encoding in windows 8.1

I use the script mentioned in this question, to check the encoding:
import sys, locale, os
print(sys.stdout.encoding)
print(sys.stdout.isatty())
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())
print(os.environ["PYTHONIOENCODING"])
print(chr(246), chr(9786), chr(9787))
and I obtain (python 3.4, windows 8.1):
windows-1252
False
cp1252
mbcs
windows-1252
ö Traceback (most recent call last):
File "C:/Users/.../UTF8-comprovacio.py", line 8, in <module>
print(chr(246), chr(9786), chr(9787))
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u263a' in position 0: character maps to <undefined>
I'm trying to change windows 8.1 encoding (e.g. I add an environment variable called "PYTHONIOENCODING" vith utf8 value) but the result is always the same. How can I change the encoding and the value of PYTHONIOENCODING in Windows 8.1? (in fact, I have another computer, also with Windows 8.1, that shows the correct values, utf-8, but I don't know why)
I had the same problem last week... I ended up just changing in IDE. Don't know your IDE, but if PyCharm, starting from the menu bar: File -> Settings... -> Editor -> File Encodings, then set: "IDE Encoding", "Project Encoding" and "Default encoding for properties files" ALL to UTF-8 and she now works like a charm.
Maybe yours has a similar option.

Resources