I'm trying to output unicode symbols from Python 3. The program is a simple one-liner where I'm printing non-ascii literals:
print("íó")
The program is encoded in utf-8.
I'm running this program on two different Windows machines (Windows 7, Windows 10); on each machine I'm running this from both cmd and MinGW.
That works on Windows 10 (both cmd and MinGW).
On Windows 7 the output is decayed to ascii if run from cmd, and Python throws an exception if being run from MinGW:
Traceback (most recent call last): File "test.py", line 4, in
print("\xed\xf3") File "C:\Program Files (x86)\Python36-32\lib\encodings\cp1251.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in
position 0-1: character maps to
What could be the problem? What should be the difference in the machines (the Python and MinGW versions are very close)? What is the difference of Python run from cmd and MinGW that cause the exception?
Machines configurations:
Windows 7; Python 3.6.0; MinGW 2.8.3
Windows 10; Python 3.6.5; MinGW 2.8.5
Related
The Python 3.4 and Python 3.8/3.9 are different when I try execute below statement:
print('\u212B')
Python 3.8/3.9 can print it correctly.
Å
Python 3.4 will report an exception:
Traceback (most recent call last):
File "test.py", line 9, in <module>
print('\u212B')
UnicodeEncodeError: 'gbk' codec can't encode character '\u212b' in position 0: illegal multibyte sequence
And according to this page, I can avoid the exception by overwrite sys.stdout via statement:
sys.stdout = io.TextIOWrapper(buffer=sys.stdout.buffer,encoding='utf-8')
But python 3.4 still print different charactor as below:
鈩?
So my questions are:
Why do different python versions have different behaviors on stand output print?
How can I print correct value Å in python 3.4?
Edit 1:
I guess the difference is caused by PEP 528 -- Change Windows console encoding to UTF-8. But I still don't understand the machanism of console encoding and how I can print correct character in Python 3.4.
Edit 2:
One more difference, sys.getfilesystemencoding() will get utf-8 in Python 3.8/3.9 and get mbcs in Python 3.4.
Why?
Regarding the rationale behind the stdout encoding you can read more in the answers here: Changing default encoding of Python?
In short, Python 3.4 is using your OS's encoding by default as the one for stdout whereas with Python 3.8 it is set to UTF-8.
How to fix this?
You can use a new method - reconfigure introduced with Python 3.7:
sys.stdout.reconfigure(encoding='utf-8')
Typically, you can try setting the environment variable PYTHONIOENCODING to utf-8:
set PYTHONIOENCODING=utf8
in most of the operating systems except Windows where another environment variable must be set for it to work:
set PYTHONLEGACYWINDOWSIOENCODING=1
You can fix it in the version of Python preceding v. 3.7 via installing win-unicode-console package that handles UTF issues transparently on Windows:
pip install win-unicode-console
If you are not running the code directly from a console there is a possibility that your IDE configuration is interfering.
Command print(u'\u2588') works well in Python Online Compiler https://repl.it/languages/python3 but not in Raspberry pi through a windows 10 terminal using Putty. Following error appears:
>>> print('\u2588')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u2588' in position 0: ordinal not in range(128)
I would appretiatte if someone could help...
Problem got resolved with a fresh install of Raspbian Buster with Python 3.7.3 (previously I had Python 3.5).
This might have been the root cause...
I'm struggling in python 3.8.0 to unpack the long as specified in the example. Is there a workaround or syntax change that I don't understand?
test=b'\x02\x00\x00\x00'
In python 3.7.3 (windows)
struct.calcsize('l')
struct.unpack('l',test)
produces
4
(2,)
but in python 3.8.0 (linux) it returns
8
and
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
struct.error: unpack requires a buffer of 8 bytes
Thanks for any help.
It's not a Python version issue; it's a platform issue.
On x86-64:
Windows (MSVC) long is usually 4 bytes
Linux (GCC) long is usually 8 bytes
I use the script mentioned in this question, to check the encoding:
import sys, locale, os
print(sys.stdout.encoding)
print(sys.stdout.isatty())
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())
print(os.environ["PYTHONIOENCODING"])
print(chr(246), chr(9786), chr(9787))
and I obtain (python 3.4, windows 8.1):
windows-1252
False
cp1252
mbcs
windows-1252
ö Traceback (most recent call last):
File "C:/Users/.../UTF8-comprovacio.py", line 8, in <module>
print(chr(246), chr(9786), chr(9787))
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u263a' in position 0: character maps to <undefined>
I'm trying to change windows 8.1 encoding (e.g. I add an environment variable called "PYTHONIOENCODING" vith utf8 value) but the result is always the same. How can I change the encoding and the value of PYTHONIOENCODING in Windows 8.1? (in fact, I have another computer, also with Windows 8.1, that shows the correct values, utf-8, but I don't know why)
I had the same problem last week... I ended up just changing in IDE. Don't know your IDE, but if PyCharm, starting from the menu bar: File -> Settings... -> Editor -> File Encodings, then set: "IDE Encoding", "Project Encoding" and "Default encoding for properties files" ALL to UTF-8 and she now works like a charm.
Maybe yours has a similar option.
When trying to print a test:
print(áéíóú);
In my x64 computer works fine, but on my ARM7 server which also has Python 3 i get:
Traceback (most recent call last):
File "test.py", line 11, in <module>
print("\xe1\xe9\xedo\xfa")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
I'm puzzled. This last error was supposed to happen on Python 2.x as the strings are by default encoded in ASCII, but on 3 they are by default UNICODE. Any tips?
I suspect that your server has an ascii-only screen. Check by separating input and output on differnet lines and see which line raises.
s = "\xe1\xe9\xedo\xfa"
print(s)