I use the script mentioned in this question, to check the encoding:
import sys, locale, os
print(sys.stdout.encoding)
print(sys.stdout.isatty())
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())
print(os.environ["PYTHONIOENCODING"])
print(chr(246), chr(9786), chr(9787))
and I obtain (python 3.4, windows 8.1):
windows-1252
False
cp1252
mbcs
windows-1252
ö Traceback (most recent call last):
File "C:/Users/.../UTF8-comprovacio.py", line 8, in <module>
print(chr(246), chr(9786), chr(9787))
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u263a' in position 0: character maps to <undefined>
I'm trying to change windows 8.1 encoding (e.g. I add an environment variable called "PYTHONIOENCODING" vith utf8 value) but the result is always the same. How can I change the encoding and the value of PYTHONIOENCODING in Windows 8.1? (in fact, I have another computer, also with Windows 8.1, that shows the correct values, utf-8, but I don't know why)
I had the same problem last week... I ended up just changing in IDE. Don't know your IDE, but if PyCharm, starting from the menu bar: File -> Settings... -> Editor -> File Encodings, then set: "IDE Encoding", "Project Encoding" and "Default encoding for properties files" ALL to UTF-8 and she now works like a charm.
Maybe yours has a similar option.
Related
The Python 3.4 and Python 3.8/3.9 are different when I try execute below statement:
print('\u212B')
Python 3.8/3.9 can print it correctly.
Å
Python 3.4 will report an exception:
Traceback (most recent call last):
File "test.py", line 9, in <module>
print('\u212B')
UnicodeEncodeError: 'gbk' codec can't encode character '\u212b' in position 0: illegal multibyte sequence
And according to this page, I can avoid the exception by overwrite sys.stdout via statement:
sys.stdout = io.TextIOWrapper(buffer=sys.stdout.buffer,encoding='utf-8')
But python 3.4 still print different charactor as below:
鈩?
So my questions are:
Why do different python versions have different behaviors on stand output print?
How can I print correct value Å in python 3.4?
Edit 1:
I guess the difference is caused by PEP 528 -- Change Windows console encoding to UTF-8. But I still don't understand the machanism of console encoding and how I can print correct character in Python 3.4.
Edit 2:
One more difference, sys.getfilesystemencoding() will get utf-8 in Python 3.8/3.9 and get mbcs in Python 3.4.
Why?
Regarding the rationale behind the stdout encoding you can read more in the answers here: Changing default encoding of Python?
In short, Python 3.4 is using your OS's encoding by default as the one for stdout whereas with Python 3.8 it is set to UTF-8.
How to fix this?
You can use a new method - reconfigure introduced with Python 3.7:
sys.stdout.reconfigure(encoding='utf-8')
Typically, you can try setting the environment variable PYTHONIOENCODING to utf-8:
set PYTHONIOENCODING=utf8
in most of the operating systems except Windows where another environment variable must be set for it to work:
set PYTHONLEGACYWINDOWSIOENCODING=1
You can fix it in the version of Python preceding v. 3.7 via installing win-unicode-console package that handles UTF issues transparently on Windows:
pip install win-unicode-console
If you are not running the code directly from a console there is a possibility that your IDE configuration is interfering.
I'm trying to output unicode symbols from Python 3. The program is a simple one-liner where I'm printing non-ascii literals:
print("íó")
The program is encoded in utf-8.
I'm running this program on two different Windows machines (Windows 7, Windows 10); on each machine I'm running this from both cmd and MinGW.
That works on Windows 10 (both cmd and MinGW).
On Windows 7 the output is decayed to ascii if run from cmd, and Python throws an exception if being run from MinGW:
Traceback (most recent call last): File "test.py", line 4, in
print("\xed\xf3") File "C:\Program Files (x86)\Python36-32\lib\encodings\cp1251.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode characters in
position 0-1: character maps to
What could be the problem? What should be the difference in the machines (the Python and MinGW versions are very close)? What is the difference of Python run from cmd and MinGW that cause the exception?
Machines configurations:
Windows 7; Python 3.6.0; MinGW 2.8.3
Windows 10; Python 3.6.5; MinGW 2.8.5
Steps to reproduce:
Create a file test.txt with content This is 中文 (i.e., UTF-8 encoded non-ASCII text).
Custom-compile python 3.5.2 on the Intel Edison.
Launch the custom-compiled python3 interpreter and issue the following piece of code:
with open('test.txt', 'r') as fh:
fh.readlines()
Actual behavior:
A UnicodeDecodeError exception is thrown. The file is opened as 'ASCII' instead of 'UTF-8' by default:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 8: ordinal not in range(128)
On a "regular" Linux system this problem is easily solved by setting a proper locale, see e.g. this post or that post. On the Intel Edison, however, I cannot set the LC_CTYPE since the default Yocto Linux distribution is missing locales (see e.g. this page).
I also tried to use a couple of other hacks like
import sys; sys.getfilesystemencoding = lambda: 'UTF-8'
import locale; locale.getpreferredencoding = lambda: 'utf-8'
And I tried setting the PYTHONIOENCODING=utf8 environment variable before starting the python interpreter.
However, none of this works. The only workaround is to specify the encoding explicitly as a command line parameter to the open command. This works for the above snippet, but it won't set the system-wide default for all packages I am using (that will implicitly open files as ASCII and may or may not offer me a way to override that default behavior).
What is the proper way to set the python interpreter default filesystem encoding? (Of course without installing unneeded system-wide locales.)
You can set the LC_ALL environment variable to alter the default:
$ python3 -c 'import locale; print(locale.getpreferredencoding())'
UTF-8
$ LC_ALL='.ASCII' python3 -c 'import locale; print(locale.getpreferredencoding())'
US-ASCII
I tested this both on OS X and CentOS 7.
As for your other attempts, here is why they don't work:
sys.getfilesystemencoding() applies to filenames only (e.g. os.listdir() and friends).
The io module doesn't actually use the locale.getpreferrredencoding() function, so altering the function on the module won't have an effect. A lightweight _bootlocale.py bootstrap module is used instead. More on that below.
PYTHONIOENCODING only applies to sys.stdin, sys.stdout and sys.stdstderr
If setting environment variables ultimately fails, you can still patch the _bootlocale module:
import _bootlocale
old = _bootlocale.getpreferredencoding # so you can restore
_bootlocale.getpreferredencoding = lambda *args: 'UTF-8'
This works for me (again on OS X and CentOS 7, tested with 3.6):
>>> import _bootlocale
>>> open('/tmp/test.txt', 'w').encoding # showing pre-patch setting
'UTF-8'
>>> old = _bootlocale.getpreferredencoding
>>> _bootlocale.getpreferredencoding = lambda *a: 'ASCII'
>>> open('/tmp/test.txt', 'w').encoding # gimped hook
'ASCII'
This question already has an answer here:
Why doesn't Python recognize my utf-8 encoded source file?
(1 answer)
Closed 6 years ago.
I keep getting UnicodeEncodeError when trying to print a 'Á' that I get from a website requested using selenium in python 3.4.
I already defined at the top of my .py file
# -*- coding: utf-8 -*-
the def is something like this:
from selenium import webdriver
b = webdriver.Firefox()
b.get('http://fisica.uniandes.edu.co/personal/profesores-de-planta')
dataProf = b.find_elements_by_css_selector('td[width="508"]')
for dato in dataProf:
print(datos.text)
and the exception:
Traceback (most recent call last):
File "C:/Users/Andres/Desktop/scrap/scrap.py", line 444, in <module>
dar_p_fisica()
File "C:/Users/Andres/Desktop/scrap/scrap.py", line 390, in dar_p_fisica
print(datos.text) #.encode().decode('ascii', 'ignore')
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2010' in position 173: character maps to <undefined>
thanks in advance
Already figured it out. As it is noted in this answer, the encoding error doesnt come from python, but from the encoding that the console is using. So the way to fix it is to run the command (in windows):
chcp 65001
that sets the encoding to UTF-8 and then run the program again. Or if working on pycharm as I was, go to Settings>Editor>File Encodings and set the IDE and Project encodings accondingly.
When trying to print a test:
print(áéíóú);
In my x64 computer works fine, but on my ARM7 server which also has Python 3 i get:
Traceback (most recent call last):
File "test.py", line 11, in <module>
print("\xe1\xe9\xedo\xfa")
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
I'm puzzled. This last error was supposed to happen on Python 2.x as the strings are by default encoded in ASCII, but on 3 they are by default UNICODE. Any tips?
I suspect that your server has an ascii-only screen. Check by separating input and output on differnet lines and see which line raises.
s = "\xe1\xe9\xedo\xfa"
print(s)