Reading data in Spyder with a different encoding - python-3.x

I'm trying to read a .spydata file into Spyder, which was written in a different platform (and probably with a different encoding), but spyder gives an error :
'ascii' codec can't decode byte 0xb5 in position 2: ordinal not in range(128)
I tried changing my encoding setting before loading spyder without success. Any ideas?

I fixed the encoding problem with this line on top of the file:
# -*- coding: utf-8 -*-
https://python.readthedocs.io/en/stable/howto/unicode.html
You can change de encoding if you need too. But I don't know if you can include the encoding line on the file you mentioned. Hope It helps.

Related

Pygrib UnicodeEncodeError

I use python 3.9.1 on macOS Big Sur with an M1 chip.
I would like to open the grib format file which is provided by Japan Meteorological Agency.
So, I tried to use the pygrib library as below:
import pygrib
gpv_file = pygrib.open("Z__C_RJTD_20171205000000_GSM_GPV_Rjp_Lsurf_FD0000-0312_grib2.bin")
But I got the error like this:
----> 1 gpv_file = pygrib.open("Z__C_RJTD_20171205000000_GSM_GPV_Rjp_Lsurf_FD0000-0312_grib2.bin")
pygrib/_pygrib.pyx in pygrib._pygrib.open.__cinit__()
pygrib/_pygrib.pyx in pygrib._pygrib._strencode()
UnicodeEncodeError: 'ascii' codec can't encode characters in position 50-51: ordinal not in range(128)
I asked other people to run the same code, and it somehow worked.
I am not sure what the problem is and how to fix it.

Python 3 and Turkish Character In Debian

I have to do a project with turkish content. In my ubuntu machine with python 3.6.5, I haven't any problem with this. But in production server, that is a debian machine with the same python 3.6.2, I have:
SyntaxError: Non-UTF-8 code starting with ....
When I use # -*- coding: utf-8 -*- as docs, I have:
utf-8 can't decode byte
I have searched google and stackoverflow all day. And try all suggestions.
Any advice about this issue would be appreciated.

UnicodeDecodeError When Opening a tar File in Python 3

I'm using Linux Mint 18.1 and Python 3.5.2.
I have a library that currently works using Python 2.7. I need to use the library for a Python 3 project. I'm updating it and have run into a unicode problem that I can't seem to fix.
First, a file is created via tar cvjf tarfile.tbz2 (on a Linux system) and is later opened in the Python library as open(tarfile).
If I run the code as is, using Python 3, I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc1 in position 11: invalid start byte
My first attempt at a fix was to open it as open(tarfile, encoding='utf-8') as I was under the impression that tar would just use what the file system gave it. When I do this, I get the same error (the byte value changes).
If I try with another encoding, say latin-1, I get the following error:
TypeError: Unicode-objects must be encoded before hashing
Which leads me to believe that utf-8 is correct, but I might be misunderstanding.
Can anyone provide suggestions?
I was going down the wrong path thinking this was some strange encoding problem. When it was just a simple problem with that fact that open() defaults to read as text (r). In Python 2 it's a no-op.
The fix is to open(tarfile, 'rb').
The head fake with unicode...should have seen this one coming. :facepalm:

Set filesystem encoding for python 3 on the Intel Edison

Steps to reproduce:
Create a file test.txt with content This is 中文 (i.e., UTF-8 encoded non-ASCII text).
Custom-compile python 3.5.2 on the Intel Edison.
Launch the custom-compiled python3 interpreter and issue the following piece of code:
with open('test.txt', 'r') as fh:
fh.readlines()
Actual behavior:
A UnicodeDecodeError exception is thrown. The file is opened as 'ASCII' instead of 'UTF-8' by default:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 8: ordinal not in range(128)
On a "regular" Linux system this problem is easily solved by setting a proper locale, see e.g. this post or that post. On the Intel Edison, however, I cannot set the LC_CTYPE since the default Yocto Linux distribution is missing locales (see e.g. this page).
I also tried to use a couple of other hacks like
import sys; sys.getfilesystemencoding = lambda: 'UTF-8'
import locale; locale.getpreferredencoding = lambda: 'utf-8'
And I tried setting the PYTHONIOENCODING=utf8 environment variable before starting the python interpreter.
However, none of this works. The only workaround is to specify the encoding explicitly as a command line parameter to the open command. This works for the above snippet, but it won't set the system-wide default for all packages I am using (that will implicitly open files as ASCII and may or may not offer me a way to override that default behavior).
What is the proper way to set the python interpreter default filesystem encoding? (Of course without installing unneeded system-wide locales.)
You can set the LC_ALL environment variable to alter the default:
$ python3 -c 'import locale; print(locale.getpreferredencoding())'
UTF-8
$ LC_ALL='.ASCII' python3 -c 'import locale; print(locale.getpreferredencoding())'
US-ASCII
I tested this both on OS X and CentOS 7.
As for your other attempts, here is why they don't work:
sys.getfilesystemencoding() applies to filenames only (e.g. os.listdir() and friends).
The io module doesn't actually use the locale.getpreferrredencoding() function, so altering the function on the module won't have an effect. A lightweight _bootlocale.py bootstrap module is used instead. More on that below.
PYTHONIOENCODING only applies to sys.stdin, sys.stdout and sys.stdstderr
If setting environment variables ultimately fails, you can still patch the _bootlocale module:
import _bootlocale
old = _bootlocale.getpreferredencoding # so you can restore
_bootlocale.getpreferredencoding = lambda *args: 'UTF-8'
This works for me (again on OS X and CentOS 7, tested with 3.6):
>>> import _bootlocale
>>> open('/tmp/test.txt', 'w').encoding # showing pre-patch setting
'UTF-8'
>>> old = _bootlocale.getpreferredencoding
>>> _bootlocale.getpreferredencoding = lambda *a: 'ASCII'
>>> open('/tmp/test.txt', 'w').encoding # gimped hook
'ASCII'

UnicodeEncodeError: 'charmap' codec can't encode character '\u2010': character maps to <undefined> [duplicate]

This question already has an answer here:
Why doesn't Python recognize my utf-8 encoded source file?
(1 answer)
Closed 6 years ago.
I keep getting UnicodeEncodeError when trying to print a 'Á' that I get from a website requested using selenium in python 3.4.
I already defined at the top of my .py file
# -*- coding: utf-8 -*-
the def is something like this:
from selenium import webdriver
b = webdriver.Firefox()
b.get('http://fisica.uniandes.edu.co/personal/profesores-de-planta')
dataProf = b.find_elements_by_css_selector('td[width="508"]')
for dato in dataProf:
print(datos.text)
and the exception:
Traceback (most recent call last):
File "C:/Users/Andres/Desktop/scrap/scrap.py", line 444, in <module>
dar_p_fisica()
File "C:/Users/Andres/Desktop/scrap/scrap.py", line 390, in dar_p_fisica
print(datos.text) #.encode().decode('ascii', 'ignore')
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2010' in position 173: character maps to <undefined>
thanks in advance
Already figured it out. As it is noted in this answer, the encoding error doesnt come from python, but from the encoding that the console is using. So the way to fix it is to run the command (in windows):
chcp 65001
that sets the encoding to UTF-8 and then run the program again. Or if working on pycharm as I was, go to Settings>Editor>File Encodings and set the IDE and Project encodings accondingly.

Resources