Switch from Windows-1252 to utf8 default encoding in Python - python-3.x

I'm writing a module in Python using a Windows os. Unfortunately, the default enconding is cp1252 and it's giving me problems with special unicode characters (like the Hawaiian okina, ʻ, aka as '\u02bb').
I managed to solve the problem, but I would like to change the default setting.
I'm using Python 3.4.1. I read on other posts about the import sys and reload(sys) trick but it's not currently working for me.
Any help will be much appreciated!

Related

When do you encounter issues with encodings in Python3?

I have recently learned more in depth about ASCII, Unicode, UTF-8, UTF-16, etc. in Python3, but I am struggling to understand when would one run into issues while reading/writing to files.
So if I open a file:
with open(myfile, 'a') as f:
f.write(stuff)
where stuff = 'Hello World!'
I have no issues writing to a file.
If I have something like:
non_latin = '娜', I can still write to the file with no problems.
So when does one run into issues regarding encodings? When does one use encode() and decode()?
You run into issues if the default encoding for your OS doesn't support the characters written. In your case the default (obtained from locale.getpreferredencoding(False)) is probably UTF-8. On Windows, the default is an ANSI encoding like cp1252 and wouldn't support Chinese. Best to be explicit and use open(myfile,'w',encoding='utf8') for example.

Why is my Jupyter Notebook using ascii codec even on Python3?

I'm analyzing a dataframe that contains French characters.
I set up an IPython kernel based on Python3 within my jupyter notebook, so the default encoding should be utf-8.
However, I can no longer save my notebook as soon as an accented character appears in my .ipynb (like é, è...), even though those are handled in utf-8.
The error message that I'm getting when trying to save is this :
Unexpected error while saving file: Videos.ipynb 'ascii' codec can't encode characters in position 347-348: ordinal not in range(128)
Here is some minimal code that gives me the same error message in a basic Python3 kernel
import pandas as pd
d = {'EN': ['Hey', 'Wassup'], 'FR': ['Hé', 'ça va']}
df = pd.DataFrame(data=d)
(actually a simple cell with "é" as text does prevent me from saving already)
I've seen similar questions, but all of them were based on Python 2.7 so nothing relevant. I also tried several things in order to fix this :
Including # coding: utf-8 at the top of my notebook
Specifying the utf-8 encoding when reading the csv file
Trying to read the file with latin-1 encoding then saving (still not
supported by ascii codec)
Also checked my default encoding in python3, just to make sure
sys.getdefaultencoding()
'utf-8'
Opened the .ipynb in Notepad++ : the encoding is set to utf-8 in there. I can add accented characters and save there, but then can no longer open the notebook in jupyter (I get an "unknown error" message").
The problem comes from saving the notebook and not reading the file, so basically I want to switch to utf-8 encoding for saving my .ipynb files but don't know where.
I suspect the issue might come from the fact that I'm using WSL on Windows10 - just a gut feeling though.
Thanks a lot for your help!
Well, turns out uninstalling then reinstalling jupyter notebook did the trick. Not sure what happened but it's now solved.

Javascript export CSV encoding utf-8 and using excel to open issue

I have been reading quite some posts including this one
Javascript export CSV encoding utf-8 issue
I know lots mentioned it's because of microsoft excel that using something like this should work
https://superuser.com/questions/280603/how-to-set-character-encoding-when-opening-excel
I have tried on ubuntu (which didn't even have any issue), on windows10, which I have to use the second posts to import, on mac which has the biggest problem because mac does not import, does not read the unicode at all.
Is there anyway I can do it in coding while exporting to enforce excel to open with utf-8? or some other workaround I might be able to try?
Thanks in advance for any help and suggestions.
Many Windows applications, including Excel, assume the localized ANSI encoding (Windows-1252 on US Windows) when opening a file, unless the file starts with byte-order-mark (BOM) code point. While UTF-8 doesn't need a BOM, a UTF-8-encoded BOM at the start of a file clues Excel that the file is UTF-8. The byte sequence is EF BB BF and the equivalent Unicode code point is U+FEFF.

python console does not display ü

dear fellow developers,
I have tried to teach python console to display ü, but it insists with displaying ü instead. I have tried it with Python 3.5 and Python 3.6. The result is the same. If I run a .py file containing line print("ü") with F5 command, it displays
ü
instead of ü. If I type in the console
print ("ü")
it displays
ü
I know it has been discussed many times, but most of the methods I have come across during the last 5 hours have not helped me or I have not applied them properly. The problem exists also with other non ascii characters. I appreciate your help!
Try adding the following line on the top on your source file : # coding: utf-8
Also, check if your file encoding is correct (Always choose UTF-8).

UnicodeDecodeError: 'utf-8' Visual Studio Code using Python 3.5.2

I'm using Visual Studio Code, Python 3.5.2, Windows 10
print("£")
produces 2 symbols that I'm not familiar with.
input("Enter pound sign: ") -> £
produces the error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 0: invalid start byte
The above examples work perfectly using Python IDLE.
I've tried changing the Encoding within Visual Studio Code with no success.
I've used Python 3.5.2 for some time now I never have this problem using Sublime Text 3.
Advice on solving this issue would be much appreciated.
This seems to be an issue with the Code Runner plugin of VS Code. A workaround is to run the code in the terminal. Add the following lines to the User or Workspace Settings file:
"code-runner.runInTerminal": false
This works on a Mac, I'm not sure about Windows.
Generally speaking the problem is that the default encoding used to print on the console doesn't support UTF-8. You can check the default encoding used by executing the following:
import sys
print(sys.stdout.encoding)
When I use the Code Runner plugin with the default configuration settings this value is US-ASCII, but when run it using terminal it is UTF-8.
Unfortunately I don't know how to change the default encoding for the Code Runner plugin.

Resources