UnicodeDecodeError: 'utf-8' Visual Studio Code using Python 3.5.2

UnicodeDecodeError: 'utf-8' Visual Studio Code using Python 3.5.2 - python-3.5

I'm using Visual Studio Code, Python 3.5.2, Windows 10
print("£")
produces 2 symbols that I'm not familiar with.
input("Enter pound sign: ") -> £
produces the error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 0: invalid start byte
The above examples work perfectly using Python IDLE.
I've tried changing the Encoding within Visual Studio Code with no success.
I've used Python 3.5.2 for some time now I never have this problem using Sublime Text 3.
Advice on solving this issue would be much appreciated.

This seems to be an issue with the Code Runner plugin of VS Code. A workaround is to run the code in the terminal. Add the following lines to the User or Workspace Settings file:
"code-runner.runInTerminal": false
This works on a Mac, I'm not sure about Windows.
Generally speaking the problem is that the default encoding used to print on the console doesn't support UTF-8. You can check the default encoding used by executing the following:
import sys
print(sys.stdout.encoding)
When I use the Code Runner plugin with the default configuration settings this value is US-ASCII, but when run it using terminal it is UTF-8.
Unfortunately I don't know how to change the default encoding for the Code Runner plugin.

Related

Why is my Jupyter Notebook using ascii codec even on Python3?

I'm analyzing a dataframe that contains French characters.
I set up an IPython kernel based on Python3 within my jupyter notebook, so the default encoding should be utf-8.
However, I can no longer save my notebook as soon as an accented character appears in my .ipynb (like é, è...), even though those are handled in utf-8.
The error message that I'm getting when trying to save is this :
Unexpected error while saving file: Videos.ipynb 'ascii' codec can't encode characters in position 347-348: ordinal not in range(128)
Here is some minimal code that gives me the same error message in a basic Python3 kernel
import pandas as pd
d = {'EN': ['Hey', 'Wassup'], 'FR': ['Hé', 'ça va']}
df = pd.DataFrame(data=d)
(actually a simple cell with "é" as text does prevent me from saving already)
I've seen similar questions, but all of them were based on Python 2.7 so nothing relevant. I also tried several things in order to fix this :
Including # coding: utf-8 at the top of my notebook
Specifying the utf-8 encoding when reading the csv file
Trying to read the file with latin-1 encoding then saving (still not
supported by ascii codec)
Also checked my default encoding in python3, just to make sure
sys.getdefaultencoding()
'utf-8'
Opened the .ipynb in Notepad++ : the encoding is set to utf-8 in there. I can add accented characters and save there, but then can no longer open the notebook in jupyter (I get an "unknown error" message").
The problem comes from saving the notebook and not reading the file, so basically I want to switch to utf-8 encoding for saving my .ipynb files but don't know where.
I suspect the issue might come from the fact that I'm using WSL on Windows10 - just a gut feeling though.
Thanks a lot for your help!

Well, turns out uninstalling then reinstalling jupyter notebook did the trick. Not sure what happened but it's now solved.

How do I configure a terminal to read UTF-8 characters?

I am working on a project which accepts user input via the command line. I am using up-to-date Windows 10 and (after much running around in circles...) I am aware that it is notoriously bad when it comes to handling UTF-8 characters. Consequently, I looked to VS Code and the integrated terminal (PowerShell) to perform input into the program. Sadly, the terminal seemed unable to accept accented UTF-8 characters such as "ë". I then did more research and configured the settings.json for VS Code for UTF-8 BOM encoding. Still, the terminal failed to read accented characters. I am certain that my program is not the issue, nor is my font. I have reduced my code to a test algorithm that simply accepts input using readline-sync (which the developers confirm is compatible with UTF-8: https://github.com/anseki/readline-sync/issues/58) and "console.log"s it.
The test case I have been using is "Hëllo". When I input "Hëllo" into the VS Code terminal, my program outputs "H�llo". When I tried converting all of my apps to UTF-8 encoding using the administrative language settings for Windows 10 and subsequently input "Hëllo" via the command terminal, it output "Hllo". I also tried forcing CMD to use Code Page 65001 with chcp 65001 for UTF-8 encoding, but it still produced "Hllo".
Here is the code I used to configure the VS Code PowerShell terminal via settings.json:
{
"[powershell]": {
"files.encoding": "utf8bom",
"files.autoGuessEncoding": true
}
}
And here is the brief code I wrote to test my input/output and whether the "ë" is being read successfully (which it is not):
const rlSync = require('readline-sync');
const name = rlSync.question('Enter Player 1 Username (Case Sensitive): ');
console.log(name);
If y'all see any issues, please let me know!
I am looking for any way to properly configure my CLI to accept accented characters for use in my program. I do not mean to restrict this question to VS Code or Powershell. If there is a way to accomplish this with the basic Windows 10 CMD, I would love that. Thank you for any help y'all can provide! <3

Is there any particular reason you're using VSCode? I think you're looking for the System.Console InputEncoding/OutputEncoding - unfortunately my default encoding just works with "Hëllo" so couldn't accurately test, and I don't know if this works with VSCode.
Try this (one line at a time)
# store current encoding settings
$i = [System.Console]::InputEncoding
$o = [System.Console]::OutputEncoding
# set encoding to UTF8
[System.Console]::InputEncoding = [System.Text.Encoding]::UTF8
[System.Console]::OutputEncoding = [System.Text.Encoding]::UTF8
# test
"Hëllo"
# revert (if you want. if you don't want, I would at least note the default encoding)
[System.Console]::InputEncoding = $i
[System.Console]::OutputEncoding = $o

python console does not display ü

dear fellow developers,
I have tried to teach python console to display ü, but it insists with displaying Ã¼ instead. I have tried it with Python 3.5 and Python 3.6. The result is the same. If I run a .py file containing line print("ü") with F5 command, it displays
Ã¼
instead of ü. If I type in the console
print ("ü")
it displays
ü
I know it has been discussed many times, but most of the methods I have come across during the last 5 hours have not helped me or I have not applied them properly. The problem exists also with other non ascii characters. I appreciate your help!

Try adding the following line on the top on your source file : # coding: utf-8
Also, check if your file encoding is correct (Always choose UTF-8).

Switch from Windows-1252 to utf8 default encoding in Python

I'm writing a module in Python using a Windows os. Unfortunately, the default enconding is cp1252 and it's giving me problems with special unicode characters (like the Hawaiian okina, ʻ, aka as '\u02bb').
I managed to solve the problem, but I would like to change the default setting.
I'm using Python 3.4.1. I read on other posts about the import sys and reload(sys) trick but it's not currently working for me.
Any help will be much appreciated!

How to convert Linux Python 3.4 code with national characters into executable code in windows

My national language is Polish.
I've got program in Python 3.4 which I wrote on linux. This program mostly work on text, Polish text. So of course, variable names don't have any special characters, but sometimes I put into them some strings with Polish characters, user will input from keyboard some strings with Polish characters and My program read from files, where I got strings with Polish characters.
Everything work well on Linux. I didn't think about encoding, it just worked. But now i want to make it work on Windows. Can you help me understand, what I should actually do to make this transform?
Or maybe some workaround - I just need to have Windows executable file. Perfect way for this, would be "Pyinstaller", but it work only for python 2.7, not 3.4. That's why I want to make it working on Windows, and in VirtualBox with py2exe compile into executable form. But maybe somone know way for this in Linux, it without this encoding problems, it would be great.
If not, I back to my question. I tried to convert my python scripts in gedit into ISO or CP1250 or 1252, I wrote in the file headline what coding I'm using, it actually worked a little, now my windows error pint me into my files with text form which I read some data, so I converted them too... But it didn't work.
So I decided, that it's no more time for blind trials, and I need to ask for help, I need to understand what encoding is used on windows, which on linux, what is the best way to convert one into another, and how make program read characters in right way.
The best way would be - I guess - not changing anything in encoding, but just make windows python understand what encoding I'm using. Is that possible?
Complete answer for my question would be great, but anything what will point me in right direction will also help me a lot.
OK. I'm not sure, if I understand your answer in comments, but tried sending text for myself via mail, coping it in virtualbox into notepad and save as utf_8. Still get this message:
C:\Users\python\Documents>py pytania.py
Traceback (most recent call last):
File "pytania.py", line 864, in <module>
start_probny()
File "pytania.py", line 850, in start_probny
utworzenie_danych()
File "pytania.py", line 740, in utworzenie_danych
utworzenie_pytania_piwo('a')
File "pytania.py", line 367, in utworzenie_pytania_piwo
for line in f: # Czytam po jednej linii
File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1134: cha
racter maps to <undefined>

As mentioned by Zero Piraeus in a comment: The default source encoding for Python 3.x is UTF-8, regardless of what platform it's running on...
If you have problems, that probably because your source code has incorrect encoding. You should stick to UTF-8 only (even though PEP 0263 -- Defining Python Source Code Encodings allows changing it).
The error message you provided is clear:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1134
Python is currently expecting UTF8 (because "UnicodeDecodeError"!), but it encounters an illegal char (0x9d isn't a valid char is UTF8). To diagnose the problem, use iconv(1) on a Linux machine, to detect errors buy doing a dummy conversion:
iconv -f utf8 -t iso8859-2 -o /dev/null < test.py
You can try to reproduce the problem by creating a very simple python file, typically : print "test €uro encoding"

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string