ascii codec error is showing while running python3 code via apche wsgi - python-3.x

Objective: Insert a Japanese text to a .ini file
Steps:
Python version used 3.6 and used Flask framework
The library used for writing config file is Configparser
Issue:
When I try running the code via the "flask run" command, there are no issues. The Japanese text is inserted to ini file correctly
But when I try running the same code via apache(wsgi) I am getting the following error
'ascii' codec can't encode characters in position 17-23: ordinal not in range(128)

Never interact with text files without explicitly specifying the encoding.
Sadly, even Python's official documentation neglects to obey this simple rule.
import configparser
config_path = 'your_file.ini'
config = configparser.ConfigParser()
with open(config_path, encoding='utf8') as fp:
config.read_file(fp)
with open(config_path, 'w', encoding='utf8') as fp:
config.write(fp)
utf8 is a reasonable choice for storing Unicode characters, pick a different encoding if you have a preference.
Japanese characters consume up to five bytes per character in UTF-8, picking utf16 (always two bytes per character) can result in smaller ini files, but there is no functional difference.

Related

When do you encounter issues with encodings in Python3?

I have recently learned more in depth about ASCII, Unicode, UTF-8, UTF-16, etc. in Python3, but I am struggling to understand when would one run into issues while reading/writing to files.
So if I open a file:
with open(myfile, 'a') as f:
f.write(stuff)
where stuff = 'Hello World!'
I have no issues writing to a file.
If I have something like:
non_latin = '娜', I can still write to the file with no problems.
So when does one run into issues regarding encodings? When does one use encode() and decode()?
You run into issues if the default encoding for your OS doesn't support the characters written. In your case the default (obtained from locale.getpreferredencoding(False)) is probably UTF-8. On Windows, the default is an ANSI encoding like cp1252 and wouldn't support Chinese. Best to be explicit and use open(myfile,'w',encoding='utf8') for example.

Why is my Jupyter Notebook using ascii codec even on Python3?

I'm analyzing a dataframe that contains French characters.
I set up an IPython kernel based on Python3 within my jupyter notebook, so the default encoding should be utf-8.
However, I can no longer save my notebook as soon as an accented character appears in my .ipynb (like é, è...), even though those are handled in utf-8.
The error message that I'm getting when trying to save is this :
Unexpected error while saving file: Videos.ipynb 'ascii' codec can't encode characters in position 347-348: ordinal not in range(128)
Here is some minimal code that gives me the same error message in a basic Python3 kernel
import pandas as pd
d = {'EN': ['Hey', 'Wassup'], 'FR': ['Hé', 'ça va']}
df = pd.DataFrame(data=d)
(actually a simple cell with "é" as text does prevent me from saving already)
I've seen similar questions, but all of them were based on Python 2.7 so nothing relevant. I also tried several things in order to fix this :
Including # coding: utf-8 at the top of my notebook
Specifying the utf-8 encoding when reading the csv file
Trying to read the file with latin-1 encoding then saving (still not
supported by ascii codec)
Also checked my default encoding in python3, just to make sure
sys.getdefaultencoding()
'utf-8'
Opened the .ipynb in Notepad++ : the encoding is set to utf-8 in there. I can add accented characters and save there, but then can no longer open the notebook in jupyter (I get an "unknown error" message").
The problem comes from saving the notebook and not reading the file, so basically I want to switch to utf-8 encoding for saving my .ipynb files but don't know where.
I suspect the issue might come from the fact that I'm using WSL on Windows10 - just a gut feeling though.
Thanks a lot for your help!
Well, turns out uninstalling then reinstalling jupyter notebook did the trick. Not sure what happened but it's now solved.

√ not recognized in Terminal

For a class of mine I have to make a very basic calculator. I want to write the code in such a way that the user can just enter what they want to do (ex. √64) press return and get the answer. I wrote this:
if '√' in operation:
squareRoot = operation.replace('√','')
squareRootFinal = math.sqrt(int(squareRoot))
When I run this in IDLE, it works like a charm. However when I run this in Terminal I get the following error:
SyntaxError: Non-ASCII character '\xe2' in file x.py on line 50, but no encoding declared;
any suggestions?
Just declare the encoding. Python is begin a bit cautious here and not guessing the encoding of your text file. IDLE is a text editor and so has already guessed the encoding and stored things internally as unicode, which it can pass directly to the Python interpreter.
Put
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
at the top of the file. (It's pretty unlikely nowadays that your encoding is not UTF-8.)
For reference, in past times files had a variety of different possible encodings. That means that the same text could be stored in different ways in binary, when written to disk. Almost all encodings have the same interpretation of bytes 0 to 127—the ASCII subset. But if any other bytes occur in the file, their meaning is potentially ambiguous.
However, in recent years, UTF-8 has become by far the most common encoding, so it's almost always a safe guess.

BeautifulSoup UnicodeEncodeError

I'm trying to parse HTML page that I saved to my computer(windows 10)
from bs4 import BeautifulSoup
with open("res/JLPT N5 vocab list.html", "r", encoding="utf8") as f:
soup = BeautifulSoup(f, "html.parser")
tables = soup.find_all("table")
sectable= tables[1]
for tr in sectable.contents[1:]:
if tr.name == "tr":
try:
print(tr.td.a.get_text())
except(AttributeError):
continue
It should print all of japanese words in first column but error was raised at print(tr.td.a.get_text()) said UnicodeEncodeError: 'charmap" codec can't encode character in position 0-1: character maps to (undefined) so, how can I solve this error?
Finally, I solved it, according to Beautiful Soup Documentatioin's Miscellaneous.
UnicodeEncodeError: 'charmap' codec can't encode character u'\xfoo' in position bar (or just about any other UnicodeEncodeError) - This is not a problem with Beautiful Soup. This problem shows up in two main situations. First, when you try to print a Unicode character that your console doesn’t know how to display. (See this page on the Python wiki for help.) Second, when you’re writing to a file and you pass in a Unicode character that’s not supported by your default encoding. In this case, the simplest solution is to explicitly encode the Unicode string into UTF-8 with u.encode("utf8").
In my case, it because I tried to print a Unicode character that my console doesn't know how to display it.
So, I enabled TrueType font for console , changed system locale to Japanese(so that console encode was changed and can choose font that support japanese for console) and then changed console font to MSコシック(this font appeared after I changed system locale).
If I want to write it to file, I just open file and specify encoding to UTF-8.

How to convert Linux Python 3.4 code with national characters into executable code in windows

My national language is Polish.
I've got program in Python 3.4 which I wrote on linux. This program mostly work on text, Polish text. So of course, variable names don't have any special characters, but sometimes I put into them some strings with Polish characters, user will input from keyboard some strings with Polish characters and My program read from files, where I got strings with Polish characters.
Everything work well on Linux. I didn't think about encoding, it just worked. But now i want to make it work on Windows. Can you help me understand, what I should actually do to make this transform?
Or maybe some workaround - I just need to have Windows executable file. Perfect way for this, would be "Pyinstaller", but it work only for python 2.7, not 3.4. That's why I want to make it working on Windows, and in VirtualBox with py2exe compile into executable form. But maybe somone know way for this in Linux, it without this encoding problems, it would be great.
If not, I back to my question. I tried to convert my python scripts in gedit into ISO or CP1250 or 1252, I wrote in the file headline what coding I'm using, it actually worked a little, now my windows error pint me into my files with text form which I read some data, so I converted them too... But it didn't work.
So I decided, that it's no more time for blind trials, and I need to ask for help, I need to understand what encoding is used on windows, which on linux, what is the best way to convert one into another, and how make program read characters in right way.
The best way would be - I guess - not changing anything in encoding, but just make windows python understand what encoding I'm using. Is that possible?
Complete answer for my question would be great, but anything what will point me in right direction will also help me a lot.
OK. I'm not sure, if I understand your answer in comments, but tried sending text for myself via mail, coping it in virtualbox into notepad and save as utf_8. Still get this message:
C:\Users\python\Documents>py pytania.py
Traceback (most recent call last):
File "pytania.py", line 864, in <module>
start_probny()
File "pytania.py", line 850, in start_probny
utworzenie_danych()
File "pytania.py", line 740, in utworzenie_danych
utworzenie_pytania_piwo('a')
File "pytania.py", line 367, in utworzenie_pytania_piwo
for line in f: # Czytam po jednej linii
File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1134: cha
racter maps to <undefined>
As mentioned by Zero Piraeus in a comment: The default source encoding for Python 3.x is UTF-8, regardless of what platform it's running on...
If you have problems, that probably because your source code has incorrect encoding. You should stick to UTF-8 only (even though PEP 0263 -- Defining Python Source Code Encodings allows changing it).
The error message you provided is clear:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1134
Python is currently expecting UTF8 (because "UnicodeDecodeError"!), but it encounters an illegal char (0x9d isn't a valid char is UTF8). To diagnose the problem, use iconv(1) on a Linux machine, to detect errors buy doing a dummy conversion:
iconv -f utf8 -t iso8859-2 -o /dev/null < test.py
You can try to reproduce the problem by creating a very simple python file, typically : print "test €uro encoding"

Resources