For a class of mine I have to make a very basic calculator. I want to write the code in such a way that the user can just enter what they want to do (ex. √64) press return and get the answer. I wrote this:
if '√' in operation:
squareRoot = operation.replace('√','')
squareRootFinal = math.sqrt(int(squareRoot))
When I run this in IDLE, it works like a charm. However when I run this in Terminal I get the following error:
SyntaxError: Non-ASCII character '\xe2' in file x.py on line 50, but no encoding declared;
any suggestions?
Just declare the encoding. Python is begin a bit cautious here and not guessing the encoding of your text file. IDLE is a text editor and so has already guessed the encoding and stored things internally as unicode, which it can pass directly to the Python interpreter.
Put
#!/usr/bin/env python3
# -*- coding: UTF-8 -*-
at the top of the file. (It's pretty unlikely nowadays that your encoding is not UTF-8.)
For reference, in past times files had a variety of different possible encodings. That means that the same text could be stored in different ways in binary, when written to disk. Almost all encodings have the same interpretation of bytes 0 to 127—the ASCII subset. But if any other bytes occur in the file, their meaning is potentially ambiguous.
However, in recent years, UTF-8 has become by far the most common encoding, so it's almost always a safe guess.
Related
I have recently learned more in depth about ASCII, Unicode, UTF-8, UTF-16, etc. in Python3, but I am struggling to understand when would one run into issues while reading/writing to files.
So if I open a file:
with open(myfile, 'a') as f:
f.write(stuff)
where stuff = 'Hello World!'
I have no issues writing to a file.
If I have something like:
non_latin = '娜', I can still write to the file with no problems.
So when does one run into issues regarding encodings? When does one use encode() and decode()?
You run into issues if the default encoding for your OS doesn't support the characters written. In your case the default (obtained from locale.getpreferredencoding(False)) is probably UTF-8. On Windows, the default is an ANSI encoding like cp1252 and wouldn't support Chinese. Best to be explicit and use open(myfile,'w',encoding='utf8') for example.
The following code is generating the error shown in the title, please can someone explain what it means, what is causing it and how to fix?
year = eval(input("Enter the year: "))
day = (1+5*((year-1)%4)+4*((year-1)%100)+6*((year-1)%400))%7
print("The 1st of January",year,"falls on day",day,end = " ")
print("of the week(0=Sun,…,6=Sat).")
You have saved the Python source code file in a different encoding than UTF-8, namely Windows-1252, because you are working on Windows.
Switch your text editor to UTF-8 and run the program again.
Python 3 expects its source code to be saved as UTF-8 unless a different encoding is declared. The … is the offending character (character code 85 in Windows-1252). This character code is invalid in UTF-8, so Python fails to read the source code and gives the error you see.
When you re-save the file as UTF-8, the Unicode code point HORIZONTAL ELLIPSIS (U+2026) will be used, which becomes 3 bytes in the file (E2, 80, A6), and Python will be happy.
Alternatively you can explicitly declare your source code to be Windows-1252, this is equally valid and will work as well.
I'm trying to write a object to a gzipped json file in one step (minimising code, and possibly saving memory space). My initial idea (python3) was thus:
import gzip, json
with gzip.open("/tmp/test.gz", mode="wb") as f:
json.dump({"a": 1}, f)
This however fails: TypeError: 'str' does not support the buffer interface, which I think has to do with a string not being encoded to bytes. So what is the proper way to do this?
Current solution that I'm unhappy with:
Opening the file in text-mode solves the problem:
import gzip, json
with gzip.open("/tmp/test.gz", mode="wt") as f:
json.dump({"a": 1}, f)
however I don't like text-mode. In my mind (and maybe this is wrong, but supported by this), text mode is used to fix line-endings. This shouldn't be an issue because json doesn't have line endings, but I don't like it (possibly) messing with my bytes, it (possibly) is slower because it's looking for line-endings to fix, and (worst of all) I don't understand why something about line-endings fixes my encoding problems?
offtopic: I should have dived into the docs a bit further than I initially did.
The python docs show:
Normally, files are opened in text mode, that means, you read and write strings from and to the file, which are encoded in a specific encoding (the default being UTF-8). 'b' appended to the mode opens the file in binary mode: now the data is read and written in the form of bytes objects. This mode should be used for all files that don’t contain text.
I don't fully agree that the result from a json encode is a string (I think it should be a set of bytes, since it explicitly defines that it uses utf-8 encoding), but I noticed this before. So I guess text mode it is.
My national language is Polish.
I've got program in Python 3.4 which I wrote on linux. This program mostly work on text, Polish text. So of course, variable names don't have any special characters, but sometimes I put into them some strings with Polish characters, user will input from keyboard some strings with Polish characters and My program read from files, where I got strings with Polish characters.
Everything work well on Linux. I didn't think about encoding, it just worked. But now i want to make it work on Windows. Can you help me understand, what I should actually do to make this transform?
Or maybe some workaround - I just need to have Windows executable file. Perfect way for this, would be "Pyinstaller", but it work only for python 2.7, not 3.4. That's why I want to make it working on Windows, and in VirtualBox with py2exe compile into executable form. But maybe somone know way for this in Linux, it without this encoding problems, it would be great.
If not, I back to my question. I tried to convert my python scripts in gedit into ISO or CP1250 or 1252, I wrote in the file headline what coding I'm using, it actually worked a little, now my windows error pint me into my files with text form which I read some data, so I converted them too... But it didn't work.
So I decided, that it's no more time for blind trials, and I need to ask for help, I need to understand what encoding is used on windows, which on linux, what is the best way to convert one into another, and how make program read characters in right way.
The best way would be - I guess - not changing anything in encoding, but just make windows python understand what encoding I'm using. Is that possible?
Complete answer for my question would be great, but anything what will point me in right direction will also help me a lot.
OK. I'm not sure, if I understand your answer in comments, but tried sending text for myself via mail, coping it in virtualbox into notepad and save as utf_8. Still get this message:
C:\Users\python\Documents>py pytania.py
Traceback (most recent call last):
File "pytania.py", line 864, in <module>
start_probny()
File "pytania.py", line 850, in start_probny
utworzenie_danych()
File "pytania.py", line 740, in utworzenie_danych
utworzenie_pytania_piwo('a')
File "pytania.py", line 367, in utworzenie_pytania_piwo
for line in f: # Czytam po jednej linii
File "C:\Python34\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1134: cha
racter maps to <undefined>
As mentioned by Zero Piraeus in a comment: The default source encoding for Python 3.x is UTF-8, regardless of what platform it's running on...
If you have problems, that probably because your source code has incorrect encoding. You should stick to UTF-8 only (even though PEP 0263 -- Defining Python Source Code Encodings allows changing it).
The error message you provided is clear:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1134
Python is currently expecting UTF8 (because "UnicodeDecodeError"!), but it encounters an illegal char (0x9d isn't a valid char is UTF8). To diagnose the problem, use iconv(1) on a Linux machine, to detect errors buy doing a dummy conversion:
iconv -f utf8 -t iso8859-2 -o /dev/null < test.py
You can try to reproduce the problem by creating a very simple python file, typically : print "test €uro encoding"
I extract unicode text from a website and do some printing that can contain non-ascii characters. Printing those raises a UnicodeEncodeError. How should I handle this case? I need a solution that requires minimum changes to the existing code.
I tried encoding the strings (by wrapping sys.stdout) with UTF-8, but then I get a TypeError since the original sys.stdout.write() method expects str not bytes.
I don't want the characters to be lost. The user may want to pipe the output into a file that would in this case be UTF8 formatted.
Edit: Setting PYTHONIOENCODING=utf8 can fix the problem, but isn't there a way I can "imitate" this behaviour in Python? Since utf8 doesn't match the Windows console encoding, some characters look strange. But this is by far better than a crash of the program.
Anyway to format the output from Python so I don't need to modify environment variables?