I want my code to stop producing errors depending on the locale set on the terminal. For example this code:
import os
print(f"Locale {os.getenv('LC_ALL')}")
foo_bytes = b'\xce\x94, \xd0\x99, \xd7\xa7, \xe2\x80\x8e \xd9\x85, \xe0\xb9\x97, \xe3\x81\x82, \xe5\x8f\xb6, \xe8\x91\x89, and \xeb\xa7\x90.'
print(foo_bytes.decode("utf-8", "replace"))
Will print cleanly as long as my locale is US.UTF-8.
However if I change my locale and run the script I described earlier
export LC_ALL=en_US.iso885915
python3 locale_script.py
It will fail on:
Locale en_US.iso885915
Traceback (most recent call last):
File "locale_script.py", line 10, in <module>
print(foo_bytes.decode("utf-8", "replace"))
File "/usr/lib/python3.6/encodings/iso8859_15.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0394' in position 0: character maps to <undefined>
This could be avoided if I had set the terminal locale within my script so it will use "utf-8" as I required. I have tried setlocale, but it still end up in the same error.
import locale
locale.setlocale(locale.LC_ALL, "C.UTF-8")
Any advice on what to do? I hope I could avoid to have to re-encode all my strings:
foo = 'Δ, Й, ק, م, ๗, あ, 叶, 葉, and 말.'
print(foo.encode().decode(sys.stdout.encoding))
My python script working very well when i'm running it in console mode. It's write a html file, convert with pdfkit in pdf and send a mail. All is allright.
But, when this script running from the crontab, there's some encoding error.
if len(html) > 0:
with open(file_html_to_pdf, 'w') as file:
message = '[PDF] >> writing the html file "{0}"'.format(file_html_to_pdf)
logging.info(message)
print(message)
file.write(html)
else:
raise IOError('html file is invalid !')
In the crontab, I've done this :
# VARIABLES FOR PYTHON
PYTHONIOENCODING=utf8
*/5 * * * * /home/users/my-user/my-project/env/bin/python /home/users/my-user/my-project/cronjob.py > /var/log/apps/my-project/cron_error_my-project.log 2>&1
And in the bashrc :
# set locale utf-8 in french.....et voilà
export PYTHONIOENCODING=utf-8
export LC_ALL=fr_FR.utf-8
export LANG="$LC_ALL"
The error message :
UnicodeEncodeError: 'ascii' codec can't encode character '\xf4' in position 37: ordinal not in range(128)
...
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 400: ordinal not in range(128)
...
Message: ('<!> Error => ', UnicodeEncodeError('ascii', '<!DOCTYPE html>\n<head>\n <title>---
Python version :
$ /home/users/my-user/my-project/env/bin/python --version
Python 3.4.2
I don't understand why :(. Can anybody help me?
Thanks
F.
In Py2:
(chr(145) + chr(78)).decode('utf-16')
I got u'\u4e91':
But in Py3:
(chr(145) + chr(78)).encode('utf-8').decode('utf-16')
I got an error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-16-le' codec can't decode byte 0x4e in position 2: truncated data
Sometimes, they work in a same way, such as (chr(93) + chr(78)), but sometimes not.
Why? And how can I do this right in Py3?
You have to use latin1 if you want to encode any byte tranparently:
(chr(145) + chr(78)).encode('latin1').decode('utf-16')
#'云'
chr(145) gets encoded with 2 bytes in utf8 (as with all values above 127):
chr(145).encode('utf8')
# b'\xc2\x91'
while it is what you wanted with latin1:
chr(145).encode('latin1')
# b'\x91'
My all code :`
import smtplib
from email.mime.text import MIMEText
smtp_adresi="smtp.gmail.com"
smtp_port=587
user="****#gmail.com"
pass="*****"
gonderilecek_adresler=["****#bilalkocak.net","******#gmail.com"]
konu="Subject"
content="HTML content"
mail=MIMEText(content,"html","UTF-8")
mail["From"]=kullanıcı_adı
mail["Subject"]=konu
mail["To"]=",".join(gonderilecek_adresler)
mail=mail.as_string()
s=smtplib.SMTP(smtp_adresi,smtp_port)
s.starttls()
s.login(user,pass)
s.sendmail(user,gonderilecek_adresler,mail)
Result:
C:\Users\ASUS\AppData\Local\Programs\Python\Python36-32\python.exe
"C:/Users/ASUS/PycharmProjects/Again/SMTP ile Mail/main.py" 'utf-8'
codec can't decode byte 0xe7 in position 7: invalid continuation byte
Process finished with exit code 0
\xe7 is the ç in your name but not encoded in UTF-8 (maybe cp1254, Turkish name?). Save your source file in UTF-8 and try again. It helps to have a reproducible example. Your ****** in the source probably removed the problem.
Note #coding:utf8 at the top of the file declares the encoding of the file, but it is the default in Python 3 so it is not required. Python 2 would need it.
Just when I thought I had my head wrapped around converting unicode to strings Python 2.7 throws an exception.
The code below loops over a number of accented characters and converts them to their non-accented equivalents. I've put in an special case for the double s.
#!/usr/bin/python
# -*- coding: utf-8 -*-
import unicodedata
def unicodeToString(uni):
return unicodedata.normalize("NFD", uni).encode("ascii", "ignore")
accentList = [
#(grave accent)
u"à",
u"è",
u"ì",
u"ò",
u"ù",
u"À",
u"È",
u"Ì",
u"Ò",
u"Ù",
#(acute accent)
u"á",
u"é",
u"í",
u"ó",
u"ú",
u"ý",
u"Á",
u"É",
u"Í",
u"Ó",
u"Ú",
u"Ý",
#(arrete accent)
u"â",
u"ê",
u"î",
u"ô",
u"û",
u"Â",
u"Ê",
u"Î",
u"Ô",
u"Û",
#(tilde )
u"ã",
u"ñ",
u"õ",
u"Ã",
u"Ñ",
u"Õ",
#(diaresses)
u"ä",
u"ë",
u"ï",
u"ö",
u"ü",
u"ÿ",
u"Ä",
u"Ë",
u"Ï",
u"Ö",
u"Ü",
u"Ÿ",
#ring
u"å",
u"Å",
#ae ligature
u"æ",
u"Æ",
#oe ligature
u"œ",
u"Œ",
#c cidilla
u"ç",
u"Ç",
# D stroke?
u"ð",
u"Ð",
# o slash
u"ø",
u"Ø",
u"¿", # Spanish ?
u"¡", # Spanish !
u"ß" # Double s
]
for i in range(0, len(accentList)):
try:
u = accentList[i]
s = unicodeToString(u)
if u == u"ß":
s = "ss"
print("%s -> %s" % (u, s))
except:
pass
Without the try/except I get an error:
File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\xc0' in position 0
: character maps to <undefined>
Is there anything I can do to make the code run without using the try/except? I'm using Sublime Text 2.
try/except does not make Unicode work. It just hides errors.
To fix the UnicodeEncodeError error, drop try/except and see Python, Unicode, and the Windows console.