python : error encoding 'ascii' ONLY running with crontab - python-3.x

My python script working very well when i'm running it in console mode. It's write a html file, convert with pdfkit in pdf and send a mail. All is allright.
But, when this script running from the crontab, there's some encoding error.
if len(html) > 0:
with open(file_html_to_pdf, 'w') as file:
message = '[PDF] >> writing the html file "{0}"'.format(file_html_to_pdf)
logging.info(message)
print(message)
file.write(html)
else:
raise IOError('html file is invalid !')
In the crontab, I've done this :
# VARIABLES FOR PYTHON
PYTHONIOENCODING=utf8
*/5 * * * * /home/users/my-user/my-project/env/bin/python /home/users/my-user/my-project/cronjob.py > /var/log/apps/my-project/cron_error_my-project.log 2>&1
And in the bashrc :
# set locale utf-8 in french.....et voilà
export PYTHONIOENCODING=utf-8
export LC_ALL=fr_FR.utf-8
export LANG="$LC_ALL"
The error message :
UnicodeEncodeError: 'ascii' codec can't encode character '\xf4' in position 37: ordinal not in range(128)
...
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 400: ordinal not in range(128)
...
Message: ('<!> Error => ', UnicodeEncodeError('ascii', '<!DOCTYPE html>\n<head>\n <title>---
Python version :
$ /home/users/my-user/my-project/env/bin/python --version
Python 3.4.2
I don't understand why :(. Can anybody help me?
Thanks
F.

Related

Set locale for stdout in python3

I want my code to stop producing errors depending on the locale set on the terminal. For example this code:
import os
print(f"Locale {os.getenv('LC_ALL')}")
foo_bytes = b'\xce\x94, \xd0\x99, \xd7\xa7, \xe2\x80\x8e \xd9\x85, \xe0\xb9\x97, \xe3\x81\x82, \xe5\x8f\xb6, \xe8\x91\x89, and \xeb\xa7\x90.'
print(foo_bytes.decode("utf-8", "replace"))
Will print cleanly as long as my locale is US.UTF-8.
However if I change my locale and run the script I described earlier
export LC_ALL=en_US.iso885915
python3 locale_script.py
It will fail on:
Locale en_US.iso885915
Traceback (most recent call last):
File "locale_script.py", line 10, in <module>
print(foo_bytes.decode("utf-8", "replace"))
File "/usr/lib/python3.6/encodings/iso8859_15.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0394' in position 0: character maps to <undefined>
This could be avoided if I had set the terminal locale within my script so it will use "utf-8" as I required. I have tried setlocale, but it still end up in the same error.
import locale
locale.setlocale(locale.LC_ALL, "C.UTF-8")
Any advice on what to do? I hope I could avoid to have to re-encode all my strings:
foo = 'Δ, Й, ק, ‎ م, ๗, あ, 叶, 葉, and 말.'
print(foo.encode().decode(sys.stdout.encoding))

Mimicking bash wc functionalities using python

I have written a very simple python programme, called wc.py, which mimics "bash wc" behaviour to count the number of words, lines and bytes in a file. My programme is as follow:
import sys
path = sys.argv[1]
w = 0
l = 0
b = 0
for currentLine in file:
wordsInLine = currentLine.strip().split(' ')
wordsInLine = [word for word in wordsInLine if word != '']
w += len(wordsInLine)
b += len(currentLine.encode('utf-8'))
l += 1
#output
print(str(l) + ' ' + str(w) + ' ' + str(b))
In order to execute my programme you should execute the following command:
python3 wc.py [a file to read the data from]
As the result it shows
[The number of lines in the file] [The number of words in the file] [The number of bytes in the file] [the file directory path]
The files I used to test my code is as follow:
file.txt which contains the following data:
1
2
3
4
Executing "wc file.txt" returns
4 4 8
Executing "python3 wc.py file.txt" returns 4 4 8
Download "Annual enterprise survey: 2020 financial year (provisional) – CSV" from CSV file download
Executing "wc [fileName].csv" returns
37081 500273 5881081
Executing "python3 wc.py [fileName].csv" returns
37081 500273 5844000
and a [something].pdf file
Executing "wc [something].pdf" works.
Executing "python3 code.py" throws the following errors:
Traceback (most recent call last):
File "code.py", line 10, in <module>
for currentLine in file:
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 10: invalid start byte
As you can see, the output of python3 code.py [something].pdf and python3 code.py [something].csv is not the same as what wc returns. Could you help me to find the reason of this erroneous behaviour in my code?
Regarding the CSV file, if you look at the difference between your result and that of wc:
5881081 - 5844000 = 37081 which is exactly the number of lines.
That is, every line has one additional character in the original file. That character is the carriage return \r which got lost in Python because you iterate over lines and don't specify the linebreaks. If you want a byte-correct result, you have to first identify the type of linebreaks used in the file (and watch out for inconsistencies throughout the document).

Problem /bin/sh: 0: Illegal option -* running code in python3

Error image and editor configured in UTF8
I am trying to run the following code in python 3.7.5
pi = 3.14159
radius = 15.3
print ('Circle Area is', pi * radius ** 2)
does not run error appears: SyntaxError: Non-ASCII character '\ xc3' in file
adding the command line #! - * - conding: utf8 - * - the following error appears:
/ bin / sh: 0: Illegal option - *
It should be without spaces and also its not conding its coding
# -*- coding: utf-8 -*-
not
#! - * - conding: utf8 - * -
You don't need this line with Python 3. The default encoding is UTF-8 with Python 3+.

UnicodeEncodeError in python3.4.2?

Hi i have the following code in python3.4.2:
s='416f1c7918f83a4f1922d86df5e78348'; w="0123456789abcdef"; x=''.join([chr(w.index(s[i])*16+w.index(s[i+1])) if(i%2==0) else '' for i in range(len(s))]); print(x);
and it shows this error
UnicodeEncodeError:
'ascii' codec can't encode character '\xf8' in position 5: ordinal not in range
(128)
Why is this happening ? isn't chr in python3 supposed to take more than 128 ?
Too much work.
>>> binascii.unhexlify(s)
b'Ao\x1cy\x18\xf8:O\x19"\xd8m\xf5\xe7\x83H'

Python SMTP: 'utf-8' codec can't decode byte 0xe7 in position 7: invalid continuation byte

My all code :`
import smtplib
from email.mime.text import MIMEText
smtp_adresi="smtp.gmail.com"
smtp_port=587
user="****#gmail.com"
pass="*****"
gonderilecek_adresler=["****#bilalkocak.net","******#gmail.com"]
konu="Subject"
content="HTML content"
mail=MIMEText(content,"html","UTF-8")
mail["From"]=kullanıcı_adı
mail["Subject"]=konu
mail["To"]=",".join(gonderilecek_adresler)
mail=mail.as_string()
s=smtplib.SMTP(smtp_adresi,smtp_port)
s.starttls()
s.login(user,pass)
s.sendmail(user,gonderilecek_adresler,mail)
Result:
C:\Users\ASUS\AppData\Local\Programs\Python\Python36-32\python.exe
"C:/Users/ASUS/PycharmProjects/Again/SMTP ile Mail/main.py" 'utf-8'
codec can't decode byte 0xe7 in position 7: invalid continuation byte
Process finished with exit code 0
\xe7 is the ç in your name but not encoded in UTF-8 (maybe cp1254, Turkish name?). Save your source file in UTF-8 and try again. It helps to have a reproducible example. Your ****** in the source probably removed the problem.
Note #coding:utf8 at the top of the file declares the encoding of the file, but it is the default in Python 3 so it is not required. Python 2 would need it.

Resources