Python SMTP: 'utf-8' codec can't decode byte 0xe7 in position 7: invalid continuation byte - python-3.x

My all code :`
import smtplib
from email.mime.text import MIMEText
smtp_adresi="smtp.gmail.com"
smtp_port=587
user="****#gmail.com"
pass="*****"
gonderilecek_adresler=["****#bilalkocak.net","******#gmail.com"]
konu="Subject"
content="HTML content"
mail=MIMEText(content,"html","UTF-8")
mail["From"]=kullanıcı_adı
mail["Subject"]=konu
mail["To"]=",".join(gonderilecek_adresler)
mail=mail.as_string()
s=smtplib.SMTP(smtp_adresi,smtp_port)
s.starttls()
s.login(user,pass)
s.sendmail(user,gonderilecek_adresler,mail)
Result:
C:\Users\ASUS\AppData\Local\Programs\Python\Python36-32\python.exe
"C:/Users/ASUS/PycharmProjects/Again/SMTP ile Mail/main.py" 'utf-8'
codec can't decode byte 0xe7 in position 7: invalid continuation byte
Process finished with exit code 0

\xe7 is the ç in your name but not encoded in UTF-8 (maybe cp1254, Turkish name?). Save your source file in UTF-8 and try again. It helps to have a reproducible example. Your ****** in the source probably removed the problem.
Note #coding:utf8 at the top of the file declares the encoding of the file, but it is the default in Python 3 so it is not required. Python 2 would need it.

Related

Set locale for stdout in python3

I want my code to stop producing errors depending on the locale set on the terminal. For example this code:
import os
print(f"Locale {os.getenv('LC_ALL')}")
foo_bytes = b'\xce\x94, \xd0\x99, \xd7\xa7, \xe2\x80\x8e \xd9\x85, \xe0\xb9\x97, \xe3\x81\x82, \xe5\x8f\xb6, \xe8\x91\x89, and \xeb\xa7\x90.'
print(foo_bytes.decode("utf-8", "replace"))
Will print cleanly as long as my locale is US.UTF-8.
However if I change my locale and run the script I described earlier
export LC_ALL=en_US.iso885915
python3 locale_script.py
It will fail on:
Locale en_US.iso885915
Traceback (most recent call last):
File "locale_script.py", line 10, in <module>
print(foo_bytes.decode("utf-8", "replace"))
File "/usr/lib/python3.6/encodings/iso8859_15.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0394' in position 0: character maps to <undefined>
This could be avoided if I had set the terminal locale within my script so it will use "utf-8" as I required. I have tried setlocale, but it still end up in the same error.
import locale
locale.setlocale(locale.LC_ALL, "C.UTF-8")
Any advice on what to do? I hope I could avoid to have to re-encode all my strings:
foo = 'Δ, Й, ק, ‎ م, ๗, あ, 叶, 葉, and 말.'
print(foo.encode().decode(sys.stdout.encoding))

Can not build sphinx excerpts while keeping original langunge text

I need to build a excerpt from Arabic text and keep the original language for display purpose of the excerpt. But the problem is if I feed the Arabic text direct to BuildExcerpt function it gave the following error.
'{"دولة": "فلسطين", "مصدر": "وزارة الاقتصاد الوطني", "رقم الشركة": "563420595", "اسم الشركة": "شركة حسان الغرابلي وشركاه للتجارة العامة", "عنوان الشركة": "غزة - الشجاعية", "نوع الشركة": "شركة مسجلة", "تاريخ التسجيل": "1994-07-03", "الهاتف": "", "راس مال الشركة": "0 دينار أردني", "مفوضون": "الشركاء مجتمعين"}'
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x84 in position 5: invalid start byte
As a workaround I used unidecode module and feed the converted text to BuildExcerpt. Then the original language encoding is missing and can not rebuild it again from the excerpt. See the output below.
[' ... -03", "lhtf": "", "rs ml lshrk#": "<b>0</b> dynr \'rdny", "mfwDwn": "lshrk mjtm ... ']
Is there way I can keep the original language encoding for the excerpt?

How to read binary data in pyspark

I'm reading binary file http://snap.stanford.edu/data/amazon/productGraph/image_features/image_features.b
using pyspark.
import array
from io import StringIO
img_embedding_file = sc.binaryRecords("s3://bucket/image_features.b", 4106)
def mapper(features):
a = array.array('f')
a.frombytes(features)
return a.tolist()
def byte_mapper(bytes):
return str(bytes)
decoded_embeddings = img_embedding_file.map(lambda x: [byte_mapper(x[:10]), mapper(x[10:])])
When just product_id is selected from the rdd using
decoded_embeddings = img_embedding_file.map(lambda x: [byte_mapper(x[:10]), mapper(x[10:])])
The output for product_id is
["b'1582480311'", "b'\\x00\\x00\\x00\\x00\\x88c-?\\xeb\\xe2'", "b'7#\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00'", "b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00'", "b'\\xec/\\x0b?\\x00\\x00\\x00\\x00K\\xea'", "b'\\x00\\x00c\\x7f\\xd9?\\x00\\x00\\x00\\x00'", "b'L\\xa6\\n>\\x00\\x00\\x00\\x00\\xfe\\xd4'", "b'\\x00\\x00\\x00\\x00\\x00\\x00\\xe5\\xd0\\xa2='", "b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00'", "b'\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00\\x00'"]
The file is hosted on s3.
The file in each row has first 10 bytes for product_id next 4096 bytes as image_features
I'm able to extract all the 4096 image features but facing issue when reading the first 10 bytes and converting it into proper readable format.
EDIT:
Finally, the problem comes from the recordLength. It's not 4096 + 10 but 4096*4 + 10. Chaging to :
img_embedding_file = sc.binaryRecords("s3://bucket/image_features.b", 16394)
Should work.
Actually you can find this in the provided code from the web site you downloaded the binary file:
for i in range(4096):
feature.append(struct.unpack('f', f.read(4))) # <-- so 4096 * 4
Old answer:
I think the issue comes from your byte_mapper function.
That's not the correct way to convert bytes to string. You should be using decode:
bytes = b'1582480311'
print(str(bytes))
# output: "b'1582480311'"
print(bytes.decode("utf-8"))
# output: '1582480311'
If you're getting the error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x88 in position 4: invalid start byte
That means product_id string contains non-utf8 characters. If you don't know the input encoding, it's difficult to convert into strings.
However, you may want to ignore those characters by adding option ignore to decode function:
bytes.decode("utf-8", "ignore")

python : error encoding 'ascii' ONLY running with crontab

My python script working very well when i'm running it in console mode. It's write a html file, convert with pdfkit in pdf and send a mail. All is allright.
But, when this script running from the crontab, there's some encoding error.
if len(html) > 0:
with open(file_html_to_pdf, 'w') as file:
message = '[PDF] >> writing the html file "{0}"'.format(file_html_to_pdf)
logging.info(message)
print(message)
file.write(html)
else:
raise IOError('html file is invalid !')
In the crontab, I've done this :
# VARIABLES FOR PYTHON
PYTHONIOENCODING=utf8
*/5 * * * * /home/users/my-user/my-project/env/bin/python /home/users/my-user/my-project/cronjob.py > /var/log/apps/my-project/cron_error_my-project.log 2>&1
And in the bashrc :
# set locale utf-8 in french.....et voilà
export PYTHONIOENCODING=utf-8
export LC_ALL=fr_FR.utf-8
export LANG="$LC_ALL"
The error message :
UnicodeEncodeError: 'ascii' codec can't encode character '\xf4' in position 37: ordinal not in range(128)
...
UnicodeEncodeError: 'ascii' codec can't encode character '\xe9' in position 400: ordinal not in range(128)
...
Message: ('<!> Error => ', UnicodeEncodeError('ascii', '<!DOCTYPE html>\n<head>\n <title>---
Python version :
$ /home/users/my-user/my-project/env/bin/python --version
Python 3.4.2
I don't understand why :(. Can anybody help me?
Thanks
F.

UnicodeEncodeError in python3.4.2?

Hi i have the following code in python3.4.2:
s='416f1c7918f83a4f1922d86df5e78348'; w="0123456789abcdef"; x=''.join([chr(w.index(s[i])*16+w.index(s[i+1])) if(i%2==0) else '' for i in range(len(s))]); print(x);
and it shows this error
UnicodeEncodeError:
'ascii' codec can't encode character '\xf8' in position 5: ordinal not in range
(128)
Why is this happening ? isn't chr in python3 supposed to take more than 128 ?
Too much work.
>>> binascii.unhexlify(s)
b'Ao\x1cy\x18\xf8:O\x19"\xd8m\xf5\xe7\x83H'

Resources