'ascii' codec can't decode byte 0xe2 - Python3 on Sublime Text 3 - python-3.x

This is my first question here.
I am running the following:
Sublime Text 3
OS 10
Python 3.6
I am trying to open and print on Sublime a .txt file. I am using the following code:
myfile = open("/stanford.txt", "r")
contents = myfile.read()
and this is the error message that I get:
Traceback (most recent call last):
File "/Users/me/python/Test1.py", line 11, in <module>
contents = myfile.read()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 173: ordinal not in range(128)
[Finished in 0.1s with exit code 1]
The 2 lines of code above work on Sublime Text 3 when I run it using Python 2.7. It also works when I run them on Terminal (using Python3). However, they do not work when I try them in IDLE: there I get exactly that same error message that Sublime Text 3 is giving me when I run them in Python3.
This is my build configuration on Sublime Text 3:
{
"cmd": ["/Library/Frameworks/Python.framework/Versions/3.6/bin/python3", "$file"],
"selector": "source.python",
"file_regex": "file \"(...*?)\", line([0-9]+)"
}
I have also tried adding "PYTHONIOENCODING" but still the same error message:
{
"cmd": ["/Library/Frameworks/Python.framework/Versions/3.6/bin/python3", "$file"],
"selector": "source.python",
"file_regex": "file \"(...*?)\", line([0-9]+)",
"env": {"PYTHONIOENCODING": "utf8"}
}
(I have also tried "utf-8" with the dash instead of the above. Same error message).
What do I need to do so that Sublime Text 3 can read the file?
Thank you
Edit: I don't think this issue is a duplicate of the other? This works on Terminal for me but not on Sublime Text 3.
Edit2: I have noticed that if I remove the apostrophes (') contained in the text file, I get to open the file with no problems. The error only occurs when I add the apostrophes back to the text file.

Ok, I finally figured it out.
I added the following to the open function:
, encoding="utf-8")
And it worked. I'm still trying to figure out a way to make this permanent to my Python3 on Sublime.

Related

How to print ダイスキ using Python 3.7 in Scite?

I'm using Win10 & Scite with utf-8 enabled output window.
The file is saved as UTF-8 with BOM
Script:
print('ダイスキ from python 3')
The script can be run on cmd prompt without error. But when run on Scite it will produce error:
Output:
>pythonw.exe -u "test.py"
Traceback (most recent call last):
File "test.py", line 12, in <module>
print('\u30c0\u30a4\u30b9\u30ad from python 3')
File "D:\BIN\Python37\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1-2: character maps to <undefined>
>Exit code: 1
How to properly print ダイスキ to stdout using python3 with Scite?
Updates:
I've edited Scite Global Options file, to support utf-8.
code.page=65001
I've tested C, Lua, old Python 2.7, and it can print utf-8 strings (on Scite output window).
Seems to be Scite configuration error or a maybe Scite bug, because the Scite output terminal window works on Lua & C, but fail only on Python3.
Scite involves popen() / piping the STDOUT.
Python 3.7 need env variable "PYTHONIOENCODING" to be set manually. So you need to add environment variable "PYTHONIOENCODING" set to "utf_8"
Result:
Try to do this:
print(u'ダイスキ')

Stop Python adding " " when reading some .csv files

I have a number of .csv files which I am opening in Python3. Some open fine and the script runs fine, others I get the below error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 5547: invalid start byte
If I tell Python to ignore errors as below
dataset = open('data.csv', 'r', errors='inore')
The script then runs but it adds quotation marks around each column header in the .csv e.g.
"No.","Time","Source","Destination"
How can I open the .csv without the quotation marks, as per the others that already do this e.g. below
No.,Time,Source,Destination
I have tried this running on Linux Mint 18.3 with Python 3.6.4 and Mac OSx with Python 3.6.3 and get same results on both. I do not have a windows PC to try.
try to strip the string mate :)
a ="\"a\""
print(a.strip("\""))
or replace the "
a.replace("\"", "")

python error while reading large files from a folder to copy to another file

i'm trying to read files in folder and copy specific part of each file to a new file using the below python code.but getting error as below
import glob
file=glob.glob("C:/Users/prasanth/Desktop/project/prgms/rank_free1/*.txt")
fp=[]
for b in file:
fp.append(open(b,'r'))
s1=''
for f in fp:
d=f.read().split('\t')
rank=d[0]
appname=d[1]
appid=d[2]
s1=appid+'\n'
file=open('C:/Users/prasanth/Desktop/project/prgms/appids_file.txt','a',encoding="utf-8")
file.write(s1)
file.close()
im getting the following error message
enter code here
Traceback (most recent call last):
File "appids.py", line 8, in <module>
d=f.read().split('\t')
File "C:\Users\prasanth\AppData\Local\Programs\Python\Python36-
32\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position
12307: character maps to <undefined>
From what I can see one of the files you are opening contains non-UTF8 characters so it can't be read into a string variable without appropriate information about its encoding.
To handle this you need to open the file for reading in binary mode and take care of the problem in your script.
You may put d=f.read().split('\t') in a try: except: construct and reopen the file in binary mode in the except: branch. Then handle in your script the problem with non-UTF8 characters it contains.

Python reading from non ascii file

I have a text file which contains the following character:
ÿ
When I try and read the file in I've tried both:
with open (file, "r") as myfile:
AND
with codecs.open(file, encoding='utf-8') as myfile:
with success. However when I try to read the file in as a string using:
file_string=myfile.read()
OR
file_string=myfile.readLine()
I keep getting this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 11889: invalid start byte
Ideally I want it to ignore the character or subsitute it with '' or whitespace
I've come up with a solution. Just use python2 instead of python3. I still can't seem to get it to work in python3 though

Python opening a txt file converted from pdf

I downloaded from http://icdept.cgaux.org/pdf_files/English-Italian-Glossary-Nautical-Terms.pdf the pdf file and converted it to a txt file using pdf2txt ( downloaded from iTunes) I am trying to convert the contents of the file to a searchable Python dictionary(I am studying for an Italian sailing licence).
I am using simply to test whether I can get the text into a format that I can parse :
with open('English-Italian-Glossary-Nautical-Terms1.txt', 'r') as out_file:
with open("nautical_glossary.txt", 'w') as in_file:
for line in out_file:
in_file.write(line)
but constantly get an error:
Traceback (most recent call last):
File "/Users/admin/Desktop/untitled folder/nautical.py", line 4, in <module>
for line in out_file:
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfe in position 0: ordinal not in range(128)
I would appreciate some help understanding the error and a suggestion to resolve the problem.
I am not sure whether someone can suggest an obvious way to parse this particular file into a dictionary format?
This error tells you that the coding of the file is not the expected. See on wikipedia about it. In other words, he doesn't know what does 0xfe mean.
You should find the correct encoding of the file and open with it. I suspect it is utf-8, but I could be wrong. Did you tried to open the file to see how it is?
Read this and try this:
with open('English-Italian-Glossary-Nautical-Terms1.txt', 'r') as out_file:
with open("nautical_glossary.txt", 'w') as in_file:
for line in out_file.readlines():
in_file.write(line)

Resources