UnicodeDecodeError: charmap' codec can't decode byte 0x8f in position 756 - python-3.x

I'm unable to retrieve the data from a Microsoft Excel document. I've tried using encoding 'Latin-1' or 'UTF-8' but when it gives me hundreds of \x00's in the terminal. Is there any way I can retrieve the data and output it to a text file?
This is what I'm running on the terminal and the error I get:
PS C:\Users\Andy-\Desktop> python.exe SRT411-Lab2.py Lab2Data.xlsx
Traceback (most recent call last):
File "SRT411-Lab2.py", line 9, in
lines = file.readlines()
File "C:\ProgramFiles\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1776.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 756: character maps to <\undefined>
Any help is greatly appreciated!
#!/usr/bin/python3
import sys
filename = sys.argv[1]
print(filename)
file = open(filename, 'r')
lines = file.readlines()
file.close()
print(lines)

I'd probably convert the excel file to csv file and use pandas to parse it

Related

Error while reading a csv file by using csv module in python3

When I am trying to read a csv file I am getting this type of error:
Traceback (most recent call last):
File "/root/Downloads/csvafa.py", line 4, in <module>
for i in a:
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
The code that i used:
import csv
with open('Book1.csv') as f:
a=csv.reader(f)
for i in a:
print(i)
i even tried to change the encoding to latin1:
import csv
with open('Book1.csv',encoding='latin1') as f:
a=csv.reader(f)
for i in a:
print(i)
After that i am getting this type of error message:
Traceback (most recent call last):
File "/root/Downloads/csvafa.py", line 4, in <module>
for i in a:
_csv.Error: line contains NUL
I am a beginner to python
This error is raised when we try to encode an invalid string. When Unicode string can’t be represented in this encoding (UTF-8), python raises a UnicodeEncodeError. You can try encoding: 'latin-1' or 'iso-8859-1'.
import pandas as pd
dataset = pd.read_csv('Book1.csv', encoding='ISO-8859–1')
It can also be that the data is compressed. Have a look at this answer.
I would try reading the file in utf-8 enconding
another solution might be this answer
It's still most likely gzipped data. gzip's magic number is 0x1f 0x8b, which is consistent with the UnicodeDecodeError you get.
You could try decompressing the data on the fly:
with open('destinations.csv', 'rb') as fd:
gzip_fd = gzip.GzipFile(fileobj=fd)
destinations = pd.read_csv(gzip_fd)

Python 'utf-8' codec stop message with IIS log

With the following python code
import csv
log_file = open('190415190514.txt', 'r')
all_data = csv.reader(log_file, delimiter=' ')
data = []
for row in all_data:
data.append(row)
to read a big file containing
2019-04-15 00:00:46 192.168.168.29 GET / - 443 - 192.168.168.80 Mozilla/5.0+(compatible;+PRTG+Network+Monitor+(www.paessler.com);+Windows) - 200 0 0 0
I get this error
File "main.py", line 5, in <module>
for row in datareader:
File "/usr/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1284: invalid start byte
I think there is no problem with the data file since it is a IIS log file. If there is any encoding issue, how can I locate that line? I am also not sure if my problem is the same this one.
Since you opened the file as 'r' instead of 'rb', python is trying to decode it as utf-8. The contents of the file are apparently not valid utf-8, so you're getting an erorr. You can find the line number of the offending line like this:
with open('190415190514.txt', 'rb') as f:
for i, line in enumerate(f):
try:
line.decode('utf-8')
except UnicodeDecodeError as e:
print (f'{e} at line {i+1}')
You probably should be passing errors or encoding to open. see: https://docs.python.org/3/library/functions.html#open

How to convert large binary file into pickle dictionary in python?

I am trying to convert large binary file contains Arabic words with 300 dimension vectors into pickle dictionary
What I am write so far is:
import pickle
ArabicDict = {}
with open('cc.ar.300.bin', encoding='utf-8') as lex:
for token in lex:
for line in lex.readlines():
data = line.split()
ArabicDict[data[0]] = float(data[1])
pickle.dump(ArabicDict,open("ArabicDictionary.p","wb"))
The error which I am getting is:
Traceback (most recent call last):
File "E:\Dataset", line 4, in <module>
for token in lex:
File "E:\lib\codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 0: invalid start byte

Unable to read "Binary (application/octet-stream)" file in python?

I am trying to read data files from CIFAR-10 data set. I have downloaded it but I am unable to read the files.
The code I am using to read the file.
def unpickle(file):
print(file)
import pickle
fo = open(file, 'rb')
dict = cPickle.load(fo)
fo.close()
return dict
file = 'data_batch_1'
It is showing error"
Traceback (most recent call last):
File "basiccnn.py", line 28, in <module>
data1 = unpickle(file)
File "basiccnn.py", line 23, in unpickle
dict = cPickle.load(fo)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)
Since your getting:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)
You seem to have an encoding issue. According to pickle.loads(), the default encoding ASCII is used which is likely why your getting that error. Setting encoding to "bytes" fixes the issue:
data = pickle.load(fo, encoding='bytes')
Two more things:
cPickle was renamed to _pickle in Python 3, but you should really just use pickle.
It's terrible practice to name variables the same as built-in types. dict is used by the dictionary data type. Use some other ambiguous name such as data instead.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position

When I use open and read syntax to open and read file in Python 3 and change files encoding, but this error happened. I want to convert a text with any encoding to UTF-8 and save it.
"sin3" has an unknown encoding,
fh= open(sin3, mode="r", encoding='utf8')
ss= fh.read()
File "/usr/lib/python3.2/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 34: invalid continuation byte
I used codecs and got this error:
fh= codecs.open(sin3, mode="r", encoding='utf8')
ss= fh.read()
File "/usr/lib/python3.2/codecs.py", line 679, in read
return self.reader.read(size)
File "/usr/lib/python3.2/codecs.py", line 482, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 34: invalid continuation byte
Try this:
Open the csv file in Sublime text editor.
Save the file in utf-8 format.
In sublime, Click File -> Save with encoding -> UTF-8
Then, you can read your file as usual:
I would recommend using Pandas.
In Pandas, you can read it by using:
import pandas as pd
data = pd.read_csv('file_name.csv', encoding='utf-8')
Try this:
fh = codecs.open(sin3, "r",encoding='utf-8', errors='ignore')
You can solve this problem by using Pandas library
import pandas as pd
data=pd.read_csv("C:\\Users\\akashkumar\\Downloads\\Customers.csv",encoding='latin1')

Resources