Unable to read "Binary (application/octet-stream)" file in python? - python-3.x

I am trying to read data files from CIFAR-10 data set. I have downloaded it but I am unable to read the files.
The code I am using to read the file.
def unpickle(file):
print(file)
import pickle
fo = open(file, 'rb')
dict = cPickle.load(fo)
fo.close()
return dict
file = 'data_batch_1'
It is showing error"
Traceback (most recent call last):
File "basiccnn.py", line 28, in <module>
data1 = unpickle(file)
File "basiccnn.py", line 23, in unpickle
dict = cPickle.load(fo)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)

Since your getting:
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8b in position 6: ordinal not in range(128)
You seem to have an encoding issue. According to pickle.loads(), the default encoding ASCII is used which is likely why your getting that error. Setting encoding to "bytes" fixes the issue:
data = pickle.load(fo, encoding='bytes')
Two more things:
cPickle was renamed to _pickle in Python 3, but you should really just use pickle.
It's terrible practice to name variables the same as built-in types. dict is used by the dictionary data type. Use some other ambiguous name such as data instead.

Related

Loading a dictionary saved as a msgpack with symspell

I am trying to use symspell in python to spellcheck some old spanish texts. Since they are all texts I need a dictionary that has old spanish words so I downloaded the large dictionary they share here which is a msgpack.
According to the basic usage, I can load a dictionary using this code
sym_spell = SymSpell(max_dictionary_edit_distance=2, prefix_length=7)
dictionary = pkg_resources.resource_filename(
"symspellpy", "dictionary.txt"
)
sym_spell.load_dictionary(dictionary, term_index=0, count_index=1)
as shown here
But when I try it with the msgpack file like this
sym_spell = SymSpell(max_dictionary_edit_distance=2, prefix_length=7)
dictionary = pkg_resources.resource_filename(
"symspellpy", "large_es.msgpack"
)
sym_spell.load_dictionary(dictionary, term_index=0, count_index=1)
I get this error
Traceback (most recent call last):
File ".../utils/quality_check.py", line 24, in <module>
sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1)
File ".../lib/python3.8/site-packages/symspellpy/symspellpy.py", line 346, in load_dictionary
return self._load_dictionary_stream(
File ".../lib/python3.8/site-packages/symspellpy/symspellpy.py", line 1122, in _load_dictionary_stream
for line in corpus_stream:
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdc in position 0: invalid continuation byte
I know this means the file is supposed to be a txt file but anyone has an idea how I can load a frequency dictionary stored in a msgpack file with symspell on python?

Error while reading a csv file by using csv module in python3

When I am trying to read a csv file I am getting this type of error:
Traceback (most recent call last):
File "/root/Downloads/csvafa.py", line 4, in <module>
for i in a:
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
The code that i used:
import csv
with open('Book1.csv') as f:
a=csv.reader(f)
for i in a:
print(i)
i even tried to change the encoding to latin1:
import csv
with open('Book1.csv',encoding='latin1') as f:
a=csv.reader(f)
for i in a:
print(i)
After that i am getting this type of error message:
Traceback (most recent call last):
File "/root/Downloads/csvafa.py", line 4, in <module>
for i in a:
_csv.Error: line contains NUL
I am a beginner to python
This error is raised when we try to encode an invalid string. When Unicode string can’t be represented in this encoding (UTF-8), python raises a UnicodeEncodeError. You can try encoding: 'latin-1' or 'iso-8859-1'.
import pandas as pd
dataset = pd.read_csv('Book1.csv', encoding='ISO-8859–1')
It can also be that the data is compressed. Have a look at this answer.
I would try reading the file in utf-8 enconding
another solution might be this answer
It's still most likely gzipped data. gzip's magic number is 0x1f 0x8b, which is consistent with the UnicodeDecodeError you get.
You could try decompressing the data on the fly:
with open('destinations.csv', 'rb') as fd:
gzip_fd = gzip.GzipFile(fileobj=fd)
destinations = pd.read_csv(gzip_fd)

UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 5: ordinal not in range(128)

I'm using python 3.7 and keep getting an error when trying to load a pickle file
Here's the code:
import pickle
with open('tenIntensities.pkl','rb') as handle:
tenIntensities = pickle.load(handle)`
and I get:
Traceback (most recent call last):
File "C:\Users\Shaun Ganju\Desktop\Coding\Textbook_work\Chapter_3_Wrangling_Spike_Trains.py", line 87, in <module>
tenIntensities = pickle.load(handle)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x80 in position 5: ordinal not in range(128)
I don't have much coding experience(just the minor basics) and the textbook I'm following said nothing about this so I looked online for answers and figured out that the message is because I did not use UTF-8 encoding so when I changed my code to:
import pickle
with open('tenIntensities.pkl', encoding='utf-8') as handle:
tenIntensities = pickle.load(handle)
I got this message instead:
Traceback (most recent call last):
File "C:\Users\Shaun Ganju\Desktop\Coding\Textbook_work\Chapter_3_Wrangling_Spike_Trains.py", line 87, in <module>
tenIntensities = pickle.load(handle)
TypeError: a bytes-like object is required, not 'str'
I'm kinda stuck and any help would be appreciated.
I encountered the same problem while working on some stuffs. In some cases, it could work like this (by encoding with latin-1):
with open('tenIntensities.pkl', encoding='latin-1') as handle:
tenIntensities = pickle.load(handle)
This too works:
with open('tenIntensities.pkl', encoding='bytes') as handle:
tenIntensities = pickle.load(handle)
Warning: There are times when these does not work - no matter how you try!

UnicodeDecodeError: charmap' codec can't decode byte 0x8f in position 756

I'm unable to retrieve the data from a Microsoft Excel document. I've tried using encoding 'Latin-1' or 'UTF-8' but when it gives me hundreds of \x00's in the terminal. Is there any way I can retrieve the data and output it to a text file?
This is what I'm running on the terminal and the error I get:
PS C:\Users\Andy-\Desktop> python.exe SRT411-Lab2.py Lab2Data.xlsx
Traceback (most recent call last):
File "SRT411-Lab2.py", line 9, in
lines = file.readlines()
File "C:\ProgramFiles\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1776.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 756: character maps to <\undefined>
Any help is greatly appreciated!
#!/usr/bin/python3
import sys
filename = sys.argv[1]
print(filename)
file = open(filename, 'r')
lines = file.readlines()
file.close()
print(lines)
I'd probably convert the excel file to csv file and use pandas to parse it

Rodeo UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)

I have just started using Python3. I am trying to open a csv file using Rodeo IDE
fp = open('Proteomics_Data.csv') # open file on read mode
lines = fp.read().split("\n") # create a list containing all lines
I am getting an error I paste it below.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-347-aebf19dd596a> in <module>()
----> 1 lines = fp.read().split("\n")
/Users/alessandro/anaconda/lib/python3.6/encodings/ascii.py in decode(self, input, final)
24 class IncrementalDecoder(codecs.IncrementalDecoder):
25 def decode(self, input, final=False):
---> 26 return codecs.ascii_decode(input, self.errors)[0]
27
28 class StreamWriter(Codec,codecs.StreamWriter):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)
What I have found so far is the terminal not being set to UTF-8, but apparently Python 3 does not need UTF-8. I am not sure tif this could be a problem related to the IDE?
Try specifying the encoding in the open function call.
fp = open('Proteomics_Data.csv', encoding='utf-8')
When I save a file as CSV I actually get two options:
CSV UTF-8 (Comma delimited)(.csv) and Comma Separated Values(.csv)
If I save with the second option (Comma Separated Values) I don't need to add encoding='utf-8'

Resources