Python 'utf-8' codec stop message with IIS log - python-3.x

With the following python code
import csv
log_file = open('190415190514.txt', 'r')
all_data = csv.reader(log_file, delimiter=' ')
data = []
for row in all_data:
data.append(row)
to read a big file containing
2019-04-15 00:00:46 192.168.168.29 GET / - 443 - 192.168.168.80 Mozilla/5.0+(compatible;+PRTG+Network+Monitor+(www.paessler.com);+Windows) - 200 0 0 0
I get this error
File "main.py", line 5, in <module>
for row in datareader:
File "/usr/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1284: invalid start byte
I think there is no problem with the data file since it is a IIS log file. If there is any encoding issue, how can I locate that line? I am also not sure if my problem is the same this one.

Since you opened the file as 'r' instead of 'rb', python is trying to decode it as utf-8. The contents of the file are apparently not valid utf-8, so you're getting an erorr. You can find the line number of the offending line like this:
with open('190415190514.txt', 'rb') as f:
for i, line in enumerate(f):
try:
line.decode('utf-8')
except UnicodeDecodeError as e:
print (f'{e} at line {i+1}')
You probably should be passing errors or encoding to open. see: https://docs.python.org/3/library/functions.html#open

Related

Error while reading a csv file by using csv module in python3

When I am trying to read a csv file I am getting this type of error:
Traceback (most recent call last):
File "/root/Downloads/csvafa.py", line 4, in <module>
for i in a:
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
The code that i used:
import csv
with open('Book1.csv') as f:
a=csv.reader(f)
for i in a:
print(i)
i even tried to change the encoding to latin1:
import csv
with open('Book1.csv',encoding='latin1') as f:
a=csv.reader(f)
for i in a:
print(i)
After that i am getting this type of error message:
Traceback (most recent call last):
File "/root/Downloads/csvafa.py", line 4, in <module>
for i in a:
_csv.Error: line contains NUL
I am a beginner to python
This error is raised when we try to encode an invalid string. When Unicode string can’t be represented in this encoding (UTF-8), python raises a UnicodeEncodeError. You can try encoding: 'latin-1' or 'iso-8859-1'.
import pandas as pd
dataset = pd.read_csv('Book1.csv', encoding='ISO-8859–1')
It can also be that the data is compressed. Have a look at this answer.
I would try reading the file in utf-8 enconding
another solution might be this answer
It's still most likely gzipped data. gzip's magic number is 0x1f 0x8b, which is consistent with the UnicodeDecodeError you get.
You could try decompressing the data on the fly:
with open('destinations.csv', 'rb') as fd:
gzip_fd = gzip.GzipFile(fileobj=fd)
destinations = pd.read_csv(gzip_fd)

UnicodeDecodeError: charmap' codec can't decode byte 0x8f in position 756

I'm unable to retrieve the data from a Microsoft Excel document. I've tried using encoding 'Latin-1' or 'UTF-8' but when it gives me hundreds of \x00's in the terminal. Is there any way I can retrieve the data and output it to a text file?
This is what I'm running on the terminal and the error I get:
PS C:\Users\Andy-\Desktop> python.exe SRT411-Lab2.py Lab2Data.xlsx
Traceback (most recent call last):
File "SRT411-Lab2.py", line 9, in
lines = file.readlines()
File "C:\ProgramFiles\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1776.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 756: character maps to <\undefined>
Any help is greatly appreciated!
#!/usr/bin/python3
import sys
filename = sys.argv[1]
print(filename)
file = open(filename, 'r')
lines = file.readlines()
file.close()
print(lines)
I'd probably convert the excel file to csv file and use pandas to parse it

PostgreSQL ANSI,Python SQL, utf-8' codec can't decode byte 0xa0

I am trying to run a sql query in python. In python 2 this used to work but now that I am using python 3 this no longer is working.
I get error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1: invalid start byte
Update, added in the 3 lines in the middle,also tried using 'windows-1252' here. same error:
conn_str = 'DSN=PostgreSQL30'
conn = pyodbc.connect('DSN=STACK_PROD')
###newly added
conn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
conn.setencoding(encoding='utf-8')
sql = "select * from stackoverflow where p_date = " + business_date
print("Query: " + sql)
crsr = conn.execute(sql)
traceback:
Traceback (most recent call last):
File "<ipython-input-2-b6db3f5e859e>", line 1, in <module>
runfile('//stack/overflow/create_extract_db_new.py', wdir='//stack/overflow')
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\anaconda3_32bit\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\anaconda3_32bit\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "//stack/overflow/create_extract_db_new.py", line 37, in <module>
crsr = conn.execute(sql)
Also tried:
conn.setdecoding(pyodbc.SQL_CHAR, encoding='windows-1252')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='windows-1252')
conn.setencoding(encoding='windows-1252')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte
Also tried:
conn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-16')
conn.setencoding(encoding='utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte
Also tried:
conn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
conn.setencoding(encoding='utf-8')
conn.setdecoding(pyodbc.SQL_WMETADATA, encoding='windows-1252')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte
Also tried:
conn.setdecoding(pyodbc.SQL_CHAR, encoding='windows-1252')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='windows-1252')
conn.setencoding(encoding='windows-1252')
conn.setdecoding(pyodbc.SQL_WMETADATA, encoding='windows-1252')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte
Dump Dsn Results, omitting my username, uid, password and server.:
[my_dsn]
Driver=C:\Program Files (x86)\psqlODBC\0903\bin\psqlodbc30a.dll
CommLog=0
Debug=0
Fetch=100
Optimizer=0
Ksqo=1
UniqueIndex=1
UseDeclareFetch=0
UnknownSizes=0
TextAsLongVarchar=1
UnknownsAsLongVarchar=0
BoolsAsChar=1
Parse=0
CancelAsFreeStmt=0
MaxVarcharSize=255
MaxLongVarcharSize=8190
ExtraSysTablePrefixes=dd_;
Description=my_dsn
Database=db_name
Port=9996
ReadOnly=0
ShowOidColumn=0
FakeOidIndex=0
RowVersioning=0
ShowSystemTables=0
Protocol=7.4
ConnSettings=
DisallowPremature=0
UpdatableCursors=1
LFConversion=1
TrueIsMinus1=0
BI=0
AB=0
ByteaAsLongVarBinary=0
UseServerSidePrepare=1
LowerCaseIdentifier=0
GssAuthUseGSS=0
SSLmode=disable
KeepaliveTime=-1
KeepaliveInterval=-1
PreferLibpq=-1
XaOpt=1
Error msg:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 3: unexpected end of data
can anyone help me ?
When using PostgreSQL's Unicode driver you need to call setencoding and setdecoding as explained here.
# Python 3.x
cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
cnxn.setencoding(encoding='utf-8')
If you are using PostgreSQL's "ANSI" driver then you may still need to call those methods to ensure that the correct single-byte character set (a.k.a. "code page", e.g., windows-1252) is used for SQL_CHAR.
What worked for me was using this line for connectiong via odbc instead, and I took out the encoding and decoding.
con = pyodbc.connect(r'DSN='+'STACK_PROD',autocommit=True)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 1: invalid continuation byte

I'm trying to practice reading text from a file in python 3.5, but I keep getting an error message.
Here's my code:
# A program that reads a text file and prints its contents on the screen
# so that each row contains a line number
# A reminder for UTF-8 users:
def insert_line_number(line, line_number):
"""
A function that takes a line and a corresponding line number and joins them
together.
:param line: A line of text
:param line_number: A corresponding line number
:return: A line with teh line number at the beginning
"""
line_with_number = "{:3d} {:s}".format(line_number, line)
return line_with_number
def print_lines(file):
"""
A function that reads lines from a file and prints them with an added line
number.
:param file: The file whose lines are to be printed
:return: -
"""
line_number = 0
try:
# Going through the lines, adding line numbers and printing the
# resulting line
for line in file:
# Removing unnecessary characters from the end of the line
line = line.rstrip()
line_number += 1
# Creating a line with the appropriate line number
line_with_number = insert_line_number(line, line_number)
print(line_with_number)
except IOError:
print("Error reading the file. Program will close.")
# Our main
def main():
filename = input("Syötä tiedoston nimi: ")
try:
file = open(filename, "r", encoding="utf-8")
print_lines(file)
file.close()
except IOError:
print("Error opening the file. Program will close.")
# Program start
main()
...and here's the error message:
Traceback (most recent call last):
File "F:/.../5.7.1(tiedostojen_lukeminen).py", line 55, in <module>
main()
File "F:/.../5.7.1(tiedostojen_lukeminen).py", line 48, in main
print_lines(file)
File "F:/.../5.7.1(tiedostojen_lukeminen).py", line 32, in print_lines
for line in file:
File "C:\Program Files (x86)\Python35-32\lib\codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 1: invalid continuation byte
Process finished with exit code 1
Thomas K had this to say in another thread:
"To put the same thing another way: when you open a file for reading,
the encoding parameter needs to be the encoding the file is already
in, not the encoding you want (you select that when you write the
file)."
– Thomas K Nov 20 '12 at 13:02
Before posting this here I was specifically instructed to use the encoding="utf-8" -parameter inside the open()-function. Without the parameter the code runs just fine on my machine, but if I include it, I get the error above.
I'd be interested to know what is going on behind the scenes. Why doesn't my code work when the parameter is included?

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position

When I use open and read syntax to open and read file in Python 3 and change files encoding, but this error happened. I want to convert a text with any encoding to UTF-8 and save it.
"sin3" has an unknown encoding,
fh= open(sin3, mode="r", encoding='utf8')
ss= fh.read()
File "/usr/lib/python3.2/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 34: invalid continuation byte
I used codecs and got this error:
fh= codecs.open(sin3, mode="r", encoding='utf8')
ss= fh.read()
File "/usr/lib/python3.2/codecs.py", line 679, in read
return self.reader.read(size)
File "/usr/lib/python3.2/codecs.py", line 482, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 34: invalid continuation byte
Try this:
Open the csv file in Sublime text editor.
Save the file in utf-8 format.
In sublime, Click File -> Save with encoding -> UTF-8
Then, you can read your file as usual:
I would recommend using Pandas.
In Pandas, you can read it by using:
import pandas as pd
data = pd.read_csv('file_name.csv', encoding='utf-8')
Try this:
fh = codecs.open(sin3, "r",encoding='utf-8', errors='ignore')
You can solve this problem by using Pandas library
import pandas as pd
data=pd.read_csv("C:\\Users\\akashkumar\\Downloads\\Customers.csv",encoding='latin1')

Resources