PostgreSQL ANSI,Python SQL, utf-8' codec can't decode byte 0xa0 - python-3.x

I am trying to run a sql query in python. In python 2 this used to work but now that I am using python 3 this no longer is working.
I get error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 1: invalid start byte
Update, added in the 3 lines in the middle,also tried using 'windows-1252' here. same error:
conn_str = 'DSN=PostgreSQL30'
conn = pyodbc.connect('DSN=STACK_PROD')
###newly added
conn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
conn.setencoding(encoding='utf-8')
sql = "select * from stackoverflow where p_date = " + business_date
print("Query: " + sql)
crsr = conn.execute(sql)
traceback:
Traceback (most recent call last):
File "<ipython-input-2-b6db3f5e859e>", line 1, in <module>
runfile('//stack/overflow/create_extract_db_new.py', wdir='//stack/overflow')
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\anaconda3_32bit\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\Users\stack\AppData\Local\Continuum\anaconda3\anaconda3_32bit\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "//stack/overflow/create_extract_db_new.py", line 37, in <module>
crsr = conn.execute(sql)
Also tried:
conn.setdecoding(pyodbc.SQL_CHAR, encoding='windows-1252')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='windows-1252')
conn.setencoding(encoding='windows-1252')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte
Also tried:
conn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-16')
conn.setencoding(encoding='utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte
Also tried:
conn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
conn.setencoding(encoding='utf-8')
conn.setdecoding(pyodbc.SQL_WMETADATA, encoding='windows-1252')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte
Also tried:
conn.setdecoding(pyodbc.SQL_CHAR, encoding='windows-1252')
conn.setdecoding(pyodbc.SQL_WCHAR, encoding='windows-1252')
conn.setencoding(encoding='windows-1252')
conn.setdecoding(pyodbc.SQL_WMETADATA, encoding='windows-1252')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd5 in position 1: invalid continuation byte
Dump Dsn Results, omitting my username, uid, password and server.:
[my_dsn]
Driver=C:\Program Files (x86)\psqlODBC\0903\bin\psqlodbc30a.dll
CommLog=0
Debug=0
Fetch=100
Optimizer=0
Ksqo=1
UniqueIndex=1
UseDeclareFetch=0
UnknownSizes=0
TextAsLongVarchar=1
UnknownsAsLongVarchar=0
BoolsAsChar=1
Parse=0
CancelAsFreeStmt=0
MaxVarcharSize=255
MaxLongVarcharSize=8190
ExtraSysTablePrefixes=dd_;
Description=my_dsn
Database=db_name
Port=9996
ReadOnly=0
ShowOidColumn=0
FakeOidIndex=0
RowVersioning=0
ShowSystemTables=0
Protocol=7.4
ConnSettings=
DisallowPremature=0
UpdatableCursors=1
LFConversion=1
TrueIsMinus1=0
BI=0
AB=0
ByteaAsLongVarBinary=0
UseServerSidePrepare=1
LowerCaseIdentifier=0
GssAuthUseGSS=0
SSLmode=disable
KeepaliveTime=-1
KeepaliveInterval=-1
PreferLibpq=-1
XaOpt=1
Error msg:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 3: unexpected end of data
can anyone help me ?

When using PostgreSQL's Unicode driver you need to call setencoding and setdecoding as explained here.
# Python 3.x
cnxn.setdecoding(pyodbc.SQL_CHAR, encoding='utf-8')
cnxn.setdecoding(pyodbc.SQL_WCHAR, encoding='utf-8')
cnxn.setencoding(encoding='utf-8')
If you are using PostgreSQL's "ANSI" driver then you may still need to call those methods to ensure that the correct single-byte character set (a.k.a. "code page", e.g., windows-1252) is used for SQL_CHAR.

What worked for me was using this line for connectiong via odbc instead, and I took out the encoding and decoding.
con = pyodbc.connect(r'DSN='+'STACK_PROD',autocommit=True)

Related

I keep getting UnicodeDecodeError even after changing encoding types

I did import sys and checked that the default encoding was already utf-8.
This is my function, from top to bottom is what I try to open with and the errors I got.
def get_file_data(self, filename, connection_string = "", readlines=False):
file_data = None
tmp_file_path = copyfile.make_temp_path()
try:
if connection_string:
source_file = copyfile.join(connection_string, filename)
else:
source_file = filename
copyfile.copyfile(source_file, tmp_file_path)
tmp_file = open(tmp_file_path, "r") # gives error ('ascii' codec can't decode byte 0x9a in position 10: ordinal not in range(128))
tmp_file = open(tmp_file_path, 'r', encoding="utf-8") # gives error ('utf-8' codec can't decode byte 0x9a in position 10: invalid start byte)
tmp_file = open(tmp_file_path, 'r', encoding="utf-16") # gives error ('utf-16-le' codec can't decode byte 0x00 in position 156: truncated data)
if readlines:
file_data = tmp_file.readlines()
else:
file_data = tmp_file.read()
tmp_file.close()
finally:
os.remove(tmp_file_path)
return file_data
Traceback (most recent call last):
File "./leds/bin/tx_crash_accident_import.py", line 56, in task
zip_file = self.get_remote_file_data(zip_filename, connection_str)
File "/leds/leds/python/interfaces/interfaces_import/interfaces_import.py", line 1837, in get_remote_file_data
file_data = tmp_file.read()
File "/opt/leds_py3/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 13:43:44 'ascii' codec can't decode byte 0x9a in position 10: ordinal not in range(128)
None
warn: 13:43:44 Traceback (most recent call last):
File "./leds/bin/tx_crash_accident_import.py", line 56, in task
zip_file = self.get_remote_file_data(zip_filename, connection_str)
File "/leds/leds/python/interfaces/interfaces_import/interfaces_import.py", line 1837, in get_remote_file_data
file_data = tmp_file.read()
File "/opt/leds_py3/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 13:43:44 'ascii' codec can't decode byte 0x9a in position 10: ordinal not in range(128)
What else can I try here? I read something about changing the system locales, but that would be my last resort.

UnicodeDecodeError: charmap' codec can't decode byte 0x8f in position 756

I'm unable to retrieve the data from a Microsoft Excel document. I've tried using encoding 'Latin-1' or 'UTF-8' but when it gives me hundreds of \x00's in the terminal. Is there any way I can retrieve the data and output it to a text file?
This is what I'm running on the terminal and the error I get:
PS C:\Users\Andy-\Desktop> python.exe SRT411-Lab2.py Lab2Data.xlsx
Traceback (most recent call last):
File "SRT411-Lab2.py", line 9, in
lines = file.readlines()
File "C:\ProgramFiles\WindowsApps\PythonSoftwareFoundation.Python.3.7_3.7.1776.0_x64__qbz5n2kfra8p0\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 756: character maps to <\undefined>
Any help is greatly appreciated!
#!/usr/bin/python3
import sys
filename = sys.argv[1]
print(filename)
file = open(filename, 'r')
lines = file.readlines()
file.close()
print(lines)
I'd probably convert the excel file to csv file and use pandas to parse it

Python 'utf-8' codec stop message with IIS log

With the following python code
import csv
log_file = open('190415190514.txt', 'r')
all_data = csv.reader(log_file, delimiter=' ')
data = []
for row in all_data:
data.append(row)
to read a big file containing
2019-04-15 00:00:46 192.168.168.29 GET / - 443 - 192.168.168.80 Mozilla/5.0+(compatible;+PRTG+Network+Monitor+(www.paessler.com);+Windows) - 200 0 0 0
I get this error
File "main.py", line 5, in <module>
for row in datareader:
File "/usr/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 1284: invalid start byte
I think there is no problem with the data file since it is a IIS log file. If there is any encoding issue, how can I locate that line? I am also not sure if my problem is the same this one.
Since you opened the file as 'r' instead of 'rb', python is trying to decode it as utf-8. The contents of the file are apparently not valid utf-8, so you're getting an erorr. You can find the line number of the offending line like this:
with open('190415190514.txt', 'rb') as f:
for i, line in enumerate(f):
try:
line.decode('utf-8')
except UnicodeDecodeError as e:
print (f'{e} at line {i+1}')
You probably should be passing errors or encoding to open. see: https://docs.python.org/3/library/functions.html#open

How to convert large binary file into pickle dictionary in python?

I am trying to convert large binary file contains Arabic words with 300 dimension vectors into pickle dictionary
What I am write so far is:
import pickle
ArabicDict = {}
with open('cc.ar.300.bin', encoding='utf-8') as lex:
for token in lex:
for line in lex.readlines():
data = line.split()
ArabicDict[data[0]] = float(data[1])
pickle.dump(ArabicDict,open("ArabicDictionary.p","wb"))
The error which I am getting is:
Traceback (most recent call last):
File "E:\Dataset", line 4, in <module>
for token in lex:
File "E:\lib\codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xba in position 0: invalid start byte

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position

When I use open and read syntax to open and read file in Python 3 and change files encoding, but this error happened. I want to convert a text with any encoding to UTF-8 and save it.
"sin3" has an unknown encoding,
fh= open(sin3, mode="r", encoding='utf8')
ss= fh.read()
File "/usr/lib/python3.2/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 34: invalid continuation byte
I used codecs and got this error:
fh= codecs.open(sin3, mode="r", encoding='utf8')
ss= fh.read()
File "/usr/lib/python3.2/codecs.py", line 679, in read
return self.reader.read(size)
File "/usr/lib/python3.2/codecs.py", line 482, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 34: invalid continuation byte
Try this:
Open the csv file in Sublime text editor.
Save the file in utf-8 format.
In sublime, Click File -> Save with encoding -> UTF-8
Then, you can read your file as usual:
I would recommend using Pandas.
In Pandas, you can read it by using:
import pandas as pd
data = pd.read_csv('file_name.csv', encoding='utf-8')
Try this:
fh = codecs.open(sin3, "r",encoding='utf-8', errors='ignore')
You can solve this problem by using Pandas library
import pandas as pd
data=pd.read_csv("C:\\Users\\akashkumar\\Downloads\\Customers.csv",encoding='latin1')

Resources