Fixing AttributeError: 'file' object has no attribute 'buffer' (Python3) - linux

Python 2.7 on Ubuntu. I tried run small python script (file converter) for Python3, got error:
$ python uboot_mdb_to_image.py < input.txt > output.bin
Traceback (most recent call last):
File "uboot_mdb_to_image.py", line 29, in <module>
ascii_stdin = io.TextIOWrapper(sys.stdin.buffer, encoding='ascii', errors='strict')
AttributeError: 'file' object has no attribute 'buffer'
I suspect it's caused by syntax differences between python 3 and python 2, here is script itself:
#!/usr/bin/env python3
import sys
import io
BYTES_IN_LINE = 0x10 # Number of bytes to expect in each line
c_addr = None
hex_to_ch = {}
ascii_stdin = io.TextIOWrapper(sys.stdin.buffer, encoding='ascii', errors='strict')
for line in ascii_stdin:
line = line[:-1] # Strip the linefeed (we can't strip all white
# space here, think of a line of 0x20s)
data, ascii_data = line.split(" ", maxsplit = 1)
straddr, strdata = data.split(maxsplit = 1)
addr = int.from_bytes(bytes.fromhex(straddr[:-1]), byteorder = 'big')
if c_addr != addr - BYTES_IN_LINE:
if c_addr:
sys.exit("Unexpected c_addr in line: '%s'" % line)
c_addr = addr
data = bytes.fromhex(strdata)
if len(data) != BYTES_IN_LINE:
sys.exit("Unexpected number of bytes in line: '%s'" % line)
# Verify that the mapping from hex data to ASCII is consistent (sanity check for transmission errors)
for b, c in zip(data, ascii_data):
try:
if hex_to_ch[b] != c:
sys.exit("Inconsistency between hex data and ASCII data in line (or the lines before): '%s'" % line)
except KeyError:
hex_to_ch[b] = c
sys.stdout.buffer.write(data)
Can anyone advice how to fix this please?

It's an old question, but since I've run into a similar issue and it came up first when googling the error...
Yes, it's caused by a difference between Python 3 and 2. In Python 3, sys.stdin is wrapped in io.TextIOWrapper. In Python 2 it's a file object, which doesn't have a buffer attribute. The same goes for stderr and stdout.
In this case, the same functionality in Python 2 can be achieved using codecs standard library:
ascii_stdin = codecs.getreader("ascii")(sys.stdin, errors="strict")
However, this snippet provides an instance of codecs.StreamReader, not io.TextIOWrapper, so may be not suitable in other cases. And, unfortunately, wrapping Python 2 stdin in io.TextIOWrapper isn't trivial - see Wrap an open stream with io.TextIOWrapper for more discussion on that.
The script in question has more Python 2 incompabilities. Related to the issue in question, sys.stdout doesn't have a buffer attribute, so the last line should be
sys.stdout.write(data)
Other things I can spot:
str.split doesn't have maxsplit argument. Use line.split(" ")[:2] instead.
int doesn't have a from_bytes attribute. But int(straddr[:-1].encode('hex'), 16) seems to be equivalent.
bytes type is Python 3 only. In Python 2, it's an alias for str.

Related

how can i open a random text file in python?

i am trying to make a python program that randomly selects a text file to open and outputs the contents of the randomly selected text file
when i try running the code, i get this error
Traceback (most recent call last):
File "//fileserva/home$/K59046/Documents/project/project.py", line 8, in
o = open(text, "r")
TypeError: expected str, bytes or os.PathLike object, not tuple
this is the code that i have written
import os
import random
os.chdir('N:\Documents\project\doodoo')
a = os.getcwd()
print("current dir is",a)
file = random.randint(1, 4)
text = (file,".txt")
o = open(text, "r")
print (o.read())
can somebody tell me what i am doing wrong?
As your error message says, your text variable is a tuple, not a string. You can use f-strings or string concatenation to solve this:
# string concatenation
text = str(file) + ".txt"
# f-strings
text = f"{file}.txt"
Your variable text is not what you expect. You currently create a tuple that could look like this: (2, ".txt"). If you want a string like "2.txt", you need to concatenate the two parts:
text = str(file) + ".txt"

pynag with python3.6 TypeError

I'm trying to read my nagios config data as follows:
pynag.Model.cfg_file = "path_to_nagios.cfg"
all_hosts = pynag.Model.Host.objects.all"
This returns an error
TypeError: endswith first arg must be bytes or a tuple of bytes
From what I've read so far, it seems that it's related to how files are opened in python3
Do you know how to correct this?
Thanks.
The fix was in the library code. The def parse_file() is opening files as 'rb'. The reason this is an error in Python 3 and not Python 2 is that Python 2 treats bytes as an alias or synonym for str. It doesn't make a distinction between byte strings and unicode strings as is done in Python 3.
In pynag/Parsers/init.py changed
lines = open(self.filename, 'rb').readlines()
to
lines = open(self.filename, 'r').readlines()

Read data with multi delimiter in pyspark

I have a input file which looks like this and has "|" as multi-delimiter :
162300111000000000106779"|"2005-11-16 14:12:32.860000000"|"1660320"|"0"|"2005-11-16 14:12:32.877000000"|""|""|""|""|""|""|""|"False"|"120600111000000000106776```
I can read this type of record with UDF as below :
inputDf = glueContext.sparkSession.read.option("delimiter", input_file_delimiter,)
.csv("s3://" + landing_bucket_name + "/" + input_file_name)
udf = UserDefinedFunction(lambda x: re.sub('"', '', str(x)))
new_df = inputDf.select(*[udf(column).alias(column) for column in inputDf.columns])
but when i get the input file as
000/00"|"AE71501"|"Complaint for Attachment of Earnings Order"|"In accordance with section test of the Attachment of Test Act Test."|"Non-Test"|"Other non-test offences"|"N"|"Other Non-Test"|"Non-Test
I am getting below exception while reading it, using the same UDF, my code fails at exact same location where i have mu UDF :
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfa' in position 66: ordinal not in range(128)
Any help on below will be great :
- Optimized code to read both type of files , considering "|" as separator.
- How my existing UDF can handle the second type of input records.
This is likely caused by running in Python 2.x which has two separate types for string-like objects (unicode strings and non-unicode strings, which are nowadays simply byte sequences).
Spark will read in your data (which are bytes, as there is no such thing as plain text), and decode the lines as a sequence of Unicode strings. When you call str on a Unicode string that has a codepoint that is not in the ASCII range of codepoints, Python 2 will produce an error:
# python2.7>>> unicode_str = u"ú"
>>> type(unicode_str)
<type 'unicode'>
>>> str(unicode_str)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfa' in position 0: ordinal not in range(128)
The recommended path is that you work with Unicode strings (which is the default string object in Python 3) all throughout your program, except at the point where you either read/receive data (where you should provide a suitable encoding scheme, so that you can decode the raw bytes) and at the point where you write/send data (again, where you use an encoding to encode the data as a series of bytes). This is called “the Unicode sandwich”.
Many libraries, including Spark, already decode bytes and encode unicode strings for you. If you simply remove the call to str in your user defined function, your code will likely work:
#pyspark shell using Python 2.7
>>> spark.sparkContext.setLogLevel("OFF") # to hide the big Py4J traceback that is dumped to the console, without modifying the log4j.properties file
>>> from py4j.protocol import Py4JJavaError
>>> from pyspark.sql.types import *
>>> from pyspark.sql.functions import udf
>>> df = spark.read.csv("your_file.csv", sep="|")
>>> def strip_double_quotes_after_str_conversion(s):
... import re
... return re.sub('"', '', str(s))
...
>>> def strip_double_quotes_without_str_conversion(s):
... import re
... return re.sub('"', '', s)
...
>>> df.select(*[udf(strip_double_quotes_without_str_conversion, StringType())(column).alias(column) for column in df.columns]).show()
+------+-------+--------------------+--------------------+--------------------+----------------+---+--------------------+----+
| _c0| _c1| _c2| _c3| _c4| _c5|_c6| _c7| _c8|
+------+-------+--------------------+--------------------+--------------------+----------------+---+--------------------+----+
|037/02|TH68150|Aggravated vehicl...|Contrary to secti...|Theft of motor ve...|Vehicle offences| Y|Aggravated Vehicl...|37.2|
+------+-------+--------------------+--------------------+--------------------+----------------+---+--------------------+----+
>>> try:
... df.select(*[udf(strip_double_quotes_after_str_conversion, StringType())(column).alias(column) for column in df.columns]).show()
... except Py4JJavaError as e:
... print("That failed. Root cause: %s" % e.java_exception.getCause().getMessage().rsplit("\n", 2)[-2])
...
That failed. Root cause: UnicodeEncodeError: 'ascii' codec can't encode character u'\xfa' in position 78: ordinal not in range(128)
So, the solution to the experienced problem is simple: don’t use str in your UDF.
Note that Python 2.x will no longer be maintained as of January 1st 2020. You’d do well transitioning to Python 3.x before that. In fact, had you executed this in a Python 3 interpreter, you would not have experienced the issue at all.

why does file.tell() affect encoding?

Calling tell() while reading a GBK-encoded file of mine causes the next call to readline() to raise a UnicodeDecodeError. However, if I don't call tell(), it doesn't raise this error.
C:\tmp>hexdump badtell.txt
000000: 61 20 6B 0D 0A D2 BB B0-E3 a k......
C:\tmp>type test.py
with open(r'c:\tmp\badtell.txt', "r", encoding='gbk') as f:
while True:
pos = f.tell()
line = f.readline();
if not line: break
print(line)
C:\tmp>python test.py
a k
Traceback (most recent call last):
File "test.py", line 4, in <module>
line = f.readline();
UnicodeDecodeError: 'gbk' codec can't decode byte 0xd2 in position 0: incomplete multibyte sequence
When I remove the f.tell() statement, it decoded successfully. Why?
I tried Python3.4/3.5 x64 on Win7/Win10, it is all the same.
Any one, any idea? Should I report a bug?
I have a big text file, and I really want to get file position ranges of this big text, is there a workaround?
OK, there is a workaround, It works so far:
with open(r'c:\tmp\badtell.txt', "rb") as f:
while True:
pos = f.tell()
line = f.readline();
if not line: break
line = line.decode("gbk").strip('\n')
print(line)
I submitted an issue yesterday here: http://bugs.python.org/issue26990
still no response yet
I just replicated this on Python 3.4 x64 on Linux. Looking at the docs for TextIOBase, I don't see anything that says tell() causes problems with reading a file, so maybe it is indeed a bug.
b'\xd2'.decode('gbk')
gives an error like the one that you saw, but in your file that byte is followed by the byte BB, and
b'\xd2\xbb'.decode('gbk')
gives a value equal to '\u4e00', not an error.
I found a workaround that works for the data in your original question, but not for other data, as you've since found. Wish I knew why! I called seek() after every tell(), with the value that tell() returned:
pos = f.tell()
f.seek(pos)
line = f.readline()
An alternative to f.seek(f.tell()) is to use the SEEK_CUR mode of seek() to give the position. With an offset of 0, this does the same as the above code: moves to the current position and gets that position.
pos = f.seek(0, io.SEEK_CUR)
line = f.readline()

Python 3.2 TypeError - can't figure out what it means

I originally put this code through Python 2.7 but needed to move to Python 3.x because of work. I've been trying to figure out how to get this code to work in Python 3.2, with no luck.
import subprocess
cmd = subprocess.Popen('net use', shell=True, stdout=subprocess.PIPE)
for line in cmd.stdout:
if 'no' in line:
print (line)
I get this error
if 'no' in (line):
TypeError: Type str doesn't support the buffer API
Can anyone provide me with an answer as to why this is and/or some documentation to read?
Much appreciated.
Python 3 uses the bytes type in a lot places where the encoding is not clearly defined. The stdout of your subprocess is a file object working with bytes data. So, you cannot check if there is some string within a bytes object, e.g.:
>>> 'no' in b'some bytes string'
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
'no' in b'some bytes string'
TypeError: Type str doesn't support the buffer API
What you need to do instead is a test if the bytes string contains another bytes string:
>>> b'no' in b'some bytes string'
False
So, back to your problem, this should work:
if b'no' in line:
print(line)

Resources