pyqt5 fails to inserts bytes - python-3.x

How do you insert a bytes object into a database. Whenever I attempt this I end up with an empty string (not NULL) being inserted.
with open(filepath, 'rb') as f:
filehash = hashlib.md5(f.read()).hexdigest()
img_pkl = pickle.dumps(img, protocol=4)
record = self.tablemodel.record()
record.setValue('originfile_path', filepath)
record.setValue('originfile_hash', filehash)
record.setValue('image', img_pkl)
record.setValue('area', area)
self.tablemodel.insertRecord(-1, record)
The issue was noted on the mailing list but never addressed.
https://www.riverbankcomputing.com/pipermail/pyqt/2016-April/037260.html
It also turns out that it makes more sense to use the table model when you can instead of prepared statements (he also noted a bug in pyqt regarding exec() vs exec_().

You must explicitly convert to a QByteArray.
record.setValue('image', QtCore.QByteArray(img_pkl))
Note: you also have to convert numpy.float64 objects using float()

Related

adding getmtime() to db and getting TypeError: 'float' object is not iterable

I'm trying to add all the file names and last modification time for files in a specific directory to a database in two different columns. This is code I have:
def getlist (self):
dbinput = self.txt_dest.get()
conn = sqlite3.connect ('dirdrill.db')
for files in os.listdir(dbinput):
for modtime in os.path.getmtime(dbinput):
with conn:
cur = conn.cursor()
cur.execute("INSERT INTO tbl_files(col_filename, col_modinfo) values (?,?)", files, modtime)
conn.commit()
conn.close()
getinfo(self)
I am getting the following error:
File "C:\Users\Ryan\Documents\GitHub\Tech-Academy-Projects\Python\dir_drill\dirdrill_func.py", line 69, in getlist
for modtime in os.path.getmtime(dbinput):
TypeError: 'float' object is not iterable
Do I need to convert the numbers that getmtime() returns into a human friendly format so that it is not a float? If so, how? Or should I be structuring the function a different way?
EDIT:
Thanks to some other help I found out that I just needed to change one line to modinfo = os.path.getmtime(dbinput). Did that and it solved the float error. However now it says that I am offering too many arguments, max 2 and I have 3, even thought I only see two. Onto the next bug... :-)
Found the answer to the original question and the additional error I discovered.
In the original code I needed to change the following:
for modtime in os.path.getmtime(dbinput):
to:
modtime = os.path.getmtime(dbinput)
For the follow up error related to too many parameters, files and modinfo needed to be wrapped in parenthesis like so:
cur.execute("INSERT INTO tbl_files(col_filename, col_modinfo) values (?,?)", (files, modtime))
Hopes this helps anyone else experiencing the same error.

python-snappy streaming data in a loop to a client

I would like to send multiple compressed arrays from a server to a client using python snappy, but I cannot get it to work after the first array. Here is a snippet for what is happening:
(sock is just the network socket that these are communicating through)
Server:
for i in range(n): #number of arrays to send
val = items[i][1] #this is the array
y = (json.dumps(val)).encode('utf-8')
b = io.BytesIO(y)
#snappy.stream_compress requires a file-like object as input, as far as I know.
with b as in_file:
with sock as out_file:
snappy.stream_compress(in_file, out_file)
Client:
for i in range(n): #same n as before
data = ''
b = io.BytesIO()
#snappy.stream_decompress requires a file-like object to write o, as far as I know
snappy.stream_decompress(sock, b)
data = b.getvalue().decode('utf-8')
val = json.loads(data)
val = json.loads(data) works only on the first iteration, but afterwards it stop working. When I do a print(data), only the first iteration will print anything. I've verified that the server does flush and send all the data, so I believe it is a problem with how I decide to receive the data.
I could not find a different way to do this. I searched and the only thing I could find is this post which has led me to what I currently have.
Any suggestions or comments?
with doesn't do what you think, refer to it's documentation. It calls sock.__exit__() after the block executed, that's not what you intended.
# what you wrote
with b as in_file:
with sock as out_file:
snappy.stream_compress(in_file, out_file)
# what you meant
snappy.stream_compress(b, sock)
By the way:
The line data = '' is obsolete because it's reassigned anyways.
Adding to #paul-scharnofske's answer:
Likewise, on the receiving side: stream_decompress doesn't quit until end-of-file, which means it will read until the socket is closed. So if you send separate multiple compressed chunks, it will read all of them before finishing, which seems not what you intend. Bottom line, you need to add "framing" around each chunk so that you know on the receiving end when one ends and the next one starts. One way to do that... For each array to be sent:
Create a io.BytesIO object with the json-encoded input as you're doing now
Create a second io.BytesIO object for the compressed output
Call stream_compress with the two BytesIO objects (you can write into a BytesIO in addition to reading from it)
Obtain the len of the output object
Send the length encoded as a 32-bit integer, say, with struct.pack("!I", length)
Send the output object
On the receiving side, reverse the process. For each array:
Read 4 bytes (the length)
Create a BytesIO object. Receive exactly length bytes, writing those bytes to the object
Create a second BytesIO object
Pass the received object as input and the second object as output to stream_decompress
json-decode the resulting output object

Reportlab PDF creating with python duplicating text

I am trying to automate the production of pdfs by reading data from a pandas data frame and writing it a page on an existing pdf form using pyPDF2 and reportlab. The main meat of the program is here:
def pdfOperations(row, bp):
packet = io.BytesIO()
can = canvas.Canvas(packet, pagesize=letter)
createText(row, can)
packet.seek(0)
new_pdf = PdfFileReader(packet)
textPage = new_pdf.getPage(0)
secondPage = bp.getPage(1)
secondPage.mergePage(textPage)
assemblePDF(frontPage, secondPage, row)
del packet, can, new_pdf, textPage, secondPage
def main():
df = openData()
bp = readPDF()
frontPage = bp.getPage(0)
for ind in df.index:
row = df.loc[ind]
pdfOperations(row, bp)
This works fine for the first row of data and the first pdf generated, but for the subsequent ones all the text is overwritten. I.e. the second pdf contains text from the first iteration and the second. I thought the garbage collection would take care of all the in memory changes, but that does not seem to be happening. Anyone know why?
I even tries forcing the objects to be deleted after the function has run its course, but no luck...
You read bp only once before the loop. Then in the loop, you obtain its second page via getPage(1) and merge stuff to it. But since its always from the same object (bp), each iteration will merge to the same page, therefore all the merges done before add up.
While I don't find any way to create a "deepcopy" of a page in PyPDF2's docs, it should work to just create a new bp object for each iteration.
Somewhere in readPDF you must have done something where you open your template PDF into a binary stream and then pass that to PdfFileReader. Instead, you could read the data into a variable:
with open(filename, "rb") as f:
bp_bin = f.read()
And from that, create a new PdfFileReader instance for each loop iteration:
for ind in df.index:
row = df.loc[ind]
bp = PdfFileReader(bp_bin)
pdfOperations(row, bp)
This should "reset" the secondPage everytime without any additional file I/O overhead. Only the parsing is done again each time, but depending on the file size and contents, maybe the time that takes is low and you can live with that.

Python 3.4: str : AttributeError: 'str' object has no attribute 'decode

I have this code part of a function that replace badly encoded foreign characters from a string :
s = "String from an old database with weird mixed encodings"
s = str(bytes(odbc_str.strip(), 'cp1252'))
s = s.replace('\\x82', 'é')
s = s.replace('\\x8a', 'è')
(...)
print(s)
# b"String from an old database with weird mixed encodings"
I need here a "real" string, not bytes. But whend i want to decode them, i have an exception :
s = "String from an old database with weird mixed encodings"
s = str(bytes(odbc_str.strip(), 'cp1252'))
s = s.replace('\\x82', 'é')
s = s.replace('\\x8a', 'è')
(...)
print(s.decode("utf-8"))
# AttributeError: 'str' object has no attribute 'decode'
Do you know why s is bytes here ?
Why can't i decode it to a real string ?
Do you know how to do it the clean way ? (today i return s[2:][:-1]. Working but very ugly, and i would like to understand this behavior)
Thanks in advance !
EDIT :
pypyodbc in python3 use all unicode by default. That confused me. On connect, you can tell him to use ANSI.
con_odbc = pypyodbc.connect("DSN=GP", False, False, 0, False)
Then, i can convert the returned stuffs into cp850, which is the initial codepage of the database.
str(odbc_str, "cp850", "replace")
No more need to manualy replace each special character.
Thank you very much pepr
The printed b"String from an old database with weird mixed encodings" is not the representation of the string content. It is the value of the string content. As you did not pass the encoding argument to str()... (see the doc https://docs.python.org/3.4/library/stdtypes.html#str)
If neither encoding nor errors is given, str(object) returns object.__str__(), which is the “informal” or nicely printable string representation of object. For string objects, this is the string itself. If object does not have a __str__() method, then str() falls back to returning repr(object).
This is what happened in your case. The b" are actually two characters that are the part of the string content. You can also try:
s1 = 'String from an old database with weird mixed encodings'
print(type(s1), repr(s1))
by = bytes(s1, 'cp1252')
print(type(by), repr(by))
s2 = str(by)
print(type(s2), repr(s2))
and it prints:
<class 'str'> 'String from an old database with weird mixed encodings'
<class 'bytes'> b'String from an old database with weird mixed encodings'
<class 'str'> "b'String from an old database with weird mixed encodings'"
This is the reason why s[2:][:-1] works for you.
If you think more about it, then (in my opinion) or you want to get bytes or bytearray from the database (if possible), and to fix the bytes (see bytes.translate https://docs.python.org/3.4/library/stdtypes.html?highlight=translate#bytes.translate) or you successfully get the string (being lucky that there was no exception when constructing that string), and you want to replace the wrong characters by the correct characters (see also str.translate() https://docs.python.org/3.4/library/stdtypes.html?highlight=translate#str.translate).
Possibly, the ODBC used internally the wrong encoding. (That is the content of the database may be correct, but it was misinterpreted by the ODBC, and you are not able to tell the ODBC what is the correct encoding.) Then you want to encode the string back to bytes using that wrong encoding, and then decode the bytes using the right encoding.

Extracting source code from html file using python3.1 urllib.request

I'm trying to obtain data using regular expressions from a html file, by implementing the following code:
import urllib.request
def extract_words(wdict, urlname):
uf = urllib.request.urlopen(urlname)
text = uf.read()
print (text)
match = re.findall("<tr>\s*<td>([\w\s.;'(),-/]+)</td>\s+<td>([\w\s.,;'()-/]+)</td>\s*</tr>", text)
which returns an error:
File "extract.py", line 33, in extract_words
match = re.findall("<tr>\s*<td>([\w\s.;'(),-/]+)</td>\s+<td>([\w\s.,;'()-/]+)</td>\s*</tr>", text)
File "/usr/lib/python3.1/re.py", line 192, in findall
return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object
Upon experimenting further in the IDLE, I noticed that the uf.read() indeed returns the html source code the first time I invoke it. But then onwards, it returns a - b''. Is there any way to get around this?
uf.read() will only read the contents once. Then you have to close it and reopen it to read it again. This is true for any kind of stream. This is however not the problem.
The problem is that reading from any kind of binary source, such as a file or a webpage, will return the data as a bytes type, unless you specify an encoding. But your regexp is not specified as a bytes type, it's specified as a unicode str.
The re module will quite reasonably refuse to use unicode patterns on byte data, and the other way around.
The solution is to make the regexp pattern a bytes string, which you do by putting a b in front of it. Hence:
match = re.findall(b"<tr>\s*<td>([\w\s.;'(),-/]+)</td>\s+<td>([\w\s.,;'()-/]+)</td>\s*</tr>", text)
Should work. Another option is to decode the text so it also is a unicode str:
encoding = uf.headers.getparam('charset')
text = text.decode(encoding)
match = re.findall("<tr>\s*<td>([\w\s.;'(),-/]+)</td>\s+<td>([\w\s.,;'()-/]+)</td>\s*</tr>", text)
(Also, to extract data from HTML, I would say that lxml is a better option).

Resources