python3 send image via pipe / udp, PIL errors, etc - python-3.x

I'm having a difficult time figuring out how to send and receive an image (and display it, not save it) via UDP socket or named pipe (fifo) on linux.
With UDP sockets the problem I have is the file size exceeds the max buffer size. Conceptually I could iterate over a bytearray of the image in buffer-size chunks, but I'm not sure how to implement this, or how to reconstruct the image on the other side.
With named pipes, below is my code.
p1.py:
fifo_name = "fifoTest"
def Test():
img = cv2.imread("Lenna.png")
data = bytearray(img)
try:
os.mkfifo(fifo_name)
print("made fifo")
except FileExistsError:
print("fifo exists!")
with open(fifo_name, "wb", os.O_NONBLOCK) as f:
f.write(data)
f.close()
print("done!")
Some issues with p1 is that I can't seem to figure out when the writing to the pipe has finished, and often time no matter what happens on the other pipe this gives a BrokenPipeError
p2.py:
import os
import io
from PIL import Image
import cv2
pipe = os.open("fifoTest", os.O_RDONLY, os.O_NONBLOCK)
# img is 117966bytes as given by shell command wc -c < Lenna.png
imgBytes = os.read(pipe, 117966)
#img = Image.open(io.BytesIO(imgBytes))
print(imgBytes)
print(len(imgBytes))
#cv2.imshow("img", img)
input("there?")
With p2 and the commented lines, I have no problems. After input captures my keypress, I get no errors, but as mentioned about p1 errors with broken pipe. When img = Image.open(io.BytesIO(imgBytes)) is uncommented, I get
img = Image.open(io.BytesIO(imgBytes))
File "/usr/local/lib/python3.5/dist-packages/PIL/Image.py", line 2585, in open
% (filename if filename else fp))
OSError: cannot identify image file <_io.BytesIO object at 0x7f21b84b0a98>
I feel like this shouldnt be a hard problem, this is a really basic operation. I need this to happen at about 10fps (which is why I'm not looking at other options). I have also gotten resource temporarily unavailable error a few times, but I think the os.O_NONBLOCK flag fixed that.
Things that I have already looked at (some were helpful, but I need to display an image once its received):
Sending image over sockets (ONLY) in Python, image can not be open
Send image using socket programming Python
File Send by UDP Sockets
Send image using socket programming Python
PIL: Convert Bytearray to Image

Related

Why does zlib decompression break after an http request is reinitated?

I have a python script that "streams" a very large gzip file using urllib3 and feeds it into a zlib.decompressobj. This zlib decompression object is configured to read gzip compression. If this initial http connection is interrupted then the zlib.decompressobj begins to throw errors after the connection is "resumed". See my source code below if you want to cut to the chase.
These errors occur despite the fact that the script initiates a new http connection using the Range header to specify the number of bytes previously read. It resumes from the completed read point present when the connection was broken. I believe this arbitrary resume point is the source of my problem.
If I don't try to decompress the chunks of data being read in by urllib3, but instead just write them to a file, everything works just fine. Without trying to decompress the stream everything works even when there is an interruption. The completed archive is valid, it is the same size as one downloaded by a browser and the MD5 hash of the .gz file is the same as if I had downloaded the file directly with Chrome.
On the other hand, if I try to decompress the chunks of data coming in after the interruption, even with the Range header specified, the zlib library throws all kinds of errors. The most recent was Error -3 while decompressing data: invalid block type
Additional Notes:
The site that I am using has the Accept-Range flag set to bytes meaning that I am able to submit modified Range headers to the server.
I am not using the requests library in this script as it ultimately manages urllib3. I am instead using urllib3 directly in an attempt to cut out the middle man.
This script it an oversimplification of my ultimate goal, which is to stream the compressed data directly from where it is hosted, enrich it and store it in a MySQL database on the local network.
I am heavily resource constrained inside of the docker container where this processing will occur.
The genesis of this question is present in a question I asked almost 3 weeks ago: requests.iter_content() thinks file is complete but it's not
The most common problem I am encountering with the urllib3 (and requests) library is the IncompleteRead(self._fp_bytes_read, self.length_remaining) error.
This error only appears if the urllib3 library has been patched to raise an exception when an incomplete read occurs.
My best guess:
I am guessing that the break in the data stream being fed to zlib.decompressobj is causing zlib to somehow lose context and start attempting to decompress the data again in an odd location. Sometimes it will resume, however the data stream is garbled, making me believe the byte location used as the new Range header fell at the front of some bytes which are then incorrectly interpreted as headers. I do not know how to counteract this and I have been trying to solve it for several weeks. The fact that the data are still valid when downloaded whole (without being decompressed prior to completion) even with an interruption occurs, makes me believe that some "loss of context" within zlib is the cause.
Source Code: (Has been updated to include a "buffer")
This code is a little bit slapped together so forgive me. Also, this target gzip file is quite a lot smaller than the actual file I will be using. Additionally, the target file in this example will no longer be available from Rapid7 in about a month's time. You may choose to substitute a different .gz file if that suits you.
import urllib3
import certifi
import inspect
import os
import time
import zlib
def patch_urllib3():
"""Set urllib3's enforce_content_length to True by default."""
previous_init = urllib3.HTTPResponse.__init__
def new_init(self, *args, **kwargs):
previous_init(self, *args, enforce_content_length = True, **kwargs)
urllib3.HTTPResponse.__init__ = new_init
#Path the urllib3 module to throw an exception for IncompleteRead
patch_urllib3()
#Set the target URL
url = "https://opendata.rapid7.com/sonar.http/2021-11-27-1638020044-http_get_8899.json.gz"
#Set the local filename
local_filename = '2021-11-27-1638020044-http_get_8899_script.json.gz'
#Configure the PoolManager to handle https (I think...)
http = urllib3.PoolManager(ca_certs=certifi.where())
#Initiate start bytes at 0 then update as download occurs
sum_bytes_read=0
session_bytes_read=0
total_bytes_read=0
#Dummy variable to silence console output from file write
writer=0
#Set zlib window bits to 16 bits for gzip decompression
decompressor = zlib.decompressobj(zlib.MAX_WBITS|16)
#Build a buffer list
buf_list=[]
i=0
while True:
print("Building request. Bytes read:",total_bytes_read)
resp = http.request(
'GET',
url,
timeout=urllib3.Timeout(connect=15, read=40),
preload_content=False)
print("Setting headers.")
#This header should cause the request to resume at "total_bytes_read"
resp.headers['Range'] = 'bytes=%s' % (total_bytes_read)
print("Local filename:",local_filename)
#If file already exists then append to it
if os.path.exists(local_filename):
print("File already exists.")
try:
print("Starting appended download.")
with open(local_filename, 'ab') as f:
for chunk in resp.stream(2048):
buf_list.append(chunk)
#Use i to offset the chunk being read from the "buffer"
#I.E. load 3 chunks (0,1,2) in the buffer list before starting to read from it
if i >2:
buffered_chunk=buf_list.pop(0)
writer=f.write(buffered_chunk)
#Comment out the below line to stop the error from occurring.
#File download should complete successfully even if interrupted when the following line is commented out.
decompressed_chunk=decompressor.decompress(buffered_chunk)
#Increment i so that the buffer list will fill before reading from it
i=i+1
session_bytes_read = resp._fp_bytes_read
#Sum bytes read is an updated value that isn't stored. It is only used for console print
sum_bytes_read = total_bytes_read + session_bytes_read
print("[+] Bytes read:",str(format(sum_bytes_read, ",")), end='\r')
print("\nAppended download complete.")
break
except Exception as e:
print(e)
#Add to total bytes read to current session bytes each time the loop needs to repeat
total_bytes_read=total_bytes_read+session_bytes_read
print("Bytes Read:",total_bytes_read)
#Mod the total_bytes back to the nearest chunk size so it can be - re-requested
total_bytes_read=total_bytes_read-(total_bytes_read%2048)-2048
print("Rounded bytes Read:",total_bytes_read)
#Pop the last entry off of the buffer since it may be incomplete
buf_list.pop()
#reset i so that the buffer has to rebuilt
i=0
print("Sleeping for 30 seconds before re-attempt...")
time.sleep(30)
#If file doesn't already exist then write to it directly
else:
print("File does not exist.")
try:
print("Starting initial download.")
with open(local_filename, 'wb') as f:
for chunk in resp.stream(2048):
buf_list.append(chunk)
#Use i to offset the chunk being read from the "buffer"
#I.E. load 3 chunks (0,1,2) in the buffer list before starting to read from it
if i > 2:
buffered_chunk=buf_list.pop(0)
#print("Buffered Chunk",str(i-2),"-",buffered_chunk)
writer=f.write(buffered_chunk)
decompressed_chunk=decompressor.decompress(buffered_chunk)
#Increment i so that the buffer list will fill before reading from it
i=i+1
session_bytes_read = resp._fp_bytes_read
print("[+] Bytes read:",str(format(session_bytes_read, ",")), end='\r')
print("\nInitial download complete.")
break
except Exception as e:
print(e)
#Set the total bytes read equal to the session bytes since this is the first failure
total_bytes_read=session_bytes_read
print("Bytes Read:",total_bytes_read)
#Mod the total_bytes back to the nearest chunk size so it can be - re-requested
total_bytes_read=total_bytes_read-(total_bytes_read%2048)-2048
print("Rounded bytes Read:",total_bytes_read)
#Pop the last entry off of the buffer since it may be incomplete
buf_list.pop()
#reset i so that the buffer has to rebuilt
i=0
print("Sleeping for 30 seconds before re-attempt...")
time.sleep(30)
print("Looping...")
#Finish writing from buffer into file
#BE SURE TO SET TO "APPEND" with "ab" or you will overwrite the start of the file
f = open(local_filename, 'ab')
print("[+] Finishing write from buffer.")
while not len(buf_list) == 0:
buffered_chunk=buf_list.pop(0)
writer=f.write(buffered_chunk)
decompressed_chunk=decompressor.decompress(buffered_chunk)
#Flush and close the file
f.flush()
f.close()
resp.release_conn()
Reproducing the error
To reproduce the error perform the following actions:
Run the script and let the download start
Be sure that line 65 decompressed_chunk=decompressor.decompress(chunk) is not commented out
Turn off your network connection until an exception is raised
Turn your network connection back on immediately.
If the decompressor.decompress(chunk) line is removed from the script then it will download the file and the data can be successfully decompressed from the file itself. However, if line 65 is present and an interruption occurs, the zlib library will not be able to continue decompressing the data stream. I need to decompress the data stream as I cannot store the actual file I am trying to use.
Is there some way to prevent this from occurring? I have now attempted to add a "buffer" list that stores the chunks; the script discards the last chunk after a failure and moves back to a point in the file that preceded the "failed" chunk. I am able to re-establish the connection and even pull back all the data correctly but even with a "buffer" my ability to decompress the stream is interrupted. I must not be smoothly recovering the data back to the buffer somehow.
Visualization:
I put this together very quickly in an attempt to better describe what I am trying to do...
I bet Mark Adler is hiding out there somewhere...
r+b doesn't append. You would need to use ab for that. It appears that on the re-try, you are reading the entire gzip file again from the start. With r+b, that file is written correctly to your output file, by overwriting what was read before.
However, you are feeding the initial read to the decompressor, and then the start of the file again. Not surprisingly, the decompressor then soon detects invalid compressed data.

PIL saving image asyncronously?

I am processing an image progressively by cropping it multiple times inside a loop. Before each crop I save the new base image and then I proceed to the next operation. Lately I tried to to this while testing with cypress, so operations are much faster and by the time the next crop is requested sometimes the original file has not been saved yet. (Used
subprocess.run(['ls', fromPage_imagePath], capture_output=True, text=True).stdout before cropping and noticed that the file was not there at the time of the next loop)
The problem looked addressable but instead I've been struggling with this at least a couple of hours.
I first tried to flush the image, i.e.
img.save(toPage_imagePath)
img.flush()
os.fsync(img)
but then realized that a PIL object is not a file. I then followed recommendations from this post and tried using a file object for saving the image, i.e.
1:
with open(toPage_imagePath, 'wb') as out_file:
img.save(out_file, 'PNG')
out_file.flush()
os.fsync(out_file)
and 2:
out_file = open(toPage_imagePath, 'wb')
img.save(out_file, 'PNG')
out_file.flush()
os.fsync(out_file)
out_file.close()
and finally tried even the time waiting option suggested in there, or:
img.save(toPage_imagePath)
noinfinite = 0
while noinfinite < 25:
time.sleep(1)
if os.path.isfile(toPage_imagePath):
print('file found')
break
noinfinite += 1
but my code seems to swiftly ignore my attempts and jumps to the next loop. What I am missing here? I need a way to reliably pause the code until the PIL image is saved.

discord.py send BytesIO

I am manipulating an image using Pillow, and then want to send it to Discord. My code: https://paste.pythondiscord.com/comebefupo.py
When using image.show(), the manipulated image is shown fine.
however, when I want to upload the image to Discord, the bot gets stuck and no error is thrown:
got bytes from direct image string
got bytes from member/user pfp or str
opened image
opened draw
drew top text
drew bottom text
prepared buffer
prepared file
# Bot just gets stuck here, no errors
According to multiple sources (1, 2), I am doing the right thing by saving the image into a BytesIO stream, and then using seek(0).
According to the documentation for a discord.File, it takes a io.BufferedIOBase which is (I believe) I put in.
EDIT:
saving the image first, and then sending that works.
# Return whole image object
return image
self.convert_bytes(image_bytes, top_text, bottom_text).save('image.png')
await ctx.send(file=discord.File('image.png'))
I have no clue why this works, and the other thing doesnt...
I had a similar problem last week, this was the code I used to send the image
with BytesIO() as image_binary:
image.save(image_binary, 'PNG')
image_binary.seek(0)
await ctx.send(file=discord.File(fp=image_binary, filename='image.png'))
This is not a full answer but it might help.
image_file = discord.File(io.BytesIO(image_bytes.encode()),filename=f"{name}.png")
await ctx.send(file=image_file )

Output data from subprocess command line by line

I am trying to read a large data file (= millions of rows, in a very specific format) using a pre-built (in C) routine. I want to then yeild the results of this, line by line, via a generator function.
I can read the file OK, but where as just running:
<command> <filename>
directly in linux will print the results line by line as it finds them, I've had no luck trying to replicate this within my generator function. It seems to output the entire lot as a single string that I need to split on newline, and of course then everything needs reading before I can yield line 1.
This code will read the file, no problem:
import subprocess
import config
file_cmd = '<command> <filename>'
for rec in (subprocess.check_output([file_cmd], shell=True).decode(config.ENCODING).split('\n')):
yield rec
(ENCODING is set in config.py to iso-8859-1 - it's a Swedish site)
The code I have works, in that it gives me the data, but in doing so, it tries to hold the whole lot in memory. I have larger files than this to process which are likely to blow the available memory, so this isn't an option.
I've played around with bufsize on Popen, but not had any success (and also, I can't decode or split after the Popen, though I guess the fact I need to split right now is actually my problem!).
I think I have this working now, so will answer my own question in the event somebody else is looking for this later ...
proc = subprocess.Popen(shlex.split(file_cmd), stdout=subprocess.PIPE)
while True:
output = proc.stdout.readline()
if output == b'' and proc.poll() is not None:
break
if output:
yield output.decode(config.ENCODING).strip()

Is there any way I can add image in bytes to pdf in Python?

I generated some plots in pandas and save it in BytesIO streams, and then I want to add it to a pdf page, then send out the pdf file as attachment in email:
import matplotlib.pyplot as plt
import io
from fpdf import FPDF
fig = plt.figure()
...
buf = io.BytesIO()
fig.savefig(buf, format='png')
pdf = FPDF()
pdf.add_page()
pdf.image(buf.getvalue(), type='PNG')
buf.close()
But this is not working, with the following error reported:
Traceback (most recent call last):
File "XXXX.py", line 166, in send_email
pdf.image(buf.getvalue(), type='PNG')
File "/usr/local/lib/python3.6/site-packages/fpdf/fpdf.py", line 150, in wrapper
return fn(self, *args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/fpdf/fpdf.py", line 971, in image
info=self._parsepng(name)
File "/usr/local/lib/python3.6/site-packages/fpdf/fpdf.py", line 1769, in _parsepng
if name.startswith("http://") or name.startswith("https://"):
TypeError: startswith first arg must be bytes or a tuple of bytes, not str
I want to solve this purely in memory and not to save image files locally. Can anyone help me with this? Thank you so much.
"I want to solve this purely in memory and not to save image files locally."
No you should not do that. UNLESS you save a PDF in memory on a massive memory file drive. Most devices use the file system for extended memory so its a lot easier to use the physical file system rather than build a custom one in your very precious memory resources. Small files may work if your system has a Memory stream File System to use filenames with.
A PDF is a resource hog since you MUST
save the image as a naturally compressed image object
decompress in that precious memory a reprex to inject as a PDF image one way or another
usually duplicated and re-expanded to be visible on screen
then the image data needs to be encoded into a deflated stream object
then the new flated image needs to be written to the pdf as a partial file object with a hardcoded decimal address in a PDF file System.
then the object needs to be cataloged and indexed at the end of the physical file so the WHOLE PDF must also be expanded in memory.
... am I explaining why its simpler to just save down to drip feed cached file objects ? Rather than mess about using Gigabytes of RAM drives.
Yes, you can use bytes instead of a file ...
In my case i have an MSSQL query that contains images as a binary string. And i want to use them directly, without saving multiple image-files. So i was looking for a solution.
What we need:
pip install fpdf2
Then import IO and FPDF2 into your python 3.x file:
import io
from fpdf import FPDF
A look into the image_parsing.py of the fpdf2-repository on Github shows that fpdf2 can work with binary objects too. No real image-file needed.
With BytesIO from io we can create a binary object out of our bytes string.
We name the binary object 'picture' and place it as our image into the PDF page.
import io
from fpdf import FPDF
# create bytes object of the image data
picture = io.BytesIO(b"some initial binary data: \x00\x01")
# set page to portrait A4 and positioning in mm
pdf = FPDF("P", "mm", "A4")
# create a page
pdf.add_page()
# insert image
# on position: x=10mm, y=10mm
# size: width=50mm hight=auto
pdf.image(picture,10,10,50)
# create PDF file
pdf.output("fpdf_test.pdf")
I hope this is helpful to some people.

Resources