Writing BytesIO objects to in-memory Zipfile - python-3.x

I have a Flask-based webapp that I'm trying to do everything in-memory without touching the disk at all.
I have created an in-memory Word doc (using python-docx library) and an in-memory Excel file (using openpyxl). They are both of type BytesIO. I want to return them both with Flask, so I want to zip them up and return the zipfile to the user's browser.
My code is as follows:
inMemory = io.BytesIO()
zipfileObj = zipfile.ZipFile(inMemory, mode='w', compression=zipfile.ZIP_DEFLATED)
try:
print('adding files to zip archive')
zipfileObj.write(virtualWorkbook)
zipfileObj.write(virtualWordDoc)
When the zipfile tries to write the virtualWorkbook I get the following error:
{TypeError}stat: path should be string, bytes, os.PathLike or integer, not BytesIO
I have skimmed the entirety of the internet but have come up empty-handed, so if someone could explain what I'm doing wrong that would be amazing

Seems like it's easier to mount tmpfs/ramdisk/smth to a specific directory like here, and just use tempfile.NamedTemporaryFile() as usual.

You could use the writestr method. It accepts both string and bytes.
zipfileObj.write(zipfile.ZipInfo('folder/name.docx'),
virtualWorkbook.read())

Related

How to upload downloaded telegram media directly on google drive?

I'm working on the telethon download_media method for downloading images and videos. It is working fine (as expected). Now, I want to directly upload the download_media to my google drive folder.
Sample code looks something like:
from telethon import TelegramClient, events, sync
from telethon.tl.types import PeerUser, PeerChat, PeerChannel
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
drive = GoogleDrive(gauth)
gfile = drive.CreateFile({'parents': [{'id': 'drive_directory_path'}]})
api_id = #####
api_hash = ##########
c = client.get_entity(PeerChannel(1234567)) # some random channel id
for m in client.iter_messages(c):
if m.photo:
# below is the one way and it works
# m.download_media("Media/")
# I want to try something like this - below code
gfile.SetContentFile(m.media)
gfile.Upload()
This code is not working. How Can I define the google drive object for download_media?
Thanks in advance. Kindly assist!
The main problem is that according to PyDrive's documentation, setContentFile() expects a string with the file's local path, and then it just uses open(), so you're meant to use this with local files. In your code you're trying to feed it the media file so it won't work.
To upload a bytes file with PyDrive you'll need to convert it to BytesIO and send it as the content. An example with a local file would look like this:
drive = GoogleDrive(gauth)
file = drive.CreateFile({'mimeType':'image/jpeg', 'title':'example.jpg'})
filebytes = open('example.jpg', 'rb').read()
file.content = io.BytesIO(filebytes)
file.Upload()
Normally you don't need to do it this way because setContentFile() does the opening and conversion for you, but this should give you the idea that if you get the bytes media file you can just convert it and assign it to file.content and then you can upload it.
Now, if you look at the Telethon documentation, you will see that download_media() takes a file argument which you can set to bytes:
file (str | file, optional):
The output file path, directory, or stream-like object. If the path exists and is a file, it will be overwritten. If file is the type bytes, it will be downloaded in-memory as a bytestring (e.g. file=bytes).
So you should be able to call m.download_media(file=bytes) to get a bytes object. Looking even deeper at the Telethon source code it appears that this does return a BytesIO object. With this in mind, you can try the following change in your loop:
for m in client.iter_messages(c):
if m.photo:
gfile.content = io.BytesIO(m.download_media(file=bytes))
gfile.Upload()
Note that I only tested the PyDrive side since I currently don't have access to the Telegram API, but looking at the docs I believe this should work. Let me know what happens.
Sources:
PyDrive docs and source
Telethon docs and source

Uploading a file from memory to S3 with Boto3

This question has been asked many times, but my case is ever so slightly different. I'm trying to create a lambda that makes an .html file and uploads it to S3. It works when the file was created on disk, then I can upload it like so:
boto3.client('s3').upload_file('index.html', bucket_name, 'folder/index.html')
So now I have to create the file in memory, for this I first tried StringIO(). However then .upload_file throws an error.
boto3.client('s3').upload_file(temp_file, bucket_name, 'folder/index.html')
ValueError: Filename must be a string`.
So I tried using .upload_fileobj() but then I get the error TypeError: a bytes-like object is required, not 'str'
So I tried using Bytesio() which wants me to convert the str to bytes first, so I did:
temp_file = BytesIO()
temp_file.write(index_top.encode('utf-8'))
print(temp_file.getvalue())
boto3.client('s3').upload_file(temp_file, bucket_name, 'folder/index.html')
But now it just uploads an empty file, despite the .getvalue() clearly showing that it does have content in there.
What am I doing wrong?
If you wish to create an object in Amazon S3 from memory, use put_object():
import boto3
s3_client = boto3.client('s3')
html = "<h2>Hello World</h2>"
s3_client.put_object(Body=html, Bucket='my-bucket', Key='foo.html', ContentType='text/html')
But now it just uploads an empty file, despite the .getvalue() clearly >showing that it does have content in there.
When you finish writing to a file buffer, the position stays at the end. When you upload a buffer, it starts from the position it is currently in. Since you're at the end, you get no data. To fix this, you just need to add a seek(0) to reset the buffer back to the beginning after you finish writing to it. Your code would look like this:
temp_file = BytesIO()
temp_file.write(index_top.encode('utf-8'))
temp_file.seek(0)
print(temp_file.getvalue())
boto3.client('s3').upload_file(temp_file, bucket_name, 'folder/index.html')

Python3 convert byte object to file object

I am using an API that only takes file objects (a BufferedRandom object returned by open(file_name, 'r+b')).
However, what I have in hand is a variable (bytes object, returned by with open(file_name, "rb") as file:
file.read())
I am wondering how to convert this bytes object into the BufferedRandom object to serve as input of the API, because if I input the bytes object as input to the API function, I got the error "bytes" object has no attribute "read".
Thank you very much!
Found an answer here.
You can get your bytes data into a file object with this:
import io
f = io.BytesIO(raw_bytes_data)
Now f behaves just like a file object with methods like read, seek etc.
I had a similar issue when I needed to download (export) files from one REDCap database and upload (import) them into another using PyCap. The export_file method returns the file contents as a raw bytestream, while import_file needs a file object. Works with Python 3.

What does "deallocated bytearray object has exported buffers" mean exactly

I am trying to run an encryption algorithm using AES256 but i get this error instead:
"deallocated bytearray object has exported buffers"
I can't seem to find any proper explanation on what the error itself actually means and therefore am having trouble debugging this. Can anyone explain?
For context, this seems to happen particularly for large files over 1GB
for root, dirs, files in os.walk(dirPath):
for name in files:
filePath = os.path.join(root, name)
with open(filePath, 'rb') as _file:
textStr = _file.read()
encrypted = fernet.encrypt(textStr)
with open(filePath, 'wb') as _file:
_file.write(encrypted)
The above code is me trying to encrypt all files in a directory
It's referring to the buffer protocol, a way of making views of raw memory in Python. It's not frequently used at the Python layer, instead usually being seen in CPython C modules, both built-in and third party C extension modules. The easiest way to use it from Python itself is with the memoryview type.
My guess is that something in the code (yours or the module you're using) made a view of a bytearray object as a buffer, then decref-ed the bytearray to zero references before releasing the buffer.

Read with MXRecordIO from bytes object

Is there a way that I can use mx.recordio.MXRecordIO to read from a bytes object rather than a file object?
For example I'm currently doing:
import mxnet as mx
results_file = 'results.rec'
with open(results_file, 'wb') as f:
f.write(results)
recordio = mx.recordio.MXRecordIO(results_file, 'r')
temp = recordio.read()
But if possible I'd rather not have to write to file as an intermediate step. I've tried using BytesIO, but can't seem to get it to work.
Currently they is no way of achieving this sorry. This is non-trivial because the RecordIO reading/parsing is done in C++ and you can't simply forward the stream to the C++ API.

Resources