Python3 convert byte object to file object - python-3.x

I am using an API that only takes file objects (a BufferedRandom object returned by open(file_name, 'r+b')).
However, what I have in hand is a variable (bytes object, returned by with open(file_name, "rb") as file:
file.read())
I am wondering how to convert this bytes object into the BufferedRandom object to serve as input of the API, because if I input the bytes object as input to the API function, I got the error "bytes" object has no attribute "read".
Thank you very much!

Found an answer here.
You can get your bytes data into a file object with this:
import io
f = io.BytesIO(raw_bytes_data)
Now f behaves just like a file object with methods like read, seek etc.
I had a similar issue when I needed to download (export) files from one REDCap database and upload (import) them into another using PyCap. The export_file method returns the file contents as a raw bytestream, while import_file needs a file object. Works with Python 3.

Related

Torch.jit.save() What is the difference between saving to file vs. to buffer?

I'm new to Pytorch. The documentation on torch.jit.save mentions 2 ways of saving a TorchScript module to disk.
From Torch.jit docs, see the test code given below (2 arrows):
def save(m, f, _extra_files=None):
r"""
Save an offline version of this module for use in a separate process. The
saved module serializes all of the methods, submodules, parameters, and
attributes of this module. It can be loaded into the C++ API using
``torch::jit::load(filename)`` or into the Python API with
:func:`torch.jit.load <torch.jit.load>`.
Args:
m: A :class:`ScriptModule` to save.
f: A file-like object (has to implement write and flush) or a string
containing a file name.
_extra_files: Map from filename to contents which will be stored as part of `f`.
Example:
.. testcode::
import torch
import io
class MyModule(torch.nn.Module):
def forward(self, x):
return x + 10
m = torch.jit.script(MyModule())
# Save to file <-------------
torch.jit.save(m, 'scriptmodule.pt')
# This line is equivalent to the previous
m.save("scriptmodule.pt")
# Save to io.BytesIO buffer <-------------
buffer = io.BytesIO()
torch.jit.save(m, buffer)
# Save with extra files
extra_files = {'foo.txt': b'bar'}
torch.jit.save(m, 'scriptmodule.pt', _extra_files=extra_files)
"""
What is the difference or advantage of saving to buffer rather than to file?. In what cases should I use one or the other?
Source: https://pytorch.org/docs/1.13/_modules/torch/jit/_serialization.html#load
Saving to a file creates a file on disk with the given name and writes the TorchScript module to it, while saving to a buffer writes the module to a memory buffer (in this case an io.BytesIO object) instead of a file on disk. The advantage of saving to a buffer is that it can be more convenient when you want to store the module in memory and don't want to write it to disk.
In cases where you want to persist the module to disk, you should save it to a file, but if you want to send the module over a network or keep it in memory, you should save it to a buffer. Writing it to a buffer potentially saves time and is used in real time systems.

Writing BytesIO objects to in-memory Zipfile

I have a Flask-based webapp that I'm trying to do everything in-memory without touching the disk at all.
I have created an in-memory Word doc (using python-docx library) and an in-memory Excel file (using openpyxl). They are both of type BytesIO. I want to return them both with Flask, so I want to zip them up and return the zipfile to the user's browser.
My code is as follows:
inMemory = io.BytesIO()
zipfileObj = zipfile.ZipFile(inMemory, mode='w', compression=zipfile.ZIP_DEFLATED)
try:
print('adding files to zip archive')
zipfileObj.write(virtualWorkbook)
zipfileObj.write(virtualWordDoc)
When the zipfile tries to write the virtualWorkbook I get the following error:
{TypeError}stat: path should be string, bytes, os.PathLike or integer, not BytesIO
I have skimmed the entirety of the internet but have come up empty-handed, so if someone could explain what I'm doing wrong that would be amazing
Seems like it's easier to mount tmpfs/ramdisk/smth to a specific directory like here, and just use tempfile.NamedTemporaryFile() as usual.
You could use the writestr method. It accepts both string and bytes.
zipfileObj.write(zipfile.ZipInfo('folder/name.docx'),
virtualWorkbook.read())

How to detect encoding of a file format

I have files in bucket of s3 and i am reading them as stream. I want to detect the encoding of the diffrent files.
I used chardet library , i am getting this error:
TypeError: Expected object of type bytes or bytearray, got: <class
'botocore.response.StreamingBody'>
and my code is:
a = (obj.get()['Body'])
reader = chardet.detect(a).get('encoding')
print(reader)
And is there any other ways to detect the encoding before opening of a file
i got this
you need to use read function again!
a = (obj.get()['Body']._raw_stream).read()

How to we send a file (accepted as part of Multipart request) to MINIO object storage in python without saving the file in local storage?

I am trying to write an API in python (Falcon) to accept a file from multipart-form parameter and put the file in MINIO object storage. The problem is I want to send the file to Minio without saving it in any temp location.
Minio-python client has a function using which we can send the file.
`put_object(bucket_name, object_name, data, length)`
where data is the file data and length is total length of object.
For more explanation: https://docs.min.io/docs/python-client-api-reference.html#put_object
I am facing problem accumulating the values of "data" and "length" arguments in the put_object function.
The type of file accepted in the API class is of falcon_multipart.parser.Parser which cannot be sent to Minio.
I can make it work if I write the file to any temp location and then read it from the desired location and send.
Can anyone help me finding a solution to this?
I tried reading file data from the Parser object and tried converting the file to bytes io.BytesIO. But it did not work.
def on_post(self,req, resp):
file = req.get_param('file')
file_data = file.file.read()
file_data= io.BytesIO(file_data)
bucket_name = req.get_param('bucket_name')
self.upload_file_to_minio(bucket_name, file, file_data)
def upload_file_to_minio(self, bucket_name, file, file_data):
minioClient = Minio("localhost:9000", access_key='minio', secret_key='minio', secure=False)
try:
file_stat = sys.getsizeof(file_data)
#file_stat = file_data.getbuffer().nbytes
minioClient.put_object(bucket_name, "SampleFile" , file, file_stat)
except ResponseError as err:
print(err)
Traceback (most recent call last):
File "/home/user/.local/lib/python3.6/site-packages/minio/helpers.py", line 382, in is_non_empty_string
if not input_string.strip():
AttributeError: 'NoneType' object has no attribute 'strip'
A very late answer to your question. As of Falcon 3.0, this should be possible leveraging the framework's native multipart/form-data support.
There is an example how to perform the same task to AWS S3: How can I save POSTed files (from a multipart form) directly to AWS S3?
However, as I understand, MinIO requires either the total length, which is unknown, or alternatively, it requires you to wrap the upload as a multipart form. That should be doable by reading reasonably large (e.g., 8 MiB or similar) chunks into the memory, and wrapping them as multipart upload parts without storing anything on disk.
IIRC Boto3's transfer manager does something like that under the hood for you too.

Difference between io.StringIO and a string variable in python

I am new to python.
Can anybody explain what's the difference between a string variable and io.StringIO . In both we can save character.
e.g
String variable
k= 'RAVI'
io.stringIO
string_out = io.StringIO()
string_out.write('A sample string which we have to send to server as string data.')
string_out.getvalue()
If we print k or string_out.getvalue() both will print the text
print(k)
print(string_out.getvalue())
They are similar because both str and StringIO represent strings, they just do it in different ways:
str: Immutable
StringIO: Mutable, file-like interface, which stores strs
A text-mode file handle (as produced by open("somefile.txt")) is also very similar to StringIO (both are "Text I/O"), with the latter allowing you to avoid using an actual file for file-like operations.
you can use io.StringIO() to simulate files, since python is dynamic with variable types usually if you have something that accepts a file object you can also use io.StringIO() with it, meaning you can have a "file" in memory that you can control the contents of without actually writing any temporary files to disk

Resources