Julia: Using ProtoBuf to read messages from gzipped file

Julia: Using ProtoBuf to read messages from gzipped file - io

A sensor provides a stream of frames containing object coordinates, which are stored in ProtoBuf format in a gzipped file. I would like to read this file in Julia.
Using protoc, I have generated the Protobuf files for both Python and Julia, coordinate_push.py and coordinate_push.jl
My Python code is as follows:
frameList = []
with gzip.open(filePath) as f:
data = f.read()
next_pos, pos = 0, 0
while pos < len(data):
msg = coordinate_push.CoordinatesFrame()
next_pos, pos = _DecodeVarint32(data, pos)
msg.ParseFromString(data[pos:pos + next_pos])
frameList.append(msg)
pos += next_pos
I'd like to rewrite the above in Julia, and don't know where to start. Part of the problem is that I haven't fully understood the Python script (IO is not my strong point).
I understand that I need:
to open the gzip file, presumably using using GZip; file = GZip.open(file_path, "r")
to read in the data, along the lines of using ProtoBuf; data = readproto(iob, CoordinatesFrame())
What I don't understand is:
how to define iob, and especially how to link it to file (in the Julia Protobuf manual, we had iob = PipeBuffer(), but here it's a gzip-file that we'd like to read)
how to replicate the while-loop in Julia, and in particular the mysterious _DecodeVarint32 (I'm on Windows, if it's related to that.)
whether the file coordinate_push.jl has to be in the same directory as my main file, and if not, how I can properly import it (it is currently in a proto subfolder, and in Python I'd import it using from src.proto import coordinate_push)
Insight on any of the three points would be highly appreciated.

You should open an issue on the Gzip GitHub repo and ask this first part of your question there (I am not a Gzip expert unfortunately).
On the second point, I suggest looking here: https://github.com/JuliaIO/FileIO.jl/blob/master/README.md for lots of examples of FileIO loops which seems exactly what you need to replicate that Python loop. For the second part of this question, you best bet for that function is to try and hunt down the definition on GitHub or in the docs somewhere.
For the 3rd questions, coordinate_push.jl does not need to be in the same folder as your "main file" (I am not sure what you mean by this so perhaps it would help to add context on the structure of your files). To import that file all you need to do is add include("path/to/coordinate_push.jl") at the top of the file you want to call/run the code from. It's worth noting that the path can either be the absolute path or the relative project path (in some cases).

Related

Read a large hdf5 file from url by chunk in python

I have a 1.5 terabyte sized hdf5 file on an Amazon Simple Storage Service located at the link below. I don't have the disk space to save it nor do I have the memory to read it. Accordingly, I want to read it by chunk, process it, and discard the read part. I was hoping to use pandas' read_hdf to read it but it does not support urls. Neither does the h5py library it seems. Though it does mention a ros3 driver but I haven't been able to get it to work yet. I also tried the response to this question but the chunks cannot be read by h5py or I have not found a way yet. So I'm rather left with no idea on how to process this file. Does anyone have any idea how to do so? The link to the file is this:
https://oedi-data-lake.s3-us-west-2.amazonaws.com/building_synthetic_dataset/A_Synthetic_Building_Operation_Dataset.h5

After having this exact same issue, I believe I've cobbled together a working solution for this using fsspec:
import h5py
import fsspec
URL = "..." # Assuming a publicly accessible url
remote_f = fsspec.open(URL, mode="rb")
if hasattr(remote_f, "open"):
remote_f = remote_f.open()
f = h5py.File(remote_f)
# Do regular hdf5 things...
I've confirmed, using your link above, that this does not read the data into memory, just as if it were a local file:
import h5py
import fsspec
URL = "https://oedi-data-lake.s3-us-west-2.amazonaws.com/building_synthetic_dataset/A_Synthetic_Building_Operation_Dataset.h5"
remote_f = fsspec.open(URL, mode="rb")
if hasattr(remote_f, "open"):
remote_f = remote_f.open()
f = h5py.File(remote_f)
f.visititems(print)
# 1. README <HDF5 dataset "1. README": shape (), type "|O">
# 2. Resources <HDF5 group "/2. Resources" (2 members)>
# 2. Resources/2.1. Building Models <HDF5 group "/2. Resources/2.1. Building Models" (9 members)>
...

How to use python's configparser to write a file without sections

I need to modify a config file using python. The file has a format similar to
property_one = 0
property_two = 5
i.e there aren't any section names.
Python's configparser module doesn't support sectionless files, but I can use it to load them easily anyway using the trick from here: https://stackoverflow.com/a/26859985/11637934
parser = ConfigParser()
with open("foo.conf") as lines:
lines = chain(("[top]",), lines) # This line does the trick.
parser.read_file(lines)
The problem is, I can't find a clean way to write the parser back to a file without the section header. The best solution I have at the moment is to write the parser to a StringIO buffer, skip the first line and then write it to a file:
with open('foo.conf', 'w') as config_file, io.StringIO() as buffer:
parser.write(buffer)
buffer.seek(0)
buffer.readline()
shutil.copyfileobj(buffer, config_file)
It works but it's a little ugly and involves creating a second copy of the file in memory. Is there a better or more concise way of achieving this?

Stumbled on a less ugly way of doing this:
text = '\n'.join(['='.join(item) for item in parser.items('top')])
with open('foo.conf', 'w') as config_file:
config_file.write(text)

Read with MXRecordIO from bytes object

Is there a way that I can use mx.recordio.MXRecordIO to read from a bytes object rather than a file object?
For example I'm currently doing:
import mxnet as mx
results_file = 'results.rec'
with open(results_file, 'wb') as f:
f.write(results)
recordio = mx.recordio.MXRecordIO(results_file, 'r')
temp = recordio.read()
But if possible I'd rather not have to write to file as an intermediate step. I've tried using BytesIO, but can't seem to get it to work.

Currently they is no way of achieving this sorry. This is non-trivial because the RecordIO reading/parsing is done in C++ and you can't simply forward the stream to the C++ API.

How to open a binary file in my case .nii file using node.js

I want to open a binary file, or at least when I try to open this with the vscode editor, is say that, can't be opened, because is a binary file.
Can someone explain to me what I can do in order to open this type of files and read the content?
About the .nii file format. is a NIFTI1 and used on medical visualization like MRI.
What I trying to do is to read this file at the lowest level and then make some computations.
I will like to use Node.js for this, not any Python or C++.
More details about the file format can be found here.
https://nifti.nimh.nih.gov/

I don't know about how VScode handle binary file but for exemple with Atom (or with another text editor like vi), you can open and view the content of a binary file. This is not very usefull however as the content is not particularly human readable, except maybe some metadata at the top of the file.
$ vim yourniifile.nii
Anyway, it's all depends on what you want to do with that file, which "computation" you're planned to apply to it, and how you will use it after that.
Luckily, there are some npm packages that can help you with the task of reading and processing that kind of file, like nifti-reader-js or nifti-js, for exemple:
const fs = require('fs');
const niftijs = require('nifti-js');
let rawData = fs.readFileSync('yourniifile.nii');
let data = niftijs.parse(rawData);
console.log(data);

Load spydata file

I'm coming from R + Rstudio. In RStudio, you can save objects to an .RData file using save()
save(object_to_save, file = "C:/path/where/RData/file/will/be/saved.RData")
You can then load() the objects :
load(file = "C:/path/where/RData/file/was/saved.RData")
I'm now using Spyder and Python3, and I was wondering if the same thing is possible.
I'm aware everything in the globalenv can be saved to a .spydata using this :
But I'm looking for a way to save to a .spydata file in the code. Basically, just the code under the buttons.
Bonus points if the answer includes a way to save an object (or multiple objects) and not the whole env.
(Please note I'm not looking for an answer using pickle or shelve, but really something similar to R's load() and save().)

(Spyder developer here) There's no way to do what you ask for with a command in Spyder consoles.
If you'd like to see this in a future Spyder release, please open an issue in our issues tracker about it, so we don't forget to consider it.

Considering the comment here, we can
rename the file from .spydata to .tar
extract the file (using file manager, for example). It will deliver a file .pickle (and maybe a .npy)
extract the objects saved from the environment:
import pickle
with open(path, 'rb') as f:
data_temp = pickle.load(f)
that object will be a dictionary with the objects saved.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Julia: Using ProtoBuf to read messages from gzipped file - io

Related

Read a large hdf5 file from url by chunk in python

How to use python's configparser to write a file without sections

Read with MXRecordIO from bytes object

How to open a binary file in my case .nii file using node.js

Load spydata file

Categories

Resources