How to convert a compiled protocol buffer back to .proto file? - python-3.x

I have a compiled google protocol buffer for python 2 and I'm attempting to port this to python 3. Unfortunately, I cannot find the proto file I used to generate the compiled protocol buffer anywhere. How do I recover the proto file so that I can compile a new one for python 3. I'm unaware of what proto versions were used and all I have is the .py file meant to run on python 2.6.

You will have to write code (in Python for instance) to walk through the tree of your message descriptors. They should - in principle - carry the full information of your original proto file except the code comments. And the generated Python module you still have in your posession should allow you to serialize the file descriptor for your proto file as a file descriptor proto message which could then be fed to code expressing it as proto code.
As a guide you should look into the various code generators for protoc which actually do the same: they read in a file descriptor as a protobuf message, analyze it and generate code.
Here's a basic introduction how to write a Protobuf plugin in Python
https://www.expobrain.net/2015/09/13/create-a-plugin-for-google-protocol-buffer/
Here's the official list of protoc plugins
https://github.com/google/protobuf/blob/master/docs/third_party.md
And here's a protoc plugin to generate LUA code, written in Python.
https://github.com/sean-lin/protoc-gen-lua/blob/master/plugin/protoc-gen-lua
Let's have a look at the main code block
def main():
plugin_require_bin = sys.stdin.read()
code_gen_req = plugin_pb2.CodeGeneratorRequest()
code_gen_req.ParseFromString(plugin_require_bin)
env = Env()
for proto_file in code_gen_req.proto_file:
code_gen_file(proto_file, env,
proto_file.name in code_gen_req.file_to_generate)
code_generated = plugin_pb2.CodeGeneratorResponse()
for k in _files:
file_desc = code_generated.file.add()
file_desc.name = k
file_desc.content = _files[k]
sys.stdout.write(code_generated.SerializeToString())
The loop for proto_file in code_gen_req.proto_file: actually loops over the file descriptor objects for which the code generator plugin was asked by protoc to generate LUA code. So now you could do something like this:
# This should get you the file descriptor for your proto file
file_descr = your_package_pb2.sometype.GetDescriptor().file
# serialized version of file descriptor
filedescr_msg = file_descr.serialized_pb
# required by lua codegen
env = Env()
# create LUA code -> modify it to create proto code
code_gen_file(filedescr, env, "your_package.proto")

As mentioned in the other post(s), you'll need to walk through the tree of your descriptor message and build your proto file contents.
You can find a full C++ example in the protocol buffers github repository. Here are some C++ code snippets from the link in order to give you an idea on how to implement this in Python:
// Special case map fields.
if (is_map()) {
strings::SubstituteAndAppend(
&field_type, "map<$0, $1>",
message_type()->field(0)->FieldTypeNameDebugString(),
message_type()->field(1)->FieldTypeNameDebugString());
} else {
field_type = FieldTypeNameDebugString();
}
std::string label = StrCat(kLabelToName[this->label()], " ");
// Label is omitted for maps, oneof, and plain proto3 fields.
if (is_map() || containing_oneof() ||
(is_optional() && !has_optional_keyword())) {
label.clear();
}
SourceLocationCommentPrinter comment_printer(this, prefix,
debug_string_options);
comment_printer.AddPreComment(contents);
strings::SubstituteAndAppend(
contents, "$0$1$2 $3 = $4", prefix, label, field_type,
type() == TYPE_GROUP ? message_type()->name() : name(), number());
Where the FieldTypeNameDebugString function is shown below:
// The field type string used in FieldDescriptor::DebugString()
std::string FieldDescriptor::FieldTypeNameDebugString() const {
switch (type()) {
case TYPE_MESSAGE:
return "." + message_type()->full_name();
case TYPE_ENUM:
return "." + enum_type()->full_name();
default:
return kTypeToName[type()];
}
}

Related

How to return pointer string with ctypes in python 2.7

I am working on implementation of new fiscal device. And it is using OPOS / UPOS library for communication. I am very new to ctypes and have no experience with C at all. However, I have managed to make it work, mostly.
But I am having issues with returning a string from generalist method DirectIO. documentation says: "This command should be used immediately after EndFiscalReceipt() to retrieve unique ID of latest receipt"
" Parameters:
– Data [in]
Ignored.
– Obj [in/out]
Value to be read."
And adds .NET example under it:
int iData = 0;
string strReferenceID = "";
fiscalPrinter.EndFiscalReceipt();
fiscalPrinter.DirectIO(CMD_EKASA_RECEIPT_ID, ref iData, ref strReferenceID);
// strReferenceID will contain latest receipt ID, e.g. "O−7DBCDA8A56EE426DBCDA8A56EE426D1A"
the first parameter (CMD_EKASA_RECEIPT_ID) is the command executed, thats why its not listed above.
However, python is not .NET and I have never been working with .NET.
I have been following instructions in ctypes doku (https://docs.python.org/2.7/library/ctypes.html), defiend this methods arguments and return in init:
self.libc.DirectIO.argtypes = [ctypes.c_int32, ctypes.c_int32, ctypes.c_char_p]
self.libc.DirectIO.restype = ctypes.c_char_p
Than tried different ways to retrieve reply string, but neither of these does work in my case:
s = ""
c_s = ctypes.c_char_p(s)
result = self.send_command(CMD_EKASA_RECEIPT_ID, 0, c_s)
p = ctypes.create_string_buffer(40)
poin = ctypes.pointer(p)
result = self.send_command(CMD_EKASA_RECEIPT_ID, 0, poin)
p = ctypes.create_string_buffer(40)
result = self.send_command(CMD_EKASA_RECEIPT_ID, 0, p)
s = ctypes.create_string_buffer('\000' * 32)
result = self.send_command(CMD_EKASA_RECEIPT_ID, 0, s)
the string object I have created is allways empty, a.k.a. "" after caling the Cmethod, just like I have created it.
However, there is one more thing, that does not make sense to me. My colleague showed me, how you can see method arguments and return in header file. For this one, there is this:
int DirectIO(int iCommand, int* piData, const char* pccString);
Which means, it returns integer? If I am not mistaken.
so, what I am thinking is, that I have to pass to the method some pointer to a string, created in python, and C will change it, into what I should read. Thus, I think my way of thinking about solution is right.
I have also tried this approach, but that does not work for me either How to pass pointer back in ctypes?
and I am starting to feel desperate. Not sure if I understand the problem correctly and looking for a solution is right place.
I have solved my problem. The whole thing was, in allocating of memory. Every example on the net that I have readed did create empty string, like s = "". But, that is not correct!
When allocated empty string "" C library have had no memory where to write result.
this was almost correct approach,
self.libc = ctypes.cdll.LoadLibrary(LIB_PATH)
self.libc.DirectIO.argtypes = [ctypes.c_int32, ctypes.c_int32, ctypes.c_char_p]
result_s = ctypes.c_char_p("")
log.info('S address: %s | S.value: "%s"' % (result_s, result_s.value))
self.libc.DirectIO(CMD_EKASA_RECEIPT_ID, 0, result_s)
log.info('S address: %s | S.value: "%s"' % (result_s, result_s.value))
returns:
S address: c_char_p(140192115373700) | S.value: ""
S address: c_char_p(140192115373700) | S.value: ""
it needed just a small modification:
self.libc = ctypes.cdll.LoadLibrary(LIB_PATH)
self.libc.DirectIO.argtypes = [ctypes.c_int32, ctypes.c_int32, ctypes.c_char_p]
result_s = ctypes.c_char_p(" " * 10)
log.info('S address: %s | S.value: %s' % (result_s, result_s.value))
self.libc.DirectIO(CMD_EKASA_RECEIPT_ID, 0, result_s)
log.info('S address: %s | S.value: %s' % (result_s, result_s.value))
now, printing result_s after calling self.libc.DirectIO does return different string, than it was before call.
S address: c_char_p(140532072777764) | S.value: " "
S address: c_char_p(140532072777764) | S.value: "0-C12A22F5"
There is linux in the tag, but OPOS does not work on linux.
Or are you working in an emulation environment such as Wine?
In any case, if you don't have the right environment, you can get into trouble with a little bit of nothing.
First, work in a Windows 32-bit environment, create something that works there, and then port it to another environment.
Since OPOS uses OLE/COM technology, the first package to use is win32com or comtypes.
UnifiedPOS is a conceptual specification and there are no actual software components.
There are three types of software that actually run: OPOS, JavaPOS, and POS for.NET.
OPOS and POS for.NET only work in a Windows 32-bit environment.
Only JavaPOS can work in a Linux environment, and it is usually only available from Java.
If you want to make something in Python, you need to create a Wrapper (or glue) library that calls Java from Python.
If the C interface UnifiedPOS(OPOS) is running on Linux without using the Windows emulator or the Wrapper for Java, it may be an original library/component created by the printer vendor with reference to UnifiedPOS.
In that case, I think that the detailed specification can only be heard from the vendor who created it.
To explain, DirectIO method and DirectIOEvent are defined as method/event that vendors can freely define and use.
Therefore, only method/event names and parameters are defined in the UnifiedPOS document.
It is necessary to ask the vendor who provides the DirectIO method/DirectIOEvent what function the specific vendor's service object has, and it is up to the vendor to determine what the parameter means is.
The OPOS specification was absorbed by UnifiedPOS from the middle, but until then it existed as a single specification.
The rest of the name is here. MCS: OPOS Releases
This is the root of the return value of the method of your library being integer.
By the way, this is the latest UnifiedPOS specification for now.
Document -- retail/17-07-32 (UnifiedPOS Retail Peripheral Architecture, Version 1.14.1)

MafftCommandline and io.StringIO

I've been trying to use the Mafft alignment tool from Bio.Align.Applications. Currently, I've had success writing my sequence information out to temporary text files that are then read by MafftCommandline(). However, I'd like to avoid redundant steps as much as possible, so I've been trying to write to a memory file instead using io.StringIO(). This is where I've been having problems. I can't get MafftCommandline() to read internal files made by io.StringIO(). I've confirmed that the internal files are compatible with functions such as AlignIO.read(). The following is my test code:
from Bio.Align.Applications import MafftCommandline
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import io
from Bio import AlignIO
sequences1 = ["AGGGGC",
"AGGGC",
"AGGGGGC",
"AGGAGC",
"AGGGGG"]
longest_length = max(len(s) for s in sequences1)
padded_sequences = [s.ljust(longest_length, '-') for s in sequences1] #padded sequences used to test compatibilty with AlignIO
ioSeq = ''
for items in padded_sequences:
ioSeq += '>unknown\n'
ioSeq += items + '\n'
newC = io.StringIO(ioSeq)
cLoc = str(newC).strip()
cLocEdit = cLoc[:len(cLoc)] #create string to remove < and >
test1Handle = AlignIO.read(newC, "fasta")
#test1HandleString = AlignIO.read(cLocEdit, "fasta") #fails to interpret cLocEdit string
records = (SeqRecord(Seq(s)) for s in padded_sequences)
SeqIO.write(records, "msa_example.fasta", "fasta")
test1Handle1 = AlignIO.read("msa_example.fasta", "fasta") #alignIO same for both #demonstrates working AlignIO
in_file = '.../msa_example.fasta'
mafft_exe = '/usr/local/bin/mafft'
mafft_cline = MafftCommandline(mafft_exe, input=in_file) #have to change file path
mafft_cline1 = MafftCommandline(mafft_exe, input=cLocEdit) #fails to read string (same as AlignIO)
mafft_cline2 = MafftCommandline(mafft_exe, input=newC)
stdout, stderr = mafft_cline()
print(stdout) #corresponds to MafftCommandline with input file
stdout1, stderr1 = mafft_cline1()
print(stdout1) #corresponds to MafftCommandline with internal file
I get the following error messages:
ApplicationError: Non-zero return code 2 from '/usr/local/bin/mafft <_io.StringIO object at 0x10f439798>', message "/bin/sh: -c: line 0: syntax error near unexpected token `newline'"
I believe this results due to the arrows ('<' and '>') present in the file path.
ApplicationError: Non-zero return code 1 from '/usr/local/bin/mafft "_io.StringIO object at 0x10f439af8"', message '/usr/local/bin/mafft: Cannot open _io.StringIO object at 0x10f439af8.'
Attempting to remove the arrows by converting the file path to a string and indexing resulted in the above error.
Ultimately my goal is to reduce computation time. I hope to accomplish this by calling internal memory instead of writing out to a separate text file. Any advice or feedback regarding my goal is much appreciated. Thanks in advance.
I can't get MafftCommandline() to read internal files made by
io.StringIO().
This is not surprising for a couple of reasons:
As you're aware, Biopython doesn't implement Mafft, it simply
provides a convenient interface to setup a call to mafft in
/usr/local/bin. The mafft executable runs as a separate process
that does not have access to your Python program's internal memory,
including your StringIO file.
The mafft program only works with an input file, it doesn't even
allow stdin as a data source. (Though it does allow stdout as a
data sink.) So ultimately, there must be a file in the file system
for mafft to open. Thus the need for your temporary file.
Perhaps tempfile.NamedTemporaryFile() or tempfile.mkstemp() might be a reasonable compromise.

Changing how nodejs require() fetches files

I'm looking to monkey-patch require() to replace its file loading with my own function. I imagine that internally require(module_id) does something like:
Convert module_id into a file path
Load the file path as a string
Compile the string into a module object and set up the various globals correctly
I'm looking to replace step 2 without reimplementing steps 1 + 3. Looking at the public API, there's require() which does 1 - 3, and require.resolve() which does 1. Is there a way to isolate step 2 from step 3?
I've looked at the source of require mocking tools such as mockery -- all they seem to be doing is replacing require() with a function that intercepts certain calls and returns a user-supplied object, and passes on other calls to the native require() function.
For context, I'm trying to write a function require_at_commit(module_id, git_commit_id), which loads a module and any of that module's requires as they were at the given commit.
I want this function because I want to be able to write certain functions that a) rely on various parts of my codebase, and b) are guaranteed to not change as I evolve my codebase. I want to "freeze" my code at various points in time, so thought this might be an easy way of avoiding having to package 20 copies of my codebase (an alternative would be to have "my_code_v1": "git://..." in my package.json, but I feel like that would be bloated and slow with 20 versions).
Update:
So the source code for module loading is here: https://github.com/joyent/node/blob/master/lib/module.js. Specifically, to do something like this you would need to reimplement Module._load, which is pretty straightforward. However, there's a bigger obstacle, which is that step 1, converting module_id into a file path, is actually harder than I thought, because resolveFilename needs to be able to call fs.exists() to know where to terminate its search... so I can't just substitute out individual files, I have to substitute entire directories, which means that it's probably easier just to export the entire git revision to a directory and point require() at that directory, as opposed to overriding require().
Update 2:
Ended up using a different approach altogether... see answer I added below
You can use the require.extensions mechanism. This is how the coffee-script coffee command can load .coffee files without ever writing .js files to disk.
Here's how it works:
https://github.com/jashkenas/coffee-script/blob/1.6.2/lib/coffee-script/coffee-script.js#L20
loadFile = function(module, filename) {
var raw, stripped;
raw = fs.readFileSync(filename, 'utf8');
stripped = raw.charCodeAt(0) === 0xFEFF ? raw.substring(1) : raw;
return module._compile(compile(stripped, {
filename: filename,
literate: helpers.isLiterate(filename)
}), filename);
};
if (require.extensions) {
_ref = ['.coffee', '.litcoffee', '.md', '.coffee.md'];
for (_i = 0, _len = _ref.length; _i < _len; _i++) {
ext = _ref[_i];
require.extensions[ext] = loadFile;
}
}
Basically, assuming your modules have a set of well-known extensions, you should be able to use this pattern of a function that takes the module and filename, does whatever loading/transforming you need, and then returns an object that is the module.
This may or may not be sufficient to do what you are asking, but honestly from your question it sounds like you are off in the weeds somewhere far from the rest of the programming world (don't take that harshly, it's just my initial reaction).
So rather than mess with the node require() module, what I ended up doing is archiving the given commit I need to a folder. My code looks something like this:
# commit_id is the commit we want
# (note that if we don't need the whole repository,
# we can pass "commit_id path_to_folder_we_need")
#
# path is the path to the file you want to require starting from the repository root
# (ie 'lib/module.coffee')
#
# cb is called with (err, loaded_module)
#
require_at_commit = (commit_id, path, cb) ->
dir = 'old_versions' #make sure this is in .gitignore!
dir += '/' + commit_id
do_require = -> cb null, require dir + '/' + path
if not fs.existsSync(dir)
fs.mkdirSync(dir)
cmd = 'git archive ' + commit_id + ' | tar -x -C ' + dir
child_process.exec cmd, (error) ->
if error
cb error
else
do_require()
else
do_require()

How can I add the build version to a scons build

At the moment I'm using some magic to get the current git revision into my scons builds.. I just grab the version a stick it into CPPDEFINES.
It works quite nicely ... until the version changes and scons wants to rebuild everything, rather than just the files that have changed - becasue the define that all files use has changed.
Ideally I'd generate a file using a custom builder called git_version.cpp and
just have a function in there that returns the right tag. That way only that one file would be rebuilt.
Now I'm sure I've seen a tutorial showing exactly how to do this .. but I can't seem to track it down. And I find the custom builder stuff a little odd in scons...
So any pointers would be appreciated...
Anyway just for reference this is what I'm currently doing:
# Lets get the version from git
# first get the base version
git_sha = subprocess.Popen(["git","rev-parse","--short=10","HEAD"], stdout=subprocess.PIPE ).communicate()[0].strip()
p1 = subprocess.Popen(["git", "status"], stdout=subprocess.PIPE )
p2 = subprocess.Popen(["grep", "Changed but not updated\\|Changes to be committed"], stdin=p1.stdout,stdout=subprocess.PIPE)
result = p2.communicate()[0].strip()
if result!="":
git_sha += "[MOD]"
print "Building version %s"%git_sha
env = Environment()
env.Append( CPPDEFINES={'GITSHAMOD':'"\\"%s\\""'%git_sha} )
You don't need a custom Builder since this is just one file. You can use a function (attached to the target version file as an Action) to generate your version file. In the example code below, I've already computed the version and put it into an environment variable. You could do the same, or you could put your code that makes git calls in the version_action function.
version_build_template="""/*
* This file is automatically generated by the build process
* DO NOT EDIT!
*/
const char VERSION_STRING[] = "%s";
const char* getVersionString() { return VERSION_STRING; }
"""
def version_action(target, source, env):
"""
Generate the version file with the current version in it
"""
contents = version_build_template % (env['VERSION'].toString())
fd = open(target[0].path, 'w')
fd.write(contents)
fd.close()
return 0
build_version = env.Command('version.build.cpp', [], Action(version_action))
env.AlwaysBuild(build_version)

Python 3 C-API IO and File Execution

I am having some serious trouble getting a Python 2 based C++ engine to work in Python3. I know the whole IO stack has changed, but everything I seem to try just ends up in failure. Below is the pre-code (Python2) and post code (Python3). I am hoping someone can help me figure out what I'm doing wrong.I am also using boost::python to control the references.
The program is supposed to load a Python Object into memory via a map and then upon using the run function it then finds the file loaded in memory and runs it. I based my code off an example from the delta3d python manager, where they load in a file and run it immediately. I have not seen anything equivalent in Python3.
Python2 Code Begins here:
// what this does is first calls the Python C-API to load the file, then pass the returned
// PyObject* into handle, which takes reference and sets it as a boost::python::object.
// this takes care of all future referencing and dereferencing.
try{
bp::object file_object(bp::handle<>(PyFile_FromString(fullPath(filename), "r" )));
loaded_files_.insert(std::make_pair(std::string(fullPath(filename)), file_object));
}
catch(...)
{
getExceptionFromPy();
}
Next I load the file from the std::map and attempt to execute it:
bp::object loaded_file = getLoadedFile(filename);
try
{
PyRun_SimpleFile( PyFile_AsFile( loaded_file.ptr()), fullPath(filename) );
}
catch(...)
{
getExceptionFromPy();
}
Python3 Code Begins here: This is what I have so far based off some suggestions here... SO Question
Load:
PyObject *ioMod, *opened_file, *fd_obj;
ioMod = PyImport_ImportModule("io");
opened_file = PyObject_CallMethod(ioMod, "open", "ss", fullPath(filename), "r");
bp::handle<> h_open(opened_file);
bp::object file_obj(h_open);
loaded_files_.insert(std::make_pair(std::string(fullPath(filename)), file_obj));
Run:
bp::object loaded_file = getLoadedFile(filename);
int fd = PyObject_AsFileDescriptor(loaded_file.ptr());
PyObject* fileObj = PyFile_FromFd(fd,fullPath(filename),"r",-1,"", "\n","", 0);
FILE* f_open = _fdopen(fd,"r");
PyRun_SimpleFile( f_open, fullPath(filename) );
Lastly, the general state of the program at this point is the file gets loaded in as TextIOWrapper and in the Run: section the fd that is returned is always 3 and for some reason _fdopen can never open the FILE which means I can't do something like PyRun_SimpleFile. The error itself is a debug ASSERTION on _fdopen. Is there a better way to do all this I really appreciate any help.
If you want to see the full program of the Python2 version it's on Github
So this question was pretty hard to understand and I'm sorry, but I found out my old code wasn't quite working as I expected. Here's what I wanted the code to do. Load the python file into memory, store it into a map and then at a later date execute that code in memory. I accomplished this a bit differently than I expected, but it makes a lot of sense now.
Open the file using ifstream, see the code below
Convert the char into a boost::python::str
Execute the boost::python::str with boost::python::exec
Profit ???
Step 1)
vector<char> input;
ifstream file(fullPath(filename), ios::in);
if (!file.is_open())
{
// set our error message here
setCantFindFileError();
input.push_back('\0');
return input;
}
file >> std::noskipws;
copy(istream_iterator<char>(file), istream_iterator<char>(), back_inserter(input));
input.push_back('\n');
input.push_back('\0');
Step 2)
bp::str file_str(string(&input[0]));
loaded_files_.insert(std::make_pair(std::string(fullPath(filename)), file_str));
Step 3)
bp::str loaded_file = getLoadedFile(filename);
// Retrieve the main module
bp::object main = bp::import("__main__");
// Retrieve the main module's namespace
bp::object global(main.attr("__dict__"));
bp::exec(loaded_file, global, global);
Full Code is located on github:

Resources