Golang excel file reading - excel

I'm using tealeg xlsx library to read an excel file https://github.com/tealeg/xlsx . They have documentation here https://godoc.org/github.com/tealeg/ . It works perfectly fine if I call the OpenFile() by local directory, but I wanted to use an http.Request.FormFile() return object which is of type multipart.Form. How do I use this file to be read by the tealeg package?
Tealeg's OpenReaderAt() looks like something I should use, but the multipart. Form object returned from http.Request.FormFile() returns a file interface but I'm not sure how to access the readerAt object? https://golang.org/pkg/mime/multipart/#File

func OpenReaderAt(r io.ReaderAt, size int64) (*File, error)
xlsx.OpenReaderAt takes in an io.ReaderAt interface and multipart.File also implements io.ReaderAt.
So you can directly pass it to xlsx.OpenReaderAt
var (
file multipart.File
size int64
err error
)
file, _,err = req.FormFile("key")
// size = // Calculate size
xlsx.OpenReaderAt(file,size)

Related

Django file object always 0 bytes when uploaded from python requests

I have been trying to upload a file to django REST using python requests.
I put the file, and some other data, to the server.
r = self.session.put(
f"{hello_url}/shadow_pbem/savefile_api/",
files=test_files,
data={"hash": test_file_hash, 'leader': 78},
headers=good_token_header,
)
I get a 200 response, the model saves all the data correctly as expected, including a correctly named save file in /media, except the save file in /media is always 0 bytes.
This is how I create the file object...
with open(testfile_path, "rb") as testfile:
...and verify the length, which is not 0.
testfile.seek(0, os.SEEK_END)
filesize = testfile.tell()
I create the files object for upload...
test_files = {
"file": ("testfile.zip", testfile, "application/zip")
}
I put some code in the view to verify, and the file object in the view is there, but it is 0 bytes.
here is the relevent part of the view. It seems to work fine, but all files are 0 bytes.
class SaveFileUploadView(APIView):
parser_class = (FileUploadParser,)
def put(self, request):
if "file" not in request.data:
raise ParseError("Empty content")
f = request.data["file"]
print(f"file {f} size:{f.size}")
# prints file testfile.zip size:0
# rest of view works fine...
I have tried with various files and formats, also using post. Files are always 0 bytes.
Any help appreciated I am going crazy....
If you do
testfile.seek(0, os.SEEK_END)
filesize = testfile.tell()
as you say,
you'll need to also rewind back to the start – otherwise there is indeed zero bytes for Requests to read anymore.
testfile.seek(0)

How to convert a compiled protocol buffer back to .proto file?

I have a compiled google protocol buffer for python 2 and I'm attempting to port this to python 3. Unfortunately, I cannot find the proto file I used to generate the compiled protocol buffer anywhere. How do I recover the proto file so that I can compile a new one for python 3. I'm unaware of what proto versions were used and all I have is the .py file meant to run on python 2.6.
You will have to write code (in Python for instance) to walk through the tree of your message descriptors. They should - in principle - carry the full information of your original proto file except the code comments. And the generated Python module you still have in your posession should allow you to serialize the file descriptor for your proto file as a file descriptor proto message which could then be fed to code expressing it as proto code.
As a guide you should look into the various code generators for protoc which actually do the same: they read in a file descriptor as a protobuf message, analyze it and generate code.
Here's a basic introduction how to write a Protobuf plugin in Python
https://www.expobrain.net/2015/09/13/create-a-plugin-for-google-protocol-buffer/
Here's the official list of protoc plugins
https://github.com/google/protobuf/blob/master/docs/third_party.md
And here's a protoc plugin to generate LUA code, written in Python.
https://github.com/sean-lin/protoc-gen-lua/blob/master/plugin/protoc-gen-lua
Let's have a look at the main code block
def main():
plugin_require_bin = sys.stdin.read()
code_gen_req = plugin_pb2.CodeGeneratorRequest()
code_gen_req.ParseFromString(plugin_require_bin)
env = Env()
for proto_file in code_gen_req.proto_file:
code_gen_file(proto_file, env,
proto_file.name in code_gen_req.file_to_generate)
code_generated = plugin_pb2.CodeGeneratorResponse()
for k in _files:
file_desc = code_generated.file.add()
file_desc.name = k
file_desc.content = _files[k]
sys.stdout.write(code_generated.SerializeToString())
The loop for proto_file in code_gen_req.proto_file: actually loops over the file descriptor objects for which the code generator plugin was asked by protoc to generate LUA code. So now you could do something like this:
# This should get you the file descriptor for your proto file
file_descr = your_package_pb2.sometype.GetDescriptor().file
# serialized version of file descriptor
filedescr_msg = file_descr.serialized_pb
# required by lua codegen
env = Env()
# create LUA code -> modify it to create proto code
code_gen_file(filedescr, env, "your_package.proto")
As mentioned in the other post(s), you'll need to walk through the tree of your descriptor message and build your proto file contents.
You can find a full C++ example in the protocol buffers github repository. Here are some C++ code snippets from the link in order to give you an idea on how to implement this in Python:
// Special case map fields.
if (is_map()) {
strings::SubstituteAndAppend(
&field_type, "map<$0, $1>",
message_type()->field(0)->FieldTypeNameDebugString(),
message_type()->field(1)->FieldTypeNameDebugString());
} else {
field_type = FieldTypeNameDebugString();
}
std::string label = StrCat(kLabelToName[this->label()], " ");
// Label is omitted for maps, oneof, and plain proto3 fields.
if (is_map() || containing_oneof() ||
(is_optional() && !has_optional_keyword())) {
label.clear();
}
SourceLocationCommentPrinter comment_printer(this, prefix,
debug_string_options);
comment_printer.AddPreComment(contents);
strings::SubstituteAndAppend(
contents, "$0$1$2 $3 = $4", prefix, label, field_type,
type() == TYPE_GROUP ? message_type()->name() : name(), number());
Where the FieldTypeNameDebugString function is shown below:
// The field type string used in FieldDescriptor::DebugString()
std::string FieldDescriptor::FieldTypeNameDebugString() const {
switch (type()) {
case TYPE_MESSAGE:
return "." + message_type()->full_name();
case TYPE_ENUM:
return "." + enum_type()->full_name();
default:
return kTypeToName[type()];
}
}

MafftCommandline and io.StringIO

I've been trying to use the Mafft alignment tool from Bio.Align.Applications. Currently, I've had success writing my sequence information out to temporary text files that are then read by MafftCommandline(). However, I'd like to avoid redundant steps as much as possible, so I've been trying to write to a memory file instead using io.StringIO(). This is where I've been having problems. I can't get MafftCommandline() to read internal files made by io.StringIO(). I've confirmed that the internal files are compatible with functions such as AlignIO.read(). The following is my test code:
from Bio.Align.Applications import MafftCommandline
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import io
from Bio import AlignIO
sequences1 = ["AGGGGC",
"AGGGC",
"AGGGGGC",
"AGGAGC",
"AGGGGG"]
longest_length = max(len(s) for s in sequences1)
padded_sequences = [s.ljust(longest_length, '-') for s in sequences1] #padded sequences used to test compatibilty with AlignIO
ioSeq = ''
for items in padded_sequences:
ioSeq += '>unknown\n'
ioSeq += items + '\n'
newC = io.StringIO(ioSeq)
cLoc = str(newC).strip()
cLocEdit = cLoc[:len(cLoc)] #create string to remove < and >
test1Handle = AlignIO.read(newC, "fasta")
#test1HandleString = AlignIO.read(cLocEdit, "fasta") #fails to interpret cLocEdit string
records = (SeqRecord(Seq(s)) for s in padded_sequences)
SeqIO.write(records, "msa_example.fasta", "fasta")
test1Handle1 = AlignIO.read("msa_example.fasta", "fasta") #alignIO same for both #demonstrates working AlignIO
in_file = '.../msa_example.fasta'
mafft_exe = '/usr/local/bin/mafft'
mafft_cline = MafftCommandline(mafft_exe, input=in_file) #have to change file path
mafft_cline1 = MafftCommandline(mafft_exe, input=cLocEdit) #fails to read string (same as AlignIO)
mafft_cline2 = MafftCommandline(mafft_exe, input=newC)
stdout, stderr = mafft_cline()
print(stdout) #corresponds to MafftCommandline with input file
stdout1, stderr1 = mafft_cline1()
print(stdout1) #corresponds to MafftCommandline with internal file
I get the following error messages:
ApplicationError: Non-zero return code 2 from '/usr/local/bin/mafft <_io.StringIO object at 0x10f439798>', message "/bin/sh: -c: line 0: syntax error near unexpected token `newline'"
I believe this results due to the arrows ('<' and '>') present in the file path.
ApplicationError: Non-zero return code 1 from '/usr/local/bin/mafft "_io.StringIO object at 0x10f439af8"', message '/usr/local/bin/mafft: Cannot open _io.StringIO object at 0x10f439af8.'
Attempting to remove the arrows by converting the file path to a string and indexing resulted in the above error.
Ultimately my goal is to reduce computation time. I hope to accomplish this by calling internal memory instead of writing out to a separate text file. Any advice or feedback regarding my goal is much appreciated. Thanks in advance.
I can't get MafftCommandline() to read internal files made by
io.StringIO().
This is not surprising for a couple of reasons:
As you're aware, Biopython doesn't implement Mafft, it simply
provides a convenient interface to setup a call to mafft in
/usr/local/bin. The mafft executable runs as a separate process
that does not have access to your Python program's internal memory,
including your StringIO file.
The mafft program only works with an input file, it doesn't even
allow stdin as a data source. (Though it does allow stdout as a
data sink.) So ultimately, there must be a file in the file system
for mafft to open. Thus the need for your temporary file.
Perhaps tempfile.NamedTemporaryFile() or tempfile.mkstemp() might be a reasonable compromise.

Changing how nodejs require() fetches files

I'm looking to monkey-patch require() to replace its file loading with my own function. I imagine that internally require(module_id) does something like:
Convert module_id into a file path
Load the file path as a string
Compile the string into a module object and set up the various globals correctly
I'm looking to replace step 2 without reimplementing steps 1 + 3. Looking at the public API, there's require() which does 1 - 3, and require.resolve() which does 1. Is there a way to isolate step 2 from step 3?
I've looked at the source of require mocking tools such as mockery -- all they seem to be doing is replacing require() with a function that intercepts certain calls and returns a user-supplied object, and passes on other calls to the native require() function.
For context, I'm trying to write a function require_at_commit(module_id, git_commit_id), which loads a module and any of that module's requires as they were at the given commit.
I want this function because I want to be able to write certain functions that a) rely on various parts of my codebase, and b) are guaranteed to not change as I evolve my codebase. I want to "freeze" my code at various points in time, so thought this might be an easy way of avoiding having to package 20 copies of my codebase (an alternative would be to have "my_code_v1": "git://..." in my package.json, but I feel like that would be bloated and slow with 20 versions).
Update:
So the source code for module loading is here: https://github.com/joyent/node/blob/master/lib/module.js. Specifically, to do something like this you would need to reimplement Module._load, which is pretty straightforward. However, there's a bigger obstacle, which is that step 1, converting module_id into a file path, is actually harder than I thought, because resolveFilename needs to be able to call fs.exists() to know where to terminate its search... so I can't just substitute out individual files, I have to substitute entire directories, which means that it's probably easier just to export the entire git revision to a directory and point require() at that directory, as opposed to overriding require().
Update 2:
Ended up using a different approach altogether... see answer I added below
You can use the require.extensions mechanism. This is how the coffee-script coffee command can load .coffee files without ever writing .js files to disk.
Here's how it works:
https://github.com/jashkenas/coffee-script/blob/1.6.2/lib/coffee-script/coffee-script.js#L20
loadFile = function(module, filename) {
var raw, stripped;
raw = fs.readFileSync(filename, 'utf8');
stripped = raw.charCodeAt(0) === 0xFEFF ? raw.substring(1) : raw;
return module._compile(compile(stripped, {
filename: filename,
literate: helpers.isLiterate(filename)
}), filename);
};
if (require.extensions) {
_ref = ['.coffee', '.litcoffee', '.md', '.coffee.md'];
for (_i = 0, _len = _ref.length; _i < _len; _i++) {
ext = _ref[_i];
require.extensions[ext] = loadFile;
}
}
Basically, assuming your modules have a set of well-known extensions, you should be able to use this pattern of a function that takes the module and filename, does whatever loading/transforming you need, and then returns an object that is the module.
This may or may not be sufficient to do what you are asking, but honestly from your question it sounds like you are off in the weeds somewhere far from the rest of the programming world (don't take that harshly, it's just my initial reaction).
So rather than mess with the node require() module, what I ended up doing is archiving the given commit I need to a folder. My code looks something like this:
# commit_id is the commit we want
# (note that if we don't need the whole repository,
# we can pass "commit_id path_to_folder_we_need")
#
# path is the path to the file you want to require starting from the repository root
# (ie 'lib/module.coffee')
#
# cb is called with (err, loaded_module)
#
require_at_commit = (commit_id, path, cb) ->
dir = 'old_versions' #make sure this is in .gitignore!
dir += '/' + commit_id
do_require = -> cb null, require dir + '/' + path
if not fs.existsSync(dir)
fs.mkdirSync(dir)
cmd = 'git archive ' + commit_id + ' | tar -x -C ' + dir
child_process.exec cmd, (error) ->
if error
cb error
else
do_require()
else
do_require()

Python 3 C-API IO and File Execution

I am having some serious trouble getting a Python 2 based C++ engine to work in Python3. I know the whole IO stack has changed, but everything I seem to try just ends up in failure. Below is the pre-code (Python2) and post code (Python3). I am hoping someone can help me figure out what I'm doing wrong.I am also using boost::python to control the references.
The program is supposed to load a Python Object into memory via a map and then upon using the run function it then finds the file loaded in memory and runs it. I based my code off an example from the delta3d python manager, where they load in a file and run it immediately. I have not seen anything equivalent in Python3.
Python2 Code Begins here:
// what this does is first calls the Python C-API to load the file, then pass the returned
// PyObject* into handle, which takes reference and sets it as a boost::python::object.
// this takes care of all future referencing and dereferencing.
try{
bp::object file_object(bp::handle<>(PyFile_FromString(fullPath(filename), "r" )));
loaded_files_.insert(std::make_pair(std::string(fullPath(filename)), file_object));
}
catch(...)
{
getExceptionFromPy();
}
Next I load the file from the std::map and attempt to execute it:
bp::object loaded_file = getLoadedFile(filename);
try
{
PyRun_SimpleFile( PyFile_AsFile( loaded_file.ptr()), fullPath(filename) );
}
catch(...)
{
getExceptionFromPy();
}
Python3 Code Begins here: This is what I have so far based off some suggestions here... SO Question
Load:
PyObject *ioMod, *opened_file, *fd_obj;
ioMod = PyImport_ImportModule("io");
opened_file = PyObject_CallMethod(ioMod, "open", "ss", fullPath(filename), "r");
bp::handle<> h_open(opened_file);
bp::object file_obj(h_open);
loaded_files_.insert(std::make_pair(std::string(fullPath(filename)), file_obj));
Run:
bp::object loaded_file = getLoadedFile(filename);
int fd = PyObject_AsFileDescriptor(loaded_file.ptr());
PyObject* fileObj = PyFile_FromFd(fd,fullPath(filename),"r",-1,"", "\n","", 0);
FILE* f_open = _fdopen(fd,"r");
PyRun_SimpleFile( f_open, fullPath(filename) );
Lastly, the general state of the program at this point is the file gets loaded in as TextIOWrapper and in the Run: section the fd that is returned is always 3 and for some reason _fdopen can never open the FILE which means I can't do something like PyRun_SimpleFile. The error itself is a debug ASSERTION on _fdopen. Is there a better way to do all this I really appreciate any help.
If you want to see the full program of the Python2 version it's on Github
So this question was pretty hard to understand and I'm sorry, but I found out my old code wasn't quite working as I expected. Here's what I wanted the code to do. Load the python file into memory, store it into a map and then at a later date execute that code in memory. I accomplished this a bit differently than I expected, but it makes a lot of sense now.
Open the file using ifstream, see the code below
Convert the char into a boost::python::str
Execute the boost::python::str with boost::python::exec
Profit ???
Step 1)
vector<char> input;
ifstream file(fullPath(filename), ios::in);
if (!file.is_open())
{
// set our error message here
setCantFindFileError();
input.push_back('\0');
return input;
}
file >> std::noskipws;
copy(istream_iterator<char>(file), istream_iterator<char>(), back_inserter(input));
input.push_back('\n');
input.push_back('\0');
Step 2)
bp::str file_str(string(&input[0]));
loaded_files_.insert(std::make_pair(std::string(fullPath(filename)), file_str));
Step 3)
bp::str loaded_file = getLoadedFile(filename);
// Retrieve the main module
bp::object main = bp::import("__main__");
// Retrieve the main module's namespace
bp::object global(main.attr("__dict__"));
bp::exec(loaded_file, global, global);
Full Code is located on github:

Resources