Stream Poco Zip Compression to Poco HTTPServerResponse - zip

I would like to directly compress a directory into a Poco::HTTPServerResponse stream. However, downloading the zip file produced by the following code leads to a corrupt archive. I do know that the below compression approach does work for locally created zip files as I have successfully done that much. What am I missing or is this simply not possible? (Poco v1.6.1)
std::string directory = "/tmp/data";
response.setStatusAndReason(HTTPResponse::HTTPStatus::HTTP_OK);
response.setKeepAlive(true);
response.setContentType("application/zip");
response.set("Content-Disposition","attachment; filename=\"data.zip\"");
Poco::Zip::Compress compress(response.send(),false);
compress.addRecursive(directory,
Poco::Zip::ZipCommon::CompressionMethod::CM_STORE,
Poco::Zip::ZipCommon::CompressionLevel::CL_MAXIMUM,
false, "data");
compress.close();

I use the same technique successfully, with only a slight difference:
The compression method and the compression level (CM and CL).
compress.addFile( cacheFile, Poco::DateTime(), currentFile.GetName(), Poco::Zip::ZipCommon::CM_DEFLATE, Poco::Zip::ZipCommon::CL_SUPERFAST );
A zip file corresponds to the DEFLATE algorithm, so when unzipping, your explorer/archive manager probably doesn't work out.
Either that, or it's pointless to use a MAXIMUM level on a STORE method (STORE non compressing by definition).
EDIT: Just tried it, actually, it's because CM_STORE internally uses headers (probably some kind of tar). Once your files have been added to the zip stream and you close it, Poco tries to order the header, and resets the position of the output stream to the start to write them.
Since it cannont be done on the HTTP output stream (your bytes are already sent!), it fails.
Switching to CM_DEFLATE should fix your problem.

Related

Is it possible to download a file nested in a zip file, without downloading the entire zip file?

Is it possible to download a file nested in a zip file, without downloading the entire zip archive?
For example from a url that could look like:
https://www.any.com/zipfile.zip?dir1\dir2\ZippedFileName.txt
Depending on if you are asking whether there is a simple way of implementing this on the server-side or a way of using standard protocols so you can do it from the client-side, there are different answers:
Doing it with the server's intentional support
Optimally, you implement a handler on the server that accepts a query string to any file download similar to your suggestion (I would however include a variable name, example: ?download_partial=dir1/dir2/file
). Then the server can just extract the file from the ZIP archive and serve just that (maybe via a compressed stream if the file is large).
If this is the path you are going and you update the question with the technology used on the server, someone may be able to answer with suggested code.
But on with the slightly more fun way...
Doing it opportunistically if the server cooperates a little
There are two things that conspire to make this a bit feasible, but only worth it if the ZIP file is massive in comparison to the file you want from it.
ZIP files have a directory that says where in the archive each file is. This directory is present at the end of the archive.
HTTP servers optionally allow download of only a range of a response.
So, if we issue a HEAD request for the URL of the ZIP file: HEAD /path/file.zip we may get back a header Accept-Ranges: bytes and a header Content-Length that tells us the length of the ZIP file. If we have those then we can issue a GET request with the header (for example) Range: bytes=1000000-1024000 which would give us part of the file.
The directory of files is towards the end of the archive, so if we request a reasonable block from the end of the file then we will likely get the central directory included. We then look up the file we want, and know where it is located in the large ZIP file.
We can then request just that range from the server, and decompress the result...

How do I decompress the diagram data in a .drawio file with node.js and zlib?

Diagrams.net, previously and still more widely known as draw.io, is a popular tool for drawing diagrams of various kinds. It stores diagrams in an XML-based format that uses the file ending .drawio. The file content has the structure:
<mxfile {...}>
<diagram {...}>
{the-actual-diagram-content}
</diagram>
</mxfile>`
According to the documentation page Extracting the XML from mxfiles, the string {the-actual-diagram-content} contains the actual diagram data in compressed format, "compressed with the standard deflate process". I'd like to decompress this data in my node.js app to parse and modify it.
I have found an older, similar question on StackOverflow, which wants the same, but uses the libraries "atob", and later "pako". I'd like to achieve the same with the more standard "zlib" node.js module, which - if this is really "the standard deflate process" - should be possible.
However, all my attempts to "inflate" the compressed string fail. I have mostly tried variations of the following code, with different encodings ('base64', 'utf8') and methods ('inflateSync', 'unzipSync', 'gunzipSync'):
zlib.inflateSync(Buffer.from(string, 'base64')).toString();
All attempts fail with the error "Error: incorrect header check". I read this as "dude, seriously, you're using the wrong unzip algorithm for this". However, I cannot figure out what the right algorithm or settings are.
The sample string I'd like to decode is the following. Using the jgraph inflate/deflate tool, this uncompresses perfectly fine. However, the settings done there, "URL Encode", "Deflate", "Base64" sound to me exactly like what I am trying.
zVdbk6I6EP41Vp3z4BYXL/Ao3nV0VEYZfQsQITOBIEQu/voNAgrqrHtOzVbti5X+0t0kX/eXxJrYdeKhDzx7RkyIawJnxjWxVxOEZktmvymQZIAoNzLA8pGZQfwVUNEJ5iCXo0dkwqDiSAnBFHlV0CCuCw1awYDvk6jqtie4+lUPWPAOUA2A71ENmdTOUKnJXfERRJZdfJnn8hkHFM45ENjAJFEJEvs1sesTQrORE3chTrkreMniBl/MXhbmQ5f+TkCXT0gX48NHW1CSsVHXjta0nmcJAT7mG+7kq6VJQYFPjq4J0yxcTVQiG1GoesBIZyNWc4bZ1MHM4tkwoD75vFDFNqnkX4A+hfGXS+cvhLBGgsSB1E+YSxHQyjlMbuzoWhK+4NkulaPwA3kXWJfUV6LYIOfqP/Am3PGm/IW8ia3mX8abeMebSdIYesceNJkOcxNinUT9K6CcATaRsoOYWM8EAp92UsUz3CXu2c01b5AS42xygNLVn8sDMLJcNjYYtdBnAAY6xAowPq1zHbsEE/+ap1pbFuMn72Vjmxo/moXZi8uTvaSwYkTfi+WwcSmKWdeg1ChiMqJSdn7dFIxMcvQN+Fz8jDgL0mfN/mWT1bkfXFNqVxqtXjSVDzGgKKyu9VFX5ekXBLFdXHIL0o3wpZvGzPaYR5UPv5tEolhNJIg3iTISfpGocCT7fQArPmchXHj5/9po3GnjXhQYs4sPPj9OQOBlt+EexWmbPjhffEIBBfo5ddpY+Q1aQthVDsv2b51IX8v+voNKx9CjU+ibmqjOV2t/sf98SZvPS8qeBV46DCh0DYT/CTfzlR7MrF5PmC8SIz5MdfnN4UVrzlmdlql4q46i66/m3uq7mtdTvZ3jrFQ0Zp9S4EjXoqUgK+uX5cTtbS3TmDV36a4E5bhgxA0mW3u1w5PpSRuph+6SIW9HNIC0cewe9e0cxsJyA4VOe8v2qyznChq33wP8Ee3DE97iYWvIqXY74k/4oIKDGQta/LnmY9WBRjsaAaN90jSwWrNYHIZr5vGxZt8eeMTHLyCQArwGfZ2OY3Uh0tZouLKlYLya0FVfjjZyM3ZM5VVDn6d4ISvcECB5rYSOOEpCJA90I54dtp0X7Mubo77bACKoZiNgz0vFOxmKLr3tk7QLlNZorbYnQ6HhBtPe20Hy2QJUcR8vwlktvaUHvPc+VDYn1yFm0kY4lJfSXBxN9d3rsKmpO3VuyWa4kLr/PpfZQ9kAjEnUKd6e3I3U+MJGJL1v6vL5hDdROQOMPeAWl8udlP+kDMUHMhS/S4bVx0i98Q0yZOb1BZ25X/+GiP2f
What am I doing wrong?
Use zlib.inflateRawSync(). What you have there is a raw deflate stream, not a zlib stream.

Heroku cannot store files temporarily

I am writing a nodejs app which works with fonts. One action it performs is that it downloads a .ttf font from the web, converts it to a base64 string, deletes the .ttf and uses that string in other stuff. I need the .ttf file stored somewhere, so I convert it. This process takes like 1-2 seconds. I know heroku has an ephemeral file system but I need to store stuff for such a short time. Is there any way I can store my files? Using fs.writeFile currently returns this error:
Error: EROFS: read-only file system, open '/app\test.txt']
I had idea how about you make an action, That would get font, convert it and store it on a global variable before used by another task.
When you want to use it again, make sure you check that global variable already filled or not with that font buffer.
Reference
Singleton
I didn't know that you could store stuff in /tmp directory. It is working for the moment but according to the dyno/ephemeral system, it gets cleaned frequently so I don't know if it may cause other problems in the long run.

What's the best practice for watching for finished video encodes?

TL;DR
I'm unsure the best way to recognise when encoding videos have finished with Chokidar. Given the different methods encoders build their video files, what's the best way to accomodate all of them?
Context
I've developed a workflow for our office that allows us to quickly queue encode jobs in Adobe Premiere Pro. To queue them locally, I made use of Premiere's CEP API. I can easily send a job to Adobe Media Encoder (on the same machine) and it will automatically encode the video file to the relative project directory. This works great.
To queue encode jobs onto LAN workstations, I've taken a different approach, as the CEP API doesn't allow for any extensibility beyond the local machine. Instead I made use of Adobe Media Encoder's watch folders to detect added Premiere project files to a subfolder on our NAS (everything is on the NAS). This works great too.
Unfortunately, I'm unaware of a way for the queued encodes to be output to the relative project directory in the same way queuing locally does. I'm trying to find a way to do this by watching a common directory and moving finished files.
Since each video filename I'm queuing has this structure:
"projectName_sequenceName_givenName_renderType.mp4/.mxf" I've been able to move the files with this information easily. However, I'm struggling to accomodate for the different methods different encoding processes use. Different encoders - X264, MainConcept H264, etc - encode to disk differently.
Using Chokidar, I watched how different encoders build their files:
Example #1:
If I start a DnXHR MXF encode, it will first create the final .MXF container and then fill it. When it finishes, it writes the sidecar .XMP file. If the encode fails or is cancelled, the sidecar file will not be written.
Example #2:
If I start a TMPG x264 encode, it will first create the final .mp4 container, then create a temporary file: '.mp4_00_' appended. It will then write some initial metadata to the final container, start encoding to the '.mp4_00_' and depending on file size, create additional temporary files, '_.mp4_01_', etc. Finally it writes some additional information to the container, then to the temporary files and then deletes the temporary files. If the encode fails or is cancelled, the files are deleted.
Example #3:
If I start a MainConcept H264 encode (Premiere's default), it will first create the audio temp file, in this case '.aac'. Then create another temp file '.mkv.md0'. Halfway through encoding, it will create the video container '.m4v', start encoding to that, create some more temporary files '.md7/md6', create the final container '.mp4', along with 'sbjo.tmp', copy the '.mkv' file and '.aac' into the '.mp4' container, add a '00' file, very quickly delete it and then finish writing the '.mp4' metadata. Some of this happens very quickly and Chokidar has not always picked it up. Unless the encoder is being inconsistent.
These are the three encode types I've observed, and they're the three we need and use. I suppose I could watch each of them differently, but my concern would be if we ever switched encoders, I'd have to rewrite the code to accomodate them. The watch folder feature that Adobe Media Encoder has recognises when files have finished encoding before attempting to use them. I haven't tested every format, but a good deal. Would Media Encoder be accomodating each unique encoding process? Simply polling locked files? Or is there something I'm missing?
The code I have currently works fine for DNxHR MXFs provided they don't fail or are cancelled. It struggles with the h264/x264 examples. Since the file is created and left untouched while encoding to the temporary files, chokidar will register 'add'. Since the file is locked the move fails. Obviously this works fine when simply copying or moving a finished video file.
const watcher = chokidar.watch(['Z:/NETWORKRENDER/Finished/*.{mp4,mxf}'], {
persistent: true,
// On start, works on existing files
ignoreInitial: false,
followSymlinks: true,
interval: 1000,
awaitWriteFinish: {
stabilityThreshold: 5000,
pollInterval: 20000
},
});

Deleting original files as you go along adding files to a TAR file

I've written a small server function which is intended to tar together a bunch of locally downloaded files, then delete the originals. It looks something like this:
with tarfile.open(archive_filename, "w:gz") as tar:
for pb in designated_objects:
bucket.download_file(pb.key, pb.key)
tar.add(pb.key)
os.delete(pb.key)
My expectation is that this will generate a tarfile with all of my desired data and an otherwise empty directory. The idea here is that I would like to minimize my disc usage as much as possible. However, I'm unsure if deleting a file before the tarfile is finished being generated (as done here) is allowed.
Will this expression work as expected?
If it will not, is there something akin to an append mode that will?
As expected, the original files are populated, then deleted. However, the behavior of the archive is unusual. When this code block is run, no archive is generated. In fact, this code block will do nothing at all (except delete your files).
I find this behavior particularly unusual and surprising given the fact that taking a pass inside the with statement (as in the code that follows) will actually write an empty archive to disc. So in a sense, the given code block does even less than nothing!
with tarfile.open('archive_filename.xy.gz', "w:gz") as tar:
pass
For reference, this behavior is what I get with Python 3.6. Behavior with other versions of Python may differ.

Resources