Deal with ZIP-Buffer in node.js - node.js

I am building the server part of a webapp, using node.js. This involves getting data from thetvdb.com (API documentation of thetvdb).
The data comes as a zip file. HTTP download is no problem, however, parsing the file is. I actually never save the file, but just keep it in memory, as suggested in How to download and unzip a zip file in memory in NodeJs?
I have a buffer with valid data (same data as when I download the file with browser/curl...). However, adm-zip (I also tired other zip libraries, some suggest invalid zip length) can't open it. It does not show an error, but the zipEntries in the end have length of 0.
When I write out the buffer to the filesystem and open it with gui or cli tools it works.
I can't give a direkt link to the file, as it would involve my API key, however I re-uploaded it here.

I think I might have an answer for you:
Don't rely on npm install. I just ran the example that you linked to with the zip file you provided, and I get an output of "0".
I saw a comment on that other StackOverflow page, saying that the version of adm-zip on npm is not up to date. I grabbed a fresh copy of adm-zip from github, overwrote the one in my node_modules folder and reran the example code and now get the following:
...
<Actor>
<id>237811</id>
<Image>actors/237811.jpg</Image>
<Name>Peter Pratt</Name>
<Role>The Master</Role>
<SortOrder>3</SortOrder>
</Actor>
<Actor>
<id>23780s/237811.jpg</Image>
Give that a shot!

Related

Is it possible to download a file nested in a zip file, without downloading the entire zip file?

Is it possible to download a file nested in a zip file, without downloading the entire zip archive?
For example from a url that could look like:
https://www.any.com/zipfile.zip?dir1\dir2\ZippedFileName.txt
Depending on if you are asking whether there is a simple way of implementing this on the server-side or a way of using standard protocols so you can do it from the client-side, there are different answers:
Doing it with the server's intentional support
Optimally, you implement a handler on the server that accepts a query string to any file download similar to your suggestion (I would however include a variable name, example: ?download_partial=dir1/dir2/file
). Then the server can just extract the file from the ZIP archive and serve just that (maybe via a compressed stream if the file is large).
If this is the path you are going and you update the question with the technology used on the server, someone may be able to answer with suggested code.
But on with the slightly more fun way...
Doing it opportunistically if the server cooperates a little
There are two things that conspire to make this a bit feasible, but only worth it if the ZIP file is massive in comparison to the file you want from it.
ZIP files have a directory that says where in the archive each file is. This directory is present at the end of the archive.
HTTP servers optionally allow download of only a range of a response.
So, if we issue a HEAD request for the URL of the ZIP file: HEAD /path/file.zip we may get back a header Accept-Ranges: bytes and a header Content-Length that tells us the length of the ZIP file. If we have those then we can issue a GET request with the header (for example) Range: bytes=1000000-1024000 which would give us part of the file.
The directory of files is towards the end of the archive, so if we request a reasonable block from the end of the file then we will likely get the central directory included. We then look up the file we want, and know where it is located in the large ZIP file.
We can then request just that range from the server, and decompress the result...

How to debug a .zip generator algorithm?

I'm trying to implement a minimal version of .zip file generation following this spec: https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
I don't actually need compression, I just need a way to string together a bunch of files into a single widely adopted archive format with the capability to stream in file data while streaming out the zip.
So far I'm partially successful, 7-zip and windows built in zip extractor can extract them just fine, winrar and macos built in zip extractor are giving me corrupted archive errors.
I can't for the life of me find the actual problem(s?) though, as far as I can tell the .zips are built 100% to the specification but the spec is a big wall of text and with swooping changes from one zip file version to the next along with legacy attributes taking on new functions it is tad confusing.
Does anyone know of an extraction tool that can give me more specific errors than just "archive is corrupt"?
Or perhaps a zip generation utility where I can pick and choose between all the different ways of building a zip file so I can go and compare the results byte by byte?
Does anyone know of an extraction tool that can give me more specific errors than just "archive is corrupt"?
The unzipada tool # Zip-Ada project will do exactly that
Testing archive ko.zip
raised ZIP.ARCHIVE_CORRUPTED : Bad (or no) end-of-central-directory
[C:\Ada\za\unzipada.exe]
Zip.Find_First_Offset at zip.adb:589
Unzip.Extract at unzip.adb:667
Unzipada at unzipada.adb:259
By browsing the code (like: zip.adb, line 589) you can narrow down the corrupt archive issues. For building the tool, download the sources and follow the readme.txt file. There are also pre-built binaries for Windows.

Blank excel spreadsheet when unzipping file

I'm researching a project on software defined networking discussed on knowledgedefinednetworking.org and they provide several datasets. Two of the three datasets are unzipping just fine (100K.csk.gz & train.csv.gz), but benchmark.csv.gz unzips into a new spreadsheet but still uses 3.3GB of memory. I'm using WinZip to unzip the files and they're all going into the same folder, but only benchmark is coming back empty. Is this a common issue or is there something potentially wrong with the download of the file that causes it to unzip empty?
"Is this a common issue" <-- Simple answer : Yes, it is a coomn issue.
"or is there something potentially wrong with the download of the file that causes it to unzip empty?" <-- the download went ok.. I tried to save it as excel. went ok. The files (file1 file2) is not blank.
Note : try to use 7zip as your file uncompressor.
Hope that solves... (:

Understanding how libzip works

I have started working with the libzip library today. But I do not understand the principle how libzip works.
My focus is on zipping a directory with all the files and dirs within
into a zip-file.
Therefore, I started with zip_open(), then I read the directory
contents and add all the dirs with zip_dir_add() to the archive.
After that, I closed the zip-file with zip_close(). Everything was
fine. The next step should be to add all the files to the archive with
zip_file_add(). But it doesn't work. The last step closing the file
fails.
OK, I forgot to create a zip_source to get this done. I added a
statement a line before to get this source (zip_source_file()). But
still it doesn't work.
What is wrong in my thinking? Do I have to fopen() and fclose() the file on the filesystem also?
And what is the difference between zip_source_file() and zip_source_filep()?
Do I have to fopen() and fclose() the file on the filesystem also?
No, you can just use zip_source_file().
From your comments I think you have the right general idea, but there is probably some detail that is making it fail. Make sure you perform all the error checking the documentation suggests after each libzip call so you can get more information about what is causing it to fail.
You could also compare your code with https://gist.github.com/clalancette/bb5069a09c609e2d33c9858fcc6e170e

Artefact folder structure does not contain empty directories

I'm trying to store whole the output of my build, this includes some empty folders. These aren't included by the artefact mechanism in teamcity:
What doesn't work:
OAR\=> OAR.zip
OAR->OAR.zip
OAR
Inside of OAR i have a folder structure that needs to be stored. I know i could put a placeholder file in each but that is not the answer i'm after. Otherwise ill have to zip it myself?
Unfortunately TeamCity, by design, searches for files and uploads them as artifacts which means that empty folders are never included. Given the open and very old issue in the TeamCity tracker I doubt they are going to fix it any time soon.
I would recommend zipping the folder yourself, that is the approach we have taken. How you implement that depends on the build technology you are using. For example, if you are building using Nant you could add the zip task to your build, there are similar options for MSBuild and Ant.
If you don't want to rely on the build performing the zip I would recommend installing 7zip on your build agents and using the command line to perform the zip. Just remember if you want 7zip to include empty directories use * as the wildcard rather than *. * like so:
7z a -r OAR.zip *
Technically you could use powershell to do the zipping, which would be better than having to install something on your agents. I haven't tried this option myself.
Apologies for not linking all my references above. Apparently, and understandably so, I need at least 10 reputation to post more than 2 links.

Resources