Is it possible to download just part of a ZIP archive (e.g. one file)? [closed] - zip

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
Is there a way by which I can download only a part of a .rar or .zip file without downloading the whole file?
There is a ZIP file containing files A, B, C, and D.
I only need A. Can I somehow tweak the download to download only A or if possible extract the file in the server itself and get A only?

The trick is to do what Sergio suggests without doing it manually. This is easy if you mount the ZIP file via an HTTP-backed virtual filesystem and then use the standard unzip command on it. This way the unzip utility's I/O calls are translated to HTTP range GETs, which means only the chunks of the ZIP file that you want get transferred over the network.
Here's an example for Linux using HTTPFS, a very lightweight virtual filesystem (it uses FUSE). There are similar tools for Windows.
Get/build httpfs:
$ wget http://sourceforge.net/projects/httpfs/files/httpfs/1.06.07.02
$ mv 1.06.07.10 httpfs_1.06.07.10.tar.bz2
$ tar -xjf httpfs_1.06.07.10.tar.bz2
$ rm httpfs
$ ./make_httpfs
Mount a remote ZIP file and extract one file from it:
$ mkdir mount_pt
$ sudo ./httpfs http://server.com/zipfile.zip mount_pt
$ sudo ls mount_pt
zipfile.zip
$ sudo unzip -p mount_pt/zipfile.zip the_file_I_want.txt > the_file_I_want.txt
$ sudo umount mount_pt
Of course you can also use whatever other tools beside the command-line one (I need sudo because it seems FUSE is set up that way on my machine, you shouldn't have to need it).

In a way, yes, you can.
ZIP file format says that there's a "central directory". Basically, this is a table that stores what files are in the archive and what offsets do they have.
So, using Content-Range you could download part of the file from the end (the central directory is the last thing in a ZIP file) and try to identify the central directory in it. If you succeed then you know the file list and offsets, so you can proceed and get those chunks separately and decompress them yourself.
This approach is quite error-prone and is not guaranteed to work. But so is hacking in general :-)
Another possible approach would be to build a custom server for that (see pst's answer for more details).

There are several ways for a normal person to be able to download an individual file from a compressed ZIP file, unfortunately they aren't common knowledge. There are some open-source tools and online web services, including:
Windows: Iczelion's HTTP Zip Dowloader (open-source) (that I've used for over 10 years!)
Linux: partial-zip (open-source)
Online: wobzip.org (closed-source)

You can arrange for your file to appear in the back of the ZIP file.
Download 100k:
$ curl -r -100000 https://www.keepassx.org/releases/2.0.2/KeePassX-2.0.2.zip -o tail.zip
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 97k 100 97k 0 0 84739 0 0:00:01 0:00:01 --:--:-- 84817
Check what files we did get:
$ unzip -t tail.zip
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
testing: KeePassX-2.0.2/share/translations/keepassx_uk.qm OK
testing: KeePassX-2.0.2/share/translations/keepassx_zh_CN.qm OK
testing: KeePassX-2.0.2/share/translations/keepassx_zh_TW.qm OK
testing: KeePassX-2.0.2/zlib1.dll OK
At least one error was detected in tail.zip.
Then extract the last file:
$ unzip tail.zip KeePassX-2.0.2/zlib1.dll
Archive: tail.zip
error [tail.zip]: missing 7751495 bytes in zipfile
(attempting to process anyway)
inflating: KeePassX-2.0.2/zlib1.dll

I think Sergio Tulentsev's idea is brilliant.
However, if there is control over the server -- e.g., custom code can be deployed -- then it is a rather trivial operation (in the scheme of things :) to map/handle a request, extract the relevant portion of the ZIP archive, and send the data back in the HTTP stream.
The request might look like:
http://foo.bar/myfile.zip_a.jpeg
Which would mean extract -- and return -- "a.jpeg" from "myfile.zip".
(I intentionally chose this silly format so that browsers would likely choose "myfile.zip_a.jpeg" as the name in the download dialog when it appears.)
Of course, how this is implemented depends on the server/language/framework and there may already be existing solutions that support a similar operation (but I know not).

Based on the good input I have written a code-snippet in Powershell to show how it could work:
# demo code downloading a single DLL file from an online ZIP archive
# and extracting the DLL into memory to mount it finally to the main process.
cls
Remove-Variable * -ea 0
# definition for the ZIP archive, the file to be extracted and the checksum:
$url = 'https://github.com/sshnet/SSH.NET/releases/download/2020.0.1/SSH.NET-2020.0.1-bin.zip'
$sub = 'net40/Renci.SshNet.dll'
$md5 = '5B1AF51340F333CD8A49376B13AFCF9C'
# prepare HTTP client:
Add-Type -AssemblyName System.Net.Http
$handler = [System.Net.Http.HttpClientHandler]::new()
$client = [System.Net.Http.HttpClient]::new($handler)
# get the length of the ZIP archive:
$req = [System.Net.HttpWebRequest]::Create($url)
$req.Method = 'HEAD'
$length = $req.GetResponse().ContentLength
$zip = [byte[]]::new($length)
# get the last 10k:
# how to get the correct length of the central ZIP directory here?
$start = $length-10kb
$end = $length-1
$client.DefaultRequestHeaders.Add('Range', "bytes=$start-$end")
$result = $client.GetAsync($url).Result
$last10kb = $result.content.ReadAsByteArrayAsync().Result
$last10kb.CopyTo($zip, $start)
# get the block containing the DLL file:
# how to get the exact file-offset from the ZIP directory?
$start = $length-3537kb
$end = $length-3201kb
$client.DefaultRequestHeaders.Clear()
$client.DefaultRequestHeaders.Add('Range', "bytes=$start-$end")
$result = $client.GetAsync($url).Result
$block = $result.content.ReadAsByteArrayAsync().Result
$block.CopyTo($zip, $start)
# extract the DLL file from archive:
Add-Type -AssemblyName System.IO.Compression
$stream = [System.IO.Memorystream]::new()
$stream.Write($zip,0,$zip.Length)
$archive = [System.IO.Compression.ZipArchive]::new($stream)
$entry = $archive.GetEntry($sub)
$bytes = [byte[]]::new($entry.Length)
[void]$entry.Open().Read($bytes, 0, $bytes.Length)
# check MD5:
$prov = [Security.Cryptography.MD5CryptoServiceProvider]::new().ComputeHash($bytes)
$hash = [string]::Concat($prov.foreach{$_.ToString("x2")})
if ($hash -ne $md5) {write-host 'dll has wrong checksum.' -f y ;break}
# load the DLL:
[void][System.Reflection.Assembly]::Load($bytes)
# use the single demo-call from the DLL:
$test = [Renci.SshNet.NoneAuthenticationMethod]::new('test')
'done.'

Related

The system cannot find the file specified - WinError 2

Upon looping a directory to delete txt files ONLY - a message is returned indicating The System cannot find the file specified: 'File.txt'.
I've made sure the txt files that I'm attempting to delete exist in the directory I'm looping. I've also checked my code and to make sure it can see my files by printing them in a list with the print command.
import os
fileLoc = 'c:\\temp\\files'
for files in os.listdir(fileLoc):
if files.endswith('.txt'):
os.unlink(files)
Upon initial execution, I expected to see all txt files deleted except for other non-txt files. The actual result was an error message "FileNotFoundError: [WinError 2] The system cannot find the file specified: 'File.txt'.
Not sure what I'm doing wrong, any help would be appreciated.
It isn't found because the the path you intended to unlink is relative to fileLoc. In fact with your code, the effect is to unlink the file relative to the current working directory. If there were *.txt files
in the cwd then the code would have unfortunate side-effects.
Another way to look at it:
Essentially, by analogy, in the shell what you're trying to do is equivalent to this:
# first the setup
$ mkdir foo
$ touch foo/a.txt
# now your code is equvalent to:
$ rm *.txt
# won't work as intended because it removes the *.txt files in the
# current directory. In fact the bug is also that your code would unlink
# any *.txt files in the current working directory unintentionally.
# what you intended was:
$ rm foo/*.txt
The missing piece was the path to the file in question.
I'll add some editorial: The Old Bard taught us to "when in doubt, print variables". In other words, debug it. I don't see from the OP an attempt to do that. Just a thing to keep in mind.
Anyway the new code:
Revised:
import os
fileLoc = 'c:\\temp\\files'
for file in os.listdir(fileLoc):
if file.endswith('.txt'):
os.unlink(os.path.join(fileLoc,file))
The fix: os.path.join() builds a path for you from parts. One part is the directory (path) where the file exists, aka: fileLoc. The other part is the filename, aka file.
os.path.join() makes a whole valid path from them using whatever OS directory separator is appropriate for your platform.
Also, might want to glance through:
https://docs.python.org/2/library/os.path.html

zip command not working

I am trying to zip a file using shell script command. I am using following command:
zip ./test/step1.zip $FILES
where $FILES contain all the input files. But I am getting a warning as follows
zip warning: name not matched: myfile.dat
and one more thing I observed that the file which is at last in the list of files in a folder has the above warning and that file is not getting zipped.
Can anyone explain me why this is happening? I am new to shell script world.
zip warning: name not matched: myfile.dat
This means the file myfile.dat does not exist.
You will get the same error if the file is a symlink pointing to a non-existent file.
As you say, whatever is the last file at the of $FILES, it will not be added to the zip along with the warning. So I think something's wrong with the way you create $FILES. Chances are there is a newline, carriage return, space, tab, or other invisible character at the end of the last filename, resulting in something that doesn't exist. Try this for example:
for f in $FILES; do echo :$f:; done
I bet the last line will be incorrect, for example:
:myfile.dat :
...or something like that instead of :myfile.dat: with no characters before the last :
UPDATE
If you say the script started working after running dos2unix on it, that confirms what everybody suspected already, that somehow there was a carriage-return at the end of your $FILES list.
od -c shows the \r carriage-return. Try echo $FILES | od -c
Another possible cause that can generate a zip warning: name not matched: error is having any of zip's environment variables set incorrectly.
From the man page:
ENVIRONMENT
The following environment variables are read and used by zip as described.
ZIPOPT
contains default options that will be used when running zip. The contents of this environment variable will get added to the command line just after the zip command.
ZIP
[Not on RISC OS and VMS] see ZIPOPT
Zip$Options
[RISC OS] see ZIPOPT
Zip$Exts
[RISC OS] contains extensions separated by a : that will cause native filenames with one of the specified extensions to be added to the zip file with basename and extension swapped.
ZIP_OPTS
[VMS] see ZIPOPT
In my case, I was using zip in a script and had the binary location in an environment variable ZIP so that we could change to a different zip binary easily without making tonnes of changes in the script.
Example:
ZIP=/usr/bin/zip
...
${ZIP} -r folder.zip folder
This is then processed as:
/usr/bin/zip /usr/bin/zip -r folder.zip folder
And generates the errors:
zip warning: name not matched: folder.zip
zip I/O error: Operation not permitted
zip error: Could not create output file (/usr/bin/zip.zip)
The first because it's now trying to add folder.zip to the archive instead of using it as the archive. The second and third because it's trying to use the file /usr/bin/zip.zip as the archive which is (fortunately) not writable by a normal user.
Note: This is a really old question, but I didn't find this answer anywhere, so I'm posting it to help future searchers (my future self included).
eebbesen hit the nail in his comment for my case (but i cannot vote for comment).
Another possible reason missed in the other comments is file exceeding the file size limit (4GB).
I converted my script for unix environment using dos2unix command and executed my script as ./myscript.sh instead bash myscript.sh.
I just discovered another potential cause for this. If the permissions of the directory/subdirectory don't allow the zip to find the file, it will report this error. Actually, if you run a chmod -R 444 on the directory, and then try to zip it, you will reproduce this error, and also have a "stored 0%" report, like this:
zip warning: name not matched: borrar/enviar
adding: borrar/ (stored 0%)
Hence, try changing the permissions of the file. If you are trying to send them through email, and those email filters (like Gmail's) invent silly filters of not sending executables, don't forget that making permissions very strict when making zip compression can be the cause of the error you are reporting, of "name not matched".
spaces are not allowed:
it would fail if there are more than one files(s) in $FILES unless you put them in loop
I also encountered this issue. In my case, the line separate is CRLF in my zip shell script which causes the problem. Using LF fixed it.

How to update all the files under the current directory in Ubuntu with some comments at the start of the files

I have an issue where in, I am trying to add copyrights message in all our files in the project. Since it will affect many directories and files, our team has split the task.
so each on of us will be updating the files manually. Can I automate it.
I tried with:
find -exec sed -i "1i # x CONFIDENTIAL\n# _____________________\n#\n# 1997 - 2012 x Incorporated\n# All Rights Reserved.\n#\n# NOTICE: All information contained herein is, and remains\n# the property of x Incorporated and its suppliers,\n# if any. The intellectual and technical concepts contained\n# herein are proprietary to x Incorporated\n# and its suppliers and may be covered by U.S. and Foreign Patents,\n# patents in process, and are protected by trade secret or copyright law.\n# Dissemination of this information or reproduction of this material\n# is strictly forbidden unless prior written permission is obtained\n# from x Incorporated.\n" -- {} \;
It just stops, as soon as it encounters the . folder and any folder under the current directory.
Can we control the command to affect some of the files in the directory by specifying the complete/partial name of the file?
You can do so by executing the following commands for each file:
cp file temp
cat copy_right_notice temp > file
Note that > overwrites file (while >> appends to file, which is not what you want (referring to your comment))

Renaming lots of files in Linux according to a pattern [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Improve this question
I'm trying to do three things with the mv command, but not sure it's possible? Probably need a script. not sure how to write it. All files are in same folder.
1) Files ending with v9.zip should just be .zip (the v9 removed)
2) Files containing _ should be -
3) Files with Uppercase letter next to a lowercase letter (or lowercase next to an Uppercase) should have a space between them. So MoveOverNow would be Move Over Now and ruNaway would be ruN away
[A-Z][a-z] or [a-z][A-Z] becomes [A-Z] [a-z] and [a-z] [A-Z]
There's a rename command provided with most Debian/Ubuntu based distros which was written by Robin Barker based on Larry Wall's original code from around 1998(!).
Here's an excerpt from the documentation:
"rename" renames the filenames supplied according to the rule specified as the first argument. The perlexpr argument is a Perl expression which is expected to modify the $_ string in Perl for at least some of the filenames
specified. If a given filename is not modified by the expression, it will not be renamed. If no filenames are given on the command line, filenames will be read via standard input.
For example, to rename all files matching "*.bak" to strip the extension, you might say
rename 's/\.bak$//' *.bak
To translate uppercase names to lower, you'd use
rename 'y/A-Z/a-z/' *
It uses perl so you can use perl expressions to match the pattern, in fact I believe it works much like tchrist's scripts.
One other really useful set of tools for bulk file renaming is the renameutils collection by Oskar Liljeblad. The source code is hosted by the Free Software Foundation. Additionally many distros (especially Debian/Ubuntu based distros) have a renameutils package with these tools.
On one of those distros you can install it with:
$ sudo apt-get install renameutils
And then to rename files just run this command:
$ qmv
It will pop open a text editor with the list of files, and you can manipulate them with your editor's search and replace function.
I haven't tested these, so I put echo at the front of the commands so you can try them before removing the echo to run them for real.
for f in *v9.zip; do echo mv "${f}" "${f%v9.zip}.zip"; done
for f in *_*; do echo mv "${f}" "${f//_/-}"; done
As for your third problem I'm sure it can be done too but maybe a more sophisticated approach than raw shell one-liners will help, as #tchrist mentioned.
My favorite solution is my own rename script. The simplest example that maps to your problems are these:
% rename 's/_/-/g' *
% rename 's/(\p{Lower})(\p{Upper})/$1 $2/g' *
Although I really hate whitespace in my filenames, especially vertical whitespace:
% rename 's/\s//g' *
% rename 's/\v//g' *
et cetera. It’s based on a script by The Larry Wall, but extended with options, as in:
usage: /home/tchrist/scripts/rename [-ifqI0vnml] [-F file] perlexpr [files]
-i ask about clobbering existent files
-f force clobbers without inquiring
-q quietly skip clobbers without inquiring
-I ask about all changes
-0 read null-terminated filenames
-v verbosely says what its doing
-V verbosely says what its doing but with newlines between old and new filenames
-n don't really do it
-m to always rename
-l to always symlink
-F path read filelist to change from magic path(s)
As you see, it can change not just the names of files, but where symbolic links are pointing to using the same pattern. You don’t have to use a s/// pattern, although often one does.
The other tools in that directory are mostly for Unicode work, of which there are some super-useful ones.
The above answers apply to Debian, Ubuntu etc
For RHEL and co: rename from_pattern to_pattern files
I think the link is broken and I couldn't find the page in the webarchive to the rename script in tchrist's post, so here is another one in Perl.
#!/usr/bin/perl
# -w switch is off bc HERE docs cause erroneous messages to be displayed under
# Cygwin
#From the Perl Cookbook, Ch. 9.9
# rename - Larry's filename fixer
$help = <<EOF;
Usage: rename expr [files]
This script's first argument is Perl code that alters the filename
(stored in \$_ ) to reflect how you want the file renamed. It can do
this because it uses an eval to do the hard work. It also skips rename
calls when the filename is untouched. This lets you simply use
wildcards like rename EXPR * instead of making long lists of filenames.
Here are five examples of calling the rename program from your shell:
% rename 's/\.orig$//' *.orig
% rename 'tr/A-Z/a-z/ unless /^Make/' *
% rename '$_ .= ".bad"' *.f
% rename 'print "$_: "; s/foo/bar/ if <STDIN> =~ /^y/i' *
% find /tmp -name '*~' -print | rename 's/^(.+)~$/.#$1/'
The first shell command removes a trailing ".orig" from each filename.
The second converts uppercase to lowercase. Because a translation is
used rather than the lc function, this conversion won't be locale-
aware. To fix that, you'd have to write:
% rename 'use locale; $_ = lc($_) unless /^Make/' *
The third appends ".bad" to each Fortran file ending in ".f", something
a lot of us have wanted to do for a long time.
The fourth prompts the user for the change. Each file's name is printed
to standard output and a response is read from standard input. If the
user types something starting with a "y" or "Y", any "foo" in the
filename is changed to "bar".
The fifth uses find to locate files in /tmp that end with a tilde. It
renames these so that instead of ending with a tilde, they start with
a dot and a pound sign. In effect, this switches between two common
conventions for backup files
EOF
$op = shift or die $help;
chomp(#ARGV = <STDIN>) unless #ARGV;
for (#ARGV) {
$was = $_;
eval $op;
die $# if $#;
rename($was,$_) unless $was eq $_;
}

An efficient way to detect corrupted png files?

I've written a program to process a bunch of png files that are generated by a seperate process. The capture mostly works, however there are times when the process dies and is restarting which leaves a corrupted image. I have no way to detect when the process dies or which file it dies one (there are ~3000 png files).
Is there a good way to check for a corrupted png file?
I know this is a question from 2010, but I think this is a better solution: pngcheck.
Since you're on a Linux system you probably already have Python installed.
An easy way would be to try loading and verifying the files with PIL (Python Imaging Library) (you'd need to install that first).
from PIL import Image
v_image = Image.open(file)
v_image.verify()
(taken verbatim from my own answer in this thread)
A different possible solution would be to slightly change how your processor processes the files: Have it always create a file named temp.png (for example), and then rename it to the "correct" name once it's done. That way, you know if there is a file named temp.png around, then the process got interrupted, whereas if there is no such file, then everything is good.
(A variant naming scheme would be to do what Firefox's downloader does -- append .partial to the real filename to get the temporary name.)
Kind of a hack, but works
If you are running on linux or something like you might have the "convert" command
$ convert --help
Version: ImageMagick 5.5.6 04/01/03 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 2003 ImageMagick Studio LLC
Usage: convert [options ...] file [ [options ...] file ...] [options ...] file
If you make an invalid png, and then try to convert, you'll get an error:
$ date> foo.png
$ convert foo.png foo.gif
convert: NotAPNGImageFile (foo.png).
Find all non-PNG files:
find . -type f -print0 | xargs -0 file --mime | grep -vF image/png
Find all corrupted PNG files:
find . -type f -print0 | xargs -0 -P0 sh -c 'magick identify +ping "$#" > /dev/null' sh
file command only checks magic number. Having the PNG magic number doesn't mean it is a well formed PNG file.
magick identify is a tool from ImageMagick. By default, it only checks headers of the file for better performance. Here we use +ping to disable the feature and make identify read the whole file.

Resources