Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 months ago.
Improve this question
example.zip/
└── example/
├── nice.md
├── tree.md
└── diagram.md
Expected:
example.zip/
├── nice.md
├── tree.md
└── diagram.md
example.zip contains a folder with the same name. In it are files that I want to move to the root of the zip file and remove the empty directory.
I looked at the zip man page. Could not find any flags related to the issue or I could be missing something.
I tried the --copy-entries flag. This create a new zip with selected files from the existing zip but also copy over the folder hierarchy.
zip example.zip "*.md" --copy-entries --out example1.zip
I am trying to write a shell script to do this.
Is it possible to do without extracting the zip?
If you have (or can install) 7z (aka p7zip) you can make use of the d(delete) and rn(rename) options, eg:
$ mkdir example
$ touch example/{nice.md,tree.md,diagram.md}
$ zip -r example.zip example
adding: example/ (stored 0%)
adding: example/diagram.md (stored 0%)
adding: example/nice.md (stored 0%)
adding: example/tree.md (stored 0%)
$ unzip -l example.zip
Archive: example.zip
Length Date Time Name
--------- ---------- ----- ----
0 09-15-2022 09:29 example/
0 09-15-2022 09:29 example/diagram.md
0 09-15-2022 09:29 example/nice.md
0 09-15-2022 09:29 example/tree.md
--------- -------
0 4 files
# rename the *.md files first and then delete the directory; if you delete
# the directory first you'll lose all files under the directory; the 7z d/rn
# commands will generate a lot of output (not shown here)
$ 7z rn example.zip example/nice.md nice.md
$ 7z rn example.zip example/tree.md tree.md
$ 7z rn example.zip example/diagram.md diagram.md
$ 7z d example.zip example
$ unzip -l example.zip
Archive: example.zip
Length Date Time Name
--------- ---------- ----- ----
0 09-15-2022 09:29 diagram.md
0 09-15-2022 09:29 nice.md
0 09-15-2022 09:29 tree.md
--------- -------
0 3 files
$ unzip example.zip
Archive: example.zip
extracting: diagram.md
extracting: nice.md
extracting: tree.md
I'm guessing in OP's real life example the names of the directories and/or files may not be known in advance; the 7z commands do work with bash variables (eg, 7z d "${zipfile}" "${dir_to_delete}"); if OP has issues dynamically processing the contents of a given *zip then I'd recommend asking a new question ...
For a large number of renames (or deletes) it looks like you can also:
specify multiple source/destination pairs on the single command line
use a list file
Good answer. Just to be clear, 7z does not do an in-place edit on the zip file when it does the rename/delete. Under the hood it copies the old zip into a temporary file (example.zip.tmp in this instance), renaming & deleting as it does that copy. Then it deletes the original zip file and renames the temporary file, example.zip.tmp back to original filename, example.zip. For the most part this is a perfectly acceptable (and safe) approach.
Here are the relevant lines from an strace run that shows the deletion of th eoriginal example.zip file, followed by renaming the example.zip.tmp file to example.zip.
$ strace 7z rn example.zip example/tree.md tree.md
...
unlink("example.zip") = 0
rename("example.zip.tmp", "example.zip") = 0
...
Main edge condition of this approach is with very large zip files where you are strapped for disk space -- you need to have space available to store the zip file twice when it creates the temporary copy.
Related
I'm using the 7zip command line interface to extract archives, like so:
7za.exe x -y {path_to_zipfile} -o{path_to_target_folder}
If my zipfile is named my_archive.7z, then I get the following filestructure in the target folder:
🗁 target_folder
└─ 🗁 my_archive
├─ 🗋 foo.png
├─ 🗁 bar
│ ├─ 🗋 baz.txt
│ └─ 🗋 qux.txt
...
However, I don't want the subfolder 🗁 my_archive. I'm looking for flags to apply on the 7zip command such that everything extracts directly in the target folder, without creating the 🗁 my_archive subfolder.
NOTES
I can't replace x with e because the filestructure shouldn't be lost (the e flag pushes all files to the toplevel).
I'm working on a Windows 10 computer, but the solution must also work on Linux.
I'm using the following version: 7-Zip (a) 19.00 (x64)
Some background info: I'm calling 7zip from a Python program, like so:
# Variables:
# 'sevenzip_abspath': absolute path to 7za executable
# 'zipfile_abspath': absolute path to zipped file (`.7z` format)
# 'targetdir_abspath': absolute path to target directory
commandlist = [
sevenzip_abspath,
'x',
'-y',
zipfile_abspath,
f'-o{targetdir_abspath}',
]
output = subprocess.Popen(
commandlist,
stdout=subprocess.PIPE,
shell=False,
).communicate()[0]
if output is not None:
print(output.decode('utf-8'))
I know I could do all kinds of things in Python after the unzipping has finished (move/rename directories, etc etc), but that's for plan B. First I want to check if there is an elegant solution.
I'd like to stick to 7zip for reasons that would lead us too far here.
You can rename the top level folder to match the target folder before extracting the archive.
7za rn {path_to_zipfile} my_archive target_folder
This will permanently change the archive. If you don't want that, take a copy first.
Recently MaxMind changed their download policy, and the old simple format is no longer available. The new file format looks like this: GeoLite2-Country_20191231.tar.gz, and inside we have a folder with the same name containing two additional files.
Although there is an option to delete the date parameter from the link, it seems that the downloaded file will still contain the date.
Now, the problem is to extract that GeoLite2-Country.mmdb from the gzip file having that variable name programmatically.
The unzip part existing in my old script was this:
gunzip -c "$1"GeoLite2-Country.mmdb.gz > "$1"GeoLite2-Country.mmdb
The question is how to modify the above part for the new situation. Or, maybe someone knows another way to solve the same problem. Thanks in advance.
The folder structure:
-+ Geolite2-Country_YYYYMMDD.tar.gz:
|-+ Geolite2-Country_YYYYMMDD
|- licence.txt
|- copyright.txt
|- Geolite2-Country.mmdb
What I need is Geolite2-Country.mmdb in the current folder of gzip file.
tar -tf /GeoLite2-City.tar.gz | grep mmdb | xargs tar -xf /GeoLite2-City.tar.gz --strip-components 1 -C /
Just fix source and destination paths
I have a lot of files on our servers which we compression with a filter that only the files older than x days will get compressed.
The zip command compresses the original, makes a filename.zip and removes the original.
This has a small problem that the timestamp changes since the compression job runs after x days.
So when we run files to remove older files (which are by now zip files), not all files get removed since the timestamp has changed from the original file to the compressed file.
I would like to add a condition where while zipping, i want the original timestamp of the file to be retained by the zip archive even though its running at a later date.
One way of doing this would be to
Get timestamp of each original file with a date command
Compress the original, remove the original
Use and insert the earlier stored timestamp to the new zip file using "touch"
I am looking for a simpler solution.
Some old file I had:
$ ls -l foo
-rw-r--r-- 1 james james 120 Sep 5 07:28 foo
Zip and redate:
$ zip foo.zip foo && touch -d "$(date -R -r foo)" foo.zip
Check it out:
$ ls -l foo.zip
-rw-r--r-- 1 james james 120 Sep 5 07:28 foo.zip
Remove the original:
$ rm -i foo
Yes you can unzip a file and preserve the old timestamp from the original time it was created. Steps to do this are as below:
Click on the filename.zip, properties
In the General tab, the security says "This file came from another computer and might be blocked to help protect this computer". Click on the Unblock check box and click OK
Extract the file and volla, the extracted file has the datatime stamp when the file was created/modified
I'm on a Linux system with limited resources and BusyBox -- this version of tar does not support --append, -r. Is there a workaround that will allow me to [1] append files from directory B to an existing tar of files from directory A after [2] making the B-files appear to have come from directory A? (Later, when someone extracts the files, they should all end up in the same directory A.)
Situation: I have a list of files that I want to tar, but I must process some of these files first. The files might be used by other processes so I don't want to edit them in-place. I want to be conservative when using disk space so my script only copies those files which it needs to change (vs copying them all and then processing some and finally archiving them all with tar -- if I copied them all I might run into disk space issues).
This means the files I want to archive end up in two separate locations. But I want the resulting tar file to appear as if they were all in the same location. Near the end of my script, I end up with two text files listing the A and B files by name.
I think this is straightforward with a full-blown version of tar, but I have to work with the BusyBox version (usage below). Thanks in advance for any ideas!
Usage: tar -[cxtzjaZmvO] [-X FILE] [-f TARFILE] [-C DIR] [FILE]...
Create, extract, or list files from a tar file
Operation:
c Create
x Extract
t List
Options:
f Name of TARFILE ('-' for stdin/out)
C Change to DIR before operation
v Verbose
z (De)compress using gzip
j (De)compress using bzip2
a (De)compress using lzma
Z (De)compress using compress
O Extract to stdout
h Follow symlinks
m Don't restore mtime
exclude File to exclude
X File with names to exclude
T File with names to include
In principle, you just need to append a tar repository containing the additional files to the end of the tar file. It is only slightly more difficult than that.
A tar file consists of any number of repetitions of header + file. The header is always a single 512-byte block, and the file is padded to a multiple of 512 bytes, so you can think of these units as being a variable number of 512-byte blocks. Each block is independent; it's header starts with the full pathname to the file. So there is no requirement that files in a directory be tarred together.
There is one complication. At the end of the tar file, there are at least two 512-byte blocks completely filled with 0s. When tar is reading a tar file, it will ignore a single zero-filled header, but the second one will cause it to stop reading the file. If it hits EOF, it will complain, so the terminating empty headers are required.
There might be more than two headers, because tar actually writes in blocks which are a multiple of 512 bytes. Gnu tar, for example, by default writes in multiples of 20 512-byte chunks, so the smallest tar file is normally 10240 bytes.
In order to append new data, you need to first truncate the existing file to eliminate the empty blocks.
I believe that if the tar file was produced by busybox, there will only be two empty blocks, but I haven't inspected the code. That would be easy; you only need to truncate the last 1024 bytes of the file before appending the additional files.
For general tar files, it is trickier. If you knew that the files themselves didn't have NUL bytes in them (i.e. they were all simple text files), you could remove empty headers until you found a block with a non-0 byte in it, which wouldn't be too difficult.
What I would do is:
Truncate the last 1024 bytes of the tar file.
Remember the current size of the tar file.
Append a test tar file consisting of the tar of a file with a simple short message
Verify that tar tf correctly shows the test file
Truncate the file back to the remembered length,
If the tar tf found the test file's name, succeed
If the last 512 bytes of the tar file are all 0s, truncate the last 512 bytes of the file, and return to step 2.
Otherwise fail
If the above procedure succeeds, you can proceed to append the tar repository with the new files.
I don't know if you have a trunc command. If not, you can use dd copy a file over top of an old file at a specified offset (see the seek= option). dd will truncate the file automatically at the end of the copy. You can also use dd to read a 512 byte block (see the skip and count options).
The best solution is to cut the last 1024 bytes and concatenate a new tar after it. In order to append a tar to an existing tar file, they must be uncompressed.
For files like:
$ find a b
a
a/file1
b
b/file2
You can:
$ tar -C a -czvf a.tar.gz .
$ gunzip -c a.tar.gz | { head -c -$((512*2)); tar -C b -c .; } | gzip > a+b.tar.gz
With the result:
$ tar -tzvf a+b.tar.gz
drwxr-xr-x 0/0 0 2018-04-20 16:11:00 ./
-rw-r--r-- 0/0 0 2018-04-20 16:11:00 ./file1
drwxr-xr-x 0/0 0 2018-04-20 16:11:07 ./
-rw-r--r-- 0/0 0 2018-04-20 16:11:07 ./file2
Or you can create both tar in the same command:
$ tar -C a -c . | { head -c -$((512*2)); tar -C b -c .; } | gzip > a+b.tar.gz
Although this is for tar generated by busybox tar. As mentioned in previous answer, GNU tar add multiple of 20 blocks. You need to force the number of blocks to be 1 (--blocking-factor=1) in order to know in advance how many blocks to cut:
$ tar --blocking-factor=1 -C a -c . | { head -c -$((512*2)); tar -C b -c .; } | gzip | tar --blocking-factor=1 -tzv
Anyway, GNU tar do have --append. The last --blocking-factor=1 is only needed if you indent do append the resulting tar again.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 1 year ago.
Improve this question
Is there a way by which I can download only a part of a .rar or .zip file without downloading the whole file?
There is a ZIP file containing files A, B, C, and D.
I only need A. Can I somehow tweak the download to download only A or if possible extract the file in the server itself and get A only?
The trick is to do what Sergio suggests without doing it manually. This is easy if you mount the ZIP file via an HTTP-backed virtual filesystem and then use the standard unzip command on it. This way the unzip utility's I/O calls are translated to HTTP range GETs, which means only the chunks of the ZIP file that you want get transferred over the network.
Here's an example for Linux using HTTPFS, a very lightweight virtual filesystem (it uses FUSE). There are similar tools for Windows.
Get/build httpfs:
$ wget http://sourceforge.net/projects/httpfs/files/httpfs/1.06.07.02
$ mv 1.06.07.10 httpfs_1.06.07.10.tar.bz2
$ tar -xjf httpfs_1.06.07.10.tar.bz2
$ rm httpfs
$ ./make_httpfs
Mount a remote ZIP file and extract one file from it:
$ mkdir mount_pt
$ sudo ./httpfs http://server.com/zipfile.zip mount_pt
$ sudo ls mount_pt
zipfile.zip
$ sudo unzip -p mount_pt/zipfile.zip the_file_I_want.txt > the_file_I_want.txt
$ sudo umount mount_pt
Of course you can also use whatever other tools beside the command-line one (I need sudo because it seems FUSE is set up that way on my machine, you shouldn't have to need it).
In a way, yes, you can.
ZIP file format says that there's a "central directory". Basically, this is a table that stores what files are in the archive and what offsets do they have.
So, using Content-Range you could download part of the file from the end (the central directory is the last thing in a ZIP file) and try to identify the central directory in it. If you succeed then you know the file list and offsets, so you can proceed and get those chunks separately and decompress them yourself.
This approach is quite error-prone and is not guaranteed to work. But so is hacking in general :-)
Another possible approach would be to build a custom server for that (see pst's answer for more details).
There are several ways for a normal person to be able to download an individual file from a compressed ZIP file, unfortunately they aren't common knowledge. There are some open-source tools and online web services, including:
Windows: Iczelion's HTTP Zip Dowloader (open-source) (that I've used for over 10 years!)
Linux: partial-zip (open-source)
Online: wobzip.org (closed-source)
You can arrange for your file to appear in the back of the ZIP file.
Download 100k:
$ curl -r -100000 https://www.keepassx.org/releases/2.0.2/KeePassX-2.0.2.zip -o tail.zip
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 97k 100 97k 0 0 84739 0 0:00:01 0:00:01 --:--:-- 84817
Check what files we did get:
$ unzip -t tail.zip
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]: attempt to seek before beginning of zipfile
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
testing: KeePassX-2.0.2/share/translations/keepassx_uk.qm OK
testing: KeePassX-2.0.2/share/translations/keepassx_zh_CN.qm OK
testing: KeePassX-2.0.2/share/translations/keepassx_zh_TW.qm OK
testing: KeePassX-2.0.2/zlib1.dll OK
At least one error was detected in tail.zip.
Then extract the last file:
$ unzip tail.zip KeePassX-2.0.2/zlib1.dll
Archive: tail.zip
error [tail.zip]: missing 7751495 bytes in zipfile
(attempting to process anyway)
inflating: KeePassX-2.0.2/zlib1.dll
I think Sergio Tulentsev's idea is brilliant.
However, if there is control over the server -- e.g., custom code can be deployed -- then it is a rather trivial operation (in the scheme of things :) to map/handle a request, extract the relevant portion of the ZIP archive, and send the data back in the HTTP stream.
The request might look like:
http://foo.bar/myfile.zip_a.jpeg
Which would mean extract -- and return -- "a.jpeg" from "myfile.zip".
(I intentionally chose this silly format so that browsers would likely choose "myfile.zip_a.jpeg" as the name in the download dialog when it appears.)
Of course, how this is implemented depends on the server/language/framework and there may already be existing solutions that support a similar operation (but I know not).
Based on the good input I have written a code-snippet in Powershell to show how it could work:
# demo code downloading a single DLL file from an online ZIP archive
# and extracting the DLL into memory to mount it finally to the main process.
cls
Remove-Variable * -ea 0
# definition for the ZIP archive, the file to be extracted and the checksum:
$url = 'https://github.com/sshnet/SSH.NET/releases/download/2020.0.1/SSH.NET-2020.0.1-bin.zip'
$sub = 'net40/Renci.SshNet.dll'
$md5 = '5B1AF51340F333CD8A49376B13AFCF9C'
# prepare HTTP client:
Add-Type -AssemblyName System.Net.Http
$handler = [System.Net.Http.HttpClientHandler]::new()
$client = [System.Net.Http.HttpClient]::new($handler)
# get the length of the ZIP archive:
$req = [System.Net.HttpWebRequest]::Create($url)
$req.Method = 'HEAD'
$length = $req.GetResponse().ContentLength
$zip = [byte[]]::new($length)
# get the last 10k:
# how to get the correct length of the central ZIP directory here?
$start = $length-10kb
$end = $length-1
$client.DefaultRequestHeaders.Add('Range', "bytes=$start-$end")
$result = $client.GetAsync($url).Result
$last10kb = $result.content.ReadAsByteArrayAsync().Result
$last10kb.CopyTo($zip, $start)
# get the block containing the DLL file:
# how to get the exact file-offset from the ZIP directory?
$start = $length-3537kb
$end = $length-3201kb
$client.DefaultRequestHeaders.Clear()
$client.DefaultRequestHeaders.Add('Range', "bytes=$start-$end")
$result = $client.GetAsync($url).Result
$block = $result.content.ReadAsByteArrayAsync().Result
$block.CopyTo($zip, $start)
# extract the DLL file from archive:
Add-Type -AssemblyName System.IO.Compression
$stream = [System.IO.Memorystream]::new()
$stream.Write($zip,0,$zip.Length)
$archive = [System.IO.Compression.ZipArchive]::new($stream)
$entry = $archive.GetEntry($sub)
$bytes = [byte[]]::new($entry.Length)
[void]$entry.Open().Read($bytes, 0, $bytes.Length)
# check MD5:
$prov = [Security.Cryptography.MD5CryptoServiceProvider]::new().ComputeHash($bytes)
$hash = [string]::Concat($prov.foreach{$_.ToString("x2")})
if ($hash -ne $md5) {write-host 'dll has wrong checksum.' -f y ;break}
# load the DLL:
[void][System.Reflection.Assembly]::Load($bytes)
# use the single demo-call from the DLL:
$test = [Renci.SshNet.NoneAuthenticationMethod]::new('test')
'done.'