trying to extract file from corrupted zip - zip

after a few attempts at the file with zip -F and zip -FF and 7zip and some other repairing tools,
when running zipdetails it says : Unexpecded END at offset 00122462, value D40D148C.
can I somehow use a hex editor to fix it? what does it mean "Unexpecded END at offset .., value ..." ?
thanks!

Related

Is it possible to partially unzip a .vcf file?

I have a ~300 GB zipped vcf file (.vcf.gz) which contains the genomes of about 700 dogs. I am only interested in a few of these dogs and I do not have enough space to unzip the whole file at this time, although I am in the process of getting a computer to do this. Is it possible to unzip only parts of the file to begin testing my scripts?
I am trying to a specific SNP at a position on a subset of the samples. I have tried using bcftools to no avail: (If anyone can identify what went wrong with that I would also really appreciate it. I created an empty file for the output (722g.990.SNP.INDEL.chrAll.vcf.bgz) but it returns the following error)
bcftools view -f PASS --threads 8 -r chr9:55252802-55252810 -o 722g.990.SNP.INDEL.chrAll.vcf.gz -O z 722g.990.SNP.INDEL.chrAll.vcf.bgz
The output type "722g.990.SNP.INDEL.chrAll.vcf.bgz" not recognised
I am planning on trying awk, but need to unzip the file first. Is it possible to partially unzip it so I can try this?
Double check your command line for bcftools view.
The error message 'The output type "something" is not recognized' is printed by bcftools when you specify an invalid value for the -O (upper-case O) command line option like this -O something. Based on the error message you are getting it seems that you might have put the file name there.
Check that you don't have your input and output file names the wrong way around in your command. Note that the -o (lower-case o) command line option specifies the output file name, and the file name at the end of the command line is the input file name.
Also, you write that you created an empty file for the output. You don't need to do that, bcftools will create the output file.
I don't have that much experience with bcftools but generically If you want to to use awk to manipulate a gzipped file you can pipe to it so as to only unzip the file as needed, you can also pipe the result directly through gzip so it too is compressed e.g.
gzip -cd largeFile.vcf.gz | awk '{ <some awk> }' | gzip -c > newfile.txt.gz
Also zcat is an alias for gzip -cd, -c is input/output to standard out, -d is decompress.
As a side note if you are trying to perform operations on just a part of a large file you may also find the excellent tool less useful it can be used to view your large file loading only the needed parts, the -S option is particularly useful for wide formats with many columns as it stops line wrapping, as is -N for showing line numbers.
less -S largefile.vcf.gz
quit the view with q and g takes you to the top of the file.

How to split a messed-up-dump back to separated files?

I have a ZIP file, not succeeded to unzip for some-reason, saying "invalid or incomplete multibytes or wide character". So I unzip -p myfile.zip > Messed.data , I want to separate them, with script.
unzip -l to get the files-size.
dd ibs=1 skip=$((sum of front file-size)) count=$((this-file-size))
I tried and found the speed was unbearable slow.
So I ask for any help to this. Thank you.

Unzip the archive with more than one entry

I'm trying to decompress ~8GB .zip file piped from curl command. Everything I have tried is being interrupted at <1GB and returns a message:
... has more than one entry--rest ignored
I've tried: funzip, gunzip, gzip -d, zcat, ... also with different arguments - all end up in the above message.
The datafile is public, so it's easy to repro the issue:
curl -L https://archive.org/download/nycTaxiTripData2013/faredata2013.zip | funzip > datafile
Are you sure the mentioned file deflates to a single file? If it extracts to multiple files you unfortunately cannot unzip on the fly.
Zip is a container as well as compression format and it doesn't know where the new file begins. You'll have to download the whole file and unzip it.

zip command not working

I am trying to zip a file using shell script command. I am using following command:
zip ./test/step1.zip $FILES
where $FILES contain all the input files. But I am getting a warning as follows
zip warning: name not matched: myfile.dat
and one more thing I observed that the file which is at last in the list of files in a folder has the above warning and that file is not getting zipped.
Can anyone explain me why this is happening? I am new to shell script world.
zip warning: name not matched: myfile.dat
This means the file myfile.dat does not exist.
You will get the same error if the file is a symlink pointing to a non-existent file.
As you say, whatever is the last file at the of $FILES, it will not be added to the zip along with the warning. So I think something's wrong with the way you create $FILES. Chances are there is a newline, carriage return, space, tab, or other invisible character at the end of the last filename, resulting in something that doesn't exist. Try this for example:
for f in $FILES; do echo :$f:; done
I bet the last line will be incorrect, for example:
:myfile.dat :
...or something like that instead of :myfile.dat: with no characters before the last :
UPDATE
If you say the script started working after running dos2unix on it, that confirms what everybody suspected already, that somehow there was a carriage-return at the end of your $FILES list.
od -c shows the \r carriage-return. Try echo $FILES | od -c
Another possible cause that can generate a zip warning: name not matched: error is having any of zip's environment variables set incorrectly.
From the man page:
ENVIRONMENT
The following environment variables are read and used by zip as described.
ZIPOPT
contains default options that will be used when running zip. The contents of this environment variable will get added to the command line just after the zip command.
ZIP
[Not on RISC OS and VMS] see ZIPOPT
Zip$Options
[RISC OS] see ZIPOPT
Zip$Exts
[RISC OS] contains extensions separated by a : that will cause native filenames with one of the specified extensions to be added to the zip file with basename and extension swapped.
ZIP_OPTS
[VMS] see ZIPOPT
In my case, I was using zip in a script and had the binary location in an environment variable ZIP so that we could change to a different zip binary easily without making tonnes of changes in the script.
Example:
ZIP=/usr/bin/zip
...
${ZIP} -r folder.zip folder
This is then processed as:
/usr/bin/zip /usr/bin/zip -r folder.zip folder
And generates the errors:
zip warning: name not matched: folder.zip
zip I/O error: Operation not permitted
zip error: Could not create output file (/usr/bin/zip.zip)
The first because it's now trying to add folder.zip to the archive instead of using it as the archive. The second and third because it's trying to use the file /usr/bin/zip.zip as the archive which is (fortunately) not writable by a normal user.
Note: This is a really old question, but I didn't find this answer anywhere, so I'm posting it to help future searchers (my future self included).
eebbesen hit the nail in his comment for my case (but i cannot vote for comment).
Another possible reason missed in the other comments is file exceeding the file size limit (4GB).
I converted my script for unix environment using dos2unix command and executed my script as ./myscript.sh instead bash myscript.sh.
I just discovered another potential cause for this. If the permissions of the directory/subdirectory don't allow the zip to find the file, it will report this error. Actually, if you run a chmod -R 444 on the directory, and then try to zip it, you will reproduce this error, and also have a "stored 0%" report, like this:
zip warning: name not matched: borrar/enviar
adding: borrar/ (stored 0%)
Hence, try changing the permissions of the file. If you are trying to send them through email, and those email filters (like Gmail's) invent silly filters of not sending executables, don't forget that making permissions very strict when making zip compression can be the cause of the error you are reporting, of "name not matched".
spaces are not allowed:
it would fail if there are more than one files(s) in $FILES unless you put them in loop
I also encountered this issue. In my case, the line separate is CRLF in my zip shell script which causes the problem. Using LF fixed it.

"Unable to open image" error when using ImageMagick's Filename References

I'm using ImageMagick to do some image processing from the commandline, and would like to operate on a list of files as specified in foo.txt. From the instructions here: http://www.imagemagick.org/script/command-line-processing.php I see that I can use Filename References from a file prefixed with #. When I run something like:
montage #foo.txt output.jpg
everything works as expected, as long as foo.txt is in the current directory. However, when I try to access bar.txt in a different directory by running:
montage /some_directory/#bar.txt
output2.jpg
I get:
montage: unable to open image
/some_directory/#bar.txt: No such file
or directory # blob.c/OpenBlob/2480.
I believe the issue is my syntax, but I'm not sure what to change it to. Any help would be appreciated.
Quite an old entry but it seems relatively obvious that you need to put the # before the full path:
montage #/some_directory/bar.txt output2.jpg
As of ImageMagick 6.5.4-7 2014-02-10, paths are not supported with # syntax. The # file must be in the current directory and identified by name only.
I haven't tried directing IM to pull the list of files from a file, but I do specify multiple files on the command line like this:
gm -sOutputFile=dest.ext -f file1.ppm file2.ppm file3.ppm
Can you pull the contents of that file into a variable, and then let the shell expand that variable?

Resources