An efficient way to detect corrupted png files? - linux

I've written a program to process a bunch of png files that are generated by a seperate process. The capture mostly works, however there are times when the process dies and is restarting which leaves a corrupted image. I have no way to detect when the process dies or which file it dies one (there are ~3000 png files).
Is there a good way to check for a corrupted png file?

I know this is a question from 2010, but I think this is a better solution: pngcheck.

Since you're on a Linux system you probably already have Python installed.
An easy way would be to try loading and verifying the files with PIL (Python Imaging Library) (you'd need to install that first).
from PIL import Image
v_image = Image.open(file)
v_image.verify()
(taken verbatim from my own answer in this thread)

A different possible solution would be to slightly change how your processor processes the files: Have it always create a file named temp.png (for example), and then rename it to the "correct" name once it's done. That way, you know if there is a file named temp.png around, then the process got interrupted, whereas if there is no such file, then everything is good.
(A variant naming scheme would be to do what Firefox's downloader does -- append .partial to the real filename to get the temporary name.)

Kind of a hack, but works
If you are running on linux or something like you might have the "convert" command
$ convert --help
Version: ImageMagick 5.5.6 04/01/03 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 2003 ImageMagick Studio LLC
Usage: convert [options ...] file [ [options ...] file ...] [options ...] file
If you make an invalid png, and then try to convert, you'll get an error:
$ date> foo.png
$ convert foo.png foo.gif
convert: NotAPNGImageFile (foo.png).

Find all non-PNG files:
find . -type f -print0 | xargs -0 file --mime | grep -vF image/png
Find all corrupted PNG files:
find . -type f -print0 | xargs -0 -P0 sh -c 'magick identify +ping "$#" > /dev/null' sh
file command only checks magic number. Having the PNG magic number doesn't mean it is a well formed PNG file.
magick identify is a tool from ImageMagick. By default, it only checks headers of the file for better performance. Here we use +ping to disable the feature and make identify read the whole file.

Related

Is it possible to get kernel version from ELF image file without disassemble or using grep or strings?

I have a vmlinuz ELF image file. I need to get the kernel version from the image file without disassembling it. Is it possible to get kerenel version from offsets of that compressed image file? The file is ELF 64-bit MSB executable, statically linked, not stripped.
As previously mentioned, the version number is hardcoded into the compressed image file. First it depends on the compression algorithm used to compress the content, how to decompress it.
Decompressing files in linux could be challenging due to the combination of compression algorithms and the correlated tool options (not to forget a newer version of tar for newer algorithms).
For files with
file extension tar.gz, tgz use e.g. $ tar -xzv -f vmlinuz.tgz
file extension tar.xz, use e.g. $ tar -xJv -f vmlinuz.tar.xz
file extension tar.bz2, use e.g. $ tar -xjv -f vmlinuz.tar.bz2
So if you have access to the file utility (should also run on windows), run the following to receive the version string and additional information of your file named e.g. vmlinuz-4.x.y-z-a.
file vmlinuz-4.x.y-z-a
Another possibility to reverse-engineer would be to read all strings of the binary file vmlinuz-4.x.y-z-a and grep for a part of the possible solution.
strings vmlinuz-4.x.y-z-a | grep 'linked,'

Is it possible to partially unzip a .vcf file?

I have a ~300 GB zipped vcf file (.vcf.gz) which contains the genomes of about 700 dogs. I am only interested in a few of these dogs and I do not have enough space to unzip the whole file at this time, although I am in the process of getting a computer to do this. Is it possible to unzip only parts of the file to begin testing my scripts?
I am trying to a specific SNP at a position on a subset of the samples. I have tried using bcftools to no avail: (If anyone can identify what went wrong with that I would also really appreciate it. I created an empty file for the output (722g.990.SNP.INDEL.chrAll.vcf.bgz) but it returns the following error)
bcftools view -f PASS --threads 8 -r chr9:55252802-55252810 -o 722g.990.SNP.INDEL.chrAll.vcf.gz -O z 722g.990.SNP.INDEL.chrAll.vcf.bgz
The output type "722g.990.SNP.INDEL.chrAll.vcf.bgz" not recognised
I am planning on trying awk, but need to unzip the file first. Is it possible to partially unzip it so I can try this?
Double check your command line for bcftools view.
The error message 'The output type "something" is not recognized' is printed by bcftools when you specify an invalid value for the -O (upper-case O) command line option like this -O something. Based on the error message you are getting it seems that you might have put the file name there.
Check that you don't have your input and output file names the wrong way around in your command. Note that the -o (lower-case o) command line option specifies the output file name, and the file name at the end of the command line is the input file name.
Also, you write that you created an empty file for the output. You don't need to do that, bcftools will create the output file.
I don't have that much experience with bcftools but generically If you want to to use awk to manipulate a gzipped file you can pipe to it so as to only unzip the file as needed, you can also pipe the result directly through gzip so it too is compressed e.g.
gzip -cd largeFile.vcf.gz | awk '{ <some awk> }' | gzip -c > newfile.txt.gz
Also zcat is an alias for gzip -cd, -c is input/output to standard out, -d is decompress.
As a side note if you are trying to perform operations on just a part of a large file you may also find the excellent tool less useful it can be used to view your large file loading only the needed parts, the -S option is particularly useful for wide formats with many columns as it stops line wrapping, as is -N for showing line numbers.
less -S largefile.vcf.gz
quit the view with q and g takes you to the top of the file.

how to find files and execute program on them

I was trying to run a program in Linux with syntax like this:
BET2 <input file> <output file>
This program would take an image and perform some preprocessing on it then save to a new file. Now I could run the program correctly. However, I have about 1 million images and I don't want to run them one by one manually.
So, my question is, is there any way I could do the following:
find all the images (.jpg file) under the current directory
let each image (something.jpg) be the input of the preprocessing program and name the output with something_processed.jpg
You could use something like
ls *.jpg | xargs -I{} BET2 {} processed_{}
You can achieve your task by executing a command similar to the one below:
cd "/path/to/the/folder/containing/your/images"
for CURRENT_IMAGE in `find *.jpg -maxdepth 0`; do
echo "Preprocessing image file $CURRENT_IMAGE"
# Your program that performs the image preprocessing
BET2 "$CURRENT_IMAGE" "${CURRENT_IMAGE}_processed.jpg"
done
Hope it helps :)

imagemagick RAW file identify/convert - no such file or directory (tmp file)

I'm trying to use imagemagick-7 (CLI) on ubuntu to identify and convert RAW images to PNG format. I am using the dcraw binary as the delegate for identifying and converting raw images.
I updated the dng:decode delegate in delegates.xml as follows:
<delegate decode="dng:decode" command=""dcraw" "%i"" />
When I run this command: magick identify test.dng, I get the following error:
identify: unable to open image '/tmp/magick-24332a6nW8lcwejNJ.ppm': No such file or directory # error/blob.c/OpenBlob/3489.
The same error is given for magick convert. I noticed that imagemagick is generating a temporary intermediate file in my /tmp directory, which has a different name than the name it's expecting to find there. For example, it generates this file - magick-24332P6aVDePOFeCn.ppm - but is expecting the file it generated to have this name - magick-24332a6nW8lcwejNJ.ppm - which is why it's not finding it.
Note: I tested the same thing on OS X and it works perfectly fine (with the same delegates.xml configuration).
Is this a bug in imagemagick's implementation for unix systems or am I doing something wrong? Any thoughts would be greatly appreciated!
You were close the right command to use in delegate is
<delegate decode="dng:decode" command=""dcraw" -c "%i" > "%u.ppm"" />
Almost! You need to use the %o placeholder to tell the delegate manager were the output file will be written to. However the dcraw utility doesn't have in output destination options -- at least from what I can tell form the man-page. It does have a stdout option (-c), so you should be able to pipe/redirect the stream to an output location.
dcraw -c %i > %o
or in delegate XML.
<delegate decode="dng:decode" command=""dcraw" -c "%i" > "%o"" />
Update
Some tricks I've observed with custom delegations.
Use full path to binaries. Usually because I installed something outside of the systems PATH. Usually /usr/local/bin or /opt directories.
command="/usr/local/bin/dcraw ...
Use mv over pipes. If your not comfortable with debugging unix streams & such. Just do some basic copy/move command. We can rewrite the above command with something like...
dcraw %i; mv %i.ppm %o
Ignore escaping quotes until you know its working. Encapsulating arguments are important, and keep everything safe, but \"%i & "%i are hard to read.
<delegate decode="dng:decode" command="dcraw %i; mv %i.ppm %o" />
... verify things are called correctly ... then probably escape paths.
<delegate decode="dng:decode" command="dcraw "%i"; mv "%i".ppm "%o"" />
As pointed out previously, use identify -list Delegate to verify that the command is loaded correctly, and -verbose to verify that it was called correctly during runtime.
For anyone else who experiences this problem, my solution ended up being to switch from imagemagick to graphicsmagick, which is by default configured to use dcraw (of course, you need to have dcraw installed and on your PATH).
http://www.graphicsmagick.org/

Resize a list of images in line command

I would like to resize a list of images, all in the directory. To achieve that, I use convert from imagemagick. I would like to resize
image1.jpg
image2.jpg
...
into
image1-resized.jpg
image2-resized.jpg
...
I was wondering if there is a method to achieve this in a single command line. An elegant solution could be often useful, not only in this case.
EDIT:
I would like a non script-like solution, ie. without for loop.
If you want to resize them to 800x600:
for file in *.jpg; do convert -resize 800x600 -- "$file" "${file%%.jpg}-resized.jpg"; done
(works in bash)
ls *.jpg|sed -e 's/\..*//'|xargs -I X convert X.jpg whatever-options X-resized.jpg
You can eliminate the sed and be extension-generic if you're willing to accept a slightly different final filename, 'resized-image1.jpg' instead of 'image1-resized.jpg':
ls|xargs -I X convert X whatever-options resized-X
GNU Parallel is even easier than for loops, and it's often faster:
parallel convert -resize 800x600 -- "{}" "{.}-resized.jpg" ::: *.jpg
A few things going on here, from right to left:
::: *.jpg means run the command for every jpg file
{.} means insert the current filename without the suffix (.jpg)
{} means insert the current filename
parallel means run the following command many times in parallel. It will choose the max to do in parallel to match the number of cores your computer has. As each one finishes it will launch the next one until all the jpg files are converted.
This runs the command convert --resize 800x600 -- foo.jpg foo-resized.jpg for each file. The -- tells convert to stop processing flags, in case a file name happens to start with a -.
P.S. On my mac I have Homebrew installed, so I was able to install parallel and convert with
brew install parallel
brew install imagemagick
If your image files have different extensions:
for f in *; do convert -resize 800x600 -- "$f" "${f%.*}-resized.${f##*.}"; done

Resources