Linux command to compress the most at the expense of being slow - linux

I know there are commands like gzip, bzip, tar, xz. I want to know which command compresses the most to claim more space even at the expense of slower compression time. Any benchmark statistics would be great.
I googled but couldnt find any page which kind of compared it to my needs.

You can find some (potentailly outdated) benchmarks here. In general, xz -9 is going to compress things pretty well. As others have mentioned, it really depends on what data you're trying to compress, though.
I would recommend giving either bzip or xz a try, and see if they are fast enough and/or compress well enough for your particular application. Benchmarks are never as good as testing them on your exact data set.

Related

nodejs choosing between JIMP and MOZJPEG

I was wondering if there is a blaring reason to use jimp vs. imagemin-mozjpeg for compressing jpegs (I am already using both imagemin and jimp in my project already, imagemin-webp to serve next gen images, and jimp to convert pngs to jpegs in rare cases) So I am more looking for reasoning that is based on the following:
Performance
Reliability (I have noticed that there are some JPEGs mozjpeg has trouble with and fails on. specifically ones that I have used GNU Image Manipulation Program [GIMP] with.)
However, if someone has good reasons that don't align with the two aforementioned I would still like to hear them.
Heres some quick links to the NPM packages mentioned if anyone needs them:
imagemin-mozjpeg
jimp
Performance
imagemin-mozjpeg uses mozjpeg to process images. And mozjpeg itself is made using C language. While jimp uses javascript to process it.
As mention in the main repository jimp:
An image processing library for Node written entirely in JavaScript, with zero native dependencies.
We know the difference in performance between Javascript and C.
Reliability
I do not want much opinion in this section. but we can see directly how the statistics of each repository.
mozjpeg:
Star: 4.1k
Open Issues: 76
Closed Issues: 186
jimp:
Star: 10.3k
Open Issues: 157
Closed Issues: 430
I do not side with either. They all have worked well. I really appreciate the work of the maintainers and contributors of the library library.
Yes, and it goes far beyond the performance of the compression process (ie how long it takes to compress an image, which is also important) or the relative activity of development of the library (which is arguably less important).
I highly recommend reading Is WebP really better than JPEG? (and this discussion), which shows that even among JPEG compression libraries, the implementation can have a significant impact on compression ratio.
In short, MozJPEG produces jpeg files that are 10% smaller than jpeg files produced by the reference JPEG implementation (libjpeg). Even more interesting, for images larger than 500px, MozJPEG actually produces jpeg files that are smaller than WebP.
This leads to an interesting question. It will depend on exactly your use case and priorities, but it might actually make sense to simplify and use MozJPEG for everything, and ditch WebP entirely.
Looking forward, AVIF might make sense as a true next-gen format (delivering 30% smaller images), but browser support is "coming soon". Alternatively, JPEG XL also looks promising, but the standard hasn't been finalized yet. HEIC is problematic and I wouldn't count on wide support.
Warning regarding jimp:
As jimp is implemented in pure JavaScript, all image operations end up blocking the JS thread. This is catastrophic in node.js.
You must use the new Worker Threads API manually to run jimp on a thread.
Finally, a warning regarding selecting image manipulation libraries generally in the node.js world:
From what I've seen, a majority of them end up writing temp files to disk and then invoking a child process to do the actual work, and then read the result back in. (eg something like child_process.exec('imageresizer -in temp/file.jpg -out temp/resized.jpg')).
This is not an ideal way to do this, and it may be especially surprising when the API looks something like var img = await resizeImg(buffer), which does not look like it writes to disk.
imagemin is one such library; I would avoid it where performance matters.
Instead, search for modules that implement bindings to native code on the libuv thread pool. This will usually be the most performant way to work with images, since the operations happen on a thread in your node process and with minimal memory copying — and no disk I/O at all.
I've never used it, but node-mozjpeg looks like a good candidate.

jpegoptim vs jpegtran vs mozjpeg

I found two slightly conflicting blog posts on this matter, here I summarize:
jpegoptim and jpegtran perform identically and produce identical results, while mozjpeg takes twice as long for very marginal filesize savings
mozjpeg takes approx 1.7* the compute time and results in a roughly proportional benefit in filesize savings compared to jpegtran
Adding to confusion, this
Reddit thread suggests that mozjpeg uses the same algorithm used in jpegcrush, and that jpegcrush is a wrapper for jpegtran... so we've come full circle? I can only assume that those Reddit comments are wrong, can anyone confirm that?
MozJPEG library uses one algorithm inspired by jpegcrush (optimized progressive scans), but technically it's a completely new implementation.
MozJPEG library is a drop-in replacement for the popular libjpeg, so almost every JPEG-related tool can be compiled in "regular" and "MozJPEG" flavors.
There exists a "regular" jpegtran and a MozJPEG version of jpegtran. That's the same program, but the MozJPEG version has different default settings and performs extra work to compress better.
Similarly, jpegoptim is usually built with stock libjpeg, but it's also possible to build it with MozJPEG's version of libjpeg (e.g. ImageOptim does it).
There are two ways to use MozJPEG:
lossless (take an existing JPEG file and make it a bit smaller). That's what MozJPEG's jpegtran does.
lossy (create a new JPEG file from uncompressed pixels, with higher quality/filesize ratio). That's what MozJPEG's cjpeg does, and other tools like jpegoptim can be made to do with MozJPEG.
Both modes of operation are slower than vanilla non-optimizing libjpeg. Lossless optimization does less work, but also achieves smaller gain.
More precise speed analysis is here: https://libjpeg-turbo.org/About/Mozjpeg
Note that "slow" here is relative. In absolute terms it compresses several megapixels per second, so it may be fast enough for most applications.

how to check compression type without decompressing?

I wrote code in nodejs to decompress different file types (like tar, tar.gz etc..)
I do not have the filename available to me.
Currently I use brute force to decompress. The first one that succeeds, wins..
I want to improve this by knowing the compression type beforehand.
Is there a way to do this?
Your "brute force" approach would actually work very well, since the software would determine incredibly quickly, usually within the first few bytes, that it had been handed the wrong thing. Except for the one that will work.
You can see this answer for a list of prefix bytes for common formats. You would also need to detect the tar format within a compressed format, which is not detailed there. Even if you find a matching prefix, you still need to proceed to decompress and decode to test the hypothesis, which is essentially your brute force method.

Image size optimisation in Linux

There is a Yahoo "Smush.it" service that allow to optimise image size.
I am looking for alternative way to reach the same goal using Linux application.
A lot of images need to processed and uploading them manually one by one seems not a good idea.
How such can be done in Linux ?
For jpegs I use JPEGmini and from what I tested it's the one with the best results keeping the same visible image quality while reducing a lot of the size, they have a server version for Linux which is not cheap and I never used.
There's also Mozilla's mozjpeg which you can use directly from the terminal, but it also reduces image quality.
In some tests I did, mozjpeg gives smaller files (not much) than JPEGmini's, but with lower image quality.
If you need to reduce pngs, you could try Trimage or some of the alternatives listed on the same link.
Smush.it's FAQ lists all the tools they are using in their service.

Which gstreamer deinterlace method produces the best output?

I am using gstreamer and it's deinterlace method, there are several options for the method, (options).
Some are marked as bad and should not be used. But which should I use?
I am processing a video file and saving to a new file (i.e. not streaming), so I'm willing to sacrific CPU processing time to get good results. I also am not concerned about the output file size, and am willing to sacrific that for better results. I also don't care if I do (or don't) need to run this more than once.
I am digitizing old family VHS tapes, and noticed this when the small kids were moving/runing/jumping around. Since this is home made video tapes, there aren't many 'scenes' or special effects or fast moving cars. Just long continuous shots of people doing things.
So given that I don't care about lots of factors (processing time etc.), and the nature of the content, which is the best method to use?

Resources