jpegoptim vs jpegtran vs mozjpeg - jpeg

I found two slightly conflicting blog posts on this matter, here I summarize:
jpegoptim and jpegtran perform identically and produce identical results, while mozjpeg takes twice as long for very marginal filesize savings
mozjpeg takes approx 1.7* the compute time and results in a roughly proportional benefit in filesize savings compared to jpegtran
Adding to confusion, this
Reddit thread suggests that mozjpeg uses the same algorithm used in jpegcrush, and that jpegcrush is a wrapper for jpegtran... so we've come full circle? I can only assume that those Reddit comments are wrong, can anyone confirm that?

MozJPEG library uses one algorithm inspired by jpegcrush (optimized progressive scans), but technically it's a completely new implementation.
MozJPEG library is a drop-in replacement for the popular libjpeg, so almost every JPEG-related tool can be compiled in "regular" and "MozJPEG" flavors.
There exists a "regular" jpegtran and a MozJPEG version of jpegtran. That's the same program, but the MozJPEG version has different default settings and performs extra work to compress better.
Similarly, jpegoptim is usually built with stock libjpeg, but it's also possible to build it with MozJPEG's version of libjpeg (e.g. ImageOptim does it).
There are two ways to use MozJPEG:
lossless (take an existing JPEG file and make it a bit smaller). That's what MozJPEG's jpegtran does.
lossy (create a new JPEG file from uncompressed pixels, with higher quality/filesize ratio). That's what MozJPEG's cjpeg does, and other tools like jpegoptim can be made to do with MozJPEG.
Both modes of operation are slower than vanilla non-optimizing libjpeg. Lossless optimization does less work, but also achieves smaller gain.
More precise speed analysis is here: https://libjpeg-turbo.org/About/Mozjpeg
Note that "slow" here is relative. In absolute terms it compresses several megapixels per second, so it may be fast enough for most applications.

Related

nodejs choosing between JIMP and MOZJPEG

I was wondering if there is a blaring reason to use jimp vs. imagemin-mozjpeg for compressing jpegs (I am already using both imagemin and jimp in my project already, imagemin-webp to serve next gen images, and jimp to convert pngs to jpegs in rare cases) So I am more looking for reasoning that is based on the following:
Performance
Reliability (I have noticed that there are some JPEGs mozjpeg has trouble with and fails on. specifically ones that I have used GNU Image Manipulation Program [GIMP] with.)
However, if someone has good reasons that don't align with the two aforementioned I would still like to hear them.
Heres some quick links to the NPM packages mentioned if anyone needs them:
imagemin-mozjpeg
jimp
Performance
imagemin-mozjpeg uses mozjpeg to process images. And mozjpeg itself is made using C language. While jimp uses javascript to process it.
As mention in the main repository jimp:
An image processing library for Node written entirely in JavaScript, with zero native dependencies.
We know the difference in performance between Javascript and C.
Reliability
I do not want much opinion in this section. but we can see directly how the statistics of each repository.
mozjpeg:
Star: 4.1k
Open Issues: 76
Closed Issues: 186
jimp:
Star: 10.3k
Open Issues: 157
Closed Issues: 430
I do not side with either. They all have worked well. I really appreciate the work of the maintainers and contributors of the library library.
Yes, and it goes far beyond the performance of the compression process (ie how long it takes to compress an image, which is also important) or the relative activity of development of the library (which is arguably less important).
I highly recommend reading Is WebP really better than JPEG? (and this discussion), which shows that even among JPEG compression libraries, the implementation can have a significant impact on compression ratio.
In short, MozJPEG produces jpeg files that are 10% smaller than jpeg files produced by the reference JPEG implementation (libjpeg). Even more interesting, for images larger than 500px, MozJPEG actually produces jpeg files that are smaller than WebP.
This leads to an interesting question. It will depend on exactly your use case and priorities, but it might actually make sense to simplify and use MozJPEG for everything, and ditch WebP entirely.
Looking forward, AVIF might make sense as a true next-gen format (delivering 30% smaller images), but browser support is "coming soon". Alternatively, JPEG XL also looks promising, but the standard hasn't been finalized yet. HEIC is problematic and I wouldn't count on wide support.
Warning regarding jimp:
As jimp is implemented in pure JavaScript, all image operations end up blocking the JS thread. This is catastrophic in node.js.
You must use the new Worker Threads API manually to run jimp on a thread.
Finally, a warning regarding selecting image manipulation libraries generally in the node.js world:
From what I've seen, a majority of them end up writing temp files to disk and then invoking a child process to do the actual work, and then read the result back in. (eg something like child_process.exec('imageresizer -in temp/file.jpg -out temp/resized.jpg')).
This is not an ideal way to do this, and it may be especially surprising when the API looks something like var img = await resizeImg(buffer), which does not look like it writes to disk.
imagemin is one such library; I would avoid it where performance matters.
Instead, search for modules that implement bindings to native code on the libuv thread pool. This will usually be the most performant way to work with images, since the operations happen on a thread in your node process and with minimal memory copying — and no disk I/O at all.
I've never used it, but node-mozjpeg looks like a good candidate.

Is there a binary kind of SVG?

It just seems to me that when writing code for dynamic data visualization, I end up doing the same things over and over in different languages/platforms. Now if I had a cross platform language(which I do) and something like a binary version of SVG, I could make my code target that and use/create interpreters for whatever platform I currently need to use it on.
The reason I don't want SVG is because the plaintext part makes it too slow for my purposes. I could of course just create my own intermediary format but if there is something already out there that's implemented by various things then the less work for me!
Depending on what you mean by “too slow”, the answer varies:
Filesize too large
Officially, the closest thing SVG has to a binary format is SVGZ, which is a gzipped SVG file with the .svgz extension. All conforming SVG viewers should be able to open it. Making one is simple on *nix systems:
gzip yourfile.svg && mv yourfile.svg.gz yourfile.svgz
You could also try Brotli compression, which tends to have smaller filesize at the cost of more compression time.
Including other assets is inefficient
SVG can only bundle bitmaps and other binary data through base64 encoding, which has a fair amount of overhead.
PDF can include “streams” of raw binary data, and is surprisingly efficient when programmatically generated.
Parsing the text data takes too long
This is tricky. PDF and its brother, Encapsulated PostScript, are also old, well-supported vector graphic formats. Unfortunately, they too are also text at their core, with optional compression.
You could try Computer Graphics Metafiles, which can be compiled ahead of time. But I’m unsure how well-supported they are across consumer devices.
From a comment:
Almost nothing about the performance of SVG other than the transmission cost of sending it over a network is down to it being plaintext
No, that's completely wrong. I worked at CSIRO using XML for massive 3D models. GeoScience Australia did a formal study into the parsing speed - parsing floating point numbers from text is relatively expensive for big data sets, compared to reading a 4 or 8 byte binary representation.
I've spent a lot of time optimising my internal binary formats for Touchgram and am now looking at vector art.
One of the techniques you can use is a combination of
variable-length integer coding and
normalising your points to a scale represented by integers, then storing paths as sequences of deltas
That can yield paths where often only 1 or 2 bytes are used per step, as opposed to the typical 12.
Consider a basic line
<polyline class="Connect" points="100,200 100,100" />
I could represent that with 4 bytes instead of 53.
So far, all I've been able to find in binary SVG is this post about a Go project linking to the project description and repo
Adobe Flash SWF files may work. Due to its previous ubiquity, 'players' and libraries were written for many platforms. The specifications were open and license permitting. For simple 2D graphics, earlier, more compatible versions would do fine.
The files are binary and extraordinarily small.

Image size optimisation in Linux

There is a Yahoo "Smush.it" service that allow to optimise image size.
I am looking for alternative way to reach the same goal using Linux application.
A lot of images need to processed and uploading them manually one by one seems not a good idea.
How such can be done in Linux ?
For jpegs I use JPEGmini and from what I tested it's the one with the best results keeping the same visible image quality while reducing a lot of the size, they have a server version for Linux which is not cheap and I never used.
There's also Mozilla's mozjpeg which you can use directly from the terminal, but it also reduces image quality.
In some tests I did, mozjpeg gives smaller files (not much) than JPEGmini's, but with lower image quality.
If you need to reduce pngs, you could try Trimage or some of the alternatives listed on the same link.
Smush.it's FAQ lists all the tools they are using in their service.

Linux command to compress the most at the expense of being slow

I know there are commands like gzip, bzip, tar, xz. I want to know which command compresses the most to claim more space even at the expense of slower compression time. Any benchmark statistics would be great.
I googled but couldnt find any page which kind of compared it to my needs.
You can find some (potentailly outdated) benchmarks here. In general, xz -9 is going to compress things pretty well. As others have mentioned, it really depends on what data you're trying to compress, though.
I would recommend giving either bzip or xz a try, and see if they are fast enough and/or compress well enough for your particular application. Benchmarks are never as good as testing them on your exact data set.

Ways to Optimize Ghostscript?

I'm using ghostscript on a linux server to combine PDFs. I'm using the latest version, 9.05. I typically have to combine 5 or more PDFs. Currently it takes around 20 seconds to combine 3 PDFs...that just seems really slow to me. Here's the line I'm using:
gs -dBATCH -dNOPAUSE -dNOGC -q -sDEVICE=pdfwrite -sOutputFile=output.pdf -c 3000000 setvmthreshold -f a.pdf b.pdf c.pdf
Any suggestions?
Ghostscript fully interprets each PDF file down to marking operations and then builds a new file from the combined content. This is, obviously, far slower than simply copying the content stream around. That's why what you are doing seems slow.
As suggested above, use a tool which just copies the content streams and objects, renumbering as required, this will be much faster. In addition to pdfjam (which I don't know anything about) you could also look at pdftk. There are bound to be others as well.
If you just need to catenate some pdf's, you might check out pdfjam. I've never found it slow during catentation, but it does at times produce output pdf's that print rather slowly.
link
Use pdfconcat, it'll do it in a split-second. Ghostscript is slow doing everything.
After tracking down what was causing time-outs, I've noticed that Ghostscript is a lot faster dealing with pdf v1.4 (acrobat 5 compatibility) files as oppose to v1.7. Simply saving the PDF files as v1.4 speeds up when the files are later used by Ghostscript.
This may not be applicable to all situations. Depending on the speed it may be worth using Ghostscript to convert the pdf down to a lower version then other operations. I'm seeing time difference from 30+ seconds to ~1 second just by changing the pdf version.

Resources