efficient image compression for pdf embedding with linux - linux

I would like to compress scanned text (monochrome or few colours) and store it in pdf (maybe djvu) files. I remember that I got very good results with Windows/Acrobat and "ZRLE" compressed monochrome tiff embedded into pdf. The algorithm was loossless as far as I remember. Now I search a way to obtain good results on linux. It should be storage saving and avoid loss (I do not mind loosing colours, but I do not want e.g. jpeg compression which would create noisy results for text scans). I need it for batch conversion, so I was thinking of the ImageMagick convert command. But which output format should I use so I get good results and to be able to embed it into pdf files (for example using pdflatex)? Or is it generally better to use djvu files?

jbig2enc encoder for images using jbig2 compression,
was originally written for GoogleBooks by Adam Langley
https://github.com/agl/jbig2enc
I forked to include latest improvements By Rubypdf and others
https://github.com/DingoDog/jbig2enc
I also built several binaries of jbig2enc for puppy linux (it can be working also on other distributions)
http://dokupuppylinux.info/programs:encoders

DJVU is not a bad choice, but if you want to stay in PDF for better compatibility you may want to look into lossless JBIG2 compression.
Quote from Wikipedia:
Overall, the algorithm used by JBIG2 to compress text is very similar
to the JB2 compression scheme used in the DjVu file format for coding
binary images.

Related

Archive and compress set of very similar png images

I have tens of thousands of png images that are very similar to each other and I would like to archive them and compress them in the process. I am aware that png files can barely be compressed because it already is a compressed file format. In my case though, these are similar to each other which is why I thought that there may be some program out there that takes advantage of that. Any hints?
EDIT: example image: https://imgur.com/a/N9csZZH
Video compression also tries to take advantage of similar images to compress smaller. You should try a lossless video compression codec such as WebM.
I think it does not matter for PNG or other lossless formats how similar they appear for humans when you want to compress them (in a single tar for example). Only a few bits or pixels difference make each image mathematically a totally different object. When a high compression of zstd cannot do the trick, you won't need to search anymore. You cannot outperform entropy. There is a mathematical limit on compression, and zstd comes close to that.

How to convert an Enhanced Windows Metafile (emf) to a JPG or PNG without loosing quality?

I am using Tableau to do some data representations and the only good quality image export Tableau allows is *.emf
Unfortunately, the online tool I use to put the report together(Canva) does not support emf format.
When I convert the file to jpg or png, the quality is drastically reduced :(
How can I overcome this matter? I tried many things such as opening emf in Illustrator and saving back with CMYK colors and 300dpi and such. But nothing seems to keep the crisp quality of the original emf file.
User Friendly solution:
InkScape opens enhanced windows metafiles, and many other vector-graphical file formats.
It exports to png with choice for output's resolution
It is opensource and available for Linux, windows and Mac OS X.
It is a fact that Tableau's image export feature does not provide many options. In general when I need high quality images, I use one of the below methods depending on the quality I need and the tools available to me at that time:
Screenshot method: If you have a large screen, taking a screenshot directly from Tableau yields better images than the exported ones. If my viz is exported to web, I sometimes enlarge the graphic from my web browser and then take the screenshot.
Converting from PDF: Since PDF can contain vector objects, Tableau's PDF files are in high quality most of the times. If you cannot use these PDF files, you may try converting these files to PNG or JPG files using online or desktop tools. Here is an online tool you may use for this purpose, but be careful about your confidential files when using such online services :)
And there are more ways to convert from PDF but are usually more complicated since they contain some Photoshop steps. I am not sure whether these are easy to apply methods for a lot of files but still you may want to check one of them: https://community.tableau.com/thread/120134

How can I find and extract an image from inside a proprietary file format?

I have cached preview files from Capture One (a photo processing program, similar to Lightroom) where I have lost the originals. Capture One saves previews in their proprietary .cop format and I'm not sure how to go about identifying what's what in there.
There are the strings ETIFFTagInteropIFD and JPEG Embedded TIFF Tags seen in the HEX view which suggests that they are somehow embedding a TIFF in there.
I do have original JPEG files with their corresponding COP-file, but when comparing them there isn't much that's similar - which makes sense I guess, since the preview COP-file is roughly half the size of the original.
What conclusions can I draw from this and what are some good tools for going further?

Is there a binary kind of SVG?

It just seems to me that when writing code for dynamic data visualization, I end up doing the same things over and over in different languages/platforms. Now if I had a cross platform language(which I do) and something like a binary version of SVG, I could make my code target that and use/create interpreters for whatever platform I currently need to use it on.
The reason I don't want SVG is because the plaintext part makes it too slow for my purposes. I could of course just create my own intermediary format but if there is something already out there that's implemented by various things then the less work for me!
Depending on what you mean by “too slow”, the answer varies:
Filesize too large
Officially, the closest thing SVG has to a binary format is SVGZ, which is a gzipped SVG file with the .svgz extension. All conforming SVG viewers should be able to open it. Making one is simple on *nix systems:
gzip yourfile.svg && mv yourfile.svg.gz yourfile.svgz
You could also try Brotli compression, which tends to have smaller filesize at the cost of more compression time.
Including other assets is inefficient
SVG can only bundle bitmaps and other binary data through base64 encoding, which has a fair amount of overhead.
PDF can include “streams” of raw binary data, and is surprisingly efficient when programmatically generated.
Parsing the text data takes too long
This is tricky. PDF and its brother, Encapsulated PostScript, are also old, well-supported vector graphic formats. Unfortunately, they too are also text at their core, with optional compression.
You could try Computer Graphics Metafiles, which can be compiled ahead of time. But I’m unsure how well-supported they are across consumer devices.
From a comment:
Almost nothing about the performance of SVG other than the transmission cost of sending it over a network is down to it being plaintext
No, that's completely wrong. I worked at CSIRO using XML for massive 3D models. GeoScience Australia did a formal study into the parsing speed - parsing floating point numbers from text is relatively expensive for big data sets, compared to reading a 4 or 8 byte binary representation.
I've spent a lot of time optimising my internal binary formats for Touchgram and am now looking at vector art.
One of the techniques you can use is a combination of
variable-length integer coding and
normalising your points to a scale represented by integers, then storing paths as sequences of deltas
That can yield paths where often only 1 or 2 bytes are used per step, as opposed to the typical 12.
Consider a basic line
<polyline class="Connect" points="100,200 100,100" />
I could represent that with 4 bytes instead of 53.
So far, all I've been able to find in binary SVG is this post about a Go project linking to the project description and repo
Adobe Flash SWF files may work. Due to its previous ubiquity, 'players' and libraries were written for many platforms. The specifications were open and license permitting. For simple 2D graphics, earlier, more compatible versions would do fine.
The files are binary and extraordinarily small.

Streaming images with Yesod and any image conversion library

I need to work with tiff images online. Tiff images are not supported by browsers. So i thought maybe i can convert them on the fly and stream them into the browser as pngs.
I found many image processing haskell libraries and JuicyPixels looks simple enough and supports reading from tiff and saving to many other formats including png.
The simplest case is to just save to png file and then serve it with sendFile
But i think involving hard drive in the process is going to add too much overhead and substantially slow down the response. SO my question is, how do i stream the image converted with JuicyPixels from tiff to png directly, without saving it into a file first.
Does JuicyPixels have any streaming interfaces? Or maybe there's a simple enough way to get to data representation in specific format and then pass it to any streaming libraries like conduit?
As i side question, anyone did streaming images from Yesod?
I don't have any experience with JuicyPixels, but it looks like it encodes to lazy ByteStrings. If that's the case, then you just need to return that lazy ByteString wrapped up in a DontFullyEvaluate.

Resources