Is there anything more compact than base64 for providing large images in plaintext logs? - node.js

I need to transfer a generated 1920x1080 image out of a Node pipeline that has no connectivity to either the internet or me, with the only controllable output being plaintext logs
So in order to retrieve that image, I need some way to transfer that data through a log file, which is tricky given the image is around 45KB with .jpeg compression
Converting the image to a base64 data URI works perfectly, as I can copy it from the logs and see the image in my browser, but the size of the URI is unwieldly at 40k+ characters
Is there a more compact or efficient way of doing this?

This link gives insight: Binary To text Encoding
Base85 Base85/ASCII85 is probably most promising but offers no more than 5% improvement. Is ~42KB less unwieldy than 45KB? How long will the solution live, how "shareable" must the output be and must image quality be preserved?
If image quality for your purpose is not all that important, you may get better mileage out of fiddling with encoding options (JPEG is a lossy method of encoding so you could save significant space at the cost of letting the images get more grainy although at 45KB they probably already are (?)).
You may want to get rid of mention of URIs in your question. You're interested in Base64 as an encoding method (for which 64 common text characters are used as digits... like decimal uses 10, hexadecimal 16 and Base85 85 etc). Base64 is just a ubiquitous "standard" (actually a family of several standards - see Base64 Variants).

Related

How can I find and extract an image from inside a proprietary file format?

I have cached preview files from Capture One (a photo processing program, similar to Lightroom) where I have lost the originals. Capture One saves previews in their proprietary .cop format and I'm not sure how to go about identifying what's what in there.
There are the strings ETIFFTagInteropIFD and JPEG Embedded TIFF Tags seen in the HEX view which suggests that they are somehow embedding a TIFF in there.
I do have original JPEG files with their corresponding COP-file, but when comparing them there isn't much that's similar - which makes sense I guess, since the preview COP-file is roughly half the size of the original.
What conclusions can I draw from this and what are some good tools for going further?

Is there a binary kind of SVG?

It just seems to me that when writing code for dynamic data visualization, I end up doing the same things over and over in different languages/platforms. Now if I had a cross platform language(which I do) and something like a binary version of SVG, I could make my code target that and use/create interpreters for whatever platform I currently need to use it on.
The reason I don't want SVG is because the plaintext part makes it too slow for my purposes. I could of course just create my own intermediary format but if there is something already out there that's implemented by various things then the less work for me!
Depending on what you mean by “too slow”, the answer varies:
Filesize too large
Officially, the closest thing SVG has to a binary format is SVGZ, which is a gzipped SVG file with the .svgz extension. All conforming SVG viewers should be able to open it. Making one is simple on *nix systems:
gzip yourfile.svg && mv yourfile.svg.gz yourfile.svgz
You could also try Brotli compression, which tends to have smaller filesize at the cost of more compression time.
Including other assets is inefficient
SVG can only bundle bitmaps and other binary data through base64 encoding, which has a fair amount of overhead.
PDF can include “streams” of raw binary data, and is surprisingly efficient when programmatically generated.
Parsing the text data takes too long
This is tricky. PDF and its brother, Encapsulated PostScript, are also old, well-supported vector graphic formats. Unfortunately, they too are also text at their core, with optional compression.
You could try Computer Graphics Metafiles, which can be compiled ahead of time. But I’m unsure how well-supported they are across consumer devices.
From a comment:
Almost nothing about the performance of SVG other than the transmission cost of sending it over a network is down to it being plaintext
No, that's completely wrong. I worked at CSIRO using XML for massive 3D models. GeoScience Australia did a formal study into the parsing speed - parsing floating point numbers from text is relatively expensive for big data sets, compared to reading a 4 or 8 byte binary representation.
I've spent a lot of time optimising my internal binary formats for Touchgram and am now looking at vector art.
One of the techniques you can use is a combination of
variable-length integer coding and
normalising your points to a scale represented by integers, then storing paths as sequences of deltas
That can yield paths where often only 1 or 2 bytes are used per step, as opposed to the typical 12.
Consider a basic line
<polyline class="Connect" points="100,200 100,100" />
I could represent that with 4 bytes instead of 53.
So far, all I've been able to find in binary SVG is this post about a Go project linking to the project description and repo
Adobe Flash SWF files may work. Due to its previous ubiquity, 'players' and libraries were written for many platforms. The specifications were open and license permitting. For simple 2D graphics, earlier, more compatible versions would do fine.
The files are binary and extraordinarily small.

How to include UTF-8 data: URIs (for SVGs), in LESS?

Reasoning
I'm bundling small minified SVGs (icons) with my CSS via LESS's data-uri method, to reduce HTTP requests similar to the purpose of icon fonts such as Octicons or Ye Olde CSS Sprites.
However, LESS encodes them in Base64.
This is sub-optimal in the case of SVG, which can be Data URI'd in UTF-8 (example).
There's three reasons why this is sub-optimal:
1: Base64 is silly for text
The purpose of Base64 is to encode binary data using only 6 bits per byte, making it safe to embed in text files. This is great for PNGs and JPEGs, but it makes any text file 33% larger for no reason. If you're now thinking "well gzip takes care of that", then keep in mind that...
2: Encoding text in Base64 makes gzip much less effective
To understand why this is the case, consider this:
btoa('width') === 'd2lkdGg='
btoa(' width') === 'IHdpZHRo'
btoa(' width') === 'ICB3aWR0aA=='
As a practical example, let's take an actual SVG and experiment with it:
$ wc -c *
68630 tiger.svg
25699 tiger.svg.gz
91534 tiger.txt
34633 tiger.txt.gz
Even after gzipping, it's still ~35% larger.
3: It disregards some free sources of redundancy
Think about the width example above. Every SVG will have this substring, and if you embed SVGs in a CSS, you'll probably have this keyword somewhere else (or others), which gzip could benefit from (because this is how Huffman Coding works), but not if it's hidden by Base64.
The Question
How can I embed SVGs in LESS as data: URIs using UTF-8 instead of Base64?
I can imagine a thousand ways to do this involving build tools like Grunt, but it breaks my workflow because I won't be able to do things like style: include:less all.less from a Jade view (I do this in development), or even just #import 'images.less'; from a less file.
I'm an idiot. This is simple:
data-uri('image/svg+xml;charset=UTF-8', 'path/to.svg')
I had to read LESS's source to figure this one out.
All the benefits I mention above are gained here, in particular that if you have tons of small SVGs, they will benefit from the redundancy between each other. And it works in all browsers.

What is the advantage of using Base64 encoding?

What is the advantage of using Base64 encode?
I would like to understand it better. Do I really need it? Can't I simply use pure strings?
I heard that the encoding can be up to 30% larger than the original (at least for images).
Originally some protocols only allowed 7 bit, and sometimes only 6 bit, data.
Base64 allows one to encode 8 bit data into 6 bits for transmission on those types of links.
Email is an example of this.
The primary use case of base64 encoding is when you want to store or transfer data with a restricted set of characters; i.e. when you can't pass an arbitrary value in each byte.
<img alt="Embedded Image"
src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAADIA..." />
This code will show encoded image, but no one can link to this image from another website and use your traffic.
Base64 decode
The advantages of Base64 encode, like somebody said, are available to transmit data from binary, into (most commonly) ASCII characters. Due to the likeliness that the receiving end can handle ASCII, it makes it a nice way to transfer binary data, via a text stream.
If your situation can handle native binary data, that will most likely yield better results, in terms of speed and such, but if not, Base64 is most likely the way to go. JSON is a great example of when you would benefit from something like this, or when it needs to be stored in a text field somewhere. Give us some more details and we can provide a better tailored answer.
One application is to transfer binary data in contexts where only characters are allowed. E.g. in XML documents/transfers. XML-RPC is an example of this.
Convert BLOB data to string and back...
Whether or not to use it depends on what you're using it for.
I've used it mostly for encoding binary data to pass through a mechanism that has really been created for text files. For example - when passing a digital certificate request around or retrieving the finished digital certificate -- in those cases, it's often very convenient to pass the binary data as Base 64 via a text field on a web form.
I probably wouldn't use it if you have something that is already text and you just want to pass it somewhere.
I use it for passing around files that tend to get chewed up by email programs because they look like text files (e.g. HL7 transcripts for replay).

Will random data appended to a JPG make it unusable?

So, to simplify my life I want to be able to append from 1 to 7 additional characters on the end of some jpg images my program is processing*. These are dummy padding (fillers, etc - probably all 0x00) just to make the file size a multiple of 8 bytes for block encryption.
Having tried this out with a few programs, it appears they are fine with the additional characters, which occur after the FF D9 that specifies the end of the image - so it appears that the file format is well defined enough that the 'corruption' I'm adding at the end shouldn't matter.
I can always post process the files later if needed, but my preference is to do the simplest thing possible - which is to let them remain (I'm decrypting other file types and they won't mind, so having a special case is annoying).
I figure with all the talk of Steganography hullaballo years ago, someone has some input here...
(encryption processing by 8 byte blocks, I don't want to save pre-encrypted file size, so append 0x00 to input data, and leave them there after decoding)
No, you can add bits to the end of a jpg file, without making it unusable. The heading of the jpg file tells how to read it, so the program reading it will stop at the end of the jpg data.
In fact, people have hidden zip files inside jpg files by appending the zip data to the end of the jpg data. Because of the way these formats are structured, the resulting file is valid in either format.
You can .. but the results may be unpredictable.
Even though there is enough information in the format to tell the client to ignore the extra data it is likely not a case the programmer tested for.
A paranoid program might look at the size, notice the discrepancy and decide it won't process your file because clearly it doesn't fully understand it. This is particularly likely when reading data from the web when random bytes in a file could be considered a security risk.
You can embed your data in the XMP tag within a JPEG (or EXIF or IPTC fields for that matter).
XMP is XML so you have a fair bit of flexibility there to do you own custom stuff.
It's probably not the simplest thing possible but putting your data here will maintain the integrity of the JPEG and require no "post processing".
You data will then show up in other imaging software such as PhotoShop, which may not be ideal.
As others have stated, you have no control how programs process image files and therefore some programs may find the images valid others may not.
However, there is a bigger issue here. Judging by your question, I'm deducing you're practicing "security through obscurity." It's widely considered a very bad practice. Use Google to find a plethora of articles about the topic.

Resources