It just seems to me that when writing code for dynamic data visualization, I end up doing the same things over and over in different languages/platforms. Now if I had a cross platform language(which I do) and something like a binary version of SVG, I could make my code target that and use/create interpreters for whatever platform I currently need to use it on.
The reason I don't want SVG is because the plaintext part makes it too slow for my purposes. I could of course just create my own intermediary format but if there is something already out there that's implemented by various things then the less work for me!
Depending on what you mean by “too slow”, the answer varies:
Filesize too large
Officially, the closest thing SVG has to a binary format is SVGZ, which is a gzipped SVG file with the .svgz extension. All conforming SVG viewers should be able to open it. Making one is simple on *nix systems:
gzip yourfile.svg && mv yourfile.svg.gz yourfile.svgz
You could also try Brotli compression, which tends to have smaller filesize at the cost of more compression time.
Including other assets is inefficient
SVG can only bundle bitmaps and other binary data through base64 encoding, which has a fair amount of overhead.
PDF can include “streams” of raw binary data, and is surprisingly efficient when programmatically generated.
Parsing the text data takes too long
This is tricky. PDF and its brother, Encapsulated PostScript, are also old, well-supported vector graphic formats. Unfortunately, they too are also text at their core, with optional compression.
You could try Computer Graphics Metafiles, which can be compiled ahead of time. But I’m unsure how well-supported they are across consumer devices.
From a comment:
Almost nothing about the performance of SVG other than the transmission cost of sending it over a network is down to it being plaintext
No, that's completely wrong. I worked at CSIRO using XML for massive 3D models. GeoScience Australia did a formal study into the parsing speed - parsing floating point numbers from text is relatively expensive for big data sets, compared to reading a 4 or 8 byte binary representation.
I've spent a lot of time optimising my internal binary formats for Touchgram and am now looking at vector art.
One of the techniques you can use is a combination of
variable-length integer coding and
normalising your points to a scale represented by integers, then storing paths as sequences of deltas
That can yield paths where often only 1 or 2 bytes are used per step, as opposed to the typical 12.
Consider a basic line
<polyline class="Connect" points="100,200 100,100" />
I could represent that with 4 bytes instead of 53.
So far, all I've been able to find in binary SVG is this post about a Go project linking to the project description and repo
Adobe Flash SWF files may work. Due to its previous ubiquity, 'players' and libraries were written for many platforms. The specifications were open and license permitting. For simple 2D graphics, earlier, more compatible versions would do fine.
The files are binary and extraordinarily small.
Related
From Wikipedia, about ZPAQ Compression-
ZPAQ has 5 compression levels from fast to best. At all but the best level, it uses the statistics of the order-1 prediction table used for deduplication to test whether the input appears random. If so, it is stored without compression as a speed optimization.
I've been working with the Python Data Compression and Archiving module, and wonder if any of those implementations (ZLIB, BZ2, LZMA) do the same? Do any of them simply store the data 'as-is' when it looks almost random? I'm not a coding expert and can't really follow the source code.
Related: How to efficiently predict if data is compressible
Some incomplete / best-guess remarks:
LZMA2 seems to do that, although for different reasons: compression-ratio; not for improving compression-time.
This is indicated at wiki:
LZMA2 is a simple container format that can include both uncompressed data and LZMA data, possibly with multiple different LZMA encoding parameters.
The XZ LZMA2 encoder processes the input in chunks (of up to 2 MB uncompressed size or 64 KB compressed size, whichever is lower), handing each chunk to the LZMA encoder, and then deciding whether to output an LZMA2 LZMA chunk including the encoded data, or to output an LZMA2 uncompressed chunk, depending on which is shorter (LZMA, like any other compressor, will necessarily expand rather than compress some kinds of data).
The latter quote also shows that there is no expected compression-speed gain as it's more or less a: do both and pick best approach.
(The article seems to focus on xz-based lzma2; probably transfers to whatever is within python; but no guarantees)
Above, together with python's docs:
Compression filters:
FILTER_LZMA1 (for use with FORMAT_ALONE)
FILTER_LZMA2 (for use with FORMAT_XZ and FORMAT_RAW)
would make me think you got everything you need and just need to use the right filter.
So check your reasoning again (time- or compression-ratio) and try the lzma2-filter with custom-prepared mixed data (if you don't want to trust blindly).
Intuition i don't expect the more classic zlib/bz2 formats to exploit uncompressable data (but it's a pure guess).
Reasoning
I'm bundling small minified SVGs (icons) with my CSS via LESS's data-uri method, to reduce HTTP requests similar to the purpose of icon fonts such as Octicons or Ye Olde CSS Sprites.
However, LESS encodes them in Base64.
This is sub-optimal in the case of SVG, which can be Data URI'd in UTF-8 (example).
There's three reasons why this is sub-optimal:
1: Base64 is silly for text
The purpose of Base64 is to encode binary data using only 6 bits per byte, making it safe to embed in text files. This is great for PNGs and JPEGs, but it makes any text file 33% larger for no reason. If you're now thinking "well gzip takes care of that", then keep in mind that...
2: Encoding text in Base64 makes gzip much less effective
To understand why this is the case, consider this:
btoa('width') === 'd2lkdGg='
btoa(' width') === 'IHdpZHRo'
btoa(' width') === 'ICB3aWR0aA=='
As a practical example, let's take an actual SVG and experiment with it:
$ wc -c *
68630 tiger.svg
25699 tiger.svg.gz
91534 tiger.txt
34633 tiger.txt.gz
Even after gzipping, it's still ~35% larger.
3: It disregards some free sources of redundancy
Think about the width example above. Every SVG will have this substring, and if you embed SVGs in a CSS, you'll probably have this keyword somewhere else (or others), which gzip could benefit from (because this is how Huffman Coding works), but not if it's hidden by Base64.
The Question
How can I embed SVGs in LESS as data: URIs using UTF-8 instead of Base64?
I can imagine a thousand ways to do this involving build tools like Grunt, but it breaks my workflow because I won't be able to do things like style: include:less all.less from a Jade view (I do this in development), or even just #import 'images.less'; from a less file.
I'm an idiot. This is simple:
data-uri('image/svg+xml;charset=UTF-8', 'path/to.svg')
I had to read LESS's source to figure this one out.
All the benefits I mention above are gained here, in particular that if you have tons of small SVGs, they will benefit from the redundancy between each other. And it works in all browsers.
DDS format has been made for directX right ? so it's should be optimized for it and not for openGL I guess.
So, there is another format(s) ? if yes, what format is a good choice ? what reason(s) ?
also, since I'm working on linux, I'm also concerned by making textures on linux. So I need a format who can be imported/exported by gimp.
The DDS format is useful for storing compressed textures. If you store the file in the same compression as it will be stored in the GPU memory, you don't need to decode and re-encode for GPU storage, instead you can just move it directly to memory.
The DDS format is basically used to store S3 Texture Compression data. The internal DDS formats DTX3 and DTX5 are for example S3TC formats that are also supported by OpenGL:
http://www.opengl.org/wiki/S3_Texture_Compression
DDS also can store pre-calculated mipmaps. Again this is a trade-off (larger file size) for reducing loading times, as the mipmaps could also be calculated at loading time.
As you can see, if you have the right code to parse the DDS file, e.g. the payload will be taken in its compressed form and not decoded on the host machine, then it is perfectly fine to use a DDS.
For an alternative, #CoffeeandCode pointed out the KTX format in his answer. These files use a different compression algorithm (see here). The advantage is that this compression is mandatory in newer OpenGL versions, while S3TC compression was always only available as an extension (and has patent problems). I don't know how they compare in quality and if you can expect OpenGL 4.3 on your target platforms.
My Take: If you are targeting recent hardware and OpenGL support (OpenGL ES 3 or OpenGL 4.3), you should use the KTX format and respective texture formats (libktx will generate the texture objects for you). If you need to be able to run on older hardware or happen to already have a lot of DDS data, you should probably stick with DDS for the time being.
There is nothing particularly D3D-"optimized" about DDS.
Once you read the header correctly, the (optionally) pre-compressed data is binary compatible with S3TC. Both DDS and OpenGL's equivalent (KTX) are capable of storing cubemap arrays and mipmaps in a single image file, that is their primary appeal.
With other formats, you may wind up using the driver to compress and/or generate mipmaps, and the driver usually does a poor job quality wise. The drivers are usually designed to do this quickly because they have a lot of textures that need to be compressed / mipmapped. You can use a higher quality mipmap downsample filter / compression implementation offline since the amount of time required to complete is rather unimportant.
The key benefits of DDS / KTX are:
Offline computation of mipmaps
Offline compression
Store all layers of a texture array/cubemap
Doing (1) and (2) offline can both improve image quality and reduce the overhead of loading textures at run-time. (3) is mostly for convenience, but a welcomed one.
I think the closest equivalent to DDS for DirectX is KTX, but even DDS works fine under OpenGL once parsed.
as the title, I used to use .dds, it did work,now I use type of .png, can it generate mipmap? Here functions what I am using: glTexImage2D(…).Or maybe gluBuild2DMipmaps(…) a better choice?
DDS are an image format that contains precalculated mipmaps. As far as quality goes, precalculated mipmaps offer the best quality, since they can be downsampled offline with advanced filter kernels like Lancozs, without having to care about runtime efficiency.
PNG does not contain additional mipmap levels so you have to compute the mipmaps at runtime. You should however not use gluBuild2DMipmaps for this. For one this function is known to exhibit buggy behavior in certain conditions and furthermore it will unconditionally resample all images to power-of-2 dimensions, although since OpenGL-2 non power-of-2 dimensions are perfectly fine for texture images.
Instead you should load the base level image with glTexImage2D(…) and use glGenerateMipmap(…) (available since OpenGL-3) to build the mipmap image pyramid from there. If you don't use OpenGL-3, you can use the SGIS_generate_mipmap extension, if available.
However be advised that online mipmap generation may yield not as good results as offline generation.
Another possible approach would be the use of JPEG2000 images; the nature of JPEG2000 image encoding results in an image pyramid being readily available. OTOH JPEG2000 is very costly to encode and decode.
I have some ideas I would like to experiment with relating to data compression, but am finding it difficult to decipher some parts of how the standard are applied "in real life". I would like to look at some sample compressed files to observe how the the blocks are arranged and the huffman tree(s) are structured.
Are there any tools in existence which can help visualize this for a given compressed file (zip/gzip/deflate etc)? I'm picturing something like a tree view or some form of graph visualizer.
You might be interested in this (if you are still interested that is :-P)
http://jvns.ca/blog/2013/10/24/day-16-gzip-plus-poetry-equals-awesome/
I made a "entropy image" tool.
The entropy_image tool replaces each pixel
with the (estimated) number of bits
necessary to encode that pixel using range coding or Huffman compression.
I hope this isn't the only compression visualization tool in the world.