visualization of compressed (deflated, gzipped) content structures - zip

I have some ideas I would like to experiment with relating to data compression, but am finding it difficult to decipher some parts of how the standard are applied "in real life". I would like to look at some sample compressed files to observe how the the blocks are arranged and the huffman tree(s) are structured.
Are there any tools in existence which can help visualize this for a given compressed file (zip/gzip/deflate etc)? I'm picturing something like a tree view or some form of graph visualizer.

You might be interested in this (if you are still interested that is :-P)
http://jvns.ca/blog/2013/10/24/day-16-gzip-plus-poetry-equals-awesome/

I made a "entropy image" tool.
The entropy_image tool replaces each pixel
with the (estimated) number of bits
necessary to encode that pixel using range coding or Huffman compression.
I hope this isn't the only compression visualization tool in the world.

Related

Do any of the Python compression module algorithms simply store the data for speed optimisation?

From Wikipedia, about ZPAQ Compression-
ZPAQ has 5 compression levels from fast to best. At all but the best level, it uses the statistics of the order-1 prediction table used for deduplication to test whether the input appears random. If so, it is stored without compression as a speed optimization.
I've been working with the Python Data Compression and Archiving module, and wonder if any of those implementations (ZLIB, BZ2, LZMA) do the same? Do any of them simply store the data 'as-is' when it looks almost random? I'm not a coding expert and can't really follow the source code.
Related: How to efficiently predict if data is compressible
Some incomplete / best-guess remarks:
LZMA2 seems to do that, although for different reasons: compression-ratio; not for improving compression-time.
This is indicated at wiki:
LZMA2 is a simple container format that can include both uncompressed data and LZMA data, possibly with multiple different LZMA encoding parameters.
The XZ LZMA2 encoder processes the input in chunks (of up to 2 MB uncompressed size or 64 KB compressed size, whichever is lower), handing each chunk to the LZMA encoder, and then deciding whether to output an LZMA2 LZMA chunk including the encoded data, or to output an LZMA2 uncompressed chunk, depending on which is shorter (LZMA, like any other compressor, will necessarily expand rather than compress some kinds of data).
The latter quote also shows that there is no expected compression-speed gain as it's more or less a: do both and pick best approach.
(The article seems to focus on xz-based lzma2; probably transfers to whatever is within python; but no guarantees)
Above, together with python's docs:
Compression filters:
FILTER_LZMA1 (for use with FORMAT_ALONE)
FILTER_LZMA2 (for use with FORMAT_XZ and FORMAT_RAW)
would make me think you got everything you need and just need to use the right filter.
So check your reasoning again (time- or compression-ratio) and try the lzma2-filter with custom-prepared mixed data (if you don't want to trust blindly).
Intuition i don't expect the more classic zlib/bz2 formats to exploit uncompressable data (but it's a pure guess).

What is the main different between JPEG-like compression standards and EZW?

Could anyone please explain briefly the difference between JPEG and EZW? And why JPEG is more popular? Is JPEG always better than EZW or just in most of the cases?
Thank you very much!
EZW is a theoretical technique which can be used as one step in wavelet compression. It's not a complete image encoder, and can't be used on its own. As best I can tell, nobody has proposed any image formats which depend on EZW, so nothing uses it (to the best of my knowledge).
(As an aside, wavelet image compression techniques have generally proven unsuccessful.)
JPEG, by contrast, is a standard which encompasses all layers of an image compressor, including the DCT as well as everything surrounding it: color space, entropy coding, file format, metadata, etc. Unlike EZW, it's been a complete, usable standard since 1992.

Is there a binary kind of SVG?

It just seems to me that when writing code for dynamic data visualization, I end up doing the same things over and over in different languages/platforms. Now if I had a cross platform language(which I do) and something like a binary version of SVG, I could make my code target that and use/create interpreters for whatever platform I currently need to use it on.
The reason I don't want SVG is because the plaintext part makes it too slow for my purposes. I could of course just create my own intermediary format but if there is something already out there that's implemented by various things then the less work for me!
Depending on what you mean by “too slow”, the answer varies:
Filesize too large
Officially, the closest thing SVG has to a binary format is SVGZ, which is a gzipped SVG file with the .svgz extension. All conforming SVG viewers should be able to open it. Making one is simple on *nix systems:
gzip yourfile.svg && mv yourfile.svg.gz yourfile.svgz
You could also try Brotli compression, which tends to have smaller filesize at the cost of more compression time.
Including other assets is inefficient
SVG can only bundle bitmaps and other binary data through base64 encoding, which has a fair amount of overhead.
PDF can include “streams” of raw binary data, and is surprisingly efficient when programmatically generated.
Parsing the text data takes too long
This is tricky. PDF and its brother, Encapsulated PostScript, are also old, well-supported vector graphic formats. Unfortunately, they too are also text at their core, with optional compression.
You could try Computer Graphics Metafiles, which can be compiled ahead of time. But I’m unsure how well-supported they are across consumer devices.
From a comment:
Almost nothing about the performance of SVG other than the transmission cost of sending it over a network is down to it being plaintext
No, that's completely wrong. I worked at CSIRO using XML for massive 3D models. GeoScience Australia did a formal study into the parsing speed - parsing floating point numbers from text is relatively expensive for big data sets, compared to reading a 4 or 8 byte binary representation.
I've spent a lot of time optimising my internal binary formats for Touchgram and am now looking at vector art.
One of the techniques you can use is a combination of
variable-length integer coding and
normalising your points to a scale represented by integers, then storing paths as sequences of deltas
That can yield paths where often only 1 or 2 bytes are used per step, as opposed to the typical 12.
Consider a basic line
<polyline class="Connect" points="100,200 100,100" />
I could represent that with 4 bytes instead of 53.
So far, all I've been able to find in binary SVG is this post about a Go project linking to the project description and repo
Adobe Flash SWF files may work. Due to its previous ubiquity, 'players' and libraries were written for many platforms. The specifications were open and license permitting. For simple 2D graphics, earlier, more compatible versions would do fine.
The files are binary and extraordinarily small.

How to extract metadata and structure from Fireworks PNG files for conversion?

Since Adobe has declared Fireworks dead, it is apparent that many of us are potentially locked into Fireworks in terms of the proprietary layered PNG format that they have used. I have been hunting for options to extract / convert my documents over to either PSD's or some other layered SVG structure (perhaps similar to what Sketch does on OS X).
Anyone have any ideas for extracting the contents of Fireworks' PNGs programmatically? So far I haven't seen any libraries to make this feasible and really no discussion of how the metadata is structured in a Fireworks PNG in a format that would help to reverse engineer this format for proper conversion and extraction.

How can I create a 3D model file from geometric shapes?

I am writing a program that will output 3D model files based on simple geometric shapes (e. g. rectangular prisms & cylinders) with known coordinates in 3-dimensional space. As an example, imagine creating a 3D model of stonehenge. this question suggests that OBJ files are the easiest to generate, but I'm struggling to find a good tutorial or easy-to-use library for doing so.
Can anyone either
(1) describe step-by-step how to create a simple file OR
(2) point me to a tutorial that describes how to do so
Notes:
* Using a GUI-based program to draw such files is not an option for me
* I have no prior experience with 3D modeling
* Other formats such as WRL or DAE would work for me as well
EDIT:
I do not need to use textures, just combinations of simple geometric shapes positioned in 3D space.
I strongly recommend to use some ASCII exchange format there are many out there I usually use these:
*.x DirectX object (it is a C++ source code)
this one is easiest to implement !!! But there are not many tools that can handle them. If you do not want to spend too much time coding then this is the right choice. Just copy the templates (at the start) from any *.x file to get started.
here some specs
*.iges common and importable on most CAD/CAM platform (Catia included)
this one is a bit complicated but for export purposes it is not that bad. It supports Volume operation like +,-,&,^ which are VERY HARD to implement properly but you do not have to use them :)
*.dxf AutoCAD exchange format
this one is even more complicated then IGES. I do not recommend to use it
*.ac AC3D
I first saw this one in flight gear.
here some specs
at first look it is quite easy but the sub-object implementation is really tricky. Unless you use it you should be fine.
This approach is easily verifiable in note pad or by loading to some 3D model viewer. Chose one that is most suitable for your needs and code save/load function to your Apps internal model class/struct. This way you will be compatible with other software and eliminate incompatibility problems which are native to creating 'almost known' binary formats like 3ds,...
In your case I would use IGES (Initial Graphics Exchange Specification)
For export you do not need to implement all just few basic shapes so it would not be too difficult. I code importers which are much much more complicated. Mine IGES loader class is about 30KB of C++ source code look here for more info
You did not provide any info about your 3D mesh model structure and capabilities
like what primitives you use, are your object simple or in skeleton hierarchy, are you using textures, and more ... so it is impossible to answer
Anyway export often looks like this:
create header and structure of target file format
if the format has any directory structure fill it and write it (IGES)
for sub-objects do not forget to add transformation matrices ...
write the chunks you need (points list, faces list, normals, ...)
With ASCII formats you can do this inside String variable so you can easily insert into or modify. Do all thing in memory and write the whole thing to file at the end which is fast and also add capability to work with memory instead of files. This is handy if you want to pack many files to single package file like *.pak or send/receive files through IPC or LAN ...
[Edit1] more about IGES
fileformat specs
I learned IGES from this pdf ... Have no clue where from I got it but this was first valid link I found in google today. I am sure there is some non registration link out there too. It is about 13.7 MB and original name IGES5-3_forDownload.pdf.
win32 viewer
this is free IGES viewer. I do not like the interface and handling but it works. It is necessary to have functional viewer for testing yours ...
examples
here are many tutorial files for many entities there are 3 sub-links (igs,peek,gif) where you can see example file in more ways for better understanding.
exporting to IGES
you did not provide any info about your 3D mesh internal structure so I can not help with export. There are many ways to export the same way so pick one that is closest to your App 3D mesh representation. For example you can use:
point cloud
rotation surfaces
rectangle (QUAD) surfaces
border lines representation (non solid)
trim surface and many more ...

Resources