I'm developing a WPF App, in which I want to encrypt and compress heavy DataSet or Objects (<2MB) and send it across over the network. The other party would decompress and decrypt the data and consume it. This process will be on both sides from Server(WCF) to Client and Client to Server.
I want efficient Compression Class (would like to stick to .Net compression classes).
Takes less time to compress and decompress.
compress ratio high while data retrieval should be 100%.
Can anyone suggest me about the compression classes (DeflateStream/GzipStream).
Thanks
VJ
Most people confuses "compressed final size" == "better network performance". In general sense, using "better-than-deflate" class compression algorithms can reduce transfer bandwidth, but they can increase total transfer time (compression+transfer+decompression). In that sense LZ class compressor seems best. Fastest implementation can be QuickLZ or LZ4. They both have a C# version. But, their implementation is not exactly as e.g. DeflateStream (actually simplier to use). QuickLZ has growing usage on network related applications while LZ4 patched into Apache Hadoop source trunk recently instead of Google's Snappy.
If you need more compression, you can grab LZMA SDK which consists of managed LZMA compression/decompression methods. But, I should warn you, LZMA's memory consumption is usually very high (depends on parameters). So, spawning several LZMA-powered threads is not likely you really want.
If you still need more and more compression, have a look at PPM or bitwise-CM class algorithms. PPM is very good on textual data and has an average speed (usually 2-3 MiB/sec). On the other hand, CM is very good on binary data. Their memory consumption can be high (depends on parameters) and very slow (1 MiB/sec to even several bytes/sec, depends on algorithms). Unfortunately, you may only find PPM .NET implementation on the internet. But, CM is really troublesome to find in .NET due to high complexity. I have written order0 bitwise coder in .NET that can be extended to a proper CM with additional models if you really need.
Related
From Wikipedia, about ZPAQ Compression-
ZPAQ has 5 compression levels from fast to best. At all but the best level, it uses the statistics of the order-1 prediction table used for deduplication to test whether the input appears random. If so, it is stored without compression as a speed optimization.
I've been working with the Python Data Compression and Archiving module, and wonder if any of those implementations (ZLIB, BZ2, LZMA) do the same? Do any of them simply store the data 'as-is' when it looks almost random? I'm not a coding expert and can't really follow the source code.
Related: How to efficiently predict if data is compressible
Some incomplete / best-guess remarks:
LZMA2 seems to do that, although for different reasons: compression-ratio; not for improving compression-time.
This is indicated at wiki:
LZMA2 is a simple container format that can include both uncompressed data and LZMA data, possibly with multiple different LZMA encoding parameters.
The XZ LZMA2 encoder processes the input in chunks (of up to 2 MB uncompressed size or 64 KB compressed size, whichever is lower), handing each chunk to the LZMA encoder, and then deciding whether to output an LZMA2 LZMA chunk including the encoded data, or to output an LZMA2 uncompressed chunk, depending on which is shorter (LZMA, like any other compressor, will necessarily expand rather than compress some kinds of data).
The latter quote also shows that there is no expected compression-speed gain as it's more or less a: do both and pick best approach.
(The article seems to focus on xz-based lzma2; probably transfers to whatever is within python; but no guarantees)
Above, together with python's docs:
Compression filters:
FILTER_LZMA1 (for use with FORMAT_ALONE)
FILTER_LZMA2 (for use with FORMAT_XZ and FORMAT_RAW)
would make me think you got everything you need and just need to use the right filter.
So check your reasoning again (time- or compression-ratio) and try the lzma2-filter with custom-prepared mixed data (if you don't want to trust blindly).
Intuition i don't expect the more classic zlib/bz2 formats to exploit uncompressable data (but it's a pure guess).
It just seems to me that when writing code for dynamic data visualization, I end up doing the same things over and over in different languages/platforms. Now if I had a cross platform language(which I do) and something like a binary version of SVG, I could make my code target that and use/create interpreters for whatever platform I currently need to use it on.
The reason I don't want SVG is because the plaintext part makes it too slow for my purposes. I could of course just create my own intermediary format but if there is something already out there that's implemented by various things then the less work for me!
Depending on what you mean by “too slow”, the answer varies:
Filesize too large
Officially, the closest thing SVG has to a binary format is SVGZ, which is a gzipped SVG file with the .svgz extension. All conforming SVG viewers should be able to open it. Making one is simple on *nix systems:
gzip yourfile.svg && mv yourfile.svg.gz yourfile.svgz
You could also try Brotli compression, which tends to have smaller filesize at the cost of more compression time.
Including other assets is inefficient
SVG can only bundle bitmaps and other binary data through base64 encoding, which has a fair amount of overhead.
PDF can include “streams” of raw binary data, and is surprisingly efficient when programmatically generated.
Parsing the text data takes too long
This is tricky. PDF and its brother, Encapsulated PostScript, are also old, well-supported vector graphic formats. Unfortunately, they too are also text at their core, with optional compression.
You could try Computer Graphics Metafiles, which can be compiled ahead of time. But I’m unsure how well-supported they are across consumer devices.
From a comment:
Almost nothing about the performance of SVG other than the transmission cost of sending it over a network is down to it being plaintext
No, that's completely wrong. I worked at CSIRO using XML for massive 3D models. GeoScience Australia did a formal study into the parsing speed - parsing floating point numbers from text is relatively expensive for big data sets, compared to reading a 4 or 8 byte binary representation.
I've spent a lot of time optimising my internal binary formats for Touchgram and am now looking at vector art.
One of the techniques you can use is a combination of
variable-length integer coding and
normalising your points to a scale represented by integers, then storing paths as sequences of deltas
That can yield paths where often only 1 or 2 bytes are used per step, as opposed to the typical 12.
Consider a basic line
<polyline class="Connect" points="100,200 100,100" />
I could represent that with 4 bytes instead of 53.
So far, all I've been able to find in binary SVG is this post about a Go project linking to the project description and repo
Adobe Flash SWF files may work. Due to its previous ubiquity, 'players' and libraries were written for many platforms. The specifications were open and license permitting. For simple 2D graphics, earlier, more compatible versions would do fine.
The files are binary and extraordinarily small.
DDS format has been made for directX right ? so it's should be optimized for it and not for openGL I guess.
So, there is another format(s) ? if yes, what format is a good choice ? what reason(s) ?
also, since I'm working on linux, I'm also concerned by making textures on linux. So I need a format who can be imported/exported by gimp.
The DDS format is useful for storing compressed textures. If you store the file in the same compression as it will be stored in the GPU memory, you don't need to decode and re-encode for GPU storage, instead you can just move it directly to memory.
The DDS format is basically used to store S3 Texture Compression data. The internal DDS formats DTX3 and DTX5 are for example S3TC formats that are also supported by OpenGL:
http://www.opengl.org/wiki/S3_Texture_Compression
DDS also can store pre-calculated mipmaps. Again this is a trade-off (larger file size) for reducing loading times, as the mipmaps could also be calculated at loading time.
As you can see, if you have the right code to parse the DDS file, e.g. the payload will be taken in its compressed form and not decoded on the host machine, then it is perfectly fine to use a DDS.
For an alternative, #CoffeeandCode pointed out the KTX format in his answer. These files use a different compression algorithm (see here). The advantage is that this compression is mandatory in newer OpenGL versions, while S3TC compression was always only available as an extension (and has patent problems). I don't know how they compare in quality and if you can expect OpenGL 4.3 on your target platforms.
My Take: If you are targeting recent hardware and OpenGL support (OpenGL ES 3 or OpenGL 4.3), you should use the KTX format and respective texture formats (libktx will generate the texture objects for you). If you need to be able to run on older hardware or happen to already have a lot of DDS data, you should probably stick with DDS for the time being.
There is nothing particularly D3D-"optimized" about DDS.
Once you read the header correctly, the (optionally) pre-compressed data is binary compatible with S3TC. Both DDS and OpenGL's equivalent (KTX) are capable of storing cubemap arrays and mipmaps in a single image file, that is their primary appeal.
With other formats, you may wind up using the driver to compress and/or generate mipmaps, and the driver usually does a poor job quality wise. The drivers are usually designed to do this quickly because they have a lot of textures that need to be compressed / mipmapped. You can use a higher quality mipmap downsample filter / compression implementation offline since the amount of time required to complete is rather unimportant.
The key benefits of DDS / KTX are:
Offline computation of mipmaps
Offline compression
Store all layers of a texture array/cubemap
Doing (1) and (2) offline can both improve image quality and reduce the overhead of loading textures at run-time. (3) is mostly for convenience, but a welcomed one.
I think the closest equivalent to DDS for DirectX is KTX, but even DDS works fine under OpenGL once parsed.
There's not much on this subject, perhaps because it isn't a good idea in the first place.
I want to create a realtime audio synthesis/processing engine that runs on the GPU. The reason for this is because I will also be using a physics library that runs on the GPU, and the audio output will be determined by the physics state. Is it true that GPU only carries audio output and can't generate it? Would this mean a large increase in latency, if I were to read the data back on the CPU and output it to the soundcard? I'm looking for a latency between 10 and 20ms in terms of the time between synthesis and playback.
Would the GPU accelerate synthesis by any worthwhile amount? I'm going to have a large number of synthesizers running at once, each of which I imagine could take up their own parallel process. AMD is coming out with GPU audio, so there must be something to this.
For what it's worth, I'm not sure that this idea lacks merit. If DarkZero's observation about transfer times is correct, it doesn't sound like there would be much overhead in getting audio onto the GPU for processing, even from many different input channels, and while there are probably audio operations that are not very amenable to parallelization, many are very VERY parallelizable.
It's obvious for example, that computing sine values for 128 samples of output from a sine source could be done completely in parallel. Working in blocks of that size would permit a latency of only about 3ms, which is acceptable in most digital audio applications. Similarly, the many other fundamental oscillators could be effectively parallelized. Amplitude modulation of such oscillators would be trivial. Efficient frequency modulation would be more challenging, but I would guess it is still possible.
In addition to oscillators, FIR filters are simple to parallelize, and a google search turned up some promising looking research papers (which I didn't take the trouble to read) that suggest that there are reasonable parallel approaches to IIR filter implementation. These two types of filters are fundamental to audio processing and many useful audio operations can be understood as such filters.
Wave-shaping is another task in digital audio that is embarrassingly parallel.
Even if you couldn't take an arbitrary software synth and map it effectively to the GPU, it is easy to imagine a software synthesizer constructed specifically to take advantage of the GPU's strengths, and avoid its weaknesses. A synthesizer relying exclusively on the components I have mentioned could still produce a fantastic range of sounds.
While marko is correct to point out that existing SIMD instructions can do some parallelization on the CPU, the number of inputs they can operate on at the same time pales in comparison to a good GPU.
In short, I hope you work on this and let us know what kind of results you see!
DSP operations on modern CPUs with vector processing units (SSE on x86/x64 or NEON on ARM) are already pretty cheap if exploited properly. This is particularly the case with filters, convolution, FFT and so on - which are fundamentally stream-based operations. There are the type of operations where a GPU might also excel.
As it turns out, soft synthesisers have quite a few operations in them that are not stream-like, and furthermore, the tendency is to process increasingly small chunks of audio at once to target low latency. These are a really bad fit for the capabilities of GPU.
The effort involved in using a GPU - particularly getting data in and out - is likely to far exceed any benefit you get. Furthermore, the capabilities of inexpensive personal computers - and also tablets and mobile devices - are more than enough for many digital audio applications AMD seem to have a solution looking for a problem. For sure, the existing music and digital audio software industry is not about to start producing software that only targets a limited sub-set of hardware.
Typical transfer times for some MB to/from GPU take 50us.
Delay is not your problem, however parallelizing a audio synthesizer in the GPU may be quite difficult. If you don't do it properly it may take more time the processing rather than the copy of data.
If you are going to run multiple synthetizers at once, I would recommend you to perform each synthesizer in a work-group, and parallelize the synthesis process with the work-items available. It will not be worth to have each synthesizer in one work-item, since it is unlikely you will have thousand.
http://arxiv.org/ftp/arxiv/papers/1211/1211.2038.pdf
You might be better off using OpenMP for it's lower initialization times.
You could check out the NESS project which is all about physical modelling synthesis. They are using GPUs for audio rendering because it the process involves simulating an acoustic 3D space for whichever given sound, and calculating what happens to that sound within the virtual 3D space (and apparently GPUs are good at working with this sort of data). Note that this is not realtime synthesis because it is so demanding of processing.
[Note: This is a rewrite of an earlier question that was considered inappropriate and closed.]
I need to do some pixel-level analysis of television (TV) video. The exact nature of this analysis is not pertinent, but it basically involves looking at every pixel of every frame of TV video, starting from an MPEG-2 transport stream. The host platform will be server-class, multiprocessor 64-bit Linux machines.
I need a library that can handle the decoding of the transport stream and present me with the image data in real-time. OpenCV and ffmpeg are two libraries that I am considering for this work. OpenCV is appealing because I have heard it has easy to use APIs and rich image analysis support, but I have no experience using it. I have used ffmpeg in the past for extracting video frame data from files for analysis, but it lacks image analysis support (though Intel's IPP can supplement).
In addition to general recommendations for approaches to this problem (excluding the actual image analysis), I have some more specific questions that would help me get started:
Are ffmpeg or OpenCV commonly used in industry as a foundation for real-time
video analysis, or is there something else I should be looking at?
Can OpenCV decode video frames in real time, and still leave enough
CPU left over to do nontrivial image analysis, also in real-time?
Is sufficient to use ffpmeg for MPEG-2 transport stream decoding, or
is it preferable to just use an MPEG-2 decoding library directly (and if so, which one)?
Are there particular pixel formats for the output frames that ffmpeg
or OpenCV is particularly efficient at producing (like RGB, YUV, or YUV422, etc)?
1.
I would definitely recommend OpenCV for "real-time" image analysis. I assume by real-time you are referring to the ability to keep up with TV frame rates (e.g., NTSC (29.97 fps) or PAL (25 fps)). Of course, as mentioned in the comments, it certainly depends on the hardware you have available as well as the image size SD (480p) vs. HD (720p or 1080p). FFmpeg certainly has its quirks, but you would be hard pressed to find a better free alternative. Its power and flexibility quite impressive; I'm sure that is one of the reasons that the OpenCV developers decided to use it as the back-end for video decoding/encoding with OpenCV.
2.
I have not seen issues with high-latency while using OpenCV for decoding. How much latency can your system have? If you need to increase performance, consider using separate threads for capture/decoding and image analysis. Since you mentioned having multi-processor systems, this should take greater advantage of your processing capabilities. I would definitely recommend using the latest Intel Core-i7 (or possibly the Xeon equivalent) architecture as this will give you the best performance available today.
I have used OpenCV on several embedded systems, so I'm quite familiar with your desire for peak performance. I have found many times that it was unnecessary to process a full frame image (especially when trying to determine masks). I would highly recommend down-sampling the images if you are having difficultly processing your acquired video streams. This can sometimes instantly give you a 4-8X speedup (depending on your down-sample factor). Also on the performance front, I would definitely recommend using Intel's IPP. Since OpenCV was originally an Intel project, IPP and OpenCV blend very well together.
Finally, because image-processing is one of those "embarrassingly parallel" problem fields don't forget about the possibility of using GPUs as a hardware accelerator for your problems if needed. OpenCV has been doing a lot of work on this area as of late, so you should have those tools available to you if needed.
3.
I think FFmpeg would be a good starting point; most of the alternatives I can think of (Handbrake, mencoder, etc.) tend to use ffmpeg as a backend, but it looks like you could probably roll your own with IPP's Video Coding library if you wanted to.
4.
OpenCV's internal representation of colors is BGR unless you use something like cvtColor to convert it. If you would like to see a list of the pixel formats that are supported by FFmpeg, you can run
ffmpeg -pix_fmts
to see what it can input and output.
For the 4th question only:
video streams are encoded in a 422 format: YUV, YUV422, YCbCr, etc. Converting them to BGR and back (for re-encoding) eats up lots of time. So if you can write your algorithms to run on YUV you'll get an instant performance boost.
Note 1. While OpenCV natively supports BGR images, you can make it process YUV, with some care and knowledge about its internals.
By example, if you want to detect some people in the video, just take the upper half of the decoded video buffer (it contains the grayscale representation of the image) and process it.
Note 2. If you want to access the YUV image in opencv, you must use ffmpeg API directly in your app. OpenCV force the conversion from YUV to BGR in its VideoCapture API.