Comparative performance of Hadoop Zlib and JDK Gzip

Comparative performance of Hadoop Zlib and JDK Gzip - multithreading

I am doing some benchmarking of single-threaded compression codecs, and the performance I see for Zlib seems significantly higher than what you would expect for a single thread. I have used org.apache.hadoop.io.compress.zlib.ZlibCompressorfor the Zlib compressor implementation, and java.util.zip.Deflate for Gzip implementation to compare with.
Is ZLib compressor (wrapper) provided in Hadoop in some ways multi-threaded, perhaps through JNI interface?
Zlib:
import org.apache.hadoop.io.compress.zlib.*;
protected final zlibCompressor = new ZlibCompressor(ZlibCompressor.CompressionLevel.DEFAULT_COMPRESSION, ZlibCompressor.CompressionStrategy.DEFAULT_STRATEGY, ZlibCompressor.CompressionHeader.DEFAULT_HEADER, DEFAULT_BUFFER_SIZE);
protected final zlibDecompressor = new ZlibDecompressor(ZlibDecompressor.CompressionHeader.DEFAULT_HEADER, DEFAULT_BUFFER_SIZE);
//compress
zlibCompressor.setInput(uncompressed, 0, uncompressed.length);
zlibCompressor.finish();
int n = zlibCompressor.compress(compressBuffer, 0, compressBuffer.length);
//decompress
zlibCompressor.reset();
zlibDecompressor.setInput(compressed, 0, compressed.length);
int n = zlibDecompressor.decompress(uncompressBuffer, 0, uncompressBuffer.length);
Gzip:
import java.util.zip.*;
protected final deflater = new Deflater(COMPRESSION_LEVEL, NO_WRAP);
protected final inflater = new Inflater(NO_WRAP);
//compress
int n = compressBlockUsingStream(uncompressed, compressBuffer);
//decompress
inflater.reset();
int n = uncompressBlockUsingStream(new InflaterInputStream(new ByteArrayInputStream(compressed), _inflater), uncompressBuffer);
Helper funtions for Gzip:
protected int compressBlockUsingStream(byte[] uncompressed, byte[] compressBuffer) throws IOException
{
ByteArrayOutputStream out = new ByteArrayOutputStream(compressBuffer);
compressToStream(uncompressed, out);
return out.length();
}
protected int uncompressBlockUsingStream(InputStream in, byte[] uncompressBuffer) throws IOException
{
ByteArrayOutputStream out = new ByteArrayOutputStream(uncompressBuffer);
byte[] buffer = new byte[4096];
int count;
while ((count = in.read(buffer)) >= 0) {
out.write(buffer, 0, count);
}
in.close();
out.close();
return out.length();
}
Throughput:
Zlib/block -- 143.902 MBps
Gzip/JDK/stream -- 22.573 MBps
Anyone has an idea why zlib is so much faster (using all cores natively)? The code is expected to run single-threaded. Anyone is able to replicate similar result?

java.util.zip uses zlib.
Are you sure that you you're using the same compression level in both? Is COMPRESSION_LEVEL equal to ZlibCompressor.CompressionLevel.DEFAULT_COMPRESSION?

Related

Upload large file using azure java sdk more than 50k block

I'm trying to upload a file size of 230GB into azure block blob with the following code
private void uploadFile(FileObject srcFile, FileObject destFile) throws Exception {
try {
BlobClient destBlobClient = blobContainerClient.getBlobClient("destFilename");
long blockSize = 4 * 1024 * 1024 // 4 MB
ParallelTransferOptions opts = new ParallelTransferOptions()
.setBlockSizeLong(blockSize)
.setMaxConcurrency(5);
BlobRequestConditions requestConditions = new BlobRequestConditions();
try (BlobOutputStream bos = destBlobClient.getBlockBlobClient().getBlobOutputStream(
opts, null, null, null, requestConditions);
InputStream is = srcFile.getContent().getInputStream()) {
byte[] buffer = new byte[(int) blockSize];
int i = 0;
for (int len; (len = is.read(buffer)) != -1; ) {
bos.write(buffer, 0, len);
}
}
}
finally {
destFile.close();
srcFile.close();
}
}
Since,I am explicitly setting block size 4MB for each write operation I'm in a assumption that each write block is considered as single block in azure. But which is not the case.
For the above example 230GB file the write operation was executed 58880 times and the file got uploaded successfully.
Can someone please explain me more about how blocks are splits internally in azure and let help me to understand better.
Thanks in advance

How are assemblies located and loaded in .NET 5+?

I am currently elaborating which content I should use in the different version numbers, so I read What are differences between AssemblyVersion, AssemblyFileVersion and AssemblyInformationalVersion? and many other posts and articles.
Based on SemVer, I would increase the minor version number if I make backwards compatible changes. I understand and like this concept, and I want to use it.
This answer on the above linked post has good explanations on the different version numbers, but it also says that changing AssemblyVersion would require recompiling all dependent assemblies and executables.
I did a quick test:
testVersionLib.csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<TargetFramework>net5.0</TargetFramework>
<SignAssembly>true</SignAssembly>
<AssemblyOriginatorKeyFile>abc.snk</AssemblyOriginatorKeyFile>
<AssemblyVersion>3.4.5.6</AssemblyVersion>
</PropertyGroup>
</Project>
project testVersionLib: class1.cs
using System;
namespace testVersionLib
{
public class Class1
{
public string m_versionText = "3.4.5.6";
}
}
project testVersionExe: program.cs
using System;
using System.Diagnostics;
namespace testVersionExe
{
class Program
{
static void Main (string[] args)
{
Console.WriteLine ("Hello World!");
PrintFileVersion ();
PrintAssemblyFullName ();
}
private static void PrintAssemblyFullName ()
{
Console.WriteLine ("m_versionText: " + new testVersionLib.Class1 ().m_versionText);
Console.WriteLine ("DLL Assembly FullName: " + System.Reflection.Assembly.GetAssembly (typeof (testVersionLib.Class1)).FullName);
}
private static void PrintFileVersion ()
{
Console.WriteLine ("DLL FileVersion: " + FileVersionInfo.GetVersionInfo ("testVersionLib.dll").FileVersion);
}
}
}
and found that this may apply to .Net Framework, but it obviously does not apply to .NET 5 (and most likely .NET 6 and above as well, and maybe previous versions of .NET Core): I created a .NET 5 C# console app with AssemblyVersion 1.2.3.4 and strong name. This EXE references a DLL with AssemblyVersion 3.4.5.6 and strong name. The DLL compiled with different versions and is placed in the EXE's folder without compiling that one again.
The results:
The EXE fails to start if the DLL version is below 3.4.5.6 (e.g. 3.4.5.5, 3.4.4.6, 3.3.5.6), which makes sense.
The EXE successfully runs if the DLL version is equal to or above the version that was used to created the app (equal: 3.4.5.6; above: 3.4.5.7, 3.4.6.6, 3.5.5.6, 4.4.5.6).
This answer only says that
[...] .Net 5+ does not (by default) require that the assembly version used at runtime match the assembly version used at build time.
but it does not explain why and how.
How are assemblies located, resolved and loaded in .NET 5?
If someone wants to repeat my test with the compiled files, here's the 7z archive, encoded as PNG:
To decode the image, save it as PNG and use this code:
static void Main (string[] args)
{
string dataPath = #"c:\temp\net5ver.7z";
string imagePath = #"c:\temp\net5ver.7z.png";
string decodedDataPath = #"c:\temp\net5ver.out.7z";
int imageWidth = 1024;
Encode (dataPath, imagePath, imageWidth);
Decode (imagePath, decodedDataPath);
}
public static void Decode (string i_imagePath, string i_dataPath)
{
var bitmap = new Bitmap (i_imagePath);
var bitmapData = bitmap.LockBits (new Rectangle (Point.Empty, bitmap.Size), System.Drawing.Imaging.ImageLockMode.ReadOnly, bitmap.PixelFormat);
byte[] dataLengthBytes = new byte[4];
Marshal.Copy (bitmapData.Scan0, dataLengthBytes, 0, 4);
int dataLength = BitConverter.ToInt32 (dataLengthBytes);
int imageWidth = bitmap.Width;
int dataLines = (int)Math.Ceiling (dataLength / (double)imageWidth);
if (bitmap.Height != dataLines + 1)
throw new Exception ();
byte[] row = new byte[imageWidth];
List<byte> data = new();
for (int copyIndex = 0; copyIndex < dataLines; copyIndex++)
{
int rowStartIndex = imageWidth * (copyIndex + 1);
Marshal.Copy (IntPtr.Add (bitmapData.Scan0, rowStartIndex), row, 0, row.Length);
data.AddRange (row.Take (dataLength - data.Count));
}
bitmap.UnlockBits (bitmapData);
System.IO.File.WriteAllBytes (i_dataPath, data.ToArray ());
}
public static void Encode (string i_dataPath,
string i_imagePath,
int i_imageWidth)
{
byte[] data = System.IO.File.ReadAllBytes (i_dataPath);
int dataLines = (int)Math.Ceiling (data.Length / (double)i_imageWidth);
int imageHeight = dataLines + 1;
var bitmap = new Bitmap (i_imageWidth, imageHeight, System.Drawing.Imaging.PixelFormat.Format8bppIndexed);
var palette = bitmap.Palette;
for (int index = 0; index < byte.MaxValue; index++)
palette.Entries[index] = Color.FromArgb (index, index, index);
bitmap.Palette = palette;
var bitmapData = bitmap.LockBits (new Rectangle (Point.Empty, bitmap.Size), System.Drawing.Imaging.ImageLockMode.WriteOnly, bitmap.PixelFormat);
Marshal.Copy (BitConverter.GetBytes (data.Length), 0, bitmapData.Scan0, 4);
for (int copyIndex = 0; copyIndex < dataLines; copyIndex++)
{
int dataStartIndex = i_imageWidth * copyIndex;
int rowStartIndex = i_imageWidth * (copyIndex + 1);
byte[] row = data.Skip (dataStartIndex).Take (i_imageWidth).ToArray ();
Marshal.Copy (row, 0, IntPtr.Add (bitmapData.Scan0, rowStartIndex), row.Length);
}
bitmap.UnlockBits (bitmapData);
bitmap.Save (i_imagePath);
}

C#: WPD - Downloading a Picture with meta tags

I am running the Portable Device API to automatically get Photos from a connected Smart Phone. I have it all transferring correctly. The code that i use is that Standard DownloadFile() routine:
public PortableDownloadInfo DownloadFile(PortableDeviceFile file, string saveToPath)
{
IPortableDeviceContent content;
_device.Content(out content);
IPortableDeviceResources resources;
content.Transfer(out resources);
PortableDeviceApiLib.IStream wpdStream;
uint optimalTransferSize = 0;
var property = new _tagpropertykey
{
fmtid = new Guid(0xE81E79BE, 0x34F0, 0x41BF, 0xB5, 0x3F, 0xF1, 0xA0, 0x6A, 0xE8, 0x78, 0x42),
pid = 0
};
resources.GetStream(file.Id, ref property, 0, ref optimalTransferSize, out wpdStream);
System.Runtime.InteropServices.ComTypes.IStream sourceStream =
// ReSharper disable once SuspiciousTypeConversion.Global
(System.Runtime.InteropServices.ComTypes.IStream)wpdStream;
var filename = Path.GetFileName(file.Name);
if (string.IsNullOrEmpty(filename))
return null;
FileStream targetStream = new FileStream(Path.Combine(saveToPath, filename),
FileMode.Create, FileAccess.Write);
try
{
unsafe
{
var buffer = new byte[1024];
int bytesRead;
do
{
sourceStream.Read(buffer, 1024, new IntPtr(&bytesRead));
targetStream.Write(buffer, 0, 1024);
} while (bytesRead > 0);
targetStream.Close();
}
}
finally
{
Marshal.ReleaseComObject(sourceStream);
Marshal.ReleaseComObject(wpdStream);
}
return pdi;
}
}
There are two problems with this standard code:
1) - when the images are saves to the windows machine, there is no EXIF information. this information is what i need. how do i preserve it?
2) the saved files are very bloated. for example, the source jpeg is 1,045,807 bytes, whilst the downloaded file is 3,942,840 bytes!. it is similar to all of the other files. I would of thought that the some inside the unsafe{} section would output it byte for byte? Is there a better way to transfer the data? (a safe way?)

Sorry about this. it works fine.. it is something else that is causing these issues

TarsosDSP Pitch Detection from .wav file. And the result frequency is always less than half

I'm trying to use TarsosDSP library to detect pitch from a .wav file, and the result of frequency is always less than half.
Here is my code.
public class Main {
public static void main(String[] args){
try{
float sampleRate = 44100;
int audioBufferSize = 2048;
int bufferOverlap = 0;
//Create an AudioInputStream from my .wav file
URL soundURL = Main.class.getResource("/DetectPicthFromWav/329.wav");
AudioInputStream stream = AudioSystem.getAudioInputStream(soundURL);
//Convert into TarsosDSP API
JVMAudioInputStream audioStream = new JVMAudioInputStream(stream);
AudioDispatcher dispatcher = new AudioDispatcher(audioStream, audioBufferSize, bufferOverlap);
MyPitchDetector myPitchDetector = new MyPitchDetector();
dispatcher.addAudioProcessor(new PitchProcessor(PitchEstimationAlgorithm.YIN, sampleRate, audioBufferSize, myPitchDetector));
dispatcher.run();
}
catch(FileNotFoundException fne){fne.printStackTrace();}
catch(UnsupportedAudioFileException uafe){uafe.printStackTrace();}
catch(IOException ie){ie.printStackTrace();}
}
}
class MyPitchDetector implements PitchDetectionHandler{
//Here the result of pitch is always less than half.
#Override
public void handlePitch(PitchDetectionResult pitchDetectionResult,
AudioEvent audioEvent) {
if(pitchDetectionResult.getPitch() != -1){
double timeStamp = audioEvent.getTimeStamp();
float pitch = pitchDetectionResult.getPitch();
float probability = pitchDetectionResult.getProbability();
double rms = audioEvent.getRMS() * 100;
String message = String.format("Pitch detected at %.2fs: %.2fHz ( %.2f probability, RMS: %.5f )\n", timeStamp,pitch,probability,rms);
System.out.println(message);
}
}
}
The 329.wav file is generated from http://onlinetonegenerator.com/ website with 329Hz.
I don't know why the result pitch is always 164.5Hz. Is there any problem in my code?

Well I don't know what methods you are using, but by looking at how the frequency is exactly halved, it could be a problem of wrong sample rate being set?
Most operations assume an initial sample rate when the signal was sampled, maybe you've passed it as an argument (or its default value is) half that?

I just had the same problem with TarsosDSP on Android. For me the answer was that the file from http://onlinetonegenerator.com/ has 32-bit samples instead of 16-bit, which appears to be the default. Relevant code:
AssetFileDescriptor afd = getAssets().openFd("440.wav"); // 440Hz sine wave
InputStream is = afd.createInputStream();
TarsosDSPAudioFormat audioFormat = new TarsosDSPAudioFormat(
/* sample rate */ 44100,
/* HERE sample size in bits */ 32,
/* number of channels */ 1,
/* signed/unsigned data */ true,
/* big-endian byte order */ false
);
AudioDispatcher dispatcher = new AudioDispatcher(new UniversalAudioInputStream(is, audioFormat), 2048, 0);
PitchDetectionHandler pdh = ...
AudioProcessor p = new PitchProcessor(PitchProcessor.PitchEstimationAlgorithm.FFT_YIN, 44100, 2048, pdh);
dispatcher.addAudioProcessor(p);
new Thread(dispatcher, "Audio Dispatcher").start();

MonoTouch - WebRequest memory leak and crash?

I've got a MonoTouch app that does an HTTP POST with a 3.5MB file, and it is very unstable on the primary platforms that I test on (iPhone 3G with OS 3.1.2 and iPhone 4 with OS 4.2.1). I'll describe what I'm doing here and maybe someone can tell me if I'm doing something wrong.
In order to rule out the rest of my app, I've whittled this down to a tiny sample app. The app is an iPhone OpenGL Project and it does only this:
At startup, allocate 6MB of memory in 30k chunks. This simulates my app's memory usage.
Read a 3.5MB file into memory.
Create a thread to post the data. (Make a WebRequest object, use GetRequestStream(), and write the 3.5MB data in).
When the main thread detects that the posting thread is done, goto step 2 and repeat.
Also, each frame, I allocate 0-100k to simulate the app doing something. I don't keep any references to this data so it should be getting garbage collected.
iPhone 3G Result: The app gets through 6 to 8 uploads and then the OS kills it. There is no crash log, but there is a LowMemory log showing that the app was jettisoned.
iPhone 4 Result: It gets an Mprotect error around the 11th upload.
A few data points:
Instruments does NOT show the memory increasing as the app continues to upload.
Instruments doesn't show any significant leaks (maybe 1 kilobyte total).
It doesn't matter whether I write the post data in 64k chunks or all at once with one Stream.Write() call.
It doesn't matter whether I wait for a response (HttpWebRequest.HaveResponse) or not before starting the next upload.
It doesn't matter if the POST data is even valid. I've tried using valid POST data and I've tried sending 3MB of zeros.
If the app is not allocating any data each frame, then it takes longer to run out of memory (but as mentioned before, the memory that I'm allocating each frame is not referenced after the frame it was allocated on, so it should be scooped up by the GC).
If nobody has any ideas, I'll file a bug with Novell, but I wanted to see if I'm doing something wrong here first.
If anyone wants the full sample app, I can provide it, but I've pasted the contents of my EAGLView.cs below.
using System;
using System.Net;
using System.Threading;
using System.Collections.Generic;
using System.IO;
using OpenTK.Platform.iPhoneOS;
using MonoTouch.CoreAnimation;
using OpenTK;
using OpenTK.Graphics.ES11;
using MonoTouch.Foundation;
using MonoTouch.ObjCRuntime;
using MonoTouch.OpenGLES;
namespace CrashTest
{
public partial class EAGLView : iPhoneOSGameView
{
[Export("layerClass")]
static Class LayerClass ()
{
return iPhoneOSGameView.GetLayerClass ();
}
[Export("initWithCoder:")]
public EAGLView (NSCoder coder) : base(coder)
{
LayerRetainsBacking = false;
LayerColorFormat = EAGLColorFormat.RGBA8;
ContextRenderingApi = EAGLRenderingAPI.OpenGLES1;
}
protected override void ConfigureLayer (CAEAGLLayer eaglLayer)
{
eaglLayer.Opaque = true;
}
protected override void OnRenderFrame (FrameEventArgs e)
{
SimulateAppAllocations();
UpdatePost();
base.OnRenderFrame (e);
float[] squareVertices = { -0.5f, -0.5f, 0.5f, -0.5f, -0.5f, 0.5f, 0.5f, 0.5f };
byte[] squareColors = { 255, 255, 0, 255, 0, 255, 255, 255, 0, 0,
0, 0, 255, 0, 255, 255 };
MakeCurrent ();
GL.Viewport (0, 0, Size.Width, Size.Height);
GL.MatrixMode (All.Projection);
GL.LoadIdentity ();
GL.Ortho (-1.0f, 1.0f, -1.5f, 1.5f, -1.0f, 1.0f);
GL.MatrixMode (All.Modelview);
GL.Rotate (3.0f, 0.0f, 0.0f, 1.0f);
GL.ClearColor (0.5f, 0.5f, 0.5f, 1.0f);
GL.Clear ((uint)All.ColorBufferBit);
GL.VertexPointer (2, All.Float, 0, squareVertices);
GL.EnableClientState (All.VertexArray);
GL.ColorPointer (4, All.UnsignedByte, 0, squareColors);
GL.EnableClientState (All.ColorArray);
GL.DrawArrays (All.TriangleStrip, 0, 4);
SwapBuffers ();
}
AsyncHttpPost m_Post;
int m_nPosts = 1;
byte[] LoadPostData()
{
// Just return 3MB of zeros. It doesn't matter whether this is valid POST data or not.
return new byte[1024 * 1024 * 3];
}
void UpdatePost()
{
if ( m_Post == null || m_Post.PostStatus != AsyncHttpPostStatus.InProgress )
{
System.Console.WriteLine( string.Format( "Starting post {0}", m_nPosts++ ) );
byte [] postData = LoadPostData();
m_Post = new AsyncHttpPost(
"https://api-video.facebook.com/restserver.php",
"multipart/form-data; boundary=" + "8cdbcdf18ab6640",
postData );
}
}
Random m_Random = new Random(0);
List< byte [] > m_Allocations;
List< byte[] > m_InitialAllocations;
void SimulateAppAllocations()
{
// First time through, allocate a bunch of data that the app would allocate.
if ( m_InitialAllocations == null )
{
m_InitialAllocations = new List<byte[]>();
int nInitialBytes = 6 * 1024 * 1024;
int nBlockSize = 30000;
for ( int nCurBytes = 0; nCurBytes < nInitialBytes; nCurBytes += nBlockSize )
{
m_InitialAllocations.Add( new byte[nBlockSize] );
}
}
m_Allocations = new List<byte[]>();
for ( int i=0; i < 10; i++ )
{
int nAllocationSize = m_Random.Next( 10000 ) + 10;
m_Allocations.Add( new byte[nAllocationSize] );
}
}
}
public enum AsyncHttpPostStatus
{
InProgress,
Success,
Fail
}
public class AsyncHttpPost
{
public AsyncHttpPost( string sURL, string sContentType, byte [] postData )
{
m_PostData = postData;
m_PostStatus = AsyncHttpPostStatus.InProgress;
m_sContentType = sContentType;
m_sURL = sURL;
//UploadThread();
m_UploadThread = new Thread( new ThreadStart( UploadThread ) );
m_UploadThread.Start();
}
void UploadThread()
{
using ( MonoTouch.Foundation.NSAutoreleasePool pool = new MonoTouch.Foundation.NSAutoreleasePool() )
{
try
{
HttpWebRequest request = WebRequest.Create( m_sURL ) as HttpWebRequest;
request.Method = "POST";
request.ContentType = m_sContentType;
request.ContentLength = m_PostData.Length;
// Write the post data.
using ( Stream stream = request.GetRequestStream() )
{
stream.Write( m_PostData, 0, m_PostData.Length );
stream.Close();
}
System.Console.WriteLine( "Finished!" );
// We're done with the data now. Let it be garbage collected.
m_PostData = null;
// Finished!
m_PostStatus = AsyncHttpPostStatus.Success;
}
catch ( System.Exception e )
{
System.Console.WriteLine( "Error in AsyncHttpPost.UploadThread:\n" + e.Message );
m_PostStatus = AsyncHttpPostStatus.Fail;
}
}
}
public AsyncHttpPostStatus PostStatus
{
get
{
return m_PostStatus;
}
}
Thread m_UploadThread;
// Queued to be handled in the main thread.
byte [] m_PostData;
AsyncHttpPostStatus m_PostStatus;
string m_sContentType;
string m_sURL;
}
}

I think you should read in your file 1 KB (or some arbitrary size) at a time and write it to the web request.
Code similar to this:
byte[] buffer = new buffer[1024];
int bytesRead = 0;
using (FileStream fileStream = File.OpenRead("YourFile.txt"))
{
while ((bytesRead = fileStream.Read(buffer, 0, buffer.Length)) != 0)
{
httpPostStream.Write(buffer, 0, bytesRead);
}
}
This is off the top of my head, but I think it's right.
This way you don't have an extra 3MB floating around in memory when you don't really need to. I think tricks like this are even more important on iDevices (or other devices) than on the desktop.
Test the buffer size too, a larger buffer will get you better speeds up to a point (I remember 8KB being pretty good).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Comparative performance of Hadoop Zlib and JDK Gzip - multithreading

java.util.zip uses zlib. Are you sure that you you're using the same compression level in both? Is COMPRESSION_LEVEL equal to ZlibCompressor.CompressionLevel.DEFAULT_COMPRESSION?

Related

Upload large file using azure java sdk more than 50k block

How are assemblies located and loaded in .NET 5+?

C#: WPD - Downloading a Picture with meta tags

TarsosDSP Pitch Detection from .wav file. And the result frequency is always less than half

MonoTouch - WebRequest memory leak and crash?

Categories

Resources