Is HDF5 able to handle multiple threads on its own, or does it have to be externally synchronized? The OpenMP example suggests the latter.
If the former, what is the proper way to define the dataspace to write to?
Anycorn,
HDF5 can handle multiple threads without external synchronization, although the writes will still be serial. You should compile the latest version (1.8.6 as of 4/5/2011) and run ./configure with the --enable-threadsafe and -with-pthreads=/pthreads-include-path/,/pthreads-lib-path/ flags.
For example:
./configure --enable-threadsafe -with-pthreads=/usr/include,/usr/lib
With regards to defining a dataspace for writing, the simplest way is to construct a basic rectangular-hyperplane using a multi-dimensional array, a rank value, and the H5Screate_simple function. Mine usually follow the same steps:
//NUM = Number of spaces in this dimension
//Create a 1 dimensional array
hsize_t dsDim[1] = {NUM};
//Create the 1x1xNUM data space (rank param = 1).
hid_t dSpace = H5Screate_simple(1, dsDim, NULL);
...
Create datasets using the dataspace
...
//Release the data space
H5Sclose(dSpace);
Hope this helps!
Related
I want to copy files from one place to another and the problem is I deal with a lot of sparse files.
Is there any (easy) way of copying sparse files without becoming huge at the destination?
My basic code:
out, err := os.Create(bricks[0] + "/" + fileName)
in, err := os.Open(event.Name)
io.Copy(out, in)
Some background theory
Note that io.Copy() pipes raw bytes – which is sort of understandable once you consider that it pipes data from an io.Reader to an io.Writer which provide Read([]byte) and Write([]byte), correspondingly.
As such, io.Copy() is able to deal with absolutely any source providing
bytes and absolutely any sink consuming them.
On the other hand, the location of the holes in a file is a "side-channel" information which "classic" syscalls such as read(2) hide from their users.
io.Copy() is not able to convey such side-channel information in any way.
IOW, initially, file sparseness was an idea to just have efficient storage of the data behind the user's back.
So, no, there's no way io.Copy() could deal with sparse files in itself.
What to do about it
You'd need to go one level deeper and implement all this using the syscall package and some manual tinkering.
To work with holes, you should use the SEEK_HOLE and SEEK_DATA special values for the lseek(2) syscall which are, while formally non-standard, are supported by all major platforms.
Unfortunately, the support for those "whence" positions is not present
neither in the stock syscall package (as of Go 1.8.1)
nor in the golang.org/x/sys tree.
But fear not, there are two easy steps:
First, the stock syscall.Seek() is actually mapped to lseek(2)
on the relevant platforms.
Next, you'd need to figure out the correct values for SEEK_HOLE and
SEEK_DATA for the platforms you need to support.
Note that they are free to be different between different platforms!
Say, on my Linux system I can do simple
$ grep -E 'SEEK_(HOLE|DATA)' </usr/include/unistd.h
# define SEEK_DATA 3 /* Seek to next data. */
# define SEEK_HOLE 4 /* Seek to next hole. */
…to figure out the values for these symbols.
Now, say, you create a Linux-specific file in your package
containing something like
// +build linux
const (
SEEK_DATA = 3
SEEK_HOLE = 4
)
and then use these values with the syscall.Seek().
The file descriptor to pass to syscall.Seek() and friends
can be obtained from an opened file using the Fd() method
of os.File values.
The pattern to use when reading is to detect regions containing data, and read the data from them – see this for one example.
Note that this deals with reading sparse files; but if you'd want to actually transfer them as sparse – that is, with keeping this property of them, – the situation is more complicated: it appears to be even less portable, so some research and experimentation is due.
On Linux, it appears you could try to use fallocate(2) with
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE to try to punch a hole at the
end of the file you're writing to; if that legitimately fails
(with syscall.EOPNOTSUPP), you just shovel as many zeroed blocks to the destination file as covered by the hole you're reading – in the hope
the OS will do the right thing and will convert them to a hole by itself.
Note that some filesystems do not support holes at all – as a concept.
One example is the filesystems in the FAT family.
What I'm leading you to is that inability of creating a sparse file might
actually be a property of the target filesystem in your case.
You might find Go issue #13548 "archive/tar: add support for writing tar containing sparse files" to be of interest.
One more note: you might also consider checking whether the destination directory to copy a source file resides in the same filesystem as the source file, and if this holds true, use the syscall.Rename() (on POSIX systems)
or os.Rename() to just move the file across different directories w/o
actually copying its data.
You don't need to resort to syscalls.
package main
import "os"
func main() {
f, _ := os.Create("/tmp/sparse.dat")
f.Write([]byte("start"))
f.Seek(1024*1024*10, 0)
f.Write([]byte("end"))
}
Then you'll see:
$ ls -l /tmp/sparse.dat
-rw-rw-r-- 1 soren soren 10485763 Jun 25 14:29 /tmp/sparse.dat
$ du /tmp/sparse.dat
8 /tmp/sparse.dat
It's true you can't use io.Copy as is. Instead you need to implement an alternative to io.Copy which reads a chunk from the src, checks if it's all '\0'. If it is, just dst.Seek(len(chunk), os.SEEK_CUR) to skip past that part in dst. That particular implementation is left as an exercise to the reader :)
I have a snippet that converts vtk (off screen) rendering to 1)Point cloud; 2)Color image. The implementation is correct, it just the speed/efficiency is an issue.
At the beginning of every iteration, I update my rendering by calling:
renderWin->Render ();
For point cloud, I get the depth using following line and then convert it to point cloud (code not posted).
float *depth = new float[width * height];
renderWin->GetZbufferData (0, 0, width - 1, height - 1, &(depth[0]));
For color image, I use vtkWindowToImageFilter to get current color rendered image:
windowToImageFilter->Modified(); // Must have this to get updated rendered image
windowToImageFilter->Update(); // this line takes a lot of time
render_img_vtk = windowToImageFilter->GetOutput();
Above program is run in the same thread sequentially. The renderWindow size is about 1000x1000. There is not a lot of polydata needs to be rendered. VTK was compiled with OpenGL2 support.
Issue:
This code only runs about 15-20Hz, when I disabled/comment the windowToImageFilter part (vtkWindowToImageFilter::Update() takes a lot of time), the framerate goes to about 30Hz.
When I disabled/comment vtkRenderWindow::GetZbufferData, it goes up to 50Hz (which is how fast I call my loop and update the rendering).
I had a quick look of the VTK source file of these two function, I see it copy data using GL command. I am not sure how can I speed this up.
Update:
After some search, I found that the glReadPixels function called in the GetZbufferData causes delay as it try to synchronize the data. Please see this post: OpenGL read pixels faster than glReadPixels.
In this post, it is suggested that PBO should be used. VTK has a class vtkPixelBufferObject but no example can be found for using it to avoid blocking the pipeline when do glReadPixels()
So how can I do this within the VTK pipeline?
My answer is just about the GetZbufferData portion.
vtkOpenGLRenderWindow already uses glReadPixels with little overhead from what I can tell. here
What happens after that I believe can introduce overhead. Main thing to note is that vtkOpenGLRenderWindow has 3 method overloads for GetZbufferData. You are using the method overload with the same signature as the one used in vtkWindowToImageFilter here
I believe you are copying that part of the implementation in vtkWindowToImageFilter, which makes total sense. What do you do with float pointer depthBuffer after you get it? Looking at the vtkWindowToImageFilter implementation, I see that they have a for loop that calls memcpy here. I believe their memcpy has to be in a for loop in order to deal with spacing, because of the variables inIncrY and outIncrY. For your situation you should only have to call memcpy once then free the array pointed to by depthBuffer. Unless you are just using the pointer. Then you have to think about who has to delete that float array, because it was created with new.
I think the better option is to use the method with this signature: int GetZbufferData( int x1, int y1, int x2, int y2, vtkFloatArray* z )
In python that looks likes this:
import vtk
# create render pipeline (not shown)
# define image bounds (not shown)
vfa = vtk.vtkFloatArray()
ib = image_bounds
render_window.GetZbufferData(ib[0], ib[1], ib[2], ib[3], vfa)
Major benefit is that the pointer for the vtkFloatArray gets handed straight to glReadPixels. Also, vtk will take of garbage collection of the vtkFloatArray if you create it with vtkSmartPointer (not needed in Python)
My python implementation is running at about 150Hz on a single pass. On a 640x480 render window.
edit: Running at 150Hz
I have a volume stored as slices in c# memory. The slices may not be consecutive in memory. I want to import this data and create a vtkImageData object.
The first way I found is to use a vtkImageImporter, but this importer only accepts a single void pointer as data input it seems. Since my slices may not be consecutive in memory, I cannot hand a single pointer to my slice data.
A second option is to create the vtkImageData from scratch and use vtkImageData->GetScalarPointer()" to get a pointer to its data. Than fill this using a loop. This is quite costly (although memcpy could speed things up a bit). I could also combine the copy approach with the vtkImageImport ofcourse.
Are these my only options, or is there a better way to get the data into a vtk object? I want to be sure there is no other option before I take the copy approach (performance heavy), or modify the low level storage of my slices so they become consecutive in memory.
I'm not too familiar with VTK for C# (ActiViz). In C++ is a good approach and rather fast one to use vtkImageData->GetScalarPointer() and manually copy your slices. It will increase your speed storing all memory first as you said, perhaps you want to do it this more robust way (change the numbers):
vtkImageData * img = vtkImageData::New();
img->SetExtent(0, 255, 0, 255, 0, 9);
img->SetSpacing(sx , sy, sz);
img->SetOrigin(ox, oy, oz);
img->SetNumberOfScalarComponents(1);
img->SetScalarTypeToFloat();
img->AllocateScalars();
Then is not to hard do something like:
float * fp = static_cast<float *>(img->GetScalarPointer());
for ( int i = 0; i < 256* 256* 10; i ++) {
fp[i] = mydata[i]
}
Another fancier option is to create your own vtkImporter basing the code in the vtkImageImport.
I am trying to gain further improvement in my Image Resizing algorithm by combining IPP and TBB. The two ways that I can accomplish this task are:
Use IPP without TBB
Use IPP with TBB inside a parallel_for loop
My question is that I have coded the application, and I get correct result. But surprisingly, my computational time is larger when they are combined. To avoid clutter, I only paste part of my code in here. But I can provide the whole code if needed. For the first case when I use only IPP, the code is like: (The base of the algorithm was borrowed from the Intel TBB sample code for Image resizing)
ippiResizeSqrPixel_8u_C1R(src, srcSize, srcStep, srcRoi, dst, dstStep, dstRoi,
m_nzoom_x,m_nzoom_y,0, 0, interpolation, pBufferWhole);
and the parallel_for loop looks like this:
parallel_for(
blocked_range<size_t>(0,CHUNK),
[=](const blocked_range<size_t> &r){
for (size_t i= r.begin(); i!= r.end(); i++){
ippiResizeSqrPixel_8u_C1R(src+((int)(i*srcWidth*srcHeight)), srcSize,
srcStep, srcRoi, dst+((int)(i*dstWidth*dstHeight)), dstStep, dstRoi,
m_nzoom_x,m_nzoom_y,0, 0, interpolation, pBuffer);
}
}
);
src and dst are pointers to the source image and the destination image. When TBB is used, the image is partitioned into CHUNKS parts and the parallel_for loops through all the CHUNKS and uses an IPP function to resize each CHUNK independently. The value for dstHeight, srcHeight, srcRoi, and dstRoi are modified to accommodate the partitioning of the image, and src+((int)(i*srcWidth*srcHeight)) and dst+((int)(i*dstWidth*dstHeight)) will point to the beginning of each partition in the source and destination image.
Apparently, IPP and TBB can be combined in this manner -- as I get the correct result -- but what baffles me is that the computational time deteriorates when they're combined compared to when IPP is used alone. Any thought on what could be the cause, or how I could solve this issue?
Thanks!
In your code, each parallelized task in parallel_for consists of multiple ippiResizeSqrPixel calls.
This might be meaningless overhead as compared to serial version that calls only once, because such function may contain prepare phase (for example, setup interpolation coefficients table) and it's generally designed to process large memory block at a time for runtime efficiency. (but I don't know how IPP does actually.)
I suggest you following parallel structure:
parallel_for(
// Range = src (or dst) height of image.
blocked_range<size_t>(0, height),
[=](const blocked_range<size_t> &r) {
// 'r' = vertical range of image to process in this task.
// You can calculate src/dst region from 'r' here,
// and call ippiResizeSqrPixel once per task.
ippiResizeSqrPixel_8u_C1R( ... );
}
);
Turns out that some IPP functions use multi-threading automatically. For such functions no improvements can be gained out of using TBB. Apparently ippiResizeSqrPixel_8u_C1R( ... ) function is one of those functions. When I disabled all the cores but one, both versions did equally good.
does J2ME have something similar to RandomAccessFile class, or is there any way to emulate this particular (random access) functionality?
The problem is this: I have a rather large binary data file (~600 KB) and would like to create a mobile application for using that data. Format of that data is home-made and contains many index blocks and data blocks. Reading the data on other platforms (like PHP or C) usually goes like this:
Read 2 bytes for index key (K), another 2 for index value (V) for the data type needed
Skip V bytes from the start of the file to seek to a file position there the data for index key K starts
Read the data
Profit :)
This happens many times during the program flow.
Um, and I'm investigating possibility of doing the very same on J2ME, and while I admit I'm quite new to the whole Java thing, I can't seem to be able to find anything beyond InputStream (DataInputStream) classes which don't have the basic seeking/skipping to byte/returning position functions I need.
So, what are my chances?
You should have something like this
try {
DataInputStream di = new DataInputStream(is);
di.marke(9999);
short key = di.readShort();
short val = di.readShort();
di.reset();
di.skip(val);
byte[] b= new byte[255];
di.read(b);
}catch(Exception ex ) {
ex.printStackTrace();
}
I prefer not to use the marke/reset methods, I think it is better to save the offset from the val location not from the start of the file so you can skip these methods. I think they have som issues on some devices.
One more note, I don't recommend to open a 600 KB file, it will crash the application on many low end devices, you should split this file to multiple files.