lmdb: how to determine free space left? - lmdb

When creating lmdb environment I can specify map size. Is there a way to determine at any point how much of the map size is used up?
In other words, I need to find out how much free space is left to be able to address the issue of running out of space before it happens.
The only thing I could think of is to go through all databases and use mdb_env_stat to get number of branch, leaf and overflow pages. Sum it all up across all dbs (times page size) and compare to the current map size. Is this the correct way to calculate used space?

That is indeed the approach I'm using as well (and the only I could find).
For every database:
MDB_stat stat;
mdb_stat(d->transaction, d->dbi, &stat);
auto dbSize = stat.ms_psize * (stat.ms_leaf_pages + stat.ms_branch_pages + stat.ms_overflow_pages);

Related

OpenCL: What if i have more tasks then available work items?

Let's make an example:
i want vector dot product made concurrently (it's not my case, this is only an example) so i have 2 large input vectors and a large output vector with the same size. the work items aviable are less then the sizes of these vectors. How can i make this dot product in opencl if the work items are less then the size of the vectors? Is this possible? Or i have just to make some tricks?
Something like:
for(i = 0; i < n; i++){
output[i] = input1[i]*input2[i];
}
with n > available work items
If by "available work items" you mean you're running into the maximum given by CL_DEVICE_MAX_WORK_ITEM_SIZES, you can always enqueue your kernel multiple times for different ranges of the array.
Depending on your actual workload, it may be more sensible to make each work item perform more work though. In the simplest case, you can use the SIMD types such as float4, float8, float16, etc. and operate on large chunks like that in one go. As always though, there is no replacement for trying different approaches and measuring the performance of each.
Divide and conquer data. If you keep workgroup size as an integer divident of global work size, then you can have N workgroup launches perhaps k of them at once per kernel launch. So you should just launch N/k kernels each with k*workgroup_size workitems and proper addressing of buffers inside kernels.
When you have per-workgroup partial sums of partial dot products(with multiple in-group reduction steps), you can simply sum them on CPU or on whichever device that data is going to.

What is a quick way to check if file contents are null?

I have a rather large file (32 GB) which is an image of an SD card, created using dd.
I suspected that the file is empty (i.e. filled with the null byte \x00) starting from a certain point.
I checked this using python in the following way (where f is an open file handle with the cursor at the last position I could find data at):
for i in xrange(512):
if set(f.read(64*1048576))!=set(['\x00']):
print i
break
This worked well (in fact it revealed some data at the very end of the image), but took >9 minutes.
Has anyone got a better way to do this? There must be a much faster way, I'm sure, but cannot think of one.
Looking at a guide about memory buffers in python here I suspected that the comparator itself was the issue. In most non-typed languages memory copies are not very obvious despite being a killer for performance.
In this case, as Oded R. established, creating a buffer from read and comparing the result with a previously prepared nul filled one is much more efficient.
size = 512
data = bytearray(size)
cmp = bytearray(size)
And when reading:
f = open(FILENAME, 'rb')
f.readinto(data)
Two things that need to be taken into account is:
The size of the compared buffers should be equal, but comparing bigger buffers should be faster until some point (I would expect memory fragmentation to be the main limit)
The last buffer may not be the same size, reading the file into the prepared buffer will keep the tailing zeroes where we want them.
Here the comparison of the two buffers will be quick and there will be no attempts of casting the bytes to string (which we don't need) and since we reuse the same memory all the time, the garbage collector won't have much work either... :)

Is the unit of VmData in proc/[pid]/status stored elsewhere?

Is it safe to assume that the VmData, VmStk and VmExe (Size of data, stack, and text segments)
are always expressed in kBs ? Is this unit stored somewhere or are these values the only places where unit information is stored ?
I can't find any documentation that says for certain it is in kBs, however, it is my best guess that it is. Here are some programs I ran across that depend on it being in kBs:
https://github.com/pmav/procvm/blob/master/procvm.sh
Look at the first code here:
http://locklessinc.com/articles/memory_usage/
Search for "Checking memory from a serial C program" here:
http://www.umbc.edu/hpcf/resources-tara/checking-memory-usage.html

VimL: Get extra KB from function that outputs file size

Right now I'm creating a plugin of sorts for Vim, it's meant to simply have all kinds of utility functions to put in your statusline, here's the link: https://github.com/Greduan/vim-usefulstatusline
Right now I have this function: https://github.com/Greduan/vim-usefulstatusline/blob/master/autoload/usefulstatusline_filesize.vim
It simply outputs the file size from bytes to megabytes. Now, currently if the file size reaches 1MB for example it outputs 1MB, this is fine, but I would also like for it to output the amount of bytes or KB extra that it has.
From example, instead of outputting 1MB it would output 1MB-367KB, see what I mean? It would output the biggest size, and then the remainder of the size that follows it. It's hard to explain.
So how would I modify the current function(s) to output the size this way?
Thanks for your help! Any of it is appreciated. :)
Who needs this? I doubt it would be convenient to anyone (especially when having small remainders like 1MB + 3KB), using 1.367MB is much better. I see in your code that you don’t have either MB (1000*1000 B) or MiB (1024*1024 B), 1000*1024 bytes is very strange. Also, don’t use getfsize, it is wrong for any non-file buffer you constantly see in plugins. Use line2byte(line('$')+1)-1.
For 1.367MB you can just rewrite humanize_bytes function in VimL if you are fine with depending on +float feature.
Using integer arithmetic you can get the remainder with
let kbytes_remainder = kbytes % 1000
And do change to either MiB/KiB (M/K is a common shortcut used in ls. Without B) or MB/KB.

Unexpected memory usage of List<T>

I always thought that the default constructor for List would initialize a list with a capacity of 4 and that the capacity would be doubled when adding the 5th element, etc...
In my application I make a lot of lists (tree like structure where each node can have many children), some of these nodes won't have any children and since my application was fast but was also using a bit much memory I decided to use the constructor where I can specify the capacity and have set this at 1.
The strange thing now is that the memory usage when I start with a capacity of 1 is about 15% higher then when I use the default constructor. It can't be because of a better fit with 4 since the doubling would be 1,2,4. So why this extra increase in memory usage? As an extra test I've tried to start with a capacity of 4. This time again the memory usage was 15% higher then when using no specified capacity.
Now this really isn't a problem, but it bothers me that a pretty simple data structure that I've used for years has some extra logic that I didn't know about yet. Does anyone have an idea of the inner workings of List in this aspect?
That's because if you use the default constructor the internal storage array is set to an empty array, but if you use the constructor with a set size an array of the correct size gets set immediately, instead of being generated on the first call to Add.
You can see this using a decompiler like JustDecompile:
public List(int capacity)
{
if (capacity < 0)
{
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.capacity, ExceptionResource.ArgumentOutOfRange_NeedNonNegNum);
}
this._items = new T[capacity];
}
public List()
{
this._items = List<T>._emptyArray;
}
If you look at the Add function it calls EnsureCapacity, which will enlarge the internal storage array if required. Obviously, if the array is set to an empty array initially the first add will create the default size array.

Resources