I want to create an empty list of lists:
shape = (70000,70000)
corr = np.empty(shape).tolist()
How can I know how much RAM I need to hold this list using windows operating system (64 bit)?
This will create a list-of-lists-of-floats. About half of the RAM used is for the floats themselves and half is for the references to them. The size of each reference is 8 bytes and the size of each float is also 8 bytes. That makes 70000 * 70000 * 8 * 2 bytes (approx 80G).
Lists look like this in memory:
image source: here
The 70001 lists objects themselves also have overhead (they maintain pointer into storage array, and their own length), but this will be negligible in comparison (probably ~4 MB).
Also note that Python lists overallocate space by an implementation-dependent factor, so consider these numbers a lower bound. Memory is over-allocated so that there are always some free slots available, which makes appends and inserts faster. The space allocated increased by about 12.5% when full.
Related
In node.js, what is the difference between memory allocated in the node.js heap vs. memory reported as "external" as seen in process.memoryUsage()? As there advantages or disadvantages to one over the other? If you were going to have a particularly large object (gigabytes), does it matter which one is used? That doc only says this:
external refers to the memory usage of C++ objects bound to JavaScript
objects managed by V8.
It doesn't tell you what any consequences or tradeoffs might be of doing allocations that use this "external" memory instead of the heap memory.
I'm examining tradeoffs between using a large Array (regular Javascript Array object) vs. a large Uint32Array object. Apparently, an Array is always allocated in the heap, but a Uint32Array is always allocated from "external" memory (not in the heap).
Here the background. I have an object that needs a fairly large array of 32 bit integers as part of what it does. This array can be as much as a billion units long. I realized that I could cut the storage occupied by the array in half if I switched to a Uint32Array instead of a regular Array because a regular Array stores things as double precision floats (64 bits long) vs. a Uint32Array stores things as 32 bit values.
In a test jig, I measured that the Uint32Array does indeed use 1/2 the total storage for the same number of units long (32-bits per unit instead of 64-bits per unit). But, when looking at the results of process.memoryUsage(), I noticed that the Uint32Array storage is in the "external" bucket of storage whereas the array is in the "heapUsed" bucket of storage.
To add to the context a bit more, because a Uint32Array is not resizable, when I do need to change its size, I have to allocate a new Uint32Array and then copy data over from the original (which I suppose could lead to more memory fragmentation opportunites). With an Array, you can just change the size and the JS engine takes care of whatever has to happen internally to adjust to the new size.
What are the advantages/disadvantages of making a large storage allocation that's "external" vs. in the "heap"? Any reason to care?
As per this page, for versions 2.6 and 2.7 (http://www.timestored.com/kdb-guides/memory-management)
2.6 Unreferenced memory blocks over 32MB/64MB are returned immediately
2.7 Unreferenced memory blocks returned when memory full or .Q.gc[] called
But in both versions, there is a significant difference between used and heap space shown by .Q.w[]. This difference only grows as I run the function again. In 2.6, a difference could occur due to fragmentation (allocating many small objects) but I am not confident it accounts for this big of a difference. In 2.7, even after running .Q.gc[], it shows a significant difference. I would like to understand fundamentally the reason for this difference in the two versions as highlighted below.
This is behavior I am seeing in 2.6 and 2.7:
2.6:
used| 11442889952
heap| 28588376064
2.7 (after running .Q.gc[])
used| 11398025856
heap| 16508780544
Automatic Garbage collection doesn't clear small objects (<32MB) . In that case manual GC call is required. If your process has lot of unreferenced small objects then that will add up in heap size and not to used size.
Second, since KDB allocates memory in power of 2, that makes the difference between used and heap memory. For ex. if a vector requires 64000 bytes, it will be assigned a memory block of size 2^16 = 65536 bytes. And boundary cases makes this difference huge, for ex. if vector requires 33000 bytes (just over 2^15) it will be allocated 65536 bytes (2^16).
Following site has good explanation of GC behavior:
http://www.aquaq.co.uk/q/garbage-collection-kdb/
Why is something like this runs extremely slow in Haskell?
test = [x|a<-[1..100],b<-[1..100],c<-[1..100],d<-[1..100],let x = a]
print $ length test
There are only about 10^8 numbers to run, it should be done in a blink of eye, but it seems like running forever and almost crashed.
Are you running this in ghci or in a compiled program? It makes a big difference.
If in ghci, then ghci will keep the computed value of test around in case you want to use it later. Normally this is a good idea, but not in this case where test is a huge value that would be cheap to recompute anyways. How huge? For starters it's a list of 10^8 elements, and (on a 64-bit system) a list costs 24 bytes per element, so that's 2.4G already. Then there is space usage of the values themselves. One might think the values are all taken from [1..100], so they should be shared and use a negligible amount of space in total. But the values in the list are really of the form x, which could depend on a, b, c and d, and length never examines the values in the list as it traverses it. So each element is going to be represented as a closure that refers to a, b, c and d, which takes at least 8*(4+1) = 40 more bytes, bringing us to a total of 6.4G.
That's rather a lot, and the garbage collector has to do quite a lot of copying when you allocate 6.4G of data, all of it permanently live. That's what takes so long, not actually computing the list or its length.
If you compile the program
test = [x|a<-[1..100],b<-[1..100],c<-[1..100],d<-[1..100],let x = a]
main = print $ length test
then test does not have to be kept live as its length is being computed, as clearly it is never going to be used again. So now the GC has almost no work to do, and the program runs in a couple seconds (reasonable for ~10^8 list node allocations and computations on Integer).
You're not just running a loop 10^8 times, you're creating a list with 10^8 elements. Since you're using length, Haskell has to actually evaluate the entire list to return its length. Each element in the list takes one word, which might be 32 bits or might be 64 bits. On the optimistic assumption that it's 32 bits (4 bytes), you've just allocated 400 MB (about 381.5 MiB) of memory. If it's 64 bits, then that's 800 MB (about 763 MiB) of memory you've just allocated. Depending on what else is going on on your system, you might have just hit the swap file / swap partition by allocating that much RAM at a chunk.
If there are other subtleties going on, I'm not aware of them, but memory usage is my first suspicion for why this is so slow.
PHP have an internal data-structure called smart string (smart_str?), where they store both length and buffer size. That is, more memory than the length of the string is allocated to improve concatenation performance. Why isn't this data-structure used for the actual PHP strings? Wouldn't that lead to fewer memory allocations and better performance?
Normal PHP strings (as of PHP 7) are represented by the zend_string type, which includes both the length of the string and its character data array. zend_strings are usually allocated to fit the character data precisely (alignment notwithstanding): They will not leave place to append additional characters.
The smart_str structure includes a pointer to a zend_string and an allocation size. This time, the zend_string will not be precisely allocated. Instead the allocation will be made too large, so that additional characters can be appended without expensive reallocations.
The reallocation policy for smart_str is as follows: First, it will be allocated to have a total size of 256 bytes (minus the zend_string header, minus allocator overhead). If this size is exceeded it will be reallocated to 4096 bytes (minus overhead). After that, the size will increase in increments of 4096 bytes.
Now, imagine that we replace all strings with smart_strings. This would mean that even a single character string would have a minimum allocation size of 256 bytes. Given that most strings in use are small, this is an unacceptable overhead.
So essentially, this is a classic performance/memory tradeoff. We use a memory-compact representation by default and switch to a faster, but less memory-effective representation in the cases that benefit most from it, i.e. cases where large strings are constructed from small parts.
Why is there a hardcoded chunk limit (.5 meg after compression) in memcached? Has anyone recompiled theirs to up it? I know I should not be sending big chunks like that around, but these extra heavy chunks happen for me from time to time and wreak havoc.
This question used to be in the official FAQ
What are some limits in memcached I might hit? (Wayback Machine)
To quote:
The simple limits you will probably see with memcache are the key and
item size limits. Keys are restricted to 250 characters. Stored data
cannot exceed 1 megabyte in size, since that is the largest typical
slab size."
The FAQ has now been revised and there are now two separate questions covering this:
What is the maxiumum key length? (250 bytes)
The maximum size of a key is 250 characters. Note this value will be
less if you are using client "prefixes" or similar features, since the
prefix is tacked onto the front of the original key. Shorter keys are
generally better since they save memory and use less bandwidth.
Why are items limited to 1 megabyte in size?
Ahh, this is a popular question!
Short answer: Because of how the memory allocator's algorithm works.
Long answer: Memcached's memory storage engine (which will be
pluggable/adjusted in the future...), uses a slabs approach to memory
management. Memory is broken up into slabs chunks of varying sizes,
starting at a minimum number and ascending by a factorial up to the
largest possible value.
Say the minimum value is 400 bytes, and the maximum value is 1
megabyte, and the factorial is 1.20:
slab 1 - 400 bytes slab 2 - 480 bytes slab 3 - 576 bytes ... etc.
The larger the slab, the more of a gap there is between it and the
previous slab. So the larger the maximum value the less efficient the
memory storage is. Memcached also has to pre-allocate some memory for
every slab that exists, so setting a smaller factorial with a larger
max value will require even more overhead.
There're other reason why you wouldn't want to do that... If we're
talking about a web page and you're attempting to store/load values
that large, you're probably doing something wrong. At that size it'll
take a noticeable amount of time to load and unpack the data structure
into memory, and your site will likely not perform very well.
If you really do want to store items larger than 1MB, you can
recompile memcached with an edited slabs.c:POWER_BLOCK value, or use
the inefficient malloc/free backend. Other suggestions include a
database, MogileFS, etc.