Is g_slice really faster than malloc - malloc

The GLib docs recommend use of the GLib Slice Allocator over malloc:
"For newly written code it is recommended to use the new g_slice API instead of g_malloc() and friends, as long as objects are not resized during their lifetime and the object size used at allocation time is still available when freeing."
-- http://developer.gnome.org/glib/unstable/glib-Memory-Slices.html
But in practise is g_slice significantly faster than Windows/Linux malloc(faster enough to warrant the extra trouble of handling sizes and GLib's preprocessor hacks like g_slice_new)? I'm planning to use GLib in my C++ program to handle INIish configuration (GKeyFile) and to get access to data structures not available in C++ like GHashTable, so the GLib dependency doesn't matter anyway.

Faster enough to be worth it sort of depends on your app. But they should be faster.
There is another issue besides speed, which is memory fragmentation and per-block overhead. GSlice
leaves malloc to deal with large or variable-size allocations while handling small known-size objects more space-efficiently.

Slice API heavily borrows from research conducted by Sun Microsystems in 1980s and it was called slab allocation back then. I could not find original research paper but here is a wikipedia page about it or you can just google for "slab allocation".
Essentially it eliminates expensive allocation/deallocation operations by facilitating reuse of memory blocks. It also reduces or eliminates memory fragmentation. So it is not all about speed, even though it should improve it as well.
If you should used or not - it depends... Look at Havoc's answer - he summarized it pretty well.
Update 1:
Note, that modern Linux kernels include SLAB allocator as one of the option and it is often the default. So, the difference between g_slice() and malloc() may be unnoticeable in that case. However, purpose of glib is cross-platform compatibility, so using slice API may somewhat guarantee consistent performance across different platforms.
Update 2:
As it was pointed by a commenter my first update is incorrect. SLAB allocation is used by kernel to allocate memory to processes but malloc() uses an unrelated mechanism, so claim that malloc() is equivalent to g_slice() on Linux is invalid. Also see this answer for more details.

Related

Why can't we have a safe ISA?

Accroding to this paper: https://doi.org/10.1109/SP.2013.13, Memory corruption bugs are one of the oldest problems in computer security. The lack of memory safety and type safety has caused countless bugs, causing billions of dollars and huge efforts to fix them.
But the root of C/C++'s memory vulnerability can trace down to the ISA level. At ISA level, every instruction can access any memory address without any fine grained safe check (only corase grained check like page fault). Sure, we can implement memory safe at a higher software level, like Java (JVM), but this leads to significant cost of performance. In a word, we can't have both safety and performance at the same time on existing CPUs.
My question is, why can't we implement the safety at the hardware level? If the CPU has a safe ISA, which ensures the memory safe by, I don't know, taking the responsbilities of malloc and free, then maybe we can get rid of the performance decline of software safe checking. If anyone professional in microelectronics can tell me, is this idea realistic?
Depending on what you mean, it could make it impossible implement memory-unsafe languages like C in a normal way. e.g. every memory access would have to be to some object that has a known size? I'd guess an operating system for such a machine might have to work around that "feature" by telling it that the entire address space was one large array object. Or else you'd need some mechanism for a read system call to know the proper bounds of the object it's writing in the copy_to_user() part of its job. And then there's other OS stuff like accessing the same physical page from different virtual pages.
The OP (via asking on Reddit) found the CHERI project which is an attempt at this idea, involving "... revisit fundamental design choices in hardware and software to dramatically improve system security." Changing hardware alone can't work; compilers need to change, too. But they were able to adapt "Clang/LLVM, FreeBSD, FreeRTOS, and applications such as WebKit," so their approach could be practical. (Unlike the hypothetical versions I was imagining when writing other parts of this answer.)
CHERI uses "fine-grained memory protection", and "Language and compiler extensions" to implement memory-safe C and C++, and higher-level languages.
So it's not a drop-in replacement, and it sounds like you have to actively use the features to gain safety. As I argue in the rest of the answer, hardware can't do it alone, and it's highly non-trivial even with software cooperation. It's easy to come up with ways that wouldn't work. :P
For hardware-enforced memory-safety to be possible, hardware would have to know about every object and its size, and be able to cache that structure in a way that allows efficient lookups to find the bounds. Page tables (4k granularity, or larger in more modern ISAs) are already hard enough for hardware for hardware to cache efficiently for large programs, and that's without even considering which pointer goes with which object.
Checking a TLBs as part of every load and store can be done efficiently, but checking another structure in parallel with that might be problematic. Especially when the ranges don't have power-of-2 sizes and natural alignment, the way pages do, which makes it possible to build a TLB from content-addressable memory that checks for a match against each of several possible values for the high bits. (e.g. a page is 4k in size, always starting at a 4k alignment boundary.)
You mean it may cost too much at hardware level, like the die area?
Die area might not even be the biggest problem, especially these days. It would cost power, and/or cost latency in very important critical paths such as L1d hit load-use latency. Even if you could come up with some plausible way for software to make tables that hardware could check, or otherwise solve the other parts of this problem.
Modifying a page-table entry requires invalidating the entry, including TLB shootdown for other cores. If every free (and some malloc) cost inter-core communication to do similar things for object tables, that would be very expensive.
I think inventing a way for software to tell the hardware about objects would be an even bigger problem. malloc and free aren't something you can just build in to a CPU where memory addressing works anything like existing CPUs, or like it does in C. Software needs to manage memory, it doesn't make sense to try to build that in to a CPU. So then malloc and free (and mmap with file-backed mappings and shared memory...) need a way to tell the CPU about objects. Seems like a mess.
I think at best an ISA could provide more tools software can use to make bounds-checks cheaper. Perhaps some kind of extra semantics on loads/stores, like an extra operand for indexed addressing modes for load or store that takes a max?
At least if we want an ISA to work anything like current ones, rather than work like a JVM or a Transmeta Crusoe and internally recompile for some real ISA.
Intel's MPX ISA extension to x86 was an attempt to let software set up bound ranges, but it's been mostly abandoned due to lower performance than pure software. Intel even dropped it from their recent CPUs (Not present in 10th Gen CPUs using 10nm lithography, or later.)
This is all just off the top of my head; I haven't searched for any serious proposals for how a system could plausibly work.
I don't think memory safety is something you can easily add after the fact to languages like C that weren't originally designed with it.
Have a look to "Code for malloc and free" at SO. Those commands are very, very far away from even being defined within an instruction set.

What memory leaks can occur outside the view of GHC's heap profiler

I have a program that exhibits the behavior of a memory leak. It gradually takes up all of the systems memory until it fills all swap space and then the operating system kills it. This happens once every several days.
I have extensively profiled the heap in a manner of ways (-hy, -hm, -hc) and tried limiting heap size (-M128M) tweaked the number of generations (-G1) but no matter what I do the heap size appears constant-ish and low always (measured in kB not MB or GB). Yet when I observe the program in htop, its resident memory steadily climbs.
What this indicates to me is that the memory leak is coming from somewhere besides the GHC heap. My program makes use of dependencies, specifically Haskell's yaml library which wraps the C library libyaml, it is possible that the leak is in the number of foreign pointers it has to objects allocated by libyaml.
My question is threefold:
What places besides the GHC heap can memory leak from in a Haskell program?
What tools can I use to track these down?
What changes to my source code need to be made to avoid these types of leaks, as they seem to differ from the more commonly experienced space leaks in Haskell?
This certainly sounds like foreign pointers aren't being finalized properly. There are several possible reasons for this:
The underlying C library doesn't free memory properly.
The Haskell library doesn't set up finalization properly.
The ForeignPtr objects aren't being freed.
I think there's actually a decent chance that it's option 3. If the RTS consistently finds enough memory in the first GC generation, then it just won't bother running a major collection. Fortunately, this is the easiest to diagnose. Just have your program run System.Memory.performGC every so often. If that fixes it, you've found the bug and can tweak just how often you want to do that.
Another possible issue is that you could have foreign pointers lying around in long-lived thunks or other closures. Make sure you don't.
One particularly strong possibility when working with a wrapped C library is that the wrapper functions will return ByteStrings whose underlying arrays were allocated by C code. So any ByteStrings you get back from yaml could potentially be off-heap.

How to work with dynamically allocated memory in Assembly (x64/Linux)?

I'm trying to build a toy language compiler (that generates assembly for NASM) and so far so good, but I got really stuck in the topic of dynamic memory allocation. It's the only part on assembly that's stopping me from starting my implementation. The goal is just to learn how things work at the low level.
Is there a good and comprehensive guide/tutorial/book about how to dynamically allocate, use and free memory using Assembly (preferably x64/Linux)? I have found some tips here and there mentioning brk, sbrk and mmap, but I don't know how to use them and I feel that there is more to it than just checking the arguments and the return value of these syscalls. How do they work exactly?
For example, in this post, it is mentioned that sbrk moves the border of the data segment. Can I know where the border is initially/after calling sbrk? Can I just use my initial data segment for the first dynamic allocations (and how)?
This other post explains how free works in C, but it does not explain how C actually gives the memory back to the OS. I have also started to read some books on assembly, but somehow they seem to ignore this topic (perhaps because it's OS specific).
Are there some working assembly code examples? I really couldn't find enough information.
I know one way is to use glibc's malloc, but I wanted to know how it is done from assembly. How do compiled languages or even LLVM do it? Do they just use C's malloc?
malloc is inteface provided for userspace programs. It may have different implementations, such as ptmalloc, tcmalloc and jemalloc. Depending on different environment, you can choosing different allocators to use and even implement your own allocator. As I know, jemalloc manages memory for userspace programs by mmap a block of demanded memory, and jemalloc controls when the block of memory frees to kernel/system.(I know jemalloc is used in Android.) Also jemalloc also uses sbrk depending on different states od system memory. For more detailed info, I think you have to read the codes of defferent allocators you wanted to learn.

Why are Sempaphores limited in Linux

We just ran out of semaphores on our Linux box, due to the use of too many Websphere Message Broker instances or somesuch.
A colleague and I got to wondering why this is even limited - it's just a bit of memory, right?
I thoroughly googled and found nothing.
Anyone know why this is?
cheers
Semaphores, when being used, require frequent access with very, very low overhead.
Having an expandable system where memory for each newly requested semaphore structure is allocated on the fly would introduce complexity that would slow down access to them because it would have to first look up where the particular semaphore in question at the moment is stored, then go fetch the memory where it is stored and check the value. It is easier and faster to keep them in one compact block of fixed memory that is readily at hand.
Having them dispersed throughout memory via dynamic allocation would also make it more difficult to efficiently use memory pages that are locked (that is, not subject to being swapped out when there are high demands on memory). The use of "locked in" memory pages for kernel data is especially important for time-sensitive and/or critical kernel functions.
Having the limit be a tunable parameter (see links in the comments of original question) allows it to be increased at runtime if needed via an "expensive" reallocation and relocation of the block. But typically this is done one time at system initialization before anything much is even using semaphores.
That said, the amount of memory used by a semaphore set is rather tiny. With modern memory available on systems being in the many gigabytes the original default limits on the number of them might seem a bit stingy. But keep in mind that on many systems semaphores are rarely used by user space processes and the linux kernel finds its way into lots of small embedded systems with rather limited memory, so setting the default limit arbitrarily high in case it might be used seems wasteful.
The few software packages, such as Oracle database for example, that do depend on having many semaphores available, typically do recommend in their installation and/or system tuning advice to increase the system limits.

Detect and remove Memory Leak in Linux Application

We have a a very large project which is basically an application which uses Linux Application programming and runs on PowerPC processor. This project was initially developed by another company. We acquired the project from the company and now we are maintaining the project.
The application is reported to have a lot of memory leak issue. Since this is a large project, it is not possible to go to each source code file and find out the memory leak. We have used Valgrid, mpatrol and other memory leak detection tools. These tools did not help much and the memory leak has not decreased by a significant percentage.
In this situation, how to go about to reduce the memory leak by a significant amount.Is there a general method which people use in these case to reduce the memory leak other than the memory leak detection tools like mentioned above.
Usually Valgrind belongs to the best tools for this tasks. If it does not work correctly, there might only be a couple of things you can still do.
First question: What language is the application in? Valgrind is very good for C and C++, but will not help you with garbage collected or scripting language. So check the language first. There might be something similar for java, but I have not used that much java, so you would have to ask someone else.
Play around a lot with the settings of valgrind. There are several plugins, that can help with this. One example could be using --leak-check=full or similar options. There are also plugins for valgrind, that can enhance it detection capabilities.
You say, that the application was reported to have a memory leak. How was this detected? Did the application detect this by itself. If it was detected by the application on it's own without any external tools, this probably means someone has added their own memory tracker inside the application. Custom memory tracker, memory pools etc. mess up valgrind and any other leak detection system very bad. So in case any custom memory handling is present in the application, your only choice is to either deactivate it (if possible) or to hook into this custom mechanism. How this could be done depends on your application only.
Add your own memory tracker. For example in C++ it is possible to hook into new/delete calls and get them to track the memory. There are a couple of libraries you can use for this. You can also write your own new/delete replacement in about 500 LOC. If you decide to use this method, be sure to read a lot of tutorials on replacing new/delete, since there are several things that are unusual in the C++ world when attempting this task.
What makes you so sure, there is an memory leak in the application (i.e. how was this detected)? If a tool just reported huge numbers of allocated memory, this might not even mean, there is an actual memory leak. A memory leak means that the handles to the memory are lost and hence it becomes impossible to every reach and free that memory again. In case your application just get's a lot of memory and keeps it accessible, you probably have a completely different problem. For example you simply might use an algorithm with a bad space complexity at one point or the other, leading to many allocations. In this case you will not need a leak detector, but rather a memory profiler, which gives you more detailed overview of the memory footprint of the code parts. However I have never used a profiler for this kind of task before, so I cannot give you any more hints on this.
You could replace all memory allocation calls with calls to your own allocation methods, which should call original methods and at the same time count memory usage and where it was allocated. This will allow you to find the leaks and eliminate them by hand.
There might also be automated tools that allow you to do this - not sure, haven't used any. But this method works.
Perhaps you might also consider using Boehm's garbage collector (that is using GC_malloc instead of malloc etc... and not bother about free-ing data).

Resources