General Information
I'm currently experimenting with the C->Haskell (C2HS) Interface Generator for Haskell. At the first glace, it was just awesome, I interfaced a rather complicated C++ library (using a small extern C-wrapper) in just a couple of hours. (And I never did any FFI before.)
There was just one problem: How to get the memory allocated in the C/C++ library freed? I found {#pointer ... foreign #} in the C2HS documentation and that looks exactly like what I'm after. Since my C-wrapper turns the C++ library into a library with referential transparency with a functional interface, the Haskell Storage Manager should be able to do the hard work for me :-). Unfortunately, I was not able to get this working. To better explain my problem, I set up a small demo project on GitHub that has the same properties as the C/C++ library+wrapper but hasn't the overhead. As you can see, the library is completely safe to use with pure unsafe FFI.
Demo Project
On GitHub, I created a small demo project organized as follows:
C library
The C library is very simple and useless: You can pass it an integer and you can get as many integers (currently [0..n]) back from the library. Remember: The library is useless, just a demo. The interface is quite simple, too: The function LTIData lti_new_data(int n) will (after passing a integer) return you some kind of opaque object containing allocated data of the C library. The library has also two accessor functions int lti_element_count(LTIData data) and int lti_get_element(LTIData data, int n), the former will return the number of elements and the latter will return you element n. Ah, and last but not least, the user of the library should after using it free the opaque LTIData using void lti_free_data(LTIData data).
Experiment/C2HSBinding/Foreign/lib_to_interface.h
Experiment/C2HSBinding/Foreign/lib_to_interface.c
Low-level Haskell Binding
The low-level Haskell Binding is set up using C2HS, you can find it in
Experiment/C2HSBinding/Foreign/HsLTI.chs
High-level Haskell API
For fun I also set up kind of a high-level Haskell API using the low-level API binding and a simple driver program that uses the high-level API. Using the driver program and e.g. valgrind one is easily able to see the leaked memory (for every parameter p_1, p_2, ..., p_n the library does \sum_{i = 1..n} 1 + p_i allocations; easily observable as below):
$ valgrind dist/build/TestHsLTI/TestHsLTI 100 2>&1 | grep -e allocs -e frees
==22647== total heap usage: 184 allocs, 74 frees, 148,119 bytes allocated
$ valgrind dist/build/TestHsLTI/TestHsLTI 100 100 2>&1 | grep -e allocs -e frees
==22651== total heap usage: 292 allocs, 80 frees, 181,799 bytes allocated
$ valgrind dist/build/TestHsLTI/TestHsLTI 100 100 100 2>&1 | grep -e allocs -e frees
==22655== total heap usage: 400 allocs, 86 frees, 215,479 bytes allocated
$ valgrind dist/build/TestHsLTI/TestHsLTI 100 100 100 100 2>&1 | grep -e allocs -e frees
==22659== total heap usage: 508 allocs, 92 frees, 249,159 bytes allocated
Current State of the Demo
You should be able to clone, compile and run the project by simply typing git clone https://github.com/weissi/c2hs-experiments.git && cd c2hs-experiments && cabal configure && cabal build && dist/build/TestHsLTI/TestHsLTI
So what's the Problem again?
The problem is that the project only uses Foreign.Ptr and not the "managed" version Foreign.ForeignPtr using C2HS's {#pointer ... foreign #} and I can't get it to work. In the demo project I also added a .chs file trying to use these foreign pointers but it does not work :-(. I tried it very hard but I didn't have any success.
And there is one thing I don't understand, too: How to tell GHC using C2HS how to free the library's data. The demo project's library provides a function void lti_free_data(LTIData data) that should get called to free the memory. But GHC can't guess that!?! If GHC uses regular a free(), not all of the memory will get freed :-(.
Problem solved: I found this file doing something similar on the internet and was able to solve it :-).
Everything it needed was some boilerplate marshalling code:
foreign import ccall "lib_to_interface.h <i_free_data"
ltiFreeDataPtr :: FunPtr (Ptr (LTIDataHs) -> IO ())
newObjectHandle :: Ptr LTIDataHs -> IO LTIDataHs
newObjectHandle p = do
fp <- newForeignPtr ltiFreeDataPtr p
return $ LTIDataHs fp
Here's the final managed (ForeignPtr) verion of the .chs file.
Related
I am trying to reproduc a problem .
My c code giving SIGABRT , i traced it back to this line number :3174
https://elixir.bootlin.com/glibc/glibc-2.27/source/malloc/malloc.c
/* Little security check which won't hurt performance: the allocator
never wrapps around at the end of the address space. Therefore
we can exclude some size values which might appear here by
accident or by "design" from some intruder. We need to bypass
this check for dumped fake mmap chunks from the old main arena
because the new malloc may provide additional alignment. */
if ((__builtin_expect ((uintptr_t) oldp > (uintptr_t) -oldsize, 0)
|| __builtin_expect (misaligned_chunk (oldp), 0))
&& !DUMPED_MAIN_ARENA_CHUNK (oldp))
malloc_printerr ("realloc(): invalid pointer");
My understanding is that when i call calloc function memory get allocated when I call realloc function and try to increase memory area ,heap is not available for some reason giving SIGABRT
My another question is, How can I limit the heap area to some bytes say, 10 bytes to replicate the problem. In stackoverflow RSLIMIT and srlimit is mentioned but no sample code is mentioned. Can you provide sample code where heap size is 10 Bytes ?
How can I limit the heap area to some bytes say, 10 bytes
Can you provide sample code where heap size is 10 Bytes ?
From How to limit heap size for a c code in linux , you could do:
You could use (inside your program) setrlimit(2), probably with RLIMIT_AS (as cited by Ouah's answer).
#include <sys/resource.h>
int main() {
setrlimit(RLIMIT_AS, &(struct rlimit){10,10});
}
Better yet, make your shell do it. With bash it is the ulimit builtin.
$ ulimit -v 10
$ ./your_program.out
to replicate the problem
Most probably, limiting heap size will result in a different problem related to heap size limit. Most probably it is unrelated, and will not help you to debug the problem. Instead, I would suggest to research address sanitizer and valgrind.
I am guessing (hoping) the answer is never.
That such memory must be explicitly freed.
For example if if I wrote:
julia> x = Libc.malloc(1_000_000)
Ptr{Void} #0x0000000002f6bd80
julia> x = nothing
have I just leaked ~1MB of memory?
However I am not 100% certain this is true,
because the docs don't mention it at all.
help?> Libc.malloc(3)
malloc(size::Integer) -> Ptr{Void}
Call malloc from the C standard library.
Yes, you are correct.
Julia is designed to seamlessly interoperate with C on a low level, so when you use the C wrapper libraries, you you get C semantics and no garbage collection.
The docs for Libc.malloc is not written to teach C, but could be improved to mention Libc.free, in case anyone gets confused.
Yet one more answer
Yes you leaked 1MB of memory. But there's a mechanism that implements ownership transfer
struct MyStruct
...
end
n = 10
x = Base.Libc.malloc(n * sizeof(MyStruct)) # returns Ptr{Nothing}
xtyped = convert(Ptr{MyStruct}, x) # something like reinterpret cast
vector = unsafe_wrap(Array, xtyped, n; own = true) # returns Vector{MyStruct}
N.B. The last line transfers ownership of memory to Julia, hence, from this moment it's better to avoid using of x and xtyped as they can point to already freed memory.
Such low-level kung fu can prove helpful while dealing with binary files especially with function unsafe_read.
Alternatively, as it was mentioned you can use Base.Libc.free(x) to manually free up memory.
P.S. However it is often better to rely on built-in memory management. By default immutable structs are tried to be allocated on stack, which improves performance.
Another microbenchmark: Why is this "loop" (compiled with ghc -O2 -fllvm, 7.4.1, Linux 64bit 3.2 kernel, redirected to /dev/null)
mapM_ print [1..100000000]
about 5x slower than a simple for-cycle in plain C with write(2) non-buffered syscall? I am trying to gather Haskell gotchas.
Even this slow C solution is much faster than Haskell
int i;
char buf[16];
for (i=0; i<=100000000; i++) {
sprintf(buf, "%d\n", i);
write(1, buf, strlen(buf));
}
Okay, on my box the C code, compiled per gcc -O3 takes about 21.5 seconds to run, the original Haskell code about 56 seconds. So not a factor of 5, a bit above 2.5.
The first nontrivial difference is that
mapM_ print [1..100000000]
uses Integers, that's a bit slower because it involves a check upfront, and then works with boxed Ints, while the Show instance of Int does the conversion work on unboxed Int#s.
Adding a type signature, so that the Haskell code works on Ints,
mapM_ print [1 :: Int .. 100000000]
brings the time down to 47 seconds, a bit above twice the time the C code takes.
Now, another big difference is that show produces a linked list of Char and doesn't just fill a contiguous buffer of bytes. That is slower too.
Then that linked list of Chars is used to fill a byte buffer that then is written to the stdout handle.
So, the Haskell code does more, and more complicated things than the C code, thus it's not surprising that it takes longer.
Admittedly, it would be desirable to have an easy way to output such things more directly (and hence faster). However, the proper way to handle it is to use a more suitable algorithm (that applies to C too). A simple change to
putStr . unlines $ map show [0 :: Int .. 100000000]
almost halves the time taken, and if one wants it really fast, one uses the faster ByteString I/O and builds the output efficiently as exemplified in applicative's answer.
On my (rather slow and outdated) machine the results are:
$ time haskell-test > haskell-out.txt
real 1m57.497s
user 1m47.759s
sys 0m9.369s
$ time c-test > c-out.txt
real 7m28.792s
user 1m9.072s
sys 6m13.923s
$ diff haskell-out.txt c-out.txt
$
(I have fixed the list so that both C and Haskell start with 0).
Yes you read this right. Haskell is several times faster than C. Or rather, normally buffered Haskell is faster than C with write(2) non-buffered syscall.
(When measuring output to /dev/null instead of a real disk file, C is about 1.5 times faster, but who cares about /dev/null performance?)
Technical data: Intel E2140 CPU, 2 cores, 1.6 GHz, 1M cache, Gentoo Linux, gcc4.6.1, ghc7.6.1.
The standard Haskell way to hand giant bytestrings over to the operating system is to use a builder monoid.
import Data.ByteString.Lazy.Builder -- requires bytestring-0.10.x
import Data.ByteString.Lazy.Builder.ASCII -- omit for bytestring-0.10.2.x
import Data.Monoid
import System.IO
main = hPutBuilder stdout $ build [0..100000000::Int]
build = foldr add_line mempty
where add_line n b = intDec n <> charUtf8 '\n' <> b
which gives me:
$ time ./printbuilder >> /dev/null
real 0m7.032s
user 0m6.603s
sys 0m0.398s
in contrast to Haskell approach you used
$ time ./print >> /dev/null
real 1m0.143s
user 0m58.349s
sys 0m1.032s
That is, it's child's play to do nine times better than mapM_ print, contra Daniel Fischer's suprising defeatism. Everything you need to know is here: http://hackage.haskell.org/packages/archive/bytestring/0.10.2.0/doc/html/Data-ByteString-Builder.html I won't compare it with your C since my results were much slower than Daniel's and n.m. so I figure something was going wrong.
Edit: Made the imports consistent with all versions of bytestring-0.10.x It occurred to me the following might be clearer -- the Builder equivalent of unlines . map show:
main = hPutBuilder stdout $ unlines_ $ map intDec [0..100000000::Int]
where unlines_ = mconcat . map (<> charUtf8 '\n')
I have a curious memory leak, it seems that the library function to_unbounded_string is leaking!
Code snippets:
procedure Parse (Str : in String;
... do stuff...
declare
New_Element : constant Ada.Strings.Unbounded.Unbounded_String :=
Ada.Strings.Unbounded.To_Unbounded_String (Str); -- this leaks
begin
valgrind output:
==6009== 10,276 bytes in 1 blocks are possibly lost in loss record 153 of 153
==6009== at 0x4025BD3: malloc (vg_replace_malloc.c:236)
==6009== by 0x42703B8: __gnat_malloc (in /usr/lib/libgnat-4.4.so.1)
==6009== by 0x4269480: system__secondary_stack__ss_allocate (in /usr/lib/libgnat-4.4.so.1)
==6009== by 0x414929B: ada__strings__unbounded__to_unbounded_string (in /usr/lib/libgnat-4.4.so.1)
==6009== by 0x80F8AD4: syntax__parser__dash_parser__parseXn (token_parser_g.adb:35)
Where token_parser_g.adb:35 is listed above as the "-- this leaks" line.
Other info: Gnatmake version 4.4.5. gcc version 4.4 valgrind version valgrind-3.6.0.SVN-Debian, valgrind options -v --leak-check=full --read-var-info=yes --show-reachable=no
Any help or insights appreciated,
NWS.
Valgrind clearly says that there is possibly a memory leak. It doesn't necessarily mean there is one. For example, if first call to that function allocates a pool of memory that is re-used during the life time of the program but is never freed, Valgrind will report it as a possible memory leak, even though it is not, as this is a common practice and memory will be returned to OS upon process termination.
Now, if you think that there is a memory leak for real, call this function in a loop, and see it memory continues to grow. If it does - file a bug report or even better, try to find and fix the leak and send a patch along with a bug report.
Hope it helps.
Was trying to keep this to comments, but what I was saying got too long and started to need formatting.
In Ada string objects are generally assumed to be perfectly-sized. The language provies functions to return the size and bounds of any string. Because of this, string handling in Ada is very different than C, and in fact more resembles how you'd do it in a functional language like Lisp.
But the basic principle is that, except in some very unusual situations, if you find yourself using Ada.Strings.Unbounded, you are going about things the wrong way.
The one case where you really can't get around using a variable-length string (or perhaps a buffer with a separate valid_length variable), is when reading strings as input from some external source. As you say, your parsing example is such a situation.
However, even here you should only have that situation on the initial buffer. Your call to your Parse routine should look something like this:
Ada.Text_IO.Get_Line (Buffer, Buffer_Len);
Parse (Buffer(Buffer'first..Buffer'first + Buffer_Len - 1));
Now inside the Parse routine you have a perfectly-sized constant Ada string to work with. If for some reason you need to pull out a subslice, you would do the following:
... --// Code to find start and end indices of my subslice
New_Element : constant String := Str(Element_Start...Element_End);
If you don't actually need to make a copy of that data for some reason though, you are better off just finding Element_Start and Element_End and working with a slice of the original string buffer. Eg:
if Str(Element_Start..Element_End) = "MyToken" then
I know this doesn't answer your question about Ada.Strings.Unbounded possibly leaking. But even if it doesn't leak, that code is relatively wasteful of machine resources (CPU and memory), and probably shouldn't be used for string manipulation unless you really need it.
Are bound[ed] strings scoped?
Expanding on #T.E.D.'s comments, Ada.Strings.Bounded "objects should not be implemented by implicit pointers and dynamic allocation." Instead, the maximum size is fixed when the generic in instantiated. As an implmentation detail, GNAT uses a discriminant to specify the maximum size of the string and a record to store the current size & contents.
In contrast, Ada.Strings.Unbounded requires that "No storage associated with an Unbounded_String object shall be lost upon assignment or scope exit." As an implmentation detail, GNAT uses a buffered implementation derived from Ada.Finalization.Controlled. As a result, the memory used by an Unbounded_String may appear to be a leak until the object is finalized, as for example when the code returns to an enclosing scope.
In the static vs shared libraries debates, I've often heard that shared libraries eliminate duplication and reduces overall disk space. But how much disk space do shared libraries really save in modern Linux distros? How much more space would be needed if all programs were compiled using static libraries? Has anyone crunched the numbers for a typical desktop Linux distro such as Ubuntu? Are there any statistics available?
ADDENDUM:
All answers were informative and are appreciated, but they seemed to shoot down my question rather than attempt to answer it. Kaleb was on the right track, but he chose to crunch the numbers for memory space instead of disk space (my question was for disk space).
Because programs only "pay" for the portions of static libraries that they use, it seems practically impossible to quantitatively know what the disk space difference would be for all static vs all shared.
I feel like trashing my question now that I realize it's practically impossible to answer. But I'll leave it here to preserve the informative answers.
So that SO stops nagging me to choose an answer, I'm going to pick the most popular one (even if it sidesteps the question).
I'm not sure where you heard this, but reduced disk space is mostly a red herring as drive space approaches pennies per gigabyte. The real gain with shared libraries comes with security and bugfix updates for those libraries; applications using static libraries have to be individually rebuilt with the new libraries, whereas all apps using shared libraries can be updated at once by replacing only a few files.
Not only do shared libraries save disk space, they also save memory, and that's a lot more important. The prelinking step is important here... you can't share the memory pages between two instances of the same library unless they are loaded at the same address, and prelinking allows that to happen.
Shared libraries do not necessarily save disk space or memory.
When an application links to a static library, only those parts of the library that the application uses will be pulled into the application binary. The library archive (.a) contains object files (.o), and if they are well factored, the application will use less memory by only linking with the object files it uses. Shared libraries will contain the whole library on disk and in memory whether parts of it are used by applications or not.
For desktop and server systems, this is less likely to result in a win overall, but if you are developing embedded applications, it's worth trying static linking all the applications to see if that gives you an overall saving.
I was able to figure out a partial quantitative answer without having to do an obscene amount of work. Here is my (hair-brained) methodology:
1) Use the following command to generate a list of packages with their installed size and list of dependencies:
dpkg-query -Wf '${Package}\t${Installed-Size}\t${Depends}
2) Parse the results and build a map of statistics for each package:
struct PkgStats
{
PkgStats() : kbSize(0), dependantCount(0) {}
int kbSize;
int dependentCount;
};
typedef std::map<std::string, PkgStats> PkgMap;
Where dependentCount is the number of other packages that directly depend on that package.
Results
Here is the Top 20 list of packages with the most dependants on my system:
Package Installed KB # Deps Dup'd MB
libc6 10096 750 7385
python 624 112 68
libatk1.0-0 200 92 18
perl 18852 48 865
gconf2 248 34 8
debconf 988 23 21
libasound2 1428 19 25
defoma 564 18 9
libart-2.0-2 164 14 2
libavahi-client3 160 14 2
libbz2-1.0 128 12 1
openoffice.org-core 124908 11 1220
gcc-4.4-base 168 10 1
libbonobo2-0 916 10 8
cli-common 336 8 2
coreutils 12928 8 88
erlang-base 6708 8 46
libbluetooth3 200 8 1
dictionaries-common 1016 7 6
where Dup'd MB is the number of megabytes that would be duplicated if there was no sharing (= installed_size * (dependants_count - 1), for dependants_count > 1).
It's not surprising to see libc6 on top. :) BTW, I have a typical Ubuntu 9.10 setup with a few programming-related packages installed, as well as some GIS tools.
Some statistics:
Total installed packages: 1717
Average # of direct dependents: 0.92
Total duplicated size with no sharing (ignoring indirect dependencies): 10.25GB
Histogram of # of direct dependents (note logarithmic Y scale):
Note that the above totally ignores indirect dependencies (i.e. everything should be at least be indirectly dependent on libc6). What I really should have done is built a graph of all dependencies and use that as the basis for my statistics. Maybe I'll get around to it sometime and post a lengthy blog article with more details and rigor.
Ok, perhaps not an answer, but the memory savings is what I'd consider. The savings is going to be based on the number of times a library is loaded after the first application, so lets find out how much savings per library are on the system using a quick script:
#!/bin/sh
lastlib=""
let -i cnt=1
let -i size=0
lsof | grep 'lib.*\.so$' | awk '{print $9}' | sort | while read lib ; do
if [ "$lastlib" == "$lib" ] ; then
let -i cnt="$cnt + 1"
else
let -i size="`ls -l $lib | awk '{print $5}'`"
let -i savings="($cnt - 1) * $size"
echo "$lastlib: $savings"
let -i cnt=1
fi
lastlib="$lib"
done
That will give us savings per lib, as such:
...
/usr/lib64/qt4/plugins/crypto/libqca-ossl.so: 0
/usr/lib64/qt4/plugins/imageformats/libqgif.so: 540640
/usr/lib64/qt4/plugins/imageformats/libqico.so: 791200
...
Then, the total savings:
$ ./checker.sh | awk '{total = total + $2}END{print total}'
263160760
So, roughly speaking on my system I'm saving about 250 Megs of memory. Your mileage will vary.