Can FFTW use an alternate user supplied malloc and free? - malloc

I'm working in an environment where I need to use alternate work-alike calls to malloc() and free(). I'd like to make calls into FFTW, but if FFTW internally calls malloc() and free() for it's own purposes it will screw up the memory management environment that I am in, so I need a way to tell FFTW to use alternate functions I supply instead of malloc and free. How do I do that?

I sure hope your underlying mex compiler is gcc / GNU binutils. If not, you might want to change mex compilers so that it is. This answer is very much specific to GNU ld...
Add the following to the code that gets compiled with your mexFunction:
void *
__wrap_malloc(size_t size)
{
return mxMalloc(size);
}
void
__wrap_free(void *ptr)
{
return mxFree(ptr);
}
void *
__wrap_memalign(size_t alignment __attribute__((__unused__)), size_t size)
{
return mxMalloc(size);
}
BTW, I got that list of three functions to wrap by doing an objdump -T libfftw3.so, and looking for *UND* entries for heap management functions. Those were the only three that showed up.
Now, add the following to your mex command line:
CLIBS='-Wl,--wrap=malloc,--wrap=free,--wrap=memalign,-Bstatic,-lfftw3,-Bdynamic $CLIBS'
By forcing the linker to bring in libfftw3 statically, it can wrap the specified heap calls with your custom wrappers that redirect to MATLAB's heap management functions.
If you wanted to be fancy, you could enforce the alignment manually in the memalign() wrapper, and return a properly aligned pointer, but then you'd have to track each allocation to undo the pointer change at free() time. I'm just naively assuming MATLAB's allocator will return pointers that are already zero mod the maximum useful alignment, thinking that MATLAB's need for that is similar to FFTW's.
Also, remember to destroy all your plans and fftw_cleanup() on the way out of your mexFunction. MATLAB will force-free anything your mexFunction (or FFTW called by your mexFunction) allocated upon return of your mexFunction, unless you explicitly mexMakeMemoryPersistent() it.

Related

LD_PRELOAD stack and data segments memory allocation

Hello,
I'm writing a Linux module (based on a GitHub project called "Ccontrol") to create cache partitioning (a.k.a page coloring) for mitigating timing- side-channel attacks (for preventing attacks like Prime+Probe).
I've used LD_PRELOAD system env variable to overwrite all the malloc(),calloc() and free() calls and replace them with color aware calls.
Now I'm looking for away to color the stack and the data segments also.
What is the system-call/Library for allocating memory for a new born process?
Is there a way to overwrite this call(without recompiling the kernel) using LD_PRELOAD or any other method?
Thank you all in advance,
Gal
There are two memory allocating syscalls: sbrk, which expands the (continuous) heap segment and mmap, which is used to map separate anonymous memory segments into the address space of the calling process.
You won't be able to use LD_PRELOAD to override these everywhere, though.
You'll only be able to do it if the code you're overriding makes these calls through the DSO-exported libc wrappers, which means you won't be able to override direct syscalls and syscalls make through unexported wrappers (DSO-internal (__attribute__((visibility("hidden")))), which most libc implementations use quite a bit. You also won't be able to override the syscalls made by the dynamic linker.
If you need a robust way of overriding the calls, you'll need to turn to ptrace or modify the kernel.

Using user-space functions like sprintf in the kernel, or not?

I am making a /proc entry for my driver. So, in the read callback function the first argument is the location into which we write the data intended for the user. I searched on how to write the data in it and i could see that everybody is using sprintf for this purpose. I am surprised to see that it works in kernel space. However this should be wrong to use a user space function in kernel space. Also i cant figure out how to write in that location without using any user space function like strcpy, sprintf, etc. I am using kernel version 3.9.10. Please suggest me how i should do this without using sprintf or any other user space function.
Most of the 'normal' user-space functions would make no sense in kernel code, so they are not available in the kernel.
However, some functions like sprintf, strcpy, or memcpy are useful in kernel code, so the kernel implements them (more or less completely) and makes them available for drivers.
See include/linux/kernel.h and string.h.
sprintf is a kernel-space function in Linux. It is totally separate from its user-space namesake and may or may not work identically to it.
Just because a function in user-space exist, it does not mean an identically named function in kernel-space cannot.

Disabling vsyscalls in Linux

I'm working on a piece of software that monitors other processes' system calls using ptrace(2). Unfortunately most modern operating system implement some kind of fast user-mode syscalls that are called vsyscalls in Linux.
Is there any way to disable the use of vsyscalls/vDSO for a single process or, if that is not possible, for the whole operating system?
Try echo 0 > /proc/sys/kernel/vsyscall64
If you're trying to ptrace on gettimeofday calls and they aren't showing up, what time source is the system using (pmtimer, acpi, tsc, hpet, etc). I wonder if you'd humor me by trying to force your timer to something older like pmtimer. It's possible one of the many gtod timer specific optimizations is causing your ptrace calls to be avoided, even with vsyscall set to zero.
Is there any way to disable the use of vsyscalls/vDSO for a single process or, if that is not possible, for the whole operating system?
It turns out there IS a way to effectively disable linking vDSO for a single process without disabling it system-wide using ptrace!
All you have to do is to stop the traced process before it returns from execve and remove the AT_SYSINFO_EHDR entry from the auxiliary vector (which comes directly after environment variables along the memory region pointed to in rsp). PTRACE_EVENT_EXEC is a good place to do this.
AT_SYSINFO_EHDR is what the kernel uses to tell the system linker where vDSO is mapped in the process's address space. If this entry is not present, ld seems to act as if the system hasn't mapped a vDSO.
Note that this doesn't somehow unmap the vDSO from your processes memory, it merely ignores it when linking other shared libraries. A malicious program will still be able to interact with it if the author really wanted to.
I know this answer is a bit late, but I hope this information will spare some poor soul a headache
For newer systems echo 0 > /proc/sys/kernel/vsyscall64 might not work. In Ubuntu 16.04 vDSO can be disabled system-wide by adding the kernel parameter vdso=0 in /etc/default/grub under the parameter: GRUB_CMDLINE_LINUX_DEFAULT.
IMPORTANT: Parameter GRUB_CMDLINE_LINUX_DEFAULT might be overwriten by other configuration files in /etc/default/grub.d/..., so double check when to add your custom configuration.
Picking up on Tenders McChiken's approach, I did create a wrapper that disables vDSO for an arbitrary binary, without affecting the rest of the system: https://github.com/danteu/novdso
The general procedure is quite simple:
use ptrace to wait for return from execve(2)
find the address of the auxvector
overwrite the AT_SYSINFO_EHDR entry with AT_IGNORE, telling the application to ignore the following value
I know this is an older question, but nobody has mentioned a third useful way of disabling the vDSO on a per-process basis. You can overwrite the libc functions with your own that performs the actual system call using LD_PRELOAD.
A simple shared library for overriding the gettimeofday and time functions, for example, could look like this:
vdso_override.c:
#include <time.h>
#include <sys/time.h>
#include <unistd.h>
#include <sys/syscall.h>
int gettimeofday(struct timeval *restrict tv, struct timezone *restrict tz)
{
return syscall(__NR_gettimeofday, (long)tv, (long)tz, 0, 0, 0, 0);
}
time_t time(time_t *tloc)
{
return syscall(__NR_time, (long)tloc, 0, 0, 0, 0, 0);
}
This uses the libc wrapper to issue a raw system call (see syscall(2)), so the vDSO is circumvented. You would have to overwrite all system calls that the vDSO exports on your architecture in this way (listed at vdso(7)).
Compile with
gcc -fpic -shared -o vdso_override.so vdso_override.c
Then run any program in which you want to disable VDSO calls as follows:
LD_PRELOAD=./vdso_override.so <some program>
This of course only works if the program you are running is not actively trying to circumvent this. While you can override a symbol using LD_PRELOAD, if the target program really wants to, there is a way to find the original symbol and use that instead.

What functions does _WinMainCRTStartup perform?

This is part of a series of at least two closely related, but distinct questions. I hope I'm doing the right thing by asking them separately.
I'm trying to get my Visual C++ 2008 app to work without the C Runtime Library. It's a Win32 GUI app without MFC or other fancy stuff, just plain Windows API.
So I set Project Properties -> Configuration -> C/C++ -> Advanced -> Omit Default Library Names to Yes (compiler flag /Zl) and rebuilt.
Then the linker complains about an unresolved external _WinMainCRTStartup. Fair enough, I can tell the linker to use a different entry point, say MyStartup. From what I gather around the web, _WinMainCRTStartup does some initialization stuff, and I probably want MyStartup to do a subset of that.
So my question is: What functions does _WinMainCRTStartup perform, and which of these can I omit if I don't use the CRT?
If you are knowledgeable about this stuff, please have a look at my other question too. Thanks!
Aside: Why do I want to do this in the first place?
My app doesn't explicitly use any CRT functions.
I like lean and mean apps.
It'll teach me something new.
The CRT's entry point does the following (this list is not complete):
Initializes global state needed by the CRT. If this is not done, you cannot use any functions or state provided by the CRT.
Initializes some global state that is used by the compiler. Run-time checks such as the security cookie used by /GS definitely stands out here. You can call __security_init_cookie yourself, however. You may need to add other code for other run-time checks.
Calls constructors on C++ objects. If you are writing C++ code, you may need to emulate this.
Retrieves command line and start up information provided by the OS and passes it your main. By default, no parameters are passed to the entry point of the program by the OS - they are all provied by the CRT.
The CRT source code is available with Visual Studio and you can step through the CRT's entry point in a debugger and find out exactly what it is doing.
A true Win32 program written in C (not C++) doesn't need any initialization at all, so you can start your project with WinMainCRTStartup() instead of WinMain(HINSTANCE,...).
It's also possible but a bit harder to write console programs as true Win32 applications; the default name of entry point is _mainCRTStartup().
Disable all extra code generation features like stack probes, array checks etc. Debugging is still possible.
Initialization
Sometimes you need the first HINSTANCE parameter. For Win32 (except Win32s), it is fixed to (HINSTANCE)0x400000.
The nCmdShow parameter is always SW_SHOWDEFAULT.
If necessary, retrieve the command line with GetCommandLine().
Termination
When your program spawns threads, e.g. by calling GetOpenFileName(), returning from WinMainCRTStartup() with return keyword will hang your program — use ExitProcess() instead.
Caveats
You will run into considerable trouble when:
using stack frames (i.e. local variables) larger than 4 KBytes (per function)
using float-point arithmetic (e.g. float->int conversion)
using 64-bit integers on 32-bit machines (multiply, bit-shift operations)
using C++ new, delete, and static objects with non-zero-out-all-members constructors
using standard library functions like fopen(), printf() of course
Troubleshoot
There is a C standard library available on all Windows systems (since Windows 95), the MSVCRT.DLL.
To use it, import their entry points, e.g. using my msvcrt-light.lib (google for it). But there are still some caveats, especially when using compilers newer than MSVC6:
stack frames are still limited to 4 KBytes
_ftol_sse or _ftol2_sse must be routed to _ftol
_iob_func must be routed to _iob
Its initialization seems to run at load time. At least the file functions will run seemlessly.
Old question, but the answers are either incorrect or focus on one specific problem.
There are a number of C and C++ features that simply will not be available on Windows (or most operating systems, for that matter) if the programs actually started at main/WinMain.
Take this simple example:
class my_class
{
public:
my_class() { m_val = 5; }
int my_func(){ return m_val }
private:
int m_val;
}
my_class g_class;
int main(int argc, char **argv)
{
return g_class.my_func();
}
in order for this program to function as expected, the constructor for my_class must be called before main. If the program started exactly at main, it would require a compiler hack (note: GCC does this in some cases) to insert a function call at the very beginning of main. Instead, on most OSes and in most cases, a different function constructs g_class and then calls main (on Windows, this is either mainCRTStartup or WinMainCRTStartup; on most other OSes I'm used to it is a function called _start).
There's other things C++ and even C require to be done before or after main to work.
How are stdin and stdout (std::cin and std::cout) useable as soon as main starts?
How does atexit work?
The C standard requires the standard library have a POSIX-like signal API, which on Windows must be "installed" before main().
On most OSes, there is no system-provided heap; the C runtime implements its own heap (Microsoft's C runtime just wraps the Kernel32 Heap functions).
Even the arguments passed to main, argc and argv, must be gotten from the system somehow.
You might want to take a look at Matt Pietrick's (ancient) articles on implementing his own C runtime for specifics on how this works with Windows + MSVC (note: MinGW and Cygwin implement specific things differently, but actually fall back to MSVCRT for most things):
http://msdn.microsoft.com/en-us/library/bb985746.aspx

Putting code and data into the same section in a Linux kernel module

I'm writing a Linux kernel module in which I would like to have some code and associated data in the same section. I declare the data and the functions with the attribute tags, like:
void * foo __attribute__ ((section ("SEC_A"))) = NULL;
void bar(void) __attribute__ ((section("SEC_A")));
However when I do this, gcc complains with:
error: foo causes a section type conflict
If I do not declare the function with the specific section name, gcc is fine with it. But I want both the function and the variable to be in the same section.
Is there any way to do that with gcc? My gcc version is gcc (Ubuntu 4.3.2-1ubuntu12) 4.3.2
From the GCC manual:
Some file formats do not support arbitrary sections so the section attribute is not available on all platforms. If you need to map the entire contents of a module to a particular section, consider using the facilities of the linker instead.
IIRC, linux uses a flat memory model, so you don't gain anything by "forcing" things into a single section, anyway, do you?
Hmmm. I suppose you could make an asm function to reserve the space and then do pointer foo to get it's address. Might want to wrap the ugly in a macro...
Another thought would be to split the problem in half; write a small example case of the closest thing you can and still compile, get the asm code, and tinker with it to see what you can get past the downstream stages. If nothing else, you could write something to mungle the asm code for that module, entomb it in you make file, and call it good.
Yet another thought: try putting the variable definitions in a small asm module (e,g, as db's or whatever with the right section declarations) and let the linker handle it.
I think you cannot put text (function) and data (BSS) objects into the same section because (some) OSes assume immutability of .TEXT section types for process re-use.

Resources