Visual Leak Detector is turned off - memory-leaks

I am using the vld 2.5 to find memory leaks and adding the additional included directories and additional library directories.
#include <QtCore/QCoreApplication>
#include "vld.h"
int main(int argc, char *argv[])
{
//VLDEnable();
//VLDReportLeaks();
QCoreApplication a(argc, argv);
return a.exec();
}
here is how to use VLD.After I start debugging the simple program, the console tells me that the Visual Leak Detector is turned off.Then I enable VLD, but it still doesn't work.
VLDEnable();
VLDReportLeaks();
the vld.ini says:
;; Visual Leak Detector - Initialization/Configuration File
;; Copyright (c) 2005-2016 VLD Team
;;
;; This library is free software; you can redistribute it and/or
;; modify it under the terms of the GNU Lesser General Public
;; License as published by the Free Software Foundation; either
;; version 2.1 of the License, or (at your option) any later version.
;;
;; This library is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
;; Lesser General Public License for more details.
;;
;; You should have received a copy of the GNU Lesser General Public
;; License along with this library; if not, write to the Free Software
;; Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
;;
;; See COPYING.txt for the full terms of the GNU Lesser General Public License.
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; Any options left blank or not present will revert to their default values.
[Options]
; The main on/off switch. If off, Visual Leak Detector will be completely
; disabled. It will do nothing but print a message to the debugger indicating
; that it has been turned off.
;
; Valid Values: on, off
; Default: on
;
VLD = on
; If yes, duplicate leaks (those that are identical) are not shown individually.
; Only the first such leak is shown, along with a number indicating the total
; number of duplicate leaks.
;
; Valid Values: yes, no
; Default: no
;
AggregateDuplicates = no
; Lists any additional modules to be included in memory leak detection. This can
; be useful for checking for memory leaks in debug builds of 3rd party modules
; which can not be easily rebuilt with '#include "vld.h"'. This option should be
; used only if absolutely necessary and only if you really know what you are
; doing.
;
; CAUTION: Avoid listing any modules that link with the release CRT libraries.
; Only modules that link with the debug CRT libraries should be listed here.
; Doing otherwise might result in false memory leak reports or even crashes.
;
; Valid Values: Any list containing module names (i.e. names of EXEs or DLLs)
; Default: None.
;
ForceIncludeModules =
; Maximum number of data bytes to display for each leaked block. If zero, then
; the data dump is completely suppressed and only call stacks are shown.
; Limiting this to a low number can be useful if any of the leaked blocks are
; very large and cause unnecessary clutter in the memory leak report.
;
; Value Values: 0 - 4294967295
; Default: 256
;
MaxDataDump =
; Maximum number of call stack frames to trace back during leak detection.
; Limiting this to a low number can reduce the CPU utilization overhead imposed
; by memory leak detection, especially when using the slower "safe" stack
; walking method (see StackWalkMethod below).
;
; Valid Values: 1 - 4294967295
; Default: 64
;
MaxTraceFrames =
; Sets the type of encoding to use for the generated memory leak report. This
; option is really only useful in conjuction with sending the report to a file.
; Sending a Unicode encoded report to the debugger is not useful because the
; debugger cannot display Unicode characters. Using Unicode encoding might be
; useful if the data contained in leaked blocks is likely to consist of Unicode
; text.
;
; Valid Values: ascii, unicode
; Default: ascii
;
ReportEncoding = ascii
; Sets the report file destination, if reporting to file is enabled. A relative
; path may be specified and is considered relative to the process' working
; directory.
;
; Valid Values: Any valid path and filename.
; Default: .\memory_leak_report.txt
;
ReportFile =
; Sets the report destination to either a file, the debugger, or both. If
; reporting to file is enabled, the report is sent to the file specified by the
; ReportFile option.
;
; Valid Values: debugger, file, both
; Default: debugger
;
ReportTo = debugger
; Turns on or off a self-test mode which is used to verify that VLD is able to
; detect memory leaks in itself. Intended to be used for debugging VLD itself,
; not for debugging other programs.
;
; Valid Values: on, off
; Default: off
;
SelfTest = off
; Selects the method to be used for walking the stack to obtain stack traces for
; allocated memory blocks. The "fast" method may not always be able to
; successfully trace completely through all call stacks. In such cases, the
; "safe" method may prove to more reliably obtain the full stack trace. The
; disadvantage is that the "safe" method is significantly slower than the "fast"
; method and will probably result in very noticeable performance degradation of
; the program being debugged.
;
; Valid Values: fast, safe
; Default: fast
;
StackWalkMethod = fast
; Determines whether memory leak detection should be initially enabled for all
; threads, or whether it should be initially disabled for all threads. If set
; to "yes", then any threads requiring memory leak detection to be enabled will
; need to call VLDEnable at some point to enable leak detection for those
; threads.
;
; Valid Values: yes, no
; Default: no
;
StartDisabled = no
; Determines whether or not all frames, including frames internal to the heap,
; are traced. There will always be a number of frames internal to Visual Leak
; Detector and C/C++ or Win32 heap APIs that aren't generally useful for
; determining the cause of a leak. Normally these frames are skipped during the
; stack trace, which somewhat reduces the time spent tracing and amount of data
; collected and stored in memory. Including all frames in the stack trace, all
; the way down into VLD's own code can, however, be useful for debugging VLD
; itself.
;
; Valid Values: yes, no
; Default: no
;
TraceInternalFrames = no
; Determines whether or not report memory leaks when missing HeapFree calls.
;
; Valid Values: yes, no
; Default: no
;
SkipHeapFreeLeaks = no
; Determines whether or not report memory leaks generated from crt startup code.
; These are not actual memory leaks as they are freed by crt after the VLD object
; has been destroyed.
;
; Valid Values: yes, no
; Default: yes
;
SkipCrtStartupLeaks = yes
Can someone help me?

I had a similar problem: suddenly I realized VLD printed "Visual Leak Detector is turned off" in the Debug log. I checked everything: #include "vld.h" used, the "vld.ini" untouched, the DLL was in the PATH, so it should work.
In my case I found the solution by downloading the VLD-sources and debugging it in some test case and voilà: I had set an environment variable "VLD64" to some directory name some months ago. After deleting it everything worked fine again.

$(VLD) is internally used by VLD lib, to check "on"/"off".
if (!GetEnvironmentVariable(L"VLD", buffer, buffersize)) {
GetPrivateProfileString(L"Options", L"VLD", L"on", buffer, buffersize, inipath);
}
Change your sys.enviroment paths variable name to VLD_PATH
$(VLD_PATH)\include
$(VLD_PATH)\lib\Win32

Related

SHC "Argument list too long" not able to resolve by increasing stack size to unlimited [duplicate]

I have to pass 256Kb of text as an argument to the "aws sqs" command but am running into a limit in the command-line at around 140Kb. This has been discussed in many places that it been solved in the Linux kernel as of 2.6.23 kernel.
But cannot get it to work. I am using 3.14.48-33.39.amzn1.x86_64
Here's a simple example to test:
#!/bin/bash
SIZE=1000
while [ $SIZE -lt 300000 ]
do
echo "$SIZE"
VAR="`head -c $SIZE < /dev/zero | tr '\0' 'a'`"
./foo "$VAR"
let SIZE="( $SIZE * 20 ) / 19"
done
And the foo script is just:
#!/bin/bash
echo -n "$1" | wc -c
And the output for me is:
117037
123196
123196
129680
129680
136505
./testCL: line 11: ./foo: Argument list too long
143689
./testCL: line 11: ./foo: Argument list too long
151251
./testCL: line 11: ./foo: Argument list too long
159211
So, the question how do I modify the testCL script is it can pass 256Kb of data? Btw, I have tried adding ulimit -s 65536 to the script and it didn't help.
And if this is plain impossible I can deal with that but can you shed light on this quote from my link above
"While Linux is not Plan 9, in 2.6.23 Linux is adding variable
argument length. Theoretically you shouldn't hit frequently "argument
list too long" errors again, but this patch also limits the maximum
argument length to 25% of the maximum stack limit (ulimit -s)."
edit:
I was finally able to pass <= 256 KB as a single command line argument (see edit (4) in the bottom). However, please read carefully how I did it and decide for yourself if this is a way you want to go. At least you should be able to understand why you are 'stuck' otherwise from what I found out.
With the coupling of ARG_MAX to ulim -s / 4 came the introduction of MAX_ARG_STRLEN as max. length of an argument:
/*
* linux/fs/exec.c
*
* Copyright (C) 1991, 1992 Linus Torvalds
*/
...
#ifdef CONFIG_MMU
/*
* The nascent bprm->mm is not visible until exec_mmap() but it can
* use a lot of memory, account these pages in current->mm temporary
* for oom_badness()->get_mm_rss(). Once exec succeeds or fails, we
* change the counter back via acct_arg_size(0).
*/
...
static bool valid_arg_len(struct linux_binprm *bprm, long len)
{
return len <= MAX_ARG_STRLEN;
}
...
#else
...
static bool valid_arg_len(struct linux_binprm *bprm, long len)
{
return len <= bprm->p;
}
#endif /* CONFIG_MMU */
...
static int copy_strings(int argc, struct user_arg_ptr argv,
struct linux_binprm *bprm)
{
...
str = get_user_arg_ptr(argv, argc);
...
len = strnlen_user(str, MAX_ARG_STRLEN);
if (!len)
goto out;
ret = -E2BIG;
if (!valid_arg_len(bprm, len))
goto out;
...
}
...
MAX_ARG_STRLEN is defined as 32 times the page size in linux/include/uapi/linux/binfmts.h:
...
/*
* These are the maximum length and maximum number of strings passed to the
* execve() system call. MAX_ARG_STRLEN is essentially random but serves to
* prevent the kernel from being unduly impacted by misaddressed pointers.
* MAX_ARG_STRINGS is chosen to fit in a signed 32-bit integer.
*/
#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
#define MAX_ARG_STRINGS 0x7FFFFFFF
...
The default page size is 4 KB so you cannot pass arguments longer than 128 KB.
I can't try it now but maybe switching to huge page mode (page size 4 MB) if possible on your system solves this problem.
For more detailed information and references see this answer to a similar question on Unix & Linux SE.
edits:
(1)
According to this answer one can change the page size of x86_64 Linux to 1 MB by enabling CONFIG_TRANSPARENT_HUGEPAGE and setting CONFIG_TRANSPARENT_HUGEPAGE_MADVISE to n in the kernel config.
(2)
After recompiling my kernel with the above configuration changes getconf PAGESIZE still returns 4096.
According to this answer CONFIG_HUGETLB_PAGE is also needed which I could pull in via CONFIG_HUGETLBFS. I am recompiling now and will test again.
(3)
I recompiled my kernel with CONFIG_HUGETLBFS enabled and now /proc/meminfo contains the corresponding HugePages_* entries mentioned in the corresponding section of the kernel documentation.
However, the page size according to getconf PAGESIZE is still unchanged. So while I should be able now to request huge pages via mmap calls, the kernel's default page size determining MAX_ARG_STRLEN is still fixed at 4 KB.
(4)
I modified linux/include/uapi/linux/binfmts.h to #define MAX_ARG_STRLEN (PAGE_SIZE * 64), recompiled my kernel and now your code produces:
...
117037
123196
123196
129680
129680
136505
143689
151251
159211
...
227982
227982
239981
239981
252611
252611
265906
./testCL: line 11: ./foo: Argument list too long
279901
./testCL: line 11: ./foo: Argument list too long
294632
./testCL: line 11: ./foo: Argument list too long
So now the limit moved from 128 KB to 256 KB as expected.
I don't know about potential side effects though.
As far as I can tell, my system seems to run just fine.
Just put the arguments into some file, and modify your program to accept "arguments" from a file. A common convention (notably used by GCC and several other GNU programs) is that an argument like #/tmp/arglist.txt asks your program to read arguments from file /tmp/arglist.txt, often one line per argument
You might perhaps pass some data thru long environment variables, but they also are limited (and what is limited by the kernel in fact is the size of main's initial stack, containing both program arguments and the environment)
Alternatively, modify your program to be configurable thru some configuration file which would contain the information you want to pass thru arguments.
(If you can recompile your kernel, you might try to increase -to a bigger power of two much smaller than your available RAM, e.g. to 2097152- the ARG_MAX which is #define-d in linux-4.*/include/uapi/linux/limits.h before recompiling your kernel)
In other ways, there is no way to circumvent that limitation (see execve(2) man page and its Limits on size of arguments and environment section) - once you have raised your stack limit (using setrlimit(2) with RLIMIT_STACK, generally with ulimit builtin in the parent shell). You need to deal with it otherwise.

How to get around the Linux "Too Many Arguments" limit

I have to pass 256Kb of text as an argument to the "aws sqs" command but am running into a limit in the command-line at around 140Kb. This has been discussed in many places that it been solved in the Linux kernel as of 2.6.23 kernel.
But cannot get it to work. I am using 3.14.48-33.39.amzn1.x86_64
Here's a simple example to test:
#!/bin/bash
SIZE=1000
while [ $SIZE -lt 300000 ]
do
echo "$SIZE"
VAR="`head -c $SIZE < /dev/zero | tr '\0' 'a'`"
./foo "$VAR"
let SIZE="( $SIZE * 20 ) / 19"
done
And the foo script is just:
#!/bin/bash
echo -n "$1" | wc -c
And the output for me is:
117037
123196
123196
129680
129680
136505
./testCL: line 11: ./foo: Argument list too long
143689
./testCL: line 11: ./foo: Argument list too long
151251
./testCL: line 11: ./foo: Argument list too long
159211
So, the question how do I modify the testCL script is it can pass 256Kb of data? Btw, I have tried adding ulimit -s 65536 to the script and it didn't help.
And if this is plain impossible I can deal with that but can you shed light on this quote from my link above
"While Linux is not Plan 9, in 2.6.23 Linux is adding variable
argument length. Theoretically you shouldn't hit frequently "argument
list too long" errors again, but this patch also limits the maximum
argument length to 25% of the maximum stack limit (ulimit -s)."
edit:
I was finally able to pass <= 256 KB as a single command line argument (see edit (4) in the bottom). However, please read carefully how I did it and decide for yourself if this is a way you want to go. At least you should be able to understand why you are 'stuck' otherwise from what I found out.
With the coupling of ARG_MAX to ulim -s / 4 came the introduction of MAX_ARG_STRLEN as max. length of an argument:
/*
* linux/fs/exec.c
*
* Copyright (C) 1991, 1992 Linus Torvalds
*/
...
#ifdef CONFIG_MMU
/*
* The nascent bprm->mm is not visible until exec_mmap() but it can
* use a lot of memory, account these pages in current->mm temporary
* for oom_badness()->get_mm_rss(). Once exec succeeds or fails, we
* change the counter back via acct_arg_size(0).
*/
...
static bool valid_arg_len(struct linux_binprm *bprm, long len)
{
return len <= MAX_ARG_STRLEN;
}
...
#else
...
static bool valid_arg_len(struct linux_binprm *bprm, long len)
{
return len <= bprm->p;
}
#endif /* CONFIG_MMU */
...
static int copy_strings(int argc, struct user_arg_ptr argv,
struct linux_binprm *bprm)
{
...
str = get_user_arg_ptr(argv, argc);
...
len = strnlen_user(str, MAX_ARG_STRLEN);
if (!len)
goto out;
ret = -E2BIG;
if (!valid_arg_len(bprm, len))
goto out;
...
}
...
MAX_ARG_STRLEN is defined as 32 times the page size in linux/include/uapi/linux/binfmts.h:
...
/*
* These are the maximum length and maximum number of strings passed to the
* execve() system call. MAX_ARG_STRLEN is essentially random but serves to
* prevent the kernel from being unduly impacted by misaddressed pointers.
* MAX_ARG_STRINGS is chosen to fit in a signed 32-bit integer.
*/
#define MAX_ARG_STRLEN (PAGE_SIZE * 32)
#define MAX_ARG_STRINGS 0x7FFFFFFF
...
The default page size is 4 KB so you cannot pass arguments longer than 128 KB.
I can't try it now but maybe switching to huge page mode (page size 4 MB) if possible on your system solves this problem.
For more detailed information and references see this answer to a similar question on Unix & Linux SE.
edits:
(1)
According to this answer one can change the page size of x86_64 Linux to 1 MB by enabling CONFIG_TRANSPARENT_HUGEPAGE and setting CONFIG_TRANSPARENT_HUGEPAGE_MADVISE to n in the kernel config.
(2)
After recompiling my kernel with the above configuration changes getconf PAGESIZE still returns 4096.
According to this answer CONFIG_HUGETLB_PAGE is also needed which I could pull in via CONFIG_HUGETLBFS. I am recompiling now and will test again.
(3)
I recompiled my kernel with CONFIG_HUGETLBFS enabled and now /proc/meminfo contains the corresponding HugePages_* entries mentioned in the corresponding section of the kernel documentation.
However, the page size according to getconf PAGESIZE is still unchanged. So while I should be able now to request huge pages via mmap calls, the kernel's default page size determining MAX_ARG_STRLEN is still fixed at 4 KB.
(4)
I modified linux/include/uapi/linux/binfmts.h to #define MAX_ARG_STRLEN (PAGE_SIZE * 64), recompiled my kernel and now your code produces:
...
117037
123196
123196
129680
129680
136505
143689
151251
159211
...
227982
227982
239981
239981
252611
252611
265906
./testCL: line 11: ./foo: Argument list too long
279901
./testCL: line 11: ./foo: Argument list too long
294632
./testCL: line 11: ./foo: Argument list too long
So now the limit moved from 128 KB to 256 KB as expected.
I don't know about potential side effects though.
As far as I can tell, my system seems to run just fine.
Just put the arguments into some file, and modify your program to accept "arguments" from a file. A common convention (notably used by GCC and several other GNU programs) is that an argument like #/tmp/arglist.txt asks your program to read arguments from file /tmp/arglist.txt, often one line per argument
You might perhaps pass some data thru long environment variables, but they also are limited (and what is limited by the kernel in fact is the size of main's initial stack, containing both program arguments and the environment)
Alternatively, modify your program to be configurable thru some configuration file which would contain the information you want to pass thru arguments.
(If you can recompile your kernel, you might try to increase -to a bigger power of two much smaller than your available RAM, e.g. to 2097152- the ARG_MAX which is #define-d in linux-4.*/include/uapi/linux/limits.h before recompiling your kernel)
In other ways, there is no way to circumvent that limitation (see execve(2) man page and its Limits on size of arguments and environment section) - once you have raised your stack limit (using setrlimit(2) with RLIMIT_STACK, generally with ulimit builtin in the parent shell). You need to deal with it otherwise.

How Does Scanf Find Keyboard Input?

I was skimming through an implementation of scanf here, and I couldn't find the exact method by which the program is picking up input from the keyboard. I know there's always another layer deeper to go, but could someone explain perhaps a step below C code, i.e. scanf, how keyboard input gets made available to my program?
Despite the fact the file is called scanf.c, it contains only the sscanf and vsscanf functions (plus another internal function that's unimportant to the C library API).
Hence it's only used for scanning strings rather than reading from files.
In terms of how an actual scanf would work, it would usually just use the lower level functions in the C library, such as getchar() and ungetc().
In terms of how those functions work, that's up to the implementation. It may call an lower level function yet again, it may have a memory mapped keyboard so it can just read memory to get a keystroke, it may receive interrupts and store keys in an ISR for later extraction.
The possibilities are very wide ranging.
For some concrete examples, I've developed a micro-processor emulator for the educational market that uses I/O ports for the keyboard (and other devices). So the lowest level code there is along the following lines:
:keyin equ 07d2 ; memory mapped keyboard port
push rf ; preserve register
setw rf :keyin ; use register f for input
:loop inb r0 rf ; get keyboard value to register 0
jz :loop ; 0 means none available, so try again
pop rf ; restore register f, register 0 now holds key
In contrast, (the much more successful) Linux can use one of the systems calls for reading a character from the input file descriptor:
buffer: db ?
getch: mov eax, 3 ; sys_read
mov ebx, 0 ; descriptor 0 (input)
mov ecx, buffer ; address to store data
mov edx, 1 ; get one character
int 80h ; make system call (old style).

x86 reserved EFLAGS bit 1 == 0: how can this happen?

I'm using the Win32 API to stop/start/inspect/change thread state. Generally works pretty well. Sometimes it fails, and I'm trying to track down the cause.
I have one thread that is forcing context switches on other threads by:
thread stop
fetch processor state into windows context block
read thread registers from windows context block to my own context block
write thread registers from another context block into windows context block
restart thread
This works remarkably well... but ... very rarely, context switches seem to fail.
(Symptom: my multithread system blows sky high executing a strange places with strange register content).
The context control is accomplished by:
if ((suspend_count=SuspendThread(WindowsThreadHandle))<0)
{ printf("TimeSlicer Suspend Thread failure");
...
}
...
Context.ContextFlags = (CONTEXT_INTEGER | CONTEXT_CONTROL | CONTEXT_FLOATING_POINT);
if (!GetThreadContext(WindowsThreadHandle,&Context))
{ printf("Context fetch failure");
...
}
call ContextSwap(&Context); // does the context swap
if (ResumeThread(WindowsThreadHandle)<0)
{ printf("Thread resume failure");
...
}
None of the print statements ever get executed. I conclude that Windows thinks the context operations all happened reliably.
Oh, yes, I do know when a thread being stopped is not computing [e.g., in a system function] and won't attempt to stop/context switch it. I know this because each thread that does anything other-than-computing sets a thread specific "don't touch me" flag, while it is doing other-than-computing. (Device driver programmers will recognize this as the equivalent of "interrupt disable" instructions).
So, I wondered about the reliability of the content of the context block.
I added a variety of sanity tests on various register values pulled out of the context block; you can actually decide that ESP is OK (within bounds of the stack area defined in the TIB), PC is in the program that I expect or in a system call, etc. No surprises here.
I decided to check that the condition code bits (EFLAGS) were being properly read out; if this were wrong, it would cause a switched task to take a "wrong branch" when its state was restored. So I added the following code to verify that the purported EFLAGS register contains stuff that only look like EFLAGS according to the Intel reference manual (http://en.wikipedia.org/wiki/FLAGS_register).
mov eax, Context.EFlags[ebx] ; ebx points to Windows Context block
mov ecx, eax ; check that we seem to have flag bits
and ecx, 0FFFEF32Ah ; where we expect constant flag bits to be
cmp ecx, 000000202h ; expected state of constant flag bits
je #f
breakpoint ; trap if unexpected flag bit status
##:
On my Win 7 AMD Phenom II X6 1090T (hex core),
it traps occasionally with a breakpoint, with ECX = 0200h. Fails same way on my Win 7 Intel i7 system. I would ignore this,
except it hints the EFLAGS aren't being stored correctly, as I suspected.
According to my reading of the Intel (and also the AMD) reference manuals, bit 1 is reserved and always has the value "1". Not what I see here.
Obviously, MS fills the context block by doing complicated things on a thread stop. I expect them to store the state accurately. This bit isn't stored correctly.
If they don't store this bit correctly, what else don't they store?
Any explanations for why the value of this bit could/should be zero sometimes?
EDIT: My code dumps the registers and the stack on catching a breakpoint.
The stack area contains the context block as a local variable.
Both EAX, and the value in the stack at the proper offset for EFLAGS in the context block contain the value 0244h. So the value in the context block really is wrong.
EDIT2: I changed the mask and comparsion values to
and ecx, 0FFFEF328h ; was FFEF32Ah where we expect flag bits to be
cmp ecx, 000000200h
This seems to run reliably with no complaints. Apparently Win7 doesn't do bit 1 of eflags right, and it appears not to matter.
Still interested in an explanation, but apparently this is not the source of my occasional context switch crash.
Microsoft has a long history of squirreling away a few bits in places that aren't really used. Raymond Chen has given plenty of examples, e.g. using the lower bit(s) of a pointer that's not byte-aligned.
In this case, Windows might have needed to store some of its thread context in an existing CONTEXT structure, and decided to use an otherwise unused bit in EFLAGS. You couldn't do anything with that bit anyway, and Windows will get that bit back when you call SetThreadContext.

Memory allocation in VC++

I am working with VC++ 2005
I have overloaded the new and delete operators.
All is fine.
My question is related to some of the magic VC++ is adding to the memory allocation.
When I use the C++ call:
data = new _T [size];
The return (for example) from the global memory allocation is 071f2ea0 but data is set to 071f2ea4
When overloaded delete [] is called the 071f2ea0 address is passed in.
Another note is when using something like:
data = new _T;
both data an the return from the global memory allocation are the same.
I am pretty sure Microsoft is adding something at the head of the memory allocation to use for book keeping. My question is, does anyone know of the rules Microsoft is using.
I want to pass in the value of "data" into some memory testing routines so I need to get back to the original memory reference from the global allocation call.
I could assume the 4 byte are an index but I wanted to make sure. I could easily be a flag plus offset, or count or and index into some other table, or just an alignment to cache line of the CPU. I need to find out for sure. I have not been able to find any references to outline the details.
I also think on one of my other runs that the offset was 6 bytes not 4
The 4 bytes most likely contains the total number of objects in the allocation so delete [] will be able to loop over all objects in the array calling their destructor..
To get back the original address, you could keep a lookup-table keyed on address / 16, which stores the base address and length. This will enable you to find the original allocation. You need to ensure that your allocation+4 doesn't cross a 16-byte boundary, however.
EDIT:
I went ahead and wrote a test program that creates 50 objects with a destructor via new, and calls delete []. The destructor just calls printf, so it won't be optimized away.
#include <stdio.h>
class MySimpleClass
{
public:
~MySimpleClass() {printf("Hi\n");}
};
int main()
{
MySimpleClass* arr = new MySimpleClass[50];
delete [] arr;
return 0;
}
The partial disassembly is below, cleaned up to be a bit more legible. As you can see, VC++ is storing array count in the initial 4 byes.
; Allocation
mov ecx, 36h ; Size of allocation
call scratch!operator new
test rax,rax ; Don't write 4 bytes if NULL.
je scratch!main+0x25
mov dword ptr [rax],32h ; Store 50 in first 4 bytes
add rax,4 ; Increment pointer by 4
; Free
lea rdi,[rax-4] ; Grab previous 4 bytes of allocation
mov ebx,dword ptr [rdi] ; Store in loop counter
jmp StartLoop ; Jump to beginning of loop
Loop:
lea rcx,[scratch!`string' (00000000`ffe11170)] ; 1st param to printf
call qword ptr [scratch!_imp_printf; Destructor
StartLoop:
sub ebx,1 ; Decrement loop counter
jns Loop ; Loop while not negative
This book keeping is distinct from the book keeping that malloc or HeapAlloc do. Those allocators don't care about objects and arrays. They only see blobs of memory with a total size. VC++ can't query the heap manager for the total size of the allocation because that would mean that the heap manager would be bound to allocate a block exactly the size that you requested. The heap manager shouldn't have this limitation - if you ask for 240 bytes to allocate for 20 12 byte objects, it should be free to return a 256 byte block that it has immediately available.
The 4 bytes offset is for the number of elements. When delete[] is invoked it is necessary to know the exact number of elements to be able to call the destructors for exactly the necessary number of objects.
Since the memory allocator could have returned a bigger block than necessary for storing all the objects the only sure way to know the number of elements is to store it in the beginning of the block.
For sure, memory allocation does need some bookkeeping info stored together with the actual memory.
Next to that, heap allocated blocks will also be 'decorated' with some magic values, used to easily detect buffer overruns, double deletion, ... Take a look at this CodeGuru site for more info. If you want to know the last of heap debugging, take a look at the msdn documentation.
The finial answer is:
When you do a malloc (which new uses under the hoods) allocates more memory than needed for the system to manage memory. This debug information is not what I am interested. What I am interested is the difference between using the return from malloc on a C++ array allocation.
What I have been able to determine is sometimes the C++ adds/uses 4 additional byte to keep count of the objects being allocated. The twist is these 4 bytes are only added if the objects being allocated require being destructed.
So given:
void* _cdecl operator new(size_t size)
{
void *ptr = malloc(size);
return(ptr);
}
for the case of:
object * data = new object [size]
data will be ptr plus 4 bytes (assuming object required a destructor)
while:
char *data = new char [size]
data will equal ptr because no destructor is required.
Again, I am not interested in the memory tracking malloc adds to manage memory.

Resources