Delphi : how to completely remove String from the memory [duplicate]

Delphi : how to completely remove String from the memory [duplicate] - string

This question already has answers here:
How can I increase memory security in Delphi?
(11 answers)
Closed 9 years ago.
Var
S1:String;
S2:String;
begin
S1:='Sensitive Data';
S2:=Crypt(S1,'encryption key');
S1:='';
FreeAndNil(S1);
end;
now when i search on my process memory using programs like "WinHex" i can easly find the un-crypted String !
even i tried to make new form to encrypt this string then unload the form but it still exist
is there any way to completely remove it
thanks in advance

You need to overwrite the string with zeros when you are done with it. Like this:
ZeroMemory(Pointer(s), Length(s)*SizeOf(Char));
If you are paranoid that the compiler will optimise away the ZeroMemory then you could use SecureZeroMemory. However, the Delphi compiler will not optimise away ZeroMemory so this is somewhat moot.
If you just write:
s := '';
then the memory will be returned as is to the memory manager. You then have no control over when, if ever, the memory manager re-uses or returns the memory.
Obviously you'd need to do that to all copies of the string, and so the only sane approach is not to make copies of sensitive data.
None of this will help with the code as per your question because your sensitive data is a string literal and so is stored in the executable. This approach can only be applied meaningfully for dynamic data. I presume that your real program does not put sensitive data in literals.
Oh, and don't ever pass a string to FreeAndNil. You can only pass object variables to FreeAndNil, but the procedure uses an untyped var parameter so the compiler cannot save you from your mistake.

Var
S1:String;
S2:String;
begin
S1:='Sensitive Data';
S2:=Crypt(S1,'encryption key');
UniqueString(S1); // <-- if reference count of S1 is not 1
ZeroMemory(Pointer(S1), Length(S1)*SizeOf(Char));
// or better: SecureZeroMemory(Pointer(S1), Length(S1)*SizeOf(Char));
S1:='';
end;

Related

Modifying the data of an array vs seq and passing the address of an array vs seq to asyncnet proc `send`

I've been working on a server that expects data to be received through a buffer. I have an object which is defined like this and some procedures that modify the buffer in it:
Packet* = ref object
buf*: seq[int8]
#buf*: array[0..4096, int8]
pos*: int
proc newPacket*(size: int): Packet =
result = Packet(buf: newSeq[int8](size))
#result = Packet()
proc sendPacket*(s: AsyncSocket, p: Packet) =
aSyncCheck s.send(addr(p.buf), p.pos)
Now the reason I have two lines commented is because that was the code I originally used, but creating an object that initialises an array with 4096 elements every time probably wasn't very good for performance. However, it works and the seq[int8] version does not.
The strange thing is though, my current code will work perfectly fine if I use the old static buffer buf*: array[0..4096, int8]. In sendPacket, I have made sure to check the data contained in the buffer to make sure both the array and seq[int8] versions are equal, and they are. (Or at least appear to be). In other words, if I were to do var p = createPacket(17) and write to p.buf with exactly 17 bytes, the values of the elements appear to be the same in both versions.
So despite the data appearing to be the same in both versions, I get a different result when calling send when passing the address of the buffer.
In case it matters, the data would be read like this:
result = p.buf[p.pos]
inc(p.pos)
And written to like this:
p.buf[p.pos] = cast[int8](value)
inc(p.pos)
Just a few things I've looked into, which were probably unrelated to my problem anyway: I looked at GC_ref and GC_unref which had no effect on my problem and also looked at maybe trying to use alloc0 where buf is defined as pointer but I couldn't seem to access the data of that pointer and that probably isn't what I should be doing in the first place. Also if I do var data = p.buf and pass the addr of data instead, I get a different result, but still not the intended one.
So I guess what I want to get to the bottom of is:
Why does send work perfectly fine when I use array[0..4096, int8] but not seq[int8] which is initialised with newSeq, even when they appear to contain the same data?
Does my current layout for receiving and writing data even make sense in a language like Nim (or any language for that matter)? Is there a better way?

In order not to initialize the array you can use the noinit pragma like this:
buf* {.noinit.}: array[0..4096, int8]
You are probably taking the pointer to the seq, not the pointer to the data inside the seq, so try using addr(p.buf[0]).
A pos field is useless if you are using the seq version since you have p.buf.len already, but you probably know that already and just left it in for the array. If you want to use the seq and expect large packets, make sure to use newSeqOfCap to only allocate the memory once.
Also, your array is 1 byte too big, it goes from 0 to 4096 inclusively! Instead you can use [0..4095, int8] or just [4096, int8].
Personally I would prefer to use a uint8 type inside of buf, so that you can just put in values from 0 to 255 instead of -128 to 127
Using a seq inside of a ref object means you have two layers of indirection when accessing buf, as well as two objects that the GC will have to clean up. You could just make Packet an alias for seq[uint8] (without ref): type Packet* = seq[uint8]. Or you can use the array version if you want to store some more data inside the Packet later on.

in-memory string deduplication

Context: I'm writing something to process log data which involves loading several GB of data into memory and cross checking various things, finding correlations in data and writing the results out to another file. (This is essentially a cooking/denormalization step before loading into a Druid.io cluster.) I want to avoid having to write the information to a database for both performance and code simplicity - it is assumed that in the foreseeable future the volume of data processed at one time can be handled by adding memory to the machine.
My question is if it is a good idea to attempt to explicitly deduplicate strings in my code; and if so, what is a good approach. Many of the values in these log files are the same exact pieces of text (probably about 25% of the total text values in the file are unique, rough guess).
Since we're talking about gigs of data, and while ram is cheap and swap is possible, there is still a limit and if I'm careless I will very likely hit it. If I do something like this:
strstore := make(map[string]string)
// do some work that involves slicing and dicing some text, resulting in:
var a = "some string that we figured out that has about a 75% chance of being duplicate"
// note that there are about 10 other such variables that are calculated here, only "a" shown for simplicity
if _, ok := strstore[a]; !ok {
strstore[a] = a
} else {
a = strstore[a]
}
// now do some more stuff with "a" and keep it in a struct which is in
// some map some place
It would seem to me that this would have the effect of "reusing" existing strings, at the cost of a hash lookup and compare. Seemingly a good trade off.
However, this might not be that helpful if the strings that are in fact new cause memory to be fragmented and have various holes that are left unclaimed.
I could also try to keep one big byte array/slice which has the character data and index into that, but it would make the code hard to write (esp having to mess around with conversion between []byte and strings, which involves it's own allocation) and I would probably just be doing a poor job of something that is really the Go runtime's domain anyway.
Looking for any advice on approaches to this problem, or if anyone's experience with this sort of thing has yielded particularly useful mechanisms to address this.

There are a lot of interesting data structures and algorithms that you could use here. It depends on what you are trying do in the stats and processing stages.
I am not sure how compressible your logs are but you could pre-process the data, again depending on your uses cases : https://github.com/alecthomas/mph/blob/master/README.md
Take a look at some of these data structures as well for background :
https://github.com/Workiva/go-datastructures

string problems migrating Delphi 3 to Delphi 2010

I got the source of an older project and have to change little things but I got in big trouble because of having only delphi 2010 to do that.
There is an record defined :
bbil = record
path : string;
pos: byte;
nr: Word;
end;
later this definition is used to read from file :
b_bil: file of bbil;
pbbil: ^bbil;
l_bil : tlist;
while not(eof(b_bil)) do
begin
new(pbbil);
read(b_bil, pbbil^);
l_bil.add(pbbil);
end
The primary problem is, the compiler does not accept the type "string" in the record because he wants a "finalization".
So I tried to change "string" to "string[255]" or "shortstring". Doing this the app is reading the file but with wrong content.
My question is how to convert the old "string" type with which the files were written to the "new" types in Delphi 2010.
I already tried a lot e.g. "{$H-}". Adding only one char more in the record shows, the file is correct, because file is read nearly correct but truncated one char more each dataset - the length of lengthbyte+255chars seems to be correct fpr the definition but shortstring is not matching.

Eek! It looks like your code either pre-dates or does not use long strings. If you want to get the same behaviour as in your old Delphi then you need to replace string with ShortString.
I see that you've tried that already and report that it fails. It's really the only explanation that makes any sense to me because all other string types are essentially pointers and so the only way the read could ever have worked is with a ShortString. The migration you are attempting is immense and you probably have huge numbers of confounding problems.
#LU RD makes a good point in the comments that the record layout may differ between Delphi versions since you are not using a packed array. You can investigate the record layout using the two Delphi versions that you have at hand. You will need to arrange that the size of the records match between versions, and that the offsets to the fields also match.
Based on the comments below, adding a padding byte between pos and nr will resolve your problems.
bbil = record
path : string;
pos: byte;
_pad: byte;
nr: Word;
end;
You could also achieve the same effect by setting the $ALIGN compiler option to {$ALIGN ON} which would be how I think I would go about things.
In the long run you really ought to get away from short strings, ANSI encoding, direct mapping between your internal records and your data files and so on. In the short run you may be better off getting hold of the same version of Delphi as was used to build this code and using that. I'd expect this issue to be just the tip of the iceberg.

Just remember:
"string" <> "string[255]" <> "shortstring" <> AnsiString
Back in old DOS/Turbo Pascal days, "strings" were indeed limited to 255 characters. In large part because the 1st byte contained the string length, and a byte can only have a value between 0 and 255.
That is no longer an issue in contemporary versions of Delphi.
"ShortString" is the type for the old DOS/Pascal string type.
"LongString" has been the default string type for a long time (including the Borland Delphi 2006 I currently use for most production work). From Delphi 3 .. Delphi 2009, LongStrings held 8-bit characters, and were limited only by available memory. From Delphi 3 .. Delphi 2009, "LongStrings" were synonymous with "AnsiStrings".
Recent versions of Delphi (Delphi 2009 and higher, including the new Delphi XE2) all now default to multi-byte Unicode "WideString" strings. WideStrings, like AnsiStrings, are also effectively "unlimited" in maximum length.
This article explains in more detail:
http://delphi.about.com/od/beginners/l/aa071800a.htm
PS:
Consider using "sizeof(bbil)" and "Packed" for binary records.

Maybe I'm overlooking something, but, the way I see it, your delphi 3 code is broken too. Try to determine the size of your record:
bbil = record
path : string;
pos: byte;
nr: Word;
end;
path (anything between 1 and 256 - one byte for length, rest for data), pos (1 byte), nr (2 bytes), making your record data size vary from 1+1+2=4 bytes to 256+1+2=259 bytes. Under that circumstance, you would get garbage from file in any case, since your program cant now how many bytes to read, before actually reading the data. I suggest you fix your record so that the string is of a fixed size, like:
path : ShortString[255];
Then, you would be able to write and read fine in both delphi 3 and 2010.

Memory leak in Ada.Strings.Unbounded ?

I have a curious memory leak, it seems that the library function to_unbounded_string is leaking!
Code snippets:
procedure Parse (Str : in String;
... do stuff...
declare
New_Element : constant Ada.Strings.Unbounded.Unbounded_String :=
Ada.Strings.Unbounded.To_Unbounded_String (Str); -- this leaks
begin
valgrind output:
==6009== 10,276 bytes in 1 blocks are possibly lost in loss record 153 of 153
==6009== at 0x4025BD3: malloc (vg_replace_malloc.c:236)
==6009== by 0x42703B8: __gnat_malloc (in /usr/lib/libgnat-4.4.so.1)
==6009== by 0x4269480: system__secondary_stack__ss_allocate (in /usr/lib/libgnat-4.4.so.1)
==6009== by 0x414929B: ada__strings__unbounded__to_unbounded_string (in /usr/lib/libgnat-4.4.so.1)
==6009== by 0x80F8AD4: syntax__parser__dash_parser__parseXn (token_parser_g.adb:35)
Where token_parser_g.adb:35 is listed above as the "-- this leaks" line.
Other info: Gnatmake version 4.4.5. gcc version 4.4 valgrind version valgrind-3.6.0.SVN-Debian, valgrind options -v --leak-check=full --read-var-info=yes --show-reachable=no
Any help or insights appreciated,
NWS.

Valgrind clearly says that there is possibly a memory leak. It doesn't necessarily mean there is one. For example, if first call to that function allocates a pool of memory that is re-used during the life time of the program but is never freed, Valgrind will report it as a possible memory leak, even though it is not, as this is a common practice and memory will be returned to OS upon process termination.
Now, if you think that there is a memory leak for real, call this function in a loop, and see it memory continues to grow. If it does - file a bug report or even better, try to find and fix the leak and send a patch along with a bug report.
Hope it helps.

Was trying to keep this to comments, but what I was saying got too long and started to need formatting.
In Ada string objects are generally assumed to be perfectly-sized. The language provies functions to return the size and bounds of any string. Because of this, string handling in Ada is very different than C, and in fact more resembles how you'd do it in a functional language like Lisp.
But the basic principle is that, except in some very unusual situations, if you find yourself using Ada.Strings.Unbounded, you are going about things the wrong way.
The one case where you really can't get around using a variable-length string (or perhaps a buffer with a separate valid_length variable), is when reading strings as input from some external source. As you say, your parsing example is such a situation.
However, even here you should only have that situation on the initial buffer. Your call to your Parse routine should look something like this:
Ada.Text_IO.Get_Line (Buffer, Buffer_Len);
Parse (Buffer(Buffer'first..Buffer'first + Buffer_Len - 1));
Now inside the Parse routine you have a perfectly-sized constant Ada string to work with. If for some reason you need to pull out a subslice, you would do the following:
... --// Code to find start and end indices of my subslice
New_Element : constant String := Str(Element_Start...Element_End);
If you don't actually need to make a copy of that data for some reason though, you are better off just finding Element_Start and Element_End and working with a slice of the original string buffer. Eg:
if Str(Element_Start..Element_End) = "MyToken" then
I know this doesn't answer your question about Ada.Strings.Unbounded possibly leaking. But even if it doesn't leak, that code is relatively wasteful of machine resources (CPU and memory), and probably shouldn't be used for string manipulation unless you really need it.

Are bound[ed] strings scoped?
Expanding on #T.E.D.'s comments, Ada.Strings.Bounded "objects should not be implemented by implicit pointers and dynamic allocation." Instead, the maximum size is fixed when the generic in instantiated. As an implmentation detail, GNAT uses a discriminant to specify the maximum size of the string and a record to store the current size & contents.
In contrast, Ada.Strings.Unbounded requires that "No storage associated with an Unbounded_String object shall be lost upon assignment or scope exit." As an implmentation detail, GNAT uses a buffered implementation derived from Ada.Finalization.Controlled. As a result, the memory used by an Unbounded_String may appear to be a leak until the object is finalized, as for example when the code returns to an enclosing scope.

How can I increase memory security in Delphi?

Is it possible to "wipe" strings in Delphi? Let me explain:
I am writing an application that will include a DLL to authorise users. It will read an encrypted file into an XML DOM, use the information there, and then release the DOM.
It is obvious that the unencrypted XML is still sitting in the memory of the DLL, and therefore vulnerable to examination. Now, I'm not going to go overboard in protecting this - the user could create another DLL - but I'd like to take a basic step to preventing user names from sitting in memory for ages. However, I don't think I can easily wipe the memory anyway because of references. If I traverse my DOM (which is a TNativeXML class) and find every string instance and then make it into something like "aaaaa", then will it not actually assign the new string pointer to the DOM reference, and then leave the old string sitting there in memory awaiting re-allocation? Is there a way to be sure I am killing the only and original copy?
Or is there in D2007 a means to tell it to wipe all unused memory from the heap? So I could release the DOM, and then tell it to wipe.
Or should I just get on with my next task and forget this because it is really not worth bothering.

I don't think it is worth bothering with, because if a user can read the memory of the process using the DLL, the same user can also halt the execution at any given point in time. Halting the execution before the memory is wiped will still give the user full access to the unencrypted data.
IMO any user sufficiently interested and able to do what you describe will not be seriously inconvenienced by your DLL wiping the memory.

Two general points about this:
First, this is one of those areas where "if you have to ask, you probably shouldn't be doing this." And please don't take that the wrong way; I mean no disrespect to your programming skills. It's just that writing secure, cryptographically strong software is something that either you're an expert at or you aren't. Very much in the same way that knowing "a little bit of karate" is much more dangerous than knowing no karate at all. There are a number of third-party tools for writing secure software in Delphi which have expert support available; I would strongly encourage anyone without a deep knowledge of cryptographic services in Windows, the mathematical foundations of cryptography, and experience in defeating side channel attacks to use them instead of attempting to "roll their own."
To answer your specific question: The Windows API has a number of functions which are helpful, such as CryptProtectMemory. However, this will bring a false sense of security if you encrypt your memory, but have a hole elsewhere in the system, or expose a side channel. It can be like putting a lock on your door but leaving the window open.

How about something like this?
procedure WipeString(const str: String);
var
i:Integer;
iSize:Integer;
pData:PChar;
begin
iSize := Length(str);
pData := PChar(str);
for i := 0 to 7 do
begin
ZeroMemory(pData, iSize);
FillMemory(pData, iSize, $FF); // 1111 1111
FillMemory(pData, iSize, $AA); // 1010 1010
FillMemory(pData, iSize, $55); // 0101 0101
ZeroMemory(pData, iSize);
end;
end;

DLLs don't own allocated memory, processes do. The memory allocated by your specific process will be discarded once the process terminates, whether the DLL hangs around (because it is in use by another process) or not.

How about decrypting the file to a stream, using a SAX processor instead of an XML DOM to do your verification and then overwriting the decrypted stream before freeing it?

If you use the FastMM memory manager in Full Debug mode, then you can force it to overwrite memory when it is being freed.
Normally that behaviour is used to detect wild pointers, but it can also be used for what your want.
On the other hand, make sure you understand what Craig Stuntz writes: do not write this authentication and authorization stuff yourself, use the underlying operating system whenever possible.
BTW: Hallvard Vassbotn wrote a nice blog about FastMM:
http://hallvards.blogspot.com/2007/05/use-full-fastmm-consider-donating.html
Regards,
Jeroen Pluimers

Messy but you could make a note of the heap size that you've used while you've got the heap filled with sensitive data then when that is released do a GetMem to allocate you a large chunk spanning (say) 200% of that. do a Fill on that chunk and make the assumption that any fragmentation is unlinkely to be of much use to an examiner.
Bri

How about keeping the password as a hash value in the XML and verify by comparing the hash of the input password with the hashed password in your XML.
Edit: You can keep all the sensitive data encrypted and decrypt only at the last possible moment.

Would it be possible to load the decrypted XML into an array of char or byte rather than a string? Then there would be no copy-on-write handling, so you would be able to backfill the memory with #0's before freeing?
Be careful if assigning array of char to string, as Delphi has some smart handling here for compatibility with traditional packed array[1..x] of char.
Also, could you use ShortString?

If your using XML, even encrypted, to store passwords your putting your users at risk. A better approach would be to store the hash values of the passwords instead, and then compare the hash against the entered password. The advantage of this approach is that even in knowing the hash value, you won't know the password which makes the hash. Adding a brute force identifier (count invalid password attempts, and lock the account after a certain number) will increase security even further.
There are several methods you can use to create a hash of a string. A good starting point would be to look at the turbo power open source project "LockBox", I believe it has several examples of creating one way hash keys.
EDIT
But how does knowing the hash value if its one way help? If your really paranoid, you can modify the hash value by something prediticable that only you would know... say, a random number using a specific seed value plus the date. You could then store only enough of the hash in your xml so you can use it as a starting point for comparison. The nice thing about psuedo random number generators is that they always generate the same series of "random" numbers given the same seed.

Be careful of functions that try to treat a string as a pointer, and try to use FillChar or ZeroMemory to wipe the string contents.
this is both wrong (strings are shared; you're screwing someone else who's currently using the string)
and can cause an access violation (if the string happens to have been a constant, it is sitting on a read-only data page in the process address space; and trying to write to it is an access violation)
procedure BurnString(var s: UnicodeString);
begin
{
If the string is actually constant (reference count of -1), then any attempt to burn it will be
an access violation; as the memory is sitting in a read-only data page.
But Delphi provides no supported way to get the reference count of a string.
It's also an issue if someone else is currently using the string (i.e. Reference Count > 1).
If the string were only referenced by the caller (with a reference count of 1), then
our function here, which received the string through a var reference would also have the string with
a reference count of one.
Either way, we can only burn the string if there's no other reference.
The use of UniqueString, while counter-intuitiave, is the best approach.
If you pass an unencrypted password to BurnString as a var parameter, and there were another reference,
the string would still contain the password on exit. You can argue that what's the point of making a *copy*
of a string only to burn the copy. Two things:
- if you're debugging it, the string you passed will now be burned (i.e. your local variable will be empty)
- most of the time the RefCount will be 1. When RefCount is one, UniqueString does nothing, so we *are* burning
the only string
}
if Length(s) > 0 then
begin
System.UniqueString(s); //ensure the passed in string has a reference count of one
ZeroMemory(Pointer(s), System.Length(s)*SizeOf(WideChar));
{
By not calling UniqueString, we only save on a memory allocation and wipe if RefCnt <> 1
It's an unsafe micro-optimization because we're using undocumented offsets to reference counts.
And i'm really uncomfortable using it because it really is undocumented.
It is absolutely a given that it won't change. And we'd have stopping using Delphi long before
it changes. But i just can't do it.
}
//if PLongInt(PByte(S) - 8)^ = 1 then //RefCnt=1
// ZeroMemory(Pointer(s), System.Length(s)*SizeOf(WideChar));
s := ''; //We want the callee to see their passed string come back as empty (even if it was shared with other variables)
end;
end;
Once you have the UnicodeString version, you can create the AnsiString and WideString versions:
procedure BurnString(var s: AnsiString); overload;
begin
if Length(s) > 0 then
begin
System.UniqueString(s);
ZeroMemory(Pointer(s), System.Length(s)*SizeOf(AnsiChar));
//if PLongInt(PByte(S) - 8)^ = 1 then //RefCount=1
// ZeroMemory(Pointer(s), System.Length(s)*SizeOf(AnsiChar));
s := '';
end;
end;
procedure BurnString(var s: WideString);
begin
//WideStrings (i.e. COM BSTRs) are not reference counted, but they are modifiable
if Length(s) > 0 then
begin
ZeroMemory(Pointer(s), System.Length(s)*SizeOf(WideChar));
//if PLongInt(PByte(S) - 8)^ = 1 then //RefCount=1
// ZeroMemory(Pointer(s), System.Length(s)*SizeOf(AnsiChar));
s := '';
end;
end;

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string