Strings in Delphi locating in dynamic memory.
How to calculate actual memory (in bytes) used by string variable?
I know the string must store some additional information, at least reference count and length, but how many bytes it uses except characters?
var
S: string;
Delphi 2010, XE, XE2 used
The layout on 32 bit UNICODE DELPHI taken from official Embarcadero documentation is like this:
Note that there's an additional longint field in the 64 bit version for 16 byte alignment. The StrRec record in 'system.pas' looks like this:
StrRec = packed record
{$IF defined(CPUX64)}
_Padding: LongInt; // Make 16 byte align for payload..
{$IFEND}
codePage: Word;
elemSize: Word;
refCnt: Longint;
length: Longint;
end;
The payload is always 2*(Length+1) in size. The overhead is 12 or 16 bytes, for 32 or 64 bit targets. Note that the actual memory block may be larger than needed as determined by the memory manager.
Finally, there has been much mis-information in this question. On 64 bit targets, strings are still indexed by 32 bit signed integers.
For String specifically, you can use SysUtils.ByteLength() to get the byte length of the character data, and if not zero then increment the result by SizeOf(System.StrRec) (which is the header in front of the character data) and SizeOf(Char) (for the null-terminator that is not included in the length), eg:
var
S: string;
len: Integer;
begin
S := ...;
len := ByteLength(s);
if len > 0 then Inc(len, SizeOf(StrRec) + SizeOf(Char));
end;
On the other hand, if you want to calculate the byte size of other string types, like AnsiString, AnsiString(N) (such as UTF8String), RawByteString, etc, you need to use System.StringElementSize() instead, eg:
var
S: SomeStringType;
len: Integer;
begin
S := ...;
len := Length(S) * StringElementSize(S);
if len > 0 then Inc(len, SizeOf(StrRec) + StringElementSize(s));
end;
In either case, the reason you only increment the length if the string has characters in it is because empty strings do not take up any memory at all, they are nil pointers.
To answer the question:
How to calculate actual memory (in bytes) used by string variable?
MemSize = Overhead + CharSize * (Length + 1)
CharSize = 1 // for Ansi strings
CharSize = 2 // for Unicode strings
Overhead = 8 // for 32 bit strings
Overhead = 16 // for 64 bit strings
Related
In the below code:
c := "fool"
d := []byte("fool")
fmt.Printf("c: %T, %d\n", c, unsafe.Sizeof(c)) // 16 bytes
fmt.Printf("d: %T, %d\n", d, unsafe.Sizeof(d)) // 24 bytes
To decide the datatype needed to receive JSON data from CloudFoundry, am testing above sample code to understand the memory allocation for []byte vs string type.
Expected size of string type variable c is 1 byte x 4 ascii encoded letter = 4 bytes, but the size shows 16 bytes.
For byte type variable d, GO embeds the string in the executable program as a string literal. It converts the string literal to a byte slice at runtime using the runtime.stringtoslicebyte function. Something like... []byte{102, 111, 111, 108}
Expected size of byte type variable d is again 1 byte x 4 ascii values = 4 bytes but the size of variable d shows 24 bytes as it's underlying array capacity.
Why the size of both variables is not 4 bytes?
Both slices and strings in Go are struct-like headers:
reflect.SliceHeader:
type SliceHeader struct {
Data uintptr
Len int
Cap int
}
reflect.StringHeader:
type StringHeader struct {
Data uintptr
Len int
}
The sizes reported by unsafe.Sizeof() are the sizes of these headers, exluding the size of the pointed arrays:
Sizeof takes an expression x of any type and returns the size in bytes of a hypothetical variable v as if v was declared via var v = x. The size does not include any memory possibly referenced by x. For instance, if x is a slice, Sizeof returns the size of the slice descriptor, not the size of the memory referenced by the slice.
To get the actual ("recursive") size of some arbitrary value, use Go's builtin testing and benchmarking framework. For details, see How to get memory size of variable in Go?
For strings specifically, see String memory usage in Golang. The complete memory required by a string value can be computed like this:
var str string = "some string"
stringSize := len(str) + int(unsafe.Sizeof(str))
I was optimizing a code using a map[string]string where the value of the map was only either "A" or "B". So I thought Obviously a map[string]bool was way better as the map hold around 50 millions elements.
var a = "a"
var a2 = "Why This ultra long string take the same amount of space in memory as 'a'"
var b = true
var c map[string]string
var d map[string]bool
c["t"] = "A"
d["t"] = true
fmt.Printf("a: %T, %d\n", a, unsafe.Sizeof(a))
fmt.Printf("a2: %T, %d\n", a2, unsafe.Sizeof(a2))
fmt.Printf("b: %T, %d\n", b, unsafe.Sizeof(b))
fmt.Printf("c: %T, %d\n", c, unsafe.Sizeof(c))
fmt.Printf("d: %T, %d\n", d, unsafe.Sizeof(d))
fmt.Printf("c: %T, %d\n", c, unsafe.Sizeof(c["t"]))
fmt.Printf("d: %T, %d\n", d, unsafe.Sizeof(d["t"]))
And the result was:
a: string, 8
a2: string, 8
b: bool, 1
c: map[string]string, 4
d: map[string]bool, 4
c2: map[string]string, 8
d2: map[string]bool, 1
While testing I found something weird, why a2 with a really long string use 8 bytes, same as a which has only one letter?
unsafe.Sizeof() does not recursively go into data structures, it just reports the "shallow" size of the value passed. Quoting from its doc:
The size does not include any memory possibly referenced by x. For instance, if x is a slice, Sizeof returns the size of the slice descriptor, not the size of the memory referenced by the slice.
Maps in Go are implemented as pointers, so unsafe.Sizeof(somemap) will report the size of that pointer.
Strings in Go are just headers containing a pointer and a length. See reflect.StringHeader:
type StringHeader struct {
Data uintptr
Len int
}
So unsafe.Sizeof(somestring) will report the size of the above struct, which is independent of the length of the string value (which is the value of the Len field).
To get the actual memory requirement of a map ("deeply"), see How much memory do golang maps reserve? and also How to get memory size of variable in Go?
Go stores the UTF-8 encoded byte sequences of string values in memory. The builtin function len() reports the byte-length of a string, so
basically the memory required to store a string value in memory is:
var str string = "some string"
stringSize := len(str) + int(unsafe.Sizeof(str))
Also don't forget that a string value may be constructed by slicing another, bigger string, and thus even if the original string is no longer referenced (and thus no longer needed), the bigger backing array will still be required to be kept in memory for the smaller string slice.
For example:
s := "some loooooooong string"
s2 := s[:2]
Here, even though memory requirement for s2 would be len(s2) + unsafe.Sizeof(str) = 2 + unsafe.Sizeof(str), still, the whole backing array of s will be retained.
I am using the TComPort library, to send a request to the device I need to send the command in Hexadecimal data like the example below
procedure TForm1.Button3Click(Sender: TObject);
begin
ComPort1.WriteStr(#$D5#$D5);
end;
But that is a hardcoded example.
How can I convert S into a valid value for ComPort1.WriteStr
procedure TForm1.Button3Click(Sender: TObject);
var
S:String;
begin
Edit1.Text:='D5 D5';
ComPort1.WriteStr(Edit1.Text);
end;
You don't send actual hex strings over the port. That is just a way to encode binary data in source code at compile-time. #$D5#$D5 is encoding 2 string characters with numeric values of 213 (depending on {$HIGHCHARUNICODE}, which is OFF by default).
TComPort.WriteStr() expects actual bytes to send, not hex strings. If you want the user to enter hex strings that you then send as binary data, look at Delphi's HexToBin() functions for that conversion.
That being said, note that string in Delphi 2009+ is 16-bit Unicode, not 8-bit ANSI. You should use TComPort.Write() to send binary data instead of TComPort.WriteStr(), eg:
procedure TForm1.Button3Click(Sender: TObject);
var
buf: array[0..1] of Byte;
begin
buf[0] := $D5;
buf[1] := $D5;
ComPort1.Write(buf, 2);
end;
However, TComPort.WriteStr() will accept a 16-bit Unicode string and transmit it as an 8-bit binary string by simply stripping off the upper 8 bits of each Char. So, if you send a string containing two Char($D5) values in it, it will be sent as 2 bytes $D5 $D5.
Environment: Win7 64bit, Delphi 2010, Win32 project.
I try to get integer hash values for set of strings with the help of BobJenkinsHash() function from Generics.Defaults.
It works but some points are not clear for me.
Can result of the function be negative?
As I see
on source site it is used uint32_t as result type of the hashword() function:
uint32_t hashword(
const uint32_t *k, /* the key, an array of uint32_t values */
size_t length, /* the length of the key, in uint32_ts */
uint32_t initval) /* the previous hash, or an arbitrary value */
{
Is it unsigned int?
Second question is I have different results for different strings with identical values:
'DEFPROD001' => 759009858
'DEFPROD001' => 1185633302
Is it normal behaviour?
My full function to calculate hash (if first argument is empty then second is returned):
function TAmWriterJD.ComposeID(const defaultID: string; const GUID: String): String;
var
bjh: Integer;
begin
if defaultID = '' then
begin
Result := GUID
end
else
begin
bjh := BobJenkinsHash(defaultID, Length(defaultID) * SizeOf(defaultID), 0);
Result := IntToStr(bjh);
end;
end;
The Delphi implementation is declared like so:
function BobJenkinsHash(const Data; Len, InitData: Integer): Integer;
It returns a signed 32 bit integer. So yes, this implementation can return negative values.
The C implementation you refer to returns an unsigned 32 bit integer. So that cannot return negative values.
Assuming both implementations are correct then they will, given the same input, return the same 32 bits of output. It's just that when interpreted as signed or unsigned values these bits have different meaning.
As to your second question, passing the same string to the hash function will yield the same hash. You must have made a mistake in your test case.
BobJenkinsHash(defaultID, Length(defaultID) * SizeOf(defaultID), 0);
Here defaultID is a string variable and that is implemented as a pointer. You are therefore hashing the address. And not even doing that correctly due to your incorrect length argument. Instead you need to write:
BobJenkinsHash(Pointer(defaultID)^, Length(defaultID) * SizeOf(Char), 0);
This program demonstrates:
{$APPTYPE CONSOLE}
uses
System.Generics.Defaults;
var
s, t: string;
begin
s := 'DEFPROD001';
t := 'DEFPROD001';
Writeln(BobJenkinsHash(s, Length(s) * SizeOf(s), 0));
Writeln(BobJenkinsHash(t, Length(t) * SizeOf(t), 0));
Writeln(BobJenkinsHash(Pointer(s)^, Length(s) * SizeOf(Char), 0));
Writeln(BobJenkinsHash(Pointer(t)^, Length(t) * SizeOf(Char), 0));
Readln;
end.
Output:
2129045826
-331457644
-161666357
-161666357
I have a function that takes in a pointer to a series of char.
I wish to copy 256 chars from that point, and place them in a string.
msg is not null-terminated.
The below code seems to give me some problem.
Is there a correct way to do this?
Init( msg: PCHAR)
var
myStr: String;
begin
for i:= 1 to 256 do
begin
myStr[i] := msg[i-1];
end;
end;
SetString(myStr, msg, 256);
Your code misses SetLength, the corrected version is:
Init( msg: PCHAR)
var
myStr: String;
begin
SetLength(myStr, 256);
for i:= 1 to 256 do
begin
myStr[i] := msg[i-1];
end;
end;
The assignment can be done more efficiently as already answered.
Updated
SetLength allocates 256 characters + terminating 0 for myStr; without SetLength your code is an error: it writes to wild address and will finally result in access violation.
If msg is null-terminated, as it should be, and the 256 characters you want to obtain are followed by the null character, simply do
myStr := msg;
If msg is longer than that, you could just do
myStr := Copy(msg, 1, 256);
In this case, a better method is
myStr := WideCharLenToString(msg, 256);
assuming you are using Delphi 2009 or later, in which the strings are Unicode.
myStr := Copy(msg, 1, 256);