Creating string from byte array - string

I'm trying to create string using StringOf function in code below. Why after ZeroMemory on array that was used to create string Showmessage displays nothing. Why? In commented ZeroMemory case ===== is displayed.
TIdBytes = array of Byte;
procedure fill(var b: TIDBytes);
begin
setlength(b,5);
b[0]:=61;
b[1]:=61;
b[2]:=61;
b[3]:=61;
b[4]:=61;
end;
procedure TMainForm.FormCreate(Sender: TObject);
var
POSSaleTransaction: TPOSSaleTransaction;
s: ansistring ;
b:TIDBytes;
begin
fill(b);
s := StringOf( TArray<byte>(b) );
ZeroMemory(#b, Length(b));
Showmessage(s);
end;
I'm using Delphi XE4
The reason I'm trying to ZeroMemory is that I wont to be 100% shure that newly created string is not using reference to byte[], but copyes b data. With help of ZeroMemory I'm deleting contents of b while expecting that it will not have influence on string.

ZeroMemory does not free memory. It writes zero bytes into the block of memory that you provide.
Even then, your code gets that wrong. In your code, b is a pointer to the dynamic array. You pass #b to ZeroMemory so you are zeroising the pointer rather than the array that it points to. And since the value byte count that you pass is greater than SizeOf(b) then you are zeroising other parts of the stack too. That's why your call to ZeroMemory is destroying your string.
To zeroise the memory you would write:
ZeroMemory(Pointer(b), Length(b));
If you want to delete a dynamic array then you can write
b := nil;
or
Finalize(b);
or
SetLength(b, 0);
The reason I'm trying to use ZeroMemory is that I want to be 100% sure that newly created string is not using reference to the byte array, but is a copy of it.
You don't need to write any code to prove that. You can be sure because a Delphi string is UTF-16 encoded, and your byte array uses an 8 bit encoding. So even if the RTL designers wanted to take a reference to the byte array, it would not have been possible.

You just blew the stack variables there. Both the B pointer (in its entirety) and partially the pointer S ( (Length(b) - SizeOf(b)) bytes of it ).
What is b ? it is a some complex structure, a handle, a pointer. Usually You do not want to destroy memory structure, you want to put the data into the cells. But in your example you just wiped out the whole memory structures allocated on stack. Including, probably, the string itself.
The following program works as expected in Delphi XE2 - see what is there instead of Zero Memory. Read what are dynamic arrays in Delphi and how they are allocated from CPU Assembler point of view when you want to use low-level tricks as raw pointers ( or untyped variables like in ZeroMemory)
program Project11;
{$APPTYPE CONSOLE}
{$R *.res}
uses
System.SysUtils;
procedure fill(var b: TBytes);
begin
SetLength(b,5);
// b[0]:=61; b[1]:=61; b[2]:=61; b[3]:=61; b[4]:=61;
FillChar(b[Low(b)], Length(b), 61); // Less copy-paste, more program structure
// Notice, above i take pointer to the cell inside the array,
// not to the array the container itself.
// That is both safer and does document the intention of the code
end;
Procedure SOTest();
var
s: ansistring ;
b: TBytes;
begin
fill(b);
s := StringOf( b );
// ZeroMemory(#b, Length(b)); -- destroying the pointer instead of freeing memory - is a memory leak
// FillChar(b, Length(b), 0); -- same as above, written in Pascal style, rather than C style.
b := nil; // this really does free the DYNAMIC ARRAYS. Laconic but prone to errors if mistyped.
// SetLength(b, 0); -- more conventional and safe method to do the same: free string or dyn-array.
// Anyway that is unnecessary - both b and s would anyway be auto-freed before the function exit.
Writeln(Length(s):4, ' ', s);
end;
begin
try
{ TODO -oUser -cConsole Main : Insert code here }
SOTest;
Write('Press Enter to exit;'); ReadLn;
except
on E: Exception do
Writeln(E.ClassName, ': ', E.Message);
end;
end.
See manuals.
http://docwiki.embarcadero.com/Libraries/XE4/en/System.FillChar
http://docwiki.embarcadero.com/RADStudio/XE4/en/Parameters_(Delphi)#Untyped_Parameters
http://docwiki.embarcadero.com/RADStudio/XE4/en/Structured_Types#Dynamic_Arrays
http://docwiki.embarcadero.com/Libraries/XE4/en/System.SysUtils.TBytes
http://docwiki.embarcadero.com/Libraries/XE4/en/System.SysUtils.StringOf
So the next question is WHY were you trying to call ZeroMemory, what is the point there ? IS there soem attempt to destroy a cypher key or other sensitive data ? http://www.catb.org/~esr/faqs/smart-questions.html#goal
If you only want to assure that "s" variable does not have any external references - there is a special function for it, UniqueString.
http://docwiki.embarcadero.com/Libraries/XE4/en/System.UniqueString
However in this particular workflow and this particular Delphi version that could not happen anyway. Read again manual for StringOf - it returns a UnicodeString temporary hidden variable. That variable is encoded in UTF-16 in XE4, which means having 2 bytes per letter, which means the original byte-chain would not suit anyway and would be transformed into new buffer.
After that you convert the UnicodeString temporary hidden variable into AnsiString variable s having one byte per letter, so it also can not have references to the temp-var, but would allocate yet another independent buffer to hold the transformed data.
As you can see there is two necessary copy-with-transformation operations, both of which make keeping data references just impossible.

You probably want to do this:
ZeroMemory(#b[0], Length(b));
instead of
ZeroMemory(#b, Length(b));
Remember the b variable is 4 bytes size pointer only and point to array of bytes.

Related

What are the possible consequences of using unsafe conversion from []byte to string in go?

The preferred way of converting []byte to string is this:
var b []byte
// fill b
s := string(b)
In this code byte slice is copied, which can be a problem in situations where performance is important.
When performance is critical, one can consider performing the unsafe conversion:
var b []byte
// fill b
s := *(*string)(unsafe.Pointer(&b))
My question is: what can go wrong when using the unsafe conversion? I known that string should be immutable and if we change b, s will also be changed. And still: so what? Is it all bad that can happen?
Modifying something that the language spec guarantees to be immutable is an act of treason.
Since the spec guarantees that strings are immutable, compilers are allowed to generate code that caches their values and does other optimization based on this. You can't change values of strings in any normal way, and if you resort to dirty ways (like package unsafe) to still do it, you lose all the guarantees provided by the spec, and by continuing to use the modified strings, you may bump into "bugs" and unexpected things randomly.
For example if you use a string as a key in a map and you change the string after you put it into the map, you might not be able to find the associated value in the map using either the original or the modified value of the string (this is implementation dependent).
To demonstrate this, see this example:
m := map[string]int{}
b := []byte("hi")
s := *(*string)(unsafe.Pointer(&b))
m[s] = 999
fmt.Println("Before:", m)
b[0] = 'b'
fmt.Println("After:", m)
fmt.Println("But it's there:", m[s], m["bi"])
for i := 0; i < 1000; i++ {
m[strconv.Itoa(i)] = i
}
fmt.Println("Now it's GONE:", m[s], m["bi"])
for k, v := range m {
if k == "bi" {
fmt.Println("But still there, just in a different bucket: ", k, v)
}
}
Output (try it on the Go Playground):
Before: map[hi:999]
After: map[bi:<nil>]
But it's there: 999 999
Now it's GONE: 0 0
But still there, just in a different bucket: bi 999
At first, we just see some weird result: simple Println() is not able to find its value. It sees something (key is found), but value is displayed as nil which is not even a valid value for the value type int (zero value for int is 0).
If we grow the map to be big (we add 1000 elements), internal data structure of the map gets restructured. After this, we're not even able to find our value by explicitly asking for it with the appropriate key. It is still in the map as iterating over all its key-value pairs we find it, but since hash code changes as the value of the string changes, most likely it is searched for in a different bucket than where it is (or where it should be).
Also note that code using package unsafe may work as you expect it now, but the same code might work completely differently (meaning it may break) with a future (or old) version of Go as "packages that import unsafe may be non-portable and are not protected by the Go 1 compatibility guidelines".
Also you may run into unexpected errors as the modified string might be used in different ways. Someone might just copy the string header, someone may copy its content. See this example:
b := []byte{'h', 'i'}
s := *(*string)(unsafe.Pointer(&b))
s2 := s // Copy string header
s3 := string([]byte(s)) // New string header but same content
fmt.Println(s, s2, s3)
b[0] = 'b'
fmt.Println(s == s2)
fmt.Println(s == s3)
We created 2 new local variables s2 and s3 using s, s2 initialized by copying the string header of s, and s3 is initialized with a new string value (new string header) but with the same content. Now if you modify the original s, you would expect in a correct program that comparing the new strings to the original you would get the same result be it either true or false (based on if values were cached, but should be the same).
But the output is (try it on the Go Playground):
hi hi hi
true
false

What exactly are strings in Nim?

From what I understand, strings in Nim are basically a mutable sequence of bytes and that they are copied on assignment.
Given that, I assumed that sizeof would tell me (like len) the number of bytes, but instead it always gives 8 on my 64-bit machine, so it seems to be holding a pointer.
Given that, I have the following questions...
What was the motivation behind copy on assignment? Is it because they're mutable?
Is there ever a time when it isn't copied when assigned? (I assume non-var function parameters don't copy. Anything else?)
Are they optimized such that they only actually get copied if/when they're mutated?
Is there any significant difference between a string and a sequence, or can the answers to the above questions be equally applied to all sequences?
Anything else in general worth noting?
Thank you!
The definition of strings actually is in system.nim, just under another name:
type
TGenericSeq {.compilerproc, pure, inheritable.} = object
len, reserved: int
PGenericSeq {.exportc.} = ptr TGenericSeq
UncheckedCharArray {.unchecked.} = array[0..ArrayDummySize, char]
# len and space without counting the terminating zero:
NimStringDesc {.compilerproc, final.} = object of TGenericSeq
data: UncheckedCharArray
NimString = ptr NimStringDesc
So a string is a raw pointer to an object with a len, reserved and data field. The procs for strings are defined in sysstr.nim.
The semantics of string assignments have been chosen to be the same as for all value types (not ref or ptr) in Nim by default, so you can assume that assignments create a copy. When a copy is unneccessary, the compiler can leave it out, but I'm not sure how much that is happening so far. Passing strings into a proc doesn't copy them. There is no optimization that prevents string copies until they are mutated. Sequences behave in the same way.
You can change the default assignment behaviour of strings and seqs by marking them as shallow, then no copy is done on assignment:
var s = "foo"
shallow s

Are numbers, bools or nils garbage collected in Lua?

This article implies that all types beside numbers, bools and nil are garbage collected.
The field gc is used for the other values (strings, tables, functions, heavy userdata, and threads), which are those subject to garbage collection.
Would this mean under certain circumstances that overusing these non-gc types might result in memory leaks?
In Lua, you have actually 2 kinds of types: Ones which are always passed by value, and ones passed by reference ( as per chapter 2.1 in the Lua Manual ).
The ones you cite are all of the "passed-by-value" type, hence they are directly stored in a variable.
If you delete the variable, the value will be gone instantly.
So it will not start leaking memory, unless, of course, you keep generating new variables containing new values. But in that case it's your own fault ;).
In the article you linked to they write down the C code that shows how values are represented:
/*You can also find this in lobject.h in the Lua source*/
/*I paraphrased a bit to remove some macro magic*/
/*unions in C store one of the values at a time*/
union Value {
GCObject *gc; /* collectable objects */
void *p; /* light userdata */
int b; /* booleans */
lua_CFunction f; /* light C functions */
numfield /* numbers */
};
typedef union Value Value;
/*the _tt tagtells what kind of value is actually stored in the union*/
struct lua_TObject {
int _tt;
Value value_;
};
As you can see in here, booleans and numbers are stored directly in the TObject struct. Since they are not "heap-allocated" it means that they can never "leak" and therefore garbage collecting them would have made no sense.
One interesting to note, however, is that the garbage collector does not collect references created to things on the C side of things (userdata and C C functions). These need to be manually managed from the C-side of things but that is sort of to be expected since in that case you are writing C instead of Lua.

Delphi: Form becomes Frozen while assigning strings in thread

A code below is in a thread.
Tf1 := TFileStream.Create(LogsPath,fmOpenRead or fmShareDenyNone);
...
str:=TStringList.Create;
str.LoadFromStream(tf1);
...
SynEditLog.Lines.Assign(str); // I do this with Synchronize
There are 30 000 strings in a text document.
A form becomes frozen while assigning those strings to SynEdit.
If to load string by string it takes me 40 sec... If to use Assign - 8 sec.
How to prevent this form's state?
Thanks!!!
I don't think Application.ProcessMessages is going to help here at all, since all the work happens in the one call to Assign.
Does SynEditLog have BeginUpdate/EndUpdate methods? I'd use them and see how you go. For instance:
SynEditLog.BeginUpdate;
try
SynEditLog.Lines.Assign(str);
finally
SynEditLog.EndUpdate;
end;
In response to that not working
You'll need to break down the assignment of the string list to the Lines property. Something like this:
var
LIndex: integer;
begin
SynEditLog.BeginUpdate;
try
//added: set the capacity before adding all the strings.
SynEditLog.Lines.Capacity := str.Capacity;
for LIndex := 0 to str.Count - 1 do
begin
SynEditLog.Lines.Add(str[LIndex]);
if LIndex mod 100 = 0 then
Application.ProcessMessages;
end;
finally
SynEditLog.EndUpdate;
end;
end;
(note: code typed directly into browser, may not compile)
If this is too slow, try increasing the LIndex mod 100 = 0 to something larger, like 1000 or even 5000.
N#
The form is freezing because you're using the GUI thread to add 30,000 lines to your control, which naturally takes a while. During this time, the GUI can't update because you're using its thread, so it looks frozen.
One way around this would be to add a few lines (or just one) at a time, and, in between each add, update the GUI (by calling Application.ProcessMessages (thanks gordy)).

Obtaining a plain char* from a string in D?

I'm having an absolute hell of a time trying to figure out how to get a plain, mutable C string (a char*) from a D string (a immutable(char)[]) to that I can pass the character data to legacy C code. toStringz doesn't work, as I get an error saying that I "cannot implicitly convert expression (toStringz(this.fileName())) of type immutable(char)* to char*". Do I need to recreate a new, mutable array of char and copy the characters over?
If you can change the header of the D interface of that legacy C code, and you are sure that legacy C code will not modify the string, you could make it accept a const(char)*, e.g.
char* strncpy(char* dest, const(char)* src, size_t count);
// ^^^^^^^^^^^^
Yeah, it's not pretty, because the result is immutable.
This is why I always return a mutable copy of new arrays in my code. There's no point in making them immutable.
Solutions:
You could just do
char[] buffer = (myString ~ '\0').dup; //Concatenate a null terminator, then dup
then use buffer.ptr as the pointer.
However:
This wastes a string. A better approach might be:
char[] buffer = myString.dup;
buffer ~= '\0'; //Hopefully this doesn't reallocate
and using buffer.ptr afterwards.
Another solution is to use a method like this one:
char* toStringz(in char[] s)
{
string result;
if (s.length > 0 && s[$ - 1] == '\0') //Is it null-terminated?
{ result = s.dup; }
else { result = new char[s.length + 1]; result[0 .. s.length][] = s[]; }
return result.ptr;
}
This one is the most efficient but also the longest.
(Edit: Whoops, I had a typo in the if; fixed it.)
If you want to pass a mutable char* to a C function, you're going to need to allocate a mutable char[]. string isn't going to work, because it's immutable(char)[]. You can't alter immutable variables, so there is no way to pass a string to a function (C or otherwise) which needs to alter its elements.
So, if you have a string, and you need to pass it to a function which takes a char[], then you can use to!(char[]) or dup and get a mutable copy of it. In addition, if you want to pass it to a C function, you're going to need to append a '\0' to it so that it's zero-terminated. The easiest way to do that is just to do ~= '\0' on the char[], but the more efficient way would probably be to do something like this:
auto cstring = new char[](str.length + 1);
cstring[0 .. str.length] = str[];
cstring[$ - 1] = '\0';
In either case, you then pass cstring.ptr to the C function that you're calling.
If you know that the C function that you're calling isn't going to alter the string, then you can either do as KennyTM suggests and alter the C function's signature in D to take a const(char)*, or you can cast the string. e.g.
auto cstring = toStringz(str);
cfunc(cast(char*)cstring.ptr);
Altering the C function's signature would be more correct and less error-prone though.
It sounds like we may be altering std.conv.to to be smart enough to turn strings into zero-terminated strings when cast to char*, const(char)*, etc. So, once that's done, getting a zero-terminated mutable string should be easier, but for the moment, you pretty much just need to copy the string and append a '\0' to it so that it's zero-terminated. But regardless, you're never going to be able to pass a string to a C function which needs to modify it, because a string can't be mutated.
Without any context on which function you're calling it's hard to say what is the right solution.
Typically, if the C function wants to modify or write to the string it probably expects you to provide a buffer and a length. Usually what I do is:
Allocate a buffer:
auto buffer = new char[](256); // your own length instead of 256 here
And call the C function:
CWriteString(buffer.ptr, buffer.length);
You can try the following :
char a[]="abc";
char *p=a;
Now you can pass pointer 'p' to the array in any function.
Hope it works.

Resources