Allocation memory for string - string

In D string is alias on immutable char[]. So every operation on string processing with allocation of memory. I tied to check it, but after replacing symbol in string I see the same address.
string str = "big";
writeln(&str);
str.replace("i","a");
writeln(&str);
Output:
> app.exe
19FE10
19FE10
I tried use ptr:
string str = "big";
writeln(str.ptr);
str.replace(`i`,`a`);
writeln(str.ptr);
And got next output:
42E080
42E080
So it's showing the same address. Why?

You made a simple error in your code:
str.replace("i","a");
str.replace returns the new string with the replacement done, it doesn't actually replace the existing variable. So try str = str.replace("i", "a"); to see the change.
But you also made a too broadly general statement about allocations:
So every operation on string processing with allocation of memory.
That's false, a great many operations do not require allocation of new memory. Anything that can slice the existing string will do so, avoiding needing new memory:
import std.string;
import std.stdio;
void main() {
string a = " foo ";
string b = a.strip();
assert(b == "foo"); // whitespace stripped off...
writeln(a.ptr);
writeln(b.ptr); // but notice how close those ptrs are
assert(b.ptr == a.ptr + 2); // yes, b is a slice of a
}
replace will also return the original string if no replacement was actually done:
string a = " foo ";
string b = a.replace("p", "a"); // there is no p to replace
assert(a.ptr is b.ptr); // so same string returned
Indexing and iteration require no new allocation (of course). Believe it or not, but even appending sometimes will not allocate because there may be memory left at the end of the slice that is not yet used (though it usually will).
There's also various functions that return range objects that do the changes as you iterate through them, avoiding allocation. For example, instead of replace(a, "f", "");, you might do something like filter!(ch => ch != 'f')(a); and loop through, which doesn't allocate a new string unless you ask it to.
So it is a lot more nuanced than you might think!

In D all arrays are a length + a pointer to the start of the array values. These are usually stored on the stack which just so happens to be RAM.
When you go take an address of a variable (which is in a function body) what you really are doing is getting a pointer to the stack.
To get the address of an array values use .ptr.
So replace &str with str.ptr and you will get the correct output.

Related

Fastest, leanest way to append characters to form a string in Swift

I come from a C# background where System.String is immutable and string concatenation is relatively expensive (as it requires reallocating the string) we know to use the StringBuilder type instead as it preallocates a larger buffer where single characters (Char, a 16-bit value-type) and short strings can be concatenated cheaply without extra allocation.
I'm porting some C# code to Swift which reads from a bit-array ([Bool]) at sub-octet indexes with character lengths less than 8 bits (it's a very space-conscious file format).
My C# code does something like this:
StringBuilder sb = new StringBuilder( expectedCharacterCount );
int idxInBits = 0;
Boolean[] bits = ...;
for(int i = 0; i < someLength; i++) {
Char c = ReadNextCharacter( ref idxInBits, 6 ); // each character is 6 bits in this example
sb.Append( c );
}
In Swift, I assume NSMutableString is the equivalent of .NET's StringBuilder, and I found this QA about appending individual characters ( How to append a character to string in Swift? ) so in Swift I have this:
var buffer: NSMutableString
for i in 0..<charCount {
let charValue: Character = readNextCharacter( ... )
buffer.AppendWithFormat("%c", charValue)
}
return String(buffer)
But I don't know why it goes through a format-string first, that seems inefficient (reparsing the format-string on every iteration) and as my code is running on iOS devices I want to be very conservative with my program's CPU and memory usage.
As I was writing this, I learned my code should really be using UnicodeScalar instead of Character, problem is NSMutableString does not let you append a UnicodeScalar value, you have to use Swift's own mutable String type, so now my code looks like:
var buffer: String
for i in 0..<charCount {
let x: UnicodeScalar = readNextCharacter( ... )
buffer.append(x)
}
return buffer
I thought that String was immutable, but I noticed its append method returns Void.
I still feel uncomfortable doing this because I don't know how Swift's String type is implemented internally, and I don't see how I can preallocate a large buffer to avoid reallocations (assuming Swift's String uses a growing algorithm).
(This answer was written based on documentation and source code valid for Swift 2 and 3: possibly needs updates and amendments once Swift 4 arrives)
Since Swift is now open-source, we can actually have a look at the source code for Swift:s native String
swift/stdlib/public/core/String.swift
From the source above, we have following comment
/// Growth and Capacity
/// ===================
///
/// When a string's contiguous storage fills up, new storage must be
/// allocated and characters must be moved to the new storage.
/// `String` uses an exponential growth strategy that makes `append` a
/// constant time operation *when amortized over many invocations*.
Given the above, you shouldn't need to worry about the performance of appending characters in Swift (be it via append(_: Character), append(_: UniodeScalar) or appendContentsOf(_: String)), as reallocation of the contiguous storage for a certain String instance should not be very frequent w.r.t. number of single characters needed to be appended for this re-allocation to occur.
Also note that NSMutableString is not "purely native" Swift, but belong to the family of bridged Obj-C classes (accessible via Foundation).
A note to your comment
"I thought that String was immutable, but I noticed its append method returns Void."
String is just a (value) type, that may be used by mutable as well as immutable properties
var foo = "foo" // mutable
let bar = "bar" // immutable
/* (both the above inferred to be of type 'String') */
The mutating void-return instance methods append(_: Character) and append(_: UniodeScalar) are accessible to mutable as well as immutable String instances, but naturally using them with the latter will yield a compile time error
let chars : [Character] = ["b","a","r"]
foo.append(chars[0]) // "foob"
bar.append(chars[0]) // error: cannot use mutating member on immutable value ...

Inconsistencies when using UnsafeMutablePointer with String or Character types

I'm currently trying to implement my own DynamicArray data type in Swift. To do so I'm using pointers a bit. As my root I'm using an UnsafeMutablePointer of a generic type T:
struct DynamicArray<T> {
private var root: UnsafeMutablePointer<T> = nil
private var capacity = 0 {
didSet {
//...
}
}
//...
init(capacity: Int) {
root = UnsafeMutablePointer<T>.alloc(capacity)
self.capacity = capacity
}
init(count: Int, repeatedValue: T) {
self.init(capacity: count)
for index in 0..<count {
(root + index).memory = repeatedValue
}
self.count = count
}
//...
}
Now as you can see I've also implemented a capacity property which tells me how much memory is currently allocated for root. Accordingly one can create an instance of DynamicArray using the init(capacity:) initializer, which allocates the appropriate amount of memory, and sets the capacity property.
But then I also implemented the init(count:repeatedValue:) initializer, which first allocates the needed memory using init(capacity: count). It then sets each segment in that part of memory to the repeatedValue.
When using the init(count:repeatedValue:) initializer with number types like Int, Double, or Float it works perfectly fine. Then using Character, or String though it crashes. It doesn't crash consistently though, but actually works sometimes, as can be seen here, by compiling a few times.
var a = DynamicArray<Character>(count: 5, repeatedValue: "A")
println(a.description) //prints [A, A, A, A, A]
//crashes most of the time
var b = DynamicArray<Int>(count: 5, repeatedValue: 1)
println(a.description) //prints [1, 1, 1, 1, 1]
//works consistently
Why is this happening? Does it have to do with String and Character holding values of different length?
Update #1:
Now #AirspeedVelocity addressed the problem with init(count:repeatedValue:). The DynamicArray contains another initializer though, which at first worked in a similar fashion as init(count:repeatedValue:). I changed it to work, as #AirspeedVelocity described for init(count:repeatedValue:) though:
init<C: CollectionType where C.Generator.Element == T, C.Index.Distance == Int>(collection: C) {
let collectionCount = countElements(collection)
self.init(capacity: collectionCount)
root.initializeFrom(collection)
count = collectionCount
}
I'm using the initializeFrom(source:) method as described here. And since collection conforms to CollectionType it should work fine.
I'm now getting this error though:
<stdin>:144:29: error: missing argument for parameter 'count' in call
root.initializeFrom(collection)
^
Is this just a misleading error message again?
Yes, chances are this doesn’t crash with basic inert types like integers but does with strings or arrays because they are more complex and allocate memory for themselves on creation/destruction.
The reason it’s crashing is that UnsafeMutablePointer memory needs to be initialized before it’s used (and similarly, needs to de-inited with destroy before it is deallocated).
So instead of assigning to the memory property, you should use the initialize method:
for index in 0..<count {
(root + index).initialize(repeatedValue)
}
Since initializing from another collection of values is so common, there’s another version of initialize that takes one. You could use that in conjunction with another helper struct, Repeat, that is a collection of the same value repeated multiple times:
init(count: Int, repeatedValue: T) {
self.init(capacity: count)
root.initializeFrom(Repeat(count: count, repeatedValue: repeatedValue))
self.count = count
}
However, there’s something else you need to be aware of which is that this code is currently inevitably going to leak memory. The reason being, you will need to destroy the contents and dealloc the pointed-to memory at some point before your DynamicArray struct is destroyed, otherwise you’ll leak. Since you can’t have a deinit in a struct, only a class, this won’t be possible to do automatically (this is assuming you aren’t expecting users of your array to do this themselves manually before it goes out of scope).
Additionally, if you want to implement value semantics (as with Array and String) via copy-on-write, you’ll also need a way of detecting if your internal buffer is being referenced multiple times. Take a look at ManagedBufferPointer to see a class that handles this for you.

Calling a C Function with C-style Strings

I'm struggling to call mktemp in D:
import core.sys.posix.stdlib;
import std.string: toStringz;
auto name = "alpha";
auto tmp = mktemp(name.toStringz);
but I can't figure out how to use it so DMD complains:
/home/per/Work/justd/fs.d(1042): Error: function core.sys.posix.stdlib.mktemp (char*) is not callable using argument types (immutable(char)*)
How do I create a mutable zero-terminated C-style string?
I think I've read somewhere that string literals (const or immutable) are implicitly convertible to zero (null)-terminated strings.
For this specific problem:
This is because mktemp needs to write to the string. From mktemp(3):
The last six characters of template must be XXXXXX and these are replaced with a string that makes the filename unique. Since it will be modified, template must not be a string constant, but should be declared as a character array.
So what you want to do here is use a char[] instead of a string. I'd go with:
import std.stdio;
void main() {
import core.sys.posix.stdlib;
// we'll use a little mutable buffer defined right here
char[255] tempBuffer;
string name = "alphaXXXXXX"; // last six X's are required by mktemp
tempBuffer[0 .. name.length] = name[]; // copy the name into the mutable buffer
tempBuffer[name.length] = 0; // make sure it is zero terminated yourself
auto tmp = mktemp(tempBuffer.ptr);
import std.conv;
writeln(to!string(tmp));
}
In general, creating a mutable string can be done in one of two ways: one is to .dup something, or the other is to use a stack buffer like I did above.
toStringz doesn't care if the input data is mutable, it always returns immutable (apparently...). But it is easy to do it yourself:
auto c_str = ("foo".dup ~ "\0").ptr;
That's how you do it, .dup makes a mutable copy, and appending the zero terminator yourself ensures it is there.
string name = "alphaXXXXXX"; // last six X's are required by mktemp
auto tmp = mktemp((name.dup ~ "\0").ptr);
In addition to Adam's great answer, there's also std.utf.toUTFz, in which case you can do
void main()
{
import core.sys.posix.stdlib;
import std.conv, std.stdio, std.utf;
auto name = toUTFz!(char*)("alphaXXXXXX");
auto tmp = mktemp(name);
writeln(to!string(tmp));
}
std.utf.toUTFz is std.string.toStringz's more capable cousin as it will generate null-terminated UTF-8, UTF-16, and UTF-32 strings (as opposed to just UTF-8) as well as any level of constness. The downside is that it's more verbose for cases where you just want immutable(char)*, because you have to specify the return type.
However, if efficiency is a concern, Adam's solution is likely better simply because it avoids having to allocate the C-string that you pass to mktemp on the heap. toUTFz is shorter though, so if you don't care about the efficiency cost of allocating the C-string on the heap (and most programs probably won't), then toUTFz is arguably better. It depends on the requirements of your particular program.

Question related to string

I have two statements:
String aStr = new String("ABC");
String bStr = "ABC";
I read in book that in first statement JVM creates two bjects and one reference variable, whereas second statement creates one reference variable and one object.
How is that? When I say new String("ABC") then It's pretty clear that object is created.
Now my question is that for "ABC" value to we do create another object?
Please clarify a bit more here.
Thank you
You will end up with two Strings.
1) the literal "ABC", used to construct aStr and assigned to bStr. The compiler makes sure that this is the same single instance.
2) a newly constructed String aStr (because you forced it to be new'ed, which is really pretty much non-sensical)
Using a string literal will only create a single object for the lifetime of the JVM - or possibly the classloader. (I can't remember the exact details, but it's almost never important.)
That means it's hard to say that the second statement in your code sample really "creates" an object - a certain object has to be present, but if you run the same code in a loop 100 times, it won't create any more objects... whereas the first statement would. (It would require that the object referred to by the "ABC" literal is present and create a new instance on each iteration, by virtue of calling the constructor.)
In particular, if you have:
Object x = "ABC";
Object y = "ABC";
then it's guaranteed (by the language specification) than x and y will refer to the same object. This extends to other constant expressions equal to the same string too:
private static final String A = "a";
Object z = A + "BC"; // x, y and z are still the same reference...
The only time I ever use the String(String) constructor is if I've got a string which may well be backed by a rather larger character array which I don't otherwise need:
String x = readSomeVeryLargeString();
String y = x.substring(5, 10);
String z = new String(y); // Copies the contents
Now if the strings that y and x refer to are eligible for collection but the string that z refers to isn't (e.g. it's passed on to other methods etc) then we don't end up holding all of the original long string in memory, which we would otherwise.

Obtaining a plain char* from a string in D?

I'm having an absolute hell of a time trying to figure out how to get a plain, mutable C string (a char*) from a D string (a immutable(char)[]) to that I can pass the character data to legacy C code. toStringz doesn't work, as I get an error saying that I "cannot implicitly convert expression (toStringz(this.fileName())) of type immutable(char)* to char*". Do I need to recreate a new, mutable array of char and copy the characters over?
If you can change the header of the D interface of that legacy C code, and you are sure that legacy C code will not modify the string, you could make it accept a const(char)*, e.g.
char* strncpy(char* dest, const(char)* src, size_t count);
// ^^^^^^^^^^^^
Yeah, it's not pretty, because the result is immutable.
This is why I always return a mutable copy of new arrays in my code. There's no point in making them immutable.
Solutions:
You could just do
char[] buffer = (myString ~ '\0').dup; //Concatenate a null terminator, then dup
then use buffer.ptr as the pointer.
However:
This wastes a string. A better approach might be:
char[] buffer = myString.dup;
buffer ~= '\0'; //Hopefully this doesn't reallocate
and using buffer.ptr afterwards.
Another solution is to use a method like this one:
char* toStringz(in char[] s)
{
string result;
if (s.length > 0 && s[$ - 1] == '\0') //Is it null-terminated?
{ result = s.dup; }
else { result = new char[s.length + 1]; result[0 .. s.length][] = s[]; }
return result.ptr;
}
This one is the most efficient but also the longest.
(Edit: Whoops, I had a typo in the if; fixed it.)
If you want to pass a mutable char* to a C function, you're going to need to allocate a mutable char[]. string isn't going to work, because it's immutable(char)[]. You can't alter immutable variables, so there is no way to pass a string to a function (C or otherwise) which needs to alter its elements.
So, if you have a string, and you need to pass it to a function which takes a char[], then you can use to!(char[]) or dup and get a mutable copy of it. In addition, if you want to pass it to a C function, you're going to need to append a '\0' to it so that it's zero-terminated. The easiest way to do that is just to do ~= '\0' on the char[], but the more efficient way would probably be to do something like this:
auto cstring = new char[](str.length + 1);
cstring[0 .. str.length] = str[];
cstring[$ - 1] = '\0';
In either case, you then pass cstring.ptr to the C function that you're calling.
If you know that the C function that you're calling isn't going to alter the string, then you can either do as KennyTM suggests and alter the C function's signature in D to take a const(char)*, or you can cast the string. e.g.
auto cstring = toStringz(str);
cfunc(cast(char*)cstring.ptr);
Altering the C function's signature would be more correct and less error-prone though.
It sounds like we may be altering std.conv.to to be smart enough to turn strings into zero-terminated strings when cast to char*, const(char)*, etc. So, once that's done, getting a zero-terminated mutable string should be easier, but for the moment, you pretty much just need to copy the string and append a '\0' to it so that it's zero-terminated. But regardless, you're never going to be able to pass a string to a C function which needs to modify it, because a string can't be mutated.
Without any context on which function you're calling it's hard to say what is the right solution.
Typically, if the C function wants to modify or write to the string it probably expects you to provide a buffer and a length. Usually what I do is:
Allocate a buffer:
auto buffer = new char[](256); // your own length instead of 256 here
And call the C function:
CWriteString(buffer.ptr, buffer.length);
You can try the following :
char a[]="abc";
char *p=a;
Now you can pass pointer 'p' to the array in any function.
Hope it works.

Resources