What is being copied when passing a string as parameter?

What is being copied when passing a string as parameter? - string

In Golang, everything is passed by value. If I pass an array "directly" (as opposed as passing it by pointer), then any modification made in the function will be found outside of it
func f(a []int) {
a[0] = 10
}
func main() {
a := []int{2,3,4}
f(a)
fmt.Println(a)
}
Output: [10 3 4]
This is because, to my understanding, an array constitutes (among other things) of a pointer to the underlying data array.
Unless I am mistaken (see here) strings also constitute (along with a "len" object) of a pointer (a unsafe.Pointer) to the underlying data. Hence, I was expecting the same behaviour as above but, apparently, I was wrong.
func f(s string) {
s = "bar"
}
func main() {
s := "foo"
f(s)
fmt.Println(s)
}
Output: "foo"
What is happening here with the string? Seems like the underlying data is being copied when the string is passed as argument.
Related question: When we do not wish our function to modify the string, is it still recommended to pass large strings by pointer for performance reasons?

A string has two values in it: pointer to an array, and the string length. When you pass string as an argument, those two values are copied, not the underlying array.
There is no way to modify the contents of string other than using unsafe. When you pass a *string to a function and that function modifies the string, the function simply modifies the string to point to a different array.

Related

Easy way to get string pointers

A library I am using has a very weird API that often takes string pointers. Currently I am doing this:
s := "foobar"
weirdFun(&s)
to pass strings. Is there a way to do this without the variable?

Maybe you should inform the author of the library, that the strings in Go are already references (to a structure, which is internally represented as a slice of runes), so no expensive copy operation is made by passing string to a function, it's call by reference.
Hope this helps!

The address operation &x can be used with addressable values.
According to the language specification:
The operand must be addressable, that is, either a variable, pointer indirection, or slice indexing operation; or a field selector of an addressable struct operand; or an array indexing operation of an addressable array. As an exception to the addressability requirement, x may also be a (possibly parenthesized) composite literal.
So, you can work around this using a composite literal:
package main
import (
"fmt"
)
func main() {
s := "text"
fmt.Printf("value: %v, type: %T\n", &s, &s)
fmt.Printf("value: %v, type: %T\n", &[]string{"literal"}[0], &[]string{"literal"}[0])
}
Even though it's possible I don't recommend using this. This is not an example of clear code.

The Azure SDK uses string pointers to distinguish between no value and the empty string.
Use Azure's StringPtr function to create a pointer to a string literal.
import (
⋮
"github.com/Azure/go-autorest/autorest/to"
)
⋮
res, err := someClient.Create(ctx, someService.ExampleParameters{
Location: to.StringPtr(location),
})

The library is really weird, but
you can do this in one line with function wrap, for example
func PointerTo[T ~string](s T) *T {
return &s
}
s := "string"
weirdFun(PointerTo(s))

Error addressing the returned slice of a function

In the next code the first Println fails on build with error slice of unaddressable value. The rest of the lines are just fine.
package main
import "fmt"
func getSlice() [0]int {
return [...]int{}
}
func getString() string {
return "hola"
}
func main() {
fmt.Println(getSlice()[:]) // Error: slice of unaddressable value
var a = getSlice()
fmt.Println(a[:])
fmt.Println(getString()[:])
var b = getString()
fmt.Println(b[:])
}
Try this code
If the first Println is commented it works.
Try it out
Why is that? What I'm missing here?

What you're missing is that when slicing an array, the operand must be addressable ([0]int is an array, not a slice). And return values of function calls are not addressable. For details see How can I store reference to the result of an operation in Go?; and "cannot take the address of" and "cannot call pointer method on".
Spec: Slice expressions:
If the sliced operand is an array, it must be addressable and the result of the slice operation is a slice with the same element type as the array.
In this expression:
getSlice()[:]
getSlice() returns an array, and since it's the result of a function call, it's not addressable. Therefore you cannot slice it.
In this expression:
getString()[:]
getString() returns a string value, so it can be sliced even if the value is not addressable. This is allowed, because the result of the slice expression will be another string, and string values in Go are immutable.
Also, variables are addressable, so this will always work:
var a = getSlice()
fmt.Println(a[:])

getSlice() is not returning a slice it's returning an array, which is not addressable. You could return a pointer to the array:
func getSlice() *[0]int {
return &[...]int{}
}
or leave getSlice() as is and place the result in a temporary variable:
t := getSlice()
fmt.Println(t[:])

Allocation memory for string

In D string is alias on immutable char[]. So every operation on string processing with allocation of memory. I tied to check it, but after replacing symbol in string I see the same address.
string str = "big";
writeln(&str);
str.replace("i","a");
writeln(&str);
Output:
> app.exe
19FE10
19FE10
I tried use ptr:
string str = "big";
writeln(str.ptr);
str.replace(`i`,`a`);
writeln(str.ptr);
And got next output:
42E080
42E080
So it's showing the same address. Why?

You made a simple error in your code:
str.replace("i","a");
str.replace returns the new string with the replacement done, it doesn't actually replace the existing variable. So try str = str.replace("i", "a"); to see the change.
But you also made a too broadly general statement about allocations:
So every operation on string processing with allocation of memory.
That's false, a great many operations do not require allocation of new memory. Anything that can slice the existing string will do so, avoiding needing new memory:
import std.string;
import std.stdio;
void main() {
string a = " foo ";
string b = a.strip();
assert(b == "foo"); // whitespace stripped off...
writeln(a.ptr);
writeln(b.ptr); // but notice how close those ptrs are
assert(b.ptr == a.ptr + 2); // yes, b is a slice of a
}
replace will also return the original string if no replacement was actually done:
string a = " foo ";
string b = a.replace("p", "a"); // there is no p to replace
assert(a.ptr is b.ptr); // so same string returned
Indexing and iteration require no new allocation (of course). Believe it or not, but even appending sometimes will not allocate because there may be memory left at the end of the slice that is not yet used (though it usually will).
There's also various functions that return range objects that do the changes as you iterate through them, avoiding allocation. For example, instead of replace(a, "f", "");, you might do something like filter!(ch => ch != 'f')(a); and loop through, which doesn't allocate a new string unless you ask it to.
So it is a lot more nuanced than you might think!

In D all arrays are a length + a pointer to the start of the array values. These are usually stored on the stack which just so happens to be RAM.
When you go take an address of a variable (which is in a function body) what you really are doing is getting a pointer to the stack.
To get the address of an array values use .ptr.
So replace &str with str.ptr and you will get the correct output.

Fastest, leanest way to append characters to form a string in Swift

I come from a C# background where System.String is immutable and string concatenation is relatively expensive (as it requires reallocating the string) we know to use the StringBuilder type instead as it preallocates a larger buffer where single characters (Char, a 16-bit value-type) and short strings can be concatenated cheaply without extra allocation.
I'm porting some C# code to Swift which reads from a bit-array ([Bool]) at sub-octet indexes with character lengths less than 8 bits (it's a very space-conscious file format).
My C# code does something like this:
StringBuilder sb = new StringBuilder( expectedCharacterCount );
int idxInBits = 0;
Boolean[] bits = ...;
for(int i = 0; i < someLength; i++) {
Char c = ReadNextCharacter( ref idxInBits, 6 ); // each character is 6 bits in this example
sb.Append( c );
}
In Swift, I assume NSMutableString is the equivalent of .NET's StringBuilder, and I found this QA about appending individual characters ( How to append a character to string in Swift? ) so in Swift I have this:
var buffer: NSMutableString
for i in 0..<charCount {
let charValue: Character = readNextCharacter( ... )
buffer.AppendWithFormat("%c", charValue)
}
return String(buffer)
But I don't know why it goes through a format-string first, that seems inefficient (reparsing the format-string on every iteration) and as my code is running on iOS devices I want to be very conservative with my program's CPU and memory usage.
As I was writing this, I learned my code should really be using UnicodeScalar instead of Character, problem is NSMutableString does not let you append a UnicodeScalar value, you have to use Swift's own mutable String type, so now my code looks like:
var buffer: String
for i in 0..<charCount {
let x: UnicodeScalar = readNextCharacter( ... )
buffer.append(x)
}
return buffer
I thought that String was immutable, but I noticed its append method returns Void.
I still feel uncomfortable doing this because I don't know how Swift's String type is implemented internally, and I don't see how I can preallocate a large buffer to avoid reallocations (assuming Swift's String uses a growing algorithm).

(This answer was written based on documentation and source code valid for Swift 2 and 3: possibly needs updates and amendments once Swift 4 arrives)
Since Swift is now open-source, we can actually have a look at the source code for Swift:s native String
swift/stdlib/public/core/String.swift
From the source above, we have following comment
/// Growth and Capacity
/// ===================
///
/// When a string's contiguous storage fills up, new storage must be
/// allocated and characters must be moved to the new storage.
/// `String` uses an exponential growth strategy that makes `append` a
/// constant time operation *when amortized over many invocations*.
Given the above, you shouldn't need to worry about the performance of appending characters in Swift (be it via append(_: Character), append(_: UniodeScalar) or appendContentsOf(_: String)), as reallocation of the contiguous storage for a certain String instance should not be very frequent w.r.t. number of single characters needed to be appended for this re-allocation to occur.
Also note that NSMutableString is not "purely native" Swift, but belong to the family of bridged Obj-C classes (accessible via Foundation).
A note to your comment
"I thought that String was immutable, but I noticed its append method returns Void."
String is just a (value) type, that may be used by mutable as well as immutable properties
var foo = "foo" // mutable
let bar = "bar" // immutable
/* (both the above inferred to be of type 'String') */
The mutating void-return instance methods append(_: Character) and append(_: UniodeScalar) are accessible to mutable as well as immutable String instances, but naturally using them with the latter will yield a compile time error
let chars : [Character] = ["b","a","r"]
foo.append(chars[0]) // "foob"
bar.append(chars[0]) // error: cannot use mutating member on immutable value ...

Obtaining a plain char* from a string in D?

I'm having an absolute hell of a time trying to figure out how to get a plain, mutable C string (a char*) from a D string (a immutable(char)[]) to that I can pass the character data to legacy C code. toStringz doesn't work, as I get an error saying that I "cannot implicitly convert expression (toStringz(this.fileName())) of type immutable(char)* to char*". Do I need to recreate a new, mutable array of char and copy the characters over?

If you can change the header of the D interface of that legacy C code, and you are sure that legacy C code will not modify the string, you could make it accept a const(char)*, e.g.
char* strncpy(char* dest, const(char)* src, size_t count);
// ^^^^^^^^^^^^

Yeah, it's not pretty, because the result is immutable.
This is why I always return a mutable copy of new arrays in my code. There's no point in making them immutable.
Solutions:
You could just do
char[] buffer = (myString ~ '\0').dup; //Concatenate a null terminator, then dup
then use buffer.ptr as the pointer.
However:
This wastes a string. A better approach might be:
char[] buffer = myString.dup;
buffer ~= '\0'; //Hopefully this doesn't reallocate
and using buffer.ptr afterwards.
Another solution is to use a method like this one:
char* toStringz(in char[] s)
{
string result;
if (s.length > 0 && s[$ - 1] == '\0') //Is it null-terminated?
{ result = s.dup; }
else { result = new char[s.length + 1]; result[0 .. s.length][] = s[]; }
return result.ptr;
}
This one is the most efficient but also the longest.
(Edit: Whoops, I had a typo in the if; fixed it.)

If you want to pass a mutable char* to a C function, you're going to need to allocate a mutable char[]. string isn't going to work, because it's immutable(char)[]. You can't alter immutable variables, so there is no way to pass a string to a function (C or otherwise) which needs to alter its elements.
So, if you have a string, and you need to pass it to a function which takes a char[], then you can use to!(char[]) or dup and get a mutable copy of it. In addition, if you want to pass it to a C function, you're going to need to append a '\0' to it so that it's zero-terminated. The easiest way to do that is just to do ~= '\0' on the char[], but the more efficient way would probably be to do something like this:
auto cstring = new char[](str.length + 1);
cstring[0 .. str.length] = str[];
cstring[$ - 1] = '\0';
In either case, you then pass cstring.ptr to the C function that you're calling.
If you know that the C function that you're calling isn't going to alter the string, then you can either do as KennyTM suggests and alter the C function's signature in D to take a const(char)*, or you can cast the string. e.g.
auto cstring = toStringz(str);
cfunc(cast(char*)cstring.ptr);
Altering the C function's signature would be more correct and less error-prone though.
It sounds like we may be altering std.conv.to to be smart enough to turn strings into zero-terminated strings when cast to char*, const(char)*, etc. So, once that's done, getting a zero-terminated mutable string should be easier, but for the moment, you pretty much just need to copy the string and append a '\0' to it so that it's zero-terminated. But regardless, you're never going to be able to pass a string to a C function which needs to modify it, because a string can't be mutated.

Without any context on which function you're calling it's hard to say what is the right solution.
Typically, if the C function wants to modify or write to the string it probably expects you to provide a buffer and a length. Usually what I do is:
Allocate a buffer:
auto buffer = new char[](256); // your own length instead of 256 here
And call the C function:
CWriteString(buffer.ptr, buffer.length);

You can try the following :
char a[]="abc";
char *p=a;
Now you can pass pointer 'p' to the array in any function.
Hope it works.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

What is being copied when passing a string as parameter? - string

Related

Easy way to get string pointers

Error addressing the returned slice of a function

Allocation memory for string

Fastest, leanest way to append characters to form a string in Swift

Obtaining a plain char* from a string in D?

Categories

Resources