Calling a C Function with C-style Strings - string

I'm struggling to call mktemp in D:
import core.sys.posix.stdlib;
import std.string: toStringz;
auto name = "alpha";
auto tmp = mktemp(name.toStringz);
but I can't figure out how to use it so DMD complains:
/home/per/Work/justd/fs.d(1042): Error: function core.sys.posix.stdlib.mktemp (char*) is not callable using argument types (immutable(char)*)
How do I create a mutable zero-terminated C-style string?
I think I've read somewhere that string literals (const or immutable) are implicitly convertible to zero (null)-terminated strings.

For this specific problem:
This is because mktemp needs to write to the string. From mktemp(3):
The last six characters of template must be XXXXXX and these are replaced with a string that makes the filename unique. Since it will be modified, template must not be a string constant, but should be declared as a character array.
So what you want to do here is use a char[] instead of a string. I'd go with:
import std.stdio;
void main() {
import core.sys.posix.stdlib;
// we'll use a little mutable buffer defined right here
char[255] tempBuffer;
string name = "alphaXXXXXX"; // last six X's are required by mktemp
tempBuffer[0 .. name.length] = name[]; // copy the name into the mutable buffer
tempBuffer[name.length] = 0; // make sure it is zero terminated yourself
auto tmp = mktemp(tempBuffer.ptr);
import std.conv;
writeln(to!string(tmp));
}
In general, creating a mutable string can be done in one of two ways: one is to .dup something, or the other is to use a stack buffer like I did above.
toStringz doesn't care if the input data is mutable, it always returns immutable (apparently...). But it is easy to do it yourself:
auto c_str = ("foo".dup ~ "\0").ptr;
That's how you do it, .dup makes a mutable copy, and appending the zero terminator yourself ensures it is there.
string name = "alphaXXXXXX"; // last six X's are required by mktemp
auto tmp = mktemp((name.dup ~ "\0").ptr);

In addition to Adam's great answer, there's also std.utf.toUTFz, in which case you can do
void main()
{
import core.sys.posix.stdlib;
import std.conv, std.stdio, std.utf;
auto name = toUTFz!(char*)("alphaXXXXXX");
auto tmp = mktemp(name);
writeln(to!string(tmp));
}
std.utf.toUTFz is std.string.toStringz's more capable cousin as it will generate null-terminated UTF-8, UTF-16, and UTF-32 strings (as opposed to just UTF-8) as well as any level of constness. The downside is that it's more verbose for cases where you just want immutable(char)*, because you have to specify the return type.
However, if efficiency is a concern, Adam's solution is likely better simply because it avoids having to allocate the C-string that you pass to mktemp on the heap. toUTFz is shorter though, so if you don't care about the efficiency cost of allocating the C-string on the heap (and most programs probably won't), then toUTFz is arguably better. It depends on the requirements of your particular program.

Related

kotlin string concatenation no side effect

Why is string concatenation kept without side-effects?
fun main() {
var s1 = "abc"
val s2 = "def"
s1.plus(s2)
println(s1)
// s1 = s1.plus(s2)
// println(s1)
}
I expected abcdef, but got just abc. The commented code works fine, but seems awkward.
The plus() method (or the + operator, which is basically the same thing in kotlin) is returning the concatenated string. So by calling
s1.plus(s2)
//or
s1 + s2
you concatenate these strings but you throw away the result.
As this answer shows. Your plus operation returns the concatenated string independent from the given strings. This is because strings are immutable in Kotlin. An alternative is to use CharLists or buffers.
Note that immutability has some serious benefits. With immutablility it is no problem to give away your references to immutable objects because nobody can change the object. You need to change the reference in your model to hold a new value. This way it helps a lot to guarantee consistency and integrity in your model. Also note that immutable objects are threadsafe for that reason.

Allocation memory for string

In D string is alias on immutable char[]. So every operation on string processing with allocation of memory. I tied to check it, but after replacing symbol in string I see the same address.
string str = "big";
writeln(&str);
str.replace("i","a");
writeln(&str);
Output:
> app.exe
19FE10
19FE10
I tried use ptr:
string str = "big";
writeln(str.ptr);
str.replace(`i`,`a`);
writeln(str.ptr);
And got next output:
42E080
42E080
So it's showing the same address. Why?
You made a simple error in your code:
str.replace("i","a");
str.replace returns the new string with the replacement done, it doesn't actually replace the existing variable. So try str = str.replace("i", "a"); to see the change.
But you also made a too broadly general statement about allocations:
So every operation on string processing with allocation of memory.
That's false, a great many operations do not require allocation of new memory. Anything that can slice the existing string will do so, avoiding needing new memory:
import std.string;
import std.stdio;
void main() {
string a = " foo ";
string b = a.strip();
assert(b == "foo"); // whitespace stripped off...
writeln(a.ptr);
writeln(b.ptr); // but notice how close those ptrs are
assert(b.ptr == a.ptr + 2); // yes, b is a slice of a
}
replace will also return the original string if no replacement was actually done:
string a = " foo ";
string b = a.replace("p", "a"); // there is no p to replace
assert(a.ptr is b.ptr); // so same string returned
Indexing and iteration require no new allocation (of course). Believe it or not, but even appending sometimes will not allocate because there may be memory left at the end of the slice that is not yet used (though it usually will).
There's also various functions that return range objects that do the changes as you iterate through them, avoiding allocation. For example, instead of replace(a, "f", "");, you might do something like filter!(ch => ch != 'f')(a); and loop through, which doesn't allocate a new string unless you ask it to.
So it is a lot more nuanced than you might think!
In D all arrays are a length + a pointer to the start of the array values. These are usually stored on the stack which just so happens to be RAM.
When you go take an address of a variable (which is in a function body) what you really are doing is getting a pointer to the stack.
To get the address of an array values use .ptr.
So replace &str with str.ptr and you will get the correct output.

Fastest, leanest way to append characters to form a string in Swift

I come from a C# background where System.String is immutable and string concatenation is relatively expensive (as it requires reallocating the string) we know to use the StringBuilder type instead as it preallocates a larger buffer where single characters (Char, a 16-bit value-type) and short strings can be concatenated cheaply without extra allocation.
I'm porting some C# code to Swift which reads from a bit-array ([Bool]) at sub-octet indexes with character lengths less than 8 bits (it's a very space-conscious file format).
My C# code does something like this:
StringBuilder sb = new StringBuilder( expectedCharacterCount );
int idxInBits = 0;
Boolean[] bits = ...;
for(int i = 0; i < someLength; i++) {
Char c = ReadNextCharacter( ref idxInBits, 6 ); // each character is 6 bits in this example
sb.Append( c );
}
In Swift, I assume NSMutableString is the equivalent of .NET's StringBuilder, and I found this QA about appending individual characters ( How to append a character to string in Swift? ) so in Swift I have this:
var buffer: NSMutableString
for i in 0..<charCount {
let charValue: Character = readNextCharacter( ... )
buffer.AppendWithFormat("%c", charValue)
}
return String(buffer)
But I don't know why it goes through a format-string first, that seems inefficient (reparsing the format-string on every iteration) and as my code is running on iOS devices I want to be very conservative with my program's CPU and memory usage.
As I was writing this, I learned my code should really be using UnicodeScalar instead of Character, problem is NSMutableString does not let you append a UnicodeScalar value, you have to use Swift's own mutable String type, so now my code looks like:
var buffer: String
for i in 0..<charCount {
let x: UnicodeScalar = readNextCharacter( ... )
buffer.append(x)
}
return buffer
I thought that String was immutable, but I noticed its append method returns Void.
I still feel uncomfortable doing this because I don't know how Swift's String type is implemented internally, and I don't see how I can preallocate a large buffer to avoid reallocations (assuming Swift's String uses a growing algorithm).
(This answer was written based on documentation and source code valid for Swift 2 and 3: possibly needs updates and amendments once Swift 4 arrives)
Since Swift is now open-source, we can actually have a look at the source code for Swift:s native String
swift/stdlib/public/core/String.swift
From the source above, we have following comment
/// Growth and Capacity
/// ===================
///
/// When a string's contiguous storage fills up, new storage must be
/// allocated and characters must be moved to the new storage.
/// `String` uses an exponential growth strategy that makes `append` a
/// constant time operation *when amortized over many invocations*.
Given the above, you shouldn't need to worry about the performance of appending characters in Swift (be it via append(_: Character), append(_: UniodeScalar) or appendContentsOf(_: String)), as reallocation of the contiguous storage for a certain String instance should not be very frequent w.r.t. number of single characters needed to be appended for this re-allocation to occur.
Also note that NSMutableString is not "purely native" Swift, but belong to the family of bridged Obj-C classes (accessible via Foundation).
A note to your comment
"I thought that String was immutable, but I noticed its append method returns Void."
String is just a (value) type, that may be used by mutable as well as immutable properties
var foo = "foo" // mutable
let bar = "bar" // immutable
/* (both the above inferred to be of type 'String') */
The mutating void-return instance methods append(_: Character) and append(_: UniodeScalar) are accessible to mutable as well as immutable String instances, but naturally using them with the latter will yield a compile time error
let chars : [Character] = ["b","a","r"]
foo.append(chars[0]) // "foob"
bar.append(chars[0]) // error: cannot use mutating member on immutable value ...

What exactly are strings in Nim?

From what I understand, strings in Nim are basically a mutable sequence of bytes and that they are copied on assignment.
Given that, I assumed that sizeof would tell me (like len) the number of bytes, but instead it always gives 8 on my 64-bit machine, so it seems to be holding a pointer.
Given that, I have the following questions...
What was the motivation behind copy on assignment? Is it because they're mutable?
Is there ever a time when it isn't copied when assigned? (I assume non-var function parameters don't copy. Anything else?)
Are they optimized such that they only actually get copied if/when they're mutated?
Is there any significant difference between a string and a sequence, or can the answers to the above questions be equally applied to all sequences?
Anything else in general worth noting?
Thank you!
The definition of strings actually is in system.nim, just under another name:
type
TGenericSeq {.compilerproc, pure, inheritable.} = object
len, reserved: int
PGenericSeq {.exportc.} = ptr TGenericSeq
UncheckedCharArray {.unchecked.} = array[0..ArrayDummySize, char]
# len and space without counting the terminating zero:
NimStringDesc {.compilerproc, final.} = object of TGenericSeq
data: UncheckedCharArray
NimString = ptr NimStringDesc
So a string is a raw pointer to an object with a len, reserved and data field. The procs for strings are defined in sysstr.nim.
The semantics of string assignments have been chosen to be the same as for all value types (not ref or ptr) in Nim by default, so you can assume that assignments create a copy. When a copy is unneccessary, the compiler can leave it out, but I'm not sure how much that is happening so far. Passing strings into a proc doesn't copy them. There is no optimization that prevents string copies until they are mutated. Sequences behave in the same way.
You can change the default assignment behaviour of strings and seqs by marking them as shallow, then no copy is done on assignment:
var s = "foo"
shallow s

Obtaining a plain char* from a string in D?

I'm having an absolute hell of a time trying to figure out how to get a plain, mutable C string (a char*) from a D string (a immutable(char)[]) to that I can pass the character data to legacy C code. toStringz doesn't work, as I get an error saying that I "cannot implicitly convert expression (toStringz(this.fileName())) of type immutable(char)* to char*". Do I need to recreate a new, mutable array of char and copy the characters over?
If you can change the header of the D interface of that legacy C code, and you are sure that legacy C code will not modify the string, you could make it accept a const(char)*, e.g.
char* strncpy(char* dest, const(char)* src, size_t count);
// ^^^^^^^^^^^^
Yeah, it's not pretty, because the result is immutable.
This is why I always return a mutable copy of new arrays in my code. There's no point in making them immutable.
Solutions:
You could just do
char[] buffer = (myString ~ '\0').dup; //Concatenate a null terminator, then dup
then use buffer.ptr as the pointer.
However:
This wastes a string. A better approach might be:
char[] buffer = myString.dup;
buffer ~= '\0'; //Hopefully this doesn't reallocate
and using buffer.ptr afterwards.
Another solution is to use a method like this one:
char* toStringz(in char[] s)
{
string result;
if (s.length > 0 && s[$ - 1] == '\0') //Is it null-terminated?
{ result = s.dup; }
else { result = new char[s.length + 1]; result[0 .. s.length][] = s[]; }
return result.ptr;
}
This one is the most efficient but also the longest.
(Edit: Whoops, I had a typo in the if; fixed it.)
If you want to pass a mutable char* to a C function, you're going to need to allocate a mutable char[]. string isn't going to work, because it's immutable(char)[]. You can't alter immutable variables, so there is no way to pass a string to a function (C or otherwise) which needs to alter its elements.
So, if you have a string, and you need to pass it to a function which takes a char[], then you can use to!(char[]) or dup and get a mutable copy of it. In addition, if you want to pass it to a C function, you're going to need to append a '\0' to it so that it's zero-terminated. The easiest way to do that is just to do ~= '\0' on the char[], but the more efficient way would probably be to do something like this:
auto cstring = new char[](str.length + 1);
cstring[0 .. str.length] = str[];
cstring[$ - 1] = '\0';
In either case, you then pass cstring.ptr to the C function that you're calling.
If you know that the C function that you're calling isn't going to alter the string, then you can either do as KennyTM suggests and alter the C function's signature in D to take a const(char)*, or you can cast the string. e.g.
auto cstring = toStringz(str);
cfunc(cast(char*)cstring.ptr);
Altering the C function's signature would be more correct and less error-prone though.
It sounds like we may be altering std.conv.to to be smart enough to turn strings into zero-terminated strings when cast to char*, const(char)*, etc. So, once that's done, getting a zero-terminated mutable string should be easier, but for the moment, you pretty much just need to copy the string and append a '\0' to it so that it's zero-terminated. But regardless, you're never going to be able to pass a string to a C function which needs to modify it, because a string can't be mutated.
Without any context on which function you're calling it's hard to say what is the right solution.
Typically, if the C function wants to modify or write to the string it probably expects you to provide a buffer and a length. Usually what I do is:
Allocate a buffer:
auto buffer = new char[](256); // your own length instead of 256 here
And call the C function:
CWriteString(buffer.ptr, buffer.length);
You can try the following :
char a[]="abc";
char *p=a;
Now you can pass pointer 'p' to the array in any function.
Hope it works.

Resources