This is a little different from SameText question.
I need to convert AnsiString into an Integer.
var
param: AnsiString;
num: Integer;
begin
if TryStrToInt(param, num) then
...
In pre-Unicode Delphi I would use TryStrToInt function, but in modern Delphi there is only Unicode version of it, so I'm getting this warning: W1057 Implicit string cast from 'AnsiString' to 'string' upon call.
My question is, how to properly convert AnsiStrings in modern Delphi without getting compiler warnings (and without superfluously having to cast string to UnicodeString(text))
Various options are available to you:
Accept and embrace Unicode. Stop using AnsiString.
Use an explicit conversion to string: TryStrToInt(string(param), num).
Disable warning W1057.
Perform the conversion from ANSI to UTF-16 yourself with a call to MultiByteToWideChar. This isn't a serious option, but if you want to leave W1057 enabled, and not use an explicit conversion, then it's what is left.
Frankly, option 1 is to be preferred. If you try to persist the use of AnsiString through your code you will be wallowing in an endless morass of casts and warnings. If you have a need for ANSI encoded strings it is likely to be at an interop boundary. Perhaps you are reading or writing files that use ANSI encoding. Perform the conversion between ANSI and UTF-16 at the interop boundary. The rest of the time, for your internal code, use string.
how to properly convert AnsiStrings in modern Delphi without getting
compiler warnings (and without superfluously having to cast string to
UnicodeString
If you don't cast from AnsiString to String, the compiler will do it for you. You can't avoid the cast. It's just a matter if you explicitly do it, or the compiler does it for you implicitly.
When you explicitly do the conversion in code (via cast), then the compiler doesn't worry about the side effects. It assumes you know what you're doing and leaves you alone.
You'll have to choose one. Compiler warning or explicit casting.
You could technically turn those compiler warnings off (but don't do this):
W1057 IMPLICIT_STRING_CAST ON Implicit string cast from ‘%s’ to
‘%s’ (Delphi)
W1058 IMPLICIT_STRING_CAST_LOSS ON Implicit string cast with
potential data loss from ‘%s’ to ‘%s’ (Delphi)
Related
When to use which type of STRING in eiffel? I saw using READABLE_STRING_GENERAL and having to l_readable_string.out' to convert it to STRING`
READABLE_STRING_GENERAL is an ancestor of all variants of strings: mutable, immutable, 8-bit, 32-bit, so it can be used as a formal argument type when the feature can handle any string variant.
READABLE_STRING_32 is a good choice when the code handles Unicode, and can work with either mutable or immutable versions.
STRING_32 is a mutable Unicode variant. The code can change its value.
STRING is an alias for a string type that can either be STRING_8 or STRING_32. At the time of writing only a few libraries are adapted to handle the mapping of STRING to STRING_32. However, this mapping could become the default in the future to facilitate working with Unicode.
Regardless of the future, I would recommend using ..._STRING_32 to process strings. This way the code directly supports Unicode. The libraries are also updated in this direction. For example, io.put_string_32 can be used to print a Unicode string to the standard output (using the current locale).
Just as a follow-up (I know I am years late in posting anything).
When converting a bytearray-object (or a bytes-object for that matter) to a C-string, the cython-documentation recommends to use the following:
cdef char * cstr = py_bytearray
there is no overhead, as cstr is pointing to the buffer of the bytearray-object.
However, C-strings are null-terminated and thus in order to be able to pass cstr to a C-function it must also be null-terminated. The cython-documentation doesn't provide any information, whether the resulting C-strings are null-terminated.
It is possible to add a NUL-byte explicitly to the byarray-object, e.g. by using b'text\x00' instead of just `b'text'. Yet this is cumbersome, easy to forget, and there is at least experimental evidence, that the explicit NUL-byte is not needed:
%%cython
from libc.stdio cimport printf
def printit(py_bytearray):
cdef char *ptr = py_bytearray
printf("%s\n", ptr)
And now
printit(bytearray(b'text'))
prints the desired "text" to stdout (which, in the case an IPython-notebook, is obviously not the output shown in the browser).
But is this a lucky coincidence or is there a guarantee, that the buffer of a bytearray-object (or a bytes-object) is null-terminated?
I think it's safe (at least in Python 3), however I'd be a bit wary.
Cython uses the C-API function PyByteArray_AsString. The Python3 documentation for it says "The returned array always has an extra null byte appended." The Python2 version does not have that note so it's difficult to be sure if it's safe.
Practically speaking, I think Python deals with this by always over-allocating bytearrays by one and NULL terminating them (see source code for one example of where this is done).
The only reason to be a bit cautious is that it's perfectly acceptable for bytearrays (and Python strings for that matter) to contain a 0 byte within the string, so it isn't a good indicator of where the end is. Therefore, you should really be using their len anyway. (This is a weak argument though, especially since you're probably the one initializing them, so you know if this should be true)
(My initial version of this answer had something about _PyByteArray_empty_string. #ead pointed out in the comments that I was mistaken about this and hence it's edited out...)
What's the most direct way to use a C string as Rust's Path?
I've got const char * from FFI and need to use it as a filesystem path in Rust.
I'd rather not enforce UTF-8 on the path, so converting through str/String is undesirable.
It should work on Windows at least for ASCII paths.
To clarify: I'm just replacing an existing C implementation that passes the path to fopen with a Rust stdlib implementation. It's not my problem whether it's a valid path or encoded properly for a given filesystem, as long as it's not worse than fopen (and I know fopen basically doesn't work on Windows).
Here's what I've learned:
Path/OsStr always use WTF-8 on Windows, and are an encoding-ignorant bag of bytes on Unix.
They never ever store any paths using any "wide" encoding like UTF-16 or UCS-2. The Windows-only masquerade of OsStr is to hide the WTF-8 encoding, nothing more.
It is extremely unlikely to ever change, because the standard library API supports creation of Path and OsStr from UTF-8 &str without any allocation or mutation of memory (i.e. as_ref() is supported, and its strict API doesn't leave room to implement it as anything other than a pointer cast).
Unix-only zero-copy version (it doesn't even depend on any implementation details):
use std::ffi::{CStr,OsStr};
use std::path::Path;
use std::os::unix::ffi::OsStrExt;
let slice = CStr::from_ptr(c_null_terminated_string_ptr_here);
let osstr = OsStr::from_bytes(slice.to_bytes());
let path: &Path = osstr.as_ref();
On Windows, converting only valid UTF-8 is the best Rust can do without a charade of creating WTF-8 OsString from code units:
…
let str = ::std::str::from_utf8(slice.to_bytes()).expect("keep your surrogates paired");
let path: &Path = str.as_ref();
Safely and portably? Insofar as I'm aware, there isn't a way. My advice is to demand UTF-8 and just pray it never breaks.
The problem is that the only thing you can really say about a "C string" is that it's NUL-terminated. You can't really say anything meaningful about how it's encoded. At least, not with any real certainty.
Unsafely and/or non-portably? If you're running on Linux (and possibly other modern *NIXen), you can maybe use OsStrExt to do the conversion. This only works assuming the C string was a valid path in the first place. If it came from some string processing code that wasn't using the same encoding as the filesystem (which these days is generally "arbitrary bytes that look like UTF-8 but might not be")... well, you'll have to convert it yourself, first.
On Windows? Hahahaha. This depends on where the string came from. C strings embedded in an executable can be in a variety of encodings depending on how the code was compiled. If it came from the OS itself, it could be in one of two different encodings: the thread's OEM codepage, or the thread's ANSI codepage. I never worked out how to check which it's set to. If it came from the console, it would be in whatever the console's input encoding was set to when you received it... assuming it wasn't piped in from something else that was using a different encoding (hi there, PowerShell!). All of the above require you to roll your own transcoding code, since Rust itself avoids this by never, ever using non-Unicode APIs on Windows.
Oh, and don't forget that there is no 8-bit encoding that can properly store Windows paths, since Windows paths are "arbitrary 16-bit words that look like UTF-16 but might not be". [1]
... so, like I said: demand UTF-8 and just pray it never breaks, because trying to do it "correctly" leads to madness.
[1]: I should clarify: there is such an encoding: WTF-8, which is what Rust uses for OsStr and OsString on Windows. The catch is that nothing else on Windows uses this, so it's never going to be how a C string is encoded.
Written in Delphi XE3, my software is communicating with an instrument that occasionally sends binary data. had expected I should use AnsiString since this data will never be Unicode. I couldn't believe that the following code doesn't work as I had expected. I'm supposing that the characters I'm exposing it to are considered illegitimate...
var
s:AnsiString;
begin
s:='test' + chr(128);
// had expected that since the string was set above to end in #128,
// it should end in #128...it does not.
if ord(s[5])<>128 then
ShowMessage('String ending is not as expected!');
end;
Naturally, I could use a pointer to accomplish this but I would think I should probably be using a different kind of string. of course, I could use a byte array but a string would be more convenient.
really, I'd like to know "why" and have some good alternatives.
thanks!
The behaviour you observe stems from the fact that Chr(128) is a UTF-16 WideChar representing U+0080.
When translated to your ANSI locale this does not map to ordinal 128. I would expect U+0080 to have no equivalent in your ANSI locale and therefore map to ? to indicate a failed translation.
Indeed the compiler will even warn you that this can happen. You code when compiled with default compiler options yields these warnings:
W1058 Implicit string cast with potential data loss from 'string' to 'AnsiString'
W1062 Narrowing given wide string constant lost information
Personally I would use configure the warnings to treat both of those warnings as errors.
The fundamental issue is revealed here:
My software is communicating with an instrument that occasionally sends binary data.
The correct data type for byte oriented binary data is an array of byte. In Delphi that would be TBytes.
It is wrong to use AnsiString since that exposes you to codepage translations. You want to be able to specify ordinal values and you categorically do not want text encodings to play a part. You do not want for your program's behaviour to be determined by the prevailing ANSI locale.
Strings are for text. For binary use byte arrays.
I am just summarizing info about implementing a digital tree (Trie) in VBA. I am not asking how to do that so please do not post your solutions - my specific question regarding fixed-length Strings in class modules comes at the end of this post.
A Trie is all about efficiency and performance therefore most of other programming languages use a Char data type to represent members of TrieNodes. Since VBA does not have a Char datatype I was thinking about faking it and using a fixed-length String with 1 character.
Note: I can come up with a work-around to this ie. use Byte and a simple function to convert between Chr() and Asc() or an Enum, or delcare as a private str as String * 1 and take advantage of get/let properties but that's not the point. Stay tuned though because...
According to Public Statement on Microsoft Help Page you can't declare a fixed-length String variable in class modules.
I can't find any reasonable explanation for this constrain.
Can anyone give some insight why such a restriction applies to fixed-length Strings in class modules in VBA?
The VBA/VB6 runtime is heavily reliant on the COM system (oleaut32 et al) and this enforces some rules.
You can export a class flile between VB "stuff" but if you publish (or could theoretically publish) it as a COM object it must be able to describe a "fixed length string" in its interface description/type library so that say a C++ client can consume it.
A fixed length string is "special" because it has active behaviour, i.e. its not a dumb datatype, it behaves somewhat like a class; for example its always padded - if you assign to it it will have trailing spaces, in VBA the compiler adds generated code to get that behaviour. A C++ consumer would be unaware of the fixed-length nature of the string because the interface cant describe it/does not support a corresponding type (a String is a BSTR) which could lead to problems.
Strings are of type BSTR and like a byte array you would still lose the padding semantics if you used one of those instead.