How to know if wstring can be safely (no data loss) converted to string? - string

So I already know how to convert wstring to string (How to convert wstring into string?).
However, I would like to to know whether it is safe to make the conversion, meaning, the wstring variable does not contain any characters that are not supported in string type.

strings can hold any data, if you use the right encoding. They are just sequences of bytes. But you need to check with your particular encoding / conversion routine.
Should be simply a matter of round-tripping. An elegant solution to many things.
Warning, Pseudo-code, there is no literal convert_to_wstring() unless you make it so:
if(convert_to_wstring(convert_to_string(ws)) == ws)
happy_days();
If what goes in comes out, it is non-lossy (at least for your code points).
Not that its the most efficient solution, but should allow you to build from your favorite conversion routines.
// Round-trip and see if we lose anything
bool check_ws2s(const std::wstring& wstr)
{
return (s2ws(ws2s(str)) == wstr);
}
Using #dk123's conversions for C++11 at How to convert wstring into string? (Upvote his answer here https://stackoverflow.com/a/18374698/257090)
wstring s2ws(const std::string& str)
{
typedef std::codecvt_utf8<wchar_t> convert_typeX;
std::wstring_convert<convert_typeX, wchar_t> converterX;
return converterX.from_bytes(str);
}
string ws2s(const std::wstring& wstr)
{
typedef std::codecvt_utf8<wchar_t> convert_typeX;
std::wstring_convert<convert_typeX, wchar_t> converterX;
return converterX.to_bytes(wstr);
}
Note, if your idea of conversion is truncating the wide chars to chars, then it is simply a matter of iterating and checking that each wide char value fits in a char. This will probably do it.
WARNING: Not appropriate for multibyte encoding.
for(wchar_t& wc: ws) {
if(wc > static_cast<char>::(wc))
return false;
}
return true;
Or:
// Could use a narrowing cast comparison, but this avoids any warnings
for(wchar_t& wc: ws) {
if(wc > std::numeric_limits<char>::max())
return false;
}
return true;
FWIW, in Win32, there are conversion routines that accept a parameter of WC_ERR_INVALID_CHARS that tells the routine to fail instead of silently dropping code points. Non-standard solutions, of course.
Example: WideCharToMultiByte()
http://msdn.microsoft.com/en-us/library/windows/desktop/dd374130(v=vs.85).aspx

Related

Convert an Arduino C++ String to an ANSI C string (char* array) using the String method .c_str() [duplicate]

This question already has answers here:
How to convert a std::string to const char* or char*
(11 answers)
Closed 1 year ago.
because many posts about this problem are misleading or ambiguous, this is how it works, just for the record (tested):
How to convert an Arduino C++ String to an ANSI C string (char* array) using the String method .c_str()
String myString = "WLAN_123456789"; // Arduino String
char cbuffer[128]=""; // ANSI C string (char* array); adjust size appropriately
strcpy( cbuffer, myString.c_str() );
example (tested):
void setup() {
Serial.begin(115200);
delay(1000);
Serial.println();
String myString = "WLAN_123456789";
char cbuffer[128]="";
Serial.println(myString); // debug
strcpy( cbuffer, myString.c_str() );
Serial.println(cbuffer); // debug
}
void loop() {
}
PS:
the example is for variable Strings and for variable char arrays which are not constant, to be able to be assigned to repeatedly and arbitrarily
(also for constant strings optionally, too).
I know your intension is trying to share "best practise" of using String.c_str(), but I couldn't help to say that your "answer" is exactly the one that cause confusion, misleading or ambiguous that you said. If anyone find String.c_str() usage is misleading or ambiguous, it is because one probably does not read the c_str() on Arduino Reference, or fully understand the pointer well.
Converts the contents of a String as a C-style, null-terminated string. Note that this gives direct access to the internal String buffer and should be used with care. In particular, you should never modify the string through the pointer returned. When you modify the String object, or when it is destroyed, any pointer previously returned by c_str() becomes invalid and should not be used any longer.
String.c_str() return a pointer to char array (i.e. a array of char terminated with a '\0'), in another word const char* ptr = String.c_str() represented exactly what String.c_str() is about. Noticed that it is directly access to the buffer in the String object and you are not supposed to change it. So in most case, you use it like Serial.print(ptr). You only need to get a copy of it (as show in your strcpy) if you want to modified the content of your 'copy'.
To put it simply, if the char array (i.e. String.c_str()) is the banana you want to get from a jungle (i.e. the String object) which consists of the forest, the monkey and the whole nine yards, the String.c_str() point you directly to the banana, nothing more, nothing less.

C++ override quotes

Ok, so I'm using C++ to make a library that'd help me to print lines into a console.
So, I want to override " "(quote operators) to create an std::string instead of the string literal, to make it easier for me to append other data types to that string I want to output.
I've seen this done before in the wxWidgets with their wxString, but I have no idea how I can do that myself.
Is that possible and how would I go about doing it?
I've already tried using this code, but with no luck:
class PString{
std::string operator""(const char* text, std::size_t len) {
return std::string(text, len);
}
};
I get this error:
error: expected suffix identifier
std::string operator""(const char* text, std::size_t len) {
^~
which, I'd assume, want me to add a suffix after the "", but I don't want that. I want to only use ""(quotes).
Thanks!
You can't use "" without defining a suffix. "" is a const char* by itself either with a prefix (like L"", u"", U"", u8"", R"()") or followed by suffixes like (""s, ""sv, ...) which can be overloaded.
The way that wxString works is set and implicit constructor wxString::wxString(const char*); so that when you pass "some string" into a function it is essentially the same as wxString("some string").
Overriding operator ""X yields string literals as the other answer.

Listing files in directory

I have created a windows form in c++ which, upon a button click, opens a dialog box for folder selection.
Now what I would like to do is get the list of files in that directory so that I can process them one by one.
I have googled it in many ways, and found many ways which include external libraries (such as boost and diren.h). I would not like to use external resources, but the ones at my disposal, the default ones.
I've read about FindFirstFile and FindNextFile, but couldnt get that combination to work.
Could you please assist?
Thanks a lot,
Idan.
Here is the updated code:
HANDLE hFind;
WIN32_FIND_DATA FindFileData;
FolderBrowserDialog^ folderBrowserDialog1 = gcnew FolderBrowserDialog;
if (folderBrowserDialog1->ShowDialog() == System::Windows::Forms::DialogResult::OK)
{
String ^ selected = folderBrowserDialog1->SelectedPath;
selected += "\\*";
char* stringPointer = (char*) Marshal::StringToHGlobalAnsi(selected).ToPointer();
hFind = FindFirstFile((LPCWSTR)stringPointer, &FindFileData);
while(hFind != INVALID_HANDLE_VALUE)
{
printf("Found file: %s\r\n", FindFileData.cFileName);
if(FindNextFile(hFind, &FindFileData) == FALSE)
break;
}
}
You obviously compile for UNICODE (wide char) since you need to cast the newStr for the lpFileName parameter of FindFirstFile. But since you pass an ANSI string, you probable won't get a useful result. Youd didn't write, what you expect to find.
In the code beforer FindFirstFile you manually convert the SelectedPath value to ANSI char. That makes no sense, when you need a wide char string anyway. Get the LPCWSTR from the String selected with the StringToHGlobalUni method. This looks somehow like this (not tested):
LPCWSTR stringPointer = Marshal::StringToHGlobalAnsi(selected).ToPointer();
hFind = FindFirstFile(stringPointer, &FindFileData);
In general: Don't use casts except when you need to adapt a bad designed interface. Use it only when you know exactly what you are doing.
Further you don't check the hFind result of FindFirstFile. It will be INVALID_HANDLE_VALUE if you pass a pointer to the wrong string format.

Converting wchar_t* to char* on iOS

I'm attempting to convert a wchar_t* to a char*. Here's my code:
size_t result = wcstombs(returned, str, length + 1);
if (result == (size_t)-1) {
int error = errno;
}
It indeed fails, and error is filled with 92 (ENOPROTOOPT) - Protocol not available.
I've even tried setting the locale:
setlocale(LC_ALL, "C");
And this one too:
setlocale(LC_ALL, "");
I'm tempted to just throw the characters with static casts!
Seems the issue was that the source string was encoded with a non-standard encoding (two ASCII characters for each wide character), which looked fine in the debugger, but clearly internally was sour. The error code produced is clearly not documented, but it's the equivalent to simply not being able to decode said piece of text.

LPCSTR, TCHAR, String

I am use next type of strings:
LPCSTR, TCHAR, String i want to convert:
from TCHAR to LPCSTR
from String to char
I convert from TCHAR to LPCSTR by that code:
RunPath = TEXT("C:\\1");
LPCSTR Path = (LPCSTR)RunPath;
From String to char i convert by that code:
SaveFileDialog^ saveFileDialog1 = gcnew SaveFileDialog;
saveFileDialog1->Title = "Сохранение файла-настроек";
saveFileDialog1->Filter = "bck files (*.bck)|*.bck";
saveFileDialog1->RestoreDirectory = true;
pin_ptr<const wchar_t> wch = TEXT("");
if ( saveFileDialog1->ShowDialog() == System::Windows::Forms::DialogResult::OK ) {
wch = PtrToStringChars(saveFileDialog1->FileName);
} else return;
ofstream os(wch, ios::binary);
My problem is that when i set "Configuration Properties -> General
Character Set in "Use Multi-Byte Character Set" the first part of code work correctly. But the second part of code return error C2440. When i set "Configuration Properties -> General
Character Set in "Use Unicode" the second part of code work correctly. But the first part of code return the only first character from TCHAR to LPCSTR.
I'd suggest you need to be using Unicode the whole way through.
LPCSTR is a "Long Pointer to a C-type String". That's typically not what you want when you're dealing with .Net methods. The char type in .Net is 16bits wide.
You also should not use the TEXT("") macro unless you're planning multiple builds using various character encodings. Try wrapping all your string literals with the _W("") macro instead and a pure unicode build if you can.
See if that helps.
PS. std::wstring is very handy in your scenario.
EDIT
You see only one character because the string is now unicode but you cast it as a regular string. Many or most of the Unicode characters in the ASCII range has their same number as in ASCII but have the second of their 2 bytes set to zero. So when a unicode string is read as a C-string you only see the first character because C-strings are null ( zero ) terminated. The easy ( and wrong ) way to deal with this is to use std:wstring to cast as a std:string then pull the C-String out of that. This is not the safe approach because Unicode has a much large character space then your standard encoding.

Resources