Unicode to char* c++ 11 - string

I want to know if is there any way to convert a unicode code to a string or char in C++ 11.
I've been trying with extended latin unicode letter Á (as an example) which has this codification:
letter: Á
Unicode: 0x00C1
UTF8 literal: \xc3\x81
I've been able to do so if it's hardcoded as:
const char* c = u8"\u00C1";
But if i got the byte sequence as a short, how can I do the equivalent to get the char* or std::string 'Á'?
EDIT, SOLUTION:
I was finally able to do so, here is the solution if anyone needs it:
std::wstring ws;
for(short input : inputList)
{
wchar_t wc(input);
ws += wc;
}
std::wstring_convert<std::codecvt_utf8<wchar_t>> cv;
str = cv.to_bytes(ws);
Thanks for the comments they were very helpful.

The C++11 standard contains codecvt_utf8, which converts between some internal character type (try char16_t if your compiler has it, otherwise wchar_t) and UTF-8 encoding.

The problems is that char is only one byte length, while unicode characters require a size of two bytes.
You can still treat it as char*, but you must remember that you are not dealing with an ascii string (there will be zeros).
You may have to switch to wchar_t.

Related

How to convert AnsiString to std::string in C++ Builder?

I would like to ask how can I get a text input from TEdit control and cast it to std::string (not AnsiString).
For example, if I have a TEdit control with the name User, I get the text from it with the User->Text command. What I want to do is to assign that value to a std::string, for example string my_str = User->Text;.
I would like to ask, how can I do this in C++ Builder? Is there some sort of a ToString() method or sort of, because I was not able to find one.
In C++Builder 2007 and earlier, TEdit::Text is an 8-bit AnsiString in the user's default ANSI locale. It is very straight forward to convert an AnsiString to a std::string - just use the AnsiString::c_str() method to get a null-terminated char* pointer to the AnsiString data, and then you can assign that to the std::string, eg:
std::string my_str = User->Text.c_str();
/* or:
System::AnsiSystem text = User->Text;
std::string my_str(text.c_str(), text.Length());
*/
If you want the std::string data to be in another character encoding, such as UTF-8, then you will have to convert the AnsiString data accordingly, such as with MultiByteToWideChar()/WideCharToMultiByte(), UTF8Encode(), etc, before assigning it to the std::string.
In C++Builder 2009 and later, TEdit::Text is a 16-bit UnicodeString in UTF-16 format. The easiest way to convert a UnicodeString to a std::string is to first convert to an AnsiStringT<CP> (where CP is the desired ANSI codepage - AnsiString uses CP=0, UTF8String uses CP=65001, etc), and then convert that to std::string, eg:
std::string my_str = AnsiString(User->Text).c_str(); // or UTF8String, etc...
/* or:
System::AnsiString text = User->Text; // or UTF8String, etc...
std::string my_str(text.c_str(), text.Length());
*/
Alternatively, in C++11 and later, you can convert the UnicodeString to a std::wstring first, and then use std::wstring_convert, eg:
#include <locale>
std::wstring my_wstr = User->Text.c_str();
/* or:
System::UnicodeString text = User->Text;
std::wstring my_wstr(text.c_str(), text.Length());
*/
// System::Char may be either wchar_t or char16_t, depending
// on which platform you are compiling for...
std::string my_str = std::wstring_convert<std::codecvt_utf8_utf16<System::Char>>{}.to_bytes(my_wstr);
I had a lot of those to migrate from Borland to Embarcadero Rio. So I created a method to do it.
#include <cwchar.h> //std::wcslen
char* __fastcall AnsiOf(wchar_t* w)
{
static char c[STR_CONV_BUF_SIZE];
memset(c, 0, sizeof(c));
WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, w, std::wcslen(w), c, STR_CONV_BUF_SIZE, NULL, NULL);
return c;
}
std::string my_str = AnsiOf((User->Text).c_str());

How to convert BSTR string to Unsigned Char (Using com technology in the appln)

I am writing small application which uses com technology. I want to convert BSTR string to an unsigned char. To do this, i used W2A() Macro to convert from BSTR to String and then copied String.C_STR() to an unsigned char array. The code snippet is as follows:
Send(BSTR *packet, int length)
{
std::string strPacket = W2A(*packet);
unsigned char * pBuffer = new unsigned char [strPacket.length()+1];
memset(pBuffer,0,strPacket.length()+1);
memcpy(pBuffer,strPacket.c_str(),strPacket.length()+1);
}
This works fine when packet contains normal string. But if the packet contains a NUL character in it, the problem occurs. Some unknown characters appear after that NUL in the pBuffer i.e, after conversion.
Can anyone please let me know how to avoid that? Or is there any other way to do it correctly?
A BSTR is a Windows API type and must be managed with API macros or functions. If you cannot use W2A macro because your string may have null chars inside, you will have to use functions as WideCharToMultiByte that can convert from wide characters of BSTR to narrow chararacters for a char*. Be sure to have the SDK documentation. Alternatively, you could make you program use WCHARs

How to convert unsigned char to LPCTSTR in visual c++?

BYTE name[1000];
In my visual c++ project there is a variable defined name with the BYTE data type. If i am not wrong then BYTE is equivalent to unsigned char. Now i want to convert this unsigned char * to LPCTSTR.
How should i do that?
LPCTSTR is defined as either char const* or wchar_t const* based on whether UNICODE is defined or not.
If UNICODE is defined, then you need to convert the multi-byte string to a wide-char string using MultiByteToWideChar.
If UNICODE is not defined, a simple cast will suffice: static_cast< char const* >( name ).
This assumes that name is a null-terminated c-string, in which case defining it BYTE would make no sense. You should use CHAR or TCHAR, based on how are you operating on name.
You can also assign 'name' variable to CString object directly like:
CString strName = name;
And then you can call CString's GetBuffer() or even preferably GetString() method which is more better to get LPCTSTR. The advantage is CString class will perform any conversions required automatically for you. No need to worry about Unicode settings.
LPCTSTR pszName = strName.GetString();

Conversion of CString to float

Some body help me regarding to the following problem
strFixFactorSide = _T("0.5");
dFixFactorSide = atof((const char *)(LPCTSTR)strFixFactorSide);
"dFixFactorSide" takes value as 0.0000;
How I will get correct value?
Use _tstof() instead of atof(), and cast CString to LPCTSTR, and leave it as such, instead of trying to get it to const char *. Forget about const char * (LPCSTR) while you're working with unicode and use only const _TCHAR * (LPCTSTR).
int _tmain(int argc, TCHAR* argv[], TCHAR* envp[])
{
int nRetCode = 0;
CString s1 = _T("123.4");
CString s2 = _T("567.8");
double v1 = _tstof((LPCTSTR)s1);
double v2 = _tstof((LPCTSTR)s2);
_tprintf(_T("%.3f"), v1 + v2);
return nRetCode;
}
and running this correctly gives the expected answer.
I think your CString strFixFactorSide is a Unicode (UTF-16) string.
If it is, the cast (const char *) only changes the pointer type, but the string it points to still remains Unicode.
atof() doesn't work with Unicode strings. If you shove L"0.5" into it, it will fetch bytes 0x30 ('0') and 0x00 (also part of UTF-16 '0'), treat that as a NUL-terminated ASCII string "0" and convert it to 0.0.
If CString strFixFactorSide is a Unicode string, you need to either first convert it to an ASCII string and then apply atof() or use a function capable of converting Unicode strings to numbers. _wtof() can be used for Unicode strings.

LPCSTR, TCHAR, String

I am use next type of strings:
LPCSTR, TCHAR, String i want to convert:
from TCHAR to LPCSTR
from String to char
I convert from TCHAR to LPCSTR by that code:
RunPath = TEXT("C:\\1");
LPCSTR Path = (LPCSTR)RunPath;
From String to char i convert by that code:
SaveFileDialog^ saveFileDialog1 = gcnew SaveFileDialog;
saveFileDialog1->Title = "Сохранение файла-настроек";
saveFileDialog1->Filter = "bck files (*.bck)|*.bck";
saveFileDialog1->RestoreDirectory = true;
pin_ptr<const wchar_t> wch = TEXT("");
if ( saveFileDialog1->ShowDialog() == System::Windows::Forms::DialogResult::OK ) {
wch = PtrToStringChars(saveFileDialog1->FileName);
} else return;
ofstream os(wch, ios::binary);
My problem is that when i set "Configuration Properties -> General
Character Set in "Use Multi-Byte Character Set" the first part of code work correctly. But the second part of code return error C2440. When i set "Configuration Properties -> General
Character Set in "Use Unicode" the second part of code work correctly. But the first part of code return the only first character from TCHAR to LPCSTR.
I'd suggest you need to be using Unicode the whole way through.
LPCSTR is a "Long Pointer to a C-type String". That's typically not what you want when you're dealing with .Net methods. The char type in .Net is 16bits wide.
You also should not use the TEXT("") macro unless you're planning multiple builds using various character encodings. Try wrapping all your string literals with the _W("") macro instead and a pure unicode build if you can.
See if that helps.
PS. std::wstring is very handy in your scenario.
EDIT
You see only one character because the string is now unicode but you cast it as a regular string. Many or most of the Unicode characters in the ASCII range has their same number as in ASCII but have the second of their 2 bytes set to zero. So when a unicode string is read as a C-string you only see the first character because C-strings are null ( zero ) terminated. The easy ( and wrong ) way to deal with this is to use std:wstring to cast as a std:string then pull the C-String out of that. This is not the safe approach because Unicode has a much large character space then your standard encoding.

Resources