Which wide-character string structure do I use? CString vs wstring

Which wide-character string structure do I use? CString vs wstring - string

I have an MFC application in C++ that uses std::string and std::wstring, and frequently casts from one to the other, and a whole lot of other nonsense. I need to standardize everything to a single format, so I was wondering if I should go with CString or std::wstring.
In the application, I'll need to generate strings from a string table, work with a lot of windows calls that require constant tchar or wchar_t pointers, Edit Controls, and interact with a COM object's API that requires BSTR.
I also have vectors of strings, so is there any problem with a vector of CStrings?
Which one is better? What are the pros and cons of each?
Examples
BSTR to wstring
CComBSTR tstr;
wstring album;
if( (trk->get_Info((BSTR *)&tstr)) == S_OK && tstr!= NULL)
album = (wstring)tstr;
wstring to BSTR
CComBSTR tstr = path.c_str();
if(trk->set_Info(tstr) == S_OK)
return true;
String resource to wstring
CString t;
wstring url;
t.LoadString(IDS_SCRIPTURL);
url = t;
GetProfileString() returns a CString.
integer to string format:
wchar_t total[32];
swprintf_s(total, 32, L"%d", trk->getInt());
wstring tot(total);

std::basic_string<> (or rather its specialisations) is horrible to work with, it's imo one of the major shortcomings of the STL (and I'd say C++ in general). It doesn't even know about encodings - c'mon, this is 2010. Being able to define the size of your character isn't enough, 'cause there's no way indicate variable-size characters in a basic_string<>. Now, utf-8 isn't nice to work with with a CString, but it's not as bad as trying to do it with basic_string. While I agree with the spirit of the above posters that a standard solution is better than the alternatives, CString is (if your project uses MFC or ATL anyway) much nicer to work with than std::string/wstring: conversions between ANSI/Unicode (through CStringA and CStringW), BSTR, loading from string table, cast operators to TCHAR (.c_str()? really?), ...
CString also has Format(), which, although not safe and somewhat ugly, is convenient. If you prefer safe formatting libraries, you'll be better off with basic_string.
Furthermore, CString has some algorithms as member functions that you'll need boost string utilities for to do on basic_string such as trim, split etc.
Vectors of CString are no problem.
Guard against a dogmatic dismissal of CString on the basis of it being Windows-only: if you use it in a Windows GUI, the application is Windows-only anyway. That being said, if there's any chance that your code will need to be cross-platform in the future, you're going to be stuck with basic_string<>.

I personally would go with CStrings in this case, since you state that you're working with BSTRs, using COM, and writing this in MFC. While wstrings would be more standards compliant you'll run into issues with constant converting from one to another. Since you're working with COM and writing it in MFC, there's no real reason to worry about making it cross-platform, since no other OS has COM like Windows and MFC is already locking you into Windows.
As you noted, CStrings also have built-in functions to help load strings and convert to BSTRs and the like, all pre-made and already built to work with Windows. So while you need to standardize on one format, why not make it easier to work with?

std::wstring would be much more portable, and benefit from a lot of existing prewritten code in STL and in boost. CString would probably go better with windows API's.
wchat_t : remember that you can get the data out of wstring any time by using the data() function, so you get the needed wchar_t pointer anyway.
BSTR : use SysAllocString to get the BSTR out of a wstring.data().
As for the platform dependance, remember that you can use std::basic_string<T> to define your own string, based on what you want the length of a single character to be.
I'd go for wstring every day....

Related

Convert from a LPCTSTR to a wchar*

I have a c++ application, and i need to convert a LPCTSTR to a wchar*.
Is there function to perform this conversion?
Using Visual Studio 2k8.
Thank you

From the comments, you are compiling for Unicode. In which case LPCTSTR evaluates as const wchar_t* and so no conversion is necessary. If you need a modifiable buffer, then you can allocate one and perform a memory copy. That works because the string is already encoded in UTF-16.
Since you are using C++ it makes sense to store strings in string classes rather than using raw C strings. For example you could use std::wstring. Or your could use the MFC/ATL string classes. Exactly which of these options is best for you depends on the specifics of the rest of your code base.

LPCTSTR may either be multibyte or Unicode, determined at compile-time. WinNT.h defines it as follows
#ifdef UNICODE
typedef LPCWSTR LPCTSTR;
#else
typedef LPCSTR LPCTSTR
#endif
meaning that it is already composed of wchar as Rup points out in a comment. So you might want to check UNICODE and use MultiByteToWideChar() if undefined. Of course, you'd need to know the code page the string is using, which depends on where and how it originates. The MultiByteToWideChar documentation has good code samples.

C++/CX - How to convert a number stored as String^ to byte?

I have a C++/CX Windows 8 application and I need to do something similar to the following conversion:
String^ foo = "32";
byte bar = <the numeric value of foo>
How can I convert the number stored within the String^ into the byte type? I am lost without all of the C# magic that I normally use to achieve this!
Thanks in advance for any help on this.

You are getting into trouble by assuming that C++/CX resembles C#. That's not the case at all, it is pure C++ with just some language extensions to make dealing with WinRT types easier. This is not appropriate use of the Platform::String type, it is not a general purpose string class. That's already covered by the standard C++ library. The class was intentionally crippled to discourage the usage you have in mind. This MSDN library article explains it well:
Use the Platform::String Class when you pass strings back and forth to methods in Windows Runtime classes, or when you are interacting with other Windows Runtime components across the application binary interface (ABI) boundary. The Platform::String Class provides methods for several common string operations, but it's not designed to be a full-featured string class. In your C++ module, use standard C++ string types such as wstring for any significant text processing, and then convert the final result to Platform::String^ before you pass it to or from a public interface. It's easy and efficient to convert between wstring or wchar_t* and Platform::String.
So the appropriate code ought to resemble:
#include <string>
...
std::wstring foo(L"32");
auto bar = static_cast<unsigned char>(std::stol(foo));

ATL CString or_bstr_t?

In our COM project, we need to choose between best string class implementation so that BSTR (used for COM interfaces) and elegant string class like CString provides many string manipulation APIs.
Are there any better way to handle the strings and string operations so that it can be BSTR complaints as well as we can have naive CString operations?

Unortunately nothing realy elegant here. The best you can do is to use CString::AllocSysString() and you better use a BSTR wrapper like CComBSTR or _bstr_t to manage the resulting BSTR lifetime. See this question for how it usually done.

CString has AllocSysString function:
http://msdn.microsoft.com/en-us/library/za1865s1%28VS.80%29.aspx
You can use it before calling COM methods.
You can use _bstr_t::Attach to create _bstr_t instance from CString::AllocSysString call, in this case you don't need to care about BSTR release.

Why use CComBSTR instead of just passing a WCHAR*?

I'm new to COM. What exactly is the advantage of replacing:
L"String"
with
CComBSTR(L"String")
I can see a changelist in the COM part of my .NET application where all strings are replaced in this way. Would like to know what is the need for this.

BSTR is not the same as WCHAR[]. BSTR values are prefixed with their length, as well as null-terminated.
If you're dealing with in-process objects that are written in C or C++, you'll usually get away with this, because the C/C++ code will probably assume that your BSTR is a null-terminated wide character string.
If, on the other hand, you're dealing with out-of-process/cross-machine objects, the proxy/stub marshalling code will assume that you really did pass a BSTR, and will expect to find a length field (it needs this to know how much data to marshal). This will go horribly wrong.
In short: if something expects a BSTR, call SysAllocString (or CComBSTR, or CString::AllocSysString).

You could just pass L"Something" into a COM method declared as expecting a BSTR but you should never do so.
The convention is that a BSTR is allocated using one of SysAllocString() family functions and anyone who receives a BSTR can (and should) call SysStringLen() whenever they want to find the string length. SysStringLen() relies on the BSTR being allocated with SysAllocString() family functions (since it uses extra data allocated and initialized by those functions) and if that requirement is violated the program will run into undefined behaviour.
Using SysAllocString() directly requires also calling SysFreeString() to free the string (otherwise memory is leaked) so it leads to lots of code and possibly to errors. The better way is just using a wrapper class such as CComBSTR or _bstr_t for managing the BSTRs - they will call SysAllocString()/SysFreeString() when necessary (unless you abuse them, of course).

Why are there so many string types in MFC?

LPTSTR* arrpsz = new LPTSTR[ m_iNumColumns ];
arrpsz[ 0 ] = new TCHAR[ lstrlen( pszText ) + 1 ];
(void)lstrcpy( arrpsz[ 0 ], pszText );
This is a code snippet about String in MFC and there are also _T("HELLO"). Why are there so many String types in MFC? What are they used for?

Strictly speaking, what you're showing here are windows specific strings, not MFC String types (but your point is even better taken if you add in CString and std::string). It's more complex than it needs to be -- largely for historical reasons.
tchar.h is definitely worth looking at -- also search for TCHAR on MSDN.
There's an old joke about string processing in C that you may find amusing: string handling in C is so efficient because there's no string type.

Historical reasons.
The original windows APIs were in C (unless the real originals were in Pascal and have been lost in the mists). Microsoft created its own datatypes to represent C datatypes, likely because C datatypes are not standard in their size. (For C integral types, char is at least 8 bits, short is at least 16 bits and at least as big as a char, int is at least 16 bits and at least as big as a short, and long is at least 32 bits and at least as big as an int.) Since Windows ran first on essentially 16-bit systems and later 32-bit, C compilers didn't necessarily agree on sizes. Microsoft further designated more complex types, so (if I've got this right) a C char * would be referred to as a LPCSTR.
Thing is, an 8-bit character is not suitable for Unicode, as UTF-8 is not easy to retrofit into C or C++. Therefore, they needed a wide character type, which in C would be referred to as wchar_t, but which got a set of Microsoft datatypes corresponding to the earlier ones. Furthermore, since people might want to compile sometimes in Unicode and sometimes in ASCII, they made the TCHAR character type, and corresponding string types, which would be based on either char (for ASCII compilation) or wchar_t (for Unicode).
Then came MFC and C++ (sigh of relief) and Microsoft wanted a string type. Since this was before the standardization of C++, there was no std::string, so they invented CString. (They also had container classes that weren't compatible with what came to be the STL and then the containers part of the library.)
Like any mature and heavily used application or API, there's a whole lot in it that would be done completely differently if it were possible to do it over from scratch.

See Generic-Text Mappings in TCHAR.H and the description of LPTSTR in Windows Data Types.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string