Convert char array to UNICODE in MFC C++

Convert char array to UNICODE in MFC C++ - visual-c++

I'm using the folowing code to read files from a folder in windows. However since this a MFC application I have to convert the char array to UNICODE. For example if I hard code the path as "C:\images3\test\" as shown below the code works.
WIN32_FIND_DATA FindFileData;
HANDLE hFind = INVALID_HANDLE_VALUE;
hFind = FindFirstFile(_T("C:\\images3\\test\\"), &FindFileData);
What I want is to get this working as follows:
char* pathOfFileType;
hFind = FindFirstFile(_T(pathOfFileType), &FindFileData);
Can anyone tell me how to fix this problem ?
Thanks

Thanks a lot for all your responses. I learnt a lot from those answers because I also didn't have much idea about what is happening underneath. Meanwhile I managed to get rid of the issue by simply converting to UNICODE using the following code with minimum changes to my existing code.
#include <atlconv.h>
USES_CONVERSION;
//An ANSI string
LPSTR lpsz_ANSI_String = pathOfFileType;
//ANSI string being converted to a UNICODE string
LPWSTR lpUnicodeStr = A2W( lpsz_ANSI_String );
hFind = FindFirstFile(lpUnicodeStr, &FindFileData);

You can use the MultiByteToWideChar function to convert a string from chars to UTF-16, but you'd better to get pathOfFileType directly in Unicode from the user or from wherever you take it, otherwise you may still experience problems with paths that contain characters not included in the current CP.

Your question demonstrates a confusion of several issues. First, using MFC doesn't mean you have to convert the character array to Unicode, one has nothing to do with the other. Furthermore, FindFirstFile is a Win32 API, not an MFC function. Finaly, _T("abc") is not necessarily unicode, rather _T(X) is a macro that in multi-byte builds expands to X, and in unicode builds expands to L X, creating a wide character literal. This is designed so that your code can compile in a unciode or multi-byte configuration. To achieve the same flexibility when declaring a variable, you use the TCHAR type instead of char or wchar_t. So your second snippet should look like
TCHAR* pathOfFileType;
hFind = FindFirstFile(pathOfFileType, &FindFileData);
Note no _T macro, that is only applied to string literals, not identifiers.

"since this a MFC application I have to convert the char array to UNICODE"
Not so. If you wish, you can use change to use the Multi-Byte Character Set.
In project properties, general change character set to 'Use Multi-Byte Character Set'
Now this will work
char* pathOfFileType;
hFind = FindFirstFile(pathOfFileType, &FindFileData);
Supposing you want to use UNICODE ( visual studio's name for the 2 byte encoding of UNICODE characters native to Windows ) then you have to explicitly call the MBCS version of the API
char* pathOfFileType;
hFind = FindFirstFileA(pathOfFileType, &FindFileData);

Related

Converting Scandinavian letters from wstring to string

Goal
Converting a wstring containing ÅåÄäÖöÆæØø into a string in C++.
Environment
C++17, Visual Studio Community 2017, Windows 10 Pro 64-bit
Description
I'm trying to convert a wstring to string and has implemented the solution suggested at
https://stackoverflow.com/a/3999597/1997617
// This is the code I use:
// Convert a wide Unicode string to an UTF8 string
std::string toString(const std::wstring &wstr)
{
if (wstr.empty()) return std::string();
int size_needed = WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), NULL, 0, NULL, NULL);
std::string strTo(size_needed, 0);
WideCharToMultiByte(CP_UTF8, 0, &wstr[0], (int)wstr.size(), &strTo[0], size_needed, NULL, NULL);
return strTo;
}
So far so good.
My problem is that I have to handle scandinavian letters (ÅåÄäÖöÆæØø) in addition to the English ones. Regard the input wstring below.
L"C:\\Users\\BjornLa\\Å-å-Ä-ä-Ö-ö Æ-æ-Ø-ø\\AEther Adept.jpg"
When returned it has become...
"C:\\Users\\BjornLa\\Ã…-Ã¥-Ã„-Ã¤-Ã–-Ã¶ Ã†-Ã¦-Ã˜-Ã¸\\AEther Adept.jpg"
... which causes me some trouble.
Question
So I would like to ask an often asked question, but with a small addition:
How do I convert a wstring to a string when it contains Scandinavian characters?

So, I did some additional read-up and experimenting based on the comments I've got.
Turns at the solution is quite simple. Just change CP_UTF8 to CP_ACP!
However...
Microsoft suggest that one actually should use CP_UTF8, if you read between the lines at the MSDN method documentation. The note for CP_ACP reads:
This value can be different on different computers, even on the same
network. It can be changed on the same computer, leading to stored
data becoming irrecoverably corrupted. This value is only intended for
temporary use and permanent storage should use UTF-16 or UTF-8 if
possible.
Also, the note for the entire method reads:
The ANSI code pages can be different on different computers, or can be
changed for a single computer, leading to data corruption. For the
most consistent results, applications should use Unicode, such as
UTF-8 or UTF-16, instead of a specific code page, unless legacy
standards or data formats prevent the use of Unicode. If using Unicode
is not possible, applications should tag the data stream with the
appropriate encoding name when protocols allow it. HTML and XML files
allow tagging, but text files do not.
So even though this CP_ACP-solution works fine for my test-cases, it remains to see if it is an overall good solution.

VC6 /r/n and Write works; Visual Studio 2013 does not work

the following code
if(!cfile.Open(fileName, CFile::modeCreate | CFile::modeReadWrite)){
return;
}
ggg.Format(_T("0 \r\n"));
cfile.Write(ggg, ggg.GetLength());
ggg.Format(_T("SECTION \r\n"));
cfile.Write(ggg, ggg.GetLength());
produces the following:
0 SECTI
clearly this is wrong: (a) \r\n is ignored, and (b) the word SECTION is cut off.
Can someone please tell me what I am doing wrong?
The same code without _T() in VC6 produces the correct results.
Thank you
a.

Apparently, you are building a Unicode build; CString (presumably that's what ggg is) holds a sequence of wchar_t characters, each two bytes large. ggg.GetLength() is the length of the string in characters.
However, CFile::Write takes the length in bytes, not in characters. You are passing half the number of bytes actually taken by the string, so only half the number of characters gets written.

Have you considered changing lines like:
cfile.Write(ggg, ggg.GetLength());
to`
cfile.Write(ggg, ggg.GetLength() * sizeof(TCHAR))
Write needs the number of bytes (not characters). Since Unicode is 2 bytes wide you need to account for that. sizeof(TCHAR) should be the number of bytes each character takes on a given platform. If it is built as Ansi it would be 1 and Unicode would have 2. Multiply that by the string length and the number of bytes should be correct.
Information on TCHAR can be found on MSDN documentation here. In particular it is defined as:
The _TCHAR data type is defined conditionally in Tchar.h. If the symbol _UNICODE is defined for your build, _TCHAR is defined as wchar_t; otherwise, for single-byte and MBCS builds, it is defined as char. (wchar_t, the basic Unicode wide-character data type, is the 16-bit counterpart to an 8-bit signed char.)
TCHAR and _TCHAR in your usage should be synonymous. However I believe these days Microsoft recommends including <tchar.h> and using _TCHAR. What I can't tell you is if _TCHAR existed on VC 6.0.
If using the method above - if you build using Unicode your output files will be in Unicode. If you build for Ansi it will be output as 8bit ASCII.
Want CFile.write to output Ascii no matter what? Read on...
If you want all text written to the file as 8bit ASCII you are going to have to use one of the macros for conversion. In particular CT2A. More on the macros can be found in this MSDN article. Each macro can be broken up by name, however CT2A says convert the Generic character string (equivalent to W when _UNICODE is defined, equivalent to A otherwise) to Ascii per the chart at the link. So no matter whether using Unicode or Ascii it would output Ascii. Your code would look something like:
ggg.Format(_T("0 \r\n"));
cfile.Write(CT2A(ggg), ggg.GetLength());
ggg.Format(_T("SECTION \r\n"));
cfile.Write(CT2A(ggg), ggg.GetLength());
Since the macro converts everything to Ascii CString's GetLength() will suffice.

Delphi Embarcadero XE: tons of warnings with String and PAnsiChar

I'm trying to migrate from Delphi 2007 to Embarcadero RAD Studio XE.
I'm getting tons of warnings.
They all look like this: I have a procedure where I declare a "String":
procedure SendMail( ADestinataire,ASubject : String);
And I'm trying to call Windows API like:
Res := MAPIResolveName(Session, Application.Handle,
PAnsiChar(ADestinataire), MAPI_LOGON_UI, 0, PRecip);
So the warnings are:
W1044: Transtyping string into PAnsiChar suspect.
What am I doing wrong / how should I correct this (350 warnings...)?
Thank you very much

MAPIResolveName uses LPSTR parameter, which is PAnsiChar in Delphi. Simple MAPI does not support UTF16 strings (though it can be used with UTF8 strings), so if you adhere to simple MAPI you should use AnsiStrings, for example
procedure SendMail( ADestinataire,ASubject : AnsiString);
or better you can use
procedure SendMail( ADestinataire,ASubject : String);
and explicitly convert string parameters to AnsiStrings prior to calling MAPIResolveName
Update
The whole Simple MAPI is now deprecated; Simple MAPI can be used with UTF8 strings, but it requires some changes in code and registry.
So if the question is about quick porting old ANSI Simple MAPI to Unicode Delphi, the best is to adhere to AnsiStrings.
A more solid approach is to abandon Simple MAPI completely and use Extended MAPI instead.

Just write
Res := MAPIResolveName(Session, Application.Handle,
PChar(ADestinataire), MAPI_LOGON_UI, 0, PRecip);
If you have a string, that is, a Unicode string, that is, a pointer to a sequence of Unicode characters, you shouldn't cast it to PAnsiChar. PAnsiChar is a pointer to a sequence of non-Unicode characters. Indeed, a cast to a PSomethingChar type simply tells the compiler to interpret the thing inside the cast as a pointer of the specified type. It doesn't do any conversion. So, basically, right now you lie to the compiler: You have a Unicode string and instructs the compiler to interpret it as an ANSI (non-Unicode) string. That's bad.
Instead, you should cast it to PWideChar, a pointer to a sequence of Unicode characters. In Delphi 2009+, PChar is equivalent to PWideChar.
Of course, if you send a pointer to a sequence of Unicode characters to the function, then the function had better expect Unicode characters, but I am not sure if this is the case of the MAPIResolveName function. I suspect that it actually requires ANSI (that is, non-Unicode) characters. If this is the case, you need to convert the Unicode string to an ANSI (non-Unicode) string. This is easy, just write AnsiString(ADestinataire). Then you cast to a ANSI (non-Unicode) PAnsiChar:
Res := MAPIResolveName(Session, Application.Handle,
PANsiChar(AnsiString(ADestinataire)), MAPI_LOGON_UI, 0, PRecip);

Starting with Delphi 2009, the string data type is now implemented as a Unicode string. Previous versions implemented the string data type as an Ansi string (i.e. one byte per character).
This had significant implications when we ported our apps from 2007 to XE, and these article were very helpful:
Delphi in a Unicode World (first of three parts)
Delphi and Unicode
Delphi Unicode Migration for Mere Mortals

To step back a moment:
When using use the cast
============ ================
String PChar
AnsiString PAnsiChar
WideString PWideChar
String used to be an alias for AnsiString, which means it happened to work if you used PAnsiChar (even though you should have used PChar).
Now string is an alias for UnicodeString, which means using PAnsiChar (which was always wrong) is now really wrong.
Knowing this you can solve the question:
ADestinataire: String
PAnsiChar(ADestinataire)

Win32 development - String related datatypes in C++

I was going to start with Win32 app development. Before I could get the first window to display i was ready to give up! I was overwhelmed by the number of datatypes you need to know about before you can write a simple WinMain and WndProc. (unless you copy-paste of course!)
Especially these -
LPSTR
LPCSTR
LPWSTR
LPCWSTR
Can someone point me to the right article that explains these with respect to Win32 programming? Which ones should I know about, which ones are needed in what situation, when to go for Unicode, what is multi-byte char set, and all the related stuff.
And the conversion to/from these datatypes to char* and char[] and whatnot, when calling Win32 API functions is a pain.
It is all so confusing.
Thanks for the help.

The pattern is relatively simple:
LPSTR = zero-terminated string of char
LPCSTR = constant zero-terminated string of char (C == constant)
LPWSTR = zero-terminated string of wchar_t (W == wide character)
LPCWSTR = constant zero-terminated string of wchar_t (C and W)
For details and explanations see e.g. http://www.codeproject.com/KB/string/cppstringguide1.aspx
The linked article also contains advice when to use Unicode in your application and when not.

How do I PInvoke a multi-byte ANSI string?

I'm working on a PInvoke wrapper for a library that does not support Unicode strings, but does support multi-byte ANSI strings. While investigating FxCop reports on the library, I noticed that the string marshaling being used had some interesting side effects. The PInvoke method was using "best fit" mapping to create a single-byte ANSI string. For illustration, this is what one method looked like:
[DllImport("thedll.dll", CharSet=CharSet.Ansi)]
public static extern int CreateNewResource(string resourceName);
The result of calling this function with a string that contains non-ASCII characters is that Windows finds a "close" character, generally this looks like it ends up being "???". If we pretend that 'a' is a non-ASCII character, then passing "cat" as a parameter would create a resource named "c?t".
If I follow the guidelines in the FxCop rule, I end up with something like this:
[DllImport("thedll.dll", CharSet=CharSet.Ansi, BestFitMapping = false, ThrowOnUnmappableChar = true)]
public static extern int CreateNewResource([MarshalAs(UnmanagedType.LPStr)] string resourceName);
This introduces a change in behavior; now when a character cannot be mapped an exception is thrown. This concerns me because this is a breaking change, so I'd like to try and marshal the strings as multi-byte ANSI but I cannot see a way to do so. UnmanagedType.LPStr is specified to be a single-byte ANSI string, LPTStr will be Unicode or ANSI depending on the system, and LPWStr is not what the library expects.
How would I tell PInvoke to marshal the string as a multibyte string? I see there's a WideCharToMultiByte() API function, could I change the signature to expect an IntPtr to a string I create in unmanaged memory? It seems like this still has many of the problems that the current implementation has (it still might have to drop or substitute characters), so I'm not sure if this is an improvement. Is there another method of marshaling that I'm missing?

ANSI is multi-byte, and ANSI strings are encoded according to the codepage currently enabled on the system. WideCharToMultiByte works the same way as P/Invoke.
Maybe what you're after is conversion to UTF-8. Although WideCharToMultiByte supports this, I don't think P/Invoke does, since it's not possible to adopt UTF-8 as the system-wide ANSI code page. At this point you'd be looking at passing the string as an IntPtr instead, although if you're doing that, you may as well use the managed Encoding class to do the conversion, rather than WideCharToMultiByte.

Here is the best way I've found to accomplish this. Instead of marshalling as a string, marshal as a byte[]. Put the responsibility on the caller of the pinvoke function API to convert to a byte array in the most appropriate fashion. Most likely by using one of the Text.Encoding classes.

If you end up having to call WideCharToMultiByte manually, I would get rid of the p/invoke and manually marshal this using WideCharToMultiByte in a C++/CLI wrapper function. Managed C++ is much better at these interop scenarios than C# is.
Though, if this is the only p/invoke you have, it's probably not worth it.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Convert char array to UNICODE in MFC C++ - visual-c++

You can use the MultiByteToWideChar function to convert a string from chars to UTF-16, but you'd better to get pathOfFileType directly in Unicode from the user or from wherever you take it, otherwise you may still experience problems with paths that contain characters not included in the current CP.

Related

Converting Scandinavian letters from wstring to string

VC6 /r/n and Write works; Visual Studio 2013 does not work

Delphi Embarcadero XE: tons of warnings with String and PAnsiChar

Win32 development - String related datatypes in C++

How do I PInvoke a multi-byte ANSI string?

Categories

Resources