Code points of some Unicode characters (like 𤭢) consume more than 2-bytes. How do I use Win32 API functions like CreateFile() with these characters?
WinBase.h
WINBASEAPI
__out
HANDLE
WINAPI
CreateFileA(
__in LPCSTR lpFileName,
__in DWORD dwDesiredAccess,
__in DWORD dwShareMode,
__in_opt LPSECURITY_ATTRIBUTES lpSecurityAttributes,
__in DWORD dwCreationDisposition,
__in DWORD dwFlagsAndAttributes,
__in_opt HANDLE hTemplateFile
);
WINBASEAPI
__out
HANDLE
WINAPI
CreateFileW(
__in LPCWSTR lpFileName,
__in DWORD dwDesiredAccess,
__in DWORD dwShareMode,
__in_opt LPSECURITY_ATTRIBUTES lpSecurityAttributes,
__in DWORD dwCreationDisposition,
__in DWORD dwFlagsAndAttributes,
__in_opt HANDLE hTemplateFile
);
#ifdef UNICODE
#define CreateFile CreateFileW
#else
#define CreateFile CreateFileA
#endif // !UNICODE
LPCSTR and LPCWSTR are define in WinNT.h as:
typedef __nullterminated CONST CHAR *LPCSTR, *PCSTR;
typedef __nullterminated CONST WCHAR *LPCWSTR, *PCWSTR;
CHAR and WCHAR is defined in WinNT.h as:
typedef char CHAR;
#ifndef _MAC
typedef wchar_t WCHAR; // wc, 16-bit UNICODE character
#else
// some Macintosh compilers don't define wchar_t in a convenient location, or define it as a char
typedef unsigned short WCHAR; // wc, 16-bit UNICODE character
#endif
CreateFileA() accepts LPCSTR file names, which are stored in 8-bit char array internally.
CreateFileW() accepts LPCWSTR file names, which are stored in 16-bit wchar_t array internally.
I have created a file in the position C:\𤭢.txt. It looks like it is not possible to open this file using CreateFile(), because it contains the character 𤭢 whose Unicode code point is 0x24B62 which doesn't fit even in a WCHAR array cell.
But that file exists in my harddisk and Windows manages it normally. How do I open this file by a Win32 API function, like Windows does internally?
Such characters are represented by UTF-16 surrogate pairs. It takes two wide character elements to represent that code point. So, you just need to call CreateFile passing the necessary surrogate pair. And naturally you need to use the wide variant of CreateFile.
Presumably you won't be hard-coding such a filename in your code. In which case you'll be getting it from a file dialog, FindFirstFile, etc. And those APIs will give you the appropriate UTF-16 encoded buffer for the file.
Related
I find the use of 'sys_' in many places in linux kernel,such as 'sys_mount' ,'sys_unlink'......
But I can't find where are these functions(or something else) defined.I guess they are defined in some ways like #define sys(name) sys_##name(){}.
Can you tell me where and how they are defined?The kernel version is 4.14.255.
The table of syscalls seems to be defined for each architecture, e.g.
#
# 64-bit system call numbers and entry vectors
#
# The format is:
# <number> <abi> <name> <entry point>
#
# The abi is "common", "64" or "x32" for this file.
#
0 common read sys_read
1 common write sys_write
2 common open sys_open
3 common close sys_close
...
Source: https://github.com/torvalds/linux/blob/v4.14/arch/x86/entry/syscalls/syscall_64.tbl
The functions themselves are spread throughout the code. For example, sys_read is defined as:
SYSCALL_DEFINE3(read, unsigned int, fd, char __user *, buf, size_t, count)
{
return ksys_read(fd, buf, count);
}
Source: https://github.com/torvalds/linux/blob/v4.14/fs/read_write.c#L615-L618
The SYSCALL_DEFINE3 macro is defined at https://github.com/torvalds/linux/blob/v4.14/include/linux/syscalls.h#L198.
I would like to ask how can I get a text input from TEdit control and cast it to std::string (not AnsiString).
For example, if I have a TEdit control with the name User, I get the text from it with the User->Text command. What I want to do is to assign that value to a std::string, for example string my_str = User->Text;.
I would like to ask, how can I do this in C++ Builder? Is there some sort of a ToString() method or sort of, because I was not able to find one.
In C++Builder 2007 and earlier, TEdit::Text is an 8-bit AnsiString in the user's default ANSI locale. It is very straight forward to convert an AnsiString to a std::string - just use the AnsiString::c_str() method to get a null-terminated char* pointer to the AnsiString data, and then you can assign that to the std::string, eg:
std::string my_str = User->Text.c_str();
/* or:
System::AnsiSystem text = User->Text;
std::string my_str(text.c_str(), text.Length());
*/
If you want the std::string data to be in another character encoding, such as UTF-8, then you will have to convert the AnsiString data accordingly, such as with MultiByteToWideChar()/WideCharToMultiByte(), UTF8Encode(), etc, before assigning it to the std::string.
In C++Builder 2009 and later, TEdit::Text is a 16-bit UnicodeString in UTF-16 format. The easiest way to convert a UnicodeString to a std::string is to first convert to an AnsiStringT<CP> (where CP is the desired ANSI codepage - AnsiString uses CP=0, UTF8String uses CP=65001, etc), and then convert that to std::string, eg:
std::string my_str = AnsiString(User->Text).c_str(); // or UTF8String, etc...
/* or:
System::AnsiString text = User->Text; // or UTF8String, etc...
std::string my_str(text.c_str(), text.Length());
*/
Alternatively, in C++11 and later, you can convert the UnicodeString to a std::wstring first, and then use std::wstring_convert, eg:
#include <locale>
std::wstring my_wstr = User->Text.c_str();
/* or:
System::UnicodeString text = User->Text;
std::wstring my_wstr(text.c_str(), text.Length());
*/
// System::Char may be either wchar_t or char16_t, depending
// on which platform you are compiling for...
std::string my_str = std::wstring_convert<std::codecvt_utf8_utf16<System::Char>>{}.to_bytes(my_wstr);
I had a lot of those to migrate from Borland to Embarcadero Rio. So I created a method to do it.
#include <cwchar.h> //std::wcslen
char* __fastcall AnsiOf(wchar_t* w)
{
static char c[STR_CONV_BUF_SIZE];
memset(c, 0, sizeof(c));
WideCharToMultiByte(CP_ACP, WC_NO_BEST_FIT_CHARS, w, std::wcslen(w), c, STR_CONV_BUF_SIZE, NULL, NULL);
return c;
}
std::string my_str = AnsiOf((User->Text).c_str());
I have a problem with the ReadProcessMemory function in c++. Actually the function itself works fine but when it comes to larger addresses (for example when I use ungsigned __int64 instead of DWORD since DWORD is too small for the Address) the function gives me a wrong pointer address. Here is the relevant code:
DWORD tempAddress;
unsigned __int64 potentialBasePointerAddress = 0x13F8A0000 + 0x18606B8; //I used unsigned __int64 since 0x13F8A0000 is too large for DWORD
if (ReadProcessMemory(hProcess, (LPVOID)potentialBasePointerAddress, &tempAddress, sizeof(tempAddress), NULL))
{
cout << tempAddress << endl;
}
//in this specific case the tempAddress is 1BD5679 (or 29185657) but actually it should be 3E7D4FE0 (see (*))
(*)Cheat Engine Result
If I change DWORD tempAddress; to unsigned __int64 tempAddress; the tempAddress is 1953A4A0002C88A5 (or 18249832811198240377)
I have really no clue how to solve this problem. I am pretty sure there is a way to handle Base Addresses larger than DWORD size but I am too stupid to find out...
I am thankful for every help!
Are you compiling the program as a 64-bit binary? If you are compiling it as a 32-bit then the cast to LPVOID actually truncates your pointer to 32-bit value and thus you simply read the wrong address.
BYTE name[1000];
In my visual c++ project there is a variable defined name with the BYTE data type. If i am not wrong then BYTE is equivalent to unsigned char. Now i want to convert this unsigned char * to LPCTSTR.
How should i do that?
LPCTSTR is defined as either char const* or wchar_t const* based on whether UNICODE is defined or not.
If UNICODE is defined, then you need to convert the multi-byte string to a wide-char string using MultiByteToWideChar.
If UNICODE is not defined, a simple cast will suffice: static_cast< char const* >( name ).
This assumes that name is a null-terminated c-string, in which case defining it BYTE would make no sense. You should use CHAR or TCHAR, based on how are you operating on name.
You can also assign 'name' variable to CString object directly like:
CString strName = name;
And then you can call CString's GetBuffer() or even preferably GetString() method which is more better to get LPCTSTR. The advantage is CString class will perform any conversions required automatically for you. No need to worry about Unicode settings.
LPCTSTR pszName = strName.GetString();
Years ago, I used to do some basic programming in C. Now I am attempting to relearn what I have forgotten as well as learn Visual C++. I am confused though by all the string options and now the extra layer of trying to make my programs Unicode compatible. I have been reading Beginning Visual C++ 2010 as well as online reading to learn this information.
As an exercise I am writing a very basic program that asks a user to input some text and then display that text in the form of a messagebox. The program works, but my way of getting it to work was more through guesswork and looking at other examples than truly understanding why I need to convert the various strings into different types.
The code is:
#include "stdafx.h"
#include <iostream>
#include <string>
#include "Windows.h"
using std::wcin;
using std::wcout;
using std::wstring;
int _tmain(int argc, _TCHAR* argv[])
{
wstring myInput;
wcout << "Enter a string: ";
getline(wcin, myInput);
MessageBoxW(NULL, myInput.c_str(), _T("Test MessageBox"), 64);
return 0;
}
The MessageBox syntax is:
int WINAPI MessageBox(
__in_opt HWND hWnd,
__in_opt LPCTSTR lpText,
__in_opt LPCTSTR lpCaption,
__in UINT uType
);
On the other hand, if I just use the command line argument as the text of the messagebox, I do not need to convert the string at all and I am not sure why.
#include "stdafx.h"
#include <iostream>
#include <string>
#include "Windows.h"
using std::wcout;
int _tmain(int argc, _TCHAR* argv[])
{
MessageBoxW(NULL, argv[1], _T("Test MessageBox"), 64);
return 0;
}
My confusion is:
Why do I need to use the c_str() for argument 2 to MessageBoxW and why do I need to use the _T() macro (?) in argument 3?
Why did the program work in the second code example without doing some sort of conversion?
What exactly does LPCTSTR mean? I see another variant in MSDN functions called LPTSTR.
Thanks!
1) .c_str() is a standard C++ method to convert from C++ strings to C strings. _tmain, _T('x'), _T("text") and _TCHAR are (somewhat ugly) Microsoft macros that make your program compile either in unicode or non-unicode mode. There's a global setting in the project options that set some macros to configure your project in one of these two modes.
If you are in non-unicode mode (referred to as ANSI mode in MS's documentation) the macros expand to:
main, 'x', "text", char
If you are in unicode mode, the macros expand to
wmain, L'x', L"text", wchar_t
2) and 3) Windows headers are full of typedefs and macros like that. Sometimes they make code more obscure thant it needs to be. In general, LP means pointer (long pointer, i guess, but it's been a while since we needed to distinguish between near and far pointers), C means "const", T means that it will be either char or wchar_t depending on project settings and STR is obviously "string". After all, it's a plain C type, that's why you can pass C strings to them without conversion.
The MessageBoxW function is expecting a C-style wide-character string (WCHAR ). The macro _L() alters your string so that it's Unicode compatible (WCHAR instead of char*).
argv[] doesn't do objects, so you're already getting a WCHAR pointer out of it.
LPCTSTR is basically a WINAPI typedef for const char * or const WCHAR*, depending on whether you are building as UNICODE. Also see this post: LPCSTR, LPCTSTR and LPTSTR
In short, your main function is being passed WCHAR* strings and MessageBoxW expects WCHAR* strings.