Unicode literals in Visual C++ - visual-c++

Consider the following code:
#include <string>
#include <fstream>
#include <iomanip>
int main() {
std::string s = "\xe2\x82\xac\u20ac";
std::ofstream out("test.txt");
out << s.length() << ":" << s << std::endl;
out << std::endl;
out.close();
}
Under GCC 4.8 on Linux (Ubuntu 14.04), the file test.txt contains this:
6:€€
Under Visual C++ 2013 on Windows, it contains this:
4:€\x80
(By '\x80' I mean the single 8-bit character 0x80).
I've been completely unable to get either compiler to output a € character using std::wstring.
Two questions:
What exactly does the Microsoft compiler think it's doing with the char* literal? It's obviously doing something to encode it, but what is not clear.
What is the right way to rewrite the above code using std::wstring and std::wofstream so that it outputs two € characters?

This is because you are using \u20ac which is a Unicode character literal in an ASCII string.
MSVC encodes "\xe2\x82\xac\u20ac" as 0xe2, 0x82, 0xac, 0x80, which is 4 narrow characters. It essentially encodes \u20ac as 0x80 because it mapped the euro character to the standard 1252 codepage
GCC is converting the Unicode literal /u20ac to the 3-byte UTF-8 sequence 0xe2, 0x82, 0xac so the resulting string ends up as 0xe2, 0x82, 0xac, 0xe2, 0x82, 0xac.
If you use std::wstring = L"\xe2\x82\xac\u20ac" it gets encoded by MSVC as 0xe2, 0x00, 0x82, 0x00, 0xac, 0x00, 0xac, 0x20 which is 4 wide characters, but since you are mixing a hand-created UTF-8 with a UTF-16, the resulting string doesn't make much sense. If you use a std::wstring = L"\u20ac\u20ac" you get 2 Unicode characters in a wide-string as you'd expect.
The next problem is that MSVC's ofstream and wofstream always write in ANSI/ASCII. To get it to write in UTF-8 you should use <codecvt> (VS 2010 or later):
#include <string>
#include <fstream>
#include <iomanip>
#include <codecvt>
int main()
{
std::wstring s = L"\u20ac\u20ac";
std::wofstream out("test.txt");
std::locale loc(std::locale::classic(), new std::codecvt_utf8<wchar_t>);
out.imbue(loc);
out << s.length() << L":" << s << std::endl;
out << std::endl;
out.close();
}
and to write UTF-16 (or more specifically UTF-16LE):
#include <string>
#include <fstream>
#include <iomanip>
#include <codecvt>
int main()
{
std::wstring s = L"\u20ac\u20ac";
std::wofstream out("test.txt", std::ios::binary );
std::locale loc(std::locale::classic(), new std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>);
out.imbue(loc);
out << s.length() << L":" << s << L"\r\n";
out << L"\r\n";
out.close();
}
Note: With UTF-16 you have to use a binary mode rather than text mode to avoid corruption, so we can't use std::endl and have to use L"\r\n" to get the correct end-of-line text file behavior.

Related

Initializing c++ string with integer and chars

#include <iostream>
#include <string>
using namespace std;
int main() {
string str {5, 'c'};
cout << str; // "\005c"
}
Output: c
With gdb, it confirms that str contains "\005c" with
str[0] = '\005'
str[1] = 'c'
Why str[0] is not being printed in output console?
Used c++ version: c++11
ASCII 5 represents a signal intended to trigger a response at the receiving end. It is not visible on the console.
Reference: http://ascii.cl/
For example: try 65 instead of 5, you will see 'A'.
The ASCII number 5 is non printable. ASCII table
ASCII representation of 5 which is 53 is printable.
string str {53, 'c'};
cout << str; // 5c
Since ASCII value of 5 is 53.
So, you can try this way:
#include <iostream>
#include <string>
using namespace std;
int main() {
string str {53, 'c'};
cout << str; // "\005c"
}

Listing the Files in a Directory in Visual C++

I have tried to "simplify" a nice piece of example code, the hyperlink to the code is at the end of this message, to specify the directory string instead of passing it as a command line argument. The simplified code compiles and executes, but the filename and size are not what I expect: the file name appears to be a hex number, and the nFileSize.High is larger than the nFileSize.Low (the actual file sizes range from 0 to 100Mb). I think my type casting may have introduced errors. Any suggestions?
#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <string>
#include <cctype>
#include <bitset>
#include <sstream>
#include <windows.h>
#include <tchar.h>
#include <stdio.h>
#include <strsafe.h>
#pragma comment(lib, "User32.lib")
using namespace std;
using namespace System; //set common language runtime support to /clr
int main()
{
WIN32_FIND_DATA ffd;
LARGE_INTEGER filesize;
//TCHAR szDir[MAX_PATH];
//size_t length_of_arg;
HANDLE hFind = INVALID_HANDLE_VALUE;
//DWORD dwError=0;
finstr = "C:\\Users\\MyName\\Documents\\Visual Studio 2010\\Projects\\Data Analysis\\Data Folder\\*";
//Just evaluate the first file before looping over all files
hFind = FindFirstFile((wchar_t*)(finstr.c_str()), &ffd);
wstring wsfname(ffd.cFileName);
string newtemp(wsfname.begin(), wsfname.end());
cout << "1st fname = " << ffd.cFileName << " newtemp = "<< newtemp << " nFSizeLo = "<< ffd.nFileSizeLow << " nFSizeHi = "<< ffd.nFileSizeHigh << "\n";
FindClose(hFind);
return 0;
}
Link to original example from Microsoft
https://msdn.microsoft.com/en-us/library/windows/desktop/aa365200(d=printer,v=vs.85).aspx

atoi on a character array with lots of integers

I have a code in which the character array is populated by integers (converted to char arrays), and read by another function which reconverts it back to integers. I have used the following function to get the conversion to char array:
char data[64];
int a = 10;
std::string str = boost::lexical_cast<std::string>(a);
memcpy(data + 8*k,str.c_str(),sizeof(str.c_str())); //k varies from 0 to 7
and the reconversion back to characters is done using:
char temp[8];
memcpy(temp,data+8*k,8);
int a = atoi(temp);
This works fine in general, but when I try to do it as part of a project involving qt (ver 4.7), it compiles fine and gives me segmentation faults when it tries to read using memcpy(). Note that the segmentation fault happens only while in the reading loop and not while writing data. I dont know why this happens, but I want to get it done by any method.
So, are there any other other functions which I can use which can take in the character array, the first bit and the last bit and convert it into the integer. Then I wouldnt have to use memcpy() at all. What I am trying to do is something like this:
new_atoi(data,8*k,8*(k+1)); // k varies from 0 to 7
Thanks in advance.
You are copying only a 4 characters (dependent on your system's pointer width). This will leave numbers of 4+ characters non-null terminated, leading to runaway strings in the input to atoi
sizeof(str.c_str()) //i.e. sizeof(char*) = 4 (32 bit systems)
should be
str.length() + 1
Or the characters will not be nullterminated
STL Only:
make_testdata(): see all the way down
Why don't you use streams...?
#include <sstream>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <string>
#include <vector>
int main()
{
std::vector<int> data = make_testdata();
std::ostringstream oss;
std::copy(data.begin(), data.end(), std::ostream_iterator<int>(oss, "\t"));
std::stringstream iss(oss.str());
std::vector<int> clone;
std::copy(std::istream_iterator<int>(iss), std::istream_iterator<int>(),
std::back_inserter(clone));
//verify that clone now contains the original random data:
//bool ok = std::equal(data.begin(), data.end(), clone.begin());
return 0;
}
You could do it a lot faster in plain C with atoi/itoa and some tweaks, but I reckon you should be using binary transmission (see Boost Spirit Karma and protobuf for good libraries) if you need the speed.
Boost Karma/Qi:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
namespace qi=::boost::spirit::qi;
namespace karma=::boost::spirit::karma;
static const char delimiter = '\0';
int main()
{
std::vector<int> data = make_testdata();
std::string astext;
// astext.reserve(3 * sizeof(data[0]) * data.size()); // heuristic pre-alloc
std::back_insert_iterator<std::string> out(astext);
{
using namespace karma;
generate(out, delimit(delimiter) [ *int_ ], data);
// generate_delimited(out, *int_, delimiter, data); // equivalent
// generate(out, int_ % delimiter, data); // somehow much slower!
}
std::string::const_iterator begin(astext.begin()), end(astext.end());
std::vector<int> clone;
qi::parse(begin, end, qi::int_ % delimiter, clone);
//verify that clone now contains the original random data:
//bool ok = std::equal(data.begin(), data.end(), clone.begin());
return 0;
}
If you wanted to do architecture independent binary serialization instead, you'd use this tiny adaptation making things a zillion times faster (see benchmark below...):
karma::generate(out, *karma::big_dword, data);
// ...
qi::parse(begin, end, *qi::big_dword, clone);
Boost Serialization
The best performance can be reached when using Boost Serialization in binary mode:
#include <sstream>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
#include <boost/serialization/vector.hpp>
int main()
{
std::vector<int> data = make_testdata();
std::stringstream ss;
{
boost::archive::binary_oarchive oa(ss);
oa << data;
}
std::vector<int> clone;
{
boost::archive::binary_iarchive ia(ss);
ia >> clone;
}
//verify that clone now contains the original random data:
//bool ok = std::equal(data.begin(), data.end(), clone.begin());
return 0;
}
Testdata
(common to all versions above)
#include <boost/random.hpp>
// generates a deterministic pseudo-random vector of 32Mio ints
std::vector<int> make_testdata()
{
std::vector<int> testdata;
testdata.resize(2 << 24);
std::generate(testdata.begin(), testdata.end(), boost::mt19937(0));
return testdata;
}
Benchmarks
I benchmarked it by
using input data of 2<<24 (33554432) random integers
not displaying output (we don't want to measure the scrolling performance of our terminal)
the rough timings were
STL only version isn't too bad actually at 12.6s
Karma/Qi text version ran in 18s 5.1s, thanks to Arlen's hint at generate_delimited :)
Karma/Qi binary version (big_dword) in only 1.4s (roughly 12x 3-4x as fast)
Boost Serialization takes the cake with around 0.8s (or when subsituting text archives instead of binaries, around 13s)
There is absolutely no reason for the Karma/Qi text version to be any slower than the STL version. I improved #sehe implementation of the Karma/Qi text version to reflect that claim.
The following Boost Karma/Qi text version is more than twice as fast as the STL version:
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>
#include <boost/random.hpp>
#include <boost/spirit/include/phoenix_core.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <boost/spirit/include/phoenix_stl.hpp>
namespace ascii = boost::spirit::ascii;
namespace qi = boost::spirit::qi;
namespace karma = boost::spirit::karma;
namespace phoenix = boost::phoenix;
template <typename OutputIterator>
void generate_numbers(OutputIterator& sink, const std::vector<int>& v){
using karma::int_;
using karma::generate_delimited;
using ascii::space;
generate_delimited(sink, *int_, space, v);
}
template <typename Iterator>
void parse_numbers(Iterator first, Iterator last, std::vector<int>& v){
using qi::int_;
using qi::phrase_parse;
using ascii::space;
using qi::_1;
using phoenix::push_back;
using phoenix::ref;
phrase_parse(first, last, *int_[push_back(ref(v), _1)], space);
}
int main(int argc, char* argv[]){
static boost::mt19937 rng(0); // make test deterministic
std::vector<int> data;
data.resize(2 << 24);
std::generate(data.begin(), data.end(), rng);
std::string astext;
std::back_insert_iterator<std::string> out(astext);
generate_numbers(out, data);
//std::cout << astext << std::endl;
std::string::const_iterator begin(astext.begin()), end(astext.end());
std::vector<int> clone;
parse_numbers(begin, end, clone);
//verify that clone now contains the original random data:
//std::copy(clone.begin(), clone.end(), std::ostream_iterator<int>(std::cout, ","));
return 0;
}

cant convert parameter from char[#] to LPWSTR

When I compile this code in Visual C++, I got the below error. Can help me solve this issue..
DWORD nBufferLength = MAX_PATH;
char szCurrentDirectory[MAX_PATH + 1];
GetCurrentDirectory(nBufferLength, szCurrentDirectory);
szCurrentDirectory[MAX_PATH +1 ] = '\0';
Error message:
Error 5 error C2664: 'GetCurrentDirectoryW' : cannot convert parameter 2 from 'char [261]' to 'LPWSTR' c:\car.cpp
Your program is configured to be compiled as unicode. Thats why GetCurrentDirectory is GetCurrentDirectoryW, which expects a LPWSTR (wchar_t*).
GetCurrentDirectoryW expects a wchar_t instead of char array. You can do this using TCHAR, which - like GetCurrentDirectory - depends on the unicode setting and always represents the appropriate character type.
Don't forget to prepend your '\0' with an L in order to make the char literal unicode, too!
It seems you have define UNICODE, _UNICODE compiler flags. In that case, you need to change the type of szCurrentDirectory from char to TCHAR.
Headers:
#include <iostream>
#include <fstream>
#include <direct.h>
#include <string.h>
#include <windows.h> //not sure
Function to get current directory:
std::string getCurrentDirectoryOnWindows()
{
const unsigned long maxDir = 260;
wchar_t currentDir[maxDir];
GetCurrentDirectory(maxDir, currentDir);
std::wstring ws(currentDir);
std::string current_dir(ws.begin(), ws.end());
return std::string(current_dir);
}
To call function:
std::string path = getCurrentDirectoryOnWindows(); //Output like: C:\Users\NameUser\Documents\Programming\MFC Program 5
To make dir (Folder) in current directory:
std::string FolderName = "NewFolder";
std::string Dir1 = getCurrentDirectoryOnWindows() + "\\" + FolderName;
_mkdir(Dir1.c_str());
This works for me in MFC C++.

How to handle strings in VC++?

HI,
Once we accept an input from a keyboard, How can we add that character into a string in VC++?
Can anyone help me with this?
You can use std::string from STL, and the + or += operator.
To do this, #include <string> and use the class std::string.
After that, there are various ways to store the input from the user.
First, you may store the character input directly into the string:
std::string myStr;
std::cin >> myStr;
Second, you can append the input to an existing string:
std::string myStr;
myOtherStr += myStr;
#include <iostream>
#include <string>
int main(int argc, char**argv)
{
std::string s;
std::cin >> s;
s += " ok";
std::cout << s;
return 0;
}
Try the following:
std::string inputStr;
std::cin >> inputStr;
This code will accept a string typed on the keyboard and store it into inputStr.
My guess is that you're in the process of learning C++. If so, my suggestion is to continue reading your C++ book. Keyboard input will surely be addressed in some upcoming chapter or the other.
There are several ways to go about, depending of what you exactly need. Check the C++ I/O tutorial at this site: www.cplusplus.com

Resources