Getting hexadecimal character escape while printing string in Xcode - string

I'm trying to run this C++ code in Xcode 8.1:
std::string str = "g[+g]g[-g]g[−g[+g]g]g[+g]g";
for (auto& c : str) {
printf("%c", c);
}
and I'm getting this as output:
g[+g]g[-g]g[\342\210\222g[+g]g]g[+g]g
Does anyone knows why some characters are coming as hexadecimal characters?
I already tried to print as c_str().

Related

Rust ncurses: garbled Chinese characters and setlocale doesn't work

I am trying to print Chinese character in a ncurses screen
log::debug!("row str {}", row_str);
ncurses::addstr(&row_str);
variable row_str is displayed well in as a parameter of log::debug , but get garbled by using ncurses::addstr, like this:
中彖~G潛®弾U // it should be "中文目录"
i've tried to fix it by the following 3 methods, but no one works.
// method 1
gettextrs::setlocale(gettextrs::LocaleCategory::LcAll, "");
// method 2
use libc::setlocale;
unsafe {
setlocale(0, "".as_bytes().as_ptr() as *const i8);
}
//method 3
ncurses::setlocale(ncurses::constants::LcCategory::all, "")

I'm need to count every word, line , and character of a given string

as you can see i'm trying to get the word count, character count and line count but the below code is not working.
#include<iostream>
using namespace std;
int main()
{
char ch;
int wc=1, lc=1, cc=0;
while((ch=cin.get())!='*')
{
cc++;
if(ch==' ')
{
wc++;
}
else if(ch=='\n')
{
wc++;
lc++;
}
}
cout<<"\n the number of character=="<<cc;
cout<<"\n the number of words=="<<wc;
cout<<"\n the number of lines=="<<lc;
return 0;
}
I entered your code and compiled it with g++. It is working without any problems. Can you post the error you get or did it compile ?
Maybe your visual c++ compiler is not working right. The code itself should work.
Edit: Below a different version of the above code, where no text input is threaded as zero words and EOF is also a break condition of the loop.
EOF depends on your system, on Windows it is Control + z, on Linux it might be Control + d.
The input text might have multiple spaces between words. Punctuation characters and digits (0-9) are threaded as word delimiters as good as possible. Underscore, backticks, tildes and apostrophe like in "don't" are handled as part of a word.
Curly brackets are handled as part of a word to keep the code simple but normal brackets are delimiters.
#include <iostream>
using namespace std;
int main()
{
int ch, wc=0, lc=1, cc=0, old=0;
cout<<"Enter your text, exit with '*':\n";
while ((ch=cin.get())!='*' && ch!=EOF)
{
cc++;
if (old<='?' && old!='\'')
wc += !(ch<='?' && ch!='\'');
lc += ((old=ch)=='\n');
}
cout<<"\nthe number of character=="<<cc
<<"\nthe number of words=="<<wc
<<"\nthe number of lines=="<<lc<<"\n";
return 0;
}

Unicode to char* c++ 11

I want to know if is there any way to convert a unicode code to a string or char in C++ 11.
I've been trying with extended latin unicode letter Á (as an example) which has this codification:
letter: Á
Unicode: 0x00C1
UTF8 literal: \xc3\x81
I've been able to do so if it's hardcoded as:
const char* c = u8"\u00C1";
But if i got the byte sequence as a short, how can I do the equivalent to get the char* or std::string 'Á'?
EDIT, SOLUTION:
I was finally able to do so, here is the solution if anyone needs it:
std::wstring ws;
for(short input : inputList)
{
wchar_t wc(input);
ws += wc;
}
std::wstring_convert<std::codecvt_utf8<wchar_t>> cv;
str = cv.to_bytes(ws);
Thanks for the comments they were very helpful.
The C++11 standard contains codecvt_utf8, which converts between some internal character type (try char16_t if your compiler has it, otherwise wchar_t) and UTF-8 encoding.
The problems is that char is only one byte length, while unicode characters require a size of two bytes.
You can still treat it as char*, but you must remember that you are not dealing with an ascii string (there will be zeros).
You may have to switch to wchar_t.

Converting wchar_t* to char* on iOS

I'm attempting to convert a wchar_t* to a char*. Here's my code:
size_t result = wcstombs(returned, str, length + 1);
if (result == (size_t)-1) {
int error = errno;
}
It indeed fails, and error is filled with 92 (ENOPROTOOPT) - Protocol not available.
I've even tried setting the locale:
setlocale(LC_ALL, "C");
And this one too:
setlocale(LC_ALL, "");
I'm tempted to just throw the characters with static casts!
Seems the issue was that the source string was encoded with a non-standard encoding (two ASCII characters for each wide character), which looked fine in the debugger, but clearly internally was sour. The error code produced is clearly not documented, but it's the equivalent to simply not being able to decode said piece of text.

C++'s char * by swig got problem in Python 3.0

Our C++ lib works fine with Python2.4 using Swig, returning a C++ char* back to a python str. But this solution hit problem in Python3.0, error is:
Exception=(, UnicodeDecodeError('utf8', b"\xb6\x9d\xa.....",0, 1, 'unexpected code byte')
Our definition is like(working fine in Python 2.4):
void cGetPubModulus(
void* pSslRsa,
char* cMod,
int* nLen );
%include "cstring.i"
%cstring_output_withsize( char* cMod, int* nLen );
Suspect swig is doing a Bytes->Str conversion automatically. In python2.4 it can be implicit but in Python3.0 it's no long allowed.. Anyone got a good idea? thanks
It's rather Python 3 that does that conversion. In Python 2 bytes and str are the same thing, in Python 3 str is unicode, so something somewhere tries to convert it to Unicode with UTF8, but it's not UTF8.
Your Python 3 code needs to return not a Python str, but a Python bytes. This will not work with Python 2, though, so you need preprocessor statements to handle the differences.
I came across a similar problem. I wrote a SWIG typemap for a custom char array (an unsigned char in fact) and it got SEGFAULT when using Python 3. So I debugged the code within the typemap and I realized the problem Lennart states.
My solution to that problem was doing the following in that typemap:
%typemap(in) byte_t[MAX_FONTFACE_LEN] {
if (PyString_Check($input))
{
$1 = (byte_t *)PyString_AsString($input);
}
else if (PyUnicode_Check($input))
{
$1 = (byte_t *)PyUnicode_AsEncodedString($input, "utf-8", "Error ~");
$1 = (byte_t *)PyBytes_AS_STRING($1);
}
else
{
PyErr_SetString(PyExc_TypeError,"Expected a string.");
return NULL;
}
}
That is, I check what kind of string object PyObject is. The functions PyString_AsString() and PyUnicode_AsString() will return > 0 if its input it's an UTF- 8 string or an Unicode string respectively. If it's an Unicode string, we convert that string to bytes in the call PyUnicode_AsEncodedString() and later on we convert those bytes to a char * with the call PyBytes_AS_STRING().
Note that I vaguely use the same variable for storing the unicode string and converting it later to bytes. Despite of being that questionable and maybe, it could derive in another coding-style discussion, the fact is that I solved my problem. I have tested it out with python3 and python2.7 binaries without any problems yet.
And lastly, the last line is for replicating an exception in the python call, to inform about that input wasn't a string, either utf nor unicode.

Resources