C++'s char * by swig got problem in Python 3.0 - python-3.x

Our C++ lib works fine with Python2.4 using Swig, returning a C++ char* back to a python str. But this solution hit problem in Python3.0, error is:
Exception=(, UnicodeDecodeError('utf8', b"\xb6\x9d\xa.....",0, 1, 'unexpected code byte')
Our definition is like(working fine in Python 2.4):
void cGetPubModulus(
void* pSslRsa,
char* cMod,
int* nLen );
%include "cstring.i"
%cstring_output_withsize( char* cMod, int* nLen );
Suspect swig is doing a Bytes->Str conversion automatically. In python2.4 it can be implicit but in Python3.0 it's no long allowed.. Anyone got a good idea? thanks

It's rather Python 3 that does that conversion. In Python 2 bytes and str are the same thing, in Python 3 str is unicode, so something somewhere tries to convert it to Unicode with UTF8, but it's not UTF8.
Your Python 3 code needs to return not a Python str, but a Python bytes. This will not work with Python 2, though, so you need preprocessor statements to handle the differences.

I came across a similar problem. I wrote a SWIG typemap for a custom char array (an unsigned char in fact) and it got SEGFAULT when using Python 3. So I debugged the code within the typemap and I realized the problem Lennart states.
My solution to that problem was doing the following in that typemap:
%typemap(in) byte_t[MAX_FONTFACE_LEN] {
if (PyString_Check($input))
{
$1 = (byte_t *)PyString_AsString($input);
}
else if (PyUnicode_Check($input))
{
$1 = (byte_t *)PyUnicode_AsEncodedString($input, "utf-8", "Error ~");
$1 = (byte_t *)PyBytes_AS_STRING($1);
}
else
{
PyErr_SetString(PyExc_TypeError,"Expected a string.");
return NULL;
}
}
That is, I check what kind of string object PyObject is. The functions PyString_AsString() and PyUnicode_AsString() will return > 0 if its input it's an UTF- 8 string or an Unicode string respectively. If it's an Unicode string, we convert that string to bytes in the call PyUnicode_AsEncodedString() and later on we convert those bytes to a char * with the call PyBytes_AS_STRING().
Note that I vaguely use the same variable for storing the unicode string and converting it later to bytes. Despite of being that questionable and maybe, it could derive in another coding-style discussion, the fact is that I solved my problem. I have tested it out with python3 and python2.7 binaries without any problems yet.
And lastly, the last line is for replicating an exception in the python call, to inform about that input wasn't a string, either utf nor unicode.

Related

How to pass string/integer value from js to c++?

I am using Nodejs c++ addon in my nodejs project. JS calls a method defined in c++ with a string as the parameter. I couldn't get the string in c++. Below is my code in c++:
NAN_METHOD(DBNode::Test){
printf("Hello\n");
printf("%s\n", info[0]->ToString());
printf("%d\n", info[1]->ToNumber());
}
Below is my js code:
const test = require('./build/Release/test.node');
test.test('ssss', 99);
Below is the output:
$ node demo.js
Hello
?ڄ?C
-272643000
You can see from the above output that the string and integer values are not correctly printed. Is there anything wrong with my code?
Let start from numbers. ToNumber returns value of type Local<Number>. It differs from regular C-like value what printf can digest.
First of all you need unwrap Local. It is v8 pointer-like utility class.
You can do it with overrided * operator. So *(info[1]->ToNumber()) gives us v8 Number child of Value. But this is not the end of story. Now we can pull good-old int from it (*(info[1]->ToNumber())).Int32Value(). Or you can use the fact Handle ancestors override -> operator too and write like info[1]->ToNumber()->Int32Value().
String case is harder. V8 uses utf8 strings and you can use String::Utf8Value utility class to get buffer of char from it. *(String::Utf8Value(info[0]->ToString()))
Usually you do not need it in v8 addons and I suggest you work with v8 objects(like Local, String, Number, etc) in your native code.
below is the solution :--
NAN_METHOD(updateSignalValue) {
Nan::Utf8String lSignal(info[0]);
int len = lSignal.length();
if (len <= 0) {
return Nan::ThrowTypeError("arg must be a non-empty string");
}
std::cout << "\n hello lSignal value is :"<
Regards, Rakesh Kumar Jha

How to know if wstring can be safely (no data loss) converted to string?

So I already know how to convert wstring to string (How to convert wstring into string?).
However, I would like to to know whether it is safe to make the conversion, meaning, the wstring variable does not contain any characters that are not supported in string type.
strings can hold any data, if you use the right encoding. They are just sequences of bytes. But you need to check with your particular encoding / conversion routine.
Should be simply a matter of round-tripping. An elegant solution to many things.
Warning, Pseudo-code, there is no literal convert_to_wstring() unless you make it so:
if(convert_to_wstring(convert_to_string(ws)) == ws)
happy_days();
If what goes in comes out, it is non-lossy (at least for your code points).
Not that its the most efficient solution, but should allow you to build from your favorite conversion routines.
// Round-trip and see if we lose anything
bool check_ws2s(const std::wstring& wstr)
{
return (s2ws(ws2s(str)) == wstr);
}
Using #dk123's conversions for C++11 at How to convert wstring into string? (Upvote his answer here https://stackoverflow.com/a/18374698/257090)
wstring s2ws(const std::string& str)
{
typedef std::codecvt_utf8<wchar_t> convert_typeX;
std::wstring_convert<convert_typeX, wchar_t> converterX;
return converterX.from_bytes(str);
}
string ws2s(const std::wstring& wstr)
{
typedef std::codecvt_utf8<wchar_t> convert_typeX;
std::wstring_convert<convert_typeX, wchar_t> converterX;
return converterX.to_bytes(wstr);
}
Note, if your idea of conversion is truncating the wide chars to chars, then it is simply a matter of iterating and checking that each wide char value fits in a char. This will probably do it.
WARNING: Not appropriate for multibyte encoding.
for(wchar_t& wc: ws) {
if(wc > static_cast<char>::(wc))
return false;
}
return true;
Or:
// Could use a narrowing cast comparison, but this avoids any warnings
for(wchar_t& wc: ws) {
if(wc > std::numeric_limits<char>::max())
return false;
}
return true;
FWIW, in Win32, there are conversion routines that accept a parameter of WC_ERR_INVALID_CHARS that tells the routine to fail instead of silently dropping code points. Non-standard solutions, of course.
Example: WideCharToMultiByte()
http://msdn.microsoft.com/en-us/library/windows/desktop/dd374130(v=vs.85).aspx

junk values in a LPTSTR string

I wrote a function to return the extension from a path, It looks like below:
LPTSTR GetExtension(LPCTSTR path1)
{
CString str(path1);
int length = str.ReverseFind(L'.');
str = str.Right(str.GetLength()-length);
LPTSTR extension= str.GetBuffer(0);
str.ReleaseBuffer();
return extension;
}
I checked the statement and found that extension have a valid value(.txt) while returning but when i use the following statement in main method like below
LPTSTR extension = GetExtension(L"C:\\Windows\\text.txt");
The variable extension is having the following junk values:
ﻮ
ﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮﻮ䞐瀘嗯᠀骰PꬰP⚜叕u
Can anyone tell me what is the reason behind it?
You are returning a pointer to a released buffer. And the buffer is a local variable of the function. Both big no-nos. Change the signature to
size_t GetExtension(LPCTSTR path, LPTSTR buffer, size_t bufferSize)
so that you can copy the result into buffer.
Or return a CString or std::wstring, you're using C++, not C. Using TCHAR is also a heavily outmoded way to handle strings, the last non-Unicode version of Windows died a timely death 12 years ago.
I dont have my compiler with me, but may be you are probably getting the buffer and storing a location of it. Then you release it while LPTSTR is still pointing to one location.
Or may be LPTSTR is on the stack while while you look at it inside the function. On exiting the function you are loosing it.

Converting wchar_t* to char* on iOS

I'm attempting to convert a wchar_t* to a char*. Here's my code:
size_t result = wcstombs(returned, str, length + 1);
if (result == (size_t)-1) {
int error = errno;
}
It indeed fails, and error is filled with 92 (ENOPROTOOPT) - Protocol not available.
I've even tried setting the locale:
setlocale(LC_ALL, "C");
And this one too:
setlocale(LC_ALL, "");
I'm tempted to just throw the characters with static casts!
Seems the issue was that the source string was encoded with a non-standard encoding (two ASCII characters for each wide character), which looked fine in the debugger, but clearly internally was sour. The error code produced is clearly not documented, but it's the equivalent to simply not being able to decode said piece of text.

Why am I getting gibberish output, along with valid output, when reading a String^ ?

I'm trying to write a few integers to a file (as a string.) Every time I try to run this bit of code I get the integers into the text file like planned, but before the integers, I get some gibberish. I did some experimenting, and found out that if I put nothing into System::String ^ b, it would give the same gibberish output into the file or a message box, but I couldn't figure out why it would do this if I was concatenating those integers to it (as strings). What could be going wrong here?
using namespace msclr::interop;
using namespace System;
using namespace System::IO;
using namespace System::Text;
...
System::IO::StreamWriter ^ x;
char buffer[21], buffer2[3];
int a;
for(a = 0; a < 10; a++){
itoa(weight[a], buffer, 10);
strcat(buffer, buffer2);
}
System::String ^ b = marshal_as<String^>(buffer);
x->WriteLine(b);
What format is the file in? You may be reading a UTF-8 file with a byte-order mark that is silently applied by a text editing program.
http://en.wikipedia.org/wiki/Byte_order_mark
Typo in question or bug in code: pass buffer2 to itoa instead of buffer.
Also, initialize buffer to "";

Resources