? symbols from LUA

? symbols from LUA - string

I've registered LUA api function by using:
lua_register( luaState, "someFunc", aaa::someFunc);
this function has to load a string from the someFunc's argument.
I am doing it with:
int aaa::someFunc( lua_State* state ) {
const char* _arg = ( const char* )lua_tostring( state, -1 );
// ...
return 0;
}
it works ok, until I put to someFunc() some symbols of my language like: ą, ł, ź etc (it's 'a', 'l', 'z' written holding ALT key). For these symbols qDebug( "%s", _arg )* gives me only "?" for them all. How to recognize these values in my program?
For example when I use in my script sendFunc( "Aaś" ) the printed _arg is "Aa?".
*qDebug is from Qt library, it works the same as printf() but outputs to debug window.
Thanks.

QT uses Unicode strings. I expect that the string returned from Lua is in some other codepage. You'll need to convert it to Unicode to use with qDebug.
To see what's really in the string (in terms of hex values), breakpoint your code before calling qDebug.

Related

What encoding does nodejs use for arguments in child_process.spawn and child_process.execFile?

In NodeJS, child_process.execFile and .spawn take this parameter:
args <string[]> List of string arguments.
How does NodeJS encode the strings you pass in this array?
Context: I'm writing a nodejs app which adds metadata (often including non-ascii characters) to an mp3.
I know that ffmpeg expects utf8-encoded arguments. If my nodejs app invokes child_process.execFile("ffmpeg",["-metadata","title="+myString], {encoding:"utf8") then how will nodejs encode myString in the arguments?
I know that id3v2 expects latin1-encoded arguments. If my nodejs app invokes child_process.execFile("id3v2",["--titl",myString], {encoding:"latin1") then how will nodejs encode myString in the arguments?
I see that execFile and spawn both take an "encoding" argument. But the nodejs docs say "The encoding option can be used to specify the character encoding used to decode the stdout and stderr output." The docs say nothing about the encoding of args.

Answer: NodeJS always encodes the args as UTF-8.
I wrote a simplistic C++ app which shows the raw truth of the bytes that are passed into its argv:
#include <stdio.h>
int main(int argc, char *argv[])
{
printf("argc=%u\n", argc);
for (int i = 0; i < argc; i++)
{
printf("%u:\"", i);
for (char *c = argv[i]; *c != 0; c++)
{
if (*c >= 32 && *c < 127)
printf("%c", *c);
else
{
unsigned char d = *(unsigned char *)c;
unsigned int e = d;
printf("\\x%02X", e);
}
}
printf("\"\n");
}
return 0;
}
Within my NodeJS app, I got some strings that I assuredly knew what they came from:
const a = Buffer.from([65]).toString("utf8");
const pound = Buffer.from([0xc2, 0xa3]).toString("utf8");
const skull = Buffer.from([0xe2, 0x98, 0xa0]).toString("utf8");
const pound2 = Buffer.from([0xa3]).toString("latin1");
The argument of toString indicates that the raw bytes in the buffer should be understood as if the buffer is in UTF-8 (or latin1 in the last case). The result is that I have four strings whose contents I unambiguously know is correct.
(I understand that Javascript VMs typically store their strings as UTF16? The fact that pound and pound2 behave the same in my experiments proves that the provenance of the strings doesn't matter.)
Finally I invoked execFile with these strings:
child_process.execFileAsync("argcheck",[a,pound,pound2,skull],{encoding:"utf8"});
child_process.execFileAsync("argcheck",[a,pound,pound2,skull],{encoding:"latin1"});
In both cases, the raw bytes that nodejs passed into argv were UTF-8 encodings of the strings a,pound,pound2,skull.
So how can we pass latin1 arguments from nodejs?
The above explanation shows it's IMPOSSIBLE for nodejs to pass in any latin1 character in the range 127..255 to child_process.spawn/execFile. But there's an escape hatch involving child_process.exec:
Example: this string "A £ ☠"
stored internally in Javascript's UTF16 as "\u0041 \u00A3 \u2620"
encoded in UTF-8 as "\x41 \xC2\xA3 \xE2\x98\xA0"
encoded in latin1 as "\x41 \xA3 ?" (the skull-and-crossbones is inexpressible in latin1)
Unicode chars 0-127 are same as latin1, and encode into utf-8 the same as latin1
Unicode chars 128-255 are same as latin1, but encode differently
Unicode chars 256+ don't exist in latin1/.
// this would encode them as utf8, which is wrong:
execFile("id3v2", ["--comment", "A £ ☠", "x.mp3"]);
// instead we'll use shell printf to bypass nodejs's wrongful encoding:
exec("id3v2 --comment \"`printf "A \xA3 ?"`\" x.mp3");
Here's a handy way to turn a string like "A £ ☠" into one like "A \xA3 ?", ready to pass into child_process.exec:
const comment2 = [...comment]
.map(c =>
c <= "\u007F" ? c : c <= "\u00FF"
? `\\x${("000" + c.charCodeAt(0).toString(16)).substr(-2)}` : "?")
)
.join("");
const cmd = `id3v2 --comment \"\`printf \"${comment2}\"\`\" \"${fn}\"`;
child_process.exec(cmd, (e, stdout, stderr) => { ... });

C++ override quotes

Ok, so I'm using C++ to make a library that'd help me to print lines into a console.
So, I want to override " "(quote operators) to create an std::string instead of the string literal, to make it easier for me to append other data types to that string I want to output.
I've seen this done before in the wxWidgets with their wxString, but I have no idea how I can do that myself.
Is that possible and how would I go about doing it?
I've already tried using this code, but with no luck:
class PString{
std::string operator""(const char* text, std::size_t len) {
return std::string(text, len);
}
};
I get this error:
error: expected suffix identifier
std::string operator""(const char* text, std::size_t len) {
^~
which, I'd assume, want me to add a suffix after the "", but I don't want that. I want to only use ""(quotes).
Thanks!

You can't use "" without defining a suffix. "" is a const char* by itself either with a prefix (like L"", u"", U"", u8"", R"()") or followed by suffixes like (""s, ""sv, ...) which can be overloaded.

The way that wxString works is set and implicit constructor wxString::wxString(const char*); so that when you pass "some string" into a function it is essentially the same as wxString("some string").
Overriding operator ""X yields string literals as the other answer.

Why am I getting gibberish output, along with valid output, when reading a String^ ?

I'm trying to write a few integers to a file (as a string.) Every time I try to run this bit of code I get the integers into the text file like planned, but before the integers, I get some gibberish. I did some experimenting, and found out that if I put nothing into System::String ^ b, it would give the same gibberish output into the file or a message box, but I couldn't figure out why it would do this if I was concatenating those integers to it (as strings). What could be going wrong here?
using namespace msclr::interop;
using namespace System;
using namespace System::IO;
using namespace System::Text;
...
System::IO::StreamWriter ^ x;
char buffer[21], buffer2[3];
int a;
for(a = 0; a < 10; a++){
itoa(weight[a], buffer, 10);
strcat(buffer, buffer2);
}
System::String ^ b = marshal_as<String^>(buffer);
x->WriteLine(b);

What format is the file in? You may be reading a UTF-8 file with a byte-order mark that is silently applied by a text editing program.
http://en.wikipedia.org/wiki/Byte_order_mark

Typo in question or bug in code: pass buffer2 to itoa instead of buffer.
Also, initialize buffer to "";

C++'s char * by swig got problem in Python 3.0

Our C++ lib works fine with Python2.4 using Swig, returning a C++ char* back to a python str. But this solution hit problem in Python3.0, error is:
Exception=(, UnicodeDecodeError('utf8', b"\xb6\x9d\xa.....",0, 1, 'unexpected code byte')
Our definition is like(working fine in Python 2.4):
void cGetPubModulus(
void* pSslRsa,
char* cMod,
int* nLen );
%include "cstring.i"
%cstring_output_withsize( char* cMod, int* nLen );
Suspect swig is doing a Bytes->Str conversion automatically. In python2.4 it can be implicit but in Python3.0 it's no long allowed.. Anyone got a good idea? thanks

It's rather Python 3 that does that conversion. In Python 2 bytes and str are the same thing, in Python 3 str is unicode, so something somewhere tries to convert it to Unicode with UTF8, but it's not UTF8.
Your Python 3 code needs to return not a Python str, but a Python bytes. This will not work with Python 2, though, so you need preprocessor statements to handle the differences.

I came across a similar problem. I wrote a SWIG typemap for a custom char array (an unsigned char in fact) and it got SEGFAULT when using Python 3. So I debugged the code within the typemap and I realized the problem Lennart states.
My solution to that problem was doing the following in that typemap:
%typemap(in) byte_t[MAX_FONTFACE_LEN] {
if (PyString_Check($input))
{
$1 = (byte_t *)PyString_AsString($input);
}
else if (PyUnicode_Check($input))
{
$1 = (byte_t *)PyUnicode_AsEncodedString($input, "utf-8", "Error ~");
$1 = (byte_t *)PyBytes_AS_STRING($1);
}
else
{
PyErr_SetString(PyExc_TypeError,"Expected a string.");
return NULL;
}
}
That is, I check what kind of string object PyObject is. The functions PyString_AsString() and PyUnicode_AsString() will return > 0 if its input it's an UTF- 8 string or an Unicode string respectively. If it's an Unicode string, we convert that string to bytes in the call PyUnicode_AsEncodedString() and later on we convert those bytes to a char * with the call PyBytes_AS_STRING().
Note that I vaguely use the same variable for storing the unicode string and converting it later to bytes. Despite of being that questionable and maybe, it could derive in another coding-style discussion, the fact is that I solved my problem. I have tested it out with python3 and python2.7 binaries without any problems yet.
And lastly, the last line is for replicating an exception in the python call, to inform about that input wasn't a string, either utf nor unicode.

LPCSTR, TCHAR, String

I am use next type of strings:
LPCSTR, TCHAR, String i want to convert:
from TCHAR to LPCSTR
from String to char
I convert from TCHAR to LPCSTR by that code:
RunPath = TEXT("C:\\1");
LPCSTR Path = (LPCSTR)RunPath;
From String to char i convert by that code:
SaveFileDialog^ saveFileDialog1 = gcnew SaveFileDialog;
saveFileDialog1->Title = "Сохранение файла-настроек";
saveFileDialog1->Filter = "bck files (*.bck)|*.bck";
saveFileDialog1->RestoreDirectory = true;
pin_ptr<const wchar_t> wch = TEXT("");
if ( saveFileDialog1->ShowDialog() == System::Windows::Forms::DialogResult::OK ) {
wch = PtrToStringChars(saveFileDialog1->FileName);
} else return;
ofstream os(wch, ios::binary);
My problem is that when i set "Configuration Properties -> General
Character Set in "Use Multi-Byte Character Set" the first part of code work correctly. But the second part of code return error C2440. When i set "Configuration Properties -> General
Character Set in "Use Unicode" the second part of code work correctly. But the first part of code return the only first character from TCHAR to LPCSTR.

I'd suggest you need to be using Unicode the whole way through.
LPCSTR is a "Long Pointer to a C-type String". That's typically not what you want when you're dealing with .Net methods. The char type in .Net is 16bits wide.
You also should not use the TEXT("") macro unless you're planning multiple builds using various character encodings. Try wrapping all your string literals with the _W("") macro instead and a pure unicode build if you can.
See if that helps.
PS. std::wstring is very handy in your scenario.
EDIT
You see only one character because the string is now unicode but you cast it as a regular string. Many or most of the Unicode characters in the ASCII range has their same number as in ASCII but have the second of their 2 bytes set to zero. So when a unicode string is read as a C-string you only see the first character because C-strings are null ( zero ) terminated. The easy ( and wrong ) way to deal with this is to use std:wstring to cast as a std:string then pull the C-String out of that. This is not the safe approach because Unicode has a much large character space then your standard encoding.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

? symbols from LUA - string

QT uses Unicode strings. I expect that the string returned from Lua is in some other codepage. You'll need to convert it to Unicode to use with qDebug. To see what's really in the string (in terms of hex values), breakpoint your code before calling qDebug.

Related

What encoding does nodejs use for arguments in child_process.spawn and child_process.execFile?

C++ override quotes

Why am I getting gibberish output, along with valid output, when reading a String^ ?

C++'s char * by swig got problem in Python 3.0

LPCSTR, TCHAR, String

Categories

Resources