How to get system charset in Rust on Windows?

How to get system charset in Rust on Windows? - rust

Working on Rust connector for TDengine, my problem is to get system charset in Rust. Which crate or which method should I use for this?

To get the ANSI codepage (that used for 8-bit text applications), use GetACP(). To get the OEM codepage (that used in consoles) use GetOEMCP().
I don't know that TDengine, but to keep your sanity you should avoid ANSI/OEM codepages and use UTF-8/Unicode whenever possible. The Rust OsString type makes this somewhat less painful.

Related

TI-Basic Problems

I recently bought a TI-84 Plus CE, and have been making programs using TI-BASIC.
I'm trying to make a simple text editor, and I need to convert character codes to characters. However, it seems that the char() command doesn't exist?
Please Help!

I don't believe that 84+ TI-BASIC supports ascii in this way (though I know that 68k BASIC had the ord() command) but one thing you could do is store all the typeable glyphs into a string (see: prgmGLYPHS on TI-Basic Developer for example) and then use inString() and sub() to store/retrieve their values. It's not pretty, and it's not fast, but it works. Here's an example using only the uppercase letters:
:"ABCDEFGHIJKLMNOPQRSTUVWXYZ→Str1
:Input ">",Str2
:Disp inString(Str1,Str2
:Input ">",N
:Disp sub(Str1,N,1
Note: The following pertains to my experience with the TI-84+ SE. The 84+ CE runs on a newer processor than the zilog Z80, so YMMV:
I expect what you're doing is storing your text in a List. Another thing that might be more efficient and secure is storing your text as an AppVar. These are allocated blocks of RAM/ROM that you can read to/write to at will...as long as you have a library to do it. With the 84+ SE you needed to use Celtic3 (or Doors CS, which includes Celtic as a library) to do that. I haven't used the 84+ CE enough to tell you what exists there, as the assembly code is entirely different. According to this reddit post the best way to do that is use the C toolchain, but I don't have experience with this either.

Reading utf-8 files to std::string in C++

Finally! We're starting to require that all our input files are encoded in utf-8! This is something we've been wanting to do for years. Unfortunately, we suck at it since none of us have ever tried it and most of us are Windows programmers or are used to operating systems where utf-8 is the only real option anyway; neither group knows anything about reading utf-8 strings in a platform agnostic way.
So we started to look at how to deal with utf-8 in a platform agnostic way and found that its pretty confusing (because Windows) and the other questions I've found here on stackoverflow don't really seem to cover our scenario or they are confusing. I found a reference to https://www.codeproject.com/Articles/38242/Reading-UTF-with-C-streams which, I find, is a bit confusing and contains a great deal of fluff.
So a few assumptions (that must be true or we're in a state of GIGO)
All files are in utf-8 (yay!)
The std::strings must contain utf-8; no conversion allowed.
The solution must be locale agnostic and work on both macOS (10.13+), Windows (10+), Android and iOS 10+.
Stream support is not required; we're dealing with local files only (for now), but support for streams is appreciated.
We're trying to avoid using std::wstring if we can and I see no reason to use it anyway. We're also trying to avoid using any third party libraries which do not use utf-8 encoded std::string; using a custom string with functions that overloads and converts all std::string arguments to the a custom string is acceptable.
Is there any way to do this using just the standard C++ library? Preferably just by imbuing the global locale with a facet that tells the stream library to just dump content of files in strings (using custom delimiters as usual); no conversion allowed.
This question is only about reading utf-8 files into std::strings and storing the content as utf-8 encoded strings. Dealing with Windows APIs and such is a separate concern.
C++17 is available.

UTF-8 is just a sequence of bytes that follow a specific encoding. If you read a sequence of bytes that is legitimate UTF-8 data into a std::string, then the string contains UTF-8 data.
There's nothing special you have to actually do to make this happen. This works like any other C or C++ file loading. Just don't mess around with iostream locales and you'll be fine.

Is there a safe / sanitised filename function in Rust

Given that rolling one's own is not usually a great security idea, is there a crate or library function in Rust to sanitise a filename? The recent 'nul' crate demonstrates that there's a few OS-specific gotchas.

"Safe" mostly depends on your threat model.
If you simply want to avoid your filesystem from being corrupted through bad filenames, then there's good news, you don't need to do anything since filesystem APIs will reject invalid names with errors. Although if you want file names and not paths you either have to be careful to not use APIs that take AsRef<Path> or strip path separators (see std::path::is_separator), since they will accept absolute or relative paths.
If you need to handle relative paths from untrusted inputs you will at least have to strip .. paths to stop directory traversal attacks.
If you want to avoid attacks on the human instead of the software you will have to do a lot of sanitizing, such as removing Unicode text direction overrides which could mislead the user about file extensions.
The recent 'nul' crate demonstrates that there's a few OS-specific gotchas.
That was not an issue with things being unsafe on a particular platform. The issue here is one of cross-platform compatibility. Something that works on UNIX systems did not translate nicely to Windows systems. But the Windows systems failed safely, it simply caused an error and Cargo handled that error in a particular way (stopping the update). It could have chosen to handle the failure in a different way, e.g. by skipping that one particular crate or by mangling the filename.

Modifying C++ DLL to support unicode - common pitfalls to avoid?

I have a windows DLL that currently only supports ASCII and I need to update it to work with Unicode strings. This DLL currently uses char* strings in a number of places, along with making a number of ASCII Windows API calls (like GetWindowTextA, RegQueryValueExA, CreateFileA, etc).
I want to switch to using the unicode/ascii macros defined in VC++. So instead of char or CHAR I'd use TCHAR. For char* I'd use LPTSTR. And I think things like sprintf_s would be changed to _stprintf_s.
I've never really dealt with unicode before, so I'm wondering if there are any common pitfalls I should look out for while doing this. Should it just be as simple as replacing the types and method names with the proper macros be enough, or are there other complications to look out for?

First read this article by Joel Spolsky: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Then run through these links on Stack Overflow: What do I need to know about Unicode?
Generally, you are looking for any code that assumes one character = one byte (memory/buffer allocation, etc). But the links above will give you a pretty good rundown of the details.

The biggest danger is likely to be buffer sizes. If your memory allocations are made in terms of sizeof(TCHAR) you'll probably be OK, but if there is code where the original programmer was assuming that characters were 1 byte each and they used integers in malloc statements, that's hard to do a global search for.

Is there a way to convert from a string to pure code in C++?

I know that its possible to read from a .txt file and then convert various parts of that into string, char, and int values, but is it possible to take a string and use it as real code in the program?
Code:
string codeblock1="cout<<This is a test;";
string codeblock2="int array[5]={0,6,6,3,5};}";
int i;
cin>>i;
if(i)
{
execute(codeblock1);
}
else
{
execute(codeblock2);
}
Where execute is a function that converts from text to actual code (I don't know if there actually is a function called execute, I'm using it for the purpose of my example).

In C++ there's no simple way to do this. This feature is available in higher-level languages like Python, Lisp, Ruby and Perl (usually with some variation of an eval function). However, even in these languages this practice is frowned upon, because it can result in very unreadable code.
It's important you ask yourself (and perhaps tell us) why you want to do it?
Or do you only want to know if it's possible? If so, it is, though in a hairy way. You can write a C++ source file (generate whatever you want into it, as long as it's valid C++), then compile it and link to your code. All of this can be done automatically, of course, as long as a compiler is available to you in runtime (and you just execute it with system). I know someone who did this for some heavy optimization once. It's not pretty, but can be made to work.

You can create a function and parse whatever strings you like and create a data structure from it. This is known as a parse tree. Subsequently you can examine your parse tree and generate the necessary dynamic structures to perform the logic therin. The parse tree is subsequently converted into a runtime representation that is executed.
All compilers do exactly this. They take your code and they produce machine code based on this. In your particular case you want a language to write code for itself. Normally this is done in the context of a code generator and it is part of a larger build process. If you write a program to parse your language (consider flex and bison for this operation) that generates code you can achieve the results you desire.

Many scripting languages offer this sort of feature, going all the way back to eval in LISP - but C and C++ don't expose the compiler at runtime.
There's nothing in the spec that stops you from creating and executing some arbitrary machine language, like so:
char code[] = { 0x2f, 0x3c, 0x17, 0x43 }; // some machine code of some sort
typedef void (FuncType*)(); // define a function pointer type
FuncType func = (FuncType)code; // take the address of the code
func(); // and jump to it!
but most environments will crash if you try this, for security reasons. (Many viruses work by convincing ordinary programs to do something like this.)
In a normal environment, one thing you could do is create a complete program as text, then invoke the compiler to compile it and invoke the resulting executable.
If you want to run code in your own memory space, you could invoke the compiler to build you a DLL (or .so, depending on your platform) and then link in the DLL and jump into it.

First, I wanted to say, that I never implemented something like that myself and I may be way off, however, did you try CodeDomProvider class in System.CodeDom.Compiler namespace? I have a feeling the classes in System.CodeDom can provide you with the functionality you are looking for.
Of course, it will all be .NET code, not any other platform
Go here for sample

Yes, you just have to build a compiler (and possibly a linker) and you're there.
Several languages such as Python can be embedded into C/C++ so that may be an option.

It's kind of sort of possible, but not with just straight C/C++. You'll need some layer underneath such as LLVM.
Check out c-repl and ccons

One way that you could do this is with Boost Python. You wouldn't be using C++ at that point, but it's a good way of allowing the user to use a scripting language to interact with the existing program. I know it's not exactly what you want, but perhaps it might help.

Sounds like you're trying to create "C++Script", which doesn't exist as far as I know. C++ is a compiled language. This means it always must be compiled to native bytecode before being executed. You could wrap the code as a function, run it through a compiler, then execute the resulting DLL dynamically, but you're not going to get access to anything a compiled DLL wouldn't normally get.
You'd be better off trying to do this in Java, JavaScript, VBScript, or .NET, which are at one stage or another interpreted languages. Most of these languages either have an eval or execute function for just that, or can just be included as text.
Of course executing blocks of code isn't the safest idea - it will leave you vulnerable to all kinds of data execution attacks.
My recommendation would be to create a scripting language that serves the purposes of your application. This would give the user a limited set of instructions for security reasons, and allow you to interact with the existing program much more dynamically than a compiled external block.

Not easily, because C++ is a compiled language. Several people have pointed round-about ways to make it work - either execute the compiler, or incorporate a compiler or interpreter into your program. If you want to go the interpreter route, you can save yourself a lot of work by using an existing open source project, such as Lua

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string