Cast &Vec<char> as &str

Cast &Vec<char> as &str - rust

I have a Vec<Vec<char>> as a console frame buffer.
I'd like to render the buffer, but it would make me to print!() every char. I would like to represent the inner &Vec<char> as &str (not converting, not making a new String, but just casting) to print it as a whole with print!().
Is it possible, or is print!() already as fast for many characters as print!() for a single string slice?

A &str represents a reference to a memory location where a string is stored encoded using UTF-8.
Since your Vec<char> is not a string encoded using UTF-8, there is no way around creating a new String in memory somewhere that you can then reference.
Luckily it's easy and fast to convert, if v is your Vec<char>, it's simply v.iter().cloned().collect::<String>(). If you no longer wish to keep the old v around, you can replace .iter().cloned() with .into_iter().

Related

Python interpret bytearray as bytes

I have a question regarding the interpretation of a string as bytes
Within python, I have the situation that one variable contains e.g. this value
"bytearray(b'\x13\x02US')"
this is unfortunately due to the behavior of a module I am using. My question is, how could i get this string into bytes?
I have tried stripping away the "bytearray(b'" and the "')" at the end, and use .encode() as a function, but the result then is:
b'\\x13\\x02US'
Which clearly escapes the \ in order to prevent the interpretation as bytes.
How could i get this converted into
b'\x13\x02US'
instead though?
Thank you very much!

You could use .decode().replace('\\', '\'), this way it replaces the double slashes with single ones. Either attache it after your .encode() function or do it on your string seperately.

Whats the most direct way to convert a Path to a *c_char?

Given a std::path::Path, what's the most direct way to convert this to a null-terminated std::os::raw::c_char? (for passing to C functions that take a path).
use std::ffi::CString;
use std::os::raw::c_char;
use std::os::raw::c_void;
extern "C" {
some_c_function(path: *const c_char);
}
fn example_c_wrapper(path: std::path::Path) {
let path_str_c = CString::new(path.as_os_str().to_str().unwrap()).unwrap();
some_c_function(path_str_c.as_ptr());
}
Is there a way to avoid having so many intermediate steps?
Path -> OsStr -> &str -> CString -> as_ptr()

It's not as easy as it looks. There's one piece of information you didn't provide: what encoding is the C function expecting the path to be in?
On Linux, paths are "just" arrays of bytes (0 being invalid), and applications usually don't try to decode them. (However, they may have to decode them with a particular encoding to e.g. display them to the user, in which case they will usually try to decode them according to the current locale, which will often use the UTF-8 encoding.)
On Windows, it's more complicated, because there are variations of API functions that use an "ANSI" code page and variations that use "Unicode" (UTF-16). Additionally, Windows doesn't support setting UTF-8 as the "ANSI" code page. This means that unless the library specifically expects UTF-8 and converts path to the native encoding itself, passing it an UTF-8 encoded path is definitely wrong (though it might appear to work for strings containing only ASCII characters).
(I don't know about other platforms, but it's messy enough already.)
In Rust, Path is just a wrapper for OsStr. OsStr uses a platform-dependent representation that happens to be compatible with UTF-8 when the string is indeed valid UTF-8, but non-UTF-8 strings use an unspecified encoding (on Windows, it's actually using WTF-8, but this is not contractual; on Linux, it's just the array of bytes as is).
Before you pass a path to a C function, you must determine what encoding it's expecting the string to be in, and if it doesn't match Rust's encoding, you'll have to convert it before wrapping it in a CString. Rust doesn't let you convert a Path or an OsStr to anything other than a str in a platform-independent way. On Unix-based targets, the OsStrExt trait is available and provides access to the OsStr as a slice of bytes.
Rust used to provide a to_cstring method on OsStr, but it was never stabilized, and it was deprecated in Rust 1.6.0, as it was realized that the behavior was inappropriate for Windows (it returned an UTF-8 encoded path, but Windows APIs don't support that!).

As Path is just a thin wrapper around OsStr, you could nearly pass it as-is to your C function. But to be a valid C string we have to add the NUL terminating byte. Thus we must allocate a CString.
On the other hand, conversion to a str is both risky (what if the Path is not a valid UTF-8 string?) and an unnecessary cost: I use as_bytes() instead of to_str().
fn example_c_wrapper<P: AsRef<std::path::Path>>(path: P) {
let path_str_c = CString::new(path.as_ref().as_os_str().as_bytes()).unwrap();
some_c_function(path_str_c.as_ptr());
}
This is fo Unix. I do not know how it works for Windows.

If your goal is to convert a path to some sequence of bytes that is interpreted as a "native" path on whatever platform the code was compiled for, then the most direct way to do this is by using the OsStrExtof each platform you want to support:
let path = ..;
let mut buf = Vec::new();
#[cfg(unix)] {
use std::os::unix::ffi::OsStrExt;
buf.extend(path.as_os_str().as_bytes());
buf.push(0);
}
#[cfg(windows)] {
use std::os::windows::ffi::OsStrExt;
buf.extend(path.as_os_str()
.encode_wide()
.chain(Some(0))
.map(|b| {
let b = b.to_ne_bytes();
b.get(0).map(|s| *s).into_iter().chain(b.get(1).map(|s| *s))
})
.flatten());
}
This code[1] gives you a buffer of bytes that represents the path as a series of null-terminated bytes when you run it on linux, and represents "unicode" (utf16) when run on windows. You could add a fallback that converts OsStr to a str on other platforms, but I strongly recommend against that. (see why bellow)
For windows, you'll want to cast your buffer pointer to wchar_t * before using it with unicode functions on Windows (e.g. _wfopen). This code assumes that wchar_t is two bytes large, and that the buffer is properly aligned to wchar_ts.
On the Linux side, just use the pointer as-is.
About converting paths to unicode strings: Don't. Contrary to recommendations here and elsewhere, blindly converting a path to utf8 is not the correct way to handle a system path. Requiring that paths be valid unicode will cause your code to fail when (not if) it encounters paths that are not valid unicode. If you're handling real world paths, you will inevitably be handling non-utf8 paths. Doing it right the first time will help avoid a lot of pain and misery in the long run.
[1]: This code was taken directly out of a library I'm working on (feel free to reuse). It has been tested against both linux and 64-bit windows via wine.

If you are trying to produce a Vec<u8>, I usually phone it in and do:
#[cfg(unix)]
fn path_to_bytes<P: AsRef<Path>>(path: P) -> Vec<u8> {
use std::os::unix::ffi::OsStrExt;
path.as_ref().as_os_str().as_bytes().to_vec()
}
#[cfg(not(unix))]
fn path_to_bytes<P: AsRef<Path>>(path: P) -> Vec<u8> {
// On Windows, could use std::os::windows::ffi::OsStrExt to encode_wide(),
// but you end up with a Vec<u16> instead of a Vec<u8>, so that doesn't
// really help.
path.as_ref().to_string_lossy().to_string().into_bytes()
}
Knowing full well that non-UTF8 paths on non-UNIX will not be supported correctly. Note that you might need a Vec<u8> if working with Thrift/protocol buffers as opposed to a C API.

How to convert the PathBuf to String

I have to convert the PathBuf variable to a String to feed my function. My code is like this:
let cwd = env::current_dir().unwrap();
let my_str: String = cwd.as_os_str().to_str().unwrap().to_string();
println!("{:?}", my_str);
it works but is awful with the cwd.as_os_str….
Do you have a more convenient method or any suggestions on how to handle it?

As mcarton has already said it is not so simple as not all paths are UTF-8 encoded. But you can use:
p.into_os_string().into_string()
In order to have a fine control of it utilize ? to send error to upper level or simply ignore it by calling unwrap():
let my_str = cwd.into_os_string().into_string().unwrap();
A nice thing about into_string() is that the error wrap the original OsString value.

It is not easy on purpose: String are UTF-8 encoded, but PathBuf might not be (eg. on Windows). So the conversion might fail.
There are also to_str and to_string_lossy methods for convenience. The former returns an Option<&str> to indicate possible failure and the later will always succeed but will replace non-UTF-8 characters with U+FFFD REPLACEMENT CHARACTER (which is why it returns Cow<str>: if the path is already valid UTF-8, it will return a reference to the inner buffer but if some characters are to be replaced, it will allocate a new String for that; in both case you can then use into_owned if you really need a String).

One way to convert PathBuf to String would be:
your_path.as_path().display().to_string();

As #mcarton mentioned, to_string_lossy() should do the job.
let cwd = env::current_dir().unwrap();
let path: String =String::from(cwd.to_string_lossy());
rustc 1.56.1 (59eed8a2a 2021-11-01)
I am a (learning) Rust fan (years of c/c++ programmer) but man, if it makes simple thing so complicated, UTF-8 and COW.. makes people lost in the translation.

How to copy wide string in VC++ 2003

Hi could you please help us in copying one wchar_t* to another wchar_t*.
I'm using the following code:
wchar_t* str1=L"sreeni";
wchar_t* str2;
wcscpy(str2,str1);
In the last line I'm getting a run time error as without allocating memory to *str2, i'm trying to copy the value of str1;
is there any method like wcscpy to copy wchar_t pointer to another wchar_t pointer?
I dont want to use wide character arrays, i.e there shouldn't be any restriction on the size of string. And I want to copy the content of complete string str1 into string str2.

I assume you mean you want to duplicate the string, rather than just the pointer.
wcsdup() is the wide equivalent of strdup(), which will allocate the memory for the duplicated string.

Detect partial or incomplete characters read from a buffer

In a loop I am reading a stream, that is encoded as UTF-8, 10 bytes (say) in every loop. As the stream is being passed to a buffer first, I must specify its read length in bytes before transforming it to a UTF-8 string. The issue that I am facing is that sometimes it will read partial, incomplete characters. I need to fix this.
Is there a way to detect if a string ends with an incomplete character, or some check that I can perform on the last character of my string to determine this?
Preferably a “non single-encoding” solution would be the best.

If a buffer ends with an incomplete character and you convert it into a string and then initialize a new buffer from that string, the new buffer will be a different length (longer if you're using utf8, shorter if you're using ucs2) than the original.
Something like:
var b1=new Buffer(buf.toString('utf8'), 'utf8');
if (b2.length !== buf.length) {
// buffer has an incomplete character
} else {
// buffer is OK
}
Substitute your desired encoding for 'utf8'.
Note that this is dependent on how the current implementation of Buffer#toString deals with incomplete characters, which isn't documented, though it's unlikely to be changed in a way that would result in equal-length buffers (a future implementation might throw an error instead, so you should probably wrap the code in a try-catch block).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string