How to convert ascii char to int like c/c++ in rust - rust

In c/c++, an ascii char can be converted to a number
int('a')
But how to do this in rust?

You can simply convert character to u32 this way: let code = 'a' as u32;. This will give you unicode value for specific character.
fn main() {
let s = "0123";
for c in s.chars() {
println!("{c} -> {}", c as u32);
}
}
Try it here
But if you are strictly require to work with ASCII not with Unicode, you can also check ascii crate

Related

Is there a function like char::escape_default for u8 bytes?

The Rust standard library has a char::escape_default function which will print the literal character if it's printable, or a sensible escape sequence (\n, \u{XXXX}, etc.) if not.
Is there an equivalent for bytes? Specifically, I would like it to return the literal byte if it's printable, or return a byte escape sequence (\xNN) if not.
The standard library has a std::ascii::escape_default function that satisfies this use case:
fn main() {
let x = String::from_utf8(
"The compiler said “you have an error!”."
.bytes()
.flat_map(|b| std::ascii::escape_default(b))
.collect::<Vec<u8>>(),
)
.unwrap();
println!("{}", x);
}

Declare char for comparison

As part of Advent of Code 2020 day 3, I'm trying to compare characters in a string to a specific character.
fn main(){
let str_foo =
"...##..#
##.#..#.";
for char in str_foo.chars() {
println!("{}", char == "#");
}
}
The error I get is expected char, found &str.
I'm struggling to find a clean way to cast either the left or right side of the equality check so that they can be compared.
Use single quotes for character literals. Fixed:
fn main() {
let str_foo =
"...##..#
##.#..#.";
for char in str_foo.chars() {
println!("{}", char == '#');
}
}
playground

Provide `char **` argument to C function from Rust?

I've got a C function that (simplified) looks like this:
static char buffer[13];
void get_string(const char **s) {
sprintf(buffer, "Hello World!");
*s = buffer;
}
I've declared it in Rust:
extern pub fn get_string(s: *mut *const c_char);
But I can't figure out the required incantation to call it, and convert the result to a Rust string. Everything I've tried either fails to compile, or causes a SEGV.
Any pointers?
First of all, char in Rust is not the equivalent to a char in C:
The char type represents a single character. More specifically, since 'character' isn't a well-defined concept in Unicode, char is a 'Unicode scalar value', which is similar to, but not the same as, a 'Unicode code point'.
In Rust you may use u8 or i8 depending in the operating system. You can use std::os::raw::c_char for this:
Equivalent to C's char type.
C's char type is completely unlike Rust's char type; while Rust's type represents a unicode scalar value, C's char type is just an ordinary integer. This type will always be either i8 or u8, as the type is defined as being one byte long.
C chars are most commonly used to make C strings. Unlike Rust, where the length of a string is included alongside the string, C strings mark the end of a string with the character '\0'. See CStr for more information.
First, we need a variable, which can be passed to the function:
let mut ptr: *const c_char = std::mem::uninitialized();
To pass it as *mut you simply can use a reference:
get_string(&mut ptr);
Now use the *const c_char for creating a CStr:
let c_str = CStr::from_ptr(ptr);
For converting it to a String you can choose:
c_str.to_string_lossy().to_string()
or
c_str().to_str().unwrap().to_string()
However, you shouldn't use String if you don't really need to. In most scenarios, a Cow<str> fulfills the needs. It can be obtained with c_str.to_string_lossy():
If the contents of the CStr are valid UTF-8 data, this function will return a Cow::Borrowed([&str]) with the the corresponding [&str] slice. Otherwise, it will replace any invalid UTF-8 sequences with U+FFFD REPLACEMENT CHARACTER and return a Cow::[Owned](String) with the result.
You can see this in action on the Playground. This Playground shows the usage with to_string_lossy().
Combine Passing a Rust variable to a C function that expects to be able to modify it
unsafe {
let mut c_buf = std::ptr::null();
get_string(&mut c_buf);
}
With How do I convert a C string into a Rust string and back via FFI?:
extern crate libc;
use libc::c_char;
use std::ffi::CStr;
use std::str;
extern "C" {
fn get_string(s: *mut *const c_char);
}
fn main() {
unsafe {
let mut c_buf = std::ptr::null();
get_string(&mut c_buf);
let c_str = CStr::from_ptr(c_buf);
let str_slice: &str = c_str.to_str().unwrap();
let str_buf: String = str_slice.to_owned(); // if necessary
};
}

How to get the last character of a &str?

In Python, this would be final_char = mystring[-1]. How can I do the same in Rust?
I have tried
mystring[mystring.len() - 1]
but I get the error the type 'str' cannot be indexed by 'usize'
That is how you get the last char (which may not be what you think of as a "character"):
mystring.chars().last().unwrap();
Use unwrap only if you are sure that there is at least one char in your string.
Warning: About the general case (do the same thing as mystring[-n] in Python): UTF-8 strings are not to be used through indexing, because indexing is not a O(1) operation (a string in Rust is not an array). Please read this for more information.
However, if you want to index from the end like in Python, you must do this in Rust:
mystring.chars().rev().nth(n - 1) // Python: mystring[-n]
and check if there is such a character.
If you miss the simplicity of Python syntax, you can write your own extension:
trait StrExt {
fn from_end(&self, n: usize) -> char;
}
impl<'a> StrExt for &'a str {
fn from_end(&self, n: usize) -> char {
self.chars().rev().nth(n).expect("Index out of range in 'from_end'")
}
}
fn main() {
println!("{}", "foobar".from_end(2)) // prints 'b'
}
One option is to use slices. Here's an example:
let len = my_str.len();
let final_str = &my_str[len-1..];
This returns a string slice from position len-1 through the end of the string. That is to say, the last byte of your string. If your string consists of only ASCII values, then you'll get the final character of your string.
The reason why this only works with ASCII values is because they only ever require one byte of storage. Anything else, and Rust is likely to panic at runtime. This is what happens when you try to slice out one byte from a 2-byte character.
For a more detailed explanation, please see the strings section of the Rust book.
As #Boiethios mentioned
let last_ch = mystring.chars().last().unwrap();
Or
let last_ch = codes.chars().rev().nth(0).unwrap();
I would rather have (how hard is that!?)
let last_ch = codes.chars(-1); // Not implemented as rustc 1.56.1

Parse a string containing a Unicode number into the corresponding Unicode character?

Is there a function to do something like this:
fn string_to_unicode_char(s: &str) -> Option<char> {
// ...
}
fn main() {
let s = r"\u{00AA}"; // note the raw string literal!
string_to_unicode_char(s).unwrap();
}
Note that r"\u{00AA}" uses a raw string i. e. it isn't a Unicode sequence but 8 separate symbols, as \ u { 0 0 A A }.
I need to interpret/convert/parse this string and return a char if all is good, None otherwise. I don't have any experience with Unicode, so any ideas are welcome.
I believe the function you are looking for is char::from_u32:
fn string_to_unicode_char(s: &str) -> Option<char> {
// Do something more appropriate to find the actual number
let number = &s[3..7];
u32::from_str_radix(number, 16)
.ok()
.and_then(std::char::from_u32)
}
fn main() {
let s = r"\u{00AA}"; // note the raw string literal!
let ch = string_to_unicode_char(s);
assert_eq!(ch, Some('\u{00AA}'));
}
I indeed completely misunderstood your question; my old answer can be seen in the edit logs
Is there a builtin function to parse a string containing a Rust unicode escape into the corresponding unicode character?
AFAIK, no, there is not a builtin function to do that.
The answer to "how to do it yourself" is a bit broad, as there are many ways to do it (and it's not clear whether you also want to parse standard escapes, such as "\n").
Use a regex
Do simple, naive manual parsing
Embed it into a bigger lexer (the function in the Rust compiler parsing such unicode escapes)

Resources