Does Rust have an equivalent to Python's unichr() function? - rust

Python has the unichr() (or chr() in Python 3) function that takes an integer and returns a character with the Unicode code point of that number. Does Rust have an equivalent function?

Sure, though it is a built-in operator as:
let c: char = 97 as char;
println!("{}", c); // prints "a"
Note that as operator works only for u8 numbers, something else will cause a compilation error:
let c: char = 97u32 as char; // error: only `u8` can be cast as `char`, not `u32`
If you need a string (to fully emulate the Python function), use to_string():
let s: String = (97 as char).to_string();
There is also the char::from_u32 function:
use std::char;
let c: Option<char> = char::from_u32(97);
It returns an Option<char> because not every number is a valid Unicode code point - the only valid numbers are 0x0000 to 0xD7FF and from 0xE000 to 0x10FFFF. This function is applicable to a larger set of values than as char and can convert numbers larger than one byte, providing you access to the whole range of Unicode code points.
I've compiled a set of examples on the Playground.

Related

What is suffix annotation in rust?

I read this docs and stumbled on this line.
let an_integer = 5i32; // Suffix annotation
What is this mean? I am assuming that it have 5 as value and i32 as integer type. Is that correct?
Yes, that's correct. When you write the literal 5 in a program, it could be interpreted as a variety of types. (A literal is a value, such as 5, which is written directly into the source code instead of being computed.) If we want to express that a literal is of a certain type, we can append ("suffix") the type to it to make it explicit, as in 5i32.
This is only done with certain built-in types, such as integers and floating-point numbers, but it can come in handy in some cases. For example, the following is not valid:
fn main() {
println!("{}", 1 << 32);
}
That's because if you specify no type at all for an integer, it defaults to i32. Since it's not valid to shift a 32-bit integer by 32 bits, Rust produces an error.
However, we can write this and it will work:
fn main() {
println!("{}", 1u64 << 32);
}
That's because now the integer is a u64 and it's in range.

Why I am getting an error while unwrapping .to_digit option called on char?

I am trying to parse a single char variable into ASCII value, but all the time I am getting an error.
Basing on answer of hansaplast from this post Parsing a char to u32 I thought this code should work:
let char_variable = 'a';
let shoud_be_u32_varaible = a.to_digit(10).unwrap();
But this code will always throw this error:
thread 'main' panicked at 'called Option::unwrap() on a None value'
For this, code example (example provided in answer of hansaplast):
let a = "29";
for c in a.chars() {
println!("{:?}", c.to_digit(10));
}
This .to_digit() method will work.
In both cases I am using on .to_digit(10) on variables which are type of char, but for my example this code throws an error and for the code from hansaplast this works. Can someone explain to me what is the difference between those examples and what I am doing wrong because now I am super confused?
Both examples can be found there: Rust playground example
Is using casting in this case will be ok?
let c = 'a';
let u = c as u32 - 48;
If not, can you tell me, what is recommended of doing this?
Okay, I think you are confusing type casting and integer parsing.
to_digit is an integer parsing method. It takes the character and given a radix determines its value in that base. So 5 in base 10 is 5 and is stored as 00000101. 11 in base 15 is stored in memory as 00010000.
Type casting of primitives in rust like 'c' as u32 is probably more what you are after. It's distinct from integer parsing in the sense that you don't care about the "meaning" of the number what you care about is the value of the bits that represent it in memory. This means that the character 'c' is stored as 1100011 (99).
If you only care about ascii characters you should also check char.is_ascii() before doing your conversion. That way you can store your results in a u8 instead of a u32
fn print_ascii_values_of_characters(string: &str) {
for c in string.chars() {
if c.is_ascii() {
println!("{:b}", c as u8) // :b prints the binary representation
}
}
}

How can I define a generic function that can return a given integer type?

I'd like to define a function that can return a number whose type is specified when the function is called. The function takes a buffer (Vec<u8>) and returns numeric value, e.g.
let byte = buf_to_num<u8>(&buf);
let integer = buf_to_num<u32>(&buf);
The buffer contains an ASCII string that represents a number, e.g. b"827", where each byte is the ASCII code of a digit.
This is my non-working code:
extern crate num;
use num::Integer;
use std::ops::{MulAssign, AddAssign};
fn buf_to_num<T: Integer + MulAssign + AddAssign>(buf: &Vec::<u8>) -> T {
let mut result : T;
for byte in buf {
result *= 10;
result += (byte - b'0');
}
result
}
I get mismatched type errors for both the addition and the multiplication lines (expected type T, found u32). So I guess my problem is how to tell the type system that T can be expressed in terms of a literal 10 or in terms of the result of (byte - b'0')?
Welcome to the joys of having to specify every single operation you're using as a generic. It's a pain, but it is worth.
You have two problems:
result *= 10; without a corresponding From<_> definition. This is because, when you specify "10", there is no way for the compiler to know what "10" as a T means - it knows primitive types, and any conversion you defined by implementing From<_> traits
You're mixing up two operations - coercion from a vector of characters to an integer, and your operation.
We need to make two assumptions for this:
We will require From<u32> so we can cap our numbers to u32
We will also clarify your logic and convert each u8 to char so we can use to_digit() to convert that to u32, before making use of From<u32> to get a T.
use std::ops::{MulAssign, AddAssign};
fn parse_to_i<T: From<u32> + MulAssign + AddAssign>(buf: &[u8]) -> T {
let mut buffer:T = (0 as u32).into();
for o in buf {
buffer *= 10.into();
buffer += (*o as char).to_digit(10).unwrap_or(0).into();
}
buffer
}
You can convince yourself of its behavior on the playground
The multiplication is resolved by force-casting the constant as u8, which makes it benefit from our requirement of From<u8> for T and allows the rust compiler to know we're not doing silly stuff.
The final change is to set result to have a default value of 0.
Let me know if this makes sense to you (or if it doesn't), and I'll be glad to elaborate further if there is a problem :-)

How to get the last character of a &str?

In Python, this would be final_char = mystring[-1]. How can I do the same in Rust?
I have tried
mystring[mystring.len() - 1]
but I get the error the type 'str' cannot be indexed by 'usize'
That is how you get the last char (which may not be what you think of as a "character"):
mystring.chars().last().unwrap();
Use unwrap only if you are sure that there is at least one char in your string.
Warning: About the general case (do the same thing as mystring[-n] in Python): UTF-8 strings are not to be used through indexing, because indexing is not a O(1) operation (a string in Rust is not an array). Please read this for more information.
However, if you want to index from the end like in Python, you must do this in Rust:
mystring.chars().rev().nth(n - 1) // Python: mystring[-n]
and check if there is such a character.
If you miss the simplicity of Python syntax, you can write your own extension:
trait StrExt {
fn from_end(&self, n: usize) -> char;
}
impl<'a> StrExt for &'a str {
fn from_end(&self, n: usize) -> char {
self.chars().rev().nth(n).expect("Index out of range in 'from_end'")
}
}
fn main() {
println!("{}", "foobar".from_end(2)) // prints 'b'
}
One option is to use slices. Here's an example:
let len = my_str.len();
let final_str = &my_str[len-1..];
This returns a string slice from position len-1 through the end of the string. That is to say, the last byte of your string. If your string consists of only ASCII values, then you'll get the final character of your string.
The reason why this only works with ASCII values is because they only ever require one byte of storage. Anything else, and Rust is likely to panic at runtime. This is what happens when you try to slice out one byte from a 2-byte character.
For a more detailed explanation, please see the strings section of the Rust book.
As #Boiethios mentioned
let last_ch = mystring.chars().last().unwrap();
Or
let last_ch = codes.chars().rev().nth(0).unwrap();
I would rather have (how hard is that!?)
let last_ch = codes.chars(-1); // Not implemented as rustc 1.56.1

Does Rust's String have a method that returns the number of characters rather than the number of bytes?

Based on the Rust book, the String::len method returns the number of bytes composing the string, which may not correspond to the length in characters.
For example if we consider the following string in Japanese, len() would return 30, which is the number of bytes and not the number of characters, which would be 10:
let s = String::from("ラウトは難しいです!");
s.len() // returns 30.
The only way I have found to get the number of characters is using the following function:
s.chars().count()
which returns 10, and is the correct number of characters.
Is there any method on String that returns the characters count, aside from the one I am using above?
Is there any method on String that returns the characters count, aside from the one I am using above?
No. Using s.chars().count() is correct. Note that this is an O(N) operation (because UTF-8 is complex) while getting the number of bytes is an O(1) operation.
You can see all the methods on str for yourself.
As pointed out in the comments, a char is a specific concept:
It's important to remember that char represents a Unicode Scalar Value, and may not match your idea of what a 'character' is. Iteration over grapheme clusters may be what you actually want.
One such example is with precomposed characters:
fn main() {
println!("{}", "é".chars().count()); // 2
println!("{}", "é".chars().count()); // 1
}
You may prefer to use graphemes from the unicode-segmentation crate instead:
use unicode_segmentation::UnicodeSegmentation; // 1.6.0
fn main() {
println!("{}", "é".graphemes(true).count()); // 1
println!("{}", "é".graphemes(true).count()); // 1
}

Resources