How to create a &str from a single character? [duplicate] - rust

This question already has answers here:
Converting a char to &str
(3 answers)
Closed 1 year ago.
I can't believe I'm asking this frankly, but how do I create a &str (or a String) when I have a single character?

The first thing to try for simple conversions is into().
It works for String because String implements From<char>.
let c: char = 'π';
let s: String = c.into();
You can't build a &str directly from a char. A &str is a reference type. The easiest solution is to build it from a string:
let s: &str = &s;
An alternative for most kinds of values is the format macro:
let s = format!("{}", c);

If just need to use the &str locally and you want to avoid heap allocation, you can use char method encode_utf8:
fn main() {
let c = 'n';
let mut tmp = [0; 1];
let foo = c.encode_utf8(&mut tmp);
println!("str: {}", foo);
}
or
fn main() {
let tmp = [b'n'; 1];
let foo = std::str::from_utf8(&tmp).unwrap();
println!("str: {}", foo);
}
To work with every char you need to use a u8 array of length 4 [0; 4]. In utf8, ascii chars can be represented as a single byte, but all other characters require more bytes with maximum of 4.
This is a simplified example based on an answer from a very similar question:
Converting a char to &str

Related

Difference between double quotes and single quotes in Rust

I was doing the adventofcode of 2020 day 3 in Rust to train a little bit because I am new to Rust and I my code would not compile depending if I used single quotes or double quotes on my "tree" variable
the first code snippet would not compile and throw the error: expected u8, found &[u8; 1]
use std::fs;
fn main() {
let text: String = fs::read_to_string("./data/text").unwrap();
let vec: Vec<&str> = text.lines().collect();
let vec_vertical_len = vec.len();
let vec_horizontal_len = vec[0].len();
let mut i_pointer: usize = 0;
let mut j_pointer: usize = 0;
let mut tree_counter: usize = 0;
let tree = b"#";
loop {
i_pointer += 3;
j_pointer += 1;
if j_pointer >= vec_vertical_len {
break;
}
let i_index = i_pointer % vec_horizontal_len;
let character = vec[j_pointer].as_bytes()[i_index];
if character == tree {
tree_counter += 1
}
}
println!("{}", tree_counter);
}
the second snippet compiles and gives the right answer..
use std::fs;
fn main() {
let text: String = fs::read_to_string("./data/text").unwrap();
let vec: Vec<&str> = text.lines().collect();
let vec_vertical_len = vec.len();
let vec_horizontal_len = vec[0].len();
let mut i_pointer: usize = 0;
let mut j_pointer: usize = 0;
let mut tree_counter: usize = 0;
let tree = b'#';
loop {
i_pointer += 3;
j_pointer += 1;
if j_pointer >= vec_vertical_len {
break;
}
let i_index = i_pointer % vec_horizontal_len;
let character = vec[j_pointer].as_bytes()[i_index];
if character == tree {
tree_counter += 1
}
}
println!("{}", tree_counter);
}
I did not find any reference explaining what is going on when using single or double quotes..can someone help me?
The short answer is it works similarly to java. Single quotes for characters and double quotes for strings.
let a: char = 'k';
let b: &'static str = "k";
The b'' or b"" prefix means take what I have here and interpret as byte literals instead.
let a: u8 = b'k';
let b: &'static [u8; 1] = b"k";
The reason strings result in references is due to how they are stored in the compiled binary. It would be too inefficient to store a string constant inside each method, so strings get put at the beginning of the binary in header area. When your program is being executed, you are taking a reference to the bytes in that header (hence the static lifetime).
Going further down the rabbit hole, single quotes technically hold a codepoint. This is essentially what you might think of as a character. So a Unicode character would also be considered a single codepoint even though it may be multiple bytes long. A codepoint is assumed to fit into a u32 or less so you can safely convert any char by using as u32, but not the other way around since not all u32 values will match valid codepoints. This also means b'\u{x}' is not valid since \u{x} may produce characters that will not fit within a single byte.
// U+1F600 is a unicode smiley face
let a: char = '\u{1F600}';
assert_eq!(a as u32, 0x1F600);
However, you might find it interesting to know that since Rust strings are stored as UTF-8, codepoints over 127 will occupy multiple bytes in a string despite fitting into a single byte on their own. As you may already know, UTF-8 is simply a way of converting codepoints to bytes and back again.
let foo: &'static str = "\u{1F600}";
let foo_chars: Vec<char> = foo.chars().collect();
let foo_bytes: Vec<u8> = foo.bytes().collect();
assert_eq!(foo_chars.len(), 1);
assert_eq!(foo_bytes.len(), 4);
assert_eq!(foo_chars[0] as u32, 0x1F600);
assert_eq!(foo_bytes, vec![240, 159, 152, 128]);

How to write string representation of a u32 into a String buffer without extra allocation? [duplicate]

This question already has answers here:
How to efficiently push displayable item into String? [duplicate]
(1 answer)
How can I append a formatted string to an existing String?
(1 answer)
Closed 3 years ago.
I can convert a u32 into String and append it to an existing string buffer like this
let mut a = String::new();
let b = 1_u32.to_string();
a.push_str(&b[..]);
But this involves allocation of a new string object in the heap.
How to push the string representation of a u32 without allocating a new String?
Should I implement an int-to-string function from scratch?
Use write! and its family:
use std::fmt::Write;
fn main() {
let mut foo = "answer ".to_string();
write!(&mut foo, "is {}.", 42).expect("This shouldn't fail");
println!("The {}", foo);
}
This prints The answer is 42., and does exactly one allocation (the explicit to_string).
(Permalink to the playground)

How to find the starting offset of a string slice of another string? [duplicate]

This question already has answers here:
How to get the byte offset between `&str`
(2 answers)
Closed 3 years ago.
Given a string and a slice referring to some substring, is it possible to find the starting and ending index of the slice?
I have a ParseString function which takes in a reference to a string, and tries to parse it according to some grammar:
ParseString(inp_string: &str) -> Result<(), &str>
If the parsing is fine, the result is just Ok(()), but if there's some error, it usually is in some substring, and the error instance is Err(e), where e is a slice of that substring.
When given the substring where the error occurs, I want to say something like "Error from characters x to y", where x and y are the starting and ending indices of the erroneous substring.
I don't want to encode the position of the errors directly in Err, because I'm nesting these invocations, and the offsets in the nested slice might not correspond to the some slice in the top level string.
As long as all of your string slices borrow from the same string buffer, you can calculate offsets with simple pointer arithmetic. You need the following methods:
str::as_ptr(): Returns the pointer to the start of the string slice
A way to get the difference between two pointers. Right now, the easiest way is to just cast both pointers to usize (which is always a no-op) and then subtract those. On 1.47.0+, there is a method offset_from() which is slightly nicer.
Here is working code (Playground):
fn get_range(whole_buffer: &str, part: &str) -> (usize, usize) {
let start = part.as_ptr() as usize - whole_buffer.as_ptr() as usize;
let end = start + part.len();
(start, end)
}
fn main() {
let input = "Everyone ♥ Ümläuts!";
let part1 = &input[1..7];
println!("'{}' has offset {:?}", part1, get_range(input, part1));
let part2 = &input[7..16];
println!("'{}' has offset {:?}", part2, get_range(input, part2));
}
Rust actually used to have an unstable method for doing exactly this, but it was removed due to being obsolete, which was a bit odd considering the replacement didn't remotely have the same functionality.
That said, the implementation isn't that big, so you can just add the following to your code somewhere:
pub trait SubsliceOffset {
/**
Returns the byte offset of an inner slice relative to an enclosing outer slice.
Examples
```ignore
let string = "a\nb\nc";
let lines: Vec<&str> = string.lines().collect();
assert!(string.subslice_offset_stable(lines[0]) == Some(0)); // &"a"
assert!(string.subslice_offset_stable(lines[1]) == Some(2)); // &"b"
assert!(string.subslice_offset_stable(lines[2]) == Some(4)); // &"c"
assert!(string.subslice_offset_stable("other!") == None);
```
*/
fn subslice_offset_stable(&self, inner: &Self) -> Option<usize>;
}
impl SubsliceOffset for str {
fn subslice_offset_stable(&self, inner: &str) -> Option<usize> {
let self_beg = self.as_ptr() as usize;
let inner = inner.as_ptr() as usize;
if inner < self_beg || inner > self_beg.wrapping_add(self.len()) {
None
} else {
Some(inner.wrapping_sub(self_beg))
}
}
}
You can remove the _stable suffix if you don't need to support old versions of Rust; it's just there to avoid a name conflict with the now-removed subslice_offset method.

How do I get the first character out of a string?

I want to get the first character of a std::str. The method char_at() is currently unstable, as is String::slice_chars.
I have come up with the following, but it seems excessive to get a single character and not use the rest of the vector:
let text = "hello world!";
let char_vec: Vec<char> = text.chars().collect();
let ch = char_vec[0];
UTF-8 does not define what "character" is so it depends on what you want. In this case, chars are Unicode scalar values, and so the first char of a &str is going to be between one and four bytes.
If you want just the first char, then don't collect into a Vec<char>, just use the iterator:
let text = "hello world!";
let ch = text.chars().next().unwrap();
Alternatively, you can use the iterator's nth method:
let ch = text.chars().nth(0).unwrap();
Bear in mind that elements preceding the index passed to nth will be consumed from the iterator.
I wrote a function that returns the head of a &str and the rest:
fn car_cdr(s: &str) -> (&str, &str) {
for i in 1..5 {
let r = s.get(0..i);
match r {
Some(x) => return (x, &s[i..]),
None => (),
}
}
(&s[0..0], s)
}
Use it like this:
let (first_char, remainder) = car_cdr("test");
println!("first char: {}\nremainder: {}", first_char, remainder);
The output looks like:
first char: t
remainder: est
It works fine with chars that are more than 1 byte.
Get the first single character out of a string w/o using the rest of that string:
let text = "hello world!";
let ch = text.chars().take(1).last().unwrap();
It would be nice to have something similar to Haskell's head function and tail function for such cases.
I wrote this function to act like head and tail together (doesn't match exact implementation)
pub fn head_tail<T: Iterator, O: FromIterator<<T>::Item>>(iter: &mut T) -> (Option<<T>::Item>, O) {
(iter.next(), iter.collect::<O>())
}
Usage:
// works with Vec<i32>
let mut val = vec![1, 2, 3].into_iter();
println!("{:?}", head_tail::<_, Vec<i32>>(&mut val));
// works with chars in two ways
let mut val = "thanks! bedroom builds YT".chars();
println!("{:?}", head_tail::<_, String>(&mut val));
// calling the function with Vec<char>
let mut val = "thanks! bedroom builds YT".chars();
println!("{:?}", head_tail::<_, Vec<char>>(&mut val));
NOTE: The head_tail function doesn't panic! if the iterator is empty. If this matched Haskell's head/tail output, this would have thrown an exception if the iterator was empty. It might also be good to use iterable trait to be more compatible to other types.
If you only want to test for it, you can use starts_with():
"rust".starts_with('r')
"rust".starts_with(|c| c == 'r')
I think it is pretty straight forward
let text = "hello world!";
let c: char = text.chars().next().unwrap();
next() takes the next item from the iterator
To “unwrap” something in Rust is to say, “Give me the result of the computation, and if there was an error, panic and stop the program.”
The accepted answer is a bit ugly!
let text = "hello world!";
let ch = &text[0..1]; // this returns "h"

Creating a string from Vec<char> [duplicate]

This question already has answers here:
How to convert Vec<char> to a string
(2 answers)
Closed 6 years ago.
I've got a Vec<char> that I need to turn into a &str or String, but I'm unsure of the best way to do this. I've looked around and every resource I've found seems to be out-dated in some way. The answers in this question don't seem to be applicable for the newest build.
I'm using the nightly for 2015-3-19
The iterator based approach with .collect should work, after updating for language changes:
char_vector.iter().cloned().collect::<String>();
(I've chosen to replace .map(|c| *c) with .cloned() but either works.)
If your vector can be consumed, you can also use into_iter to avoid the clone
fn main() {
let char_vector = vec!['h', 'e', 'l', 'l', 'o'];
let str: String = char_vector.into_iter().collect();
println!("{}", str);
}
You can convert the Vec into a String without doing any allocations. It requires quite some unsafe code though:
#![feature(raw, unicode)]
use std::raw::Repr;
use std::slice::from_raw_parts_mut;
fn inplace_to_string(v: Vec<char>) -> String {
unsafe {
let mut i = 0;
{
let ch_v = &v[..];
let r = ch_v.repr();
let p: &mut [u8] = from_raw_parts_mut(r.data as *mut u8, r.len*4);
for ch in ch_v {
i += ch.encode_utf8(&mut p[i..i+4]).unwrap();
}
}
let p = v.as_ptr();
let cap = v.capacity()*4;
std::mem::forget(v);
let v = Vec::from_raw_parts(p as *mut u8, i, cap);
String::from_utf8_unchecked(v)
}
}
fn main() {
let char_vector = vec!['h', 'ä', 'l', 'l', 'ö'];
let str: String = char_vector.iter().cloned().collect();
let str2 = inplace_to_string(char_vector);
println!("{}", str);
println!("{}", str2);
}
PlayPen
Detailed Explanation
This creates a mutable u8 slice and a char slice simultaneously to the same buffer (breaking all Rust guarantees). Note that the u8 slice is four times as large as the char slice, since char always takes up 4 bytes.
let ch_v = &v[..];
let r = ch_v.repr();
let v: &mut [u8] = from_raw_parts_mut(r.data as *mut u8, r.len*4);
We need that to iterate over the unicode chars and replace them by their utf8 encoded counterpart. Since utf8 is always shorter or the same length as unicode, we can guarantee that we never overwrite any part we haven't read yet.
for ch in ch_v {
i += ch.encode_utf8(&mut v[i..i+4]).unwrap();
}
Since a char is always unicode and our buffer is always exactly 4 bytes (which is the maximum number of bytes a utf8 encoded unicode char will need), we can encode our chars to utf8 without checking if it worked (it will always work). The encode_utf8 function returns the length of the utf8 representation. Our index i is the location of the last written utf8 char.
Finally we need to do some cleaning up. Our vector is still of type Vec<char>. We get all the info we need (Pointer to the heap allocated array and the capacity)
let p = v.as_ptr();
let cap = v.capacity()*4;
Then we release the previous vector from all obligations like freeing memory.
std::mem::forget(v);
and finally recreate the u8 vector with correct length and capacity, and directly turn it into a String. The conversion to String does not need to be checked, as we already know the utf8 is correct, since the original Vec<char> could only contain correct unicode chars.
let v = Vec::from_raw_parts(p as *mut u8, i, cap);
String::from_utf8_unchecked(v)

Resources