How to parse 16bit hex from char iterator

How to parse 16bit hex from char iterator - rust

Say I have a str s with the value foo004Frtz and I want to use an iterator to parse the string.
Goal is to in the end parse the 004F into a u16.
let it = s.chars().into_iter();
Assuming that it is at the correct position (second o), how do I parse the u16 from the iterator?
I tried:
let x = u16::from_str_radix(hex, 16); // Should be set to 79
but have a hard time constructing hex from the iterator.
EDIT:
Make value a bit more complicated.

Thx to #Chayim I figured out one solution:
let hex = &it.as_str()[..4];
let x = u16::from_str_radix(&hex, 16).unwrap();
Not sure though how efficient as_str() is...

Related

taking only a int from a text with int string in Rust

i need to take only the integer from a string like this "Critical: 3\r\n" , note that the value change everytime so i can't search for "3", i need to search for a generic int.
Thanks.

Many ways to do it. There are already some answers. Here is one more approach:
let s = "Critical: 3\r\n";
let s_res = s.split(":").collect::<Vec<&str>>()[1].trim();
println!("s_res = {s_res:?}"); // "3"
In the above code s_res will be a string (&str). To convert that string to an integer, you can do something like this:
let n: isize = s_res.parse().expect("Failed to parse the integer!");
println!("n = {n}"); // 3
Note that, depending on your needs, you might want to add some extra validations/asserts, in case you expect the pattern might change (for example, the number of colons not to be 1, etc.).

Building on #AlexanderKrauze's comment the most common way to do so is using a regex, which lets you look for any pattern in a String:
let your_text = "Critical: 3\r\n";
let re = Regex::new(r"\d+").unwrap(); // matches any amount of consecutive digits
let result:Option<Match> = re.find(your_text);// returns the match
let number:u32 = result.map(|m| m.as_str().parse::<u32>().unwrap()).unwrap_or(0); // converts to int
print!("{}", number);
would be the code for that. Only one digit is r"\d".
More documentation is found here.

You can use chars to get an iterator over the chars of a string, and then apply filter on that iterator to filter out only digits(is_digit).
fn main() {
let my_str: String = "Critical: 3\r\n".to_owned();
let digits: String = my_str.chars().filter(|char| char.is_digit(10)).collect();
println!("{}", digits)
}

Need to extract the last word in a Rust string

I am doing some processing of a string in Rust, and I need to be able to extract the last set of characters from that string. In other words, given a string like the following:
some|not|necessarily|long|name
I need to be able to get the last part of that string, namely "name" and put it into another String or a &str, in a manner like:
let last = call_some_function("some|not|necessarily|long|name");
so that last becomes equal to "name".
Is there a way to do this? Is there a string function that will allow this to be done easily? If not (after looking at the documentation, I doubt that there is), how would one do this in Rust?

While the answer from #effect is correct, it is not the most idiomatic nor the most performant way to do it. It'll walk the entire string and match all of the |s to reach the last. You can make it better, but there is a method of str that does exactly what you want - rsplit_once():
let (_, name) = s.rsplit_once('|').unwrap();
// Or
// let name = s.rsplit_once('|').unwrap().1;
//
// You can also use a multichar separator:
// let (_, name) = s.rsplit_once("|").unwrap();
// But in the case of a single character, a `char` type is likely to be more performant.
Playground.

You can use the String::split() method, which will return an iterator over the substrings split by that separator, and then use the Iterator::last() method to return the last element in the iterator, like so:
let s = String::from("some|not|necessarily|long|name");
let last = s.split('|').last().unwrap();
assert_eq!(last, "name");
Please also note that string slices (&str) also implement the split method, so you don't need to use std::String.
let s = "some|not|necessarily|long|name";
let last = s.split('|').last().unwrap();
assert_eq!(last, "name");

Why is in this rust tutorial the string beeing converted to bytes?

I am reading the rust tutorial and in this section the tutorial converts a string into a byte array
like so:
fn first_word(s: &String) -> usize {
let bytes = s.as_bytes();
for (i, &item) in bytes.iter().enumerate() {
if item == b' ' {
return i;
}
}
s.len()
}
They state that this conversion is because we want to find the first instance of the space character so we need to compare to it. My question is why do we need to convert to bytes? What if instead of converting the string to bytes we convert the byte ' ' into a String and compare to that?

Strings in Rust are UTF-8 encoded. You can iterate over chars but that will be a bit slower because Unicode code points are variable length, and the char type is 4 bytes long so you can't fit as many in a cache line.
A space has the same byte representation regardless of whether you are using ASCII or UTF-8 encoding, so this is an easy optimisation. It's also the same amount of code as iterating over chars.
But, probably more importantly, the function in question is returning an index for where the character is found. Finding the index by iterating over chars would tell you how many unicode code points to skip to get to the that position, but you'd have to iterate again each time you wanted to use the index because each preceding codepoint could be anywhere from 1 to 4 bytes long. An index into bytes is much more straightforward and efficient.
For example, with a byte index:
let words = String::from("Hello there");
let index = first_word(&words); // byte index
// just a slice
let first_word = str::from_utf8(&words.as_bytes()[0..index]).unwrap();
Indexing Unicode code points:
let words = String::from("Hello there");
let index = first_word(&words); // code point index
// having to iterate again, and allocate a new String
let first_word: String = words.chars().take(index).collect();
Any method to take a slice here would involve calculating the byte position first.

[] operator for strings, link with slices for vectors

Why do you have to walk over the string to find the nᵗʰ letter of a string when you do s[n] where s is a string. (According to https://doc.rust-lang.org/book/strings.html)
From what I understood, a string is an array of chars and a char is an array of 4 bytes or a number of 4 bytes. So is getting the nth letter would be similar as doing this : v[4*n..4*n+4] where v is a vector ?
What is the cost of v[i..j] ?
I would assume that the cost of v[i..j] is j-i and so that the cost of s[n] should be 4.

Note: The second edition of The Rust Programming Language has an improved and smooth explanation to Strings in Rust, which you might wish to read as well. The answer below, although still accurate, quotes from the first edition of the book.
I will try to clarify these misconceptions about strings in Rust by quoting from the book (https://doc.rust-lang.org/book/strings.html).
A ‘string’ is a sequence of Unicode scalar values encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid encoding of UTF-8 sequences.
With this in mind, plus that UTF-8 code points are variably sized (1 to 4 bytes depending on the character), all strings in Rust, whether they are &str or String, are not arrays of characters, and can not be treated like such. It is further explained why on Slicing:
Because strings are valid UTF-8, they do not support indexing:
let s = "hello";
println!("The first letter of s is {}", s[0]); // ERROR!!!
Usually, access to a vector with [] is very fast. But, because each character in a UTF-8 encoded string can be multiple bytes, you have to walk over the string to find the nᵗʰ letter of a string. This is a significantly more expensive operation, and we don’t want to be misleading.
Unlike what was mentioned in the question, one cannot do s[n], because although in theory this would allows us to fetch the nth byte in constant time, that byte is not guaranteed to make any sense on its own.
What is the cost of v[i..j] ?
The cost of slicing is actually constant, because it is done at byte-level:
You can get a slice of a string with slicing syntax:
let dog = "hachiko";
let hachi = &dog[0..5];
But note that these are byte offsets, not character offsets. So this will fail at runtime:
let dog = "忠犬ハチ公";
let hachi = &dog[0..2];
with this error:
thread '' panicked at 'index 0 and/or 2 in 忠犬ハチ公 do not lie on
character boundary'
Basically, slicing is acceptable and will yield a new view of that string, so no copies are made. However, it should only be used when you are completely sure that the offsets are right in terms of character boundaries.
In order to iterate over each character of a string, you may instead call chars():
let c = s.chars().nth(n);
Even with that in mind, note that handling Unicode character might not be exactly what you want if you wish to handle character modifiers in UTF-8 (which are scalar values by themselves but should not be treated individually either). Quoting now from the str API:
fn chars(&self) -> Chars
Returns an iterator over the chars of a string slice.
As a string slice consists of valid UTF-8, we can iterate through a string slice by char. This method returns such an iterator.
It's important to remember that char represents a Unicode Scalar Value, and may not match your idea of what a 'character' is. Iteration over grapheme clusters may be what you actually want.
Remember, chars may not match your human intuition about characters:
let y = "y̆";
let mut chars = y.chars();
assert_eq!(Some('y'), chars.next()); // not 'y̆'
assert_eq!(Some('\u{0306}'), chars.next());
assert_eq!(None, chars.next());
The unicode_segmentation crate provides a means to define grapheme cluster boundaries:
extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;
let s = "a̐éö̲\r\n";
let g = UnicodeSegmentation::graphemes(s, true).collect::<Vec<&str>>();
let b: &[_] = &["a̐", "é", "ö̲", "\r\n"];
assert_eq!(g, b);

If you do want to treat the string as an array of codepoints (which isn't strictly the same as characters; there are combining marks, emoji with separate skin-tone modifiers, etc.), you can collect it into a Vec:
fn main() {
let s = "£10 🙃!";
for (i,c) in s.char_indices() {
println!("{} {}", i, c);
}
let v: Vec<char> = s.chars().collect();
println!("v[5] = {}", v[5]);
}
Play link
With bonus demonstration of some varying character widths, this outputs:
0 £
2 1
3 0
4
5 🙃
9 !
v[5] = !

How do I reverse a string in 0.9?

How do I reverse a string in Rust 0.9?
According to rosettacode.org this worked in 0.8:
let reversed:~str = "一二三四五六七八九十".rev_iter().collect();
... but I can't get iterators working on strings in 0.9.
Also tried std::str::StrSlice::bytes_rev but I haven't figured out a clean way to convert the result back into a string without the compiler choking.

First of all iteration over bytes and reversing will break multibyte characters (you want iteration over chars)
let s = ~"abc";
let s2: ~str = s.chars_rev().collect();
println!("{:?}", s2);

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to parse 16bit hex from char iterator - rust

Thx to #Chayim I figured out one solution: let hex = &it.as_str()[..4]; let x = u16::from_str_radix(&hex, 16).unwrap(); Not sure though how efficient as_str() is...

Related

taking only a int from a text with int string in Rust

Need to extract the last word in a Rust string

Why is in this rust tutorial the string beeing converted to bytes?

[] operator for strings, link with slices for vectors

How do I reverse a string in 0.9?

Categories

Resources