Convert &[Box<[u8]>] to String or &str - rust

What's an efficient way to convert a result of type &[Box<[u8]>] into something more readily consumed like String or &str?
An example function is the txt_data() method from trust_dns_proto::rr:rdat::txt::TXT.
I've tried several things that seem to go nowhere, like:
fn main() {
let raw: &[Box<[u8]>] = &["Hello", " world!"]
.iter()
.map(|i| i.as_bytes().to_vec().into_boxed_slice())
.collect::<Vec<_>>();
let value = raw.iter().map(|s| String::from(*s)).join("");
assert_eq!(value, "Hello world!");
}
Where raw is of that type.

There is no way to convert an array of octets to str directly cause the data is split up. So a String look like a good candidate.
I would use str::from_utf8() combined with try_fold():
use std::str;
fn main() {
let raw: &[Box<[u8]>] = &["Hello", " world!"]
.iter()
.map(|i| i.as_bytes().to_vec().into_boxed_slice())
.collect::<Vec<_>>();
let value = raw
.iter()
.map(|i| str::from_utf8(i))
.try_fold(String::new(), |a, i| {
i.map(|i| {
let mut a = a;
a.push_str(i);
a
})
});
assert_eq!(value.as_ref().map(|x| x.as_str()), Ok("Hello world!"));
}

It looks like the solution is this:
let value: String = raw
.iter()
.map(|s| String::from_utf8((*s).to_vec()).unwrap())
.collect::<Vec<String>>()
.join("");
Where the key is from_utf8() and the (*s).to_vec() suggested by rustc.

Related

How to use common BTreeMap variable in rust(single thread)

Here is my original simplified code, I want to use a global variable instead of the variables in separate functions. What's the suggestion method in rust?
BTW, I've tried to use global or change to function parameter, both are nightmare for a beginner. Too difficult to solve the lifetime & variable type cast issue.
This simple program is only a single thread tool, so, in C language, it is not necessary the extra mutex.
// version 1
use std::collections::BTreeMap;
// Trying but failed
// let mut guess_number = BTreeMap::new();
// | ^^^ expected item
fn read_csv() {
let mut guess_number = BTreeMap::new();
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: u16 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
fn main() {
let mut guess_number = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv();
}
To explain how hard it is for a beginner, by pass parameter
// version 2
use std::collections::BTreeMap;
fn read_csv(guess_number: BTreeMap) {
// ^^^^^^^^ expected 2 generic arguments
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: u16 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
}
fn main() {
let mut guess_number = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv(guess_number);
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
After some effort, try & error to get the possible work type BTreeMap<&str, i32>
// version 3
use std::collections::BTreeMap;
fn read_csv(guess_number: &BTreeMap<&str, i32>) {
// let mut guess_number = BTreeMap::new();
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: i32 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
fn main() {
let mut guess_number: BTreeMap<&str, i32> = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv(&guess_number);
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
will cause following error
7 | fn read_csv(guess_number: &BTreeMap<&str, i32>) {
| -------------------- help: consider changing this to be a mutable reference: `&mut BTreeMap<&str, i32>`
...
16 | guess_number.insert(vec[0], number);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ `guess_number` is a `&` reference, so the data it refers to cannot be borrowed as mutable
The final answer (seems not suggest use global in Rust, so use 'mutable reference').
// version 4
use std::collections::BTreeMap;
fn read_csv(guess_number: &mut BTreeMap<&str, i32>) {
let lines = ["Tom,4", "John,6"];
for line in lines.iter() {
let split = line.split(",");
let vec: Vec<_> = split.collect();
println!("{} {:?}", line, vec);
let number: i32 = vec[1].trim().parse().unwrap();
guess_number.insert(vec[0], number);
}
}
fn main() {
let mut guess_number: BTreeMap<&str, i32> = BTreeMap::new();
guess_number.insert("Tom", 3);
guess_number.insert("John", 7);
if guess_number.contains_key("John") {
println!("John's number={:?}", guess_number.get("John").unwrap());
}
read_csv(&mut guess_number);
for (k, v) in guess_number {
println!("{} {:?}", k, v);
}
}
This question is not specific to BTreeMaps but for pretty much all data types, such as numbers, strings, vectors, enums, etc.
If you want to pass a variable (value) from one function to another, you can do that in various ways in Rust. Typically you either move the value or you pass a reference to it. Moving is something quite specific to Rust and its ownership model. This is really essential, so if you have serious intentions to learn Rust, I strongly suggest you read the chapter Understanding Ownership from "the book". Don't get discouraged if you don't understand it from one reading. Spend as much time as needed, as you really can't move forward w/o this knowledge.
As for global variables, there are very few situations where they should be used. In Rust using global variables is slightly more difficult, compared to most other languages. This thread is quite useful, although you might find it a bit difficult to comprehend. My advice to a beginner would be to first fully understand the basic concept of moving and passing references.

Splitting a UTF-8 string into chunks

I want to split a UTF-8 string into chunks of equal size. I came up with a solution that does exactly that. Now I want to simplify it removing the first collect call if possible. Is there a way to do it?
fn main() {
let strings = "ĄĆĘŁŃÓŚĆŹŻ"
.chars()
.collect::<Vec<char>>()
.chunks(3)
.map(|chunk| chunk.iter().collect::<String>())
.collect::<Vec<String>>();
println!("{:?}", strings);
}
Playground link
You can use chunks() from Itertools.
use itertools::Itertools; // 0.10.1
fn main() {
let strings = "ĄĆĘŁŃÓŚĆŹŻ"
.chars()
.chunks(3)
.into_iter()
.map(|chunk| chunk.collect::<String>())
.collect::<Vec<String>>();
println!("{:?}", strings);
}
This doesn't require Itertools as a dependency and also does not allocate, as it iterates over slices of the original string:
fn chunks(s: &str, length: usize) -> impl Iterator<Item=&str> {
assert!(length > 0);
let mut indices = s.char_indices().map(|(idx, _)| idx).peekable();
std::iter::from_fn(move || {
let start_idx = match indices.next() {
Some(idx) => idx,
None => return None,
};
for _ in 0..length - 1 {
indices.next();
}
let end_idx = match indices.peek() {
Some(idx) => *idx,
None => s.bytes().len(),
};
Some(&s[start_idx..end_idx])
})
}
fn main() {
let strings = chunks("ĄĆĘŁŃÓŚĆŹŻ", 3).collect::<Vec<&str>>();
println!("{:?}", strings);
}
Having considered the problem with graphemes I ended up with the following solution.
I used the unicode-segmentation crate.
use unicode_segmentation::UnicodeSegmentation;
fn main() {
let strings = "ĄĆĘŁŃÓŚĆŹŻèèèèè"
.graphemes(true)
.collect::<Vec<&str>>()
.chunks(length)
.map(|chunk| chunk.concat())
.collect::<Vec<String>>();
println!("{:?}", strings);
}
I hope some simplifications can still be made.

Does Rust have an equivalent to Python's dictionary comprehension syntax?

How would one translate the following Python, in which several files are read and their contents are used as values to a dictionary (with filename as key), to Rust?
countries = {region: open("{}.txt".format(region)).read() for region in ["canada", "usa", "mexico"]}
My attempt is shown below, but I was wondering if a one-line, idiomatic solution is possible.
use std::{
fs::File,
io::{prelude::*, BufReader},
path::Path,
collections::HashMap,
};
macro_rules! map(
{ $($key:expr => $value:expr),+ } => {
{
let mut m = HashMap::new();
$(
m.insert($key, $value);
)+
m
}
};
);
fn lines_from_file<P>(filename: P) -> Vec<String>
where
P: AsRef<Path>,
{
let file = File::open(filename).expect("no such file");
let buf = BufReader::new(file);
buf.lines()
.map(|l| l.expect("Could not parse line"))
.collect()
}
fn main() {
let _countries = map!{ "canada" => lines_from_file("canada.txt"),
"usa" => lines_from_file("usa.txt"),
"mexico" => lines_from_file("mexico.txt") };
}
Rust's iterators have map/filter/collect methods which are enough to do anything Python's comprehensions can. You can create a HashMap with collect on an iterator of pairs, but collect can return various types of collections, so you may have to specify the type you want.
For example,
use std::collections::HashMap;
fn main() {
println!(
"{:?}",
(1..5).map(|i| (i + i, i * i)).collect::<HashMap<_, _>>()
);
}
Is roughly equivalent to the Python
print({i+i: i*i for i in range(1, 5)})
But translated very literally, it's actually closer to
from builtins import dict
def main():
print("{!r}".format(dict(map(lambda i: (i+i, i*i), range(1, 5)))))
if __name__ == "__main__":
main()
not that you would ever say it that way in Python.
Python's comprehensions are just sugar for a for loop and accumulator. Rust has macros--you can make any sugar you want.
Take this simple Python example,
print({i+i: i*i for i in range(1, 5)})
You could easily re-write this as a loop and accumulator:
map = {}
for i in range(1, 5):
map[i+i] = i*i
print(map)
You could do it basically the same way in Rust.
use std::collections::HashMap;
fn main() {
let mut hm = HashMap::new();
for i in 1..5 {
hm.insert(i + i, i * i);
}
println!("{:?}", hm);
}
You can use a macro to do the rewriting to this form for you.
use std::collections::HashMap;
macro_rules! hashcomp {
($name:ident = $k:expr => $v:expr; for $i:ident in $itr:expr) => {
let mut $name = HashMap::new();
for $i in $itr {
$name.insert($k, $v);
}
};
}
When you use it, the resulting code is much more compact. And this choice of separator tokens makes it resemble the Python.
fn main() {
hashcomp!(hm = i+i => i*i; for i in 1..5);
println!("{:?}", hm);
}
This is just a basic example that can handle a single loop. Python's comprehensions also can have filters and additional loops, but a more advanced macro could probably do that too.
Without using your own macros I think the closest to
countries = {region: open("{}.txt".format(region)).read() for region in ["canada", "usa", "mexico"]}
in Rust would be
let countries: HashMap<_, _> = ["canada", "usa", "mexico"].iter().map(|&c| {(c,read_to_string(c.to_owned() + ".txt").expect("Error reading file"),)}).collect();
but running a formatter, will make it more readable:
let countries: HashMap<_, _> = ["canada", "usa", "mexico"]
.iter()
.map(|&c| {
(
c,
read_to_string(c.to_owned() + ".txt").expect("Error reading file"),
)
})
.collect();
A few notes:
To map a vector, you need to transform it into an iterator, thus iter().map(...).
To transform an iterator back into a tangible data structure, e.g. a HashMap (dict), use .collect(). This is the advantage and pain of Rust, it is very strict with types, no unexpected conversions.
A complete test program:
use std::collections::HashMap;
use std::fs::{read_to_string, File};
use std::io::Write;
fn create_files() -> std::io::Result<()> {
let regios = [
("canada", "Ottawa"),
("usa", "Washington"),
("mexico", "Mexico city"),
];
for (country, capital) in regios {
let mut file = File::create(country.to_owned() + ".txt")?;
file.write_fmt(format_args!("The capital of {} is {}", country, capital))?;
}
Ok(())
}
fn create_hashmap() -> HashMap<&'static str, String> {
let countries = ["canada", "usa", "mexico"]
.iter()
.map(|&c| {
(
c,
read_to_string(c.to_owned() + ".txt").expect("Error reading file"),
)
})
.collect();
countries
}
fn main() -> std::io::Result<()> {
println!("Hello, world!");
create_files().expect("Failed to create files");
let countries = create_hashmap();
{
println!("{:#?}", countries);
}
std::io::Result::Ok(())
}
Not that specifying the type of countries is not needed here, because the return type of create_hashmap() is defined.

How do I convert reverse domain notation to PascalCase?

I want to convert "foo.bar.baz" to "FooBarBaz". My input will always be only ASCII. I tried:
let result = "foo.bar.baz"
.to_string()
.split(".")
.map(|x| x[0..1].to_string().to_uppercase() + &x[1..])
.fold("".to_string(), |acc, x| acc + &x);
println!("{}", result);
but that feels inefficient.
Your solution is a good start. You could probably make it work without heap allocations in the "functional" style; I prefer putting complex logic into normal for loops though.
Also I don't like assuming input is in ASCII without actually checking - this should work with any string.
You probably could also use String::with_capacity in your code to avoid reallocations in standard cases.
Playground
fn dotted_to_pascal_case(s: &str) -> String {
let mut result = String::with_capacity(s.len());
for part in s.split('.') {
let mut cs = part.chars();
if let Some(c) = cs.next() {
result.extend(c.to_uppercase());
}
result.push_str(cs.as_str());
}
result
}
fn main() {
println!("{}", dotted_to_pascal_case("foo.bar.baz"));
}
Stefan's answer is correct, but I decided to get rid of that first String allocation and go full-functional, without loops:
fn dotted_to_pascal_case(s: &str) -> String {
s.split('.')
.map(|piece| piece.chars())
.flat_map(|mut chars| {
chars
.next()
.expect("empty section between dots!")
.to_uppercase()
.chain(chars)
})
.collect()
}
fn main() {
println!("{}", dotted_to_pascal_case("foo.bar.baz"));
}

How to convert a string of digits into a vector of digits?

I'm trying to store a string (or str) of digits, e.g. 12345 into a vector, such that the vector contains {1,2,3,4,5}.
As I'm totally new to Rust, I'm having problems with the types (String, str, char, ...) but also the lack of any information about conversion.
My current code looks like this:
fn main() {
let text = "731671";
let mut v: Vec<i32>;
let mut d = text.chars();
for i in 0..text.len() {
v.push( d.next().to_digit(10) );
}
}
You're close!
First, the index loop for i in 0..text.len() is not necessary since you're going to use an iterator anyway. It's simpler to loop directly over the iterator: for ch in text.chars(). Not only that, but your index loop and the character iterator are likely to diverge, because len() returns you the number of bytes and chars() returns you the Unicode scalar values. Being UTF-8, the string is likely to have fewer Unicode scalar values than it has bytes.
Next hurdle is that to_digit(10) returns an Option, telling you that there is a possibility the character won't be a digit. You can check whether to_digit(10) returned the Some variant of an Option with if let Some(digit) = ch.to_digit(10).
Pieced together, the code might now look like this:
fn main() {
let text = "731671";
let mut v = Vec::new();
for ch in text.chars() {
if let Some(digit) = ch.to_digit(10) {
v.push(digit);
}
}
println!("{:?}", v);
}
Now, this is rather imperative: you're making a vector and filling it digit by digit, all by yourself. You can try a more declarative or functional approach by applying a transformation over the string:
fn main() {
let text = "731671";
let v: Vec<u32> = text.chars().flat_map(|ch| ch.to_digit(10)).collect();
println!("{:?}", v);
}
ArtemGr's answer is pretty good, but their version will skip any characters that aren't digits. If you'd rather have it fail on bad digits, you can use this version instead:
fn to_digits(text: &str) -> Option<Vec<u32>> {
text.chars().map(|ch| ch.to_digit(10)).collect()
}
fn main() {
println!("{:?}", to_digits("731671"));
println!("{:?}", to_digits("731six71"));
}
Output:
Some([7, 3, 1, 6, 7, 1])
None
To mention the quick and dirty elephant in the room, if you REALLY know your string contains only digits in the range '0'..'9', than you can avoid memory allocations and copies and use the underlying &[u8] representation of String from str::as_bytes directly. Subtract b'0' from each element whenever you access it.
If you are doing competitive programming, this is one of the worthwhile speed and memory optimizations.
fn main() {
let text = "12345";
let digit = text.as_bytes();
println!("Text = {:?}", text);
println!("value of digit[3] = {}", digit[3] - b'0');
}
Output:
Text = "12345"
value of digit[3] = 4
This solution combines ArtemGr's + notriddle's solutions:
fn to_digits(string: &str) -> Vec<u32> {
let opt_vec: Option<Vec<u32>> = string
.chars()
.map(|ch| ch.to_digit(10))
.collect();
match opt_vec {
Some(vec_of_digits) => vec_of_digits,
None => vec![],
}
}
In my case, I implemented this function in &str.
pub trait ExtraProperties {
fn to_digits(self) -> Vec<u32>;
}
impl ExtraProperties for &str {
fn to_digits(self) -> Vec<u32> {
let opt_vec: Option<Vec<u32>> = self
.chars()
.map(|ch| ch.to_digit(10))
.collect();
match opt_vec {
Some(vec_of_digits) => vec_of_digits,
None => vec![],
}
}
}
In this way, I transform &str to a vector containing digits.
fn main() {
let cnpj: &str = "123456789";
let nums: Vec<u32> = cnpj.to_digits();
println!("cnpj: {cnpj}"); // cnpj: 123456789
println!("nums: {nums:?}"); // nums: [1, 2, 3, 4, 5, 6, 7, 8, 9]
}
See the Rust Playground.

Resources