I have a string with contents like the following:
key1:value1 key2:value2 key3:value3 ...
I want to end up with a HashMap<&str, &str> (or equivalent), which maps the keys and values from the string (e.g. a hash lookup of "key1" should return "value1").
Currently, I am able to easily fill up the HashMap with the following:
let mut hm = HashMap::new();
for item in my_string.split_ascii_whitespace() {
let splits = item.split(":").collect::<Vec<&str>>();
hm.insert(splits[0], splits[1]);
}
However, what if I wanted to do it in one line (even at cost of readability, for "code golf"-ish purposes)? I know how to do this with a HashSet, so I'd imagine it would look somewhat similar; perhaps something like this (which doesn't actually compile):
let hm: HashMap<&str, &str> = HashMap::from_iter(my_string.split_ascii_whitespace().map(|s| s.split(":").take(2).collect::<Vec<&str>>()));
I have tried different combinations of things similar to the above, but I can't seem to find something that will actually compile.
My approach toward solving this problem was remembering that an Iterator<Item=(K, V)> can be collected into an HashMap<K, V> so upon knowing that I worked backwards to try to figure out how I could turn a &str into an Iterator<Item=(&str, &str)> which I managed to do using the String find and split_at methods:
use std::collections::HashMap;
fn one_liner(string: &str) -> HashMap<&str, &str> {
string.split_whitespace().map(|s| s.split_at(s.find(":").unwrap())).map(|(key, val)| (key, &val[1..])).collect()
}
fn main() {
dbg!(one_liner("key1:value1 key2:value2 key3:value3"));
}
playground
The second map call was necessary to remove the leading : character from the value string.
Related
I have a HashMap with Option<String> keys; is it possible to do lookups with a key of type Option<&str>? I know I can use a &str to lookup in a HashMap<String, V> because str implements Borrow<String>.
Do I have to convert to an owned string just to do a lookup?
It's slightly less efficient, but you could use Cow here. It avoids the problem with the Borrow trait by instead having a single type that can represent either a reference or an owned value, like so:
use std::borrow::Cow;
use std::collections::HashMap;
fn main() {
let mut map = HashMap::<Option<Cow<'static, str>>, i32>::new();
map.insert(None, 5);
map.insert(Some(Cow::Borrowed("hello")), 10);
map.insert(Some(Cow::Borrowed("world")), 15);
// works with None and constant string slices...
assert_eq!(map.get(&None), Some(&5));
assert_eq!(map.get(&Some(Cow::Borrowed("hello"))), Some(&10));
// ...and also works with heap-allocated strings, without copies
let stack = String::from("world");
assert_eq!(map.get(&Some(Cow::Borrowed(&stack))), Some(&15));
}
In the rust book we have the following code
#[test]
fn one_result() {
let query = "duct";
let contents = "\
Rust:
safe, fast, productive.
Pick three.";
assert_eq!(vec!["safe, fast, productive."], search(query, contents));
}
and the function for searching is:
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
let mut results = Vec::new();
for line in contents.lines() {
if line.contains(query) {
results.push(line);
}
}
results
}
How does assert_eq access the vector element with a string? I cannot find any description about such functionality
assert_eq! does not access vector elements by string. It compares equality (==) of the two vectors.
assert_eq! is also just syntactic sugar for checking equality, otherwise panicking.
In other words, this is the same as your assert:
if vec!["safe, fast, productive."] != search(query, contents) {
panic!()
}
Keep reading the book to find out about Traits, notably the Eq and PartialEq traits, which are responsible for testing equality in rust.
I think I found the answer. The code
assert_eq!(vec!["safe, fast, productive."], search(query, contents));
creates a vector with elements of type String with only one entry that contains the text “safe, fast, productive.” and then compares this vector with the returned vector from the search function. So it does not try to access vector elements with strings, but compares vectors that contain only one element each
Per Steve Klabnik's writeup in the pre-Rust 1.0 documentation on the difference between String and &str, in Rust you should use &str unless you really need to have ownership over a String. Similarly, it's recommended to use references to slices (&[]) instead of Vecs unless you really need ownership over the Vec.
I have a Vec<String> and I want to write a function that uses this sequence of strings and it doesn't need ownership over the Vec or String instances, should that function take &[&str]? If so, what's the best way to reference the Vec<String> into &[&str]? Or, is this coercion overkill?
You can create a function that accepts both &[String] and &[&str] using the AsRef trait:
fn test<T: AsRef<str>>(inp: &[T]) {
for x in inp { print!("{} ", x.as_ref()) }
println!("");
}
fn main() {
let vref = vec!["Hello", "world!"];
let vown = vec!["May the Force".to_owned(), "be with you.".to_owned()];
test(&vref);
test(&vown);
}
This is actually impossible without either memory allocation or per-element call1.
Going from String to &str is not just viewing the bits in a different light; String and &str have a different memory layout, and thus going from one to the other requires creating a new object. The same applies to Vec and &[]
Therefore, whilst you can go from Vec<T> to &[T], and thus from Vec<String> to &[String], you cannot directly go from Vec<String> to &[&str]. Your choices are:
either accept &[String]
allocate a new Vec<&str> referencing the first Vec, and convert that into a &[&str]
As an example of the allocation:
fn usage(_: &[&str]) {}
fn main() {
let owned = vec![String::new()];
let half_owned: Vec<_> = owned.iter().map(String::as_str).collect();
usage(&half_owned);
}
1 Using generics and the AsRef<str> bound as shown in #aSpex's answer you get a slightly more verbose function declaration with the flexibility you were asking for, but you do have to call .as_ref() in all elements.
I have a &[u8] slice over a binary buffer. I need to parse it, but a lot of the methods that I would like to use (such as str::find) don't seem to be available on slices.
I've seen that I can covert both by buffer slice and my pattern to str by using from_utf8_unchecked() but that seems a little dangerous (and also really hacky).
How can I find a subsequence in this slice? I actually need the index of the pattern, not just a slice view of the parts, so I don't think split will work.
Here's a simple implementation based on the windows iterator.
fn find_subsequence(haystack: &[u8], needle: &[u8]) -> Option<usize> {
haystack.windows(needle.len()).position(|window| window == needle)
}
fn main() {
assert_eq!(find_subsequence(b"qwertyuiop", b"tyu"), Some(4));
assert_eq!(find_subsequence(b"qwertyuiop", b"asd"), None);
}
The find_subsequence function can also be made generic:
fn find_subsequence<T>(haystack: &[T], needle: &[T]) -> Option<usize>
where for<'a> &'a [T]: PartialEq
{
haystack.windows(needle.len()).position(|window| window == needle)
}
I don't think the standard library contains a function for this. Some libcs have memmem, but at the moment the libc crate does not wrap this. You can use the twoway crate however. rust-bio implements some pattern matching algorithms, too. All of those should be faster than using haystack.windows(..).position(..)
I found the memmem crate useful for this task:
use memmem::{Searcher, TwoWaySearcher};
let search = TwoWaySearcher::new("dog".as_bytes());
assert_eq!(
search.search_in("The quick brown fox jumped over the lazy dog.".as_bytes()),
Some(41)
);
How about Regex on bytes? That looks very powerful. See this Rust playground demo.
extern crate regex;
use regex::bytes::Regex;
fn main() {
//see https://doc.rust-lang.org/regex/regex/bytes/
let re = Regex::new(r"say [^,]*").unwrap();
let text = b"say foo, say bar, say baz";
// Extract all of the strings without the null terminator from each match.
// The unwrap is OK here since a match requires the `cstr` capture to match.
let cstrs: Vec<usize> =
re.captures_iter(text)
.map(|c| c.get(0).unwrap().start())
.collect();
assert_eq!(cstrs, vec![0, 9, 18]);
}
Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.
I've implemented the following method to return me the words from a file in a 2 dimensional data structure:
fn read_terms() -> Vec<Vec<String>> {
let path = Path::new("terms.txt");
let mut file = BufferedReader::new(File::open(&path));
return file.lines().map(|x| x.unwrap().as_slice().words().map(|x| x.to_string()).collect()).collect();
}
Is this the right, idiomatic and efficient way in Rust? I'm wondering if collect() needs to be called so often and whether it's necessary to call to_string() here to allocate memory. Maybe the return type should be defined differently to be more idiomatic and efficient?
There is a shorter and more readable way of getting words from a text file.
use std::io::{BufRead, BufReader};
use std::fs::File;
let reader = BufReader::new(File::open("file.txt").expect("Cannot open file.txt"));
for line in reader.lines() {
for word in line.unwrap().split_whitespace() {
println!("word '{}'", word);
}
}
You could instead read the entire file as a single String and then build a structure of references that points to the words inside:
use std::io::{self, Read};
use std::fs::File;
fn filename_to_string(s: &str) -> io::Result<String> {
let mut file = File::open(s)?;
let mut s = String::new();
file.read_to_string(&mut s)?;
Ok(s)
}
fn words_by_line<'a>(s: &'a str) -> Vec<Vec<&'a str>> {
s.lines().map(|line| {
line.split_whitespace().collect()
}).collect()
}
fn example_use() {
let whole_file = filename_to_string("terms.txt").unwrap();
let wbyl = words_by_line(&whole_file);
println!("{:?}", wbyl)
}
This will read the file with less overhead because it can slurp it into a single buffer, whereas reading lines with BufReader implies a lot of copying and allocating, first into the buffer inside BufReader, and then into a newly allocated String for each line, and then into a newly allocated the String for each word. It will also use less memory, because the single large String and vectors of references are more compact than many individual Strings.
A drawback is that you can't directly return the structure of references, because it can't live past the stack frame the holds the single large String. In example_use above, we have to put the large String into a let in order to call words_by_line. It is possible to get around this with unsafe code and wrapping the String and references in a private struct, but that is much more complicated.