I have source files that contain text CSV lines for many products for a given day. I want to use Rust to collate these files so that I end up with many new destination CSV files, one per product, each containing portions of the lines only specific to that product.
My current solution is to loop over the lines of the source files and use a HashMap<String, String> to gather the lines for each product. I split each source line and use the element containing the product ID as a key, to obtain an Entry (occupied or vacant) in my HashMap. If it is vacant, I initialize the value with a new String that is allocated up-front with a given capacity, so that I can efficiently append to it thereafter.
// so far, so good (the first CSV item is the product ID)
let mystringval = productmap.entry(splitsource[0].to_owned()).or_insert(String::with_capacity(SOME_CAPACITY));
I then want to append formatted elements of the same source line to this Entry. There are many examples online, such as
https://doc.rust-lang.org/std/collections/hash_map/struct.HashMap.html#method.entry
of how to make this work if the HashMap value is an integer:
// this works if you obtain an Entry from a HashMap containing int vals
*myval += 1;
I haven't figured out how to append more text to the Entry I obtain from my HashMap<String, String> using this kind of syntax, and I've done my best to research examples online. There are surprisingly few examples anywhere of manipulating non-numeric entries in Rust data structures.
// using the Entry obtained from my first code snippet above
*mystringval.push_str(sourcePortion.as_str());
Attempting to compile this produces the following error:
error: type `()` cannot be dereferenced
--> coll.rs:102:17
|
102 | *mystringval.push_str(sourcePortion.as_str());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
How can I append to a String inside the Entry value?
*mystringval.push_str(sourcePortion.as_str()); is parsed as *(mystringval.push_str(sourcePortion.as_str())); and since String::push_str returns (), you get the () cannot be dereferenced error.
Using parentheses around the dereference solves the precedence issue:
(*mystringval).push_str(sourcePortion.as_str());
The reason *myval += 1 works is because unary * has a higher precedence than +=, which means it's parsed as
(*myval) += 1
Since or_insert returns &mut V, you don't need to dereference it before calling its methods. The following also works:
mystringval.push_str(sourcePortion.as_str());
If you inspect the type returned by or_insert:
fn update_count(map: &mut HashMap<&str, u32>) {
let () = map.entry("hello").or_insert(0);
}
You will see it is a mutable reference:
error[E0308]: mismatched types
--> src/main.rs:4:9
|
4 | let () = map.entry("hello").or_insert(0);
| ^^ expected &mut u32, found ()
|
= note: expected type `&mut u32`
found type `()`
That means that you can call any method that needs a &mut self receiver with no extra syntax:
fn update_mapping(map: &mut HashMap<&str, String>) {
map.entry("hello").or_insert_with(String::new).push_str("wow")
}
Turning back to the integer form, what happens if we don't put the dereference?
fn update_count(map: &mut HashMap<&str, i32>) {
map.entry("hello").or_insert(0) += 1;
}
error[E0368]: binary assignment operation `+=` cannot be applied to type `&mut i32`
--> src/main.rs:4:5
|
4 | map.entry("hello").or_insert(0) += 1;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ cannot use `+=` on type `&mut i32`
error[E0067]: invalid left-hand side expression
--> src/main.rs:4:5
|
4 | map.entry("hello").or_insert(0) += 1;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ invalid expression for left-hand side
The difference is that the += operator automatically takes a mutable reference to the left-hand side of the expression. Expanded, it might look something like this:
use std::ops::AddAssign;
fn update_count(map: &mut HashMap<&str, i32>) {
AddAssign::add_assign(&mut map.entry("hello").or_insert(0), 1);
}
Adding the explicit dereference brings the types back to one that has the trait implemented:
use std::ops::AddAssign;
fn update_count(map: &mut HashMap<&str, i32>) {
AddAssign::add_assign(&mut (*map.entry("hello").or_insert(0)), 1);
}
Related
I am trying to print some data for debugging purposes when I encountered an error. You can have a look at the full code here.
The error occurred when I tried to print data returned from the prepare_hash_data() function, which returns a Vec<u8>.
let data = self.prepare_hash_data()?; // returns Vec<u8>
println!("{}", String::from_utf8(data)?); // error here
let mut hasher = Sha256::new();
hasher.input(&data[..]);
println!("{}", hasher.result_str().as_str());
self.hash = hasher.result_str();
The prepare_hash_data() function is given below. Other details are omitted. Simply, it is just a function that returns Vec<u8>
fn prepare_hash_data(&self) -> Result<Vec<u8>, failure::Error> {
let content = (
self.hash_prev_block.clone(),
self.transactions.clone(),
self.timestamp,
DIFFICULTY,
self.nonce
);
let bytes = bincode::serialize(&content)?;
Ok(bytes)
}
The error given is
error[E0382]: borrow of moved value: `data`
--> src/block.rs:63:23
|
60 | let data = self.prepare_hash_data()?;
| ---- move occurs because `data` has type `Vec<u8>`, which does not implement the `Copy` trait
61 | println!("{}", String::from_utf8(data)?);
| ---- value moved here
62 | let mut hasher = Sha256::new();
63 | hasher.input(&data[..]);
| ^^^^ value borrowed here after move
I tried the following ways
Implementing Copy trait. But, Vec<u8> can't have the Copy trait as described here.
Looking at E0382 given in the error message, there are two ways suggested.
Using a reference, we can let another function borrow the value without changing its ownership.
But how should I use reference in this example?
Should I change the function signature to something like this fn prepare_hash_data(&self) -> Result<&Vec<u8>, failure::Error>?
With Rc, a value cannot be owned by more than one variable.
Don't know how to implement.
I tried cloning the data by println!("{}", String::from_utf8(data.clone())?);
But, it gives another error backtrace::backtrace::trace_unsynchronized
For the full error log, click here
What should be the correct approach to printing some data that can't be copied or cloned without moving it for later usage in subsequent lines?
I did look at the following solutions but can't relate the answer.
move occurs because value has type Vec, which does not implement the Copy trait
'move occurs because value has type' Rust error
I am trying to print some data for debugging purposes
The code you provided:
println!("{}", String::from_utf8(data)?);
Is not the correct way to debug-print data.
For one, String::from_utf8 consumes the data, destroying it in the process. Further, your data most likely isn't valid UTF8 data, so String::from_utf8 will only throw an error.
Use debug printing instead, it works out of the box:
println!("{:?}, data);
I retract my previous answer because it partially missed your problem.
It still contained valuable information, so I have kept it below. If someone disagrees, feel free to delete it together with this sentence.
--Previous Answer--
Your problem is that String::from_utf8 consumes its argument, meaning, data cannot be accessed any more afterwards.
Here is your problem in a more compact example:
fn main() {
let data: Vec<u8> = "Hello!".as_bytes().to_vec();
// Consumes `data`
println!("{}", String::from_utf8(data).unwrap());
// `data` cannot be accessed any more, as it got moved
// into `String::from_utf8`
println!("Length of vector: {}", data.len());
}
error[E0382]: borrow of moved value: `data`
--> src/main.rs:9:38
|
2 | let data: Vec<u8> = "Hello!".as_bytes().to_vec();
| ---- move occurs because `data` has type `Vec<u8>`, which does not implement the `Copy` trait
...
5 | println!("{}", String::from_utf8(data).unwrap());
| ---- value moved here
...
9 | println!("Length of vector: {}", data.len());
| ^^^^^^^^^^ value borrowed here after move
|
help: consider cloning the value if the performance cost is acceptable
|
5 | println!("{}", String::from_utf8(data.clone()).unwrap());
| ++++++++
In your case this can be easily fixed, though. The reason it consumes the data is because String is an owning variable. It owns its data, meaning, if you create it from a Vec, it stores the Vec data internally.
There is another type: &str, a string slice. It's very similar to String, just that it doesn't own its data, but merely references it. That means you can create it without destroying data:
fn main() {
let data: Vec<u8> = "Hello!".as_bytes().to_vec();
// Borrows `data`
println!("{}", std::str::from_utf8(&data).unwrap());
// `data` is still accessible because it was only borrowed
println!("Length of vector: {}", data.len());
}
Hello!
Length of vector: 6
That said, the given solution is only a workaround. The proper way to fix this is to implement Display.
As your data object clearly has more meaning than just a bunch of bytes, create a proper struct that represents its meaning.
There are two ways I would consider:
A newtype struct, like struct MyData(Vec<u8>), for which you can implement Display.
A String. As you convert it to a string anyway, why not just make it a string right away, as the return value of prepare_hash_data? Note that if your reason to have it a Vec is that it is binary data, then you shouldn't convert it to a string via from_utf8, as it's not UTF-8 data. If it is valid UTF-8 data, however, use a String right away, not a Vec<u8>. And String already implements Display and can be printed without further conversion.
I am looping loop over a Vec<&str>, each time reassigning a variable that holds the intersection of the last two checked. This is resulting in "expected char, found &char". I think this is happening because the loop is a new block scope, which means the values from the original HashSet are borrowed, and go into the new HashSet as borrowed. Unfortunately, the type checker doesn't like that. How do I create a new HashSet<char> instead of HashSet<&char>?
Here is my code:
use std::collections::HashSet;
fn find_item_in_common(sacks: Vec::<&str>) -> char {
let mut item: Option<char> = None;
let mut sacks_iter = sacks.iter();
let matching_chars = sacks_iter.next().unwrap().chars().collect::<HashSet<_>>();
loop {
let next_sack = sacks_iter.next();
if next_sack.is_none() { break; }
let next_sack_values: HashSet<_> = next_sack.unwrap().chars().collect();
matching_chars = matching_chars.intersection(&next_sack_values).collect::<HashSet<_>>();
}
matching_chars.drain().nth(0).unwrap()
}
and here are the errors that I'm seeing:
error[E0308]: mismatched types
--> src/bin/03.rs:13:26
|
6 | let matching_chars = sacks_iter.next().unwrap().chars().collect::<HashSet<_>>();
| ---------------------------------------------------------- expected due to this value
...
13 | matching_chars = matching_chars.intersection(&next_sack_values).collect::<HashSet<_>>();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected `char`, found `&char`
|
= note: expected struct `HashSet<char>`
found struct `HashSet<&char>`
By the way, what is that first error trying to tell me? It seems like it is missing something before or after "expected" -- <missing thing?> expected <or missing thing?> due to this value?
I also tried changing matching_chars = matching_chars to matching_chars = matching_chars.cloned() and I get the following error. I understand what the error is saying, but I don't know how to resolve it.
error[E0599]: the method `cloned` exists for struct `HashSet<char>`, but its trait bounds were not satisfied
--> src/bin/03.rs:13:41
|
13 | matching_chars = matching_chars.cloned().intersection(&next_sack_values).collect::<HashSet<_>>();
| ^^^^^^ method cannot be called on `HashSet<char>` due to unsatisfied trait bounds
|
::: /Users/brandoncc/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/std/src/collections/hash/set.rs:112:1
|
112 | pub struct HashSet<T, S = RandomState> {
| -------------------------------------- doesn't satisfy `HashSet<char>: Iterator`
|
= note: the following trait bounds were not satisfied:
`HashSet<char>: Iterator`
which is required by `&mut HashSet<char>: Iterator`
Your attempt at using cloned() was almost right but you have to call it after you create the iterator:
matching_chars.intersection(&next_sack_values).cloned().collect::<HashSet<_>>()
or for Copy types you should use the more appropriate .copied() adapter:
matching_chars.intersection(&next_sack_values).copied().collect::<HashSet<_>>()
Looking at the signature of HashSet::intersection will make this clearer:
pub fn intersection<'a>(
&'a self,
other: &'a HashSet<T, S>
) -> Intersection<'a, T, S>
The type Intersection<'a, T, S> implements Iterator<Item=&'a T>. So when you collect this iterator, you get a HashSet<&char> as opposed to a HashSet<char>.
The solution is simply to use .cloned on the iterator before you use .collect, since char is Clone, like so:
matching_chars = matching_chars.intersection(&next_sack_values).cloned().collect()
By the way, what is that first error trying to tell me?
The error is telling you that it expects char because (due to) the original value for matching_chars has type HashSet<char>.
I also tried changing matching_chars = matching_chars to matching_chars = matching_chars.cloned() and I get the following error. I understand what the error is saying, but I don't know how to resolve it.
Do you, really?
str::chars is an Iterator<Item=char>, so when you collect() to a hashset you get a HashSet<char>.
The problem is that intersection borrows the hashset, and since the items the hashset contains may or may not be Clone, it also has to borrow the set items, it can't just copy or clone them (not without restricting its flexibility anyway).
So that's where you need to add the cloned call, on the HashSet::intersection in order to adapt it from an Iterator<Item=&char> to an Iterator<Item=char>.
Or you can just use the & operator, which takes two borrowed hashsets and returns an owned hashset (requiring that the items be Clone).
Alternatively use Iterator::filter or Iterator::findon one of the sets, checking if the othersHashSet::containsthe item being looked at. Fundamentally that's basically whatintersection` does, and you know there's just one item at the end.
std::mem::swap has the signature:
pub fn swap<T>(x: &mut T, y: &mut T)
If I try to implement it (playground):
pub fn swap<T>(a: &mut T, b: &mut T) {
let t = a;
a = b;
b = t;
}
I get an error about the lifetimes of the two parameters:
error[E0623]: lifetime mismatch
--> src/lib.rs:4:9
|
1 | pub fn swap<T>(a: &mut T, b: &mut T) {
| ------ ------
| |
| these two types are declared with different lifetimes...
...
4 | b = t;
| ^ ...but data from `a` flows into `b` here
error[E0623]: lifetime mismatch
--> src/lib.rs:3:9
|
1 | pub fn swap<T>(a: &mut T, b: &mut T) {
| ------ ------ these two types are declared with different lifetimes...
2 | let t = a;
3 | a = b;
| ^ ...but data from `b` flows into `a` here
If I change the signature to:
pub fn swap_lt<'t, T>(mut a: &'t T, mut b: &'t T)
It compiles, but I get a warning which seems to mean that we're just swapping temporary copies:
warning: value assigned to `a` is never read
--> src/lib.rs:3:5
|
3 | a = b;
| ^
|
= note: `#[warn(unused_assignments)]` on by default
= help: maybe it is overwritten before being read?
warning: value assigned to `b` is never read
--> src/lib.rs:4:5
|
4 | b = t;
| ^
|
= help: maybe it is overwritten before being read?
Your code is not operating on temporary copies. It just swaps the references that were passed in, which does not have any effect on the values they are pointing to. This also explains why the compiler wants the lifetimes to match – reference x is pointing to the value reference y pointed to before and vice versa, which is only possible if the two references have the same lifetime.
When swapping the actual values, a different problem occurs. You first need to move one of the values to a temporary variable. However, since T is not Copy, you can't move a value out from behind a reference, since this would leave the reference invalid, which is not allowed in Rust. If you allow T: Default, you could replace the value with its default temporarily. However, if you want to implement the function for the general case, you need to resort to unsafe code. One way of doing so is using the std::ptr::read() and std::ptr::write() functions to read and write data from raw pointers:
fn swap<T>(x: &mut T, y: &mut T) {
unsafe {
let z = read(x);
write(x, read(y));
write(y, z);
}
}
This code is trickier than it looks. The read() function returns a copy of the value without invalidating the original value, so we end up with the same non-Copy value being present in two places. We need to take care that we don't drop any of the values, which happens implicitly in many cases. For example, this implementation is wrong, since it implicitly drops the value x is initially pointing to
fn swap<T>(x: &mut T, y: &mut T) {
unsafe {
let z = read(x);
*x = read(y); // Wrong – drops the original value x is pointing to
write(y, z);
}
}
The actual implementation of swap() in the standard library uses a few optimizations:
It makes use of the std::ptr::copy_nonoverlapping() function instead of write(x, read(y)), which is implemented as a compiler intrinsic. The Rust compiler delegates this to LLVM to make sure the generated code is as efficient as possible for the target platform. Our code actually uses temporary storage for both x and y. Using copy_nonoverlapping(), temporary storage is only needed for one of the variables.
Values of size 32 or larger are swapped in blocks, so only 32 bytes of temporary storage are needed.
If you, for the sake of an exercise, don't want to use core::mem::swap or say core::ptr::swap, you could implement it as such:
pub fn swap<T>(a: &mut T, b: &mut T) {
unsafe {
let t = core::ptr::read(a);
core::ptr::copy_nonoverlapping(b, a, 1);
core::ptr::write(b, t);
}
}
Doing it using strictly safe code is not possible without having something like T: Default.
Other answers have covered unsafe implementations of swap(). A safe implementation is possible as well, but it requires additional constraints on T. For example:
pub fn swap<T: Default>(x: &mut T, y: &mut T) {
let t = std::mem::take(x);
*x = std::mem::take(y);
*y = t;
}
Here T: Default is required by std::mem::take(), which moves the value out of an &mut T reference, and leaves T::default() as replacement. A replacement is needed because the value behind the reference can and will be used again, so it must be in a valid state. For example, to move the value out of *x, we need to leave a well-defined value in *x because we will assign to *x in the subsequent line. The assignment, unaware of the previous operation, expects a valid value on the left-hand side, in order to destroy it. Leaving the old value untouched in *x would result in use-after-free and ultimately a double-free.
Another option is to require Clone:
pub fn swap<T: Clone>(x: &mut T, y: &mut T) {
let t = x.clone();
*x = y.clone();
*y = t;
}
For standard library containers this variant will be less efficient because T::clone() will perform a deep copy of the container, whereas T::default() will create an empty container without performing an allocation.
Implementing swap() without additional constraint on T requires unsafe code, as shown in other answers.
Please help me to compile my code attached bellow. The compiler says that following 2 patterns depending on which lines I comment out.
The program reads a &str which is a simple "svg path command" like code then parses it. The pasted code has been simplified for simplicity. It uses Regex to split the input string into lines then study each line in the main for loop. Each loop pushes the parse result onto a vector. Finally the function returns the vector.
Basically the compiler says returning the vector is not allowed because it refers local variable. Though I don't have any workaround.
error[E0597]: `cmd` does not live long enough
--> src/main.rs:24:25
|
24 | codeV = re.captures(cmd.as_str());
| ----- ^^^ borrowed value does not live long enough
| |
| borrow might be used here, when `codeV` is dropped and runs the destructor for type `Option<regex::Captures<'_>>`
...
30 | }
| - `cmd` dropped here while still borrowed
|
= note: values in a scope are dropped in the opposite order they are defined
error[E0515]: cannot return value referencing local variable `cmd`
--> src/main.rs:31:1
|
24 | codeV = re.captures(cmd.as_str());
| --- `cmd` is borrowed here
...
31 | V //Error
| ^ returns a value referencing data owned by the current function
Playground
use regex::Regex;
pub fn parse(path:&str) {//->Vec<Option<regex::Captures<>>> //Error
let reg_n=Regex::new(r"\n").unwrap();
let path=reg_n.replace_all("\n"," ");
let reg_cmd=Regex::new(r"(?P<cmd>[mlhv])").unwrap();
let path=reg_cmd.replace_all(&path,"\n${cmd}");
let cmdV=reg_n.split(&path);
//let cmdV:Vec<&str> = reg.split(path).map(|x|x).collect();
let mut V:Vec<Option<regex::Captures<>>>=vec![];
let mut codeV:Option<regex::Captures<>>=None;
let mut count=0;
for cmd_f in cmdV{//This loop block has been simplified.
count+=1;
if count==1{continue;}
let mut cmd="".to_string();
cmd=cmd_f.to_string();
cmd=cmd.replace(" ","");
let re = Regex::new(r"\{(?P<code>[^\{^\}]{0,})\}").unwrap();
codeV = re.captures(cmd.as_str());
//cmd= re.replace_all(cmd.as_str(),"").to_string();
let cmd_0=cmd.chars().nth(0).unwrap();
//cmd.remove(0);
//V.push(codeV); //Compile error
V.push(None); //OK
}
//V
}
fn main() {
parse("m {abcd} l {efgh}");
}
Though I don't have any workaround.
regex's captures refer to the string they matched for efficiency. This means they can't outlive that string, as the match groups are essentially just offsets into that string.
Since the strings you match are created in the loop body, this means captures can't escape the loop body.
Aside from not creating strings in the loop body (or even the function), the solution / workaround is to convert your capture groups to owned data and store that: instead of trying to return a vector of captures, extract from the capture the data you actually want, convert it to an owned String (or tuple thereof, or whatever), and push that onto your vector.
e.g. https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=0107333e30f831a418d75b280e9e2f31
you can use cmd.clone().as_str() if you not sure the value has borrowed or no
I defined an Attribute type and I have a Vec<Attribute> that I am looping over to retrieve the "best" one. This was similar to my first attempt:
#[derive(Debug)]
struct Attribute;
impl Attribute {
fn new() -> Self {
Self
}
}
fn example(attrs: Vec<Attribute>, root: &mut Attribute) {
let mut best_attr = &Attribute::new();
for a in attrs.iter() {
if is_best(a) {
best_attr = a;
}
}
*root = *best_attr;
}
// simplified for example
fn is_best(_: &Attribute) -> bool {
true
}
I had the following compile error:
error[E0507]: cannot move out of borrowed content
--> src/lib.rs:17:13
|
17 | *root = *best_attr;
| ^^^^^^^^^^ cannot move out of borrowed content
After some searching for a solution, I resolved the error by doing the following:
Adding a #[derive(Clone)] attribute to my Attribute struct
Replacing the final statement with *root = best_attr.clone();
I don't fully understand why this works, and I feel like this is a rough solution to the problem I was having. How does this resolve the error, and is this the correct way to solve this problem?
You are experiencing the basis of the Rust memory model:
every object can (and must!) be owned by only exactly one other object
most types are never implicitly copied and always moved (there are some exceptions: types that implement Copy)
Take this code for example:
let x = String::new();
let y = x;
println!("{}", x);
it generates the error:
error[E0382]: borrow of moved value: `x`
--> src/main.rs:4:20
|
3 | let y = x;
| - value moved here
4 | println!("{}", x);
| ^ value borrowed here after move
|
= note: move occurs because `x` has type `std::string::String`, which does not implement the `Copy` trait
x, of type String, is not implicitly copyable, and thus has been moved into y. x cannot be used any longer.
In your code, when you write *root = *best_attr, you are first dereferencing the reference best_attr, then assigning the dereferenced value to *root. Your Attribute type is not Copy, thus this assignment should be a move.
Then, the compiler complains:
cannot move out of borrowed content
Indeed, best_attr is an immutable reference, which does not allow you to take ownership of the value behind it (it doesn't even allow modifying it). Allowing such a move would put the object owning the value behind the reference in an undefined state, which is exactly what Rust aims to prevent.
In this case, your best option is indeed to create a new object with the same value as the first one, which is exactly what the trait Clone is made for.
#[derive(Clone)] allows you to mark your structs as Clone-able, as long as all of their fields are Clone. In more complex cases, you'll have to implement the trait by hand.