Vecs support std::io::Write, so code can be written that takes a File or Vec, for example. From the API reference, it looks like neither Vec nor slices support std::io::Read.
Is there a convenient way to achieve this? Does it require writing a wrapper struct?
Here is an example of working code, that reads and writes a file, with a single line commented that should read a vector.
use ::std::io;
// Generic IO
fn write_4_bytes<W>(mut file: W) -> Result<usize, io::Error>
where W: io::Write,
{
let len = file.write(b"1234")?;
Ok(len)
}
fn read_4_bytes<R>(mut file: R) -> Result<[u8; 4], io::Error>
where R: io::Read,
{
let mut buf: [u8; 4] = [0; 4];
file.read(&mut buf)?;
Ok(buf)
}
// Type specific
fn write_read_vec() {
let mut vec_as_file: Vec<u8> = Vec::new();
{ // Write
println!("Writing Vec... {}", write_4_bytes(&mut vec_as_file).unwrap());
}
{ // Read
// println!("Reading File... {:?}", read_4_bytes(&vec_as_file).unwrap());
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// Comment this line above to avoid an error!
}
}
fn write_read_file() {
let filepath = "temp.txt";
{ // Write
let mut file_as_file = ::std::fs::File::create(filepath).expect("open failed");
println!("Writing File... {}", write_4_bytes(&mut file_as_file).unwrap());
}
{ // Read
let mut file_as_file = ::std::fs::File::open(filepath).expect("open failed");
println!("Reading File... {:?}", read_4_bytes(&mut file_as_file).unwrap());
}
}
fn main() {
write_read_vec();
write_read_file();
}
This fails with the error:
error[E0277]: the trait bound `std::vec::Vec<u8>: std::io::Read` is not satisfied
--> src/main.rs:29:42
|
29 | println!("Reading File... {:?}", read_4_bytes(&vec_as_file).unwrap());
| ^^^^^^^^^^^^ the trait `std::io::Read` is not implemented for `std::vec::Vec<u8>`
|
= note: required by `read_4_bytes`
I'd like to write tests for a file format encoder/decoder, without having to write to the file-system.
While vectors don't support std::io::Read, slices do.
There is some confusion here caused by Rust being able to coerce a Vec into a slice in some situations but not others.
In this case, an explicit coercion to a slice is needed because at the stage coercions are applied, the compiler doesn't know that Vec<u8> doesn't implement Read.
The code in the question will work when the vector is coerced into a slice using one of the following methods:
read_4_bytes(&*vec_as_file)
read_4_bytes(&vec_as_file[..])
read_4_bytes(vec_as_file.as_slice()).
Note:
When asking the question initially, I was taking &Read instead of Read. This made passing a reference to a slice fail, unless I'd passed in &&*vec_as_file which I didn't think to do.
Recent versions of rust you can also use as_slice() to convert a Vec to a slice.
Thanks to #arete on #rust for finding the solution!
std::io::Cursor
std::io::Cursor is a simple and useful wrapper that implements Read for Vec<u8>, so it allows to use vector as a readable entity.
let mut file = Cursor::new(vector);
read_something(&mut file);
And documentation shows how to use Cursor instead of File to write unit-tests!
Working example:
use std::io::Cursor;
use std::io::Read;
fn read_something(file: &mut impl Read) {
let _ = file.read(&mut [0; 8]);
}
fn main() {
let vector = vec![1, 2, 3, 4];
let mut file = Cursor::new(vector);
read_something(&mut file);
}
From the documentation about std::io::Cursor:
Cursors are typically used with in-memory buffers to allow them to implement Read and/or Write...
The standard library implements some I/O traits on various types which are commonly used as a buffer, like Cursor<Vec<u8>> and Cursor<&[u8]>.
Slice
The example above works for slices as well. In that case it would look like the following:
read_something(&mut &vector[..]);
Working example:
use std::io::Read;
fn read_something(file: &mut impl Read) {
let _ = file.read(&mut [0; 8]);
}
fn main() {
let vector = vec![1, 2, 3, 4];
read_something(&mut &vector[..]);
}
&mut &vector[..] is a "mutable reference to a slice" (a reference to a reference to a part of vector), so I just find the explicit option with Cursor to be more clear and elegant.
Cursor <-> Slice
Even more: if you have a Cursor that owns a buffer, and you need to emulate, for instance, a part of a "file", you can get a slice from the Cursor and pass to the function.
read_something(&mut &file.get_ref()[1..3]);
Related
To read the bytes of a PNG file, I want to create a function called read_8_bytes which will read the next 8 bytes in the file each time it's called.
fn main(){
let png = File::open("test.png").expect("1");
let mut png_reader = BufReader::new(png);
let mut byteBuffer: Vec<u8> = vec![0;8];
png_reader.read_exact(&mut byteBuffer).expect("2");
}
This works fine and if I keep calling read_exact from main I can read the next 8 bytes. I tried to create a function to do this and the solution just seems needlessly complicated. I'm wondering if there is a better way.
I thought I have to pass the BufReader to the function, but due to how Rust works this makes things complicated and I end up working out I need to do something like:
fn read_eight_bytes<R: BufRead>(fd: &mut R)
This compiles but I'm not happy because I don't understand why this needed to be done and seems complex. Is there a simple way of having a function I can pass a file descriptor type thing to and have it store the position like in C without having to do this?
Looking at your question, I think you are trying to say that you are confused as to why the <R: BufRead> is necessary or furthermore why this even works.
In your example, this generic is not strictly necessary. One could implement the function you describe like so:
use std::{fs, io};
fn main() -> io::Result<()> {
let mut file = fs::File::open("./path/to/file")?;
let bytes = read_eight_bytes(&mut file)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes(file: &mut fs::File) -> io::Result<[u8; 8]> {
use io::Read;
let mut bytes = [0; 8];
file.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
This is perfectly valid and hopefully should make sense.
But then, why does fn read_eight_bytes<R: BufRead>(file: &mut R) -> [u8; 8] work? First of all, I assume you understand the following concepts:
Generics
Traits
Given an understanding of the above concepts, you should know that this syntax means that the function read_eight_bytes is a generic function with a generic type named R. You should then also understand that the generic has a trait bound, requiring the type R to implement BufRead. And that this function takes a parameter which is a mutable reference to the variable file, which is of the type R.
Now taking a look at the definition of BufRead: we see that it contains several functions. But surprisingly there is no read_exact function! Why does a function like this compile?
use std::{fs, io};
use io::BufRead;
fn main() -> io::Result<()> {
let file = fs::File::open("./path/to/file")?;
let mut reader = io::BufReader::new(file);
let bytes = read_eight_bytes(&mut reader)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes<R: BufRead>(reader: &mut R) -> io::Result<[u8; 8]> {
let mut bytes = [0; 8];
reader.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
Note: I have altered the return type to io::Result<...>. This is considered to be a better practice compared to unwraping every Result.
I have also changed the function call to use a BufReader because BufReader implements BufRead whilst File does not. I will cover the difference a little further below.
The reason this works is because BufRead is a Super Trait. This means that any type that implements BufRead must also implement Read too. And thus it must have the read_exact function!
Given our function never requires the functions on BufRead we could change the trait bound to only require Read:
use std::{fs, io};
use io::Read;
fn main() -> io::Result<()> {
let file = fs::File::open("./path/to/file")?;
let mut reader = io::BufReader::new(file);
let bytes = read_eight_bytes(&mut reader)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes<R: Read>(reader: &mut R) -> io::Result<[u8; 8]> {
let mut bytes = [0; 8];
reader.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
Now here is something interesting about this change. The read_eight_bytes function can now be called in (at least) two different ways:
use std::{fs, io};
use io::Read;
fn main() -> io::Result<()> {
let mut file = fs::File::open("./path/to/file")?;
let bytes = read_eight_bytes(&mut file)?;
println!("{:?}", bytes);
let file = fs::File::open("./path/to/file")?;
let mut reader = io::BufReader::new(file);
let bytes = read_eight_bytes(&mut reader)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes<R: Read>(reader: &mut R) -> io::Result<[u8; 8]> {
let mut bytes = [0; 8];
reader.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
Why is this? This is because both File and BufReader implement the Read trait. And thus can both be used with the read_eight_bytes function!
So then why would someone want to use either File or BufReader over the other?
Well the BufReader documentation explains this:
The BufReader struct adds buffering to any reader.
It can be excessively inefficient to work directly with a Read
instance. For example, every call to read on TcpStream results in a
system call. A BufReader performs large, infrequent reads on the
underlying Read and maintains an in-memory buffer of the results.
BufReader can improve the speed of programs that make small and
repeated read calls to the same file or network socket. It does not
help when reading very large amounts at once, or reading just one or a
few times. It also provides no advantage when reading from a source
that is already in memory, like a Vec.
Now, remember how before we wrote this function just for the File type? The primary reason why one would want to write it with generics would be such that a caller can make the choice presented above. This is common practice in libraries where such a choice really does matter. However, generics come at the cost of increased compile times (when used excessively) and increased code complexity.
I'm trying to understand how trait objects are implemented in Rust. Please let me know if the following understanding is correct.
I have a function that takes any type that implements the Write trait:
fn some_func(write_to: &mut Write) {}
In any place where we have a type that implements this trait and calls the above function, the compiler generates a "trait object", probably by adding a call to TraitObject::new(data, vtable).
If we have something like:
let input = get_user_input(); // say we are expecting the input to be 1 or 2
let mut file = File::new("blah.txt").unwrap();
let mut vec: Vec<u8> = vec![1, 2, 3];
match input {
1 => some_func(&mut file),
2 => some_func(&mut vec),
}
will probably turn out to be:
match input {
1 => {
let file_write_trait_object: &mut Write =
TraitObject::new(&file, &vtable_for_file_write_trait);
some_func(file_write_trait_object);
}
2 => {
let vec_write_trait_object: &mut Write =
TraitObject::new(&vec, &vtable_for_vec_write_trait);
some_func(vec_write_trait_object);
}
}
Inside some_func the compiler will just access the methods used based on the vtable in the TraitObject passed along.
Trait objects are fat pointers, so fn some_func(write_to: &mut Write) compiles to something like fn some_func(_: *mut OpaqueStruct, _: *const WriteVtable).
With this sample code:
use std::fs::{File};
use std::io::{BufRead, BufReader};
use std::path::Path;
type BoxIter<T> = Box<Iterator<Item=T>>;
fn tokens_from_str<'a>(text: &'a str)
-> Box<Iterator<Item=String> + 'a> {
Box::new(text.lines().flat_map(|s|
s.split_whitespace().map(|s| s.to_string())
))
}
// Returns an iterator of an iterator. The use case is a very large file where
// each line is very long. The outer iterator goes over the file's lines.
// The inner iterator returns the words of each line.
pub fn tokens_from_path<P>(path_arg: P)
-> BoxIter<BoxIter<String>>
where P: AsRef<Path> {
let reader = reader_from_path(path_arg);
let iter = reader.lines()
.filter_map(|result| result.ok())
.map(|s| tokens_from_str(&s));
Box::new(iter)
}
fn reader_from_path<P>(path_arg: P) -> BufReader<File>
where P: AsRef<Path> {
let path = path_arg.as_ref();
let file = File::open(path).unwrap();
BufReader::new(file)
}
I get this compiler error message:
rustc 1.18.0 (03fc9d622 2017-06-06)
error: `s` does not live long enough
--> <anon>:23:35
|
23 | .map(|s| tokens_from_str(&s));
| ^- borrowed value only lives until here
| |
| does not live long enough
|
= note: borrowed value must be valid for the static lifetime...
My questions are:
How can this be fixed (without changing the function signatures, if possible?)
Any suggestions on better function arguments and return values?
One issue is that .split_whitespace() takes a reference, and doesn't own its content. So when you try to construct a SplitWhitespace object with an owned owned object (this happens when you call .map(|s| tokens_from_str(&s))), the string s is dropped while SplitWhitespace is still trying to reference it. I wrote a quick fix to this by creating a struct that takes ownership of the String and yields a SplitWhitespace on demand.
use std::fs::File;
use std::io::{BufRead, BufReader};
use std::path::Path;
use std::iter::IntoIterator;
use std::str::SplitWhitespace;
pub struct SplitWhitespaceOwned(String);
impl<'a> IntoIterator for &'a SplitWhitespaceOwned {
type Item = &'a str;
type IntoIter = SplitWhitespace<'a>;
fn into_iter(self) -> Self::IntoIter {
self.0.split_whitespace()
}
}
// Returns an iterator of an iterator. The use case is a very large file where
// each line is very long. The outer iterator goes over the file's lines.
// The inner iterator returns the words of each line.
pub fn tokens_from_path<P>(path_arg: P) -> Box<Iterator<Item = SplitWhitespaceOwned>>
where P: AsRef<Path>
{
let reader = reader_from_path(path_arg);
let iter = reader
.lines()
.filter_map(|result| result.ok())
.map(|s| SplitWhitespaceOwned(s));
Box::new(iter)
}
fn reader_from_path<P>(path_arg: P) -> BufReader<File>
where P: AsRef<Path>
{
let path = path_arg.as_ref();
let file = File::open(path).unwrap();
BufReader::new(file)
}
fn main() {
let t = tokens_from_path("test.txt");
for line in t {
for word in &line {
println!("{}", word);
}
}
}
Disclaimer: Frame Challenge
When dealing with huge files, the simplest solution is to use Memory Mapped Files.
That is, you tell the OS that you want the whole file to be accessible in memory, and it's up to it to deal with paging parts of the file in and out of memory.
Once this is down, then your whole file is accessible as a &[u8] or &str (at your convenience), and you can trivially access slices of it.
It may not always be the fastest solution; it certainly is the easiest.
The problem here is that you do use to_string() to turn each item into an owned value, this is done lazily. Since it's lazy, the value before to_string is used (a &str) still exists in the returned iterator's state, and thus is invalid (since the source String is dropped as soon as your map closure returns).
Naive solution
The simplest solution here is to remove the lazy evaluation for that part of the iterator, and simply allocate all tokens as soon as the line is allocated. This will not be as fast, and will involve an additional allocation, but has minimal changes from your current function, and keeps the same signature:
// Returns an iterator of an iterator. The use case is a very large file where
// each line is very long. The outer iterator goes over the file's lines.
// The inner iterator returns the words of each line.
pub fn tokens_from_path<P>(path_arg: P) -> BoxIter<BoxIter<String>>
where
P: AsRef<Path>
{
let reader = reader_from_path(path_arg);
let iter = reader.lines()
.filter_map(|result| result.ok())
.map(|s| {
let collected = tokens_from_str(&s).collect::<Vec<_>>();
Box::new(collected.into_iter()) as Box<Iterator<Item=String>>
});
Box::new(iter)
}
This solution will be fine for any small workload, and it will only allocate about twice as much memory for the line at the same time. There's a performance penalty, but unless you have 10mb+ lines, it probably won't matter.
If you do choose this solution, I'd recommend changing the function signature of tokens_from_path to directly return a BoxIter<String>:
pub fn tokens_from_path<P>(path_arg: P) -> BoxIter<String>
where
P: AsRef<Path>
{
let reader = reader_from_path(path_arg);
let iter = reader.lines()
.filter_map(|result| result.ok())
.flat_map(|s| {
let collected = tokens_from_str(&s).collect::<Vec<_>>();
Box::new(collected.into_iter()) as Box<Iterator<Item=String>>
});
Box::new(iter)
}
Alternative: decouple tokens_from_path and tokens_from_str
The original code doesn't work because you are trying to return borrows to a String which you don't return.
We can fix that by returning the String instead - just hidden behind an opaque API. This is pretty similar to breeden's solution, but slightly different in execution.
use std::fs::{File};
use std::io::{BufRead, BufReader};
use std::path::Path;
type BoxIter<T> = Box<Iterator<Item=T>>;
/// Structure representing in our code a line, but with an opaque API surface.
pub struct TokenIntermediate(String);
impl<'a> IntoIterator for &'a TokenIntermediate {
type Item = String;
type IntoIter = Box<Iterator<Item=String> + 'a>;
fn into_iter(self) -> Self::IntoIter {
// delegate to tokens_from_str
tokens_from_str(&self.0)
}
}
fn tokens_from_str<'a>(text: &'a str) -> Box<Iterator<Item=String> + 'a> {
Box::new(text.lines().flat_map(|s|
s.split_whitespace().map(|s| s.to_string())
))
}
// Returns an iterator of an iterator. The use case is a very large file where
// each line is very long. The outer iterator goes over the file's lines.
// The inner iterator returns the words of each line.
pub fn token_parts_from_path<P>(path_arg: P) -> BoxIter<TokenIntermediate>
where
P: AsRef<Path>
{
let reader = reader_from_path(path_arg);
let iter = reader.lines()
.filter_map(|result| result.ok())
.map(|s| TokenIntermediate(s));
Box::new(iter)
}
fn reader_from_path<P>(path_arg: P) -> BufReader<File>
where P: AsRef<Path> {
let path = path_arg.as_ref();
let file = File::open(path).unwrap();
BufReader::new(file)
}
As you notice, there's no difference to tokens_from_str, and tokens_from_path just returns this opaque TokenIntermediate struct. This will be just as usable as your original solution, all it does is push the ownership of your intermediate String values onto the caller, so they can iterate the tokens in them.
I need a completely in-memory object that I can give to BufReader and BufWriter. Something like Python's StringIO. I want to write to and read from such an object using methods ordinarily used with Files.
Is there a way to do this using the standard library?
In fact there is a way: Cursor<T>!
(please also read Shepmaster's answer on why often it's even easier)
In the documentation you can see that there are the following impls:
impl<T> Seek for Cursor<T> where T: AsRef<[u8]>
impl<T> Read for Cursor<T> where T: AsRef<[u8]>
impl Write for Cursor<Vec<u8>>
impl<T> AsRef<[T]> for Vec<T>
From this you can see that you can use the type Cursor<Vec<u8>> just as an ordinary file, because Read, Write and Seek are implemented for that type!
Little example (Playground):
use std::io::{Cursor, Read, Seek, SeekFrom, Write};
// Create fake "file"
let mut c = Cursor::new(Vec::new());
// Write into the "file" and seek to the beginning
c.write_all(&[1, 2, 3, 4, 5]).unwrap();
c.seek(SeekFrom::Start(0)).unwrap();
// Read the "file's" contents into a vector
let mut out = Vec::new();
c.read_to_end(&mut out).unwrap();
println!("{:?}", out);
For a more useful example, check the documentation linked above.
You don't need a Cursor most of the time.
object that I can give to BufReader and BufWriter
BufReader requires a value that implements Read:
impl<R: Read> BufReader<R> {
pub fn new(inner: R) -> BufReader<R>
}
BufWriter requires a value that implements Write:
impl<W: Write> BufWriter<W> {
pub fn new(inner: W) -> BufWriter<W> {}
}
If you view the implementors of Read you will find impl<'a> Read for &'a [u8].
If you view the implementors of Write, you will find impl Write for Vec<u8>.
use std::io::{Read, Write};
fn main() {
// Create fake "file"
let mut file = Vec::new();
// Write into the "file"
file.write_all(&[1, 2, 3, 4, 5]).unwrap();
// Read the "file's" contents into a new vector
let mut out = Vec::new();
let mut c = file.as_slice();
c.read_to_end(&mut out).unwrap();
println!("{:?}", out);
}
Writing to a Vec will always append to the end. We also take a slice to the Vec that we can update. Each read of c will advance the slice further and further until it is empty.
The main differences from Cursor:
Cannot seek the data, so you cannot easily re-read data
Cannot write to anywhere but the end
If you want to use BufReader with an in-memory String, you can use the as_bytes() method:
use std::io::BufRead;
use std::io::BufReader;
use std::io::Read;
fn read_buff<R: Read>(mut buffer: BufReader<R>) {
let mut data = String::new();
let _ = buffer.read_line(&mut data);
println!("read_buff got {}", data);
}
fn main() {
read_buff(BufReader::new("Potato!".as_bytes()));
}
This prints read_buff got Potato!. There is no need to use a cursor for this case.
To use an in-memory String with BufWriter, you can use the as_mut_vec method. Unfortunately it is unsafe and I have not found any other way. I don't like the Cursor approach since it consumes the vector and I have not found a way yet to use the Cursor together with BufWriter.
use std::io::BufWriter;
use std::io::Write;
pub fn write_something<W: Write>(mut buf: BufWriter<W>) {
buf.write("potato".as_bytes());
}
#[cfg(test)]
mod tests {
use super::*;
use std::io::{BufWriter};
#[test]
fn testing_bufwriter_and_string() {
let mut s = String::new();
write_something(unsafe { BufWriter::new(s.as_mut_vec()) });
assert_eq!("potato", &s);
}
}
The following code reads space-delimited records from stdin, and writes comma-delimited records to stdout. Even with optimized builds it's rather slow (about twice as slow as using, say, awk).
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
let fields: Vec<_> = line.split(' ').collect();
println!("{}", fields.join(","));
}
}
One obvious improvement would be to use itertools to join without allocating a vector (the collect call causes an allocation). However, I tried a different approach:
fn main() {
let stdin = std::io::stdin();
let mut cache = Vec::<&str>::new();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
cache.extend(line.split(' '));
println!("{}", cache.join(","));
cache.clear();
}
}
This version tries to reuse the same vector over and over. Unfortunately, the compiler complains:
error: `line` does not live long enough
--> src/main.rs:7:22
|
7 | cache.extend(line.split(' '));
| ^^^^
|
note: reference must be valid for the block suffix following statement 1 at 5:39...
--> src/main.rs:5:40
|
5 | let mut cache = Vec::<&str>::new();
| ^
note: ...but borrowed value is only valid for the for at 6:4
--> src/main.rs:6:5
|
6 | for line in stdin.lock().lines().map(|x| x.unwrap()) {
| ^
error: aborting due to previous error
Which of course makes sense: the line variable is only alive in the body of the for loop, whereas cache keeps a pointer into it across iterations. But that error still looks spurious to me: since the cache is cleared after each iteration, no reference to line can be kept, right?
How can I tell the borrow checker about this?
The only way to do this is to use transmute to change the Vec<&'a str> into a Vec<&'b str>. transmute is unsafe and Rust will not raise an error if you forget the call to clear here. You might want to extend the unsafe block up to after the call to clear to make it clear (no pun intended) where the code returns to "safe land".
use std::io::BufRead;
use std::mem;
fn main() {
let stdin = std::io::stdin();
let mut cache = Vec::<&str>::new();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
let cache: &mut Vec<&str> = unsafe { mem::transmute(&mut cache) };
cache.extend(line.split(' '));
println!("{}", cache.join(","));
cache.clear();
}
}
In this case Rust doesn't know what you're trying to do. Unfortunately, .clear() does not affect how .extend() is checked.
The cache is a "vector of strings that live as long as the main function", but in extend() calls you're appending "strings that live only as long as one loop iteration", so that's a type mismatch. The call to .clear() doesn't change the types.
Usually such limited-time uses are expressed by making a long-lived opaque object that enables access to its memory by borrowing a temporary object with the right lifetime, like RefCell.borrow() gives a temporary Ref object. Implementation of that would be a bit involved and would require unsafe methods for recycling Vec's internal memory.
In this case an alternative solution could be to avoid any allocations at all (.join() allocates too) and stream the printing thanks to Peekable iterator wrapper:
for line in stdin.lock().lines().map(|x| x.unwrap()) {
let mut fields = line.split(' ').peekable();
while let Some(field) = fields.next() {
print!("{}", field);
if fields.peek().is_some() {
print!(",");
}
}
print!("\n");
}
BTW: Francis' answer with transmute is good too. You can use unsafe to say you know what you're doing and override the lifetime check.
Itertools has .format() for the purpose of lazy formatting, which skips allocating a string too.
use std::io::BufRead;
use itertools::Itertools;
fn main() {
let stdin = std::io::stdin();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
println!("{}", line.split(' ').format(","));
}
}
A digression, something like this is a “safe abstraction” in the littlest sense of the solution in another answer here:
fn repurpose<'a, T: ?Sized>(mut v: Vec<&T>) -> Vec<&'a T> {
v.clear();
unsafe {
transmute(v)
}
}
Another approach is to refrain from storing references altogether, and to store indices instead. This trick can also be useful in other data structure contexts, so this might be a nice opportunity to try it out.
use std::io::BufRead;
fn main() {
let stdin = std::io::stdin();
let mut cache = Vec::new();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
cache.push(0);
cache.extend(line.match_indices(' ').map(|x| x.0 + 1));
// cache now contains the indices where new words start
// do something with this information
for i in 0..(cache.len() - 1) {
print!("{},", &line[cache[i]..(cache[i + 1] - 1)]);
}
println!("{}", &line[*cache.last().unwrap()..]);
cache.clear();
}
}
Though you made the remark yourself in the question, I feel the need to point out that there are more elegant methods to do this using iterators, that might avoid the allocation of a vector altogether.
The approach above was inspired by a similar question here, and becomes more useful if you need to do something more complicated than printing.
Elaborating on Francis's answer about using transmute(), this could be safely abstracted, I think, with this simple function:
pub fn zombie_vec<'a, 'b, T: ?Sized>(mut data: Vec<&'a T>) -> Vec<&'b T> {
data.clear();
unsafe {
std::mem::transmute(data)
}
}
Using this, the original code would be:
fn main() {
let stdin = std::io::stdin();
let mut cache0 = Vec::<&str>::new();
for line in stdin.lock().lines().map(|x| x.unwrap()) {
let mut cache = cache0; // into the loop
cache.extend(line.split(' '));
println!("{}", cache.join(","));
cache0 = zombie_vec(cache); // out of the loop
}
}
You need to move the outer vector into every loop iteration, and restore it back to before you finish, while safely erasing the local lifetime.
The safe solution is to use .drain(..) instead of .clear() where .. is a "full range". It returns an iterator, so drained elements can be processed in a loop. It is also available for other collections (String, HashMap, etc.)
fn main() {
let mut cache = Vec::<&str>::new();
for line in ["first line allocates for", "second"].iter() {
println!("Size and capacity: {}/{}", cache.len(), cache.capacity());
cache.extend(line.split(' '));
println!(" {}", cache.join(","));
cache.drain(..);
}
}