Convert image to bytes and then write to new file - rust

I'm trying to take an image that is converted into a vector of bytes and write those bytes to a new file. The first part is working, and my code is compiling, but the new file that is created ends up empty (nothing is written to it). What am I missing?
Is there a cleaner way to convert Vec<u8> into &[u8] so that it can be written? The way I'm currently doing it seems kind of ridiculous...
use std::os;
use std::io::BufferedReader;
use std::io::File;
use std::io::BufferedWriter;
fn get_file_buffer(path_str: String) -> Vec<u8> {
let path = Path::new(path_str.as_bytes());
let file = File::open(&path);
let mut reader = BufferedReader::new(file);
match reader.read_to_end() {
Ok(x) => x,
Err(_) => vec![0],
}
}
fn main() {
let file = get_file_buffer(os::args()[1].clone());
let mut new_file = File::create(&Path::new("foo.png")).unwrap();
let mut writer = BufferedWriter::new(new_file);
writer.write(String::from_utf8(file).unwrap().as_bytes()).unwrap();
writer.flush().unwrap();
}

Given a Vec<T>, you can get a &[T] out of it in two ways:
Take a reference to a dereference of it, i.e. &*file; this works because Vec<T> implements Deref<[T]>, so *file is effectively of type [T] (though doing that without borrowing it, i.e. &*file, is not legal).
Call the as_slice() method.
As the BufWriter docs say, “the buffer will be written out when the writer is dropped”, so that writer.flush().unwrap() is not strictly necessary, serving only to make handling of errors explicit.
But as for the behaviour you describe, that I mostly do not observe. So long as you do not encounter any I/O errors, the version not using the String dance will work fine, while with the String dance it will panic if the input data is not legal UTF-8 (which if you’re dealing with images it probably won’t be). String::from_utf8 returns None in such cases, and so unwrapping that panics.

Related

Rust read last x lines from file

Currently, I'm using this function in my code:
fn lines_from_file(filename: impl AsRef<Path>) -> Vec<String> {
let file = File::open(filename).expect("no such file");
let buf = BufReader::new(file);
buf.lines().map(|l| l.expect("Could not parse line")).collect()
}
How can I safely read the last x lines only in the file?
The tail crate claims to provide an efficient means of reading the final n lines from a file by means of the BackwardsReader struct, and looks fairly easy to use. I can't swear to its efficiency (it looks like it performs progressively larger reads seeking further and further back in the file, which is slightly suboptimal relative to an optimized memory map-based solution), but it's an easy all-in-one package and the inefficiencies likely won't matter in 99% of all use cases.
To avoid storing files in memory (because the files were quite large) I chose to use rev_buffer_reader and to only take x elements from the iterator
fn lines_from_file(file: &File, limit: usize) -> Vec<String> {
let buf = RevBufReader::new(file);
buf.lines().take(limit).map(|l| l.expect("Could not parse line")).collect()
}

Unknown size at compile time when trying to print string contents in Rust

I have a couple of pieces of code, once errors out and the other doesn't, and I don't understand why.
The one that errors out when compiling:
fn main() {
let s1 = String::from("hello");
println!("{}", *s1);
}
This throws: doesn't have a size known at compile-time, on the line println!("{}", *s1);
The one that works:
fn main() {
let s1 = String::from("hello");
print_string(&s1);
}
fn print_string(s1: &String) {
println!("{}", *s1);
}
Why is this happening? Aren't both correct ways to access the string contents and printing them?
In the first snippet you’re dereferencing a String. This yields an str which is a dynamically sized type (sometimes called unsized types in older texts). DSTs are somewhat difficult to use directly
In the second snippet you’re dereferencing a &String, which yields a regular String, which is a normal sized type.
In both cases the dereference is completely useless, why are you even using one?

Read an arbitrary number of bytes from type implementing Read

I have something that is Read; currently it's a File. I want to read a number of bytes from it that is only known at runtime (length prefix in a binary data structure).
So I tried this:
let mut vec = Vec::with_capacity(length);
let count = file.read(vec.as_mut_slice()).unwrap();
but count is zero because vec.as_mut_slice().len() is zero as well.
[0u8;length] of course doesn't work because the size must be known at compile time.
I wanted to do
let mut vec = Vec::with_capacity(length);
let count = file.take(length).read_to_end(vec).unwrap();
but take's receiver parameter is a T and I only have &mut T (and I'm not really sure why it's needed anyway).
I guess I can replace File with BufReader and dance around with fill_buf and consume which sounds complicated enough but I still wonder: Have I overlooked something?
Like the Iterator adaptors, the IO adaptors take self by value to be as efficient as possible. Also like the Iterator adaptors, a mutable reference to a Read is also a Read.
To solve your problem, you just need Read::by_ref:
use std::io::Read;
use std::fs::File;
fn main() {
let mut file = File::open("/etc/hosts").unwrap();
let length = 5;
let mut vec = Vec::with_capacity(length);
file.by_ref().take(length as u64).read_to_end(&mut vec).unwrap();
let mut the_rest = Vec::new();
file.read_to_end(&mut the_rest).unwrap();
}
1. Fill-this-vector version
Your first solution is close to work. You identified the problem but did not try to solve it! The problem is that whatever the capacity of the vector, it is still empty (vec.len() == 0). Instead, you could actually fill it with empty elements, such as:
let mut vec = vec![0u8; length];
The following full code works:
#![feature(convert)] // needed for `as_mut_slice()` as of 2015-07-19
use std::fs::File;
use std::io::Read;
fn main() {
let mut file = File::open("/usr/share/dict/words").unwrap();
let length: usize = 100;
let mut vec = vec![0u8; length];
let count = file.read(vec.as_mut_slice()).unwrap();
println!("read {} bytes.", count);
println!("vec = {:?}", vec);
}
Of course, you still have to check whether count == length, and read more data into the buffer if that's not the case.
2. Iterator version
Your second solution is better because you won't have to check how many bytes have been read, and you won't have to re-read in case count != length. You need to use the bytes() function on the Read trait (implemented by File). This transform the file into a stream (i.e an iterator). Because errors can still happen, you don't get an Iterator<Item=u8> but an Iterator<Item=Result<u8, R::Err>>. Hence you need to deal with failures explicitly within the iterator. We're going to use unwrap() here for simplicity:
use std::fs::File;
use std::io::Read;
fn main() {
let file = File::open("/usr/share/dict/words").unwrap();
let length: usize = 100;
let vec: Vec<u8> = file
.bytes()
.take(length)
.map(|r: Result<u8, _>| r.unwrap()) // or deal explicitly with failure!
.collect();
println!("vec = {:?}", vec);
}
You can always use a bit of unsafe to create a vector of uninitialized memory. It is perfectly safe to do with primitive types:
let mut v: Vec<u8> = Vec::with_capacity(length);
unsafe { v.set_len(length); }
let count = file.read(vec.as_mut_slice()).unwrap();
This way, vec.len() will be set to its capacity, and all bytes in it will be uninitialized (likely zeros, but possibly some garbage). This way you can avoid zeroing the memory, which is pretty safe for primitive types.
Note that read() method on Read is not guaranteed to fill the whole slice. It is possible for it to return with number of bytes less than the slice length. There are several RFCs on adding methods to fill this gap, for example, this one.

Idiomatic way to parse binary data into primitive types

I've written the following method to parse binary data from a gzipped file using GzDecoder from the Flate2 library
fn read_primitive<T: Copy>(reader: &mut GzDecoder<File>) -> std::io::Result<T>
{
let sz = mem::size_of::<T>();
let mut vec = Vec::<u8>::with_capacity(sz);
let ret: T;
unsafe{
vec.set_len(sz);
let mut s = &mut vec[..];
try!(reader.read(&mut s));
let ptr :*const u8 = s.as_ptr();
ret = *(ptr as *const T)
}
Ok(ret)
}
It works, but I'm not particularly happy with the code, especially with using the dummy vector and the temporary variable ptr. It all feels very inelegant to me and I'm sure there's a better way to do this. I'd be happy to hear any suggestions of how to clean up this code.
Your code allows any copyable T, not just primitives. That means you could try to parse in something with a reference, which is probably not what you want:
#[derive(Copy)]
struct Foo(&str);
However, the general sketch of your code is what I'd expect. You need a temporary place to store some data, and then you must convert that data to the appropriate primitive (perhaps dealing with endinaness issues).
I'd recommend the byteorder library. With it, you call specific methods for the primitive that is required:
reader.read_u16::<LittleEndian>()
Since these methods know the desired size, they can stack-allocate an array to use as the temporary buffer, which is likely a bit more efficient than a heap-allocation. Additionally, I'd suggest changing your code to accept a generic object with the Read trait, instead of the specific GzDecoder.
You may also want to look into a serialization library like rustc-serialize or serde to see if they fit any of your use cases.

Is this the right way to read lines from file and split them into words in Rust?

Editor's note: This code example is from a version of Rust prior to 1.0 and is not syntactically valid Rust 1.0 code. Updated versions of this code produce different errors, but the answers still contain valuable information.
I've implemented the following method to return me the words from a file in a 2 dimensional data structure:
fn read_terms() -> Vec<Vec<String>> {
let path = Path::new("terms.txt");
let mut file = BufferedReader::new(File::open(&path));
return file.lines().map(|x| x.unwrap().as_slice().words().map(|x| x.to_string()).collect()).collect();
}
Is this the right, idiomatic and efficient way in Rust? I'm wondering if collect() needs to be called so often and whether it's necessary to call to_string() here to allocate memory. Maybe the return type should be defined differently to be more idiomatic and efficient?
There is a shorter and more readable way of getting words from a text file.
use std::io::{BufRead, BufReader};
use std::fs::File;
let reader = BufReader::new(File::open("file.txt").expect("Cannot open file.txt"));
for line in reader.lines() {
for word in line.unwrap().split_whitespace() {
println!("word '{}'", word);
}
}
You could instead read the entire file as a single String and then build a structure of references that points to the words inside:
use std::io::{self, Read};
use std::fs::File;
fn filename_to_string(s: &str) -> io::Result<String> {
let mut file = File::open(s)?;
let mut s = String::new();
file.read_to_string(&mut s)?;
Ok(s)
}
fn words_by_line<'a>(s: &'a str) -> Vec<Vec<&'a str>> {
s.lines().map(|line| {
line.split_whitespace().collect()
}).collect()
}
fn example_use() {
let whole_file = filename_to_string("terms.txt").unwrap();
let wbyl = words_by_line(&whole_file);
println!("{:?}", wbyl)
}
This will read the file with less overhead because it can slurp it into a single buffer, whereas reading lines with BufReader implies a lot of copying and allocating, first into the buffer inside BufReader, and then into a newly allocated String for each line, and then into a newly allocated the String for each word. It will also use less memory, because the single large String and vectors of references are more compact than many individual Strings.
A drawback is that you can't directly return the structure of references, because it can't live past the stack frame the holds the single large String. In example_use above, we have to put the large String into a let in order to call words_by_line. It is possible to get around this with unsafe code and wrapping the String and references in a private struct, but that is much more complicated.

Resources