Read up to n bytes from file [duplicate] - rust

I have a struct with a BufStream<T> where T: Read+Write.
The BufStream can be a TcpStream and I'd like to read n bytes from it.
Not a fixed amount of bytes in a predefined buffer, but I have a string/stream which indicates the number of bytes to read next.
Is there a nice way to do that?

Since Rust 1.6, Read::read_exact can be used to do this. If bytes_to_read is the number of bytes you need to read, possibly determined at runtime, and reader is the stream to read from:
let mut buf = vec![0u8; bytes_to_read];
reader.read_exact(&mut buf)?;
The part that wasn't clear to me from the read_exact documentation was that the target buffer can be a dynamically-allocated Vec.
Thanks to the Rust Gitter community for pointing me to this solution.

It sounds like you want Read::take and Read::read_to_end.
This will allow you to read data into a &mut Vec<u8>, which is useful when you want to reuse an existing buffer or don't have an appropriately sized slice already. This allows you to avoid initializing the data with dummy values before overwriting them with the newly-read information:
use std::{
io::{prelude::*, BufReader},
str,
};
fn read_n<R>(reader: R, bytes_to_read: u64) -> Vec<u8>
where
R: Read,
{
let mut buf = vec![];
let mut chunk = reader.take(bytes_to_read);
// Do appropriate error handling for your situation
// Maybe it's OK if you didn't read enough bytes?
let n = chunk.read_to_end(&mut buf).expect("Didn't read enough");
assert_eq!(bytes_to_read as usize, n);
buf
}
fn main() {
let input_data = b"hello world";
let mut reader = BufReader::new(&input_data[..]);
let first = read_n(&mut reader, 5);
let _ = read_n(&mut reader, 1);
let second = read_n(&mut reader, 5);
println!(
"{:?}, {:?}",
str::from_utf8(&first),
str::from_utf8(&second)
);
}
If you are worried that Read::take consumes the reader by reference, note that take comes from Read and Read is implemented for any mutable reference to a type that implements Read. You can also use Read::by_ref to create this mutable reference.
See also:
Whats the idiomatic way to reference BufReader/BufWriter when passing it between functions?
Why does Iterator::take_while take ownership of the iterator?

Related

Is there a simpler way to pass a BufReader to a function?

To read the bytes of a PNG file, I want to create a function called read_8_bytes which will read the next 8 bytes in the file each time it's called.
fn main(){
let png = File::open("test.png").expect("1");
let mut png_reader = BufReader::new(png);
let mut byteBuffer: Vec<u8> = vec![0;8];
png_reader.read_exact(&mut byteBuffer).expect("2");
}
This works fine and if I keep calling read_exact from main I can read the next 8 bytes. I tried to create a function to do this and the solution just seems needlessly complicated. I'm wondering if there is a better way.
I thought I have to pass the BufReader to the function, but due to how Rust works this makes things complicated and I end up working out I need to do something like:
fn read_eight_bytes<R: BufRead>(fd: &mut R)
This compiles but I'm not happy because I don't understand why this needed to be done and seems complex. Is there a simple way of having a function I can pass a file descriptor type thing to and have it store the position like in C without having to do this?
Looking at your question, I think you are trying to say that you are confused as to why the <R: BufRead> is necessary or furthermore why this even works.
In your example, this generic is not strictly necessary. One could implement the function you describe like so:
use std::{fs, io};
fn main() -> io::Result<()> {
let mut file = fs::File::open("./path/to/file")?;
let bytes = read_eight_bytes(&mut file)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes(file: &mut fs::File) -> io::Result<[u8; 8]> {
use io::Read;
let mut bytes = [0; 8];
file.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
This is perfectly valid and hopefully should make sense.
But then, why does fn read_eight_bytes<R: BufRead>(file: &mut R) -> [u8; 8] work? First of all, I assume you understand the following concepts:
Generics
Traits
Given an understanding of the above concepts, you should know that this syntax means that the function read_eight_bytes is a generic function with a generic type named R. You should then also understand that the generic has a trait bound, requiring the type R to implement BufRead. And that this function takes a parameter which is a mutable reference to the variable file, which is of the type R.
Now taking a look at the definition of BufRead: we see that it contains several functions. But surprisingly there is no read_exact function! Why does a function like this compile?
use std::{fs, io};
use io::BufRead;
fn main() -> io::Result<()> {
let file = fs::File::open("./path/to/file")?;
let mut reader = io::BufReader::new(file);
let bytes = read_eight_bytes(&mut reader)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes<R: BufRead>(reader: &mut R) -> io::Result<[u8; 8]> {
let mut bytes = [0; 8];
reader.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
Note: I have altered the return type to io::Result<...>. This is considered to be a better practice compared to unwraping every Result.
I have also changed the function call to use a BufReader because BufReader implements BufRead whilst File does not. I will cover the difference a little further below.
The reason this works is because BufRead is a Super Trait. This means that any type that implements BufRead must also implement Read too. And thus it must have the read_exact function!
Given our function never requires the functions on BufRead we could change the trait bound to only require Read:
use std::{fs, io};
use io::Read;
fn main() -> io::Result<()> {
let file = fs::File::open("./path/to/file")?;
let mut reader = io::BufReader::new(file);
let bytes = read_eight_bytes(&mut reader)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes<R: Read>(reader: &mut R) -> io::Result<[u8; 8]> {
let mut bytes = [0; 8];
reader.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
Now here is something interesting about this change. The read_eight_bytes function can now be called in (at least) two different ways:
use std::{fs, io};
use io::Read;
fn main() -> io::Result<()> {
let mut file = fs::File::open("./path/to/file")?;
let bytes = read_eight_bytes(&mut file)?;
println!("{:?}", bytes);
let file = fs::File::open("./path/to/file")?;
let mut reader = io::BufReader::new(file);
let bytes = read_eight_bytes(&mut reader)?;
println!("{:?}", bytes);
Ok(())
}
fn read_eight_bytes<R: Read>(reader: &mut R) -> io::Result<[u8; 8]> {
let mut bytes = [0; 8];
reader.read_exact(&mut bytes)?;
Ok(bytes)
}
Playground
Why is this? This is because both File and BufReader implement the Read trait. And thus can both be used with the read_eight_bytes function!
So then why would someone want to use either File or BufReader over the other?
Well the BufReader documentation explains this:
The BufReader struct adds buffering to any reader.
It can be excessively inefficient to work directly with a Read
instance. For example, every call to read on TcpStream results in a
system call. A BufReader performs large, infrequent reads on the
underlying Read and maintains an in-memory buffer of the results.
BufReader can improve the speed of programs that make small and
repeated read calls to the same file or network socket. It does not
help when reading very large amounts at once, or reading just one or a
few times. It also provides no advantage when reading from a source
that is already in memory, like a Vec.
Now, remember how before we wrote this function just for the File type? The primary reason why one would want to write it with generics would be such that a caller can make the choice presented above. This is common practice in libraries where such a choice really does matter. However, generics come at the cost of increased compile times (when used excessively) and increased code complexity.

How to `read` a number of bytes into a `Vec`? [duplicate]

I have a struct with a BufStream<T> where T: Read+Write.
The BufStream can be a TcpStream and I'd like to read n bytes from it.
Not a fixed amount of bytes in a predefined buffer, but I have a string/stream which indicates the number of bytes to read next.
Is there a nice way to do that?
Since Rust 1.6, Read::read_exact can be used to do this. If bytes_to_read is the number of bytes you need to read, possibly determined at runtime, and reader is the stream to read from:
let mut buf = vec![0u8; bytes_to_read];
reader.read_exact(&mut buf)?;
The part that wasn't clear to me from the read_exact documentation was that the target buffer can be a dynamically-allocated Vec.
Thanks to the Rust Gitter community for pointing me to this solution.
It sounds like you want Read::take and Read::read_to_end.
This will allow you to read data into a &mut Vec<u8>, which is useful when you want to reuse an existing buffer or don't have an appropriately sized slice already. This allows you to avoid initializing the data with dummy values before overwriting them with the newly-read information:
use std::{
io::{prelude::*, BufReader},
str,
};
fn read_n<R>(reader: R, bytes_to_read: u64) -> Vec<u8>
where
R: Read,
{
let mut buf = vec![];
let mut chunk = reader.take(bytes_to_read);
// Do appropriate error handling for your situation
// Maybe it's OK if you didn't read enough bytes?
let n = chunk.read_to_end(&mut buf).expect("Didn't read enough");
assert_eq!(bytes_to_read as usize, n);
buf
}
fn main() {
let input_data = b"hello world";
let mut reader = BufReader::new(&input_data[..]);
let first = read_n(&mut reader, 5);
let _ = read_n(&mut reader, 1);
let second = read_n(&mut reader, 5);
println!(
"{:?}, {:?}",
str::from_utf8(&first),
str::from_utf8(&second)
);
}
If you are worried that Read::take consumes the reader by reference, note that take comes from Read and Read is implemented for any mutable reference to a type that implements Read. You can also use Read::by_ref to create this mutable reference.
See also:
Whats the idiomatic way to reference BufReader/BufWriter when passing it between functions?
Why does Iterator::take_while take ownership of the iterator?

How do I specify the calling convention for a C callback that uses &mut [u8] and u32? [duplicate]

Can I somehow get an array from std::ptr::read?
I'd like to do something close to:
let mut v: Vec<u8> = ...
let view = &some_struct as *const _ as *const u8;
v.write(&std::ptr::read<[u8, ..30]>(view));
Which is not valid in this form (can't use the array signature).
If you want to obtain a slice from a raw pointer, use std::slice::from_raw_parts():
let slice = unsafe { std::slice::from_raw_parts(some_pointer, count_of_items) };
If you want to obtain a mutable slice from a raw pointer, use std::slice::from_raw_parts_mut():
let slice = unsafe { std::slice::from_raw_parts_mut(some_pointer, count_of_items) };
Are you sure you want read()? Without special care it will cause disaster on structs with destructors. Also, read() does not read a value of some specified type from a pointer to bytes; it reads exactly one value of the type behind the pointer (e.g. if it is *const u8 then read() will read one byte) and returns it.
If you only want to write byte contents of a structure into a vector, you can obtain a slice from the raw pointer:
use std::mem;
use std::io::Write;
struct SomeStruct {
a: i32,
}
fn main() {
let some_struct = SomeStruct { a: 32 };
let mut v: Vec<u8> = Vec::new();
let view = &some_struct as *const _ as *const u8;
let slice = unsafe { std::slice::from_raw_parts(view, mem::size_of::<SomeStruct>()) };
v.write(slice).expect("Unable to write");
println!("{:?}", v);
}
This makes your code platform-dependent and even compiler-dependent: if you use types of variable size (e.g. isize/usize) in your struct or if you don't use #[repr(C)], the data you wrote into the vector is likely to be read as garbage on another machine (and even #[repr(C)] may not lift this problem sometimes, as far as I remember).

How to read (std::io::Read) from a Vec or Slice?

Vecs support std::io::Write, so code can be written that takes a File or Vec, for example. From the API reference, it looks like neither Vec nor slices support std::io::Read.
Is there a convenient way to achieve this? Does it require writing a wrapper struct?
Here is an example of working code, that reads and writes a file, with a single line commented that should read a vector.
use ::std::io;
// Generic IO
fn write_4_bytes<W>(mut file: W) -> Result<usize, io::Error>
where W: io::Write,
{
let len = file.write(b"1234")?;
Ok(len)
}
fn read_4_bytes<R>(mut file: R) -> Result<[u8; 4], io::Error>
where R: io::Read,
{
let mut buf: [u8; 4] = [0; 4];
file.read(&mut buf)?;
Ok(buf)
}
// Type specific
fn write_read_vec() {
let mut vec_as_file: Vec<u8> = Vec::new();
{ // Write
println!("Writing Vec... {}", write_4_bytes(&mut vec_as_file).unwrap());
}
{ // Read
// println!("Reading File... {:?}", read_4_bytes(&vec_as_file).unwrap());
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// Comment this line above to avoid an error!
}
}
fn write_read_file() {
let filepath = "temp.txt";
{ // Write
let mut file_as_file = ::std::fs::File::create(filepath).expect("open failed");
println!("Writing File... {}", write_4_bytes(&mut file_as_file).unwrap());
}
{ // Read
let mut file_as_file = ::std::fs::File::open(filepath).expect("open failed");
println!("Reading File... {:?}", read_4_bytes(&mut file_as_file).unwrap());
}
}
fn main() {
write_read_vec();
write_read_file();
}
This fails with the error:
error[E0277]: the trait bound `std::vec::Vec<u8>: std::io::Read` is not satisfied
--> src/main.rs:29:42
|
29 | println!("Reading File... {:?}", read_4_bytes(&vec_as_file).unwrap());
| ^^^^^^^^^^^^ the trait `std::io::Read` is not implemented for `std::vec::Vec<u8>`
|
= note: required by `read_4_bytes`
I'd like to write tests for a file format encoder/decoder, without having to write to the file-system.
While vectors don't support std::io::Read, slices do.
There is some confusion here caused by Rust being able to coerce a Vec into a slice in some situations but not others.
In this case, an explicit coercion to a slice is needed because at the stage coercions are applied, the compiler doesn't know that Vec<u8> doesn't implement Read.
The code in the question will work when the vector is coerced into a slice using one of the following methods:
read_4_bytes(&*vec_as_file)
read_4_bytes(&vec_as_file[..])
read_4_bytes(vec_as_file.as_slice()).
Note:
When asking the question initially, I was taking &Read instead of Read. This made passing a reference to a slice fail, unless I'd passed in &&*vec_as_file which I didn't think to do.
Recent versions of rust you can also use as_slice() to convert a Vec to a slice.
Thanks to #arete on #rust for finding the solution!
std::io::Cursor
std::io::Cursor is a simple and useful wrapper that implements Read for Vec<u8>, so it allows to use vector as a readable entity.
let mut file = Cursor::new(vector);
read_something(&mut file);
And documentation shows how to use Cursor instead of File to write unit-tests!
Working example:
use std::io::Cursor;
use std::io::Read;
fn read_something(file: &mut impl Read) {
let _ = file.read(&mut [0; 8]);
}
fn main() {
let vector = vec![1, 2, 3, 4];
let mut file = Cursor::new(vector);
read_something(&mut file);
}
From the documentation about std::io::Cursor:
Cursors are typically used with in-memory buffers to allow them to implement Read and/or Write...
The standard library implements some I/O traits on various types which are commonly used as a buffer, like Cursor<Vec<u8>> and Cursor<&[u8]>.
Slice
The example above works for slices as well. In that case it would look like the following:
read_something(&mut &vector[..]);
Working example:
use std::io::Read;
fn read_something(file: &mut impl Read) {
let _ = file.read(&mut [0; 8]);
}
fn main() {
let vector = vec![1, 2, 3, 4];
read_something(&mut &vector[..]);
}
&mut &vector[..] is a "mutable reference to a slice" (a reference to a reference to a part of vector), so I just find the explicit option with Cursor to be more clear and elegant.
Cursor <-> Slice
Even more: if you have a Cursor that owns a buffer, and you need to emulate, for instance, a part of a "file", you can get a slice from the Cursor and pass to the function.
read_something(&mut &file.get_ref()[1..3]);

How to read a specific number of bytes from a stream?

I have a struct with a BufStream<T> where T: Read+Write.
The BufStream can be a TcpStream and I'd like to read n bytes from it.
Not a fixed amount of bytes in a predefined buffer, but I have a string/stream which indicates the number of bytes to read next.
Is there a nice way to do that?
Since Rust 1.6, Read::read_exact can be used to do this. If bytes_to_read is the number of bytes you need to read, possibly determined at runtime, and reader is the stream to read from:
let mut buf = vec![0u8; bytes_to_read];
reader.read_exact(&mut buf)?;
The part that wasn't clear to me from the read_exact documentation was that the target buffer can be a dynamically-allocated Vec.
Thanks to the Rust Gitter community for pointing me to this solution.
It sounds like you want Read::take and Read::read_to_end.
This will allow you to read data into a &mut Vec<u8>, which is useful when you want to reuse an existing buffer or don't have an appropriately sized slice already. This allows you to avoid initializing the data with dummy values before overwriting them with the newly-read information:
use std::{
io::{prelude::*, BufReader},
str,
};
fn read_n<R>(reader: R, bytes_to_read: u64) -> Vec<u8>
where
R: Read,
{
let mut buf = vec![];
let mut chunk = reader.take(bytes_to_read);
// Do appropriate error handling for your situation
// Maybe it's OK if you didn't read enough bytes?
let n = chunk.read_to_end(&mut buf).expect("Didn't read enough");
assert_eq!(bytes_to_read as usize, n);
buf
}
fn main() {
let input_data = b"hello world";
let mut reader = BufReader::new(&input_data[..]);
let first = read_n(&mut reader, 5);
let _ = read_n(&mut reader, 1);
let second = read_n(&mut reader, 5);
println!(
"{:?}, {:?}",
str::from_utf8(&first),
str::from_utf8(&second)
);
}
If you are worried that Read::take consumes the reader by reference, note that take comes from Read and Read is implemented for any mutable reference to a type that implements Read. You can also use Read::by_ref to create this mutable reference.
See also:
Whats the idiomatic way to reference BufReader/BufWriter when passing it between functions?
Why does Iterator::take_while take ownership of the iterator?

Resources