How to convert from std::io::Bytes to &[u8] - rust

I am trying to write the contents of an HTTP Response to a file.
extern crate reqwest;
use std::io::Write;
use std::fs::File;
fn main() {
let mut resp = reqwest::get("https://www.rust-lang.org").unwrap();
assert!(resp.status().is_success());
// Write contents to disk.
let mut f = File::create("download_file").expect("Unable to create file");
f.write_all(resp.bytes());
}
But I get the following compile error:
error[E0308]: mismatched types
--> src/main.rs:12:17
|
12 | f.write_all(resp.bytes());
| ^^^^^^^^^^^^ expected &[u8], found struct `std::io::Bytes`
|
= note: expected type `&[u8]`
found type `std::io::Bytes<reqwest::Response>`

You cannot. Checking the docs for io::Bytes, there are no appropriate methods. That's because io::Bytes is an iterator that returns things byte-by-byte so there may not even be a single underlying slice of data.
It you only had io::Bytes, you would need to collect the iterator into a Vec:
let data: Result<Vec<_>, _> = resp.bytes().collect();
let data = data.expect("Unable to read data");
f.write_all(&data).expect("Unable to write data");
However, in most cases you have access to the type that implements Read, so you could instead use Read::read_to_end:
let mut data = Vec::new();
resp.read_to_end(&mut data).expect("Unable to read data");
f.write_all(&data).expect("Unable to write data");
In this specific case, you can use io::copy to directly copy from the Request to the file because Request implements io::Read and File implements io::Write:
extern crate reqwest;
use std::io;
use std::fs::File;
fn main() {
let mut resp = reqwest::get("https://www.rust-lang.org").unwrap();
assert!(resp.status().is_success());
// Write contents to disk.
let mut f = File::create("download_file").expect("Unable to create file");
io::copy(&mut resp, &mut f).expect("Unable to copy data");
}

Related

How to write a hyper response body to a file?

I am trying to write a test program with tokio that grabs a file from a website and writes the streamed response to a file. The hyper website shows an example that uses a while loop and uses the .data() method the response body, but I'd like to manipulate the stream with .map() and a couple others.
I thought the next reasonable thing to try would be to convert the stream to an AsyncRead by using the .into_async_read() method from TryStreamExt, but that doesn't seem to work. I had to use a map to convert the hyper::error::Error into a std::error::Error to get a TryStream, but now the compiler is telling me that AsyncRead isn't implemented for the transformed stream. Here is my main.rs file and the error:
use std::error::Error;
use futures::stream::{StreamExt, TryStreamExt};
use http::Request;
use hyper::{Body, Client};
use hyper_tls::HttpsConnector;
use tokio::fs::File;
use tokio::io;
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
let https = HttpsConnector::new();
let client = Client::builder().build::<_, Body>(https);
let request = Request::get("some file from the internet").body(Body::empty())?;
let response = client.request(request).await?;
let mut stream = response
.body()
.map(|result| result.map_err(|error| std::io::Error::new(std::io::ErrorKind::Other, "Error!")))
.into_async_read();
let mut file = File::create("output file").await?;
io::copy(&mut stream, &mut file).await?;
Ok(())
}
error[E0277]: the trait bound `futures_util::stream::try_stream::into_async_read::IntoAsyncRead<futures_util::stream::stream::map::Map<hyper::body::body::Body, [closure#src/main.rs:20:14: 20:103]>>: tokio::io::async_read::AsyncRead` is not satisfied
--> src/main.rs:24:5
|
24 | io::copy(&mut stream, &mut file).await?;
| ^^^^^^^^ the trait `tokio::io::async_read::AsyncRead` is not implemented for `futures_util::stream::try_stream::into_async_read::IntoAsyncRead<futures_util::stream::stream::map::Map<hyper::body::body::Body, [closure#src/main.rs:20:14: 20:103]>>`
|
::: /Users/jackson/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.13/src/io/util/copy.rs:63:12
|
63 | R: AsyncRead + Unpin + ?Sized,
| --------- required by this bound in `tokio::io::util::copy::copy`
error[E0277]: the trait bound `futures_util::stream::try_stream::into_async_read::IntoAsyncRead<futures_util::stream::stream::map::Map<hyper::body::body::Body, [closure#src/main.rs:20:14: 20:103]>>: tokio::io::async_read::AsyncRead` is not satisfied
--> src/main.rs:24:5
|
24 | io::copy(&mut stream, &mut file).await?;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the trait `tokio::io::async_read::AsyncRead` is not implemented for `futures_util::stream::try_stream::into_async_read::IntoAsyncRead<futures_util::stream::stream::map::Map<hyper::body::body::Body, [closure#src/main.rs:20:14: 20:103]>>`
|
= note: required because of the requirements on the impl of `core::future::future::Future` for `tokio::io::util::copy::Copy<'_, futures_util::stream::try_stream::into_async_read::IntoAsyncRead<futures_util::stream::stream::map::Map<hyper::body::body::Body, [closure#src/main.rs:20:14: 20:103]>>, tokio::fs::file::File>`
You almost had it. You invoked into_async_read which gives you an implementation of futures::io::AsyncRead but you want a tokio::io::AsyncRead.
The tokio-util crate gives you a tool to do this conversion.
Add to your Cargo.toml:
tokio-util = { version = "0.3.1", features=["compat"] }
And say you add a conversion function like this:
fn to_tokio_async_read(r: impl futures::io::AsyncRead) -> impl tokio::io::AsyncRead {
tokio_util::compat::FuturesAsyncReadCompatExt::compat(r)
}
then your code could become:
let mut futures_io_async_read = response
.body()
.map(|result| result.map_err(|error| std::io::Error::new(std::io::ErrorKind::Other, "Error!")))
.into_async_read();
let tokio_async_read = to_tokio_async_read(futures_io_async_read)
let mut file = File::create("output file").await?;
io::copy(&mut tokio_async_read, &mut file).await?;

How to decompress XZ data from a hyper::Response on the fly?

I'm downloading an XZ file with hyper, and I would like to save it to disk in decompressed form by extracting as much as possible from each incoming Chunk and writing results to disk immediately, as opposed to first downloading the entire file and then decompressing.
There is the xz2 crate that implements the XZ format. However, its XzDecoder does not seem to support a Python-like decompressobj model, where a caller repeatedly feeds partial input and gets partial output.
Instead, XzDecoder receives input bytes via a Read parameter, and I'm not sure how to glue these two things together. Is there a way to feed a Response to XzDecoder?
The only clue I found so far is this issue, which contains a reference to a private ReadableChunks type, which I could in theory replicate in my code - but maybe there is an easier way?
XzDecoder does not seem to support a Python-like decompressobj model, where a caller repeatedly feeds partial input and gets partial output
there's xz2::stream::Stream which does exactly what you want. Very rough untested code, needs proper error handling, etc, but I hope you'll get the idea:
fn process(body: hyper::body::Body) {
let mut decoder = xz2::stream::Stream::new_stream_decoder(1000, 0).unwrap();
body.for_each(|chunk| {
let mut buf: Vec<u8> = Vec::new();
if let Ok(_) = decoder.process_vec(&chunk, &mut buf, Action::Run) {
// write buf to disk
}
Ok(())
}).wait().unwrap();
}
Based on #Laney's answer, I came up with the following working code:
extern crate failure;
extern crate hyper;
extern crate tokio;
extern crate xz2;
use std::fs::File;
use std::io::Write;
use std::u64;
use failure::Error;
use futures::future::done;
use futures::stream::Stream;
use hyper::{Body, Chunk, Response};
use hyper::rt::Future;
use hyper_tls::HttpsConnector;
use tokio::runtime::Runtime;
fn decode_chunk(file: &mut File, xz: &mut xz2::stream::Stream, chunk: &Chunk)
-> Result<(), Error> {
let end = xz.total_in() as usize + chunk.len();
let mut buf = Vec::with_capacity(8192);
while (xz.total_in() as usize) < end {
buf.clear();
xz.process_vec(
&chunk[chunk.len() - (end - xz.total_in() as usize)..],
&mut buf,
xz2::stream::Action::Run)?;
file.write_all(&buf)?;
}
Ok(())
}
fn decode_response(mut file: File, response: Response<Body>)
-> impl Future<Item=(), Error=Error> {
done(xz2::stream::Stream::new_stream_decoder(u64::MAX, 0)
.map_err(Error::from))
.and_then(|mut xz| response
.into_body()
.map_err(Error::from)
.for_each(move |chunk| done(
decode_chunk(&mut file, &mut xz, &chunk))))
}
fn main() -> Result<(), Error> {
let client = hyper::Client::builder().build::<_, hyper::Body>(
HttpsConnector::new(1)?);
let file = File::create("hello-2.7.tar")?;
let mut runtime = Runtime::new()?;
runtime.block_on(client
.get("https://ftp.gnu.org/gnu/hello/hello-2.7.tar.xz".parse()?)
.map_err(Error::from)
.and_then(|response| decode_response(file, response)))?;
runtime.shutdown_now();
Ok(())
}

How to read (std::io::Read) from a Vec or Slice?

Vecs support std::io::Write, so code can be written that takes a File or Vec, for example. From the API reference, it looks like neither Vec nor slices support std::io::Read.
Is there a convenient way to achieve this? Does it require writing a wrapper struct?
Here is an example of working code, that reads and writes a file, with a single line commented that should read a vector.
use ::std::io;
// Generic IO
fn write_4_bytes<W>(mut file: W) -> Result<usize, io::Error>
where W: io::Write,
{
let len = file.write(b"1234")?;
Ok(len)
}
fn read_4_bytes<R>(mut file: R) -> Result<[u8; 4], io::Error>
where R: io::Read,
{
let mut buf: [u8; 4] = [0; 4];
file.read(&mut buf)?;
Ok(buf)
}
// Type specific
fn write_read_vec() {
let mut vec_as_file: Vec<u8> = Vec::new();
{ // Write
println!("Writing Vec... {}", write_4_bytes(&mut vec_as_file).unwrap());
}
{ // Read
// println!("Reading File... {:?}", read_4_bytes(&vec_as_file).unwrap());
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// Comment this line above to avoid an error!
}
}
fn write_read_file() {
let filepath = "temp.txt";
{ // Write
let mut file_as_file = ::std::fs::File::create(filepath).expect("open failed");
println!("Writing File... {}", write_4_bytes(&mut file_as_file).unwrap());
}
{ // Read
let mut file_as_file = ::std::fs::File::open(filepath).expect("open failed");
println!("Reading File... {:?}", read_4_bytes(&mut file_as_file).unwrap());
}
}
fn main() {
write_read_vec();
write_read_file();
}
This fails with the error:
error[E0277]: the trait bound `std::vec::Vec<u8>: std::io::Read` is not satisfied
--> src/main.rs:29:42
|
29 | println!("Reading File... {:?}", read_4_bytes(&vec_as_file).unwrap());
| ^^^^^^^^^^^^ the trait `std::io::Read` is not implemented for `std::vec::Vec<u8>`
|
= note: required by `read_4_bytes`
I'd like to write tests for a file format encoder/decoder, without having to write to the file-system.
While vectors don't support std::io::Read, slices do.
There is some confusion here caused by Rust being able to coerce a Vec into a slice in some situations but not others.
In this case, an explicit coercion to a slice is needed because at the stage coercions are applied, the compiler doesn't know that Vec<u8> doesn't implement Read.
The code in the question will work when the vector is coerced into a slice using one of the following methods:
read_4_bytes(&*vec_as_file)
read_4_bytes(&vec_as_file[..])
read_4_bytes(vec_as_file.as_slice()).
Note:
When asking the question initially, I was taking &Read instead of Read. This made passing a reference to a slice fail, unless I'd passed in &&*vec_as_file which I didn't think to do.
Recent versions of rust you can also use as_slice() to convert a Vec to a slice.
Thanks to #arete on #rust for finding the solution!
std::io::Cursor
std::io::Cursor is a simple and useful wrapper that implements Read for Vec<u8>, so it allows to use vector as a readable entity.
let mut file = Cursor::new(vector);
read_something(&mut file);
And documentation shows how to use Cursor instead of File to write unit-tests!
Working example:
use std::io::Cursor;
use std::io::Read;
fn read_something(file: &mut impl Read) {
let _ = file.read(&mut [0; 8]);
}
fn main() {
let vector = vec![1, 2, 3, 4];
let mut file = Cursor::new(vector);
read_something(&mut file);
}
From the documentation about std::io::Cursor:
Cursors are typically used with in-memory buffers to allow them to implement Read and/or Write...
The standard library implements some I/O traits on various types which are commonly used as a buffer, like Cursor<Vec<u8>> and Cursor<&[u8]>.
Slice
The example above works for slices as well. In that case it would look like the following:
read_something(&mut &vector[..]);
Working example:
use std::io::Read;
fn read_something(file: &mut impl Read) {
let _ = file.read(&mut [0; 8]);
}
fn main() {
let vector = vec![1, 2, 3, 4];
read_something(&mut &vector[..]);
}
&mut &vector[..] is a "mutable reference to a slice" (a reference to a reference to a part of vector), so I just find the explicit option with Cursor to be more clear and elegant.
Cursor <-> Slice
Even more: if you have a Cursor that owns a buffer, and you need to emulate, for instance, a part of a "file", you can get a slice from the Cursor and pass to the function.
read_something(&mut &file.get_ref()[1..3]);

Method accepting either a file buffer or a string [duplicate]

I need a completely in-memory object that I can give to BufReader and BufWriter. Something like Python's StringIO. I want to write to and read from such an object using methods ordinarily used with Files.
Is there a way to do this using the standard library?
In fact there is a way: Cursor<T>!
(please also read Shepmaster's answer on why often it's even easier)
In the documentation you can see that there are the following impls:
impl<T> Seek for Cursor<T> where T: AsRef<[u8]>
impl<T> Read for Cursor<T> where T: AsRef<[u8]>
impl Write for Cursor<Vec<u8>>
impl<T> AsRef<[T]> for Vec<T>
From this you can see that you can use the type Cursor<Vec<u8>> just as an ordinary file, because Read, Write and Seek are implemented for that type!
Little example (Playground):
use std::io::{Cursor, Read, Seek, SeekFrom, Write};
// Create fake "file"
let mut c = Cursor::new(Vec::new());
// Write into the "file" and seek to the beginning
c.write_all(&[1, 2, 3, 4, 5]).unwrap();
c.seek(SeekFrom::Start(0)).unwrap();
// Read the "file's" contents into a vector
let mut out = Vec::new();
c.read_to_end(&mut out).unwrap();
println!("{:?}", out);
For a more useful example, check the documentation linked above.
You don't need a Cursor most of the time.
object that I can give to BufReader and BufWriter
BufReader requires a value that implements Read:
impl<R: Read> BufReader<R> {
pub fn new(inner: R) -> BufReader<R>
}
BufWriter requires a value that implements Write:
impl<W: Write> BufWriter<W> {
pub fn new(inner: W) -> BufWriter<W> {}
}
If you view the implementors of Read you will find impl<'a> Read for &'a [u8].
If you view the implementors of Write, you will find impl Write for Vec<u8>.
use std::io::{Read, Write};
fn main() {
// Create fake "file"
let mut file = Vec::new();
// Write into the "file"
file.write_all(&[1, 2, 3, 4, 5]).unwrap();
// Read the "file's" contents into a new vector
let mut out = Vec::new();
let mut c = file.as_slice();
c.read_to_end(&mut out).unwrap();
println!("{:?}", out);
}
Writing to a Vec will always append to the end. We also take a slice to the Vec that we can update. Each read of c will advance the slice further and further until it is empty.
The main differences from Cursor:
Cannot seek the data, so you cannot easily re-read data
Cannot write to anywhere but the end
If you want to use BufReader with an in-memory String, you can use the as_bytes() method:
use std::io::BufRead;
use std::io::BufReader;
use std::io::Read;
fn read_buff<R: Read>(mut buffer: BufReader<R>) {
let mut data = String::new();
let _ = buffer.read_line(&mut data);
println!("read_buff got {}", data);
}
fn main() {
read_buff(BufReader::new("Potato!".as_bytes()));
}
This prints read_buff got Potato!. There is no need to use a cursor for this case.
To use an in-memory String with BufWriter, you can use the as_mut_vec method. Unfortunately it is unsafe and I have not found any other way. I don't like the Cursor approach since it consumes the vector and I have not found a way yet to use the Cursor together with BufWriter.
use std::io::BufWriter;
use std::io::Write;
pub fn write_something<W: Write>(mut buf: BufWriter<W>) {
buf.write("potato".as_bytes());
}
#[cfg(test)]
mod tests {
use super::*;
use std::io::{BufWriter};
#[test]
fn testing_bufwriter_and_string() {
let mut s = String::new();
write_something(unsafe { BufWriter::new(s.as_mut_vec()) });
assert_eq!("potato", &s);
}
}

tokio-curl: capture output into a local `Vec` - may outlive borrowed value

I do not know Rust well enough to understand lifetimes and closures yet...
Trying to collect the downloaded data into a vector using tokio-curl:
extern crate curl;
extern crate futures;
extern crate tokio_core;
extern crate tokio_curl;
use std::io::{self, Write};
use std::str;
use curl::easy::Easy;
use tokio_core::reactor::Core;
use tokio_curl::Session;
fn main() {
// Create an event loop that we'll run on, as well as an HTTP `Session`
// which we'll be routing all requests through.
let mut lp = Core::new().unwrap();
let mut out = Vec::new();
let session = Session::new(lp.handle());
// Prepare the HTTP request to be sent.
let mut req = Easy::new();
req.get(true).unwrap();
req.url("https://www.rust-lang.org").unwrap();
req.write_function(|data| {
out.extend_from_slice(data);
io::stdout().write_all(data).unwrap();
Ok(data.len())
})
.unwrap();
// Once we've got our session, issue an HTTP request to download the
// rust-lang home page
let request = session.perform(req);
// Execute the request, and print the response code as well as the error
// that happened (if any).
let mut req = lp.run(request).unwrap();
println!("{:?}", req.response_code());
println!("out: {}", str::from_utf8(&out).unwrap());
}
Produces an error:
error[E0373]: closure may outlive the current function, but it borrows `out`, which is owned by the current function
--> src/main.rs:25:24
|
25 | req.write_function(|data| {
| ^^^^^^ may outlive borrowed value `out`
26 | out.extend_from_slice(data);
| --- `out` is borrowed here
|
help: to force the closure to take ownership of `out` (and any other referenced variables), use the `move` keyword, as shown:
| req.write_function(move |data| {
Investigating further, I see that Easy::write_function requires the 'static lifetime, but the example of how to collect output from the curl-rust docs uses Transfer::write_function instead:
use curl::easy::Easy;
let mut data = Vec::new();
let mut handle = Easy::new();
handle.url("https://www.rust-lang.org/").unwrap();
{
let mut transfer = handle.transfer();
transfer.write_function(|new_data| {
data.extend_from_slice(new_data);
Ok(new_data.len())
}).unwrap();
transfer.perform().unwrap();
}
println!("{:?}", data);
The Transfer::write_function does not require the 'static lifetime:
impl<'easy, 'data> Transfer<'easy, 'data> {
/// Same as `Easy::write_function`, just takes a non `'static` lifetime
/// corresponding to the lifetime of this transfer.
pub fn write_function<F>(&mut self, f: F) -> Result<(), Error>
where F: FnMut(&[u8]) -> Result<usize, WriteError> + 'data
{
...
But I can't use a Transfer instance on tokio-curl's Session::perform because it requires the Easy type:
pub fn perform(&self, handle: Easy) -> Perform {
transfer.easy is a private field that is directly passed to session.perform.
It this an issue with tokio-curl? Maybe it should mark the transfer.easy field as public or implement new function like perform_transfer? Is there another way to collect output using tokio-curl per transfer?
The first thing you have to understand when using the futures library is that you don't have any control over what thread the code is going to run on.
In addition, the documentation for curl's Easy::write_function says:
Note that the lifetime bound on this function is 'static, but that is often too restrictive. To use stack data consider calling the transfer method and then using write_function to configure a callback that can reference stack-local data.
The most straight-forward solution is to use some type of locking primitive to ensure that only one thread at a time may have access to the vector. You also have to share ownership of the vector between the main thread and the closure:
use std::sync::Mutex;
use std::sync::Arc;
let out = Arc::new(Mutex::new(Vec::new()));
let out_closure = out.clone();
// ...
req.write_function(move |data| {
let mut out = out_closure.lock().expect("Unable to lock output");
// ...
}).expect("Cannot set writing function");
// ...
let out = out.lock().expect("Unable to lock output");
println!("out: {}", str::from_utf8(&out).expect("Data was not UTF-8"));
Unfortunately, the tokio-curl library does not currently support using the Transfer type that would allow for stack-based data.

Resources