Stream body bytes from Surf request to file - rust

I'm making requests for files using surf and I want to save the request body in a file. But these files are too large to store in memory so I need to be able to stream it to the file.
Surf looks like it supports this but I have not found a way to save the result in a file.
My attempt looks like this
let mut result = surf::get(&link)
.await
.map_err(|err| anyhow!(err))
.context(format!("Failed to fetch from {}", &link))?;
let body = surf::http::Body::from_reader(req, None);
let mut image_tempfile: File = tempfile::tempfile()?;
image_tempfile.write(body);
but this does not work as write() expects a &[u8] which I believe would require reading the whole body in to memory. Is there any way I can write the content of the surf Body to a file without holding it all in memory?

Surf's Response implements AsyncRead (a.k.a. async_std::io::Read) and if you convert your temporary file into something that implements AsyncWrite (like async_std::fs::File) then you can use async_std::io::copy to move bytes from one to the other asynchronously, without buffering the whole file:
let mut response = surf::get("https://www.google.com").await.unwrap();
let mut tempfile = async_std::fs::File::from(tempfile::tempfile().unwrap());
async_std::io::copy(&mut response, &mut tempfile).await.unwrap();

Related

Reading response into bytes takes forever [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 11 months ago.
Improve this question
I am trying to download a file using reqwest. The response status is 200. At the line when I am trying to read the response as bytes response.bytes().await? it waits forever.
However, when I try to make a curl request for the same URL, it passes and I am able to successfully download the file.
I am not sure what is wrong or how I should debug from here. Any suggestion is welcome.
use anyhow::Context;
use reqwest_middleware::ClientBuilder;
use reqwest_retry::{policies::ExponentialBackoff, RetryTransientMiddleware};
use reqwest_tracing::TracingMiddleware;
use std::fs;
use std::io::Cursor;
use std::os::unix::fs::PermissionsExt;
use std::path::PathBuf;
use std::{
path::Path,
process::{Command, Output},
};
async fn download() -> Result<(), anyhow::Error> {
let panda_link = format!(
"https://surya.jfrog.io/artifactory/binaries/panda/v2.34.0/{}/panda",
ARCH_NAME
);
let retry_policy = ExponentialBackoff::builder().build_with_max_retries(3);
let client = ClientBuilder::new(reqwest::Client::new())
.with(TracingMiddleware)
.with(RetryTransientMiddleware::new_with_policy(retry_policy))
.build();
println!("the client has been successfully built");
let response = client
.get(panda_link)
.send()
.await?;
println!("got the response {}", response.status());
response
.error_for_status_ref()
.context("Failed to download panda")?;
println!("check if response was error is done");
let response_bytes = response.bytes().await?;
println!("reading response bytes");
let mut content = Cursor::new(response_bytes);
println!("reading bytes");
let new_file_path = PathBuf::from("/");
println!("this is the newfile_path {:?}", new_file_path);
let mut file = std::fs::File::create(&new_file_path)
.context(format!("Failed creating file {}", &new_file_path.display()))?;
fs::set_permissions(&new_file_path, fs::Permissions::from_mode(0o750)).context(
format!("Failed making {} executable", &new_file_path.display()),
)?;
println!("file created");
std::io::copy(&mut content, &mut file)
.context(format!("Failed saving file {}", &new_file_path.display()))?;
Ok(())
}
I think the issue you are having is a combination of multiple things:
Personal internet download speed.
The Website speed itself. (website's traffic load, location of the website, the website's speed to generate download).
Ram speed.
Disk speed.
I did some tests with the code below. (It's fuctional and doesn't "wait forever")
If I pointed the download link to a large file (~100 Mb) on google docs it would download in (give or take) a second, but if I pointed it to a large file (~100 Mb) on a website that wasn't very fast then it would take a few seconds.
// Download link
let panda_link = format!(
"https://surya.jfrog.io/artifactory/binaries/panda/v2.34.0/{}/panda",
ARCH_NAME
);
// File Path
let new_file_path = PathBuf::from("/");
// Gets the response from website as bytes (Not sure if its stored in the memory. If it is stored in the memory than it would be impossible to download something that is larger than your total ram size.)
let content = reqwest::get(panda_link).await?.bytes().await?;
// Writes the bytes to the disk.
fs::write(new_file_path, content)?;

Rust hyper HTTP(S) server to read/write arbitrary amount of bytes

I'm looking to build a basic HTTP(S) server using rust hyper, with the purpose of throughput measurement. Essentially it has two functions
on GET request, send an infinite (or arbitrary large) stream of bytes.
on POST, discard all incoming bytes, then send a short acknowledgement.
I have this working for HTTP using std::net, but would like to add it to hyper to be able to measure HTTP and HTTPS. Being fairly new to rust, I am wondering how to add it to the hyper HTTPS example server - that is, how I can get the response builder to expose a stream (io::stream?) I can write a static buffer of random bytes to, without building the entire response body in memory.
Essentially, I would like to go
loop {
match stream.write(rand_bytes) {
Ok(_) => {},
Err(_) => break,
}
}
here
async fn echo(req: Request<Body>) -> Result<Response<Body>, hyper::Error> {
let mut response = Response::new(Body::empty());
match (req.method(), req.uri().path()) {
// Help route.
(&Method::GET, "/") => {
*response.body_mut() = Body::from("I want to be a stream.\n");
}
...
I see that I could wrap a futures stream using wrap_stream, so maybe my question is how to define a stream iterator that I can use in wrap_stream which returns the same bytes over and over again.
Thanks to the comment from Caesar above, the following snippet worked for me:
let infstream: futures::stream::Iter<std::iter::Repeat<Result<String, String>>> = stream::iter(std::iter::repeat(Ok(rand_string)));
*response.body_mut() = Body::wrap_stream(infstream);
This does not implement any termination criteria, but it could easily be modified to return a fixed number of bytes.

How should I get a copy of this object in rust?

I've got a small program that is supposed to print out the response status of an html get request and also print out the raw html of that response. (I am using the latest version of the reqwest crate for this)
fn main() {
let req = reqwest::blocking::get("https://www.rust-lang.org");
let rawhtml = req.clone().unwrap().text().unwrap();
let status = req.unwrap().status();
println!("status: {}\n\n", status);
println!("{}", rawhtml);
}
running with cargo run gives me error[E0599] saying
method cannot be called on Result<reqwest::blocking::Response, reqwest::Error> due to unsatisfied trait bounds
So if I cannot clone the object then how am I to use more than one of its functions that consumes "self"? I don't want to have to make multiple get requests when all the info I need should be in one.
First of all, reqwest::blocking::Response doesn't implement Clone or provide an alternative, so "getting a copy" is not an option; you need to structure your program so it doesn't need a copy.
The problem you're having with your current code is not mainly with the Response; it's that you're calling Result::unwrap, which does consume the Result it's called on — in order to give you its contents. The right thing to do here is to call unwrap only once.
let req = reqwest::blocking::get("https://www.rust-lang.org")
.unwrap();
let rawhtml = req.text().unwrap();
let status = req.status();
This still won't compile, but that's because you are calling the methods in the wrong order: you must ask for whatever things you need from the headers of the response before using the body. This is not an arbitrary constraint; it's because HTTP gives you that information in that order, and so having the API work this way allows reqwest to not need to store the entire response as it is downloaded — the Response object is not just an unchanging data structure but actually represents the response as it is being sent to your computer.
This version will work:
let req = reqwest::blocking::get("https://www.rust-lang.org")
.unwrap();
let status = req.status();
let rawhtml = req.text().unwrap();
status and rawhtml both implement Clone, so you can keep those around and make copies as much as you need, unlike the Response.
(Disclaimer: I haven't actually used reqwest myself; this answer is based on reading the docs and source, and general Rust principles.)

How to use tokio's UdpSocket to handle messages in a 1 server: N clients setup?

What I want to do:
... write a (1) server/ (N) clients (network-game-)architecture that uses UDP sockets as underlying base for communication.
Messages are sent as Vec<u8>, encoded via bincode (crate)
I also want to be able to occasionally send datagrams that can exceed the typical max MTU of ~1500 bytes and be correctly assembled on receiver end, including sending of ack-messages etc. (I assume I'll have to implement that myself, right?)
For the UdpSocket I thought about using tokio's implementation and maybe framed. I am not sure whether this is a good choice though, as it seems that this would introduce an unnecessary step of mapping Vec<u8> (serialized by bincode) to Vec<u8> (needed by UdpCodec of tokio) (?)
Consider this minimal code-example:
Cargo.toml (server)
bincode = "1.0"
futures = "0.1"
tokio-core = "^0.1"
(Serde and serde-derive are used in shared crate where the protocol is defined!)
(I want to replace tokio-core with tokio asap)
fn main() -> () {
let addr = format!("127.0.0.1:{port}", port = 8080);
let addr = addr.parse::<SocketAddr>().expect(&format!("Couldn't create valid SocketAddress out of {}", addr));
let mut core = Core::new().unwrap();
let handle = core.handle();
let socket = UdpSocket::bind(&addr, &handle).expect(&format!("Couldn't bind socket to address {}", addr));
let udp_future = socket.framed(MyCodec {}).for_each(|(addr, data)| {
socket.send_to(&data, &addr); // Just echo back the data
Ok(())
});
core.run(udp_future).unwrap();
}
struct MyCodec;
impl UdpCodec for MyCodec {
type In = (SocketAddr, Vec<u8>);
type Out = (SocketAddr, Vec<u8>);
fn decode(&mut self, src: &SocketAddr, buf: &[u8]) -> io::Result<Self::In> {
Ok((*src, buf.to_vec()))
}
fn encode(&mut self, msg: Self::Out, buf: &mut Vec<u8>) -> SocketAddr {
let (addr, mut data) = msg;
buf.append(&mut data);
addr
}
}
The problem here is:
let udp_future = socket.framed(MyCodec {}).for_each(|(addr, data)| {
| ------ value moved here ^^^^^^^^^^^^^^ value captured here after move
|
= note: move occurs because socket has type tokio_core::net::UdpSocket, which does not implement the Copy trait
The error makes total sense, yet I am not sure how I would create such a simple echo-service. In reality, the handling of a message involves a bit more logic ofc, but for the sake of a minimal example, this should be enough to give a rough idea.
My workaround is an ugly hack: creating a second socket.
Here's the signature of UdpSocket::framed from Tokio's documentation:
pub fn framed<C: UdpCodec>(self, codec: C) -> UdpFramed<C>
Note that it takes self, not &self; that is, calling this function consumes the socket. The UdpFramed wrapper owns the underlying socket when you call this. Your compilation error is telling you that you're moving socket when you call this method, but you're also trying to borrow socket inside your closure (to call send_to).
This probably isn't what you want for real code. The whole point of using framed() is to turn your socket into something higher-level, so you can send your codec's items directly instead of having to assemble datagrams. Using send or send_to directly on the socket will probably break the framing of your message protocol. In this code, where you're trying to implement a simple echo server, you don't need to use framed at all. But if you do want to have your cake and eat it and use both framed and send_to, luckily UdpFramed still allows you to borrow the underlying UdpSocket, using get_ref. You can fix your problem this way:
let framed = {
let socket = UdpSocket::bind(&addr, &handle).expect(&format!("Couldn't bind socket to address {}", addr));
socket.framed(MyCodec {})
}
let udp_future = framed.for_each(|(addr, data)| {
info!(self.logger, "Udp packet received from {}: length: {}", addr, data.len());
framed.get_ref().send_to(&data, &addr); // Just echo back the data
Ok(())
});
I haven't checked this code, since (as Shepmaster rightly pointed out) your code snippet has other problems, but it should give you the idea anyway. I'll repeat my warning from earlier: if you do this in real code, it will break the network protocol you're using. get_ref's documentation puts it like this:
Note that care should be taken to not tamper with the underlying stream of data coming in as it may corrupt the stream of frames otherwise being worked with.
To answer the new part of your question: yes, you need to handle reassembly yourself, which means your codec does actually need to do some framing on the bytes you're sending. Typically this might involve a start sequence which cannot occur in the Vec<u8>. The start sequence lets you recognise the start of the next message after a packet was lost (which happens a lot with UDP). If there's no byte sequence that can't occur in the Vec<u8>, you need to escape it when it does occur. You might then send the length of the message, followed by the data itself; or just the data, followed by an end sequence and a checksum so you know none was lost. There are pros and cons to these designs, and it's a big topic in itself.
You also need your UdpCodec to contain data: a map from SocketAddr to the partially-reassembled message that's currently in progress. In decode, if you are given the start of a message, copy it into the map and return Ok. If you are given the middle of a message, and you already have the start of a message in the map (for that SocketAddr), append the buffer to the existing buffer and return Ok. When you get to the end of the message, return the whole thing and empty the buffer. The methods on UdpCodec take &mut self in order to enable this use case. (NB In theory, you should also deal with packets arriving out of order, but that's actually quite rare in the real world.)
encode is a lot simpler: you just need to add the same framing and copy the message into the buffer.
Let me reiterate here that you don't need to and shouldn't use the underlying socket after calling framed() on it. UdpFramed is both a source and a sink, so you use that one object to send the replies as well. You can even use split() to get separate Stream and Sink implementations out of it, if that makes the ownership easier in your application.
Overall, now I've seen how much of the problem you're struggling with, I'd recommend just using several TCP sockets instead of UDP. If you want a connection-oriented, reliable protocol, TCP already exists and does that for you. It's very easy to spend a lot of time making a "reliable" layer on top of UDP that is both slower and less reliable than TCP.

How can I read non-blocking from stdin?

Is there a way to check whether data is available on stdin in Rust, or to do a read that returns immediately with the currently available data?
My goal is to be able to read the input produced for instance by cursor keys in a shell that is setup to return all read data immediately. For instance with an equivalent to: stty -echo -echok -icanon min 1 time 0.
I suppose one solution would be to use ncurses or similar libraries, but I would like to avoid any kind of large dependencies.
So far, I got only blocking input, which is not what I want:
let mut reader = stdin();
let mut s = String::new();
match reader.read_to_string(&mut s) {...} // this blocks :(
Converting OP's comment into an answer:
You can spawn a thread and send data over a channel. You can then poll that channel in the main thread using try_recv.
use std::io;
use std::sync::mpsc;
use std::sync::mpsc::Receiver;
use std::sync::mpsc::TryRecvError;
use std::{thread, time};
fn main() {
let stdin_channel = spawn_stdin_channel();
loop {
match stdin_channel.try_recv() {
Ok(key) => println!("Received: {}", key),
Err(TryRecvError::Empty) => println!("Channel empty"),
Err(TryRecvError::Disconnected) => panic!("Channel disconnected"),
}
sleep(1000);
}
}
fn spawn_stdin_channel() -> Receiver<String> {
let (tx, rx) = mpsc::channel::<String>();
thread::spawn(move || loop {
let mut buffer = String::new();
io::stdin().read_line(&mut buffer).unwrap();
tx.send(buffer).unwrap();
});
rx
}
fn sleep(millis: u64) {
let duration = time::Duration::from_millis(millis);
thread::sleep(duration);
}
Most operating systems default to work with the standard input and output in a blocking way. No wonder then that the Rust library follows in stead.
To read from a blocking stream in a non-blocking way you might create a separate thread, so that the extra thread blocks instead of the main one. Checking whether a blocking file descriptor produced some input is similar: spawn a thread, make it read the data, check whether it produced any data so far.
Here's a piece of code that I use with a similar goal of processing a pipe output interactively and that can hopefully serve as an example. It sends the data over a channel, which supports the try_recv method - allowing you to check whether the data is available or not.
Someone has told me that mio might be used to read from a pipe in a non-blocking way, so you might want to check it out too. I suspect that passing the stdin file descriptor (0) to Receiver::from_raw_fd should just work.
You could also potentially look at using ncurses (also on crates.io) which would allow you read in raw mode. There are a few examples in the Github repository which show how to do this.

Resources