Reading response into bytes takes forever [closed] - rust

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 11 months ago.
Improve this question
I am trying to download a file using reqwest. The response status is 200. At the line when I am trying to read the response as bytes response.bytes().await? it waits forever.
However, when I try to make a curl request for the same URL, it passes and I am able to successfully download the file.
I am not sure what is wrong or how I should debug from here. Any suggestion is welcome.
use anyhow::Context;
use reqwest_middleware::ClientBuilder;
use reqwest_retry::{policies::ExponentialBackoff, RetryTransientMiddleware};
use reqwest_tracing::TracingMiddleware;
use std::fs;
use std::io::Cursor;
use std::os::unix::fs::PermissionsExt;
use std::path::PathBuf;
use std::{
path::Path,
process::{Command, Output},
};
async fn download() -> Result<(), anyhow::Error> {
let panda_link = format!(
"https://surya.jfrog.io/artifactory/binaries/panda/v2.34.0/{}/panda",
ARCH_NAME
);
let retry_policy = ExponentialBackoff::builder().build_with_max_retries(3);
let client = ClientBuilder::new(reqwest::Client::new())
.with(TracingMiddleware)
.with(RetryTransientMiddleware::new_with_policy(retry_policy))
.build();
println!("the client has been successfully built");
let response = client
.get(panda_link)
.send()
.await?;
println!("got the response {}", response.status());
response
.error_for_status_ref()
.context("Failed to download panda")?;
println!("check if response was error is done");
let response_bytes = response.bytes().await?;
println!("reading response bytes");
let mut content = Cursor::new(response_bytes);
println!("reading bytes");
let new_file_path = PathBuf::from("/");
println!("this is the newfile_path {:?}", new_file_path);
let mut file = std::fs::File::create(&new_file_path)
.context(format!("Failed creating file {}", &new_file_path.display()))?;
fs::set_permissions(&new_file_path, fs::Permissions::from_mode(0o750)).context(
format!("Failed making {} executable", &new_file_path.display()),
)?;
println!("file created");
std::io::copy(&mut content, &mut file)
.context(format!("Failed saving file {}", &new_file_path.display()))?;
Ok(())
}

I think the issue you are having is a combination of multiple things:
Personal internet download speed.
The Website speed itself. (website's traffic load, location of the website, the website's speed to generate download).
Ram speed.
Disk speed.
I did some tests with the code below. (It's fuctional and doesn't "wait forever")
If I pointed the download link to a large file (~100 Mb) on google docs it would download in (give or take) a second, but if I pointed it to a large file (~100 Mb) on a website that wasn't very fast then it would take a few seconds.
// Download link
let panda_link = format!(
"https://surya.jfrog.io/artifactory/binaries/panda/v2.34.0/{}/panda",
ARCH_NAME
);
// File Path
let new_file_path = PathBuf::from("/");
// Gets the response from website as bytes (Not sure if its stored in the memory. If it is stored in the memory than it would be impossible to download something that is larger than your total ram size.)
let content = reqwest::get(panda_link).await?.bytes().await?;
// Writes the bytes to the disk.
fs::write(new_file_path, content)?;

Related

Stream body bytes from Surf request to file

I'm making requests for files using surf and I want to save the request body in a file. But these files are too large to store in memory so I need to be able to stream it to the file.
Surf looks like it supports this but I have not found a way to save the result in a file.
My attempt looks like this
let mut result = surf::get(&link)
.await
.map_err(|err| anyhow!(err))
.context(format!("Failed to fetch from {}", &link))?;
let body = surf::http::Body::from_reader(req, None);
let mut image_tempfile: File = tempfile::tempfile()?;
image_tempfile.write(body);
but this does not work as write() expects a &[u8] which I believe would require reading the whole body in to memory. Is there any way I can write the content of the surf Body to a file without holding it all in memory?
Surf's Response implements AsyncRead (a.k.a. async_std::io::Read) and if you convert your temporary file into something that implements AsyncWrite (like async_std::fs::File) then you can use async_std::io::copy to move bytes from one to the other asynchronously, without buffering the whole file:
let mut response = surf::get("https://www.google.com").await.unwrap();
let mut tempfile = async_std::fs::File::from(tempfile::tempfile().unwrap());
async_std::io::copy(&mut response, &mut tempfile).await.unwrap();

Rust hyper HTTP(S) server to read/write arbitrary amount of bytes

I'm looking to build a basic HTTP(S) server using rust hyper, with the purpose of throughput measurement. Essentially it has two functions
on GET request, send an infinite (or arbitrary large) stream of bytes.
on POST, discard all incoming bytes, then send a short acknowledgement.
I have this working for HTTP using std::net, but would like to add it to hyper to be able to measure HTTP and HTTPS. Being fairly new to rust, I am wondering how to add it to the hyper HTTPS example server - that is, how I can get the response builder to expose a stream (io::stream?) I can write a static buffer of random bytes to, without building the entire response body in memory.
Essentially, I would like to go
loop {
match stream.write(rand_bytes) {
Ok(_) => {},
Err(_) => break,
}
}
here
async fn echo(req: Request<Body>) -> Result<Response<Body>, hyper::Error> {
let mut response = Response::new(Body::empty());
match (req.method(), req.uri().path()) {
// Help route.
(&Method::GET, "/") => {
*response.body_mut() = Body::from("I want to be a stream.\n");
}
...
I see that I could wrap a futures stream using wrap_stream, so maybe my question is how to define a stream iterator that I can use in wrap_stream which returns the same bytes over and over again.
Thanks to the comment from Caesar above, the following snippet worked for me:
let infstream: futures::stream::Iter<std::iter::Repeat<Result<String, String>>> = stream::iter(std::iter::repeat(Ok(rand_string)));
*response.body_mut() = Body::wrap_stream(infstream);
This does not implement any termination criteria, but it could easily be modified to return a fixed number of bytes.

How to exchange data in a block_on section?

I'm learning Rust and Tokio and I suspect I may be going in the wrong direction.
I'm trying to open a connection to a remote server and perform a handshake. I want to use non-blocking IO so I'm using Tokio's thread pool. The handshake needs to be performed quickly or the remote will close the socket so I'm trying to chain the message exchange in a single block_on section:
let result: Result<(), Box<dyn std::error::Error>> = session
.runtime()
.borrow_mut()
.block_on(async {
let startup = startup(session.configuration());
stream.write_all(startup.as_ref()).await?;
let mut buffer:Vec<u8> = Vec::new();
let mut tmp = [0u8; 1];
loop {
let total = stream.read(&mut tmp).await;
/*
if total == 0 {
break;
}
*/
if total.is_err() {
break;
}
buffer.extend(&tmp);
}
Ok(())
});
My problem is what to do when there are no more bytes in the socket to read. My current implementation reads the response and after the last byte hangs, I believe because the socket is not closed. I thought checking for 0 bytes read would be enough but the call to read() never returns.
What's the best way to handle this?
From your comment:
Nope, the connection is meant to remain open.
If you read from an open connection, the read will block until there are enough bytes to satisfy it or the other end closes the connection, similar to how blocking reads work in C. Tokio is working as-intended.
If closing the stream does not signal the end of a message, then you will have to do your own work to figure out when to stop reading and start processing. A simple way would to just prefix the request with a length, and only read that many bytes.
Note that you'd have to do the above no matter what API you'd use. The fact that you use tokio or not doesn't really answer the fundamental question of "when is the message over".

Runtime returning without completing all tasks despite using block_on in Tokio

I am writing a multi-threaded concurrent Kafka producer using Rust and Tokio. The project has 2 modes, an interactive mode that runs in an infinite loop and a file mode which takes a file as an argument and then reads the file and sends these messages to Kafka via multiple threads. Interactive mode works fine! but file mode has issues.
To achieve this, I had initially started with Rayon, but then switched to a more flexible runtime; tokio. Now, I am able to parallelize the task of sending data over a specified number of threads within tokio, however, it seems that runtime is getting dropped before all messages are produced. Here is my code:
pub fn worker(brokers: String, f: File, t: usize, topic: Arc<String>) {
let reader = BufReader::new(f);
let mut rt = runtime::Builder::new()
.threaded_scheduler()
.core_threads(t)
.build()
.unwrap();
let producers: Arc<Vec<Mutex<BaseProducer>>> = Arc::new(
(0..t)
.map(|_| get_producer(&brokers))
.collect::<Vec<Mutex<BaseProducer>>>(),
);
let acounter = atomic::AtomicUsize::new(0);
let _results: Vec<_> = reader
.lines()
.map(|line| line.unwrap())
.map(move |line| {
let prods = producers.clone();
let tp = topic.clone();
let cnt = acounter.swap(
(acounter.load(atomic::Ordering::SeqCst) + 1) % t,
atomic::Ordering::SeqCst,
);
rt.block_on(async move {
match prods[cnt]
.lock()
.unwrap()
.send(BaseRecord::to(&(*tp)).payload(&line).key(""))
{
Ok(_) => (),
Err(e) => eprintln!("{:?}", e),
};
})
})
.collect();
}
fn get_producer(brokers: &String) -> Mutex<BaseProducer> {
Mutex::new(
BaseProducer::from_config(
ClientConfig::new()
.set("bootstrap.servers", &brokers)
.set("message.timeout.ms", "5000"),
)
.expect("Producer creation error"),
)
}
As a high-level walkthrough: I create mutable producers equal to the number of threads specified and every task within this thread will use one of these producers. The file is read line by line sequentially and every line is moved into the closure that produces it as a message to Kafka.
The code works fine, for the most part, but there are issues related to the runtime exiting without completing all tasks, even when I am using the block_on function in the runtime. Which is supposed to block until the future is complete (Async block in my case here).
I believe the issue is that the issue is with runtime getting dropped without all the threading within Tokio exiting successfully.
I tried reading a file with this approach habing 100,000 records, on a single thread, I was able to produce 28,000 records. On 2 threads, close to 46,000 records. And while utilising all 8 cores of my CPU, I was getting 99,000-100,000 messages indeterministically.
I have checked several answers on SO, but none help in my case. I also read through the documentation of tokio::runtime::Runtime here and tried to use spawn and then use futures::future::join, but that didn't work either.
Any help is appreciated!

How can I read non-blocking from stdin?

Is there a way to check whether data is available on stdin in Rust, or to do a read that returns immediately with the currently available data?
My goal is to be able to read the input produced for instance by cursor keys in a shell that is setup to return all read data immediately. For instance with an equivalent to: stty -echo -echok -icanon min 1 time 0.
I suppose one solution would be to use ncurses or similar libraries, but I would like to avoid any kind of large dependencies.
So far, I got only blocking input, which is not what I want:
let mut reader = stdin();
let mut s = String::new();
match reader.read_to_string(&mut s) {...} // this blocks :(
Converting OP's comment into an answer:
You can spawn a thread and send data over a channel. You can then poll that channel in the main thread using try_recv.
use std::io;
use std::sync::mpsc;
use std::sync::mpsc::Receiver;
use std::sync::mpsc::TryRecvError;
use std::{thread, time};
fn main() {
let stdin_channel = spawn_stdin_channel();
loop {
match stdin_channel.try_recv() {
Ok(key) => println!("Received: {}", key),
Err(TryRecvError::Empty) => println!("Channel empty"),
Err(TryRecvError::Disconnected) => panic!("Channel disconnected"),
}
sleep(1000);
}
}
fn spawn_stdin_channel() -> Receiver<String> {
let (tx, rx) = mpsc::channel::<String>();
thread::spawn(move || loop {
let mut buffer = String::new();
io::stdin().read_line(&mut buffer).unwrap();
tx.send(buffer).unwrap();
});
rx
}
fn sleep(millis: u64) {
let duration = time::Duration::from_millis(millis);
thread::sleep(duration);
}
Most operating systems default to work with the standard input and output in a blocking way. No wonder then that the Rust library follows in stead.
To read from a blocking stream in a non-blocking way you might create a separate thread, so that the extra thread blocks instead of the main one. Checking whether a blocking file descriptor produced some input is similar: spawn a thread, make it read the data, check whether it produced any data so far.
Here's a piece of code that I use with a similar goal of processing a pipe output interactively and that can hopefully serve as an example. It sends the data over a channel, which supports the try_recv method - allowing you to check whether the data is available or not.
Someone has told me that mio might be used to read from a pipe in a non-blocking way, so you might want to check it out too. I suspect that passing the stdin file descriptor (0) to Receiver::from_raw_fd should just work.
You could also potentially look at using ncurses (also on crates.io) which would allow you read in raw mode. There are a few examples in the Github repository which show how to do this.

Resources