Rust hyper HTTP(S) server to read/write arbitrary amount of bytes - rust

I'm looking to build a basic HTTP(S) server using rust hyper, with the purpose of throughput measurement. Essentially it has two functions
on GET request, send an infinite (or arbitrary large) stream of bytes.
on POST, discard all incoming bytes, then send a short acknowledgement.
I have this working for HTTP using std::net, but would like to add it to hyper to be able to measure HTTP and HTTPS. Being fairly new to rust, I am wondering how to add it to the hyper HTTPS example server - that is, how I can get the response builder to expose a stream (io::stream?) I can write a static buffer of random bytes to, without building the entire response body in memory.
Essentially, I would like to go
loop {
match stream.write(rand_bytes) {
Ok(_) => {},
Err(_) => break,
}
}
here
async fn echo(req: Request<Body>) -> Result<Response<Body>, hyper::Error> {
let mut response = Response::new(Body::empty());
match (req.method(), req.uri().path()) {
// Help route.
(&Method::GET, "/") => {
*response.body_mut() = Body::from("I want to be a stream.\n");
}
...
I see that I could wrap a futures stream using wrap_stream, so maybe my question is how to define a stream iterator that I can use in wrap_stream which returns the same bytes over and over again.

Thanks to the comment from Caesar above, the following snippet worked for me:
let infstream: futures::stream::Iter<std::iter::Repeat<Result<String, String>>> = stream::iter(std::iter::repeat(Ok(rand_string)));
*response.body_mut() = Body::wrap_stream(infstream);
This does not implement any termination criteria, but it could easily be modified to return a fixed number of bytes.

Related

How to know when the server has received the whole request?

I am implementing the HTTP/1.1 protocol from scratch for academic purpose. I have implemented the RequestBuilder which builds the request object successively from the buffer passed. This is the code to handle the opened socket.
async fn process_socket(stream: TcpStream) -> Result<Request> {
let mut request_builder = RequestBuilder::new();
let mut buffer: [u8; 1024] = unsafe { MaybeUninit::uninit().assume_init() };
loop {
stream.readable().await?;
match stream.try_read(&mut buffer) {
Ok(0) => {
break;
}
Ok(n) => (),
Err(ref e) if e.kind() == ErrorKind::WouldBlock => {
continue;
}
Err(e) => {
return Err(e.into());
}
}
request_builder.parse(&buffer);
}
let request = request_builder.build()?;
Ok(request)
}
request_builder.parse(&buffer); will take the next part of the buffer and parses the request further. My question is, how to break the loop when the client has sent the whole request. When I make a request to the server using curl localhost:8080, the whole request is parsed.
Expected behaviour
The loop would have been broken after reading the whole request stream.
Actual behaviour
The loop is stuck at stream.readable().await?; after reading the whole request into buffer. Currently, when I kill curl command using Ctrl+C, the loop is broken using Ok(0), but I want it to break after reading the who
You need to interpret the HTTP request, as the TCP connection will not get half-closed by a client. A FIN by the client which would violate the protocol) is the only way readable() returns (with an Err) unless the client sends more data (which breaks the HTTP specification).

How to exchange data in a block_on section?

I'm learning Rust and Tokio and I suspect I may be going in the wrong direction.
I'm trying to open a connection to a remote server and perform a handshake. I want to use non-blocking IO so I'm using Tokio's thread pool. The handshake needs to be performed quickly or the remote will close the socket so I'm trying to chain the message exchange in a single block_on section:
let result: Result<(), Box<dyn std::error::Error>> = session
.runtime()
.borrow_mut()
.block_on(async {
let startup = startup(session.configuration());
stream.write_all(startup.as_ref()).await?;
let mut buffer:Vec<u8> = Vec::new();
let mut tmp = [0u8; 1];
loop {
let total = stream.read(&mut tmp).await;
/*
if total == 0 {
break;
}
*/
if total.is_err() {
break;
}
buffer.extend(&tmp);
}
Ok(())
});
My problem is what to do when there are no more bytes in the socket to read. My current implementation reads the response and after the last byte hangs, I believe because the socket is not closed. I thought checking for 0 bytes read would be enough but the call to read() never returns.
What's the best way to handle this?
From your comment:
Nope, the connection is meant to remain open.
If you read from an open connection, the read will block until there are enough bytes to satisfy it or the other end closes the connection, similar to how blocking reads work in C. Tokio is working as-intended.
If closing the stream does not signal the end of a message, then you will have to do your own work to figure out when to stop reading and start processing. A simple way would to just prefix the request with a length, and only read that many bytes.
Note that you'd have to do the above no matter what API you'd use. The fact that you use tokio or not doesn't really answer the fundamental question of "when is the message over".

How to use tokio's UdpSocket to handle messages in a 1 server: N clients setup?

What I want to do:
... write a (1) server/ (N) clients (network-game-)architecture that uses UDP sockets as underlying base for communication.
Messages are sent as Vec<u8>, encoded via bincode (crate)
I also want to be able to occasionally send datagrams that can exceed the typical max MTU of ~1500 bytes and be correctly assembled on receiver end, including sending of ack-messages etc. (I assume I'll have to implement that myself, right?)
For the UdpSocket I thought about using tokio's implementation and maybe framed. I am not sure whether this is a good choice though, as it seems that this would introduce an unnecessary step of mapping Vec<u8> (serialized by bincode) to Vec<u8> (needed by UdpCodec of tokio) (?)
Consider this minimal code-example:
Cargo.toml (server)
bincode = "1.0"
futures = "0.1"
tokio-core = "^0.1"
(Serde and serde-derive are used in shared crate where the protocol is defined!)
(I want to replace tokio-core with tokio asap)
fn main() -> () {
let addr = format!("127.0.0.1:{port}", port = 8080);
let addr = addr.parse::<SocketAddr>().expect(&format!("Couldn't create valid SocketAddress out of {}", addr));
let mut core = Core::new().unwrap();
let handle = core.handle();
let socket = UdpSocket::bind(&addr, &handle).expect(&format!("Couldn't bind socket to address {}", addr));
let udp_future = socket.framed(MyCodec {}).for_each(|(addr, data)| {
socket.send_to(&data, &addr); // Just echo back the data
Ok(())
});
core.run(udp_future).unwrap();
}
struct MyCodec;
impl UdpCodec for MyCodec {
type In = (SocketAddr, Vec<u8>);
type Out = (SocketAddr, Vec<u8>);
fn decode(&mut self, src: &SocketAddr, buf: &[u8]) -> io::Result<Self::In> {
Ok((*src, buf.to_vec()))
}
fn encode(&mut self, msg: Self::Out, buf: &mut Vec<u8>) -> SocketAddr {
let (addr, mut data) = msg;
buf.append(&mut data);
addr
}
}
The problem here is:
let udp_future = socket.framed(MyCodec {}).for_each(|(addr, data)| {
| ------ value moved here ^^^^^^^^^^^^^^ value captured here after move
|
= note: move occurs because socket has type tokio_core::net::UdpSocket, which does not implement the Copy trait
The error makes total sense, yet I am not sure how I would create such a simple echo-service. In reality, the handling of a message involves a bit more logic ofc, but for the sake of a minimal example, this should be enough to give a rough idea.
My workaround is an ugly hack: creating a second socket.
Here's the signature of UdpSocket::framed from Tokio's documentation:
pub fn framed<C: UdpCodec>(self, codec: C) -> UdpFramed<C>
Note that it takes self, not &self; that is, calling this function consumes the socket. The UdpFramed wrapper owns the underlying socket when you call this. Your compilation error is telling you that you're moving socket when you call this method, but you're also trying to borrow socket inside your closure (to call send_to).
This probably isn't what you want for real code. The whole point of using framed() is to turn your socket into something higher-level, so you can send your codec's items directly instead of having to assemble datagrams. Using send or send_to directly on the socket will probably break the framing of your message protocol. In this code, where you're trying to implement a simple echo server, you don't need to use framed at all. But if you do want to have your cake and eat it and use both framed and send_to, luckily UdpFramed still allows you to borrow the underlying UdpSocket, using get_ref. You can fix your problem this way:
let framed = {
let socket = UdpSocket::bind(&addr, &handle).expect(&format!("Couldn't bind socket to address {}", addr));
socket.framed(MyCodec {})
}
let udp_future = framed.for_each(|(addr, data)| {
info!(self.logger, "Udp packet received from {}: length: {}", addr, data.len());
framed.get_ref().send_to(&data, &addr); // Just echo back the data
Ok(())
});
I haven't checked this code, since (as Shepmaster rightly pointed out) your code snippet has other problems, but it should give you the idea anyway. I'll repeat my warning from earlier: if you do this in real code, it will break the network protocol you're using. get_ref's documentation puts it like this:
Note that care should be taken to not tamper with the underlying stream of data coming in as it may corrupt the stream of frames otherwise being worked with.
To answer the new part of your question: yes, you need to handle reassembly yourself, which means your codec does actually need to do some framing on the bytes you're sending. Typically this might involve a start sequence which cannot occur in the Vec<u8>. The start sequence lets you recognise the start of the next message after a packet was lost (which happens a lot with UDP). If there's no byte sequence that can't occur in the Vec<u8>, you need to escape it when it does occur. You might then send the length of the message, followed by the data itself; or just the data, followed by an end sequence and a checksum so you know none was lost. There are pros and cons to these designs, and it's a big topic in itself.
You also need your UdpCodec to contain data: a map from SocketAddr to the partially-reassembled message that's currently in progress. In decode, if you are given the start of a message, copy it into the map and return Ok. If you are given the middle of a message, and you already have the start of a message in the map (for that SocketAddr), append the buffer to the existing buffer and return Ok. When you get to the end of the message, return the whole thing and empty the buffer. The methods on UdpCodec take &mut self in order to enable this use case. (NB In theory, you should also deal with packets arriving out of order, but that's actually quite rare in the real world.)
encode is a lot simpler: you just need to add the same framing and copy the message into the buffer.
Let me reiterate here that you don't need to and shouldn't use the underlying socket after calling framed() on it. UdpFramed is both a source and a sink, so you use that one object to send the replies as well. You can even use split() to get separate Stream and Sink implementations out of it, if that makes the ownership easier in your application.
Overall, now I've seen how much of the problem you're struggling with, I'd recommend just using several TCP sockets instead of UDP. If you want a connection-oriented, reliable protocol, TCP already exists and does that for you. It's very easy to spend a lot of time making a "reliable" layer on top of UDP that is both slower and less reliable than TCP.

Join futures with limited concurrency

I have a large vector of Hyper HTTP request futures and want to resolve them into a vector of results. Since there is a limit of maximum open files, I want to limit concurrency to N futures.
I've experimented with Stream::buffer_unordered but seems like it executed futures one by one.
We've used code like this in a project to avoid opening too many TCP sockets. These futures have Hyper futures within, so it seems exactly the same case.
// Convert the iterator into a `Stream`. We will process
// `PARALLELISM` futures at the same time, but with no specified
// order.
let all_done =
futures::stream::iter(iterator_of_futures.map(Ok))
.buffer_unordered(PARALLELISM);
// Everything after here is just using the stream in
// some manner, not directly related
let mut successes = Vec::with_capacity(LIMIT);
let mut failures = Vec::with_capacity(LIMIT);
// Pull values off the stream, dividing them into success and
// failure buckets.
let mut all_done = all_done.into_future();
loop {
match core.run(all_done) {
Ok((None, _)) => break,
Ok((Some(v), next_all_done)) => {
successes.push(v);
all_done = next_all_done.into_future();
}
Err((v, next_all_done)) => {
failures.push(v);
all_done = next_all_done.into_future();
}
}
}
This is used in a piece of example code, so the event loop (core) is explicitly driven. Watching the number of file handles used by the program showed that it was capped. Additionally, before this bottleneck was added, we quickly ran out of allowable file handles, whereas afterward we did not.

How do I create a structure that can be used in a Rust multithreaded server?

I want to implement a simple server, used by 3 different module of my project.
These modules will send data to the server, which will save it into a file and merge these informations when these modules will finish their job.
All these informations have a timestamp (a float) and a label (a float or a string).
This is my data structure to save these informations:
pub struct Data {
file_name: String,
logs: Vec<(f32, String)>,
measures: Vec<(f32, f32)>,
statements: Vec<(f32, String)>,
}
I use socket to interact with the server.
I use also Arc to implement a Data struct and make it shareable for each of these modules.
So, when I handle the client, I verify if the message sent by the module is correct, and if it is I call a new function that process and save the message in the good data structure field (logs, measures or statements).
// Current ip address
let ip_addr: &str = &format!("{}:{}",
&ip,
port);
// Bind the current IP address
let listener = match TcpListener::bind(ip_addr) {
Ok(listener) => listener,
Err(error) => panic!("Canno't bind {}, due to error {}",
ip_addr,
error),
};
let global_data_struct = Data::new(DEFAULT_FILE.to_string());
let global_data_struct_shared = Arc::new(global_data_struct);
// Get and process streams
for stream in listener.incoming() {
let mut global_data_struct_shared_clone = global_data_struct_shared.clone();
thread::spawn(move || {
// Borrow stream
let stream = stream;
match stream {
// Get the stream value
Ok(mut stream_v) => {
let current_ip = stream_v.peer_addr().unwrap().ip();
let current_port = stream_v.peer_addr().unwrap().port();
println!("Connected with peer {}:{}", current_ip, current_port);
// PROBLEM IN handle_client!
// A get_mut from global_data_struct_shared_clone
// returns to me None, not a value - so I
// can't access to global_data_struct_shared_clone
// fields :'(
handle_client(&mut stream_v, &mut global_data_struct_shared_clone);
},
Err(_) => error!("Canno't decode stream"),
}
});
}
// Stop listening
drop(listener);
I have some problems to get a mutable reference in handle_client to process fields in global_data_struct_shared_clone, because the Arc::get_mut(global_data_struct_shared_clone) returns to me None - due to the global_data_struct_shared.clone() for each incoming request.
Can someone help me to manage correctly this structure between these 3 modules please?
The insight of Rust is that memory safety is achieved by enforcing Aliasing XOR Mutability.
Enforcing this single principle prevents whole classes of bugs: pointer/iterator invalidation (which was the goal) and also data races.
As much as possible, Rust will try to enforce this principle at compile-time; however it can also enforce it at run-time if the user opts in by using dedicated types/methods.
Arc::get_mut is such a method. An Arc (Atomic Reference Counted pointer) is specifically meant to share a reference between multiple owners, which means aliasing, and as a result disallows mutability by default; Arc::get_mut will perform a run-time check: if the pointer is actually not alias (count of 1), then it allows mutability.
However, as you realized, this is not suitable in your case since the Arc is aliased at that point in time.
So you need to turn to other types.
The simplest solution is Arc<Mutex<...>>, Arc allows sharing, Mutex allows controlled mutability, together you can share with run-time controlled mutability enforced by the Mutex.
This is coarse-grained, but might very well be sufficient.
More sophisticated approaches can use RwLock (Reader-Writer lock), more granular Mutex or even atomics; but I would advise starting with a single Mutex and see how it goes, you have to walk before you run.

Resources