How to parse multipart forms using abonander/multipart with Rocket? - rust

This might be useful for me:
I have no idea how you're meant to go about parsing a multipart form
besides doing it manually using just the raw post-data string as input
I will try to adjust the Hyper example but any help will be much appreciated.
Relevant issues:
Support Multipart Forms.
support rocket

Rocket's primary abstraction for data is the FromData trait. Given the POST data and the request, you can construct a given type:
pub trait FromData<'a>: Sized {
type Error;
type Owned: Borrow<Self::Borrowed>;
type Borrowed: ?Sized;
fn transform(
request: &Request,
data: Data
) -> Transform<Outcome<Self::Owned, Self::Error>>;
fn from_data(
request: &Request,
outcome: Transformed<'a, Self>
) -> Outcome<Self, Self::Error>;
}
Then, it's just a matter of reading the API for multipart and inserting tab A into slot B:
#![feature(proc_macro_hygiene, decl_macro)]
use multipart::server::Multipart; // 0.16.1, default-features = false, features = ["server"]
use rocket::{
data::{Data, FromData, Outcome, Transform, Transformed},
post, routes, Request,
}; // 0.4.2
use std::io::Read;
#[post("/", data = "<upload>")]
fn index(upload: DummyMultipart) -> String {
format!("I read this: {:?}", upload)
}
#[derive(Debug)]
struct DummyMultipart {
alpha: String,
one: i32,
file: Vec<u8>,
}
// All of the errors in these functions should be reported
impl<'a> FromData<'a> for DummyMultipart {
type Owned = Vec<u8>;
type Borrowed = [u8];
type Error = ();
fn transform(_request: &Request, data: Data) -> Transform<Outcome<Self::Owned, Self::Error>> {
let mut d = Vec::new();
data.stream_to(&mut d).expect("Unable to read");
Transform::Owned(Outcome::Success(d))
}
fn from_data(request: &Request, outcome: Transformed<'a, Self>) -> Outcome<Self, Self::Error> {
let d = outcome.owned()?;
let ct = request
.headers()
.get_one("Content-Type")
.expect("no content-type");
let idx = ct.find("boundary=").expect("no boundary");
let boundary = &ct[(idx + "boundary=".len())..];
let mut mp = Multipart::with_body(&d[..], boundary);
// Custom implementation parts
let mut alpha = None;
let mut one = None;
let mut file = None;
mp.foreach_entry(|mut entry| match &*entry.headers.name {
"alpha" => {
let mut t = String::new();
entry.data.read_to_string(&mut t).expect("not text");
alpha = Some(t);
}
"one" => {
let mut t = String::new();
entry.data.read_to_string(&mut t).expect("not text");
let n = t.parse().expect("not number");
one = Some(n);
}
"file" => {
let mut d = Vec::new();
entry.data.read_to_end(&mut d).expect("not file");
file = Some(d);
}
other => panic!("No known key {}", other),
})
.expect("Unable to iterate");
let v = DummyMultipart {
alpha: alpha.expect("alpha not set"),
one: one.expect("one not set"),
file: file.expect("file not set"),
};
// End custom
Outcome::Success(v)
}
}
fn main() {
rocket::ignite().mount("/", routes![index]).launch();
}
I've never used either of these APIs for real, so there's no guarantee that this is a good implementation. In fact, all the panicking on error definitely means it's suboptimal. A production usage would handle all of those cleanly.
However, it does work:
%curl -X POST -F alpha=omega -F one=2 -F file=#hello http://localhost:8000/
I read this: DummyMultipart { alpha: "omega", one: 2, file: [104, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100, 33, 10] }
An advanced implementation would allow for some abstraction between the user-specific data and the generic multipart aspects. Something like Multipart<MyForm> would be nice.
The author of Rocket points out that this solution allows a malicious end user to POST an infinitely sized file, which would cause the machine to run out of memory. Depending on the intended use, you may wish to establish some kind of cap on the number of bytes read, potentially writing to the filesystem at some breakpoint.

Official support for multipart form parsing in Rocket is still being discussed. Until then, take a look at the official example on how to integrate the multipart crate with Rocket: https://github.com/abonander/multipart/blob/master/examples/rocket.rs

Related

can you carry mutable state in nom?

I've been playing around with nom and I'm wondering how to track state when you are parsing something. For practice I'm using nom to write a QOI image file library.
So far I've tried using closures and wrapping the parsers in a struct implementation. I'm fairly sure I could get it done with global mutable state, but we all know that's a sin. I've posted a minimal(ish) example below with the error I get on my struct impl version:
use nom::{
branch::alt,
bytes::streaming::tag,
error::Error,
error::ErrorKind,
error::ParseError,
multi::{count, many1},
number::complete::be_u8,
sequence::{pair, terminated, tuple},
IResult,
};
type Pixel = [u8; 3];
fn px_hash(px: Pixel) -> usize {
(px[0] * 3 + px[1] * 5 + px[2] * 7) as usize & 64
}
pub struct Decoder {
table: [Pixel; 64],
prev: Pixel,
}
const QOI_END_TAG: &[u8] = &[0, 0, 0, 0, 0, 0, 0, 1];
const QOI_FAKE_HEADER: u8 = 0xff;
const QOI_OP_RGB: u8 = 0xfe;
impl Decoder {
pub fn decode(&mut self, input: &[u8]) -> IResult<&[u8], Vec<Pixel>> {
pair(
tag(&[QOI_FAKE_HEADER]),
terminated(many1(alt((self.rgb, self.index))), tag(QOI_END_TAG)),
)(input)
.map(|(_, (next_input, pixels))| (next_input, pixels))
}
pub fn rgb(&mut self, input: &[u8]) -> IResult<&[u8], Pixel> {
tuple((tag(&[QOI_OP_RGB]), count(be_u8, 3)))(input).map(|(next_input, (_, c))| {
let px = [c[0], c[1], c[2]];
self.prev = px;
self.table[px_hash(px)] = px;
(next_input, px)
})
}
pub fn index(&mut self, input: &[u8]) -> IResult<&[u8], Pixel> {
be_u8(input).and_then(|(next_input, res)| {
if res >= 64 {
let e: ErrorKind = ErrorKind::Tag;
nom::Err::Error(Error::from_error_kind(input, ErrorKind::Tag));
}
let res = res as usize;
self.prev = self.table[res];
Ok((next_input, self.table[res]))
})
}
}
And here's the error I get.
error[E0615]: attempted to take value of method `rgb` on type `&mut min::Decoder`
--> src/min.rs:32:40
|
32 | terminated(many1(alt((self.rgb, self.index))), tag(QOI_END_TAG)),
| ^^^ method, not a field
I understand the the signature isn't really what's expected, as self is an implicit argument, but closures have that problem except even worse.
Does anyone know a work around for a problem like this? Carrying state is a common requirement for any kind of decompression etc. so I'm sure that this issue has come up for people but my google-fu hasn't come up with much.
I suppose it's also possible that I'm using the wrong tool for this job, but I was under the impression that parser combinators could get something like this done.

Is there a way in Rust to overload method for a specific type?

The following is only an example. If there's a native solution for this exact problem with reading bytes - cool, but my goal is to learn how to do it by myself, for any other purpose as well.
I'd like to do something like this: (pseudo-code below)
let mut reader = Reader::new(bytesArr);
let int32: i32 = reader.read(); // separate implementation to read 4 bits and convert into int32
let int64: i64 = reader.read(); // separate implementation to read 8 bits and convert into int64
I imagine it looking like this: (pseudo-code again)
impl Reader {
read<T>(&mut self) -> T {
// if T is i32 ... else if ...
}
}
or like this:
impl Reader {
read(&mut self) -> i32 {
// ...
}
read(&mut self) -> i64 {
// ...
}
}
But haven't found anything relatable yet.
(I actually have, for the first case (if T is i32 ...), but it looked really unreadable and inconvenient)
You could do this by having a Readable trait which you implement on i32 and i64, which does the operation. Then on Reader you could have a generic function which takes any type that is Readable and return it, for example:
struct Reader {
n: u8,
}
trait Readable {
fn read_from_reader(reader: &mut Reader) -> Self;
}
impl Readable for i32 {
fn read_from_reader(reader: &mut Reader) -> i32 {
reader.n += 1;
reader.n as i32
}
}
impl Readable for i64 {
fn read_from_reader(reader: &mut Reader) -> i64 {
reader.n += 1;
reader.n as i64
}
}
impl Reader {
fn read<T: Readable>(&mut self) -> T {
T::read_from_reader(self)
}
}
fn main() {
let mut r = Reader { n: 0 };
let int32: i32 = r.read();
let int64: i64 = r.read();
println!("{} {}", int32, int64);
}
You can try it on the playground
After some trials and searches, I found that implementing them in current Rust seems a bit difficult, but not impossible.
Here is the code, I'll explain it afterwards:
#![feature(generic_const_exprs)]
use std::{
mem::{self, MaybeUninit},
ptr,
};
static DATA: [u8; 8] = [
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
];
struct Reader;
impl Reader {
fn read<T: Copy + Sized>(&self) -> T
where
[(); mem::size_of::<T>()]: ,
{
let mut buf = [unsafe { MaybeUninit::uninit().assume_init() }; mem::size_of::<T>()];
unsafe {
ptr::copy_nonoverlapping(DATA.as_ptr(), buf.as_mut_ptr(), buf.len());
mem::transmute_copy(&buf)
}
}
}
fn main() {
let reader = Reader;
let v_u8: u8 = reader.read();
dbg!(v_u8);
let v_u16: u16 = reader.read();
dbg!(v_u16);
let v_u32: u32 = reader.read();
dbg!(v_u32);
let v_u64: u64 = reader.read();
dbg!(v_u64);
}
Suppose the global static variable DATA is the target data you want to read.
In current Rust, we cannot directly use the size of a generic parameter as the length of an array. This does not work:
fn example<T: Copy + Sized>() {
let mut _buf = [0_u8; mem::size_of::<T>()];
}
The compiler gives a weird error:
error: unconstrained generic constant
--> src\main.rs:34:31
|
34 | let mut _buf = [0_u8; mem::size_of::<T>()];
| ^^^^^^^^^^^^^^^^^^^
|
= help: try adding a `where` bound using this expression: `where [(); mem::size_of::<T>()]:`
There is an issue that is tracking it, if you want to go deeper into this error you can take a look.
We just follow the compiler's suggestion to add a where bound. This requires feature generic_const_exprs to be enabled.
Next, unsafe { MaybeUninit::uninit().assume_init() } is optional, which drops the overhead of initializing this array, since we will eventually overwrite it completely. You can replace it with 0_u8 if you don't like it.
Finally, copy the data you need and transmute this array to your generic type, return.
I think you will see the output you expect:
[src\main.rs:38] v_u8 = 255
[src\main.rs:41] v_u16 = 65535
[src\main.rs:44] v_u32 = 4294967295
[src\main.rs:47] v_u64 = 18446744073709551615

Read binary file in units of f64 in Rust

Assuming you have a binary file example.bin and you want to read that file in units of f64, i.e. the first 8 bytes give a float, the next 8 bytes give a number, etc. (assuming you know endianess) How can this be done in Rust?
I know that one can use std::fs::read("example.bin") to get a Vec<u8> of the data, but then you have to do quite a bit of "gymnastics" to convert always 8 of the bytes to a f64, i.e.
fn eight_bytes_to_array(barry: &[u8]) -> &[u8; 8] {
barry.try_into().expect("slice with incorrect length")
}
let mut file_content = std::fs::read("example.bin").expect("Could not read file!");
let nr = eight_bytes_to_array(&file_content[0..8]);
let nr = f64::from_be_bytes(*nr_dp_per_spectrum);
I saw this post, but its from 2015 and a lot of changes have happend in Rust since then, so I was wondering if there is a better/faster way these days?
Example without proper error handling and checking for cases when file contains not divisible amount of bytes.
use std::fs::File;
use std::io::{BufReader, Read};
fn main() {
// Using BufReader because files in std is unbuffered by default
// And reading by 8 bytes is really bad idea.
let mut input = BufReader::new(
File::open("floats.bin")
.expect("Failed to open file")
);
let mut floats = Vec::new();
loop {
use std::io::ErrorKind;
// You may use 8 instead of `size_of` but size_of is less error-prone.
let mut buffer = [0u8; std::mem::size_of::<f64>()];
// Using read_exact because `read` may return less
// than 8 bytes even if there are bytes in the file.
// This, however, prevents us from handling cases
// when file size cannot be divided by 8.
let res = input.read_exact(&mut buffer);
match res {
// We detect if we read until the end.
// If there were some excess bytes after last read, they are lost.
Err(error) if error.kind() == ErrorKind::UnexpectedEof => break,
// Add more cases of errors you want to handle.
_ => {}
}
// You should do better error-handling probably.
// This simply panics.
res.expect("Unexpected error during read");
// Use `from_be_bytes` if numbers in file is big-endian
let f = f64::from_le_bytes(buffer);
floats.push(f);
}
}
I would create a generic iterator that returns f64 for flexibility and reusability.
struct F64Reader<R: io::BufRead> {
inner: R,
}
impl<R: io::BufRead> F64Reader<R> {
pub fn new(inner: R) -> Self {
Self{
inner
}
}
}
impl<R: io::BufRead> Iterator for F64Reader<R> {
type Item = f64;
fn next(&mut self) -> Option<Self::Item> {
let mut buff: [u8; 8] = [0;8];
self.inner.read_exact(&mut buff).ok()?;
Some(f64::from_be_bytes(buff))
}
}
This means if the file is large, you can loop through the values without storing it all in memory
let input = fs::File::open("example.bin")?;
for f in F64Reader::new(io::BufReader::new(input)) {
println!("{}", f)
}
Or if you want all the values you can collect them
let input = fs::File::open("example.bin")?;
let values : Vec<f64> = F64Reader::new(io::BufReader::new(input)).collect();

How can I put an async function into a Vec in Rust?

I need to put some futures in a Vec for later joining. However if I try to collect it using an iterator, the compiler doesn't seem to be able to determine the type for the vector.
I'm trying to create a command line utility that accepts an arbitrary number of IP addresses, communicates with those remotes and collects the results for printing. The communication function works well, I've cut down the program to show the failure I need to understand.
use futures::future::join_all;
use itertools::Itertools;
use std::net::SocketAddr;
use std::str::from_utf8;
use std::fmt;
#[tokio::main(flavor = "current_thread")]
pub async fn main() -> Result<(), Box<dyn std::error::Error>> {
let socket: Vec<SocketAddr> = vec![
"192.168.20.33:502".parse().unwrap(),
"192.168.20.34:502".parse().unwrap(),];
let async_vec = vec![
MyStruct::get(socket[0]),
MyStruct::get(socket[1]),];
// The above 3 lines happen to work to build a Vec because there are
// 2 sockets. But I need to build a Vec to join_all from an arbitary
// number of addresses. Why doesn't the line below work instead?
//let async_vec = socket.iter().map(|x| MyStruct::get(*x)).collect();
let rt = join_all(async_vec).await;
let results = rt.iter().map(|x| x.as_ref().unwrap().to_string()).join("\n");
let mut rvec: Vec<String> = results.split("\n").map(|x| x.to_string()).collect();
rvec.sort_by(|a, b| a[15..20].cmp(&b[15..20]));
println!("{}", rvec.join("\n"));
Ok(())
}
struct MyStruct {
serial: [u8; 12],
placeholder: String,
}
impl fmt::Display for MyStruct {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let serial = match from_utf8(&self.serial) {
Ok(v) => v,
Err(_) => "(invalid)",
};
let lines = (1..4).map(|x| format!("{}, line{}, {}", serial, x, self.placeholder)).join("\n");
write!(f, "{}", lines)
}
}
impl MyStruct {
pub async fn get(sockaddr: SocketAddr) -> Result<MyStruct, Box<dyn std::error::Error>> {
let char = sockaddr.ip().to_string().chars().last().unwrap();
let rv = MyStruct{serial: [char as u8;12], placeholder: sockaddr.to_string(), };
Ok(rv)
}
}
This line:
let async_vec = socket.iter().map(|x| MyStruct::get(*x)).collect();
doesn't work because the compiler can't know that you want to collect everything into a Vec. You might want to collect into some other container (e.g. a linked list or a set). Therefore you need to tell the compiler the kind of container you want with:
let async_vec = socket.iter().map(|x| MyStruct::get(*x)).collect::<Vec::<_>>();
or:
let async_vec: Vec::<_> = socket.iter().map(|x| MyStruct::get(*x)).collect();

How to write incoming stream into a file in warp?

Goal:
The server should be able to receive a stream of binary data and save it to a file.
I'm getting this error:
mismatched types
expected `&[u8]`, found type parameter `B`
How can I get a &[u8] from generic type B?
use warp::Filter;
use warp::{body};
use futures::stream::Stream;
async fn handle_upload<S, B>(stream: S) -> Result<impl warp::Reply, warp::Rejection>
where
S: Stream<Item = Result<B, warp::Error>>,
S: StreamExt,
B: warp::Buf
{
let mut file = File::create("some_binary_file").unwrap();
let pinnedStream = Box::pin(stream);
while let Some(item) = pinnedStream.next().await {
let data = item.unwrap();
file.write_all(data);
}
Ok(warp::reply())
}
#[tokio::main]
async fn main() {
pretty_env_logger::init();
let upload = warp::put()
.and(warp::path("stream"))
.and(body::stream())
.and_then(handle_upload);
warp::serve(upload).run(([127, 0, 0, 1], 3030)).await;
}
B implements warp::Buf which is re-exported from the bytes crate. It has a .bytes() method that returns a &[u8] which may work.
However, the documentation says that .bytes() may return a shorter slice than what it actually contains. So you can call .bytes() and .advance() the stream while it .has_remaining() OR convert it to Bytes and send that to the file:
let mut data = item.unwrap();
file.write_all(data.to_bytes().as_ref());

Resources