I'm fairly new to Rust and have spent most of my time writing code in C/C++. I have a flask webserver that returns back a packed data structure in the form of length + null-terminated string:
test_data = "Hello there bob!" + "\x00"
test_data = test_data.encode("utf-8")
data = struct.pack("<I", len(test_data ))
data += test_data
return data
In my rust code, I'm using the easy_http_request crate and can successfully get the response back by calling get_from_url_str. What I'm trying to do is map the returned response back to the Test data structure (if possible). I've attempted to use align_to to unsuccessfully get the string data mapped to the structure.
extern crate easy_http_request;
extern crate libc;
use easy_http_request::DefaultHttpRequest;
use libc::c_char;
#[repr(C, packed)]
#[derive(Debug, Clone, Copy)]
struct Test {
a: u32,
b: *const c_char // TODO: What do I put here???
}
fn main() {
let response = DefaultHttpRequest::get_from_url_str("http://localhost:5000/").unwrap().send().unwrap();
let (head, body, _tail) = unsafe { response.body.align_to::<Test>() };
let my_test: Test = body[0];
println!("{}", my_test.a); // Correctly prints '17'
println!("{:?}", my_test.b); // Fails
}
I'm not sure this is possible in Rust. In the response.body I can correctly see the null-terminated string, so I know the data is there. Just unsure if there's a way to map it to a string in the Test structure. There's no reason I need to use a null-terminated string. Ultimately, I'm just trying to map a data structure of size and a string to a Rust struct of the similar types.
It looks like you are confused by two different meanings of pack:
* In Python, pack is a protocol of sorts to serialize data into an array of bytes.
* In Rust, pack is a directive added to a struct to remove padding between members and disable other weirdness.
While they can be use together to make a protocol work, that is not the case, because in your pack you have a variable-length member. And trying to serialize/deserialize a pointer value directly is a very bad idea.
Your packed flask message is basically:
4 bytes litte endian value with the number of bytes in the string.
so many bytes indicated above for the string, encoded in utf-8.
For that you do not need a packed struct. The easiest way is to just read the fields manually, one by one. Something like this (error checking omitted):
use std::convert::TryInto;
let a = i32::from_le_bytes(response[0..4].try_into().unwrap());
let b = std::str::from_utf8(&response[4 .. 4 + a as usize]).unwrap();
Don't use raw pointers, they are unsafe to use and recommended only when there are strong reasons to
get around Rust’s safety guarantees.
At minumum a struct that fits your requirement is something like:
struct Test<'a> {
value: &'a str
}
or a String owned value for avoiding lifetime dependencies.
A reference to &str comprises a len and a pointer (it is not a C-like char * pointer).
By the way, the hard part is not the parsing of the wire protocol but to manage correctly all the possible
decoding errors and avoid unexpected runtime failures in case of buggy or malicious clients.
In order not to reinvent the wheel, an example with the parse combinator nom:
use nom::{
number::complete::le_u32,
bytes::complete::take,
error::ErrorKind,
IResult
};
use easy_http_request::DefaultHttpRequest;
use std::str::from_utf8;
#[derive(Debug, Clone)]
struct Test {
value: String
}
fn decode_len_value(bytes: &[u8]) -> IResult<&[u8], Test> {
let (buffer, len) = le_u32(bytes)?;
// take len-1 bytes because null char (\0) is accounted into len
let (remaining, val) = take(len-1)(buffer)?;
match from_utf8(val) {
Ok(strval) => Ok((remaining, Test {value: strval.to_owned()})),
Err(_) => Err(nom::Err::Error((remaining, ErrorKind::Char)))
}
}
fn main() {
let response = DefaultHttpRequest::get_from_url_str("http://localhost:5000/").unwrap().send().unwrap();
let result = decode_len_value(&response.body);
println!("{:?}", result);
}
Related
I work on a Rust library used, through C headers, in a Swift UI.
I can read from Swift in Rust, but I can't write right away to Swift (so from Rust) what I've just read.
--
Basically, I get to convert successfully in String an *const i8 saying hello world.
But the same String fails to be handled with consistency by as_ptr() (and so being parsed as UTF-8 in Swift) =>
Swift send hello world as *const i8
Rust handle it through let input: &str successfully (#1 print in get_message()) => rightly prints hello world
Now I can't convert this input &strto a pointer again:
the pointer can't be decoded by Swift
the "pointer encoding" changes at every call of the function (should be always the same output, as for "hello world".as_ptr())
Basically, why
"hello world".as_ptr() always have the same output and can be decoded by Swift
when input.as_ptr() has a different output every time called and can't never be decoded by Swift (where printing input rightly returns hello world)?
Do you guys have ideas?
#[derive(Debug)]
#[repr(C)]
pub struct MessageC {
pub message_bytes: *const u8,
pub message_len: libc::size_t,
}
/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
unsafe fn c_string_safe(cstring: *const i8) -> String {
CStr::from_ptr(cstring).to_string_lossy().into_owned()
}
/// # Safety
/// call of c_string_safe from Swift
/// => https://doc.rust-lang.org/std/ffi/struct.CStr.html#method.from_ptr
/// on `async extern "C"` => <https://stackoverflow.com/a/52521592/7281870>
#[no_mangle]
#[tokio::main] // allow async function, needed to call here other async functions (not this example but needed)
pub async unsafe extern "C" fn get_message(
user_input: *const i8,
) -> MessageC {
let input: &str = &c_string_safe(user_input);
println!("from Swift: {}", input); // [consistent] from Swift: hello world
println!("converted to ptr: {:?}", input.as_ptr()); // [inconsistent] converted to ptr: 0x60000079d770 / converted to ptr: 0x6000007b40b0
println!("directly to ptr: {:?}", "hello world".as_ptr()); // [consistent] directly to ptr: 0x1028aaf6f
MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
}
}
The way you construct MessageC is unsound and returns a dangling pointer. The code in get_message() is equivalent to this:
pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
let _invisible = c_string_safe(user_input);
let input: &str = &_invisible;
// let's skip the prints
let msg = MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
};
drop(_invisible);
return msg;
}
Hopefully this formulation highlights the issue: c_string_safe() returns an owned heap-allocated String which gets dropped (and its data deallocated) by the end of the function. input is a slice that refers to the data allocated by that String. In safe Rust you wouldn't be allowed to return a slice referring to a local variable such as input - you'd have to either return the String itself or limit yourself to passing the slice downwards to functions.
However, you're not using safe Rust and you're creating a pointer to the heap-allocated data. Now you have a problem because as soon as get_message() returns, the _invisible String gets deallocated, and the pointer you're returning is dangling. The dangling pointer may even appear to work because deallocation is not obligated to clear the data from memory, it just marks it as available for future allocations. But those future allocations can and will happen, perhaps from a different thread. Thus a program that references freed memory is bound to misbehave, often in an unpredictable fashion - which is precisely what you have observed.
In all-Rust code you'd resolve the issue by safely returning String instead. But you're doing FFI, so you must reduce the string to a pointer/length pair. Rust allows you to do just that, the easiest way being to just call std::mem::forget() to prevent the string from getting deallocated:
pub async unsafe extern "C" fn get_message(user_input: *const i8) -> MessageC {
let mut input = c_string_safe(user_input);
input.shrink_to_fit(); // ensure string capacity == len
let msg = MessageC {
message_bytes: input.as_ptr(),
message_len: input.len() as libc::size_t,
};
std::mem::forget(input); // prevent input's data from being deallocated on return
msg
}
But now you have a different problem: get_message() allocates a string, but how do you deallocate it? Just dropping MessageC won't do it because it just contains pointers. (And doing so by implementing Drop would probably be unwise because you're sending it to Swift or whatever.) The solution is to provide a separate function that re-creates the String from the MessageC and drops it immediately:
pub unsafe fn free_message_c(m: MessageC) {
// The call to `shrink_to_fit()` above makes it sound to re-assemble
// the string using a capacity equal to its length
drop(String::from_raw_parts(
m.message_bytes as *mut _,
m.message_len,
m.message_len,
));
}
You should call this function once you're done with MessageC, i.e. when the Swift code has done its job. (You could even make it extern "C" and call it from Swift.)
Finally, using "hello world".as_ptr() directly works because "hello world" is a static &str which is baked into the executable and never gets deallocated. In other words, it doesn't point to a String, it points to some static data that comes with the program.
I want to send my struct via a TcpStream. I could send String or u8, but I can not send an arbitrary struct. For example:
struct MyStruct {
id: u8,
data: [u8; 1024],
}
let my_struct = MyStruct { id: 0, data: [1; 1024] };
let bytes: &[u8] = convert_struct(my_struct); // how??
tcp_stream.write(bytes);
After receiving the data, I want to convert &[u8] back to MyStruct. How can I convert between these two representations?
I know Rust has a JSON module for serializing data, but I don't want to use JSON because I want to send data as fast and small as possible, so I want to no or very small overhead.
A correctly sized struct as zero-copied bytes can be done using stdlib and a generic function.
In the example below there there is a reusable function called any_as_u8_slice instead of convert_struct, since this is a utility to wrap cast and slice creation.
Note that the question asks about converting, this example creates a read-only slice, so has the advantage of not needing to copy the memory.
Heres a working example based on the question:
unsafe fn any_as_u8_slice<T: Sized>(p: &T) -> &[u8] {
::core::slice::from_raw_parts(
(p as *const T) as *const u8,
::core::mem::size_of::<T>(),
)
}
fn main() {
struct MyStruct {
id: u8,
data: [u8; 1024],
}
let my_struct = MyStruct { id: 0, data: [1; 1024] };
let bytes: &[u8] = unsafe { any_as_u8_slice(&my_struct) };
// tcp_stream.write(bytes);
println!("{:?}", bytes);
}
Note 1) even though 3rd party crates might be better in some cases, this is such a primitive operation that its useful to know how to do in Rust.
Note 2) at time of writing (Rust 1.15), there is no support for const functions. Once there is, it will be possible to cast into a fixed sized array instead of a slice.
Note 3) the any_as_u8_slice function is marked unsafe because any padding bytes in the struct may be uninitialized memory (giving undefined behavior).
If there were a way to ensure input arguments used only structs which were #[repr(packed)], then it could be safe.
Otherwise the function is fairly safe since it prevents buffer over-run since the output is read-only, fixed number of bytes, and its lifetime is bound to the input.If you wanted a version that returned a &mut [u8], that would be quite dangerous since modifying could easily create inconsistent/corrupt data.
(Shamelessly stolen and adapted from Renato Zannon's comment on a similar question)
Perhaps a solution like bincode would suit your case? Here's a working excerpt:
Cargo.toml
[package]
name = "foo"
version = "0.1.0"
authors = ["An Devloper <an.devloper#example.com>"]
edition = "2018"
[dependencies]
bincode = "1.0"
serde = { version = "1.0", features = ["derive"] }
main.rs
use serde::{Deserialize, Serialize};
use std::fs::File;
#[derive(Serialize, Deserialize)]
struct A {
id: i8,
key: i16,
name: String,
values: Vec<String>,
}
fn main() {
let a = A {
id: 42,
key: 1337,
name: "Hello world".to_string(),
values: vec!["alpha".to_string(), "beta".to_string()],
};
// Encode to something implementing `Write`
let mut f = File::create("/tmp/output.bin").unwrap();
bincode::serialize_into(&mut f, &a).unwrap();
// Or just to a buffer
let bytes = bincode::serialize(&a).unwrap();
println!("{:?}", bytes);
}
You would then be able to send the bytes wherever you want. I assume you are already aware of the issues with naively sending bytes around (like potential endianness issues or versioning), but I'll mention them just in case ^_^.
I'm doing something with MaybeUninit and FFI in Rust that seems to work, but I suspect may be unsound/relying on undefined behavior.
My aim is to have a struct MoreA extend a struct A, by including A as an initial field. And then to call some C code that writes to the struct A. And then finalize MoreA by filling in its additional fields, based on what's in A.
In my application, the additional fields of MoreA are all integers, so I don't have to worry about assignments to them dropping the (uninitialized) previous values.
Here's a minimal example:
use core::fmt::Debug;
use std::mem::MaybeUninit;
#[derive(Clone, Copy, PartialEq, Debug)]
#[repr(C)]
struct A(i32, i32);
#[derive(Clone, Copy, PartialEq, Debug)]
#[repr(C)]
struct MoreA {
head: A,
more: i32,
}
unsafe fn mock_ffi(p: *mut A) {
// write doesn't drop previous (uninitialized) occupant of p
p.write(A(1, 2));
}
fn main() {
let mut b = MaybeUninit::<MoreA>::uninit();
unsafe { mock_ffi(b.as_mut_ptr().cast()); }
let b = unsafe {
let mut b = b.assume_init();
b.more = 3;
b
};
assert_eq!(&b, &MoreA { head: A(1, 2), more: 3 });
}
Is the code let b = unsafe { ... } sound? It runs Ok and Miri doesn't complain.
But the MaybeUninit docs say:
Moreover, uninitialized memory is special in that the compiler knows that it does not have
a fixed value. This makes it undefined behavior to have uninitialized data in a variable
even if that variable has an integer type, which otherwise can hold any fixed bit pattern.
Also, the Rust book says that Behavior considered undefined includes:
Producing an invalid value, even in private fields and locals. "Producing" a value happens any time a value is assigned to or read from a place, passed to a function/primitive operation or returned from a function/primitive operation. The following values are invalid (at their respective type):
... An integer (i*/u*) or ... obtained from uninitialized memory.
On the other hand, it doesn't seem possible to write to the more field before calling assume_init. Later on the same page:
There is currently no supported way to create a raw pointer or reference to a field of a struct
inside MaybeUninit. That means it is not possible to create a struct by calling
MaybeUninit::uninit::() and then writing to its fields.
If what I'm doing in the above code example does trigger undefined behavior, what would solutions be?
I'd like to avoid boxing the A value (that is, I'd like to have it be directly included in MoreA).
I'd hope also to avoid having to create one A to pass to mock_ffi and then having to copy the results into MoreA. A in my real application is a large structure.
I guess if there's no sound way to get what I'm after, though, I'd have to choose one of those two fallbacks.
If struct A is of a type that can hold the bit-pattern 0 as a valid value, then I guess a third fallback would be:
Start with MaybeUninit::zeroed() rather than MaybeUninit::uninit().
Currently, the only sound way to refer to uninitialized memory—of any type—is MaybeUninit. In practice, it is probably safe to read or write to uninitialized integers, but that is not officially documented. It is definitely not safe to read or write to an uninitialized bool or most other types.
In general, as the documentation states, you cannot initialize a struct field by field. However, it is sound to do so as long as:
the struct has repr(C). This is necessary because it prevents Rust from doing clever layout tricks, so that the layout of a field of type MaybeUninit<T> remains identical to the layout of a field of type T, regardless of its adjacent fields.
every field is MaybeUninit. This lets us assume_init() for the entire struct, and then later initialise each field individually.
Given that your struct is already repr(C), you can use an intermediate representation which uses MaybeIninit for every field. The repr(C) also means that we can transmute between the types once it is initialised, provided that the two structs have the same fields in the same order.
use std::mem::{self, MaybeUninit};
#[repr(C)]
struct MoreAConstruct {
head: MaybeUninit<A>,
more: MaybeUninit<i32>,
}
let b: MoreA = unsafe {
// It's OK to assume a struct is initialized when all of its fields are MaybeUninit
let mut b_construct = MaybeUninit::<MoreAConstruct>::uninit().assume_init();
mock_ffi(b_construct.head.as_mut_ptr());
b_construct.more = MaybeUninit::new(3);
mem::transmute(b_construct)
};
It is now possible (since Rust 1.51) to initialize fields of any uninitialized struct using the std::ptr::addr_of_mut macro. This example is from the documentation:
You can use MaybeUninit, and the std::ptr::addr_of_mut macro, to
initialize structs field by field:
#[derive(Debug, PartialEq)] pub struct Foo {
name: String,
list: Vec<u8>, }
let foo = {
let mut uninit: MaybeUninit<Foo> = MaybeUninit::uninit();
let ptr = uninit.as_mut_ptr();
// Initializing the `name` field
unsafe { addr_of_mut!((*ptr).name).write("Bob".to_string()); }
// Initializing the `list` field
// If there is a panic here, then the `String` in the `name` field leaks.
unsafe { addr_of_mut!((*ptr).list).write(vec![0, 1, 2]); }
// All the fields are initialized, so we call `assume_init` to get an initialized Foo.
unsafe { uninit.assume_init() } };
assert_eq!(
foo,
Foo {
name: "Bob".to_string(),
list: vec![0, 1, 2]
}
);
I want to send my struct via a TcpStream. I could send String or u8, but I can not send an arbitrary struct. For example:
struct MyStruct {
id: u8,
data: [u8; 1024],
}
let my_struct = MyStruct { id: 0, data: [1; 1024] };
let bytes: &[u8] = convert_struct(my_struct); // how??
tcp_stream.write(bytes);
After receiving the data, I want to convert &[u8] back to MyStruct. How can I convert between these two representations?
I know Rust has a JSON module for serializing data, but I don't want to use JSON because I want to send data as fast and small as possible, so I want to no or very small overhead.
A correctly sized struct as zero-copied bytes can be done using stdlib and a generic function.
In the example below there there is a reusable function called any_as_u8_slice instead of convert_struct, since this is a utility to wrap cast and slice creation.
Note that the question asks about converting, this example creates a read-only slice, so has the advantage of not needing to copy the memory.
Heres a working example based on the question:
unsafe fn any_as_u8_slice<T: Sized>(p: &T) -> &[u8] {
::core::slice::from_raw_parts(
(p as *const T) as *const u8,
::core::mem::size_of::<T>(),
)
}
fn main() {
struct MyStruct {
id: u8,
data: [u8; 1024],
}
let my_struct = MyStruct { id: 0, data: [1; 1024] };
let bytes: &[u8] = unsafe { any_as_u8_slice(&my_struct) };
// tcp_stream.write(bytes);
println!("{:?}", bytes);
}
Note 1) even though 3rd party crates might be better in some cases, this is such a primitive operation that its useful to know how to do in Rust.
Note 2) at time of writing (Rust 1.15), there is no support for const functions. Once there is, it will be possible to cast into a fixed sized array instead of a slice.
Note 3) the any_as_u8_slice function is marked unsafe because any padding bytes in the struct may be uninitialized memory (giving undefined behavior).
If there were a way to ensure input arguments used only structs which were #[repr(packed)], then it could be safe.
Otherwise the function is fairly safe since it prevents buffer over-run since the output is read-only, fixed number of bytes, and its lifetime is bound to the input.If you wanted a version that returned a &mut [u8], that would be quite dangerous since modifying could easily create inconsistent/corrupt data.
(Shamelessly stolen and adapted from Renato Zannon's comment on a similar question)
Perhaps a solution like bincode would suit your case? Here's a working excerpt:
Cargo.toml
[package]
name = "foo"
version = "0.1.0"
authors = ["An Devloper <an.devloper#example.com>"]
edition = "2018"
[dependencies]
bincode = "1.0"
serde = { version = "1.0", features = ["derive"] }
main.rs
use serde::{Deserialize, Serialize};
use std::fs::File;
#[derive(Serialize, Deserialize)]
struct A {
id: i8,
key: i16,
name: String,
values: Vec<String>,
}
fn main() {
let a = A {
id: 42,
key: 1337,
name: "Hello world".to_string(),
values: vec!["alpha".to_string(), "beta".to_string()],
};
// Encode to something implementing `Write`
let mut f = File::create("/tmp/output.bin").unwrap();
bincode::serialize_into(&mut f, &a).unwrap();
// Or just to a buffer
let bytes = bincode::serialize(&a).unwrap();
println!("{:?}", bytes);
}
You would then be able to send the bytes wherever you want. I assume you are already aware of the issues with naively sending bytes around (like potential endianness issues or versioning), but I'll mention them just in case ^_^.
Essentially I have a tcp based network protocol to parse.
In C I can just cast some memory to the type that I want. How can I accomplish something similar with Rust.
You can do the same thing in Rust too. You just have to be little careful when you define the structure.
use std::mem;
#[repr(C)]
#[packed]
struct YourProtoHeader {
magic: u8,
len: u32
}
let mut buf = [0u8, ..1024]; // large enough buffer
// read header from some Reader (a socket, perhaps)
reader.read_at_least(mem::size_of::<YourProtoHeader>(), buf.as_mut_slice()).unwrap();
let ptr: *const u8 = buf.as_ptr();
let ptr: *const YourProtoHeader = ptr as *const YourProtoHeader;
let ptr: &YourProtoHeader = unsafe { &*ptr };
println!("Data length: {}", ptr.len);
Unfortunately, I don't know how to specify the buffer to be exactly size_of::<YourProtoHeader>() size; buffer length must be a constant, but size_of() call is technically a function, so Rust complains when I use it in the array initializer. Nevertheless, large enough buffer will work too.
Here we're converting a pointer to the beginning of the buffer to a pointer to your structure. This is the same thing you would do in C. The structure itself should be annotated with #[repr(C)] and #[pack] attributes: the first one disallows possible field reordering, the second disables padding for field alignment.