Instantiating a struct that has private fields [duplicate] - rust

I'm trying to learn more about FFI in Rust and linking with C libraries, specifically libc. While on my "quest", I came across the following problem.
Normal pattern in C
void(* sig_set(int sig, void(*handler)(int))) {
// uninitialized sigaction structs
struct sigaction new_action, old_action;
// assign options to new action
new_action.sa_flags = SA_RESTART;
new_action.sa_handler = handler;
sigemptyset(&new_action.sa_mask);
if(sigaction(sig, &new_action, &old_action) < 0) {
fprintf(stderr, "Error: %s!\n", "signal error");
exit(1);
}
return old_action.sa_handler;
}
Attempt in Rust
use libc; // 0.2.77
fn sig_init(sig: i32, handler: fn(i32) -> ()) -> usize {
unsafe {
let mut new_action: libc::sigaction;
let mut old_action: libc::sigaction;
new_action.sa_flags = 0x10000000;
new_action.sa_sigaction = handler as usize;
libc::sigemptyset(&mut new_action.sa_mask as *mut libc::sigset_t);
libc::sigaction(
sig,
&mut new_action as *mut libc::sigaction,
&mut old_action as *mut libc::sigaction,
);
old_action.sa_sigaction
}
}
The compiler will throw the following error:
error[E0381]: assign to part of possibly-uninitialized variable: `new_action`
--> src/lib.rs:8:9
|
8 | new_action.sa_flags = 0x10000000;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ use of possibly-uninitialized `new_action`
error[E0381]: borrow of possibly-uninitialized variable: `old_action`
--> src/lib.rs:15:13
|
15 | &mut old_action as *mut libc::sigaction,
| ^^^^^^^^^^^^^^^ use of possibly-uninitialized `old_action`
This makes sense as very bad things could happen if sigemptyset were to read from sa_mask. So I tried the following on line 3 of the above.
let mut new_action: libc::sigaction = libc::sigaction {
sa_sigaction: handler as usize,
sa_flags: 0x10000000,
sa_mask: mask,
};
This will not work as _restorer is missing in the above example, but _restorer is private. How would I get around this problem or a similar situation? Would you use something like mem::transmute?

The standard library defines a couple of types and functions to deal with initialization. They are generic, so they can be used to initialize values of any type.
Modern Rust suggests using MaybeUninit in most cases. Applied to your case, it would look something like:
use std::mem::MaybeUninit;
let mut new_action: libc::sigaction = MaybeUninit::zeroed().assume_init();
let mut old_action: libc::sigaction = MaybeUninit::zeroed().assume_init();
MaybeUninit was stabilized in Rust 1.36. Before that, you could use std::mem::uninitialized(), which gives you an uninitialized value. LLVM will consider the contents to be undefined, and will perform aggressive optimizations based on this. You must initialize any value before it's read.
More appropriate to your case, there's std::mem::zeroed(), which gives you a value whose storage is filled with zeroes. This function is unsafe because such a value is not necessarily legal for all types. zeroed() is appropriate for "plain old data" (POD) types. Applied to your case, it would look something like:
use std::mem;
let mut new_action: libc::sigaction = mem::zeroed();
let mut old_action: libc::sigaction = mem::zeroed();

How about using mem::zeroed? The docs even say:
This is useful for FFI functions sometimes, but should generally be avoided.

Related

Is there a way to use the value of an element from a map to update another element without interior mutability? [duplicate]

I have the following code:
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &mut HashMap<i32, HashSet<i32>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get_mut(&start).unwrap();
let pipes = conns.get(&num).unwrap();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
The logic is not very important, I'm trying to create a function which will itself and walk over pipes.
My issue is that this doesn't compile:
error[E0502]: cannot borrow `*conns` as immutable because it is also borrowed as mutable
--> src/main.rs:10:17
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- mutable borrow occurs here
10 | let pipes = conns.get(&num).unwrap();
| ^^^^^ immutable borrow occurs here
...
19 | }
| - mutable borrow ends here
error[E0499]: cannot borrow `*conns` as mutable more than once at a time
--> src/main.rs:16:46
|
9 | let mut orig_conns = conns.get_mut(&start).unwrap();
| ----- first mutable borrow occurs here
...
16 | populate_connections(start, num, conns, ancs);
| ^^^^^ second mutable borrow occurs here
...
19 | }
| - first borrow ends here
I don't know how to make it work. At the beginning, I'm trying to get two HashSets stored in a HashMap (orig_conns and pipes).
Rust won't let me have both mutable and immutable variables at the same time. I'm confused a bit because this will be completely different objects but I guess if &start == &num, then I would have two different references to the same object (one mutable, one immutable).
Thats ok, but then how can I achieve this? I want to iterate over one HashSet and read and modify other one. Let's assume that they won't be the same HashSet.
If you can change your datatypes and your function signature, you can use a RefCell to create interior mutability:
use std::cell::RefCell;
use std::collections::{HashMap, HashSet};
fn populate_connections(
start: i32,
num: i32,
conns: &HashMap<i32, RefCell<HashSet<i32>>>,
ancs: &mut HashSet<i32>,
) {
let mut orig_conns = conns.get(&start).unwrap().borrow_mut();
let pipes = conns.get(&num).unwrap().borrow();
for pipe in pipes.iter() {
if !ancs.contains(pipe) && !orig_conns.contains(pipe) {
ancs.insert(*pipe);
orig_conns.insert(*pipe);
populate_connections(start, num, conns, ancs);
}
}
}
fn main() {}
Note that if start == num, the thread will panic because this is an attempt to have both mutable and immutable access to the same HashSet.
Safe alternatives to RefCell
Depending on your exact data and code needs, you can also use types like Cell or one of the atomics. These have lower memory overhead than a RefCell and only a small effect on codegen.
In multithreaded cases, you may wish to use a Mutex or RwLock.
Use hashbrown::HashMap
If you can switch to using hashbrown, you may be able to use a method like get_many_mut:
use hashbrown::HashMap; // 0.12.1
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
if let Some([a, b]) = map.get_many_mut([&1, &2]) {
std::mem::swap(a, b);
}
dbg!(&map);
}
As hashbrown is what powers the standard library hashmap, this is also available in nightly Rust as HashMap::get_many_mut.
Unsafe code
If you can guarantee that your two indices are different, you can use unsafe code and avoid interior mutability:
use std::collections::HashMap;
fn get_mut_pair<'a, K, V>(conns: &'a mut HashMap<K, V>, a: &K, b: &K) -> (&'a mut V, &'a mut V)
where
K: Eq + std::hash::Hash,
{
unsafe {
let a = conns.get_mut(a).unwrap() as *mut _;
let b = conns.get_mut(b).unwrap() as *mut _;
assert_ne!(a, b, "The two keys must not resolve to the same value");
(&mut *a, &mut *b)
}
}
fn main() {
let mut map = HashMap::new();
map.insert(1, true);
map.insert(2, false);
dbg!(&map);
let (a, b) = get_mut_pair(&mut map, &1, &2);
std::mem::swap(a, b);
dbg!(&map);
}
Similar code can be found in libraries like multi_mut.
This code tries to have an abundance of caution. An assertion enforces that the two values are distinct pointers before converting them back into mutable references and we explicitly add lifetimes to the returned variables.
You should understand the nuances of unsafe code before blindly using this solution. Notably, previous versions of this answer were incorrect. Thanks to #oberien for finding the unsoundness in the original implementation of this and proposing a fix. This playground demonstrates how purely safe Rust code could cause the old code to result in memory unsafety.
An enhanced version of this solution could accept an array of keys and return an array of values:
fn get_mut_pair<'a, K, V, const N: usize>(conns: &'a mut HashMap<K, V>, mut ks: [&K; N]) -> [&'a mut V; N]
It becomes more difficult to ensure that all the incoming keys are unique, however.
Note that this function doesn't attempt to solve the original problem, which is vastly more complex than verifying that two indices are disjoint. The original problem requires:
tracking three disjoint borrows, two of which are mutable and one that is immutable.
tracking the recursive call
must not modify the HashMap in any way which would cause resizing, which would invalidate any of the existing references from a previous level.
must not alias any of the references from a previous level.
Using something like RefCell is a much simpler way to ensure you do not trigger memory unsafety.

Does the documentation mention the possibility of adding the `mut` keyword in front of functions' arguments?

I have a basic Reader encapsulating some generic elements:
pub struct Reader<R> {
inner: R,
order: Endian,
first_ifd_offset: usize,
}
impl<R: Read + Seek> Reader<R> {
pub fn new(reader: R) -> Result<Reader<R>> {
let mut order_raw = [0, 0];
reader.read_exact(&mut order_raw)?;
let magic_number = u16::to_be(u16::from_bytes(order_raw));
/* ... */
}
}
This does not compile and produces the following error:
error[E0596]: cannot borrow immutable argument `reader` as mutable
--> src/reader.rs:17:9
|
15 | pub fn new(reader: R) -> Result<Reader<R>> {
| ------ consider changing this to `mut reader`
16 | let mut order_raw = [0, 0];
17 | reader.read_exact(&mut order_raw)?;
| ^^^^^^ cannot borrow mutably
As I am getting the argument by value, the new function should be the new owner of the reader element. The compiler advises me to to add a mut keyword in front of the function argument.
Does the documentation mention the possibility of adding the mut keyword in front of functions' arguments? I was not able to find resources mentioning it.
The BufReader struct of the standard library has a
similar new function and does not use a mut keyword but an unsafe
block code in the body. Does unsafe prevent the usage of mut inside the function's signature?
It’s implied but not directly mentioned in the book. Both let and function arguments are patterns, so just like you can use mut in let, you can use it in arguments.
I think the compiler is very precise in saying where to add the mut. Generally the compiler tries to underline the specific places:
pub fn new(mut reader: R) -> Result<Reader<R>>
It's now possible to mutate the reader in the function. This behaves like:
pub fn new(reader: R) -> Result<Reader<R>, Error> {
let mut reader = reader;
// ...
As far as I know, it's only mentioned once in the book but more or less in sense of It's a pattern, you may use it in functions too.
unsafe does not fix it, it's UB:
Mutating non-mutable data — that is, data reached through a shared reference or data owned by a let binding), unless that data is contained within an UnsafeCell<U>.

How do I initialize an opaque C struct when using Rust FFI?

Here's what I would like to do in C code:
#include <some_lib.h>
int main() {
some_lib_struct_t x;
some_lib_func(&x);
}
How do I make use of the library in Rust? Here's what I've got so far:
extern crate libc; // 0.2.51
struct some_lib_struct_t;
#[link(name = "some_lib")]
extern "C" {
fn some_lib_func(x: *mut some_lib_struct_t);
}
fn main() {
let mut x: some_lib_struct_t;
unsafe {
some_lib_func(&mut x);
}
}
When compiling I get an error:
error[E0381]: borrow of possibly uninitialized variable: `x`
--> src/main.rs:13:23
|
13 | some_lib_func(&mut x);
| ^^^^^^ use of possibly uninitialized `x`
The safest answer is to initialize the struct yourself:
let mut x: some_lib_struct_t = some_lib_struct_t;
unsafe {
some_lib_func(&mut x);
}
The closest analog to the C code is to use MaybeUninit
use std::mem::MaybeUninit;
unsafe {
let mut x = MaybeUninit::uninit();
some_lib_func(x.as_mut_ptr());
}
Before Rust 1.36, you can use mem::uninitialized:
unsafe {
let mut x: some_lib_struct_t = std::mem::uninitialized();
some_lib_func(&mut x);
}
You have to be sure that some_lib_func completely initializes all the members of the struct, otherwise the unsafety will leak outside of the unsafe block.
Speaking of "members of the struct", I can almost guarantee your code won't do what you want. You've defined some_lib_struct_t as having zero size. That means that no stack space will be allocated for it, and a reference to it won't be what your C code is expecting.
You need to mirror the definition of the C struct in Rust so that the appropriate size, padding, and alignment can be allocated. Usually, this means using repr(C).
Many times, C libraries avoid exposing their internal struct representation by always returning a pointer to the opaque type:
What's the Rust idiom to define a field pointing to a C opaque pointer?
In Rust how can I define or import a C struct from a third party library?
Allocating an object for C / FFI library calls
From the documentation for mem::uninitialized():
Deprecated since 1.39.0: use mem::MaybeUninit instead
The new solution would look like this:
use std::mem::MaybeUninit;
let instance = unsafe {
let mut x: MaybeUninit<some_lib_struct_t> = MaybeUninit::uninit();
some_lib_func(x.as_mut_ptr());
x.assume_init()
}
After reading Shepmaster's answer, I looked closer at the header for the library. Just as they said, some_lib_struct_t was just a typedef for a pointer to actual_lib_struct_t. I made the following changes:
extern crate libc;
struct actual_lib_struct_t;
type some_lib_type_t = *mut actual_lib_struct_t;
#[link(name="some_lib")]
extern {
fn some_lib_func(x: *mut some_lib_type_t);
}
fn main() {
let mut x: some_lib_type_t;
unsafe {
x = std::mem::uninitialized();
some_lib_func(&mut x);
}
}
And it works! I do however get the warning found zero-size struct in foreign module, consider adding a member to this struct, #[warn(improper_ctypes)] on by default.

tokio-curl: capture output into a local `Vec` - may outlive borrowed value

I do not know Rust well enough to understand lifetimes and closures yet...
Trying to collect the downloaded data into a vector using tokio-curl:
extern crate curl;
extern crate futures;
extern crate tokio_core;
extern crate tokio_curl;
use std::io::{self, Write};
use std::str;
use curl::easy::Easy;
use tokio_core::reactor::Core;
use tokio_curl::Session;
fn main() {
// Create an event loop that we'll run on, as well as an HTTP `Session`
// which we'll be routing all requests through.
let mut lp = Core::new().unwrap();
let mut out = Vec::new();
let session = Session::new(lp.handle());
// Prepare the HTTP request to be sent.
let mut req = Easy::new();
req.get(true).unwrap();
req.url("https://www.rust-lang.org").unwrap();
req.write_function(|data| {
out.extend_from_slice(data);
io::stdout().write_all(data).unwrap();
Ok(data.len())
})
.unwrap();
// Once we've got our session, issue an HTTP request to download the
// rust-lang home page
let request = session.perform(req);
// Execute the request, and print the response code as well as the error
// that happened (if any).
let mut req = lp.run(request).unwrap();
println!("{:?}", req.response_code());
println!("out: {}", str::from_utf8(&out).unwrap());
}
Produces an error:
error[E0373]: closure may outlive the current function, but it borrows `out`, which is owned by the current function
--> src/main.rs:25:24
|
25 | req.write_function(|data| {
| ^^^^^^ may outlive borrowed value `out`
26 | out.extend_from_slice(data);
| --- `out` is borrowed here
|
help: to force the closure to take ownership of `out` (and any other referenced variables), use the `move` keyword, as shown:
| req.write_function(move |data| {
Investigating further, I see that Easy::write_function requires the 'static lifetime, but the example of how to collect output from the curl-rust docs uses Transfer::write_function instead:
use curl::easy::Easy;
let mut data = Vec::new();
let mut handle = Easy::new();
handle.url("https://www.rust-lang.org/").unwrap();
{
let mut transfer = handle.transfer();
transfer.write_function(|new_data| {
data.extend_from_slice(new_data);
Ok(new_data.len())
}).unwrap();
transfer.perform().unwrap();
}
println!("{:?}", data);
The Transfer::write_function does not require the 'static lifetime:
impl<'easy, 'data> Transfer<'easy, 'data> {
/// Same as `Easy::write_function`, just takes a non `'static` lifetime
/// corresponding to the lifetime of this transfer.
pub fn write_function<F>(&mut self, f: F) -> Result<(), Error>
where F: FnMut(&[u8]) -> Result<usize, WriteError> + 'data
{
...
But I can't use a Transfer instance on tokio-curl's Session::perform because it requires the Easy type:
pub fn perform(&self, handle: Easy) -> Perform {
transfer.easy is a private field that is directly passed to session.perform.
It this an issue with tokio-curl? Maybe it should mark the transfer.easy field as public or implement new function like perform_transfer? Is there another way to collect output using tokio-curl per transfer?
The first thing you have to understand when using the futures library is that you don't have any control over what thread the code is going to run on.
In addition, the documentation for curl's Easy::write_function says:
Note that the lifetime bound on this function is 'static, but that is often too restrictive. To use stack data consider calling the transfer method and then using write_function to configure a callback that can reference stack-local data.
The most straight-forward solution is to use some type of locking primitive to ensure that only one thread at a time may have access to the vector. You also have to share ownership of the vector between the main thread and the closure:
use std::sync::Mutex;
use std::sync::Arc;
let out = Arc::new(Mutex::new(Vec::new()));
let out_closure = out.clone();
// ...
req.write_function(move |data| {
let mut out = out_closure.lock().expect("Unable to lock output");
// ...
}).expect("Cannot set writing function");
// ...
let out = out.lock().expect("Unable to lock output");
println!("out: {}", str::from_utf8(&out).expect("Data was not UTF-8"));
Unfortunately, the tokio-curl library does not currently support using the Transfer type that would allow for stack-based data.

How can I initialize sigset_t or other variables used as "out parameters" in Rust?

I'm trying to learn more about FFI in Rust and linking with C libraries, specifically libc. While on my "quest", I came across the following problem.
Normal pattern in C
void(* sig_set(int sig, void(*handler)(int))) {
// uninitialized sigaction structs
struct sigaction new_action, old_action;
// assign options to new action
new_action.sa_flags = SA_RESTART;
new_action.sa_handler = handler;
sigemptyset(&new_action.sa_mask);
if(sigaction(sig, &new_action, &old_action) < 0) {
fprintf(stderr, "Error: %s!\n", "signal error");
exit(1);
}
return old_action.sa_handler;
}
Attempt in Rust
use libc; // 0.2.77
fn sig_init(sig: i32, handler: fn(i32) -> ()) -> usize {
unsafe {
let mut new_action: libc::sigaction;
let mut old_action: libc::sigaction;
new_action.sa_flags = 0x10000000;
new_action.sa_sigaction = handler as usize;
libc::sigemptyset(&mut new_action.sa_mask as *mut libc::sigset_t);
libc::sigaction(
sig,
&mut new_action as *mut libc::sigaction,
&mut old_action as *mut libc::sigaction,
);
old_action.sa_sigaction
}
}
The compiler will throw the following error:
error[E0381]: assign to part of possibly-uninitialized variable: `new_action`
--> src/lib.rs:8:9
|
8 | new_action.sa_flags = 0x10000000;
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ use of possibly-uninitialized `new_action`
error[E0381]: borrow of possibly-uninitialized variable: `old_action`
--> src/lib.rs:15:13
|
15 | &mut old_action as *mut libc::sigaction,
| ^^^^^^^^^^^^^^^ use of possibly-uninitialized `old_action`
This makes sense as very bad things could happen if sigemptyset were to read from sa_mask. So I tried the following on line 3 of the above.
let mut new_action: libc::sigaction = libc::sigaction {
sa_sigaction: handler as usize,
sa_flags: 0x10000000,
sa_mask: mask,
};
This will not work as _restorer is missing in the above example, but _restorer is private. How would I get around this problem or a similar situation? Would you use something like mem::transmute?
The standard library defines a couple of types and functions to deal with initialization. They are generic, so they can be used to initialize values of any type.
Modern Rust suggests using MaybeUninit in most cases. Applied to your case, it would look something like:
use std::mem::MaybeUninit;
let mut new_action: libc::sigaction = MaybeUninit::zeroed().assume_init();
let mut old_action: libc::sigaction = MaybeUninit::zeroed().assume_init();
MaybeUninit was stabilized in Rust 1.36. Before that, you could use std::mem::uninitialized(), which gives you an uninitialized value. LLVM will consider the contents to be undefined, and will perform aggressive optimizations based on this. You must initialize any value before it's read.
More appropriate to your case, there's std::mem::zeroed(), which gives you a value whose storage is filled with zeroes. This function is unsafe because such a value is not necessarily legal for all types. zeroed() is appropriate for "plain old data" (POD) types. Applied to your case, it would look something like:
use std::mem;
let mut new_action: libc::sigaction = mem::zeroed();
let mut old_action: libc::sigaction = mem::zeroed();
How about using mem::zeroed? The docs even say:
This is useful for FFI functions sometimes, but should generally be avoided.

Resources