How can you allocate a raw mutable pointer in stable Rust? - rust

I was trying to build a naive implementation of a custom String-like struct with small string optimization. Now that unions are allowed in stable Rust, I came up with the following code:
struct Large {
capacity: usize,
buffer: *mut u8,
}
struct Small([u8; 16]);
union Container {
large: Large,
small: Small,
}
struct MyString {
len: usize,
container: Container,
}
I can't seem to find a way how to allocate that *mut u8. Is it possible to do in stable Rust? It looks like using alloc::heap would work, but it is only available in nightly.

As of Rust 1.28, std::alloc::alloc is stable.
Here is an example which shows in general how it can be used.
use std::{
alloc::{self, Layout},
cmp, mem, ptr, slice, str,
};
// This really should **not** be copied
#[derive(Copy, Clone)]
struct Large {
capacity: usize,
buffer: *mut u8,
}
// This really should **not** be copied
#[derive(Copy, Clone, Default)]
struct Small([u8; 16]);
union Container {
large: Large,
small: Small,
}
struct MyString {
len: usize,
container: Container,
}
impl MyString {
fn new() -> Self {
MyString {
len: 0,
container: Container {
small: Small::default(),
},
}
}
fn as_buf(&self) -> &[u8] {
unsafe {
if self.len <= 16 {
&self.container.small.0[..self.len]
} else {
slice::from_raw_parts(self.container.large.buffer, self.len)
}
}
}
pub fn as_str(&self) -> &str {
unsafe { str::from_utf8_unchecked(self.as_buf()) }
}
// Not actually UTF-8 safe!
fn push(&mut self, c: u8) {
unsafe {
use cmp::Ordering::*;
match self.len.cmp(&16) {
Less => {
self.container.small.0[self.len] = c;
}
Equal => {
let capacity = 17;
let layout = Layout::from_size_align(capacity, mem::align_of::<u8>())
.expect("Bad layout");
let buffer = alloc::alloc(layout);
{
let buf = self.as_buf();
ptr::copy_nonoverlapping(buf.as_ptr(), buffer, buf.len());
}
self.container.large = Large { capacity, buffer };
*self.container.large.buffer.offset(self.len as isize) = c;
}
Greater => {
let Large {
mut capacity,
buffer,
} = self.container.large;
capacity += 1;
let layout = Layout::from_size_align(capacity, mem::align_of::<u8>())
.expect("Bad layout");
let buffer = alloc::realloc(buffer, layout, capacity);
self.container.large = Large { capacity, buffer };
*self.container.large.buffer.offset(self.len as isize) = c;
}
}
self.len += 1;
}
}
}
impl Drop for MyString {
fn drop(&mut self) {
unsafe {
if self.len > 16 {
let Large { capacity, buffer } = self.container.large;
let layout =
Layout::from_size_align(capacity, mem::align_of::<u8>()).expect("Bad layout");
alloc::dealloc(buffer, layout);
}
}
}
}
fn main() {
let mut s = MyString::new();
for _ in 0..32 {
s.push(b'a');
println!("{}", s.as_str());
}
}
I believe this code to be correct with respect to allocations, but not for anything else. Like all unsafe code, verify it yourself. It's also completely inefficient as it reallocates for every additional character.
If you'd like to allocate a collection of u8 instead of a single u8, you can create a Vec and then convert it into the constituent pieces, such as by calling as_mut_ptr:
use std::mem;
fn main() {
let mut foo = vec![0; 1024]; // or Vec::<u8>::with_capacity(1024);
let ptr = foo.as_mut_ptr();
let cap = foo.capacity();
let len = foo.len();
mem::forget(foo); // Avoid calling the destructor!
let foo_again = unsafe { Vec::from_raw_parts(ptr, len, cap) }; // Rebuild it to drop it
// Do *NOT* use `ptr` / `cap` / `len` anymore
}
Re allocating is a bit of a pain though; you'd have to convert back to a Vec and do the whole dance forwards and backwards
That being said, your Large struct seems to be missing a length, which would be distinct from capacity. You could just use a Vec instead of writing it out. I see now it's up a bit in the hierarchy.
I wonder if having a full String wouldn't be a lot easier, even if it were a bit less efficient in that the length is double-counted...
union Container {
large: String,
small: Small,
}
See also:
What is the right way to allocate data to pass to an FFI call?
How do I use the Rust memory allocator for a C library that can be provided an allocator?

What about Box::into_raw()?
struct TypeMatches(*mut u8);
TypeMatches(Box::into_raw(Box::new(0u8)));
But it's difficult to tell from your code snippet if this is what you really need. You probably want a real allocator, and you could use libc::malloc with an as cast, as in this example.

There's a memalloc crate which provides a stable allocation API. It's implemented by allocating memory with Vec::with_capacity, then extracting the pointer:
let vec = Vec::with_capacity(cap);
let ptr = buf.as_mut_ptr();
mem::forget(vec);
To free the memory, use Vec::from_raw_parts.

Related

Is there a way in Rust to overload method for a specific type?

The following is only an example. If there's a native solution for this exact problem with reading bytes - cool, but my goal is to learn how to do it by myself, for any other purpose as well.
I'd like to do something like this: (pseudo-code below)
let mut reader = Reader::new(bytesArr);
let int32: i32 = reader.read(); // separate implementation to read 4 bits and convert into int32
let int64: i64 = reader.read(); // separate implementation to read 8 bits and convert into int64
I imagine it looking like this: (pseudo-code again)
impl Reader {
read<T>(&mut self) -> T {
// if T is i32 ... else if ...
}
}
or like this:
impl Reader {
read(&mut self) -> i32 {
// ...
}
read(&mut self) -> i64 {
// ...
}
}
But haven't found anything relatable yet.
(I actually have, for the first case (if T is i32 ...), but it looked really unreadable and inconvenient)
You could do this by having a Readable trait which you implement on i32 and i64, which does the operation. Then on Reader you could have a generic function which takes any type that is Readable and return it, for example:
struct Reader {
n: u8,
}
trait Readable {
fn read_from_reader(reader: &mut Reader) -> Self;
}
impl Readable for i32 {
fn read_from_reader(reader: &mut Reader) -> i32 {
reader.n += 1;
reader.n as i32
}
}
impl Readable for i64 {
fn read_from_reader(reader: &mut Reader) -> i64 {
reader.n += 1;
reader.n as i64
}
}
impl Reader {
fn read<T: Readable>(&mut self) -> T {
T::read_from_reader(self)
}
}
fn main() {
let mut r = Reader { n: 0 };
let int32: i32 = r.read();
let int64: i64 = r.read();
println!("{} {}", int32, int64);
}
You can try it on the playground
After some trials and searches, I found that implementing them in current Rust seems a bit difficult, but not impossible.
Here is the code, I'll explain it afterwards:
#![feature(generic_const_exprs)]
use std::{
mem::{self, MaybeUninit},
ptr,
};
static DATA: [u8; 8] = [
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
u8::MAX,
];
struct Reader;
impl Reader {
fn read<T: Copy + Sized>(&self) -> T
where
[(); mem::size_of::<T>()]: ,
{
let mut buf = [unsafe { MaybeUninit::uninit().assume_init() }; mem::size_of::<T>()];
unsafe {
ptr::copy_nonoverlapping(DATA.as_ptr(), buf.as_mut_ptr(), buf.len());
mem::transmute_copy(&buf)
}
}
}
fn main() {
let reader = Reader;
let v_u8: u8 = reader.read();
dbg!(v_u8);
let v_u16: u16 = reader.read();
dbg!(v_u16);
let v_u32: u32 = reader.read();
dbg!(v_u32);
let v_u64: u64 = reader.read();
dbg!(v_u64);
}
Suppose the global static variable DATA is the target data you want to read.
In current Rust, we cannot directly use the size of a generic parameter as the length of an array. This does not work:
fn example<T: Copy + Sized>() {
let mut _buf = [0_u8; mem::size_of::<T>()];
}
The compiler gives a weird error:
error: unconstrained generic constant
--> src\main.rs:34:31
|
34 | let mut _buf = [0_u8; mem::size_of::<T>()];
| ^^^^^^^^^^^^^^^^^^^
|
= help: try adding a `where` bound using this expression: `where [(); mem::size_of::<T>()]:`
There is an issue that is tracking it, if you want to go deeper into this error you can take a look.
We just follow the compiler's suggestion to add a where bound. This requires feature generic_const_exprs to be enabled.
Next, unsafe { MaybeUninit::uninit().assume_init() } is optional, which drops the overhead of initializing this array, since we will eventually overwrite it completely. You can replace it with 0_u8 if you don't like it.
Finally, copy the data you need and transmute this array to your generic type, return.
I think you will see the output you expect:
[src\main.rs:38] v_u8 = 255
[src\main.rs:41] v_u16 = 65535
[src\main.rs:44] v_u32 = 4294967295
[src\main.rs:47] v_u64 = 18446744073709551615

If an ffi function modifies a pointer, should the owning struct be referenced mutable?

I am currently experimenting with the FFI functionality of Rust and implemented a simble HTTP request using libcurl as an exercise. Consider the following self-contained example:
use std::ffi::c_void;
#[repr(C)]
struct CURL {
_private: [u8; 0],
}
// Global CURL codes
const CURL_GLOBAL_DEFAULT: i64 = 3;
const CURLOPT_WRITEDATA: i64 = 10001;
const CURLOPT_URL: i64 = 10002;
const CURLOPT_WRITEFUNCTION: i64 = 20011;
// Curl types
type CURLcode = i64;
type CURLoption = i64;
// Curl function bindings
#[link(name = "curl")]
extern "C" {
fn curl_easy_init() -> *mut CURL;
fn curl_easy_setopt(handle: *mut CURL, option: CURLoption, value: *mut c_void) -> CURLcode;
fn curl_easy_perform(handle: *mut CURL) -> CURLcode;
fn curl_global_init(flags: i64) -> CURLcode;
}
// Curl callback for data retrieving
extern "C" fn callback_writefunction(
data: *mut u8,
size: usize,
nmemb: usize,
user_data: *mut c_void,
) -> usize {
let slice = unsafe { std::slice::from_raw_parts(data, size * nmemb) };
let mut vec = unsafe { Box::from_raw(user_data as *mut Vec<u8>) };
vec.extend_from_slice(slice);
Box::into_raw(vec);
nmemb * size
}
type Result<T> = std::result::Result<T, CURLcode>;
// Our own curl handle
pub struct Curl {
handle: *mut CURL,
data_ptr: *mut Vec<u8>,
}
impl Curl {
pub fn new() -> std::result::Result<Curl, CURLcode> {
let ret = unsafe { curl_global_init(CURL_GLOBAL_DEFAULT) };
if ret != 0 {
return Err(ret);
}
let handle = unsafe { curl_easy_init() };
if handle.is_null() {
return Err(2); // CURLE_FAILED_INIT according to libcurl-errors(3)
}
// Set data callback
let ret = unsafe {
curl_easy_setopt(
handle,
CURLOPT_WRITEFUNCTION,
callback_writefunction as *mut c_void,
)
};
if ret != 0 {
return Err(2);
}
// Set data pointer
let data_buf = Box::new(Vec::new());
let data_ptr = Box::into_raw(data_buf);
let ret = unsafe {
curl_easy_setopt(handle, CURLOPT_WRITEDATA, data_ptr as *mut std::ffi::c_void)
};
match ret {
0 => Ok(Curl { handle, data_ptr }),
_ => Err(2),
}
}
pub fn set_url(&self, url: &str) -> Result<()> {
let url_cstr = std::ffi::CString::new(url.as_bytes()).unwrap();
let ret = unsafe {
curl_easy_setopt(
self.handle,
CURLOPT_URL,
url_cstr.as_ptr() as *mut std::ffi::c_void,
)
};
match ret {
0 => Ok(()),
x => Err(x),
}
}
pub fn perform(&self) -> Result<String> {
let ret = unsafe { curl_easy_perform(self.handle) };
if ret == 0 {
let b = unsafe { Box::from_raw(self.data_ptr) };
let data = (*b).clone();
Box::into_raw(b);
Ok(String::from_utf8(data).unwrap())
} else {
Err(ret)
}
}
}
fn main() -> Result<()> {
let my_curl = Curl::new().unwrap();
my_curl.set_url("https://www.example.com")?;
my_curl.perform().and_then(|data| Ok(println!("{}", data)))
// No cleanup code in this example for the sake of brevity.
}
While this works, I found it surprising that my_curl does not need to be declared mut, since none of the methods use &mut self, even though they pass a mut* pointer to the FFI function
s.
Should I change the declaration of perform to use &mut self instead of &self (for safety), since the internal buffer gets modified? Rust does not enforce this, but of course Rust does not know that the buffer gets modified by libcurl.
This small example runs fine, but I am unsure if I would be facing any kind of issues in larger programs, when the compiler might optimize for non-mutable access on the Curl struct, even though the instance of the struct is getting modified - or at least the data the pointer is pointing to.
Contrary to popular belief, there is absolutely no borrowchecker-induced restriction in Rust on passing *const/*mut pointers. There doesn't need to be, because dereferencing pointers is inherently unsafe, and can only be done in such blocks, with the programmer verifying all necessary invariants manually. In your case, you need to tell the compiler that is a mutable reference, as you already suspected.
The interested reader should definitely give the ffi section of the nomicon a read, to find out about some interesting ways to shoot yourself in the foot with it.

How to update in one thread and read from many?

I've failed to get this code past the borrow-checker:
use std::sync::Arc;
use std::thread::{sleep, spawn};
use std::time::Duration;
#[derive(Debug, Clone)]
struct State {
count: u64,
not_copyable: Vec<u8>,
}
fn bar(thread_num: u8, arc_state: Arc<State>) {
let state = arc_state.clone();
loop {
sleep(Duration::from_millis(1000));
println!("thread_num: {}, state.count: {}", thread_num, state.count);
}
}
fn main() -> std::io::Result<()> {
let mut state = State {
count: 0,
not_copyable: vec![],
};
let arc_state = Arc::new(state);
for i in 0..2 {
spawn(move || {
bar(i, arc_state.clone());
});
}
loop {
sleep(Duration::from_millis(300));
state.count += 1;
}
}
I'm probably trying the wrong thing.
I want one (main) thread which can update state and many threads which can read state.
How should I do this in Rust?
I have read the Rust book on shared state, but that uses mutexes which seem overly complex for a single writer / multiple reader situation.
In C I would achieve this with a generous sprinkling of _Atomic.
Atomics are indeed a proper way, there are plenty of those in std (link. Your example needs 2 fixes.
Arc must be cloned before moving into the closure, so your loop becomes:
for i in 0..2 {
let arc_state = arc_state.clone();
spawn(move || { bar(i, arc_state); });
}
Using AtomicU64 is fairly straight forward, though you need explicitly use newtype methods with specified Ordering (Playground):
use std::sync::atomic::{AtomicU64, Ordering};
use std::sync::Arc;
use std::thread::{sleep, spawn};
use std::time::Duration;
#[derive(Debug)]
struct State {
count: AtomicU64,
not_copyable: Vec<u8>,
}
fn bar(thread_num: u8, arc_state: Arc<State>) {
let state = arc_state.clone();
loop {
sleep(Duration::from_millis(1000));
println!(
"thread_num: {}, state.count: {}",
thread_num,
state.count.load(Ordering::Relaxed)
);
}
}
fn main() -> std::io::Result<()> {
let state = State {
count: AtomicU64::new(0),
not_copyable: vec![],
};
let arc_state = Arc::new(state);
for i in 0..2 {
let arc_state = arc_state.clone();
spawn(move || {
bar(i, arc_state);
});
}
loop {
sleep(Duration::from_millis(300));
// you can't use `state` here, because it moved
arc_state.count.fetch_add(1, Ordering::Relaxed);
}
}

Rust creating String with pointer offset

So let's say I have a String, "Foo Bar" and I want to create a substring of "Bar" without allocating new memory.
So I moved the raw pointer of the original string to the start of the substring (in this case offsetting it by 4) and use the String::from_raw_parts() function to create the String.
So far I have the following code, which as far as I understand should do this just fine. I just don't understand why this does not work.
use std::mem;
fn main() {
let s = String::from("Foo Bar");
let ptr = s.as_ptr();
mem::forget(s);
unsafe {
// no error when using ptr.add(0)
let txt = String::from_raw_parts(ptr.add(4) as *mut _, 3, 3);
println!("{:?}", txt); // This even prints "Bar" but crashes afterwards
println!("prints because 'txt' is still in scope");
}
println!("won't print because 'txt' was dropped",)
}
I get the following error on Windows:
error: process didn't exit successfully: `target\debug\main.exe` (exit code: 0xc0000374, STATUS_HEAP_CORRUPTION)
And these on Linux (cargo run; cargo run --release):
munmap_chunk(): invalid pointer
free(): invalid pointer
I think it has something to do with the destructor of String, because as long as txt is in scope the program runs just fine.
Another thing to notice is that when I use ptr.add(0) instead of ptr.add(4) it runs without an error.
Creating a slice didn't give me any problems on the other Hand. Dropping that worked just fine.
let t = slice::from_raw_parts(ptr.add(4), 3);
In the end I want to split an owned String in place into multiple owned Strings without allocating new memory.
Any help is appreciated.
The reason for the errors is the way that the allocator works. It is Undefined Behaviour to ask the allocator to free a pointer that it didn't give you in the first place. In this case, the allocator allocated 7 bytes for s and returned a pointer to the first one. However, when txt is dropped, it tells the allocator to deallocate a pointer to byte 4, which it has never seen before. This is why there is no issue when you add(0) instead of add(4).
Using unsafe correctly is hard, and you should avoid it where possible.
Part of the purpose of the &str type is to allow portions of an owned string to be shared, so I would strongly encourage you to use those if you can.
If the reason you can't just use &str on its own is because you aren't able to track the lifetimes back to the original String, then there are still some solutions, with different trade-offs:
Leak the memory, so it's effectively static:
let mut s = String::from("Foo Bar");
let s = Box::leak(s.into_boxed_str());
let txt: &'static str = &s[4..];
let s: &'static str = &s[..4];
Obviously, you can only do this a few times in your application, or else you are going to use up too much memory that you can't get back.
Use reference-counting to make sure that the original String stays around long enough for all of the slices to remain valid. Here is a sketch solution:
use std::{fmt, ops::Deref, rc::Rc};
struct RcStr {
rc: Rc<String>,
start: usize,
len: usize,
}
impl RcStr {
fn from_rc_string(rc: Rc<String>, start: usize, len: usize) -> Self {
RcStr { rc, start, len }
}
fn as_str(&self) -> &str {
&self.rc[self.start..self.start + self.len]
}
}
impl Deref for RcStr {
type Target = str;
fn deref(&self) -> &str {
self.as_str()
}
}
impl fmt::Display for RcStr {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fmt::Display::fmt(self.as_str(), f)
}
}
impl fmt::Debug for RcStr {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fmt::Debug::fmt(self.as_str(), f)
}
}
fn main() {
let s = Rc::new(String::from("Foo Bar"));
let txt = RcStr::from_rc_string(Rc::clone(&s), 4, 3);
let s = RcStr::from_rc_string(Rc::clone(&s), 0, 4);
println!("{:?}", txt); // "Bar"
println!("{:?}", s); // "Foo "
}

Rust MemWriter return pointer to Buffer

I'd like to have a function use a MemWriter to write some bytes and then return a pointer to the buffer. I'm struggling to understand how to use lifetimes in this case. How would I make the below code work and what should I read to fill my knowledge gap here?
struct Request<T: Encodable> {
id: i16,
e: T
}
impl <T: Encodable> Request<T> {
fn serialize<'s>(&'s self) -> io::IoResult<&'s Vec<u8>> {
let mut writer = io::MemWriter::new();
try!(writer.write_be_i16(0 as i16));
let buf = writer.unwrap();
let size = buf.len();
let result: io::IoResult<&Vec<u8>> = Ok(&buf);
result
}
}
You can't return a reference to a buffer that is stored nowhere
You need to store your buffer internally, or you would try to return a reference to freed memory, which is dangerous and thus forbidden by the lifetime checker.
For example like this :
struct Request<T: Encodable> {
buf: Vec<u8>
}
impl <T: Encodable> Request<T> {
fn serialize<'s>(&'s mut self) -> io::IoResult<&'s Vec<u8>> { //'
let mut writer = io::MemWriter::new();
try!(writer.write_be_i16(0 as i16));
self.buf = writer.unwrap();
let size = self.buf.len();
let result: io::IoResult<&Vec<u8>> = Ok(&self.buf);
result
}
}
Or, as Vladimir Matveev pointed out in the comments, you can simply return the Vec. Vec is already a container safely managing memory on the heap, returning it directly should be good for you in most situations, and this way you avoid any lifetime issues.
impl <T: Encodable> Request<T> {
fn serialize(&mut self) -> io::IoResult<Vec<u8>> {
let mut writer = io::MemWriter::new();
try!(writer.write_be_i16(0 as i16));
let buf = writer.unwrap();
let size = buf.len();
Ok(buf)
}
}

Resources