What is the difference between the allocators "Global" and "System"? - rust

The Rust standard library provides two allocator structs: std::alloc::{System, Global}.
What's the relation and the difference between them?

Please note that the following explanations focus on Rust 1.60 nightly. Some things might change, when the new allocator_api feature gets stabilized.
TL;DR: std::alloc::System glues the standard library types that need memory allocations to the memory allocation mechanism of the operating system (such as: libc: malloc -> brk()/mmap()). alloc::alloc::Global is
the default implementation for the allocator_api feature, which can be used to alter allocations of Vectors, Boxes, etc.
Whereas std::alloc::System actually comes from the standard library, std::alloc::Global is in fact a re-export from alloc::alloc::Global.
When you register a custom global allocator, you need a type that implements the trait core::alloc::GlobalAlloc. When you assign this type to a global static variable with the #[global_allocator] attribute, Rust lets the following "magic symbols" point to the corresponding implementations of the trait functions.
extern "Rust" {
// These are the magic symbols to call the global allocator. rustc generates
// them to call `__rg_alloc` etc. if there is a `#[global_allocator]` attribute
// (the code expanding that attribute macro generates those functions), or to call
// the default implementations in libstd (`__rdl_alloc` etc. in `library/std/src/alloc.rs`)
// otherwise.
// The rustc fork of LLVM also special-cases these function names to be able to optimize them
// like `malloc`, `realloc`, and `free`, respectively.
#[rustc_allocator]
#[rustc_allocator_nounwind]
fn __rust_alloc(size: usize, align: usize) -> *mut u8;
#[rustc_allocator_nounwind]
fn __rust_dealloc(ptr: *mut u8, size: usize, align: usize);
#[rustc_allocator_nounwind]
fn __rust_realloc(ptr: *mut u8, old_size: usize, align: usize, new_size: usize) -> *mut u8;
#[rustc_allocator_nounwind]
fn __rust_alloc_zeroed(size: usize, align: usize) -> *mut u8;
}
There exist higher-level Rust wrappers for these magic functions:
alloc::alloc::{alloc, alloc_zeroed, dealloc, realloc}.
std::alloc::System is the default #[global_allocator] allocator for the target you are compiling for (Unix, Windows, ...). For example, on UNIX-systems it will use libc::malloc, which will use mmap() and brk() syscalls to obtain heap memory from the operating system.
alloc::alloc::Global however is part of the allocator_api-feature which can be seen as a finer-grained selection of an allocator that should get used. The Allocator API brings the trait alloc::alloc::Allocator that gets implemented by alloc::alloc::Global. By default, it forwards requests to alloc::alloc::{alloc, alloc_zeroed, dealloc, realloc}. Typical types provided by the alloc crate (Vector, BTreeMap, Box, ...) all allow to chose a custom alloc::alloc::Allocator. The default value for this is alloc::alloc::Global.
To give an example, this is the type definition of alloc::vec::Vec (note the "Global):
pub struct Vec<T, #[unstable(feature = "allocator_api", issue = "32838")] A: Allocator = Global> {
buf: RawVec<T, A>,
len: usize,
}
You can use custom allocators (the allocator_api-feature) for example to force page-aligned allocations in a Box<T>.

Related

Are fixed position buffers a good use case for Pin?

I'm writing some code that shares buffers between Rust and the Linux kernel. After the buffer is registered, its memory location should never move. Since the API takes a *mut u8 (actually a libc::iovec) but doesn't enforce the fixed memory location constraint, it works fine if I represent the buffer as RefCell<Vec<u8>>, Arc<RefCell<Vec<u8>>>, or Arc<Mutex<Vec<u8>>>. (But the Rust code actually never writes to the buffers.)
Does Pin offer additional safety against data moving in this use case? I think it may not, since one of the major risks is calling resize() and growing the vector, but I do intend to call resize() without growing the vector. The code works, but I'm interested in writing this the most correct way.
I don't really know about Pin but it is mostly useful for self-referential structs and coroutines AFAIK.
For you case, I would just create a struct which enforces invariant about data movement. E.g.:
use std::ops::{Deref, DerefMut};
struct MyData{
data: Vec<u8>,
}
impl MyData{
fn with_capacity(cap: usize)->Self{
Self{data: Vec::with_capacity(cap)}
}
fn resize(&mut self, new_len: usize, val_to_set: u8){
let old_ptr = self.data.as_ptr();
// Enforce invariant that memory never moves.
assert!(new_len <= self.data.capacity());
self.data.resize(new_len, val_to_set);
debug_assert_eq!(old_ptr, self.data.as_ptr());
}
}
// To get access to bytes
impl Deref for MyData{
type Target = [u8];
fn deref(&self)->&[u8]{
&*self.data
}
}
impl DerefMut for MyData{
fn deref_mut(&mut self)->&mut[u8]{
&mut *self.data
}
}
This struct enforces that bytes never move without involving Pin.
And you can wrap it into RefCell.

How to safely create an opaque struct and then free it over the FFI boundary?

I am using cbindgen to generate C bindings for a small Rust crate that implements the ULID specification. To avoid leaking information, I am generating an opaque struct ulid_ctx and returning a pointer to that context object when it is first created. I'm struggling a little bit with reconciling Rust's ownership semantics and C's laissez-faire approach to memory.
#[allow(non_camel_case_types)]
pub struct ulid_ctx {
seed: u32,
}
#[no_mangle]
pub extern "C" fn ulid_create(seed: u32) -> *mut ulid_ctx {
let ctx = ulid_ctx { seed: s };
Box::leak(Box::new(ctx))
}
#[no_mangle]
pub unsafe extern "C" fn ulid_ctx_destroy(ctx: *mut ulid_ctx) {
Box::from_raw(ctx);
}
Two questions:
Does Box::leak(Box::new(ctx)) correctly allocate a ctx value on the heap and then inform Rust that the function no longer owns it?
Will Box::from_raw(ctx); re-create a Box and then immediately drop it, thereby freeing the memory?
Although it's not a lot of data (32 bits), I would like to avoid creating a memory leak if possible.
Does Box::leak(Box::new(ctx)) correctly allocate a ctx value on the
heap and then inform Rust that the function no longer owns it?
Indeed, as the name says it leaks the data, so it will not be dropped when going out of scope.
As per #user4815162342 comment, consider using Box::into_raw instead.
Will Box::from_raw(ctx); re-create a Box and then immediately drop it,
thereby freeing the memory?
Also true, will build a Box then will be dropped. As a note, it may be nice to make the drop explicit.
#[no_mangle]
pub unsafe extern "C" fn ulid_ctx_destroy(ctx: *mut ulid_ctx) {
drop(unsafe { Box::from_raw(ctx) })
}

How do I force Rust to allocate a big object on the heap without using a Vec? [duplicate]

In this code...
struct Test { a: i32, b: i64 }
fn foo() -> Box<Test> { // Stack frame:
let v = Test { a: 123, b: 456 }; // 12 bytes
Box::new(v) // `usize` bytes (`*const T` in `Box`)
}
... as far as I understand (ignoring possible optimizations), v gets allocated on the stack and then copied to the heap, before being returned in a Box.
And this code...
fn foo() -> Box<Test> {
Box::new(Test { a: 123, b: 456 })
}
...shouldn't be any different, presumably, since there should be a temporary variable for struct allocation (assuming compiler doesn't have any special semantics for the instantiation expression inside Box::new()).
I've found Do values in return position always get allocated in the parents stack frame or receiving Box?. Regarding my specific question, it only proposes the experimental box syntax, but mostly talks about compiler optimizations (copy elision).
So my question remains: using stable Rust, how does one allocate structs directly on the heap, without relying on compiler optimizations?
As of Rust 1.39, there seems to be only one way in stable to allocate memory on the heap directly - by using std::alloc::alloc (note that the docs state that it is expected to be deprecated). It's reasonably unsafe.
Example:
#[derive(Debug)]
struct Test {
a: i64,
b: &'static str,
}
fn main() {
use std::alloc::{alloc, dealloc, Layout};
unsafe {
let layout = Layout::new::<Test>();
let ptr = alloc(layout) as *mut Test;
(*ptr).a = 42;
(*ptr).b = "testing";
let bx = Box::from_raw(ptr);
println!("{:?}", bx);
}
}
This approach is used in the unstable method Box::new_uninit.
It turns out there's even a crate for avoiding memcpy calls (among other things): copyless. This crate also uses an approach based on this.
You seem to be looking for the box_syntax feature, however as of Rust 1.39.0 it is not stable and only available with a nightly compiler. It also seems like this feature will not be stabilized any time soon, and might have a different design if it ever gets stabilized.
On a nightly compiler, you can write:
#![feature(box_syntax)]
struct Test { a: i32, b: i64 }
fn foo() -> Box<Test> {
box Test { a: 123, b: 456 }
}
Is there a way to allocate directly to the heap without box?
No. If there was, it wouldn't need a language change.
People tend to avoid this by using the unstable syntax indirectly, such as by using one of the standard containers which, in turn, uses it internally.
See also:
How to allocate arrays on the heap in Rust 1.0?
Is there any way to allocate a standard Rust array directly on the heap, skipping the stack entirely?
What does the box keyword do?
What is the <- symbol in Rust?
I recently had the same problem. Based on the answers here and other places, I wrote a simple function for heap allocation:
pub fn unsafe_allocate<T>() -> Box<T> {
let mut grid_box: Box<T>;
unsafe {
use std::alloc::{alloc, dealloc, Layout};
let layout = Layout::new::<T>();
let ptr = alloc(layout) as *mut T;
grid_box = Box::from_raw(ptr);
}
return grid_box;
}
This will create a region in memory automatically sized after T and unsafely convince Rust that that memory region is an actual T value. The memory may contain arbitrary data; you should not assume all values are 0.
Example use:
let mut triangles: Box<[Triangle; 100000]> = unsafe_allocate::<[Triangle; 100000]>();

Is there a way to allocate directly to the heap without using the unstable box syntax? [duplicate]

In this code...
struct Test { a: i32, b: i64 }
fn foo() -> Box<Test> { // Stack frame:
let v = Test { a: 123, b: 456 }; // 12 bytes
Box::new(v) // `usize` bytes (`*const T` in `Box`)
}
... as far as I understand (ignoring possible optimizations), v gets allocated on the stack and then copied to the heap, before being returned in a Box.
And this code...
fn foo() -> Box<Test> {
Box::new(Test { a: 123, b: 456 })
}
...shouldn't be any different, presumably, since there should be a temporary variable for struct allocation (assuming compiler doesn't have any special semantics for the instantiation expression inside Box::new()).
I've found Do values in return position always get allocated in the parents stack frame or receiving Box?. Regarding my specific question, it only proposes the experimental box syntax, but mostly talks about compiler optimizations (copy elision).
So my question remains: using stable Rust, how does one allocate structs directly on the heap, without relying on compiler optimizations?
As of Rust 1.39, there seems to be only one way in stable to allocate memory on the heap directly - by using std::alloc::alloc (note that the docs state that it is expected to be deprecated). It's reasonably unsafe.
Example:
#[derive(Debug)]
struct Test {
a: i64,
b: &'static str,
}
fn main() {
use std::alloc::{alloc, dealloc, Layout};
unsafe {
let layout = Layout::new::<Test>();
let ptr = alloc(layout) as *mut Test;
(*ptr).a = 42;
(*ptr).b = "testing";
let bx = Box::from_raw(ptr);
println!("{:?}", bx);
}
}
This approach is used in the unstable method Box::new_uninit.
It turns out there's even a crate for avoiding memcpy calls (among other things): copyless. This crate also uses an approach based on this.
You seem to be looking for the box_syntax feature, however as of Rust 1.39.0 it is not stable and only available with a nightly compiler. It also seems like this feature will not be stabilized any time soon, and might have a different design if it ever gets stabilized.
On a nightly compiler, you can write:
#![feature(box_syntax)]
struct Test { a: i32, b: i64 }
fn foo() -> Box<Test> {
box Test { a: 123, b: 456 }
}
Is there a way to allocate directly to the heap without box?
No. If there was, it wouldn't need a language change.
People tend to avoid this by using the unstable syntax indirectly, such as by using one of the standard containers which, in turn, uses it internally.
See also:
How to allocate arrays on the heap in Rust 1.0?
Is there any way to allocate a standard Rust array directly on the heap, skipping the stack entirely?
What does the box keyword do?
What is the <- symbol in Rust?
I recently had the same problem. Based on the answers here and other places, I wrote a simple function for heap allocation:
pub fn unsafe_allocate<T>() -> Box<T> {
let mut grid_box: Box<T>;
unsafe {
use std::alloc::{alloc, dealloc, Layout};
let layout = Layout::new::<T>();
let ptr = alloc(layout) as *mut T;
grid_box = Box::from_raw(ptr);
}
return grid_box;
}
This will create a region in memory automatically sized after T and unsafely convince Rust that that memory region is an actual T value. The memory may contain arbitrary data; you should not assume all values are 0.
Example use:
let mut triangles: Box<[Triangle; 100000]> = unsafe_allocate::<[Triangle; 100000]>();

How to add/subtract an offset to/from NonNull<Opaque>?

I provide two functions for managing memory:
unsafe extern "system" fn alloc<A: Alloc>(
size: usize,
alignment: usize,
) -> *mut c_void { ... }
unsafe extern "system" fn free<A: Alloc>(
memory: *mut c_void
) { ... }
Both functions internally use the allocator-api.
These signatures cannot be changed. The problem is that free does not ask for size and alignment, which is required for Alloc::dealloc. To get around this, alloc allocates some extra space for one Layout. free can now access this Layout to get the needed extra data.
Recently, the allocator-api changed and instead of *mut u8 it now uses NonNull<Opaque>. This is where my problem occurs.
core::alloc::Opaque:
An opaque, unsized type. Used for pointers to allocated memory. [...] Such pointers are similar to C’s void* type.
Opaque is not Sized, so the use of NonNull::as_ptr().add() and NonNull::as_ptr().sub() are forbidden.
Previously, I used something like this (for simplicity, I assume Alloc's functions to be static):
#![feature(allocator_api)]
#![no_std]
extern crate libc;
use core::alloc::{Alloc, Layout};
use libc::c_void;
unsafe extern "system" fn alloc<A: Alloc>(
size: usize,
alignment: usize,
) -> *mut c_void
{
let requested_layout =
Layout::from_size_align(size, alignment).unwrap();
let (layout, padding) = Layout::new::<Layout>()
.extend_packed(requested_layout)
.unwrap();
let ptr = A::alloc(layout).unwrap();
(ptr as *mut Layout).write(layout);
ptr.add(padding)
}
The last line is not possible anymore with NonNull<Opaque>. How I can get around this?
I'd probably write it like this, using NonNull::as_ptr to get a *mut Opaque and then cast that to different concrete types:
#![feature(allocator_api)]
#![no_std]
extern crate libc;
use core::alloc::{Alloc, Layout};
use libc::c_void;
unsafe fn alloc<A: Alloc>(allocator: &mut A, size: usize, alignment: usize) -> *mut c_void {
let requested_layout = Layout::from_size_align(size, alignment).expect("Invalid layout");
let (layout, _padding) = Layout::new::<Layout>()
.extend_packed(requested_layout)
.expect("Unable to create layout");
let ptr = allocator.alloc(layout).expect("Unable to allocate");
// Get a pointer to our layout storage
let raw = ptr.as_ptr() as *mut Layout;
// Save it
raw.write(layout);
// Skip over it
raw.offset(1) as *mut _
}
unsafe extern "system" fn alloc<A: Alloc>(
This makes no sense to me. The various FFI ABIs ("C", "system", etc.) have no way of specifying Rust generic types. It seems deeply incorrect for this function to be marked extern.
Layout::new::<Layout>().extend_packed(requested_layout)
This seems likely to be very broken. As the documentation for Layout::extend_packed states, emphasis mine:
the alignment of next is irrelevant, and is not incorporated at all into the resulting layout.
Your returned pointer doesn't seem to honor the alignment request.

Resources