Am getting "Illegal instruction" when running collect on iter::count - linux

Running this:
fn main() {
std::iter::count(1i16, 3).collect::<Vec<i16>>();
}
I get:
thread '' panicked at 'capacity overflow', /home/tshepang/projects/rust/src/libcore/option.rs:329
That's what I'd expect when running this:
fn main() {
std::iter::count(1i8, 3).collect::<Vec<i8>>();
}
But instead, I get this:
Illegal instruction
In addition, syslog displays this line:
Dec 27 08:31:08 thome kernel: [170925.955841] traps: main[30631] trap invalid opcode ip:7f60ab175470 sp:7fffbb116578 error:0 in main[7f60ab15c000+5b000]

This was a fun adventure.
Iter::collect simply calls FromIterator::from_iter
Vec's implementation of FromIterator asks the iterator for its size and then allocates memory:
let (lower, _) = iterator.size_hint();
let mut vector = Vec::with_capacity(lower);
Vec::with_capacity computes the total size of memory and attempts to allocate it:
let size = capacity.checked_mul(mem::size_of::<T>())
.expect("capacity overflow");
let ptr = unsafe { allocate(size, mem::min_align_of::<T>()) };
if ptr.is_null() { ::alloc::oom() } // Important!
In this case, i8 takes 1 byte, and the lower bound of an infinite iterator is std::uint::MAX. Multiplied together, that's still std::uint::MAX. When we allocate that, we get a null pointer back.
alloc::oom is defined to simply abort, which is implemented by an Illegal Instruction!
The reason that an i16 has different behavior is because it triggers the checked_mul expectation - you can't allocate std::uint::MAX * 2 bytes!
In modern Rust, the examples would be written as:
(1i16..).step_by(3).collect::<Vec<_>>();
(1i8..).step_by(3).collect::<Vec<_>>();
Both now fail in the same manner:
memory allocation of 12297829382473034412 bytes failed
memory allocation of 6148914691236517206 bytes failed

Related

Can I free memory from Vec::into_boxed_slice using Box::from_raw?

I saw the following code for returning a byte array to C:
#[repr(C)]
struct Buffer {
data: *mut u8,
len: usize,
}
extern "C" fn generate_data() -> Buffer {
let mut buf = vec![0; 512].into_boxed_slice();
let data = buf.as_mut_ptr();
let len = buf.len();
std::mem::forget(buf);
Buffer { data, len }
}
extern "C" fn free_buf(buf: Buffer) {
let s = unsafe { std::slice::from_raw_parts_mut(buf.data, buf.len) };
let s = s.as_mut_ptr();
unsafe {
Box::from_raw(s);
}
}
I notice that the free_buf function takes a Buffer, instead of a *mut u8. Is this intentional?
Can the free_buf function be reduced to:
unsafe extern "C" fn free_buf(ptr: *mut u8) {
Box::from_raw(ptr);
}
You are correct to note that the C runtime free function takes only a pointer to the memory region to be freed as an argument.
However, you don't call this directly. In fact Rust has a layer that abstracts away the actual memory allocator being used: std::alloc::GlobalAlloc.
The reason for providing such an abstraction is to allow other allocators to be used, and in fact it is quite easy to swap out the default OS provided allocator.
It would be quite limiting to require that any allocator keeps track of the length of blocks to allow them to be freed without supplying the length to the deallocation function, so the general deallocation function requires the length as well.
You might be interested to know that C++ has a similar abstraction. This answer provides some more discussion about why it could be preferable to require the application to keep track of the lengths of allocated memory regions rather than the heap manager.
If we check the type of Box::from_raw, we see that it construct a Box<u8> from a raw *mut u8. One would need a *mut [u8] (fat pointer to slice) in order to construct a Box<[u8]> (which is what we have in the very beginning).
And dropping a Box<u8> will (at best) only release one byte of memory (if not causing a runtime error), while dropping a Box<[u8]> correctly releases all the memory.
No, what you do is undefined behavior, it's mandatory that the type between into_raw() and from_raw() match. Rust alloc API doesn't require the allocator to remember any information, and so the allocation implementation will expect correctness of all information pass to it.
In your example, *mut u8 and *mut [u8] are a totally different type and so have different layout.
Also, mismatch the type could prevent destructor to run properly.
You can't use from_raw() to destruct any pointer like C free() using void *.

Why does creating and writing to a very large vector cause a core dump?

I'm creating a Sieve of Eratosthenes so I can see all the prime numbers up to the starting number. Just the following code causes a core dump on Rust 1.26. There are no compiler warnings or errors, and the core dump isn't very helpful either with no error message.
fn main() {
let starting_number: i64 = 600851475143;
let mut primes = vec![true; 600851475143];
primes[0] = false;
primes[1] = false;
for i in 2..((starting_number as f64).ln() as usize) {
if primes[i] {
let mut j = i + i;
while j < primes.len() {
primes[j] = false;
j += i;
}
}
}
}
I thought Rust was all about safety and avoiding core dumps? Is this a legit error with my code which isn't caught by the compiler or something different?
The problem is that you run out of memory.
A lot of operating systems are "lazy" to allocate memory. This means that the OS will not actually allocate the real amount of memory you ask for until you use it. You are asking for at least 75 106 434 393 octets (a.k.a. 70 Gio) but Rust don't optimize the size of Vec<bool>, so you are asking for 600 851 475 143 bytes (a.k.a. 600 GiB) — your OS must not have found enough memory.
It's an error that your OS can't handle because it already told you "OK" when you asked for the memory. It's a critical error, so it ends your process with a core dump.
I thought Rust was all about safety and avoiding core dumps?
A core dump doesn't necessarily imply that your program is not safe. As you see, your program didn't do an out of bounds memory access, it just doesn't have enough memory. It's the best way to handle this error from your OS point of view and there is nothing unsafe according to the definition of safe in Rust.
BTW, on my machine (archlinux), your program is simply killed:
[1] 4901 killed cargo run

Splitting a `Vec`

I'm trying to write a little buffer-thing for parsing so I can pull records off the front of as I parse them out, ideally without making any copies and just transferring ownership of chunks of the front of the buffer off as I run. Here's my implementation:
struct BufferThing {
buf: Vec<u8>,
}
impl BufferThing {
fn extract(&mut self, size: usize) -> Vec<u8> {
assert!(size <= self.buf.len());
let remaining: usize = self.buf.len() - size;
let ptr: *mut u8 = self.buf.as_mut_ptr();
unsafe {
self.buf = Vec::from_raw_parts(ptr.offset(size as isize), remaining, remaining);
Vec::from_raw_parts(ptr, size, size)
}
}
}
This compiles, but panics with a signal: 11, SIGSEGV: invalid memory reference as it starts running. This is mostly the same code as the example in the Nomicon, but I'm trying to do it on Vec's and I'm trying to split a field instead of the object itself.
Is it possible to do this without copying out one of the Vecs? And is there some section of the Nomicon or other documentation that explains why I'm blowing everything up in the unsafe block?
Unfortunately, that's not how memory allocators work. It might have been possible in the past, when memory was at a premium, but today's allocators are geared for speed rather than memory preservation.
A common implementation of memory allocators is to use slabs. Basically, it's:
struct Allocator {
less_than_32_bytes: List<[u8; 32]>,
less_than_64_bytes: List<[u8; 64]>,
less_than_128_bytes: List<[u8; 128]>,
less_than_256_bytes: List<[u8; 256]>,
less_than_512_bytes: List<[u8; 512]>,
...
}
When you request 96 bytes, it takes an element from less_than_128_bytes.
When you free that element, it frees all of it, not just the first N bytes, and the whole block is now re-usable. Any pointer inside the block is now dangling and should NOT be dereferenced.
Furthermore, trying to free a pointer in the middle of a block will only confuse the allocator: it won't find it, because the contract is that you address blocks by their first byte.
You violated the contract using unsafe code, BOOM.
The solution I propose is simple:
use a single Vec<u8> containing the whole buffer to parse
use slices into this Vec for parsing
Rust will check the lifetimes, so your slices cannot outlive the buffer, and slicing a slice further (s[..offset], s[offset..]) does not allocate.
If you don't mind one allocation, there's Vec::split_off which allocates a new Vec big enough for the split part.

Recursive function calculating factorials leads to stack overflow

I tried a recursive factorial algorithm in Rust. I use this version of the compiler:
rustc 1.12.0 (3191fbae9 2016-09-23)
cargo 0.13.0-nightly (109cb7c 2016-08-19)
Code:
extern crate num_bigint;
extern crate num_traits;
use num_bigint::{BigUint, ToBigUint};
use num_traits::One;
fn factorial(num: u64) -> BigUint {
let current: BigUint = num.to_biguint().unwrap();
if num <= 1 {
return One::one();
}
return current * factorial(num - 1);
}
fn main() {
let num: u64 = 100000;
println!("Factorial {}! = {}", num, factorial(num))
}
I got this error:
$ cargo run
thread 'main' has overflowed its stack
fatal runtime error: stack overflow
error: Process didn't exit successfully
How to fix that? And why do I see this error when using Rust?
Rust doesn't have tail call elimination, so your recursion is limited by your stack size. It may be a feature for Rust in the future (you can read more about it at the Rust FAQ), but in the meantime you will have to either not recurse so deep or use loops.
Why?
This is a stack overflow which occurs whenever there is no stack memory left. For example, stack memory is used by
local variables
function arguments
return values
Recursion uses a lot of stack memory, because for every recursive call, the memory for all local variables, function arguments, ... has to be allocated on the stack.
How to fix that?
The obvious solution is to write your algorithm in a non-recursive manner (you should do this when you want to use the algorithm in production!). But you can also just increase the stack size. While the stack size of the main thread can't be modified, you can create a new thread and set a specific stack size:
fn main() {
let num: u64 = 100_000;
// Size of one stack frame for `factorial()` was measured experimentally
thread::Builder::new().stack_size(num as usize * 0xFF).spawn(move || {
println!("Factorial {}! = {}", num, factorial(num));
}).unwrap().join();
}
This code works and, when executed via cargo run --release (with optimization!), outputs the solution after only a couple of seconds calculation.
Measuring stack frame size
In case you want to know how the stack frame size (memory requirement for one call) for factorial() was measured: I printed the address of the function argument num on each factorial() call:
fn factorial(num: u64) -> BigUint {
println!("{:p}", &num);
// ...
}
The difference between two successive call's addresses is (more or less) the stack frame size. On my machine, the difference was slightly less than 0xFF (255), so I just used that as size.
In case you're wondering why the stack frame size isn't smaller: the Rust compiler doesn't really optimize for this metric. Usually it's really not important, so optimizers tend to sacrifice this memory requirement for better execution speed. I took a look at the assembly and in this case many BigUint methods were inlined. This means that the local variables of other functions are using stack space as well!
Just as an alternative.. (I do not recommend)
Matts answer is true to an extent. There is a crate called stacker (here) that can artificially increase the stack size for usage in recursive algorithms. It does this by allocating some heap memory to overflow into.
As a word of warning... this takes a very long time to run ... but, it runs, and it doesn't blow the stack. Compiling with optimizations brings it down but its still pretty slow. You're likely to get better perf from a loop as Matt suggests. I thought I would throw this out there anyway.
extern crate num_bigint;
extern crate num_traits;
extern crate stacker;
use num_bigint::{BigUint, ToBigUint};
use num_traits::One;
fn factorial(num: u64) -> BigUint {
// println!("Called with: {}", num);
let current: BigUint = num.to_biguint().unwrap();
if num <= 1 {
// println!("Returning...");
return One::one();
}
stacker::maybe_grow(1024 * 1024, 1024 * 1024, || {
current * factorial(num - 1)
})
}
fn main() {
let num: u64 = 100000;
println!("Factorial {}! = {}", num, factorial(num));
}
I have commented out the debug printlns.. you can uncomment them if you like.

What happens when a stack-allocated value is boxed?

If we have a value that is already allocated on stack, will boxing copy it to heap and then transfer ownership (that's how it works in .NET, with the exception that both copies will stay alive)? Or will the compiler be "smart" enough to allocate it directly on heap from the beginning?
struct Foo {
x: i32,
}
fn main() {
// a is allocated on stack?
let a = Foo { x: 1 };
// if a is not used, it will be optimized out
println!("{}", a.x);
// what happens here? will the stack allocated structure
// be moved to heap? or was it originally allocated on heap?
let b = Box::new(a);
}
I'm not a specialist in assembler, but this looks like it is actually allocated on stack and then moved: http://pastebin.com/8PzsgTJ1. But I need a confirmation from someone who actually knows what is happening.
It would be pretty strange for this optimization to happen as you describe it. For example, in this code:
let a = Foo { x: 1 };
// operation that observes a
let b = Box::new(a);
// operation that observes b
&a and &b would be equal, which would be surprising. However, if you do something similar, but don't observe a:
#[inline(never)]
fn frobnotz() -> Box<Foo> {
let a = Foo { x: 1 };
Box::new(a)
}
You can see via the LLVM IR that this case was optimized:
define internal fastcc noalias dereferenceable(4) %Foo* #_ZN8frobnotz20h3dca7bc0ee8400bciaaE() unnamed_addr #0 {
entry-block:
%0 = tail call i8* #je_mallocx(i64 4, i32 0)
%1 = icmp eq i8* %0, null
br i1 %1, label %then-block-106-.i.i, label %"_ZN5boxed12Box$LT$T$GT$3new20h2665038481379993400E.exit"
then-block-106-.i.i: ; preds = %entry-block
tail call void #_ZN3oom20he7076b57c17ed7c6HYaE()
unreachable
"_ZN5boxed12Box$LT$T$GT$3new20h2665038481379993400E.exit": ; preds = %entry-block
%2 = bitcast i8* %0 to %Foo*
%x.sroa.0.0..sroa_idx.i = bitcast i8* %0 to i32*
store i32 1, i32* %x.sroa.0.0..sroa_idx.i, align 4
ret %Foo* %2
}
Similarly, you can return the struct on the stack and then box it up, and there will still just be the one allocation:
You may think that this gives us terrible performance: return a value and then immediately box it up ?! Isn't this pattern the worst of both worlds? Rust is smarter than that. There is no copy in this code. main allocates enough room for the box, passes a pointer to that memory into foo as x, and then foo writes the value straight into the Box.
As explained in the official Rust documentation here, Box<T>::new(x: T) allocates memory on the heap and then moves the argument into that memory. Accessing a after let b = Box::new(a) is a compile-time error.

Resources