Treat Mmap as volatile in Rust - rust

I am trying to monitor a binary file containing single 32 bit integer. I mapped the file into memory via MmapMut and read in a forever loop:
fn read(mmap: &MmapMut) {
let mut i = u32::from_ne_bytes(mmap[0..4].try_into().unwrap());
loop {
let j = u32::from_ne_bytes(mmap[0..4].try_into().unwrap());
if j != i {
assert!(false); // this assert should fail, but it never does
i = j;
}
}
}
However, compiler seems to optimise the loop away assuming neither i or j can ever change.
Is there a way to prevent this optimisation?

You can use MmapRaw with read_volatile().
fn read(mmap: &MmapRaw) {
let mut i = unsafe { mmap.as_ptr().cast::<u32>().read_volatile() };
loop {
let j = unsafe { mmap.as_ptr().cast::<u32>().read_volatile() };
if j != i {
assert!(false);
i = j;
}
}
}

Related

Rust thread 'main' has overflowed its stack

I'm trying to re-implement this C code:
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#define ROWS 100000
#define COLS 10000
int twodarray[ROWS][COLS];
int main() {
for (int i = 0; i < ROWS; i++) {
for (int j = 0; j < COLS; j++) {
twodarray[i][j] = rand();
}
}
int64 sum = 0;
for (int i = 0; i < ROWS; i++) {
for (int j = 0; j < COLS; j++) {
sum += twodarray[i][j];
}
}
}
So, after a lot of trial and error I've come up with Rust code that at least compiles
extern crate rand;
const ROWS: usize = 100000;
const COLS: usize = 10000;
fn main() {
let mut twodarray: [[u128; COLS]; ROWS] = [[0; COLS]; ROWS];
for i in 0..ROWS {
for j in 0..COLS {
twodarray[i][j] = rand::random::<u8>() as u128;
}
}
let mut sum: u128 = 0;
for k in 0..ROWS {
for j in 0..COLS {
sum += twodarray[k][j] as u128;
}
}
println!("{}", sum);
}
But when I compile and execute it, I get the following error message: thread 'main' has overflowed its stack. To be honest, I have absolutelly no idea why this is happening. I'm currently learning rust, but there is not a lot of information on 2d arrays online... I DO NOT WANT TO USE VECTOR. This exercise is specifically aimed on arrays.
EDIT:
After reading the accepted answear, I've come up with Rust code that outputs expected results:
extern crate rand;
const ROWS: usize = 100000;
const COLS: usize = 10000;
fn main() {
// let mut twodarray: Box<[[u8; COLS]; ROWS]> = Box::new([[0; COLS]; ROWS]);
// for i in 0..ROWS {
// for j in 0..COLS {
// twodarray[i][j] = rand::random::<u8>();
// }
// }
// let mut sum: u32 = 0;
// for k in 0..ROWS {
// for l in 0..COLS {
// sum += twodarray[k][l] as u32;
// }
// }
let mut twodarray: Box<[[u8; ROWS]; COLS]> = Box::new([[0; ROWS]; COLS]);
for i in 0..COLS {
for j in 0..ROWS {
twodarray[i][j] = rand::random::<u8>();
}
}
let mut sum: u32 = 0;
for k in 0..COLS {
for l in 0..ROWS {
sum += twodarray[k][l] as u32;
}
}
println!("{}", sum);
}
This two-dimensional array is huge. Rust tries to allocate it on the stack (that's the way arrays in Rust work).
You explicitly wrote you didn't want to use vectors, but in reality vectors are allocated on the heap, so you wouldn't have such a problem with them. It's good to keep that in mind.
If you insist on using arrays, maybe put them in a Box, so that they go to the heap?
That's not a Rust-specific problem. Stacks are generally a lot more limited, compared to the heap, usually up to a few megabytes. In some other languages, such as Java, arrays are allocated on the heap, but that doesn't mean there is no similar stack size limit there as well.

Pass mutable pointer to function without initializing it first

use std::ptr::{addr_of_mut, null_mut};
use libc::{CLOCK_MONOTONIC, timer_create, timer_delete, timer_t};
fn main() {
let mut timer1: timer_t = null_mut();
unsafe {
let r = timer_create(CLOCK_MONOTONIC, null_mut(), addr_of_mut!(timer1));
if r == 0 {
timer_delete(timer1);
}
}
}
When calling timer_create(), the resulting timer ID is stored at variable timer1. I pass it as a mutable pointer, so that's the output variable.
How can I avoid initializing timer1 to null_mut() as in the code above knowing that it's guaranteed by the API to be safe ?
You can use MaybeUninit:
use std::mem::MaybeUninit;
use std::ptr::null_mut;
use libc::{CLOCK_MONOTONIC, timer_create, timer_delete, timer_t};
fn main() {
let mut timer1 = MaybeUninit::<timer_t>::uninit();
let timer1 = unsafe {
let r = timer_create(CLOCK_MONOTONIC, null_mut(), timer1.as_mut_ptr());
if r != 0 {
panic!("…");
}
timer1.assume_init()
};
unsafe {
timer_delete(timer1);
}
}

Is there a way to update a string in place in rust?

You can also consider this as, is it possible to URLify a string in place in rust?
For example,
Problem statement: Replace whitespace with %20
Assumption: String will have enough capacity left to accommodate new characters.
Input: Hello how are you
Output: Hello%20how%20are%20you
I know there are ways to do this if we don't have to do this "in place". I am solving a problem that explicitly states that you have to update in place.
If there isn't any safe way to do this, is there any particular reason behind that?
[Edit]
I was able to solve this using unsafe approach, but would appreciate a better approach than this. More idiomatic approach if there is.
fn space_20(sentence: &mut String) {
if !sentence.is_ascii() {
panic!("Invalid string");
}
let chars: Vec<usize> = sentence.char_indices().filter(|(_, ch)| ch.is_whitespace()).map(|(idx, _)| idx ).collect();
let char_count = chars.len();
if char_count == 0 {
return;
}
let sentence_len = sentence.len();
sentence.push_str(&"*".repeat(char_count*2)); // filling string with * so that bytes array becomes of required size.
unsafe {
let bytes = sentence.as_bytes_mut();
let mut final_idx = sentence_len + (char_count * 2) - 1;
let mut i = sentence_len - 1;
let mut char_ptr = char_count - 1;
loop {
if i != chars[char_ptr] {
bytes[final_idx] = bytes[i];
if final_idx == 0 {
// all elements are filled.
println!("all elements are filled.");
break;
}
final_idx -= 1;
} else {
bytes[final_idx] = '0' as u8;
bytes[final_idx - 1] = '2' as u8;
bytes[final_idx - 2] = '%' as u8;
// final_idx is of type usize cannot be less than 0.
if final_idx < 3 {
println!("all elements are filled at start.");
break;
}
final_idx -= 3;
// char_ptr is of type usize cannot be less than 0.
if char_ptr > 0 {
char_ptr -= 1;
}
}
if i == 0 {
// all elements are parsed.
println!("all elements are parsed.");
break;
}
i -= 1;
}
}
}
fn main() {
let mut sentence = String::with_capacity(1000);
sentence.push_str(" hello, how are you?");
// sentence.push_str("hello, how are you?");
// sentence.push_str(" hello, how are you? ");
// sentence.push_str(" ");
// sentence.push_str("abcd");
space_20(&mut sentence);
println!("{}", sentence);
}
An O(n) solution that neither uses unsafe nor allocates (provided that the string has enough capacity), using std::mem::take:
fn urlify_spaces(text: &mut String) {
const SPACE_REPLACEMENT: &[u8] = b"%20";
// operating on bytes for simplicity
let mut buffer = std::mem::take(text).into_bytes();
let old_len = buffer.len();
let space_count = buffer.iter().filter(|&&byte| byte == b' ').count();
let new_len = buffer.len() + (SPACE_REPLACEMENT.len() - 1) * space_count;
buffer.resize(new_len, b'\0');
let mut write_pos = new_len;
for read_pos in (0..old_len).rev() {
let byte = buffer[read_pos];
if byte == b' ' {
write_pos -= SPACE_REPLACEMENT.len();
buffer[write_pos..write_pos + SPACE_REPLACEMENT.len()]
.copy_from_slice(SPACE_REPLACEMENT);
} else {
write_pos -= 1;
buffer[write_pos] = byte;
}
}
*text = String::from_utf8(buffer).expect("invalid UTF-8 during URL-ification");
}
(playground)
Basically, it calculates the final length of the string, sets up a reading pointer and a writing pointer, and translates the string from right to left. Since "%20" has more characters than " ", the writing pointer never catches up with the reading pointer.
Is it possible to do this without unsafe?
Yes like this:
fn main() {
let mut my_string = String::from("Hello how are you");
let mut insert_positions = Vec::new();
let mut char_counter = 0;
for c in my_string.chars() {
if c == ' ' {
insert_positions.push(char_counter);
char_counter += 2; // Because we will insert two extra chars here later.
}
char_counter += 1;
}
for p in insert_positions.iter() {
my_string.remove(*p);
my_string.insert(*p, '0');
my_string.insert(*p, '2');
my_string.insert(*p, '%');
}
println!("{}", my_string);
}
Here is the Playground.
But should you do it?
As discussed for example here on Reddit this is almost always not the recommended way of doing this, because both remove and insert are O(n) operations as noted in the documentation.
Edit
A slightly better version:
fn main() {
let mut my_string = String::from("Hello how are you");
let mut insert_positions = Vec::new();
let mut char_counter = 0;
for c in my_string.chars() {
if c == ' ' {
insert_positions.push(char_counter);
char_counter += 2; // Because we will insert two extra chars here later.
}
char_counter += 1;
}
for p in insert_positions.iter() {
my_string.remove(*p);
my_string.insert_str(*p, "%20");
}
println!("{}", my_string);
}
and the corresponding Playground.

Rust handling variable outside the scoped_threshhold

from rust beginner.
I have some problems about ownership I think. What i wanna do is changing "ret" which is boolean
type variable inside the pool block. But when i ran the code and checked the ret, it changed well inside the pool block but outside the block, ret alway behave as true,,,
plz fix my headache...
let mut pool = Pool::new(max_worker);
let mut ret = true;
pool.scoped(|scoped| {
for i in 0..somevalue{
scoped.execute( move || {
let ret_ref = &mut ret;
// Do Something
if success {
*ret_ref = false
}
});
}
});
if ret == true { /* Do Something */ }
scoped_threadpool::Pool::scoped(&mut self, <closure>) returns a closure that impls FnOnce which means you can only call it once. You had it inside a for loop which is why the compiler kept giving you errors with confusing suggestions. Once you refactor the code to move the for outside the call to scoped then it compiles and works as expected:
use scoped_threadpool::Pool;
fn main() {
let max_workers = 1;
let somevalue = 1;
let mut pool = Pool::new(max_workers);
let mut ret = true;
let ret_ref = &mut ret;
for i in 0..somevalue {
pool.scoped(|scoped| {
scoped.execute(|| {
// do something
let success = true;
if success {
*ret_ref = false
}
});
});
}
if ret == true {
println!("ret stayed true");
} else {
// prints this
println!("ret was changed to false in the scoped thread");
}
}
playground

How to give each CPU core mutable access to a portion of a Vec? [duplicate]

This question already has an answer here:
How do I pass disjoint slices from a vector to different threads?
(1 answer)
Closed 4 years ago.
I've got an embarrassingly parallel bit of graphics rendering code that I would like to run across my CPU cores. I've coded up a test case (the function computed is nonsense) to explore how I might parallelize it. I'd like to code this using std Rust in order to learn about using std::thread. But, I don't understand how to give each thread a portion of the framebuffer. I'll put the full testcase code below, but I'll try to break it down first.
The sequential form is super simple:
let mut buffer0 = vec![vec![0i32; WIDTH]; HEIGHT];
for j in 0..HEIGHT {
for i in 0..WIDTH {
buffer0[j][i] = compute(i as i32,j as i32);
}
}
I thought that it would help to make a buffer that was the same size, but re-arranged to be 3D & indexed by core first. This is the same computation, just a reordering of the data to show the workings.
let mut buffer1 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
for c in 0..num_logical_cores {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
buffer1[c][y][i] = compute(i as i32,j as i32)
}
}
}
But, when I try to put the inner part of the code in a closure & create a thread, I get errors about the buffer & lifetimes. I basically don't understand what to do & could use some guidance. I want per_core_buffer to just temporarily refer to the data in buffer2 that belongs to that core & allow it to be written, synchronize all the threads & then read buffer2 afterwards. Is this possible?
let mut buffer2 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let mut handles = Vec::new();
for c in 0..num_logical_cores {
let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
let handle = thread::spawn(move || {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
per_core_buffer[y][i] = compute(i as i32,j as i32)
}
}
});
handles.push(handle)
}
for handle in handles {
handle.join().unwrap();
}
The error is this & I don't understand:
error[E0597]: `buffer2` does not live long enough
--> src/main.rs:50:36
|
50 | let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
| ^^^^^^^ borrowed value does not live long enough
...
88 | }
| - borrowed value only lives until here
|
= note: borrowed value must be valid for the static lifetime...
The full testcase is:
extern crate num_cpus;
use std::time::Instant;
use std::thread;
fn compute(x: i32, y: i32) -> i32 {
(x*y) % (x+y+10000)
}
fn main() {
let num_logical_cores = num_cpus::get();
const WIDTH: usize = 40000;
const HEIGHT: usize = 10000;
let y_per_core = HEIGHT/num_logical_cores + 1;
// ------------------------------------------------------------
// Serial Calculation...
let mut buffer0 = vec![vec![0i32; WIDTH]; HEIGHT];
let start0 = Instant::now();
for j in 0..HEIGHT {
for i in 0..WIDTH {
buffer0[j][i] = compute(i as i32,j as i32);
}
}
let dur0 = start0.elapsed();
// ------------------------------------------------------------
// On the way to Parallel Calculation...
// Reorder the data buffer to be 3D with one 2D region per core.
let mut buffer1 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let start1 = Instant::now();
for c in 0..num_logical_cores {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
buffer1[c][y][i] = compute(i as i32,j as i32)
}
}
}
let dur1 = start1.elapsed();
// ------------------------------------------------------------
// Actual Parallel Calculation...
let mut buffer2 = vec![vec![vec![0i32; WIDTH]; y_per_core]; num_logical_cores];
let mut handles = Vec::new();
let start2 = Instant::now();
for c in 0..num_logical_cores {
let per_core_buffer = &mut buffer2[c]; // <<< lifetime error
let handle = thread::spawn(move || {
for y in 0..y_per_core {
let j = y*num_logical_cores + c;
if j >= HEIGHT {
break;
}
for i in 0..WIDTH {
per_core_buffer[y][i] = compute(i as i32,j as i32)
}
}
});
handles.push(handle)
}
for handle in handles {
handle.join().unwrap();
}
let dur2 = start2.elapsed();
println!("Runtime: Serial={0:.3}ms, AlmostParallel={1:.3}ms, Parallel={2:.3}ms",
1000.*dur0.as_secs() as f64 + 1e-6*(dur0.subsec_nanos() as f64),
1000.*dur1.as_secs() as f64 + 1e-6*(dur1.subsec_nanos() as f64),
1000.*dur2.as_secs() as f64 + 1e-6*(dur2.subsec_nanos() as f64));
// Sanity check
for j in 0..HEIGHT {
let c = j % num_logical_cores;
let y = j / num_logical_cores;
for i in 0..WIDTH {
if buffer0[j][i] != buffer1[c][y][i] {
println!("wtf1? {0} {1} {2} {3}",i,j,buffer0[j][i],buffer1[c][y][i])
}
if buffer0[j][i] != buffer2[c][y][i] {
println!("wtf2? {0} {1} {2} {3}",i,j,buffer0[j][i],buffer2[c][y][i])
}
}
}
}
Thanks to #Shepmaster for the pointers and clarification that this is not an easy problem for Rust, and that I needed to consider crates to find a reasonable solution. I'm only just starting out in Rust, so this really wasn't clear to me.
I liked the ability to control the number of threads that scoped_threadpool gives, so I went with that. Translating my code from above directly, I tried to use the 4D buffer with core as the most-significant-index and that ran into troubles because that 3D vector does not implement the Copy trait. The fact that it implements Copy makes me concerned about performance, but I went back to the original problem and implemented it more directly & found a reasonable speedup by making each row a thread. Copying each row will not be a large memory overhead.
The code that works for me is:
let mut buffer2 = vec![vec![0i32; WIDTH]; HEIGHT];
let mut pool = Pool::new(num_logical_cores as u32);
pool.scoped(|scope| {
let mut y = 0;
for e in &mut buffer2 {
scope.execute(move || {
for x in 0..WIDTH {
(*e)[x] = compute(x as i32,y as i32);
}
});
y += 1;
}
});
On a 6 core, 12 thread i7-8700K for 400000x4000 testcase this runs in 3.2 seconds serially & 481ms in parallel--a reasonable speedup.
EDIT: I continued to think about this issue and got a suggestion from Rustlang on twitter that I should consider rayon. I converted my code to rayon and got similar speedup with the following code.
let mut buffer2 = vec![vec![0i32; WIDTH]; HEIGHT];
buffer2
.par_iter_mut()
.enumerate()
.map(|(y,e): (usize, &mut Vec<i32>)| {
for x in 0..WIDTH {
(*e)[x] = compute(x as i32,y as i32);
}
})
.collect::<Vec<_>>();

Resources