getting HC-SR04 ultrasonic sensor data from stm32f411 with rust HAL yields constant value independent of sensor condition - rust

So I want to get the distance in cm from my sensor, I already did it with Arduino C and an Arduino compatible board. Now I want to do this with stm32, below is my code (leaving out the conversion of pulse length to sound, as the delta time is constant already at this point.
#![deny(unsafe_code)]
#![allow(clippy::empty_loop)]
#![no_main]
#![no_std]
use panic_halt as _; // panic handler
use cortex_m_rt::{entry, interrupt};
use stm32f4xx_hal as hal;
use crate::hal::{pac, prelude::*};
use stm32f4xx_hal::delay::Delay;
use rtt_target::{rtt_init_print, rprintln};
use stm32f4xx_hal::timer::{Counter, Timer, SysCounter, CounterUs};
use cortex_m::peripheral::SYST;
use stm32f4xx_hal::time::Hertz;
use core::fmt::Debug;
use stm32f4xx_hal::pac::TIM2;
use core::pin::Pin;
fn dbg<T: Debug>(d: T, tag: &str) -> T {
rprintln!("{} {:?}", tag, d);
d
}
fn waste(c_us: &CounterUs<TIM2>, us: u32) {
let ts1 = c_us.now().ticks();
while (c_us.now().ticks() - ts1) < us {}
}
fn waste_until<T>(c_us: &CounterUs<TIM2>,
predicate: fn(_: &T) -> bool,
dt: &T,
us: u32) -> u32 {
let ts1 = c_us.now().ticks();
while (c_us.now().ticks() - ts1) < us && !predicate(dt) {}
return c_us.now().ticks() - ts1;
}
#[entry]
fn main() -> ! {
if let (Some(dp), Some(cp)) = (
pac::Peripherals::take(),
cortex_m::peripheral::Peripherals::take(),
) {
rtt_init_print!();
let gpioa = dp.GPIOA.split();
let mut trig = gpioa.pa3.into_push_pull_output();
let mut echo = gpioa.pa4.into_pull_up_input();
let rcc = dp.RCC.constrain();
let clocks = rcc.cfgr.freeze();
let mut counter = Timer::new(dp.TIM2, &clocks).counter_us();
counter.start(1_000_000_u32.micros()).unwrap();
loop {
trig.set_low();
waste(&counter, 2);
trig.set_high();
waste(&counter, 10);
trig.set_low();
let _ = waste_until(&counter, |c|c.is_high(),&echo, 1000);
let pulse_duration = waste_until(&counter, |c| c.is_low(),&echo, 1000);
rprintln!("{}", pulse_duration);
}
}
loop {}
}
I know that the code at this point does not stop the evaluation of the data in the case of timeout in the waste_until function, but given that there is an object less then 10 cm from sensor (which has a range of up to 2 meters) it shouldn't be causing issues.
I have few things I don't understand completely, which I assume might be the cause of this behavior.
First of all, I'm not sure if hardware timers loop, or have to be reset manually. (I used my waste function with half a second delay and managed to make seemingly ok blinky program, so i hope i got it correct).
I'm not sure if i have to configure my TIM2 maximum sampling frequency as in theory I could do it with sysclock, but i didn't find a way to do it with TIM2. Also I assumed that it wouldn't let me create CounterUs without minimum valid sample rate.
I'm not sure if ticks() are in one to one relation with microseconds (only assumed so, because it seemed logical that CounterUs would do that).
I'm not sure about the problems which might occur if timer loops mid wait and delta time becomes negative (in case of u32 just overflows).
When it comes to pull_up_input and pull_down_input does pull_up refer to the fact that pin is usually pulled high, and to trigger logical one it has to go low or that it has to be pulled high to get logical one? (Also it is not very clear if the is_low() and is_high() methods refer to the state of the pin, or logical value of the pin?)
I spent quite some time on this thing, but sadly to no avail so far. Hopefully someone can tell me if one of the things above is wrong and indeed causes the issue, or if its not something I considered helped me to see it.
(Value I'm getting is 1000 - 1001)
So from one of the comments I found out about the pull down and pull up resistors and watched couple YouTube videos on the matter. Not sure if this is correct, but from what I've found it seems that in fact i need a pull_down_input for echo pin. So I replaced it and the value
I'm getting is still constant but it's 1 now.
Now that makes some sense, since I assume that 1000 was originating from the timeout value in my waste. But getting 1 is a bit more confusing, I mean it cannot be faster then 1 us, right?
So after experimenting some more, I've ended up with this version of the code:
#![deny(unsafe_code)]
#![allow(clippy::empty_loop)]
#![no_main]
#![no_std]
use panic_halt as _; // panic handler
use cortex_m_rt::{entry, interrupt};
use stm32f4xx_hal as hal;
use crate::hal::{pac, prelude::*};
use stm32f4xx_hal::delay::Delay;
use rtt_target::{rtt_init_print, rprintln};
use stm32f4xx_hal::timer::{Counter, Timer, SysCounter, CounterUs};
use cortex_m::peripheral::SYST;
use stm32f4xx_hal::time::Hertz;
use core::fmt::Debug;
use stm32f4xx_hal::pac::TIM2;
use core::pin::Pin;
use cortex_m::asm::nop;
fn dbg<T: Debug>(d: T, tag: &str) -> T {
rprintln!("{} {:?}", tag, d);
d
}
fn waste(c_us: &CounterUs<TIM2>, us: u32) {
let ts1 = c_us.now().ticks();
while (c_us.now().ticks() - ts1) < us {}
}
fn waste_until<T>(c_us: &CounterUs<TIM2>,
predicate: fn(_: &T) -> bool,
dt: &T,
us: u32) -> Option<u32> {
let ts1 = c_us.now().ticks();
while (c_us.now().ticks() - ts1) < us && !predicate(dt) {
}
if predicate(dt) {Some(c_us.now().ticks() - ts1)} else {None}
}
#[entry]
fn main() -> ! {
if let (Some(dp), Some(cp)) = (
pac::Peripherals::take(),
cortex_m::peripheral::Peripherals::take(),
) {
rtt_init_print!();
let gpioa = dp.GPIOA.split();
let mut trig = gpioa.pa4.into_push_pull_output();
let mut echo = gpioa.pa5.into_pull_down_input();
let rcc = dp.RCC.constrain();
let clocks = rcc.cfgr.freeze();
let mut counter = Timer::new(dp.TIM2, &clocks).counter_us();
counter.start(1_000_000_u32.micros()).unwrap();
loop {
// starting pulse
trig.set_low();
waste(&counter, 2);
trig.set_high();
waste(&counter, 10);
trig.set_low();
// ending pulse
// starting echo read
if let Some(_) = waste_until(&counter, |c|c.is_high(),&echo, 1_000_000) { // if didn't timeout
if let Some(pulse_duration) = waste_until(&counter, |c| c.is_low(),&echo, 1_000_000) { // if didn't timeout
rprintln!("{}", pulse_duration);
} else {
rprintln!("no falling edge");
}
} else {
rprintln!("no rising edge");
}
// end echo read
}
}
loop {}
}
And here it became clear that the pattern in fact was that first 1-3 readings output same value (so far I've seen 1, 21 and 41) and then it keeps timing out in the outer if.
I tried changing io pins because I considered that my poor solder job was to blame, and also inspected the pins with multimeter, they seem to be fine.
I'm not entirely sure but I think that given that sensor has a recommended VCC of 5 volts, and stlink-2 provides 3.3 volts to the board the sensor can preform worse (but once again the target object is at most 5 cm away).
Here are the images of my breadboard just in case i missed something.

Related

CPU time sleep instead of wall-clock time sleep

Currently, I have the following Rust toy program:
use rayon::prelude::*;
use std::{env, thread, time};
/// Sleeps 1 seconds n times parallely using rayon
fn rayon_sleep(n: usize) {
let millis = vec![0; n];
millis
.par_iter()
.for_each(|_| thread::sleep(time::Duration::from_millis(1000)));
}
fn main() {
let args: Vec<String> = env::args().collect();
let n = args[1].parse::<usize>().unwrap();
let now = time::Instant::now();
rayon_sleep(n);
println!("rayon: {:?}", now.elapsed());
}
Basically, my program accepts one input argument n. Then, I sleep for 1 second n times. The program executes the sleep tasks in parallel using rayon.
However, this is not exactly what I want. As far as I know, thread::sleep sleeps according to wall-clock time. However, I would like to keep a virtual CPU busy for 1 second in CPU time.
Is there any way to do this?
EDIT
I would like to make this point clear: I don't mind if the OS preempts the tasks. However, if this happens, then I don't want to consider the time the task spends in the ready/waiting queue.
EDIT
This is a simple, illustrative example of what I need to do. In reality, I have to develop a benchmark for a crate that allows defining and simulating models using the DEVS formalism. The benchmark aims to compare DEVS-compliant libraries with each other, and it explicitly says that the models must spend a fixed, known amount of CPU time. That is why I need to make sure of that. Thus, I cannot use a simple busy loop nor simply sleep.
I followed Sven Marnach's suggestions and implemented the following function:
use cpu_time::ThreadTime;
use rayon::prelude::*;
use std::{env, thread, time};
/// Sleeps 1 seconds n times parallely using rayon
fn rayon_sleep(n: usize) {
let millis = vec![0; n];
millis.par_iter().for_each(|_| {
let duration = time::Duration::from_millis(1000);
let mut x: u32 = 0;
let now = ThreadTime::now(); // get current thread time
while now.elapsed() < duration { // active sleep
std::hint::black_box(&mut x); // to avoid compiler optimizations
x = x.wrapping_add(1);
}
});
}
fn main() {
let args: Vec<String> = env::args().collect();
let n = args[1].parse::<usize>().unwrap();
let now = time::Instant::now();
rayon_sleep(n);
println!("rayon: {:?}", now.elapsed());
}
If I set n to 8, it takes 2 seconds more or less. I'd expect a better performance (1 second, as I have 8 vCPUs), but I guess that the overhead corresponds to the OS scheduling policy.

How do I get the difference in time in nanoseconds? Problem in types (Rust)

I have a bit of of a problem regarding types in rust.
In my problem, I am simulating message transferring in a graph.
I have the speed at which a message can be transferred in M/s (Megabits/seconds) along a channel and I also have the message size in M (Megabits). To get the time at which the message arrive is pretty standard: size/speed.
I am now trying to get the difference in time in nanoseconds between the time that the message was sent and the time now. I want it in nanoseconds because if the message is very small then the time for transferring will also be very small.
If I want to get number of milliseconds from the difference in time then I can get it with the function diff.num_milliseconds() which gives me a i64 type.
However if I want to get the number of nanoseconds from the difference then diff.num_nanoseconds() returns a type Option<i64> and you can't seem to compare a float with Option<i64>.
use std::thread;
use std::time::Duration;
use chrono::{Utc};
fn main() {
// size of message in M
let size = 0.05;
// transfer speed in M/s
let speed = 1e6;
// speed in M/ns
let speed_nano = speed/1e9;
let now = Utc::now();
let sec = Duration::from_millis(5000);
thread::sleep(sec);
let then = Utc::now();
let diff = then-now;
println!("{}",(size/speed_nano) < diff.num_nanoseconds());
}
What am I missing here? How can I properly do the comparison?
Option<i64> means that the function returns a value that contains either an i64 or nothing at all. You get either a Some(an_i64) or None, which represents the absence of an i64 value.
You can extract the i64 (if it's there) using unwrap, for example:
(size/speed_nano) < diff.num_nanoseconds().unwrap()
If there's no i64 value there, unwrap will panic and terminate your program. You can also use unwrap_or to supply a default value:
(size/speed_nano) < diff.num_nanoseconds().unwrap_or(0)
You can't compare a float to an Option<i64> because it doesn't make sense: how would a float compare to a missing, "nonexistent" value None?
fn is_less(a: f64, b: Option<i64>) -> bool {
match b {
Some(my_integer) => a < my_integer,
None => ?????
}
}
In that case, the float a is not less than None, but it's also not equal to it and not greater, so what is it? Rust decided that it's an error.

Bad rust code optimization or I just haven't done enough? (Euler #757)

I'm trying to solve my first ever project Euler problem just to have fun with Rust, and got stuck on what seems to be an extremely long compute time to solve
Problem:
https://projecteuler.net/problem=757
I came up with this code to try to solve it, which I'm able to solve the base problem (up to 10^6) in ~245 ms and get the expected result of 2,851.
use std::time::Instant;
fn factor(num: u64) -> Vec<u64> {
let mut counter = 1;
let mut factors = Vec::with_capacity(((num as f64).log(10.0)*100.0) as _);
while counter <= (num as f64).sqrt() as _ {
let div = num / counter;
let rem = num % counter;
if rem == 0 {
factors.push(counter);
factors.push(div);
}
counter += 1
}
factors.shrink_to_fit();
factors
}
fn main() {
let now = Instant::now();
let max = 10u64.pow(6);
let mut counter = 0;
'a: for i in 1..max {
// Optimization: All numbers in the pattern appear to be evenly divisible by 4
let div4 = i / 4;
let mod4 = i % 4;
if mod4 != 0 {continue}
// Optimization: And the remainder of that divided by 3 is always 0 or 1
if div4 % 3 > 1 {continue}
let mut factors = factor(i);
if factors.len() >= 4 {
// Optimization: The later found factors seem to be the most likely to fit the pattern, so try them first
factors.reverse();
let pairs: Vec<_> = factors.chunks(2).collect();
for paira in pairs.iter() {
for pairb in pairs.iter() {
if pairb[0] + pairb[1] == paira[0] + paira[1] + 1 {
counter += 1;
continue 'a;
}
}
}
}
}
println!("{}, {} ms", counter, now.elapsed().as_millis());
}
It looks like my code is spending the most amount of time on factoring, and in my search for a more efficient factoring algorithm than what I was able to come up with on my own, I couldn't find any rust code already made (the code I did find was actually slower.) But I did a simulation to estimate how long it would take even if I had a perfect factoring algorithm, and it would take 13 days to find all numbers up to 10^14 with the non-factoring portions of this code. Probably not what the creator of this problem intends.
Given I'm relatively new to programming, is there some concept or programming method that I'm not aware of (like say using a hashmap to do fast lookups) that can be used in this situation? Or is the solution going to involve spotting patterns in the numbers and making optimizations like the ones I have found so far?
If Vec::push is called when the vector is at its capacity, it will re-allocate its internal buffer to double the size and copy all its elements to this new allocation.
Vec::new() creates a vector with no space allocated so it will be doing this re-allocation.
You can use Vec::with_capacity((num/2) as usize) to avoid this and just allocate the max you might need.

What corner case am I missing in my Rust emulation of C++'s `std::cin >>`?

My plan is to write a simple method which does exactly what std::cin >> from the C++ standard library does:
use std::io::BufRead;
pub fn input<T: std::str::FromStr>(handle: &std::io::Stdin) -> Result<T, T::Err> {
let mut x = String::new();
let mut guard = handle.lock();
loop {
let mut trimmed = false;
let available = guard.fill_buf().unwrap();
let l = match available.iter().position(|&b| !(b as char).is_whitespace()) {
Some(i) => {
trimmed = true;
i
}
None => available.len(),
};
guard.consume(l);
if trimmed {
break;
}
}
let available = guard.fill_buf().unwrap();
let l = match available.iter().position(|&b| (b as char).is_whitespace()) {
Some(i) => i,
None => available.len(),
};
x.push_str(std::str::from_utf8(&available[..l]).unwrap());
guard.consume(l);
T::from_str(&x)
}
The loop is meant to trim away all the whitespace before valid input begins. The match block outside the loop is where the length of the valid input (that is, before trailing whitespaces begin or EOF is reached) is calculated.
Here is an example using the above method.
let handle = std::io::stdin();
let x: i32 = input(&handle).unwrap();
println!("x: {}", x);
let y: String = input(&handle).unwrap();
println!("y: {}", y);
When I tried a few simple tests, the method works as intended. However, when I use this in online programming judges like the one in codeforces, I get a complaint telling that the program sometimes stays idle or that the wrong input has been taken, among other issues, which leads to suspecting that I missed a corner case or something like that. This usually happens when the input is a few hundreds of lines long.
What input is going to break the method? What is the correction?
After a lot of experimentation, I noticed a lag when reading each input, which added up as the number of inputs were increased. The function doesn't make use of a buffer. It tries to access the stream every time it needs to fill a variable, which is slow and hence the lag.
Lesson learnt: Always use a buffer with a good capacity.
However, the idleness issue still persisted, until I replaced the fill_buf, consume pairs with something like read_line or read_string.

Is `iter().map().sum()` as fast as `iter().fold()`?

Does the compiler generate the same code for iter().map().sum() and iter().fold()? In the end they achieve the same goal, but the first code would iterate two times, once for the map and once for the sum.
Here is an example. Which version would be faster in total?
pub fn square(s: u32) -> u64 {
match s {
s # 1...64 => 2u64.pow(s - 1),
_ => panic!("Square must be between 1 and 64")
}
}
pub fn total() -> u64 {
// A fold
(0..64).fold(0u64, |r, s| r + square(s + 1))
// or a map
(1..64).map(square).sum()
}
What would be good tools to look at the assembly or benchmark this?
For them to generate the same code, they'd first have to do the same thing. Your two examples do not:
fn total_fold() -> u64 {
(0..64).fold(0u64, |r, s| r + square(s + 1))
}
fn total_map() -> u64 {
(1..64).map(square).sum()
}
fn main() {
println!("{}", total_fold());
println!("{}", total_map());
}
18446744073709551615
9223372036854775807
Let's assume you meant
fn total_fold() -> u64 {
(1..64).fold(0u64, |r, s| r + square(s + 1))
}
fn total_map() -> u64 {
(1..64).map(|i| square(i + 1)).sum()
}
There are a few avenues to check:
The generated LLVM IR
The generated assembly
Benchmark
The easiest source for the IR and assembly is one of the playgrounds (official or alternate). These both have buttons to view the assembly or IR. You can also pass --emit=llvm-ir or --emit=asm to the compiler to generate these files.
Make sure to generate assembly or IR in release mode. The attribute #[inline(never)] is often useful to keep functions separate to find them easier in the output.
Benchmarking is documented in The Rust Programming Language, so there's no need to repeat all that valuable information.
Before Rust 1.14, these do not produce the exact same assembly. I'd wait for benchmarking / profiling data to see if there's any meaningful impact on performance before I worried.
As of Rust 1.14, they do produce the same assembly! This is one reason I love Rust. You can write clear and idiomatic code and smart people come along and make it equally as fast.
but the first code would iterate two times, once for the map and once for the sum.
This is incorrect, and I'd love to know what source told you this so we can go correct it at that point and prevent future misunderstandings. An iterator operates on a pull basis; one element is processed at a time. The core method is next, which yields a single value, running just enough computation to produce that value.
First, let's fix those example to actually return the same result:
pub fn total_fold_iter() -> u64 {
(1..65).fold(0u64, |r, s| r + square(s))
}
pub fn total_map_iter() -> u64 {
(1..65).map(square).sum()
}
Now, let's develop them, starting with fold. A fold is just a loop and an accumulator, it is roughly equivalent to:
pub fn total_fold_explicit() -> u64 {
let mut total = 0;
for i in 1..65 {
total = total + square(i);
}
total
}
Then, let's go with map and sum, and unwrap the sum first, which is roughly equivalent to:
pub fn total_map_partial_iter() -> u64 {
let mut total = 0;
for i in (1..65).map(square) {
total += i;
}
total
}
It's just a simple accumulator! And now, let's unwrap the map layer (which only applies a function), obtaining something that is roughly equivalent to:
pub fn total_map_explicit() -> u64 {
let mut total = 0;
for i in 1..65 {
let s = square(i);
total += s;
}
total
}
As you can see, the both of them are extremely similar: they have apply the same operations in the same order and have the same overall complexity.
Which is faster? I have no idea. And a micro-benchmark may only tell half the truth anyway: just because something is faster in a micro-benchmark does not mean it is faster in the midst of other code.
What I can say, however, is that they both have equivalent complexity and therefore should behave similarly, ie within a factor of each other.
And that I would personally go for map + sum, because it expresses the intent more clearly whereas fold is the "kitchen-sink" of Iterator methods and therefore far less informative.

Resources