Why does .step_by() expect a usize? - rust

I am trying to implement a function with nested for loops. I am having trouble with the .step_by() function, however.
Here is my code:
fn get_prime_factors_below(n: i32) -> HashMap<i32, Vec<i32>> {
for i in 2..n / 2 + 1 {
for j in (i * 2..n).step_by(i) {
//
}
}
return factors;
}
This returns the following error when I try to compile it:
for j in (i * 2..n).step_by(i) {
------- ^ expected `usize`, found `i32`
|
arguments to this function are incorrect
Why does .step_by() expect a usize and doesn't work with an i32?

Using usize makes sense when you are, through some manner, performing an index lookup. If you were to try and perform an index lookup with an unsigned integer that couldn't represent numbers smaller than usize (e.g. u8) you potentially wouldn't be able to address the whole container - remember that on any system Rust targets, usize is the pointer-sized unsigned integer.
The largest step you might want to make is a usize-sized step over an iterator which occupies all your memory. If we used a larger size than this, the largest step would step into memory our system can't represent. If we used a size smaller than this, we wouldn't be able to make the biggest possible step at all!
Using this same logic, by using usize, your code can be compiled for 32 bit and 64 bit systems without having to make any changes. In this way, usize is 'correct', any other integer type can create problems for you later on.

Related

In Rust, is it faster to store larger types or to store smaller ones and cast them all the time?

I'm developing a chess engine in Rust. I have a Move struct with from and to fields, which are Square. Square is a struct containing a usize, so that I can use it directly when accessing board elements of the position. Since in Rust indexing must be done with usize, I'm wondering what's the fastest way to handle this situation (note that move generation should be as fast as possible). I understand it's more memory friendly to store u8 and cast them every time I need to use them as an index, but is it faster? What would be the idiomatic way to approach this?
I have:
struct Square {index: usize}
fn position.at(square: Square) -> Option<Piece> {
position.board[square.index]
}
I've tried migrating to u8 and casting every time with mixed results:
struct Square(u8)
fn position.at(square: Square) -> Option<Piece> {
position.board[square.0 as usize]
}
Pro u8 casting:
better cache utilization (objects are smaller); but might be only interesting when there are a lot of objects
Con u8:
casting might require additional instructions on some platforms; but these are usually only register operations which are optimized by the cpu
Idiomatic way to avoid the as usize: implement a wrapper
impl Square {
#[inline]
pub fn index(&self) -> usize {
self.0 as usize
}
}
Or, when you want to make it really typesafe, implement std::ops::Index:
struct Piece;
struct Board([Piece; 64]);
struct Square(u8);
impl std::ops::Index<Square> for Board {
type Output = Piece;
fn index(&self, index: Square) -> &Self::Output {
&self.0[index.0 as usize]
}
}
When you only cast for indexing rusts compiler is smart enough to notice that, so they produce the exact same assembly.
see playground
even if that wouldn't be the case, casting from u8 to usize wouldnt be much more than a single instruction (which is pretty much no overhead)
on the other hand usize takes 8 times as much space than u8 (on a 64bit machine)
So if you plan on having A LOT of square instances that might be a factor to consider and go with the casting option, if not it pretty much doesn't matter at all

How to store state within a module between function calls?

I have the following functions in a module:
pub fn square(s: u32) -> u64 {
if s < 1 || s > 64 {
panic!("Square must be between 1 and 64")
}
total_for_square(s) - total_for_square(s - 1)
}
fn total_for_square(s: u32) -> u64 {
if s == 64 {
return u64::max_value();
}
2u64.pow(s) - 1
}
pub fn total() -> u64 {
u64::max_value()
}
This works fine when calling individual functions directly. However, I want to optimize it and cache values to total_for_square to speed up future look ups (storing in a HashMap). How should I approach where to store the HashMap so it's available between calls? I know I could refactor to put all of this in a struct, but in this case, I cannot change the API.
In other, higher level languages I have used, I would just have a variable in the same scope as the functions. However, it's not clear if that is possible in Rust on the module level.
In other, higher level languages I have used, I would just have a variable in the same scope as the functions.
You can use something similar in Rust but it's syntactically more complicated: you need to create a global for your cache using lazy_static or once_cell for instance.
The cache will need to be thread-safe though, so either a regular map sitting behind a Mutex or RwLock, or some sort of concurrent map.
Although given you only have 64 inputs, you could just precompute the entire thing and return precomputed values directly.
The cached crate comes in handy:
use cached::proc_macro::cached;
#[cached]
fn total_for_square(s: u32) -> u64 {
if s == 64 {
return u64::MAX;
}
2u64.pow(s) - 1
}
Indeed, you only need to write two lines, and the crate will take care of everything. Internally, the cached values are stored in a hash map.
(Note that u64::max_value() has been superseded by u64::MAX)
Side note: in this specific case, the simplest solution is probably to modify square so that it returns s * s.

How can I define a generic function that can return a given integer type?

I'd like to define a function that can return a number whose type is specified when the function is called. The function takes a buffer (Vec<u8>) and returns numeric value, e.g.
let byte = buf_to_num<u8>(&buf);
let integer = buf_to_num<u32>(&buf);
The buffer contains an ASCII string that represents a number, e.g. b"827", where each byte is the ASCII code of a digit.
This is my non-working code:
extern crate num;
use num::Integer;
use std::ops::{MulAssign, AddAssign};
fn buf_to_num<T: Integer + MulAssign + AddAssign>(buf: &Vec::<u8>) -> T {
let mut result : T;
for byte in buf {
result *= 10;
result += (byte - b'0');
}
result
}
I get mismatched type errors for both the addition and the multiplication lines (expected type T, found u32). So I guess my problem is how to tell the type system that T can be expressed in terms of a literal 10 or in terms of the result of (byte - b'0')?
Welcome to the joys of having to specify every single operation you're using as a generic. It's a pain, but it is worth.
You have two problems:
result *= 10; without a corresponding From<_> definition. This is because, when you specify "10", there is no way for the compiler to know what "10" as a T means - it knows primitive types, and any conversion you defined by implementing From<_> traits
You're mixing up two operations - coercion from a vector of characters to an integer, and your operation.
We need to make two assumptions for this:
We will require From<u32> so we can cap our numbers to u32
We will also clarify your logic and convert each u8 to char so we can use to_digit() to convert that to u32, before making use of From<u32> to get a T.
use std::ops::{MulAssign, AddAssign};
fn parse_to_i<T: From<u32> + MulAssign + AddAssign>(buf: &[u8]) -> T {
let mut buffer:T = (0 as u32).into();
for o in buf {
buffer *= 10.into();
buffer += (*o as char).to_digit(10).unwrap_or(0).into();
}
buffer
}
You can convince yourself of its behavior on the playground
The multiplication is resolved by force-casting the constant as u8, which makes it benefit from our requirement of From<u8> for T and allows the rust compiler to know we're not doing silly stuff.
The final change is to set result to have a default value of 0.
Let me know if this makes sense to you (or if it doesn't), and I'll be glad to elaborate further if there is a problem :-)

Do I have to use a usize to access elements of a vector?

I have a 2D vector that rejects indexing using i32 values, but works if I cast those values using as usize:
#[derive(Clone)]
struct Color;
struct Pixel {
color: Color,
}
fn shooting_star(p: &mut Vec<Vec<Pixel>>, x: i32, y: i32, w: i32, h: i32, c: Color) {
for i in x..=w {
for j in y..=h {
p[i][j].color = c.clone();
}
}
}
fn main() {}
When I compile, I get the error message
error[E0277]: the trait bound `i32: std::slice::SliceIndex<[std::vec::Vec<Pixel>]>` is not satisfied
--> src/main.rs:11:13
|
11 | p[i][j].color = c.clone();
| ^^^^ slice indices are of type `usize` or ranges of `usize`
|
= help: the trait `std::slice::SliceIndex<[std::vec::Vec<Pixel>]>` is not implemented for `i32`
= note: required because of the requirements on the impl of `std::ops::Index<i32>` for `std::vec::Vec<std::vec::Vec<Pixel>>`
If I change the code to have
p[i as usize][j as usize].color = c.clone();
Then everything works fine. However, this feels like it would be a really bizarre choice with no reason not to be handled by the Vec type.
In the documentation, there are plenty of examples like
assert_eq!(vec[0], 1);
By my understanding, if a plain number with no decimal is by default an i32, then there's no reason using an i32 to index shouldn't work.
Unlike Java, C# or even C++, numeric literals in Rust do not have a fixed type. The numeric type of a literal is usually inferred by the compiler, or explicitly stated using a suffix (0usize, 0.0f64, and so on). In that regard, the type of the 0 literal in assert_eq!(vec[0], 1); is inferred to be a usize, since Rust allows Vec indexing by numbers of type usize only.
As for the rationale behind using usize as the indexing type: a usize is equivalent to a word in the target architecture. Thus, a usize can refer to the index/address of all possible memory locations for the computer the program is running on. Thus, the maximum possible length of a vector is the maximum possible value that can be contained in a isize (isize::MAX == usize::MAX / 2). Using usize sizes and indices for a Vec prevents the creation and usage of a vector larger than the available memory itself.
Furthermore, the usage of an unsigned integer just large enough to refer all possible memory locations allows the removal of two dynamic checks, one, the supplied size/index is non-negative (if isize was used, this check would have to be implemented manually), and two, creating a vector or dereferencing a value of a vector will not cause the computer to run out of memory. However, the latter is guaranteed only when the type stored in the vector fits into one word.

What is the difference between casting to `i32` from `usize` versus the other way?

I am making a function that makes a array of size n random numbers but my comparison for the while throws an error.
while ar.len() as i32 < size { }
Complains with: expected one of !, (, +, ,, ::, <, or >, found {.
If I remove the as i32 it complains with mismatch types and if I add a as usize to the size variable then it doesn't complain.
When you cast from a smaller-sized type to a larger one, you won't lose any data, but the data will now take up more space.
When you cast from a larger-sized type to a smaller one, you might lose some of your data, but the data will take up less space.
Pretend I have a box of size 1 that can hold the numbers 0 to 9 and another box of size 2 that can hold the numbers 0 to 99.
If I want to store the number 7; both boxes will work, but I will have space left over if I use the larger box. I could move the value from the smaller box to the larger box without any trouble.
If I want to store the number 42; only one box can fit the number: the larger one. If I try to take the number and cram it in the smaller box, something will be lost, usually the upper parts of the number. In this case, my 42 would be transformed into a 2! Oops!
In addition, signed and unsigned plays a role; when you cast between signed and unsigned numbers, you might be incorrectly interpreting the value, as a number like -1 becomes 255!
See also:
How do I convert between numeric types safely and idiomatically?
In this particular case, it's a bit more complicated. A usize is defined to be a "pointer-sized integer", which is usually the native size of the machine. On a 64-bit x64 processor, that means a usize is 64 bits, and on a 32-bit x86 processor, it will be 32 bits.
Casting a usize to a i32 thus will operate differently depending on what type of machine you are running on.
The error message you get is because the code you've tried isn't syntactically correct, and the compiler isn't giving a good error message.
You really want to type
while (ar.len() as i32) < size { }
The parenthesis will help the precedence be properly applied.
To be on the safe side, I'd cast to the larger value:
while ar.len() < size as usize { }
See also:
How do I convert a usize to a u32 using TryFrom?
How to idiomatically convert between u32 and usize?
Why is type conversion from u64 to usize allowed using `as` but not `From`?
It seems that your size is of type i32. You either need parentheses:
while (ar.len() as i32) < size { }
or cast size to usize:
while ar.len() < size as usize { }
as len() returns a usize and the types on both sides of the comparison need to match. You need the parentheses in the first case so that the < operator doesn't try to compare i32 with size but rather ar.len() as i32 with size which is your intention.

Resources