How to iterate over all byte values (overflowing_literals in `0..256`)

How to iterate over all byte values (overflowing_literals in `0..256`) - rust

I'm trying to iterate over all possible byte (u8) values. Unfortunately my range literals in 0..256 are cast to u8 and 256 overflows:
fn foo(byte: u8) {
println!("{}", byte);
}
fn main() {
for byte in 0..256 {
foo(byte);
println!("Never executed.");
}
for byte in 0..1 {
foo(byte);
println!("Executed once.");
}
}
The above compiles with:
warning: literal out of range for u8
--> src/main.rs:6:20
|
6 | for byte in 0..256 {
| ^^^
|
= note: #[warn(overflowing_literals)] on by default
The first loop body is never executed at all.
My workaround is very ugly and feels brittle because of the cast:
for short in 0..256 {
let _explicit_type: u16 = short;
foo(short as u8);
}
Is there a better way?

As of Rust 1.26, inclusive ranges are stabilized using the syntax ..=, so you can write this as:
for byte in 0..=255 {
foo(byte);
}

This is issue Unable to create a range with max value.
The gist of it is that byte is inferred to be u8, and therefore 0..256 is represented as a Range<u8> but unfortunately 256 overflows as an u8.
The current work-around is to use a larger integral type and cast to u8 later on since 256 is actually never reached.
There is a RFC for inclusive range with ... which has entered final comment period; maybe in the future it'll be possible to have for byte in 0...255 or its alternative (0..255).inclusive().

Related

How to generate a random String of alphanumeric chars?

The first part of the question is probably pretty common and there are enough code samples that explain how to generate a random string of alphanumerics. The piece of code I use is from here.
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let rand_string: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect();
println!("{}", rand_string);
}
This piece of code does however not compile, (note: I'm on nightly):
error[E0277]: a value of type `String` cannot be built from an iterator over elements of type `u8`
--> src/main.rs:8:10
|
8 | .collect();
| ^^^^^^^ value of type `String` cannot be built from `std::iter::Iterator<Item=u8>`
|
= help: the trait `FromIterator<u8>` is not implemented for `String`
Ok, the elements that are generated are of type u8. So I guess this is an array or vector of u8:
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let r = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>();
let s = String::from_utf8_lossy(&r);
println!("{}", s);
}
And this compiles and works!
2dCsTqoNUR1f0EzRV60IiuHlaM4TfK
All good, except that I would like to ask if someone could explain what exactly happens regarding the types and how this can be optimised.
Questions
.sample_iter(&Alphanumeric) produces u8 and not chars?
How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?
The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.

.sample_iter(&Alphanumeric) produces u8 and not chars?
Yes, this was changed in rand v0.8. You can see in the docs for 0.7.3:
impl Distribution<char> for Alphanumeric
But then in the docs for 0.8.0:
impl Distribution<u8> for Alphanumeric
How can I avoid the second variable s and directly interpret an u8 as a utf-8 character? I guess the representation in memory would not change at all?
There are a couple of ways to do this, the most obvious being to just cast every u8 to a char:
let s: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.map(|x| x as char)
.collect();
Or, using the From<u8> instance of char:
let s: String = thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.map(char::from)
.collect();
Of course here, since you know every u8 must be valid UTF-8, you can use String::from_utf8_unchecked, which is faster than from_utf8_lossy (although probably around the same speed as the as char method):
let s = unsafe {
String::from_utf8_unchecked(
thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>(),
)
};
If, for some reason, the unsafe bothers you and you want to stay safe, then you can use the slower String::from_utf8 and unwrap the Result so you get a panic instead of UB (even though the code should never panic or UB):
let s = String::from_utf8(
thread_rng()
.sample_iter(&Alphanumeric)
.take(30)
.collect::<Vec<_>>(),
).unwrap();
The length of these strings should always be 30. How can I optimise the heap allocation of a Vec away? Also they could actually be char[] instead of Strings.
First of all, trust me, you don't want arrays of chars. They are not fun to work with. If you want a stack string, have a u8 array then use a function like std::str::from_utf8 or the faster std::str::from_utf8_unchecked (again only usable since you know valid utf8 will be generated.)
As to optimizing the heap allocation away, refer to this answer. Basically, it's not possible with a bit of hackiness/ugliness (such as making your own function that collects an iterator into an array of 30 elements).
Once const generics are finally stabilized, there'll be a much prettier solution.

The first example in the docs for rand::distributions::Alphanumeric shows that if you want to convert the u8s into chars you should map them using the char::from function:
use rand::{thread_rng, Rng};
use rand::distributions::Alphanumeric;
fn main() {
let rand_string: String = thread_rng()
.sample_iter(&Alphanumeric)
.map(char::from) // map added here
.take(30)
.collect();
println!("{}", rand_string);
}
playground

How can I define a generic function that can return a given integer type?

I'd like to define a function that can return a number whose type is specified when the function is called. The function takes a buffer (Vec<u8>) and returns numeric value, e.g.
let byte = buf_to_num<u8>(&buf);
let integer = buf_to_num<u32>(&buf);
The buffer contains an ASCII string that represents a number, e.g. b"827", where each byte is the ASCII code of a digit.
This is my non-working code:
extern crate num;
use num::Integer;
use std::ops::{MulAssign, AddAssign};
fn buf_to_num<T: Integer + MulAssign + AddAssign>(buf: &Vec::<u8>) -> T {
let mut result : T;
for byte in buf {
result *= 10;
result += (byte - b'0');
}
result
}
I get mismatched type errors for both the addition and the multiplication lines (expected type T, found u32). So I guess my problem is how to tell the type system that T can be expressed in terms of a literal 10 or in terms of the result of (byte - b'0')?

Welcome to the joys of having to specify every single operation you're using as a generic. It's a pain, but it is worth.
You have two problems:
result *= 10; without a corresponding From<_> definition. This is because, when you specify "10", there is no way for the compiler to know what "10" as a T means - it knows primitive types, and any conversion you defined by implementing From<_> traits
You're mixing up two operations - coercion from a vector of characters to an integer, and your operation.
We need to make two assumptions for this:
We will require From<u32> so we can cap our numbers to u32
We will also clarify your logic and convert each u8 to char so we can use to_digit() to convert that to u32, before making use of From<u32> to get a T.
use std::ops::{MulAssign, AddAssign};
fn parse_to_i<T: From<u32> + MulAssign + AddAssign>(buf: &[u8]) -> T {
let mut buffer:T = (0 as u32).into();
for o in buf {
buffer *= 10.into();
buffer += (*o as char).to_digit(10).unwrap_or(0).into();
}
buffer
}
You can convince yourself of its behavior on the playground
The multiplication is resolved by force-casting the constant as u8, which makes it benefit from our requirement of From<u8> for T and allows the rust compiler to know we're not doing silly stuff.
The final change is to set result to have a default value of 0.
Let me know if this makes sense to you (or if it doesn't), and I'll be glad to elaborate further if there is a problem :-)

How to calculate u64 modulus u8 in Rust? [duplicate]

Editor's note: This question is from a version of Rust prior to 1.0 and references some items that are not present in Rust 1.0. The answers still contain valuable information.
What's the idiomatic way to convert from (say) a usize to a u32?
For example, casting using 4294967295us as u32 works and the Rust 0.12 reference docs on type casting say
A numeric value can be cast to any numeric type. A raw pointer value can be cast to or from any integral type or raw pointer type. Any other cast is unsupported and will fail to compile.
but 4294967296us as u32 will silently overflow and give a result of 0.
I found ToPrimitive and FromPrimitive which provide nice functions like to_u32() -> Option<u32>, but they're marked as unstable:
#[unstable(feature = "core", reason = "trait is likely to be removed")]
What's the idiomatic (and safe) way to convert between numeric (and pointer) types?
The platform-dependent size of isize / usize is one reason why I'm asking this question - the original scenario was I wanted to convert from u32 to usize so I could represent a tree in a Vec<u32> (e.g. let t = Vec![0u32, 0u32, 1u32], then to get the grand-parent of node 2 would be t[t[2us] as usize]), and I wondered how it would fail if usize was less than 32 bits.

Converting values
From a type that fits completely within another
There's no problem here. Use the From trait to be explicit that there's no loss occurring:
fn example(v: i8) -> i32 {
i32::from(v) // or v.into()
}
You could choose to use as, but it's recommended to avoid it when you don't need it (see below):
fn example(v: i8) -> i32 {
v as i32
}
From a type that doesn't fit completely in another
There isn't a single method that makes general sense - you are asking how to fit two things in a space meant for one. One good initial attempt is to use an Option — Some when the value fits and None otherwise. You can then fail your program or substitute a default value, depending on your needs.
Since Rust 1.34, you can use TryFrom:
use std::convert::TryFrom;
fn example(v: i32) -> Option<i8> {
i8::try_from(v).ok()
}
Before that, you'd have to write similar code yourself:
fn example(v: i32) -> Option<i8> {
if v > std::i8::MAX as i32 {
None
} else {
Some(v as i8)
}
}
From a type that may or may not fit completely within another
The range of numbers isize / usize can represent changes based on the platform you are compiling for. You'll need to use TryFrom regardless of your current platform.
See also:
How do I convert a usize to a u32 using TryFrom?
Why is type conversion from u64 to usize allowed using `as` but not `From`?
What as does
but 4294967296us as u32 will silently overflow and give a result of 0
When converting to a smaller type, as just takes the lower bits of the number, disregarding the upper bits, including the sign:
fn main() {
let a: u16 = 0x1234;
let b: u8 = a as u8;
println!("0x{:04x}, 0x{:02x}", a, b); // 0x1234, 0x34
let a: i16 = -257;
let b: u8 = a as u8;
println!("0x{:02x}, 0x{:02x}", a, b); // 0xfeff, 0xff
}
See also:
What is the difference between From::from and as in Rust?
About ToPrimitive / FromPrimitive
RFC 369, Num Reform, states:
Ideally [...] ToPrimitive [...] would all be removed in favor of a more principled way of working with C-like enums
In the meantime, these traits live on in the num crate:
ToPrimitive
FromPrimitive

Can a BigInteger be truncated to an i32 in Rust?

In Java, intValue() gives back a truncated portion of the BigInteger instance. I wrote a similar program in Rust but it appears not to truncate:
extern crate num;
use num::bigint::{BigInt, RandBigInt};
use num::ToPrimitive;
fn main() {
println!("Hello, world!");
truncate_num(
BigInt::parse_bytes(b"423445324324324324234324", 10).unwrap(),
BigInt::parse_bytes(b"22447", 10).unwrap(),
);
}
fn truncate_num(num1: BigInt, num2: BigInt) -> i32 {
println!("Truncation of {} is {:?}.", num1, num1.to_i32());
println!("Truncation of {} is {:?}.", num2, num2.to_i32());
return 0;
}
The output I get from this is
Hello, world!
Truncation of 423445324324324324234324 is None.
Truncation of 22447 is Some(22447).
How can I achieve this in Rust? Should I try a conversion to String and then truncate manually? This would be my last resort.

Java's intValue() returns the lowest 32 bits of the integer. This could be done by a bitwise-AND operation x & 0xffffffff. A BigInt in Rust doesn't support bitwise manipulation, but you could first convert it to a BigUint which supports such operations.
fn truncate_biguint_to_u32(a: &BigUint) -> u32 {
use std::u32;
let mask = BigUint::from(u32::MAX);
(a & mask).to_u32().unwrap()
}
Converting BigInt to BigUint will be successful only when it is not negative. If the BigInt is negative (-x), we could find the lowest 32 bits of its absolute value (x), then negate the result.
fn truncate_bigint_to_u32(a: &BigInt) -> u32 {
use num_traits::Signed;
let was_negative = a.is_negative();
let abs = a.abs().to_biguint().unwrap();
let mut truncated = truncate_biguint_to_u32(&abs);
if was_negative {
truncated.wrapping_neg()
} else {
truncated
}
}
Demo
You may use truncate_bigint_to_u32(a) as i32 if you need a signed number.
There is also a to_signed_bytes_le() method with which you could extract the bytes and decode that into a primitive integer directly:
fn truncate_bigint_to_u32_slow(a: &BigInt) -> u32 {
let mut bytes = a.to_signed_bytes_le();
bytes.resize(4, 0);
bytes[0] as u32 | (bytes[1] as u32) << 8 | (bytes[2] as u32) << 16 | (bytes[3] as u32) << 24
}
This method is extremely slow compared to the above methods and I don't recommend using it.

There's no natural truncation of a big integer into a smaller one. Either it fits or you have to decide what value you want.
You could do this:
println!("Truncation of {} is {:?}.", num1, num1.to_i32().unwrap_or(-1));
or
println!("Truncation of {} is {:?}.", num1, num1.to_i32().unwrap_or(std::i32::MAX));
but your application logic should probably dictate what's the desired behavior when the returned option contains no value.

What does “`str` does not have a constant size known at compile-time” mean, and what's the simplest way to fix it?

I'm trying to manipulate a string derived from a function parameter and then return the result of that manipulation:
fn main() {
let a: [u8; 3] = [0, 1, 2];
for i in a.iter() {
println!("{}", choose("abc", *i));
}
}
fn choose(s: &str, pad: u8) -> String {
let c = match pad {
0 => ["000000000000000", s].join("")[s.len()..],
1 => [s, "000000000000000"].join("")[..16],
_ => ["00", s, "0000000000000"].join("")[..16],
};
c.to_string()
}
On building, I get this error:
error[E0277]: the trait bound `str: std::marker::Sized` is not satisfied
--> src\main.rs:9:9
|
9 | let c = match pad {
| ^ `str` does not have a constant size known at compile-time
|
= help: the trait `std::marker::Sized` is not implemented for `str`
= note: all local variables must have a statically known size
What's wrong here, and what's the simplest way to fix it?

TL;DR Don't use str, use &str. The reference is important.
The issue can be simplified to this:
fn main() {
let demo = "demo"[..];
}
You are attempting to slice a &str (but the same would happen for a String, &[T], Vec<T>, etc.), but have not taken a reference to the result. This means that the type of demo would be str. To fix it, add an &:
let demo = &"demo"[..];
In your broader example, you are also running into the fact that you are creating an allocated String inside of the match statement (via join) and then attempting to return a reference to it. This is disallowed because the String will be dropped at the end of the match, invalidating any references. In another language, this could lead to memory unsafety.
One potential fix is to store the created String for the duration of the function, preventing its deallocation until after the new string is created:
fn choose(s: &str, pad: u8) -> String {
let tmp;
match pad {
0 => {
tmp = ["000000000000000", s].join("");
&tmp[s.len()..]
}
1 => {
tmp = [s, "000000000000000"].join("");
&tmp[..16]
}
_ => {
tmp = ["00", s, "0000000000000"].join("");
&tmp[..16]
}
}.to_string()
}
Editorially, there's probably more efficient ways of writing this function. The formatting machinery has options for padding strings. You might even be able to just truncate the string returned from join without creating a new one.
What it means is harder to explain succinctly. Rust has a number of types that are unsized. The most prevalent ones are str and [T]. Contrast these types to how you normally see them used: &str or &[T]. You might even see them as Box<str> or Arc<[T]>. The commonality is that they are always used behind a reference of some kind.
Because these types don't have a size, they cannot be stored in a variable on the stack — the compiler wouldn't know how much stack space to reserve for them! That's the essence of the error message.
See also:
What is the return type of the indexing operation?
Return local String as a slice (&str)
Why your first FizzBuzz implementation may not work

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to iterate over all byte values (overflowing_literals in `0..256`) - rust

As of Rust 1.26, inclusive ranges are stabilized using the syntax ..=, so you can write this as: for byte in 0..=255 { foo(byte); }

Related

How to generate a random String of alphanumeric chars?

How can I define a generic function that can return a given integer type?

How to calculate u64 modulus u8 in Rust? [duplicate]

Can a BigInteger be truncated to an i32 in Rust?

What does “`str` does not have a constant size known at compile-time” mean, and what's the simplest way to fix it?

Categories

Resources