I'm trying to print out rows where the string "name" has a fixed length of 20 (trailing spaces).
I then want to generate a float number (amount) with 10 whole numbers and 8 decimals, problem is I cant figure out how to format the amount/float with leading zeroes making them the same length, also for some reason currently all the decimals becomes zero.
The output I want:
John Doe D4356557654354645634564563.15343534
John Doe C5674543545645634565456345.34535767
John Doe C0000000000000000000000000.44786756
John Doe D0000000000000000000865421.12576545
What the output currently looks like:
John Doe 12345678912345C390571360.00000000
John Doe 12345678912345D5000080896.00000000
John Doe 12345678912345C4320145.50000000
John Doe 12345678912345C1073856384.00000000
Code
use rand::Rng;
use pad::PadStr;
struct Report {
name: String,
account_number: i64,
letter: char,
amount: f32,
}
fn main() {
let mut n = 1;
let mut rng = rand::thread_rng();
while n < 101 {
let acc = Report {
name: String::from("John Doe").pad_to_width(20),
account_number: 12345678912345,
letter: rng.gen_range('C'..='D'),
amount: rng.gen_range(100.1..9999999999.9),
};
println!("{}{}{}{:.8}\n", acc.name, acc.account_number, acc.letter, acc.amount);
n += 1;
}
}
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=e229dbd212a94cd9cc0be507568c48d5
(for some reason "pad" is not working at playground)
The reason you don't see any after-comma values is that f32 does not have enough precision to represent the value you would like to print. In fact, no commonly used base-2 float has the precision you need here, converting from base 10 to base 2 back and forth with fractional numbers will always result in rounding errors.
For example, if you represent '9876543210.12345678' (a number your string can represent) in IEEE-754 32-bit floats (which f32 is), you get:
Desired value: 9876543210.12345678
Most accurate representation: 9876543488.00000000
Error due to conversion: 277.87654322
Binary Representation: 01010000 00010011 00101100 00000110
Hexadecimal Representation: 0x50132c06
Even f64 still makes errors:
Desired value: 9876543210.12345678
Value actually stored in double: 9876543210.1234569549560546875
Error due to conversion: 0.0000001749560546854
Binary Representation: 01000010 00000010 01100101 10000000 10110111 01010000 11111100 11010111
Hexadecimal Representation: 0x42026580B750FCD7
Don't use floats for this task. Use integers instead. For example two integers that can then be printed with a dot in between.
If you really do need the actual value, use a decimal float instead, so it can actually represent the values you are trying to encode.
Another alternative would be to use fixed point integers. In other words, shift all values of your numbers by 8 decimals and represent them as an integer. Then, add the dot again when printing them. In your case you are lucky; if you store 9876543210.12345678 as the integer 987654321012345678, it still fits into 64 bits, so you can use a u64 to represent it. In fact, given that your example seems to be talking about an amount of money, this is most likely how it was intended.
Example:
Don't use pad for padding. It's already built into Rust's format macro: format!("{:<20}", "John Doe")
use rand::Rng;
#[derive(Debug)]
struct MoneyAmount(u64);
impl MoneyAmount {
fn as_value_string(&self) -> String {
let mut s = format! {"{:>018}", self.0};
s.insert(10, '.');
s
}
}
struct Report {
name: String,
account_number: i64,
letter: char,
amount: MoneyAmount,
}
fn main() {
let mut rng = rand::thread_rng();
for _ in 0..10 {
let acc = Report {
name: "John Doe".to_string(),
account_number: 12345678912345,
letter: rng.gen_range('C'..='D'),
amount: MoneyAmount(rng.gen_range(100100000000..999999999999999999)),
};
println!(
"{:<20}{}{}{}",
acc.name,
acc.account_number,
acc.letter,
acc.amount.as_value_string()
);
}
}
John Doe 12345678912345D3628299098.68538932
John Doe 12345678912345C5874745565.85000457
John Doe 12345678912345C6337870441.26543580
John Doe 12345678912345D1215442576.70454002
John Doe 12345678912345C3402018622.70996714
John Doe 12345678912345C7999783867.43749281
John Doe 12345678912345D5797682336.45356635
John Doe 12345678912345D1707577080.35404025
John Doe 12345678912345D5813907399.04925935
John Doe 12345678912345C0611246390.19108372
Related
This question already has an answer here:
How do I print an integer in binary with leading zeros?
(1 answer)
Closed last year.
Would anyone know how to convert a binary number into a string that represents its digits ?
let s: u32 = 0b00100000001011001100001101110001110000110010110011100000;
I need study different parts of this binary number by cuting it into pieces (ex first 5 digits, then digit 6 to 15, etc...).
In order to do so, I'm thinking using string slices but first I need to convert the binary number into a string ( "00100000010110011...").
Thank you !
Use binary format:
fn main() {
let s: u64 = 0b00100000001011001100001101110001110000110010110011100000u64;
let s_str: String = format!("{s:b}");
println!("{s_str}");
}
Playground
How do I format 5.0 into 05.0%? Using {:02.2} does not seem to work; it does the same as {.2}.
https://doc.rust-lang.org/std/fmt/#width
fn main() {
println!("{:04.1}%", 5.0);
}
prints
05.0%
which is padded to a length of 4 characters including the decimal point. Your example would be padded to length 2, but already has length 3, thus nothing changes.
In Java, I could do this.
int diff = 'Z' - 'A'; // 25
I have tried the same in Rust:
fn main() {
'Z' - 'A';
}
but the compiler complains:
error[E0369]: binary operation `-` cannot be applied to type `char`
--> src/main.rs:2:5
|
2 | 'Z' - 'A';
| ^^^^^^^^^
|
= note: an implementation of `std::ops::Sub` might be missing for `char`
How can I do the equivalent operation in Rust?
The operation is meaningless in a Unicode world, and barely ever meaningful in an ASCII world, this is why Rust doesn't provide it directly, but there are two ways to do this depending on your use case:
Cast the characters to their scalar value: 'Z' as u32 - 'A' as u32
Use byte character literals: b'Z' - b'A'
Math is not meaningless in unicode, that misses the most amazing feature of utf-8.
Any 7bit char with 0 high bit is valid us-ascii, a 7bit us-ascii doc is valid utf-8. You can treat utf-8 as us-ascii bytes provided all comparisons and math deal with values lower than 127.
This is by design of utf-8, C code tends to just work, however rust makes this complicated.
Given a string value: &str
Grab the bytes as_bytes()
for byt in value.as_bytes() {
let mut c = *byt; // c is u8 (unsigned byte)
// if we are dealing with us-ascii chars...
if c >= b'A' && c <= b'Z' {
// math works, this converts to us-ascii lowercase
c = c + 32;
}
// to treat the u8 as a rust char cast it
let ch = c as char;
// now you can write sane code like
if ch == ':' || ch == ' ' || ch == '/' {
....
// but you cant do math anymore
This math is not meaningless, +32 is a handy lowercase function for A-Z and this is valid treatment of utf-8 chars.
It is not by accident that a + 1 = b in utf-8. Ascii-beticaly ordering may not be the same as real world alphabetical ordering, but it is still useful because it performs well over a common range of characters.
It is not meaningless that '3' + 1 = '4' in ascii.
You will not break strings utf-8 as bytes, simple code like if (c == 'a') will work even if you have smiley poos in the string.
Doing math on Rust's char is impossible, which is a shame.
Doing math on one byte of a utf-8 string is as valid as its ever been.
I have a piece of text with characters of different bytelength.
let text = "Hello привет";
I need to take a slice of the string given start (included) and end (excluded) character indices. I tried this
let slice = &text[start..end];
and got the following error
thread 'main' panicked at 'byte index 7 is not a char boundary; it is inside 'п' (bytes 6..8) of `Hello привет`'
I suppose it happens since Cyrillic letters are multi-byte and the [..] notation takes chars using byte indices. What can I use if I want to slice using character indices, like I do in Python:
slice = text[start:end] ?
I know I can use the chars() iterator and manually walk through the desired substring, but is there a more concise way?
Possible solutions to codepoint slicing
I know I can use the chars() iterator and manually walk through the desired substring, but is there a more concise way?
If you know the exact byte indices, you can slice a string:
let text = "Hello привет";
println!("{}", &text[2..10]);
This prints "llo пр". So the problem is to find out the exact byte position. You can do that fairly easily with the char_indices() iterator (alternatively you could use chars() with char::len_utf8()):
let text = "Hello привет";
let end = text.char_indices().map(|(i, _)| i).nth(8).unwrap();
println!("{}", &text[2..end]);
As another alternative, you can first collect the string into Vec<char>. Then, indexing is simple, but to print it as a string, you have to collect it again or write your own function to do it.
let text = "Hello привет";
let text_vec = text.chars().collect::<Vec<_>>();
println!("{}", text_vec[2..8].iter().cloned().collect::<String>());
Why is this not easier?
As you can see, neither of these solutions is all that great. This is intentional, for two reasons:
As str is a simply UTF8 buffer, indexing by unicode codepoints is an O(n) operation. Usually, people expect the [] operator to be a O(1) operation. Rust makes this runtime complexity explicit and doesn't try to hide it. In both solutions above you can clearly see that it's not O(1).
But the more important reason:
Unicode codepoints are generally not a useful unit
What Python does (and what you think you want) is not all that useful. It all comes down to the complexity of language and thus the complexity of unicode. Python slices Unicode codepoints. This is what a Rust char represents. It's 32 bit big (a few fewer bits would suffice, but we round up to a power of 2).
But what you actually want to do is slice user perceived characters. But this is an explicitly loosely defined term. Different cultures and languages regard different things as "one character". The closest approximation is a "grapheme cluster". Such a cluster can consist of one or more unicode codepoints. Consider this Python 3 code:
>>> s = "Jürgen"
>>> s[0:2]
'Ju'
Surprising, right? This is because the string above is:
0x004A LATIN CAPITAL LETTER J
0x0075 LATIN SMALL LETTER U
0x0308 COMBINING DIAERESIS
...
This is an example of a combining character that is rendered as part of the previous character. Python slicing does the "wrong" thing here.
Another example:
>>> s = "fire"
>>> s[0:2]
'fir'
Also not what you'd expect. This time, fi is actually the ligature fi, which is one codepoint.
There are far more examples where Unicode behaves in a surprising way. See the links at the bottom for more information and examples.
So if you want to work with international strings that should be able to work everywhere, don't do codepoint slicing! If you really need to semantically view the string as a series of characters, use grapheme clusters. To do that, the crate unicode-segmentation is very useful.
Further resources on this topic:
Blogpost "Let's stop ascribing meaning to unicode codepoints"
Blogpost "Breaking our Latin-1 assumptions
http://utf8everywhere.org/
A UTF-8 encoded string may contain characters which consists of multiple bytes. In your case, п starts at index 6 (inclusive) and ends at position 8 (exclusive) so indexing 7 is not the start of the character. This is why your error occurred.
You may use str::char_indices() for solving this (remember, that getting to a position in a UTF-8 string is O(n)):
fn get_utf8_slice(string: &str, start: usize, end: usize) -> Option<&str> {
assert!(end >= start);
string.char_indices().nth(start).and_then(|(start_pos, _)| {
string[start_pos..]
.char_indices()
.nth(end - start - 1)
.map(|(end_pos, _)| &string[start_pos..end_pos])
})
}
playground
You may use str::chars() if you are fine with getting a String:
let string: String = text.chars().take(end).skip(start).collect();
Here is a function which retrieves a utf8 slice, with the following pros:
handle all edge cases (empty input, 0-width output ranges, out-of-scope ranges);
never panics;
use start-inclusive, end-exclusive ranges.
pub fn utf8_slice(s: &str, start: usize, end: usize) -> Option<&str> {
let mut iter = s.char_indices()
.map(|(pos, _)| pos)
.chain(Some(s.len()))
.skip(start)
.peekable();
let start_pos = *iter.peek()?;
for _ in start..end { iter.next(); }
Some(&s[start_pos..*iter.peek()?])
}
Why do you have to walk over the string to find the nᵗʰ letter of a string when you do s[n] where s is a string. (According to https://doc.rust-lang.org/book/strings.html)
From what I understood, a string is an array of chars and a char is an array of 4 bytes or a number of 4 bytes. So is getting the nth letter would be similar as doing this : v[4*n..4*n+4] where v is a vector ?
What is the cost of v[i..j] ?
I would assume that the cost of v[i..j] is j-i and so that the cost of s[n] should be 4.
Note: The second edition of The Rust Programming Language has an improved and smooth explanation to Strings in Rust, which you might wish to read as well. The answer below, although still accurate, quotes from the first edition of the book.
I will try to clarify these misconceptions about strings in Rust by quoting from the book (https://doc.rust-lang.org/book/strings.html).
A ‘string’ is a sequence of Unicode scalar values encoded as a stream of UTF-8 bytes. All strings are guaranteed to be a valid encoding of UTF-8 sequences.
With this in mind, plus that UTF-8 code points are variably sized (1 to 4 bytes depending on the character), all strings in Rust, whether they are &str or String, are not arrays of characters, and can not be treated like such. It is further explained why on Slicing:
Because strings are valid UTF-8, they do not support indexing:
let s = "hello";
println!("The first letter of s is {}", s[0]); // ERROR!!!
Usually, access to a vector with [] is very fast. But, because each character in a UTF-8 encoded string can be multiple bytes, you have to walk over the string to find the nᵗʰ letter of a string. This is a significantly more expensive operation, and we don’t want to be misleading.
Unlike what was mentioned in the question, one cannot do s[n], because although in theory this would allows us to fetch the nth byte in constant time, that byte is not guaranteed to make any sense on its own.
What is the cost of v[i..j] ?
The cost of slicing is actually constant, because it is done at byte-level:
You can get a slice of a string with slicing syntax:
let dog = "hachiko";
let hachi = &dog[0..5];
But note that these are byte offsets, not character offsets. So this will fail at runtime:
let dog = "忠犬ハチ公";
let hachi = &dog[0..2];
with this error:
thread '' panicked at 'index 0 and/or 2 in 忠犬ハチ公 do not lie on
character boundary'
Basically, slicing is acceptable and will yield a new view of that string, so no copies are made. However, it should only be used when you are completely sure that the offsets are right in terms of character boundaries.
In order to iterate over each character of a string, you may instead call chars():
let c = s.chars().nth(n);
Even with that in mind, note that handling Unicode character might not be exactly what you want if you wish to handle character modifiers in UTF-8 (which are scalar values by themselves but should not be treated individually either). Quoting now from the str API:
fn chars(&self) -> Chars
Returns an iterator over the chars of a string slice.
As a string slice consists of valid UTF-8, we can iterate through a string slice by char. This method returns such an iterator.
It's important to remember that char represents a Unicode Scalar Value, and may not match your idea of what a 'character' is. Iteration over grapheme clusters may be what you actually want.
Remember, chars may not match your human intuition about characters:
let y = "y̆";
let mut chars = y.chars();
assert_eq!(Some('y'), chars.next()); // not 'y̆'
assert_eq!(Some('\u{0306}'), chars.next());
assert_eq!(None, chars.next());
The unicode_segmentation crate provides a means to define grapheme cluster boundaries:
extern crate unicode_segmentation;
use unicode_segmentation::UnicodeSegmentation;
let s = "a̐éö̲\r\n";
let g = UnicodeSegmentation::graphemes(s, true).collect::<Vec<&str>>();
let b: &[_] = &["a̐", "é", "ö̲", "\r\n"];
assert_eq!(g, b);
If you do want to treat the string as an array of codepoints (which isn't strictly the same as characters; there are combining marks, emoji with separate skin-tone modifiers, etc.), you can collect it into a Vec:
fn main() {
let s = "£10 🙃!";
for (i,c) in s.char_indices() {
println!("{} {}", i, c);
}
let v: Vec<char> = s.chars().collect();
println!("v[5] = {}", v[5]);
}
Play link
With bonus demonstration of some varying character widths, this outputs:
0 £
2 1
3 0
4
5 🙃
9 !
v[5] = !