Why is the amount of bytes read different in each case? - rust

Snippet is trying to count the number of bytes read in the following sample.txt
sample.txt
one two three four five six
seven eight nine ten eleven twelve
thirteen fourteen fifteen sixteen
%
case 1:
let file = File::open(fname)?;
let mut reader = BufReader::new(&file);
let mut buffer: Vec<u8> = vec![];
let num_bytes = reader.read_until(b'%', &mut buffer);
//println!("{}", String::from_utf8(buffer).unwrap());
println!("read_bytes: {}", num_bytes.unwrap());
read_bytes: 101
case 2:
let file = File::open(fname)?;
let mut reader = BufReader::new(&file);
let mut num_bytes: u32 = 0;
for readline in reader.lines() {
if let Ok(line) = readline {
//println!("{}", line);
let bytes = line.as_bytes();
num_bytes += bytes.len() as u32;
if bytes == b"%" {
break;
}
}
}
println!("read_bytes: {}", num_bytes)
read_bytes: 98
I can't seem to figure out why the two cases are outputting different results. Any help with appreciated thanks

From the docs for BufRead.lines:
The iterator returned from this function will yield instances of io::Result<String>. Each string returned will not have a newline byte.
Your count is off by 3 because you have 3 lines in the data and newline characters are not being counted in the second example.

Related

u32 to ASCII Bytes without String Rust

i want to convert u32 into ASCII bytes.
input: 1u32
output [49]
This was my try, but its empty with 0u32 and also using Vec, i would prefer ArrayVec but how do i know the size of the number. Is there any simple way to do this , without using any dynamic allocations?
let mut num = 1u32;
let base = 10u32;
let mut a: Vec<char> = Vec::new();
while num != 0 {
let chars = char::from_digit(num % base,10u32).unwrap();
a.push(chars);
num /= base;
}
let mut vec_of_u8s: Vec<u8> = a.iter().map(|c| *c as u8).collect();
vec_of_u8s.reverse();
println!("{:?}",vec_of_u8s)
Use the write! macro and ArrayVec with the capacity set to 10 (the maximum digits of a u32):
use std::io::Write;
use arrayvec::ArrayVec; // 0.7.2
fn main() {
let input = 1u32;
let mut buffer = ArrayVec::<u8, 10>::new();
write!(buffer, "{}", input).unwrap();
dbg!(buffer);
}
[src/main.rs:10] buffer = [
49,
]

Difference between double quotes and single quotes in Rust

I was doing the adventofcode of 2020 day 3 in Rust to train a little bit because I am new to Rust and I my code would not compile depending if I used single quotes or double quotes on my "tree" variable
the first code snippet would not compile and throw the error: expected u8, found &[u8; 1]
use std::fs;
fn main() {
let text: String = fs::read_to_string("./data/text").unwrap();
let vec: Vec<&str> = text.lines().collect();
let vec_vertical_len = vec.len();
let vec_horizontal_len = vec[0].len();
let mut i_pointer: usize = 0;
let mut j_pointer: usize = 0;
let mut tree_counter: usize = 0;
let tree = b"#";
loop {
i_pointer += 3;
j_pointer += 1;
if j_pointer >= vec_vertical_len {
break;
}
let i_index = i_pointer % vec_horizontal_len;
let character = vec[j_pointer].as_bytes()[i_index];
if character == tree {
tree_counter += 1
}
}
println!("{}", tree_counter);
}
the second snippet compiles and gives the right answer..
use std::fs;
fn main() {
let text: String = fs::read_to_string("./data/text").unwrap();
let vec: Vec<&str> = text.lines().collect();
let vec_vertical_len = vec.len();
let vec_horizontal_len = vec[0].len();
let mut i_pointer: usize = 0;
let mut j_pointer: usize = 0;
let mut tree_counter: usize = 0;
let tree = b'#';
loop {
i_pointer += 3;
j_pointer += 1;
if j_pointer >= vec_vertical_len {
break;
}
let i_index = i_pointer % vec_horizontal_len;
let character = vec[j_pointer].as_bytes()[i_index];
if character == tree {
tree_counter += 1
}
}
println!("{}", tree_counter);
}
I did not find any reference explaining what is going on when using single or double quotes..can someone help me?
The short answer is it works similarly to java. Single quotes for characters and double quotes for strings.
let a: char = 'k';
let b: &'static str = "k";
The b'' or b"" prefix means take what I have here and interpret as byte literals instead.
let a: u8 = b'k';
let b: &'static [u8; 1] = b"k";
The reason strings result in references is due to how they are stored in the compiled binary. It would be too inefficient to store a string constant inside each method, so strings get put at the beginning of the binary in header area. When your program is being executed, you are taking a reference to the bytes in that header (hence the static lifetime).
Going further down the rabbit hole, single quotes technically hold a codepoint. This is essentially what you might think of as a character. So a Unicode character would also be considered a single codepoint even though it may be multiple bytes long. A codepoint is assumed to fit into a u32 or less so you can safely convert any char by using as u32, but not the other way around since not all u32 values will match valid codepoints. This also means b'\u{x}' is not valid since \u{x} may produce characters that will not fit within a single byte.
// U+1F600 is a unicode smiley face
let a: char = '\u{1F600}';
assert_eq!(a as u32, 0x1F600);
However, you might find it interesting to know that since Rust strings are stored as UTF-8, codepoints over 127 will occupy multiple bytes in a string despite fitting into a single byte on their own. As you may already know, UTF-8 is simply a way of converting codepoints to bytes and back again.
let foo: &'static str = "\u{1F600}";
let foo_chars: Vec<char> = foo.chars().collect();
let foo_bytes: Vec<u8> = foo.bytes().collect();
assert_eq!(foo_chars.len(), 1);
assert_eq!(foo_bytes.len(), 4);
assert_eq!(foo_chars[0] as u32, 0x1F600);
assert_eq!(foo_bytes, vec![240, 159, 152, 128]);

Remove value if part of vector, and if so accumulate it to another variable

I currently do it this way:
// v is a vector with thousands of sorted unsigned int value.
let mut total = 0;
// [...]
// some loop
let a = 5;
if v.iter().any(|&x| x == a as u16) {
total += a;
v.retain(|&x| x != a as u16);
}
// end loop
But it is quite inefficient since I iterate twice over v (although perhaps the compiler would catch this and optimize), isn't it a more elegant way to do it with Rust?
NB: The vector is sorted and contains no duplicate values if it can help
If I understand correctly your request, here a solution:
You say your vector is sorted so you can use binary_search()
And so you can use remove()
fn foo(data: &mut Vec<u16>) -> u64 {
let mut total: u64 = 0;
let mut a = 0;
while data.len() > 0 {
if let Ok(i) = data.binary_search(&a) {
total += data.remove(i) as u64;
}
a += 1;
}
total
}
fn main() {
let mut data = vec![1, 3, 8, 9, 46];
assert_eq!(foo(&mut data), 67);
}
This keep the vector sorted while removing, note that this is a dummy example. If you don't care about sorting you can use swap_remove() but this disallow the use of binary_search().
It's hard to say what would be the better.

Converting large number stored in array of u32 to bytes and back

I'm doing some computational mathematics in Rust, and I have some large numbers which I store in an array of 24 values. I have functions that convert them to bytes and back, but it doesn't work fine for u32 values, whereas it works fine for u64. The code sample can be found below:
fn main() {
let mut bytes = [0u8; 96]; // since u32 is 4 bytes in my system, 4*24 = 96
let mut j;
let mut k: u32;
let mut num: [u32; 24] = [1335565270, 4203813549, 2020505583, 2839365494, 2315860270, 442833049, 1854500981, 2254414916, 4192631541, 2072826612, 1479410393, 718887683, 1421359821, 733943433, 4073545728, 4141847560, 1761299410, 3068851576, 1582484065, 1882676300, 1565750229, 4185060747, 1883946895, 4146];
println!("original_num: {:?}", num);
for i in 0..96 {
j = i / 4;
k = (i % 4) as u32;
bytes[i as usize] = (num[j as usize] >> (4 * k)) as u8;
}
println!("num_to_ytes: {:?}", &bytes[..]);
num = [0u32; 24];
for i in 0..96 {
j = i / 4;
k = (i % 4) as u32;
num[j as usize] |= (bytes[i as usize] as u32) << (4 * k);
}
println!("recovered_num: {:?}", num);
}
Rust playground
The above code does not retrieve the correct number from the byte array. But, if I change all u32 to u64, all 4s to 8s, and reduce the size of num from 24 values to 12, it works all fine. I assume I have some logical problem for the u32 version. The correctly working u64 version can be found in this Rust playground.
Learning how to create a MCVE is a crucial skill when programming. For example, why do you have an array at all? Why do you reuse variables?
Your original first number is 0x4F9B1BD6, the output first number is 0x000B1BD6.
Comparing the intermediate bytes shows that you have garbage:
let num = 0x4F9B1BD6_u32;
println!("{:08X}", num);
let mut bytes = [0u8; BYTES_PER_U32];
for i in 0..bytes.len() {
let k = (i % BYTES_PER_U32) as u32;
bytes[i] = (num >> (4 * k)) as u8;
}
for b in &bytes {
print!("{:X}", b);
}
println!();
4F9B1BD6
D6BD1BB1
Printing out the values of k:
for i in 0..bytes.len() {
let k = (i % BYTES_PER_U32) as u32;
println!("{} / {}", k, 4 * k);
bytes[i] = (num >> (4 * k)) as u8;
}
Shows that you are trying to shift by multiples of 4 bits:
0 / 0
1 / 4
2 / 8
3 / 12
I'm pretty sure that every common platform today uses 8 bits for a byte, not 4.
This is why magic numbers are bad. If you had used constants for the values, you would have noticed the problem much sooner.
since u32 is 4 bytes in my system
A u32 better be 4 bytes on every system — that's why it's a u32.
Overall, don't reinvent the wheel. Use the byteorder crate or equivalent:
extern crate byteorder;
use byteorder::{BigEndian, ReadBytesExt, WriteBytesExt};
const LENGTH: usize = 24;
const BYTES_PER_U32: usize = 4;
fn main() {
let num: [u32; LENGTH] = [
1335565270, 4203813549, 2020505583, 2839365494, 2315860270, 442833049, 1854500981,
2254414916, 4192631541, 2072826612, 1479410393, 718887683, 1421359821, 733943433,
4073545728, 4141847560, 1761299410, 3068851576, 1582484065, 1882676300, 1565750229,
4185060747, 1883946895, 4146,
];
println!("original_num: {:?}", num);
let mut bytes = [0u8; LENGTH * BYTES_PER_U32];
{
let mut bytes = &mut bytes[..];
for &n in &num {
bytes.write_u32::<BigEndian>(n).unwrap();
}
}
let mut num = [0u32; LENGTH];
{
let mut bytes = &bytes[..];
for n in &mut num {
*n = bytes.read_u32::<BigEndian>().unwrap();
}
}
println!("recovered_num: {:?}", num);
}

Convert int to a vector of strings

I am trying to convert long numbers to a string vector. For example, 17562 would become ["1", "7", "5", "6", "2"]. I have seen a lot of examples of converting ints to strings, but no ints to string vectors. I want to iterate over each digit individually.
Here is what I have so far, but it isn't working.
fn main() {
let x = 42;
let values: Vec<&str> = x.to_string().split(|c: char| c.is_alphabetic()).collect();
println!("{:?}", values);
}
Gives me the compiler error of :
<anon>:3:29: 3:42 error: borrowed value does not live long enough
<anon>:3 let values: Vec<&str> = x.to_string().split(|c: char| c.is_alphabetic()).collect();
<anon>:3:88: 6:2 note: reference must be valid for the block suffix following statement 1 at 3:87...
<anon>:3 let values: Vec<&str> = x.to_string().split(|c: char| c.is_alphabetic()).collect();
<anon>:4 println!("{:?}", values);
<anon>:5
<anon>:6 }
<anon>:3:5: 3:88 note: ...but borrowed value is only valid for the statement at 3:4
<anon>:3 let values: Vec<&str> = x.to_string().split(|c: char| c.is_alphabetic()).collect();
<anon>:3:5: 3:88 help: consider using a `let` binding to increase its lifetime
<anon>:3 let values: Vec<&str> = x.to_string().split(|c: char| c.is_alphabetic()).collect();
The equivalent of what I am trying to do in python would be x = 42; x = list(str(x)); print(x)
Ok, the first problem is that you don't store the result of x.to_string() anywhere. As such, it will cease to exist at the end of the expression, meaning that values will be trying to reference a value that no longer exists. Hence the error. The simplest solution is to just store the temporary string somewhere so that it continues to exist:
fn main() {
let x = 42;
let x_str = x.to_string();
let values: Vec<&str> = x_str.split(|c: char| c.is_alphabetic()).collect();
println!("{:?}", values);
}
Second problem: this outputs ["42"] because you told it to split on letters. You probably meant to use is_numeric:
fn main() {
let x = 42;
let x_str = x.to_string();
let values: Vec<&str> = x_str.split(|c: char| c.is_numeric()).collect();
println!("{:?}", values);
}
Third problem: this outputs ["", "", ""], because those are the three strings between numeric characters. Split's argument is the separator. Thus, the third problem is that you're using entirely the wrong method to begin with.
The closest direct equivalent to the Python code you listed would be:
fn main() {
let x = 42;
let values: Vec<String> = x.to_string().chars().map(|c| c.to_string()).collect();
println!("{:?}", values);
}
At last, it outputs: ["4", "2"].
But, this is horribly inefficient: this takes the integer, allocates an intermediate buffer, prints the integer to it, turns it into a string. It takes each code point in that string, allocates an intermediate buffer, prints the code point to it, turns it into a string. Then it collects all these strings into a Vec, possibly reallocating more than once.
It works, but is a bit wasteful. If you don't care about waste, you can stop reading now.
You can make things a bit less wasteful by collecting code points instead of strings:
fn main() {
let x = 42;
let values: Vec<char> = x.to_string().chars().collect();
println!("{:?}", values);
}
This outputs: ['4', '2']. Note the different quotes because we're using char instead of String.
We can remove the intermediate allocations from Vec resizing by pre-allocating its storage, which gives us this version:
fn main() {
let x = 42u32; // no negatives!
let values = {
if x == 0 {
vec!['0']
} else {
// pre-allocate Vec so there's no resizing
let digits = 1 + (x as f64).log10() as u32;
let mut cs = Vec::with_capacity(digits as usize);
let mut div = 10u32.pow(digits - 1);
while div > 0 {
cs.push((b'0' + ((x / div) % 10) as u8) as char);
div /= 10;
}
cs
}
};
println!("{:?}", values);
}
Unless you're doing this in a loop, I'd just stick to the correct, wasteful version.
If you are looking for a performant version, I'd just use this
fn digits(mut val: u64) -> Vec<u8> {
// An unsigned 64-bit number can have 20 digits
let mut result = Vec::with_capacity(20);
loop {
let digit = val % 10;
val = val / 10;
result.push(digit as u8);
if val == 0 { break }
}
result.reverse();
result
}
fn main() {
println!("{:?}", digits(0));
println!("{:?}", digits(1));
println!("{:?}", digits(9));
println!("{:?}", digits(10));
println!("{:?}", digits(11));
println!("{:?}", digits(1234567890));
println!("{:?}", digits(0xFFFFFFFFFFFFFFFF));
}
This may over allocate by a few bytes, but 20 bytes total is small unless you are doing this a whole bunch. It also leaves each value as a number, which you can convert to a string as needed.
What about:
let ss = value.to_string()
.chars()
.map(|c| c.to_string())
.collect::<Vec<_>>();
Demo
Not the greatest perf but reads well.

Resources