I'm writing a method that receives an instance of bytes::Bytes representing a Type/Length/Value data structure where byte 0 is the type, the next 4 the length and the remaining the value. I implemented a unit test that is behaving a very unexpected way.
Given the method:
fn split_into_packets(packet: &Bytes) -> Vec<Bytes> {
let mut packets: Vec<Bytes> = Vec::new();
let mut begin: usize = 1;
while begin < packet.len() {
let slice = &packet[1..5];
print!("0: {} ", slice[0]);
print!("1: {} ", slice[1]);
print!("2: {} ", slice[2]);
println!("3: {}", slice[3]);
let size = u32::from_be_bytes(pop(slice));
println!("{}", size);
}
return packets;
}
And the test:
let mut bytes = BytesMut::with_capacity(330);
bytes.extend_from_slice(b"R\x52\x00\x00\x00\x08\x00");
let packets = split_into_packets(&bytes.freeze());
I see the following on my console:
0: 82 1: 0 2: 0 3: 0
I expected it to be:
0: 0 1: 0 2: 0 3: 82
What's going on? What am I missing?
fn split_into_packets(packet: &Bytes) -> Vec<Bytes> { // paket = "R\x52\x00\x00\x00\x08\x00"
let mut packets: Vec<Bytes> = Vec::new();
let mut begin: usize = 1;
while begin < packet.len() {
let slice = &packet[1..5]; // slice = "\x52\x00\x00\x00"
print!("0: {} ", slice[0]); // "\x52\x00\x00\x00"
^^^^
| |
+--+--- this is slice[0] = 0x52 = 82 (in decimal)
print!("1: {} ", slice[1]); // "\x52\x00\x00\x00"
^^^^
| |
+--+--- this is slice[1] = 0x0 = 0 (in decimal)
print!("2: {} ", slice[2]); // "\x52\x00\x00\x00"
^^^^
| |
+--+--- this is slice[2] = 0x0 = 0 (in decimal)
println!("3: {}", slice[3]); // "\x52\x00\x00\x00"
^^^^
| |
+--+--- this is slice[3] = 0x0 = 0 (in decimal)
let size = u32::from_be_bytes(pop(slice));
println!("{}", size);
}
return packets;
}
I hope the above explains why you get 82, 0, 0, 0 when printing the bytes one after another.
So, onto the next thing: How do we convert 4 bytes to an u32: To do that there are two possibilities that differ in how they interpret the bytes:
from_be_bytes: Converts bytes to u32 in big-endian: u32::from_be_bytes([0x12, 0x34, 0x56, 0x78])==0x12345678
from_le_bytes: Converts bytes to u32 in little-endian: u32::from_le_bytes([0x78, 0x56, 0x34, 0x12])==0x12345678
For endianess, you can e.g. consult the respective wikipedia page.
Related
In Rust, turning a *const u8 into a *const T is ok, but dereferencing the cast pointer is unsafe because the memory pointed to may not satisfy T's requirements of size, alignment and valid byte pattern. I'm trying to come up with an example that violates the alignment requirement, but satisfy the 2 others.
So I generate a random slice of 7 u8 and try to interpret different length-4 sub-slices as an f32 value. Any byte patttern is a valid f32 and 4 u8 are indead size_of::<f32>(). So the only thing that varies is the alignment of the sub-slice pointer, which is shifted from the base slice:
slice: [ 0 | 1 | 2 | 3 | 4 | 5 | 6 ]
sub-slices: [ 0 1 2 3 ]
[ 1 2 3 4 ]
[ 2 3 4 5 ]
[ 3 4 5 6 ]
This is the code that I run
use std::mem::transmute;
use std::ptr::read;
use std::convert::TryInto;
//use rand::Rng;
fn to_f32(v: &[u8]) -> f32 {
let ptr = v.as_ptr() as *const f32;
unsafe {
// [1] dereference
*ptr
// [2] alternatively
//ptr.read()
}
}
fn main() {
println!("align_of::<f32>() = {}", std::mem::align_of::<f32>());
//let mut rng = rand::thread_rng();
// with a pointer on the stack
let v: [u8; 7] = [ 0x4A, 0x3A, 0x2a, 0x10, 0x0F, 0xD2, 0x37];
// with a pointer on the heap
//let v = Box::new(rng.gen::<[u8;7]>());
for i in 0..4 {
let ptr = &v[i..(i+4)];
let f = to_f32(ptr);
// max alignment of ptr
let alignment = 1 << (ptr.as_ptr() as usize).trailing_zeros();
// other ways to convert, as a control check
let repr = ptr.try_into().expect("");
let f2 = unsafe { transmute::<[u8; 4], f32>(repr) };
let f3 = f32::from_le_bytes(repr);
println!("{:x?} [{alignment}]: {repr:02x?} : {f} =? {f2} = {f3}", ptr.as_ptr());
assert_eq!(f, f2);
assert_eq!(f, f3);
}
}
The code outputs:
align_of::<f32>() = 4
0x7fffa431a5d1 [1]: [4a, 3a, 2a, 10] : 0.000000000000000000000000000033571493 =? 0.000000000000000000000000000033571493 = 0.000000000000000000000000000033571493
0x7fffa431a5d2 [2]: [3a, 2a, 10, 0f] : 0.000000000000000000000000000007107881 =? 0.000000000000000000000000000007107881 = 0.000000000000000000000000000007107881
0x7fffa431a5d3 [1]: [2a, 10, 0f, d2] : -153612880000 =? -153612880000 = -153612880000
0x7fffa431a5d4 [4]: [10, 0f, d2, 37] : 0.000025040965 =? 0.000025040965 = 0.000025040965
The question is why is this code never asserting, even though it [1] unsafely dereference an unaligned pointer or [2] calls ptr::read() that explicitly requires valid alignment ?
Dereferencing an unaligned pointer is Undefined Behavior. Undefined Behavior is undefined, anything can happen, and that includes the expected result. This does not mean the code is correct. Specifically, x86 allows unaligned reads, so this is likely the reason it does not fail.
Miri indeed reports an error in your code:
error: Undefined Behavior: accessing memory with alignment 1, but alignment 4 is required
--> src/main.rs:10:9
|
10 | *ptr
| ^^^^ accessing memory with alignment 1, but alignment 4 is required
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE:
= note: inside `to_f32` at src/main.rs:10:9: 10:13
note: inside `main`
--> src/main.rs:28:17
|
28 | let f = to_f32(ptr);
| ^^^^^^^^^^^
i want to convert u32 into ASCII bytes.
input: 1u32
output [49]
This was my try, but its empty with 0u32 and also using Vec, i would prefer ArrayVec but how do i know the size of the number. Is there any simple way to do this , without using any dynamic allocations?
let mut num = 1u32;
let base = 10u32;
let mut a: Vec<char> = Vec::new();
while num != 0 {
let chars = char::from_digit(num % base,10u32).unwrap();
a.push(chars);
num /= base;
}
let mut vec_of_u8s: Vec<u8> = a.iter().map(|c| *c as u8).collect();
vec_of_u8s.reverse();
println!("{:?}",vec_of_u8s)
Use the write! macro and ArrayVec with the capacity set to 10 (the maximum digits of a u32):
use std::io::Write;
use arrayvec::ArrayVec; // 0.7.2
fn main() {
let input = 1u32;
let mut buffer = ArrayVec::<u8, 10>::new();
write!(buffer, "{}", input).unwrap();
dbg!(buffer);
}
[src/main.rs:10] buffer = [
49,
]
In the case of Brazil, the thousands separator is '.' and the decimal separator is ','.
Is there any more efficient way using just the standard Rust library?
I'm currently using the following functions:
thousands_separator:
fn thousands_separator (value: f64, decimal: usize) -> String {
let abs_value = value.abs(); // absolute value
let round = format!("{:0.decimal$}", abs_value);
let integer: String = round[..(round.len() - decimal - 1)].to_string();
let fraction: String = round[(round.len() - decimal)..].to_string();
//println!("round: {}", round);
//println!("integer: {}", integer);
//println!("fraction: {}", fraction);
let size = 3;
let thousands_sep = '.';
let decimal_sep = "," ;
let integer_splitted = integer
.chars()
.enumerate()
.flat_map(|(i, c)| {
if (integer.len() - i) % size == 0 && i > 0 {
Some(thousands_sep)
} else {
None
}
.into_iter()
.chain(std::iter::once(c))
})
.collect::<String>();
if value.is_sign_negative() {
"-".to_string() + &integer_splitted + decimal_sep + &fraction
} else {
integer_splitted + decimal_sep + &fraction
}
}
thousands_separator_alternative:
fn thousands_separator_alternative(value: f64, decimal: usize) -> String {
let abs_value = value.abs(); // absolute value
let round = format!("{:0.decimal$}", abs_value);
let integer: String = round[..(round.len() - decimal - 1)].to_string();
let fraction: String = round[(round.len() - decimal)..].to_string();
println!("round: {}", round);
println!("integer: {}", integer);
println!("fraction: {}", fraction);
let size = 3;
let thousands_sep = '.';
let decimal_sep = "," ;
// Get chars from string.
let chars: Vec<char> = integer.chars().collect();
// Allocate new string.
let mut integer_splitted = String::new();
// Add characters and thousands_sep in sequence
let mut i = 0;
loop {
let j = integer.len() - i;
if j % size == 0 && i > 0 {
integer_splitted.push(thousands_sep);
}
integer_splitted.push(chars[i]);
//println!("i: {} ; j: {} ; integer_splitted: {}", i, j, integer_splitted);
if i == integer.len() - 1 {
break;
}
i += 1;
}
if value.is_sign_negative() {
"-".to_string() + &integer_splitted + decimal_sep + &fraction
} else {
integer_splitted + decimal_sep + &fraction
}
}
The end result is (https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=393859bc0628fcf61eb57dad57a0b945):
number: 67999.9999
A: formatted number: 68.000,00
round: 68000.00
integer: 68000
fraction: 00
B: formatted number: 68.000,00
number: 56345722178.365
A: formatted number: 56.345.722.178,36
round: 56345722178.36
integer: 56345722178
fraction: 36
B: formatted number: 56.345.722.178,36
number: -2987954368.996177
A: formatted number: -2.987.954.369,00
round: 2987954369.00
integer: 2987954369
fraction: 00
B: formatted number: -2.987.954.369,00
number: 0.999
A: formatted number: 1,00
round: 1.00
integer: 1
fraction: 00
B: formatted number: 1,00
number: -4321.99999
A: formatted number: -4.322,00
round: 4322.00
integer: 4322
fraction: 00
B: formatted number: -4.322,00
Snippet is trying to count the number of bytes read in the following sample.txt
sample.txt
one two three four five six
seven eight nine ten eleven twelve
thirteen fourteen fifteen sixteen
%
case 1:
let file = File::open(fname)?;
let mut reader = BufReader::new(&file);
let mut buffer: Vec<u8> = vec![];
let num_bytes = reader.read_until(b'%', &mut buffer);
//println!("{}", String::from_utf8(buffer).unwrap());
println!("read_bytes: {}", num_bytes.unwrap());
read_bytes: 101
case 2:
let file = File::open(fname)?;
let mut reader = BufReader::new(&file);
let mut num_bytes: u32 = 0;
for readline in reader.lines() {
if let Ok(line) = readline {
//println!("{}", line);
let bytes = line.as_bytes();
num_bytes += bytes.len() as u32;
if bytes == b"%" {
break;
}
}
}
println!("read_bytes: {}", num_bytes)
read_bytes: 98
I can't seem to figure out why the two cases are outputting different results. Any help with appreciated thanks
From the docs for BufRead.lines:
The iterator returned from this function will yield instances of io::Result<String>. Each string returned will not have a newline byte.
Your count is off by 3 because you have 3 lines in the data and newline characters are not being counted in the second example.
I'm doing some computational mathematics in Rust, and I have some large numbers which I store in an array of 24 values. I have functions that convert them to bytes and back, but it doesn't work fine for u32 values, whereas it works fine for u64. The code sample can be found below:
fn main() {
let mut bytes = [0u8; 96]; // since u32 is 4 bytes in my system, 4*24 = 96
let mut j;
let mut k: u32;
let mut num: [u32; 24] = [1335565270, 4203813549, 2020505583, 2839365494, 2315860270, 442833049, 1854500981, 2254414916, 4192631541, 2072826612, 1479410393, 718887683, 1421359821, 733943433, 4073545728, 4141847560, 1761299410, 3068851576, 1582484065, 1882676300, 1565750229, 4185060747, 1883946895, 4146];
println!("original_num: {:?}", num);
for i in 0..96 {
j = i / 4;
k = (i % 4) as u32;
bytes[i as usize] = (num[j as usize] >> (4 * k)) as u8;
}
println!("num_to_ytes: {:?}", &bytes[..]);
num = [0u32; 24];
for i in 0..96 {
j = i / 4;
k = (i % 4) as u32;
num[j as usize] |= (bytes[i as usize] as u32) << (4 * k);
}
println!("recovered_num: {:?}", num);
}
Rust playground
The above code does not retrieve the correct number from the byte array. But, if I change all u32 to u64, all 4s to 8s, and reduce the size of num from 24 values to 12, it works all fine. I assume I have some logical problem for the u32 version. The correctly working u64 version can be found in this Rust playground.
Learning how to create a MCVE is a crucial skill when programming. For example, why do you have an array at all? Why do you reuse variables?
Your original first number is 0x4F9B1BD6, the output first number is 0x000B1BD6.
Comparing the intermediate bytes shows that you have garbage:
let num = 0x4F9B1BD6_u32;
println!("{:08X}", num);
let mut bytes = [0u8; BYTES_PER_U32];
for i in 0..bytes.len() {
let k = (i % BYTES_PER_U32) as u32;
bytes[i] = (num >> (4 * k)) as u8;
}
for b in &bytes {
print!("{:X}", b);
}
println!();
4F9B1BD6
D6BD1BB1
Printing out the values of k:
for i in 0..bytes.len() {
let k = (i % BYTES_PER_U32) as u32;
println!("{} / {}", k, 4 * k);
bytes[i] = (num >> (4 * k)) as u8;
}
Shows that you are trying to shift by multiples of 4 bits:
0 / 0
1 / 4
2 / 8
3 / 12
I'm pretty sure that every common platform today uses 8 bits for a byte, not 4.
This is why magic numbers are bad. If you had used constants for the values, you would have noticed the problem much sooner.
since u32 is 4 bytes in my system
A u32 better be 4 bytes on every system — that's why it's a u32.
Overall, don't reinvent the wheel. Use the byteorder crate or equivalent:
extern crate byteorder;
use byteorder::{BigEndian, ReadBytesExt, WriteBytesExt};
const LENGTH: usize = 24;
const BYTES_PER_U32: usize = 4;
fn main() {
let num: [u32; LENGTH] = [
1335565270, 4203813549, 2020505583, 2839365494, 2315860270, 442833049, 1854500981,
2254414916, 4192631541, 2072826612, 1479410393, 718887683, 1421359821, 733943433,
4073545728, 4141847560, 1761299410, 3068851576, 1582484065, 1882676300, 1565750229,
4185060747, 1883946895, 4146,
];
println!("original_num: {:?}", num);
let mut bytes = [0u8; LENGTH * BYTES_PER_U32];
{
let mut bytes = &mut bytes[..];
for &n in &num {
bytes.write_u32::<BigEndian>(n).unwrap();
}
}
let mut num = [0u32; LENGTH];
{
let mut bytes = &bytes[..];
for n in &mut num {
*n = bytes.read_u32::<BigEndian>().unwrap();
}
}
println!("recovered_num: {:?}", num);
}