Parsing variable from delimited file - rust

I have some file content which delimited by pipe | symbol. Named, important.txt.
1|130|80|120|110|E
2|290|420|90|70|B
3|100|220|30|80|C
Then, I use Rust BufReader::split to read its content.
use std::error::Error;
use std::fs::File;
use std::io::BufReader;
use std::io::Prelude::*;
use std::path::Path;
fn main() {
let path = Path::new("important.txt");
let display = path.display();
//Open read-only
let file = match File::open(&path) {
Err(why) => panic!("can't open {}: {}", display,
Error::description(why)),
Ok(file) => file,
}
//Read each line
let reader = BufReader::new(&file);
for vars in reader.split(b'|') {
println!("{:?}\n", vars.unwrap());
}
}
The problem is, vars.unwrap() would return bytes instead of string.
[49]
[49, 51, 48]
[56, 48]
[49, 50, 48]
[49, 49, 48]
[69, 10, 50]
[50, 57, 48]
[52, 50, 48]
[57, 48]
[55, 48]
[66, 10, 51]
[49, 48, 48]
[50, 50, 48]
[51, 48]
[56, 48]
[67, 10]
Do you have any ideas how to parse this delimited file into variable in Rust?

Since your data is line-based, you can use BufRead::lines:
use std::io::{BufReader, BufRead};
fn main() {
let input = r#"1|130|80|120|110|E
2|290|420|90|70|B
3|100|220|30|80|C
"#;
let reader = BufReader::new(input.as_bytes());
for line in reader.lines() {
for value in line.unwrap().split('|') {
println!("{}", value);
}
}
}
This gives you an iterator over Strings for each line in the input. You then use str::split to get the pieces.
Alternatively, you can take the &[u8] you already have and make a string from it with str::from_utf8:
use std::io::{BufReader, BufRead};
use std::str;
fn main() {
let input = r#"1|130|80|120|110|E
2|290|420|90|70|B
3|100|220|30|80|C
"#;
let reader = BufReader::new(input.as_bytes());
for vars in reader.split(b'|') {
let bytes = vars.unwrap();
let s = str::from_utf8(&bytes).unwrap();
println!("{}", s);
}
}
You may also want to look into the csv crate if you are reading structured data like a CSV that just happens to be pipe-delimited.

Related

Deserialize JSON list of hex strings as bytes

Iʼm trying to read a JSON stream, part of which looks like
"data": [
"c1a8f800a4393e0cacd05a5bc60ae3e0",
"bbac4013c1ca3482155b584d35dac185",
"685f237d4fcbd191c981b94ef6986cde",
"a08898e81f1ddb6612aa12641b856aa9"
]
(there are more entries in the data list and each each entry is longer, but this should be illustrative; both the length of the list and the length of each hex string is known at compile time)
Ideally Iʼd want a single [u8; 64] (the actual size is known at compile time), or failing that, a Vec<u8>, but I imagine itʼs easier to deseriazie it as a Vec<[u8; 16]> and merge them later. However, Iʼm having trouble doing even that.
The hex crate has a way to deserialize a single hex string as a Vec or array of u8, but I canʼt figure out how to tell Serde to do that for each entry of the list. Is there a simple way to do that Iʼm overlooking, or do I need to write my own list deserializer?
Serde has the power to use serializers and deserializers from other crates in a nested fashion using #[serde(with = "...")]. Since hex has a serde feature, this can be easily done.
Here is a simple example using serde_json and hex.
cargo.toml
serde = { version = "1.0.133", features = ["derive"] }
serde_json = "1.0.74"
hex = { version = "0.4", features = ["serde"] }
main.rs
use serde::{Deserialize, Serialize};
use serde_json::Result;
#[derive(Serialize, Deserialize, Debug)]
struct MyData {
data: Vec<MyHex>,
}
#[derive(Serialize, Deserialize, Debug)]
#[serde(transparent)]
struct MyHex {
#[serde(with = "hex::serde")]
hex: Vec<u8>,
}
fn main() -> Result<()> {
let data = r#"
{
"data": [
"c1a8f800a4393e0cacd05a5bc60ae3e0",
"bbac4013c1ca3482155b584d35dac185",
"685f237d4fcbd191c981b94ef6986cde",
"a08898e81f1ddb6612aa12641b856aa9"
]
}
"#';
let my_data: MyData = serde_json::from_str(data)?;
println!("{:?}", my_data); // MyData { data: [MyHex { hex: [193, 168, 248, 0, 164, 57, 62, 12, 172, 208, 90, 91, 198, 10, 227, 224] }, MyHex { hex: [187, 172, 64, 19, 193, 202, 52, 130, 21, 91, 88, 77, 53, 218, 193, 133] }, MyHex { hex: [104, 95, 35, 125, 79, 203, 209, 145, 201, 129, 185, 78, 246, 152, 108, 222] }, MyHex { hex: [160, 136, 152, 232, 31, 29, 219, 102, 18, 170, 18, 100, 27, 133, 106, 169] }] }
return Ok(());
}
Serde With Reference
Hex Serde Reference
In some performance-critical situations, it may be advantageous to implement your own deserializer and use it with serde(deserialize_with = …).
If you go that route, you have to:
Implement a deserialziation function for data
Implement a visitor which takes a sequence of precisely 4 blocks
These blocks then need another deserialization function that turns a string into [u8; 16]
use serde::{Deserialize, Deserializer};
#[derive(Deserialize, Debug)]
pub struct Foo {
#[serde(deserialize_with = "deserialize_array_of_hex")]
pub data: [u8; 64],
}
fn deserialize_array_of_hex<'de, D: Deserializer<'de>>(d: D) -> Result<[u8; 64], D::Error> {
use serde::de;
use std::fmt;
#[derive(serde_derive::Deserialize)]
struct Block(#[serde(with="hex::serde")] [u8; 16]);
struct VecVisitor;
impl<'de> de::Visitor<'de> for VecVisitor {
type Value = [u8; 64];
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
write!(formatter, "a list containing 4 hex strings")
}
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
where
A: de::SeqAccess<'de>,
{
let mut data = [0; 64];
for i in 0..4 {
let block = seq
.next_element::<Block>()?
.ok_or_else(|| de::Error::custom("too short"))?;
for j in 0..16 {
data[i * 16 + j] = block.0[j];
}
}
if seq.next_element::<String>()?.is_some() {
return Err(de::Error::custom("too long"))
}
Ok(data)
}
}
d.deserialize_seq(VecVisitor)
}
Full example playground. One could also implement DeserializeSeed for Block and only pass a reference to the [u8; 64] to be written into, but I suspect that copying 16 bytes is negligibly cheap. (Edit: I measured it, it turns out about 10% faster than the other two solutions in this post (when using hex::decode_to_slice in visit_str).)
Actually, nevermind having to implement your own deserializer for performance, the above solution is about equal in performance to
use serde::Deserialize;
#[derive(Deserialize, Debug)]
#[serde(from = "MyDataPre")]
pub struct MyData {
pub data: [u8; 64],
}
impl From<MyDataPre> for MyData {
fn from(p: MyDataPre) -> Self {
let mut data = [0; 64];
for b in 0..4 {
for j in 0..16 {
data[b * 16 + j] = p.data[b].0[j];
}
}
MyData { data }
}
}
#[derive(Deserialize, Debug)]
pub struct MyDataPre {
data: [MyHex; 4],
}
#[derive(Deserialize, Debug)]
struct MyHex (#[serde(with = "hex::serde")] [u8; 16]);
The trick here is the use of #[serde(from = …)], which allows you to deserialize to some other struct, and then tell serde how to convert that to the struct you originally wanted.

How to stream a vector of bytes to BufWriter?

I'm trying to stream bytes to a tcp server by using io::copy(&mut reader, &mut writer), but it gives me this error: the trait "std::io::Read" is not implemented for "Vec<{integer}>". Here I have a vector of bytes which would be the same as me opening a file and converting it to bytes. I want to write the bytes to the BufWriter. What am I doing wrong?
use std::io;
use std::net::TcpStream;
use std::io::BufWriter;
pub fn connect() {
if let Ok(stream) = TcpStream::connect("localhost:8080") {
println!("Connection established!");
let mut reader = vec![
137, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73, 72, 68, 82, 0, 0, 0, 70, 0, 0, 0, 70,
];
let mut writer = BufWriter::new(&stream);
io::copy(&mut reader, &mut writer).expect("Failed to write to stream");
} else {
println!("Couldn't connect to the server")
}
}
error[E0277]: the trait bound `Vec<{integer}>: std::io::Read` is not satisfied
--> src/lib.rs:12:31
|
12 | io::copy(&mut reader, &mut writer).expect("Failed to write to stream");
| -------- ^^^^^^^^^^^ the trait `std::io::Read` is not implemented for `Vec<{integer}>`
| |
| required by a bound introduced by this call
|
note: required by a bound in `std::io::copy`
the compiler have a little trouble here, Vec doesn't implement Read but &[u8] do, you just have a get a slice from the vec before create a mutable reference:
copy(&mut reader.as_slice(), &mut writer).expect("Failed to write to stream");
See also:
What is the difference between storing a Vec vs a Slice?
What are the differences between Rust's `String` and `str`?
Using .as_slice() like so works for me:
pub fn connect() {
if let Ok(stream) = TcpStream::connect("localhost:8080") {
println!("Connection established!");
let reader = vec![
137, 80, 78, 71, 13, 10, 26, 10, 0, 0, 0, 13, 73, 72, 68, 82, 0, 0, 0, 70, 0, 0, 0, 70,
];
let mut writer = BufWriter::new(&stream);
io::copy(&mut reader.as_slice(), &mut writer).expect("Failed to write to stream");
} else {
println!("Couldn't connect to the server")
}
}
That’s because std::io::Read supports slices.

How to pad an array with zeros?

fn main() {
let arr: [u8;8] = [97, 112, 112, 108, 101];
println!("Len is {}",arr.len());
println!("Elements are {:?}",arr);
}
error[E0308]: mismatched types
--> src/main.rs:2:23
|
2 | let arr: [u8;8] = [97, 112, 112, 108, 101];
| ------ ^^^^^^^^^^^^^^^^^^^^^^^^ expected an array with a fixed size of 8 elements, found one with 5 elements
| |
| expected due to this
Is there any way to pad the remaining elements with 0's? Something like:
let arr: [u8;8] = [97, 112, 112, 108, 101].something();
In addition to the other answers, you can use const generics to write a dedicated method.
fn pad_zeroes<const A: usize, const B: usize>(arr: [u8; A]) -> [u8; B] {
assert!(B >= A); //just for a nicer error message, adding #[track_caller] to the function may also be desirable
let mut b = [0; B];
b[..A].copy_from_slice(&arr);
b
}
Playground
You can use concat_arrays macro for it:
use concat_arrays::concat_arrays;
fn main() {
let arr: [u8; 8] = concat_arrays!([97, 112, 112, 108, 101], [0; 3]);
println!("{:?}", arr);
}
I don't think it's possible to do without external dependencies.
You could start with zeros and set the initial values afterwards. This requires arr to be mut though.
fn main() {
let mut arr: [u8;8] = [0;8];
let init = [97, 112, 112, 108, 101];
arr[..init.len()].copy_from_slice(&init);
println!("Len is {}",arr.len());
println!("Elements are {:?}",arr);
}
Link to Playground

How can i parse an itm.txt file?

I am new to embedded and Rust programming. I am working on sensor and I am able to get the readings of the sensor.
The problem:
The data is in the itm.txt file which includes headers and a variable-length payload, Data in itm.txt file. When I am getting the readings of the sensor I am getting this.
Can someone please help me to parse this data, I am finding no way in Rust to parse this data.
This is the code I am using to get the file content.
use std::io;
use std::fs::File;
use std::io::{Read, BufReader};
fn decode<R: io::Read>(mut r: R) -> io::Result<Vec<u8>> {
let mut output = vec![];
loop {
let mut len = 0u8;
r.read_exact(std::slice::from_mut(&mut len))?;
let len = 1 << (len - 1);
let mut buf = vec![0; len];
let res = r.read_exact(&mut buf);
if buf == b"\0" {
break;
}
output.extend(buf);
}
Ok(output)
}
fn main() {
//let data = "{ x: 579, y: -197 , z: -485 }\0";
let mut file = File::open("/tmp/itm.txt").expect("No such file is present at this location");
let mut buf_reader = Vec::new();
file.read_to_end(&mut buf_reader).expect("error");
let content = std::str::from_utf8(&buf_reader).unwrap();
println!("raw str: {:?}", content);
println!("raw hex: {:x?}", content.as_bytes());
let decoded = decode(content.as_bytes()).unwrap();
let s = std::str::from_utf8(&decoded).expect("Failed");
println!("decoded str: {:?}", s);
}
The error I am getting now
raw str: "\u{2}Un\u{3}scal\u{3}edMe\u{3}asur\u{3}emen\u{1}t\u{1} \u{2}{ \u{1}x\u{2}: \u{2}31\u{1}3\u{1},\u{1} \n\n"
raw hex: [2, 55, 6e, 3, 73, 63, 61, 6c, 3, 65, 64, 4d, 65, 3, 61, 73, 75, 72, 3, 65, 6d, 65, 6e, 1, 74, 1, 20, 2, 7b, 20, 1, 78, 2, 3a, 20, 2, 33, 31, 1, 33, 1, 2c, 1, 20, a, a]
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }', src/main.rs:51:46
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
This is the link to itm thing.

Why do I get incorrect values when implementing HMAC-SHA256?

I'm trying to make a function in Rust that will return a HMAC-SHA256 digest. I've been working from the description at Wikipedia and RFC 2104.
I've been struggling with returning the correct HMAC. I'm using ring for the SHA256 digests but no matter what I try, I can't seem to get the right result. I suspect it might have something to do with .as_ref().to_vec() conversions. Even if that's true, I don't know how to continue from that. Not everything from RFC 2104 is implemented in the following code, but it highlights my issue.
extern crate ring;
use ring::{digest, test};
pub fn hmac(k: Vec<u8>, mut m: Vec<u8>) -> Vec<u8> {
// Initialize ipad and opad as byte vectors with SHA256 blocksize
let ipad = vec![0x5C; 64];
let opad = vec![0x36; 64];
// iround and oround are used to seperate the two steps with XORing
let mut iround = vec![];
let mut oround = vec![];
for count in 0..k.len() {
iround.push(k[count] ^ ipad[count]);
oround.push(k[count] ^ opad[count]);
}
iround.append(&mut m); // m is emptied here
iround = (digest::digest(&digest::SHA256, &iround).as_ref()).to_vec();
oround.append(&mut iround); // iround is emptied here
oround = (digest::digest(&digest::SHA256, &oround).as_ref()).to_vec();
let hashed_mac = oround.to_vec();
return hashed_mac;
}
#[test]
fn test_hmac_digest() {
let k = vec![0x61; 64];
let m = vec![0x62; 64];
let actual = hmac(k, m);
// Expected value taken from: https://www.freeformatter.com/hmac-generator.html#ad-output
let expected = test::from_hex("f6cbb37b326d36f2f27d294ac3bb46a6aac29c1c9936b985576041bfb338ae70").unwrap();
assert_eq!(actual, expected);
}
These are the digests:
Actual = [139, 141, 144, 52, 11, 3, 48, 112, 117, 7, 56, 151, 163, 65, 152, 195, 163, 164, 26, 250, 178, 100, 187, 230, 89, 61, 191, 164, 146, 228, 180, 62]
Expected = [246, 203, 179, 123, 50, 109, 54, 242, 242, 125, 41, 74, 195, 187, 70, 166, 170, 194, 156, 28, 153, 54, 185, 133, 87, 96, 65, 191, 179, 56, 174, 112]
As mentioned in a comment, you have swapped the bytes for the inner and outer padding. Refer back to the Wikipedia page:
o_key_pad = key xor [0x5c * blockSize] //Outer padded key
i_key_pad = key xor [0x36 * blockSize] //Inner padded key
Here's what my take on the function would look like. I believe it has less allocation:
extern crate ring;
use ring::{digest, test};
const BLOCK_SIZE: usize = 64;
pub fn hmac(k: &[u8], m: &[u8]) -> Vec<u8> {
assert_eq!(k.len(), BLOCK_SIZE);
let mut i_key_pad: Vec<_> = k.iter().map(|&k| k ^ 0x36).collect();
let mut o_key_pad: Vec<_> = k.iter().map(|&k| k ^ 0x5C).collect();
i_key_pad.extend_from_slice(m);
let hash = |v| digest::digest(&digest::SHA256, v);
let a = hash(&i_key_pad);
o_key_pad.extend_from_slice(a.as_ref());
hash(&o_key_pad).as_ref().to_vec()
}
#[test]
fn test_hmac_digest() {
let k = [0x61; BLOCK_SIZE];
let m = [0x62; BLOCK_SIZE];
let actual = hmac(&k, &m);
// Expected value taken from: https://www.freeformatter.com/hmac-generator.html#ad-output
let expected = test::from_hex("f6cbb37b326d36f2f27d294ac3bb46a6aac29c1c9936b985576041bfb338ae70").unwrap();
assert_eq!(actual, expected);
}

Resources