How do I iterate over elements of a struct? - rust

I'm writing a small client/server program for encrypted network communications and have the following struct to allow the endpoints to negotiate capabilities.
struct KeyExchangePacket {
kexinit: u8,
replay_cookie: [u8; 32],
kex_algorithms: String,
kgen_algorithms: String,
encryption_algorithms: String,
mac_algorithms: String,
compression_algorithms: String,
supported_languages: String,
}
I need to convert the fields into bytes in order to send them over a TcpStream, but I currently have to convert them one at a time.
send_buffer.extend_from_slice(kex_algorithms.as_bytes());
send_buffer.extend_from_slice(kgen_algorithms.as_bytes());
etc...
Is there a way to iterate over the fields and push their byte values into a buffer for sending?

Is there a way to iterate over the fields
No. You have to implement it yourself, or find a macro / compiler plugin that will do it for you.
See How to iterate or map over tuples? for a similar question.
Think about how iterators work. An iterator has to yield a single type for each iteration. What would that type be for your struct composed of at least 3 different types?

Bincode does this.
let packet = KeyExchangePacket { /* ... */ };
let size_limit = bincode::SizeLimit::Infinite;
let encoded: Vec<u8> = bincode::serde::serialize(&packet, size_limit).unwrap();
From the readme:
The encoding (and thus decoding) proceeds unsurprisingly -- primitive types are encoded according to the underlying Writer, tuples and structs are encoded by encoding their fields one-by-one, and enums are encoded by first writing out the tag representing the variant and then the contents.

Related

Most efficient way to keep collection of string references

What is the most efficient way to keep a collection of references to strings in Rust?
Specifically, I have the following as the beginning of some code to parse command line arguments (option parsing to be added):
let args: Vec<String> = env::args().collect();
let mut files: Vec<&String> = Vec::new();
let mut i = 1;
while i < args.len() {
let arg = &args[i];
i += 1;
if arg.as_bytes()[0] != b'-' {
files.push(arg);
continue;
}
}
args is as recommended in https://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html declared as Vec<String>. As I understand it, that means new strings are constructed, which is mildly surprising; I would've expected that the command line arguments already exist in memory, and it would only be necessary to make a vector of references to the existing strings. But the compiler seems to concur that it needs to be Vec<String>.
It would seem inefficient to do the same for files; there is surely no need for further copying. Instead, I have declared it as Vec<&String>, which as I understand it, means only creating a vector of references to the existing strings, which is optimal. (Not that it makes a measurable performance difference for command line arguments, but I want to figure this out now, so I can get it right later when dealing with much larger data.)
Where I am slightly confused is that Rust seems to frequently recommend str over String, and indeed the compiler is happy to have files hold either str or &str.
My best guess right now is that str, being an object that refers to a slice of a string, is most efficient when you want to keep a reference to just part of the string, but when you know you want the whole string, it is better to skip the overhead of creating a slice object, and just keep &String.
Is the above correct, or am I missing something?
args is as recommended in https://doc.rust-lang.org/book/ch12-01-accepting-command-line-arguments.html declared as Vec<String>. As I understand it, that means new strings are constructed, which is mildly surprising; I would've expected that the command line arguments already exist in memory
The command-line arguments do exist in memory but
they are not String, they are not even guaranteed to be UTF8
they are not in a Vec layout
Fundamentally there isn't even any prescription as to their storage, all you know is they're C strings (nul-terminated) and you get an array of pointers to those, whose last element is a null pointer.
Which is why args is an iterator of String: it will lazily decode and validate each argument as you request it, in fact you can check its source code:
pub fn args() -> Args {
Args { inner: args_os() }
}
#[stable(feature = "env", since = "1.0.0")]
impl Iterator for Args {
type Item = String;
fn next(&mut self) -> Option<String> {
self.inner.next().map(|s| s.into_string().unwrap())
}
fn size_hint(&self) -> (usize, Option<usize>) {
self.inner.size_hint()
}
}
Now I couldn't tell you why args_os yields OsString rather than OsStr, I would assume portability of some sort (e.g. some platforms might not guarantee the args data lives for the entirety of the program).
My best guess right now is that str, being an object that refers to a slice of a string, is most efficient when you want to keep a reference to just part of the string, but when you know you want the whole string, it is better to skip the overhead of creating a slice object, and just keep &String.
Is the above correct, or am I missing something?
&String exists only for regularity (in the sense that it's a natural outgrowth of shared references and String existing concurrently), it's not actually useful: an &String only lets you access readonly / immutable methods of String, all of which are really provided by str aside from capacity() (which is rarely useful) and a handful of methods duplicated from str to String (I assume for efficiency) like len or is_empty.
&str is also generally more efficient than &String: while its size is 2 words (pointer, length) rather than one (pointer), it points directly to the relevant data rather than pointing to a pointer to the relevant data (and requiring a dereference to access the length property). As such, &String is rarely considered useful and clippy will warn against it by default (also &Vec as &[] is usually better for the same reason).

How to partially deserialise a JSON object?

I have a JSON object:
{"content":{"foo":1,"bar":2},"signature":"3f5ab1..."}
Deserialising this into a custom type already works fine, using:
let s: SignedContent = serde_json::from_str(string)?;
What I want is {"foo":1,"bar":2} as a &[u8] slice, so that I can check the signature.
(I am aware of the issues around canonical JSON representations, and have mitigations in place.)
Currently I am wastefully re-serialising the Content object (within the SignedContent object) into a string and getting the octets from that.
Is there a more efficient way?
Looks like a job for serde_json::value::RawValue (which is available with the "raw_value" feature).
Reference to a range of bytes encompassing a single valid JSON value in the input data.
A RawValue can be used to defer parsing parts of a payload until later, or to avoid parsing it at all in the case that part of the payload just needs to be transferred verbatim into a different output object.
When serializing, a value of this type will retain its original formatting and will not be minified or pretty-printed.
With usage being:
#[derive(Deserialize)]
struct SignedContent<'a> {
#[serde(borrow)]
content: &'a RawValue,
// or without the 'a
//content: Box<RawValue>
}
You can then use content.get() to get the raw &str.

How to achieve blake2AsHex functionality from Polkadot.js in Substrate?

I want to use the blake2AsHex kind of function in Rust. This function exists in javascript but I am looking for a corresponding function in rust. So far, using the primitives of Substrate which are:
pub fn blake2_256(data: &[u8]) -> [u8; 32]
// Do a Blake2 256-bit hash and return result.
I am getting a different value.
When I execute this in console:
util_crypto.blake2AsHex("0x0000000000000000000000000000000000000000000000000000000000000001")
I get the desired value: 0x33e423980c9b37d048bd5fadbd4a2aeb95146922045405accc2f468d0ef96988. However, when I execute this rust code:
let res = hex::encode(&blake2_256("0x0000000000000000000000000000000000000000000000000000000000000001".as_bytes()));
println!("File Hash encoding: {:?}", res);
I get a different value:
47016246ca22488cf19f5e2e274124494d272c69150c3db5f091c9306b6223fc
Hence, how can I implement blake2AsHex in Rust?
Again you have an issue with data types here.
"0x0000000000000000000000000000000000000000000000000000000000000001".as_bytes()
is converting a big string to bytes, not the hexadecimal representation.
You need to correctly create the bytearray that you want to represent, and then it should work.
You are already using hex::encode for bytes to hex string... you should be using hex::decode for hex string to bytes:
https://docs.rs/hex/0.3.1/hex/fn.decode.html
Decodes a hex string into raw bytes.

Does Rust provide a way to parse integer numbers directly from ASCII data in byte (u8) arrays?

Rust has FromStr, however as far as I can see this only takes Unicode text input. Is there an equivalent to this for [u8] arrays?
By "parse" I mean take ASCII characters and return an integer, like C's atoi does.
Or do I need to either...
Convert the u8 array to a string first, then call FromStr.
Call out to libc's atoi.
Write an atoi in Rust.
In nearly all cases the first option is reasonable, however there are cases where files maybe be very large, with no predefined encoding... or contain mixed binary and text, where its most straightforward to read integer numbers as bytes.
No, the standard library has no such feature, but it doesn't need one.
As stated in the comments, the raw bytes can be converted to a &str via:
str::from_utf8
str::from_utf8_unchecked
Neither of these perform extra allocation. The first one ensures the bytes are valid UTF-8, the second does not. Everyone should use the checked form until such time as profiling proves that it's a bottleneck, then use the unchecked form once it's proven safe to do so.
If bytes deeper in the data need to be parsed, a slice of the raw bytes can be obtained before conversion:
use std::str;
fn main() {
let raw_data = b"123132";
let the_bytes = &raw_data[1..4];
let the_string = str::from_utf8(the_bytes).expect("not UTF-8");
let the_number: u64 = the_string.parse().expect("not a number");
assert_eq!(the_number, 231);
}
As in other code, these these lines can be extracted into a function or a trait to allow for reuse. However, once that path is followed, it would be a good idea to look into one of the many great crates aimed at parsing. This is especially true if there's a need to parse binary data in addition to textual data.
I do not know of any way in the standard library, but maybe the atoi crate works for you? Full disclosure: I am its author.
use atoi::atoi;
let (number, digits) = atoi::<u32>(b"42 is the answer"); //returns (42,2)
You can check if the second element of the tuple is a zero to see if the slice starts with a digit.
let (number, digits) = atoi::<u32>(b"x"); //returns (0,0)
let (number, digits) = atoi::<u32>(b"0"); //returns (0,1)

Convert a Vec<u16> or Vec<WCHAR> to a &str

I'm getting into Rust programming to realize a small program and I'm a little bit lost in string conversions.
In my program, I have a vector as follows:
let mut name: Vec<winnt::WCHAR> = Vec::new();
WCHAR is the same as a u16 on my Windows machine.
I hand over the Vec<u16> to a C function (as a pointer) which fills it with data. I then need to convert the string contained in the vector into a &str. However, no matter, what I try, I can not manage to get this conversion working.
The only thing I managed to get working is to convert it to a WideString:
widestr = unsafe { WideCString::from_ptr_str(name.as_ptr()) };
But this seems to be a step into the wrong direction.
What is the best way to convert the Vec<u16> to an &str under the assumption that the vector holds a valid and null-terminated string.
I then need to convert the string contained in the vector into a &str. However, no matter, what I try, I can not manage to get this conversion working.
There's no way of making this a "free" conversion.
A &str is a Unicode string encoded with UTF-8. This is a byte-oriented encoding. If you have UTF-16 (or the different but common UCS-2 encoding), there's no way to read one as the other. That's equivalent to trying to read a JPEG image as a PDF. Both chunks of data might be a string, but the encoding is important.
The first question is "do you really need to do that?". Many times, you can take data from one function and shovel it back into another function, never looking at it. If you can get away with that, that might be be best answer.
If you do need to transform it, then you have to deal with the errors that can occur. An arbitrary array of 16-bit integers may not be valid UTF-16 or UCS-2. These encodings have edge cases that can easily produce invalid strings. Null-termination is another aspect - Unicode actually allows for embedded NUL characters, so a null-terminated string can't hold all possible Unicode characters!
Once you've ensured that the encoding is valid 1 and figured out how many entries in the input vector comprise the string, then you have to decode the input format and re-encode to the output format. This is likely to require some kind of new allocation, so you are most likely to end up with a String, which can then be used most anywhere a &str can be used.
There is a built-in method to convert UTF-16 data to a String: String::from_utf16. Note that it returns a Result to allow for these error cases. There's also String::from_utf16_lossy, which replaces invalid encoded parts with the Unicode replacement character.
let name = [0x68, 0x65, 0x6c, 0x6c, 0x6f];
let a = String::from_utf16(&name);
let b = String::from_utf16_lossy(&name);
println!("{:?}", a);
println!("{:?}", b);
If you are starting from a pointer to a u16 or WCHAR, you will need to convert to a slice first by using slice::from_raw_parts. If you have a null-terminated string, you need to find the NUL yourself and slice the input appropriately.
1: This is actually a great way of using types; a &str is guaranteed to be UTF-8 encoded, so no further check needs to be made. Similarly, the WideCString is likely to perform a check once upon construction and then can skip the check on later uses.
This is my simple hack for this case. There must be a bug; fix for your own case:
let mut v = vec![0u16; MAX_PATH as usize];
// imaginary win32 function
win32_function(v.as_mut_ptr());
let mut path = String::new();
for val in v.iter() {
let c: u8 = (*val & 0xFF) as u8;
if c == 0 {
break;
} else {
path.push(c as char);
}
}

Resources