Rust struct field str slice with size - rust

I can do the following:
#[repr(C, packed)]
#[derive(Clone, Copy, Debug)]
struct Test {
field: [u8; 8],
}
But I want to format it as a string when I'm using the debug formatter, and I'm wondering if I could define the field as a str slice with a known size. I can't use String because I am loading the struct from memory, where it is not instantiated by my Rust program. I have read that string slices are represented using UTF-8 characters, but I need ASCII. Should I manually implement the Debug trait instead?

There's no automatic way of doing that, so you'll have to manually implement Debug. Be careful, however, that not every [u8; 8] is valid UTF-8 (unless you're actually guaranteed to get ASCII).
To be safe, you could switch whether to print field as a string or as an array of bytes based on whether it's legal UTF-8:
use std::fmt;
impl fmt::Debug for Test {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
let utf8;
let value: &dyn fmt::Debug = if let Ok(s) = std::str::from_utf8(&self.field) {
utf8 = s;
&utf8
} else {
&self.field
};
f.debug_struct("Test")
.field("field", value)
.finish()
}
}
fn main() {
let valid_ascii = Test {
field: [b'a'; 8],
};
let invalid_ascii = Test {
field: [0xFF; 8],
};
println!("{:?}", valid_ascii); // Test { field: "aaaaaaaa" }
println!("{:?}", invalid_ascii); // Test { field: [255, 255, 255, 255, 255, 255, 255, 255] }
}
If you're guaranteed valid ASCII, you can of course just use std::str::from_utf8_unchecked and skip that step.

Related

Skip empty objects when deserializing array with serde

I need to deserialize an array (JSON) of a type let call Foo. I have implemented this and it works well for most stuff, but I have noticed the latest version of the data will sometimes include erroneous empty objects.
Prior to this change, each Foo can be de-serialized to the following enum:
#[derive(Deserialize)]
#[serde(untagged)]
pub enum Foo<'s> {
Error {
// My current workaround is using Option<Cow<'s, str>>
error: Cow<'s, str>,
},
Value {
a: u32,
b: i32,
// etc.
}
}
/// Foo is part of a larger struct Bar.
#[derive(Deserialize)]
#[serde(untagged)]
pub struct Bar<'s> {
foos: Vec<Foo<'s>>,
// etc.
}
This struct may represent one of the following JSON values:
// Valid inputs
[]
[{"a": 34, "b": -23},{"a": 33, "b": -2},{"a": 37, "b": 1}]
[{"error":"Unable to connect to network"}]
[{"a": 34, "b": -23},{"error":"Timeout"},{"a": 37, "b": 1}]
// Possible input for latest versions of data
[{},{},{},{},{},{},{"a": 34, "b": -23},{},{},{},{},{},{},{},{"error":"Timeout"},{},{},{},{},{},{}]
This does not happen very often, but it is enough to cause issues. Normally, the array should include 3 or less entries, but these extraneous empty objects break that convention. There is no meaningful information I can gain from parsing {} and in the worst cases there can be hundreds of them in one array.
I do not want to error on parsing {} as the array still contains other meaningful values, but I do not want to include {} in my parsed data either. Ideally I would also be able to use tinyvec::ArrayVec<[Foo<'s>; 3]> instead of a Vec<Foo<'s>> to save memory and reduce time spent performing allocation during paring, but am unable to due to this issue.
How can I skip {} JSON values when deserializing an array with serde in Rust?
I also put together a Rust Playground with some test cases to try different solutions.
serde_with::VecSkipError provides a way to ignore any elements which fail deserialization, by skipping them. This will ignore any errors and not only the empty object {}. So it might be too permissive.
#[serde_with::serde_as]
#[derive(Deserialize)]
pub struct Bar<'s> {
#[serde_as(as = "serde_with::VecSkipError<_>")]
foos: Vec<Foo<'s>>,
}
Playground
The simplest, but not performant, solution would be to define an enum that captures both the Foo case and the empty case, deserialize into a vector of those, and then filter that vector to get just the nonempty ones.
#[derive(Deserialize, Debug)]
#[serde(untagged)]
pub enum FooDe<'s> {
Nonempty(Foo<'s>),
Empty {},
}
fn main() {
let json = r#"[
{},{},{},{},{},{},
{"a": 34, "b": -23},
{},{},{},{},{},{},{},
{"error":"Timeout"},
{},{},{},{},{},{}
]"#;
let foo_des = serde_json::from_str::<Vec<FooDe>>(json).unwrap();
let foos = foo_des
.into_iter()
.filter_map(|item| {
use FooDe::*;
match item {
Nonempty(foo) => Some(foo),
Empty {} => None,
}
})
.collect();
let bar = Bar { foos };
println!("{:?}", bar);
// Bar { foos: [Value { a: 34, b: -23 }, Error { error: "Timeout" }] }
}
Conceptually this is simple but you're allocating a lot of space for Empty cases that you ultimately don't need. Instead, you can control exactly how deserialization is done by implementing it yourself.
struct BarVisitor<'s> {
marker: PhantomData<fn() -> Bar<'s>>,
}
impl<'s> BarVisitor<'s> {
fn new() -> Self {
BarVisitor {
marker: PhantomData,
}
}
}
// This is the trait that informs Serde how to deserialize Bar.
impl<'de, 's: 'de> Deserialize<'de> for Bar<'s> {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: Deserializer<'de>,
{
impl<'de, 's: 'de> Visitor<'de> for BarVisitor<'s> {
// The type that our Visitor is going to produce.
type Value = Bar<'s>;
fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
formatter.write_str("a list of objects")
}
fn visit_seq<V>(self, mut access: V) -> Result<Self::Value, V::Error>
where
V: SeqAccess<'de>,
{
let mut foos = Vec::new();
while let Some(foo_de) = access.next_element::<FooDe>()? {
if let FooDe::Nonempty(foo) = foo_de {
foos.push(foo)
}
}
let bar = Bar { foos };
Ok(bar)
}
}
// Instantiate our Visitor and ask the Deserializer to drive
// it over the input data, resulting in an instance of Bar.
deserializer.deserialize_seq(BarVisitor::new())
}
}
fn main() {
let json = r#"[
{},{},{},{},{},{},
{"a": 34, "b": -23},
{},{},{},{},{},{},{},
{"error":"Timeout"},
{},{},{},{},{},{}
]"#;
let bar = serde_json::from_str::<Bar>(json).unwrap();
println!("{:?}", bar);
// Bar { foos: [Value { a: 34, b: -23 }, Error { error: "Timeout" }] }
}

get_value returning `f64` instead of `[u8; 4]`

I'm using the noise crate and having trouble understanding how to convert their Color type to an RGB value.
noise = "0.7.0"
pub type Color = [u8; 4];
I'm trying to use the get_value() function, seen here in the docs as:
pub fn get_value(&self, x: usize, y: usize) -> Color {
let (width, height) = self.size;
if x < width && y < height {
self.map[x + y * width]
} else {
self.border_color
}
}
get_value() is implemented for PlaneMapBuilder. So I would expect PlaneMapBuilder::get_value(x,y) to return something of the format [r,g,b,a], but this does not happen:
extern crate noise;
use noise::{utils::*, Billow};
fn main() {
let mut my_noise = PlaneMapBuilder::new(&Billow::new()).build();
let my_val = my_noise.get_value(1,1);
println!("{}", my_val.to_string());
///returns something like -0.610765515150546, not a [u8;4] as I would expect
}
In the docs I see this definition of add_gradient_point() which takes a Color as a parameter:
pub fn add_gradient_point(mut self, pos: f64, color: Color) -> Self {
// check to see if the vector already contains the input point.
if !self
.gradient_points
.iter()
.any(|&x| (x.pos - pos).abs() < std::f64::EPSILON)
{
// it doesn't, so find the correct position to insert the new
// control point.
let insertion_point = self.find_insertion_point(pos);
// add the new control point at the correct position.
self.gradient_points
.insert(insertion_point, GradientPoint { pos, color });
}
self
}
Here they use the [u8; 4] structure I would expect for the Color type:
let jade_gradient = ColorGradient::new()
.clear_gradient()
.add_gradient_point(-1.000, [24, 146, 102, 255])
.add_gradient_point(0.000, [78, 154, 115, 255])
What could explain this behavior?
get_value() is implemented for PlaneMapBuilder
You are correct that PlaneMapBuilder "implements" get_value(). However, it is not get_value() from NoiseImage. It is actually NoiseMap, where its get_value() returns a f64 and not Color.
Depending on what kind of "colors" you'd want, then you could instead use ImageRenderer and call its render() method with &my_noise, which returns a NoiseImage.
// noise = "0.7.0"
use noise::{utils::*, Billow};
fn main() {
let my_noise = PlaneMapBuilder::new(&Billow::new()).build();
let image = ImageRenderer::new().render(&my_noise);
let my_val = image.get_value(1, 1);
println!("{:?}", my_val);
// Prints: `[18, 18, 18, 255]`
}
Here they use the [u8; 4] structure I would expect for the Color type
Just to be clear, those are the same thing in this case. In short the type keyword allows you to define new "type aliases" for an existing types. Essentially, you'd be able to give a complex type a shorthand name. However, they are still the same type.

How can I convert a hex string to a u8 slice?

I have a string that looks like this "090A0B0C" and I would like to convert it to a slice that looks something like this [9, 10, 11, 12]. How would I best go about doing that?
I don't want to convert a single hex char tuple to a single integer value. I want to convert a string consisting of multiple hex char tuples to a slice of multiple integer values.
You can also implement hex encoding and decoding yourself, in case you want to avoid the dependency on the hex crate:
use std::{fmt::Write, num::ParseIntError};
pub fn decode_hex(s: &str) -> Result<Vec<u8>, ParseIntError> {
(0..s.len())
.step_by(2)
.map(|i| u8::from_str_radix(&s[i..i + 2], 16))
.collect()
}
pub fn encode_hex(bytes: &[u8]) -> String {
let mut s = String::with_capacity(bytes.len() * 2);
for &b in bytes {
write!(&mut s, "{:02x}", b).unwrap();
}
s
}
Note that the decode_hex() function panics if the string length is odd. I've made a version with better error handling and an optimised encoder available on the playground.
You could use the hex crate for that. The decode function looks like it does what you want:
fn main() {
let input = "090A0B0C";
let decoded = hex::decode(input).expect("Decoding failed");
println!("{:?}", decoded);
}
The above will print [9, 10, 11, 12]. Note that decode returns a heap allocated Vec<u8>, if you want to decode into an array you'd want to use the decode_to_slice function
fn main() {
let input = "090A0B0C";
let mut decoded = [0; 4];
hex::decode_to_slice(input, &mut decoded).expect("Decoding failed");
println!("{:?}", decoded);
}
or the FromHex trait:
use hex::FromHex;
fn main() {
let input = "090A0B0C";
let decoded = <[u8; 4]>::from_hex(input).expect("Decoding failed");
println!("{:?}", decoded);
}

What is the correct way to fill a C string pointer from Rust?

I have a FFI signature I need to implement:
pub unsafe extern fn f(header_size: u32, header_ptr: *mut u8) -> i32;
A FFI caller is expected to provide a buffer header_ptr and the size of that buffer header_size. Rust is expected to fill a string into that buffer up to header_size, and return 0 if successful. The FFI caller is expected to interpret the string as ASCII.
How can I fill that buffer the most idiomatic way, given I have a headers: &str with the content I want to provide?
Right now I have:
let header_bytes = slice::from_raw_parts_mut(header_ptr, header_size as usize);
if header_bytes.len() < headers.len() { return Errors::IndexOutOfBounds as i32; }
for (i, byte) in headers.as_bytes().iter().enumerate() {
header_bytes[i] = *byte;
}
But that feels wrong.
Edit, I think this is not an exact duplicate to this because my question relates to strings, and IIRC there were special considerations when converting &str to CStrings.
Since C strings are not much more than 0-terminated byte arrays converting from Rust strings is very straight forward. Almost every valid Rust string is also a valid C string, but you have to make sure that the C string ends with a 0-character and that there are no 0-characters anywhere else in the string.
Rust provides a type that takes care of the conversion: CString.
If your input string was successfully converted to a CString you can simply copy the bytes without worrying about the details.
use std::slice;
use std::ffi::CString;
pub unsafe extern fn f(header_size: u32, header_ptr: *mut u8) -> i32 {
let headers = "abc";
let c_headers = match CString::new(headers) {
Ok(cs) => cs,
Err(_) => return -1, // failed to convert to C string
};
let bytes = c_headers.as_bytes_with_nul();
let header_bytes = slice::from_raw_parts_mut(header_ptr, header_size as usize);
header_bytes[..bytes.len()].copy_from_slice(bytes);
0 // success
}
fn main() {
let mut h = [1u8; 8];
unsafe {
f(h.len() as u32, h.as_mut_ptr());
}
println!("{:?}", h); // [97, 98, 99, 0, 1, 1, 1, 1]
}
Note that I left out the length check for brevity. header_bytes[..bytes.len()] will panic if the buffer is too short. This is something you will want to avoid if f is called from C.

How to convert a string of digits into a vector of digits?

I'm trying to store a string (or str) of digits, e.g. 12345 into a vector, such that the vector contains {1,2,3,4,5}.
As I'm totally new to Rust, I'm having problems with the types (String, str, char, ...) but also the lack of any information about conversion.
My current code looks like this:
fn main() {
let text = "731671";
let mut v: Vec<i32>;
let mut d = text.chars();
for i in 0..text.len() {
v.push( d.next().to_digit(10) );
}
}
You're close!
First, the index loop for i in 0..text.len() is not necessary since you're going to use an iterator anyway. It's simpler to loop directly over the iterator: for ch in text.chars(). Not only that, but your index loop and the character iterator are likely to diverge, because len() returns you the number of bytes and chars() returns you the Unicode scalar values. Being UTF-8, the string is likely to have fewer Unicode scalar values than it has bytes.
Next hurdle is that to_digit(10) returns an Option, telling you that there is a possibility the character won't be a digit. You can check whether to_digit(10) returned the Some variant of an Option with if let Some(digit) = ch.to_digit(10).
Pieced together, the code might now look like this:
fn main() {
let text = "731671";
let mut v = Vec::new();
for ch in text.chars() {
if let Some(digit) = ch.to_digit(10) {
v.push(digit);
}
}
println!("{:?}", v);
}
Now, this is rather imperative: you're making a vector and filling it digit by digit, all by yourself. You can try a more declarative or functional approach by applying a transformation over the string:
fn main() {
let text = "731671";
let v: Vec<u32> = text.chars().flat_map(|ch| ch.to_digit(10)).collect();
println!("{:?}", v);
}
ArtemGr's answer is pretty good, but their version will skip any characters that aren't digits. If you'd rather have it fail on bad digits, you can use this version instead:
fn to_digits(text: &str) -> Option<Vec<u32>> {
text.chars().map(|ch| ch.to_digit(10)).collect()
}
fn main() {
println!("{:?}", to_digits("731671"));
println!("{:?}", to_digits("731six71"));
}
Output:
Some([7, 3, 1, 6, 7, 1])
None
To mention the quick and dirty elephant in the room, if you REALLY know your string contains only digits in the range '0'..'9', than you can avoid memory allocations and copies and use the underlying &[u8] representation of String from str::as_bytes directly. Subtract b'0' from each element whenever you access it.
If you are doing competitive programming, this is one of the worthwhile speed and memory optimizations.
fn main() {
let text = "12345";
let digit = text.as_bytes();
println!("Text = {:?}", text);
println!("value of digit[3] = {}", digit[3] - b'0');
}
Output:
Text = "12345"
value of digit[3] = 4
This solution combines ArtemGr's + notriddle's solutions:
fn to_digits(string: &str) -> Vec<u32> {
let opt_vec: Option<Vec<u32>> = string
.chars()
.map(|ch| ch.to_digit(10))
.collect();
match opt_vec {
Some(vec_of_digits) => vec_of_digits,
None => vec![],
}
}
In my case, I implemented this function in &str.
pub trait ExtraProperties {
fn to_digits(self) -> Vec<u32>;
}
impl ExtraProperties for &str {
fn to_digits(self) -> Vec<u32> {
let opt_vec: Option<Vec<u32>> = self
.chars()
.map(|ch| ch.to_digit(10))
.collect();
match opt_vec {
Some(vec_of_digits) => vec_of_digits,
None => vec![],
}
}
}
In this way, I transform &str to a vector containing digits.
fn main() {
let cnpj: &str = "123456789";
let nums: Vec<u32> = cnpj.to_digits();
println!("cnpj: {cnpj}"); // cnpj: 123456789
println!("nums: {nums:?}"); // nums: [1, 2, 3, 4, 5, 6, 7, 8, 9]
}
See the Rust Playground.

Resources