How to slice a large Vec<i32> as &[u8]?

How to slice a large Vec<i32> as &[u8]? - rust

I don't know how to convert a Vec<i32> into a &[u8] slice.
fn main() {
let v: Vec<i32> = vec![1; 100_000_000];
let v_bytes: &[u8] = /* ... */;
}
I want to write a large Vec<i32> to a file so I can read it back at a future time.

You can use std::slice::from_raw_parts:
let v_bytes: &[u8] = unsafe {
std::slice::from_raw_parts(
v.as_ptr() as *const u8,
v.len() * std::mem::size_of::<i32>(),
)
};
Following the comments on this answer, you should wrap this code in a function and have the return value borrow the input, so that you use the borrow checker as far as possible:
fn as_u8_slice(v: &[i32]) -> &[u8] {
unsafe {
std::slice::from_raw_parts(
v.as_ptr() as *const u8,
v.len() * std::mem::size_of::<i32>(),
)
}
}

Since Rust 1.30, the best solution is to use slice::align_to:
fn main() {
let v: Vec<i32> = vec![1; 8];
let (head, body, tail) = unsafe { v.align_to::<u8>() };
assert!(head.is_empty());
assert!(tail.is_empty());
println!("{:#x?}", body);
}
This properly handles the cases where the alignment of the first type and the second type do not match. In this example, I ensure that the alignment of the i32 is greater than that of the u8 via the assert! statements.
I took #swizards answer and ran with it a bit to get the other side of the coin - reading the vector back in:
use std::fs::File;
use std::io::{Read, Write};
use std::{mem, slice};
fn as_u8_slice(v: &[i32]) -> &[u8] {
let element_size = mem::size_of::<i32>();
unsafe { slice::from_raw_parts(v.as_ptr() as *const u8, v.len() * element_size) }
}
fn from_u8(v: Vec<u8>) -> Vec<i32> {
let data = v.as_ptr();
let len = v.len();
let capacity = v.capacity();
let element_size = mem::size_of::<i32>();
// Make sure we have a proper amount of capacity (may be overkill)
assert_eq!(capacity % element_size, 0);
// Make sure we are going to read a full chunk of stuff
assert_eq!(len % element_size, 0);
unsafe {
// Don't allow the current vector to be dropped
// (which would invalidate the memory)
mem::forget(v);
Vec::from_raw_parts(
data as *mut i32,
len / element_size,
capacity / element_size,
)
}
}
fn do_write(filename: &str, v: &[i32]) {
let mut f = File::create(filename).unwrap();
f.write_all(as_u8_slice(v)).unwrap();
}
fn do_read(filename: &str) -> Vec<i32> {
let mut f = File::open(filename).unwrap();
let mut bytes = Vec::new();
f.read_to_end(&mut bytes).unwrap();
from_u8(bytes)
}
fn main() {
let v = vec![42; 10];
do_write("vector.dump", &v);
let v2 = do_read("vector.dump");
assert_eq!(v, v2);
println!("{:?}", v2)
}

Related

How can concatenated &[u8] slices implement the Read trait without additional copying?

The Read trait is implemented for &[u8]. How can I get a Read trait over several concatenated u8 slices without actually doing any concatenation first?
If I concatenate first, there will be two copies -- multiple arrays into a single array followed by copying from single array to destination via the Read trait. I would like to avoid the first copying.
I want a Read trait over &[&[u8]] that treats multiple slices as a single continuous slice.
fn foo<R: std::io::Read + Send>(data: R) {
// ...
}
let a: &[u8] = &[1, 2, 3, 4, 5];
let b: &[u8] = &[1, 2];
let c: &[&[u8]] = &[a, b];
foo(c); // <- this won't compile because `c` is not a slice of bytes.

You could use the multi_reader crate, which can concatenate any number of values that implement Read:
let a: &[u8] = &[1, 2, 3, 4, 5];
let b: &[u8] = &[1, 2];
let c: &[&[u8]] = &[a, b];
foo(multi_reader::MultiReader::new(c.iter().copied()));
If you don't want to depend on an external crate, you can wrap the slices in a struct of your own and implement Read for it:
struct MultiRead<'a> {
sources: &'a [&'a [u8]],
pos_in_current: usize,
}
impl<'a> MultiRead<'a> {
fn new(sources: &'a [&'a [u8]]) -> MultiRead<'a> {
MultiRead {
sources,
pos_in_current: 0,
}
}
}
impl Read for MultiRead<'_> {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
let current = loop {
if self.sources.is_empty() {
return Ok(0); // EOF
}
let current = self.sources[0];
if self.pos_in_current < current.len() {
break current;
}
self.pos_in_current = 0;
self.sources = &self.sources[1..];
};
let read_size = buf.len().min(current.len() - self.pos_in_current);
buf[..read_size].copy_from_slice(&current[self.pos_in_current..][..read_size]);
self.pos_in_current += read_size;
Ok(read_size)
}
}
Playground

Create a wrapper type around the slices and implement Read for it. Compared to user4815162342's answer, I delegate down to the implementation of Read for slices:
use std::{io::Read, mem};
struct Wrapper<'a, 'b>(&'a mut [&'b [u8]]);
impl<'a, 'b> Read for Wrapper<'a, 'b> {
fn read(&mut self, buf: &mut [u8]) -> std::io::Result<usize> {
let slices = mem::take(&mut self.0);
match slices {
[head, ..] => {
let n_bytes = head.read(buf)?;
if head.is_empty() {
// Advance the child slice
self.0 = &mut slices[1..];
} else {
// More to read, put back all the child slices
self.0 = slices;
}
Ok(n_bytes)
}
_ => Ok(0),
}
}
}
fn main() {
let parts: &mut [&[u8]] = &mut [b"hello ", b"world"];
let mut w = Wrapper(parts);
let mut buf = Vec::new();
w.read_to_end(&mut buf).unwrap();
assert_eq!(b"hello world", &*buf);
}
A more efficient implementation would implement further methods from Read, such as read_to_end or read_vectored.
See also:
How do I implement a trait I don't own for a type I don't own?

how to constrain the lifetime when doing unsafe conversion

I want to create a Test ref from the array ref with the same size and keep the lifetime checking.
I can do this by using a function and I know the function can deduce the lifetime. The code below is intentionally designed to fail when compiling because of use after move. It works.
struct Test {
a: i32,
}
/// 'a can be removed for simplification
fn create_test<'a>(ptr: &'a mut [u8]) -> &'a mut Test {
assert_eq!(ptr.len(), size_of::<Test>());
unsafe { &mut *(ptr as *mut [u8] as *mut Test) }
}
fn main() {
let mut space = Box::new([0 as u8; 100]);
let (s1, _s2) = space.split_at_mut(size_of::<Test>());
let test = create_test(s1);
drop(space);
test.a += 1;
}
My question is how can I do this without declaring an extra function to constrain the lifetime.
fn main() {
let mut space = Box::new([0 as u8; 100]);
let (s1, _s2): (&'a mut [u8], _) = space.split_at_mut(size_of::<Test>());
let test: &'a mut Test = unsafe { &mut *(s1 as *mut [u8] as *mut Test) };
drop(space);
}
such `a is not allowed.

The following code works. And it holds the borrowing check.
fn main() {
let mut space = Box::new([0 as u8; 100]);
let layout = Layout::new::<Test>();
println!("{}", layout.align());
let (_prefix, tests, _suffix) = unsafe { space.align_to_mut::<Test>() };
assert!(tests.len() > 0);
let test = &mut tests[0];
let (_, suffix, _) = unsafe { tests[1..].align_to_mut::<u8>() };
}

You cannot do that, but this is not needed either. Lifetimes are used to ensure safety across function boundaries. In the same function you can just ensure safety manually.
Theoretically, we would not need a borrow checker if the compiler could just inspect the called functions and follow the execution path to deterime whether we invoke Undefined Behavior. Practically, this can't be done because of problems like the Halting Problem and performance.

Taking ownership of vector twice

I am trying to figure out the best way to use two functions on the same vector? I'm able to get both functions to do what I need them to do but as soon as I want to use them both on the same vector, I can't seem to make that compile "reasonably". I imagine I can just stick a bunch of muts and &s everywhere but that seems like a lot just to get two functions to run on the same vector as opposed to one. Am I missing some best practice here that can make this simpler?
Current code that fails compilation with v is a `&` reference error:
fn main() {
let vec = vec![1,2,1,4,5];
println!("Mean: {}, Median: {}", mean(&vec), median(&vec))
}
fn mean(v: &Vec<i32>) -> i32 {
v.iter().sum::<i32>() / v.len() as i32
}
fn median(v: &Vec<i32>) -> i32 {
v.sort();
let med_idx = v.len() / 2 as usize;
v[med_idx]
}

You cannot do that. Rust requires you to think about ownership and constness very deeply. For example, your median function seems to sort vector internally: you should either allow it to modify the argument passed (median(v: &mut Vec<i32>) in the function definition and &mut v in the argument) or make a copy explicitly inside. However, if you allow mutating the vector, the original vector should be mutable itself (let mut vec). So, you can hack your way through like this:
fn main() {
let mut vec = vec![1,2,1,4,5]; // !
println!("Mean: {}, Median: {}", mean(&vec), median(&mut vec)) // !
}
fn mean(v: &Vec<i32>) -> i32 {
v.iter().sum::<i32>() / v.len() as i32
}
fn median(v: &mut Vec<i32>) -> i32 { // !
v.sort();
let med_idx = v.len() / 2 as usize;
v[med_idx]
}
However, making median modify the vector it analyzes seems very weird to me. I think it would be better to make an explicit copy and sorting it:
fn main() {
let vec = vec![1,2,1,4,5];
println!("Mean: {}, Median: {}", mean(&vec), median(&vec))
}
fn mean(v: &Vec<i32>) -> i32 {
v.iter().sum::<i32>() / v.len() as i32
}
fn median(v: &Vec<i32>) -> i32 { // !
let v_sorted = v.sorted(); // creates a copy
let med_idx = v.len() / 2 as usize;
v[med_idx]
}
If you don't want the penalty, you can stick with the first solution and just create a copy on call site. This gives the most flexibility:
fn main() {
let vec = vec![1,2,1,4,5];
println!("Mean: {}, Median: {}", mean(&vec), median(&mut vec.clone())) // !
}
fn mean(v: &Vec<i32>) -> i32 {
v.iter().sum::<i32>() / v.len() as i32
}
fn median(v: &mut Vec<i32>) -> i32 { // !
v.sort();
let med_idx = v.len() / 2 as usize;
v[med_idx]
}

How to convert a Rust integer type to its string representation without allocating a String? [duplicate]

I want to do something like:
let x = 123;
let mut buf = [0 as u8; 20];
format_to!(x --> buf);
assert_eq!(&buf[..3], &b"123"[..]);
With #![no_std] and without any memory allocator.
As I understand, there is an implementation of core::fmt::Display for u64, and I want to use it if possible.
In other words, I want to do something like format!(...), but without a memory allocator. How can I do this?

Let's start with the standard version:
use std::io::Write;
fn main() {
let x = 123;
let mut buf = [0 as u8; 20];
write!(&mut buf[..], "{}", x).expect("Can't write");
assert_eq!(&buf[0..3], b"123");
}
If we then remove the standard library:
#![feature(lang_items)]
#![no_std]
use core::panic::PanicInfo;
#[lang = "eh_personality"]
extern "C" fn eh_personality() {}
#[panic_handler]
fn panic(info: &PanicInfo) -> ! {
loop {}
}
fn main() {
let x = 123;
let mut buf = [0 as u8; 20];
write!(&mut buf[..], "{}", x).expect("Can't write");
assert_eq!(&buf[0..3], b"123");
}
We get the error
error[E0599]: no method named `write_fmt` found for type `&mut [u8]` in the current scope
--> src/main.rs:17:5
|
17 | write!(&mut buf[..], "{}", x).expect("Can't write");
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this error originates in a macro outside of the current crate (in Nightly builds, run with -Z external-macro-backtrace for more info)
write_fmt is implemented in the core library by core::fmt::Write. If we implement it ourselves, we are able to pass that error:
#![feature(lang_items)]
#![feature(start)]
#![no_std]
use core::panic::PanicInfo;
#[lang = "eh_personality"]
extern "C" fn eh_personality() {}
#[panic_handler]
fn panic(info: &PanicInfo) -> ! {
loop {}
}
use core::fmt::{self, Write};
struct Wrapper<'a> {
buf: &'a mut [u8],
offset: usize,
}
impl<'a> Wrapper<'a> {
fn new(buf: &'a mut [u8]) -> Self {
Wrapper {
buf: buf,
offset: 0,
}
}
}
impl<'a> fmt::Write for Wrapper<'a> {
fn write_str(&mut self, s: &str) -> fmt::Result {
let bytes = s.as_bytes();
// Skip over already-copied data
let remainder = &mut self.buf[self.offset..];
// Check if there is space remaining (return error instead of panicking)
if remainder.len() < bytes.len() { return Err(core::fmt::Error); }
// Make the two slices the same length
let remainder = &mut remainder[..bytes.len()];
// Copy
remainder.copy_from_slice(bytes);
// Update offset to avoid overwriting
self.offset += bytes.len();
Ok(())
}
}
#[start]
fn start(_argc: isize, _argv: *const *const u8) -> isize {
let x = 123;
let mut buf = [0 as u8; 20];
write!(Wrapper::new(&mut buf), "{}", x).expect("Can't write");
assert_eq!(&buf[0..3], b"123");
0
}
Note that we are duplicating the behavior of io::Cursor into this wrapper. Normally, multiple writes to a &mut [u8] will overwrite each other. This is good for reusing allocation, but not useful when you have consecutive writes of the same data.
Then it's just a matter of writing a macro if you want to.
You should also be able to use a crate like arrayvec, which has written this code for you. This is untested:
#![feature(lang_items)]
#![feature(start)]
#![no_std]
use core::panic::PanicInfo;
#[lang = "eh_personality"]
extern "C" fn eh_personality() {}
#[panic_handler]
fn panic(_info: &PanicInfo) -> ! {
loop {}
}
use arrayvec::ArrayString; // 0.4.10
use core::fmt::Write;
#[start]
fn start(_argc: isize, _argv: *const *const u8) -> isize {
let x = 123;
let mut buf = ArrayString::<[u8; 20]>::new();
write!(&mut buf, "{}", x).expect("Can't write");
assert_eq!(&buf, "123");
0
}

With bare_io:
use bare_io::{Cursor, Write};
let mut buf = [0 as u8; 256];
let mut cur = Cursor::new(&mut buf[..]);
write!(&mut cur, "hello world, stack buf, {}\n\0", 234).expect("!write");
unsafe { puts(buf.as_ptr()) };
With bare_io, smallvec and alloc:
use smallvec::{Array, SmallVec};
struct WriteSmallVec<A: Array<Item = u8>>(SmallVec<A>);
impl<A: Array<Item = u8>> Write for WriteSmallVec<A> {
fn write(&mut self, buf: &[u8]) -> bare_io::Result<usize> {
self.0.extend_from_slice(buf);
Ok(buf.len())
}
fn flush(&mut self) -> bare_io::Result<()> {
Ok(())
}
}
let mut sv = WriteSmallVec(SmallVec::<[u8; 256]>::new());
write!(&mut sv, "hello world, SmallVec, prev len: {}\n\0", len).expect("!write");
unsafe { puts(sv.0.as_ptr()) };
With bare_io, patched inlinable_string and alloc:
use core::fmt::Write;
use inlinable_string::{InlinableString, StringExt};
let mut is = InlinableString::new();
write!(&mut is, "hello world, InlinableString, {}\n\0", 345).expect("!write");
unsafe { puts(is.as_ptr()) };
Tested in Linux kernel,
cargo build --release -Z build-std=core,alloc --target=x86_64-linux-kernel
Also did some benchmarks, comparing a simple array with SmallVec and InlinableString: https://gitlab.com/artemciy/lin-socks/-/blob/95d2bb96/bench/stack-string.rs
p.s. bare-io has been yanked though.

How can I use the format! macro in a no_std environment?

How could I implement the following example without using std?
let text = format!("example {:.1} test {:x} words {}", num1, num2, num3);
text has type &str and num1, num2 and num3 have any numeric type.
I've tried using numtoa and itoa/dtoa for displaying numbers but numtoa does not support floats and itoa does not support no_std. I feel like displaying a number in a string is fairly common and that I'm probably missing something obvious.

In addition to Shepmaster's answer you can also format strings without an allocator.
In core::fmt::Write you only need to implement write_str and then you get write_fmt for free.
With format_args!(...) (same syntax as format!) you can prepare a core::fmt::Arguments value, which can be passed to core::fmt::write.
See Playground:
#![crate_type = "dylib"]
#![no_std]
pub mod write_to {
use core::cmp::min;
use core::fmt;
pub struct WriteTo<'a> {
buffer: &'a mut [u8],
// on write error (i.e. not enough space in buffer) this grows beyond
// `buffer.len()`.
used: usize,
}
impl<'a> WriteTo<'a> {
pub fn new(buffer: &'a mut [u8]) -> Self {
WriteTo { buffer, used: 0 }
}
pub fn as_str(self) -> Option<&'a str> {
if self.used <= self.buffer.len() {
// only successful concats of str - must be a valid str.
use core::str::from_utf8_unchecked;
Some(unsafe { from_utf8_unchecked(&self.buffer[..self.used]) })
} else {
None
}
}
}
impl<'a> fmt::Write for WriteTo<'a> {
fn write_str(&mut self, s: &str) -> fmt::Result {
if self.used > self.buffer.len() {
return Err(fmt::Error);
}
let remaining_buf = &mut self.buffer[self.used..];
let raw_s = s.as_bytes();
let write_num = min(raw_s.len(), remaining_buf.len());
remaining_buf[..write_num].copy_from_slice(&raw_s[..write_num]);
self.used += raw_s.len();
if write_num < raw_s.len() {
Err(fmt::Error)
} else {
Ok(())
}
}
}
pub fn show<'a>(buffer: &'a mut [u8], args: fmt::Arguments) -> Result<&'a str, fmt::Error> {
let mut w = WriteTo::new(buffer);
fmt::write(&mut w, args)?;
w.as_str().ok_or(fmt::Error)
}
}
pub fn test() {
let mut buf = [0u8; 64];
let _s: &str = write_to::show(
&mut buf,
format_args!("write some stuff {:?}: {}", "foo", 42),
).unwrap();
}

In general, you don't. format! allocates a String, and a no_std environment doesn't have an allocator.
If you do have an allocator, you can use the alloc crate. This crate contains the format! macro.
#![crate_type = "dylib"]
#![no_std]
#[macro_use]
extern crate alloc;
fn thing() {
let text = format!("example {:.1} test {:x} words {}", 1, 2, 3);
}
See also:
How to format output to a byte array with no_std and no allocator?

You can also combine the usage of numtoa and arrayvec crates. Example:
#![no_std]
use numtoa::NumToA;
use arrayvec::ArrayString;
fn main() -> ! {
let mut num_buffer = [0u8; 20];
let mut text = ArrayString::<[_; 100]>::new();
let num1 = 123;
let num2 = 456;
let num3 = 789;
// text.clear(); (on subsequent usages)
text.push_str("example ");
text.push_str(num1.numtoa_str(10, &mut num_buffer));
text.push_str(" test ");
text.push_str(num2.numtoa_str(10, &mut num_buffer));
text.push_str(" words ");
text.push_str(num3.numtoa_str(10, &mut num_buffer));
}
Note that push_str can panic. Check out the api for try_ -methods
And Cargo.toml
 [dependencies]
arrayvec = { version = "0.5", default-features = false }
numtoa = "0.2"

Write a formatter!
use core::fmt::{self, Write};
use core::str;
fn main() {
// For LCD 160 / 8 = 20 chars
let mut buf = [0u8; 20];
let mut buf = ByteMutWriter::new(&mut buf[..]);
buf.clear();
write!(&mut buf, "Hello {}!", "Rust").unwrap();
// buf.as_str()
}
pub struct ByteMutWriter<'a> {
buf: &'a mut [u8],
cursor: usize,
}
impl<'a> ByteMutWriter<'a> {
pub fn new(buf: &'a mut [u8]) -> Self {
ByteMutWriter { buf, cursor: 0 }
}
pub fn as_str(&self) -> &str {
str::from_utf8(&self.buf[0..self.cursor]).unwrap()
}
#[inline]
pub fn capacity(&self) -> usize {
self.buf.len()
}
pub fn clear(&mut self) {
self.cursor = 0;
}
pub fn len(&self) -> usize {
self.cursor
}
pub fn empty(&self) -> bool {
self.cursor == 0
}
pub fn full(&self) -> bool {
self.capacity() == self.cursor
}
}
impl fmt::Write for ByteMutWriter<'_> {
fn write_str(&mut self, s: &str) -> fmt::Result {
let cap = self.capacity();
for (i, &b) in self.buf[self.cursor..cap]
.iter_mut()
.zip(s.as_bytes().iter())
{
*i = b;
}
self.cursor = usize::min(cap, self.cursor + s.as_bytes().len());
Ok(())
}
}

I build a small crate based on Shepmaster's post. There is also a macro included that allows easy use. All this works without a heap allocator and is compatible with no_std.
use arrform::{arrform, ArrForm};
let af = arrform!(64, "write some stuff {}: {:.2}", "foo", 42.3456);
assert_eq!("write some stuff foo: 42.35", af.as_str());
This macro first reserves a buffer on the stack. Then it uses the struct ArrForm to format text and numbers. It returns an instance of ArrForm that allows easy access to the contained text. The macro panics if the buffer is chosen too small.
see https://github.com/Simsys/arrform or https://crates.io/crates/arrform.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to slice a large Vec<i32> as &[u8]? - rust

I don't know how to convert a Vec<i32> into a &[u8] slice. fn main() { let v: Vec<i32> = vec![1; 100_000_000]; let v_bytes: &[u8] = /* ... */; } I want to write a large Vec<i32> to a file so I can read it back at a future time.

Related

How can concatenated &[u8] slices implement the Read trait without additional copying?

how to constrain the lifetime when doing unsafe conversion

Taking ownership of vector twice

How to convert a Rust integer type to its string representation without allocating a String? [duplicate]

How can I use the format! macro in a no_std environment?

Categories

Resources