buffer.write() in Rust Language starting from write() in C - rust

I have this code in C that writes in a fd the command. My issue is that I can't represent the same behaviour in Rust language because apparently the .write() doesn't takes the same parameters as C's write(). The function is the following:
static void set_address(int32_t fd, uint64_t start_address, uint64_t len){
uint64_t command[3];
int64_t bytes;
command[0] = SET_ADDRESS_AREA;
command[1] = start_address;
command[2] = len;
bytes = write(fd, command, (ssize_t)LEN_SET_ADDRESS_AREA);
if (bytes != LEN_SET_ADDRESS_AREA)
{
printf("\nError\n");
exit(-1);
}
}
So my code is:
for (i,val) in ref_memory.iter().enumerate().step_by(DIM_SLICE){
let mut start_address = val ;
let p = std::ptr::addr_of!(start_address);
println!("the address index of val is {:?}",p);
let mut command = (SET_ADDRESS_AREA,start_address,DIM_SLICE);
let file_buffer = File::create(_path);
let bytes_written = file_buffer.unwrap().write(command);
}
}
Writing this
let bytes_written = file_buffer.unwrap().write(command);
I get the error:
Mismatched types: expected reference &[u8] and found tuple (u8, &u8, u8)
Should I create a struct to pass just one reference of type &u8?
Alternatively, is there a crate that offers this feature?

Its not clear why you're diverged so much from the C code when converting it to Rust. Why the for loop? Why the addr_of? Why create the file in the function when the original clearly already has the file descriptor? Why create a tuple instead of an array?
The direct conversion is mostly straight-forward.
fn set_address(file: &mut File, start_address: u64, len: u64) {
let command: [u64; 3] = [
SET_ADDRESS_AREA,
start_address,
len
];
let bytes = file.write(bytemuck::cast_slice(&command)).unwrap();
if bytes != LEN_SET_ADDRESS_AREA {
println!("Error");
std::process::exit(-1);
}
}
The only tricky part here is my use of the bytemuck crate to convert a [u64] into a [u8]. You can do without it, but is a bit more annoying.
Here is a full example on the playground that includes the above and two other methods.

Should I create a struct to pass just one reference of type &u8?
You don't need to create anything. write(2) takes 3 parameters because it needs an fd, a pointer to a buffer to write, and an amount of data to write.
In Rust, the fd is the object on which the method is called (file_buffer), and a slice (&[u8]) has a length so it provides both the "data to write" buffer and the amount of data to write.
What you should do is either just write ref_memory directly (it's not clear why you even split it across multiple writes, especially since you write them all to the same file anyway), or use chunks to split the input buffer into sub-slices, that you can then write directly
let p = std::ptr::addr_of!(start_address);
That makes absolutely no sense. That's a raw pointer to the start_address local variable, which is a copy of val.
Your C code is also... not right. Partial writes are perfectly valid and legal, there's lots of reasons why they might happen (e.g. hardware drivers, network buffer sizes, ...), a write(2) error is signalled by a return value of -1. Following which you'd generally read errno or use perror(3) or strerror(3) to report why the call failed.

Related

How to properly convert a Rust string into a C string?

I'm trying to use a C library that requires me to give it strings (const char*) as function arguments, among other things.
I tried writing 2 functions for this (Rust -> C, C -> Rust), so I'm already aware CString and CStr exist, but couldn't get them working for the 1st conversion.
I tried using solutions from Stack Overflow and examples from the docs for both of them but the result always ends up garbage (mostly random characters and on one occasion the same result as in this post).
// My understanding is that C strings have a \0 terminator, which I need to append to input
// without having the variable created deallocated/destroyed/you name it by Rust right after
// I don't have any other explanation for the data becoming garbage if I clone it.
// Also, this conversion works if i manually append \0 to the end of the string at the constructor
pub unsafe fn convert_str(input: &str) -> *const c_char {
let c_str = input.as_ptr() as *const c_char;
return c_str
}
// Works, at least for now
pub unsafe fn cstr_to_str(c_buf: *const i8) -> &'static str {
let cstr = CStr::from_ptr(c_buf);
return cstr.to_str().expect("success");
}
The resulting implementation acts like this:
let pointer = convert_str("Hello");
let result = cstr_to_str(pointer);
println!("Hello becomes {}", result);
// Output:
// Hello becomes HelloHello becomescannot access a Thread Local Storage value during or after destruction/rustc/fe5b1...
// LOOKsrc/main.rsHello this is windowFailed to create GLFW window.
How do I fix this? Is there a better way to do this I'm not seeing?
Rust strings don't have a \0 terminator, so to append one, convert_str must necessarily allocate memory (since it can't modify input - even in unsafe code, it can't know whether there's space for one more byte in the memory allocated for input)
If you're gonna wrangle C strings, you're gonna have to do C style manual management, i.e. together with returning a char*, convert_str also has to return the implicit obligation to free the string to the caller. Said differently, convert_str can't deallocate the buffer it must allocate. (If it did, using the pointer it returns would be a use after free, which indeed results in garbage.)
So your code might look like this:
Allocate a new CString in convert_str with CString::new(input).unwrap() and make sure its internal buffer doesn't get dropped at the end of the function with .into_raw()
Deallocate the return value of convert_str when you're done using it with drop(CString::from_raw(pointer)).
Playground

Detecting EOF without 0-byte read in Rust

I've been working on some code that reads data from a Read type (the input) in chunks and does some processing on each chunk. The issue is that the final chunk needs to be processed with a different function. As far as I can tell, there's a couple of ways to detect EOF from a Read, but none of them feel particularly ergonomic for this case. I'm looking for a more idiomatic solution.
My current approach is to maintain two buffers, so that the previous read result can be maintained if the next read reads zero bytes, which indicates EOF in this case, since the buffer is of non-zero length:
use std::io::{Read, Result};
const BUF_SIZE: usize = 0x1000;
fn process_stream<I: Read>(mut input: I) -> Result<()> {
// Stores a chunk of input to be processed
let mut buf = [0; BUF_SIZE];
let mut prev_buf = [0; BUF_SIZE];
let mut prev_read = input.read(&mut prev_buf)?;
loop {
let bytes_read = input.read(&mut buf)?;
if bytes_read == 0 {
break;
}
// Some function which processes the contents of a chunk
process_chunk(&prev_buf[..prev_read]);
prev_read = bytes_read;
prev_buf.copy_from_slice(&buf[..]);
}
// Some function used to process the final chunk differently from all other messages
process_final_chunk(&prev_buf[..prev_read]);
Ok(())
}
This strikes me as a very ugly way to do this, I shouldn't need to use two buffers here.
An alternative I can think of would be to impose Seek on input and use input.read_exact(). I could then check for an UnexpectedEof errorkind to determine that we've hit the end of input, and seek backwards to read the final chunk again (the seek & read again is necessary here because the contents of the buffer are undefined in the case of an UnexpectedEof error). But this doesn't seem idiomatic at all: Encountering an error, seeking back, and reading again just to detect we're at the end of a file is very strange.
My ideal solution would be something like this, using an imaginary input.feof() function that returns true if the last input.read() call reached EOF, like the feof syscall in C:
fn process_stream<I: Read>(mut input: I) -> Result<()> {
// Stores a chunk of input to be processed
let mut buf = [0; BUF_SIZE];
let mut bytes_read = 0;
loop {
bytes_read = input.read(&mut buf)?;
if input.feof() {
break;
}
process_chunk(&buf[..bytes_read]);
}
process_final_chunk(&buf[..bytes_read]);
Ok(())
}
Can anyone suggest a way to implement this that is more idiomatic? Thanks!
When read of std::io::Read returns Ok(n), not only does that mean that the buffer buf has been filled in with n bytes of data from this source., but it also indicates that the bytes after index n (inclusive) are left untouched. With this in mind, you actually don't need a prev_buf at all, because when n is 0, all bytes of the buffer would be left untoutched (leaving them to be those bytes of the previous read).
prog-fh's solution is what you want to go with for the kind of processing you want to do, because it will only hand off full chunks to process_chunk. With read potentially returning a value between 0 and BUF_SIZE, this is needed. For more info, see this part of the above link:
It is not an error if the returned value n is smaller than the buffer size, even when the reader is not at the end of the stream yet. This may happen for example because fewer bytes are actually available right now (e. g. being close to end-of-file) or because read() was interrupted by a signal.
However, I advise that you think about what should happen when you get a Ok(0) from read that does not represent end of file forever. See this part:
If n is 0, then it can indicate one of two scenarios:
This reader has reached its “end of file” and will likely no longer be able to produce bytes. Note that this does not mean that the reader will always no longer be able to produce bytes.
So if you were to get a sequence of reads that returned Ok(BUF_SIZE), Ok(BUF_SIZE), 0, Ok(BUF_SIZE) (which is entirely possible, it just represents a hitch in the IO), would you want to not consider the last Ok(BUF_SIZE) as a read chunk? If you treat Ok(0) as EOF forever, that may be a mistake here.
The only way to reliably determine what should be considered as the last chunk is to have the expected length (in bytes, not # of chunks) sent beforehand as part of the protocol. Given a variable expected_len, you could then determine the start index of the last chunk through expected_len - expected_len % BUF_SIZE, and the end index just being expected_len itself.
Since you consider read_exact() as a possible solution, then we can consider that a non-final chunk contains exactly BUF_SIZE bytes.
Then why not just read as much as we can to fill such a buffer and process it with a function, then, when it's absolutely not possible (because EOF is reached), process the incomplete last chunk with another function?
Note that feof() in C does not guess that EOF will be reached on the next read attempt; it just reports the EOF flag that could have been set during the previous read attempt.
Thus, for EOF to be set and feof() to report it, a read attempt returning 0 must have been encountered first (as in the example below).
use std::fs::File;
use std::io::{Read, Result};
const BUF_SIZE: usize = 0x1000;
fn process_chunk(bytes: &[u8]) {
println!("process_chunk {}", bytes.len());
}
fn process_final_chunk(bytes: &[u8]) {
println!("process_final_chunk {}", bytes.len());
}
fn process_stream<I: Read>(mut input: I) -> Result<()> {
// Stores a chunk of input to be processed
let mut buf = [0; BUF_SIZE];
loop {
let mut bytes_read = 0;
while bytes_read < BUF_SIZE {
let r = input.read(&mut buf[bytes_read..])?;
if r == 0 {
break;
}
bytes_read += r;
}
if bytes_read == BUF_SIZE {
process_chunk(&buf);
} else {
process_final_chunk(&buf[..bytes_read]);
break;
}
}
Ok(())
}
fn main() {
let file = File::open("data.bin").unwrap();
process_stream(file).unwrap();
}
/*
$ dd if=/dev/random of=data.bin bs=1024 count=10
$ cargo run
process_chunk 4096
process_chunk 4096
process_final_chunk 2048
$ dd if=/dev/random of=data.bin bs=1024 count=8
$ cargo run
process_chunk 4096
process_chunk 4096
process_final_chunk 0
*/

How can I write all of stdin to stdout without using lots of memory?

I want to write a Rust program that takes everything in stdin and copies it to stdout. So far I have this
fn main() {
let mut stdin: io::Stdin = io::stdin();
let mut stdout: io::Stdout = io::stdout();
let mut buffer: [u8; 1_000_000] = [0; 1_000_000];
let mut n_bytes_read: usize = 0;
let mut uninitialized: bool = true;
while uninitialized || n_bytes_read > 0
{
n_bytes_read = stdin.read(&mut buffer).expect("Could not read from STDIN.");
uninitialized = false;
}
}
I'm copying everything into a buffer of size one million so as not to blow up the memory if someone feeds my program a 3 gigabyte file. So now I want to copy this to stdout, but the only primitive write operation I can find is stdout.write(&mut buffer) - but this writes the whole buffer! I would need a way to write a specific number of bytes, like stdout.write_only(&mut buffer, n_bytes_read).
I'd like to do this in the most basic way possible, with a minimum of standard library imports.
If all you wanted to do was copy from stdin to stdout without using much memory, just use std::io::copy. It streams the data from a reader to a writer.
If your goal is to write part of a buffer, then take a slice of that buffer and pass that to write:
stdout.write(&buffer[0..n_bytes_read]);
A slice does not copy the data so you will not use any more memory.
Note however that write may not write everything you have asked - it returns the number of bytes actually written. If you use write_all it will write the whole slice.

How to safely reinterpret Vec<f64> as Vec<num_complex::Complex<f64>> with half the size?

I have complex number data filled into a Vec<f64> by an external C library (prefer not to change) in the form [i_0_real, i_0_imag, i_1_real, i_1_imag, ...] and it appears that this Vec<f64> has the same memory layout as a Vec<num_complex::Complex<f64>> of half the length would be, given that num_complex::Complex<f64>'s data structure is memory-layout compatible with [f64; 2] as documented here. I'd like to use it as such without needing a re-allocation of a potentially large buffer.
I'm assuming that it's valid to use from_raw_parts() in std::vec::Vec to fake a new Vec that takes ownership of the old Vec's memory (by forgetting the old Vec) and use size / 2 and capacity / 2, but that requires unsafe code. Is there a "safe" way to do this kind of data re-interpretation?
The Vec is allocated in Rust as a Vec<f64> and is populated by a C function using .as_mut_ptr() that fills in the Vec<f64>.
My current compiling unsafe implementation:
extern crate num_complex;
pub fn convert_to_complex_unsafe(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
let new_vec = unsafe {
Vec::from_raw_parts(
buffer.as_mut_ptr() as *mut num_complex::Complex<f64>,
buffer.len() / 2,
buffer.capacity() / 2,
)
};
std::mem::forget(buffer);
return new_vec;
}
fn main() {
println!(
"Converted vector: {:?}",
convert_to_complex_unsafe(vec![3.0, 4.0, 5.0, 6.0])
);
}
Is there a "safe" way to do this kind of data re-interpretation?
No. At the very least, this is because the information you need to know is not expressed in the Rust type system but is expressed via prose (a.k.a. the docs):
Complex<T> is memory layout compatible with an array [T; 2].
— Complex docs
If a Vec has allocated memory, then [...] its pointer points to len initialized, contiguous elements in order (what you would see if you coerced it to a slice),
— Vec docs
Arrays coerce to slices ([T])
— Array docs
Since a Complex is memory-compatible with an array, an array's data is memory-compatible with a slice, and a Vec's data is memory-compatible with a slice, this transformation should be safe, even though the compiler cannot tell this.
This information should be attached (via a comment) to your unsafe block.
I would make some small tweaks to your function:
Having two Vecs at the same time pointing to the same data makes me very nervous. This can be trivially avoided by introducing some variables and forgetting one before creating the other.
Remove the return keyword to be more idiomatic
Add some asserts that the starting length of the data is a multiple of two.
As rodrigo points out, the capacity could easily be an odd number. To attempt to avoid this, we call shrink_to_fit. This has the downside that the Vec may need to reallocate and copy the memory, depending on the implementation.
Expand the unsafe block to cover all of the related code that is required to ensure that the safety invariants are upheld.
pub fn convert_to_complex(mut buffer: Vec<f64>) -> Vec<num_complex::Complex<f64>> {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
buffer.shrink_to_fit();
let ptr = buffer.as_mut_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
let cap = buffer.capacity();
assert!(len % 2 == 0);
assert!(cap % 2 == 0);
std::mem::forget(buffer);
Vec::from_raw_parts(ptr, len / 2, cap / 2)
}
}
To avoid all the worrying about the capacity, you could just convert a slice into the Vec. This also doesn't have any extra memory allocation. It's simpler because we can "lose" any odd trailing values because the Vec still maintains them.
pub fn convert_to_complex(buffer: &[f64]) -> &[num_complex::Complex<f64>] {
// This is where I'd put the rationale for why this `unsafe` block
// upholds the guarantees that I must ensure. Too bad I
// copy-and-pasted from Stack Overflow without reading this comment!
unsafe {
let ptr = buffer.as_ptr() as *mut num_complex::Complex<f64>;
let len = buffer.len();
assert!(len % 2 == 0);
std::slice::from_raw_parts(ptr, len / 2)
}
}

Should I pass a mutable reference or transfer ownership of a variable in the context of FFI?

I have a program that utilizes a Windows API via a C FFI (via winapi-rs). One of the functions expects a pointer to a pointer to a string as an output parameter. The function will store its result into this string. I'm using a variable of type WideCString for this string.
Can I "just" pass in a mutable ref to a ref to a string into this function (inside an unsafe block) or should I rather use a functionality like .into_raw() and .from_raw() that also moves the ownership of the variable to the C function?
Both versions compile and work but I'm wondering whether I'm buying any disadvantages with the direct way.
Here are the relevant lines from my code utilizing .into_raw and .from_raw.
let mut widestr: WideCString = WideCString::from_str("test").unwrap(); //this is the string where the result should be stored
let mut security_descriptor_ptr: winnt::LPWSTR = widestr.into_raw();
let rtrn3 = unsafe {
advapi32::ConvertSecurityDescriptorToStringSecurityDescriptorW(sd_buffer.as_mut_ptr() as *mut std::os::raw::c_void,
1,
winnt::DACL_SECURITY_INFORMATION,
&mut security_descriptor_ptr,
ptr::null_mut())
};
if rtrn3 == 0 {
match IOError::last_os_error().raw_os_error() {
Some(1008) => println!("Need to fix this errror in get_acl_of_file."), // Do nothing. No idea, why this error occurs
Some(e) => panic!("Unknown OS error in get_acl_of_file {}", e),
None => panic!("That should not happen in get_acl_of_file!"),
}
}
let mut rtr: WideCString = unsafe{WideCString::from_raw(security_descriptor_ptr)};
The description of this parameter in MSDN says:
A pointer to a variable that receives a pointer to a null-terminated security descriptor string. For a description of the string format, see Security Descriptor String Format. To free the returned buffer, call the LocalFree function.
I am expecting the function to change the value of the variable. Doesn't that - per definition - mean that I'm moving ownership?
I am expecting the function to change the value of the variable. Doesn't that - per definition - mean that I'm moving ownership?
No. One key way to think about ownership is: who is responsible for destroying the value when you are done with it.
Competent C APIs (and Microsoft generally falls into this category) document expected ownership rules, although sometimes the words are oblique or assume some level of outside knowledge. This particular function says:
To free the returned buffer, call the LocalFree function.
That means that the ConvertSecurityDescriptorToStringSecurityDescriptorW is going to perform some kind of allocation and return that to the user. Checking out the function declaration, you can also see that they document that parameter as being an "out" parameter:
_Out_ LPTSTR *StringSecurityDescriptor,
Why is it done this way? Because the caller doesn't know how much memory to allocate to store the string 1!
Normally, you'd pass a reference to uninitialized memory into the function which must then initialize it for you.
This compiles, but you didn't provide enough context to actually call it, so who knows if it works:
extern crate advapi32;
extern crate winapi;
extern crate widestring;
use std::{mem, ptr, io};
use winapi::{winnt, PSECURITY_DESCRIPTOR};
use widestring::WideCString;
fn foo(sd_buffer: PSECURITY_DESCRIPTOR) -> WideCString {
let mut security_descriptor = unsafe { mem::uninitialized() };
let retval = unsafe {
advapi32::ConvertSecurityDescriptorToStringSecurityDescriptorW(
sd_buffer,
1,
winnt::DACL_SECURITY_INFORMATION,
&mut security_descriptor,
ptr::null_mut()
)
};
if retval == 0 {
match io::Error::last_os_error().raw_os_error() {
Some(1008) => println!("Need to fix this errror in get_acl_of_file."), // Do nothing. No idea, why this error occurs
Some(e) => panic!("Unknown OS error in get_acl_of_file {}", e),
None => panic!("That should not happen in get_acl_of_file!"),
}
}
unsafe { WideCString::from_raw(security_descriptor) }
}
fn main() {
let x = foo(ptr::null_mut());
println!("{:?}", x);
}
[dependencies]
winapi = { git = "https://github.com/nils-tekampe/winapi-rs/", rev = "1bb62e2c22d0f5833cfa9eec1db2c9cfc2a4a303" }
advapi32-sys = { git = "https://github.com/nils-tekampe/winapi-rs/", rev = "1bb62e2c22d0f5833cfa9eec1db2c9cfc2a4a303" }
widestring = "*"
Answering your questions directly:
Can I "just" pass in a mutable ref to a ref to a string into this function (inside an unsafe block) or should I rather use a functionality like .into_raw() and .from_raw() that also moves the ownership of the variable to the C function?
Neither. The function doesn't expect you to pass it a pointer to a string, it wants you to pass a pointer to a place where it can put a string.
I also just realized after your explanation that (as far as I understood it) in my example, the widestr variable never gets overwritten by the C function. It overwrites the reference to it but not the data itself.
It's very likely that the memory allocated by WideCString::from_str("test") is completely leaked, as nothing has a reference to that pointer after the function call.
Is this a general rule that a C (WinAPI) function will always allocate the buffer by itself (if not following the two step approach where it first returns the size)?
I don't believe there are any general rules between C APIs or even inside of a C API. Especially at a company as big as Microsoft with so much API surface. You need to read the documentation for each method. This is part of the constant drag that can make writing C feel like a slog.
it somehow feels odd for me to hand over uninitialized memory to such a function.
Yep, because there's not really a guarantee that the function initializes it. In fact, it would be wasteful to initialize it in case of failure, so it probably doesn't. It's another thing that Rust seems to have nicer solutions for.
Note that you shouldn't do function calls (e.g. println!) before calling things like last_os_error; those function calls might change the value of the last error!
1 Other Windows APIs actually require a multistep process - you call the function with NULL, it returns the number of bytes you need to allocate, then you call it again

Resources