Does Rust's standard library support direct IO file access?

Does Rust's standard library support direct IO file access? - io

Is there a way to specify O_DIRECT with Rust's standard library, or do you need to use libc?

You can use the Unix specific extension trait os::unix::fs::OpenOptionsExt:
use std::{fs::OpenOptions, os::unix::fs::OpenOptionsExt};
const O_DIRECT: i32 = 0o0040000; // Double check value
fn main() {
OpenOptions::new()
.read(true)
.custom_flags(O_DIRECT)
.open("/etc/passwd")
.expect("Can't open");
}
The value of O_DIRECT is platform-specific, however. I'd probably end up using libc to provide the value.

Related

Why is fs::read_dir() thread safe on POSIX platforms

Some Background
Originally, Rust switched from readdir(3) to readdir_r(3) for thread safety. But readdir_r(3) has some problems, then they changed it back:
Linux and Android: fs: Use readdir() instead of readdir_r() on Linux and Android
Fuchsia: Switch Fuchsia to readdir (instead of readdir_r)
...
So, in the current implementation, they use readdir(3) on most POSIX platforms
#[cfg(any(
target_os = "android",
target_os = "linux",
target_os = "solaris",
target_os = "fuchsia",
target_os = "redox",
target_os = "illumos"
))]
fn next(&mut self) -> Option<io::Result<DirEntry>> {
unsafe {
loop {
// As of POSIX.1-2017, readdir() is not required to be thread safe; only
// readdir_r() is. However, readdir_r() cannot correctly handle platforms
// with unlimited or variable NAME_MAX. Many modern platforms guarantee
// thread safety for readdir() as long an individual DIR* is not accessed
// concurrently, which is sufficient for Rust.
super::os::set_errno(0);
let entry_ptr = readdir64(self.inner.dirp.0);
Thread issue of readdir(3)
The problem of readdir(3) is that its return value (struct dirent *) is a pointer pointing to the internal buffer of the directory stream (DIR), thus can be overwritten by the following readdir(3) calls. So if we have a DIR stream, and share it with multiple threads, with all threads calling readdir(3), which is a race condition.
If we want to safely handle this, an external synchronization is needed.
My question
Then I am curious about what Rust did to avoid such issues. Well, it seems that they just call readdir(3), memcpy the return value to their caller-allocated buffer, and then return. But this function is not marked as unsafe, this makes me confused.
So my question is why is it safe to call fs::read_dir() in multi-threaded programs?
There is a comment stating that it is safe to use it in Rust without extra external synchronization, but I didn't get it...
It requires external synchronization if a particular directory stream may be shared among threads, but I believe we avoid that naturally from the lack of &mut aliasing. Dir is Sync, but only ReadDir accesses it, and only from its mutable Iterator implementation.
OP's edit after 3 months
At the time of writing this question, I was not familiar with multi-threaded programming in Rust. After refining my skill, taking another look at this post makes me realize that it is pretty easy to verify this question:
// With scpped threads
// Does not compile since we can not mutably borrow pwd more than once
use std::{
fs::read_dir,
thread::{scope, spawn},
};
fn main() {
let mut pwd = read_dir(".").unwrap();
scope(|s| {
for _ in 1..10 {
s.spawn(|| {
let entry = pwd.next().unwrap().unwrap();
println!("{:?}", entry.file_name());
});
}
})
}
// Use interior mutability to share it with multiple threads
// This code does compile because synchronization is applied (RwLock)
use std::{
fs::read_dir,
sync::{Arc, RwLock},
thread::spawn,
};
fn main() {
let pwd = Arc::new(RwLock::new(read_dir(".").unwrap()));
for _ in 1..10 {
spawn({
let pwd = Arc::clone(&pwd);
move || {
let entry = pwd.write().unwrap().next().unwrap().unwrap();
println!("{:?}", entry.file_name());
}
}).join().unwrap();
}
}

readdir is not safe when called from multiple threads with the same DIR* dirp parameter (i.e. with the same self.inner.dirp.0 in the Rust case) but it may be called safely with different dirps. Since calling ReadDir::next requires a &mut self, it is guaranteed that nobody else can call it from another thread at the same time on the same ReadDir instance, and so it is safe.

Dealing with lifetimes in global variables

I am currently writing a board support package for an embedded board and would like to set up a serial output over USB. I am basing my design on the hifive BSP.
The process to do this is in three steps:
Set up the USB bus (UsbBusAllocator, which refers to a UsbPeripheral)
Initialize a SerialPort instance, which refers to UsbBusAllocator
Initialize a UsbDevice instance, which refers to UsbBusAllocator
To make my life simpler, I wrapped the SerialPort and UsbDevice in a SerialWrapper struct:
pub struct SerialWrapper<'a> {
port: SerialPort<'a, Usbd<UsbPeripheral<'a>>>,
dev: UsbDevice<'a, Usbd<UsbPeripheral<'a>>>,
}
impl<'a> SerialWrapper<'a> {
pub fn new(bus: &'a UsbPort<'a>) -> Self {
// create the wrapper ...
}
}
I would like a way make the structure created by the SerialWrapper::new global.
I tried to use:
static mut STDOUT: Option<SerialWrapper<'a>> = None;
but I can't use this as lifetime 'a is not declared.
I thought about using MaybeUninit or PhantomData but both will still need to have SerialWrapper<'a> as type parameter and I will get the same issue.
My goal is to be able to use code similar to this:
struct A;
struct B<'a> {
s: &'a A,
}
static mut STDOUT: Option<B> = None;
fn init_stdout() {
let a = A {};
unsafe {
STDOUT = Some(B {s: &a});
}
}
// access_stdout is the important function
// all the rest can be changed without issue
fn access_stdout() -> Result<(), ()> {
unsafe {
if let Some(_stdout) = STDOUT {
// do stuff is system ready
Ok(())
} else {
// do stuff is system not ready
Err(())
}
}
}
fn main() {
init_stdout();
let _ = access_stdout();
}
Do you have any suggestions on how to proceed?
I don't mind having a solution requiring unsafe code, as long as I can have safe functions to access my serial port.

Short answer: whenever you have a lifetime in the type of a static variable, that lifetime needs to be 'static.
Rust does not allow for dangling references, so if you having anything living for shorter than the static lifetime in a static variable, there's the possibility for dangling. I think your code will need a fair amount of refactoring to satisfy this requirement. I would recommend figuring out a way to store the data you need without references since that will make your life much easier. If it is absolutely imperative that you store references you'll need to figure out a way to leak the data to extend its lifetime to 'static.
I am not terribly familiar with embedded development, and I know that static mut has some use-cases there, however usage of that language feature is pretty universally frowned upon. static muts are wildly unsafe and even bypass some borrow checker mechanisms by allowing multiple mutable references at the same time. If you were to encode this properly in Rust's type system you'd probably want to make it just static STDOUT: SyncWrapperForUnsafeCell<Option<T>>, and then provide a safe interface (that might involve locking) for your wrapper with comments explaining why your current environment makes it safe. However if you think that a static mut is the appropriate option there I trust your judgement.

Generic BufReader?

I want to pass a BufReader instance to a separate function:
use std::fs::File;
use std::io::BufReader;
fn main() {
let f = File::open("test.txt")?;
let mut reader = BufReader::new(f);
read(reader);
}
fn read(reader: &mut BufReader) {
// todo
}
but get an error about a missing generic argument for BufReader<R>. What generic parameter to use - String, because that's what I want to read? Sorry for my ignorance - I only know generics from Java.

You can make the type generic, constrained to Read (so your method would work for any type buffered reader that implements Read itself):
fn read<R: Read>(reader: &mut BufReader<R>) {
// todo
}
Playground
From the documentation:
The BufReader struct adds buffering to any reader.
It can be excessively inefficient to work directly with a Read
instance. For example, every call to read on TcpStream results in a
system call. A BufReader performs large, infrequent reads on the
underlying Read and maintains an in-memory buffer of the results.
Also, check on rust generics

How to store one of two constants in a value, where the constants share traits?

Depending on configuration I need to select either stdout or sink once, and pass the results as an output destination for subsequent output call.s
My Java and C++ experience tell me that abstracting away from the concrete type is wise and makes room for future design changes. This code however won't compile:
let out = if std::env::var("LOG").is_ok() {
std::io::stdout()
} else {
std::io::sink()
};
Stating...
`if` and `else` have incompatible types
What is the Rust-o-matic way of solving this?

Dynamic dispatch using trait objects is probably what you need:
use std::io::{self, Write};
use std::env;
fn get_output() -> Box<dyn Write> {
if env::var("LOG").is_ok() {
Box::new(io::stdout())
} else {
Box::new(io::sink())
}
}
let out = get_output();

The approach from Peter's answer is probably what you need, but it does require an extra allocation. (Which probably doesn't matter in the least in this case, but could matter in other scenarios.) If you are only passing out downward, i.e. as argument to functions, you can avoid the allocation by using two variables to store the different outputs:
let (mut stdout, mut sink);
let out: &mut dyn Write = if std::env::var("LOG").is_ok() {
stdout = std::io::stdout();
&mut stdout
} else {
sink = std::io::sink();
&mut sink
};
// ...proceed to use out...

Is it possible to create a Stream from a File rather than loading the file contents into memory?

I'm currently using the rusoto_s3 lib to upload a file to S3. All the examples I have found do the same thing: Open a file, read the full contents of the file into memory (Vec<u8>), then convert the Vec into a ByteStream (which implements From<Vec<u8>>). Here is a code example:
fn upload_file(&self, file_path: &Path) -> FileResult<PutObjectOutput> {
let mut file = File::open(file_path)?;
let mut file_data: Vec<u8> = vec![];
file.read_to_end(&mut file_data)?;
let client = S3Client::new(Region::UsEast1);
let mut request = PutObjectRequest::default();
request.body = Some(file_data.into());
Ok(client.put_object(request).sync()?)
}
This is probably acceptable for small files, but (I assume) this technique would break down as soon as you attempt to upload a file with a size greater than the available heap memory.
Another way to create a ByteStream is by using this initializer which accepts an object implementing the Stream trait. I would assume that File would implement this trait, but this does not appear to be the case.
My question(s):
Is there some type which can be constructed from a File which implements Stream? Is the correct solution to make my own tuple struct which wraps File and implements Stream itself, and is this implementation trivial? Is there another solution I'm not seeing, or am I simply misunderstanding how memory is allocated in the code above?

Is there some type which can be constructed from a File which implements Stream?
No, unfortunately. Nothing built-in in std, futures or tokio can do this directly at the moment.
Due to the "detatched" nature of Stream's items, such an implementation would have to allocate a new owned buffer for every slice of incoming data and hand it over to the caller. That wouldn't be very efficient. Until the Rust language has generic associated type (GAT), which hopefully will be in next year, we then can satisfyingly address the problem. Check out this futures-rs ticket and Niko's async interview #2 for more detail.
That being said, there are use cases right now where a Stream facade on top of underlying IO is desirable and good enough.
Is the correct solution to make my own tuple struct which wraps File and implements Stream itself, and is this implementation trivial?
For futures-0.1 that the rusoto depends on, there are several ways to implement this:
implement Stream trait for a struct that wraps a Read
make use of futures utility functions such as futures::stream::poll_fn
tokio-codec-0.1 has an excellent FramedRead that has already implemented Stream
The third is surely the easiest:
use futures::stream::Stream; // futures = "0.1.29"
use rusoto_core::{ByteStream, Region}; // rusoto_core = "0.42.0"
use rusoto_s3::{PutObjectOutput, PutObjectRequest, S3Client, S3}; // rusoto_s3 = "0.42.0"
use std::{error::Error, fs::File, path::Path};
use tokio_codec::{BytesCodec, FramedRead}; // tokio-codec = "0.1.1"
use tokio_io::io::AllowStdIo; // tokio-io = "0.1.12"
fn upload_file(file_path: &Path) -> Result<PutObjectOutput, Box<dyn Error>> {
let file = File::open(file_path)?;
let aio = AllowStdIo::new(file);
let stream = FramedRead::new(aio, BytesCodec::new()).map(|bs| bs.freeze());
let client = S3Client::new(Region::UsEast1);
let mut request = PutObjectRequest::default();
request.body = Some(ByteStream::new(stream));
Ok(client.put_object(request).sync()?)
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Does Rust's standard library support direct IO file access? - io

Is there a way to specify O_DIRECT with Rust's standard library, or do you need to use libc?

Related

Why is fs::read_dir() thread safe on POSIX platforms

Dealing with lifetimes in global variables

Generic BufReader?

How to store one of two constants in a value, where the constants share traits?

Is it possible to create a Stream from a File rather than loading the file contents into memory?

Categories

Resources