How do I read OS-compatible strings from stdin? - rust

I'm trying to write a Rust program that gets a separated list of filenames on stdin.
On Windows, I might invoke it from a cmd window with something like:
dir /b /s | findstr .*,v$ | rust-prog -n
On Unix I'd use something like:
find . -name '*,v' -print0 | rust-prog -0
I'm having trouble converting what I receive on stdin into something that can be used by std::path::Path. As I understand it, to get something that will compile on Windows or Unix, I'm going to need to use conditional compilation, and std::os::windows::ffi or std::os::unix::ffi as appropriate.
Furthermore, It seems on Windows I'll need to use kernel32::MultiByteToWideChar using the current code page to create something usable by std::os::windows::ffi::OsStrExt.
Is there an easier way to do this? Does what I'm suggesting even seem workable?
As an example, it's easy to convert a string to a path, so I tried to use the string handling functions of stdin:
use std::io::{self, Read};
fn main() {
let mut buffer = String::new();
match io::stdin().read_line(&mut buffer) {
Ok(n) => println!("{}", buffer),
Err(error) => println!("error: {}", error)
}
}
On Windows, if I have a directory with a single file called ¿.txt (that's 0xbf). and pipe the name into stdin. I get: error: stream did not contain valid UTF-8.

Here's a reasonable looking version for Windows. Convert the console supplied string to a wide string using win32api functions then wrap it in an OsString using OsString::from_wide.
I'm not convinced it uses the correct code page yet. dir seems to use OEM code page, so maybe that should be the default. There's also a distinction between input code page and output code page in a console.
In my Cargo.toml
[dependencies]
winapi = "0.2"
kernel32-sys = "0.2.2"
Code to read a list of filenames piped through stdin on Windows as per the question.
extern crate kernel32;
extern crate winapi;
use std::io::{self, Read};
use std::ptr;
use std::fs::metadata;
use std::ffi::OsString;
use std::os::windows::ffi::OsStringExt;
/// Convert windows console input to wide string that can
/// be used by OS functions
fn wide_from_console_string(bytes: &[u8]) -> Vec<u16> {
assert!(bytes.len() < std::i32::MAX as usize);
let mut wide;
let mut len;
unsafe {
let cp = kernel32::GetConsoleCP();
len = kernel32::MultiByteToWideChar(cp, 0, bytes.as_ptr() as *const i8, bytes.len() as i32, ptr::null_mut(), 0);
wide = Vec::with_capacity(len as usize);
len = kernel32::MultiByteToWideChar(cp, 0, bytes.as_ptr() as *const i8, bytes.len() as i32, wide.as_mut_ptr(), len);
wide.set_len(len as usize);
}
wide
}
/// Extract paths from a list supplied as Cr LF
/// separated wide string
/// Would use a generic split on substring if it existed
fn paths_from_wide(wide: &[u16]) -> Vec<OsString> {
let mut r = Vec::new();
let mut start = 0;
let mut i = start;
let len = wide.len() - 1;
while i < len {
if wide[i] == 13 && wide[i + 1] == 10 {
if i > start {
r.push(OsString::from_wide(&wide[start..i]));
}
start = i + 2;
i = i + 2;
} else {
i = i + 1;
}
}
if i > start {
r.push(OsString::from_wide(&wide[start..i]));
}
r
}
fn main() {
let mut bytes = Vec::new();
if let Ok(_) = io::stdin().read_to_end(&mut bytes) {
let pathlist = wide_from_console_string(&bytes[..]);
let paths = paths_from_wide(&pathlist[..]);
for path in paths {
match metadata(&path) {
Ok(stat) => println!("{:?} is_file: {}", &path, stat.is_file()),
Err(e) => println!("Error: {:?} for {:?}", e, &path)
}
}
}
}

Related

Read binary file in units of f64 in Rust

Assuming you have a binary file example.bin and you want to read that file in units of f64, i.e. the first 8 bytes give a float, the next 8 bytes give a number, etc. (assuming you know endianess) How can this be done in Rust?
I know that one can use std::fs::read("example.bin") to get a Vec<u8> of the data, but then you have to do quite a bit of "gymnastics" to convert always 8 of the bytes to a f64, i.e.
fn eight_bytes_to_array(barry: &[u8]) -> &[u8; 8] {
barry.try_into().expect("slice with incorrect length")
}
let mut file_content = std::fs::read("example.bin").expect("Could not read file!");
let nr = eight_bytes_to_array(&file_content[0..8]);
let nr = f64::from_be_bytes(*nr_dp_per_spectrum);
I saw this post, but its from 2015 and a lot of changes have happend in Rust since then, so I was wondering if there is a better/faster way these days?
Example without proper error handling and checking for cases when file contains not divisible amount of bytes.
use std::fs::File;
use std::io::{BufReader, Read};
fn main() {
// Using BufReader because files in std is unbuffered by default
// And reading by 8 bytes is really bad idea.
let mut input = BufReader::new(
File::open("floats.bin")
.expect("Failed to open file")
);
let mut floats = Vec::new();
loop {
use std::io::ErrorKind;
// You may use 8 instead of `size_of` but size_of is less error-prone.
let mut buffer = [0u8; std::mem::size_of::<f64>()];
// Using read_exact because `read` may return less
// than 8 bytes even if there are bytes in the file.
// This, however, prevents us from handling cases
// when file size cannot be divided by 8.
let res = input.read_exact(&mut buffer);
match res {
// We detect if we read until the end.
// If there were some excess bytes after last read, they are lost.
Err(error) if error.kind() == ErrorKind::UnexpectedEof => break,
// Add more cases of errors you want to handle.
_ => {}
}
// You should do better error-handling probably.
// This simply panics.
res.expect("Unexpected error during read");
// Use `from_be_bytes` if numbers in file is big-endian
let f = f64::from_le_bytes(buffer);
floats.push(f);
}
}
I would create a generic iterator that returns f64 for flexibility and reusability.
struct F64Reader<R: io::BufRead> {
inner: R,
}
impl<R: io::BufRead> F64Reader<R> {
pub fn new(inner: R) -> Self {
Self{
inner
}
}
}
impl<R: io::BufRead> Iterator for F64Reader<R> {
type Item = f64;
fn next(&mut self) -> Option<Self::Item> {
let mut buff: [u8; 8] = [0;8];
self.inner.read_exact(&mut buff).ok()?;
Some(f64::from_be_bytes(buff))
}
}
This means if the file is large, you can loop through the values without storing it all in memory
let input = fs::File::open("example.bin")?;
for f in F64Reader::new(io::BufReader::new(input)) {
println!("{}", f)
}
Or if you want all the values you can collect them
let input = fs::File::open("example.bin")?;
let values : Vec<f64> = F64Reader::new(io::BufReader::new(input)).collect();

How to read a text File in Rust and read mutliple Values per line

So basically, I have a text file with the following syntax:
String int
String int
String int
I have an idea how to read the Values if there is only one entry per line, but if there are multiple, I do not know how to do it.
In Java, I would do something simple with while and Scanner but in Rust I have no clue.
I am fairly new to Rust so please help me.
Thanks for your help in advance
Solution
Here is my modified Solution of #netwave 's code:
use std::fs;
use std::io::{BufRead, BufReader, Error};
fn main() -> Result<(), Error> {
let buff_reader = BufReader::new(fs::File::open(file)?);
for line in buff_reader.lines() {
let parsed = sscanf::scanf!(line?, "{} {}", String, i32);
println!("{:?}\n", parsed);
}
Ok(())
}
You can use the BuffRead trait, which has a read_line method. Also you can use lines.
For doing so the easiest option would be to wrap the File instance with a BuffReader:
use std::fs;
use std::io::{BufRead, BufReader};
...
let buff_reader = BufReader::new(fs::File::open(path)?);
loop {
let mut buff = String::new();
buff_reader.read_line(&mut buff)?;
println!("{}", buff);
}
Playground
Once you have each line you can easily use sscanf crate to parse the line to the types you need:
let parsed = sscanf::scanf!(buff, "{} {}", String, i32);
Based on: https://doc.rust-lang.org/rust-by-example/std_misc/file/read_lines.html
For data.txt to contain:
str1 100
str2 200
str3 300
use std::fs::File;
use std::io::{self, BufRead};
use std::path::Path;
fn main() {
// File hosts must exist in current path before this produces output
if let Ok(lines) = read_lines("./data.txt") {
// Consumes the iterator, returns an (Optional) String
for line in lines {
if let Ok(data) = line {
let values: Vec<&str> = data.split(' ').collect();
match values.len() {
2 => {
let strdata = values[0].parse::<String>();
let intdata = values[1].parse::<i32>();
println!("Got: {:?} {:?}", strdata, intdata);
},
_ => panic!("Invalid input line {}", data),
};
}
}
}
}
// The output is wrapped in a Result to allow matching on errors
// Returns an Iterator to the Reader of the lines of the file.
fn read_lines<P>(filename: P) -> io::Result<io::Lines<io::BufReader<File>>>
where P: AsRef<Path>, {
let file = File::open(filename)?;
Ok(io::BufReader::new(file).lines())
}
Outputs:
Got: Ok("str1") Ok(100)
Got: Ok("str2") Ok(200)
Got: Ok("str3") Ok(300)

Streaming version for Rust (nom 7) multiline parser

I'm trying to learn NOM, but I'm also new in Rust.
I have a text file with words on each line.
I want to split it to 2 files: valid ASCII (without control codes) and everything else.
extern crate nom;
use std::fs::File;
use std::io::{prelude::*, BufReader};
fn main() -> std::io::Result<()> {
let file = File::open("/words.txt")?;
let reader = BufReader::new(file);
for line in reader.lines().filter_map(|result| result.ok()) {
parse(line);
}
Ok(())
}
fn parse(line: String) {
for c in line.chars() {
if c.is_ascii_control() | !c.is_ascii() {
println!("C> {}", line);
return;
}
}
if line.len() > 0 {
println!("A{}> {}", line.len(), line);
}
}
But input file is too large for in-memory processing and I should use Streaming functionality, like this.
How to modify this code to combine streaming buffer with limited capacity (1000 chars) and line_ending check?

Why does a generic function replicating C's fread for unsigned integers always return zero?

I am trying to read in binary 16-bit machine instructions from a 16-bit architecture (the exact nature of that is irrelevant here), and print them back out as hexadecimal values. In C, I found this simple by using the fread function to read 16 bits into a uint16_t.
I figured that I would try to replicate fread in Rust. It seems to be reasonably trivial if I can know ahead-of-time the exact size of the variable that is being read into, and I had that working specifically for 16 bits.
I decided that I wanted to try to make the fread function generic over the various built-in unsigned integer types. For that I came up with the below function, using some traits from the Num crate:
fn fread<T>(
buffer: &mut T,
element_count: usize,
stream: &mut BufReader<File>,
) -> Result<usize, std::io::Error>
where
T: num::PrimInt + num::Unsigned,
{
let type_size = std::mem::size_of::<T>();
let mut buf = Vec::with_capacity(element_count * type_size);
let buf_slice = buf.as_mut_slice();
let bytes_read = match stream.read_exact(buf_slice) {
Ok(()) => element_count * type_size,
Err(ref e) if e.kind() == std::io::ErrorKind::UnexpectedEof => 0,
Err(e) => panic!("{}", e),
};
*buffer = buf_slice
.iter()
.enumerate()
.map(|(i, &b)| {
let mut holder2: T = num::zero();
holder2 = holder2 | T::from(b).expect("Casting from u8 to T failed");
holder2 << ((type_size - i) * 8)
})
.fold(num::zero(), |acc, h| acc | h);
Ok(bytes_read)
}
The issue is that when I call it in the main function, I seem to always get 0x00 back out, but the number of bytes read that is returned by the function is always 2, so that the program enters an infinite loop:
extern crate num;
use std::fs::File;
use std::io::BufReader;
use std::io::prelude::Read;
fn main() -> Result<(), std::io::Error> {
let cmd_line_args = std::env::args().collect::<Vec<_>>();
let f = File::open(&cmd_line_args[1])?;
let mut reader = BufReader::new(f);
let mut instructions: Vec<u16> = Vec::new();
let mut next_instruction: u16 = 0;
fread(&mut next_instruction, 1, &mut reader)?;
let base_address = next_instruction;
while fread(&mut next_instruction, 1, &mut reader)? > 0 {
instructions.push(next_instruction);
}
println!("{:#04x}", base_address);
for i in instructions {
println!("0x{:04x}", i);
}
Ok(())
}
It appears to me that I'm somehow never reading anything from the file, so the function always just returns the number of bytes it was supposed to read. I'm clearly not using something correctly here, but I'm honestly unsure what I'm doing wrong.
This is compiled on Rust 1.26 stable for Windows if that matters.
What am I doing wrong, and what should I do differently to replicate fread? I realise that this is probably a case of the XY problem (in that there's almost certainly a better Rust way to repeatedly read some bytes from a file and pack them into one unsigned integer), but I'm really curious as to what I'm doing wrong here.
Your problem is that this line:
let mut buf = Vec::with_capacity(element_count * type_size);
creates a zero-length vector, even though it allocates memory for element_count * type_size bytes. Therefore you are asking stream.read_exact to read zero bytes. One way to fix this is to replace the above line with:
let mut buf = vec![0; element_count * type_size];
Side note: when the read succeeds, bytes_read receives the number of bytes you expected to read, not the number of bytes you actually read. You should probably use std::mem::size_of_val (buf_slice) to get the true byte count.
in that there's almost certainly a better Rust way to repeatedly read some bytes from a file and pack them into one unsigned integer
Yes, use the byteorder crate. This requires no unneeded heap allocation (the Vec in the original code):
extern crate byteorder;
use byteorder::{LittleEndian, ReadBytesExt};
use std::{
fs::File, io::{self, BufReader, Read},
};
fn read_instructions_to_end<R>(mut rdr: R) -> io::Result<Vec<u16>>
where
R: Read,
{
let mut instructions = Vec::new();
loop {
match rdr.read_u16::<LittleEndian>() {
Ok(instruction) => instructions.push(instruction),
Err(e) => {
return if e.kind() == std::io::ErrorKind::UnexpectedEof {
Ok(instructions)
} else {
Err(e)
}
}
}
}
}
fn main() -> Result<(), std::io::Error> {
let name = std::env::args().skip(1).next().expect("no file name");
let f = File::open(name)?;
let mut f = BufReader::new(f);
let base_address = f.read_u16::<LittleEndian>()?;
let instructions = read_instructions_to_end(f)?;
println!("{:#04x}", base_address);
for i in &instructions {
println!("0x{:04x}", i);
}
Ok(())
}

How do I use include_str! for multiple files or an entire directory?

I would like to copy an entire directory to a location in a user's $HOME. Individually copying files to that directory is straightforward:
let contents = include_str!("resources/profiles/default.json");
let fpath = dpath.join(&fname);
fs::write(fpath, contents).expect(&format!("failed to create profile: {}", n));
I haven't found a way to adapt this to multiple files:
for n in ["default"] {
let fname = format!("{}{}", n, ".json");
let x = format!("resources/profiles/{}", fname).as_str();
let contents = include_str!(x);
let fpath = dpath.join(&fname);
fs::write(fpath, contents).expect(&format!("failed to create profile: {}", n));
}
...the compiler complains that x must be a string literal.
As far as I know, there are two options:
Write a custom macro.
Replicate the first code for each file I want to copy.
What is the best way of doing this?
I would create a build script that iterates through a directory, building up an array of tuples containing the name and another macro call to include the raw data:
use std::{
env,
error::Error,
fs::{self, File},
io::Write,
path::Path,
};
const SOURCE_DIR: &str = "some/path/to/include";
fn main() -> Result<(), Box<dyn Error>> {
let out_dir = env::var("OUT_DIR")?;
let dest_path = Path::new(&out_dir).join("all_the_files.rs");
let mut all_the_files = File::create(&dest_path)?;
writeln!(&mut all_the_files, r##"["##,)?;
for f in fs::read_dir(SOURCE_DIR)? {
let f = f?;
if !f.file_type()?.is_file() {
continue;
}
writeln!(
&mut all_the_files,
r##"("{name}", include_bytes!(r#"{name}"#)),"##,
name = f.path().display(),
)?;
}
writeln!(&mut all_the_files, r##"]"##,)?;
Ok(())
}
This has some weaknesses, namely that it requires the path to be expressible as a &str. Since you were already using include_string!, I don't think that's an extra requirement. This also means that the generated string has to be a valid Rust string. We use raw strings inside the generated file, but this can still fail if a filename were to contain the string "#. A better solution would probably use str::escape_default.
Since we are including files, I used include_bytes! instead of include_str!, but if you really needed to you can switch back. The raw bytes skips performing UTF-8 validation at compile time, so it's a small win.
Using it involves importing the generated value:
const ALL_THE_FILES: &[(&str, &[u8])] = &include!(concat!(env!("OUT_DIR"), "/all_the_files.rs"));
fn main() {
for (name, data) in ALL_THE_FILES {
println!("File {} is {} bytes", name, data.len());
}
}
See also:
How can I locate resources for testing with Cargo?
You can use include_dir macro.
use include_dir::{include_dir, Dir};
use std::path::Path;
const PROJECT_DIR: Dir = include_dir!(".");
// of course, you can retrieve a file by its full path
let lib_rs = PROJECT_DIR.get_file("src/lib.rs").unwrap();
// you can also inspect the file's contents
let body = lib_rs.contents_utf8().unwrap();
assert!(body.contains("SOME_INTERESTING_STRING"));
Using a macro:
macro_rules! incl_profiles {
( $( $x:expr ),* ) => {
{
let mut profs = Vec::new();
$(
profs.push(($x, include_str!(concat!("resources/profiles/", $x, ".json"))));
)*
profs
}
};
}
...
let prof_tups: Vec<(&str, &str)> = incl_profiles!("default", "python");
for (prof_name, prof_str) in prof_tups {
let fname = format!("{}{}", prof_name, ".json");
let fpath = dpath.join(&fname);
fs::write(fpath, prof_str).expect(&format!("failed to create profile: {}", prof_name));
}
Note: This is not dynamic. The files ("default" and "python") are specified in the call to the macro.
Updated: Use Vec instead of HashMap.

Resources