Parsing number with nom 5.0 - rust

I'm trying to parse a large file (tens of GB) streaming using Nom 5.0. One piece of the parser tries to parse numbers:
use nom::IResult;
use nom::character::streaming::{char, digit1};
// use nom::character::complete::{char, digit1};
use nom::combinator::{map, opt};
use nom::multi::many1;
use nom::sequence::{preceded, tuple};
pub fn number(input: &str) -> IResult<&str, &str> {
map(
tuple((
opt(char('-')),
many1(digit1),
opt(preceded(char('.'), many1(digit1)))
)),
|_| "0"
)(input)
}
(Obviously, it should not return "0" for all number; that's just to make the function as simple as possible.) For this parser, I wrote a test:
#[test]
fn match_positive_integer() {
let (_, res) = number("0").unwrap();
assert_eq!("0", res);
}
This test fails with Incomplete(Size(1)) because the "decimals" opt() wants to read data and it isn't there. If I switch to the complete versions of the matchers (as commented-out line), the test passes.
I assume this will actually work in production, because it will be fed additional data when complaining about incompleteness, but I would still like to create unit tests. Additionally, the issue would occur in production if a number happened to be the very last bit of input in a file. How do I convince a streaming Nom parser that there is no more data available?

One can argue that the test in its original form is correct: The parser can't decide whether the given input is a number or not, so the parsing-result is in fact undecided yet. In production, especially when reading large files as you do, the buffer of already-read-but-to-be-parsed bytes might end right in between what could be a number unless it's actually not. Then, the parser needs to preserve its current state and ask for more input so it can retry/continue. Think of Incomplete not as a final error but as I don't even know: This could be an error depending on the next byte, this problem is undecidable as of yet!.
You can use the complete-combinator on your top-level parser so when you do in fact reach EOF, you error out on that. Incomplete-results within the top-level parser should be handled e.g. by expanding the read-buffer by some margin and retrying.
You can wrap the parser in a complete()-parser local to the current unittest and test on that. Something to the tune of
#[test]
fn match_positive_integer() {
let (_, res) = complete(number("0")).unwrap();
assert_eq!("0", res);
}

Related

How to immediately early return when Iterator gives error Result in a loop?

I'm in progress of learning Rust and I'm trying to apply similar style I'm using with C++. With C++ I would be using RAII and exceptions and the closest I can get with Rust is RAII combined with early return with error Result. As such, when I write code such as
fn parse_meminfo() -> Result<Meminfo, io::Error> {
// ...
let file = File::open("/proc/meminfo")?;
for line in io::BufReader::new(file).lines() {
let line = line?;
// ...
Is it possible to somehow avoid the fragment let line = line?; of code? I already tried
for line in io::BufReader::new(file).lines()?
and
for line? in io::BufReader::new(file).lines()
and
for Ok(line) in io::BufReader::new(file).lines()
and
for line in io::BufReader::new(file).lines().expect("failed to read line")
but rustc wasn't happy with any of those. First and fourth have the test in logically wrong position because ? and expect() are testing the iterator instead of the item, if I've understood correctly. Second was syntax error, and the syntax Ok(line) is not happy to ignore the error because that doesn't mean early return. I would like to have early return with the error if the iterator in the loop gives an error and adding one extra line of code just for that seems silly and that test seems a bit detached from the actual loop. Can you suggest anything better? If the let line = line?; is indeed the idiomatic style, then I guess I would have to learn to love it.
Is it possible to somehow avoid the fragment let line = line?; of code?
Depending how you're using line, you could just tack on the ? over there. expr? simply desugars to1
match expr {
Ok(v) => v,
Err(e) => return e.into()
}
So you can see how the second attempt would not work (the left hand side of the for...in is an irrefutable pattern, not an expression).
The error handling here is largely orthogonal to the iteration, and iteration doesn't (at this juncture, though it seems unlikely that'll be added) have first-class support for error signaling, so you have an Iterator<Item=Result<...>> and you handle the Result however you wish.
If you don't care for the error reporting you could always .lines().map(|v| v.expect(...) and that'll immediately panic at the first error, but I don't know that that's what you're looking for.
1: that's not quite true anymore because of the `Try` trait, but close enough still

How to reduce flicker in terminal re-drawing?

I have a program that displays the state of some commands ran in parallel
fmt ✔
clippy cargo clippy --tests --color always ...
tests cargo test --color always ..
The program is my first one that relies on multi-threading, and I have some threads running those programs as soon as they are "available", and I have one thread (the main one) dedicated to waiting for new results (which are pretty rare, given that jobs tend to run for at leat a few seconds, and there a relatively few jobs, 10 in parallel at most) and deleting & reprinting in a loop the state of things.
In this part of the software, I don't print the output of the commands, just the commands being ran and some ascii spinner.
I don't know how these things should be done, so I managed to limit redraws to at least 40ms :
const AWAIT_TIME: Duration = std::time::Duration::from_millis(40);
fn delay(&mut self) -> usize {
let time_for = AWAIT_TIME
- SystemTime::now()
.duration_since(self.last_occurence)
.unwrap();
let millis: usize = std::cmp::max(time_for.as_millis() as usize, 0);
if millis != 0 {
sleep(time_for);
}
self.last_occurence = SystemTime::now();
millis
}
while let Some(progress) = read(&rx) { ... }
job_display.refresh(&tracker, delay);
delay = job_starter.delay();
So I end up tracking the number of lines and chars written and delete them all :
struct TermWrapper {
term: Box<StdoutTerminal>,
written_lines: u16,
written_chars: usize,
}
...
pub fn clear(&mut self) {
(0..self.written_lines as usize).for_each(|_| {
self.term.cursor_up().unwrap();
self.term.carriage_return().unwrap();
self.term.delete_line().unwrap();
});
self.written_lines = 0;
self.written_chars = 0;
}
It works, but it tends to flicker, especially in embedded terminals.
My next idea is to store the hash of printed string and skip the redraw if I can.
Are there some known patterns I can apply to get some nicer output ?
What are the common strategies I can use ?
The minimum requirement to guarantee no flicker when updating a terminal is: don't send one thing and then overwrite it with something else (within a single 'frame' of drawing). In the case of clearing, we can restate that rule more specifically: don't clear the regions that you're going to put text in. Instead, clear only regions that you know you aren't putting text in (in case there is previous text there).
The conventional terminal command set contains a very useful tool for this: the “clear to end of line” command. The way you can use it is:
Move the cursor to the beginning of a line you want to replace the text in.
Write the text, without any newline or CRLF at the end
Write “clear to end of line”. (In crossterm, that's ClearType::UntilNewLine.)
After sending the clear command, the rest of the line is cleared (just as if you had happened to write the exact number of spaces to completely fill the line). In this way, you need to keep track of which lines you're writing on, but you don't need to keep track of the exact width of each string you wrote.
The next step beyond this, useful for arbitrary 2D screen layouts, is to remember what text has previously been sent to the terminal, and only send what needs to be changed — in Rust, the tui crate provides this, and you can also find bindings to the well-known C library curses for the same purpose.

How to get keyboard input with Termion in Rust?

I'm trying out Rust and I really like it so far. I'm working on a tool that needs to get arrow key input from the user. So far, I've got something half-working: if I hold a key for a while, the relevant function gets called. However, it's far from instantaneous.
What I've got so far:
let mut stdout = io::stdout().into_raw_mode();
let mut stdin = termion::async_stdin();
// let mut stdin = io::stdin();
let mut it = stdin.keys(); //iterator object
loop {
//copied straight from GitLab: https://gitlab.redox-os.org/redox-os/termion/-/issues/168
let b = it.next();
match b {
Some(x) => match x {
Ok(k) => {
match k {
Key::Left => move_cursor(&mut cursor_char, -1, &enc_chars, &mpt, &status),
Key::Right => move_cursor(&mut cursor_char, 1, &enc_chars, &mpt, &status),
Key::Ctrl('c') => break,
_ => {}
}
},
_ => {}
},
None => {}
}
//this loop might do nothing if no recognized key was pressed.
}
I don't quite understand it myself. I'm using the terminal raw mode, if that has anything to do with it. I've looked at the rustyline crate, but that's really no good as it's more of an interactive shell-thing, and I just want to detect keypresses.
If you're using raw input mode and reading key by key, you'll need to manually buffer the character keys using the same kind of match loop you already have. The Key::Char(ch) enum variant can be used to match regular characters. You can then use either a mutable String or an array like [u8; MAX_SIZE] to store the character data and append characters as they're typed. If the user moves the cursor, you'd need to keep track of the current position within your input buffer and make sure to insert the newly typed characters into the correct spot, moving the existing characters if needed. It is a lot of work, which is why there are crates that will do it for you, but you will have less chance to control how the input behaves. If you want to use an existing crate, then tui-rs might be a good one to check out for a complete solution, or linefeed for something much simpler.
As for the delay, I think it might be because you're using AsyncReader, which according to the docs is using a secondary thread to do blocking reads

Warning function should have a snake case identifier on by default

I'm trying to figure out what this warning actually means. The program works perfectly but during compile I get this warning:
main.rs:6:1: 8:2 warning: function 'isMultiple' should have a snake case identifier,
#[warn(non_snake_case_functions)] on by default
the code is very simple:
/*
Find the sum of all multiples of 3 or 5 below 1000
*/
fn isMultiple(num: int) -> bool {
num % 5 == 0 || num % 3 == 0
}
fn main() {
let mut sum_of_multiples = 0;
//loop from 0..999
for i in range(0,1000) {
sum_of_multiples +=
if isMultiple(i) {
i
}else{
0
};
}
println!("Sum is {}", sum_of_multiples);
}
You can turn it off by including this line in your file. Check out this thread
#![allow(non_snake_case)]
Rust style is for functions with snake_case names, i.e. the compiler is recommending you write fn is_multiple(...).
My take on it is that programmers have been debating the case of names and other formatting over and over and over again, for decades. Everyone has their preferences, they are all different.
It's time we all grew up and realized that it's better we all use the same case and formatting. That makes code much easier to read for everyone in the long run. It's time to quit the selfish bickering over preferences. Just do as everyone else.
Personally I'm not much into underscores so the Rust standard upset me a bit. But I'm prepared to get use to it. Wouldn't it be great if we all did that and never had to waste time thinking and arguing about it again.
To that end I use cargo-fmt and clippy and I accept whatever they say to do. Job done, move on to next thing.
There is a method in the Rust convention:
Structs get camel case.
Variables get snake case.
Constants get all upper case.
Makes it easy to see what is what at a glance.
-ZiCog
Link to thread : https://users.rust-lang.org/t/is-snake-case-better-than-camelcase-when-writing-rust-code/36277/2

table of functions vs switch in golang

im am writing a simple emulator in go (should i? or should i go back to c?).
anyway, i am fetching the instruction and decoding it. at this point i have a byte like 0x81, and i have to execute the right function.
should i have something like this
func (sys *cpu) eval() {
switch opcode {
case 0x80:
sys.add(sys.b)
case 0x81:
sys.add(sys.c)
etc
}
}
or something like this
var fnTable = []func(*cpu) {
0x80: func(sys *cpu) {
sys.add(sys.b)
},
0x81: func(sys *cpu) {
sys.add(sys.c)
}
}
func (sys *cpu) eval() {
return fnTable[opcode](sys)
}
1.which one is better?
2.which one is faster?
also
3.can i declare a function inline?
4.i have a cpu struct in which i have the registers etc. would it be faster if i have the registers and all as globals? (without the struct)
thank you very much.
I did some benchmarks and the table version is faster than the switch version once you have more than about 4 cases.
I was surprised to discover that the Go compiler (gc, anyway; not sure about gccgo) doesn't seem to be smart enough to turn a dense switch into a jump table.
Update:
Ken Thompson posted on the Go mailing list describing the difficulties of optimizing switch.
The first version looks better to me, YMMV.
Benchmark it. Depends how good is the compiler at optimizing. The "jump table" version might be faster if the compiler doesn't try hard enough to optimize.
Depends on your definition of what is "to declare function inline". Go can declare and define functions/methods at the top level only. But functions are first class citizens in Go, so one can have variables/parameters/return values and structured types of function type. In all this places a function literal can [also] be assigned to the variable/field/element...
Possibly. Still I would suggest to not keep the cpu state in a global variable. Once you possibly decide to go emulating multicore, it will be welcome ;-)
If you have the ast of some expression, and you want to eval it for a big amount of data rows, then you may only once compile it into the tree of lambdas, and do not calculate any switches on each iteration at all;
For example, given such ast: {* (a, {+ (b, c)})}
Compile function (in very rough pseudo language) will be something like this:
func (e *evaluator) compile(brunch ast) {
switch brunch.type {
case binaryOperator:
switch brunch.op {
case *: return func() {compile(brunch.arg0) * compile(brunch.arg1)}
case +: return func() {compile(brunch.arg0) + compile(brunch.arg1)}
}
case BasicLit: return func() {return brunch.arg0}
case Ident: return func(){return e.GetIdent(brunch.arg0)}
}
}
So eventually compile returns the func, that must be called on each row of your data and there will be no switches or other calculation stuff at all.
There remains the question about operations with data of different types, that is for your own research ;)
This is an interesting approach, in situations, when there is no jump-table mechanism available :) but sure, func call is more complex operation then jump.

Resources