I am taking data from text file
let fil1_json = File::open("fil1.json")?;
let mut fil1_json_reader = BufReader::new(fil1_json);
let fil2_json = File::open("fil2.json")?;
let mut fil2_json_reader = BufReader::new(fil2_json);
for fil1_line in fil1_json_reader.by_ref().lines() {
for fil2_line in fil2_json_reader.by_ref().lines() {
println!("{:#?} ----- {:#?}", fil1_line, fil2_line);
}
}
In the second nested loop, it is only going inside once. It looks like fil2_json_reader is getting emptied after first iteration.
Where it is changing as I am not changing anywhere?
Where it is changing as I am not changing anywhere?
Readers consume the data. In the case of File, this is the natural expectation, since file abstractions almost universally have a cursor that advances every time you read.
If you want to iterate several times over the same data, then the obvious option is saving it to memory (typically before splitting into lines(), but you can also save a vector of those even if it will be slower). However, since the reader is backed by an actual file, it is better to re-iterate over the file by seeking to its beginning:
fil2_json_reader.seek(SeekFrom::Start(0))
Related
I have a program that displays the state of some commands ran in parallel
fmt ✔
clippy cargo clippy --tests --color always ...
tests cargo test --color always ..
The program is my first one that relies on multi-threading, and I have some threads running those programs as soon as they are "available", and I have one thread (the main one) dedicated to waiting for new results (which are pretty rare, given that jobs tend to run for at leat a few seconds, and there a relatively few jobs, 10 in parallel at most) and deleting & reprinting in a loop the state of things.
In this part of the software, I don't print the output of the commands, just the commands being ran and some ascii spinner.
I don't know how these things should be done, so I managed to limit redraws to at least 40ms :
const AWAIT_TIME: Duration = std::time::Duration::from_millis(40);
fn delay(&mut self) -> usize {
let time_for = AWAIT_TIME
- SystemTime::now()
.duration_since(self.last_occurence)
.unwrap();
let millis: usize = std::cmp::max(time_for.as_millis() as usize, 0);
if millis != 0 {
sleep(time_for);
}
self.last_occurence = SystemTime::now();
millis
}
while let Some(progress) = read(&rx) { ... }
job_display.refresh(&tracker, delay);
delay = job_starter.delay();
So I end up tracking the number of lines and chars written and delete them all :
struct TermWrapper {
term: Box<StdoutTerminal>,
written_lines: u16,
written_chars: usize,
}
...
pub fn clear(&mut self) {
(0..self.written_lines as usize).for_each(|_| {
self.term.cursor_up().unwrap();
self.term.carriage_return().unwrap();
self.term.delete_line().unwrap();
});
self.written_lines = 0;
self.written_chars = 0;
}
It works, but it tends to flicker, especially in embedded terminals.
My next idea is to store the hash of printed string and skip the redraw if I can.
Are there some known patterns I can apply to get some nicer output ?
What are the common strategies I can use ?
The minimum requirement to guarantee no flicker when updating a terminal is: don't send one thing and then overwrite it with something else (within a single 'frame' of drawing). In the case of clearing, we can restate that rule more specifically: don't clear the regions that you're going to put text in. Instead, clear only regions that you know you aren't putting text in (in case there is previous text there).
The conventional terminal command set contains a very useful tool for this: the “clear to end of line” command. The way you can use it is:
Move the cursor to the beginning of a line you want to replace the text in.
Write the text, without any newline or CRLF at the end
Write “clear to end of line”. (In crossterm, that's ClearType::UntilNewLine.)
After sending the clear command, the rest of the line is cleared (just as if you had happened to write the exact number of spaces to completely fill the line). In this way, you need to keep track of which lines you're writing on, but you don't need to keep track of the exact width of each string you wrote.
The next step beyond this, useful for arbitrary 2D screen layouts, is to remember what text has previously been sent to the terminal, and only send what needs to be changed — in Rust, the tui crate provides this, and you can also find bindings to the well-known C library curses for the same purpose.
I'm trying out Rust and I really like it so far. I'm working on a tool that needs to get arrow key input from the user. So far, I've got something half-working: if I hold a key for a while, the relevant function gets called. However, it's far from instantaneous.
What I've got so far:
let mut stdout = io::stdout().into_raw_mode();
let mut stdin = termion::async_stdin();
// let mut stdin = io::stdin();
let mut it = stdin.keys(); //iterator object
loop {
//copied straight from GitLab: https://gitlab.redox-os.org/redox-os/termion/-/issues/168
let b = it.next();
match b {
Some(x) => match x {
Ok(k) => {
match k {
Key::Left => move_cursor(&mut cursor_char, -1, &enc_chars, &mpt, &status),
Key::Right => move_cursor(&mut cursor_char, 1, &enc_chars, &mpt, &status),
Key::Ctrl('c') => break,
_ => {}
}
},
_ => {}
},
None => {}
}
//this loop might do nothing if no recognized key was pressed.
}
I don't quite understand it myself. I'm using the terminal raw mode, if that has anything to do with it. I've looked at the rustyline crate, but that's really no good as it's more of an interactive shell-thing, and I just want to detect keypresses.
If you're using raw input mode and reading key by key, you'll need to manually buffer the character keys using the same kind of match loop you already have. The Key::Char(ch) enum variant can be used to match regular characters. You can then use either a mutable String or an array like [u8; MAX_SIZE] to store the character data and append characters as they're typed. If the user moves the cursor, you'd need to keep track of the current position within your input buffer and make sure to insert the newly typed characters into the correct spot, moving the existing characters if needed. It is a lot of work, which is why there are crates that will do it for you, but you will have less chance to control how the input behaves. If you want to use an existing crate, then tui-rs might be a good one to check out for a complete solution, or linefeed for something much simpler.
As for the delay, I think it might be because you're using AsyncReader, which according to the docs is using a secondary thread to do blocking reads
I need to profile several variables like frames per second being rendered in my app. Therefore I need a simple way to update variables in the terminal.
I've searched and found ascii_table for generating tables, and termion for updating the terminal. But I suspect termion here is simply being used to clear the terminal.
Anyways, I was able to draw a simple table and update its contents every 200 miliseconds:
use ascii_table::{Align, AsciiTable, Column};
extern crate termion;
use termion::{clear, color, cursor};
use std::fmt::Display;
use std::{thread, time};
fn main() {
let mut ascii_table = AsciiTable::default();
ascii_table.max_width = 40;
let mut column = Column::default();
column.header = "H1".into();
column.align = Align::Left;
ascii_table.columns.insert(0, column);
let mut column = Column::default();
column.header = "H2".into();
column.align = Align::Center;
ascii_table.columns.insert(1, column);
let mut column = Column::default();
column.header = "H3".into();
column.align = Align::Right;
ascii_table.columns.insert(2, column);
let mut i = 0;
while (true) {
let data: Vec<Vec<&dyn Display>> = vec![
vec![&i, &"hello", &789],
];
let s = ascii_table.format(data.clone());
println!(
"\n{}{}{}{}",
cursor::Hide,
clear::All,
cursor::Goto(1, 1),
s
);
println!("Hello");//couldn't make this appear on top.
i = i+1;
std::thread::sleep(std::time::Duration::from_millis(200));
}
}
Is this the way programs like top update data on the terminal? Or is there a better way? It'd be nice to have more complex structures.
There isn't a fundamentally better way to format complex data on a terminal than you're doing. There are some individual refinements that can be made to improve display quality.
In particular, in order to reduce flickering, it is best to overwrite text rather than clearing the entire terminal first, and only clear the parts that either need to become blank or are already blank, using narrower clear operations such as clear to end of line, which you would use when you're replacing a line and it might become shorter — by putting this clear at the end of the text, so that if the text is unchanged it doesn't disappear briefly.
Since you're starting with code that generates multiline text, you'll need to edit the string:
// Insert clear-to-end-of-line into the table
let s = ascii_table.format(data.clone())
.replace("\n", &format!("{}\n", clear::UntilNewline));
println!(
"\n{}{}{}{}",
cursor::Hide,
cursor::Goto(1, 1),
s,
clear::AfterCursor,
);
Notice that I have reordered the operations: first, we go to (1, 1) and draw the text (clearing to end of line as we go). Then when everything is done, we clear from cursor to end of screen. This way, we're never clearing any of the text we want to be still present, so there will be no flicker.
I notice you have another wish:
println!("Hello");//couldn't make this appear on top.
All you need to do here is do it after the goto and before the table, and include clearing, and it'll work as you'd like.
// Move cursor to top. This always goes first.
// Note print!, not println!, since we don't want to move down
// after the goto.
print!("{}{}", cursor::Hide, cursor::Goto(1, 1));
// Use clear::UntilNewline on intermediate things
println!("Hello{}", clear::UntilNewline);
// ...even if they are blank lines
println!("{}", clear::UntilNewline);
// Use clear::AfterCursor on the *last* thing printed
// Note print!, not println!, since if we are filling the entire terminal
// we don't want to cause it to scroll down.
print!("{}{}", s, clear::AfterCursor);
One thing I haven't covered is that you can also use termion::cursor::Goto to move to specific areas on the terminal to update them, instead of writing entire lines top-to-bottom. This is of course more complex since your program has to comprehend the entire layout to know what cursor position to go to, and know which parts need to be redrawn. In the days of actual serial terminals and modems that had very low data rates, this was a very important optimization to avoid wasting transmission time on characters that were the same — today, it's less critical.
I'm fairly new to Rust. I graduated with a Computer Engineering degree 4 years ago, and I remember discussing (and understanding) atomic operations in my Operating Systems course. However, since graduating, I've been working primarily in high-level languages where I haven't had to care about low-level stuff like atomics. Now that I'm getting into Rust, I'm struggling to remember how a lot of this stuff works.
I'm currently trying to understand the source code for the hibitset library, specifically atomic.rs.
This module specifies an AtomicBitSet type which corresponds to the BitSet type from lib.rs, but using atomic values and operations. From my understanding, an "atomic operation" is an operation that is guaranteed to not be interrupted by another thread; any "load" or "store" on the same value will have to wait for the operation to finish before proceeding. Following from this definition, an "atomic value" is a value whose operations are fully atomic. AtomicBitSet uses AtomicUsize, which is a usize wrapper where all methods are fully atomic. However, AtomicBitSet specifies several operations that seem to not be atomic (add and remove), and there is one atomic operation: add_atomic. Looking at add vs add_atomic, I can't really tell what the difference is.
Here is add (verbatim):
/// Adds `id` to the `BitSet`. Returns `true` if the value was
/// already in the set.
#[inline]
pub fn add(&mut self, id: Index) -> bool {
use std::sync::atomic::Ordering::Relaxed;
let (_, p1, p2) = offsets(id);
if self.layer1[p1].add(id) {
return true;
}
self.layer2[p2].store(self.layer2[p2].load(Relaxed) | id.mask(SHIFT2), Relaxed);
self.layer3
.store(self.layer3.load(Relaxed) | id.mask(SHIFT3), Relaxed);
false
}
This method calls load() and store() directly. I'm assuming that the fact that it's using Ordering::Relaxed is what makes this method non-atomic, because another thread doing the same thing to a different index might clobber this operation.
Here is add_atomic (verbatim):
/// Adds `id` to the `AtomicBitSet`. Returns `true` if the value was
/// already in the set.
///
/// Because we cannot safely extend an AtomicBitSet without unique ownership
/// this will panic if the Index is out of range.
#[inline]
pub fn add_atomic(&self, id: Index) -> bool {
let (_, p1, p2) = offsets(id);
// While it is tempting to check of the bit was set and exit here if it
// was, this can result in a data race. If this thread and another
// thread both set the same bit it is possible for the second thread
// to exit before l3 was set. Resulting in the iterator to be in an
// incorrect state. The window is small, but it exists.
let set = self.layer1[p1].add(id);
self.layer2[p2].fetch_or(id.mask(SHIFT2), Ordering::Relaxed);
self.layer3.fetch_or(id.mask(SHIFT3), Ordering::Relaxed);
set
}
This method uses fetch_or instead of calling load and store directly, which I'm assuming is what makes this method atomic.
But why does the usage of Ordering::Relaxed still allow this to be considered atomic? I realize that the individual "or" operations are atomic, but the full method could be run at the same time as another thread. Wouldn't that have an impact?
Moreover, why would a type like this expose non-atomic methods? Is it just for performance? That seems confusing to me. If I were to pick an AtomicBitSet over a BitSet because it's going to be used by more than one thread, I'd probably want to only use atomic operations on it. If I didn't I wouldn't be using it. Right?
I'd also love an explanation of the comment inside add_atomic. As-is it does not make sense to me. Doesn't the non-atomic version still have to care about that? It seems like the two methods are doing effectively the same thing, just with different levels of atomicity.
I'd really just love some help wrapping my head around atomics. I think I understand ordering after reading this and this, but both are still using concepts that I don't understand. When they talk about one thread "seeing" something from another, what does that mean exactly? When it's said that sequentially-consistent operations have the same order "across all threads" what does that even mean? Does the processor change the instruction order differently for different threads?
In the non-atomic case, this line:
self.layer2[p2].store(self.layer2[p2].load(Relaxed) | id.mask(SHIFT2), Relaxed);
is more or less equivalent to:
let tmp1 = self.layer2[p2];
let tmp2 = tmp1 | id.mask(SHIFT2);
self.layer2[p2] = tmp2;
so another thread could change self.layer2[p2] between the moment it is read into tmp1 and the moment tmp2 is stored into it. So if another thread tries to set another bit at the same time, there is a risk that the following sequence occurs:
thread 1 reads an empty mask,
thread 2 reads an empty mask,
thread 1 sets bit 1 of the mask and writes it,
thread 2 sets bit 2 of the mask and writes it, thus overwriting the value set by thread 1,
in the end only bit 2 is set!
The same goes for self.layer3.
In the atomic case, the use of fetch_or guarantees that the whole read-modify-write cycle is atomic.
In both cases, since the ordering is relaxed, the writes to layer2 and layer3 may seem to occur in any order as seen from other threads.
The comment inside add_atomic is meant avoid an issue when two threads try to add the same bit. Assume that add_atomic was written like this:
pub fn add_atomic(&self, id: Index) -> bool {
let (_, p1, p2) = offsets(id);
if self.layer1[p1].add(id) {
return true;
}
self.layer2[p2].fetch_or(id.mask(SHIFT2), Ordering::Relaxed);
self.layer3.fetch_or(id.mask(SHIFT3), Ordering::Relaxed);
false
}
Then you risk the following sequence:
thread 1 sets bit 1 in layer1 and sees that it wasn't set beforehand,
thread 2 tries to set bit 1 in layer1 and sees that thread 1 already set it, so thread 2 returns from add_atomic,
thread 2 executes another operation that requires reading layer3, but layer3 has not been updated yet, so thread 2 gets a wrong value!
thread 1 updates layer3, but it is too late.
This is why the add_atomic case ensures that layer2 and layer3 are set properly in all threads even if it looked like the bit was already set beforehand.
So I've been searching for a solution to this problem for some time. I've written a program to take data from two separate text files, parse it, and output to another text file and an ARFF file for analysis by Weka. The problem I'm running into is that the function I wrote to handle the data read and parsing operations doesn't de-allocate memory properly. Every successive call uses an additional 100MB or so and I need call this function over 60 times over the course of the function. Is there a way to force D to de-allocate memory, with respect to arrays, dynamic arrays, and associative arrays in particular?
An example of my problem:
struct Datum {
string Foo;
int Bar;
}
Datum[] Collate() {
Datum[] data;
int[] userDataSet;
int[string] secondarySet;
string[] raw = splitLines(readText(readFile)).dup;
foreach (r; raw) {
userDataSet ~= parse(r);
secondarySet[r.split(",").dup] = parseSomeOtherWay(r);
}
data = doSomeOtherCalculation(userDataSet, secondarySet);
return data;
}
Are the strings in the returned data still pointing inside the original text file?
Array slicing operations in D do not make a copy of the data - instead, they just store a pointer and length. This also applies to splitLines, split, and possibly to doSomeOtherCalculation. This means that as long as a substring of the original file text exists anywhere in the program, the entire file's contents cannot be freed.
If the data you're returning is only a small fraction of the size of the text file you're reading, you can use .dup to make a copy of the string. This will prevent the small strings from pinning the entire file's contents in memory.
If the content of the Collate() result is duplicated after the call, it's probable that it's not collected by the GC and thus resides in memory while it's not used anymore. If so then you can use a global container that you reset for each Collate():
void Collate(out Datum[] data) {
// data content is cleared because of 'out' param storage class
// your processing to fill data
}