Is there a FuturesOrdered alternative that yields results one by one? - rust

In Rust, I have a bunch of async functions that I want to execute in parallel. The order in which the results of these functions are handled is important. I also want to retrieve the results of these functions as they become available.
Let me explain poorly.
Here is the description of FuturesOrdered:
This "combinator" is similar to FuturesUnordered, but it imposes an
order on top of the set of futures. While futures in the set will race
to completion in parallel, results will only be returned in the order
their originating futures were added to the queue.
So far so good. Now look at this example:
let mut ft = FuturesOrdered::new();
ft.push(wait_n(1)); // wait_n sleeps
ft.push(wait_n(2)); // for the given
ft.push(wait_n(4)); // number of secs
ft.push(wait_n(3));
ft.push(wait_n(5));
let r = ft.collect::<Vec<u64>>().await;
Since FuturesOrdered awaits until all futures are completed; this is what I get:
|--| ++
|----| ++
|--------| ++
|------| ++
|----------|++
++-> all results available here
This is what I want:
|--|++
|----|++
|--------|++
|------| ++
|----------| ++
In other words; I want to await on the next future; as the remaining futures keep racing to completion. Also note that even though task #4 was completed before task #3; it was handled after #3 because of the initial order.
How can I get a stream of futures that are executed concurrently like this? I was hoping for something like this:
let mut ft = MagicalStreamOfOrderedFutures::new();
ft.push(wait_n(1));
ft.push(wait_n(2));
ft.push(wait_n(4));
ft.push(wait_n(3));
ft.push(wait_n(5));
while Some(result) = ft.next().await {
// returns results in order at seconds 1,2,4,4,5
}

Since FuturesOrdered awaits until all futures are completed
It does not inherently do that.
You're asking it to because you're collect-ing to a Vec. Since the entire point of StreamExt::collect is to convert the entire stream into a collection:
Transforms a stream into a collection, returning a future representing the result of that computation. The returned future will be resolved when the stream terminates.
It can only yield the collection once all the futures have settled.
If you access the stream lazily, it'll yield items as they become available:
let mut s = stream::FuturesOrdered::new();
s.push(future::lazy(|_| 1).boxed());
s.push(future::lazy(|_| panic!("never resolves")).boxed());
let f = s.next().await;
println!("{:?}", f);
works just fine, despite it not being possible for the second future to resolve. If you try to collect it, it'll panic.
How can I get a stream of futures that are executed concurrently like this? I was hoping for something like this:
Exactly like that?
let mut s = stream::FuturesOrdered::new();
s.push(sleep(Duration::from_millis(100)));
s.push(sleep(Duration::from_millis(200)));
s.push(sleep(Duration::from_millis(400)));
s.push(sleep(Duration::from_millis(300)));
s.push(sleep(Duration::from_millis(500)));
let start = Instant::now();
while s.next().await.is_some() {
println!("{:.2?}", Instant::now() - start);
}
101.49ms
200.98ms
400.94ms
400.96ms
501.40ms
(using millisecond sleeps because multi-second sleep tends to trip the playground's timeout)

Related

get() function doesn't yield any result on tuple keys, and the values exist and do match when manually checked

let mut FEED_CA_:HashMap<(Address, Address, Address), (iuniswapv2pair_mod::IUniswapV2Pair<Provider<Ws>>, iuniswapv2pair_mod::IUniswapV2Pair<Provider<Ws>>, iuniswapv2pair_mod::IUniswapV2Pair<Provider<Ws>>, iuniswapv2pair_mod::IUniswapV2Pair<Provider<Ws>>, iuniswapv2pair_mod::IUniswapV2Pair<Provider<Ws>>, iuniswapv2pair_mod::IUniswapV2Pair<Provider<Ws>>)> = HashMap::new();
for i in FEED_CA_START{
let c0 = i[0];
let c1 = i[1];
let c2 = i[2];
let p_01_ = ALL_C.get(&&(*c0, *c1)).ok_or(continue).unwrap();
FEED_CA_START is a vec of (c0, c1, c2) tuples :)
ALL_C is a HashMap of <(Address, Address), Address> from which I can't get the value out.
The whole iteration fails on last line in the above snippet. There's eight more of those get()s, and not one of them gets filled, the output data structure (FEED_CA_ ) to which I insert results after iterating through the whole db is always len = 0.
I've:
Printed out the values &(c0, c1) for each i in my tuple FEED_CA_START
Printed out the values of the faulty &(c0, c1), as well as &(*c0, *c1) and even &&(*c0, *c1).
Manually found that they indeed do match, do exist (100% of those calls should succeed).
Printed out the types of the above and made sure they indeed do match. Same result, even if types of the query and key match and are both exactly: &(primitive_types::H160, primitive_types::H160)
Done some borrow&deref trial-and-error combinations, there's only so many I can try, and still no hope.
printed out the databases I've derived those from and manually checked - there's multiple matches, and the program returns exactly 0 always, and I've tried multiple functions/methods/iterations, my program always breaks at get() function, goes straight to continue.
I ran a simple test where in a new program I've created a HashMap with addresses akin to ALL_C , and accessed it using a tuple key - and there it worked. Here it doesn't. I've even thrown some reference/typing spaghetti there to make sure it breaks as well, it didn't. Got the value every time. Here I'm trying as hard as I can to make it work, and it's day 2 stuck on this issue, dead end.
How do I approach a problem like that? I'm lost.
If I've understood the code snippet is correctly, I think your problem is the .ok_or(continue).
Arguments passed to ok_or are eagerly evaluated; if you are passing the result of a function call, it is recommended to use ok_or_else, which is lazily evaluated.
In other words, the continue is always evaluated and so the code is really not doing what you expect.
let p_01_ = ALL_C.get(&(c0, c1)).ok_or(continue).unwrap();
println!("unreachable");
As a possible solution, you could use filter_map which will ignore all None results and then check the length. It doesn't seem very elegant but it might be better that repeated if let blocks:
// Note: I just guessed some types for an example
for i in FEED_CA_START {
let c0 = i.0;
let c1 = i.1;
let c2 = i.2;
let args = vec![(c0, c1), (c0, c2)];
let ps: Vec<&Address> = args.iter().filter_map(|arg| ALL_C.get(arg)).collect();
if ps.len() != args.len() {
continue;
}
// Do something
}
As per the other answer, continue is evaluated so it never get to check the other code. IMO it should be imposible to construct that in the first place.
On the other hand, it is not a good construct to do that, you should use something more rusty like:
let mut FEED_CA_:HashMap<(Address, Address, Address), (iuniswapv2pair_mod::IUniswapV2Pair<Provider<Ws>>, iuniswapv2pair_mod::IUniswapV2Pair<Provider<Ws>>, iuniswapv2pair_mod::IUniswapV2Pair<Provider<Ws>>, iuniswapv2pair_mod::IUniswapV2Pair<Provider<Ws>>, iuniswapv2pair_mod::IUniswapV2Pair<Provider<Ws>>, iuniswapv2pair_mod::IUniswapV2Pair<Provider<Ws>>)> = HashMap::new();
for i in FEED_CA_START{
let c0 = i[0];
let c1 = i[1];
let c2 = i[2];
if let Some(p_01_) = ALL_C.get(&&(*c0, *c1)) {
// compute stuff over `p_01_`
} else { // else could be avoided if you don't need to compute anything else in the loop if there is nothing for the key
continue
};
...

Rust - nested loop going only once inside inner loop

I am taking data from text file
let fil1_json = File::open("fil1.json")?;
let mut fil1_json_reader = BufReader::new(fil1_json);
let fil2_json = File::open("fil2.json")?;
let mut fil2_json_reader = BufReader::new(fil2_json);
for fil1_line in fil1_json_reader.by_ref().lines() {
for fil2_line in fil2_json_reader.by_ref().lines() {
println!("{:#?} ----- {:#?}", fil1_line, fil2_line);
}
}
In the second nested loop, it is only going inside once. It looks like fil2_json_reader is getting emptied after first iteration.
Where it is changing as I am not changing anywhere?
Where it is changing as I am not changing anywhere?
Readers consume the data. In the case of File, this is the natural expectation, since file abstractions almost universally have a cursor that advances every time you read.
If you want to iterate several times over the same data, then the obvious option is saving it to memory (typically before splitting into lines(), but you can also save a vector of those even if it will be slower). However, since the reader is backed by an actual file, it is better to re-iterate over the file by seeking to its beginning:
fil2_json_reader.seek(SeekFrom::Start(0))

What is the difference between this "atomic" Rust code and its "non-atomic" counterpart?

I'm fairly new to Rust. I graduated with a Computer Engineering degree 4 years ago, and I remember discussing (and understanding) atomic operations in my Operating Systems course. However, since graduating, I've been working primarily in high-level languages where I haven't had to care about low-level stuff like atomics. Now that I'm getting into Rust, I'm struggling to remember how a lot of this stuff works.
I'm currently trying to understand the source code for the hibitset library, specifically atomic.rs.
This module specifies an AtomicBitSet type which corresponds to the BitSet type from lib.rs, but using atomic values and operations. From my understanding, an "atomic operation" is an operation that is guaranteed to not be interrupted by another thread; any "load" or "store" on the same value will have to wait for the operation to finish before proceeding. Following from this definition, an "atomic value" is a value whose operations are fully atomic. AtomicBitSet uses AtomicUsize, which is a usize wrapper where all methods are fully atomic. However, AtomicBitSet specifies several operations that seem to not be atomic (add and remove), and there is one atomic operation: add_atomic. Looking at add vs add_atomic, I can't really tell what the difference is.
Here is add (verbatim):
/// Adds `id` to the `BitSet`. Returns `true` if the value was
/// already in the set.
#[inline]
pub fn add(&mut self, id: Index) -> bool {
use std::sync::atomic::Ordering::Relaxed;
let (_, p1, p2) = offsets(id);
if self.layer1[p1].add(id) {
return true;
}
self.layer2[p2].store(self.layer2[p2].load(Relaxed) | id.mask(SHIFT2), Relaxed);
self.layer3
.store(self.layer3.load(Relaxed) | id.mask(SHIFT3), Relaxed);
false
}
This method calls load() and store() directly. I'm assuming that the fact that it's using Ordering::Relaxed is what makes this method non-atomic, because another thread doing the same thing to a different index might clobber this operation.
Here is add_atomic (verbatim):
/// Adds `id` to the `AtomicBitSet`. Returns `true` if the value was
/// already in the set.
///
/// Because we cannot safely extend an AtomicBitSet without unique ownership
/// this will panic if the Index is out of range.
#[inline]
pub fn add_atomic(&self, id: Index) -> bool {
let (_, p1, p2) = offsets(id);
// While it is tempting to check of the bit was set and exit here if it
// was, this can result in a data race. If this thread and another
// thread both set the same bit it is possible for the second thread
// to exit before l3 was set. Resulting in the iterator to be in an
// incorrect state. The window is small, but it exists.
let set = self.layer1[p1].add(id);
self.layer2[p2].fetch_or(id.mask(SHIFT2), Ordering::Relaxed);
self.layer3.fetch_or(id.mask(SHIFT3), Ordering::Relaxed);
set
}
This method uses fetch_or instead of calling load and store directly, which I'm assuming is what makes this method atomic.
But why does the usage of Ordering::Relaxed still allow this to be considered atomic? I realize that the individual "or" operations are atomic, but the full method could be run at the same time as another thread. Wouldn't that have an impact?
Moreover, why would a type like this expose non-atomic methods? Is it just for performance? That seems confusing to me. If I were to pick an AtomicBitSet over a BitSet because it's going to be used by more than one thread, I'd probably want to only use atomic operations on it. If I didn't I wouldn't be using it. Right?
I'd also love an explanation of the comment inside add_atomic. As-is it does not make sense to me. Doesn't the non-atomic version still have to care about that? It seems like the two methods are doing effectively the same thing, just with different levels of atomicity.
I'd really just love some help wrapping my head around atomics. I think I understand ordering after reading this and this, but both are still using concepts that I don't understand. When they talk about one thread "seeing" something from another, what does that mean exactly? When it's said that sequentially-consistent operations have the same order "across all threads" what does that even mean? Does the processor change the instruction order differently for different threads?
In the non-atomic case, this line:
self.layer2[p2].store(self.layer2[p2].load(Relaxed) | id.mask(SHIFT2), Relaxed);
is more or less equivalent to:
let tmp1 = self.layer2[p2];
let tmp2 = tmp1 | id.mask(SHIFT2);
self.layer2[p2] = tmp2;
so another thread could change self.layer2[p2] between the moment it is read into tmp1 and the moment tmp2 is stored into it. So if another thread tries to set another bit at the same time, there is a risk that the following sequence occurs:
thread 1 reads an empty mask,
thread 2 reads an empty mask,
thread 1 sets bit 1 of the mask and writes it,
thread 2 sets bit 2 of the mask and writes it, thus overwriting the value set by thread 1,
in the end only bit 2 is set!
The same goes for self.layer3.
In the atomic case, the use of fetch_or guarantees that the whole read-modify-write cycle is atomic.
In both cases, since the ordering is relaxed, the writes to layer2 and layer3 may seem to occur in any order as seen from other threads.
The comment inside add_atomic is meant avoid an issue when two threads try to add the same bit. Assume that add_atomic was written like this:
pub fn add_atomic(&self, id: Index) -> bool {
let (_, p1, p2) = offsets(id);
if self.layer1[p1].add(id) {
return true;
}
self.layer2[p2].fetch_or(id.mask(SHIFT2), Ordering::Relaxed);
self.layer3.fetch_or(id.mask(SHIFT3), Ordering::Relaxed);
false
}
Then you risk the following sequence:
thread 1 sets bit 1 in layer1 and sees that it wasn't set beforehand,
thread 2 tries to set bit 1 in layer1 and sees that thread 1 already set it, so thread 2 returns from add_atomic,
thread 2 executes another operation that requires reading layer3, but layer3 has not been updated yet, so thread 2 gets a wrong value!
thread 1 updates layer3, but it is too late.
This is why the add_atomic case ensures that layer2 and layer3 are set properly in all threads even if it looked like the bit was already set beforehand.

blockingForEach(), why apply function to blocked observables

I'm having trouble understanding the point of a blocking Observable, specifically blockingForEach()
What is the point in applying a function to an Observable that we will never see?? Below, I'm attempting to have my console output in the following order
this is the integer multiplied by two:2
this is the integer multiplied by two:4
this is the integer multiplied by two:6
Statement comes after multiplication
My current method prints the statement before the multiplication
fun rxTest(){
val observer1 = Observable.just(1,2,3).observeOn(AndroidSchedulers.mainThread())
val observer2 = observer1.map { response -> response * 2 }
observer2
.observeOn(AndroidSchedulers.mainThread())
.subscribeOn(AndroidSchedulers.mainThread())
.subscribe{ it -> System.out.println("this is the integer multiplie by two:" + it) }
System.out.println("Statement comes after multiplication ")
}
Now I have my changed my method to include blockingForEach()
fun rxTest(){
val observer1 = Observable.just(1,2,3).observeOn(AndroidSchedulers.mainThread())
val observer2 = observer1.map { response -> response * 2 }
observer2
.observeOn(AndroidSchedulers.mainThread())
.subscribeOn(AndroidSchedulers.mainThread())
.blockingForEach { it -> System.out.println("this is the integer multiplie by two:" + it) }
System.out.println("Statement comes after multiplication ")
}
1.)What happens to the transformed observables once no longer blocking? Wasnt that just unnecessary work since we never see those Observables??
2.)Why is my System.out("Statement...) appear before my observables when I'm subscribing?? Its like observable2 skips its blocking method, makes the System.out call and then resumes its subscription
It's not clear what you mean by your statement that you will "never see" values emitted by an observer chain. Each value that is emitted in the observer chain is seen by observers downstream from the point where they are emitted. At the point where you subscribe to the observer chain is the usual place where you perform a side effect, such as printing a value or storing it into a variable. Thus, the values are always seen.
In your examples, you are getting confused by how the schedulers work. When you use the observeOn() or subscribeOn() operators, you are telling the observer chain to emit values after the value is move on to a different thread. When you move data between threads, the destination thread has to be able to process the data. If your main code is running on the same thread, you can lock yourself out or you will re-order operations.
Normally, the use of blocking operations is strongly discouraged. Blocking operations can often be used when testing, because you have full control of the consequences. There are a couple of other situations where blocking may make sense. An example would be an application that requires access to a database or other resource; the application has no purpose without that resource, so it blocks until it becomes available or a timeout occurs, kicking it out.

Multithreading based on duplicated jOOλ streams

The code below represents a toy example of the problem I am trying to solve.
Imagine that we have an original stream of data originalStream and that the goal is to apply 2 very different data processing. As an example here, one data processing will multiply each element by 2 and sum the result (dataProcess1) and the other will multiply by 4 and sum the result (dataProcess2). Obviously the operation would not be so simple in real life....
The idea is to use jOOλ in order to duplicate the stream and apply both operations to the 2 streams. However, the trick is that I want to run both data processing in different threads. Since originalStream.duplicate() is not thread-safe out of the box, the code below will fail to give the right result which should be: result1 = 570; result2 = 180. Instead the code may unpredictably fail on NPE, yield the wrong result or (sometimes) even give the right result...
The question is how to minimally modify the code such that it will become thread-safe.
Note that I do not want to first collect the stream into a list and then generate 2 new streams. Instead I want to stay with streams until they are eventually collected at the end of the data process. It may not be the most efficient nor the most logical thing to want to do but I think it is nevertheless conceptually interesting. Note also that I wish to keep using org.jooq.lambda.Seq (group: 'org.jooq', name: 'jool', version: '0.9.12') as much as possible as the real data processing functions will use methods that are specific to this library and not present in regular Java streams.
Seq<Long> originalStream = seq(LongStream.range(0, 10));
Tuple2<Seq<Long>, Seq<Long>> duplicatedOriginalStream = originalStream.duplicate();
ExecutorService executor = Executors.newFixedThreadPool(2);
List<Future<Long>> res = executor.invokeAll(Arrays.asList(
() -> duplicatedOriginalStream.v1.map(x -> 2 * x).zipWithIndex().map(x -> x.v1 * x.v2).reduce((x, y) -> x + y).orElse(0L),
() -> duplicatedOriginalStream.v2.map(x -> 4 * x).reduce((x, y) -> x + y).orElse(0L)
));
executor.shutdown();
System.out.printf("result1 = %d\tresult2 = %d\n", res.get(0).get(), res.get(1).get());

Resources