Is there a way to make StreamExt::next non blocking (fail fast) if the stream is empty (need to wait for the next element)?

Is there a way to make StreamExt::next non blocking (fail fast) if the stream is empty (need to wait for the next element)? - rust

Currently I am doing something like this
use tokio::time::timeout;
while let Ok(option_element) = timeout(Duration::from_nanos(1), stream.next()).await {
...
}
to drain the items already in the rx buffer of the stream. I don't want to wait for the next element that has not been received.
I think the timeout would slow down the while loop.
I am wondering that is there a better way to do this without the use of the timeout?
Possibly like this https://github.com/async-rs/async-std/issues/579 but for the streams in futures/tokio.

The direct answer to your question is to use the FutureExt::now_or_never method from the futures crate as in stream.next().now_or_never().
However it is important to avoid writing a busy loop that waits on several things by calling now_or_never on each thing in a loop. This is bad because it is blocking the thread, and you should prefer a different solution such as tokio::select! to wait for multiple things. For the special case of this where you are constantly checking whether the task should shut down, see this other question instead.
On the other hand, an example where using now_or_never is perfectly fine is when you want to empty a queue for the items available now so you can batch process them in some manner. This is fine because the now_or_never loop will stop spinning as soon as it has emptied the queue.
Beware that if the stream is empty, then now_or_never will succeed because next() immediately returns None in this case.

Related

Get all avalable items from a futures Stream (non-blocking)

I have a WebSocket connection which wraps a futures_core::stream::Stream (incoming) and Sink (outgoing).
I want to decode and process all available messages from the Stream without blocking. Clearly at the socket level it's a TCP/IP stream of bytes and there is going to be 0..N messages sitting in the socket receive buffer waiting for a call to read(). A non-blocking call to read could well read multiple pipelined websocket frames. At the level of the Rust abstraction this might be possible with fn poll_next(...):
The trait is modelled after Future, but allows poll_next to be called
even after a value has been produced, yielding None once the stream
has been fully exhausted.
However, I don't know how to use this poll method directly without the async/await syntax, and even if I can, I don't see how it solves the problem. If I call it in a loop while I get back Some(frame), collecting the frames in a Vec, it will still suspend the task when it runs out of buffered frames and return Poll::Pending - so I won't be able to do anything with the collected frames immediately anyway. Ideally I need to process the collected frames when I get Poll::Pending without suspending anything, and then call it again allowing it to suspend only the second time around, if need be. Is there a solution possible here that doesn't involve discarding all of the future abstractions and resorting to buffering and parsing web socket frames myself?

You seem to have a misunderstanding of how suspensions work. When the parent function calls poll_next in a loop, it is not poll_next returning Poll::Pending that results in a suspension. Instead it is when the function containing the loop returns a Poll::Pending as a result of that. But there is nothing that says you have to do that immediatly. You are free to process the frames you have collected before returning to the executor.

thread with a forever loop with one inherently asynch operation

I'm trying to understand the semantics of async/await in an infinitely looping worker thread started inside a windows service. I'm a newbie at this so give me some leeway here, I'm trying to understand the concept.
The worker thread will loop forever (until the service is stopped) and it processes an external queue resource (in this case a SQL Server Service Broker queue).
The worker thread uses config data which could be changed while the service is running by receiving commands on the main service thread via some kind of IPC. Ideally the worker thread should process those config changes while waiting for the external queue messages to be received. Reading from service broker is inherently asynchronous, you literally issue a "waitfor receive" TSQL statement with a receive timeout.
But I don't quite understand the flow of control I'd need to use to do that.
Let's say I used a concurrentQueue to pass config change messages from the main thread to the worker thread. Then, if I did something like...
void ProcessBrokerMessages() {
foreach (BrokerMessage m in ReadBrokerQueue()) {
ProcessMessage(m);
}
}
// ... inside the worker thread:
while (!serviceStopped) {
foreach (configChange in configChangeConcurrentQueue) {
processConfigChange(configChange);
}
ProcessBrokerMessages();
}
...then the foreach loop to process config changes and the broker processing function need to "take turns" to run. Specifically, the config-change-processing loop won't run while the potentially-long-running broker receive command is running.
My understanding is that simply turning the ProcessBrokerMessages() into an async method doesn't help me in this case (or I don't understand what will happen). To me, with my lack of understanding, the most intuitive interpretation seems to be that when I hit the async call it would go off and do its thing, and execution would continue with a restart of the outer while loop... but that would mean the loop would also execute the ProcessBrokerMessages() function over and over even though it's already running from the invocation in the previous loop, which I don't want.
As far as I know this is not what would happen, though I only "know" that because I've read something along those lines. I don't really understand it.
Arguably the existing flow of control (ie, without the async call) is OK... if config changes affect ProcessBrokerMessages() function (which they can) then the config can't be changed while the function is running anyway. But that seems like it's a point specific to this particular example. I can imagine a case where config changes are changing something else that the thread does, unrelated to the ProcessBrokerMessages() call.
Can someone improve my understanding here? What's the right way to have
a block of code which loops over multiple statements
where one (or some) but not all of those statements are asynchronous
and the async operation should only ever be executing once at a time
but execution should keep looping through the rest of the statements while the single instance of the async operation runs
and the async method should be called again in the loop if the previous invocation has completed
It seems like I could use a BackgroundWorker to run the receive statement, which flips a flag when its job is done, but it also seems weird to me to create a thread specifically for processing the external resource and then, within that thread, create a BackgroundWorker to actually do that job.

You could use a CancelationToken. Most async functions accept one as a parameter, and they cancel the call (the returned Task actually) if the token is signaled. SqlCommand.ExecuteReaderAsync (which you're likely using to issue the WAITFOR RECEIVE is no different. So:
Have a cancellation token passed to the 'execution' thread.
The settings monitor (the one responding to IPC) also has a reference to the token
When a config change occurs, the monitoring makes the config change and then signals the token
the execution thread aborts any pending WAITFOR (or any pending processing in the message processing loop actually, you should use the cancellation token everywhere). any transaction is aborted and rolled back
restart the execution thread, with new cancellation token. It will use the new config

So in this particular case I decided to go with a simpler shared state solution. This is of course a less sound solution in principle, but since there's not a lot of shared state involved, and since the overall application isn't very complicated, it seemed forgivable.
My implementation here is to use locking, but have writes to the config from the service main thread wrapped up in a Task.Run(). The reader doesn't bother with a Task since the reader is already in its own thread.

Forcing a message loop to yield

I hope this question isn't too broad.
I'm working with a legacy Ada application. This application is built around a very old piece of middleware that handles, among other things, our IPC. For the sake of this question, I can boil the middleware's provisions down to
1: a message loop that processes messages (from other programs or this program)
2: a function to send messages to this program or others
3: a function to read from a database
The program operates mainly on a message loop - simply something like
loop
This_Msg := Message_Loop.Wait_For_Message; -- Blocking wait call
-- Do things based on This_Msg's ID
end loop
however there are also callbacks that can be triggered by external stimuli. These callbacks run in their own threads. Some of these callbacks call the database-reading function, which has always been fine, EXCEPT, as we recently discovered, in a relatively rare condition. When this condition occurs, it turns out it isn't safe to read from the database when the message loop is executing its blocking Wait_For_Message.
It seemed like a simple solution would be to use a protected object to synchronize the Wait_For_Message and database read: if we try to read the database while Wait_For_Message is blocking, the read will block until Wait_For_Message returns, at which point the Wait_For_Message call will be blocked until the database read is complete. The next problem is that I can't guarantee the message loop will receive a message in a timely fashion, meaning that the database read could be blocked for an arbitrary amount of time. It seems like the solution to this is also simple: send a do-nothing message to the loop before blocking, ensuring that the Wait_For_Message call will yield.
What I'm trying to wrap my head around is:
If I send the do-nothing message and THEN block before the database read, I don't think I can guarantee that Wait_For_Message won't have returned, yielded, processed the do-nothing message, and started blocking again before the pre-database read block. I think I conceptually need to start blocking and THEN push a message, but I'm not sure how to do this. I think I could handle it with a second layer of locks, but I can't think of the most efficient way to do so, and don't know if that's even the right solution. This is really my first foray into concurrency in Ada, so I'm hoping for a pointer in the right direction.

Perhaps you should use a task for this; the following would have the task waiting at the SELECT to either process a message or access the DB while another call on an entry during the processing would queue on that entry for the loop to reiterate the select, thus eliminating the problem altogether... unless, somehow, your DB-access calls the message entry; but that shouldn't happen.
Package Example is
Task Message_Processor is
Entry Message( Text : String );
Entry Read_DB( Data : DB_Rec );
End Message_Processor;
End Example;
Package Body Example is
Task Body Message_Processor is
Package Message_Holder is new Ada.Containers.Indefinite_Holders
(Element_Type => String);
Package DB_Rec_Holder is new Ada.Containers.Indefinite_Holders
(Element_Type => DB_Rec);
Current_Message : Message_Holder.Holder;
Current_DB_Rec : DB_Rec_Holder.Holder;
Begin
MESSAGE_LOOP:
loop
select
accept Message (Text : in String) do
Current_Message:= Message_Holder.To_Holder( Text );
end Message;
-- Process the message **outside** the rendevouz.
delay 1.0; -- simulate processing.
Ada.Text_IO.Put_Line( Current_Message.Element );
or
accept Read_DB (Data : in DB_Rec) do
Current_DB_Rec:= DB_Rec_Holder.To_Holder( Data );
end Message;
-- Process the DB-record here, **outside** the rendevouz.
or
Terminate;
end select;
end loop MESSAGE_LOOP;
End Message_Processor;
End Example;

How does the stack work on NodeJs?

I've recently started reading up on NodeJS (and JS) and am a little confused about how call backs are working in NodeJs. Assume we do something like this:
setTimeout(function(){
console.log("World");
},1000);
console.log("Hello");
output: "Hello World"
So far on what I've read JS is single threaded so the event loop is going though one big stack, and I've also been told not to put big calls in the callback functions.
1)
Ok, so my first question is assuming its one stack does the callback function get run by the main event loop thread? if so then if we have a site with that serves up content via callbacks (fetches from db and pushes request), and we have 1000 concurrent, are those 1000 users basically being served synchronously, where the main thread goes in each callback function does the computation and then continues the main event loop? If this is the case, how exactly is this concurrent?
2) How are the callback functions added to the stack, so lets say my code was like the following:
setTimeout(function(){
console.log("Hello");
},1000);
setTimeout(function(){
console.log("World");
},2000);
then does the callback function get added to the stack before the timeout even has occured? if so is there a flag that's set that notifies the main thread the callback function is ready (or is there another mechanism). If this is infact what is happening doesn't it just bloat the stack, especially for large web applications with callback functions, and the larger the stack the longer everything will take to run since the thread has to step through the entire thing.

The event loop is not a stack. It could be better thought of as a queue (first in/first out). The most common usage is for performing I/O operations.
Imagine you want to read a file from disk. You start by saying hey, I want to read this file, and when you're done reading, call this callback. While the actual I/O operation is being performed in a separate process, the Node application is free to keep doing something else. When the file has finished being read, it adds an item to the event loop's queue indicating the completion. The node app might still be busy doing something else, or there may be other items waiting to be dequeued first, but eventually our completion notification will be dequeued by node's event loop. It will match this up with our callback, and then the callback will be called as expected. When the callback has returned, the event loop dequeues the next item and continues. When there is nothing left in node's event loop, the application exits.
That's a rough approximation of how the event loop works, not an exact technical description.
For the specific setTimeout case, think of it like a priority queue. Node won't consider dequeuing the item/job/callback until at least that amount of time has passed.
There are lots of great write-ups on the Node/JavaScript event loop which you probably want to read up on if you're confused.

callback functions are not added to caller stack. There is no recursion here. They are called from event loop. Try to replace console.log in your example and watch result - stack is not growing.

Should I use for loop async way when I use node.js?

I'm testing with node.js with express.
Theoretically, If I run something very heavy calculation on a "for loop" without any callbacks,
it is blocked and other request should be ignored.
But In my case, regular "for loop"
for(var i=0;i<300000;i++) {
console.log( i );
}
does not make any request blocks but just high cpu load.
It accepts other requests as well.
but why should I use some other methods to make these non-blocking such as
process.nextTick()
Or does node.js take care of basic loop functions ( for, while ) with wrapping them with process.nextTick() as default?

Node runs in a single thread with an event loop, so as you said, when your for loop is executing, no other processing will happen. The underlying operating system TCP socket may very well accept incoming connections, but if node is busy doing your looping logic then the request itself won't be processed until afterward.
If you absolutely must run some long-running processin Node, then you should use separate worker processes to do the calculation, and leave the main event loop to do request handling.

Node doesn't wrap loops with process.nextTick().
It may be that your program is continuing to accept new connections because console.log is yielding control back to the main event loop; since it's an I/O operation.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Is there a way to make StreamExt::next non blocking (fail fast) if the stream is empty (need to wait for the next element)? - rust

Related

Get all avalable items from a futures Stream (non-blocking)

thread with a forever loop with one inherently asynch operation

Forcing a message loop to yield

How does the stack work on NodeJs?

Should I use for loop async way when I use node.js?

Categories

Resources