asyncio streams check if reader has data

asyncio streams check if reader has data - python-3.x

So I want to implement a simple comms protocol where reads & writes are completely asynchronous. That means that client sends some data and then server may or may not respond with an answer. So I can't just call reader.read() because that blocks until at least something is returned. And I may have something more to send in the mean time.
So is there a way to check if reader has something to read?
(please note that I'm talking specifically about the streams version: I'm fully aware that protocols version has separate handlers for reading and writing and does not suffer from this issue)

There is no way to ask reader has incoming data or not.
I guess to create asyncio.Task for reading data from asyncio stream reader in loop.
If you need to write data asynchronously feel free to call StreamWriter.write() from any task that have some outgoing data.
I strongly dont recommend to use protocols directly -- they are low-level abstraction useful for flow control but for application code is better to use high-level streams.

Related

Is python3's BufferedProtocol an abstraction over TCP? Or is it so low-level that I have to implement the TCP things too?

I am referring to this: https://docs.python.org/3/library/asyncio-protocol.html#asyncio.BufferedProtocol
I haven't seen the answer to this question documented anywhere and I want to know the answer in advance of writing any code.
It seems to imply that it is a modification of asyncio.Protocol (for TCP) but seeing as though TCP is not mentioned for BufferedProtocol it's got me concerned that I'd have to contend with out of order packets etc.
Many thanks!

BufferedProtocol isn't a protocol based on TCP, it's an interface (base class) for custom implementation of asyncio protocols, specifically those that try to minimize the amount of copying. The docstring provides more details:
The idea of BufferedProtocol is that it allows to manually allocate and control the receive buffer. Event loops can then use the buffer provided by the protocol to avoid unnecessary data copies. This can result in noticeable performance improvement for protocols that receive big amounts of data. Sophisticated protocols can allocate the buffer only once at creation time.
Currently none of the protocols shipped with asyncio derive from BufferedProtocol, so the use case for this is user code that needs to achieve high throughput - see the BPO issue and the linked mailing list post for details.
seeing as though TCP is not mentioned for BufferedProtocol it's got me concerned that I'd have to contend with out of order packets etc.
Unless you are writing custom low-level asyncio code, you shouldn't care about BufferedProtocol at all. Regular asyncio TCP code calls functions such as open_connection or start_server, both of which handle provide a streaming abstraction on top of TCP sockets in the usual way (using a buffer, handling errors, etc.).

I can confirm - BufferedProtocol is for TCP only. Not for files or anything else. And it gives you a handle on a zero copy buffer to work with. That's basically all I wanted to know.

Channels in Go, and emitters in node.js?

Does Go have an equivalent of node.js' "emitter"?
I'm teaching myself Go by porting over a node.js library I wrote. In the node version, the library emits an event once something happens (e.g. it listens on UDP port 1234 and when "ABC" is received, "abcreceived" is emitted so the calling code can respond as necessary (e.g. sending back "DEF")
I've seen channels in Go (and am currently reading up on them), but as I'm still new to this language, I don't know if (or how, for that matter) that can be used to communicate with whatever code is using my library.
I've also seen https://github.com/chuckpreslar/emission, but am not sure if this is acceptable, or if there's a better ("Best practice") way of doing things.

Go and Node.js are very different. Node.js supports concurrency only via callbacks. There might be various ways of dressing them up, but they're fundamentally callbacks.
In Node.js, there is no parallelism; Node.js has a single-threaded runtime. When Node.js async is used to achieve what is called 'parallel' execution, it isn't parallel in the sense used in Go, but concurrent.
Concurrency is not parallelism in the Go world.
Go has explicit concurrency based on Communicating Sequential Processes (CSP), a mathematical basis conceived by Tony Hoare at Oxford. The runtime interleaves cooperating processes called goroutines by time-slicing them onto the available CPU cores. Within each goroutine, the code is single threaded, so is easy to write. In the simple case, no data is shared between goroutines; instead messages pass between them along channels. In this way, there is no need for callbacks.
When goroutines get blocked waiting for I/O, that's OK because they don't use any CPU time until they're unblocked. Their memory footprint is slight and you can have very large numbers of them. So callbacks are not needed for I/O operations either.
Because the execution models of Go and Node.js are about as different as they could be, attempting to port code from one to the other is very likely to lead to very clumsy solutions. It's better to start from the original requirements and implement from scratch.
It would be possible to distort the Go concurrency model using function arguments to behave like callbacks. This would be a bad idea because it would not be idiomatic and would lose the benefits that CSP gives.

So by reading others' Go code and some links in the comments to my question, I think channels are the way to go.
In my library code (semi pseudo-code):
// Make a new channel called "Events"
var Events = make(chan
func doSomething() {
// ...
Events <-"abcreceived" // Add "abcreceived" to the Events channel
}
And in the code that will use my library:
evt := <-mylib.Events
switch evt {
case "abcreceived":
sendBackDEF()
break
// ...
}
I still prefer node.js' EventEmitter (because you can transfer data back easily) but for simple things, this should suffice.

Idiomatic way to handle writes to a TcpStream while waiting on read

As a way to familiarize myself with Rust & networking in general I started writing a very basic telnet chat server. Everything appears to be going well, but right now I end up with blocks of unsafe code, and I wanted to know if there was a better way to do things.
I spawn a task to listen for connections similar to this: Example TCP server written in Rust Once a user connects and I get a TcpStream I put it in a Connection struct. A Connection struct uses channels to communicate with two tasks- one for reading from its TcpStream, and one for writing. The reading task blocks on the TcpStream's read() method and sends any input back to the Connection struct. The writer blocks on a Port's recv() method and writes anything it receives to the TcpStream. In this way, the main loop of the program can simply maintain a vector of Connection structs to check for user input and write to them at will. The issue is that with this implementation the TcpStream must be be shared by both the read & write tasks, and the mutable write() method called whilst the mutable read() method is still blocking in the other task. In 0.8 I did this with an Rc and unsafe_borrow_mut's, but I'd like to do it in a better fashion if I can- the objections of Rust's type system in this case may be completely valid, for all I know. Any comments on overall design would be welcome as well. Thanks.

There is currently no idiomatic way to do this because Rust's IO API currently does not support parallel read or writes.
See this issue for more info: https://github.com/mozilla/rust/issues/11165
There will be a redesign of the TcpStream API to allow such things in one or the other way (see issue) and which will then provide an "idiomatic way" to do this.

winsock 2. thread safety for simultaneous send's. tcp

is it possible to have multiple threads sending on the same socket? will there be interleaving of the streams or will the socket block on the first thread (assuming tcp)? the majority of opinions i've found seems to warn against doing this for obvious fears of interleaving, but i've also found a few comments that state the opposite. are interleaving fears a carryover from winsock1 and are they well-founded for winsock2? is there a way to setup a winsock2 socket that would allow for lack of local synchronization?
two of the contrary opinions below... who's right?
comment 1
"Winsock 2 implementations should be completely thread safe. Simultaneous reads / writes on different threads should succeed, or fail with WSAEINPROGRESS, depending on the setting of the overlapped flag when the socket is created. Anyway by default, overlapped sockets are created; so you don't have to worry about it. Make sure you don't use NT SP6, if ur on SP6a, you should be ok !"
source
comment 2
"The same DLL doesn't get accessed by multiple processes as of the introduction of Windows 95. Each process gets its own copy of the writable data segment for the DLL. The "all processes share" model was the old Win16 model, which is luckily quite dead and buried by now ;-)"
source
looking forward to your comments!
jim
~edit1~
to clarify what i mean by interleaving. thread 1 sends the msg "Hello" thread 2 sends the msg "world!". recipient receives: "Hwoel lorld!". this assumes both messages were NOT sent in a while loop. is this possible?

I'd really advice against doing this in any case. The send functions might send less than you tell it to for various very legit reasons, and if another thread might enter and try to also send something, you're just messing up your data.
Now, you can certainly write to a socket from several threads, but you've no longer any control over what gets on the wire unless you've proper locking at the application level.
consider sending some data:
WSASend(sock,buf,buflen,&sent,0,0,0:
the sent parameter will hold the no. of bytes actually sent - similar to the return value of the send()function. To send all the data in buf you will have to loop doing a WSASend until all all the data actually get sent.
If, say, the first WSASend sends all but the last 4 bytes, another thread might go and send something while you loop back and try to send the last 4 bytes.
With proper locking to ensure that can't happen, it should e no problem sending from several threads - I wouldn't do it anyway just for the pure hell it will be to debug when something does go wrong.

is it possible to have multiple threads sending on the same socket?
Yes - although, depending on implementation this can be more or less visible. First, I'll clarify where I am coming from:
C# / .Net 3.5
System.Net.Sockets.Socket
The overall visibility (i.e. required management) of threading and the headaches incurred will be directly dependent on how the socket is implemented (synchronously or asynchronously). If you go the synchronous route then you have a lot of work to manually manage connecting, sending, and receiving over multiple threads. I highly recommend that this implementation be avoided. The efforts to correctly and efficiently perform the synchronous methods in a threaded model simply are not worth the comparable efforts to implement the asynchronous methods.
I have implemented an asynchronous Tcp server in less time than it took for me to implement the threaded synchronous version. Async is much easier to debug - and if you are intent on Tcp (my favorite choice) then you really have few worries in lost messages, missing data, or whatever.
will there be interleaving of the streams or will the socket block on the first thread (assuming tcp)?
I had to research interleaved streams (from wiki) to ensure that I was accurate in my understanding of what you are asking. To further understand interleaving and mixed messages, refer to these links on wiki:
Real Time Messaging Protocol
Transmission Control Protocol
Specifically, the power of Tcp is best described in the following section:
Due to network congestion, traffic load balancing, or other unpredictable network behavior, IP packets can be
lost, duplicated, or delivered out of order. TCP detects these problems, requests retransmission of lost
packets, rearranges out-of-order packets, and even helps minimize network congestion to reduce the
occurrence of the other problems. Once the TCP receiver has finally reassembled a perfect copy of the data
originally transmitted, it passes that datagram to the application program. Thus, TCP abstracts the application's
communication from the underlying networking details.
What this means is that interleaved messages will be re-ordered into their respective messages as sent by the sender. It is expected that threading is or would be involved in developing a performance-driven Tcp client/server mechanism - whether through async or sync methods.
In order to keep a socket from blocking, you can set it's Blocking property to false.
I hope this gives you some good information to work with. Heck, I even learned a little bit...

What are good sources to study the threading implementation of a XMPP application?

From my understanding the XMPP protocol is based on an always-on connection where you have no, immediate, indication of when an XML message ends.
This means you have to evaluate the stream as it comes. This also means that, probably, you have to deal with asynchronous connections since the socket can block in the middle of an XML message, either due to message length or a connection being slow.
I would appreciate one source per answer so we can mod them up and see what's the favourite.

Are you wanting to deal with multiple connections at once? Good asynch socket processing is a must in that case, to avoid one thread per connection.
Otherwise, you just need an XML parser that can deal with a chunk of bytes at a time. Expat is the canonical example; if you're in Java, try XP. These types of XML parsers will fire events as possible, and buffer partial stanzas until the rest arrives.
Now, to address your assertion that there is no notification when a stanza ends, that's not really true. The important thing is not to process the XML stream as if it is a sequence of documents. Use the following pseudo-code:
stanza = null
while parser has more:
switch on token type:
START_TAG:
elem = create element from parser state
if stanza is not null:
add elem as child of stanza
stanza = elem
END_TAG:
parent = parent of stanza
if parent is not null:
fire OnStanza event
stanza = parent
This approach should work with an event-based or pull parser. It only requires holding on to one pointer worth of state. Obviously, you'll also need to handle attributes, character data, entity references (like & and the like), and special-purpose the stream:stream tag, but this should get you started.

Igniterealtime.org provides an open source XMPP-server and client written in java

ejabberd is written in Erlang. I don't know the details of the ejabberd implementation, but one advantage of using Erlang is really inexpensive threads. I'll speculate they start a thread per XMPP connection. In Erlang terminology these would be called processes, but these are not protected-memory address spaces they are lightweight user-space threads.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string