Will `TcpStream` be disabled once it receives an invalid UTF-8? - rust

I'm trying to create a server that receives a string from a client through TCP socket communication and returns the same string to the same socket. I want the following specifications:
Repeat communication with the client (corresponding to the loop block in the code below)
When the client receives a valid UTF-8 character, return the same character (Ok branch in the loop block)
When the client does not receive a valid UTF-8 character, return the string "Invalid data" (Err branch in the loop block)
use std::io::Error;
use tokio::io::{AsyncBufReadExt, AsyncWriteExt, BufReader};
use tokio::net::TcpListener;
#[tokio::main]
async fn main() -> Result<(), Error> {
// Create TCP listener
let addr = "localhost:8080";
let socket = TcpListener::bind(&addr).await;
let listener = socket.unwrap();
// Accept connection
let (mut socket, _addr) = listener.accept().await.unwrap();
// Split socket into read and write halves
let (reader, mut writer) = socket.split();
// Read buffer into string
let mut line = String::new();
let mut buf_reader = BufReader::new(reader);
loop {
match buf_reader.read_line(&mut line).await {
Ok(bytes) => {
if bytes == 0 {
// `bytes == 0` means the connection is closed
break;
}
}
Err(error) => {
println!("{error}");
line = "Invalid data".to_string();
}
}
// Respond to client
writer.write_all(line.as_bytes()).await.unwrap();
line.clear();
}
Ok(())
}
The client is telnet on macOS.
telnet localhost 8080
Below are how to reproduce the issue:
Typing "hello" returns "hello".
Typing ctrl-C and pressing Enter shows "stream did not contain valid UTF-8" on the server side and no response from the server is displayed on the telnet side (I want "Invalid data" to be displayed).
Typing "hello" again returns nothing, even though I have confirmed that the server is receiving it.
telnet output:
hello
hello
^C
hello
Will the TcpStream become invalid once invalid UTF-8 is received?
Expected behaviours
The server returns "Invalid data" when it receives invalid UTF-8 characters.
The server returns the same characters it receives if they are valid UTF-8 characters, even after receiving invalid UTF-8 characters in the previous loop.

It was an issue with telnet.
I created invalid.csv which contains 3 rows alternating between valid and invalid rows of UTF-8 sequences:
invalid.csv
hello
(invalid UTF-8 sequences)
hello
Then, I used a pipe with nc command:
cat invalid.csv | nc localhost 8080
The output was:
hello
Invalid data
hello
which is as expected.

Related

Encode utf8 on TcpStream

I am trying to route all my traffic by Iptables .
iptables -t nat -D OUTPUT -p tcp -j DNAT --to-destination 127.0.0.1:3400
to my Rust Code which is listening on specific port
let addrs = [
SocketAddr::from(([127, 0, 0, 1], 3400)),
];
let tcp = TcpListener::bind(&addrs[..]).expect("error bind tcp");
match tcp.accept() {
Ok((_socket,addr)) => println!("{:?} ",addr),
Err(_) => println!("error found"),
}
let mut buffer = [0;500];
let mut buf = unsafe {
slice::from_raw_parts_mut((&mut buffer).as_mut_ptr(),buffer.len())
};
for stream in tcp.incoming() {
let buf = stream.unwrap().read(buf).expect("stream read buffer ");
let result = StrType::from_utf8(&buffer).expect("result decode failed");
// println!("{:?} {:?}",buffer,buf);
println!("{:?}",buf);
println!("{}",result.len());
println!("{:?}\n\n",result);
}
then i want to read my data which UTF8 and i faced this such error .
thread 'main' panicked at 'result decode failed: Utf8Error { valid_up_to: 8, error_len: Some(1) }', src/main.rs:46:50
How can i resolve this error or how can i get data of requested ?
Thanks for your helping.
Since utf8 encoded strings' chars can vary in length from 1 to 4 bytes, when you are getting transfer over the network (or in other streaming way) it can happen, that packet (or the buffer you read into) is divided in the middle of a character. Rust requires that str and String types contains only valid utf8 encoded characters, so when you are trying to interpret the bytes as utf8 string it returns error.
Luckily this error type Utf8Error contains information about until which byte this byte slice is valid utf8. So you can use only the first, correct part, and the rest concatenate with further data. You can see the example of that in the linked documentation.
Also, you don't have to use unsafe slice::from_raw_parts_mut, just use &mut buffer.

Continuously process child process' outputs byte for byte with a BufReader

I'm trying to interact with an external command (in this case, exiftool) and reading the output byte by byte as in the example below.
While I can get it to work if I'm willing to first read in all the output and wait for the child process to finish, using a BufReader seems to result in indefinitely waiting for the first byte. I used this example as reference for accessing stdout with a BufReader.
use std::io::{Write, Read};
use std::process::{Command, Stdio, ChildStdin, ChildStdout};
fn main() {
let mut child = Command::new("exiftool")
.arg("-#") // "Read command line options from file"
.arg("-") // use stdin for -#
.arg("-q") // "quiet processing" (only send image data to stdout)
.arg("-previewImage") // for extracting thumbnails
.arg("-b") // "Output metadata in binary format"
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn().unwrap();
{
// Pass input file names via stdin
let stdin: &mut ChildStdin = child.stdin.as_mut().unwrap();
stdin.write_all("IMG_1709.CR2".as_bytes()).unwrap();
// Leave scope:
// "When an instance of ChildStdin is dropped, the ChildStdin’s underlying file handle will
// be closed."
}
// This doesn't work:
let stdout: ChildStdout = child.stdout.take().unwrap();
let reader = std::io::BufReader::new(stdout);
for (byte_i, byte_value) in reader.bytes().enumerate() {
// This line is never printed and the program doesn't seem to terminate:
println!("Obtained byte {}: {}", byte_i, byte_value.unwrap());
// …
break;
}
// This works:
let output = child.wait_with_output().unwrap();
for (byte_i, byte_value) in output.stdout.iter().enumerate() {
println!("Obtained byte {}: {}", byte_i, byte_value);
// …
break;
}
}
You're not closing the child's stdin. Your stdin variable is a mutable reference, and dropping that has no effect on the referenced ChildStdin.
Use child.stdin.take() instead of child.stdin.as_mut():
{
// Pass input file names via stdin
let stdin: ChildStdin = child.stdin.take().unwrap();
stdin.write_all("IMG_1709.CR2".as_bytes()).unwrap();
// Leave scope:
// "When an instance of ChildStdin is dropped, the ChildStdin’s underlying file handle will
// be closed."
}

How to know when the server has received the whole request?

I am implementing the HTTP/1.1 protocol from scratch for academic purpose. I have implemented the RequestBuilder which builds the request object successively from the buffer passed. This is the code to handle the opened socket.
async fn process_socket(stream: TcpStream) -> Result<Request> {
let mut request_builder = RequestBuilder::new();
let mut buffer: [u8; 1024] = unsafe { MaybeUninit::uninit().assume_init() };
loop {
stream.readable().await?;
match stream.try_read(&mut buffer) {
Ok(0) => {
break;
}
Ok(n) => (),
Err(ref e) if e.kind() == ErrorKind::WouldBlock => {
continue;
}
Err(e) => {
return Err(e.into());
}
}
request_builder.parse(&buffer);
}
let request = request_builder.build()?;
Ok(request)
}
request_builder.parse(&buffer); will take the next part of the buffer and parses the request further. My question is, how to break the loop when the client has sent the whole request. When I make a request to the server using curl localhost:8080, the whole request is parsed.
Expected behaviour
The loop would have been broken after reading the whole request stream.
Actual behaviour
The loop is stuck at stream.readable().await?; after reading the whole request into buffer. Currently, when I kill curl command using Ctrl+C, the loop is broken using Ok(0), but I want it to break after reading the who
You need to interpret the HTTP request, as the TCP connection will not get half-closed by a client. A FIN by the client which would violate the protocol) is the only way readable() returns (with an Err) unless the client sends more data (which breaks the HTTP specification).

Why do I get a FrameTooBig error when using Tokio's frame_delimited?

I'm trying to get my feet wet using Tokio. When I send a message from a Telnet connection, I get Custom { kind: InvalidData, error: FrameTooBig }. I don't understand the issue, nor how to overcome it.
extern crate tokio;
extern crate tokio_io;
use tokio::io;
use tokio::net::TcpListener;
use tokio::prelude::*;
use tokio_io::codec::length_delimited;
fn main() {
let addr = "127.0.0.1:12345".parse().unwrap();
let listener = TcpListener::bind(&addr).unwrap();
let server = listener
.incoming()
.for_each(|socket| {
let transport = length_delimited::Builder::new().new_read(socket);
let msg_proccessing = transport
.for_each(|msg| {
// Note: This part is never actually executed
println!("{:?}", msg);
Ok(())
})
.map_err(|e| println!("waaaaaaaaaaaaaaaaa {:?}", e));
tokio::spawn(msg_proccessing);
Ok(())
})
.map_err(|_| {});
tokio::run(server);
}
Client side:
▶ telnet localhost 12345
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
\x00\x00\x00\x0bhello world
Connection closed by foreign host.
The problem was lying on the client side, and it was related with how Telnet works. It's not straightforward to send hex data using Telnet, so I tried this and worked well:
echo '\x00\x00\x00\x0bhello world' | nc localhost 12345 #WORKS!
However, neither of these work:
echo '\x00\x00\x00\x0bhello world' | telnet localhost
▶ telnet localhost 12345
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
\x00\x00\x00\x0bhello world
Connection closed by foreign host.
It seems like the FrameTooBig error was due to the fact that the message telnet was sending was bigger than what the server expected. I wasn't be able to encode the frame properly using hex, and then the length was not matching the length header received, and thus the error.

Using Elixir to talk to Rust via Ports, what am I getting wrong?

I'm in the process of writing a tutorial, because I couldn't find a simple example anywhere, of communicating between Elixir and Rust via a Port.
I can get Rustler to work, but that is a NIF, not a Port.
I'm missing something fundamental in my code. I'm not sure if I'm missing something basic in stdio or if it's something else, but I've tried a lot of different things.
I can get port communication to work with a very basic program in Rust:
use std::env;
fn main() {
println!("hello world!");
}
I can get this to get pulled into my iex -S mix by running this port:
defmodule PortExample do
def test() do
port = Port.open({:spawn_executable, "_build/dev/rustler_crates/portexample/debug/portexample"}, [:binary])
Port.info(port)
port
end
Here's what the iex for that looks like:
Interactive Elixir (1.4.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> PortExample.test()
#Port<0.9420>
iex(2)> flush()
{#Port<0.9420>, {:data, "hello world!\n"}}
:ok
iex(3)>
I can do the same using a porcelain library call:
alias Porcelain.Result
def porcelain() do
result = Porcelain.exec("_build/dev/rustler_crates/portexample/debug/portexample",["hello", "world"])
IO.inspect result.out
end
corresponding iex:
iex(3)> PortExample.porcelain()
"hello world!\n"
"hello world!\n"
iex(4)>
However, as soon as I start using a Rust library with some form of input/output, things start falling over.
For example, Rust code:
use std::io::{self, Write, Read};
fn main() {
let mut input = String::new();
let mut output = String::new();
for i in 0..2 {
match io::stdin().read_line(&mut input) {
Ok(n) => {
println!("input: {}", input.trim());
io::stdout().flush();
}
Err(error) => println!("error: {}", error),
}
}
}
I can get it to compile and run in the command line:
hello
input: hello
world
input: hello
world
However, when I call it from an Elixir port:
iex(12)> port = PortExample.test()
#Port<0.8779>
iex(13)> Port.command(port, "hello")
true
iex(14)> Port.command(port, "world")
true
iex(15)> Port.command(port, "!")
true
iex(16)> Port.command(port, "more")
true
iex(17)> flush()
:ok
iex(18)> Port.info(port)
[name: '_build/dev/rustler_crates/portexample/debug/portexample',
links: [#PID<0.271.0>], id: 4697, connected: #PID<0.271.0>, input: 0,
output: 15, os_pid: 21523]
I get no data from it at all! However, the Port.info(port) call shows that its received 15 bytes. It just hasn't posted returned anything at all to the port. I've been trying to read other code and I thought I was doing things similar enough that it should work, but it doesn't.
I thought: maybe the buffer isn't flushed? so I flush the buffer in Rust.
I thought: maybe the loop is hanging, so I limited it to only a few passes.
When I try to run this same code through the porcelain call, it hangs.
You're reading a line of input in the Rust code, which will read until a \r\n or \n, but you're not sending the newline character from Elixir. If you change all the Port.command calls to add a \n after the message, it works:
iex(1)> port = Port.open({:spawn_executable, "a"}, [:binary])
#Port<0.1229>
iex(2)> Port.command(port, "hello")
true
iex(3)> flush()
:ok
iex(4)> Port.command(port, "hello\n")
true
iex(5)> flush()
{#Port<0.1229>, {:data, "input: hellohello\n"}}
:ok

Resources