Encode utf8 on TcpStream

Encode utf8 on TcpStream - rust

I am trying to route all my traffic by Iptables .
iptables -t nat -D OUTPUT -p tcp -j DNAT --to-destination 127.0.0.1:3400
to my Rust Code which is listening on specific port
let addrs = [
SocketAddr::from(([127, 0, 0, 1], 3400)),
];
let tcp = TcpListener::bind(&addrs[..]).expect("error bind tcp");
match tcp.accept() {
Ok((_socket,addr)) => println!("{:?} ",addr),
Err(_) => println!("error found"),
}
let mut buffer = [0;500];
let mut buf = unsafe {
slice::from_raw_parts_mut((&mut buffer).as_mut_ptr(),buffer.len())
};
for stream in tcp.incoming() {
let buf = stream.unwrap().read(buf).expect("stream read buffer ");
let result = StrType::from_utf8(&buffer).expect("result decode failed");
// println!("{:?} {:?}",buffer,buf);
println!("{:?}",buf);
println!("{}",result.len());
println!("{:?}\n\n",result);
}
then i want to read my data which UTF8 and i faced this such error .
thread 'main' panicked at 'result decode failed: Utf8Error { valid_up_to: 8, error_len: Some(1) }', src/main.rs:46:50
How can i resolve this error or how can i get data of requested ?
Thanks for your helping.

Since utf8 encoded strings' chars can vary in length from 1 to 4 bytes, when you are getting transfer over the network (or in other streaming way) it can happen, that packet (or the buffer you read into) is divided in the middle of a character. Rust requires that str and String types contains only valid utf8 encoded characters, so when you are trying to interpret the bytes as utf8 string it returns error.
Luckily this error type Utf8Error contains information about until which byte this byte slice is valid utf8. So you can use only the first, correct part, and the rest concatenate with further data. You can see the example of that in the linked documentation.
Also, you don't have to use unsafe slice::from_raw_parts_mut, just use &mut buffer.

Related

Will `TcpStream` be disabled once it receives an invalid UTF-8?

I'm trying to create a server that receives a string from a client through TCP socket communication and returns the same string to the same socket. I want the following specifications:
Repeat communication with the client (corresponding to the loop block in the code below)
When the client receives a valid UTF-8 character, return the same character (Ok branch in the loop block)
When the client does not receive a valid UTF-8 character, return the string "Invalid data" (Err branch in the loop block)
use std::io::Error;
use tokio::io::{AsyncBufReadExt, AsyncWriteExt, BufReader};
use tokio::net::TcpListener;
#[tokio::main]
async fn main() -> Result<(), Error> {
// Create TCP listener
let addr = "localhost:8080";
let socket = TcpListener::bind(&addr).await;
let listener = socket.unwrap();
// Accept connection
let (mut socket, _addr) = listener.accept().await.unwrap();
// Split socket into read and write halves
let (reader, mut writer) = socket.split();
// Read buffer into string
let mut line = String::new();
let mut buf_reader = BufReader::new(reader);
loop {
match buf_reader.read_line(&mut line).await {
Ok(bytes) => {
if bytes == 0 {
// `bytes == 0` means the connection is closed
break;
}
}
Err(error) => {
println!("{error}");
line = "Invalid data".to_string();
}
}
// Respond to client
writer.write_all(line.as_bytes()).await.unwrap();
line.clear();
}
Ok(())
}
The client is telnet on macOS.
telnet localhost 8080
Below are how to reproduce the issue:
Typing "hello" returns "hello".
Typing ctrl-C and pressing Enter shows "stream did not contain valid UTF-8" on the server side and no response from the server is displayed on the telnet side (I want "Invalid data" to be displayed).
Typing "hello" again returns nothing, even though I have confirmed that the server is receiving it.
telnet output:
hello
hello
^C
hello
Will the TcpStream become invalid once invalid UTF-8 is received?
Expected behaviours
The server returns "Invalid data" when it receives invalid UTF-8 characters.
The server returns the same characters it receives if they are valid UTF-8 characters, even after receiving invalid UTF-8 characters in the previous loop.

It was an issue with telnet.
I created invalid.csv which contains 3 rows alternating between valid and invalid rows of UTF-8 sequences:
invalid.csv
hello
(invalid UTF-8 sequences)
hello
Then, I used a pipe with nc command:
cat invalid.csv | nc localhost 8080
The output was:
hello
Invalid data
hello
which is as expected.

How to set APLN protocols before TLS handshake with OpenSSL in Rust?

I want to set the APLN protocols to "h2" and "http/1.1" before the TLS handshake. I am using .set_alpn_protos(). However, my attempt yields an error at runtime:
context.set_alpn_protos(b"\x06h2\x08http/1.1").expect("set ALPN error");
thread 'main' panicked at 'set ALPN error: ErrorStack([])', src/checker/tls/get_tls_info.rs:58:56
I can set them successfully in Python like this:
ssl.set_alpn_protos([b'h2', b'http/1.1'])
What am I doing wrong?

From the docs you linked (emphasis mine):
It consists of a sequence of supported protocol names prefixed by their byte length.
replace \x06 with \x02 since b"h2" is 2 bytes not 6
context.set_alpn_protos(b"\x02h2\x08http/1.1").expect("set ALPN error");
You could also use something like the following to calculate the wire format from a list similar to pythons:
let protos: &[&[u8]] = &[b"h21", b"http/1.1"];
let wire = protos.into_iter().flat_map(|proto| {
let mut proto = proto.to_vec();
let len: u8 = proto.len().try_into().expect("proto is too long");
proto.insert(0, len);
proto
}).collect::<Vec<_>>();

Nodejs asymmetrical buffer <-> string conversion

In nodejs I had naively expected the following to always output true:
let buff = Buffer.allocUnsafe(20); // Essentially random contents
let str = buff.toString('utf8');
let decode = Buffer.from(str, 'utf8');
console.log(0 === buff.compare(decode));
Given a Buffer buff, how can I detect ahead of time whether buff will be exactly equal to Buffer.from(buff.toString('utf8'), 'utf8')?

You should be probably be fine by just testing that the input buffer contains valid UTF-8 data:
try {
new TextDecoder('utf-8', { fatal: true }).decode(buff);
console.log(true);
} catch {
console.log(false);
}
But I wouldn't swear on Node being 100% consistent in the handling of invalid UTF-8 data when converting from string to buffer. If you want to be safe, you'll have to stick to buffer comparison. You could make the process of encoding/decoding a little more efficient by using transcode, which does not require creating a temporary string.
import { transcode } from 'buffer';
let buff = Buffer.allocUnsafe(20);
let decode = transcode(buff, 'utf8', 'utf8');
console.log(0 === buff.compare(decode));
If you're interested how TextDecoder determines if a buffer represents a valid utf8 string, the rigorous definition of this procedure can be found here.

Continuously process child process' outputs byte for byte with a BufReader

I'm trying to interact with an external command (in this case, exiftool) and reading the output byte by byte as in the example below.
While I can get it to work if I'm willing to first read in all the output and wait for the child process to finish, using a BufReader seems to result in indefinitely waiting for the first byte. I used this example as reference for accessing stdout with a BufReader.
use std::io::{Write, Read};
use std::process::{Command, Stdio, ChildStdin, ChildStdout};
fn main() {
let mut child = Command::new("exiftool")
.arg("-#") // "Read command line options from file"
.arg("-") // use stdin for -#
.arg("-q") // "quiet processing" (only send image data to stdout)
.arg("-previewImage") // for extracting thumbnails
.arg("-b") // "Output metadata in binary format"
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.spawn().unwrap();
{
// Pass input file names via stdin
let stdin: &mut ChildStdin = child.stdin.as_mut().unwrap();
stdin.write_all("IMG_1709.CR2".as_bytes()).unwrap();
// Leave scope:
// "When an instance of ChildStdin is dropped, the ChildStdin’s underlying file handle will
// be closed."
}
// This doesn't work:
let stdout: ChildStdout = child.stdout.take().unwrap();
let reader = std::io::BufReader::new(stdout);
for (byte_i, byte_value) in reader.bytes().enumerate() {
// This line is never printed and the program doesn't seem to terminate:
println!("Obtained byte {}: {}", byte_i, byte_value.unwrap());
// …
break;
}
// This works:
let output = child.wait_with_output().unwrap();
for (byte_i, byte_value) in output.stdout.iter().enumerate() {
println!("Obtained byte {}: {}", byte_i, byte_value);
// …
break;
}
}

You're not closing the child's stdin. Your stdin variable is a mutable reference, and dropping that has no effect on the referenced ChildStdin.
Use child.stdin.take() instead of child.stdin.as_mut():
{
// Pass input file names via stdin
let stdin: ChildStdin = child.stdin.take().unwrap();
stdin.write_all("IMG_1709.CR2".as_bytes()).unwrap();
// Leave scope:
// "When an instance of ChildStdin is dropped, the ChildStdin’s underlying file handle will
// be closed."
}

strace can't parse my netlink message, but it appears valid

I'm taking my first stab at using the NetLink API in linux. I'm using Rust because it hasn't bitten me in the ass enough for me to go back to C yet. I figured a good place to start would be to enumerate the netlink devices, since there's already a utility that does that (ip link). When I run the Rust code it returns 3 devices out of the 6 devices that ip link returns. So I'm trying to inspect the request that I'm sending vs. what ip link is sending.
# Mine
$ sudo strace -ff -v -e trace=%network mine 2>&1 |grep sendto
sendto(3,
{
{nlmsg_len=40, nlmsg_type=0x12 /* NLMSG_??? */, nlmsg_flags=NLM_F_REQUEST|0x300, nlmsg_seq=1620766291, nlmsg_pid=0},
"\x11\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x08\x00\x1d\x00\x01\x00\x00\x00"
},
40, 0, {sa_family=AF_NETLINK, nl_pid=0, nl_groups=00000000}, 12) = 40
$ sudo strace -ff --const-print-style=raw -v -e trace=%network mine 2>&1 |grep sendto
sendto(3,
{
{nlmsg_len=40, nlmsg_type=0x12, nlmsg_flags=0x301, nlmsg_seq=1620766293, nlmsg_pid=0},
"\x11\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x08\x00\x1d\x00\x01\x00\x00\x00"
},
40, 0, {sa_family=0x10, nl_pid=0, nl_groups=00000000}, 12) = 40
# ip link
$ sudo strace -ff -v -e trace=%network ip link 2>&1 |grep sendto
sendto(3,
{
{nlmsg_len=40, nlmsg_type=RTM_GETLINK, nlmsg_flags=NLM_F_REQUEST|NLM_F_DUMP, nlmsg_seq=1620765818, nlmsg_pid=0},
{ifi_family=AF_PACKET, ifi_type=ARPHRD_NETROM, ifi_index=0, ifi_flags=0, ifi_change=0},
{
{nla_len=8, nla_type=IFLA_EXT_MASK},
1
}
},
40, 0, NULL, 0) = 40
$ sudo strace -ff --const-print-style=raw -v -e trace=%network ip link 2>&1 |grep sendto
sendto(3,
{
{nlmsg_len=40, nlmsg_type=0x12, nlmsg_flags=0x301, nlmsg_seq=1620765854, nlmsg_pid=0},
{ifi_family=0x11, ifi_type=0, ifi_index=0, ifi_flags=0, ifi_change=0},
{
{nla_len=8, nla_type=0x1d},
1
}
},
40, 0, NULL, 0) = 40
The two differences that I can spot are that (A) ip link binds the socket whereas the Rust library provides a dest_addr argument to sendto. I doubt this is relevant. (B) strace can parse the structure sent by ip link but can't seem to fully parse the structure sent by my Rust program. strace says both programs agree on the struct nlmsghdr header. However, strace doesn't parse the struct ifinfomsg of my program. Looking at the bytes, however, it appears to match.
The rust library I'm using netlink-packet-route doesn't seem to have an obvious equivalent to struct rtattr. Adjacent to the ifinfomsg (called LinkHeader in the rust library) there's a list of what it calls Nla. The enum values line up with their C equivilents and as shown above the constant values line up too.
man rtnetlink doesn't mention anything about IFLA_EXT_MASK as a possible rtattr for RTM_GETLINK and I don't get many other hits in documentation for it.
I guess the next step is to pop both into gdb and see if there's any other observable difference between the two calls.
The super-ugly demo-quality Rust code that produces the above message:
use std::io::{Result, Error, ErrorKind};
use netlink_sys::{Socket, protocols::NETLINK_ROUTE, SocketAddr};
use netlink_packet_core::{NetlinkMessage, NetlinkHeader, NLM_F_DUMP, NLM_F_REQUEST};
use netlink_packet_route::{RtnlMessage, LinkMessage, LinkHeader, AF_PACKET, ARPHRD_NETROM};
use netlink_packet_core::NetlinkPayload::InnerMessage;
use netlink_packet_route::RtnlMessage::NewLink;
use netlink_packet_route::link::nlas::Nla;
use std::time::{UNIX_EPOCH, SystemTime};
fn main() -> Result<()> {
println!("Hello, world!");
let socket = Socket::new(NETLINK_ROUTE)?;
let kernel_addr = SocketAddr::new(0, 0);
let msgid = SystemTime::now().duration_since(UNIX_EPOCH).map_err(|_|{Error::from(ErrorKind::Other)})?.as_secs();
let nlas: Vec<Nla> = vec![Nla::ExtMask(1)];
let mut packet = NetlinkMessage {
header: NetlinkHeader {
sequence_number: msgid as u32,
flags: NLM_F_DUMP | NLM_F_REQUEST,
..Default::default()
},
payload: RtnlMessage::GetLink(LinkMessage {header: LinkHeader {
interface_family: AF_PACKET as u8,
link_layer_type: ARPHRD_NETROM,
..LinkHeader::default()}, nlas, ..LinkMessage::default()}).into(),
};
packet.finalize();
let mut buf = vec![0; packet.header.length as usize];
packet.serialize(&mut buf[..]);
let n_sent = socket.send_to(&buf[..], &kernel_addr, 0).unwrap();
assert_eq!(n_sent, buf.len());
let mut buf = vec![0; 4096];
loop {
let (n_received, sender_addr) = socket.recv_from(&mut buf[..], 0).unwrap();
assert_eq!(sender_addr, kernel_addr);
for i in &mut buf[n_received..] { *i = 0 };
if n_received == 4096 { return Err(Error::from(ErrorKind::OutOfMemory))}
if buf[4] == 2 && buf[5] == 0 {
println!("the kernel responded with an error");
return Err(Error::from(ErrorKind::ConnectionReset));
}
if buf[4] == 3 && buf[5] == 0 {
println!("Done");
return Ok(());
}
let resp = NetlinkMessage::<RtnlMessage>::deserialize(&buf).expect("Failed to deserialize message");
match resp.payload {
InnerMessage(i) => match i {
NewLink(j) => {
let name = j.nlas.iter().find(|nla| { matches!(nla, Nla::IfName(_))});
println!("index {:?}: {:?}", j.header.index, name);
},
_ => println!("Some other message type")
},
_ => println!("Some other message type")
}
}
}

So it turns out that netlink will send more than one response structure in a single response packet. I discovered this by setting up a monitor interface for netlink and tcpdump'ing it. The total number of bytes sent back in response to ip link and to my program were similar, but they were distributed over differing numbers of packets. The size of the buffer used on the recvmsg call affects the way netlink packs the results.
sudo ip link add nlmon0 type nlmon
sudo ip link set dev nlmon0 up
sudo tcpdump -i nlmon0 -w /tmp/dump &
ip link
mine
fg
CTRL+C
tshark -r /tmp/dump
1 0.000000 → Netlink route 56
2 0.000073 → Netlink route 2660
3 0.000127 → Netlink route 2688
4 0.000205 → Netlink route 4060
5 0.000258 → Netlink 36
6 3.622386 → Netlink route 56
7 3.622449 → Netlink route 2660
8 3.622512 → Netlink route 2688
9 3.623179 → Netlink route 2740
10 3.623748 → Netlink route 1336
11 3.624273 → Netlink 36
I will need to figure out how this Rust library is setup to deal with this.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Encode utf8 on TcpStream - rust

Related

Will `TcpStream` be disabled once it receives an invalid UTF-8?

How to set APLN protocols before TLS handshake with OpenSSL in Rust?

Nodejs asymmetrical buffer <-> string conversion

Continuously process child process' outputs byte for byte with a BufReader

strace can't parse my netlink message, but it appears valid

Categories

Resources