I'm trying to convert a file (that I opened and read into a buffer), into a valid BSON format.
I writing the client-side for making a request that takes two fields;
Name of the file
File(buffer)
The problem here is I can't seem to make a successful conversion here.
Another question is, after making this conversion, is it possible to convert this BSON request into a buffer, because that's the type curl(Easy) crate takes for making its requests (i.e requests that are from the terminal, not the browser of forms)
this is my code for making this request
// It takes in a file path.
fn send_file_post(file_from_arg: &str) -> tide::Result {
// initialise the connection to the server
let mut easy = Easy::new();
easy.url("http://0.0.0.0:8080/hi").unwrap();
// Opens and reads the file path
let mut file = File::open(file_from_arg)?;
let mut buf = [0; 1096];
// reads file into buffer
loop {
let n = file.read(&mut buf)?;
if n == 0 {
// reached end of file
break;
}
// easy.write_all(&buf[..n])?;
}
// attempted bson format
let bson_data: Bson = bson!({
"name": file_from_arg,
"file": buf
});
// sending the request, this is only sending the file for now, I want to send a bson format that is buffered (in a buffer/bytes format)
easy.post_fields_copy(&buf).unwrap();
easy.write_function(|data| {
stdout().write_all(data).unwrap();
Ok(data.len())
})
.unwrap();
println!(" oh hi{:?}", easy.perform().unwrap());
Ok(format!("okay sent!").into())
}
I realised that serde_json has a convert to vec! method, which can be likened to bytes if not the same. So I converted the file into bytes and sent it as a buffer. This is what the function looks like below.
// It takes in a file path.
fn send_file_post(file_from_arg: &str, port_addr: &str) -> tide::Result {
// initialise
let mut easy = Easy::new();
let port = format!("{}/hi", port_addr);
easy.url(&port).unwrap();
// reads file path
let file = std::fs::read(file_from_arg)?;
//extracts name of the file from file path
let (.., file_name) = file_from_arg
.rsplit_once(std::path::MAIN_SEPARATOR)
.unwrap();
// creates the necessary type to send the file in bytes
let new_post = FileSearch {
file_name: file_name.to_string(),
file_bytes: file,
};
// Unwrap into a vector, which can be likened to bytes
let send_file_body_req = serde_json::to_vec(&new_post).unwrap();
// make and send request
easy.post(true).unwrap();
easy.post_field_size(send_file_body_req.len() as u64).unwrap();
let mut transfer = easy.transfer();
transfer
.read_function(|buf| Ok(send_file_body_req.as_slice().read(buf).unwrap_or(0)))
.unwrap();
transfer.perform().unwrap();
Ok(format!("okay sent!").into())
}
Related
I'm writing a program to calculate md5sum of files. The core codes is :
use md5;
use std::fs::File;
fn func(file_path: &str) {
let mut f = File::open(file_path).unwrap();
let mut contents = Vec::<u8>::new();
f.read_to_end(&mut contents).unwrap();
let digest = md5::compute(&contents.as_slice());
println!("{:x}\t{}", digest, file_path);
}
This function work well with moderate-size files. But it will raise a segmentation fault when calculating a large file, generate a core dump file more than 22G. How to do partial read and calculate the md5sum separately and then gather them into a final result?
You can use a BufReader to read the file in chunks:
let f = File::open(file_path).unwrap();
// Find the length of the file
let len = f.metadata().unwrap().len();
// Decide on a reasonable buffer size (1MB in this case, fastest will depend on hardware)
let buf_len = len.min(1_000_000) as usize;
let mut buf = BufReader::with_capacity(buf_len, f);
let mut context = md5::Context::new();
loop {
// Get a chunk of the file
let part = buf.fill_buf().unwrap();
// If that chunk was empty, the reader has reached EOF
if part.is_empty() {
break;
}
// Add chunk to the md5
context.consume(part);
// Tell the buffer that the chunk is consumed
let part_len = part.len();
buf.consume(part_len);
}
let digest = context.compute();
println!("{:x}\t{}", digest, file_path);
I'm not sure if this is going to help, though. I would expect yours to have a memory allocation error, so segmentation fault is unexpected.
I made a loop for a webserver.
On a windows client I didn't have any problems but on a linux client the server didn't responding to requests.
The problem: I found out that if request_size % buffer_size == 0 then the loop runs once more waiting for more data.
The question: Is there an efficient way of reading data that takes into consideration slow connections, connections that drop packages. (Not just using non_blocking or nodelay.)
let listener = TcpListener::bind("127.0.0.1:80").unwrap();
while let Ok((mut stream, _)) = listener.accept() {
let mut data: Vec<u8> = Vec::new();
let mut buf = [0u8; 32];
while let Ok(size) = stream.read(&mut buf) {
data.extend(buf[..size].iter());
if size != buf.len() { break; }
}
// do something with the data
}
I could increase the buffer size but that wouldn't solve the problem.
First, to detect EOF reliably, you should test the returned size of Read::read against zero and not your buffer size, because if you have a 'slow connections' you might not get enough data to fill the entire buffer at once, causing your loop to quite early with an incomplete message in data.
There are essentially 3 ways to make sure you received the entire message:
Read until EOF
Read a fixed-sized message
Encode some 'content length' and read that many bytes
Notice, that only the last two variants allow your client to eventually send more data over the same stream. Also notice, that these two variants can be implemented comparably easy via Read::read_exact.
Besides notice, if you don't trust your client, it might be helpful to set up TcpStream::set_read_timeout with a reasonably long timeout (e.g. 2 min).
Read until EOF
This is probably the easiest and, according to your title and code, probably the method you are aiming for. However, to generate an EOF, the client must shutdown at least its write channel. So, if your server is stuck in read, I assume you forgot to shutdown your client (tho I have to guess here).
On the server side, if you really want to read until EOF, you don't need a loop yourself, you can simply use the Read::read_to_end utility function. Here is an example for a client & server with the client sending a single message terminated by EOF:
use std::io::Read;
use std::io::Write;
use std::net::TcpListener;
use std::net::TcpStream;
// --- Client code
const SERVER_ADDR: &str = "localhost:1234";
pub fn client() {
let mut socket = TcpStream::connect(SERVER_ADDR).expect("Failed to connect");
// Send a 'single' message, the flushes kinda simulates a very slow connection
for _ in 0..3 {
socket.write(b"Hello").expect("Failed to send");
socket.flush().unwrap();
}
// Instead of shutdow, you can also drop(socket), but than you can't read.
socket.shutdown(std::net::Shutdown::Write).unwrap();
// go reading, or whatever
}
// --- Server code
const SERVER_BIND: &str = "127.0.0.1:1234";
pub fn server() {
let listener = TcpListener::bind(SERVER_BIND).expect("Failed to bind");
while let Ok((stream, _)) = listener.accept() {
let _ = handle_client(stream); // don't care if the client screwed up
}
}
pub fn handle_client(mut socket: TcpStream) -> std::io::Result<()> {
let mut data: Vec<u8> = Vec::new();
// Read all bytes until EOF
socket.read_to_end(&mut data)?;
println!("Data: {:?}", data); // or whatever
Ok(())
}
i have made this code to check for alive urls in a text file it was first to check for a single url the script worked but then i wanted to make it multithreaded i got this error
error
here is the original code :
use hyper_tls::HttpsConnector;
use hyper::Client;
use tokio::io::BufReader;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error + Send + Sync>> {
let https = HttpsConnector::new();
let url = std::env::args().nth(1).expect("no list given");
let client = Client::builder().build::<_, hyper::Body>(https);
let reader = BufReader::new(url);
let lines = reader.lines();
for l in lines {
let sep = l.parse()?;
// Await the response...
let resp = client.get(sep).await?;
if resp.status() == 200 {
println!("{}", l);}
if resp.status() == 301 {
println!("{}", l); }
}
Ok(())
}
the issue seems to be that you are passing in the file's name as opposed to its content to the BufReader.
In order to read the contents instead, you can use a tokio::fs:File.
Here's an example of reading a file and printing its lines to stdout using tokio and a BufReader:
use tokio::{
fs::File,
io::{
// This trait needs to be imported, as the lines function being
// used on reader is defined there
AsyncBufReadExt,
BufReader
}
};
#[tokio::main]
async fn main() {
// get file command line argument
let file_argument = std::env::args().nth(1).expect("Please provide a file as command line argument.");
// open file
let file = File::open(file_argument).await.expect("Failed to open file");
// create reader using file
let reader = BufReader::new(file);
// get iterator over lines
let mut lines = reader.lines();
// this has to be used instead of a for loop, since lines isn't a
// normal iterator, but a Lines struct, the next element of which
// can be obtained using the next_line function.
while let Some(line) = lines.next_line().await.expect("Failed to read file") {
// print current line
println!("{}", line);
}
}
Is there a good way to handle the ownership of a file held within a struct using Rust? As a stripped down example, consider:
// Buffered file IO
use std::io::{BufReader,BufRead};
use std::fs::File;
// Structure that contains a file
#[derive(Debug)]
struct Foo {
file : BufReader <File>,
data : Vec <f64>,
}
// Reads the file and strips the header
fn init_foo(fname : &str) -> Foo {
// Open the file
let mut file = BufReader::new(File::open(fname).unwrap());
// Dump the header
let mut header = String::new();
let _ = file.read_line(&mut header);
// Return our foo
Foo { file : file, data : Vec::new() }
}
// Read the remaining foo data and process it
fn read_foo(mut foo : Foo) -> Foo {
// Strip one more line
let mut header_alt = String::new();
let _ = foo.file.read_line(&mut header_alt);
// Read in the rest of the file line by line
let mut data = Vec::new();
for (lineno,line) in foo.file.lines().enumerate() {
// Strip the error
let line = line.unwrap();
// Print some diagnostic information
println!("Line {}: val {}",lineno,line);
// Save the element
data.push(line.parse::<f64>().unwrap());
}
// Export foo
Foo { data : data, ..foo}
}
fn main() {
// Initialize our foo
let foo = init_foo("foo.txt");
// Read in our data
let foo = read_foo(foo);
// Print out some debugging info
println!("{:?}",foo);
}
This currently gives the compilation error:
error[E0382]: use of moved value: `foo.file`
--> src/main.rs:48:5
|
35 | for (lineno,line) in foo.file.lines().enumerate() {
| -------- value moved here
...
48 | Foo { data : data, ..foo}
| ^^^^^^^^^^^^^^^^^^^^^^^^^ value used here after move
|
= note: move occurs because `foo.file` has type `std::io::BufReader<std::fs::File>`, which does not implement the `Copy` trait
error: aborting due to previous error
For more information about this error, try `rustc --explain E0382`.
error: Could not compile `rust_file_struct`.
To learn more, run the command again with --verbose.
And, to be sure, this makes sense. Here, lines() takes ownership of the buffered file, so we can't use the value in the return. What's confusing me is a better way to handle this situation. Certainly, after the for loop, the file is consumed, so it really can't be used. To better denote this, we could represent file as Option <BufReader <File>>. However, this causes some grief because the second read_line call, inside of read_foo, needs a mutable reference to file and I'm not sure how to obtain one it it's wrapped inside of an Option. Is there a good way of handling the ownership?
To be clear, this is a stripped down example. In the actual use case, there are several files as well as other data. I've things structured in this way because it represents a configuration that comes from the command line options. Some of the options are files, some are flags. In either case, I'd like to do some processing, but not all, of the files early in order to throw the appropriate errors.
I think you're on track with using the Option within the Foo struct. Assuming the struct becomes:
struct Foo {
file : Option<BufReader <File>>,
data : Vec <f64>,
}
The following code is a possible solution:
// Reads the file and strips the header
fn init_foo(fname : &str) -> Foo {
// Open the file
let mut file = BufReader::new(File::open(fname).unwrap());
// Dump the header
let mut header = String::new();
let _ = file.read_line(&mut header);
// Return our foo
Foo { file : Some(file), data : Vec::new() }
}
// Read the remaining foo data and process it
fn read_foo(foo : Foo) -> Option<Foo> {
let mut file = foo.file?;
// Strip one more line
let mut header_alt = String::new();
let _ = file.read_line(&mut header_alt);
// Read in the rest of the file line by line
let mut data = Vec::new();
for (lineno,line) in file.lines().enumerate() {
// Strip the error
let line = line.unwrap();
// Print some diagnostic information
println!("Line {}: val {}",lineno,line);
// Save the element
data.push(line.parse::<f64>().unwrap());
}
// Export foo
Some(Foo { data : data, file: None})
}
Note in this case that read_foo returns an optional Foo due to the fact that the file could be None.
On a side note, IMO, unless you absolutely need the BufReader to be travelling along with the Foo, I would discard it. As you've already found, calling lines causes a move, which makes it difficult to retain within another struct. As a suggestion, you could make the file field simply a String so that you could always derive the BufReader and read the file when needed.
For example, here's a solution where a file name (i.e. a &str) can be turned into a Foo with all the line processing done just before the construction of the struct.
// Buffered file IO
use std::io::{BufReader,BufRead};
use std::fs::File;
// Structure that contains a file
#[derive(Debug)]
struct Foo {
file : String,
data : Vec <f64>,
}
trait IntoFoo {
fn into_foo(self) -> Foo;
}
impl IntoFoo for &str {
fn into_foo(self) -> Foo {
// Open the file
let mut file = BufReader::new(File::open(self).unwrap());
// Dump the header
let mut header = String::new();
let _ = file.read_line(&mut header);
// Strip one more line
let mut header_alt = String::new();
let _ = file.read_line(&mut header_alt);
// Read in the rest of the file line by line
let mut data = Vec::new();
for (lineno,line) in file.lines().enumerate() {
// Strip the error
let line = line.unwrap();
// Print some diagnostic information
println!("Line {}: val {}",lineno,line);
// Save the element
data.push(line.parse::<f64>().unwrap());
}
Foo { file: self.to_string(), data }
}
}
fn main() {
// Read in our data from the file
let foo = "foo.txt".into_foo();
// Print out some debugging info
println!("{:?}",foo);
}
In this case, there's no need to worry about the ownership of the BufReader because it's created, used, and discarded in the same function. Of course, I don't fully know your use case, so this may not be suitable for your implementation.
I read a file given as an argument, but when I try to pass it to handle_client in a task at the bottom so that it can be written to the tcp stream when someone connects I get error: capture of moved value: html... what am I missing?
fn get_file_string(path_str: &String) -> String{
let path = Path::new(path_str.as_bytes());
let file = File::open(&path);
let mut reader = BufferedReader::new(file);
reader.read_to_string().unwrap()
}
fn main() {
let listener = TcpListener::bind("127.0.0.1:8001");
let mut acceptor = listener.listen();
let ref file_to_host = os::args()[1];
let html = get_file_string(file_to_host).clone();
fn handle_client(mut stream: TcpStream, html: String) {
let write = stream.write_str(html.as_slice());
}
for stream in acceptor.incoming() {
match stream {
Err(e) => { println!("{}", e) }
Ok(stream) => spawn(proc() {
handle_client(stream,html)
})
}
}
}
html is a String. Just like everything in Rust, a String is owned in exactly one place (if it satisfied Copy, it would be able to just duplicate it implicitly, but as it involves a heap allocation it’s definitely not). At present, you’re passing html to the handle_client function by value; therefore, when you call handle_client(stream, html), both stream and html are moved into that function and are no longer accessible. In the case of stream, that doesn’t matter as it’s a variable from inside the loop, but html comes from outside the loop; if it let you do it, it would take it the first time and work fine, but then it would be freed; second time through the loop, you would have an invalid String being passed through.
The solution in this case, seeing as you’re passing it through spawn and so can’t pass a reference (the slice, &str) is to clone the value so that that value can be moved into the proc and into the handle_client call:
Ok(stream) => {
let html = html.clone();
spawn(proc() {
handle_client(stream, html)
})
}