from a stream "FramedRead" how to "do something" in every chunk

from a stream "FramedRead" how to "do something" in every chunk - rust

I would like to display the upload progress of a file using the crate indicatif, I am uploading the file asynchronous using reqwest with something like this:
use tokio::fs::File;
use tokio_util::codec::{BytesCodec, FramedRead};
let file = File::open(file_path).await?;
let stream = FramedRead::new(file, BytesCodec::new());
let body = Body::wrap_stream(stream);
client.put(url).body(body)
The progress bar is implemented like this:
use indicatif::ProgressBar;
let bar = ProgressBar::new(1000);
for _ in 0..1000 {
bar.inc(1);
// ...
}
bar.finish();
How from the stream:
let stream = FramedRead::new(file, BytesCodec::new());
// how on every chunk do X ?
let body = Body::wrap_stream(stream);
could I call bar.inc(1) on every interaction?
From the docs I see that there is a read_buffer but how to iterate over it in a way that I could use it for calling a custom function or also count the bytes send in the cased I could display "bytes sent" so far for example.

You can use TryStreamExt::inspect_ok, for instance, which will call a closure with a reference to every Ok(item) in the stream when that item is consumed.
use futures::stream::TryStreamExt;
use tokio_util::codec::{BytesCodec, FramedRead};
let stream = FramedRead::new(file, BytesCodec::new())
.inspect_ok(|chunk| {
// do X with chunk...
});
let body = Body::wrap_stream(stream);

Related

How to select a file as bytes or text in Rust WASM?

I am trying to get the Vec<u8> or String (or more ideally a Blob ObjectURL) of a file uploaded as triggered by a button click.
I am guessing this will require an invisible <input> somewhere in the DOM but I can't figure out how to leverage web_sys and/or gloo to either get the contents nor a Blob ObjectURL.

A js-triggered input probably won't do the trick, as many browsers won't let you trigger a file input from JS, for good reasons. You can use labels to hid the input if you think it is ugly. Other than that, you need to wiggle yourself through the files api of HtmlInputElement. Pretty painful, that:
use js_sys::{Object, Reflect, Uint8Array};
use wasm_bindgen::{prelude::*, JsCast};
use wasm_bindgen_futures::JsFuture;
use web_sys::*;
#[wasm_bindgen(start)]
pub fn init() {
// Just some setup for the example
std::panic::set_hook(Box::new(console_error_panic_hook::hook));
let window = window().unwrap();
let document = window.document().unwrap();
let body = document.body().unwrap();
while let Some(child) = body.first_child() {
body.remove_child(&child).unwrap();
}
// Create the actual input element
let input = document
.create_element("input")
.expect_throw("Create input")
.dyn_into::<HtmlInputElement>()
.unwrap();
input
.set_attribute("type", "file")
.expect_throw("Set input type file");
let recv_file = {
let input = input.clone();
Closure::<dyn FnMut()>::wrap(Box::new(move || {
let input = input.clone();
wasm_bindgen_futures::spawn_local(async move {
file_callback(input.files()).await;
})
}))
};
input
.add_event_listener_with_callback("change", recv_file.as_ref().dyn_ref().unwrap())
.expect_throw("Listen for file upload");
recv_file.forget(); // TODO: this leaks. I forgot how to get around that.
body.append_child(&input).unwrap();
}
async fn file_callback(files: Option<FileList>) {
let files = match files {
Some(files) => files,
None => return,
};
for i in 0..files.length() {
let file = match files.item(i) {
Some(file) => file,
None => continue,
};
console::log_2(&"File:".into(), &file.name().into());
let reader = file
.stream()
.get_reader()
.dyn_into::<ReadableStreamDefaultReader>()
.expect_throw("Reader is reader");
let mut data = Vec::new();
loop {
let chunk = JsFuture::from(reader.read())
.await
.expect_throw("Read")
.dyn_into::<Object>()
.unwrap();
// ReadableStreamReadResult is somehow wrong. So go by hand. Might be a web-sys bug.
let done = Reflect::get(&chunk, &"done".into()).expect_throw("Get done");
if done.is_truthy() {
break;
}
let chunk = Reflect::get(&chunk, &"value".into())
.expect_throw("Get chunk")
.dyn_into::<Uint8Array>()
.expect_throw("bytes are bytes");
let data_len = data.len();
data.resize(data_len + chunk.length() as usize, 255);
chunk.copy_to(&mut data[data_len..]);
}
console::log_2(
&"Got data".into(),
&String::from_utf8_lossy(&data).into_owned().into(),
);
}
}
(If you've got questions about the code, ask. But it's too much to explain it in detail.)
And extra, the features you need on web-sys for this to work:
[dependencies.web-sys]
version = "0.3.60"
features = ["Window", "Navigator", "console", "Document", "HtmlInputElement", "Event", "EventTarget", "FileList", "File", "Blob", "ReadableStream", "ReadableStreamDefaultReader", "ReadableStreamReadResult"]

Thanks to Caesar I ended up with this code for use with dominator as the Dom crate.
pub fn upload_file_input(mimes: &str, mutable: Mutable<Vec<u8>>) -> Dom {
input(|i| {
i.class("file-input")
.prop("type", "file")
.prop("accept", mimes)
.apply(|el| {
let element: HtmlInputElement = el.__internal_element();
let recv_file = {
let input = element.clone();
Closure::<dyn FnMut()>::wrap(Box::new(move || {
let input = input.clone();
let mutable = mutable.clone();
wasm_bindgen_futures::spawn_local(async move {
file_callback(input.files(), mutable.clone()).await;
})
}))
};
element
.add_event_listener_with_callback(
"change",
recv_file.as_ref().dyn_ref().unwrap(),
)
.expect("Listen for file upload");
recv_file.forget();
el
})
})
}
async fn file_callback(files: Option<FileList>, mutable: Mutable<Vec<u8>>) {
let files = match files {
Some(files) => files,
None => return,
};
for i in 0..files.length() {
let file = match files.item(i) {
Some(file) => file,
None => continue,
};
// gloo::console::console_dbg!("File:", &file.name());
let reader = file
.stream()
.get_reader()
.dyn_into::<ReadableStreamDefaultReader>()
.expect("Reader is reader");
let mut data = Vec::new();
loop {
let chunk = JsFuture::from(reader.read())
.await
.expect("Read")
.dyn_into::<Object>()
.unwrap();
// ReadableStreamReadResult is somehow wrong. So go by hand. Might be a web-sys bug.
let done = Reflect::get(&chunk, &"done".into()).expect("Get done");
if done.is_truthy() {
break;
}
let chunk = Reflect::get(&chunk, &"value".into())
.expect("Get chunk")
.dyn_into::<Uint8Array>()
.expect("bytes are bytes");
let data_len = data.len();
data.resize(data_len + chunk.length() as usize, 255);
chunk.copy_to(&mut data[data_len..]);
}
mutable.set(data);
// gloo::console::console_dbg!(
// "Got data",
// &String::from_utf8_lossy(&data).into_owned(),
// );
}
}

Is it possible to get both the text and the JSON of a response from reqwest

From the reqwest docs, you can get the deserialized json, or the body text from a request response.
What I can't see is how to get them both. My requirement is that I want the decoded json for use in the code but want to print out the text for debugging. Unfortunately attempting to get both will give you an error about use of a moved value since both of these functions take ownership of the request. It doesn't seem possible to clone the request either.
This is an example of something I'd like to be able to do but line 4 is invalid since it uses response which was moved on line 1.
let posts: Vec<Post> = match response.json::<PostList>().await {
Ok(post_list) => post_list.posts,
Err(e) => {
let text = response.text().await.unwrap();
println!("Error fetching posts: {}, {}", e, text);
Vec::new()
}
};

The reason both json() and text() cannot be called on same response is that both these methods have to read the whole response stream, and this can only be done one time.
Your best option here is to first read it into a String and then parse JSON from that string:
let response_text = response.text().await.unwrap();
let posts: Vec<Post> = match serde_json::from_str::<PostList>(&response_text) {
...
}

How do I get URL parameters?

Something wrong with below code (I'm absolutely new in Rust...):
let function = move |req: &mut Request| -> IronResult<Response> {
let router = req.extensions.get::<Router>().expect("Unable to get router");
println!("router:{:?}", router);
let val = router.find("param").expect("param is required");
...
}
...
router.get("/page", function, "handler");
...
While executed (site/page?param=0), got below traces. Looks like 'router' is empty. What's is wrong ?
router:Params { map: {} }
thread '<unnamed>' panicked at 'param is required', src/main.rs:xx:xx
So, I decided to use params::{Params} and almost reached my goal, but...
let function = move |req: &mut Request| -> IronResult<Response> {
use params::{Params};
use crate::iron::Plugin;
let map = req.get_ref::<Params>().unwrap();
let val1 = map.find(&["param1"]).expect("param1 is required");
let val2 = map.find(&["param2"]).expect("param2 is required");
let cmd = format!("cmd={:?}{:?}\n", val1, val2);
println!("{}", cmd);
...
};
I get cmd="1600","100" instead of wanted cmd=1600,100. Can't use format!({}) due to compilation error:
error[E0277]: `params::Value` doesn't implement `std::fmt::Display`
`params::Value` cannot be formatted with the default formatter
Tried different string/number conversions but failed since val1, val2 are not strings but params::Value. Tried remove quotes from cmd string but also failed. Any idea of simple solution?

The params of the Router are those specified in the URL pattern e.g. if you define a route GET /:query then you'll get a query param storing the corresponding path segment.
The querystring you access via Request::url.

How to speed up writing points to InfluxDB?

I'm writing points (60k per batch) to InfluxDB.
Point contains time (i64) and value (f64).
It looks like.
use influx_db_client::{
Client, Point, Points, Value, Precision
};
use tokio;
fn main() {
let client = Client::new(Url::parse("http://localhost:8086").expect("Cannot parse url"), "test");
let data: Vec<Point> = get_data();
tokio::runtime::Runtime::new().unwrap().block_on(async move {
for chunk in data.chunks(60000) {
let points = Points::create_new(chunk.to_vec());
client.write_points(points, Some(Precision::Milliseconds), None).await.expect("Cannot write points");
}
});
}
It works and it takes 6 seconds but I think it may be faster.
I tried to use threadpool but it doesn't work because I can't create a lot of clients.
use influx_db_client::{
Client, Point, Points, Value, Precision
};
use tokio;
use threadpool::ThreadPool;
fn main() {
let n_workers = 4;
let data: Vec<Point> = get_data();
let pool = ThreadPool::new(n_workers);
for chunk in data.chunks(60000) {
let points = Points::create_new(chunk.to_vec());
pool.execute(move || {
tokio::runtime::Runtime::new().unwrap().block_on(async move {
let client = Client::new(Url::parse("http://localhost:8086").expect("Cannot parse url"), "test");
client.write_points(points, Some(Precision::Milliseconds), None).await.expect("Cannot write points");
});
});
}
}
So I don't know how to execute it faster.
Can I use asynchronous?
Probably you know some optimization tricks?
data contains 1'400'000 points (they are quickly created, 1.3 s, get_data() - 0.627 s, chunk loop (without calling client.write_points) - 0.678 s)

Fastest way to send many groups of HTTP requests using new async/await syntax and control the amount of workers

Most recent threads I have read are saying async is the better way to perform lots of I/O bound work such as sending HTTP requests and the like. I have tried to pick up async recently but am struggling with understanding how to send many groups of requests in parallel, for example:
let client = reqwest::Client::new();
let mut requests = 0;
let get = client.get("https://somesite.com").send().await?;
let response = get.text().await?;
if response.contains("some stuff") {
let get = client.get("https://somesite.com/something").send().await?;
let response = get.text().await?;
if response.contains("some new stuff") {
requests += 1;
println!("Got response {}", requests)
This does what I want, but how can I run it in parallel and control the amount of "worker threads" or whatever the equivalent is to a thread pool in async?
I understand it is similar to this question, but mine is strictly talking about the nightly Rust async/await syntax and a more specific use case where groups of requests/tasks need to be done. I also find using combinators for these situations a bit confusing, was hoping the newer style would help make it a bit more readable.

Not sure if this is the fastest way, as I am just experimenting myself, but here is my solution:
let client = reqwest::Client::new();
let links = vec![ // A vec of strings representing links
"example.net/a".to_owned(),
"example.net/b".to_owned(),
"example.net/c".to_owned(),
"example.net/d".to_owned(),
];
let ref_client = &client; // Need this to prevent client from being moved into the first map
futures::stream::iter(links)
.map(async move |link: String| {
let res = ref_client.get(&link).send().await;
// res.map(|res| res.text().await.unwrap().to_vec())
match res { // This is where I would usually use `map`, but not sure how to await for a future inside a result
Ok(res) => Ok(res.text().await.unwrap()),
Err(err) => Err(err),
}
})
.buffer_unordered(10) // Number of connection at the same time
.filter_map(|c| future::ready(c.ok())) // Throw errors out, do your own error handling here
.filter_map(|item| {
if item.contains("abc") {
future::ready(Some(item))
} else {
future::ready(None)
}
})
.map(async move |sec_link| {
let res = ref_client.get(&sec_link).send().await;
match res {
Ok(res) => Ok(res.text().await.unwrap()),
Err(err) => Err(err),
}
})
.buffer_unordered(10) // Number of connections for the secondary requests (so max 20 connections concurrently)
.filter_map(|c| future::ready(c.ok()))
.for_each(|item| {
println!("File received: {}", item);
future::ready(())
})
.await;
This requires the #![feature(async_closure)] feature.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

from a stream "FramedRead" how to "do something" in every chunk - rust

Related

How to select a file as bytes or text in Rust WASM?

Is it possible to get both the text and the JSON of a response from reqwest

How do I get URL parameters?

How to speed up writing points to InfluxDB?

Fastest way to send many groups of HTTP requests using new async/await syntax and control the amount of workers

Categories

Resources