I'm iterating over several gigabytes of input items from a database. On each input item, I'm doing some CPU-intensive processing which produces one or more new output items, tens of gigabytes in total. The output items are then stored in another database table.
I have gotten a nice speedup by using Rayon for parallel processing. However, the database API is not thread-safe; it's Send but not Sync, so the I/O must be serialized.
Ideally, I would just want to write:
input_database
.read_items()
.par_bridge() // Start parallelism.
.flat_map_iter(|input_item| {
// produce an Iterator<Item = OutputItem>
})
.ser_bridge() // End parallelism. This function does not exist.
.for_each(|output_item| {
output_database.write_item(output_item);
});
Basically I want the opposite of par_bridge(); something that runs on the thread where it's called, reads items from each thread, and produces them serially. But in the current implementation of Rayon, this doesn't seem to exist. I'm not sure whether this is because it's theoretically impossible, or whether it doesn't fit into the current design of the library.
The output is too big to collect it all into a Vec first; it needs to be streamed into the database directly.
By the way, I'm not married to Rayon; if there's another crate that is more suitable, I'm happy to make the switch.
You can wrap your output database in an Arc<Mutex> to prevent parallel accesses:
let output_database = Arc::new (Mutex::new (output_database));
input_database
.read_items()
.par_bridge() // Start parallelism.
.flat_map_iter(|input_item| {
// produce an Iterator<Item = OutputItem>
})
.for_each_with (output_database, |output_database, output_item| {
output_database.lock().write_item(output_item);
});
I assume the order doesn't matter, therefore you don't need to have an order of your output data.
You could use a mpsc::channel to transfer your data from the for_each closure to your database api, e.g.
use std::sync::mpsc;
let (tx, rx) = mpsc::channel();
input_database
.read_items()
.par_bridge() // Start parallelism.
.flat_map_iter(|input_item| {
// produce an Iterator<Item = OutputItem>
})
.for_each(move |output_item| {
tx.send(output_item).unwrap();
});
and in a second thread you can use the rx variable to receive the data and write it to the database.
Related
I am a javascript developer and I am building a desktop application using Tauri. I am used to single-threaded languages and trying to get comfortable with the concept of concurrency.
My problem can be summarized as follows. I receive a JSON array from the backend of length 500. I loop through this array and perform some asynchronous operations like making a network call. In the end, I aggregate, structure, and return the data. This entire process takes around 25-35 seconds on my machine.
I wanted to leverage the concept of concurrency to reduce the time required for this operation. One possible solution that I thought of, was to create n number of threads, let's say 8, and parallelly process the data.
async fun main(){
// creating variables which will hold final structured data
let app_data;
let app_to_bundle_map;
// create 8 threads
let thread_1 = thread::spawn(|| {})
let thread_2 = thread::spawn(|| {})
//.. so on
for i in 1..500{
let thread_assigned = i % 8
match thread_assigned {
1 => {
// process_data() on thread 1 and insert into app_data & app_to_bundle_map.
// But how do I assign process_data() to the thread's closure?
// How do I make sure thread 1 is available for use?
}
2 => {
// process_data() on thread 2 and insert into app_data & app_to_bundle_map
}
_ => {
// process_data() on thread _ and insert into app_data & app_to_bundle_map
}
}
}
}
fun process_data(item: String, data: &Data){
// perform some heavy operations like
make_network_call(item.url)
// perform more operations and modify the function argument
more_processing()
}
async fun make_a_network_call(url: String) -> String{
let client = reqwest::Client::builder().build().unwrap();
let mut _res: Result<Response, Error> = client.get(url).send().await;
match _res{
OK(_res) => {
// return response
}
Err(_res) => {
format!(r#"{{"error": "{placeholder}"}}"#, placeholder = _res.to_string())
}
}
}
Another option I thought of was dividing my data of size 500 into 8 parts and then processing them parallelly. Is this a better approach? Or are both the approaches wrong, if so, what do you suggest is the correct method to solve such problems in rust? Overall, my final goal is to reduce the time from 25-35 seconds to less than 10 seconds. Looking forward to everybody's insights. Thank you in advance.
Concurrency is hard [citation needed]. What you are trying to do here is manually handling it. That of course could work, but it will be a pain, especially if you are a beginner. Luckily there are fantastic libraries already out there that provide handling concurrency part for you. You should looking if they already provide what you need.From your description I am not quite sure if you are CPU or IO bound.
If you are CPU bound you should look at rayon crate. It allows to easily iterate in parallel over some iterator.
If you are IO bound you should look at async rust. There are many libraries, that allow to do many things, but I would recommend tokio to begin with. It is production-ready and has great emphasis put on networking. You would need however to learn a bit about async rust, as it requires different thinking model than normal synchronous code.
And regardless of which one you choose you should familiarize yourself with channels. They are a great and easy tool for passing data around, including from one thread to another.
I'm new to Rust.
For learning purposes, I'm writing a simple program to search for files in Linux, and it uses a recursive function:
fn ffinder(base_dir:String, prmtr:&'static str, e:bool, h:bool) -> std::io::Result<()>{
let mut handle_vec = vec![];
let pth = std::fs::read_dir(&base_dir)?;
for p in pth {
let p2 = p?.path().clone();
if p2.is_dir() {
if !h{ //search doesn't include hidden directories
let sstring:String = get_fname(p2.display().to_string());
let slice:String = sstring[..1].to_string();
if slice != ".".to_string() {
let handle = thread::spawn(move || {
ffinder(p2.display().to_string(),prmtr,e,h);
});
handle_vec.push(handle);
}
}
else {//search include hidden directories
let handle2 = thread::spawn(move || {
ffinder(p2.display().to_string(),prmtr,e,h);
});
handle_vec.push(handle2);
}
}
else {
let handle3 = thread::spawn(move || {
if compare(rmv_underline(get_fname(p2.display().to_string())),rmv_underline(prmtr.to_string()),e){
println!("File found at: {}",p2.display().to_string().blue());
}
});
handle_vec.push(handle3);
}
}
for h in handle_vec{
h.join().unwrap();
}
Ok(())
}
I've tried to use multi threading (thread::spawn), however it can create too many threads, exploding the OS limit, which breaks the program execution.
Is there a way to multi thread with recursion, using a safe,limited (fixed) amount of threads?
As one of the commenters mentioned, this is an absolutely perfect case for using Rayon. The blog post mentioned doesn't show how Rayon might be used in recursion, only making an allusion to crossbeam's scoped threads with a broken link. However, Rayon provides its own scoped threads implementation that solves your problem as well in that only uses as many threads as you have cores available, avoiding the error you ran into.
Here's the documentation for it:
https://docs.rs/rayon/1.0.1/rayon/fn.scope.html
Here's an example from some code I recently wrote. Basically what it does is recursively scan a folder, and each time it nests into a folder it creates a new job to scan that folder while the current thread continues. In my own tests it vastly outperforms a single threaded approach.
let source = PathBuf::from("/foo/bar/");
let (tx, rx) = mpsc::channel();
rayon::scope(|s| scan(&source, tx, s));
fn scan<'a, U: AsRef<Path>>(
src: &U,
tx: Sender<(Result<DirEntry, std::io::Error>, u64)>,
scope: &Scope<'a>,
) {
let dir = fs::read_dir(src).unwrap();
dir.into_iter().for_each(|entry| {
let info = entry.as_ref().unwrap();
let path = info.path();
if path.is_dir() {
let tx = tx.clone();
scope.spawn(move |s| scan(&path, tx, s)) // Recursive call here
} else {
// dbg!("{}", path.as_os_str().to_string_lossy());
let size = info.metadata().unwrap().len();
tx.send((entry, size)).unwrap();
}
});
}
I'm not an expert on Rayon, but I'm fairly certain the threading strategy works like this:
Rayon creates a pool of threads to match the number of logical cores you have available in your environment. The first call to the scoped function creates a job that the first available thread "steals" from the queue of jobs available. Each time we make another recursive call, it doesn't necessarily execute immediately, but it creates a new job that an idle thread can then "steal" from the queue. If all of the threads are busy, the job queue just fills up each time we make another recursive call, and each time a thread finishes its current job it steals another job from the queue.
The full code can be found here: https://github.com/1Dragoon/fcp
(Note that repo is a work in progress and the code there is currently typically broken and probably won't work at the time you're reading this.)
As an exercise to the reader, I'm more of a sys admin than an actual developer, so I also don't know if this is the ideal approach. From Rayon's documentation linked earlier:
scope() is a more flexible building block compared to join(), since a loop can be used to spawn any number of tasks without recursing
The language of that is a bit confusing. I'm not sure what they mean by "without recursing". Join seems to intend for you to already have tasks known about ahead of time and to execute them in parallel as threads become available, whereas scope seems to be more aimed at only creating jobs when you need them rather than having to know everything you need to do in advance. Either that or I'm understanding their meaning backwards.
I got really stuck at this very simple problem in rust:
Ideally using a single core CPU, I want to be able to read a parts of the same file in a non blocking way:
struct PrefetchOp {
}
impl PrefetchOp {
async fn start (currentData: [u8;8]) {
let next = self.prefetch(); // this reads data from disk and puts it in memory
self.perform_slow_step(currentData);
let mut future
for x in 0..100 {
future = self.prefetch();
self.perform_slow_step(next.await);
next = future;
}
}
}
Now, I want to write a prefetch which reads data in an async way such that hopefully it will be ready by the next slow step or that line will wait for it to be done.
Now, prefetch is reading parts of the same file and my plan was to:
Implement a future that creates the file descriptor, performs a poll_seek and performs poll_read
Or, create a file descriptor before the loop and have a future that does the poll_seek and the poll_read. Will I not have concurrency issues then?
How can I do this in the simplest most elegant way?
I have a large vector of Hyper HTTP request futures and want to resolve them into a vector of results. Since there is a limit of maximum open files, I want to limit concurrency to N futures.
I've experimented with Stream::buffer_unordered but seems like it executed futures one by one.
We've used code like this in a project to avoid opening too many TCP sockets. These futures have Hyper futures within, so it seems exactly the same case.
// Convert the iterator into a `Stream`. We will process
// `PARALLELISM` futures at the same time, but with no specified
// order.
let all_done =
futures::stream::iter(iterator_of_futures.map(Ok))
.buffer_unordered(PARALLELISM);
// Everything after here is just using the stream in
// some manner, not directly related
let mut successes = Vec::with_capacity(LIMIT);
let mut failures = Vec::with_capacity(LIMIT);
// Pull values off the stream, dividing them into success and
// failure buckets.
let mut all_done = all_done.into_future();
loop {
match core.run(all_done) {
Ok((None, _)) => break,
Ok((Some(v), next_all_done)) => {
successes.push(v);
all_done = next_all_done.into_future();
}
Err((v, next_all_done)) => {
failures.push(v);
all_done = next_all_done.into_future();
}
}
}
This is used in a piece of example code, so the event loop (core) is explicitly driven. Watching the number of file handles used by the program showed that it was capped. Additionally, before this bottleneck was added, we quickly ran out of allowable file handles, whereas afterward we did not.
I want to implement a simple server, used by 3 different module of my project.
These modules will send data to the server, which will save it into a file and merge these informations when these modules will finish their job.
All these informations have a timestamp (a float) and a label (a float or a string).
This is my data structure to save these informations:
pub struct Data {
file_name: String,
logs: Vec<(f32, String)>,
measures: Vec<(f32, f32)>,
statements: Vec<(f32, String)>,
}
I use socket to interact with the server.
I use also Arc to implement a Data struct and make it shareable for each of these modules.
So, when I handle the client, I verify if the message sent by the module is correct, and if it is I call a new function that process and save the message in the good data structure field (logs, measures or statements).
// Current ip address
let ip_addr: &str = &format!("{}:{}",
&ip,
port);
// Bind the current IP address
let listener = match TcpListener::bind(ip_addr) {
Ok(listener) => listener,
Err(error) => panic!("Canno't bind {}, due to error {}",
ip_addr,
error),
};
let global_data_struct = Data::new(DEFAULT_FILE.to_string());
let global_data_struct_shared = Arc::new(global_data_struct);
// Get and process streams
for stream in listener.incoming() {
let mut global_data_struct_shared_clone = global_data_struct_shared.clone();
thread::spawn(move || {
// Borrow stream
let stream = stream;
match stream {
// Get the stream value
Ok(mut stream_v) => {
let current_ip = stream_v.peer_addr().unwrap().ip();
let current_port = stream_v.peer_addr().unwrap().port();
println!("Connected with peer {}:{}", current_ip, current_port);
// PROBLEM IN handle_client!
// A get_mut from global_data_struct_shared_clone
// returns to me None, not a value - so I
// can't access to global_data_struct_shared_clone
// fields :'(
handle_client(&mut stream_v, &mut global_data_struct_shared_clone);
},
Err(_) => error!("Canno't decode stream"),
}
});
}
// Stop listening
drop(listener);
I have some problems to get a mutable reference in handle_client to process fields in global_data_struct_shared_clone, because the Arc::get_mut(global_data_struct_shared_clone) returns to me None - due to the global_data_struct_shared.clone() for each incoming request.
Can someone help me to manage correctly this structure between these 3 modules please?
The insight of Rust is that memory safety is achieved by enforcing Aliasing XOR Mutability.
Enforcing this single principle prevents whole classes of bugs: pointer/iterator invalidation (which was the goal) and also data races.
As much as possible, Rust will try to enforce this principle at compile-time; however it can also enforce it at run-time if the user opts in by using dedicated types/methods.
Arc::get_mut is such a method. An Arc (Atomic Reference Counted pointer) is specifically meant to share a reference between multiple owners, which means aliasing, and as a result disallows mutability by default; Arc::get_mut will perform a run-time check: if the pointer is actually not alias (count of 1), then it allows mutability.
However, as you realized, this is not suitable in your case since the Arc is aliased at that point in time.
So you need to turn to other types.
The simplest solution is Arc<Mutex<...>>, Arc allows sharing, Mutex allows controlled mutability, together you can share with run-time controlled mutability enforced by the Mutex.
This is coarse-grained, but might very well be sufficient.
More sophisticated approaches can use RwLock (Reader-Writer lock), more granular Mutex or even atomics; but I would advise starting with a single Mutex and see how it goes, you have to walk before you run.