How to bulk update mongoDB in Rust by inserting full doc if entry doesn't exist and updating select fields if entry exists? - rust

In short, I am interested in finding the most optimal, minimal call amount way of executing this pseudocode logic:
match find(doc) {
Some(x) => x.update(select_fields),
None=>collection.insert(all_fields)
}
but in bulk, for the entire local DB, without iterating one by one. Is there such a method? What's the most minimal one currently available?
My use case:
I have a HashMap<T,MyStruct>. I've packed both key and value into the doc!{}. Is that okay?
For some reason I was getting error trait From<u64> is not implemented for Bson in key3, so I changed my code to f64:
let dmp_op = my_database.lock().unwrap().clone();
let mut dmp_db = vec![];
for (k,v) in dmp_op{
dmp_db.push(doc! { "key": value, "key2": value2, "key3": value3 as f64 },
)
};
match collection.insert_many(dmp_db, None).await{
Ok(x)=>x,
Err(x)=>{
println!("{:?}",x);
continue
}
};
This part works, but that's non-repeatable. Instead of doing this, I'd love to execute the aforementioned logic in the most optimal way from scratch.
I can't find any information as to whether all the singular methods I used in an implementation of find_one_and_update() + upsert can be used in bulk.
PS: On second thought... maybe my infra logic is flawed? Just starting with MongoDB, what is more preferable:
Inserting/Updating one by one inside worker threads into MongoDB instead of local HashMap
Creating a separate thread that from time to time inserts into MongoDB the local HashMap / cleanses it to keep low resource?

Related

How to avoid clones when using postgres_types::Json?

I'm currently doing a rust app which uses tokio postgres and i need to make a sql request to fetch some data based on a jsonb row. The problem is that tokio postgres use a particular type (postgres_types::Json) which can be used like this : &Json::<Struct>(struct_var).
The struct var can't be a reference so the Json takes ownership which raises a problem as i need to use one of the struct's field after.
I could solve the problem using clone but i wanted to know before if there was an other solution which would not lower the performances.
Here is the function :
pub async fn user_exists_ipv4(
pool: &Pool,
ip: IpAddr,
device: &Device,
) -> Result<Option<Uuid>, String> {
// Get a connection from the pool
let conn = get_connection(pool).await?;
let country = &device.country[..];
// Get the user id from the database
let result = conn
.query(
FETCH_USER_QUERY_FOR_V4,
&[
&ip.to_string(),
&Json::<Device>(device.clone()),
&country.to_string(),
],
)
.await?
...
You can use references with Json, it is simply a wrapper that implements ToSql for types that are Serialize-able. That will include &T where T: Serialize. So you can use it with device directly as it is:
&Json::<&Device>(device)
You also don't need to annotate the type of Json explicitly since it can be inferred directly from what you pass to it. The code above could be more succinctly written as:
&Json(device)

How can I insert a key-value pair in an unordered map that is present in the innermost struct in rust programming language?

This is my data model:
pub struct RaffleDetails {
prize: Balance,
start: Timestamp,
end: Timestamp,
participants: UnorderedMap<AccountId, Balance>,
}
pub struct RaffleDapp {
raffles: UnorderedMap<AccountId, RaffleDetails>,
}
How can I insert a key-value pair in the 'participants' variable?
I tried self.raffles.get(&raffle_account_id).unwrap().participants.insert(&env::predecessor_account_id(), &confidence); but it's not persistent.
References:
UnorderedMap
NEAR Rust SDK
You need to make sure you are updating the RaffleDetails instance that is in the map, not a copy/clone of it.
I'm not familiar with UnorderedMap, but it seems to me the get() method returns a copy of the value that is in the map, so you are only updating the copied value. I don't know if UnorderedMap allows you to mutate a value in it directly (skimming through the docs, I don't see such a method). What you can do though is re-insert the modified RaffleDetails into the raffles map (so as to replace the old one with the modified one).
I'm talking about something like this (I haven't tested compiling it):
let o = self.raffles.get(&raffle_account_id);
if let copied_rd = Some(o) {
copied_rd.participants.insert(&env::predecessor_account_id(), &confidence);
self.raffles.insert(&raffle_account_id, &copied_rd);
}

Moving context into several closures?

I have found a way to move context into several closures, but it looks ugly. I do it with help of Rc and cloning each variable I need to use for each closure. Particularly I don't like to clone every variable for every closure I want to use:
let mut context = Rc::new( Context { a : 13 } );
..
let context_clone_1 = Rc::clone( &context );
engine.on_event1( Box::new( move ||
{
println!( "on_event1 : {}", context_clone_1.a );
...
let context_clone_2 = Rc::clone( &context );
engine.on_event2( Box::new( move ||
{
println!( "on_event1 : {}", context_clone_1.a );
...
It is an extensive way to go and I feel there must be a better way to do it. Also, uncommenting line // context_clone_1.a += 1; breaks the compilation. What is the proper way of solving problems like this in Rust?
Here is a playground with minimal code.
There are two "problems" here:
Since you specifically asked about context_clone_1.a += 1;: When putting a value into an Rc, there could be multiple references to that value, derived from the independent Rc owners. If mutation was allowed, this would also allow simultaneous mutation and aliasing, which is not allowed in Rust; therefore Rc does not allow mutating its inner value. A common approach to regain mutability is to put the value into a RefCell, which provides mutability through try_borrow_mut() with a runtime check that ensures no aliasing occurs. A Rc<RefCell<T>> is commonly seen in Rust.
Regarding the use of Rc: The way your code is currently set up is actually fine, at least if that's how it should work. The way the code is currently structured allows for flexibility, including cases where multiple Context-objects provide callback implementations on different events. For example, this is currently possible:
let context1 = Context { a : 13 };
engine.on_event1(Box::new(move ||
{
println!("on_event1 : {}", context1.a );
});
let context2 = Context { a : 999 };
engine.on_event2(Box::new(move ||
{
println!("on_event1 : {}", context2.a );
});
In case you have exactly one Context (as in your example), and since the Engine needs to make sure that all callbacks are alive while it itself is alive, you'll need to put each callback - which is structured as a completely separate thing - into a Rc. In your case, all Rc end up pointing to the same object; but they don't have to and this is what your code currently allows for.
A more simple solution would be to define a trait for Context, something along the lines of
trait EventDriver {
fn event1(&mut self, &Engine);
fn event2(&mut self, &Engine);
}
... and then have Context implement the trait. The Engine-struct then becomes generic over E: EventDriver and Context becomes the E in that. This solution only allows for exactly one instance of Context to provide event callbacks. But since Engine is the owner of that object, it can be sure that all callbacks are alive while it itself is alive and the whole Rc-thing goes away.

Handling duplicate inserts into database in async rust

Beginner in both rust and async programming here.
I have a function that downloads and stores a bunch of tweets in the database:
pub async fn process_user_timeline(config: &Settings, pool: &PgPool, user_object: &Value) {
// get timeline
if let Ok((user_timeline, _)) =
get_user_timeline(config, user_object["id"].as_str().unwrap()).await
{
// store tweets
if let Some(tweets) = user_timeline["data"].as_array() {
for tweet in tweets.iter() {
store_tweet(pool, &tweet, &user_timeline, "normal")
.await
.unwrap_or_else(|e| {
println!(
">>>X>>> failed to store tweet {}: {:?}",
tweet["id"].as_str().unwrap(),
e
)
});
}
}
}
}
It's being called in an asynchronous loop by another function:
pub async fn loop_until_hit_rate_limit<'a, T, Fut>(
object_arr: &'a [T],
settings: &'a Settings,
pool: &'a PgPool,
f: impl Fn(&'a Settings, &'a PgPool, &'a T) -> Fut + Copy,
rate_limit: usize,
) where
Fut: Future,
{
let total = object_arr.len();
let capped_total = min(total, rate_limit);
let mut futs = vec![];
for (i, object) in object_arr[..capped_total].iter().enumerate() {
futs.push(async move {
println!(">>> PROCESSING {}/{}", i + 1, total);
f(settings, pool, object).await;
});
}
futures::future::join_all(futs).await;
}
Sometimes two async tasks will try to insert the same tweet at the same time, producing this error:
failed to store tweet 1398307091442409475: Database(PgDatabaseError { severity: Error, code: "23505", message: "duplicate key value violates unique constraint \"tweets_tweet_id_key\"", detail: Some("Key (tweet_id)=(1398307091442409475) already exists."), hint: None, position: None, where: None, schema: Some("public"), table: Some("tweets"), column: None, data_type: None, constraint: Some("tweets_tweet_id_key"), file: Some("nbtinsert.c"), line: Some(656), routine: Some("_bt_check_unique") })
Mind the code already checks for whether a tweet is present before inserting it, so this only happens in the following scenario: READ from task 1 > READ from task 2 > WRITE from task 1 (success) > WRITE from task 2 (error).
To solve this, my best attempt so far has been to place an unwrap_or_else() clause which lets one of the tasks fail without panicking out of the entire execution. I am aware of at least one drawback - sometimes both tasks will bail out and the tweet never gets written. It happens in <1% of cases, but it happens.
Are there other drawbacks to my approach I'm not aware of?
What's the right way to handle this? I hate losing data, and even worse doing so non-deterministically.
PS I'm using actix web and sqlx as my webserver / db libraries.
Generally for anything that may be written by multiple threads/processes, any logic like
if (!exists) {
writeValue()
}
needs to either be protected by some kind of lock, or the code needs to be changed to write atomically with the possibility the write will fail because something else already wrote to it.
For in-memory data in Rust you'd use Mutex to ensure that you can read and then write the data back before anything else reads it, or Atomic to modify the data in such a way that if something already wrote it, you can detect that.
In databases, for any query that might conflict with some other query happening around the same time, you'd want to use an ON CONFLICT clause in your query so that the database itself knows what to do when it tries to write data and it already exists.
For your case since I'm guessing the tweets are immutable, you'd likely want to do ON CONFLICT tweet_id DO NOTHING (or whatever your ID column is), in which case the INSERT will skip inserting if there is already a tweet with the ID you are inserting, and it won't throw an error.

How can I make this Rust code more idiomatic

Recently I started to learn Rust and one of my main struggles is converting years of Object Oriented thinking into procedural code.
I'm trying to parse a XML that have tags that are processed by an specific handler that can deal with the data it gets from the children.
Further more I have some field members that are common between them and I would prefer not to have to write the same fields to all the handlers.
I tried my hand on it and my code came out like this:
use roxmltree::Node; // roxmltree = "0.14.0"
fn get_data_from(node: &Node) -> String {
let tag_name = get_node_name(node);
let tag_handler: dyn XMLTagHandler = match tag_name {
"name" => NameHandler::new(),
"phone" => PhoneHandler::new(),
_ => DefaultHandler::new()
}
if tag_handler.is_recursive() {
for child in node.children() {
let child_value = get_data_from(&child);
// do something with child value
}
}
let value: String = tag_handler.value()
value
}
// consider that handlers are on my project and can be adapted to my needs, and that XMLTagHandler is the trait that they share in common.
My main issues with this are:
This feels like a Object oriented approach to it;
is_recursive needs to be reimplemented to each struct because they traits cannot have field members, and I will have to add more fields later, which means more boilerplate for each new field;
I could use one type for a Handler and pass to it a function pointer, but this approach seems dirty. e.g.:=> Handler::new(my_other_params, phone_handler_func)
This feels like a Object oriented approach to it
Actually, I don't think so. This code is in clear violation of the Tell-Don't-Ask principle, which falls out from the central idea of object-oriented programming: the encapsulation of data and related behavior into objects. The objects (NameHandler, PhoneHandler, etc.) don't have enough knowledge about what they are to do things on their own, so get_data_from has to query them for information and decide what to do, rather than simply sending a message and letting the object figure out how to deal with it.
So let's start by moving the knowledge about what to do with each kind of tag into the handler itself:
trait XmlTagHandler {
fn foreach_child<F: FnMut(&Node)>(&self, node: &Node, callback: F);
}
impl XmlTagHandler for NameHandler {
fn foreach_child<F: FnMut(&Node)>(&self, _node: &Node, _callback: F) {
// "name" is not a recursive tag, so do nothing
}
}
impl XmlTagHandler for DefaultHandler {
fn foreach_child<F: FnMut(&Node)>(&self, node: &Node, callback: F) {
// all other tags may be recursive
for child in node.children() {
callback(child);
}
}
}
This way you call foreach_child on every kind of Handler, and let the handler itself decide whether the right action is to recurse or not. After all, that's why they have different types -- right?
To get rid of the dyn part, which is unnecessary, let's write a little generic helper function that uses XmlTagHandler to handle one specific kind of tag, and modify get_data_from so it just dispatches to the correct parameterized version of it. (I'll suppose that XmlTagHandler also has a new function so that you can create one generically.)
fn handle_tag<H: XmlTagHandler>(node: &Node) -> String {
let handler = H::new();
handler.foreach_child(node, |child| {
// do something with child value
});
handler.value()
}
fn get_data_from(node: &Node) -> String {
let tag_name = get_node_name(node);
match tag_name {
"name" => handle_tag::<NameHandler>(node),
"phone" => handle_tag::<PhoneHandler>(node),
_ => handle_tag::<DefaultHandler>(node),
}
}
If you don't like handle_tag::<SomeHandler>(node), also consider making handle_tag a provided method of XmlTagHandler, so you can instead write SomeHandler::handle(node).
Note that I have not really changed any of the data structures. Your presumption of an XmlTagHandler trait and various Handler implementors is a pretty normal way to organize code. However, in this case, it doesn't offer any real improvement over just writing three separate functions:
fn get_data_from(node: &Node) -> String {
let tag_name = get_node_name(node);
match tag_name {
"name" => get_name_from(node),
"phone" => get_phone_from(node),
_ => get_other_from(node),
}
}
In some languages, such as Java, all code has to be part of some class – so you can find yourself writing classes that don't exist for any other reason than to group related things together. In Rust you don't need to do this, so make sure that any added complication such as XmlTagHandler is actually pulling its weight.
is_recursive needs to be reimplemented to each struct because they traits cannot have field members, and I will have to add more fields later, which means more boilerplate for each new field
Without more information about the fields, it's impossible to really understand what problem you're facing here; however, in general, if there is a family of structs that have some data in common, you may want to make a generic struct instead of a trait. See the answers to How to reuse codes for Binary Search Tree, Red-Black Tree, and AVL Tree? for more suggestions.
I could use one type for a Handler and pass to it a function pointer, but this approach seems dirty
Elegance is sometimes a useful thing, but it is subjective. I would recommend closures rather than function pointers, but this suggestion doesn't seem "dirty" to me. Making closures and putting them in data structures is a very normal way to write Rust code. If you can elaborate on what you don't like about it, perhaps someone could point out ways to improve it.

Resources