How to avoid clones when using postgres_types::Json?

How to avoid clones when using postgres_types::Json? - rust

I'm currently doing a rust app which uses tokio postgres and i need to make a sql request to fetch some data based on a jsonb row. The problem is that tokio postgres use a particular type (postgres_types::Json) which can be used like this : &Json::<Struct>(struct_var).
The struct var can't be a reference so the Json takes ownership which raises a problem as i need to use one of the struct's field after.
I could solve the problem using clone but i wanted to know before if there was an other solution which would not lower the performances.
Here is the function :
pub async fn user_exists_ipv4(
pool: &Pool,
ip: IpAddr,
device: &Device,
) -> Result<Option<Uuid>, String> {
// Get a connection from the pool
let conn = get_connection(pool).await?;
let country = &device.country[..];
// Get the user id from the database
let result = conn
.query(
FETCH_USER_QUERY_FOR_V4,
&[
&ip.to_string(),
&Json::<Device>(device.clone()),
&country.to_string(),
],
)
.await?
...

You can use references with Json, it is simply a wrapper that implements ToSql for types that are Serialize-able. That will include &T where T: Serialize. So you can use it with device directly as it is:
&Json::<&Device>(device)
You also don't need to annotate the type of Json explicitly since it can be inferred directly from what you pass to it. The code above could be more succinctly written as:
&Json(device)

Related

Handling duplicate inserts into database in async rust

Beginner in both rust and async programming here.
I have a function that downloads and stores a bunch of tweets in the database:
pub async fn process_user_timeline(config: &Settings, pool: &PgPool, user_object: &Value) {
// get timeline
if let Ok((user_timeline, _)) =
get_user_timeline(config, user_object["id"].as_str().unwrap()).await
{
// store tweets
if let Some(tweets) = user_timeline["data"].as_array() {
for tweet in tweets.iter() {
store_tweet(pool, &tweet, &user_timeline, "normal")
.await
.unwrap_or_else(|e| {
println!(
">>>X>>> failed to store tweet {}: {:?}",
tweet["id"].as_str().unwrap(),
e
)
});
}
}
}
}
It's being called in an asynchronous loop by another function:
pub async fn loop_until_hit_rate_limit<'a, T, Fut>(
object_arr: &'a [T],
settings: &'a Settings,
pool: &'a PgPool,
f: impl Fn(&'a Settings, &'a PgPool, &'a T) -> Fut + Copy,
rate_limit: usize,
) where
Fut: Future,
{
let total = object_arr.len();
let capped_total = min(total, rate_limit);
let mut futs = vec![];
for (i, object) in object_arr[..capped_total].iter().enumerate() {
futs.push(async move {
println!(">>> PROCESSING {}/{}", i + 1, total);
f(settings, pool, object).await;
});
}
futures::future::join_all(futs).await;
}
Sometimes two async tasks will try to insert the same tweet at the same time, producing this error:
failed to store tweet 1398307091442409475: Database(PgDatabaseError { severity: Error, code: "23505", message: "duplicate key value violates unique constraint \"tweets_tweet_id_key\"", detail: Some("Key (tweet_id)=(1398307091442409475) already exists."), hint: None, position: None, where: None, schema: Some("public"), table: Some("tweets"), column: None, data_type: None, constraint: Some("tweets_tweet_id_key"), file: Some("nbtinsert.c"), line: Some(656), routine: Some("_bt_check_unique") })
Mind the code already checks for whether a tweet is present before inserting it, so this only happens in the following scenario: READ from task 1 > READ from task 2 > WRITE from task 1 (success) > WRITE from task 2 (error).
To solve this, my best attempt so far has been to place an unwrap_or_else() clause which lets one of the tasks fail without panicking out of the entire execution. I am aware of at least one drawback - sometimes both tasks will bail out and the tweet never gets written. It happens in <1% of cases, but it happens.
Are there other drawbacks to my approach I'm not aware of?
What's the right way to handle this? I hate losing data, and even worse doing so non-deterministically.
PS I'm using actix web and sqlx as my webserver / db libraries.

Generally for anything that may be written by multiple threads/processes, any logic like
if (!exists) {
writeValue()
}
needs to either be protected by some kind of lock, or the code needs to be changed to write atomically with the possibility the write will fail because something else already wrote to it.
For in-memory data in Rust you'd use Mutex to ensure that you can read and then write the data back before anything else reads it, or Atomic to modify the data in such a way that if something already wrote it, you can detect that.
In databases, for any query that might conflict with some other query happening around the same time, you'd want to use an ON CONFLICT clause in your query so that the database itself knows what to do when it tries to write data and it already exists.
For your case since I'm guessing the tweets are immutable, you'd likely want to do ON CONFLICT tweet_id DO NOTHING (or whatever your ID column is), in which case the INSERT will skip inserting if there is already a tweet with the ID you are inserting, and it won't throw an error.

How can I make this Rust code more idiomatic

Recently I started to learn Rust and one of my main struggles is converting years of Object Oriented thinking into procedural code.
I'm trying to parse a XML that have tags that are processed by an specific handler that can deal with the data it gets from the children.
Further more I have some field members that are common between them and I would prefer not to have to write the same fields to all the handlers.
I tried my hand on it and my code came out like this:
use roxmltree::Node; // roxmltree = "0.14.0"
fn get_data_from(node: &Node) -> String {
let tag_name = get_node_name(node);
let tag_handler: dyn XMLTagHandler = match tag_name {
"name" => NameHandler::new(),
"phone" => PhoneHandler::new(),
_ => DefaultHandler::new()
}
if tag_handler.is_recursive() {
for child in node.children() {
let child_value = get_data_from(&child);
// do something with child value
}
}
let value: String = tag_handler.value()
value
}
// consider that handlers are on my project and can be adapted to my needs, and that XMLTagHandler is the trait that they share in common.
My main issues with this are:
This feels like a Object oriented approach to it;
is_recursive needs to be reimplemented to each struct because they traits cannot have field members, and I will have to add more fields later, which means more boilerplate for each new field;
I could use one type for a Handler and pass to it a function pointer, but this approach seems dirty. e.g.:=> Handler::new(my_other_params, phone_handler_func)

This feels like a Object oriented approach to it
Actually, I don't think so. This code is in clear violation of the Tell-Don't-Ask principle, which falls out from the central idea of object-oriented programming: the encapsulation of data and related behavior into objects. The objects (NameHandler, PhoneHandler, etc.) don't have enough knowledge about what they are to do things on their own, so get_data_from has to query them for information and decide what to do, rather than simply sending a message and letting the object figure out how to deal with it.
So let's start by moving the knowledge about what to do with each kind of tag into the handler itself:
trait XmlTagHandler {
fn foreach_child<F: FnMut(&Node)>(&self, node: &Node, callback: F);
}
impl XmlTagHandler for NameHandler {
fn foreach_child<F: FnMut(&Node)>(&self, _node: &Node, _callback: F) {
// "name" is not a recursive tag, so do nothing
}
}
impl XmlTagHandler for DefaultHandler {
fn foreach_child<F: FnMut(&Node)>(&self, node: &Node, callback: F) {
// all other tags may be recursive
for child in node.children() {
callback(child);
}
}
}
This way you call foreach_child on every kind of Handler, and let the handler itself decide whether the right action is to recurse or not. After all, that's why they have different types -- right?
To get rid of the dyn part, which is unnecessary, let's write a little generic helper function that uses XmlTagHandler to handle one specific kind of tag, and modify get_data_from so it just dispatches to the correct parameterized version of it. (I'll suppose that XmlTagHandler also has a new function so that you can create one generically.)
fn handle_tag<H: XmlTagHandler>(node: &Node) -> String {
let handler = H::new();
handler.foreach_child(node, |child| {
// do something with child value
});
handler.value()
}
fn get_data_from(node: &Node) -> String {
let tag_name = get_node_name(node);
match tag_name {
"name" => handle_tag::<NameHandler>(node),
"phone" => handle_tag::<PhoneHandler>(node),
_ => handle_tag::<DefaultHandler>(node),
}
}
If you don't like handle_tag::<SomeHandler>(node), also consider making handle_tag a provided method of XmlTagHandler, so you can instead write SomeHandler::handle(node).
Note that I have not really changed any of the data structures. Your presumption of an XmlTagHandler trait and various Handler implementors is a pretty normal way to organize code. However, in this case, it doesn't offer any real improvement over just writing three separate functions:
fn get_data_from(node: &Node) -> String {
let tag_name = get_node_name(node);
match tag_name {
"name" => get_name_from(node),
"phone" => get_phone_from(node),
_ => get_other_from(node),
}
}
In some languages, such as Java, all code has to be part of some class – so you can find yourself writing classes that don't exist for any other reason than to group related things together. In Rust you don't need to do this, so make sure that any added complication such as XmlTagHandler is actually pulling its weight.
is_recursive needs to be reimplemented to each struct because they traits cannot have field members, and I will have to add more fields later, which means more boilerplate for each new field
Without more information about the fields, it's impossible to really understand what problem you're facing here; however, in general, if there is a family of structs that have some data in common, you may want to make a generic struct instead of a trait. See the answers to How to reuse codes for Binary Search Tree, Red-Black Tree, and AVL Tree? for more suggestions.
I could use one type for a Handler and pass to it a function pointer, but this approach seems dirty
Elegance is sometimes a useful thing, but it is subjective. I would recommend closures rather than function pointers, but this suggestion doesn't seem "dirty" to me. Making closures and putting them in data structures is a very normal way to write Rust code. If you can elaborate on what you don't like about it, perhaps someone could point out ways to improve it.

Deserializing a String with into_serde makes the app panick

With a friend of mine, we're trying to use the serde_json crate to deserialize some message sent by a WebSocket.
We are having a specific error, and we managed to recreate it with the following snippet of code:
use serde::{Deserialize, Serialize};
#[derive(Deserialize, Debug)]
struct EffetSer {
test: String
}
fn main() {
let test_value = JsValue::from_str("{\"test\": \"value\"}");
let test_value: EffetSer = test_value.into_serde().unwrap();
log::error!("WOW : {:?}", test_value);
}
Our TOML has the following dependencies:
wasm-bindgen = { version = '0.2.63', features = ['serde-serialize'] }
serde = { version = '1.0', features = ["derive"] }
serde_json = '1.0.55'
js-sys = '0.3.40'
The error is the following:
app.js:310 panicked at 'called `Result::unwrap()` on an `Err` value: Error("invalid type: string \"{\\\"test\\\": \\\"value\\\"}\", expected struct EffetSer", line: 1, column: 23)'
Any help would be very appreciated, as we're still struggling to understand what we're doing wrong and why we cannot deserialize our String.

The problem is likely misunderstanding of into_serde's semantics.
According to documentation, it works like this:
Invokes JSON.stringify on this value and then parses the resulting JSON into an arbitrary Rust value.
In other words, its semantics are as following:
convert each component of the JsValue to the corresponding serde internal element;
deserialize the required type from the given tree of components.
Now, what does this mean in our case? Well, you created JsValue using JsValue::from_str, which, again according to documentation,
Creates a new JS value which is a string.
So, the JsValue here is not an object, as you are likely assuming; it is a primitive - a string, which simply happens to have the shape of object's JSON representation. Then, when you invoke from_serde, Serde sees the string - not as input, but as internal representation, which cannot be transformed into object.
Now, what to do? There are several ways to fix this code:
First and the most obvious: don't use JsValue at all, deserialize from &str directly with serde_json::from_str.
Use js_sys::JSON::parse to get the object-like JsValue from string, and then convert it to the EffetSer with into_serde. This is likely to be less efficient, since it requires the round-trip of JSON::parse and JSON::serialize to convert the string to object and then back to string.
Write your own method to convert JsValue to EffetSer directly. I'm not sure if this is possible, however, since I wasn't able to find a way to extract a single field from JS object.

Can't use a neon JsArray: This function takes 3 parameters but 2 were supplied

I'm learning how to use neon, but I don't understand a thing. If I try to execute this code:
#[macro_use]
extern crate neon;
use neon::vm::{Call, JsResult};
use neon::mem::Handle;
use neon::js::{JsInteger, JsNumber, JsString, JsObject, JsArray, JsValue, Object, Key};
use neon::js::error::{JsError, Kind};
fn test(call: Call) -> JsResult<JsArray> {
let scope = call.scope;
let js_arr: Handle<JsArray> = try!(try!(call.arguments.require(scope, 1)).check::<JsArray>());
js_arr.set(0, JsNumber::new(scope, 1000));
Ok(js_arr)
}
register_module!(m, {
m.export("test", test)
});
I get this error when I call js_arr.set: This function takes 3 parameters but 2 were supplied.
I don't understand why since it's a JsArray. Even Racer tells me that the set method takes 2 parameters. No matter what, js_arr.set takes 3 parameters in this order: &mut bool, neon::macro_internal::runtime::raw::Local and neon::macro_internal::runtime::raw::Local.
What's happening? I can't understand how JsArray works.

As paulsevere says on a GitHub issue for Neon, import neon::js::Object. In addition, do not import Key, which also provides a set method:
#[macro_use]
extern crate neon;
use neon::vm::{Call, JsResult};
use neon::js::{Object, JsArray, JsInteger, JsObject, JsNumber};
fn make_an_array(call: Call) -> JsResult<JsArray> {
let scope = call.scope; // the current scope for rooting handles
let array = JsArray::new(scope, 3);
array.set(0, JsInteger::new(scope, 9000))?;
array.set(1, JsObject::new(scope))?;
array.set(2, JsNumber::new(scope, 3.14159))?;
Ok(array)
}
register_module!(m, {
m.export("main", make_an_array)
});
This creates a brand new array. If you'd like to accept an array as the first argument to your function and then modify it, this works:
#[macro_use]
extern crate neon;
use neon::vm::{Call, JsResult};
use neon::js::{Object, JsArray, JsInteger, JsUndefined};
use neon::mem::Handle;
fn hello(call: Call) -> JsResult<JsUndefined> {
let scope = call.scope;
let js_arr: Handle<JsArray> = call.arguments.require(scope, 0)?.check::<JsArray>()?;
js_arr.set(0, JsInteger::new(scope, 1000))?;
Ok(JsUndefined::new())
}
register_module!(m, {
m.export("hello", hello)
});

let js_arr: Handle<JsArray> makes it clear that js_arr is a Handle<JsArray> and Handle<T> has this method:
unsafe fn set(self, out: &mut bool, obj: Local, val: Local) -> bool
I'd guess that you're accidentally trying to call Handle::set (which is unsafe and takes three non-self arguments) rather than JsArray::set (which is safe and takes two non-self arguments).
If that's the case, you need to force a deref_mut to occur. (_mut because JsArray::set takes &mut self.)
I haven't run into this sort of naming collision before, so I can't be certain whether the auto-deref is smart enough, but something like this may work:
(&mut js_arr).set(0, JsNumber::new(scope, 1000));
Failing that, two other things to try are:
JsArray::set(&mut js_arr, 0, JsNumber::new(scope, 1000));
(If the former example fails because it's too much like C++-style method overloading. This is known as Fully Qualified Syntax and is normally used to disambiguate when an object implements two traits which provide methods of the same name.)
Call js_arr.deref_mut() directly to get a mutable reference to the underlying JsArray, then call set on that.

How to pass a dynamic amount of typed arguments to a function?

Lets say I want to write a little client for an HTTP API. It has a resource that returns a list of cars:
GET /cars
It also accepts the two optional query parameters color and manufacturer, so I could query specific cars like:
GET /cars?color=black
GET /cars?manufacturer=BMW
GET /cars?color=green&manufacturer=VW
How would I expose these resources properly in Rust? Since Rust doesn't support overloading, defining multiple functions seems to be the usual approach, like:
fn get_cars() -> Cars
fn get_cars_by_color(color: Color) -> Cars
fn get_cars_by_manufacturer(manufacturer: Manufacturer) -> Cars
fn get_cars_by_manufacturer_and_color(manufacturer: Manufacturer, color: Color) -> Cars
But this will obviously not scale when you have more than a few parameters.
Another way would be to use a struct:
struct Parameters {
color: Option<Color>,
manufacturer: Option<Manufacturer>
}
fn get_cars(params: Parameters) -> Cars
This has the same scaling issue, every struct field must be set on creation (even if its value is just None).
I guess I could just accept a HashMap<String, String>, but that doesn't sound very good either.
So my question is, what is the proper/best way to do this in Rust?

You could use the Builder pattern, as mentioned here. For your particular API, it could look like this:
Cars::new_get()
.by_color("black")
.by_manufacturer("BMW")
.exec();

I would like to point out that no matter the solution, if you wish for a compile-time checked solution the "url parsing -> compile-time checkable" translation is necessarily hard-wired. You can generate that with an external script, with macros, etc... but in any case for the compiler to check it, it must exist at compile-time. There just is no short-cut.
Therefore, no matter which API you go for, at some point you will have something akin to:
fn parse_url(url: &str) -> Parameters {
let mut p: Parameters = { None, None };
if let Some(manufacturer) = extract("manufacturer", url) {
p.manufacturer = Some(Manufacturer::new(manufacturer));
}
if let Some(color) = extract("color", url) {
p.color = Some(Color::new(color));
}
p
}
And although you can try and sugarcoat it, the fundamentals won't change.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string