How to configure tower_http TraceLayer in a separate function? - rust

I'm implementing a tokio/axum HTTP server. In the function where I run the server, I configure routing, add shared application services and add tracing layer.
My tracing configuration looks like this:
let tracing_layer = TraceLayer::new_for_http()
.make_span_with(|_request: &Request<Body>| {
let request_id = Uuid::new_v4().to_string();
tracing::info_span!("http-request", %request_id)
})
.on_request(|request: &Request<Body>, _span: &Span| {
tracing::info!("request: {} {}", request.method(), request.uri().path())
})
.on_response(
|response: &Response<BoxBody>, latency: Duration, _span: &Span| {
tracing::info!("response: {} {:?}", response.status(), latency)
},
)
.on_failure(
|error: ServerErrorsFailureClass, _latency: Duration, _span: &Span| {
tracing::error!("error: {}", error)
},
);
let app = Router::new()
// routes
.layer(tracing_layer)
// other layers
...
Trying to organize the code a bit I move the tracing layer configuration to a separate function. The trick is to provide a compiling return type for this function.
The first approach was to move the code as is and let an IDE generate the return type:
TraceLayer<SharedClassifier<ServerErrorsAsFailures>, fn(&Request<Body>) -> Span, fn(&Request<Body>, &Span), fn(&Response<BoxBody>, Duration, &Span), DefaultOnBodyChunk, DefaultOnEos, fn(ServerErrorsFailureClass, Duration, &Span)>
Which is completely unreadable, but the worst is it does not compile: "expected fn pointer, found closure"
In the second approach I changed fn into impl Fn that would mean a closure type. Again, I get an error that my closures are not Clone.
Third, I try to extract closures into separate functions. But then I get "expected fn pointer, found fn item".
What can I do 1) to make it compile and 2) to make it more readable?

Speaking from experience, breaking up the code like that is very hard due to all the generics. I would instead recommend functions that accept and return axum::Routers. That way you bypass all the generics:
fn add_middleware(router: Router) -> Router {
router.layer(
TraceLayer::new_for_http().make_span_with(...)
)
}

Related

Error handling for applications: how to return a public message error instead of all the chain of errors and tracing it at the same time?

PROLOGUE
I'm using async-graphql and I have hundreds of resolvers and for each resolver I would like to trace all the possible errors.
In each method of my app I'm using anyhow::{Error}.
Right now I have code similar to this for each resolver:
#[Object]
impl MutationRoot {
async fn player_create(&self, ctx: &Context<'_>, input: PlayerInput) -> Result<Player> {
let services = ctx.data_unchecked::<Services>();
let player = services
.player_create(input)
.await?;
Ok(player)
}
}
So I thought about using the below code (note the added line with .map_err()):
#[Object]
impl MutationRoot {
async fn player_create(&self, ctx: &Context<'_>, input: PlayerInput) -> Result<Player> {
let services = ctx.data_unchecked::<Services>();
let player = services
.player_create(input)
.await
.map_err(errorify)?;
Ok(player)
}
}
fn errorify(err: anyhow::Error) -> async_graphql::Error {
tracing::error!("{:?}", err);
err.into()
}
Now the error is traced along with all the error chain:
ERROR example::main:10: I'm the error number 4
Caused by:
0: I'm the error number 3
1: I'm the error number 2
2: I'm the error number 1
in example::async_graphql
QUESTION 1
Is there a way to avoid the .map_err() on each resolver?
I would like to use the ? alone.
Should I use a custom error?
Can we have a global "hook"/callback/fn to call on each error?
QUESTION 2
As you can see above the chain of the error is the inverse.
In my graphql response I'm getting as message the "I'm the error number 4" but I need to get the "I'm the error number 2" instead.
The error chain is created using anyhow like this:
main.rs: returns error with .with_context(|| "I'm the error number 4")?
call fn player_create() in graphql.rs: returns with .with_context(|| "I'm the error number 3")?
call fn new_player() in domain.rs: returns with .with_context(|| "I'm the error number 2")?
call fn save_player() in database.rs: returns with .with_context(|| "I'm the error number 1")?
How can I accomplish this?
I'm really new to Rust. I come from Golang where I was using a struct like:
type Error struct {
error error
public_message string
}
chaining it easily with:
return fmt.Errorf("this function is called but the error was: %w", previousError)
How to do it in Rust?
Do I necessarily have to use anyhow?
Can you point me to a good handling error tutorial/book for applications?
Thank you very much.
I would suggest you define your own error for your library and handle them properly by using thiserror crate.
It's like Go defining var ErrNotFound = errors.New(...) and use fmt.Errorf(..., err) to add context.
With the powerful tool enum in Rust, so you can handle every error properly by match arms. It also provides really convenient derive macro like #[from] or #[error(transparent)] to make error conversion/forwarding easy.
Edit 1:
If you want to separate public message and tracing log, you may consider defining you custom error struct like this:
#[derive(Error, Debug)]
pub struct MyError {
msg: String,
#[source] // optional if field name is `source`
source: anyhow::Error,
}
and implement Display trait for formatting its inner msg field.
Finally, you could use macro in tracing-attributes crate:
#[instrument(err(Debug))]
fn my_function(arg: usize) -> Result<(), std::io::Error> {
Ok(())
}
Is there a way to avoid the .map_err() on each resolver?
Yes, you should be able to remove it unless you really need to convert to async_graphql::Error.
Do I necessarily have to use anyhow?
No, but it makes this easier when using ? on different error types.
I'm really new to Rust. I come from Golang where I was using a struct like:
Take a look at thiserror which lets you build you own enum of error variants.

Moving context into several closures?

I have found a way to move context into several closures, but it looks ugly. I do it with help of Rc and cloning each variable I need to use for each closure. Particularly I don't like to clone every variable for every closure I want to use:
let mut context = Rc::new( Context { a : 13 } );
..
let context_clone_1 = Rc::clone( &context );
engine.on_event1( Box::new( move ||
{
println!( "on_event1 : {}", context_clone_1.a );
...
let context_clone_2 = Rc::clone( &context );
engine.on_event2( Box::new( move ||
{
println!( "on_event1 : {}", context_clone_1.a );
...
It is an extensive way to go and I feel there must be a better way to do it. Also, uncommenting line // context_clone_1.a += 1; breaks the compilation. What is the proper way of solving problems like this in Rust?
Here is a playground with minimal code.
There are two "problems" here:
Since you specifically asked about context_clone_1.a += 1;: When putting a value into an Rc, there could be multiple references to that value, derived from the independent Rc owners. If mutation was allowed, this would also allow simultaneous mutation and aliasing, which is not allowed in Rust; therefore Rc does not allow mutating its inner value. A common approach to regain mutability is to put the value into a RefCell, which provides mutability through try_borrow_mut() with a runtime check that ensures no aliasing occurs. A Rc<RefCell<T>> is commonly seen in Rust.
Regarding the use of Rc: The way your code is currently set up is actually fine, at least if that's how it should work. The way the code is currently structured allows for flexibility, including cases where multiple Context-objects provide callback implementations on different events. For example, this is currently possible:
let context1 = Context { a : 13 };
engine.on_event1(Box::new(move ||
{
println!("on_event1 : {}", context1.a );
});
let context2 = Context { a : 999 };
engine.on_event2(Box::new(move ||
{
println!("on_event1 : {}", context2.a );
});
In case you have exactly one Context (as in your example), and since the Engine needs to make sure that all callbacks are alive while it itself is alive, you'll need to put each callback - which is structured as a completely separate thing - into a Rc. In your case, all Rc end up pointing to the same object; but they don't have to and this is what your code currently allows for.
A more simple solution would be to define a trait for Context, something along the lines of
trait EventDriver {
fn event1(&mut self, &Engine);
fn event2(&mut self, &Engine);
}
... and then have Context implement the trait. The Engine-struct then becomes generic over E: EventDriver and Context becomes the E in that. This solution only allows for exactly one instance of Context to provide event callbacks. But since Engine is the owner of that object, it can be sure that all callbacks are alive while it itself is alive and the whole Rc-thing goes away.

Handling duplicate inserts into database in async rust

Beginner in both rust and async programming here.
I have a function that downloads and stores a bunch of tweets in the database:
pub async fn process_user_timeline(config: &Settings, pool: &PgPool, user_object: &Value) {
// get timeline
if let Ok((user_timeline, _)) =
get_user_timeline(config, user_object["id"].as_str().unwrap()).await
{
// store tweets
if let Some(tweets) = user_timeline["data"].as_array() {
for tweet in tweets.iter() {
store_tweet(pool, &tweet, &user_timeline, "normal")
.await
.unwrap_or_else(|e| {
println!(
">>>X>>> failed to store tweet {}: {:?}",
tweet["id"].as_str().unwrap(),
e
)
});
}
}
}
}
It's being called in an asynchronous loop by another function:
pub async fn loop_until_hit_rate_limit<'a, T, Fut>(
object_arr: &'a [T],
settings: &'a Settings,
pool: &'a PgPool,
f: impl Fn(&'a Settings, &'a PgPool, &'a T) -> Fut + Copy,
rate_limit: usize,
) where
Fut: Future,
{
let total = object_arr.len();
let capped_total = min(total, rate_limit);
let mut futs = vec![];
for (i, object) in object_arr[..capped_total].iter().enumerate() {
futs.push(async move {
println!(">>> PROCESSING {}/{}", i + 1, total);
f(settings, pool, object).await;
});
}
futures::future::join_all(futs).await;
}
Sometimes two async tasks will try to insert the same tweet at the same time, producing this error:
failed to store tweet 1398307091442409475: Database(PgDatabaseError { severity: Error, code: "23505", message: "duplicate key value violates unique constraint \"tweets_tweet_id_key\"", detail: Some("Key (tweet_id)=(1398307091442409475) already exists."), hint: None, position: None, where: None, schema: Some("public"), table: Some("tweets"), column: None, data_type: None, constraint: Some("tweets_tweet_id_key"), file: Some("nbtinsert.c"), line: Some(656), routine: Some("_bt_check_unique") })
Mind the code already checks for whether a tweet is present before inserting it, so this only happens in the following scenario: READ from task 1 > READ from task 2 > WRITE from task 1 (success) > WRITE from task 2 (error).
To solve this, my best attempt so far has been to place an unwrap_or_else() clause which lets one of the tasks fail without panicking out of the entire execution. I am aware of at least one drawback - sometimes both tasks will bail out and the tweet never gets written. It happens in <1% of cases, but it happens.
Are there other drawbacks to my approach I'm not aware of?
What's the right way to handle this? I hate losing data, and even worse doing so non-deterministically.
PS I'm using actix web and sqlx as my webserver / db libraries.
Generally for anything that may be written by multiple threads/processes, any logic like
if (!exists) {
writeValue()
}
needs to either be protected by some kind of lock, or the code needs to be changed to write atomically with the possibility the write will fail because something else already wrote to it.
For in-memory data in Rust you'd use Mutex to ensure that you can read and then write the data back before anything else reads it, or Atomic to modify the data in such a way that if something already wrote it, you can detect that.
In databases, for any query that might conflict with some other query happening around the same time, you'd want to use an ON CONFLICT clause in your query so that the database itself knows what to do when it tries to write data and it already exists.
For your case since I'm guessing the tweets are immutable, you'd likely want to do ON CONFLICT tweet_id DO NOTHING (or whatever your ID column is), in which case the INSERT will skip inserting if there is already a tweet with the ID you are inserting, and it won't throw an error.

Can't use a neon JsArray: This function takes 3 parameters but 2 were supplied

I'm learning how to use neon, but I don't understand a thing. If I try to execute this code:
#[macro_use]
extern crate neon;
use neon::vm::{Call, JsResult};
use neon::mem::Handle;
use neon::js::{JsInteger, JsNumber, JsString, JsObject, JsArray, JsValue, Object, Key};
use neon::js::error::{JsError, Kind};
fn test(call: Call) -> JsResult<JsArray> {
let scope = call.scope;
let js_arr: Handle<JsArray> = try!(try!(call.arguments.require(scope, 1)).check::<JsArray>());
js_arr.set(0, JsNumber::new(scope, 1000));
Ok(js_arr)
}
register_module!(m, {
m.export("test", test)
});
I get this error when I call js_arr.set: This function takes 3 parameters but 2 were supplied.
I don't understand why since it's a JsArray. Even Racer tells me that the set method takes 2 parameters. No matter what, js_arr.set takes 3 parameters in this order: &mut bool, neon::macro_internal::runtime::raw::Local and neon::macro_internal::runtime::raw::Local.
What's happening? I can't understand how JsArray works.
As paulsevere says on a GitHub issue for Neon, import neon::js::Object. In addition, do not import Key, which also provides a set method:
#[macro_use]
extern crate neon;
use neon::vm::{Call, JsResult};
use neon::js::{Object, JsArray, JsInteger, JsObject, JsNumber};
fn make_an_array(call: Call) -> JsResult<JsArray> {
let scope = call.scope; // the current scope for rooting handles
let array = JsArray::new(scope, 3);
array.set(0, JsInteger::new(scope, 9000))?;
array.set(1, JsObject::new(scope))?;
array.set(2, JsNumber::new(scope, 3.14159))?;
Ok(array)
}
register_module!(m, {
m.export("main", make_an_array)
});
This creates a brand new array. If you'd like to accept an array as the first argument to your function and then modify it, this works:
#[macro_use]
extern crate neon;
use neon::vm::{Call, JsResult};
use neon::js::{Object, JsArray, JsInteger, JsUndefined};
use neon::mem::Handle;
fn hello(call: Call) -> JsResult<JsUndefined> {
let scope = call.scope;
let js_arr: Handle<JsArray> = call.arguments.require(scope, 0)?.check::<JsArray>()?;
js_arr.set(0, JsInteger::new(scope, 1000))?;
Ok(JsUndefined::new())
}
register_module!(m, {
m.export("hello", hello)
});
let js_arr: Handle<JsArray> makes it clear that js_arr is a Handle<JsArray> and Handle<T> has this method:
unsafe fn set(self, out: &mut bool, obj: Local, val: Local) -> bool
I'd guess that you're accidentally trying to call Handle::set (which is unsafe and takes three non-self arguments) rather than JsArray::set (which is safe and takes two non-self arguments).
If that's the case, you need to force a deref_mut to occur. (_mut because JsArray::set takes &mut self.)
I haven't run into this sort of naming collision before, so I can't be certain whether the auto-deref is smart enough, but something like this may work:
(&mut js_arr).set(0, JsNumber::new(scope, 1000));
Failing that, two other things to try are:
JsArray::set(&mut js_arr, 0, JsNumber::new(scope, 1000));
(If the former example fails because it's too much like C++-style method overloading. This is known as Fully Qualified Syntax and is normally used to disambiguate when an object implements two traits which provide methods of the same name.)
Call js_arr.deref_mut() directly to get a mutable reference to the underlying JsArray, then call set on that.

How to pass a dynamic amount of typed arguments to a function?

Lets say I want to write a little client for an HTTP API. It has a resource that returns a list of cars:
GET /cars
It also accepts the two optional query parameters color and manufacturer, so I could query specific cars like:
GET /cars?color=black
GET /cars?manufacturer=BMW
GET /cars?color=green&manufacturer=VW
How would I expose these resources properly in Rust? Since Rust doesn't support overloading, defining multiple functions seems to be the usual approach, like:
fn get_cars() -> Cars
fn get_cars_by_color(color: Color) -> Cars
fn get_cars_by_manufacturer(manufacturer: Manufacturer) -> Cars
fn get_cars_by_manufacturer_and_color(manufacturer: Manufacturer, color: Color) -> Cars
But this will obviously not scale when you have more than a few parameters.
Another way would be to use a struct:
struct Parameters {
color: Option<Color>,
manufacturer: Option<Manufacturer>
}
fn get_cars(params: Parameters) -> Cars
This has the same scaling issue, every struct field must be set on creation (even if its value is just None).
I guess I could just accept a HashMap<String, String>, but that doesn't sound very good either.
So my question is, what is the proper/best way to do this in Rust?
You could use the Builder pattern, as mentioned here. For your particular API, it could look like this:
Cars::new_get()
.by_color("black")
.by_manufacturer("BMW")
.exec();
I would like to point out that no matter the solution, if you wish for a compile-time checked solution the "url parsing -> compile-time checkable" translation is necessarily hard-wired. You can generate that with an external script, with macros, etc... but in any case for the compiler to check it, it must exist at compile-time. There just is no short-cut.
Therefore, no matter which API you go for, at some point you will have something akin to:
fn parse_url(url: &str) -> Parameters {
let mut p: Parameters = { None, None };
if let Some(manufacturer) = extract("manufacturer", url) {
p.manufacturer = Some(Manufacturer::new(manufacturer));
}
if let Some(color) = extract("color", url) {
p.color = Some(Color::new(color));
}
p
}
And although you can try and sugarcoat it, the fundamentals won't change.

Resources