It is hard to explain exactly what I mean by contextual go-to-implementation, so take the following example code in Rust:
struct A {}
struct B {}
impl From<A> for B {
fn from(a: A) -> Self {
B{}
}
}
fn fun() -> B {
let a = A{};
a.into()
}
It seems useful to me to be able to place the cursor on the call to into() in the last line of fun and expect to be able to easily go to the definition of from() in From<A> for B (expecting to see how a (of type A) becomes something of type B).
What really happens is go-to-implementation request takes me to the generic implementation of into in the standard library:
// From implies Into
#[stable(feature = "rust1", since = "1.0.0")]
#[rustc_const_unstable(feature = "const_convert", issue = "88674")]
impl<T, U> const Into<U> for T
where
U: ~const From<T>,
{
/// Calls `U::from(self)`.
///
/// That is, this conversion is whatever the implementation of
/// <code>[From]<T> for U</code> chooses to do.
fn into(self) -> U { // <- I'm taken here
U::from(self)
}
}
And this is correct. However, the context is lost and there is no way to now follow from the line below to the actual destination back in the source code, because there are many implementations of From. The implementation of LSP could know exactly, that in the context of the recent jump, T = A and U = B, so:
the editor could temporarily show this context to the user as inline hints (until the context is reset),
on the next go-to-implementation request, the context could be used to know that there is exactly one implementation of From for given context (T and U) and jump specifically to the line: fn from(a: A) -> Self {.
Would an implementation of this feature require changes in the protocol definition itself?
Are there any features already in the protocol for handling generic code that could point me towards the correct approach?
Go-to-implementation functionality is described in https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_implementation
It looks like the request is passed ImplementationParams which contain a location for which implementations are to be retrieved.
An array of LocationLink could be used to pass context of multiple previous go-to-implementation requests (or, in fact any go-to-* requests) that would allow the language server to limit the number of returned LocationLinks based on the context containing some type instantiation.
So, yes, the protocol would need extension to support this feature.
There is also another possibility. The go-to-* context stack could be maintained in the server and each go-to-* request would push a new context onto the stack. Analysis of this stack would allow the server to contextually limit the list of proposed go-to-* locations, but it would also allow it to generate code hints/diagnostics differently, depending on the go-to context stack, including a new type of hint/hover for contextual type concretization.
Up to this point it requires no modifications to the protocol. But in reality, at least methods to pop and clear the stack would be necessary.
Backwards compatibility can be achieved by defining new server and client capabilities.
Related
How can I downcast a trait to a struct like in this C# example?
I have a base trait and several derived structs that must be pushed into a single vector of base traits.
I have to check if the each item of the vector is castable to a specific derived struct and, if yes, use it as a struct of that type.
This is my Rust code, I don't know how to implement the commented part.
trait Base {
fn base_method(&self);
}
struct Derived1;
impl Derived1 {
pub fn derived1_method(&self) {
println!("Derived1");
}
}
impl Base for Derived1 {
fn base_method(&self) {
println!("Base Derived1");
}
}
struct Derived2;
impl Derived2 {
pub fn derived2_method(&self) {
println!("Derived2");
}
}
impl Base for Derived2 {
fn base_method(&self) {
println!("Base Derived2");
}
}
fn main() {
let mut bases = Vec::<Box<dyn Base>>::new();
let der1 = Derived1{};
let der2 = Derived2{};
bases.push(Box::new(der1));
bases.push(Box::new(der2));
for b in bases {
b.base_method();
//if (b is Derived1)
// (b as Derived1).derived1_method();
//else if (b is Derived2)
// (b as Derived2).derived2_method();
}
}
Technically you can use as_any, as explained in this answer:
How to get a reference to a concrete type from a trait object?
However, type-checking and downcasting when looping over a vector of trait objects is considered a code smell. If you put a bunch of objects into a vector and then loop over that vector, presumably the objects in that vector are supposed to play a similar role.
So then you should refactor your code such that you can call the same method on your object regardless of the underlying concrete type.
From your code, it seems you're purely checking the type (and downcasting) so that you can call the appropriate method. What you really should do, then, is introduce yet another trait that provides a unified interface that you then can call from your loop, so that the loop doesn't need to know the concrete type at all.
EDIT: Allow me to add a concrete example that highlights this, but I'm going to use Python to show this, because in Python it's very easy to do what you are asking to do, so we can then focus on why it's not the best design choice:
class Dog:
def bark():
print("Woof woof")
class Cat:
def meow():
print("Meow meow")
list_of_animals = [Dog(), Dog(), Cat(), Dog()]
for animal in list_of_animals:
if isinstance(animal, Dog):
animal.bark()
elif isinstance(animal, Cat):
animal.meow()
Here Python's dynamic typing allows us to just slap all the objects into the same list, iterate over it, and then figure out the type at runtime so we can call the right method.
But really the whole point of well-designed object oriented code is to lift that burden from the caller to the object. This type of design is very inflexible, because as soon as you add another animal, you'll have to add another branch to your if block, and you better do that everywhere you had that branching.
The solution is of course to identify the common role that both bark and meow play, and abstract that behavior into an interface. In Python of course we don't need to formally declare such an interface, so we can just slap another method in:
class Dog:
...
def make_sound():
self.bark()
class Cat:
...
def make_sound():
self.meow()
...
for animal in list_of_animals:
animal.make_sound()
In your Rust code, you actually have two options, and that depends on the rest of your design. Either, as I suggested, adding another trait that expresses the common role that the objects play (why put them into a vector otherwise in the first place?) and implementing that for all your derived structs. Or, alternatively, expressing all the various derived structs as different variants of the same enum, and then add a method to the enum that handles the dispatch. The enum is of course more closed to outside extension than using the trait version. That's why the solution will depend on your needs for that.
I want to create a function that provides a two step write and commit, like so:
// Omitting locking for brevity
struct States {
commited_state: u64,
// By reference is just a placeholder - I don't know how to do this
pending_states: HashSet<i64>
}
impl States {
fn read_dirty(&self) -> {
// Sum committed state and all non committed states
self.commited_state +
pending_states.into_iter().fold(sum_all_values).unwrap_or(0)
}
fn read_committed(&self) {
self.commited_state
}
}
let state_container = States::default();
async fn update_state(state_container: States, new_state: i64) -> Future {
// This is just pseudo code missing locking and such
// I'd like to add a reference to new_state
state_container.pending_states.insert(
new_state
)
async move {
// I would like to defer the commit
// I add the state to the commited state
state_container.commited_state =+ new_state;
// Then remove it *by reference* from the pending states
state_container.remove(new_state)
}
}
I'd like to be in a situation where I can call it like so
let commit_handler = update_state(state_container, 3).await;
// Do some external transactional stuff
third_party_transactional_service(...)?
// Commit if the above line does not error
commit_handler.await;
The problem I have is that HashMaps and HashSets, hash values based of their value and not their actual reference - so I can't remove them by reference.
I appreciate this a bit of a long question, but I'm just trying to give a bit more context as to what I'm trying to do. I know that in a typical database you'd generally have an atomic counter to generate the transaction ID, but that feels a bit overkill when the pointer reference would be enough.
However, I don't want to get the pointer value using unsafe, because it just seems a bit off to do something relatively simple.
Values in rust don't have an identity like they do in other languages. You need to ascribe them an identity somehow. You've hit on two ways to do this in your question: an ID contained within the value, or the address of the value as a pointer.
Option 1: An ID contained in the value
It's trivial to have a usize ID with a static AtomicUsize (atomics have interior mutability).
use std::sync::atomic::{AtomicUsize, Ordering};
// No impl of clone/copy as we want these IDs to be unique.
#[derive(Debug, Hash, PartialEq, Eq)]
#[repr(transparent)]
pub struct OpaqueIdentifier(usize);
impl OpaqueIdentifier {
pub fn new() -> Self {
static COUNTER: AtomicUsize = AtomicUsize::new(0);
Self(COUNTER.fetch_add(1, Ordering::Relaxed))
}
pub fn id(&self) -> usize {
self.0
}
}
Now your map key becomes usize, and you're done.
Having this be a separate type that doesn't implement Copy or Clone allows you to have a concept of an "owned unique ID" and then every type with one of these IDs is forced not to be Copy, and a Clone impl would require obtaining a new ID.
(You can use a different integer type than usize. I chose it semi-arbitrarily.)
Option 2: A pointer to the value
This is more challenging in Rust since values in Rust are movable by default. In order for this approach to be viable, you have to remove this capability by pinning.
To make this work, both of the following must be true:
You pin the value you're using to provide identity, and
The pinned value is !Unpin (otherwise pinning still allows moves!), which can be forced by adding a PhantomPinned member to the value's type.
Note that the pin contract is only upheld if the object remains pinned for its entire lifetime. To enforce this, your factory for such objects should only dispense pinned boxes.
This could complicate your API as you cannot obtain a mutable reference to a pinned value without unsafe. The pin documentation has examples of how to do this properly.
Assuming that you have done all of this, you can then use *const T as the key in your map (where T is the pinned type). Note that conversion to a pointer is safe -- it's conversion back to a reference that isn't. So you can just use some_pin_box.get_ref() as *const _ to obtain the pointer you'll use for lookup.
The pinned box approach comes with pretty significant drawbacks:
All values being used to provide identity have to be allocated on the heap (unless using local pinning, which is unlikely to be ergonomic -- the pin! macro making this simpler is experimental).
The implementation of the type providing identity has to accept self as a &Pin or &mut Pin, requiring unsafe code to mutate the contents.
In my opinion, it's not even a good semantic fit for the problem. "Location in memory" and "identity" are different things, and it's only kind of by accident that the former can sometimes be used to implement the latter. It's a bit silly that moving a value in memory would change its identity, no?
I'd just go with adding an ID to the value. This is a substantially more obvious pattern, and it has no serious drawbacks.
In this question, an issue arose that could be solved by changing an attempt at using a generic type parameter into an associated type. That prompted the question "Why is an associated type more appropriate here?", which made me want to know more.
The RFC that introduced associated types says:
This RFC clarifies trait matching by:
Treating all trait type parameters as input types, and
Providing associated types, which are output types.
The RFC uses a graph structure as a motivating example, and this is also used in the documentation, but I'll admit to not fully appreciating the benefits of the associated type version over the type-parameterized version. The primary thing is that the distance method doesn't need to care about the Edge type. This is nice but seems a bit shallow of a reason for having associated types at all.
I've found associated types to be pretty intuitive to use in practice, but I find myself struggling when deciding where and when I should use them in my own API.
When writing code, when should I choose an associated type over a generic type parameter, and when should I do the opposite?
This is now touched on in the second edition of The Rust Programming Language. However, let's dive in a bit in addition.
Let us start with a simpler example.
When is it appropriate to use a trait method?
There are multiple ways to provide late binding:
trait MyTrait {
fn hello_word(&self) -> String;
}
Or:
struct MyTrait<T> {
t: T,
hello_world: fn(&T) -> String,
}
impl<T> MyTrait<T> {
fn new(t: T, hello_world: fn(&T) -> String) -> MyTrait<T>;
fn hello_world(&self) -> String {
(self.hello_world)(self.t)
}
}
Disregarding any implementation/performance strategy, both excerpts above allow the user to specify in a dynamic manner how hello_world should behave.
The one difference (semantically) is that the trait implementation guarantees that for a given type T implementing the trait, hello_world will always have the same behavior whereas the struct implementation allows having a different behavior on a per instance basis.
Whether using a method is appropriate or not depends on the usecase!
When is it appropriate to use an associated type?
Similarly to the trait methods above, an associated type is a form of late binding (though it occurs at compilation), allowing the user of the trait to specify for a given instance which type to substitute. It is not the only way (thus the question):
trait MyTrait {
type Return;
fn hello_world(&self) -> Self::Return;
}
Or:
trait MyTrait<Return> {
fn hello_world(&Self) -> Return;
}
Are equivalent to the late binding of methods above:
the first one enforces that for a given Self there is a single Return associated
the second one, instead, allows implementing MyTrait for Self for multiple Return
Which form is more appropriate depends on whether it makes sense to enforce unicity or not. For example:
Deref uses an associated type because without unicity the compiler would go mad during inference
Add uses an associated type because its author thought that given the two arguments there would be a logical return type
As you can see, while Deref is an obvious usecase (technical constraint), the case of Add is less clear cut: maybe it would make sense for i32 + i32 to yield either i32 or Complex<i32> depending on the context? Nonetheless, the author exercised their judgment and decided that overloading the return type for additions was unnecessary.
My personal stance is that there is no right answer. Still, beyond the unicity argument, I would mention that associated types make using the trait easier as they decrease the number of parameters that have to be specified, so in case the benefits of the flexibility of using a regular trait parameter are not obvious, I suggest starting with an associated type.
Associated types are a grouping mechanism, so they should be used when it makes sense to group types together.
The Graph trait introduced in the documentation is an example of this. You want a Graph to be generic, but once you have a specific kind of Graph, you don't want the Node or Edge types to vary anymore. A particular Graph isn't going to want to vary those types within a single implementation, and in fact, wants them to always be the same. They're grouped together, or one might even say associated.
Associated types can be used to tell the compiler "these two types between these two implementations are the same". Here's a double dispatch example that compiles, and is almost similar to how the standard library relates iterator to sum types:
trait MySum {
type Item;
fn sum<I>(iter: I)
where
I: MyIter<Item = Self::Item>;
}
trait MyIter {
type Item;
fn next(&self) {}
fn sum<S>(self)
where
S: MySum<Item = Self::Item>;
}
struct MyU32;
impl MySum for MyU32 {
type Item = MyU32;
fn sum<I>(iter: I)
where
I: MyIter<Item = Self::Item>,
{
iter.next()
}
}
struct MyVec;
impl MyIter for MyVec {
type Item = MyU32;
fn sum<S>(self)
where
S: MySum<Item = Self::Item>,
{
S::sum::<Self>(self)
}
}
fn main() {}
Also, https://blog.thomasheartman.com/posts/on-generics-and-associated-types has some good information on this as well:
In short, use generics when you want to type A to be able to implement a trait any number of times for different type parameters, such as in the case of the From trait.
Use associated types if it makes sense for a type to only implement the trait once, such as with Iterator and Deref.
The main goal is to implement a computation graph, that handles nodes with values and nodes with operators (think of simple arithmetic operators like add, subtract, multiply etc..). An operator node can take up to two value nodes, and "produces" a resulting value node.
Up to now, I'm using an enum to differentiate between a value and operator node:
pub enum Node<'a, T> where T : Copy + Clone {
Value(ValueNode<'a, T>),
Operator(OperatorNode)
}
pub struct ValueNode<'a, T> {
id: usize,
value_object : &'a dyn ValueType<T>
}
Update: Node::Value contains a struct, which itself contains a reference to a trait object ValueType, which is being implemented by a variety of concrete types.
But here comes the problem. During compililation, the generic types will be elided, and replaced by the actual types. The generic type T is also being propagated throughout the computation graph (obviously):
pub struct ComputationGraph<T> where T : Copy + Clone {
nodes: Vec<Node<T>>
}
This actually restricts the usage of ComputeGraph to one specific ValueType.
Furthermore the generic type T cannot be Sized, since a value node can be an opqaue type handled by a different backend not available through rust (think of C opqaue types made available through FFI).
One possible solution to this problem would be to introduce an additional enum type, that "mirrors" the concrete implementation of the valuetype trait mentioned above. this approach would be similiar, that enum dispatch does.
Is there anything I haven't thought of to use multiple implementations of ValueType?
update:
What i want to achive is following code:
pub struct Scalar<T> where T : Copy + Clone{
data : T
}
fn main() {
let cg = ComputeGraph::new();
// a new scalar type. doesn't have to be a tuple struct
let a = Scalar::new::<f32>(1.0);
let b_size = 32;
let b = Container::new::<opaque_type>(32);
let op = OperatorAdd::new();
// cg.insert_operator_node constructs four nodes: 3 value nodes
// and one operator nodes internally.
let result = cg.insert_operator_node::<Container>(&op, &a, &b);
}
update
ValueType<T> looks like this
pub trait ValueType<T> {
fn get_size(&self) -> usize;
fn get_value(&self) -> T;
}
update
To further increase the clarity of my question think of a small BLAS library backed by OpenCL. The memory management and device interaction shall be transparent to the user. A Matrix type allocates space on an OpenCL device with types as a primitive type buffer, and the appropriate call will return a pointer to that specific region of memory. Think of an operation that will scale the matrix by a scalar type, that is being represented by a primitive value. Both the (pointer to the) buffer and the scalar can be passed to a kernel function. Going back to the ComputeGraph, it seems obvious, that all BLAS operations form some type of computational graph, which can be reduced to a linear list of instructions ( think here of setting kernel arguments, allocating buffers, enqueue the kernel, storing the result, etc... ). Having said all that, a computation graph needs to be able to store value nodes with a variety of types.
As always the answer to the problem posed in the question is obvious. The graph expects one generic type (with trait bounds). Using an enum to "cluster" various subtypes was the solution, as already sketched out in the question.
An example to illustrate the solution. Consider following "subtypes":
struct Buffer<T> {
// fields
}
struct Scalar<T> {
// fields
}
struct Kernel {
// fields
}
The value containing types can be packed into an enum:
enum MemType {
Buffer(Buffer<f32>);
Scalar(Scalar<f32>);
// more enum variants ..
}
Now MemType and Kernel can now be packed in an enum as well
enum Node {
Value(MemType);
Operator(Kernel);
}
Node can now be used as the main type for nodes/vertices inside the graph. The solution might not be very elegant, but it does the trick for now. Maybe some code restructuring might be done in the future.
I have a lint that warns on x.len() == 0, suggesting to use x.is_empty() instead. However, I wanted to get rid of the false positives if x has no is_empty(self: &Self) method.
Thus began the quest to look up methods from within rustc.
First step, get the x: I matched the node of an Expr to ExprMethodCall(ref method, _, ref args) (and made sure that args.len() == 1 and method.node.as_str() == "len") and just used &*args[0], which I will call expr from now on.
Next step, get the type of x: This can easily be done using rustc::middle::ty::expr_ty(cx.tcx, expr). Note that this is a rustc::middle::ty::Ty (and not a syntax::ast::Ty, which led to some confusion).
To look up the methods, the ctxt.impl_items and ctxt.trait_item_def_ids looked promising, so I get the DefId for my type with rustc::middle::ty::ty::ty_to_def_id(ty) and try to get the ids. However, this approach has a few problems:
For
let x = [1, 2];
x.len() == 2 // <- lookee here
I simply have no DefId. That is ok though, because we have a ty_vec in that case, and std::vec::Vec is known to have both len() and is_empty().
The good message is that the ctxt.trait_item_def_ids has a suitable entry for a trait with a is_empty method. Alas, for the following example:
struct One;
impl One { fn is_empty(self: &Self) -> bool { false } }
I got no TraitOrItemId for any impl item, which is a bit unfortunate. Can someone privy to rustc help me find my impl items?
I got it! The problem was that I trying to get a DefId for the type, not for the impl. Going through cx.tcx.inherent_impls.get(id) gave me a vec of DefIds for inherent impls, which I could then query via the impl_items lookup I already implemented.
Look in rust-clippy/src/len_zero.rs for an example implementation. Edit: Note that the implementation is O(N) where N is the number of methods of the type (either direct impl or by traits) – perhaps rustc will someday allow a faster lookup...