What functionality does it makes sense to implement using Rust enums? - rust

I'm having problem understanding the usefulness of Rust enums after reading The Rust Programming Language.
In section 17.3, Implementing an Object-Oriented Design Pattern, we have this paragraph:
If we were to create an alternative implementation that didn’t use the state pattern, we might instead use match expressions in the methods on Post or even in the main code that checks the state of the post and changes behavior in those places. That would mean we would have to look in several places to understand all the implications of a post being in the published state! This would only increase the more states we added: each of those match expressions would need another arm.
I agree completely. It would be very bad to use enums in this case because of the reasons outlined. Yet, using enums was my first thought of a more idiomatic implementation. Later in the same section, the book introduces the concept of encoding the state of the objects using types, via variable shadowing.
It's my understanding that Rust enums can contain complex data structures, and different variants of the same enum can contain different types.
What is a real life example of a design in which enums are the better option? I can only find fake or very simple examples in other sources.
I understand that Rust uses enums for things like Result and Option, but those are very simple uses. I was thinking of some functionality with a more complex behavior.
This turned out to be a somewhat open ended question, but I could not find a useful response after searching Google. I'm free to change this question to a more closed version if someone could be so kind as to help me rephrase it.

A fundamental trade-off between these choices in a broad sense has a name: "the expression problem". You should find plenty on Google under that name, both in general and in the context of Rust.
In the context of the question, the "problem" is to write the code in such a way that both adding a new state and adding a new operation on states does not involve modifying existing implementations.
When using a trait object, it is easy to add a state, but not an operation. To add a state, one defines a new type and implements the trait. To add an operation, naively, one adds a method to the trait but has to intrusively update the trait implementations for all states.
When using an enum for state, it is easy to add a new operation, but not a new state. To add an operation, one defines a new function. To add a new state, naively, one must intrusively modify all the existing operations to handle the new state.
If I explained this well enough, hopefully it should be clear that both will have a place. They are in a way dual to one another.
With this lens, an enum would be a better fit when the operations on the enum are expected to change more than the alternatives. For example, suppose you were trying to represent an abstract syntax tree for C++, which changes every three years. The set of types of AST nodes may not change frequently relative to the set of operations you may want to perform on AST nodes.
With that said, there are solutions to the more difficult options in both cases, but they remain somewhat more difficult. And what code must be modified may not be the primary concern.

Related

Multiple specialization, iterator patterns in Rust

Learning Rust (yay!) and I'm trying to understand the intended idiomatic programming required for certain iterator patterns, while scoring top performance. Note: not Rust's Iterator trait, just a method I've written accepting a closure and applying it to some data I'm pulling off of disk / out of memory.
I was delighted to see that Rust (+LLVM?) took an iterator I had written for sparse matrix entries, and a closure for doing sparse matrix vector multiplication, written as
iterator.map_edges({ |x, y| dst[y] += src[x] });
and inlined the closure's body in the generated code. It went quite fast. :D
If I create two of these iterators, or use the first a second time (not a correctness issue) each instance slows down quite a lot (about 2x in this case), presumably because the optimizer no longer chooses to do specialization because of the multiple call sites, and you end up doing a function call for each element.
I'm trying to understand if there are idiomatic patterns that keep the pleasant experience above (I like it, at least) without sacrificing the performance. My options seem to be (none satisfying this constraint):
Accept dodgy performance (2x slower is not fatal, but no prizes either).
Ask the user to supply a batch-oriented closure, so acting on an iterator over a small batch of data. This exposes a bit much of the internals of the iterator (the data are compressed nicely, and the user needs to know how to unwrap them, or the iterator needs to stage an unwrapped batch in memory).
Make map_edges generic in a type implementing a hypothetical EdgeMapClosure trait, and ask the user to implement such a type for each closure they want to inline. Not tested, but I would guess this exposes distinct methods to LLVM, each of which get nicely inlined. Downside is that the user has to write their own closure (packing relevant state up, etc).
Horrible hacks, like make distinct methods map_edges0, map_edges1, ... . Or add a generic parameter the programmer can use to make the methods distinct, but which is otherwise ignored.
Non-solutions include "just use for pair in iterator.iter() { /* */ }"; this is prep work for a data/task-parallel platform, and I would like to be able to capture/move these closures to work threads rather than capturing the main thread's execution. Maybe the pattern I should be using is to write the above, put it in a lambda/closure, and ship it around instead?
In a perfect world, it would be great to have a pattern which causes each occurrence of map_edges in the source file to result in different specialized methods in the binary, without forcing the entire project to be optimized at some scary level. I'm coming out of an unpleasant relationship with managed languages and JITs where generics would be the only way (I know of) to get this to happen, but Rust and LLVM seem magical enough that I thought there might be a good way. How do Rust's iterators handle this to inline their closure bodies? Or don't they (they should!)?
It seems that the problem is resolved by Rust's new approach to closures outlined at
http://smallcultfollowing.com/babysteps/blog/2014/11/26/purging-proc/
In short, Option 3 above (make functions generic with respect to a new closure type) is now transparently implemented when you make an implementation generic using the new closure traits. Rust produces the type behind the scenes for you.

How to simulate optional fields in an ADT in Haskell?

I have an algebraic data type that's being used all thoughout my program. I've realized that there's a point when I need to annotate all my structures of that type with a simple string.
I would like to not go through tons of code to account for adding a field to my commonly used type. Especially since there's no meaningful value for this annotation until quite far in my program, it seems excessive to refactor a thousand lines of code for a trivial change.
Also, as the type is pretty complex, it seems silly to just duplicate the type and make a slightly different version.
Is this just a weakness of Haskell, or am I missing the right way to handle this? I assume it's the latter, but I can't find anything akin to optional parameters to a type constructor.
I wouldn't go as far as declaring "a weakness of Haskell" - more like something that's needed to take into account when dealing with ADTs (and not specifically in Haskell).
Each program consists of "entities" and "actions". Your problem is that you need to modify an entity (an ADT in Haskell).
In an OOP language the solution will be straightforward - subclass you type. But if you'll decide to change the signature of a method in an existing class - you'll have no choice but to go through your entire codebase and deal with the damage.
In a functional language, functions are "cheap" - each function is independent from the ADT and refactoring it is often simple. Is that a weakness of OOP languages? No - it's just a tradeoff between ease of modifying existing "actions" (which is easy in functional languages) and ease of modifying "entities" (which is easy in OOP languages).
And in a practical note:
Designing an ADT is very important since changing can be very expensive.
#bheklilr's advice is priceless - follow it!

Is there a list of names you should not use in programming?

Is there a list of items on the web that you should not use when creating a model or variable?
For example, if I wanted to create apartment listings, naming a model something like Property would be problematic in the future and also confusing since property is a built-in Python function.
I did try Googling this, but couldn't come up with anything.
Thanks!
Rules and constraints about naming depend on the programming language. How an identifier/name is bound depends on the language semantics and its scoping rules: an identifer/name will be bound to different element depending on the scope. Scoping is usally lexical (i.e. static) but some language have dynamic scoping (some variant of lisp).
If names are different, there is no confusion in scoping. If identifiers/names are reused accrossed scopes, an identifier/name might mask another one. This is referred as Shadowing. This is a source of confusion.
Certain reserved names (i.e. keywords) have special meaning. Such keyword can simply be forbidden as names of other elements, or not.
For instance, in Smallatalk self is a keyword. It is still possible to declare a temporary variable self, though. In the scope where the temporary variable is visible, self resolves to the temporary variable, not the usual self that is receiver of the message.
Of course, shadowing can happen between regular names.
Scoping rules take types into consideration as well, and inheritance might introduce shadows.
Another source of confusion related to binding is Method Overloading. In statically typed languages, which method is executed depends on the static types at the call site. In certain cases, overloading makes it confusing to know which method is selected. Both Shadowing and Overloading should avoided to avoid confusions.
If your goal is to translate Python to Javascript, and vice versa, I guess you need to check the scoping rules and keywords of both languages to make sure your translation is not only syntactically correct, but also semantically correct.
Generally, programming languages have 'reserved words' or 'keywords' that you're either not able to use or in some cases are but should stay away from. For Python, you can find that list here.
Most words in most natural languages can have different meanings, according to the context. That's why we use specifiers to make the meaning of a word clear. If in any case you think that some particular identifier may be confusing, you can just add a specifier to make it clear. For example ObjectProperty has probably nothing to do with real estate, even in an application that deals with real estate.
The case you present is no different than using generic identifiers with no attached context. For example a variable named limit or length may have completely different meanings in different programs. Just use identifiers that make sense and document their meaning extensively. Being consistent within your own code base would also be preferable. Do not complicate your life with banned term lists that will never be complete and will only make programming more difficult.
The obvious exceptions are words reserved by your programming language of choice - but then again no decent compiler would allow you to use them anyway...

Is there an object-identity-based, thread-safe memoization library somewhere?

I know that memoization seems to be a perennial topic here on the haskell tag on stack overflow, but I think this question has not been asked before.
I'm aware of several different 'off the shelf' memoization libraries for Haskell:
The memo-combinators and memotrie packages, which make use of a beautiful trick involving lazy infinite data structures to achieve memoization in a purely functional way. (As I understand it, the former is slightly more flexible, while the latter is easier to use in simple cases: see this SO answer for discussion.)
The uglymemo package, which uses unsafePerformIO internally but still presents a referentially transparent interface. The use of unsafePerformIO internally results in better performance than the previous two packages. (Off the shelf, its implementation uses comparison-based search data structures, rather than perhaps-slightly-more-efficient hash functions; but I think that if you find and replace Cmp for Hashable and Data.Map for Data.HashMap and add the appropraite imports, you get a hash based version.)
However, I'm not aware of any library that looks answers up based on object identity rather than object value. This can be important, because sometimes the kinds of object which are being used as keys to your memo table (that is, as input to the function being memoized) can be large---so large that fully examining the object to determine whether you've seen it before is itself a slow operation. Slow, and also unnecessary, if you will be applying the memoized function again and again to an object which is stored at a given 'location in memory' 1. (This might happen, for example, if we're memoizing a function which is being called recursively over some large data structure with a lot of structural sharing.) If we've already computed our memoized function on that exact object before, we can already know the answer, even without looking at the object itself!
Implementing such a memoization library involves several subtle issues and doing it properly requires several special pieces of support from the language. Luckily, GHC provides all the special features that we need, and there is a paper by Peyton-Jones, Marlow and Elliott which basically worries about most of these issues for you, explaining how to build a solid implementation. They don't provide all details, but they get close.
The one detail which I can see which one probably ought to worry about, but which they don't worry about, is thread safety---their code is apparently not threadsafe at all.
My question is: does anyone know of a packaged library which does the kind of memoization discussed in the Peyton-Jones, Marlow and Elliott paper, filling in all the details (and preferably filling in proper thread-safety as well)?
Failing that, I guess I will have to code it up myself: does anyone have any ideas of other subtleties (beyond thread safety and the ones discussed in the paper) which the implementer of such a library would do well to bear in mind?
UPDATE
Following #luqui's suggestion below, here's a little more data on the exact problem I face. Let's suppose there's a type:
data Node = Node [Node] [Annotation]
This type can be used to represent a simple kind of rooted DAG in memory, where Nodes are DAG Nodes, the root is just a distinguished Node, and each node is annotated with some Annotations whose internal structure, I think, need not concern us (but if it matters, just ask and I'll be more specific.) If used in this way, note that there may well be significant structural sharing between Nodes in memory---there may be exponentially more paths which lead from the root to a node than there are nodes themselves. I am given a data structure of this form, from an external library with which I must interface; I cannot change the data type.
I have a function
myTransform : Node -> Node
the details of which need not concern us (or at least I think so; but again I can be more specific if needed). It maps nodes to nodes, examining the annotations of the node it is given, and the annotations its immediate children, to come up with a new Node with the same children but possibly different annotations. I wish to write a function
recursiveTransform : Node -> Node
whose output 'looks the same' as the data structure as you would get by doing:
recursiveTransform Node originalChildren annotations =
myTransform Node recursivelyTransformedChildren annotations
where
recursivelyTransformedChildren = map recursiveTransform originalChildren
except that it uses structural sharing in the obvious way so that it doesn't return an exponential data structure, but rather one on the order of the same size as its input.
I appreciate that this would all be easier if say, the Nodes were numbered before I got them, or I could otherwise change the definition of a Node. I can't (easily) do either of these things.
I am also interested in the general question of the existence of a library implementing the functionality I mention quite independently of the particular concrete problem I face right now: I feel like I've had to work around this kind of issue on a few occasions, and it would be nice to slay the dragon once and for all. The fact that SPJ et al felt that it was worth adding not one but three features to GHC to support the existence of libraries of this form suggests that the feature is genuinely useful and can't be worked around in all cases. (BUT I'd still also be very interested in hearing about workarounds which will help in this particular case too: the long term problem is not as urgent as the problem I face right now :-) )
1 Technically, I don't quite mean location in memory, since the garbage collector sometimes moves objects around a bit---what I really mean is 'object identity'. But we can think of this as being roughly the same as our intuitive idea of location in memory.
If you only want to memoize based on object identity, and not equality, you can just use the existing laziness mechanisms built into the language.
For example, if you have a data structure like this
data Foo = Foo { ... }
expensive :: Foo -> Bar
then you can just add the value to be memoized as an extra field and let the laziness take care of the rest for you.
data Foo = Foo { ..., memo :: Bar }
To make it easier to use, add a smart constructor to tie the knot.
makeFoo ... = let foo = Foo { ..., memo = expensive foo } in foo
Though this is somewhat less elegant than using a library, and requires modification of the data type to really be useful, it's a very simple technique and all thread-safety issues are already taken care of for you.
It seems that stable-memo would be just what you needed (although I'm not sure if it can handle multiple threads):
Whereas most memo combinators memoize based on equality, stable-memo does it based on whether the exact same argument has been passed to the function before (that is, is the same argument in memory).
stable-memo only evaluates keys to WHNF.
This can be more suitable for recursive functions over graphs with cycles.
stable-memo doesn't retain the keys it has seen so far, which allows them to be garbage collected if they will no longer be used. Finalizers are put in place to remove the corresponding entries from the memo table if this happens.
Data.StableMemo.Weak provides an alternative set of combinators that also avoid retaining the results of the function, only reusing results if they have not yet been garbage collected.
There is no type class constraint on the function's argument.
stable-memo will not work for arguments which happen to have the same value but are not the same heap object. This rules out many candidates for memoization, such as the most common example, the naive Fibonacci implementation whose domain is machine Ints; it can still be made to work for some domains, though, such as the lazy naturals.
Ekmett just uploaded a library that handles this and more (produced at HacPhi): http://hackage.haskell.org/package/intern. He assures me that it is thread safe.
Edit: Actually, strictly speaking I realize this does something rather different. But I think you can use it for your purposes. It's really more of a stringtable-atom type interning library that works over arbitrary data structures (including recursive ones). It uses WeakPtrs internally to maintain the table. However, it uses Ints to index the values to avoid structural equality checks, which means packing them into the data type, when what you want are apparently actually StableNames. So I realize this answers a related question, but requires modifying your data type, which you want to avoid...

What programming languages will let me manipulate the sequence of instructions in a method?

I have an upcoming project in which a core requirement will be to mutate the way a method works at runtime. Note that I'm not talking about a higher level OO concept like "shadow one method with another", although the practical effect would be similar.
The key properties I'm after are:
I must be able to modify the method in such a way that I can add new expressions, remove existing expressions, or modify any of the expressions that take place in it.
After modifying the method, subsequent calls to that method would invoke the new sequence of operations. (Or, if the language binds methods rather than evaluating every single time, provide me a way to unbind/rebind the new method.)
Ideally, I would like to manipulate the atomic units of the language (e.g., "invoke method foo on object bar") and not the assembly directly (e.g. "pop these three parameters onto the stack"). In other words, I'd like to be able to have high confidence that the operations I construct are semantically meaningful in the language. But I'll take what I can get.
If you're not sure if a candidate language meets these criteria, here's a simple litmus test:
Can you write another method called clean which:
accepts a method m as input
returns another method m2 that performs the same operations as m
such that m2 is identical to m, but doesn't contain any calls to the print-to-standard-out method in your language (puts, System.Console.WriteLn, println, etc.)?
I'd like to do some preliminary research now and figure out what the strongest candidates are. Having a large, active community is as important to me as the practicality of implementing what I want to do. I am aware that there may be some unforged territory here, since manipulating bytecode directly is not typically an operation that needs to be exposed.
What are the choices available to me? If possible, can you provide a toy example in one or more of the languages that you recommend, or point me to a recent example?
Update: The reason I'm after this is that I'd like to write a program which is capable of modifying itself at runtime in response to new information. This modification goes beyond mere parameters or configurable data, but full-fledged, evolved changes in behavior. (No, I'm not writing a virus. ;) )
Well, you could always use .NET and the Expression libraries to build up expressions. That I think is really your best bet as you can build up representations of commands in memory and there is good library support for manipulating, traversing, etc.
Well, those languages with really strong macro support (in particular Lisps) could qualify.
But are you sure you actually need to go this deeply? I don't know what you're trying to do, but I suppose you could emulate it without actually getting too deeply into metaprogramming. Say, instead of using a method and manipulating it, use a collection of functions (with some way of sharing state, e.g. an object holding state passed to each).
I would say Groovy can do this.
For example
class Foo {
void bar() {
println "foobar"
}
}
Foo.metaClass.bar = {->
prinltn "barfoo"
}
Or a specific instance of foo without effecting other instances
fooInstance.metaClass.bar = {->
println "instance barfoo"
}
Using this approach I can modify, remove or add expression from the method and Subsequent calls will use the new method. You can do quite a lot with the Groovy metaClass.
In java, many professional framework do so using the open source ASM framework.
Here is a list of all famous java apps and libs including ASM.
A few years ago BCEL was also very much used.
There are languages/environments that allows a real runtime modification - for example, Common Lisp, Smalltalk, Forth. Use one of them if you really know what you're doing. Otherwise you can simply employ an interpreter pattern for an evolving part of your code, it is possible (and trivial) with any OO or functional language.

Resources