Why does NPM's policy of duplicated dependencies work? - node.js

By default, when I use NPM to manage a package depending on foo and bar, both of which depend on corelib, by default, NPM will install corelib twice (once for foo, and once for bar). They might even be different versions.
Now, let's suppose that corelib defined some data structure (e.g. a URL object) which is passed between foo, bar and the main application. Now, what I would expect, is if there was ever a backwards incompatible change to this object (e.g. one of the field names changed), and foo depended on corelib-1.0 and bar depended on corelib-2.0, I'd be a very sad panda: bar's version of corelib-2.0 might see a data structure created by the old version of corelib-1.0 and things would not work very well.
I was really surprised to discover that this situation basically never happens (I trawled Google, Stack Overflow, etc, looking for examples of people whose applications had stopped working, but who could have fixed it by running dedupe.) So my question is, why is this the case? Is it because node.js libraries never define data structures that are shared outside of the programmers? Is it because node.js developers never break backwards compatibility of their data structures? I'd really like to know!

this situation basically never happens
Yes, my experience is indeed that that is not a problem in the Node/JS ecosystem. And I think it is, in part, thanks to the robustness principle.
Below is my view on why and how.
Primitives, the early days
I think the first and foremost reason is that the language provides a common basis for primitive types (Number, String, Bool, Null, Undefined) and some basic compound types (Object, Array, RegExp, etc...).
So if I receive a String from one of the libs' APIs I use, and pass it to another, it cannot go wrong because there is just a single String type.
This is what used to happen, and still happens to some extent to this day: Library authors try to rely on the built-ins as much as possible and only diverge when there is sufficient reason to, and with sufficient care and thought.
Not so in Haskell. Before I started using stack, I've run into the following situation quite a few times with Text and ByteString:
Couldn't match type ‘T.Text’
with ‘Text’
NB: ‘T.Text’
is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.1’
‘Text’ is defined in ‘Data.Text.Internal’ in package ‘text-1.2.2.0’
Expected type: String -> Text
Actual type: String -> T.Text
This is quite frustrating, because in the above example only the patch version is different. The two data types may only be different nominally, and the ADT definition and the underlying memory representation may be completely identical.
As an example, it could have been a minor bugfix to the intersperse function that warranted the release of 1.2.2.1. Which is completely irrelevant to me if all I care about, in this hypothetical example, is concatenating some Texts and comparing their lengths.
Compound types, objects
Sometimes there is sufficient reason to diverge in JS from the built in data types: Take Promises as an example. It's such a useful abstraction over async computations compared to callbacks that many APIs started using them. What now? How come we don't run into many incompatibilities when different versions of these {then(), fail(), ...} objects are being passed up, down and around the dependency tree?
I think it's thanks to the robustness principle.
Be conservative in what you send, be liberal in what you accept.
So if I am authoring a JS library which I know returns promises and takes promises as part of its API, I'll be very careful how I interact with the received objects. E.g. I won't be calling fancy .success(), .finally(), ['catch']() methods on it, since I want to be as compatible as possible with different users, with different implementations of Promises. So, very conservatively, I may just use .then(done, fail), and nothing more. At this point, it doesn't matter if the user uses the promises that my lib returns, or Bluebirds' or even if they hand-write their own, so long as those adhere to the most basic Promise 'laws' -- the most basic API contracts.
Can this still lead to breakage at runtime? Yes, it can. If even the most basic API contract is not fulfilled, you may get an exception saying "Uncaught TypeError: promise.then is not a function". I think the trick here is that library authors are explicit about what their API needs: e.g. a .then method on the supplied object. And then its up to whoever is building on top of that API to make it damn sure that that method is available on the object they pass in.
I'd like to also point out here that this is also the case for Haskell, isn't it? Should I be so foolish as to write an instance for a typeclass that still type-checks without following its laws, I'll get runtime errors, won't I?
Where do we go from here?
Having thought through all this just now, I think we might be able to have the benefits of the robustness principle even in Haskell with much less (or even no(?)) risk for runtime exceptions/errors compared to JavaScript: We just need the typesystem be granular enough so it can distinguish what we want to do with the data we manipulate, and determine if that is still safe or not. E.g. The hypothetical Text example above, I would wager is still safe. And the compiler should only complain if I'm trying to use intersperse, and asks me to qualify it. E.g. with T.intersperse so it can be sure which one I want to use.
How do we do this in practice? Do we need extra support, e.g. language extension flags from GHC? We might not.
Just recently I found bookkeeper, which is a compile-time type-checked anonymous records implementation.
Please note: The following is conjecture on my part, I haven't taken much time to try and experiment with Bookkeeper. But I intend to in my Haskell projects to see if what I write about below could really be achieved with an approach such as this.
With Bookkeeper I could define an API like so:
emptyBook & #then =: id & #fail =: const
:: Bookkeeper.Internal.Book'
'["fail" 'Data.Type.Map.:-> (a -> b -> a),
"then" 'Data.Type.Map.:-> (a1 -> a1)]
Since functions are also first-class values. And whichever API takes this Book as an argument can be very specific what it demands from it: Namely the #then function, and that it has to match a certain type signature. And it cares not for any other function that may or may not be present with whatever signature. All this checked at compile time.
Prelude Bookkeeper
> let f o = (o ?: #foo) "a" "b" in f $ emptyBook & #foo =: (++)
"ab"
Conclusion
Maybe Bookkeeper or something similar will turn out to be useful in my experiments. Maybe Backpack will rush to the rescue with its common interface definitions. Or some other solution comes along. But either way, I hope we can move towards being able to take advantage of the robustness principle. And that Haskell's dependency management can also "just work" most of the time and fail with type errors only when it is truly warranted.
Does the above make sense? Anything unclear? Does it answer your question? I'd be curious to hear.
Further possibly relevant discussion may be found in this /r/haskell reddit thread, where this topic came up just not long ago, and I thought to post this answer to both places.

If I understand well, the supposed problem might be :
Module A
exports = require("c") //v0.1
Module B
console.log(require("a"))
console.log(require("c")) //v0.2
Module C
V0.1
exports = "hello";
V0.2
exports = "world";
By copying C_0.2 in node_modules and C0.1 in node_modules/a/node_modules and creating dummy packages.json, I think I created the case you're talking about.
will B have 2 different conflicting versions of C_data ?
Short answer :
it does. So node does not handle conflicting versions.
The reason you don't see it on the internet is as gustavohenke explained that node naturally does not encourage you to pollute the global scope or chain pass structures between modules.
In other words, it's not often that you'll see a module export another module's structure.

I don't have first-hand experience with this kind of situation in a large JS program, but I would guess that it has to do with the OO style of bundling data together with the functions that act on that data into a single object. Effectively the "ABI" of an object is to pull public methods by name out of a dictionary, and then invoke them by passing the object as the first argument. (Or perhaps the dictionary contains closures that are already partially applied to the object itself; it doesn't really matter.)
In Haskell we do encapsulation at a module level. For example, take a module that defines a type T and a bunch of functions, and exports the type constructor T (but not its definition) and some of the functions. The normal way to use such a module (and the only way that the type system will permit) is to use one exported function create to create a value of type T, and another exported function consume to consume the value of type T: consume (create a b c) x y z.
If I had two different versions of the module with different definitions of T and I was able to use the create from version 1 together with the consume from version 2 then I'd likely get a crash or wrong answer. Note that this is possible even if the public API and externally observable behavior of the two versions is identical; perhaps version 2 has a different representation of T that allows for a more efficient implementation of consume. Of course, GHC's type system stops you from doing this, but there are no such safeguards in a dynamic language.
You can translate this style of programming directly into a language like JavaScript or Python:
import M
result = M.consume(M.create(a, b, c), x, y, z)
and it would have exactly the same kind of problem that you are talking about.
However, it's far more common to use the OO style:
import M
result = M.create(a, b, c).consume(x, y, z)
Note that only create is imported from the module. consume is in a sense imported from the object we got back from create. In your foo/bar/corelib example, let's say that foo (which depends on corelib-1.0) calls create and passes the result to bar (which depends on corelib-2.0) which will call consume on it. Actually, while foo needs a dependency on corelib to call create, bar does not need a dependency on corelib to call consume at all. It's only using the base language notions to invoke consume (what we could spell getattr in Python). In this situation, bar will end up invoking the version of consume from corelib-1.0 regardless of what version of corelib bar "depends on".
Of course for this to work the public API of corelib must not have changed too much between corelib-1.0 and corelib-2.0. If bar wants to use a method fancyconsume which is new in corelib-2.0 then it won't be present on an object created by corelib-1.0. Still, this situation is much better than we had in original Haskell version, where even changes that do not affect the public API at all can cause breakage. And perhaps bar depends on corelib-2.0 features for the objects it creates and consumes itself, but only uses the API of corelib-1.0 to consume objects it receives externally.
To achieve something similar in Haskell, you could use this translation. Rather than directly using the underlying implementation
data TImpl = TImpl ... -- private
create_ :: A -> B -> C -> TImpl
consume_ :: TImpl -> X -> Y -> Z -> R
...
we wrap up the consumer interface with an existential in an API package corelib-api:
module TInterface where
data T = forall a. T { impl :: a,
_consume :: a -> X -> Y -> Z -> R,
... } -- Or use a type class if preferred.
consume :: T -> X -> Y -> Z -> R
consume t = (_consume t) (impl t)
and then the implementation in a separate package corelib:
module T where
import TInterface
data TImpl = TImpl ... -- private
create_ :: A -> B -> C -> TImpl
consume_ :: TImpl -> X -> Y -> Z -> R
...
create :: A -> B -> C -> T
create a b c = T { impl = create_ a b c,
_consume = consume_ }
Now foo uses corelib-1.0 to call create, but bar only needs corelib-api to call consume. The type T lives in corelib-api, so if the public API version does not change, then foo and bar can interoperate even if bar is linked against a different version of corelib.
(I know Backpack has a lot to say about this kind of thing; I'm offering this translation as a way to explain what is happening in the OO programs, not as a style one should seriously adopt.)

Here is a question that mostly answers the same thing: https://stackoverflow.com/a/15948590/2083599
Node.js modules don't pollute the global scope, so when they're required, they'll be private to the module that required them - and this is a great functionality.
When 2 or more packages require different versions of the same lib, NPM will install them for each package, so no conflicts will ever happen.
When they don't, NPM will install only once that lib.
In the other hand, Bower, which is a package manager for the browser, does install only flat dependencies because the libs will go to the global scope, so you can't install jquery 1.x.x and 2.x.x. They'll only export the same jQuery and $ vars.
About the backwards compatibility problems:
All developers do break backwards compatibility at least once! The only difference between Node developers and developers of other platforms is that we have been teached to always use semver.
Considering that most packages out there have not reached v2.0.0 yet, I believe that they have kept the same API in the switch from v0.x.x to v1.0.0.

Related

What are module signatures in Haskell?

I have recently found Haskell's feature called "module signatures". As I have discovered they are put in .hsig files and begin with signature keyword instead of module.
The example syntax of such a file may look like
signature Str where
data Str
empty :: Str
append :: Str -> Str -> Str
However, I cannot imagine how and why one would use them. Could you explain me which problems do they solve and how to properly make use of them?
They strongly remind me the module system that one can see in OCaml (link), which also has modules signatures and separate implementations, but I can't decide how close are these two concepts. Is it somehow related?
They are related to the OCaml module system, but with some important differences:
Signatures are defined within the language (in .hsig files) but unlike in OCaml they are not instantiated within the language. Instead, the package manager controls instantiation (currently, only Cabal provides that). Modules never know if they are importing an abstract signature or an actual module.
Implementation modules know nothing about signatures and do not not reference them directly. Any existing module can implement a signature if the definitions happen to be compatible.
Instantiation is triggered by a coincidence of module name and signature name in the dependencies of some compilation unit (executable, library, test suite...) When the names coincide, a process called "signature matching" takes place that verifies that the types and definitions are compatible.
The "happy path" is that in your program you depend on some library having a signature "hole", and also on another library that provides an implementation module with the same name. Then signature matching happens automatically. When the names don't match, or we need multiple instantiations of the signature-using library, we have to rename signatures and/or modules in the mixins section of the Cabal file.
As for why module signatures might be useful, consider bytestring, the most popular library by far for handling binary data in Haskell. But there are others, for example stdio with its Bytes type.
Suppose you are writing your own library that uses binary data, and you don't want to force your users into either stdio or bytestring. What are your choices?
One would be to create something like a Bytelike class and parameterize all your functions with it. You would also need to add a type parameter to every data type that contains bytes.
Another would be to create a signature that defines an abstract binary data type and all the operations that are required of it. Your library would make use of the signature, and remain "indefinite" until the user depends both on your library and a suitable implementation when creating his own libraries.
From the perspective of the user, the typeclass solution is unsatisfactory. The user knows that he wants to use either ByteString or Bytes, just one of them. The decision will not depend on some runtime flag and will remain constant across the extent of his program. And yet he has to deal with a more complex API that reminds him of that already decided issue at every turn.
It's better if he makes the decision once, writes it in his .cabal file, and deals with a simpler API from then onwards.
As described here, they're quite closely related to OCaml module signatures. They allow you to create a package missing some modules and say these modules, containing such and such types and values, should be delivered by package's user. I haven't tested it myself, but I imagine that such a package works very much like an OCaml functor.

What is a Sample in Helm?

There doesn't seem to be much documentation for Sample a in the Haskell FRP library Helm. I am trying to write a function similar to sample on in Elm and I think update could help. However I am confused about how update works because, from the source code here, it seems that the variable p is not used at all.
What should this function be doing and why is the input p included if it isn't used? Is there a better way to do this? I think seq could work, but I tried implementing my animation with seq and it doesn't do the thing I am looking for.
Probably the first argument there exists for historical reasons or for consistency with other functions offered by helm; but I don't know enough about either to say for sure.
The intended use of the update function seems to be to wrap the appropriate constructor around its argument: update p a s will result in either Changed a or Unchanged a depending on whether a matches with the value stored in s. One might use this, for example, as an argument to foldp:
foldp (update undefined) :: Eq a => Sample a -> Signal a -> Signal (Sample a)
Downstream signals could then ignore Unchanged values easily.

Representing and building a cyclic abstract syntax tree

I'm a newbie Haskell programmer with imperative background.
I'm writing a program that parses an abstract syntax tree (or rather a graph) that has cycles. (This is actually GCC's Generic AST). I'm designing data types for representing this graph, but I'm facing difficulties:
Firstly, there are lots of nasty cycles in GCC AST. For example type declarations refer to the actual type descriptor which refers back to the type's declaration (and this declaration may sometimes be different from the original, sometimes it is just a reference to an identifier node). All tree nodes reference a so-called context node (a form of parent reference). However, this context node (for some node X) often is not the same as the node that originally referred to X. For example a declaration for some builtin function may be found from a C++ namespace node, but the function declaration's context points to a translation unit declaration. Perhpas this makes using zippers impossible? Also I can't just ignore these context nodes, because they are useful for finding out the scope of declarations.
So, when designing the data structure I should take into account the fact, that sometimes I don't know what kind of node I will be dealing with at run time. However, when I'm certain that a node will have a known type, I would like to be able to reflect this on type level and take advantage of static type checking (for example function's result type is always a type, not an integer constant, but it might be integer or pointer type, etc.).
Secondly, GCC tree is a hierarchical data type in object-oriented sense. All nodes have common information, such as what kind of node they are. All declarations have a name and many flags, and variable declarations have type information in addition. Many nodes have so much data that it would be incovenient to access this information through pattern matching only. So I most likely will be using accessor functions (and type classes to provide an uniform interface regardless of the node type).
Thirdly, I would like my graph to be purely functional, but I don't know how to build it. My input is text, which has a section for every node. Nodes are identified by unique ids, and there are lots of forward- and self-references.
So, with my background, I sense that I'm trying to force my Haskell interface to an imperative form. So I'm asking you for some concrete advice to guide me in designing my data types, their interface and how to build the graph.
So far I have decided that I will not be tying the knot. It would prevent me from doing transformations on the tree (which IMHO deserves some transformations). It also would make it hard to write this tree back to disc, if I want to do that some day.
EDIT:
Here is a sample of my input. It is in YAML, but the format is not yet set to stone (I'm generating this data my self from within GCC).
http://sange.fi/~aura/test.yaml The example contains the global namespace with one function declaration for (int main (int argc, char *argv[])).
Thanks in advance!

Is there an object-identity-based, thread-safe memoization library somewhere?

I know that memoization seems to be a perennial topic here on the haskell tag on stack overflow, but I think this question has not been asked before.
I'm aware of several different 'off the shelf' memoization libraries for Haskell:
The memo-combinators and memotrie packages, which make use of a beautiful trick involving lazy infinite data structures to achieve memoization in a purely functional way. (As I understand it, the former is slightly more flexible, while the latter is easier to use in simple cases: see this SO answer for discussion.)
The uglymemo package, which uses unsafePerformIO internally but still presents a referentially transparent interface. The use of unsafePerformIO internally results in better performance than the previous two packages. (Off the shelf, its implementation uses comparison-based search data structures, rather than perhaps-slightly-more-efficient hash functions; but I think that if you find and replace Cmp for Hashable and Data.Map for Data.HashMap and add the appropraite imports, you get a hash based version.)
However, I'm not aware of any library that looks answers up based on object identity rather than object value. This can be important, because sometimes the kinds of object which are being used as keys to your memo table (that is, as input to the function being memoized) can be large---so large that fully examining the object to determine whether you've seen it before is itself a slow operation. Slow, and also unnecessary, if you will be applying the memoized function again and again to an object which is stored at a given 'location in memory' 1. (This might happen, for example, if we're memoizing a function which is being called recursively over some large data structure with a lot of structural sharing.) If we've already computed our memoized function on that exact object before, we can already know the answer, even without looking at the object itself!
Implementing such a memoization library involves several subtle issues and doing it properly requires several special pieces of support from the language. Luckily, GHC provides all the special features that we need, and there is a paper by Peyton-Jones, Marlow and Elliott which basically worries about most of these issues for you, explaining how to build a solid implementation. They don't provide all details, but they get close.
The one detail which I can see which one probably ought to worry about, but which they don't worry about, is thread safety---their code is apparently not threadsafe at all.
My question is: does anyone know of a packaged library which does the kind of memoization discussed in the Peyton-Jones, Marlow and Elliott paper, filling in all the details (and preferably filling in proper thread-safety as well)?
Failing that, I guess I will have to code it up myself: does anyone have any ideas of other subtleties (beyond thread safety and the ones discussed in the paper) which the implementer of such a library would do well to bear in mind?
UPDATE
Following #luqui's suggestion below, here's a little more data on the exact problem I face. Let's suppose there's a type:
data Node = Node [Node] [Annotation]
This type can be used to represent a simple kind of rooted DAG in memory, where Nodes are DAG Nodes, the root is just a distinguished Node, and each node is annotated with some Annotations whose internal structure, I think, need not concern us (but if it matters, just ask and I'll be more specific.) If used in this way, note that there may well be significant structural sharing between Nodes in memory---there may be exponentially more paths which lead from the root to a node than there are nodes themselves. I am given a data structure of this form, from an external library with which I must interface; I cannot change the data type.
I have a function
myTransform : Node -> Node
the details of which need not concern us (or at least I think so; but again I can be more specific if needed). It maps nodes to nodes, examining the annotations of the node it is given, and the annotations its immediate children, to come up with a new Node with the same children but possibly different annotations. I wish to write a function
recursiveTransform : Node -> Node
whose output 'looks the same' as the data structure as you would get by doing:
recursiveTransform Node originalChildren annotations =
myTransform Node recursivelyTransformedChildren annotations
where
recursivelyTransformedChildren = map recursiveTransform originalChildren
except that it uses structural sharing in the obvious way so that it doesn't return an exponential data structure, but rather one on the order of the same size as its input.
I appreciate that this would all be easier if say, the Nodes were numbered before I got them, or I could otherwise change the definition of a Node. I can't (easily) do either of these things.
I am also interested in the general question of the existence of a library implementing the functionality I mention quite independently of the particular concrete problem I face right now: I feel like I've had to work around this kind of issue on a few occasions, and it would be nice to slay the dragon once and for all. The fact that SPJ et al felt that it was worth adding not one but three features to GHC to support the existence of libraries of this form suggests that the feature is genuinely useful and can't be worked around in all cases. (BUT I'd still also be very interested in hearing about workarounds which will help in this particular case too: the long term problem is not as urgent as the problem I face right now :-) )
1 Technically, I don't quite mean location in memory, since the garbage collector sometimes moves objects around a bit---what I really mean is 'object identity'. But we can think of this as being roughly the same as our intuitive idea of location in memory.
If you only want to memoize based on object identity, and not equality, you can just use the existing laziness mechanisms built into the language.
For example, if you have a data structure like this
data Foo = Foo { ... }
expensive :: Foo -> Bar
then you can just add the value to be memoized as an extra field and let the laziness take care of the rest for you.
data Foo = Foo { ..., memo :: Bar }
To make it easier to use, add a smart constructor to tie the knot.
makeFoo ... = let foo = Foo { ..., memo = expensive foo } in foo
Though this is somewhat less elegant than using a library, and requires modification of the data type to really be useful, it's a very simple technique and all thread-safety issues are already taken care of for you.
It seems that stable-memo would be just what you needed (although I'm not sure if it can handle multiple threads):
Whereas most memo combinators memoize based on equality, stable-memo does it based on whether the exact same argument has been passed to the function before (that is, is the same argument in memory).
stable-memo only evaluates keys to WHNF.
This can be more suitable for recursive functions over graphs with cycles.
stable-memo doesn't retain the keys it has seen so far, which allows them to be garbage collected if they will no longer be used. Finalizers are put in place to remove the corresponding entries from the memo table if this happens.
Data.StableMemo.Weak provides an alternative set of combinators that also avoid retaining the results of the function, only reusing results if they have not yet been garbage collected.
There is no type class constraint on the function's argument.
stable-memo will not work for arguments which happen to have the same value but are not the same heap object. This rules out many candidates for memoization, such as the most common example, the naive Fibonacci implementation whose domain is machine Ints; it can still be made to work for some domains, though, such as the lazy naturals.
Ekmett just uploaded a library that handles this and more (produced at HacPhi): http://hackage.haskell.org/package/intern. He assures me that it is thread safe.
Edit: Actually, strictly speaking I realize this does something rather different. But I think you can use it for your purposes. It's really more of a stringtable-atom type interning library that works over arbitrary data structures (including recursive ones). It uses WeakPtrs internally to maintain the table. However, it uses Ints to index the values to avoid structural equality checks, which means packing them into the data type, when what you want are apparently actually StableNames. So I realize this answers a related question, but requires modifying your data type, which you want to avoid...

What is typestate?

What does TypeState refer to in respect to language design? I saw it mentioned in some discussions regarding a new language by mozilla called Rust.
Note: Typestate was dropped from Rust, only a limited version (tracking uninitialized and moved from variables) is left. See my note at the end.
The motivation behind TypeState is that types are immutable, however some of their properties are dynamic, on a per variable basis.
The idea is therefore to create simple predicates about a type, and use the Control-Flow analysis that the compiler execute for many other reasons to statically decorate the type with those predicates.
Those predicates are not actually checked by the compiler itself, it could be too onerous, instead the compiler will simply reasons in terms of graph.
As a simple example, you create a predicate even, which returns true if a number is even.
Now, you create two functions:
halve, which only acts on even numbers
double, which take any number, and return an even number.
Note that the type number is not changed, you do not create a evennumber type and duplicate all those functions that previously acted on number. You just compose number with a predicate called even.
Now, let's build some graphs:
a: number -> halve(a) #! error: `a` is not `even`
a: number, even -> halve(a) # ok
a: number -> b = double(a) -> b: number, even
Simple, isn't it ?
Of course it gets a bit more complicated when you have several possible paths:
a: number -> a = double(a) -> a: number, even -> halve(a) #! error: `a` is not `even`
\___________________________________/
This shows that you reason in terms of sets of predicates:
when joining two paths, the new set of predicates is the intersection of the sets of predicates given by those two paths
This can be augmented by the generic rule of a function:
to call a function, the set of predicates it requires must be satisfied
after a function is called, only the set of predicates it established is satisfied (note: arguments taken by value are not affected)
And thus the building block of TypeState in Rust:
check: checks that the predicate holds, if it does not fail, otherwise adds the predicate to set of predicates
Note that since Rust requires that predicates are pure functions, it can eliminate redundant check calls if it can prove that the predicate already holds at this point.
What Typestate lack is simple: composability.
If you read the description carefully, you will note this:
after a function is called, only the set of predicates it established is satisfied (note: arguments taken by value are not affected)
This means that predicates for a types are useless in themselves, the utility comes from annotating functions. Therefore, introducing a new predicate in an existing codebase is a bore, as the existing functions need be reviewed and tweaked to cater to explain whether or not they need/preserve the invariant.
And this may lead to duplicating functions at an exponential rate when new predicates pop up: predicates are not, unfortunately, composable. The very design issue they were meant to address (proliferation of types, thus functions), does not seem to be addressed.
It's basically an extension of types, where you don't just check whether some operation is allowed in general, but in this specific context. All that at compile time.
The original paper is actually quite readable.
There's a typestate checker written for Java, and Adam Warski's explanatory page gives some useful information. I'm only just figuring this material out myself, but if you are familiar with QuickCheck for Haskell, the application of QuickCheck to monadic state seems similar: categorise the states and explain how they change when they are mutated through the interface.
Typestate is explained as:
leverage type system to encode state changes
Implemented by creating a type for each state
Use move semantics to invalidate a state
Return the next state from the previous state
Optionally drop the state(close file, connections,...)
Compile time enforcement of logic
struct Data;
struct Signed;
impl Data {
fn sign(self) -> Signed {
Signed
}
}
let data = Data;
let singed = data.sign();
data.sign() // Compile error

Resources