Declaring a field as `f: elems[g] -> one h` is producing a name-not-found error. What am I missing? - alloy

In a model I am writing, I would like to say that a reading of a manuscript identifies a sequence of tokens in the manuscript and maps them to types; the mapping should be defined for all tokens in the manuscript and should identify exactly one type for each token. (Types and tokens here are as described by Peirce's type/token distinction.)
When I write
sig Document, Type, Token {}
sig Reading {
doc: Document,
tokens: seq Token,
mapping: elems[tokens] -> one Type
}
run {} for 3
an attempt to execute the run command produces the error message
The name "elems" cannot be found.
If I replace elems[tokens] with seq/Int.tokens (borrowing from the definition of elems in util/sequiv) or (simplifying) Int.tokens or univ.tokens, I get the results I expect.
If I replace elems[tokens] with ran[tokens] (and include open util/relation), I get a similar complaint about the name ran.
Using these names elsewhere in the model (not shown) does not elicit this error, so I infer that the problem is not that the functions in question are unknown but that function invocations are unwelcome in the right hand side of a field declaration.
The grammar says of the right-hand side of a field declaration only that it is an expression, and function invocations are allowed as expressions. So I suppose there is a constraint expressed elsewhere that explains why my initial formulation does not work. Can anyone tell me what it is?
I can make do with univ.tokens for now, but I would prefer the original formulation as easier for my expected readers to understand -- they can squint and think of it as a function call, whereas with dot join I need to pause to explain it to them, which distracts from the core task of the model. My thanks for any help.

I toyed a bit with the example and it seems your inference is correct. One can't reference a function in field declarations (even if the grammar states otherwise)
What you proposed (univ.tokens or Int.tokens) is for me the cleanest workaround.
What Peter proposed in his comment does indeed the trick, but redefining elems looks too fishy, especially since the elems function is already clearly defined for your readers. It could be a source of confusion.
Another idea if you really really insist on using elems[token] would be to define mapping as a relation from Token to Type and then to constrain the left-handside of the relation to consist only of those Token present in the sequence. Here would then be your model:
sig Document, Type, Token {}
sig Reading {
doc: Document,
tokens: seq Token,
mapping: Token -> one Type
}{
mapping.Type =elems[tokens]
}
run {} for 3

Related

Strict map with custom reading method or type with custom read instance

So, it is not a problem but I would want an opinion what would be a better way. So I need to read data from a outside source (TCP), that comes basically in this format:
key: value
okey: enum
stuff: 0.12240
amazin: 1020
And I need to parse it into a Haskell accessible format, so the two solutions I thought about, were either to, parse that into a strict String to String map, or record syntax type declarations.
Initially I thought to make a type synonym for my String => String map, and make extractor functions like amazin :: NiceSynonym -> Int, and do the necessary treatment and parsing within the method, but that felt like, sketchy at the time? Then I thought an actual type declaration with record syntax, with a custom Read instance. That was a nightmare, because there is a lot of enums and keys with different types and etc. And it felt... disappointing. It simply wraps the arguments and creates reader functions, not much different from the original: amazin :: TypeDeclaration -> Int.
Now I'm kind of regretting not going with reader functions as I initially envisioned. So, anything else I'm forgetting to consider? Any pros and cons of either sides to take note on? Is one objectively better then the other?
P.S.: Some considerations that may make one or the other better:
Once read I won't need to change it at all whatsoever, it's basically a status report
No need to compare, add, etc., again just status report no point
Not really a need for performance, I wont be reading hundreds a second or anything
TL;DR: Given that input example, what's the best way to make into a Haskell-readable format? map, data constructor, dependent map...
Both ways are very valid in their own respects, but since I was making an API to interact with such protocol too, I preferred the record syntax so I could cover all the properties more easily. Also I wasn't really going to do any checking or treatment in the getter functions, and no matter how boring making the reader instance for my type might have seemed, I bet doing all the get functions manually would be worse. Parsing stuff manually is inherently boring, I guess I was just looking for a magical funcional one liner to make all the work for me.

Haskell lexical layout rule implementation

I have been working on a pet language, which has Haskell-like syntax. One of the neat things which Haskell does, which I have been trying to replicate, is its insertion of {, } and ; tokens based on code layout, before the parsing step.
I found http://www.haskell.org/onlinereport/syntax-iso.html, which includes a specification of how to implement the layout program, and have made a version of it (modified, of course, for my (much simpler) language).
Unfortunately, I am getting an incorrect output for the following:
f (do x y z) a b
It should be producing the token stream ID ( DO { ID ID ID } ) ID ID, but it is instead producing the token stream ID ( DO { ID ID ID ) ID ID }.
I imagine that this is due to my unsatisfactory implementation of parse-error(t) (parse-error(t) = false), but I have no idea how I would go about implementing parse-error(t) efficiently.
How do Haskell compilers like GHC etc. handle this case? Is there an easy way to implement parse-error(t) such that it handles this case (and hopefully others that I haven't noticed yet)?
I ended up implementing a custom version of the parsing algorithm used by JISON's compiled parsers, which takes an immutable state object, and a token, and performs as much work as possible with the token before returning. I am then able to use this implementation to check if a token will produce a parse error, and easily roll back to previous states of the parser.
It works fairly well, although it is a bit of a hack right now. You can find the code here: https://github.com/mystor/myst/blob/cb9b7b7d83e5d00f45bef0f994e3a4ce71c11bee/compiler/layout.js
I tried doing what #augustss suggested, using the error production to fake the token's insertion, but it appears as though JISON doesn't have all of the tools which I need in order to get a reliable implementation, and re-implementing a stripped-down version of the parsing algorithm turned out to be easier, and lined up better with the original document.

is not bound to a legal value during translation

I would like to know a bit more about the bounding this error relates to.
I have an alloy model for which I create an instance manually (by writting it down in XML).
This instance is readable and the A4Solution can be correctly displayed.
But when I try to evaluate an expression in this instance using the eval() function, I receive this error message, though the field name and type of the exprvar retrieved from the model is exactly the same as the one in the instance..
I would like to know what does this bounding consist of. What are the properties taken into consideration to tell that one element of the instance is bounded to one element of the model.
Is the hidden ID figuring in the XML somewhere taken into consideration ?
I would like to know what does this bounding consist of.
It means that every free variable (i.e., relation in Kodkod) has to be given a bound before asking the solver to solve the formula.
What are the properties taken into consideration to tell that one element of the instance is bounded to one element of the model.
The instance XML file contains and exact value (a tuple set) for each and every sig and field from the model; those tuple sets are exactly the bounds that should be used when evaluating expressions. I'm not sure how exactly you are trying to use the Alloy API; but recreating the bounds from the XML file should (mostly) be handled by the API behind the scene.
Judging by your last comment to my previous answer, your problem might be related to this post.
In a nutshell, if you are trying to evaluate AST expression objects obtained from one "alloy world" (CompModule) against a solution that was entirely recreated from an XML file, you will get exactly that error. For example, if you have two PrimSig objects, s1 and s2, and they both have the same name (e.g., both were obtained by parsing sig S {...}), they are not "Java equals" (s1.equals(s2) returns false); the same holds for the underlying Kodkod relations/variables. So if you then try to evaluate s1 in a context which has a bound only for s2, you'll get an error saying that s1 is not bound.
Providing the set of Signatures defined in the metamodel while reading the A4Solution solved my problem.
Thus changing:
solution = A4SolutionReader.read(null, new XMLNode(f));
by
solution = A4SolutionReader.read(mm.getAllReachableSigs(), new XMLNode(f));
solved this illegal bounding issue

What is typestate?

What does TypeState refer to in respect to language design? I saw it mentioned in some discussions regarding a new language by mozilla called Rust.
Note: Typestate was dropped from Rust, only a limited version (tracking uninitialized and moved from variables) is left. See my note at the end.
The motivation behind TypeState is that types are immutable, however some of their properties are dynamic, on a per variable basis.
The idea is therefore to create simple predicates about a type, and use the Control-Flow analysis that the compiler execute for many other reasons to statically decorate the type with those predicates.
Those predicates are not actually checked by the compiler itself, it could be too onerous, instead the compiler will simply reasons in terms of graph.
As a simple example, you create a predicate even, which returns true if a number is even.
Now, you create two functions:
halve, which only acts on even numbers
double, which take any number, and return an even number.
Note that the type number is not changed, you do not create a evennumber type and duplicate all those functions that previously acted on number. You just compose number with a predicate called even.
Now, let's build some graphs:
a: number -> halve(a) #! error: `a` is not `even`
a: number, even -> halve(a) # ok
a: number -> b = double(a) -> b: number, even
Simple, isn't it ?
Of course it gets a bit more complicated when you have several possible paths:
a: number -> a = double(a) -> a: number, even -> halve(a) #! error: `a` is not `even`
\___________________________________/
This shows that you reason in terms of sets of predicates:
when joining two paths, the new set of predicates is the intersection of the sets of predicates given by those two paths
This can be augmented by the generic rule of a function:
to call a function, the set of predicates it requires must be satisfied
after a function is called, only the set of predicates it established is satisfied (note: arguments taken by value are not affected)
And thus the building block of TypeState in Rust:
check: checks that the predicate holds, if it does not fail, otherwise adds the predicate to set of predicates
Note that since Rust requires that predicates are pure functions, it can eliminate redundant check calls if it can prove that the predicate already holds at this point.
What Typestate lack is simple: composability.
If you read the description carefully, you will note this:
after a function is called, only the set of predicates it established is satisfied (note: arguments taken by value are not affected)
This means that predicates for a types are useless in themselves, the utility comes from annotating functions. Therefore, introducing a new predicate in an existing codebase is a bore, as the existing functions need be reviewed and tweaked to cater to explain whether or not they need/preserve the invariant.
And this may lead to duplicating functions at an exponential rate when new predicates pop up: predicates are not, unfortunately, composable. The very design issue they were meant to address (proliferation of types, thus functions), does not seem to be addressed.
It's basically an extension of types, where you don't just check whether some operation is allowed in general, but in this specific context. All that at compile time.
The original paper is actually quite readable.
There's a typestate checker written for Java, and Adam Warski's explanatory page gives some useful information. I'm only just figuring this material out myself, but if you are familiar with QuickCheck for Haskell, the application of QuickCheck to monadic state seems similar: categorise the states and explain how they change when they are mutated through the interface.
Typestate is explained as:
leverage type system to encode state changes
Implemented by creating a type for each state
Use move semantics to invalidate a state
Return the next state from the previous state
Optionally drop the state(close file, connections,...)
Compile time enforcement of logic
struct Data;
struct Signed;
impl Data {
fn sign(self) -> Signed {
Signed
}
}
let data = Data;
let singed = data.sign();
data.sign() // Compile error

Why do a lot of programming languages put the type *after* the variable name?

I just came across this question in the Go FAQ, and it reminded me of something that's been bugging me for a while. Unfortunately, I don't really see what the answer is getting at.
It seems like almost every non C-like language puts the type after the variable name, like so:
var : int
Just out of sheer curiosity, why is this? Are there advantages to choosing one or the other?
There is a parsing issue, as Keith Randall says, but it isn't what he describes. The "not knowing whether it is a declaration or an expression" simply doesn't matter - you don't care whether it's an expression or a declaration until you've parsed the whole thing anyway, at which point the ambiguity is resolved.
Using a context-free parser, it doesn't matter in the slightest whether the type comes before or after the variable name. What matters is that you don't need to look up user-defined type names to understand the type specification - you don't need to have understood everything that came before in order to understand the current token.
Pascal syntax is context-free - if not completely, at least WRT this issue. The fact that the variable name comes first is less important than details such as the colon separator and the syntax of type descriptions.
C syntax is context-sensitive. In order for the parser to determine where a type description ends and which token is the variable name, it needs to have already interpreted everything that came before so that it can determine whether a given identifier token is the variable name or just another token contributing to the type description.
Because C syntax is context-sensitive, it very difficult (if not impossible) to parse using traditional parser-generator tools such as yacc/bison, whereas Pascal syntax is easy to parse using the same tools. That said, there are parser generators now that can cope with C and even C++ syntax. Although it's not properly documented or in a 1.? release etc, my personal favorite is Kelbt, which uses backtracking LR and supports semantic "undo" - basically undoing additions to the symbol table when speculative parses turn out to be wrong.
In practice, C and C++ parsers are usually hand-written, mixing recursive descent and precedence parsing. I assume the same applies to Java and C#.
Incidentally, similar issues with context sensitivity in C++ parsing have created a lot of nasties. The "Alternative Function Syntax" for C++0x is working around a similar issue by moving a type specification to the end and placing it after a separator - very much like the Pascal colon for function return types. It doesn't get rid of the context sensitivity, but adopting that Pascal-like convention does make it a bit more manageable.
the 'most other' languages you speak of are those that are more declarative. They aim to allow you to program more along the lines you think in (assuming you aren't boxed into imperative thinking).
type last reads as 'create a variable called NAME of type TYPE'
this is the opposite of course to saying 'create a TYPE called NAME', but when you think about it, what the value is for is more important than the type, the type is merely a programmatic constraint on the data
If the name of the variable starts at column 0, it's easier to find the name of the variable.
Compare
QHash<QString, QPair<int, QString> > hash;
and
hash : QHash<QString, QPair<int, QString> >;
Now imagine how much more readable your typical C++ header could be.
In formal language theory and type theory, it's almost always written as var: type. For instance, in the typed lambda calculus you'll see proofs containing statements such as:
x : A y : B
-------------
\x.y : A->B
I don't think it really matters, but I think there are two justifications: one is that "x : A" is read "x is of type A", the other is that a type is like a set (e.g. int is the set of integers), and the notation is related to "x ε A".
Some of this stuff pre-dates the modern languages you're thinking of.
An increasing trend is to not state the type at all, or to optionally state the type. This could be a dynamically typed langauge where there really is no type on the variable, or it could be a statically typed language which infers the type from the context.
If the type is sometimes given and sometimes inferred, then it's easier to read if the optional bit comes afterwards.
There are also trends related to whether a language regards itself as coming from the C school or the functional school or whatever, but these are a waste of time. The languages which improve on their predecessors and are worth learning are the ones that are willing to accept input from all different schools based on merit, not be picky about a feature's heritage.
"Those who cannot remember the past are condemned to repeat it."
Putting the type before the variable started innocuously enough with Fortran and Algol, but it got really ugly in C, where some type modifiers are applied before the variable, others after. That's why in C you have such beauties as
int (*p)[10];
or
void (*signal(int x, void (*f)(int)))(int)
together with a utility (cdecl) whose purpose is to decrypt such gibberish.
In Pascal, the type comes after the variable, so the first examples becomes
p: pointer to array[10] of int
Contrast with
q: array[10] of pointer to int
which, in C, is
int *q[10]
In C, you need parentheses to distinguish this from int (*p)[10]. Parentheses are not required in Pascal, where only the order matters.
The signal function would be
signal: function(x: int, f: function(int) to void) to (function(int) to void)
Still a mouthful, but at least within the realm of human comprehension.
In fairness, the problem isn't that C put the types before the name, but that it perversely insists on putting bits and pieces before, and others after, the name.
But if you try to put everything before the name, the order is still unintuitive:
int [10] a // an int, ahem, ten of them, called a
int [10]* a // an int, no wait, ten, actually a pointer thereto, called a
So, the answer is: A sensibly designed programming language puts the variables before the types because the result is more readable for humans.
I'm not sure, but I think it's got to do with the "name vs. noun" concept.
Essentially, if you put the type first (such as "int varname"), you're declaring an "integer named 'varname'"; that is, you're giving an instance of a type a name. However, if you put the name first, and then the type (such as "varname : int"), you're saying "this is 'varname'; it's an integer". In the first case, you're giving an instance of something a name; in the second, you're defining a noun and stating that it's an instance of something.
It's a bit like if you were defining a table as a piece of furniture; saying "this is furniture and I call it 'table'" (type first) is different from saying "a table is a kind of furniture" (type last).
It's just how the language was designed. Visual Basic has always been this way.
Most (if not all) curly brace languages put the type first. This is more intuitive to me, as the same position also specifies the return type of a method. So the inputs go into the parenthesis, and the output goes out the back of the method name.
I always thought the way C does it was slightly peculiar: instead of constructing types, the user has to declare them implicitly. It's not just before/after the variable name; in general, you may need to embed the variable name among the type attributes (or, in some usage, to embed an empty space where the name would be if you were actually declaring one).
As a weak form of pattern-matching, it is intelligable to some extent, but it doesn't seem to provide any particular advantages, either. And, trying to write (or read) a function pointer type can easily take you beyond the point of ready intelligability. So overall this aspect of C is a disadvantage, and I'm happy to see that Go has left it behind.
Putting the type first helps in parsing. For instance, in C, if you declared variables like
x int;
When you parse just the x, then you don't know whether x is a declaration or an expression. In contrast, with
int x;
When you parse the int, you know you're in a declaration (types always start a declaration of some sort).
Given progress in parsing languages, this slight help isn't terribly useful nowadays.
Fortran puts the type first:
REAL*4 I,J,K
INTEGER*4 A,B,C
And yes, there's a (very feeble) joke there for those familiar with Fortran.
There is room to argue that this is easier than C, which puts the type information around the name when the type is complex enough (pointers to functions, for example).
What about dynamically (cheers #wcoenen) typed languages? You just use the variable.

Resources