Optimization of total ordering in Alloy and Kodkod - alloy

It is said in various places (e.g. here enter link description here or there) that the total ordering relation is hardwired in order to improve the efficiency of analyses (and to get atom names to appear in the "natural" order).
As far as I understand, the optimization is made in Kodkod (in this piece of code). However, is there an article or document explaining in more detail (than the Java documentation, which speaks in terms of Boolean matrices and does not provide an argumentation for the algorithm --which is fine in source code documentation--) the optimizations made in Kodkod? As far as I can tell, E. Torlak's PhD thesis does not speak of these ones (a paper by I. Shlyakhter speak about other optimizations, but I don't know whether those are implemented in Kodkod or Alloy).

This optimization is made in Kodkod, at the code location you found. It is described briefly in Ilya Shlyakhter's PhD thesis (p. 111, Section 4.5.5).

Related

Is Alloy Analyzer "a falsifier"?

In my community, recently we actively use the term "falsification" of a formal specification. The term appears in, for instance:
https://www.cs.huji.ac.il/~ornak/publications/cav05.pdf
I wonder whether Alloy Analyzer does falsification. It seems true for me, but I'm not sure. Is it correct? If not, what is the difference?
Yes, Alloy is a falsifier. Alloy's primary novelty when it was introduced 20 years ago was to argue that falsification was often more important than verification, since most designs are not correct, so the role of an analyzer should be to find the errors, not to show that they are not present. For a discussion of this issue, see Section 1.4, Verification vs. Refutation in Software analysis: A roadmap (Jackson and Rinard, 2000); Section 5.1.1, Instance Finding and Undecidability Compromises in Software Abstractions (Jackson 2006).
In Alloy's case though, there's another aspect, which is the argument that scope-complete analysis is actually quite effective from a verification standpoint. This claim is what we called the "small scope hypothesis" -- that most bugs can be found in small scopes (that is analyses that are bounded by a small fixed number of elements in each basic type).
BTW, Alloy was one of the earliest tools to suggest using SAT for bounded verification. See, for example, Boolean Compilation of Relational Specifications (Daniel Jackson, 1998), a tech report that was known to the authors of the first bounded model checking paper, which discusses Alloy's predecessor, Nitpick, in the following terms:
The hypothesis underlying Nitpick is a controversial one. It is that,
in practice, small scopes suffice. In other words, most errors can be
demonstrated by counterexamples within a small scope. This is a purely
empirical hypothesis, since the relevant distribution of errors cannot
be described mathematically: it is determined by the specifications
people write.
Our hope is that successful use of the Nitpick tool will justify the
hypothesis. There is some evidence already for its plausibility. In
our experience with Nitpick to date, we have not gained further
information by increasing the scope beyond 6.
A similar notion of scope is implicit in the context of model checking
of hardware. Although the individual state machines are usually
finite, the design is frequently parameterized by the number of
machines executing in parallel. This metric is analogous to scope; as
the number of machines increases, the state space increases
exponentially, and it is rarely possible to analyze a system involving
more than a handful of machines. Fortunately, however, it seems that
only small configurations are required to find errors. The celebrated
analysis of the Futurebus+ cache protocol [C+95], which perhaps marked
the turning point in model checking’s industrial reputation, was
performed for up to 8 processors and 3 buses. The reported flaws,
however, could be demonstrated with counterexamples involving at most
3 processors and 2 buses.
From my understanding of what is meant by falsification, yes, Alloy does it.
It becomes quite apparent when you look at the motivation behind the creation of Alloy, as forumalted in the Software Abstraction book:
This book is the result of a 10-year effort to bridge this gap, to develop a language (Alloy) that captures the essence of software abstractions simply and succinctly, with an analysis that is fully automatic, and can expose the subtlest of flaws.

More work like Judea Pearl's Heuristics?

I am researching formal and informal search heuristics. One of the best books on the subject I've found is Judea Pearl's Heuristics. Embarrassingly, I find myself unable to find a good search strategy that returns more material in this vein.
Things I am looking for:
Summary papers about advances in search
Books/papers that cover some of the history of the development of methods
Some idea about who is currently producing research in this space and their specialization
Additional keywords, search methods, and items that should appear on this list to broaden the search
I'm looking for non-technical material. Most works have a bunch of specific implementation detail and small, short bits about where the research came from and what it lead to (which leads to me chasing citation trails). This is totally fine, but hoping to find works that include more of the non-technical info all in one place.
Some works I've canvassed so far:
Search and Optimization by Metaheuristics. Techniques and Algorithms Inspired by Nature
Metaheuristics: from design to implementation
Artificial Intelligence, Evolutionary Computing and Metaheuristics: In the Footsteps of Alan Turing
Essays and Surveys in Metaheuristics
Essentials of Metaheuristics
Handbook of approximation algorithms and metaheuristics
Heuristics, Metaheuristics and Approximate Methods in Planning and Scheduling
Recent Advances on Meta-Heuristics and Their Application to Real Scenarios
Advances in Knowledge Representation
Applications of Conceptual Spaces: The Case for Geometric Knowledge Representation
Concepts, Ontologies, and Knowledge Representation
Handbook of Knowledge Representation
I realize this is more an academically oriented question and am also open to suggestions of where else to post such a question.

machine representation of natural text

I'm currently working on high-level machine representation of natural text.
For example,
"I had one dog but I gave it to Danny who didn't have any"
would be
I.have.dog =1
I.have.dog -=1
Danny.have.dog = 0
Danny.have.dog +=1
something like this....
I'm trying to find resources, but can't really find matching topics..
Is there a valid subject name for this type of research? Any library of resources?
Natural logic sounds like something related but it's not really the same thing I'm working on. Please help me out!
Representing natural language's meaning is the domain of computational semantics. Within that area, lots of frameworks have been developed, though the basic one is still first-order logic.
Specifically, your problem seems to be that of recognizing discourse semantics, which deals with information change brought about by language use. This is pretty much an open area of research, so expect to find a lot of research papers and PhD positions, but little readily-usable software.
As larsmans already said, this is pretty much a really open field of research, called computational semantics (a subfield of computational linguistics.)
There's one important thing that you'll need to understand before starting off in the comp-sem world: most people there use fancy high-level languages. By high-level I don't mean C, but more something like LISP, Prolog, or, as of late, Haskell. Computational semantics is very close to logic, which is why people researching the topic are more comfortable with functional and logical languages — they're closer to what they actually use all day long.
It will also be very useful for you to first look at some foundational course in predicate logic, since that's what the underlying literature usually takes for granted.
A good introduction to the connection between logic and language is L.T.F. Gamut — Logic, Language, and Meaning, volume I. This deals with the linguistic side of semantics, which won't help you implement anything, but it will help you understand the following literature. That said, there are at least some books that will explain predicate logic as they go, but if you ask me, any person really interested in the representation of language as a formal system should take a course in predicate and possibly intuitionist and intensional logic.
To give you a bit of a peek, your example is rather difficult to treat for
current comp-sem approaches. Not impossible, but already pretty high up the
scale of difficulty. What makes it difficult is the tense for one part (dealing
with tense and aspect will typically bring you into even semantics,) but also
that you'd have to define the give and have relations in a way that
works for this example. (An easier example to work with would be, say "I had
a dog, but I gave it to Danny who didn't have any." Can you see why?)
Let's translate "I have a dog."
∃x[dog(x) ∧ have(I,x)]
(There is an object x, such that x is a dog and the have-relation holds between
"I" and x.)
These sentences would then be evaluated against a model, where the "I"
constant might already be defined. By evaluating multiple sentences in sequence,
you could then alter that model so that it keeps track of a conversation.
Let's give you some suggestions to start you off.
The classic comp-sem system is
SHRDLU, which places geometric
figures of certain color in a virtual environment. You can play around with it, since there's a Windows-compatible demo online at that page I linked you to.
The best modern book on the topic is probably Blackburn and Bos
(2005). It's written in Prolog, but
there are sources linked on the page to learn Prolog
(now!)
Van Eijck and Unger give a good course on computational semantics in Haskell, which is a bit more recent, but in my eyes not quite as educational in terms of raw computational semantics as Blackburn and Bos.

Software to Tune/Calibrate Properties for Heuristic Algorithms

Today I read that there is a software called WinCalibra (scroll a bit down) which can take a text file with properties as input.
This program can then optimize the input properties based on the output values of your algorithm. See this paper or the user documentation for more information (see link above; sadly doc is a zipped exe).
Do you know other software which can do the same which runs under Linux? (preferable Open Source)
EDIT: Since I need this for a java application: should I invest my research in java libraries like gaul or watchmaker? The problem is that I don't want to roll out my own solution nor I have time to do so. Do you have pointers to an out-of-the-box applications like Calibra? (internet searches weren't successfull; I only found libraries)
I decided to give away the bounty (otherwise no one would have a benefit) although I didn't found a satisfactory solution :-( (out-of-the-box application)
Some kind of (Metropolis algorithm-like) probability selected random walk is a possibility in this instance. Perhaps with simulated annealing to improve the final selection. Though the timing parameters you've supplied are not optimal for getting a really great result this way.
It works like this:
You start at some point. Use your existing data to pick one that look promising (like the highest value you've got). Set o to the output value at this point.
You propose a randomly selected step in the input space, assign the output value there to n.
Accept the step (that is update the working position) if 1) n>o or 2) the new value is lower, but a random number on [0,1) is less than f(n/o) for some monotonically increasing f() with range and domain on [0,1).
Repeat steps 2 and 3 as long as you can afford, collecting statistics at each step.
Finally compute the result. In your case an average of all points is probably sufficient.
Important frill: This approach has trouble if the space has many local maxima with deep dips between them unless the step size is big enough to get past the dips; but big steps makes the whole thing slow to converge. To fix this you do two things:
Do simulated annealing (start with a large step size and gradually reduce it, thus allowing the walker to move between local maxima early on, but trapping it in one region later to accumulate precision results.
Use several (many if you can afford it) independent walkers so that they can get trapped in different local maxima. The more you use, and the bigger the difference in output values, the more likely you are to get the best maxima.
This is not necessary if you know that you only have one, big, broad, nicely behaved local extreme.
Finally, the selection of f(). You can just use f(x) = x, but you'll get optimal convergence if you use f(x) = exp(-(1/x)).
Again, you don't have enough time for a great many steps (though if you have multiple computers, you can run separate instances to get the multiple walkers effect, which will help), so you might be better off with some kind of deterministic approach. But that is not a subject I know enough about to offer any advice.
There are a lot of genetic algorithm based software that can do exactly that. Wrote a PHD about it a decade or two ago.
A google for Genetic Algorithms Linux shows a load of starting points.
Intrigued by the question, I did a bit of poking around, trying to get a better understanding of the nature of CALIBRA, its standing in academic circles and the existence of similar software of projects, in the Open Source and Linux world.
Please be kind (and, please, edit directly, or suggest editing) for the likely instances where my assertions are incomplete, inexact and even flat-out incorrect. While working in related fields, I'm by no mean an Operational Research (OR) authority!
[Algorithm] Parameter tuning problem is a relatively well defined problem, typically framed as one of a solution search problem whereby, the combination of all possible parameter values constitute a solution space and the parameter tuning logic's aim is to "navigate" [portions of] this space in search of an optimal (or locally optimal) set of parameters.
The optimality of a given solution is measured in various ways and such metrics help direct the search. In the case of the Parameter Tuning problem, the validity of a given solution is measured, directly or through a function, from the output of the algorithm [i.e. the algorithm being tuned not the algorithm of the tuning logic!].
Framed as a search problem, the discipline of Algorithm Parameter Tuning doesn't differ significantly from other other Solution Search problems where the solution space is defined by something else than the parameters to a given algorithm. But because it works on algorithms which are in themselves solutions of sorts, this discipline is sometimes referred as Metaheuristics or Metasearch. (A metaheuristics approach can be applied to various algorihms)
Certainly there are many specific features of the parameter tuning problem as compared to the other optimization applications but with regard to the solution searching per-se, the approaches and problems are generally the same.
Indeed, while well defined, the search problem is generally still broadly unsolved, and is the object of active research in very many different directions, for many different domains. Various approaches offer mixed success depending on the specific conditions and requirements of the domain, and this vibrant and diverse mix of academic research and practical applications is a common trait to Metaheuristics and to Optimization at large.
So... back to CALIBRA...
From its own authors' admission, Calibra has several limitations
Limit of 5 parameters, maximum
Requirement of a range of values for [some of ?] the parameters
Works better when the parameters are relatively independent (but... wait, when that is the case, isn't the whole search problem much easier ;-) )
CALIBRA is based on a combination of approaches, which are repeated in a sequence. A mix of guided search and local optimization.
The paper where CALIBRA was presented is dated 2006. Since then, there's been relatively few references to this paper and to CALIBRA at large. Its two authors have since published several other papers in various disciplines related to Operational Research (OR).
This may be indicative that CALIBRA hasn't been perceived as a breakthrough.
State of the art in that area ("parameter tuning", "algorithm configuration") is the SPOT package in R. You can connect external fitness functions using a language of your choice. It is really powerful.
I am working on adapters for e.g. C++ and Java that simplify the experimental setup, which requires some getting used to in SPOT. The project goes under name InPUT, and a first version of the tuning part will be up soon.

canonical problems list

Does anyone known of a a good reference for canonical CS problems?
I'm thinking of things like "the sorting problem", "the bin packing problem", "the travailing salesman problem" and what not.
edit: websites preferred
You can probably find the best in an algorithms textbook like Introduction to Algorithms. Though I've never read that particular book, it's quite renowned for being thorough and would probably contain most of the problems you're likely to encounter.
"Computers and Intractability: A guide to the theory of NP-Completeness" by Garey and Johnson is a great reference for this sort of thing, although the "solved" problems (in P) are obviously not given much attention in the book.
I'm not aware of any good on-line resources, but Karp's seminal paper Reducibility among Combinatorial Problems (1972) on reductions and complexity is probably the "canonical" reference for Hard Problems.
Have you looked at Wikipedia's Category:Computational problems and Category:NP Complete Problems pages? It's probably not complete, but they look like good starting points. Wikipedia seems to do pretty well in CS topics.
I don't think you'll find the answers to all those problems in only one book. I've never seen any decent, comprehensive website on algorithms, so I'd recommend you to stick to the books. That said, you can always get some introductory material on canonical algorithm texts (there are always three I usually recommend: CLRS, Manber, Aho, Hopcroft and Ullman (this one is a bit out of date in some key topics, but it's so formal and well-written that it's a must-read). All of them contain important combinatorial problems that are, in some sense, canonical problems in computer science. After learning some fundamentals in graph theory you'll be able to move to Network Flows and Linear Programming. These comprise a set of techniques that will ultimately solve most problems you'll encounter (linear programming with the variables restricted to integer values is NP-hard). Network flows deals with problems defined on graphs (with weighted/capacitated edges) with very interesting applications in fields that seemingly have no relationship to graph theory whatsoever. THE textbook on this is Ahuja, Magnanti and Orlin's. Linear programming is some kind of superset of network flows, and deals with optimizing a linear function on variables subject to restrictions in the form of a linear system of equations. A book that emphasizes the relationship to network flows is Bazaraa's. Then you can move on to integer programming, a very valuable tool that presents many natural techniques for modelling problems like bin packing, task scheduling, the knapsack problem, and so on. A good reference would be L. Wolsey's book.
You definitely want to look at NIST's Dictionary of Algorithms and Data Structures. It's got the traveling salesman problem, the Byzantine generals problem, the dining philosophers' problem, the knapsack problem (= your "bin packing problem", I think), the cutting stock problem, the eight queens problem, the knight's tour problem, the busy beaver problem, the halting problem, etc. etc.
It doesn't have the firing squad synchronization problem (I'm surprised about that omission) or the Jeep problem (more logistics than computer science).
Interestingly enough there's a blog on codinghorror.com which talks about some of these in puzzle form. (I can't remember whether I've read Smullyan's book cited in the blog, but he is a good compiler of puzzles & philosophical musings. Martin Gardner and Douglas Hofstadter and H.E. Dudeney are others.)
Also maybe check out the Stony Brook Algorithm Repository.
(Or look up "combinatorial problems" on google, or search for "problem" in Wolfram Mathworld or look at Hilbert's problems, but in all these links many of them are more pure-mathematics than computer science.)
#rcreswick those sound like good references but fall a bit shy of what I'm thinking of. (However, for all I know, it's the best there is)
I'm going to not mark anything as accepted in hopes people might find a better reference.
Meanwhile, I'm going to list a few problems here, fell free to add more
The sorting problem Find an order for a set that is monotonic in a given way
The bin packing problem partition a set into a minimum number of sets where each subset is "smaller" than some limit
The travailing salesman problem Find a Hamiltonian cycle in a weighted graph with the minimum total weight

Resources