Alloy symmetry breaking does not work - alloy

i am using the Alloy api to generate some models.
Recently I realized that Alloy generates an isomorph models.
Is symmetry breaking default?
kind regards,

Yes, symmetry breaking is on by default. (Actually, I'm not aware of any way to turn it off, so "default" may not be quite the right word to use for it.)
If you find multiple isomorphic models among your results, it is because the Alloy Analyzer makes a performance / symmetry-breaking tradeoff. The tradeoff is discussed following section 5.2.1 of Software abstractions:
[The Analyzer] generates symmetry-breaking constraints from the model, and conjoins them to the analysis constraint. If they were perfect, these constraints would rule out all but one assignment in each equivalence class, but that turns out to require very large symmetry-breaking constraints, which would overload the solver and actually damage performance. The analyzer therefore generates a much smaller constraint, which breaks only some of the symmetries, but in practice eliminates a very high proportion (over 99%) of the assignments.

Related

Is Alloy Analyzer "a falsifier"?

In my community, recently we actively use the term "falsification" of a formal specification. The term appears in, for instance:
https://www.cs.huji.ac.il/~ornak/publications/cav05.pdf
I wonder whether Alloy Analyzer does falsification. It seems true for me, but I'm not sure. Is it correct? If not, what is the difference?
Yes, Alloy is a falsifier. Alloy's primary novelty when it was introduced 20 years ago was to argue that falsification was often more important than verification, since most designs are not correct, so the role of an analyzer should be to find the errors, not to show that they are not present. For a discussion of this issue, see Section 1.4, Verification vs. Refutation in Software analysis: A roadmap (Jackson and Rinard, 2000); Section 5.1.1, Instance Finding and Undecidability Compromises in Software Abstractions (Jackson 2006).
In Alloy's case though, there's another aspect, which is the argument that scope-complete analysis is actually quite effective from a verification standpoint. This claim is what we called the "small scope hypothesis" -- that most bugs can be found in small scopes (that is analyses that are bounded by a small fixed number of elements in each basic type).
BTW, Alloy was one of the earliest tools to suggest using SAT for bounded verification. See, for example, Boolean Compilation of Relational Specifications (Daniel Jackson, 1998), a tech report that was known to the authors of the first bounded model checking paper, which discusses Alloy's predecessor, Nitpick, in the following terms:
The hypothesis underlying Nitpick is a controversial one. It is that,
in practice, small scopes suffice. In other words, most errors can be
demonstrated by counterexamples within a small scope. This is a purely
empirical hypothesis, since the relevant distribution of errors cannot
be described mathematically: it is determined by the specifications
people write.
Our hope is that successful use of the Nitpick tool will justify the
hypothesis. There is some evidence already for its plausibility. In
our experience with Nitpick to date, we have not gained further
information by increasing the scope beyond 6.
A similar notion of scope is implicit in the context of model checking
of hardware. Although the individual state machines are usually
finite, the design is frequently parameterized by the number of
machines executing in parallel. This metric is analogous to scope; as
the number of machines increases, the state space increases
exponentially, and it is rarely possible to analyze a system involving
more than a handful of machines. Fortunately, however, it seems that
only small configurations are required to find errors. The celebrated
analysis of the Futurebus+ cache protocol [C+95], which perhaps marked
the turning point in model checking’s industrial reputation, was
performed for up to 8 processors and 3 buses. The reported flaws,
however, could be demonstrated with counterexamples involving at most
3 processors and 2 buses.
From my understanding of what is meant by falsification, yes, Alloy does it.
It becomes quite apparent when you look at the motivation behind the creation of Alloy, as forumalted in the Software Abstraction book:
This book is the result of a 10-year effort to bridge this gap, to develop a language (Alloy) that captures the essence of software abstractions simply and succinctly, with an analysis that is fully automatic, and can expose the subtlest of flaws.

Is incomplete testing versus exhaustive analysis an apples-to-oranges comparison?

I often hear arguments like this: A disadvantage of traditional testing is that it is incomplete whereas Alloy analysis is exhaustive and complete (within a bound). But, the first is talking about software, the second is talking about models. Isn't it an apples-to-oranges comparison?
Update: I was wrong. The comparison is not this: testing code versus analyzing models. That is an apples-to-oranges comparison. Instead, the comparisons are these:
Testing models versus analysis of models.
Testing code versus analysis of code.
Those are apples-to-apples comparisons.
So, whether the artifact is a model or code, you can compare two kinds of analysis: testing, which corresponds to drawing a relatively small number of cases randomly, without a bound on the size, versus small scope analysis, which involves all cases within a small bound.
Thanks to Daniel Jackson for clearing up my misunderstanding.
First, when Alloy was invented, the only existing tools for analyzing models in data-rich languages such as Z and VDM that were not proof-based used scenarios to test the model. Each scenario was constructed by the user, so the approach suffered from the cost of creating the scenarios and the low coverage of their small number.
Second, Alloy has been used to find bugs in code: see the PhD theses by Mandana Vaziri, Mana Taghdiri, Greg Dennis, Juan Pablo Galeotti and others. In all of these, bugs were found that evaded conventional tests.
Third, it's worth noting that bounded-exhaustive forms of testing are becoming viable. Sarfraz Khurshid was a pioneer in this work with his thesis on generating test cases, initially in a tool called TestEra based on Alloy, and later (with Darko Marinov et al) in a tool called Korat that traded a more diected solving method for less declarative constraints.

Consistent terminology: Modeling, DAE, ODE

I am new to the subject "modeling of physical systems". I read some basic literature and did some tutorials in Modelica and Simulink/Simscape. I wanted to ask you, if I understand the following content correctly:
Symbolic manipulation is the process of transforming a differential-algebraic system of equation (physical model: DAE) into a system of differential equations (ODE) that can be solved by standard solvers (Runge, Kutta, BDF, ...)
There are also solver that can solve DAE's directly. But Modelica (openModelica, Dymola) and Simscape transfer the System into an ODE (why are this methods better compared to direct DAE solvers?)
A "flat Modelica code" is the result ( = ODE) of the transformation.
Thank you very much for your answers.
Symbolic processing for Modelica includes:
remove object oriented structure and obtain an hybrid DAE (flat Modelica)
perform matching, index reduction, casualization to get an ODE
perform optimization (tearing, common subexpression elimination, etc)
generate code for a particular solver
OpenModelica can also solve the system in DAE mode without transforming it to ODE and I guess other Modelica tools can also do that.
A "flat Modelica code" is Modelica code where the object orientation is removed, connect equations are expanded to normal equations. The result is a hybrid DAE.
See Modelica Spec 3.3 for more info about all this (for example Appendix C):
https://modelica.org/documents/ModelicaSpec33Revision1.pdf
So I think your understanding of the terminology is very good too.
Due to the declarative way (opposed to imperative) of programming in modelica, we get immediately very high numbers of algebraic equations. Solving these (partly) symbolically has, above all, these essential advantages:
Speed. Without eliminating algebraic loops, modelica would not be practically usable for any real-world problem and even then only in simple cases no algebraic equations remain. It would be too slow and would force you to do transformations manually yourself in modelica too (as in imperative languages e.g. in C/C++ or Simulink). Even today modelica can still be slower than manually transformed and optimized solutions.
Moreover modelica applications often need simulations in real-time.
Correctness. Symbolic transformations are based on proofs and modelica applications often are in the area of safety critical or cyber-physical systems.
One additional consideration is that there are different forms of DAEs, and modeling often lead to high-index DAEs that are complicated to solve numerically (*). (Note "high" means index greater than 1, typically 2 - but sometimes even higher.)
Symbolic transformations can reduce high-index DAEs to semi-explicit index 1 DAEs, and then by (numerically) solving the systems of equations they are transformed into ODEs.
Thus even if a tool solves DAEs directly it is normally the semi-explicit index 1 DAEs that are solved, not the original high index DAE.
(I know this answer is late. The hybrid part for the symbolic transformations is more complicated, still working on that.)
For more information see https://en.wikipedia.org/wiki/Differential-algebraic_system_of_equations
(*): There are some solvers for high index DAEs (in particular index 2), but typically they rely on a specific structure of the model and finding that structure requires similar techniques as reducing the index to 1.

How to implement a "Generalisation" in SCL

Is it possible for a generalisation in UML to be implemented in Simatic SCL code (or Structured text code)?
The definition of a Generalisation in UML:
A generalisation is a relationship between a morew general classifier and a
more specific classifier. Each Instance of the specific classifier is also an
indirect instance of the general clasifier. Thus, the specific classifier
inherits the features of the more general classifier.
Features specified for instances of the general classifier are implicitly
specified for instances of the specific classifier. Any constraint applying
to instances of the general classifier also applies to instances of the
specific classifier.
In general the answer to this is no, not really. All means of programming PLCs (ladder, ST, FBD, etc) are generally only very lightly abstracted from the actual machine code. They are closer to assembly wrappers than to anything we would think of as a modern development language. Structured Text is closer to very primitive Pascal - it lacks most any sort of object oriented features.
The notion is that PLCs and PLC programmers have long since been used to an approach of extreme micromanagement when it comes to developing programs for them. The reasons for this are many - some more valid than others. Scott Whitlock wrote a good bit here outlining some of those reasons. A big one is that maintenance guys on the factory floor are often the ones trying to troubleshoot the machines and having clear, non-abstract, state-machine information available to them is much more valuable than the need for an elegant, minimal formulation to stroke the ego of the system developer.
PLC programming is a ruthlessly practical industry. If you have the choice between something 10% more practical and something 90% more elegant, the practical solution will always win.
With that said - there are some who are playing in this area. I suggest a quick read of this article for some examples of trying to make ST work a bit like you are suggesting. Still, I would be cautious before putting anything like this to work in a real factory with real machines that need to be both safe and reliably making money.

Expression trees vs IL.Emit for runtime code specialization

I recently learned that it is possible to generate C# code at runtime and I would like to put this feature to use. I have code that does some very basic geometric calculations like computing line-plane intersections and I think I could gain some performance benefits by generating specialized code for some of the methods because many of the calculations are performed for the same plane or the same line over and over again. By specializing the code that computes the intersections I think I should be able to gain some performance benefits.
The problem is that I'm not sure where to begin. From reading a few blog posts and browsing MSDN documentation I've come across two possible strategies for generating code at runtime: Expression trees and IL.Emit. Using expression trees seems much easier because there is no need to learn anything about OpCodes and various other MSIL related intricacies but I'm not sure if expression trees are as fast as manually generated MSIL. So are there any suggestions on which method I should go with?
The performance of both is generally same, as expression trees internally are traversed and emitted as IL using the same underlying system functions that you would be using yourself. It is theoretically possible to emit a more efficient IL using low-level functions, but I doubt that there would be any practically important performance gain. That would depend on the task, but I have not come of any practical optimisation of emitted IL, compared to one emitted by expression trees.
I highly suggest getting the tool called ILSpy that reverse-compiles CLR assemblies. With that you can look at the code actually traversing the expression trees and actually emitting IL.
Finally, a caveat. I have used expression trees in a language parser, where function calls are bound to grammar rules that are compiled from a file at runtime. Compiled is a key here. For many problems I came across, when what you want to achieve is known at compile time, then you would not gain much performance by runtime code generation. Some CLR JIT optimizations might be also unavailable to dynamic code. This is only an opinion from my practice, and your domain would be different, but if performance is critical, I would rather look at native code, highly optimized libraries. Some of the work I have done would be snail slow if not using LAPACK/MKL. But that is only a piece of the advice not asked for, so take it with a grain of salt.
If I were in your situation, I would try alternatives from high level to low level, in increasing "needed time & effort" and decreasing reusability order, and I would stop as soon as the performance is good enough for the time being, i.e.:
first, I'd check to see if Math.NET, LAPACK or some similar numeric library already has similar functionality, or I can adapt/extend the code to my needs;
second, I'd try Expression Trees;
third, I'd check Roslyn Project (even though it is in prerelease version);
fourth, I'd think about writing common routines with unsafe C code;
[fifth, I'd think about quitting and starting a new career in a different profession :) ],
and only if none of these work out, would I be so hopeless to try emitting IL at run time.
But perhaps I'm biased against low level approaches; your expertise, experience and point of view might be different.

Resources