I am struggling to understand the difference in functionality between clp(Z) and another relational arithmetic system used in MiniKanren.
In particular, clp(Z) apparently applies to bounded fields while Kiselyov et al. is described to apply to unbounded fields.
I tried to use various edge cases related to infinity and indetermination, but I wasn't able to find clear differences other than Kiselyov et al. obviously not supporting intervals and negative numbers.
What is the point/advantage of the Kiselyov system? Is it mainly that the implementation is simpler, or is there more?
Good question!
There are many approaches to performing relational arithmetic, including CLP(Z), CLP(FD), "Kiselyov Arithmetic", and relational Peano Arithmetic. You can also restrict arithmetic to only work on ground numbers (and otherwise signal an error), or delay evaluation of arithmetic constraints until the arguments to a relation become ground enough to solve the relation deterministically.
All of these approaches are useful, and they all have their tradeoffs.
I've been thinking about writing a short paper on this topic. If you are interested, perhaps we could write it up together.
To briefly answer your question, we should keep in mind the distinction between CLP(Z) and CLP(FD). 'CLP(X)' stands for "Constraint Logic Programming over domain 'X'". CLP(FD) operates over a Finite Domain (FD) of integers. In CLP(Z), the domain is the set of all integers, and is therefore unbounded in size.
Obviously the FD domain is contained within the Z domain, so why bother having a separate CLP(FD) domain/solver? Because it may be faster or easier to solve problems within a restricted domain. Indeed, some problems that are undecidable in one domain may become decidable within a restricted domain.
In particular, clp(Z) apparently applies to bounded fields while Kiselyov et al. is described to apply to unbounded fields.
The Z domain in CLP(Z) is actually unbounded. The FD domain in CLP(FD) is bounded. In Kiselyov Arithmetic, the domain is unbounded.
Kiselyov Numerals are interesting in that one numeral can represent infinite sets of concrete numbers. For example,
(0 1 1)
is the Kiselyov Numeral representing 6. (Kiselyov Numerals are lists of binary digits in little-endian order, which is why 6 is represented with a leading '0' instead of a leading '1'.)
Consider this Kiselyov Numeral:
`(1 . ,x)
where x is a "fresh" logic variable. This Kiselyov Numeral represents any positive odd integer. This is one advantage of Kiselyov Arithmetic: operations can be performed on partially-instantiated numerals, representing potentially infinitely many concrete natural numbers, and without grounding the (potentially infinitely many) answers. Representing infinitely many natural numbers as a single numeral sometimes allows us to reason about infinitely many concrete numbers at once. Alas, this only works in cases where the underlying set of natural numbers we want to represent can be represented using Kiselyov Numerals of the form
`(<bit sequence that doesn't end in 0> . ,x)
One disadvantage of Kiselyov Arithmetic is that each arithmetic relation is "solved" immediately: if we want to add two Kiselyov Numerals, then multiply the result by another Kiselyov Numeral, we have to either perform the complete addition or the complete multiplication first, then perform the other operation. In contrast a CLP(Z) or CLP(FD) solver can accumulate constraints, check satisfiability at each step, and only perform the full solving at the end of the computation, once all the constraints have been accumulated. This approach can be much more efficient, and can also find inconsistencies in a set of constraints, where naive use of Kiselyov Arithmetic would diverge.
other than Kiselyov et al. obviously not supporting intervals and negative numbers.
Kiselyov Arithmetic can be extended to support negative numbers, and also to support fractions/rational numbers. I suspect that supporting intervals is also doable. Alas, I don't know of any libraries that include these extensions.
There are many other tradeoffs between different approaches to relational arithmetic, worthy of a short paper, at least! I hope this gives you some idea, however.
Cheers,
--Will
Related
I'm working with linear problems on rationals in Z3. To use Z3 I take SBV.
An example of a problem I pose is:
import Data.SBV
solution1 = do
x <- sRational "x"
w <- sRational "w"
constrain $ x.< w
constrain $ x + 2*w .>=0 .|| x .== 1
My question is:
Are these kinds of problems decidable?
I couldn't find a list of decidable theories or a way to tell if a theory is decidable.
The closest I found is this. The theory about the real ones is decidable, but is it the same for rational numbers? Intuition tells me that it is, but I have not found the information that allows me to assure it.
Thanks in advance
SBV models rationals using the standard "two integers" idea; that is, it represents the numerator and the denominator separately as integers. This means that if you add two symbolic rationals, you'll have a non-linear term over the integers. So, in theory, the problem will be in the semi-decidable fragment. That is, even if you restrict your multiplications to concrete scalars, addition of symbolic rationals will give rise to non-linear terms over integers.
Having said that, I had good luck using rationals; where z3 was able to decide most problems of interest without much difficulty. If it proves to be an issue, you should switch to SReal type (i.e., algebraic reals), for which z3 has a decision procedure. But of course, the models you get can now include algebraic reals, such as square-root-of-2, etc. (i.e., the roots of any polynomial with integer coefficients.)
Side note If your problem allows for delta-sat (i.e., satisfiability with perturbations), you should look into dReal (http://dreal.github.io), which SBV also supports as a backend solver. But perhaps that's not what you had in mind.
Theoretical note
Strictly speaking, linear arithmetic over rationals is decidable; see Section 3 of https://www.cs.ox.ac.uk/people/james.worrell/lecture15-2015.pdf for a proof. However, SMT solvers do not support rationals out-of-the-box; and SBV (as I mentioned above), uses two symbolic integers to represent rationals. So, adding two rationals will give rise to multiplication of two symbolic integers, taking you out of the decidable fragment. Of course, in practice, the solvers are quite adept at coming up with solutions even in the presence of non-linear terms; it's just that you're not always guaranteed. So, a more strict answer to your question is while linear arithmetic over rationals is decidable, the translation used by SBV puts the problem into the non-linear integer arithmetic domain, and hence decidability is not guaranteed. In any case, SMTLib does not come with a theory of rationals, so you're kind of out-of-luck when it comes to first class support for them.
I guess a rational solution will exist iff an integer solution exists to a suitably scaled collection of constraints. For example, x=1/2(=5/10), w=3/5(=6/10) is a solution to your example problem. Scaling your problem by 10, we have the equivalent constraint set:
10*x < 10*w
(10*x + 20*w >= 0) || (10*x == 10)
Writing x'=10*x and w'=10*w, this means that x'=5, w'=6 is an integer solution to:
x' < w'
(x' + w' >= 0) || (x' == 10)
Presburger famously showed that first-order logic plus integers and addition is decidable. (Multiplication by a constant is also allowed, since it can be expanded to an addition -- e.g. 3*x is x+x+x.)
I guess the only trick left is to show that it's possible to choose what scaling to use without having solved the problem yet. Nothing obvious occurs to me off the top of my head, but it seems reasonable that this should be doable. For example, perhaps if you take the product of all the nonzero numerators and denominators in your constraint set, you can show that the set of rationals with that product as their denominator is indistinguishable from the full set of rationals. (If so, you could look through the proof to see if it still works with a smaller denominator.)
I'm not a z3 expert, so I can't talk about how this translates to whether that tool specifically is suitable, but it seems likely to me that it is possible to create a suitable tool.
In Haskell, why is the infix alias of mappend (from class Monoid) <> instead of +? In algebra courses + is usually used for the binary operator of a monoid.
The function + is specific to numbers, and moreover, it's only one way to implement Monoid for numbers (* is equally valid). Similarly, with booleans, it would be equally valid to use && and ||. Using the symbol + suggests that Monoids are about addition specifically, when really they're just about any associative operation.
It is true that, at least in my experience, one is likely to use mappend in a fashion resembling addition: concatenating lists or vectors, taking unions of sets or maps, etc, etc. However, the Haskell mindset favors generality and adherence to mathematical principles over (arguably) what is more intuitive. It's certainly reasonable, in my opinion, to think of mappend as a sort of general addition, and make adjustments in the cases where it isn't.
Partly due to the principle of least astonishment and partly because there are at least two sensible monoid instances for numbers (namely, Sum and Product from Data.Monoid).
All of the type classes that I've come across, I think have had laws that establish symmetries by specifying equations. I was wondering though if there are any prominent theoretical or even practical examples of type classes that establish asymmetries, i.e. ones that demand the lack of symmetry? By e.g. specifying <expr1> /= <expr2> or <, or not somePredicate(a, b).
I understand that inequality can be expressed as an equality with a free variable, i.e. a > b = a + k = b etc, but I'm thinking the introduction of free variables itself might align with my idea of enforced asymmetry.
What would be the (theoretical) applications of such law? And are there any (runnable) examples of this?
Alternatively: if this can't be considered Haskell enough to be on SO, should this go on CS or CSTheory?
Algebraic laws in general typically are only specified in terms of equational identities, and not not disequalities. The standpoint to think about this is model theory. A theory can be thought of as 1) a collection of symbols, of different arities, so that sentences can be constructed from them (i.e. of arity 0 we might have sequences of numerals, of arity 1 we have negation, and of arity two we have addition) and 2) a set of equations that provide relations between sentences constructed from such signatures.
This lets us describe things like various arithmetic theories, groups, rings, modules, etc.
Now a model of a theory is a set of concrete assignments of mathematical objects (numbers, functions, etc) to the elements of the signature, such that the translation of sentences into the elements of the model respects these equations.
Categorically, we often think of a theory as a special sort of category of all possible sentences generated by the signature. The arrows in this category are implication -- sentences which may be generated from others by application of the equational identities. This in turn induces equivalences between all sentences which are the same under the application of the equational identities (this yields the "generators and relations" approach). And in turn, a model is simply a functor from this theory to any other category, though typically Set.
This yields a very nice adjunction between syntax and semantics. The greater the collection of sentences you want to model, the fewer the models you can get, and the more models you have, the smaller the set of sentences that will be satisfied by all of them. (Here I am only sketching the idea rather than filling in lots of important details).
In any case, one consequence of this that people tend to ignore, but that really pays off, is that in such a setting you want a "terminal model" that is the least among all models, just as you want an "initial theory" that admits all models. The terminal (aka trivial) model is the functor that sends everything in the theory to the empty set and maps on the empty set. Lots of very nice properties emerge when you have such things. But note -- to have such things, you must only have equational identities and not "disidentities." Such theories are called Algebraic Theories.
What does this all have to do with Haskell? Well, we can think of the signatures of typeclasses exactly as signatures of algebraic theories, and the laws of them as the equations of such theories. And that's generally how typeclasses are used in Haskell and why they were introduced -- to suit these sorts of situations.
But of course we don't have to do this -- we can have whatever laws we want. But we lose all sorts of nice properties along the way -- and often in fact find that disequalities mean our theory will have very few models, and with weird structure relating them. Since typeclasses are intended to capture common structure between various things, and since non-algebraic theories tend to fix unique(ish) models, then it turns out it is rarely the case that we would want to use disequality relations in typeclass laws. And indeed I can't think of any examples where I've seen it come up.
Here's another way to think of it -- consider a theory with equalities and disequalities both, and then eliminate the disequalities. What remains still admits all the prior models, but also may have a bunch of "unintended" ones. So we don't gain additional reasoning in terms of rewrites -- we just have certain models that are a priori excluded. Furthermore, when one wishes to rule out "unintended" models this is usually because we want to fix a particular "intended" one. And if we want to fix a particular intended model, the question immediately arises -- why not just use that concrete structure, instead of the more general typeclass?
pigworker once asked how to express that a type is infinitely differentiable. This question brought to mind the fact that in complex analysis, a function that is differentiable (on an open set) must be infinitely differentiable (on that set). Is there a way to talk about complex differentiation of datatypes? If so, does a similar theorem hold?
Not really an answer... but this rant is way too long for a comment.
I find it a bit misleading to think complex differentiability just implies infinite differentiability. It's in fact much stronger than that: if a function is complex differentiable, then its derivatives at any point determine the entire function. And because infinite differentiability gives you a full Taylor series, you have an analytic function which is equal to your function, i.e. is your function itself. So, in a sense complex differentiable functions are analytic... because they are.
From a (standard) calculus perspective, the key contrast between real diff'ability and complex diff'ability is that in the reals, there is only one direction in which you can take the limit of difference-quotients (f(x+δ) - f x)/δ. You merely require that the left limit equals the right limit. But because that's an equality after the limit, this has only an effect locally. (Topologically speaking, the constraint just compares two discrete values, so it doesn't really deal with continuity properties at all.)
OTOH, for complex differentiability we require that the limit of the difference quotient is the same if we approach x from any direction in the entire complex plane. That's an entire continuous degree of freedom constrained. You can then go on to perform topological tricks (Cauchy integrals are essentially that) to “spread” the constraint through the entire domain.
I consider this a bit problematic philosophically. Holomorphic functions aren't really functions at all, as in: they're not so much defined by the entirety of their result values across the domain, as by some way to write them with analytic formulas (i.e. possibly-infinite algebraic expressions / polynomials).
Most mathematicians and physicists apparently like this a lot – such expressions are just the way in which they generally write functions.
I don't, really, like it at all: to me, a function should be a function, something defined by individual values, like field strengths you can measure in space or results you can define in Haskell.
Anyway, I digress...
If we translate this issue from functions on numbers to functors on Haskell types, I suppose the upshot is that complex diff'ability means nothing else but: a type can be written as a (possibly infinite?) ADT polynomial. And how to get infinite differentiability for such ADTs was shown in the post you linked to.
Another spin... perhaps closer to an answer.
These “derivatives” of Haskell types aren't really derivatives in the calculus sense. As in, they aren't motivated by a concept of small-pertubation response analysis†. It so happens that you can mathematically proove, for a very specific class of functions – those defined by an algebraic expression – that the calculus-derivative can again be written in a simple algebraic way (given by the well-known differentiation rules). That means trivially that you can differentiate infinitely often.
The usefulness of this symbolic differentiation also motivates to think about it as a more abstract operation. And when you're differentiating Haskell types, it is mainly just this algebraic definition you're going after, not the original calculus one.
Which is fine... but once you're doing algebra rather than calculus, it's not very meaningful to distinguish “real” from “complex” – it's actually neither, because you're not handling values but symbolic representations of values. An untyped language, if you will (and indeed, Haskell's type language is still untyped, with everything having kind *).
†Be it with traditional convergent limits or NSA-infinitesimals.
In System F I can define the genuine total addition function using Church numerals.
In Haskell I cannot define that function because of the bottom value. For example, in haskell if x + y = x, then I cannot say that y is zero - if x is bottom, x + y = x for any y. So the addition is not the true addition but an approximation to it.
In C I cannot define that function because C specification requires everything to have finite size. So in C possible approximations are even worse than in Haskell.
So we have:
In System F it's possible to define the addition but it's not possible to have a complete implementation (because there are no infinite hardware).
In Haskell it's not possible to define the addition (because of the bottom), and it's not possible to have a complete implementation.
In C it's not possible to define the total addition function (because semantic of everything is bounded) but compliant implementations are possible.
So all 3 formal systems (Haskell, System F and C) seem to have different design tradeoffs.
So what are consequences of choosing one over another?
Haskell
This is a strange problem because you're working with a vague notion of =. _|_ = _|_ only "holds" (and even then you should really use ⊑) at the domain semantic level. If we distinguish between information available at the domain semantic level and equality in the language itself, then it's perfectly correct to say that True ⊑ x + y == x --> True ⊑ y == 0.
It's not addition that's the problem, and it's not natural numbers that are the problem either -- the issue is simply distinguishing between equality in the language and statements about equality or information in the semantics of the language. Absent the issue of bottoms, we can typically reason about Haskell using naive equational logic. With bottoms, we can still use equational reasoning -- we just have to be more sophisticated with our equations.
A fuller and clearer exposition of the relationship between total languages and the partial languages defined by lifting them is given in the excellent paper "Fast and Loose Reasoning is Morally Correct".
C
You claim that the C requires everything (including addressable space) to have a finite size, and therefore that C semantics "impose a limit" on the size of representable naturals. Not really. The C99 standard says the following: "Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type." The rationale document further emphasizes that "C has now been implemented on a wide range of architectures. While some of these
architectures feature uniform pointers which are the size of some integer type, maximally
portable code cannot assume any necessary correspondence between different pointer types and the integer types. On some implementations, pointers can even be wider than any integer type."
As you can see, there's explicitly no assumption that pointers must be of a finite size.
You have a set of theories as frameworks to do your reasoning with; finite reality, Haskell semantics, System F are just ones of them.
You can choose appropriate theory for your work, build new theory from scratch or from big pieces of existing theories gathered together. For example, you can consider set of always terminating Haskell programs and employ bottomless semantics safely. In this case your addition will be correct.
For low level language there may be considerations to plug finiteness in but for high level language it is worth to omit such things because more abstract theories allow wider application.
While programming, you use not "language specification" theory but "language specification + implementation limitations" theory so there is no difference between cases where memory limits present in language specification or in language implementation. Absence of limits become important when you start building pure theoretic constructions in framework of language semantics. For example, you may want to prove some program equivalences or language translations and find that every unneeded detail in language specification brings a much pain in proof.
I'm sure you've heard the aphorism that "in theory there is no difference between theory and practice, but in practice there is."
In this case, in theory there are differences, but all of these systems deal with the same finite amount of addressable memory so in practice there is no difference.
EDIT:
Assuming you can represent a natural number in any of these systems, you can represent addition in any of them. If the constraints you are concerned about prevent you from representing a natural number then you can't represent Nat*Nat addition.
Represent a natural number as a pair of (heuristic lower bound on the maximum bit size and a lazily evaluated list of bits).
In the lambda calculus, you can represent the list as a function that returns a function that called with true returns the 1's bit, and called with false returns a function that does the same for the 2's bit and so on.
Addition is then an operation applied to the zip of those two lazy lists that propagates a carry bit.
You of course have to represent the maximum bit size heuristic as a natural number, but if you only instantiate numbers with a bit count that is strictly smaller than the number you are representing, and your operators don't break that heuristic, then the bit size is inductively a smaller problem than the numbers you want to manipulate, so operations terminate.
On the ease of accounting for edge cases, C will give you very little help. You can return special values to represent overflow/underflow, and even try to make them infectious (like IEEE-754 NaN) but you won't get complaints at compile time if you fail to check. You could try and overload a signal SIGFPE or something similar to trap problems.
I cannot say that y is zero - if x is bottom, x + y = x for any y.
If you're looking to do symbolic manipulation, Matlab and Mathematica are implemented in C and C like languages. That said, python has a well-optimized bigint implementation that is used for all integer types. It's probably not suitable for representing really really large numbers though.