In System F I can define the genuine total addition function using Church numerals.
In Haskell I cannot define that function because of the bottom value. For example, in haskell if x + y = x, then I cannot say that y is zero - if x is bottom, x + y = x for any y. So the addition is not the true addition but an approximation to it.
In C I cannot define that function because C specification requires everything to have finite size. So in C possible approximations are even worse than in Haskell.
So we have:
In System F it's possible to define the addition but it's not possible to have a complete implementation (because there are no infinite hardware).
In Haskell it's not possible to define the addition (because of the bottom), and it's not possible to have a complete implementation.
In C it's not possible to define the total addition function (because semantic of everything is bounded) but compliant implementations are possible.
So all 3 formal systems (Haskell, System F and C) seem to have different design tradeoffs.
So what are consequences of choosing one over another?
Haskell
This is a strange problem because you're working with a vague notion of =. _|_ = _|_ only "holds" (and even then you should really use ⊑) at the domain semantic level. If we distinguish between information available at the domain semantic level and equality in the language itself, then it's perfectly correct to say that True ⊑ x + y == x --> True ⊑ y == 0.
It's not addition that's the problem, and it's not natural numbers that are the problem either -- the issue is simply distinguishing between equality in the language and statements about equality or information in the semantics of the language. Absent the issue of bottoms, we can typically reason about Haskell using naive equational logic. With bottoms, we can still use equational reasoning -- we just have to be more sophisticated with our equations.
A fuller and clearer exposition of the relationship between total languages and the partial languages defined by lifting them is given in the excellent paper "Fast and Loose Reasoning is Morally Correct".
C
You claim that the C requires everything (including addressable space) to have a finite size, and therefore that C semantics "impose a limit" on the size of representable naturals. Not really. The C99 standard says the following: "Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type." The rationale document further emphasizes that "C has now been implemented on a wide range of architectures. While some of these
architectures feature uniform pointers which are the size of some integer type, maximally
portable code cannot assume any necessary correspondence between different pointer types and the integer types. On some implementations, pointers can even be wider than any integer type."
As you can see, there's explicitly no assumption that pointers must be of a finite size.
You have a set of theories as frameworks to do your reasoning with; finite reality, Haskell semantics, System F are just ones of them.
You can choose appropriate theory for your work, build new theory from scratch or from big pieces of existing theories gathered together. For example, you can consider set of always terminating Haskell programs and employ bottomless semantics safely. In this case your addition will be correct.
For low level language there may be considerations to plug finiteness in but for high level language it is worth to omit such things because more abstract theories allow wider application.
While programming, you use not "language specification" theory but "language specification + implementation limitations" theory so there is no difference between cases where memory limits present in language specification or in language implementation. Absence of limits become important when you start building pure theoretic constructions in framework of language semantics. For example, you may want to prove some program equivalences or language translations and find that every unneeded detail in language specification brings a much pain in proof.
I'm sure you've heard the aphorism that "in theory there is no difference between theory and practice, but in practice there is."
In this case, in theory there are differences, but all of these systems deal with the same finite amount of addressable memory so in practice there is no difference.
EDIT:
Assuming you can represent a natural number in any of these systems, you can represent addition in any of them. If the constraints you are concerned about prevent you from representing a natural number then you can't represent Nat*Nat addition.
Represent a natural number as a pair of (heuristic lower bound on the maximum bit size and a lazily evaluated list of bits).
In the lambda calculus, you can represent the list as a function that returns a function that called with true returns the 1's bit, and called with false returns a function that does the same for the 2's bit and so on.
Addition is then an operation applied to the zip of those two lazy lists that propagates a carry bit.
You of course have to represent the maximum bit size heuristic as a natural number, but if you only instantiate numbers with a bit count that is strictly smaller than the number you are representing, and your operators don't break that heuristic, then the bit size is inductively a smaller problem than the numbers you want to manipulate, so operations terminate.
On the ease of accounting for edge cases, C will give you very little help. You can return special values to represent overflow/underflow, and even try to make them infectious (like IEEE-754 NaN) but you won't get complaints at compile time if you fail to check. You could try and overload a signal SIGFPE or something similar to trap problems.
I cannot say that y is zero - if x is bottom, x + y = x for any y.
If you're looking to do symbolic manipulation, Matlab and Mathematica are implemented in C and C like languages. That said, python has a well-optimized bigint implementation that is used for all integer types. It's probably not suitable for representing really really large numbers though.
Related
I'm working with linear problems on rationals in Z3. To use Z3 I take SBV.
An example of a problem I pose is:
import Data.SBV
solution1 = do
x <- sRational "x"
w <- sRational "w"
constrain $ x.< w
constrain $ x + 2*w .>=0 .|| x .== 1
My question is:
Are these kinds of problems decidable?
I couldn't find a list of decidable theories or a way to tell if a theory is decidable.
The closest I found is this. The theory about the real ones is decidable, but is it the same for rational numbers? Intuition tells me that it is, but I have not found the information that allows me to assure it.
Thanks in advance
SBV models rationals using the standard "two integers" idea; that is, it represents the numerator and the denominator separately as integers. This means that if you add two symbolic rationals, you'll have a non-linear term over the integers. So, in theory, the problem will be in the semi-decidable fragment. That is, even if you restrict your multiplications to concrete scalars, addition of symbolic rationals will give rise to non-linear terms over integers.
Having said that, I had good luck using rationals; where z3 was able to decide most problems of interest without much difficulty. If it proves to be an issue, you should switch to SReal type (i.e., algebraic reals), for which z3 has a decision procedure. But of course, the models you get can now include algebraic reals, such as square-root-of-2, etc. (i.e., the roots of any polynomial with integer coefficients.)
Side note If your problem allows for delta-sat (i.e., satisfiability with perturbations), you should look into dReal (http://dreal.github.io), which SBV also supports as a backend solver. But perhaps that's not what you had in mind.
Theoretical note
Strictly speaking, linear arithmetic over rationals is decidable; see Section 3 of https://www.cs.ox.ac.uk/people/james.worrell/lecture15-2015.pdf for a proof. However, SMT solvers do not support rationals out-of-the-box; and SBV (as I mentioned above), uses two symbolic integers to represent rationals. So, adding two rationals will give rise to multiplication of two symbolic integers, taking you out of the decidable fragment. Of course, in practice, the solvers are quite adept at coming up with solutions even in the presence of non-linear terms; it's just that you're not always guaranteed. So, a more strict answer to your question is while linear arithmetic over rationals is decidable, the translation used by SBV puts the problem into the non-linear integer arithmetic domain, and hence decidability is not guaranteed. In any case, SMTLib does not come with a theory of rationals, so you're kind of out-of-luck when it comes to first class support for them.
I guess a rational solution will exist iff an integer solution exists to a suitably scaled collection of constraints. For example, x=1/2(=5/10), w=3/5(=6/10) is a solution to your example problem. Scaling your problem by 10, we have the equivalent constraint set:
10*x < 10*w
(10*x + 20*w >= 0) || (10*x == 10)
Writing x'=10*x and w'=10*w, this means that x'=5, w'=6 is an integer solution to:
x' < w'
(x' + w' >= 0) || (x' == 10)
Presburger famously showed that first-order logic plus integers and addition is decidable. (Multiplication by a constant is also allowed, since it can be expanded to an addition -- e.g. 3*x is x+x+x.)
I guess the only trick left is to show that it's possible to choose what scaling to use without having solved the problem yet. Nothing obvious occurs to me off the top of my head, but it seems reasonable that this should be doable. For example, perhaps if you take the product of all the nonzero numerators and denominators in your constraint set, you can show that the set of rationals with that product as their denominator is indistinguishable from the full set of rationals. (If so, you could look through the proof to see if it still works with a smaller denominator.)
I'm not a z3 expert, so I can't talk about how this translates to whether that tool specifically is suitable, but it seems likely to me that it is possible to create a suitable tool.
How does the pumping length of a regular language relate to the pumping length of a related language. For example, if A :< B :< C are all regular languages and k is the pumping length of B, do we know anything about the pumping lengths of A and C?
One might be inclined naively to think that a sublanguage has a smaller (<=) pumping length when we look at finite languages. {a,ab,abc} :< {a,ab,abc,abcd} have respective pumping lengths 4 <= 5. Taking an element out of a set can't make its longest word even longer.
On the other hand if you look at the state machine formed by the synchronized product of two languages, the intersection language and the union language have the same state machine structure, but differ in that the set of final states of the intersection is a subset of the set of final states of the union. Having more final states, could make it more probable to find a shorter path through the state machine. But on the contrary having fewer final states makes it more likely that the state machine has non-co-accessible states, and is thus reducible.
Note first that all languages over an alphabet are related to all other languages over that alphabet by some relation, however contrived. We really do need to limit discussion, as you have suggested, to something like subset in order to meaningfully scope the question.
Now, you've already correctly noted that in the general case, the subset relation doesn't have a clear bearing on the pumping lengths of relative languages. You can easily take a* and compare to {a^n} and show that a minimal DFA for a* is almost always simpler than one for {a^n}.
Let us further restrict ourselves to languages that differ by a finite number of entries; that is, L \ R is finite and R \ L is finite. This is an indicator of similarity different from subset; but if we require that, w.l.o.g., that R \ L be empty, then we recover a restricted version of subset. We might now ask the same question given this modified version: for languages that differ in a finite number of entries, does the subset relation tell us anything?
The answer is still no. Consider L = a* and R = a* \ A, where A is a finite non-empty subset of a*. Even still, L takes one state and R takes potentially many more.
Restricting ourselves to finite sets only, as you suggest, does let us deduce what you propose: that a minimal automaton for R will have no more states than the one for L. Why is this? We must have n+1 states to accept any string of length n, and we must have a dead state to accept strings not in the language (of which there will be infinitely many). A minimal DFA will have no loops at all (except centering on the dead state) if the language is finite, since otherwise you'd be able to get infinitely many strings.
Your observation about taking the Cartesian product is correct, in that the result of applying the construction gives a structurally identical DFA for any set operation (union, difference, intersection, etc.); however, these DFAs are not guaranteed to be minimal and the point for finite languages still holds, namely, the DFA for intersection will have no more states than the one for union, before and after DFA minimization of both machines.
You might be able to take the Myhill-Nerode theorem and define a notion of "close-relatedness" using that which does allow you to compare languages to determine which will have the larger minimal DFA. I am not sure about that, however, since an easy way of doing that would allow you to compare to parameterized reference languages to, for instance, easily prove any non-regular language is non-regular, which might be a big deal in mathematics in its own right (like, either it's impossible in general or it would prove P=NP, etc. to do so).
I'm new to F# and Haskell and am implementing a project in order to determine which language I would prefer to devote more time to.
I have a numerous situations where I expect a given numerical type to have given dimensions based on parameters given to a top-level function (ie, at runtime). For example, in this F# snippet, I have
type DataStreamItem = LinearAlgebra.Vector<float32>
type Ball =
{R : float32;
X : DataStreamItem}
and I expect all instances of type DataStreamItem to have D dimensions.
My question is in the interests of algorithm development and debugging since such shape-mismatche-bugs can be a headache to pin down but should be a non-issue when the algorithm is up-and-running:
Is there a way, in either F# or Haskell, to constrain DataStreamItem and / or Ball to have dimensions of D? Or do I need to resort to pattern matching on every calculation?
If the latter is the case, are there any good, light-weight paradigms to catch such constraint violations as soon as they occur (and that can be removed when performance is critical)?
Edit:
To clarify the sense in which D is constrained:
D is defined such that if you expressed the algorithm of the function main(DataStream) as a computation graph, all of the intermediate calculations would depend on the dimension of D for the execution of main(DataStream). The simplest example I can think of would be a dot-product of M with DataStreamItem: the dimension of DataStream would determine the creation of dimension parameters of M
Another Edit:
A week later, I find the following blog outlining precisely what I was looking for in dependant types in Haskell:
https://blog.jle.im/entry/practical-dependent-types-in-haskell-1.html
And Another:
This reddit contains some discussion on Dependent Types in Haskell and contains a link to the quite interesting dissertation proposal of R. Eisenberg.
Neither Haskell not F# type system is rich enough to (directly) express statements of the sort "N nested instances of a recursive type T, where N is between 2 and 6" or "a string of characters exactly 6 long". Not in those exact terms, at least.
I mean, sure, you can always express such a 6-long string type as type String6 = String6 of char*char*char*char*char*char or some variant of the sort (which technically should be enough for your particular example with vectors, unless you're not telling us the whole example), but you can't say something like type String6 = s:string{s.Length=6} and, more importantly, you can't define functions of the form concat: String<n> -> String<m> -> String<n+m>, where n and m represent string lengths.
But you're not the first person asking this question. This research direction does exist, and is called "dependent types", and I can express the gist of it most generally as "having higher-order, more powerful operations on types" (as opposed to just union and intersection, as we have in ML languages) - notice how in the example above I parametrize the type String with a number, not another type, and then do arithmetic on that number.
The most prominent language prototypes (that I know of) in this direction are Agda, Idris, F*, and Coq (not really the full deal AFAIK). Check them out, but beware: this is kind of the edge of tomorrow, and I wouldn't advise starting a big project based on those languages.
(edit: apparently you can do certain tricks in Haskell to simulate dependent types, but it's not very convenient, and you have to enable UndecidableInstances)
Alternatively, you could go with a weaker solution of doing the checks at runtime. The general gist is: wrap your vector types in a plain wrapper, don't allow direct construction of it, but provide constructor functions instead, and make those constructor functions ensure the desired property (i.e. length). Something like:
type Stream4 = private Stream4 of DataStreamItem
with
static member create (item: DataStreamItem) =
if item.Length = 4 then Some (Stream4 item)
else None
// Alternatively:
if item.Length <> 4 then failwith "Expected a 4-long vector."
item
Here is a fuller explanation of the approach from Scott Wlaschin: constrained strings.
So if I understood correctly, you're actually not doing any type-level arithmetic, you just have a “length tag” that's shared in a chain of function calls.
This has long been possible to do in Haskell; one way that I consider quite elegant is to annotate your arrays with a standard fixed-length type of the desired length:
newtype FixVect v s = FixVect { getFixVect :: VU.Vector s }
To ensure the correct length, you only provide (polymorphic) smart constructors that construct from the fixed-length type – perfectly safe, though the actual dimension number is nowhere mentioned!
class VectorSpace v => FiniteDimensional v where
asFixVect :: v -> FixVect v (Scalar v)
instance FiniteDimensional Float where
asFixVect s = FixVect $ VU.singleton s
instance (FiniteDimensional a, FiniteDimensional b, Scalar a ~ Scalar b) => FiniteDimensional (a,b) where
asFixVect (a,b) = case (asFixVect a, asFixVect b) of
(FixVect av, FixVect bv) -> FixVect $ av<>bv
This construction from unboxed tuples is really inefficient, however this doesn't mean you can write efficient programs with this paradigm – if the dimension always stays constant, you only need to wrap and unwrap the once and can do all the critical operations through safe yet runtime-unchecked zips, folds and LA combinations.
Regardless, this approach isn't really widely used. Perhaps the single constant dimension is in fact too limiting for most relevant operations, and if you need to unwrap to tuples often it's way too inefficient. Another approach that is taking off these days is to actually tag the vectors with type-level numbers. Such numbers have become available in a usable form with the introduction of data kinds in GHC-7.4. Up until now, they're still rather unwieldy and not fit for proper arithmetic, but the upcoming 8.0 will greatly improve many aspects of this dependently-typed programming in Haskell.
A library that offers efficient length-indexed arrays is linear.
pigworker once asked how to express that a type is infinitely differentiable. This question brought to mind the fact that in complex analysis, a function that is differentiable (on an open set) must be infinitely differentiable (on that set). Is there a way to talk about complex differentiation of datatypes? If so, does a similar theorem hold?
Not really an answer... but this rant is way too long for a comment.
I find it a bit misleading to think complex differentiability just implies infinite differentiability. It's in fact much stronger than that: if a function is complex differentiable, then its derivatives at any point determine the entire function. And because infinite differentiability gives you a full Taylor series, you have an analytic function which is equal to your function, i.e. is your function itself. So, in a sense complex differentiable functions are analytic... because they are.
From a (standard) calculus perspective, the key contrast between real diff'ability and complex diff'ability is that in the reals, there is only one direction in which you can take the limit of difference-quotients (f(x+δ) - f x)/δ. You merely require that the left limit equals the right limit. But because that's an equality after the limit, this has only an effect locally. (Topologically speaking, the constraint just compares two discrete values, so it doesn't really deal with continuity properties at all.)
OTOH, for complex differentiability we require that the limit of the difference quotient is the same if we approach x from any direction in the entire complex plane. That's an entire continuous degree of freedom constrained. You can then go on to perform topological tricks (Cauchy integrals are essentially that) to “spread” the constraint through the entire domain.
I consider this a bit problematic philosophically. Holomorphic functions aren't really functions at all, as in: they're not so much defined by the entirety of their result values across the domain, as by some way to write them with analytic formulas (i.e. possibly-infinite algebraic expressions / polynomials).
Most mathematicians and physicists apparently like this a lot – such expressions are just the way in which they generally write functions.
I don't, really, like it at all: to me, a function should be a function, something defined by individual values, like field strengths you can measure in space or results you can define in Haskell.
Anyway, I digress...
If we translate this issue from functions on numbers to functors on Haskell types, I suppose the upshot is that complex diff'ability means nothing else but: a type can be written as a (possibly infinite?) ADT polynomial. And how to get infinite differentiability for such ADTs was shown in the post you linked to.
Another spin... perhaps closer to an answer.
These “derivatives” of Haskell types aren't really derivatives in the calculus sense. As in, they aren't motivated by a concept of small-pertubation response analysis†. It so happens that you can mathematically proove, for a very specific class of functions – those defined by an algebraic expression – that the calculus-derivative can again be written in a simple algebraic way (given by the well-known differentiation rules). That means trivially that you can differentiate infinitely often.
The usefulness of this symbolic differentiation also motivates to think about it as a more abstract operation. And when you're differentiating Haskell types, it is mainly just this algebraic definition you're going after, not the original calculus one.
Which is fine... but once you're doing algebra rather than calculus, it's not very meaningful to distinguish “real” from “complex” – it's actually neither, because you're not handling values but symbolic representations of values. An untyped language, if you will (and indeed, Haskell's type language is still untyped, with everything having kind *).
†Be it with traditional convergent limits or NSA-infinitesimals.
I'm wondering why Haskell doesn't have a single element tuple. Is it just because nobody needed it so far, or any rational reasons? I found an interesting thread in a comment at the Real World Haskell's website http://book.realworldhaskell.org/read/types-and-functions.html#funcstypes.composite, and people guessed various reasons like:
No good syntax sugar.
It is useless.
You can think that a normal value like (1) is actually a single element tuple.
But does anyone know the reason except a guess?
There's a lib for that!
http://hackage.haskell.org/packages/archive/OneTuple/0.2.1/doc/html/Data-Tuple-OneTuple.html
Actually, we have a OneTuple we use all the time. It's called Identity, and is now used as the base of standard pure monads in the new mtl:
http://hackage.haskell.org/packages/archive/transformers/0.2.2.0/doc/html/Data-Functor-Identity.html
And it has an important use! By virtue of providing a type constructor of kind * -> *, it can be made an instance (a trival one, granted, though not the most trivial) of Monad, Functor, etc., which lets us use it as a base for transformer stacks.
The exact reason is because it's totally unnecessary. Why would you need a one-tuple if you can just have its value?
The syntax also tends to be a bit clunky. In Python, you can have one-tuples, but you need a trailing comma to distinguish it from a parenthesized expression:
onetuple = (3,)
All in all, there's no reason for it. I'm sure there's no "official" reason because the designers of Haskell probably never even considered a single element tuple because it has no use.
I don't know if you were looking for some reasons beyond the obvious, but in this case the obvious answer is the right one.
My answer is not exactly about Haskell semantics, but about the theoretical mathematical elegance of making a value the same as its one-tuple. (So this answer should not be taken as an explanation of the standard behavior expected of a Haskell implementation, because it isn't intended as such.)
In programming languages and computation models where all functions are curried, such as lambda-calculus and combinatory logic, every function has exactly one input argument and one output/return value. No more, no less.
When we want a particular function f to have more than one input argument – say 3 –, we simulate it under this curried regime by creating a 1-argument function that returns a 2-argument function. Thus, f x y z = ((f x) y) z, and f x would return a 2-argument function.
Likewise, sometimes we might want to return more than one value from a function. It is not literally possible under this semantics, but we can simulate it by returning a tuple. We can generalize this.
If, for uniformity, we constrain the only return value of any function to be an (n-)tuple, we are able to harmonize some interesting features of the unit value and of supposedly non-tuple return values with the features of tuples in general, as follows.
Let's adopt as the general syntax of n-tuples the following schema, where ci is the component with the index i:
Notice that n-tuples have delimiting parentheses in this syntax.
Under this schema, how would we represent a 0-tuple? Since it has no components, this degenerate case would be represented like this: ( ). This syntax precisely coincides with the syntax we adopt to represent the unit value. So, we are tempted to make the unit value the same as a 0-tuple.
What about a 1-tuple? It would have this representation: . Here a syntactical ambiguity would immediately arise: parentheses would be used in the language both as 1-tuple delimiters and as mere grouping of values or expressions. So, in a context where (v) appears, a compiler or interpreter would be unsure whether this is a 1-tuple with a component whose value is v, or just an isolated value v inside superfluous parentheses.
A way to solve this ambiguity is to force a value to be the same as the 1-tuple that would have it as its only component. Not much would be sacrificed, since the only non-empty projection we can perform on a 1-tuple is to obtain its only value.
For this to be consistently enforced, the syntactical consequence is that we would have to relax a bit our former requirement that delimiting parentheses are mandatory for all n-tuples: now they would be optional for 1-tuples, and mandatory for all other values of n. (Or we could require all values to be delimited by parentheses, but this would be inconvenient for practical use.)
In summary, under the interpretation that a 1-tuple is the same as its only component value, we could, by making syntactic puns with parentheses, consider all return values of functions in our programming language or computing model as n-tuples: the 0-tuple in the case of the unit type, 1-tuples in the case of ordinary/"atomic" values which we usually don't think of as tuples, and pairs/triples/quadruples/... for other kinds of tuples. This heterodox interpretation is mathematically parsimonious and uniform, is expressive enough to simulate functions with multiple input arguments and multiple return values, and is not incompatible with Haskell (in the sense that no harm is done if the programmer assumes this unofficial interpretation).
This was an argument by syntactic puns. Whether you are satisfied or not with it, we can do even better. A more principled argument can be taken from the mathematical theory of relations, by exploring the Cartesian product operation.
An (n-adic) relation is extensionally defined as a uniform set of (n-)tuples. (This characterization is fundamental to relational database theory and is therefore important knowledge for professional computer programmers.)
A dyadic relation – a set of pairs (2-tuples) – is a subset of the Cartesian product of 2 sets:
. For a homogeneous relation: .
A triadic relation – a set of triples (3-tuples) – is a subset of the Cartesian product of 3 sets:
. For a homogeneous relation: .
A monadic relation – a set of monuples (1-tuples) – is a subset of the Cartesian product of 1 set:
(by the usual mathematical convention).
As we can see, a monadic relation is just a set of atomic values! This means that a set of 1-tuples is a set of atomic values. Therefore, it is convenient to consider that 1-tuples and atomic values are the same thing.