Can good type systems distinguish between matrices in different bases?

Can good type systems distinguish between matrices in different bases? - haskell

My program (Hartree-Fock/iterative SCF) has two matrices F and F' which are really the same matrix expressed in two different bases. I just lost three hours of debugging time because I accidentally used F' instead of F. In C++, the type-checker doesn't catch this kind of error because both variables are Eigen::Matrix<double, 2, 2> objects.
I was wondering, for the Haskell/ML/etc. people, whether if you were writing this program you would have constructed a type system where F and F' had different types? What would that look like? I'm basically trying to get an idea how I can outsource some logic errors onto the type checker.
Edit: The basis of a matrix is like the unit. You can say 1L or however many gallons, they both mean the same thing. Or, to give a vector example, you can say (0,1) in Cartesian coordinates or (1,pi/2) in polar. But even though the meaning is the same, the numerical values are different.
Edit: Maybe units was the wrong analogy. I'm not looking for some kind of record type where I can specify that the first field will be litres and the second gallons, but rather a way to say that this matrix as a whole, is defined in terms of some other matrix (the basis), where the basis could be any matrix of the same dimensions. E.g., the constructor would look something like mkMatrix [[1, 2], [3, 4]] [[5, 6], [7, 8]] and then adding that object to another matrix would type-check only if both objects had the same matrix as their second parameters. Does that make sense?
Edit: definition on Wikipedia, worked examples

This is entirely possible in Haskell.
Statically checked dimensions
Haskell has arrays with statically checked dimensions, where the dimensions can be manipulated and checked statically, preventing indexing into the wrong dimension. Some examples:
This will only work on 2-D arrays:
multiplyMM :: Array DIM2 Double -> Array DIM2 Double -> Array DIM2 Double
An example from repa should give you a sense. Here, taking a diagonal requires a 2D array, returns a 1D array of the same type.
diagonal :: Array DIM2 e -> Array DIM1 e
or, from Matt sottile's repa tutorial, statically checked dimensions on a 3D matrix transform:
f :: Array DIM3 Double -> Array DIM2 Double
f u =
let slabX = (Z:.All:.All:.(0::Int))
slabY = (Z:.All:.All:.(1::Int))
u' = (slice u slabX) * (slice u slabX) +
(slice u slabY) * (slice u slabY)
in
R.map sqrt u'
Statically checked units
Another example from outside of matrix programming: statically checked units of dimension, making it a type error to confuse e.g. feet and meters, without doing the conversion.
Prelude> 3 *~ foot + 1 *~ metre
1.9144 m
or for a whole suite of SI units and quanities.
E.g. can't add things of different dimension, such as volumes and lengths:
> 1 *~ centi litre + 2 *~ inch
Error:
Expected type: Unit DVolume a1
Actual type: Unit DLength a0
So, following the repa-style array dimension types, I'd suggest adding a Base phantom type parameter to your array type, and using that to distinguish between bases. In Haskell, the index Dim
type argument gives the rank of the array (i.e. its shape), and you could do similarly.
Or, if by base you mean some dimension on the units, using dimensional types.
So, yep, this is almost a commodity technique in Haskell now, and there's some examples of designing with types like this to help you get started.

This is a very good question. I don't think you can encode the notion of a basis in most type systems, because essentially anything that the type checker does needs to be able to terminate, and making judgments about whether two real-valued vectors are equal is too difficult. You could have (2 v_1) + (2 v_2) or 2 (v_1 + v_2), for example. There are some languages which use dependent types [ wikipedia ], but these are relatively academic.
I think most of your debugging pain would be alleviated if you simply encoded the bases in which you matrix works along with the matrix. For example,
newtype Matrix = Matrix { transform :: [[Double]],
srcbasis :: [Double], dstbasis :: [Double] }
and then, when you M from basis a to b with N, check that N is from b to c, and return a matrix with basis a to c.
NOTE -- it seems most people here have programming instead of math background, so I'll provide short explanation here. Matrices are encodings of linear transformations between vector spaces. For example, if you're encoding a rotation by 45 degrees in R^2 (2-dimensional reals), then the standard way of encoding this in a matrix is saying that the standard basis vector e_1, written "[1, 0]", is sent to a combination of e_1 and e_2, namely [1/sqrt(2), 1/sqrt(2)]. The point is that you can encode the same rotation by saying where different vectors go, for example, you could say where you're sending [1,1] and [1,-1] instead of e_1=[1,0] and e_2=[0,1], and this would have a different matrix representation.
Edit 1
If you have a finite set of bases you are working with, you can do it...
{-# LANGUAGE EmptyDataDecls #-}
data BasisA
data BasisB
data BasisC
newtype Matrix a b = Matrix { coefficients :: [[Double]] }
multiply :: Matrix a b -> Matrix b c -> Matrix a c
multiply (Matrix a_coeff) (Matrix b_coeff) = (Matrix multiplied) :: Matrix a c
where multiplied = undefined -- your algorithm here
Then, in ghci (the interactive Haskell interpreter),
*Matrix> let m = Matrix [[1, 2], [3, 4]] :: Matrix BasisA BasisB
*Matrix> m `multiply` m
<interactive>:1:13:
Couldn't match expected type `BasisB'
against inferred type `BasisA'
*Matrix> let m2 = Matrix [[1, 2], [3, 4]] :: Matrix BasisB BasisC
*Matrix> m `multiply` m2
-- works after you finish defining show and the multiplication algorithm

While I realize this does not strictly address the (clarified) question – my apologies – it seems relevant at least in relation to Don Stewart's popular answer...
I am the author of the Haskell dimensional library that Don referenced and provided examples from. I have also been writing – somewhat under the radar – an experimental rudimentary linear algebra library based on dimensional. This linear algebra library statically tracks the sizes of vectors and matrices as well as the physical dimensions ("units") of their elements on a per element basis.
This last point – tracking physical dimensions on a per element basis – is rather challenging and perhaps overkill for most uses, and one could even argue that it makes little mathematical sense to have quantities of different physical dimensions as elements in any given vector/matrix. However, some linear algebra applications of interest to me such as kalman filtering and weighted least squares estimation typically use heterogeneous state vectors and covariance matrices.
Using a Kalman filter as an example, consider a state vector x = [d, v] which has physical dimensions [L, LT^-1]. The next (future) state vector is predicted by multiplication by the state transition matrix F, i.e.: x' = F x_. Clearly for this equation to make sense F cannot be arbitrary but must have size and physical dimensions [[1, T], [T^-1, 1]]. The predict_x' function below statically ensures that this relationship holds:
predict_x' :: (Num a, MatrixVector f x x) => Mat f a -> Vec x a -> Vec x a
predict_x' f x_ = f |*< x_
(The unsightly operator |*< denotes multiplication of a matrix on the left with a vector on the right.)
More generally, for an a priori state vector x_ of arbitrary size and with elements of arbitrary physical dimensions, passing a state transition matrix f with "incompatible" size and/or physical dimensions to predict_x' will cause a compile time error.

In F# (which originally evolved from OCaml), you can use units of measure. Andrew Kenned, who designed the feature (and also created a very interesting theory behind it) has a great series of articles that demonstrate it.
This can quite likely be used in your scenario - although I don't fully understand the question. For example, you can declare two unit types like this:
[<Measure>] type litre
[<Measure>] type gallon
Adding litres and gallons gives you a compile time error:
1.0<litre> + 1.0<gallon> // Error!
F# doesn't automatically insert conversion between different units, but you can write a conversion function:
let toLitres gal = gal * 3.78541178<litre/gallon>
1.0<litre> + (toLitres 1.0<gallon>)
The beautiful thing about units of measure in F# is that they are automatically inferred and functions are generic. If you multiply 1.0<gallon> * 1.0<gallon>, the result is 1.0<gallon^2>.
People have used this feature for various things - ranging from conversion of virtual meters to screen pixels (in solar system simulations) to converting currencies (dollars in financial systems). Although I'm not expert, it is quite likely that you could use it in some way for your problem domain too.

If it's expressed in a different base, you can just add a template parameter to act as the base. That will differentiate those types. A float is a float is a float- if you don't want two float values to be the same if they actually have the same value, then you need to tell the type system about it.

Related

How to filter by predicate on index in Repa

I have two Repa arrays a1 and a2 and I would like to eliminate all the elements in a2 for which the corresponding index in a1 is above a certain threshold. For example:
import qualified Data.Array.Repa as R -- for Repa
import Data.Array.Repa (Z (..), (:.)(..))
a1 = R.fromFunction (Z :. 4) $ \(Z :. x) -> [8, 15, 9, 14] ! x
a2 = R.fromFunction (Z :. 4) $ \(Z :. x) -> [0, 1, 2, 3] ! x
threshold = 10
desired = R.fromFunction (Z :. 2) $ \(Z :. x) -> [0, 2] ! x
-- 15 and 14 are above the threshold, 10
One way to do this is with selectP but I would like to avoid using this, since it computes the arrays, and I would like my arrays to remain in delayed form, if possible.
Another way is with the repa-array, but stack solver does not seem to know how to import this library with resolver nightly-2017-04-10.

One way to look at this issue is that, in order to create a Repa Array, you need to know the size (extent) of the Array upon creation (eg. fromFunction), but, in case of filter operation, there is no way to know the size of the resulting Array in repa without applying a thresholding predicate, essentially computing values of the resulting Array.
Another way to look at it is, Delayed array is a simple function from an index to a value, which is fine for most operations. For filtering though, when you apply a predicate, in order to find a value at a particular index, you now need to know all values that come before that index in the resulting array, cause for any location, a value may be there, maybe not.
vector package solves this issue elegantly with stream fusion, and repa-array, next version of Repa, which is still in experimental stage, seems to be trying to use a similar approach, except with extention to higher dimensions (I might be wrong, haven't looked too closely).
So, short answer, there is no way to do filtering with Repa style functional fusion. Either:
stick to selectP - faster (probably), but less memory efficient (for sure), or
piggy back onto ifilter from vector package for sequential
filtering

You can build a list of pairs with zip, then filter by a predicate function with the type (Int,Int) -> Bool and lastly extract the first or second element of the pair (depending on which one you want) by using map fst or map snd respectively. Everything you need for this is in Prelude.
I hope this is enough information so you can put the pieces together yourself. If in doubt, look at the type signatures of the functions i mentioned.

Rendering Fractals: The Mobius Transformation and The Newtonian Basin

I understand how to render (two dimensional) "Escape Time Group" fractals (Julia and Mandelbrot), but I can't seem to get a Mobius Transformation or a Newton Basin rendered.
I'm trying to render them using the same method (by recursively using the polynomial equation on each pixel 'n' times), but I have a feeling these fractals are rendered using totally different methods. Mobius 'Transformation' implies that an image must already exist, and then be transformed to produce the geometry, and the Newton Basin seems to plot each point, not just points that fall into a set.
How are these fractals graphed? Are they graphed using the same iterative methods as the Julia and Mandelbrot?
Equations I'm Using:
Julia: Zn+1 = Zn^2 + C
Where Z is a complex number representing a pixel, and C is a complex constant (Correct).
Mandelbrot: Cn+1 = Cn^2 + Z
Where Z is a complex number representing a pixel, and C is the complex number (0, 0), and is compounded each step (The reverse of the Julia, correct).
Newton Basin: Zn+1 = Zn - (Zn^x - a) / (Zn^y - a)
Where Z is a complex number representing a pixel, x and y are exponents of various degrees, and a is a complex constant (Incorrect - creating a centered, eight legged 'line star').
Mobius Transformation: Zn+1 = (aZn + b) / (cZn + d)
Where Z is a complex number representing a pixel, and a, b, c, and d are complex constants (Incorrect, everything falls into the set).
So how are the Newton Basin and Mobius Transformation plotted on the complex plane?
Update: Mobius Transformations are just that; transformations.
"Every Möbius transformation is
a composition of translations,
rotations, zooms (dilations) and
inversions."
To perform a Mobius Transformation, a shape, picture, smear, etc. must be present already in order to transform it.
Now how about those Newton Basins?
Update 2: My math was wrong for the Newton Basin. The denominator at the end of the equation is (supposed to be) the derivative of the original function. The function can be understood by studying 'NewtonRoot.m' from the MIT MatLab source-code. A search engine can find it quite easily. I'm still at a loss as to how to graph it on the complex plane, though...
Newton Basin:
f(x) = x - f(x) / f'(x)

In Mandelbrot and Julia sets you terminate the inner loop if it exceeds a certain threshold as a measurement how fast the orbit "reaches" infinity
if(|z| > 4) { stop }
For newton fractals it is the other way round: Since the newton method is usually converging towards a certain value we are interested how fast it reaches its limit, which can be done by checking when the difference of two consecutive values drops below a certain value (usually 10^-9 is a good value)
if(|z[n] - z[n-1]| < epsilon) { stop }

How do I define a collection of values with units?

I want to define a collection of data arrays and associate each with units. For instance, one member of the collection could be Vector Float to which I somehow associate the units meters. Another member could be Vector Int to which I associate the units counts. Another could be Vector Double to which I associate the units centimeters. My use case is that for each member of the collection, I want to:
Display the precision of the data (i.e. Int vs Float vs Double vs Int64)
Display the numeric values according to the precision (i.e. [1.0, 3.0, 4.0] or [1, 3, 4] as appropriate).
Display the significance of the values, including units (i.e. Length meters, Count, Length centimeters).
Each data array is read from a file that defines the array of values, the precision, and the units, so none of these are available at compile time.
How can I define the type for such a collection, say, as an IntMap? Can I do so using existing units related libraries or do I need to roll my own? I am open to using any GHC extensions needed to get the job done (such as GADTs).
I have encountered the following issues in trying to formulate a solution.
Vector Float, Vector Double, and Vector Int are all different types, so I cannot put them into the same collection without existential quantification such as in the answer to this question. However, if I use existential quantification using the Num class as a constraint, I remove the precision information.
The units in the dimensional package are at the type level, so arrays with quantity Length cannot be in the same collection as arrays with quantity Time. Again, this might be solved with existential quantification, and again, destroying the needed information. Other units packages I've found treat units on the type level as well.
One possible solution is to take the units and precision information out of the type level and into the data level:
data Quantity = Length LengthUnit
| Time TimeUnit
| Count
| Angle AngleUnit
etc
data TimeUnit = Sec | MSec | Hour | Day
data AngleUnit = Deg | Rad | etc
data LengthUnit = Meter | Centimeter | Mile
etc
data Precision = Int | Float | Double
data DataArray quant prec = forall e . Num e => DataArray Quantity Precision (Vector e)
(This simple units system could be improved by adding prefixes, conversion factors, etc, like units packages usually do. The example above is just for illustrative purposes.)
This presents a number of problems as well. For one, it's easy to define a DataArray that has inconsistent type such as DataArray Angle Int (Vector Float). It also smells of reinventing the wheel, both in implementing a whole new units system and redundantly defining the precision of the data.
I feel like I am overcomplicating something that should be simple. What is a type strategy that achieves my usage goals?

Stuck with Apostolico-Crochemore algorithm

I am trying to understand the Apostolico-Crochemore algorithm.
The only English description I have found is http://www-igm.univ-mlv.fr/~lecroq/string/node12.html#SECTION00120, but I am stuck with the second line of the description where it says
x is a power of a single character
What does that mean?
m in this case is the length of the pattern, c is a character from the alphabet in use. I can't understand how x == c^m.
This is then followed by (x=(a^l)bu for a, b in Sigma, u in Sigma and a neq b that also uses ^ operation which I cannot understand.

Algorithms on strings are sometimes described in the jargon of formal languages, where concatenation (joining) of strings is written as multiplication: x * y, usually written just xy, means "the string x followed by the string y". So x^n (i.e. "raising the string x to the nth power") naturally means "n copies of the string x, joined together".
This is mostly just a notational device, though multiplication (of ordinary real numbers) and string concatenation do share some abstract mathematical properties. E.g. they are both associative: (xy)z = x(yz), whether we're talking about multiplying numbers or joining strings. (OTOH, xy = yx for real numbers but not for strings, in general. But then matrix multiplication is not commutative either.)

Is there a language with constrainable types?

Is there a typed programming language where I can constrain types like the following two examples?
A Probability is a floating point number with minimum value 0.0 and maximum value 1.0.
type Probability subtype of float
where
max_value = 0.0
min_value = 1.0
A Discrete Probability Distribution is a map, where: the keys should all be the same type, the values are all Probabilities, and the sum of the values = 1.0.
type DPD<K> subtype of map<K, Probability>
where
sum(values) = 1.0
As far as I understand, this is not possible with Haskell or Agda.

What you want is called refinement types.
It's possible to define Probability in Agda: Prob.agda
The probability mass function type, with sum condition is defined at line 264.
There are languages with more direct refinement types than in Agda, for example ATS

You can do this in Haskell with Liquid Haskell which extends Haskell with refinement types. The predicates are managed by an SMT solver at compile time which means that the proofs are fully automatic but the logic you can use is limited by what the SMT solver handles. (Happily, modern SMT solvers are reasonably versatile!)
One problem is that I don't think Liquid Haskell currently supports floats. If it doesn't though, it should be possible to rectify because there are theories of floating point numbers for SMT solvers. You could also pretend floating point numbers were actually rational (or even use Rational in Haskell!). With this in mind, your first type could look like this:
{p : Float | p >= 0 && p <= 1}
Your second type would be a bit harder to encode, especially because maps are an abstract type that's hard to reason about. If you used a list of pairs instead of a map, you could write a "measure" like this:
measure total :: [(a, Float)] -> Float
total [] = 0
total ((_, p):ps) = p + probDist ps
(You might want to wrap [] in a newtype too.)
Now you can use total in a refinement to constrain a list:
{dist: [(a, Float)] | total dist == 1}
The neat trick with Liquid Haskell is that all the reasoning is automated for you at compile time, in return for using a somewhat constrained logic. (Measures like total are also very constrained in how they can be written—it's a small subset of Haskell with rules like "exactly one case per constructor".) This means that refinement types in this style are less powerful but much easier to use than full-on dependent types, making them more practical.

Perl6 has a notion of "type subsets" which can add arbitrary conditions to create a "sub type."
For your question specifically:
subset Probability of Real where 0 .. 1;
and
role DPD[::T] {
has Map[T, Probability] $.map
where [+](.values) == 1; # calls `.values` on Map
}
(note: in current implementations, the "where" part is checked at run-time, but since "real types" are checked at compile-time (that includes your classes), and since there are pure annotations (is pure) inside the std (which is mostly perl6) (those are also on operators like *, etc), it's only a matter of effort put into it (and it shouldn't be much more).
More generally:
# (%% is the "divisible by", which we can negate, becoming "!%%")
subset Even of Int where * %% 2; # * creates a closure around its expression
subset Odd of Int where -> $n { $n !%% 2 } # using a real "closure" ("pointy block")
Then you can check if a number matches with the Smart Matching operator ~~:
say 4 ~~ Even; # True
say 4 ~~ Odd; # False
say 5 ~~ Odd; # True
And, thanks to multi subs (or multi whatever, really – multi methods or others), we can dispatch based on that:
multi say-parity(Odd $n) { say "Number $n is odd" }
multi say-parity(Even) { say "This number is even" } # we don't name the argument, we just put its type
#Also, the last semicolon in a block is optional

Nimrod is a new language that supports this concept. They are called Subranges. Here is an example. You can learn more about the language here link
type
TSubrange = range[0..5]

For the first part, yes, that would be Pascal, which has integer subranges.

The Whiley language supports something very much like what you are saying. For example:
type natural is (int x) where x >= 0
type probability is (real x) where 0.0 <= x && x <= 1.0
These types can also be implemented as pre-/post-conditions like so:
function abs(int x) => (int r)
ensures r >= 0:
//
if x >= 0:
return x
else:
return -x
The language is very expressive. These invariants and pre-/post-conditions are verified statically using an SMT solver. This handles examples like the above very well, but currently struggles with more complex examples involving arrays and loop invariants.

For anyone interested, I thought I'd add an example of how you might solve this in Nim as of 2019.
The first part of the questions is straightfoward, since in the interval since since this question was asked, Nim has gained the ability to generate subrange types on floats (as well as ordinal and enum types). The code below defines two new float subranges types, Probability and ProbOne.
The second part of the question is more tricky -- defining a type with constrains on a function of it's fields. My proposed solution doesn't directly define such a type but instead uses a macro (makePmf) to tie the creation of a constant Table[T,Probability] object to the ability to create a valid ProbOne object (thus ensuring that the PMF is valid). The makePmf macro is evaluated at compile time, ensuring that you can't create an invalid PMF table.
Note that I'm a relative newcomer to Nim so this may not be the most idiomatic way to write this macro:
import macros, tables
type
Probability = range[0.0 .. 1.0]
ProbOne = range[1.0..1.0]
macro makePmf(name: untyped, tbl: untyped): untyped =
## Construct a Table[T, Probability] ensuring
## Sum(Probabilities) == 1.0
# helper templates
template asTable(tc: untyped): untyped =
tc.toTable
template asProb(f: float): untyped =
Probability(f)
# ensure that passed value is already is already
# a table constructor
tbl.expectKind nnkTableConstr
var
totprob: Probability = 0.0
fval: float
newtbl = newTree(nnkTableConstr)
# create Table[T, Probability]
for child in tbl:
child.expectKind nnkExprColonExpr
child[1].expectKind nnkFloatLit
fval = floatVal(child[1])
totprob += Probability(fval)
newtbl.add(newColonExpr(child[0], getAst(asProb(fval))))
# this serves as the check that probs sum to 1.0
discard ProbOne(totprob)
result = newStmtList(newConstStmt(name, getAst(asTable(newtbl))))
makePmf(uniformpmf, {"A": 0.25, "B": 0.25, "C": 0.25, "D": 0.25})
# this static block will show that the macro was evaluated at compile time
static:
echo uniformpmf
# the following invalid PMF won't compile
# makePmf(invalidpmf, {"A": 0.25, "B": 0.25, "C": 0.25, "D": 0.15})
Note: A cool benefit of using a macro is that nimsuggest (as integrated into VS Code) will even highlight attempts to create an invalid Pmf table.

Modula 3 has subrange types. (Subranges of ordinals.) So for your Example 1, if you're willing to map probability to an integer range of some precision, you could use this:
TYPE PROBABILITY = [0..100]
Add significant digits as necessary.
Ref: More about subrange ordinals here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string