Can constraints be enforced on public data types? - haskell

I have the following code :
-- A CharBox is a rectangular matrix of characters
data CharBox = CharBox [String]
deriving Show
-- Build a CharBox, ensuring the contents are rectangular
mkCharBox :: [String] -> CharBox
mkCharBox [] = CharBox []
mkCharBox xxs#(x:xs) = if (all (\s -> (length s) == length x) xs)
then CharBox xxs
else error "CharBox must be a rectangle."
The [[Char]] must be rectangular (i.e. all sub-lists must have the same length) for many functions in the module to work properly. Inside the module I'm always using the mkCharBox "constructor" so I don't have to enforce this constraint all the time.
Initially I wanted my module declaration to look like this :
module CharBox (
CharBox, -- No (CharBox) because it doesn't enforce rectangularity
mkCharBox
) where
But like that, users of my module cannot pattern match on CharBox. In another module I do
findWiresRight :: CharBox -> [Int]
findWiresRight (CharBox xs) = elemIndices '-' (map last xs)
And ghci complains: Not in scope: data constructor 'CharBox'
Is it possible to enforce my constraint that CharBoxes contain only rectangular arrays, while still allowing pattern matching ? (Also if this is not possible, I'd be interested in knowing the technical reason why. I find there's usually a lot to learn in Haskell when exploring such restrictions)

It's not possible in vanilla Haskell to both hide the constructors and support pattern matching.
The usual approaches to address this are:
view patterns, essentially, export the pattern matching functions.
or:
move the invariant into the type system via size types.

The simplest solution would be to add an extract function to the module:
extract :: CharBox -> [String]
extract (CharBox xs) = xs
and then use it instead of pattern matching:
findWiresRight :: CharBox -> [Int]
findWiresRight c = elemIndices '-' $ map last $ extract c

Related

Haskell: Why ++ is not allowed in pattern matching?

Suppose we want to write our own sum function in Haskell:
sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (x:xs) = x + sum' xs
Why can't we do something like:
sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (xs++[x]) = x + sum' xs
In other words why can't we use ++ in pattern matching ?
This is a deserving question, and it has so far received sensible answers (mutter only constructors allowed, mutter injectivity, mutter ambiguity), but there's still time to change all that.
We can say what the rules are, but most of the explanations for why the rules are what they are start by over-generalising the question, addressing why we can't pattern match against any old function (mutter Prolog). This is to ignore the fact that ++ isn't any old function: it's a (spatially) linear plugging-stuff-together function, induced by the zipper-structure of lists. Pattern matching is about taking stuff apart, and indeed, notating the process in terms of the plugger-togetherers and pattern variables standing for the components. Its motivation is clarity. So I'd like
lookup :: Eq k => k -> [(k, v)] -> Maybe v
lookup k (_ ++ [(k, v)] ++ _) = Just v
lookup _ _ = Nothing
and not only because it would remind me of the fun I had thirty years ago when I implemented a functional language whose pattern matching offered exactly that.
The objection that it's ambiguous is a legitimate one, but not a dealbreaker. Plugger-togetherers like ++ offer only finitely many decompositions of finite input (and if you're working on infinite data, that's your own lookout), so what's involved is at worst search, rather than magic (inventing arbitrary inputs that arbitrary functions might have thrown away). Search calls for some means of prioritisation, but so do our ordered matching rules. Search can also result in failure, but so, again, can matching.
We have a sensible way to manage computations offering alternatives (failure and choice) via the Alternative abstraction, but we are not used to thinking of pattern matching as a form of such computation, which is why we exploit Alternative structure only in the expression language. The noble, if quixotic, exception is match-failure in do-notation, which calls the relevant fail rather than necessarily crashing out. Pattern matching is an attempt to compute an environment suitable for the evaluation of a 'right-hand side' expression; failure to compute such an environment is already handled, so why not choice?
(Edit: I should, of course, add that you only really need search if you have more than one stretchy thing in a pattern, so the proposed xs++[x] pattern shouldn't trigger any choices. Of course, it takes time to find the end of a list.)
Imagine there was some sort of funny bracket for writing Alternative computations, e.g., with (|) meaning empty, (|a1|a2|) meaning (|a1|) <|> (|a2|), and a regular old (|f s1 .. sn|) meaning pure f <*> s1 .. <*> sn. One might very well also imagine (|case a of {p1 -> a1; .. pn->an}|) performing a sensible translation of search-patterns (e.g. involving ++) in terms of Alternative combinators. We could write
lookup :: (Eq k, Alternative a) => k -> [(k, v)] -> a k
lookup k xs = (|case xs of _ ++ [(k, v)] ++ _ -> pure v|)
We may obtain a reasonable language of search-patterns for any datatype generated by fixpoints of differentiable functors: symbolic differentiation is exactly what turns tuples of structures into choices of possible substructures. Good old ++ is just the sublists-of-lists example (which is confusing, because a list-with-a-hole-for-a-sublist looks a lot like a list, but the same is not true for other datatypes).
Hilariously, with a spot of LinearTypes, we might even keep hold of holey data by their holes as well as their root, then plug away destructively in constant time. It's scandalous behaviour only if you don't notice you're doing it.
You can only pattern match on constructors, not on general functions.
Mathematically, a constructor is an injective function: each combination of arguments gives one unique value, in this case a list. Because that value is unique, the language can deconstruct it again into the original arguments. I.e., when you pattern match on :, you essentially use the function
uncons :: [a] -> Maybe (a, [a])
which checks if the list is of a form you could have constructed with : (i.e., if it is non-empty), and if yes, gives you back the head and tail.
++ is not injective though, for example
Prelude> [0,1] ++ [2]
[0,1,2]
Prelude> [0] ++ [1,2]
[0,1,2]
Neither of these representations is the right one, so how should the list be deconstructed again?
What you can do however is define a new, “virtual” constructor that acts like : in that it always seperates exactly one element from the rest of the list (if possible), but does so on the right:
{-# LANGUAGE PatternSynonyms, ViewPatterns #-}
pattern (:>) :: [a] -> a -> [a]
pattern (xs:>ω) <- (unsnoc -> Just (xs,ω))
where xs:>ω = xs ++ [ω]
unsnoc :: [a] -> Maybe ([a], a)
unsnoc [] = Nothing
unsnoc [x] = Just x
unsnoc (_:xs) = unsnoc xs
Then
sum' :: Num a => [a] -> a
sum' (xs:>x) = x + sum xs
sum' [] = 0
Note that this is very inefficient though, because the :> pattern-synonym actually needs to dig through the entire list, so sum' has quadratic rather than linear complexity.
A container that allows pattern matching on both the left and right end efficiently is Data.Sequence, with its :<| and :|> pattern synonyms.
You can only pattern-match on data constructors, and ++ is a function, not a data constructor.
Data constructors are persistent; a value like 'c':[] cannot be simplified further, because it is a fundamental value of type [Char]. An expression like "c" ++ "d", however, can replaced with its equivalent "cd" at any time, and thus couldn't reliably be counted on to be present for pattern matching.
(You might argue that "cd" could always replaced by "c" ++ "d", but in general there isn't a one-to-one mapping between a list and a decomposition via ++. Is "cde" equivalent to "c" ++ "de" or "cd" ++ "e" for pattern matching purposes?)
++ isn't a constructor, it's just a plain function. You can only match on constructors.
You can use ViewPatterns or PatternSynonyms to augment your ability to pattern match (thanks #luqui).

Insufficient definition of replicate

I have a question that I think is rather tricky.
The standard prelude contains the function
replicate :: Int -> a -> [a]
The following might seem like a reasonable definition for it
replicate n x = take n [x,x,..]
But it is actually not sufficient. Why not?
I know that the replicate function is defined as:
replicate :: Int -> a -> [a]
replicate n x = take n (repeat x)
And repeat is defined as:
repeat :: a -> [a]
repeat x = xs where xs = x:xs
Is the definition insufficient (from the question) because it uses an infinite list?
First of all there is a small syntax error in the question, it should be:
replicate n x = take n [x,x..]
-- ^ no comma
but let's not be picky.
Now when you use range syntax (i.e. x..), then x should be of a type that is an instance of Enum. Indeed:
Prelude> :t \n x -> take n [x,x..]
\n x -> take n [x,x..] :: Enum a => Int -> a -> [a]
You can argue that x,x.. will only generate x, but the Haskell compiler does not know that at compile time.
So the type in replicate (in the question) is too specific: it implies a type constraint - Enum a - that is actually not necessary.
Your own definition on the other hand is perfectly fine. Haskell has no problem with infinite lists since it uses lazy evaluation. Furthermore because you define xs with xs as tail, you actually constructed a circular linked list which also is better in terms of memory usage.

Recursively merge list of lists based on shared elements

I don't know what the official technical name is for what I'm trying to do so I'll try to explain it as best I can.
Given a list of lists:
[[2,3,4,5], [1,5,6], [7,8,9]]
I want to union only the lists that have atleast one common element. So basically something like this:
simUnion :: [[Int]] -> [[Int]]
simUnion list = --...
--Result
-- [[1,2,3,4,5,6], [7,8,9]]
The problem I'm running into is running a match process between each element. Basically this is like the old math class problem where each person in a room must shake the hand of each other person. Ordinarily I'd accomplish this with a nested for loop, but how can I do this using Haskell's recursion?
Any help at all would be great!
If there is a finite number of distinct elements, you can turn the task inside out and make a Ord elem => Map elem [[elem]] out of your [[elem]] and then start iteratively merging the elements by the next algorithm:
while map isn't empty, take away a key, put it in the queue
get all the groups containing key popped from the queue
concat them and put into the queue (and in some accumulator, too)
if the queue got empty, the group is finished; take another key from the map
Note: The following post is written in literate Haskell. Save it as *.lhs and load it in GHCi. Also note that the discussed algorithm has runtime O(n²) and isn't optimal. A better approach would use union find or similar.
First, let us think about the tools we need if we want to group a single list x with the rest of the lists xs. We need to separate between the lists from xs that have an element in common with x, and we need to build the union of such lists. Therefore, we should import some functions from Data.List:
> import Data.List (partition, union)
Next, we need to check whether two lists are suitable to get merged:
> intersects :: Eq a => [a] -> [a] -> Bool
> intersects xs ys = any (`elem` ys) xs
Now we have all the tools at hand to define simUnion. The empty case is clear: if we don't have any lists, the result doesn't have any list either:
> simUnion :: Eq a => [[a]] -> [[a]]
> simUnion [] = []
Suppose we have at least two lists. We take the first one and check whether they have any element in common with any other list. We can do so by using partition:
> simUnion (x:xs) =
> let (common, noncommon) = partition (intersects x) xs
Now, common :: [[a]] will only contain those lists that have at least one element in common. There can be two cases now: either common is empty, and our list x has no element in common with any list from xs:
> in if null common
> then x : simUnion xs
We ignore uncommon here, since xs == uncommon in this case. In the other case, we need to build the union of all lists in common and x. This can be done with foldr union. However, this new list must be used in simUnion again, since it may have new intersections. For example, in
simUnion [[1,2], [2,3], [3,4]]
you want to end up with [[1,2,3,4]], not [[1,2,3],[3,4]]:
> else simUnion (foldr union x common : noncommon)
Note that the result will be unsorted, but you can map sort over it as a last step.
I have two main recommendations:
Don't think of it in terms of recursion! Instead, make liberal use of library utility functions.
Use appropriate data structures! Since you're talking about membership tests and unions, sets (from the Data.Set module) sound like they would be a better choice.
Applying those ideas, here's a fairly simple (though perhaps very naïve and suboptimal) solution:
import Data.Set (Set)
import qualified Data.Set as Set
simUnion :: Set (Set Int) -> Set (Set Int)
simUnion sets = Set.map outer sets
where outer :: Set Int -> Set Int
outer set = unionMap middle set
where middle :: Int -> Set Int
middle i = unionMap inner sets
where inner :: Set Int -> Set Int
inner set
| i `Set.member` set = set
| otherwise = Set.empty
-- | Utility function analogous to the 'concatMap' list function, but
-- for sets.
unionMap :: (Ord a, Ord b) => (a -> Set b) -> Set a -> Set b
unionMap f as = Set.unions (map f (Set.toList as))
Now using your example:
-- | This evaluates to:
--
-- >>> simUnion sampleData
-- fromList [fromList [1,2,3,4,5,6],fromList [7,8,9]]
sampleData :: Set (Set Int)
sampleData = Set.fromList (map Set.fromList sampleData')
where sampleData' :: [[Int]]
sampleData' = [[2,3,4,5], [1,5,6], [7,8,9]]
Ordinarily I'd accomplish this with a nested for loop, but how can I do this using Haskell's recursion?
You don't use recursion directly. You use higher-order functions like Set.map and unionMap. Note that these functions are analogous to loops, and that we're using them in a nested manner. Rule of thumb: imperative for loops very often translate to functional map, filter, reduce or similar operations. Nested imperative loops correspondingly often translate to nested use of such functions.

Uniqueness and other restrictions for Arbitrary in QuickCheck

I'm trying to write a modified Arbitrary instance for my data type, where (in my case) a subcomponent has a type [String]. I would ideally like to bring uniqueness in the instance itself, that way I don't need ==> headers / prerequisites for every test I write.
Here's my data type:
data Foo = Vars [String]
and the trivial Arbitrary instance:
instance Arbitrary Foo where
arbitrary = Vars <$> (:[]) <$> choose ('A','z')
This instance is strange, I know. In the past, I've had difficulty when quickcheck combinatorically explodes, so I'd like to keep these values small. Another request - how can I make an instance where the generated strings are under 4 characters, for instance?
All of this, fundamentally requires (boolean) predicates to augment Arbitrary instances. Is this possible?
Definitely you want the instance to produce only instances that match the intention of the data type. If you want all the variables to be distinct, the Arbitrary instance must reflect this. (Another question is if in this case it wouldn't make more sense to define Vars as a set, like newtype Vars = Set [String].)
I'd suggest to check for duplicates using Set or Hashtable, as nub has O(n^2) complexity, which might slow down your test considerably for larger inputs. For example:
import Control.Applicative
import Data.List (nub)
import qualified Data.Set as Set
import Test.QuickCheck
newtype Foo = Vars [String]
-- | Checks if a given list has no duplicates in _O(n log n)_.
hasNoDups :: (Ord a) => [a] -> Bool
hasNoDups = loop Set.empty
where
loop _ [] = True
loop s (x:xs) | s' <- Set.insert x s, Set.size s' > Set.size s
= loop s' xs
| otherwise
= False
-- | Always worth to test if we wrote `hasNoDups` properly.
prop_hasNoDups :: [Int] -> Property
prop_hasNoDups xs = hasNoDups xs === (nub xs == xs)
Your instance then needs to create a list of list, and each list should be randomized. So instead of (: []), which creates just a singleton list (and just one level), you need to call listOf twice:
instance Arbitrary Foo where
arbitrary = Vars <$> (listOf . listOf $ choose ('A','z'))
`suchThat` hasNoDups
Also notice that choose ('A', 'z') allows to use all characters between A and z, which includes many control characters. My guess is that you rather want something like
oneof [choose ('A','Z'), choose ('a','z')]
If you really want, you could also make hasNoDups O(n) using hash tables in the ST monad.
Concerning limiting the size: you could always have your own parametrized functions that produce different Gen Foo, but I'd say in most cases it's not necessary. Gen has it's own internal size parameter, which is increased throughout the tests (see this answer), so different sizes (as generated using listOf) of lists are covered.
But I'd suggest you to implement shrink, as this will give you much nicer counter-examples. For example, if we define (a wrong test) that tried to verify that no instance of Var contains 'a' in any of its variable:
prop_Foo_hasNoDups :: Foo -> Property
prop_Foo_hasNoDups (Vars xs) = all (notElem 'a') xs === True
we'll get ugly counter-examples such as
Vars ["RhdaJytDWKm","FHHhrqbI","JVPKGTqNCN","awa","DABsOGNRYz","Wshubp","Iab","pl"]
But adding
shrink (Vars xs) = map Vars $ shrink xs
to Arbitrary Foo makes the counter-example to be just
Vars ["a"]
suchThat :: Gen a -> (a -> Bool) -> Gen a is a way to embed Boolean predicates in a Gen. See the haddocks for more info.
Here's how you would make the instance unique:
instance Arbitrary Foo where
arbitrary = Vars <$> (:[]) <$> (:[]) <$> choose ('A','z')
`suchThat` isUnique
where
isUnique x = nub x == x

One interesting pattern

I'm solving 99 Haskell Probems. I've successfully solved problem No. 21, and when I opened solution page, the following solution was proposed:
Insert an element at a given position into a list.
insertAt :: a -> [a] -> Int -> [a]
insertAt x xs (n+1) = let (ys,zs) = split xs n in ys++x:zs
I found pattern (n + 1) interesting, because it seems to be an elegant way to convert 1-based argument of insertAt into 0-based argument of split (it's function from previous exercises, essentially the same as splitAt). The problem is that GHC did not find this pattern that elegant, in fact it says:
Parse error in pattern: n + 1
I don't think that the guy who wrote the answer was dumb and I would like to know if this kind of patterns is legal in Haskell, and if it is, how to fix the solution.
I believe it has been removed from the language, and so was likely around when the author of 99 Haskell Problems wrote that solution, but it is no longer in Haskell.
The problem with n+k patterns goes back to a design decision in Haskell, to distinguish between constructors and variables in patterns by the first character of their names. If you go back to ML, a common function definition might look like (using Haskell syntax)
map f nil = nil
map f (x:xn) = f x : map f xn
As you can see, syntactically there's no difference between f and nil on the LHS of the first line, but they have different roles; f is a variable that needs to be bound to the first argument to map while nil is a constructor that needs to be matched against the second. Now, ML makes this distinction by looking each variable up in the surrounding scope, and assuming names are variables when the look-up fails. So nil is recognized as a constructor when the lookup fails. But consider what happens when there's a typo in the pattern:
map f niil = nil
(two is in niil). niil isn't a constructor name in scope, so it gets treated as a variable, and the definition is silently interpreted incorrectly.
Haskell's solution to this problem is to require constructor names to begin with uppercase letters, and variable names to begin with lowercase letters. And, for infix operators / constructors, constructor names must begin with : while operator names may not begin with :. This also helps distinguish between deconstructing bindings:
x:xn = ...
is clearly a deconstructing binding, because you can't define a function named :, while
n - m = ...
is clearly a function definition, because - can't be a constructor name.
But allowing n+k patterns, like n+1, means that + is both a valid function name, and something that works like a constructor in patterns. Now
n + 1 = ...
is ambiguous again; it could be part of the definition of a function named (+), or it could be a deconstructing pattern match definition of n. In Haskell 98, this ambiguity was solved by declaring
n + 1 = ...
a function definition, and
(n + 1) = ...
a deconstructing binding. But that obviously was never a satisfactory solution.
Note that you can now use view patterns instead of n+1.
For example:
{-# LANGUAGE ViewPatterns #-}
module Temp where
import Data.List (splitAt)
split :: [a] -> Int -> ([a], [a])
split = flip splitAt
insertAt :: a -> [a] -> Int -> [a]
insertAt x xs (subtract 1 -> n) = let (ys,zs) = split xs n in ys++x:zs

Resources