What to call a function that splits lists? - haskell

I want to write a function that splits lists into sublists according to what items satisfy a given property p. My question is what to call the function. I'll give examples in Haskell, but the same problem would come up in F# or ML.
split :: (a -> Bool) -> [a] -> [[a]] --- split lists into list of sublists
The sublists, concatenated, are the original list:
concat (split p xss) == xs
Every sublist satisfies the initial_p_only p property, which is to say (A) the sublist begins with an element satisfying p—and is therefore not empty, and (B) no other elements satisfy p:
initial_p_only :: (a -> Bool) -> [a] -> Bool
initial_p_only p [] = False
initial_p_only p (x:xs) = p x && all (not . p) xs
So to be precise about it,
all (initial_p_only p) (split p xss)
If the very first element in the original list does not satisfy p, split fails.
This function needs to be called something other than split. What should I call it??

I believe the function you're describing is breakBefore from the list-grouping package.
Data.List.Grouping: http://hackage.haskell.org/packages/archive/list-grouping/0.1.1/doc/html/Data-List-Grouping.html
ghci> breakBefore even [3,1,4,1,5,9,2,6,5,3,5,8,9,7,9,3,2,3,8,4,6,2,6]
[[3,1],[4,1,5,9],[2],[6,5,3,5],[8,9,7,9,3],[2,3],[8],[4],[6],[2],[6]]

I quite like some name based on the term "break" as adamse suggests. There are quite a few possible variants of the function. Here is what I'd expect (based on the naming used in F# libraries).
A function named just breakBefore would take an element before which it should break:
breakBefore :: Eq a => a -> [a] -> [[a]]
A function with the With suffix would take some kind of function that directly specifies when to break. In case of brekaing this is the function a -> Bool that you wanted:
breakBeforeWith :: (a -> Bool) -> [a] -> [[a]]
You could also imagine a function with By suffix would take a key selector and break when the key changes (which is a bit like group by, but you can have multiple groups with the same key):
breakBeforeBy :: Eq k => (a -> k) -> [a] -> [[a]]
I admit that the names are getting a bit long - and maybe the only function that is really useful is the one you wanted. However, F# libraries seem to be using this pattern quite consistently (e.g. there is sort, sortBy taking key selector and sortWith taking comparer function).
Perhaps it is possible to have these three variants for more of the list processing functions (and it's quite good idea to have some consistent naming pattern for these three types).

Related

Variable scope in a higher-order lambda function

In working through a solution to the 8 Queens problem, a person used the following line of code:
sameDiag try qs = any (\(colDist,q) -> abs (try - q) == colDist) $ zip [1..] qs
try is an an item; qs is a list of the same items.
Can someone explain how colDist and q in the lambda function get bound to anything?
How did try and q used in the body of lambda function find their way into the same scope?
To the degree this is a Haskell idiom, what problem does this design approach help solve?
The function any is a higher-order function that takes 2 arguments:
the 1st argument is of type a -> Bool, i.e. a function from a to Bool
the 2nd argument is of type [a], i.e. a list of items of type a;
i.e. the 1st argument is a function that takes any element from the list passed as the 2nd argument, and returns a Bool based on that element. (well it can take any values of type a, not just the ones in that list, but it's quite obviously certain that any won't be invoking it with some arbitrary values of a but the ones from the list.)
You can then simplify thinking about the original snippet by doing a slight refactoring:
sameDiag :: Int -> [Int] -> Bool
sameDiag try qs = any f xs
where
xs = zip [1..] qs
f = (\(colDist, q) -> abs (try - q) == colDist)
which can be transformed into
sameDiag :: Int -> [Int] -> Bool
sameDiag try qs = any f xs
where
xs = zip [1..] qs
f (colDist, q) = abs (try - q) == colDist)
which in turn can be transformed into
sameDiag :: Int -> [Int] -> Bool
sameDiag try qs = any f xs
where
xs = zip [1..] qs
f pair = abs (try - q) == colDist) where (colDist, q) = pair
(Note that sameDiag could also have a more general type Integral a => a -> [a] -> Bool rather than the current monomorphic one)
— so how does the pair in f pair = ... get bound to a value? well, simple: it's just a function; whoever calls it must pass along a value for the pair argument. — when calling any with the first argument set to f, it's the invocation of the function any who's doing the calling of f, with individual elements of the list xs passed in as values of the argument pair.
and, since the contents of xs is a list of pairs, it's OK to pass an individual pair from this list to f as f expects it to be just that.
EDIT: a further explanation of any to address the asker's comment:
Is this a fair synthesis? This approach to designing a higher-order function allows the invoking code to change how f behaves AND invoke the higher-order function with a list that requires additional processing prior to being used to invoke f for every element in the list. Encapsulating the list processing (in this case with zip) seems the right thing to do, but is the intent of this additional processing really clear in the original one-liner above?
There's really no additional processing done by any prior to invoking f. There is just very minimalistic bookkeeping in addition to simply iterating through the passed in list xs: invoking f on the elements during the iteration, and immediately breaking the iteration and returning True the first time f returns True for any list element.
Most of the behavior of any is "implicit" though in that it's taken care of by Haskell's lazy evaluation, basic language semantics as well as existing functions, which any is composed of (well at least my version of it below, any' — I haven't taken a look at the built-in Prelude version of any yet but I'm sure it's not much different; just probably more heavily optimised).
In fact, any is simple it's almost trivial to re-implement it with a one liner on a GHCi prompt:
Prelude> let any' f xs = or (map f xs)
let's see now what GHC computes as its type:
Prelude> :t any'
any' :: (a -> Bool) -> [a] -> Bool
— same as the built-in any. So let's give it some trial runs:
Prelude> any' odd [1, 2, 3] -- any odd values in the list?
True
Prelude> any' even [1, 3] -- any even ones?
False
Prelude> let adult = (>=18)
Prelude> any' adult [17, 17, 16, 15, 17, 18]
— see how you can sometimes write code that almost looks like English with higher-order functions?
zip :: [a] -> [b] -> [(a,b)] takes two lists and joins them into pairs, dropping any remaining at the end.
any :: (a -> Bool) -> [a] -> Bool takes a function and a list of as and then returns True if any of the values returned true or not.
So colDist and q are the first and second elements of the pairs in the list made by zip [1..] qs, and they are bound when they are applied to the pair by any.
q is only bound within the body of the lambda function - this is the same as with lambda calculus. Since try was bound before in the function definition, it is still available in this inner scope. If you think of lambda calculus, the term \x.\y.x+y makes sense, despite the x and the y being bound at different times.
As for the design approach, this approach is much cleaner than trying to iterate or recurse through the list manually. It seems quite clear in its intentions to me (with respect to the larger codebase it comes from).

Cleanest way to apply a list of boolean functions to a list?

Consider this:
ruleset = [rule0, rule1, rule2, rule3, rule4, rule5]
where rule0, rule1, etc. are boolean functions that take one argument. What is the cleanest way to find if all elements of a particular list satisfy all the rules in the ruleset?
Obviously, a loop would work, but Haskell folks always seem to have clever one-liners for these types of problems.
The all function seems appropriate (eg. all (== check_one_element) ruleset) or nested maps. Also, map ($ anElement) ruleset is roughly what I want, but for all elements.
I'm a novice at Haskell and the many ways one could approach this problem are overwhelming.
If you require all the functions to be true for each argument, then it's just
and (ruleset <*> list)
(You'll need to import Control.Applicative to use <*>.)
Explanation:
When <*> is given a pair of lists, it applies each function from the list on the left to each argument from the list on the right, and gives back a list containing all the results.
A one-liner:
import Control.Monad.Reader
-- sample data
rulesetL = [ (== 1), (>= 2), (<= 3) ]
list = [1..10]
result = and $ concatMap (sequence rulesetL) list
(The type we're working on here is Integer, but it could be anything else.)
Let me explain what's happening: rulesetL is of type [Integer -> Bool]. By realizing that (->) e is a monad, we can use
sequence :: Monad m => [m a] -> m [a]
which in our case will get specialized to type [Integer -> Bool] -> (Integer -> [Bool]). So
sequence rulesetL :: Integer -> [Bool]
will pass a value to all the rules in the list. Next, we use concatMap to apply this function to list and collect all results into a single list. Finally, calling
and :: [Bool] -> Bool
will check that all combinations returned True.
Edit: Check out dave4420's answer, it's nicer and more concise. Mine answer could help if you'd need to combine rules and apply them later on some lists. In particular
liftM and . sequence :: [a -> Bool] -> (a -> Bool)
combines several rules into one. You can also extend it to other similar combinators like using or etc. Realizing that rules are values of (->) a monad can give you other useful combinators, such as:
andRules = liftM2 (&&) :: (a -> Bool) -> (a -> Bool) -> (a -> Bool)
orRules = liftM2 (||) :: (a -> Bool) -> (a -> Bool) -> (a -> Bool)
notRule = liftM not :: (a -> Bool) -> (a -> Bool)
-- or just (not .)
etc. (don't forget to import Control.Monad.Reader).
An easier-to-understand version (without using Control.Applicative):
satisfyAll elems ruleset = and $ map (\x -> all ($ x) ruleset) elems
Personally, I like this way of writing the function, as the only combinator it uses explicitly is and:
allOkay ruleset items = and [rule item | rule <- ruleset, item <- items]

Type-safe difference lists

A common idiom in Haskell, difference lists, is to represent a list xs as the value (xs ++). Then (.) becomes "(++)" and id becomes "[]" (in fact this works for any monoid or category). Since we can compose functions in constant time, this gives us a nice way to efficiently build up lists by appending.
Unfortunately the type [a] -> [a] is way bigger than the type of functions of the form (xs ++) -- most functions on lists do something other than prepend to their argument.
One approach around this (as used in dlist) is to make a special DList type with a smart constructor. Another approach (as used in ShowS) is to not enforce the constraint anywhere and hope for the best. But is there a nice way of keeping all the desired properties of difference lists while using a type that's exactly the right size?
Yes!
We can view [a] as a free monad instance Free ((,) a) ().
Thus we can apply the scheme described by Edward Kmett in Free Monads for Less.
The type we'll get is
newtype F a = F { runF :: forall r. (() -> r) -> ((a, r) -> r) -> r }
or simply
newtype F a = F { runF :: forall r. r -> (a -> r -> r) -> r }
So runF is nothing else than the foldr function for our list!
This is called the Boehm-Berarducci encoding and it's isomorphic to the original data type (list) — so this is as small as you can possibly get.
Will Ness says:
So this type is still too "wide", it allows more than just prefixing - doesn't constrain the g function argument.
If I understood his argument correctly, he points out that you can apply the foldr (or runF) function to something different from [] and (:).
But I never claimed that you can use foldr-encoding only for concatenation. Indeed, as this name implies, you can use it to calculate any fold — and that's what Will Ness demonstrated.
It may become more clear if you forget for a moment that there's one true list type, [a]. There may be lots of list types — e.g. I can define one by
data List a = Nil | Cons a (List a)
It's be different from, but isomorphic to [a].
The foldr-encoding above is just yet another encoding of lists, like List a or [a]. It is also isomorphic to [a], as evidenced by functions \l -> F (\a f -> foldr a f l) and \x -> runF [] (:) and the fact that their compositions (in either order) is identity. But you are not obliged to convert to [a] — you can convert to List directly, using \x -> runF x Nil Cons.
The important point is that F a doesn't contain an element that is not the foldr functions for some list — nor does it contain an element that is the foldr functions for more than one list (obviously).
Thus, it doesn't contain too few or too many elements — it contains precisely as many elements as needed to exactly represent all lists.
This is not true of the difference list encoding — for example, the reverse function is not an append operation for any list.

What are the alternatives to prelude's iterate if the "output" values are not the same as those being iterated on?

I have come across a pattern where, I start with a seed value x and at each step generate a new seed value and a value to be output. My desired final result is a list of the output values. This can be represented by the following function:
my_iter :: (a -> (a, b)) -> a -> [b]
my_iter f x = y : my_iter f x'
where (x',y) = f x
And a contrived example of using this would be generating the Fibonacci numbers:
fibs:: [Integer]
fibs = my_iter (\(a,b) -> let c = a+b in ((b, c), c)) (0,1)
-- [1, 2, 3, 5, 8...
My problem is that I have this feeling that there is very likely a more idiomatic way to do this kind of stuff. What are the idiomatic alternatives to my function?
The only ones I can think of right now involve iterate from the Prelude, but they have some shortcomings.
One way is to iterate first and map after
my_iter f x = map f2 $ iterate f1 x
where f1 = fst . f
f2 = snd . f
However, this can look ugly if there is no natural way to split f into the separate f1 and f2 functions. (In the contrived Fibonacci case this is easy to do, but there are some situations where the generated value is not an "independent" function of the seed so its not so simple to split things)
The other way is to tuple the "output" values together with the seeds, and use a separate step to separate them (kind of like the "Schwartzian transform" for sorting things):
my_iter f x = map snd . tail $ iterate (f.fst) (x, undefined)
But this seems wierd, since we have to remember to ignore the generated values in order to get to the seed (the (f.fst) bit) and add we need an "undefined" value for the first, dummy generated value.
As already noted, the function you want is unfoldr. As the name suggests, it's the opposite of foldr, but it might be instructive to see exactly why that's true. Here's the type of foldr:
(a -> b -> b) -> b -> [a] -> b
The first two arguments are ways of obtaining something of type b, and correspond to the two data constructors for lists:
[] :: [a]
(:) :: a -> [a] -> [a]
...where each occurrence of [a] is replaced by b. Noting that the [] case produces a b with no input, we can consolidate the two as a function taking Maybe (a, b) as input.
(Maybe (a, b) -> b) -> ([a] -> b)
The extra parentheses show that this is essentially a function that turns one kind of transformation into another.
Now, simply reverse the direction of both transformations:
(b -> Maybe (a, b)) -> (b -> [a])
The result is exactly the type of unfoldr.
The underlying idea this demonstrates can be applied similarly to other recursive data types, as well.
The standard function you're looking for is called unfoldr.
Hoogle is a very useful tool in this case, since it doesn't only support searching functions by name, but also by type.
In your case, you came up with the desired type (a -> (a, b)) -> a -> [b]. Entering it yields no results - hmm.
Well, maybe there's a standard function with a slightly different syntax. For example, the standard function might have its arguments flipped; let's look for something with (a -> (a, b)) in its type signature somewhere. This time we're lucky as there are plenty of results, but all of them are in exotic packages and none of them seems very helpful.
Maybe the second part of your function is a better match, you want to generate a list out of some initial element after all - so type in a -> [b] and hit search. First result: unfoldr - bingo!
Another possibility is iterateM in State monad:
iterateM :: Monad m => m a -> m [a]
iterateM = sequence . repeat
It is not in standard library but it's easy to build.
So your my_iter is
evalState . sequence . repeat :: State s a -> s -> [a]

Creating a list type using functions

For a silly challenge I am trying to implement a list type using as little of the prelude as possible and without using any custom types (the data keyword).
I can construct an modify a list using tuples like so:
import Prelude (Int(..), Num(..), Eq(..))
cons x = (x, ())
prepend x xs = (x, xs)
head (x, _) = x
tail (_, x) = x
at xs n = if n == 0 then xs else at (tail xs) (n-1)
I cannot think of how to write an at (!!) function. Is this even possible in a static language?
If it is possible could you try to nudge me in the right direction without telling me the answer.
There is a standard trick known as Church encoding that makes this easy. Here's a generic example to get you started:
data Foo = A Int Bool | B String
fooValue1 = A 3 False
fooValue2 = B "hello!"
Now, a function that wants to use this piece of data must know what to do with each of the constructors. So, assuming it wants to produce some result of type r, it must at the very least have two functions, one of type Int -> Bool -> r (to handle the A constructor), and the other of type String -> r (to handle the B constructor). In fact, we could write the type that way instead:
type Foo r = (Int -> Bool -> r) -> (String -> r) -> r
You should read the type Foo r here as saying "a function that consumes a Foo and produces an r". The type itself "stores" a Foo inside a closure -- so that it will effectively apply one or the other of its arguments to the value it closed over. Using this idea, we can rewrite fooValue1 and fooValue2:
fooValue1 = \consumeA consumeB -> consumeA 3 False
fooValue2 = \consumeA consumeB -> consumeB "hello!"
Now, let's try applying this trick to real lists (though not using Haskell's fancy syntax sugar).
data List a = Nil | Cons a (List a)
Following the same format as before, consuming a list like this involves either giving a value of type r (in case the constructor was Nil) or telling what to do with an a and another List a, so. At first, this seems problematic, since:
type List a r = (r) -> (a -> List a -> r) -> r
isn't really a good type (it's recursive!). But we can instead demand that we first reduce all the recursive arguments to r first... then we can adjust this type to make something more reasonable.
type List a r = (r) -> (a -> r -> r) -> r
(Again, we should read the type List a r as being "a thing that consumes a list of as and produces an r".)
There's one final trick that's necessary. What we would like to do is to enforce the requirement that the r that our List a r returns is actually constructed from the arguments we pass. That's a little abstract, so let's give an example of a bad value that happens to have type List a r, but which we'd like to rule out.
badList = \consumeNil consumeCons -> False
Now, badList has type List a Bool, but it's not really a function that consumes a list and produces a Bool, since in some sense there's no list being consumed. We can rule this out by demanding that the type work for any r, no matter what the user wants r to be:
type List a = forall r. (r) -> (a -> r -> r) -> r
This enforces the idea that the only way to get an r that gets us off the ground is to use the (user-supplied) consumeNil function. Can you see how to make this same refinement for our original Foo type?
If it is possible could you try and nudge me in the right direction without telling me the answer.
It's possible, in more than one way. But your main problem here is that you've not implemented lists. You've implemented fixed-size vectors whose length is encoded in the type.
Compare the types from adding an element to the head of a list vs. your implementation:
(:) :: a -> [a] -> [a]
prepend :: a -> b -> (a, b)
To construct an equivalent of the built-in list type, you'd need a function like prepend with a type resembling a -> b -> b. And if you want your lists to be parameterized by element type in a straightforward way, you need the type to further resemble a -> f a -> f a.
Is this even possible in a static language?
You're also on to something here, in that the encoding you're using works fine in something like Scheme. Languages with "dynamic" systems can be regarded as having a single static type with implicit conversions and metadata attached, which obviously solves the type mismatch problem in a very extreme way!
I cannot think of how to write an at (!!) function.
Recalling that your "lists" actually encode their length in their type, it should be easy to see why it's difficult to write functions that do anything other than increment/decrement the length. You can actually do this, but it requires elaborate encoding and more advanced type system features. A hint in this direction is that you'll need to use type-level numbers as well. You'd probably enjoy doing this as an exercise as well, but it's much more advanced than encoding lists.
Solution A - nested tuples:
Your lists are really nested tuples - for example, they can hold items of different types, and their type reveals their length.
It is possible to write indexing-like function for nested tuples, but it is ugly, and it won't correspond to Prelude's lists. Something like this:
class List a b where ...
instance List () b where ...
instance List a b => List (b,a) b where ...
Solution B - use data
I recommend using data construct. Tuples are internally something like this:
data (,) a b = Pair a b
so you aren't avoiding data. The division between "custom types" and "primitive types" is rather artificial in Haskell, as opposed to C.
Solution C - use newtype:
If you are fine with newtype but not data:
newtype List a = List (Maybe (a, List a))
Solution D - rank-2-types:
Use rank-2-types:
type List a = forall b. b -> (a -> b -> b) -> b
list :: List Int
list = \n c -> c 1 (c 2 n) -- [1,2]
and write functions for them. I think this is closest to your goal. Google for "Church encoding" if you need more hints.
Let's set aside at, and just think about your first four functions for the moment. You haven't given them type signatures, so let's look at those; they'll make things much clearer. The types are
cons :: a -> (a, ())
prepend :: a -> b -> (a, b)
head :: (a, b) -> a
tail :: (a, b) -> b
Hmmm. Compare these to the types of the corresponding Prelude functions1:
return :: a -> [a]
(:) :: a -> [a] -> [a]
head :: [a] -> a
tail :: [a] -> [a]
The big difference is that, in your code, there's nothing that corresponds to the list type, []. What would such a type be? Well, let's compare, function by function.
cons/return: here, (a,()) corresponds to [a]
prepend/(:): here, both b and (a,b) correspond to [a]
head: here, (a,b) corresponds to [a]
tail: here, (a,b) corresponds to [a]
It's clear, then, that what you're trying to say is that a list is a pair. And prepend indicates that you then expect the tail of the list to be another list. So what would that make the list type? You'd want to write type List a = (a,List a) (although this would leave out (), your empty list, but I'll get to that later), but you can't do this—type synonyms can't be recursive. After all, think about what the type of at/!! would be. In the prelude, you have (!!) :: [a] -> Int -> a. Here, you might try at :: (a,b) -> Int -> a, but this won't work; you have no way to convert a b into an a. So you really ought to have at :: (a,(a,b)) -> Int -> a, but of course this won't work either. You'll never be able to work with the structure of the list (neatly), because you'd need an infinite type. Now, you might argue that your type does stop, because () will finish a list. But then you run into a related problem: now, a length-zero list has type (), a length-one list has type (a,()), a length-two list has type (a,(a,())), etc. This is the problem: there is no single "list type" in your implementation, and so at can't have a well-typed first parameter.
You have hit on something, though; consider the definition of lists:
data List a = []
| a : [a]
Here, [] :: [a], and (:) :: a -> [a] -> [a]. In other words, a list is isomorphic to something which is either a singleton value, or a pair of a value and a list:
newtype List' a = List' (Either () (a,List' a))
You were trying to use the same trick without creating a type, but it's this creation of a new type which allows you to get the recursion. And it's exactly your missing recursion which allows lists to have a single type.
1: On a related note, cons should be called something like singleton, and prepend should be cons, but that's not important right now.
You can implement the datatype List a as a pair (f, n) where f :: Nat -> a and n :: Nat, where n is the length of the list:
type List a = (Int -> a, Int)
Implementing the empty list, the list operations cons, head, tail, and null, and a function convert :: List a -> [a] is left as an easy exercise.
(Disclaimer: stole this from Bird's Introduction to Functional Programming in Haskell.)
Of course, you could represent tuples via functions as well. And then True and False and the natural numbers ...

Resources