Mutually Exclusive Events (Probability Theory) - haskell

Data.Set and Data.List have not a function for intersectionS.
As I was interested in mutually exclusive events, I wrote the following function.
Is it correct? Is it efficient?
mutuallyExclusiveEvents
:: (Foldable t, Ord a) => t (S.Set a) -> Bool
mutuallyExclusiveEvents xss =
isJust $ foldr (\xs acc -> case acc of
Nothing -> Nothing
Just s -> if any (`S.member` s) xs
then Nothing
else Just (S.union xs s)
) (Just $ S.empty) xss
EXAMPLES
mutuallyExclusiveEvents [S.fromList [1,3,5], S.fromList [2,4,6], S.fromList [10,12,1]] == False
mutuallyExclusiveEvents [S.fromList [1,3,5], S.fromList [2,4,6], S.fromList [10,12]] == True

It's a really good start. I think we can identify two goals:
we don't want our algorithm to be quadratic in the number of sub-sets
we want our loop over the input set of sets to truly short-circuit as soon as we determine there is overlap
We would fail at 1 if we tried the naive solution of checking the intersection of each pair of sub-sets in turn. In your solution you've recognized or intuited that if s1 intersects with s2 then it also intersects with the union of s2 and s3 so you can accumulate a union and check for intersection in one pass and save work.
You also partially succeed at (2) in that you avoid doing meaningful work as soon as you find an intersection. The only deficiency is you still have to traverse the entire list. We'd like mutuallyExclusiveEvents to truly short-circuit, that is it should work on infinite lists. A good way to test this when you're developing is using undefined:
*Main S> mutuallyExclusiveEvents' ([S.fromList [1,3,5], S.fromList [10,12,1]] ++ undefined )
*** Exception: Prelude.undefined
CallStack (from HasCallStack):
error, called at libraries/base/GHC/Err.hs:79:14 in base:GHC.Err
undefined, called at <interactive>:95:93 in interactive:Ghci3
Due to laziness foldr (and other functions implemented in terms of it), scanr etc. can truly shortcircuit in the way that we want, e.g.:
*Main S> foldr (&&) True ([True, False] ++ undefined )
False
The trick is that if you want to be able to short-circuit when some condition holds, the function :: a -> b -> b you pass to foldr must be able to return a result without inspecting the second argument (of type b), i.e. it must be lazy in its second argument. Vice versa for foldl.
Here's the solution I came up with:
mutuallyExclusiveEvents
:: (Ord a) => [S.Set a] -> Bool
mutuallyExclusiveEvents xss =
all nonOverlapping $ zip xss $ scanl S.union S.empty xss
where nonOverlapping (s1, s2) = S.null $ S.intersection s1 s2
One way of looking at the functions above: scanl and zip are both productive ( inspecting the head of the resulting list only requires one step of evaluation) while all short-circuits in the way we've just been talking about.
Note it's less general than yours, working only over lists. I thought to rewrite it without zip but with scanl1 but was surprised to find that is not polymorphic in Traversable (there may be a good reason).
EDIT: also as you probably know you can likely get very sophisticated with your approach to this (and related) problems if you want to, especially if false-positives or approximations are okay, e.g. https://en.wikipedia.org/wiki/HyperLogLog

Related

Conditions on list comprehension using Haskell and SBV

I want to write a Haskell list comprehension with a condition on symbolic expressions (SBV). I reproduced the problem with the following small example.
import Data.SBV
allUs :: [SInteger]
allUs = [0,1,2]
f :: SInteger -> SBool
f 0 = sTrue
f 1 = sFalse
f 2 = sTrue
someUs :: [SInteger]
someUs = [u | u <- allUs, f u == sTrue]
with show someUs, this gives the following error
*** Data.SBV: Comparing symbolic values using Haskell's Eq class!
***
*** Received: 0 :: SInteger == 0 :: SInteger
*** Instead use: 0 :: SInteger .== 0 :: SInteger
***
*** The Eq instance for symbolic values are necessiated only because
*** of the Bits class requirement. You must use symbolic equality
*** operators instead. (And complain to Haskell folks that they
*** remove the 'Eq' superclass from 'Bits'!.)
CallStack (from HasCallStack):
error, called at ./Data/SBV/Core/Symbolic.hs:1009:23 in sbv-8.8.5-IR852OLMhURGkbvysaJG5x:Data.SBV.Core.Symbolic
Changing the condition into f u .== sTrue also gives an error
<interactive>:8:27: error:
• Couldn't match type ‘SBV Bool’ with ‘Bool’
Expected type: Bool
Actual type: SBool
• In the expression: f u .== sTrue
In a stmt of a list comprehension: f u .== sTrue
In the expression: [u | u <- allUs, f u .== sTrue]
How to get around this problem?
Neither your f nor your someUs are symbolically computable as written. Ideally, these should be type-errors, rejected out-of-hand. This is due to the fact that symbolic values cannot be instances of the Eq class: Why? Because determining equality of symbolic values requires a call to the underlying solver; so the result cannot be Bool; it should really be SBool. But Haskell doesn't allow generalized guards in pattern-matching to allow for that possibility. (And there are good reasons for that too, so it's not really Haskell's fault here. It's just that the two styles of programming don't work well all that great together.)
You can ask why SBV makes symbolic values an instance of the Eq class. The only reason why it's an instance of Eq is what the error message is telling you: Because we want them to be instances of the Bits class; which has Eq as a superclass requirement. But that's a whole another discussion.
Based on this, how can you write your functions in SBV? Here's how you'd code f in the symbolic style:
f :: SInteger -> SBool
f i = ite (i .== 0) sTrue
$ ite (i .== 1) sFalse
$ ite (i .== 2) sTrue
$ sFalse -- arbitrarily filled to make the function total
Ugly, but this is the only way to write it unless you want to play some quasi-quoting tricks.
Regarding someUs: This isn't something you can directly write symbolically either: This is known as a spine-concrete list. And there's no way for SBV to know how long your resulting list would be without actually running the solver on individual elements. In general you cannot do filter like functions on a spine-concrete list with symbolic elements.
The solution is to use what's known as a symbolic list and a bounded-list abstraction. This isn't very satisfactory, but is the best you can do to avoid termination problems:
{-# LANGUAGE OverloadedLists #-}
import Data.SBV
import Data.SBV.List
import Data.SBV.Tools.BoundedList
f :: SInteger -> SBool
f i = ite (i .== 0) sTrue
$ ite (i .== 1) sFalse
$ ite (i .== 2) sTrue
$ sFalse -- arbitrarily filled to make the function total
allUs :: SList Integer
allUs = [0,1,2]
someUs :: SList Integer
someUs = bfilter 10 f allUs
When I run this, I get:
*Main> someUs
[0,2] :: [SInteger]
But you'll ask what's that number 10 in the call to bfilter? Well, the idea is that all lists are assumed to have some sort of an upper bound on their length, and the Data.SBV.Tools.BoundedList exports a bunch of methods to deal with them easily; all taking a bound parameter. So long as the inputs are at most this length long, they'll work correctly. There's no guarantee as to what happens if your list is longer than the bound given. (In general it'll chop off your lists at the bound, but you should not rely on that behavior.)
There's a worked-out example of uses of such lists in coordination with BMC (bounded-model-checking) at https://hackage.haskell.org/package/sbv-8.12/docs/Documentation-SBV-Examples-Lists-BoundedMutex.html
To sum up, dealing with lists in a symbolic context comes with some costs in modeling and how much you can do, due to restrictions in Haskell (where Bool is a fixed type instead of a class), and underlying solvers, which cannot deal with recursively defined functions all that well. The latter is mainly due to the fact that such proofs require induction, and SMT-solvers cannot do induction out-of-the-box. But if you follow the rules of the game using BMC like ideas, you can handle practical instances of the problem up to reasonable bounds.
(.==) takes two instances of EqSymbolic, returning an SBool. Inside a list comprehension, conditionals are implemented using the guard function.
Here's what it looks like:
guard :: Alternative f => Bool -> f ()
guard False = empty
guard True = pure ()
For lists, empty is [], and pure () returns a singleton list [()]. Any member of the list that evaluates to False will return an empty list instead of a unit item, excluding it from computations down the chain.
[True, False, True] >>= guard
= concatMap guard [True, False, True]
= concat $ map guard [True, False, True]
= concat $ [[()], [], [()]]
= [(), ()]
The second branch is then excluded when the context is flattened, so it's "pruned" from the computation.
It seems like you have two problems here - when you pattern match in f, you're doing a comparison using the Eq class. That's where the SBV error is coming from. Since your values are close together, you could use select, which takes a list of items, a default, an expression which evaluates to an index, and attempt to take the indexth item from that list.
You could rewrite f as
f :: SInteger -> SBool
f = select [sTrue, sFalse, sTrue] sFalse
The second problem is that guards explicitly look for Bool, but (.==) still returns an SBool. Looking at Data.SBV, you should be able to coerce that into a regular Bool using unliteral, which attempts to unwrap an SBV value into an equivalent Haskell one.
fromSBool :: SBool -> Bool
fromSBool = fromMaybe False . unliteral
someUs :: [SInteger]
someUs = [u | u <- allUs, fromSBool (f u)]
-- [0 :: SInteger, 2 :: SInteger]

Haskell: Why ++ is not allowed in pattern matching?

Suppose we want to write our own sum function in Haskell:
sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (x:xs) = x + sum' xs
Why can't we do something like:
sum' :: (Num a) => [a] -> a
sum' [] = 0
sum' (xs++[x]) = x + sum' xs
In other words why can't we use ++ in pattern matching ?
This is a deserving question, and it has so far received sensible answers (mutter only constructors allowed, mutter injectivity, mutter ambiguity), but there's still time to change all that.
We can say what the rules are, but most of the explanations for why the rules are what they are start by over-generalising the question, addressing why we can't pattern match against any old function (mutter Prolog). This is to ignore the fact that ++ isn't any old function: it's a (spatially) linear plugging-stuff-together function, induced by the zipper-structure of lists. Pattern matching is about taking stuff apart, and indeed, notating the process in terms of the plugger-togetherers and pattern variables standing for the components. Its motivation is clarity. So I'd like
lookup :: Eq k => k -> [(k, v)] -> Maybe v
lookup k (_ ++ [(k, v)] ++ _) = Just v
lookup _ _ = Nothing
and not only because it would remind me of the fun I had thirty years ago when I implemented a functional language whose pattern matching offered exactly that.
The objection that it's ambiguous is a legitimate one, but not a dealbreaker. Plugger-togetherers like ++ offer only finitely many decompositions of finite input (and if you're working on infinite data, that's your own lookout), so what's involved is at worst search, rather than magic (inventing arbitrary inputs that arbitrary functions might have thrown away). Search calls for some means of prioritisation, but so do our ordered matching rules. Search can also result in failure, but so, again, can matching.
We have a sensible way to manage computations offering alternatives (failure and choice) via the Alternative abstraction, but we are not used to thinking of pattern matching as a form of such computation, which is why we exploit Alternative structure only in the expression language. The noble, if quixotic, exception is match-failure in do-notation, which calls the relevant fail rather than necessarily crashing out. Pattern matching is an attempt to compute an environment suitable for the evaluation of a 'right-hand side' expression; failure to compute such an environment is already handled, so why not choice?
(Edit: I should, of course, add that you only really need search if you have more than one stretchy thing in a pattern, so the proposed xs++[x] pattern shouldn't trigger any choices. Of course, it takes time to find the end of a list.)
Imagine there was some sort of funny bracket for writing Alternative computations, e.g., with (|) meaning empty, (|a1|a2|) meaning (|a1|) <|> (|a2|), and a regular old (|f s1 .. sn|) meaning pure f <*> s1 .. <*> sn. One might very well also imagine (|case a of {p1 -> a1; .. pn->an}|) performing a sensible translation of search-patterns (e.g. involving ++) in terms of Alternative combinators. We could write
lookup :: (Eq k, Alternative a) => k -> [(k, v)] -> a k
lookup k xs = (|case xs of _ ++ [(k, v)] ++ _ -> pure v|)
We may obtain a reasonable language of search-patterns for any datatype generated by fixpoints of differentiable functors: symbolic differentiation is exactly what turns tuples of structures into choices of possible substructures. Good old ++ is just the sublists-of-lists example (which is confusing, because a list-with-a-hole-for-a-sublist looks a lot like a list, but the same is not true for other datatypes).
Hilariously, with a spot of LinearTypes, we might even keep hold of holey data by their holes as well as their root, then plug away destructively in constant time. It's scandalous behaviour only if you don't notice you're doing it.
You can only pattern match on constructors, not on general functions.
Mathematically, a constructor is an injective function: each combination of arguments gives one unique value, in this case a list. Because that value is unique, the language can deconstruct it again into the original arguments. I.e., when you pattern match on :, you essentially use the function
uncons :: [a] -> Maybe (a, [a])
which checks if the list is of a form you could have constructed with : (i.e., if it is non-empty), and if yes, gives you back the head and tail.
++ is not injective though, for example
Prelude> [0,1] ++ [2]
[0,1,2]
Prelude> [0] ++ [1,2]
[0,1,2]
Neither of these representations is the right one, so how should the list be deconstructed again?
What you can do however is define a new, “virtual” constructor that acts like : in that it always seperates exactly one element from the rest of the list (if possible), but does so on the right:
{-# LANGUAGE PatternSynonyms, ViewPatterns #-}
pattern (:>) :: [a] -> a -> [a]
pattern (xs:>ω) <- (unsnoc -> Just (xs,ω))
where xs:>ω = xs ++ [ω]
unsnoc :: [a] -> Maybe ([a], a)
unsnoc [] = Nothing
unsnoc [x] = Just x
unsnoc (_:xs) = unsnoc xs
Then
sum' :: Num a => [a] -> a
sum' (xs:>x) = x + sum xs
sum' [] = 0
Note that this is very inefficient though, because the :> pattern-synonym actually needs to dig through the entire list, so sum' has quadratic rather than linear complexity.
A container that allows pattern matching on both the left and right end efficiently is Data.Sequence, with its :<| and :|> pattern synonyms.
You can only pattern-match on data constructors, and ++ is a function, not a data constructor.
Data constructors are persistent; a value like 'c':[] cannot be simplified further, because it is a fundamental value of type [Char]. An expression like "c" ++ "d", however, can replaced with its equivalent "cd" at any time, and thus couldn't reliably be counted on to be present for pattern matching.
(You might argue that "cd" could always replaced by "c" ++ "d", but in general there isn't a one-to-one mapping between a list and a decomposition via ++. Is "cde" equivalent to "c" ++ "de" or "cd" ++ "e" for pattern matching purposes?)
++ isn't a constructor, it's just a plain function. You can only match on constructors.
You can use ViewPatterns or PatternSynonyms to augment your ability to pattern match (thanks #luqui).

Functional Programming-Style Map Function that adds elements?

I know and love my filter, map and reduce, which happen to be part of more and more languages that are not really purely functional.
I found myself needing a similar function though: something like map, but instead of one to one it would be one to many.
I.e. one element of the original list might be mapped to multiple elements in the target list.
Is there already something like this out there or do I have to roll my own?
This is exactly what >>= specialized to lists does.
> [1..6] >>= \x -> take (x `mod` 3) [1..]
[1,1,2,1,1,2]
It's concatenating together the results of
> map (\x -> take (x `mod` 3) [1..]) [1..6]
[[1],[1,2],[],[1],[1,2],[]]
You do not have to roll your own. There are many relevant functions here, but I'll highlight three.
First of all, there is the concat function, which already comes in the Prelude (the standard library that's loaded by default). What this function does, when applied to a list of lists, is return the list that contains concatenated contents of the sublists.
EXERCISE: Write your own version of concat :: [[a]] -> [a].
So using concat together with map, you could write this function:
concatMap :: (a -> [b]) -> [a] -> [b]
concatMap f = concat . map f
...except that you don't actually need to write it, because it's such a common pattern that the Prelude already has it (at a more general type than what I show here—the library version takes any Foldable, not just lists).
Finally, there is also the Monad instance for list, which can be defined this way:
instance Monad [] where
return a = [a]
as >>= f = concatMap f as
So the >>= operator (the centerpiece of the Monad class), when working with lists, is exactly the same thing as concatMap.
EXERCISE: Skim through the documentation of the Data.List module. Figure out how to import the module into your code and play around with some of the functions.

Would the ability to detect cyclic lists in Haskell break any properties of the language?

In Haskell, some lists are cyclic:
ones = 1 : ones
Others are not:
nums = [1..]
And then there are things like this:
more_ones = f 1 where f x = x : f x
This denotes the same value as ones, and certainly that value is a repeating sequence. But whether it's represented in memory as a cyclic data structure is doubtful. (An implementation could do so, but this answer explains that "it's unlikely that this will happen in practice".)
Suppose we take a Haskell implementation and hack into it a built-in function isCycle :: [a] -> Bool that examines the structure of the in-memory representation of the argument. It returns True if the list is physically cyclic and False if the argument is of finite length. Otherwise, it will fail to terminate. (I imagine "hacking it in" because it's impossible to write that function in Haskell.)
Would the existence of this function break any interesting properties of the language?
Would the existence of this function break any interesting properties of the language?
Yes it would. It would break referential transparency (see also the Wikipedia article). A Haskell expression can be always replaced by its value. In other words, it depends only on the passed arguments and nothing else. If we had
isCycle :: [a] -> Bool
as you propose, expressions using it would not satisfy this property any more. They could depend on the internal memory representation of values. In consequence, other laws would be violated. For example the identity law for Functor
fmap id === id
would not hold any more: You'd be able to distinguish between ones and fmap id ones, as the latter would be acyclic. And compiler optimizations such as applying the above law would not longer preserve program properties.
However another question would be having function
isCycleIO :: [a] -> IO Bool
as IO actions are allowed to examine and change anything.
A pure solution could be to have a data type that internally distinguishes the two:
import qualified Data.Foldable as F
data SmartList a = Cyclic [a] | Acyclic [a]
instance Functor SmartList where
fmap f (Cyclic xs) = Cyclic (map f xs)
fmap f (Acyclic xs) = Acyclic (map f xs)
instance F.Foldable SmartList where
foldr f z (Acyclic xs) = F.foldr f z xs
foldr f _ (Cyclic xs) = let r = F.foldr f r xs in r
Of course it wouldn't be able to recognize if a generic list is cyclic or not, but for many operations it'd be possible to preserve the knowledge of having Cyclic values.
In the general case, no you can't identify a cyclic list. However if the list is being generated by an unfold operation then you can. Data.List contains this:
unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
The first argument is a function that takes a "state" argument of type "b" and may return an element of the list and a new state. The second argument is the initial state. "Nothing" means the list ends.
If the state ever recurs then the list will repeat from the point of the last state. So if we instead use a different unfold function that returns a list of (a, b) pairs we can inspect the state corresponding to each element. If the same state is seen twice then the list is cyclic. Of course this assumes that the state is an instance of Eq or something.

Folding across Maybes in Haskell

In an attempt to learn Haskell, I have come across a situation in which I wish to do a fold over a list but my accumulator is a Maybe. The function I'm folding with however takes in the "extracted" value in the Maybe and if one fails they all fail. I have a solution I find kludgy, but knowing as little Haskell as I do, I believe there should be a better way. Say we have the following toy problem: we want to sum a list, but fours for some reason are bad, so if we attempt to sum in a four at any time we want to return Nothing. My current solution is as follows:
import Maybe
explodingFourSum :: [Int] -> Maybe Int
explodingFourSum numberList =
foldl explodingFourMonAdd (Just 0) numberList
where explodingFourMonAdd =
(\x y -> if isNothing x
then Nothing
else explodingFourAdd (fromJust x) y)
explodingFourAdd :: Int -> Int -> Maybe Int
explodingFourAdd _ 4 = Nothing
explodingFourAdd x y = Just(x + y)
So basically, is there a way to clean up, or eliminate, the lambda in the explodingFourMonAdd using some kind of Monad fold? Or somehow currying in the >>=
operator so that the fold behaves like a list of functions chained by >>=?
I think you can use foldM
explodingFourSum numberList = foldM explodingFourAdd 0 numberList
This lets you get rid of the extra lambda and that (Just 0) in the beggining.
BTW, check out hoogle to search around for functions you don't really remember the name for.
So basically, is there a way to clean up, or eliminate, the lambda in the explodingFourMonAdd using some kind of Monad fold?
Yapp. In Control.Monad there's the foldM function, which is exactly what you want here. So you can replace your call to foldl with foldM explodingFourAdd 0 numberList.
You can exploit the fact, that Maybe is a monad. The function sequence :: [m a] -> m [a] has the following effect, if m is Maybe: If all elements in the list are Just x for some x, the result is a list of all those justs. Otherwise, the result is Nothing.
So you first decide for all elements, whether it is a failure. For instance, take your example:
foursToNothing :: [Int] -> [Maybe Int]
foursToNothing = map go where
go 4 = Nothing
go x = Just x
Then you run sequence and fmap the fold:
explodingFourSum = fmap (foldl' (+) 0) . sequence . foursToNothing
Of course you have to adapt this to your specific case.
Here's another possibility not mentioned by other people. You can separately check for fours and do the sum:
import Control.Monad
explodingFourSum xs = guard (all (/=4) xs) >> return (sum xs)
That's the entire source. This solution is beautiful in a lot of ways: it reuses a lot of already-written code, and it nicely expresses the two important facts about the function (whereas the other solutions posted here mix those two facts up together).
Of course, there is at least one good reason not to use this implementation, as well. The other solutions mentioned here traverse the input list only once; this interacts nicely with the garbage collector, allowing only small portions of the list to be in memory at any given time. This solution, on the other hand, traverses xs twice, which will prevent the garbage collector from collecting the list during the first pass.
You can solve your toy example that way, too:
import Data.Traversable
explodingFour 4 = Nothing
explodingFour x = Just x
explodingFourSum = fmap sum . traverse explodingFour
Of course this works only because one value is enough to know when the calculation fails. If the failure condition depends on both values x and y in explodingFourSum, you need to use foldM.
BTW: A fancy way to write explodingFour would be
import Control.Monad
explodingFour x = mfilter (/=4) (Just x)
This trick works for explodingFourAdd as well, but is less readable:
explodingFourAdd x y = Just (x+) `ap` mfilter (/=4) (Just y)

Resources