Out-of-bounds `select` even though I `constrain` the index - haskell

I have a static-length list of values ks :: [SInt16] and an index x :: SInt16. I'd like to index into the list using x:
(.!) :: (Mergeable a) => [a] -> SInt16 -> a
xs .! i = select xs (error "(.!) : out of bounds") i
I would expect to be able to use (.!) with a sufficiently constrained x like this:
sat $ do
let ks = [1, 3, 5, 2, 4]
x <- sInt16 "x"
constrain $ 0 .<= x .&& x .< literal (fromIntegral $ length ks)
let y = ks .! x
return $ y .< x
However, this fails with the error coming from (.!).
Of course, in my real program, I use (.!) all over the place in locations where there is no suitable default value to use in select.
How can I index into a list with a constrained-to-be-in-bounds index?

Simple solution
select is fully expanded by SBV during symbolic execution, hence you do have to provide a proper default value, as you found out. So, if you do want to use select you have to come up with an actual value there.
To address your immediate need, I'd suggest simply defining:
(.!) :: (Mergeable a) => [a] -> SInt16 -> a
[] .! _ = error "(.!): Empty list!"
xs#(x:_) .! i = select xs x i
So long as you make sure you have asserted enough constraints on i, this should work just fine.
A slightly better approach
The above requires your user to keep track of proper constraints on the index variable, and this can get rather hairy. A simple trick to use in these cases is to use a "smart" constructor instead. First define:
import Data.SBV
mkIndex :: SIntegral b => String -> [a] -> Symbolic (SBV b)
mkIndex nm lst = do i <- free nm
constrain $ 0 .<= i .&& i .< literal (fromIntegral (length lst))
pure i
(.!) :: (Mergeable a) => [a] -> SInt16 -> a
[] .! _ = error "(.!): Empty list!"
xs#(x:_) .! i = select xs x i
Now you can say:
p = sat $ do let ks = [1, 3, 5, 2, 4]
x <- mkIndex "x" ks
let y = ks .! x
return $ y .< x
This is just a bit more verbose than your original (as you need to pass the list you want to index into), but it can save a lot of headaches down the road. Furthermore, you can change your mkIndex to put diagnostics, or assert further constraints as needed.
A more defensive approach
The "better" approach above requires you to know in advance the length of the list you'll be indexing into. This is obvious in your example, but I can imagine situations where that information will not be readily available. If that is the case, I'd recommend actually creating a symbolic value for the out-of-bounds access element, and tracking that explicitly yourself. This is more complicated, but you can hide most of it behind a simple data-type. Something like:
{-# LANGUAGE ScopedTypeVariables #-}
import Data.SBV
newtype Index a b = Index (SBV a, SBV b)
mkIndex :: (SymVal a, SymVal b) => String -> Symbolic (Index a b)
mkIndex nm = do def <- free $ nm ++ "_access_out_of_bounds_value"
idx <- free nm
pure $ Index (def, idx)
(.!) :: (SymVal a, SIntegral b) => [SBV a] -> Index a b -> SBV a
xs .! Index (i, i') = select xs i i'
Now assume you try to do a sat, but put in incorrect constraints on your index:
p = sat $ do let ks = [1, 3, 5, 2, 4]
xi#(Index (_, x)) :: Index Int16 Int16 <- mkIndex "x"
-- incorrectly constrain x here to do out-of-bounds
constrain $ x .> 10
let y = ks .! xi
pure $ y .< x
You'll get:
*Main> p
Satisfiable. Model:
x_access_out_of_bounds_value = 0 :: Int16
x = 16386 :: Int16
This way, you can see that something went wrong, and what value the solver picked to satisfy the access-out-of-bounds case.
Summary
Which approach you take really depends on your actual needs. But I'd recommend going for at least the second alternative if possible, as an SMT solver can always "cleverly" pick values to give you unexpected models. You'd have protected yourself against at least the most obvious bugs that way. In a production system, I'd insist on the third approach, as debugging bugs that arise due to complicated constraints can be rather difficult in practice. The more "tracking" variables you leave for yourself, the better.

Related

How to avoid infinite loop in zipWith a self reference?

I'd like to create a list data structure that can zipWith that has a better behavior with self reference. This is for an esoteric language that will rely on self reference and laziness to be Turing complete using only values (no user functions). I've already created it, called Atlas but it has many built ins, I'd like to reduce that and be able to compile/interpret in Haskell.
The issue is that zipWith checks if either list is empty and returns empty. But in the case that this answer depends on the result of zipWith then it will loop infinitely. Essentially I'd like it to detect this case and have faith that the list won't be empty. Here is an example using DList
import Data.DList
import Data.List (uncons)
zipDL :: (a->b->c) -> DList a -> DList b -> DList c
zipDL f a b = fromList $ zipL f (toList a) (toList b)
zipL :: (a->b->c) -> [a] -> [b] -> [c]
zipL _ [] _ = []
zipL _ _ [] = []
zipL f ~(a:as) ~(b:bs) = f a b : zipL f as bs
a = fromList [5,6,7]
main=print $ dh where
d = zipDL (+) a $ snoc (fromList dt) 0
~(Just (dh,dt)) = uncons $ toList d
This code would sum the list 5,6,7 except for the issue. It can be fixed by removing zipL _ _ [] = [] because then it assumes that the result won't be empty and then it in fact turns out not to be empty. But this is a bad solution because we can't always assume that it is the second list that could have the self reference.
Another way of explaining it is if we talk about the sizes of these list.
The size of zip a b = min (size a) (size b)
So in this example: size d = min (size a) (size d-1+1)
But there in lies the problem, if the size of d is 0, then the size of d = 0, but if size of d is 1 the size is 1, however once the size of d is said to be greater than size of a, then the size would be a, which is a contradiction. But any size 0-a works which means it is undefined.
Essentially I want to detect this case and make the size of d = a.
So far the only thing I have figured out is to make all lists lists of Maybe, and terminate lists with a Nothing value. Then in the application of the zipWith binary function return Nothing if either value is Nothing. You can then take out both of the [] checks in zip, because you can think of all lists as being infinite. Finally to make the summation example work, instead of doing a snoc, do a map, and replace any Nothing value with the snoc value. This works because when checking the second list for Nothing, it can lazily return true, since no value of the second list can be nothing.
Here is that code:
import Data.Maybe
data L a = L (Maybe a) (L a)
nil :: L a
nil = L Nothing nil
fromL :: [a] -> L a
fromL [] = nil
fromL (x:xs) = L (Just x) (fromL xs)
binOpMaybe :: (a->b->c) -> Maybe a -> Maybe b -> Maybe c
binOpMaybe f Nothing _ = Nothing
binOpMaybe f _ Nothing = Nothing
binOpMaybe f (Just a) (Just b) = Just (f a b)
zip2W :: (a->b->c) -> L a -> L b -> L c
zip2W f ~(L a as) ~(L b bs) = L (binOpMaybe f a b) (zip2W f as bs)
unconsL :: L a -> (Maybe a, Maybe (L a))
unconsL ~(L a as) = (a, Just as)
mapOr :: a -> L a -> L a
mapOr v ~(L a as) = L (Just $ fromMaybe v a) $ mapOr v as
main=print $ h
where
a = fromL [4,5,6]
b = zip2W (+) a (mapOr 0 (fromJust t))
(h,t) = unconsL $ b
The downside to this approach is it needs this other operator to map with Just . fromMaybe initialvalue. This is a less intuitive operator than ++. And without it the language could be built entirely on ++ uncons and (:[]) which would be pretty neat.
The other thing I've figured out is in the current ruby implementation to throw an error when a value depends on itself, and catch it in the empty list detection. But this is vary hacky and not entirely sound, although it does work for cases like this. I don't think this can work in Haskell since I don't think you can detect self dependence?
Sorry for the long description and the very odd use case. I've spent tons of time thinking about this, but haven't solved it yet and can't explain it any more succinctly! Not expecting an answer but figured it is worth a shot, thanks for considering.
EDIT:
After seeing it framed as a greatest fixed point question, it seems like a poor question because there is no efficient general solution to such a problem. For example, suppose the code was b = zipWith (+) a (if length b < 1 then [1] else []).
For my purposes it could still be nice to handle some cases correctly - the example provided does have a solution. So I could reframe the question as: when can we find the greatest fixed point efficiently and what is that fixed point? But I believe there is no simple answer to such a question, and so it would be a poor basis for a programming language to rely on ad hoc rules.
Sounds like you want a greatest fixed point. I'm not sure I've seen this done before, but maybe it's possible to make a sensible type class for types that support those.
class GF a where gfix :: (a -> a) -> a
instance GF a => GF [a] where
gfix f = case (f (repeat undefined), f []) of
(_:_, _) -> b:bs where
b = gfix (\a' -> head (f (a':bs)))
bs = gfix (\as' -> tail (f (b:as')))
([], []) -> []
_ -> error "no fixed point greater than bottom exists"
-- use the usual least fixed point. this ain't quite right, but
-- it works for this example, and maybe it's Good Enough
instance GF Int where gfix f = let x = f x in x
Try it out in ghci:
> gfix (\xs -> zipWith (+) [5,6,7] (tail xs ++ [0])) :: [Int]
[18,13,7]
This implementation isn't particularly efficient; e.g. replacing [5,6,7] with [1..n] results in a runtime that's quadratic in n. Perhaps with some cleverness that can be improved, but it's not immediately obvious to me how that would go.
I have an answer for this specific case, not general.
appendRepeat :: a -> [a] -> [a]
appendRepeat v a = h : appendRepeat v t
where
~(h,t) =
if null a
then (v,[])
else (head a,tail a)
a = [4,5,6]
main=print $ head b
where
b = zipWith (+) a $ appendRepeat 0 (tail b)
appendRepeat adds a an infinite list of a repeated value to the end of a list. But the key thing about it is it doesn't check if list is empty or not when deciding that it is returning a non empty list where the tail is a recursive call. This way laziness never ends up in an infinite loop checking the zipWith _ [] case.
So this code works, and for the purposes of the original question, it can be used to convert the language to just using 2 simple functions (++ and :[]). But the interpreter would need to do some static analysis for appending a repeated value and replace it to using this special appendRepeat function (which can easily be done in Atlas). It seems hacky to only make this one implementation switcharoo, but that is all that is needed.

Haskell nested function order

I'm trying to write a function in Haskell to generate multidimensional lists.
(Technically I'm using Curry, but my understanding is that it's mostly a superset of Haskell, and the thing I'm trying to do is common to Haskell as well.)
After a fair bit of head scratching, I realized my initial desired function (m_array generating_function list_of_dimensions, giving a list nested to a depth equal to length list_of_dimensions) was probably at odds with they type system itself, since (AFAICT) the nesting-depth of lists is part of its type, and my function wanted to return values whose nesting-depths differed based on the value of a parameter, meaning it wanted to return values whose types varied based on the value of a parameter, which (AFAICT) isn't supported in Haskell. (If I'm wrong, and this CAN be done, please tell me.) At this point I moved on to the next paragraph, but if there's a workaround I've missed that takes very similar parameters and still outputs a nested list, let me know. Like, maybe if you can encode the indices as some data type that implicitly includes the nesting level in its type, and is instantiated with e.g. dimensions 5 2 6 ..., maybe that'd work? Not sure.
In any case, I thought that perhaps I could encode the nesting-depth by nesting the function itself, while still keeping the parameters manageable. This did work, and I ended up with the following:
ma f (l:ls) idx = [f ls (idx++[i]) | i <- [0..(l-1)]]
However, so far it's still a little clunky to use: you need to nest the calls, like
ma (ma (ma (\_ i -> 0))) [2,2,2] []
(which, btw, gives [[[0,0],[0,0]],[[0,0],[0,0]]]. If you use (\_ i -> i), it fills the array with the indices of the corresponding element, which is a result I'd like to keep available, but could be a confusing example.)
I'd prefer to minimize the boilerplate necessary. If I can't just call
ma (\_ i -> i) [2,2,2]
I'd LIKE to be able to call, at worst,
ma ma ma (\_ i -> i) [2,2,2] []
But if I try that, I get errors. Presumably the list of parameters is being divvied up in a way that doesn't make sense for the function. I've spent about half an hour googling and experimenting, trying to figure out Haskell's mechanism for parsing strings of functions like that, but I haven't found a clear explanation, and understanding eludes me. So, the formal questions:
How does Haskell parse e.g. f1 f2 f3 x y z? How are the arguments assigned? Is it dependent on the signatures of the functions, or does it e.g. just try to call f1 with 5 arguments?
Is there a way of restructuring ma to permit calling it without parentheses? (Adding at most two helper functions would be permissible, e.g. maStart ma ma maStop (\_ i -> i) [1,2,3,4] [], if necessary.)
The function you want in your head-scratching paragraph is possible directly -- though a bit noisily. With GADTs and DataKinds, values can be parameterized by numbers. You won't be able to use lists directly, because they don't mention their length in their type, but a straightforward variant that does works great. Here's how it looks.
{-# Language DataKinds #-}
{-# Language GADTs #-}
{-# Language ScopedTypeVariables #-}
{-# Language StandaloneDeriving #-}
{-# Language TypeOperators #-}
import GHC.TypeLits
infixr 5 :+
data Vec n a where
O :: Vec 0 a -- O is supposed to look a bit like a mix of 0 and []
(:+) :: a -> Vec n a -> Vec (n+1) a
data FullTree n a where
Leaf :: a -> FullTree 0 a
Branch :: [FullTree n a] -> FullTree (n+1) a
deriving instance Show a => Show (Vec n a)
deriving instance Show a => Show (FullTree n a)
ma :: forall n a. ([Int] -> a) -> Vec n Int -> FullTree n a
ma f = go [] where
go :: [Int] -> Vec n' Int -> FullTree n' a
go is O = Leaf (f is)
go is (l :+ ls) = Branch [go (i:is) ls | i <- [0..l-1]]
Try it out in ghci:
> ma (\_ -> 0) (2 :+ 2 :+ 2 :+ O)
Branch [Branch [Branch [Leaf 0,Leaf 0],Branch [Leaf 0,Leaf 0]],Branch [Branch [Leaf 0,Leaf 0],Branch [Leaf 0,Leaf 0]]]
> ma (\i -> i) (2 :+ 2 :+ 2 :+ O)
Branch [Branch [Branch [Leaf [0,0,0],Leaf [1,0,0]],Branch [Leaf [0,1,0],Leaf [1,1,0]]],Branch [Branch [Leaf [0,0,1],Leaf [1,0,1]],Branch [Leaf [0,1,1],Leaf [1,1,1]]]]
A low-tech solution:
In Haskell, you can model multi-level lists by using the so-called free monad.
The base definition is:
data Free ft a = Pure a | Free (ft (Free ft a))
where ft can be any functor, but here we are interested in ft being [], that is the list functor.
So we define our multidimensional list like this:
import Control.Monad
import Control.Monad.Free
type Mll = Free [] -- Multi-Level List
The Mll type transformer happens to be an instance of the Functor, Foldable, Traversable classes, which can come handy.
To make an array of arbitrary dimension, we start with:
the list of dimensions, for example [5,2,6]
the filler function, which returns a value for a given set of indices
We can start by making a “grid” object, whose item at indices say [x,y,z] is precisely the [x,y,z] list. As we have a functor instance, we can complete the process by just applying fmap filler to our grid object.
This gives the following code:
makeNdArray :: ([Int] -> a) -> [Int] -> Mll a
makeNdArray filler dims =
let
addPrefix x (Pure xs) = Pure (x:xs)
addPrefix x (Free xss) = Free $ map (fmap (x:)) xss
makeGrid [] = Pure []
makeGrid (d:ds) = let base = 0
fn k = addPrefix k (makeGrid ds)
in Free $ map fn [base .. (d-1+base)]
grid = makeGrid dims
in
fmap filler grid -- because we are an instance of the Functor class
To visualize the resulting structure, it is handy to be able to remove the constructor names:
displayMll :: Show a => Mll a -> String
displayMll = filter (\ch -> not (elem ch "Pure Free")) . show
The resulting structure can easily be flattened if need be:
toListFromMll :: Mll a -> [a]
toListFromMll xs = foldr (:) [] xs
For numeric base types, we can get a multidimensional sum function “for free”, so to speak:
mllSum :: Num a => (Mll a) -> a
mllSum = sum -- because we are an instance of the Foldable class
-- or manually: foldr (+) 0
Some practice:
We use [5,2,6] as the dimension set. To visualize the structure, we associate a decimal digit to every index. We can pretend to have 1-base indexing by adding 111, because that way all the resulting numbers are 3 digits long, which makes the result easier to check. Extra newlines added manually.
$ ghci
GHCi, version 8.8.4: https://www.haskell.org/ghc/ :? for help
λ>
λ> dims = [5,2,6]
λ> filler = \[x,y,z] -> (100*x + 10*y + z + 111)
λ>
λ> mxs = makeNdArray filler dims
λ>
λ> displayMll mxs
"[[[111,112,113,114,115,116],[121,122,123,124,125,126]],
[[211,212,213,214,215,216],[221,222,223,224,225,226]],
[[311,312,313,314,315,316],[321,322,323,324,325,326]],
[[411,412,413,414,415,416],[421,422,423,424,425,426]],
[[511,512,513,514,515,516],[521,522,523,524,525,526]]]"
λ>
As mentioned above, we can flatten the structure:
λ>
λ> xs = toListFromMll mxs
λ> xs
[111,112,113,114,115,116,121,122,123,124,125,126,211,212,213,214,215,216,221,222,223,224,225,226,311,312,313,314,315,316,321,322,323,324,325,326,411,412,413,414,415,416,421,422,423,424,425,426,511,512,513,514,515,516,521,522,523,524,525,526]
λ>
or take its overall sum:
λ>
λ> sum mxs
19110
λ>
λ> sum xs
19110
λ>
λ>
λ> length mxs
60
λ>
λ> length xs
60
λ>

Constrain a symbolic list on count of elements of a certain type in SBV

Using the SBV library, I'm trying to satisfy conditions on a symbolic list of states:
data State = Intro | Start | Content | Comma | Dot
mkSymbolicEnumeration ''State
-- examples of such lists
[Intro, Start, Content, Comma, Start, Comma, Content, Dot]
[Intro, Comma, Start, Content, Comma, Content, Start, Dot]
All works fine except that I need the final list to contain exactly n elements of either [Intro, Start, Content] in total. Currently I do it using a bounded filter:
answer :: Int -> Symbolic [State]
answer n = do
seq <- sList "seq"
let maxl = n+6
let minl = n+2
constrain $ L.length seq .<= fromIntegral maxl
constrain $ L.length seq .>= fromIntegral minl
-- some additional constraints hidden for brevity purposes
let etypes e = e `sElem` [sIntro, sStart, sContent]
constrain $ L.length (L.bfilter maxl etypes seq) .== fromIntegral n
As you can see, the list can be of any length between n+2 and n+6, the important bit is that it has the right count of [sIntro, sStart, sContent] elements within it.
It works all fine, except it's extremely slow. Like, for n=4 it takes a few seconds, but for n>=6 it takes forever (more than 30 minutes and still counting). If I remove the bounded filter constraint, the result is instant with n up to 25 or so.
In the end, I don't particularly care about using L.bfilter. All I need is a way to declare that the final symbolic list should contain exactly n elements of some given types.
-> Is there a faster way to be able to satisfy for count(sIntro || sStart || sContent)?
-- EDIT after discussion in comments:
The code below is supposed to make sure that all valid elements are up-front in the elts list. For example, if we count 8 valids elements from elts, then we take 8 elts and we count the validTaken valid elements in this sub-list. If the result is 8, it means that all the 8 valids elements are up-front in elts. Sadly, this results in a systematic Unsat outcome, even after removing all other constraints. The function works well when tested against some dummy lists of elements, though.
-- | test that all valid elements are upfront in the list of elements
validUpFront :: SInteger -> [Elem] -> SBool
validUpFront valids elts =
let takeValids = flip take elts <$> (fromInteger <$> unliteral valids)
validTaken = sum $ map (oneIf . included) $ fromMaybe [] takeValids
in valids .== validTaken
-- ...
answer n = runSMT $ do
-- ...
let valids = sum $ map (oneIf . included) elts :: SInteger
constrain $ validUpFront valids elts
Solvers for the sequence logic, while quite versatile, are notoriously slow. For this particular problem, I'd recommend using regular boolean logic, which will perform much better. Here's how I'd code your problem:
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE DeriveDataTypeable #-}
{-# LANGUAGE DeriveAnyClass #-}
{-# LANGUAGE StandaloneDeriving #-}
import Data.SBV
import Data.SBV.Control
import Data.Maybe
import Control.Monad
data State = Intro | Start | Content | Comma | Dot
mkSymbolicEnumeration ''State
data Elem = Elem { included :: SBool
, element :: SState
}
new :: Symbolic Elem
new = do i <- free_
e <- free_
pure Elem {included = i, element = e}
get :: Elem -> Query (Maybe State)
get e = do isIn <- getValue (included e)
if isIn
then Just <$> getValue (element e)
else pure Nothing
answer :: Int -> IO [State]
answer n = runSMT $ do
let maxl = n+6
let minl = n+2
-- allocate upto maxl elements
elts <- replicateM maxl new
-- ask for at least minl of them to be valid
let valids :: SInteger
valids = sum $ map (oneIf . included) elts
constrain $ valids .>= fromIntegral minl
-- count the interesting ones
let isEtype e = included e .&& element e `sElem` [sIntro, sStart, sContent]
eTypeCount :: SInteger
eTypeCount = sum $ map (oneIf . isEtype) elts
constrain $ eTypeCount .== fromIntegral n
query $ do cs <- checkSat
case cs of
Sat -> catMaybes <$> mapM get elts
_ -> error $ "Query is " ++ show cs
Example run:
*Main> answer 5
[Intro,Comma,Comma,Intro,Intro,Intro,Start]
I've been able to run upto answer 500 which returned in about 5 seconds on my relatively old machine.
Making sure all valids are at the beginning
The easiest way to make all the valid elements are at the beginning of the list is to count the alternations in the included value, and make sure you allow only one such transition:
-- make sure there's at most one-flip in the sequence.
-- This'll ensure all the selected elements are up-front.
let atMostOneFlip [] = sTrue
atMostOneFlip (x:xs) = ite x (atMostOneFlip xs) (sAll sNot xs)
constrain $ atMostOneFlip (map included elts)
This'll make sure all the valids precede the suffix of the list that contain the invalid entries. When you write your other properties, you'd have to check that both the current element and the next element is valid. In template form:
foo (x:y:rest) = ((included x .&& included y) .=> (element y .== sStart .=> element x .== sDot))
.&& foo (y:rest)
By symbolically looking at the values of included x and included y, you can determine if they are both included, or if x is the last element, or if they're both out; and write the corresponding constraints as implications in each case. The above shows the case for when you're in the middle of the sequence somewhere, with both x and y included.

Haskell - finding bigrams from an input list of words

I'm following the NLPWP Computational Linguistics site and trying to create a Haskell procedure to find collocations (most common groupings of two words, like "United States" or "to find") in a list of words. I've got the following working code to find bigram frequency:
import Data.Map (Map)
import qualified Data.Map as Map
-- | Function for creating a list of bigrams
-- | e.g. [("Colorless", "green"), ("green", "ideas")]
bigram :: [a] -> [[a]]
bigram [] = []
bigram [_] = []
bigram xs = take 2 xs : bigram (tail xs)
-- | Helper for freqList and freqBigram
countElem base alow = case (Map.lookup alow base) of
Just v -> Map.insert alow (v + 1) base
Nothing -> Map.insert alow 1 base
-- | Maps each word to its frequency.
freqList alow = foldl countElem Map.empty alow
-- | Maps each bigram to its frequency.
freqBigram alow = foldl countElem Map.empty (bigram alow)
I'm trying to write a function that outputs a Map from each bigram to [freq of bigram]/[(freq word 1)*(freq word 2)]. Could you possibly provide advice on how to approach it?
None of the following code is working, but it gives a vague outline for what I was trying to do.
collocations alow =
| let f key = (Map.lookup key freqBi) / ((Map.lookup (first alow) freqs)*(Map.lookup (last alow) freqs))
in Map.mapWithKey f = freqBi
where freqs = (freqList alow)
where freqBi = (freqBigram alow)
I'm very new to Haskell, so let me know if you've got any idea how to fix the collocations procedure. Style tips are also welcome.
Most of your code looks sane, except for the final colloctions function.
I'm not sure why there's a stray pipe in there after the equals sign. You're not trying to write any kind of pattern guard, so I don't think that should be there.
Map.lookup returns a Maybe key, so trying to do division or multiplication isn't going to work. Maybe what you want is some kind of function that takes a key and a map, and returns the associated count or zero if the key doesn't exist?
Other than that, it looks like you're not too far off having this work.
As I read it, your confusion stems from mistaking types, more or less. General advice: Use type signatures on all your top level functions and make sure they are sensible and what you expect of the function (I often do this even before implementing the function).
Let's take a look at your
-- | Function for creating a list of bigrams
-- | e.g. [("Colorless", "green"), ("green", "ideas")]
bigram :: [a] -> [[a]]
If you're giving in a list of Strings, you'll be getting a list of lists of Strings, so your bigram is a list.
You could decide to be more explicit (only allow Strings instead of sometype a - for the beginning at least). So, actually we get a list of Words an make a list of Bigrams from it:
type Word = String
type Bigram = (Word, Word)
bigram :: [Word] -> [Bigram]
For the implementation you can try to use readily available functions from Data.List, for example zipWith and tail.
Now your freqList and freqBigram look like
freqList :: [Word] -> Map Word Int
freqBigram :: [Word] -> Map Bigram Int
With this error messages of the compiler will be clearer to you. To point at it: Take care what you're doing in the lookups for the word frequencies. You're searching for the frequency of word1 and word2, and the bigram is (word1,word2).
Now you should be able to figure the solution out on your own, I guess.
First of all I advise you to have a look at the function
insertWith :: Ord k => (a -> a -> a) -> k -> a -> Map k a -> Map k a
maybe you'll recognize the pattern if used
f freqs bg = insertWith (+) bg 1 freqs
Next as #MathematicalOrchid already pointed out your solution is not too far from being correct.
lookup :: Ord k => k -> Map k a -> Maybe a
You already took care of that in your countElems function.
what I'd like to note that there is this neat abstraction called Applicative, which works really well for problems like yours.
First of all you have to import Control.Applicative if you're using GHC prior to 7.10 for newer versions it is already at your fingertips.
So what does this abstraction provide, similar to Functor it gives you a way to handle "side effects" in your case the possibility of the failing lookup resulting in Nothing.
We have two operators provided by Applicative: pure and <*>, and in addition as every Applicative is required to be a Functor we also get fmap or <$> which are the latter is just an infix alias for convenience.
So how does this apply to your situation?
<*> :: Applicative f => f (a -> b) -> f a -> f b
<$> :: Functor f => a -> b -> f a -> f b
First of all you see that those two look darn similar but with <*> being slightly less familiar.
Now having a function
f :: Int -> Int
f x = x + 3
and
x1 :: Maybe Int
x1 = Just 4
x2 :: Maybe Int
x2 = Nothing
one couldn't simply just f y because that wouldn't typecheck - but and that is the first idea to keep in mind. Maybe is a Functor (it is also an Applicative - it is even more an M-thing, but let's not go there).
f <$> x1 = Just 7
f <$> x2 = Nothing
so you can imagine the f looking up the value and performing the calculation inside the Just and if there is no value - a.k.a. we have the Nothing situation, we'll do what every lazy student does - be lazy and do nothing ;-).
Now we get to the next part <*>
g1 :: Maybe (Int -> Int)
g1 = Just (x + 3)
g2 :: Maybe (Int -> Int)
g2 = Nothing
Still g1 x1 wouldn't work, but
g1 <*> x1 = Just 7
g1 <*> x2 = Nothing
g2 <*> x1 = Nothing -- remember g2 is Nothing
g2 <*> x2 = Nothing
NEAT! - but still how does this solve your problem?
The 'magic' is using both operators ... for multi-argument functions
h :: Int -> Int -> Int
h x y = x + y + 2
and partial function application, which just means put in one value get back a function that waits for the next value.
GHCi> :type h 1
h 1 :: Int -> Int
Now the strange thing happens we can use with a function like h.
GHCi> :type h1 <$> x1
h1 <$> x1 :: Maybe (Int -> Int)
well that's good because then we can use our <*> with it
y1 :: Maybe Int
y1 = Just 7
h1 <$> x1 <*> y1 = Just (4 + 7 + 2)
= Just 13
and this even works with an arbitrary number of arguments
k :: Int -> Int -> Int -> Int -> Int
k x y z w = ...
k <$> x1 <*> y1 <*> z1 <*> w1 = ...
So design a pure function that works with Int, Float, Double or whatever you like and then use the Functor/Applicative abstraction to make your lookup and frequency calculation work with each other.

Is there an indexed list in Haskell and is it good or bad?

I am a new comer to the Haskell world and I am wondering if there is something like this:
data IndexedList a = IList Int [a]
findIndex::(Int->Int)->IndexedList a->(a,IndexedList a)
findIndex f (IList x l) = (l!!(f x), IList (f x) l)
next::IndexedList a->(a,IndexedList a)
next x = findIndex (+1) x
I've noticed that this kind of list is not purely functional but kind of useful for some applications. Should it be considered harmful?
Thanks,
Bob
It's certainly useful to have a list that comes equipped with a pointed to a particular location in the list. However, the way it's usually done in Haskell is somewhat different - rather than using an explicit pointer, we tend to use a zipper.
The list zipper looks like this
data ListZipper a = LZ [a] a [a] deriving (Show)
You should think of the middle field a as being the element that is currently pointed to, the first field [a] as being the elements before the current position, and the final field [a] as being the elements after the current position.
Usually we store the elements before the current one in reverse order, for efficiency, so that the list [0, 1, 2, *3*, 4, 5, 6] with a pointer to the middle element, would be stored as
LZ [2,1,0] 3 [4,5,6]
You can define functions that move the pointer to the left or right
left (LZ (a:as) b bs) = LZ as a (b:bs)
right (LZ as a (b:bs)) = LZ (a:as) b bs
If you want to move to the left or right n times, then you can do that with the help of a function that takes another function, and applies it n times to its argument
times n f = (!!n) . iterate f
so that to move left three times, you could use
>> let lz = LZ [2,1,0] 3 [4,5,6]
>> (3 `times` left) lz
LZ [] 0 [1,2,3,4,5,6]
Your two functions findIndex and next can be written as
next :: ListZipper a -> (a, ListZipper a)
next = findIndex 1
findIndex :: Int -> ListZipper a -> (a, ListZipper a)
findIndex n x = let y#(LZ _ a _) = (n `times` right) x in (a, y)
Contrary to what you think this list is in fact purely functional. The reason is that IList (f x) l creates a new list (and does not, as you may think, modify the current IndexedList). It is in general not that easy to create non-purely functional data structures or functions in Haskell, as long as you stay away from unsafePerformIO.
The reason I would recommend against using the IndexedList is that there is no assurance that the index is less than the length of the list. In this case the lookup l!!(f x) will fail with an exception, which is generally considered bad style in Haskell. An alternative could be to use a safe lookup, which returns a Maybe a like the following:
findIndex :: (Int -> Int) -> IndexedList a -> (Maybe a, IndexedList a)
findIndex f (IList i l) = (maybe_x, IList new_i l)
where
new_i = f i
maybe_x = if new_i < length l
then Just (l !! newI)
else Nothing
I can also not think of a usecase where such a list would be useful, but I guess I am limited by my creativity ;)

Resources