Haskell recursion with random numbers and IO - haskell

For the 99 Haskell questions, specifically the 23rd one, I need to
"Extract a given number of randomly selected elements from a list.
Example (in lisp):
(rnd-select '(a b c d e f g h) 3)
(E D A)
"
Which I have implemented like so:
import System.Random
import Control.Monad
removeAt :: [a] -> Int -> [a]
removeAt (x:xs) i
| i > 0 = x : removeAt xs (i-1)
| otherwise = xs
rndSelect :: (RandomGen g) => [a] -> Int -> g -> IO [a]
rndSelect _ 0 _ = return []
rndSelect xs n gen = do
let (pos, newGen) = randomR (0, length xs - 1) gen
rest <- rndSelect (removeAt xs pos) (n-1) newGen
return $ (xs!!pos):rest
-- for an explanation of what this is doing see EXPLANATION below
As far as I can tell this works, but what I'm concerned about are those last two lines. I'm new to this and I don't know the associated costs of the '<-' operator is or bouncing in and out of IO repeatedly like I'm doing. Is this efficient, is there a better way to do this that doesn't involve bouncing IO, or is there no real overheads involved?
Any insight you have is appreciated, since I've only recently started learning these more sophisticated concepts in Haskell and haven't yet gotten used to reasoning about Haskell's IO system.
EXPLANATION: In order to do this I've decided that I should randomly select one element from the list using the randomR function (returns a random number in a given range), and keep doing this recursively until I've taken n elements.
I've made a couple assumptions about the problem that have lead me to this approach. Firstly I've assumed that rndSelect can select a specific element from the list only once, and secondly I've assumed that each element should have an equal probability of being picked.
PS: it's my first question on SO so if I've formatted the question poorly feel free to tell me.

You do not need IO for this, since randomR does not require it. What you need to do however, is to thread the random number generator through your computation:
import System.Random
import Control.Monad
removeAt :: [a] -> Int -> [a]
removeAt (x:xs) i
| i > 0 = x : removeAt xs (i-1)
| otherwise = xs
rndSelect :: (RandomGen t, Num a) => [a1] -> a -> t -> ([a1], t)
rndSelect _ 0 g = ([],g)
rndSelect xs n gen =
let (pos, newGen) = randomR (0, length xs - 1) gen
(rest,ng) = rndSelect (removeAt xs pos) (n-1) newGen
in ((xs!!pos):rest, ng)
If you're concerned about overheads going from IO to pure code, don't be. Instead you can try mwc-random package which will be atleast an order of magnitude faster in this case. Further, you could get additional benefit using any random access data structure instead of list if you have many elements.

You can avoid IO as :
rndSelect :: (RandomGen g) => [a] -> Int -> g -> [a]
rndSelect _ 0 _ = return []
rndSelect xs n gen = do
let (pos, newGen) = randomR (0, length xs - 1) gen
rest = rndSelect (removeAt xs pos) (n-1) newGen
in (xs!!pos):rest

Related

How do I deal with the error "No instance for (Control.Monad.IO.Class.MonadIO [])"?

Good evening everybody! This is a question concerning Haskell. I want to get x random elements from a list by using a function.
The problem I get is that when I try to use random numbers, with randomRIO. I get the error message:
No instance for (Control.Monad.IO.Class.MonadIO [])
arising from a use of `randomRIO'
This error message suddenly goes away when i use print or return. But I dont want to use print, and return messes up the output to a nested list [[a]] instead of [a].
Does any of you have a tip on what I can do to extract x random elements from the list, in the form of a list?
The type would be something like this. xRanElems :: [a] -> Int -> [a]
where the second Int is an accumulator but I got that covered.
xRanElems xs n = do
r <- randomRIO (0, n)
xRanElems2 xs n r
where n is just length xs - 1
xRanElems2 xs n r = (xs !! r) : (xRanElems (xsWithOutSelectedElem xs r) (n-1))
Thankful for any input!
The following typechecks:
import System.Random
xRanElems :: [a] -> Int -> IO [a]
xRanElems xs n = do
-- randomRIO :: Random t
-- => (t, t) -> IO t
r <- randomRIO (0, n) -- r :: Int
xRanElems2 xs n r
xRanElems2 :: [a] -> Int -> Int -> IO [a]
xRanElems2 xs n r =
let (a,b:c) = splitAt r xs
in
fmap (b :) -- [a] -> [a]
(xRanElems -- IO [a] -> IO [a]
(a++c) (n-1))
Trying to run it, e.g. with xRanElems [1..3] 2, reveals that it loops forever.
This is because you need to provide the base case in xRanElems to stop the recursion, e.g. returning [] when n <= 0.
The above code also contains an off-by-1 error which you're invited to fix.

Is there any terminating fold in Haskell?

I need some kind of fold which can terminate if I already have the data I want.
For example I need to find first 3 numbers which are greater than 5. I decided to use Either for termination and my code looks like this:
terminatingFold :: ([b] -> a -> Either [b] [b]) -> [a] -> [b]
terminatingFold f l = reverse $ either id id $ fold [] l
where fold acc [] = Right acc
fold acc (x:xs) = f acc x >>= flip fold xs
first3NumsGreater5 acc x =
if length acc >= 3
then Left acc
else Right (if x > 5 then (x : acc) else acc)
Are there some more clever/generic approaches?
The result of your function is a list, and it would be desirable if it were produced lazily, that is, extracting one item from the result should only require evaluating the input list up until the item is found there.
Unfolds are under-appreciated for these kinds of tasks. Instead of focusing on "consuming" the input list, let's think of it as a seed from which (paired with some internal accumulator) we can produce the result, element by element.
Let's define a Seed type that contains a generic accumulator paired with the as-yet unconsumed parts of the input:
{-# LANGUAGE NamedFieldPuns #-}
import Data.List (unfoldr)
data Seed acc input = Seed {acc :: acc, pending :: [input]}
Now let's reformulate first3NumsGreater5 as a function that either produces the next output element from the Seed, of signals that there aren't any more elements:
type Counter = Int
first3NumsGreater5 :: Seed Counter Int -> Maybe (Int, Seed Counter Int)
first3NumsGreater5 (Seed {acc, pending})
| acc >= 3 =
Nothing
| otherwise =
case dropWhile (<= 5) pending of
[] -> Nothing
x : xs -> Just (x, Seed {acc = succ acc, pending = xs})
Now our main function can be written in terms of unfoldr:
unfoldFromList ::
(Seed acc input -> Maybe (output, Seed acc input)) ->
acc ->
[input] ->
[output]
unfoldFromList next acc pending = unfoldr next (Seed {acc, pending})
Putting it to work:
main :: IO ()
main = print $ unfoldFromList first3NumsGreater5 0 [0, 6, 2, 7, 9, 10, 11]
-- [6,7,9]
Normally an early termination-capable fold is foldr with the combining function which is non-strict in its second argument. But, its information flow is right-to-left (if any), while you want it left-to-right.
A possible solution is to make foldr function as a left fold, which can then be made to stop early:
foldlWhile :: Foldable t
=> (a -> Bool) -> (r -> a -> r) -> r
-> t a -> r
foldlWhile t f a xs = foldr cons (\acc -> acc) xs a
where
cons x r acc | t x = r (f acc x)
| otherwise = acc
You will need to tweak this for t to test the acc instead of x, to fit your purposes.
This function is foldlWhile from https://wiki.haskell.org/Foldl_as_foldr_alternative, re-written a little. foldl'Breaking from there might fit the bill a bit better.
foldr with the lazy reducer function can express corecursion perfectly fine just like unfoldr does.
And your code is already lazy: terminatingFold (\acc x -> Left acc) [1..] => []. That's why I'm not sure if this answer is "more clever", as you've requested.
edit: following a comment by #danidiaz, to make it properly lazy you'd have to code it as e.g.
first3above5 :: (Foldable t, Ord a, Num a)
=> t a -> [a]
first3above5 xs = foldr cons (const []) xs 0
where
cons x r i | x > 5 = if i==2 then [x]
else x : r (i+1)
| otherwise = r i
This can be generalized further by abstracting the test and the count.
Of course it's just reimplementing take 3 . filter (> 5), but shows how to do it in general with foldr.

Search in the list of integers, one of the longest ordered subsets (not necessarily consecutive) ordered by growth

Function, which finds in the list of integers one of the longest ordered increments of subscripts (not necessarily consecutive) numbers. Example:
• Sequence [21,27,15,14,18,16,14,17,22,13] = [14,16,17,22]
I have a problem with the function which takes the initial number from the array, and looks for a sequence:
fstLen:: Int -> [Int] -> [Int]
fstLen a [] = a: []
fstLen x (l:ls) = if x < l then x:(fstLen l ls) else fstLen x ls
I have problems in place, 14,18,16,14,17,22,13
14 < 18 but then 18 > 16 and my algorithm takes the number 16 as the basis and is looking for a new sequence and I need to go back to 14
How can I do it?
(sorry for my english)
You could always just use subsequences from Data.List to get all the possible subsequences in a list. When you get these subsequences, just take the sorted ones with this function and filter:
isSorted :: (Ord a) => [a] -> Bool
isSorted [] = True
isSorted [_] = True
isSorted(x:y:xs) = x <= y && isSorted (y:xs)
Then get the maximum length subsequence with maximumBy(or another method), with the ordering being comparinglength.
Here is what the code could look like:
import Data.Ord (comparing)
import Data.List (subsequences, maximumBy, nub)
isSorted :: (Ord a) => [a] -> Bool
isSorted [] = True
isSorted [_] = True
isSorted(x:y:xs) = x <= y && isSorted (y:xs)
max_sequence :: (Ord a) => [a] -> [a]
max_sequence xs = maximumBy (comparing length) $ map nub $ filter isSorted (subsequences xs)
Which seems to work correctly:
*Main> max_sequence [21,27,15,14,18,16,14,17,22,13]
[14,16,17,22]
Note: used map nub to remove duplicate elements from the sub sequences. If this is not used, then this will return [14,14,17,22] as the maximum sub sequence, which may be fine if you allow this.
A more efficient n log n solution can be done by maintaining a map where
keys are the first element of an increasing sequence.
values are a tuple: (length of the sequence, the actual sequence)
and the map maintains the invariance that for each possible size of an increasing sequence, only the lexicographically largest one is retained.
Extra traceShow bellow to demonstrate how the map changes while folding from the end of the list:
import Debug.Trace (traceShow)
import Data.Map (empty, elems, insert, delete, lookupGT, lookupLT)
-- longest (strictly) increasing sequence
lis :: (Ord k, Show k, Foldable t) => t k -> [k]
lis = snd . maximum . elems . foldr go empty
where
go x m = traceShow m $ case x `lookupLT` m of
Nothing -> m'
Just (k, v) -> if fst a < fst v then m' else k `delete` m'
where
a = case x `lookupGT` m of
Nothing -> (1, [x])
Just (_, (i, r)) -> (i + 1, x:r)
m' = insert x a m
then:
\> lis [21,27,15,14,18,16,14,17,22,13]
fromList []
fromList [(13,(1,[13]))]
fromList [(22,(1,[22]))]
fromList [(17,(2,[17,22])),(22,(1,[22]))]
fromList [(14,(3,[14,17,22])),(17,(2,[17,22])),(22,(1,[22]))]
fromList [(16,(3,[16,17,22])),(17,(2,[17,22])),(22,(1,[22]))]
fromList [(16,(3,[16,17,22])),(18,(2,[18,22])),(22,(1,[22]))]
fromList [(14,(4,[14,16,17,22])),(16,(3,[16,17,22])),(18,(2,[18,22])),(22,(1,[22]))]
fromList [(15,(4,[15,16,17,22])),(16,(3,[16,17,22])),(18,(2,[18,22])),(22,(1,[22]))]
fromList [(15,(4,[15,16,17,22])),(16,(3,[16,17,22])),(18,(2,[18,22])),(27,(1,[27]))]
[15,16,17,22]
It is not necessary to retain the lists within the map. One can reconstruct the longest increasing sequence only using the keys and the length of the sequences (i.e. only the first element of the tuples).
Excellent question! Looking forward to a variety of answers.
Still improving my answer. The answer below folds to build increasing subsequences from the right. It also uses the the list monad to prepend new elements to subsequences if the new element is smaller than the head of the subsequence. (This is my first real application of the list monad.) For example,
λ> [[3], [1]] >>= (prepIfSmaller 2)
[[2,3],[3],[1]]
This solution is about as short as I can make it.
import Data.List (maximumBy)
maxSubsequence :: Ord a => [a] -> [a]
maxSubsequence [] = []
maxSubsequence xs = takeLongest $ go [] xs
where
takeLongest :: Ord a => [[a]] -> [a]
takeLongest = maximumBy (\ x y -> compare (length x) (length y))
go :: Ord a => [[a]] -> [a] -> [[a]]
go = foldr (\x subs -> [x] : (subs >>= (prepIfSmaller x)))
where prepIfSmaller x s#(h:_) = (if x < h then [x:s] else []) ++ [s]
Quick test.
λ> maxSubsequence [21,27,15,14,18,16,14,17,22,13]
[15,16,17,22]

Elegant way to return the longer of two strings

I'm trying to write a function that returns the longer of two strings. So far this is what I have:
maxString :: String -> String -> String
maxString a b
| (length a) > (length b) = a
| otherwise = b
This works, but I'm wondering if there is a more elegant way to write this. Note: the two arguments cannot be in a list. They must be separate arguments to allow for currying.
Thoughts?
So far all answers except Tejing's fully traverse both arguments. This one only traverses as far as the end of the shorter one.
longest a b = l' a b where
l' _ [] = a
l' [] _ = b
l' (_:ar) (_:br) = l' ar br
Played around with this one a bit. I wanted it to be a one-liner as well as having complexity O(min(n,m)). (ie. working even with infinite lists)
maxList :: [a] -> [a] -> [a]
maxList s s' = if s == zipWith const s s' then s' else s
You can use existing Haskell facilities in Data.Ord and Data.Function and obtain a one-liner as follows:
maxString' x y = maximumBy (compare `on` length) [x, y]
Code:
import Data.Ord
import Data.Function
import Data.List
maxString' x y = maximumBy (compare `on` length) [x, y]
*Main> maxString' "ab" "c"
"ab"
-- EDIT --
As #DavidYoung pointed out,
compare `on` length
above can also be written in a shorter form: comparing length
That's pretty much the most concise way to write it. However, the version below will terminate even when one list is infinite (O(min(a,b)) instead of O(a + b)).
compareLengths :: [a] -> [b] -> Ordering
compareLengths [ ] [ ] = EQ
compareLengths (a:as) [ ] = GT
compareLengths [ ] (b:bs) = LT
compareLengths (a:as) (b:bs) = compareLengths as bs
maxList :: [a] -> [a] -> [a]
maxList a b = case compareLengths a b of
GT -> a
_ -> b
Also, there's no real reason to limit your function to just Strings when it would make sense for arbitrary lists.

Improving code to generate a distribution

I am new to Haskell and I wonder how/if I can make this code more efficient and tidy. It seems unnecessarily long and untidy.
My script generates a list of 10 averages of 10 coin flips.
import Data.List
import System.Random
type Rand a = StdGen -> Maybe (a,StdGen)
output = do
gen <- newStdGen
return $ distBernoulli 10 10 gen
distBernoulli :: Int -> Int -> StdGen -> [Double]
distBernoulli m n gen = [fromIntegral (sum x) / fromIntegral (length x) | x <- lst]
where lst = splitList (randomList (n*m) gen) n
splitList :: [Int] -> Int -> [[Int]]
splitList [] n = []
splitList lst n = take n lst : splitList (drop n lst) n
randomList :: Int -> StdGen -> [Int]
randomList n = take n . unfoldr trialBernoulli
trialBernoulli :: Rand Int
trialBernoulli gen = Just ((2*x)-1,y)
where (x,y) = randomR (0,1) gen
Any help would be appreciated, thanks.
I'd tackle this problem in a slightly different way. First I'd define a function that would give me an infinite sampling of flips from a Bernoulli distribution with success probability p:
flips :: Double -> StdGen -> [Bool]
flips p = map (< p) . randoms
Then I'd write distBernoulli as follows:
distBernoulli :: Int -> Int -> StdGen -> [Double]
distBernoulli m n = take m . map avg . splitEvery n . map val . flips 0.5
where
val True = 1
val False = -1
avg = (/ fromIntegral n) . sum
I think this matches your definition of distBernoulli:
*Main> distBernoulli 10 10 $ mkStdGen 0
[-0.2,0.4,0.4,0.0,0.0,0.2,0.0,0.6,0.2,0.0]
(Note that I'm using splitEvery from the handy split package, so you'd have to install the package and add import Data.List.Split (splitEvery) to your imports.)
This approach is slightly more general, and I think a little neater, but really the main difference is just that I'm using randoms and splitEvery.
EDIT: I posted this too fast and didn't match behavior, it should be good now.
import Control.Monad.Random
import Control.Monad (liftM, replicateM)
KNOWLEDGE: If you like randoms then use MonadRandom - it rocks.
STYLE: Only importing symbols you use helps readability and sometimes maintainability.
output :: IO [Double]
output = liftM (map dist) getLists
Note: I've given output an explicit type, but know it doesn't have to be IO.
STYLE:
1) Its usually good to separate your IO from pure functions. Here I've divided out the getting of random lists from the calculation of distributions. In your case it was pure but you combined getting "random" lists via a generator with the distribution function; I would divide those parts up.
2) Read Do notation considered harmful. Consider using >>= instead of
output = do
gen <- new
return $ dist gen
you can do:
output = new >>= dist
Wow!
dist :: [Int] -> Double
dist lst = (fromIntegral (sum lst) / fromIntegral (length lst))
getLists :: MonadRandom m => Int -> Int -> m [[Int]]
getLists m n= replicateM m (getList n)
KNOWLEDGE In Control.Monad anything ending in an M is like the original but for monads. In this case, replicateM should be familiar if you used the Data.List replicate function.
getList :: MonadRandom m => Int -> m [Int]
getList m = liftM (map (subtract 1 . (*2)) . take m) (getRandomRs (0,1::Int))
STYLE: If I do something lots of times I like to have a single instance in its own function (getList) then the repetition in a separate function.
I'm not sure I understand your code or your question...
But it seems to me all you'd need to do is generate a list of random ones and zeroes, and then divide each of them by their length with a map and add them together with a foldl.
Something like:
makeList n lis = if n /= 0 then
makeList (n-1) randomR(0,1) : lis
else
lis
And then make it apply a Map and Foldl or Foldr to it.
Using the above, I am now using this.
import Data.List
import System.Random
type Rand a = [a]
distBernoulli :: Int -> Int -> StdGen -> [Double]
distBernoulli m n gen = [fromIntegral (sum x) / fromIntegral (length x) | x <- lst]
where lst = take m $ splitList (listBernoulli gen) n
listBernoulli :: StdGen -> Rand Int
listBernoulli = map (\x -> (x*2)-1) . randomRs (0,1)
splitList :: [Int] -> Int -> [[Int]]
splitList lst n = take n lst : splitList (drop n lst) n
Thanks for your help, and I welcome any further comments :)

Resources