Finding elements and their indices in a list - haskell

I need to have both the elements of a list satisfying a predicate and the indices of these elements. I can achieve this as follows:
import Data.List (findIndices)
list :: [Int]
list = [3,2,4,1,9]
indices = findIndices (>2) list
elems = [list!!i | i <- indices]
-- same as: elems = filter (>2) list
Isn't there a package providing a function giving both the elements and their indices in "one shot" ? I'm surprised I don't find this function somewhere. Otherwise, how to do such a function, improving my above code ? I don't believe this code is optimal since it somehow accesses twice to the elements of the list. I took a quick look at the source code of findIndices but I don't understand it yet.

You can make it more efficient – avoid the !! access – by filtering a list of (index, element) tuples.
let (indices, elems) = unzip [(i, x) | (i, x) <- zip [0..] list, x > 2]
Split into an appropriate function:
findItems :: (a -> Bool) -> [a] -> [(Int, a)]
findItems predicate = filter (predicate . snd) . zip [0..]
let (indices, elems) = unzip $ findItems (>2) list
There might be a more straightforward way, and I’ll be happy to find out about it :)

I think Ry's suggestion is just fine. For a more direct, and in particular more generic one, you could use lens tooling:
Prelude> import Control.Lens as L
Prelude L> import Control.Arrow as A
Prelude L A> ifoldr (\i x -> if x>2 then (i:)***(x:) else id) ([],[]) [3,2,4,1,9]
([0,2,4],[3,4,9])
This can immediately be used also on arrays (where the index extraction is much more useful)
Prelude L A> import qualified Data.Vector as V
Prelude L A V> ifoldr (\i x -> if x>2 then (i:)***(x:) else id) ([],[]) $ V.fromList [3,2,4,1,9]
([0,2,4],[3,4,9])
...even on unboxed ones, though these aren't Foldable:
Prelude L A V> import qualified Data.Vector.Unboxed as VU
Prelude L A V VU> import Data.Vector.Generic.Lens as V
ifoldrOf vectorTraverse (\i x -> if x>2 then (i:)***(x:) else id) ([],[]) $ VU.fromList [3,2,4,1,9]
([0,2,4],[3.0,4.0,9.0])

(indices, elems) = unzip [ item | item <- zip [0..] ls, (snd item) > 2 ]
Not sure that it's any more efficient, but it gets it done in "one shot".

Related

Random numbers without duplicates

Simulating a lottery of 6 numbers chosen from 40, I want to create a list of numbers in Haskell using the system random generator but eliminate duplicates, which often arise.
If I have the following:
import System.Random
main :: IO ()
main = do
rs <- forM [1..6] $ \_x -> randomRIO (1, 40) :: (IO Int)
print rs
this is halfway. But how do I filter out duplicates? It seems to me I need a while loop of some sort to construct a list filtering elements that are already in the list until the list is the required size. If I can generate an infinite list of random numbers and filter it inside the IO monad I am sure that would work, but I do not know how to approach this. It seems while loops are generally deprecated in Haskell, so I am uncertain of the true Haskeller's way here. Is this a legitimate use case for a while loop, and if so, how does one do that?
The function you are looking for is nub from Data.List, to filter dublicates.
import Data.List
import System.Random
main = do
g <- newStdGen
print . take 6 . nub $ (randomRs (1,40) g :: [Int])
If you don't mind using a library, then install the random-shuffle package and use it like this:
import System.Random.Shuffle
import Control.Monad.Random
main1 = do
perm <- evalRandIO $ shuffleM [1..10]
print perm
If you want to see how to implement a naive Fischer-Yates shuffle using lists in Haskell, have a look at this code:
shuffle2 xs g = go [] g (length xs) xs
where
go perm g n avail
| n == 0 = (perm,g)
| otherwise = let (i, g') = randomR (0,n-1) g
a = avail !! i
-- can also use splitAt to define avail':
avail' = take i avail ++ drop (i+1) avail
in go (a:perm) g' (n-1) avail'
main = do
perm <- evalRandIO $ liftRand $ shuffle2 [1..10]
print perm
The parameters to the go helper function are:
perm - the constructed permutation so far
g - the current generator value
n - the length of the available items
avail - the available items - i.e. items not yet selected to be part of the permutation
go simply adds a random element from avail to the permutation being constructed and recursively calls itself with the new avail list and new generator.
To only draw k random elements from xs, just start go at k instead of length xs:
shuffle2 xs k g = go [] g k xs
...
You could also use a temporary array (in the ST or IO monad) to implement a Fischer-Yates type algorithm. The shuffleM function in random-shuffle uses a yet completely different approach which you might find interesting.
Update: Here is an example of using an ST-array in a F-Y style algorithm:
import Control.Monad.Random
import Data.Array.ST
import Control.Monad
import Control.Monad.ST (runST, ST)
shuffle3 :: RandomGen g => Int -> g -> ([Int], g)
shuffle3 n g0 = runST $ do
arr <- newListArray (1,n) [1..n] :: ST s (STUArray s Int Int)
let step g i = do let (j,g') = randomR (1,n) g
-- swap i and j
a <- readArray arr i
b <- readArray arr j
writeArray arr j a
writeArray arr i b
return g'
g' <- foldM step g0 [1..n]
perm <- getElems arr
return (perm, g')
main = do
perm <- evalRandIO $ liftRand $ shuffle3 20
print perm
I've used the Fisher Yates Shuffle in C++ with a decent random number generator to great success. This approach is very efficient if you are willing to allocate an array for holding numbers 1 to 40.
Going the strict IO way requires to break down nub, bringing the condition into the tail recursion.
import System.Random
randsf :: (Eq a, Num a, Random a) => [a] -> IO [a]
randsf rs
| length rs > 6 = return rs
| otherwise = do
r <- randomRIO (1,40)
if elem r rs
then randsf rs
else randsf (r:rs)
main = do
rs <- randsf [] :: IO [Int]
print rs
If you know what you do unsafeInterleaveIO from System.IO.Unsafe can be handy, allowing you to generate lazy lists from IO. Functions like getContents work this way.
import Control.Monad
import System.Random
import System.IO.Unsafe
import Data.List
rands :: (Eq a, Num a, Random a) => IO [a]
rands = do
r <- randomRIO (1,40)
unsafeInterleaveIO $ liftM (r:) rands
main = do
rs <- rands :: IO [Int]
print . take 6 $ nub rs
You commented:
The goal is to learn how to build a list monadically using filtering. It's a raw newbie question
Maybe you should change the question title then! Anyways, this is quite a common task. I usually define a combinator with a general monadic type that does what I want, give it a descriptive name (which I didn't quite succeed in here :-) and then use it, like below
import Control.Monad
import System.Random
-- | 'myUntil': accumulate a list with unique action results until the list
-- satisfies a test
myUntil :: (Monad m, Eq a) => ([a] -> Bool) -> m a -> m [a]
myUntil test action = myUntil' test [] action where
myUntil' test resultSoFar action = do
if test resultSoFar then
return resultSoFar
else do
x <- action
let newResults = if x `elem` resultSoFar then resultSoFar
else resultSoFar ++ [x] -- x:resultSoFar
myUntil' test newResults action
main :: IO ()
main = do
let enough xs = length xs == 6
drawNumber = randomRIO (0, 40::Int)
numbers <- myUntil enough drawNumber
print numbers
NB: this is not the optimal way to get your 6 distinct numbers, but meant as an example how to, in general, build a list monadically using a filter that works on the entire list
It is in essence the same as Vektorweg's longest answer, but uses a combinator with a much more general type (which is the way I like to do it, which may be more useful for you, given your comment at the top of this answer)

Implementing an efficient sliding-window algorithm in Haskell

I needed an efficient sliding window function in Haskell, so I wrote the following:
windows n xz#(x:xs)
| length v < n = []
| otherwise = v : windows n xs
where
v = take n xz
My problem with this is that I think the complexity is O(n*m) where m is the length of the list and n is the window size. You count down the list once for take, another time for length, and you do it down the list of essentially m-n times. It seems like it can be more efficient than this, but I'm at a loss for how to make it more linear. Any takers?
You can't get better than O(m*n), since this is the size of the output data structure.
But you can avoid checking the lengths of the windows if you reverse the order of operations: First create n shifted lists and then just zip them together. Zipping will get rid of those that don't have enough elements automatically.
import Control.Applicative
import Data.Traversable (sequenceA)
import Data.List (tails)
transpose' :: [[a]] -> [[a]]
transpose' = getZipList . sequenceA . map ZipList
Zipping a list of lists is just a transposition, but unlike transpose from Data.List it throws away outputs that would have less than n elements.
Now it's easy to make the window function: Take m lists, each shifted by 1, and just zip them:
windows :: Int -> [a] -> [[a]]
windows m = transpose' . take m . tails
Works also for infinite lists.
You can use Seq from Data.Sequence, which has O(1) enqueue and dequeue at both ends:
import Data.Foldable (toList)
import qualified Data.Sequence as Seq
import Data.Sequence ((|>))
windows :: Int -> [a] -> [[a]]
windows n0 = go 0 Seq.empty
where
go n s (a:as) | n' < n0 = go n' s' as
| n' == n0 = toList s' : go n' s' as
| otherwise = toList s'' : go n s'' as
where
n' = n + 1 -- O(1)
s' = s |> a -- O(1)
s'' = Seq.drop 1 s' -- O(1)
go _ _ [] = []
Note that if you materialize the entire result your algorithm is necessarily O(N*M) since that is the size of your result. Using Seq just improves performance by a constant factor.
Example use:
>>> windows [1..5]
[[1,2,3],[2,3,4],[3,4,5]]
First let's get the windows without worrying about the short ones at the end:
import Data.List (tails)
windows' :: Int -> [a] -> [[a]]
windows' n = map (take n) . tails
> windows' 3 [1..5]
[[1,2,3],[2,3,4],[3,4,5],[4,5],[5],[]]
Now we want to get rid of the short ones without checking the length of every one.
Since we know they are at the end, we could lose them like this:
windows n xs = take (length xs - n + 1) (windows' n xs)
But that's not great since we still go through xs an extra time to get its length. It also doesn't work on infinite lists, which your original solution did.
Instead let's write a function for using one list as a ruler to measure the amount to take from another:
takeLengthOf :: [a] -> [b] -> [b]
takeLengthOf = zipWith (flip const)
> takeLengthOf ["elements", "get", "ignored"] [1..10]
[1,2,3]
Now we can write this:
windows :: Int -> [a] -> [[a]]
windows n xs = takeLengthOf (drop (n-1) xs) (windows' n xs)
> windows 3 [1..5]
[[1,2,3],[2,3,4],[3,4,5]]
Works on infinite lists too:
> take 5 (windows 3 [1..])
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7]]
As Gabriella Gonzalez says, the time complexity is no better if you want to use the whole result. But if you only use some of the windows, we now manage to avoid doing the work of take and length on the ones you don't use.
If you want O(1) length then why not use a structure that provides O(1) length? Assuming you aren't looking for windows from an infinite list, consider using:
import qualified Data.Vector as V
import Data.Vector (Vector)
import Data.List(unfoldr)
windows :: Int -> [a] -> [[a]]
windows n = map V.toList . unfoldr go . V.fromList
where
go xs | V.length xs < n = Nothing
| otherwise =
let (a,b) = V.splitAt n xs
in Just (a,b)
Conversation of each window from a vector to a list might bite you some, I won't hazard an optimistic guess there, but I will bet that the performance is better than the list-only version.
For the sliding window I also used unboxed Vetors as length, take, drop as well as splitAt are O(1) operations.
The code from Thomas M. DuBuisson is a by n shifted window, not a sliding, except if n =1. Therefore a (++) is missing, however this has a cost of O(n+m). Therefore careful, where you put it.
import qualified Data.Vector.Unboxed as V
import Data.Vector.Unboxed (Vector)
import Data.List
windows :: Int -> Vector Double -> [[Int]]
windows n = (unfoldr go)
where
go !xs | V.length xs < n = Nothing
| otherwise =
let (a,b) = V.splitAt 1 xs
c= (V.toList a ++V.toList (V.take (n-1) b))
in (c,b)
I tried it out with +RTS -sstderr and:
putStrLn $ show (L.sum $ L.concat $ windows 10 (U.fromList $ [1..1000000]))
and got real time 1.051s and 96.9% usage, keeping in mind that after the sliding window two O(m) operations are performed.

Haskell: how to operate the string type in a do block

I want to make a function that firstly divides a list l to two list m and n. Then create two thread to find out the longest palindrome in the two list. My code is :
import Control.Concurrent (forkIO)
import System.Environment (getArgs)
import Data.List
import Data.Ord
main = do
l <- getArgs
forkIO $ putStrLn $ show $ longestPalindr $ mList l
forkIO $ putStrLn $ show $ longestPalindr $ nList l
longestPalindr x =
snd $ last $ sort $
map (\l -> (length l, l)) $
map head $ group $ sort $
filter (\y -> y == reverse y) $
concatMap inits $ tails x
mList l = take (length l `div` 2) l
nList l = drop (length l `div` 2) l
Now I can compile it, but the result is a [ ]. When I just run the longestPalindr and mList , I get the right result. I thought the logic here is right. So what is the problem?
The question title may need to be changed, as this is no longer about type errors.
The functionality of the program can be fixed by simply mapping longestPalindr across the two halves of the list. In your code, you are finding the longest palindrome across [[Char]], so the result length is usually just 1.
I've given a simple example of par and pseq. This just suggests to the compiler that it may be smart to evaluate left and right independently. It doesn't guarantee parallel evaluation, but rather leaves it up to the compiler to decide.
Consult Parallel Haskell on the wiki to understand sparks, compile with the -threaded flag, then run it with +RTS -N2. Add -stderr for profiling, and see if there is any benefit to sparking here. I would expect negative returns until you start to feed it longer lists.
For further reading on functional parallelism, take a look at Control.Parallel.Strategies. Manually wrangling threads in Haskell is only really needed in nondeterministic scenarios.
import Control.Parallel (par, pseq)
import System.Environment (getArgs)
import Data.List
import Data.Ord
import Control.Function (on)
main = do
l <- getArgs
let left = map longestPalindr (mList l)
right = map longestPalindr (nList l)
left `par` right `pseq` print $ longest (left ++ right)
longestPalindr x = longest pals
where pals = nub $ filter (\y -> y == reverse y) substrings
substrings = concatMap inits $ tails x
longest = maximumBy (compare `on` length)
mList l = take (length l `div` 2) l
nList l = drop (length l `div` 2) l
For reference, please read the Parallelchapter from Simon Marlow's book.
http://chimera.labs.oreilly.com/books/1230000000929/ch02.html#sec_par-eval-whnf
As others have stated, using par from the Eval monad seems to be the correct approach here.
Here is a simplified view of your problem. You can test it out by compiling with +RTS -threaded -RTSand then you can use Thread Scope to profile your performance.
import Control.Parallel.Strategies
import Data.List (maximumBy, subsequences)
import Data.Ord
isPalindrome :: Eq a => [a] -> Bool
isPalindrome xs = xs == reverse xs
-- * note while subsequences is correct, it is asymptotically
-- inefficient due to nested foldr calls
getLongestPalindrome :: Ord a => [a] -> Int
getLongestPalindrome = length . maximum' . filter isPalindrome . subsequences
where maximum' :: Ord a => [[a]] -> [a]
maximum' = maximumBy $ comparing length
--- Do it in parallel, in a monad
-- rpar rpar seems to fit your case, according to Simon Marlow's book
-- http://chimera.labs.oreilly.com/books/1230000000929/ch02.html#sec_par-eval-whnf
main :: IO ()
main = do
let shorter = [2,3,4,5,4,3,2]
longer = [1,2,3,4,5,4,3,2,1]
result = runEval $ do
a <- rpar $ getLongestPalindrome shorter
b <- rpar $ getLongestPalindrome longer
if a > b -- 'a > b' will always be false in this case
then return (a,"shorter")
else return (b,"longer")
print result
-- This will print the length of the longest palindrome along w/ the list name
-- Don't forget to compile w/ -threaded and use ThreadScope to check
-- performance and evaluation

How can I efficiently filter a pair of lists in Haskell?

I'm working on a simple problem on Programming Praxis: remove all duplicates from a list without changing the order. Assuming the elements are in class Ord, I came up with the following:
import Data.Set (Set)
import qualified Data.Set as Set
buildsets::Ord a => [a] -> [Set a]
buildsets = scanl (flip Set.insert) Set.empty
nub2::Ord a => [a] -> [a]
nub2 thelist = map fst $ filter (not . uncurry Set.member) (zip thelist (buildsets thelist))
As you can see, the buildsets function gets me most of the way there, but that last step (nub2) of putting everything together looks absolutely horrible. Is there a cleaner way to accomplish this?
Since we have to filter the list and we should probably use some set to keep records, we might as well use filterM with the state monad:
import qualified Data.Set as S
import Control.Monad.State.Strict
nub2 :: Ord a => [a] -> [a]
nub2 = (`evalState` S.empty) . filterM go where
go x = state $ \s -> if S.member x s
then (False, s)
else (True, S.insert x s)
If I wanted to somewhat golf the function, I'd to the following:
import Control.Arrow (&&&)
nub2 = (`evalState` S.empty) . filterM (\x -> state (S.notMember x &&& S.insert x))
Simple recursion looks ok to me.
> g xs = go xs S.empty where
> go [] _ = []
> go (x:xs) a | S.member x a = go xs a
> | otherwise = x:go xs (S.insert x a)
Based directly on Sassa NF's suggestion, but with a slight type change for cleanliness:
g x = catMaybes $ unfoldr go (Set.empty, x)
where
go (_,[]) = Nothing
go (s,(x:xs)) = Just (if Set.member x s then Nothing else Just x,
(Set.insert x s, xs))
Sometimes it really cleans up code to pull out and name subpieces. (In some ways this really is the Haskell way to comment code)
This is wordier that what you did above, but I think it is much easier to understand....
First I start with some definitions:
type Info=([Int], S.Set Int) --This is the remaining and seen items at a point in the list
item=head . fst --The current item
rest=fst --Future items
seen=snd --The items already seen
Then I add two self descriptive helper functions:
itemHasBeenSeen::Info->Bool
itemHasBeenSeen info = item info `S.member` seen info
moveItemToSet::Info->Info
moveItemToSet info = (tail $ rest info, item info `S.insert` seen info)
With this the program becomes:
nub2::[Int]->[Int]
nub2 theList =
map item
$ filter (not . itemHasBeenSeen)
$ takeWhile (not . null . rest)
$ iterate moveItemToSet start
where start = (theList, S.empty)
Reading from bottom to top (just as the data flows), you can easily see what it happening:
start=(theList, S.empty), start with the full list, and an empty set.
iterate moveItemToSet start, repeatedly move the first item of the list into the set, saving each iteration of Info in an array.
takeWhile (not . null . rest)- Stop the iteration when you run out of elements.
filter (not . itemHasBeenSeen)- Remove items that have already been seen.
map item- Throw away the helper values....

How to group similar items in a list using Haskell?

Given a list of tuples like this:
dic = [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
How to group items of dic resulting in a list grp where,
grp = [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]
I'm actually a newcomer to Haskell...and seems to be falling in love with it..
Using group or groupBy in Data.List will only group similar adjacent items in a list.
I wrote an inefficient function for this, but it results in memory failures as I need to process a very large coded string list. Hope you would help me find a more efficient way.
Whenever possible, reuse library code.
import Data.Map
sortAndGroup assocs = fromListWith (++) [(k, [v]) | (k, v) <- assocs]
Try it out in ghci:
*Main> sortAndGroup [(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
fromList [(1,["bb","cc","aa"]),(2,["aa"]),(3,["gg","ff"])]
EDIT In the comments, some folks are worried about whether (++) or flip (++) is the right choice. The documentation doesn't say which way things get associated; you can find out by experimenting, or you can sidestep the whole issue using difference lists:
sortAndGroup assocs = ($[]) <$> fromListWith (.) [(k, (v:)) | (k, v) <- assocs]
-- OR
sortAndGroup = fmap ($[]) . M.fromListWith (.) . map (fmap (:))
These alternatives are about the same length as the original, but they're a bit less readable to me.
Here's my solution:
import Data.Function (on)
import Data.List (sortBy, groupBy)
import Data.Ord (comparing)
myGroup :: (Eq a, Ord a) => [(a, b)] -> [(a, [b])]
myGroup = map (\l -> (fst . head $ l, map snd l)) . groupBy ((==) `on` fst)
. sortBy (comparing fst)
This works by first sorting the list with sortBy:
[(1,"aa"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg"),(1,"bb")]
=> [(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]
then grouping the list elements by the associated key with groupBy:
[(1,"aa"),(1,"bb"),(1,"cc"),(2,"aa"),(3,"ff"),(3,"gg")]
=> [[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]
and then transforming the grouped items to tuples with map:
[[(1,"aa"),(1,"bb"),(1,"cc")],[(2,"aa")],[(3,"ff"),(3,"gg")]]
=> [(1,["aa","bb","cc"]), (2, ["aa"]), (3, ["ff","gg"])]`)
Testing:
> myGroup dic
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]
Also you can use TransformListComp extension, for example:
Prelude> :set -XTransformListComp
Prelude> import GHC.Exts (groupWith, the)
Prelude GHC.Exts> let dic = [ (1, "aa"), (1, "bb"), (1, "cc") , (2, "aa"), (3, "ff"), (3, "gg")]
Prelude GHC.Exts> [(the key, value) | (key, value) <- dic, then group by key using groupWith]
[(1,["aa","bb","cc"]),(2,["aa"]),(3,["ff","gg"])]
If the list is not sorted on the first element, I don't think you can do better than O(nlog(n)).
One simple way would be to just sort and then use anything from the answer of second part.
You can use from Data.Map a map like Map k [a] to use first element of tuple as key and keep on adding to the values.
You can write your own complex function, which even after you all the attempts will still take O(nlog(n)).
If list is sorted on the first element as is the case in your example, then the task is trivial for something like groupBy as given in the answer by #Mikhail or use foldr and there are numerous other ways.
An example of using foldr is here:
grp :: Eq a => [(a,b)] -> [(a,[b])]
grp = foldr f []
where
f (z,s) [] = [(z,[s])]
f (z,s) a#((x,y):xs) | x == z = (x,s:y):xs
| otherwise = (z,[s]):a
{-# LANGUAGE TransformListComp #-}
import GHC.Exts
import Data.List
import Data.Function (on)
process :: [(Integer, String)] -> [(Integer, [String])]
process list = [(the a, b) | let info = [ (x, y) | (x, y) <- list, then sortWith by y ], (a, b) <- info, then group by a using groupWith]

Resources