What optimizations can be made to this Haskell code?

What optimizations can be made to this Haskell code? - haskell

--Returns last N elements in list
lastN :: Int -> [a] -> [a]
lastN n xs = let m = length xs in drop (m-n) xs
--create contiguous array starting from index b within list a
produceContiguous :: [a] -> Int -> [[a]]
produceContiguous [] _ = [[]]
produceContiguous arr ix = scanl (\acc x -> acc ++ [x]) [arr !! ix] inset
where inset = lastN (length arr - (ix + 1)) arr
--Find maximum sum of all possible contiguous sub arrays, modulo [n]
--d is dummy data
let d = [1,2,3,10,6,3,1,47,10]
let maxResult = maximum $ map (\s -> maximum s) $ map (\c -> map (\ac -> (sum ac )`mod` (last n)) c ) $ map (\n -> produceContiguous d n ) [0..(length d) -1]
I'm a Haskell newb - just a few days into it .. If I'm doing something obviously wrong, whoopsies

You can improve the runtime a lot by observing that map sum (produceContiguous d n) (which has runtime Ω(m^2), m the length of drop n d -- possibly O(m^3) time because you're appending to the end of acc on each iteration) can be collapsed to scanl (+) 0 (drop n d) (which has runtime O(m)). There are plenty of other stylistic changes I would make as well, but that's the main algorithmic one I can think of.
Cleaning up all the stylistic stuff, I would probably write:
import Control.Monad
import Data.List
addMod n x y = (x+y) `mod` n
maxResult n = maximum . (scanl (addMod n) 0 <=< tails)
In ghci:
*Main> jaggedGoofyMax 100 [1..1000]
99
(12.85 secs, 24,038,973,096 bytes)
*Main> dmwitMax 100 [1..1000]
99
(0.42 secs, 291,977,440 bytes)
Not shown here is the version of jaggedGoofyMax that has only the optimization I mentioned in my first paragraph applied, which has slightly better runtime/memory usage stats to dmwitMax when run in ghci (but basically identical to dmwitMax when both are compiled with -O2). So you can see that for even modest input sizes this optimization can make a big difference.

Related

Haskell: Increment elements of a list by cumulative length of previous lists

Here is the list of lists: [[1,2,3],[1,2,3,4],[1,2,3]]
How can I increment each element of the second list by the length of the first list, and increment the third list by the length of the first list + second list? The first list should remain unchanged.
Intended output: [[1,2,3],[4,5,6,7],[8,9,10]]
Since the first list has length 3, the second list is generated by [1+3, 2+3, 3+3, 4+3].
Since the first list + second list combined have length 7, the third list is generated by [1+7, 2+7, 3+7].
Ideally it should work with any number of lists.
So far, I've had slight sucess using this:
scanl1 (\xs ys -> [y + length xs | y <- ys]) [[1,2,3],[1,2,3,4],[1,2,3]]
which outputs: [[1,2,3],[4,5,6,7],[5,6,7]]

scanl1 is a good idea, but it's not quite right, because you don't want your accumulator to be a list, but rather to be an integer. So you really want scanl, not scanl1. I'll leave it as an exercise for you to see how to adjust your solution - given that you managed to write something almost-right with scanl1, I don't think you'll find it too hard once you have the right function.
In the comments, jpmariner suggests mapAccumL :: (s -> a -> (s, b)) -> s -> [a] -> (s, [b])). That's perfectly typed for what we want to do, so let's see how it would look.
import Data.Traversable (mapAccumL)
addPreviousLengths :: [[Int]] -> [[Int]]
addPreviousLengths = snd . mapAccumL go 0
where go n xs = (n + length xs, map (+ n) xs)
λ> addPreviousLengths [[1,2,3],[1,2,3,4],[1,2,3]]
[[1,2,3],[4,5,6,7],[8,9,10]]
mapAccumL really is the best tool for this job - there's not much unnecessary complexity involved in using it. But if you're trying to implement this from scratch, you might try the recursive approach Francis King suggested. I'd suggest a lazy algorithm instead of the tail-recursive algorithm, though:
incrLength :: [[Int]] -> [[Int]]
incrLength = go 0
where go _ [] = []
go amount (x:xs) =
map (+ amount) x : go (amount + length x) xs
It works the same as the mapAccumL version. Note that both versions are lazy: they consume only as much of the input list as necessary. This is an advantage not shared by a tail-recursive approach.
λ> take 3 . incrLength $ repeat [1]
[[1],[2],[3]]
λ> take 3 . addPreviousLengths $ repeat [1]
[[1],[2],[3]]

There are many ways to solve this. A simple recursion is one approach:
lst :: [[Int]]
lst = [[1,2,3],[1,2,3,4],[1,2,3]]
incrLength :: [[Int]] -> Int -> [[Int]] -> [[Int]]
incrLength [] _ result = result
incrLength (x:xs) amount result =
incrLength xs (amount + length x) (result ++ [map (+amount) x])
(Edit: it is more efficient to use (:) in this function. See #amalloy comment below. The result then has to be reversed.
incrLength :: [[Int]] -> Int -> [[Int]] -> [[Int]]
incrLength [] _ result = reverse result
incrLength (x:xs) amount result =
incrLength xs (amount + length x) (map (+amount) x : result)
End Edit)
Another approach is to use scanl. We use length to get the length of the inner lists, then accumulate using scanl.
map length lst -- [3,4,3]
scanl (+) 0 $ map length lst -- [0,3,7,10]
init $ scanl (+) 0 $ map length lst -- [0,3,7]
Then we zip the lst and the accumulated value together, and map one over the other.
incrLength' :: [[Int]] -> [[Int]]
incrLength' lst =
[map (+ snd y) (fst y) | y <- zip lst addlst]
where
addlst =init $scanl (+) 0 $ map length lst
main = do
print $ incrLength lst 0 [] -- [[1,2,3],[4,5,6,7],[8,9,10]]

Can this prime sieve code be further simplified in Haskell?

The code works well
primes = next [2 ..]
where
next (p : ps) = p : next ts
where
ts = filter (\x -> mod x p /= 0) ps
Just GHCI think there is a incomplete patter in next.
Well, this is correct from a grammatical point of view.
But obviously the input of 'next' cannot be empty.
So is there a solution other than adding the declaration
({-# OPTIONS_GHC -Wno-incomplete-patterns #-})?

The exhaustiveness checker knows that next has type Num a => [a] -> [a]. The empty list is a valid argument to next, even if you never actually call next on the empty list.
The key here is that you don't really want Num a => [a] as your argument type. You know it will only be called on an infinite list, so use a type that doesn't have finite lists as values.
data Stream a = Cons a (Stream a)
sequence :: Num a => a -> Stream a
sequence x = Cons x (sequence (x + 1))
filterStream :: (a -> Bool) -> Stream a -> Stream a
filterStream p (Cons x xs) | p x = Cons x (filterStream p xs)
| otherwise = filterStream p xs
-- Since you'll probably want a list of values, not just a stream of them, at some point.
toList :: Stream a -> [a]
toList (Cons x xs) = x : toList xs
primes :: Stream Integer
primes = next (sequence 2)
where
next (Cons x xs) = Cons x xs'
where xs' = filterStream (\x -> mod x p /= 0) xs
The Stream library provides a module Data.Stream that defines the Stream type and numerous analogs to list functions.
import qualified Data.Stream as S
-- S.toList exists as well.
primes :: Stream Integer
primes = next (S.fromList [2..])
where next (Cons x xs) = Cons x (S.filter (\x -> mod x p /= 0) xs)

You've already got a proper answer to your question. For completeness, the other option is just to add the unneeded clause that we know will never be called:
primes = next [2 ..]
where
next (p : xs) =
p : next [x | x <- xs, mod x p > 0]
next _ = undefined
Another, more "old-style" solution, is to analyze the argument by explicit calls to head and tail (very much not recommended, in general):
primes = next [2 ..]
where
next xs = let { p = head xs } in
p : next [x | x <- tail xs, mod x p > 0]
This could perhaps count as a simplification.
On an unrelated note, you write that it "works well". Unfortunately, while indeed producing the correct results, it does so very slowly. Because of always taking only one element at a time off the input list, its time complexity is quadratic in the number n of primes produced. In other words, primes !! n takes time quadratic in n. Empirically,
> primes !! 1000
7927 -- (0.27 secs, 102987392 bytes)
> primes !! 2000
17393 -- (1.00 secs, 413106616 bytes)
> primes !! 3000
27457 -- (2.23 secs, 952005872 bytes)
> logBase (2/1) (1.00 / 0.27)
1.8889686876112561 -- n^1.9
> logBase (3/2) (2.23 / 1.00)
1.9779792870810489 -- n^2.0
In fact the whole bunch of the elements may be taken from the input at once, up to the square of the current prime, with the code thus taking only about ~ n1.5 time, give or take a log factor:
{-# LANGUAGE ViewPatterns #-}
primes_ = 2 : next primes_ [3 ..]
where
next (p : ps) (span (< p*p) -> (h, xs)) =
h ++ next ps [x | x <- xs, mod x p > 0]
next _ _ = undefined
Empirically, again, we get
> primes !! 3000
27457 -- (0.08 secs, 29980864 bytes)
> primes !! 30000
350381 -- (1.81 secs, 668764336 bytes)
> primes !! 60000
746777 -- (4.74 secs, 1785785848 bytes)
> primes !! 100000
1299721 -- (9.87 secs, 3633306112 bytes)
> logBase (6/3) (4.74 / 1.81)
1.388897361815054 -- n^1.4
> logBase (10/6) (9.87 / 4.74)
1.4358377567888103 -- n^1.45
As we can see here, the complexity advantage expresses itself as an enormous speedup in absolute terms as well.
So then this sieve is equivalent to the optimal trial division, unlike the one in the question. Of course when it was first proposed in 1976, Haskell had no view patterns yet, and in fact there was yet no Haskell itself.

A kind of sliding window

This function comes from some code to calculate convolutions of finite sequences.
window n k = [ drop (i-k) $ take i $ [1..n] | i <- [1..(n+k)-1] ]
*Main> window 4 6
[[1],[1,2],[1,2,3],[1,2,3,4],[1,2,3,4],[1,2,3,4],[2,3,4],[3,4],[4]]
It's sliding window of length k over a sequence of length n, where k can be larger than n.
The code calls take and drop on the source list roughly n+k times, so it seems to have at least quadratic complexity.
Clearly, it can be written without a list comprehension:
window n k = map (\i -> (drop (i-k) . take i) [1..n]) [1..(n+k)-1]
Is there a better way to do this?
Edit:
Full set of examples, as requested.
Prelude> window 4 4
[[1],[1,2],[1,2,3],[1,2,3,4],[2,3,4],[3,4],[4]]
Prelude> window 4 6
[[1],[1,2],[1,2,3],[1,2,3,4],[1,2,3,4],[1,2,3,4],[2,3,4],[3,4],[4]]
Prelude> window 6 4
[[1],[1,2],[1,2,3],[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6],[5,6],[6]]
Computing the convolution of [1..4] and [1..5] works like this:
Prelude> let a = window 4 5
Prelude> let b = window 5 4
Prelude> map sum $ zipWith (zipWith (*)) a (map reverse b)
[1,4,10,20,30,34,31,20]

So you want to have a window of length k sliding over the given sequence (its length n is then not important).
It starts with just its last cell over the head of the sequence, then it moves along notch-by-notch until it covers the sequence's last element by its head cell.
This is then just map (take k) (tails sequence) with take k (inits sequence) in the front:
window :: Int -> [a] -> [[a]]
window k = (++) <$> take k . inits <*> map (take k) . tails
Observe:
> window 4 [1..6]
[[],[1],[1,2],[1,2,3],[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6],[5,6],[6],[]]
> window 6 [1..4]
[[],[1],[1,2],[1,2,3],[1,2,3,4],[1,2,3,4],[2,3,4],[3,4],[4],[]]
You can take care of the []s by putting it through init . tail.
There's a discrepancy with your desired output in case k > n. If that's important an additional sequence of xs should be inserted between the two parts. Thus we get
-- NB: will diverge on infinite lists
window :: Int -> [a] -> [[a]]
window k xs
= (init . tail) $
take k (inits xs)
++ replicate (k-n-1) xs
++ map (take k) (tails xs)
where
n = length xs
note: Measuring length is an anti-pattern; it is used here for prototyping purposes only. Because of it the function will get stuck on infinite lists. Instead, length should be fused in so the function will be productive, producing successive windows indefinitely right away.
So now we get
> window 4 [1..6]
[[1],[1,2],[1,2,3],[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6],[5,6],[6]]
> window 6 [1..4]
[[1],[1,2],[1,2,3],[1,2,3,4],[1,2,3,4],[1,2,3,4],[2,3,4],[3,4],[4]]
tails is linear, and inits, normally quadratic, is capped by take k so in case k << n it'll be linear as well.
For completeness, here's a version which doesn't measure the length of the input list so it works for the infinite inputs as well:
window :: Int -> [a] -> [[a]]
window k xs | k > 0
= a
++ replicate (k - length a) xs
++ (init . map (take k) . tails
. drop 1 $ xs)
where
a = take k . tail $ inits xs

Using fold* to grow a list in Haskell

I'm trying to solve the following problem in Haskell: given an integer return the list of its digits. The constraint is I have to only use one of the fold* functions (* = {r,l,1,l1}).
Without such constraint, the code is simple:
list_digits :: Int -> [Int]
list_digits 0 = []
list_digits n = list_digits r ++ [n-10*r]
where
r = div n 10
But how do I use fold* to, essentially grow a list of digits from an empty list?
Thanks in advance.

Is this a homework assignment? It's pretty strange for the assignment to require you to use foldr, because this is a natural use for unfoldr, not foldr. unfoldr :: (b -> Maybe (a, b)) -> b -> [a] builds a list, whereas foldr :: (a -> b -> b) -> b -> [a] -> b consumes a list. An implementation of this function using foldr would be horribly contorted.
listDigits :: Int -> [Int]
listDigits = unfoldr digRem
where digRem x
| x <= 0 = Nothing
| otherwise = Just (x `mod` 10, x `div` 10)
In the language of imperative programming, this is basically a while loop. Each iteration of the loop appends x `mod` 10 to the output list and passes x `div` 10 to the next iteration. In, say, Python, this'd be written as
def list_digits(x):
output = []
while x > 0:
output.append(x % 10)
x = x // 10
return output
But unfoldr allows us to express the loop at a much higher level. unfoldr captures the pattern of "building a list one item at a time" and makes it explicit. You don't have to think through the sequential behaviour of the loop and realise that the list is being built one element at a time, as you do with the Python code; you just have to know what unfoldr does. Granted, programming with folds and unfolds takes a little getting used to, but it's worth it for the greater expressiveness.
If your assignment is marked by machine and it really does require you to type the word foldr into your program text, (you should ask your teacher why they did that and) you can play a sneaky trick with the following "id[]-as-foldr" function:
obfuscatedId = foldr (:) []
listDigits = obfuscatedId . unfoldr digRem

Though unfoldr is probably what the assignment meant, you can write this using foldr if you use foldr as a hylomorphism, that is, building up one list while it tears another down.
digits :: Int -> [Int]
digits n = snd $ foldr go (n, []) places where
places = replicate num_digits ()
num_digits | n > 0 = 1 + floor (logBase 10 $ fromIntegral n)
| otherwise = 0
go () (n, ds) = let (q,r) = n `quotRem` 10 in (q, r : ds)
Effectively, what we're doing here is using foldr as "map-with-state". We know ahead of time
how many digits we need to output (using log10) just not what those digits are, so we use
unit (()) values as stand-ins for those digits.
If your teacher's a stickler for just having a foldr at the top-level, you can get
away with making go partial:
digits' :: Int -> [Int]
digits' n = foldr go [n] places where
places = replicate num_digits ()
num_digits | n > 0 = floor (logBase 10 $ fromIntegral n)
| otherwise = 0
go () (n:ds) = let (q,r) = n `quotRem` 10 in (q:r:ds)
This has slightly different behaviour on non-positive numbers:
>>> digits 1234567890
[1,2,3,4,5,6,7,8,9,0]
>>> digits' 1234567890
[1,2,3,4,5,6,7,8,9,0]
>>> digits 0
[]
>>> digits' 0
[0]
>>> digits (negate 1234567890)
[]
>>> digits' (negate 1234567890)
[-1234567890]

Computing Moving Average in Haskell

I'm working on learning Haskell, so I tried to implement a moving average function. Here is my code:
mAverage :: Int-> [Int] -> [Float]
mAverage x a = [fromIntegral k / fromIntegral x | k <- rawAverage]
where
rawAverage = mAverage' x a a
-- First list contains original values; second list contains moving average computations
mAverage' :: Int -> [Int] -> [Int] -> [Int]
mAverage' 1 a b = b
mAverage' x a b = mAverage' (x - 1) a' b'
where
a' = init a
b' = zipWith (+) a' (tail b)
where the user calls mAverage with a length for each average and the list of values (e.g. mAverage 4 [1,2..100]).
However, when I run the code on the input mAverage 4 [1,2..100000], I get that it takes 3.6 seconds in ghci (using :set +s) and uses a gigabyte of memory. This seems very inefficient to me, as the equivalent function takes a fraction of a second in Python. Is there some way that I could make my code more efficient?

If you want to learn something new you can take a look at this nice solution for Moving Average problem. It is written by one of my students so I won't claim authorship. I really like it because it's very short. The only problem here is average function. Such functions are known to be bad. Instead you can use Beautiful folds by Gabriel Gonzalez. And yes, this function takes O(k) time (where k is size of window) for calculating average of window (I find it better because you can face floating point errors if you try to add only new element to window and subtract last). Oh, it also uses State monad :)
{-# LANGUAGE UnicodeSyntax #-}
module MovingAverage where
import Control.Monad (forM)
import Control.Monad.State (evalState, gets, modify)
moving :: Fractional a ⇒ Int → [a] → [a]
moving n _ | n <= 0 = error "non-positive argument"
moving n xs = evalState (forM xs $ \x → modify ((x:) . take (n-1)) >> gets average) []
where
average xs = sum xs / fromIntegral n

Here is a straightforward list-based solution which is idiomatic and fast enough, though requires more memory.
import Data.List (tails)
mavg :: Fractional b => Int -> [b] -> [b]
mavg k lst = take (length lst-k) $ map average $ tails lst
where average = (/ fromIntegral k) . sum . take k
This solution allows to use any function instead of average in a moving window.
The following solution is less universal but it is constant in space and seems to be the fastest one.
import Data.List (scanl')
mavg :: Fractional b => Int -> [b] -> [b]
mavg k lst = map (/ fromIntegral k) $ scanl' (+) (sum h) $ zipWith (-) t lst
where (h, t) = splitAt k lst
Finally, the solution which uses a kind of Okasaki's persistent functional queue, to keep the moving window. It does make sense when dealing with streaming data, like conduits or pipes.
mavg k lst = map average $ scanl' enq ([], take k lst) $ drop k lst
where
average (l,r) = (sum l + sum r) / fromIntegral k
enq (l, []) x = enq ([], reverse l) x
enq (l, (_:r)) x = (x:l, r)
And as it was mentioned in comments to the original post, do not use ghci for profiling. For example, you won't be able to see any benefits of scanl' in ghci.

Here's a solution for you.
The idea is to scan two lists, one where the averaging window starts, and another where it ends. Getting a tail end of a list costs as much as scanning the part we're skipping, and we're not copying anything. (If the windows size was usually quite large, we could calculate the remaining_data along with counting the sum initial_data, in one go.)
We generate a list of partial sums as described in my comment, then divide them by the windows width to get averages.
While slidingAverage calculates averages for biased position (window width to the right), centeredSlidingAverage calculates centered averages, using half window width to the left and to the right.
import Data.List (splitAt, replicate)
slidingAverage :: Int -> [Int] -> [Double] -- window size, source list -> list of averages
slidingAverage w xs = map divide $ initial_sum : slidingSum initial_sum xs remaining_data
where
divide = (\n -> (fromIntegral n) / (fromIntegral w)) -- divides the sums by window size
initial_sum = sum initial_data
(initial_data, remaining_data) = splitAt w xs
centeredSlidingAverage :: Int -> [Int] -> [Double] -- window size, source list -> list of averages
centeredSlidingAverage w xs = slidingAverage w $ left_padding ++ xs ++ right_padding
where
left_padding = replicate half_width 0
right_padding = replicate (w - half_width) 0
half_width = (w `quot` 2) -- quot is integer division
slidingSum :: Int -> [Int] -> [Int] -> [Int] -- window_sum before_window after_window -> list of sums
slidingSum _ _ [] = []
slidingSum window_sum before_window after_window = new_sum : slidingSum new_sum new_before new_after
where
value_to_go = head before_window
new_before = tail before_window
value_to_come = head after_window
new_after = tail after_window
new_sum = window_sum - value_to_go + value_to_come
When I try length $ slidingAverage 10 [1..1000000], it takes less than a second on my MBP. Due to the laziness, centeredSlidingAverage takes about the same time.

One simple way of doing it that also uses O(n) complexity
movingAverage :: (Fractional a) => Int -> [a] -> [a]
movingAverage n _ | n <= 0 = error "non-positive argument"
movingAverage n xs = fmap average $ groupBy n xs
where average xs' = sum xs' / fromIntegral (length xs')
groupBy :: Int -> [a] -> [[a]]
groupBy _ [] = []
groupBy n xs = go [] xs
where
go _ [] = []
go l (x:xs') = (x:t) : go (x:l) xs'
where t = take (n-1) l

Another way is to use STUArray.
import Data.Array.Unboxed
import Data.Array.ST
import Data.STRef
import Control.Monad
import Control.Monad.ST
movingAverage :: [Double] -> IO [Double]
movingAverage vals = stToIO $ do
let end = length vals - 1
myArray <- newArray (1, end) 0 :: ST s (STArray s Int Double)
forM_ [1 .. end] $ \i -> do
let cval = vals !! i
let lval = vals !! (i-1)
writeArray myArray i ((cval + lval)/2)
getElems myArray

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

What optimizations can be made to this Haskell code? - haskell

Related

Haskell: Increment elements of a list by cumulative length of previous lists

Can this prime sieve code be further simplified in Haskell?

A kind of sliding window

Using fold* to grow a list in Haskell

Computing Moving Average in Haskell

Categories

Resources