Optimize this bit of conduit code for speed - haskell

I am implementing an approximate counting algorithm where we:
Maintain t counters {X1,..., Xt} using log (log n) bits for each
Initialize all counters to 0
When an item arrives, increase each Xi by 1 independently with probability (½)Xi
When the stream is over, output Z = 1/t (2X1 − 1 + ... + 2Xt − 1)
repeat the above steps m times independently and in parallel and output the median.
Here it is in haskell, using the conduit library.
import Data.Random
import Data.Conduit
import Data.List
import Data.Ord (comparing)
import qualified Data.Conduit.List as Cl
import Control.Monad.Identity
type Prob = Double
type Counter = Float
type Delta = Double
type Eps = Double
-- * Run Morris alpha on stream inputs `xs`
morrisA :: [a] -> IO Counter
morrisA xs = flip runRVar StdRandom $ Cl.sourceList xs $$ alpha
-- * Run Morris beta on stream inputs `xs` for `t` independent trials and average
morrisB :: Int -> [a] -> IO Counter
morrisB t = fmap rmean . replicateM t . morrisA
-- * final morris algorithm
-- * Run on stream inputs `xs` for t independent trials for `t = 1/eps`,
-- * and `m` times in parralell, for `m = 1/(e^2 * d)`
-- * and take the median
morris :: Eps -> Delta -> [a] -> IO Counter
morris e d = fmap rmedian . replicateM m . morrisB t
where (t,m) = (round $ 1/(e^2*d), round $ 1/d)
-- * Utils * --
-- * A step in morris Algorithm alpha
alpha :: Sink a RVar Counter
alpha = (\x -> 2^(round x) - 1) <$> Cl.foldM (\x _ -> incr x) 0
-- * Increment a counter `x` with probability 1/2^x
incr :: Counter -> RVar Counter
incr x = do
h <- (\q -> q <= (0.5^(round x) :: Prob)) <$> uniform 0 1
return $ if h then (seq () succ x) else seq () x
rmean, rmedian :: (Floating a, Ord a, RealFrac a) => [a] -> Float
rmean = fromIntegral . round . mean
rmedian = fromIntegral . round . median
-- |Numerically stable mean
mean :: Floating a => [a] -> a
mean x = fst $ foldl' (\(!m, !n) x -> (m+(x-m)/(n+1),n+1)) (0,0) x
-- |Median
median :: (Floating a, Ord a) => [a] -> a
median x | odd n = head $ drop (n `div` 2) x'
| even n = mean $ take 2 $ drop i x'
where i = (length x' `div` 2) - 1
x' = sort x
n = length x
The problem is that runtime of morris is linear in both length of stream and number of iterations t*m. So eg morrisA takes about 100μs for 100 items. Now if we want to be 95% confident of 5% error, we have to run morris for n=160000 times.
Could someone suggest how to optimize this code? maybe:
implement morris in something other than conduit
some faster way to repeat morrisA other than replicateM.

Related

Counting "perfect squares" in a list in Haskell

I am new to Haskell and I am struggling to get this concept to work.
I have to use a list of integers as a parameter and count the number of perfect squares (1,4,9,etc) in the list and output that value. So for example, if I entered myfunction[1,5,9] the output would be 2.
So far I have only got one function to work:
myfunction list = map per list
where per y = floor(sqrt (y) * sqrt (y))
Takes the square root of the element multiplied by the square root of the element. My mindset with the above is that I could set the above result equal to the original element. And if they are equal to each other, that would mean they are perfect squares. And then I would increment a value.
My issue is that my book doesn't give me any ideas for how to increment a variable or how to incorporate more than 1 function at a time. And as a result, I've been aimlessly working on this over the course of 3 days.
Thank you in advance for any help, advice, or resources!
fixing your version
first completed your version could look like this:
myfunction list = length $ filter per list
where
per y = floor(sqrt y * sqrt y) == y
this would even type-check but it would not work (try it)
that's because there is a small little problem - let's make it obvious by giving some types:
myfunction :: [Int] -> Int
myfunction list = length $ filter per list
where
per :: Int -> Bool
per y = floor(sqrt y * sqrt y) == y
you get an error:
No instance for (Floating Int) arising from a use of ‘sqrt’
it is trying to say to you that it does not know how to use sqrt for an Int - an easy fix is using fromIntegral and let it convert the Int into something that can:
myfunction :: [Int] -> Int
myfunction list = length $ filter per list
where
per :: Int -> Bool
per y = floor(sqrt (fromIntegral y) * sqrt (fromIntegral y)) == y
this kind of works (wrong answer) but gives an warning - we could get rid of with
myfunction :: [Int] -> Int
myfunction list = length $ filter per list
where
per :: Int -> Bool
per y = floor(sqrt (fromIntegral y :: Double) * sqrt (fromIntegral y)) == y
were we tell Haskell what type to use for the conversion (the warning would tell you that you default to this anyway).
So there is the wrong answer still.
#jpmarinier already told why - the way you test/sqr is sadly not cutting it (at least as I thought you wanted) - here is a fix:
myfunction :: [Int] -> Int
myfunction list = length $ filter per list
where
per :: Int -> Bool
per y = let y' = (fromIntegral y :: Double) in sqrt y' ** 2 == y'
where we first convert y to a Double value y' and test this.
Another option is using a integer-sqr as #jpmarinier mentioned:
myfunction :: [Int] -> Int
myfunction list = length $ filter per list
where
per :: Int -> Bool
per y = squareRoot y * squareRoot y == y
squareRoot :: Int -> Int
squareRoot = floor . sqrt . (fromIntegral :: Int -> Double)
that should finally work.
without floor and sqr:
ok this is maybe a bit to grok for you but here is how you can do this by sieving out the values.
Let's start by creating a (ascending) list of all perfect squares - I don't know which type you want those numbers to be so let's stay generic as well:
-- need later
import Data.List (sort)
perfectSquares :: Enum a => Num a => [a]
perfectSquares = [ n*n | n <- [1..] ]
no let's make a function counting elements from two lists - if the lists are sorted this can be done recursively by walking alongside the lists - I don't know if your input lists are always sorted so let's sort it before:
countOccurances :: (Enum a, Num a, Ord a) => [a] -> [a] -> a
countOccurances from list =
countAlong from $ sort list
where
countAlong [] _ = 0
countAlong _ [] = 0
countAlong allXs#(x:xs) allYs#(y:ys)
| x < y = countAlong xs allYs
| x > y = countAlong allXs ys
| otherwise = 1 + countAlong allXs ys
having both we can combine them for the answer:
import Data.List (sort)
countPerfectSquares :: (Enum a, Num a, Ord a) => [a] -> a
countPerfectSquares = countOccurances perfectSquares
countOccurances :: (Enum a, Num a, Ord a) => [a] -> [a] -> a
countOccurances from list =
countAlong from $ sort list
where
countAlong [] _ = 0
countAlong _ [] = 0
countAlong allXs#(x:xs) allYs#(y:ys)
| x < y = countAlong xs allYs
| x > y = countAlong allXs ys
| otherwise = 1 + countAlong allXs ys
perfectSquares :: Enum a => Num a => [a]
perfectSquares = [ n*n | n <- [1..] ]
Example:
> countPerfectSquares [1,5,9] :: Int
2

Haskell read n numbers from user and return their sum

I'm trying to write a function isums that reads n numbers from user and returns their sum. Also, after each number, the sum up to that number is printed. I have this so far:
isums :: Int -> IO Int
isums n = do
num <- readLn
putStrLn (show (num + sum))
sum <- isums (n - 1)
return (num + sum)
Also I'm not using IORef.
This would probably be easiest to express using a helper function, as the extra requirement of printing the partial sums after each input adds a bit of extra clutter:
isums :: Int -> IO Int
isums n = helper n 0
where
helper 0 acc = return acc
helper m acc = do
x <- readLn
let x' = acc + x
print x'
helper (m - 1) x'
What you're doing is kinda like a fold (look at foldM), only instead of traversing a list, you're getting the values "to be folded" from IO. If we had a function:
accM :: Monad m => (a -> m a) -> m a -> Int -> m a
accM f acc 0 = acc
accM f acc n = accM f (acc >>= f) (n - 1)
Then we could write this as:
isums :: Int -> IO Int
isums n = accM helper (return 0) n
where
helper acc = do
x <- readLn
let x' = acc + x
print x'
return x'
Which is a bit nicer (and more reusable) as it lets us separate the general behaviour (accM) from the specific behaviour (helper).
Here is one more solution: it builds on the last version of the #Willem's one, but instead of using a list of ()'s (which is a bit underwhelming) as a fuel for the loop (foldM) it applies a list of actions for reading the values.
import Control.Monad
isums n = foldM go 0 (replicate n readLn)
where
go s a = do
x <- a
let s' = s + x
print s'
return s'
Here replicate n readLn creates a list of actions, each of which reads an integer. These actions are not evaluated till the go is called during the looping by means of foldM. The fact that we can create such a list without performing actual reading stems from the Haskell's laziness.
You can use replicateM :: Applicative m => Int -> m a -> m [a] for this:
import Control.Monad(replicateM)
isums :: (Read n, Num n) => Int -> IO n
isums n = do
numbers <- replicateM n readLn
return (sum numbers)
So here we repeat readLn the given number of times, and then we return the sum of the numbers list.
An equivalent can be achieved with an fmap:
import Control.Monad(replicateM)
isums :: (Read n, Num n) => Int -> IO n
isums n = fmap sum (replicateM n readLn)
Or even pointfree (and pointless):
import Control.Monad(replicateM)
isums :: (Read n, Num n) => Int -> IO n
isums = fmap sum . flip replicateM readLn
We can also produce a list of partial sums with scanl:
import Control.Monad(replicateM)
isums :: (Read n, Num n) => Int -> IO [n]
isums = fmap (scanl (+) 0) . flip replicateM readLn
and then later process the list, or in case we need to print these, and return the last one, we can perform a mapM on that list, like:
import Control.Monad(replicateM)
isums :: (Read n, Num n) => Int -> IO ()
isums n = fmap (scanl (+) 0) (replicateM n readLn) >>= mapM_ print
Or in case we need to print the partial sums:
isums :: (Read n, Num n, Show n) => Int -> IO n
isums n = foldM f 0 (replicate n ())
where f a _ = readLn >>= \b -> let c = a + b in print c >> return c

Lagrange Interpolation for a schema based on Shamir's Secret Sharing

I'm trying to debug an issue with an implementation of a threshold encryption scheme. I've posted this question on crypto to get some help with the actual scheme but was hoping to get a sanity check on the simplified code I am using.
Essentially the the crypto system uses Shamir's Secret Sharing to combine the shares of a key. The polynomial is each member of the list 'a' multiplied by a increasing power of the parameter of the polynomial. I've left out the mod by prime to simplify the code as the actual implementation uses PBC via a Haskell wrapper.
I have for the polynomial
poly :: [Integer] -> Integer -> Integer
poly as xi = (f 1 as)
where
f _ [] = 0
f 0 _ = 0
f s (a:as) = (a * s) + f (s * xi) as
The Lagrange interpolation is:
interp0 :: [(Integer, Integer)] -> Integer
interp0 xys = round (sum $ zipWith (*) ys $ fmap (f xs) xs)
where
xs = map (fromIntegral .fst) xys
ys = map (fromIntegral .snd) xys
f :: (Eq a, Fractional a) => [a] -> a -> a
f xs xj = product $ map (p xj) xs
p :: (Eq a, Fractional a) => a -> a -> a
p xj xm = if xj == xm then 1 else negate (xm / (xj - xm))
and the split and combination code is
execPoly as#(a0:_) = do
let xs = zipWith (,) [0..] (fmap (poly as) [0..100])
let t = length as + 1
let offset = 1
let shares = take t (drop offset xs)
let sm2 = interp0 shares
putText ("poly and interp over " <> show as <> " = " <> show sm2 <> ". Should be " <> show a0)
main :: IO ()
main = do
execPoly [10,20,30,40,50,60,70,80,90,100,110,120,130,140,150] --1
execPoly [10,20,30,40,50,60,70,80] -- 2
execPoly(1) fails to combine to 10 but execPoly(2) combines correctly. The magic threshold seems to be 8.
Is my code correct? I am missing something in the implementation that limits the threshold size to 8?
As MathematicalOrchid said it was a precision problem.
Updated the code to:
f :: (Eq a, Integral a) => [a] -> a -> Ratio a
f xs xj = product $ map (p xj) xs
p :: (Eq a, Integral a)=> a -> a -> Ratio a
p xj xm = if xj == xm then (1 % 1) else (negate xm) % (xj - xm)
And it works as expected.

Function Type Restrictions

Is it generally preferable to have the strictest or loosest type definition for a function? What are the pros and cons of each approach? I found that when I rewrote my pearson correlation code using strictly doubles, it was easier for me to write, follow, and reason about (this could just be inexperience). But I can also see how having a more broad type definition would make the functions more generally applicable. Would stricter type definitions be characterized as a form of tech debt?
With Typeclasses:
import Data.List
mean :: Fractional a => [a] -> a
mean xs = s / n
where
(s , n) = foldl' k (0,0) xs
k (s, n) x = s `seq` n `seq` (s + x, n + 1)
covariance :: Fractional a => [a] -> [a] -> a
covariance xs ys = mean productXY
where
productXY = zipWith (*) [x - mx | x <- xs] [y - my | y <- ys]
mx = mean xs
my = mean ys
stddev :: Floating a => [a] -> a
stddev xs = sqrt (covariance xs xs)
pearson :: RealFloat a => [a] -> [a] -> a
pearson x y = fifthRound $ covariance x y / (stddev x * stddev y)
pearsonMatrix :: RealFloat a => [[a]] -> [[a]]
pearsonMatrix (x:xs) = [pearson x y | y <- x:xs]:(pearsonMatrix xs)
pearsonMatrix [] = []
fifthRound :: RealFrac a => a -> a
fifthRound x = (/100000) $ fromIntegral $ round (x * 100000)
With Doubles:
import Data.List
mean :: [Double] -> Double
mean xs = s / n
where
(s , n) = foldl' k (0,0) xs
k (s, n) x = s `seq` n `seq` (s + x, n + 1)
covariance :: [Double] -> [Double] -> Double
covariance xs ys = mean productXY
where
productXY = zipWith (*) [x - mx | x <- xs] [y - my | y <- ys]
mx = mean xs
my = mean ys
stddev :: [Double] -> Double
stddev xs = sqrt (covariance xs xs)
pearson :: [Double] -> [Double] -> Double
pearson x y = fifthRound (covariance x y / (stddev x * stddev y))
pearsonMatrix :: [[Double]] -> [[Double]]
pearsonMatrix (x:xs) = [pearson x y | y <- x:xs]:(pearsonMatrix xs)
pearsonMatrix [] = []
fifthRound :: Double -> Double
fifthRound x = (/100000) $ fromIntegral $ round (x * 100000)
Readability is a matter of opinion. In general, I find that more general type signatures are more readable because there are fewer possible definitions (sometimes there is even only one non-diverging definition). For example, seeing that mean only has a Fractional constraint immediately limits the operations being performed in that function (compared to the Double version which could be performing sqrt operations for all I know). Of course, generalizing types is not always more readable. (And just for fun)
The main disadvantage of having more general versions of functions is that they may remain unoptimized at runtime so that Double's dictionary of the Floating functions has to be passed to mean every time it is called.
You can have the best of all worlds by adding a SPECIALIZE pragma. This tells the compiler to basically duplicate your function code with some of the type variables instantiated. If you know you are going to be calling your mean function pretty much only with Double, then this is what I would do
{-# SPECIALIZE mean :: [Double] -> Double #-}
mean :: Fractional a => [a] -> a
mean xs = s / n
where
(s , n) = foldl' k (0,0) xs
k (s, n) x = s `seq` n `seq` (s + x, n + 1)
And you get to see the specialized version of the signature in your code too! Yay!

Ever increasing CPU consumption with Haskell and stream-fusion

Here is a short Haskell program that generates a 440 Hz sound. It uses pulseaudio as an audio backend.
import GHC.Float
import Control.Arrow
import Sound.Pulse.Simple
import qualified Data.List.Stream as S
import Data.List
type Time = Double
type Frequency = Double
type Sample = Double
type CV = Double
chunksize = 441 * 2
sampleRate :: (Fractional a) => a
sampleRate = 44100
integral :: [Double] -> [Double]
integral = scanl1 (\acc x -> acc + x / sampleRate)
chunks :: Int -> [a] -> [[a]]
chunks n = S.takeWhile (not . S.null) . S.unfoldr (Just . S.splitAt n)
pulseaudioOutput :: [Sample] -> IO ()
pulseaudioOutput sx = do
pa <- simpleNew Nothing "Synths" Play Nothing "Synths PCM output"
(SampleSpec (F32 LittleEndian) 44100 1) Nothing Nothing
mapM_ (simpleWrite pa . S.map double2Float) $ chunks 1000 sx
simpleDrain pa
simpleFree pa
oscSine :: Frequency -> [CV] -> [Sample]
oscSine f = S.map sin <<< integral <<< S.map ((2 * pi * f *) . (2**))
music ::[Sample]
music = oscSine 440 (S.repeat 0)
main = do
pulseaudioOutput music
If I compile and run this, I see an ever growing CPU consumption.
If I change "S.splitAt" to "splitAt" in the definition of "chunks", everything is fine.
Can anyone guess why this can be?
Thank you.
Update
In the following code all three version of chunks can produce the aforementioned behaviour:
import GHC.Float
import Control.Arrow
import Sound.Pulse.Simple
import Data.List.Stream
import Prelude hiding ( unfoldr
, map
, null
, scanl1
, takeWhile
, repeat
, splitAt
, drop
, take
)
type Time = Double
type Frequency = Double
type Sample = Double
type CV = Double
chunksize = 441 * 2
sampleRate :: (Fractional a) => a
sampleRate = 44100
integral :: [Double] -> [Double]
integral = scanl1 (\acc x -> acc + x / sampleRate)
chunks :: Int -> [a] -> [[a]]
--chunks n = takeWhile (not . null) . unfoldr (Just . splitAt n)
--chunks n xs = take n xs : chunks n (drop n xs)
chunks n xs = h : chunks n t
where
(h, t) = splitAt n xs
pulseaudioOutput :: [Sample] -> IO ()
pulseaudioOutput sx = do
pa <- simpleNew Nothing "Synths" Play Nothing "Synths PCM output"
(SampleSpec (F32 LittleEndian) 44100 1) Nothing Nothing
mapM_ (simpleWrite pa . map double2Float) $ chunks 1000 sx
simpleDrain pa
simpleFree pa
oscSine :: Frequency -> [CV] -> [Sample]
oscSine f = map sin <<< integral <<< map ((2 * pi * f *) . (2**))
music ::[Sample]
music = oscSine 440 (repeat 0)
main = do
pulseaudioOutput music
I cleaned up the code to avoid mixing plain old lists and stream-fusion lists. The memory / cpu leak is still there. To see that the code is working on old lists, just remove the Prelude import and ".Stream" after "Data.List".
The splitAt on streams that is substituted by the fusion rules (http://hackage.haskell.org/package/stream-fusion-0.1.2.5/docs/Data-Stream.html#g:12) has the following signature:
splitAt :: Int -> Stream a -> ([a], [a])
From this we can see that since it produces lists and not streams, that obstructs further fusion. The correct thing to do, I think, is to produce either a splitAt that generates streams, or better yet to write a chunks function directly on streams with the appropriate fusion rules from the list version.
Here is a splitAt on streams that I think should be good. You would of course need to pair it with the appropriate rewrite rules from a splitAt on lists, and if those rewrite rules get tricky, perhaps write the chunks function directly, though it seems a bit tricky to do so as well:
splitAt :: Int -> Stream a -> (Stream a, Stream a)
splitAt n0 (Stream next s0)
| n0 < 0 = (nilStream, (Stream next s0))
| otherwise = loop_splitAt n0 s0
where
nilStream = Stream (const Done) s0
loop_splitAt 0 !s = (nilStream, (Stream next s))
loop_splitAt !n !s = case next s of
Done -> (nilStream, nilStream)
Skip s' -> loop_splitAt n s'
Yield x s' -> (cons x xs', xs'')
where
(xs', xs'') = loop_splitAt (n-1) s'

Resources