Haskell `randoms` function not behaving well with my library - haskell

I'm trying to write a Haskell library for cryptographically secure random numbers. The code follows:
module URandom (URandom, initialize) where
import qualified Data.ByteString.Lazy as B
import System.Random
import Data.Word
newtype URandom = URandom [Word8]
instance RandomGen URandom where
next (URandom (x : xs)) = (fromIntegral x, URandom xs)
split (URandom l) = (URandom (evens l), URandom (odds l))
where evens (x : _ : xs) = x : evens xs
odds (_ : x : xs) = x : odds xs
genRange _ = (fromIntegral (minBound :: Word8), fromIntegral (maxBound :: Word8))
initialize :: IO URandom
initialize = URandom . B.unpack <$> B.readFile "/dev/urandom"
Unfortunately, it's not behaving like I want. In particular, performing
take 10 . randoms <$> initialize
yields (something similar to)
[-4611651379516519433,-4611644973572935887,-31514321567846,9223361179177989878,-4611732094835278236,9223327886739677537,4611709625714976418,37194416358963,4611669560113361421,-4611645373004878170,-9223329383535098640,4611675323959360258,-27021785867556,9223330964083681227,4611705212636167666]
which to my, albiet untrained, eye, does not appear very random. A lot of 46... and 92... in there.
What could be going wrong? Why doesn't this produce well-distributed numbers? It's worth noting that even if I concatenate together Word8s to form Ints the distribution does not improve, I didn't think it was worth including that code here.
Edit: here's some evidence that's not distributed correctly. I've written a function called histogram:
histogram :: ∀ t . (Integral t, Bounded t)
=> [t] -> Int -> S.Seq Int
histogram [] buckets = S.replicate buckets 0
histogram (x : xs) buckets = S.adjust (+ 1) (whichBucket x) (histogram xs buckets)
where whichBucket x = fromIntegral $ ((fromIntegral x * fromIntegral buckets) :: Integer) `div` fromIntegral (maxBound :: t)
and when I run
g <- initialize
histogram (take 1000000 $ randoms g :: [Word64]) 16
I get back
fromList [128510,0,0,121294,129020,0,0,122090,127873,0,0,120919,128637,0,0,121657]
Some of the buckets are completely empty!

The issue is a bug in random-1.0.1.1 that was fixed in random-1.1. The changelog points to this ticket. In particular, referring to the older version:
It also assumes that all RandomGen implementations produce the same range of random values as StdGen.
Here randomness is produced 8 bits at a time, and that caused the observed behavior.
random-1.1 fixed this:
This implementation also works with any RandomGen, even ones that produce as little as a single bit of entropy per next call or have a minimum bound other than zero.

Related

Computing Moving Average in Haskell

I'm working on learning Haskell, so I tried to implement a moving average function. Here is my code:
mAverage :: Int-> [Int] -> [Float]
mAverage x a = [fromIntegral k / fromIntegral x | k <- rawAverage]
where
rawAverage = mAverage' x a a
-- First list contains original values; second list contains moving average computations
mAverage' :: Int -> [Int] -> [Int] -> [Int]
mAverage' 1 a b = b
mAverage' x a b = mAverage' (x - 1) a' b'
where
a' = init a
b' = zipWith (+) a' (tail b)
where the user calls mAverage with a length for each average and the list of values (e.g. mAverage 4 [1,2..100]).
However, when I run the code on the input mAverage 4 [1,2..100000], I get that it takes 3.6 seconds in ghci (using :set +s) and uses a gigabyte of memory. This seems very inefficient to me, as the equivalent function takes a fraction of a second in Python. Is there some way that I could make my code more efficient?
If you want to learn something new you can take a look at this nice solution for Moving Average problem. It is written by one of my students so I won't claim authorship. I really like it because it's very short. The only problem here is average function. Such functions are known to be bad. Instead you can use Beautiful folds by Gabriel Gonzalez. And yes, this function takes O(k) time (where k is size of window) for calculating average of window (I find it better because you can face floating point errors if you try to add only new element to window and subtract last). Oh, it also uses State monad :)
{-# LANGUAGE UnicodeSyntax #-}
module MovingAverage where
import Control.Monad (forM)
import Control.Monad.State (evalState, gets, modify)
moving :: Fractional a ⇒ Int → [a] → [a]
moving n _ | n <= 0 = error "non-positive argument"
moving n xs = evalState (forM xs $ \x → modify ((x:) . take (n-1)) >> gets average) []
where
average xs = sum xs / fromIntegral n
Here is a straightforward list-based solution which is idiomatic and fast enough, though requires more memory.
import Data.List (tails)
mavg :: Fractional b => Int -> [b] -> [b]
mavg k lst = take (length lst-k) $ map average $ tails lst
where average = (/ fromIntegral k) . sum . take k
This solution allows to use any function instead of average in a moving window.
The following solution is less universal but it is constant in space and seems to be the fastest one.
import Data.List (scanl')
mavg :: Fractional b => Int -> [b] -> [b]
mavg k lst = map (/ fromIntegral k) $ scanl' (+) (sum h) $ zipWith (-) t lst
where (h, t) = splitAt k lst
Finally, the solution which uses a kind of Okasaki's persistent functional queue, to keep the moving window. It does make sense when dealing with streaming data, like conduits or pipes.
mavg k lst = map average $ scanl' enq ([], take k lst) $ drop k lst
where
average (l,r) = (sum l + sum r) / fromIntegral k
enq (l, []) x = enq ([], reverse l) x
enq (l, (_:r)) x = (x:l, r)
And as it was mentioned in comments to the original post, do not use ghci for profiling. For example, you won't be able to see any benefits of scanl' in ghci.
Here's a solution for you.
The idea is to scan two lists, one where the averaging window starts, and another where it ends. Getting a tail end of a list costs as much as scanning the part we're skipping, and we're not copying anything. (If the windows size was usually quite large, we could calculate the remaining_data along with counting the sum initial_data, in one go.)
We generate a list of partial sums as described in my comment, then divide them by the windows width to get averages.
While slidingAverage calculates averages for biased position (window width to the right), centeredSlidingAverage calculates centered averages, using half window width to the left and to the right.
import Data.List (splitAt, replicate)
slidingAverage :: Int -> [Int] -> [Double] -- window size, source list -> list of averages
slidingAverage w xs = map divide $ initial_sum : slidingSum initial_sum xs remaining_data
where
divide = (\n -> (fromIntegral n) / (fromIntegral w)) -- divides the sums by window size
initial_sum = sum initial_data
(initial_data, remaining_data) = splitAt w xs
centeredSlidingAverage :: Int -> [Int] -> [Double] -- window size, source list -> list of averages
centeredSlidingAverage w xs = slidingAverage w $ left_padding ++ xs ++ right_padding
where
left_padding = replicate half_width 0
right_padding = replicate (w - half_width) 0
half_width = (w `quot` 2) -- quot is integer division
slidingSum :: Int -> [Int] -> [Int] -> [Int] -- window_sum before_window after_window -> list of sums
slidingSum _ _ [] = []
slidingSum window_sum before_window after_window = new_sum : slidingSum new_sum new_before new_after
where
value_to_go = head before_window
new_before = tail before_window
value_to_come = head after_window
new_after = tail after_window
new_sum = window_sum - value_to_go + value_to_come
When I try length $ slidingAverage 10 [1..1000000], it takes less than a second on my MBP. Due to the laziness, centeredSlidingAverage takes about the same time.
One simple way of doing it that also uses O(n) complexity
movingAverage :: (Fractional a) => Int -> [a] -> [a]
movingAverage n _ | n <= 0 = error "non-positive argument"
movingAverage n xs = fmap average $ groupBy n xs
where average xs' = sum xs' / fromIntegral (length xs')
groupBy :: Int -> [a] -> [[a]]
groupBy _ [] = []
groupBy n xs = go [] xs
where
go _ [] = []
go l (x:xs') = (x:t) : go (x:l) xs'
where t = take (n-1) l
Another way is to use STUArray.
import Data.Array.Unboxed
import Data.Array.ST
import Data.STRef
import Control.Monad
import Control.Monad.ST
movingAverage :: [Double] -> IO [Double]
movingAverage vals = stToIO $ do
let end = length vals - 1
myArray <- newArray (1, end) 0 :: ST s (STArray s Int Double)
forM_ [1 .. end] $ \i -> do
let cval = vals !! i
let lval = vals !! (i-1)
writeArray myArray i ((cval + lval)/2)
getElems myArray

What's the best way to represent a short bit string?

I want to represent a string of up to around 120 bits, and speed is critical. I need to be able to build a bitstring by repeated snoc operations, and then to consume it with repeated uncons operations. One idea is to steal the implementation of Word128 from data-dword and use something like this to build:
empty = 1
snoc xs x = (xs `shiftL` 1) .|. x
But the unconsing seems to get a bit ugly, having to first countLeadingZeros and shift left to eliminate them before being able to read off the elements by shifting and masking the high bits.
Is there some more pleasant way that's at least as fast, or some faster way that's not too much more unpleasant?
Context
Phil Ruffwind has proposed a version of lens's at for Data.Map, but all implementations thus far are substantially slower than the naive implementation lens currently uses when key comparison is cheap. If I could produce a very cheap representation of the path to an entry while looking it up, and then consume it very efficiently with a specialized version of insert or delete, then maybe I could make this worthwhile.
I am not sure if this qualifies. I fear that I'm re-implementing countLeadingZeros in some form...
Anyway, the idea is to snoc bits from the left, shifting right. Then, we can "count" the trailing zeros of x using x-1 and a XOR. The result of the "count" is a mask "00..01..11" which, roughly, is a unary representation of the trailing zeros. We do not convert this unary to binary since we have no need to: with some bit-level work, we can uncons.
Untested and unproven code follows.
import Data.Word
import Data.Bits
import Text.Printf
type T = Word64 -- can be adapted to any WordN
-- for pretty printing
pr :: T -> String
pr x = printf "%064b\n" x
empty :: T
empty = shiftL 1 63
snoc :: T -> T -> T
snoc x xs = shiftR xs 1 .|. (shiftL x 63)
-- returns (head, tail)
-- head is not normalized (0 or 1), only (0 or /=0)
uncons :: T -> (T, T)
uncons xs =
let -- example
-- 0101001100000000000 xs
y = (xs `xor` (xs - 1))
-- 0000000111111111111 y
z = shiftR y 1 + 1
-- 0000000100000000000 z
z' = shiftL z 1
-- 0000001000000000000 z'
in (xs .&. z' , (xs .&. complement z) .|. z' )

Implementing an efficient sliding-window algorithm in Haskell

I needed an efficient sliding window function in Haskell, so I wrote the following:
windows n xz#(x:xs)
| length v < n = []
| otherwise = v : windows n xs
where
v = take n xz
My problem with this is that I think the complexity is O(n*m) where m is the length of the list and n is the window size. You count down the list once for take, another time for length, and you do it down the list of essentially m-n times. It seems like it can be more efficient than this, but I'm at a loss for how to make it more linear. Any takers?
You can't get better than O(m*n), since this is the size of the output data structure.
But you can avoid checking the lengths of the windows if you reverse the order of operations: First create n shifted lists and then just zip them together. Zipping will get rid of those that don't have enough elements automatically.
import Control.Applicative
import Data.Traversable (sequenceA)
import Data.List (tails)
transpose' :: [[a]] -> [[a]]
transpose' = getZipList . sequenceA . map ZipList
Zipping a list of lists is just a transposition, but unlike transpose from Data.List it throws away outputs that would have less than n elements.
Now it's easy to make the window function: Take m lists, each shifted by 1, and just zip them:
windows :: Int -> [a] -> [[a]]
windows m = transpose' . take m . tails
Works also for infinite lists.
You can use Seq from Data.Sequence, which has O(1) enqueue and dequeue at both ends:
import Data.Foldable (toList)
import qualified Data.Sequence as Seq
import Data.Sequence ((|>))
windows :: Int -> [a] -> [[a]]
windows n0 = go 0 Seq.empty
where
go n s (a:as) | n' < n0 = go n' s' as
| n' == n0 = toList s' : go n' s' as
| otherwise = toList s'' : go n s'' as
where
n' = n + 1 -- O(1)
s' = s |> a -- O(1)
s'' = Seq.drop 1 s' -- O(1)
go _ _ [] = []
Note that if you materialize the entire result your algorithm is necessarily O(N*M) since that is the size of your result. Using Seq just improves performance by a constant factor.
Example use:
>>> windows [1..5]
[[1,2,3],[2,3,4],[3,4,5]]
First let's get the windows without worrying about the short ones at the end:
import Data.List (tails)
windows' :: Int -> [a] -> [[a]]
windows' n = map (take n) . tails
> windows' 3 [1..5]
[[1,2,3],[2,3,4],[3,4,5],[4,5],[5],[]]
Now we want to get rid of the short ones without checking the length of every one.
Since we know they are at the end, we could lose them like this:
windows n xs = take (length xs - n + 1) (windows' n xs)
But that's not great since we still go through xs an extra time to get its length. It also doesn't work on infinite lists, which your original solution did.
Instead let's write a function for using one list as a ruler to measure the amount to take from another:
takeLengthOf :: [a] -> [b] -> [b]
takeLengthOf = zipWith (flip const)
> takeLengthOf ["elements", "get", "ignored"] [1..10]
[1,2,3]
Now we can write this:
windows :: Int -> [a] -> [[a]]
windows n xs = takeLengthOf (drop (n-1) xs) (windows' n xs)
> windows 3 [1..5]
[[1,2,3],[2,3,4],[3,4,5]]
Works on infinite lists too:
> take 5 (windows 3 [1..])
[[1,2,3],[2,3,4],[3,4,5],[4,5,6],[5,6,7]]
As Gabriella Gonzalez says, the time complexity is no better if you want to use the whole result. But if you only use some of the windows, we now manage to avoid doing the work of take and length on the ones you don't use.
If you want O(1) length then why not use a structure that provides O(1) length? Assuming you aren't looking for windows from an infinite list, consider using:
import qualified Data.Vector as V
import Data.Vector (Vector)
import Data.List(unfoldr)
windows :: Int -> [a] -> [[a]]
windows n = map V.toList . unfoldr go . V.fromList
where
go xs | V.length xs < n = Nothing
| otherwise =
let (a,b) = V.splitAt n xs
in Just (a,b)
Conversation of each window from a vector to a list might bite you some, I won't hazard an optimistic guess there, but I will bet that the performance is better than the list-only version.
For the sliding window I also used unboxed Vetors as length, take, drop as well as splitAt are O(1) operations.
The code from Thomas M. DuBuisson is a by n shifted window, not a sliding, except if n =1. Therefore a (++) is missing, however this has a cost of O(n+m). Therefore careful, where you put it.
import qualified Data.Vector.Unboxed as V
import Data.Vector.Unboxed (Vector)
import Data.List
windows :: Int -> Vector Double -> [[Int]]
windows n = (unfoldr go)
where
go !xs | V.length xs < n = Nothing
| otherwise =
let (a,b) = V.splitAt 1 xs
c= (V.toList a ++V.toList (V.take (n-1) b))
in (c,b)
I tried it out with +RTS -sstderr and:
putStrLn $ show (L.sum $ L.concat $ windows 10 (U.fromList $ [1..1000000]))
and got real time 1.051s and 96.9% usage, keeping in mind that after the sliding window two O(m) operations are performed.

Haskell - How do I get Random Points (Int,Int)

I'm trying to get a set of random points (x,y) for drawing graph nodes to a screen. I need one randomly generated point for each node name passed in.
I found this code on a SO page, and modified it slightly to work for me, but it doesn't really do what I need.
I need a list of random (as random as possible) (Int,Int).
Anyway, here is what I have so far, and of course, it gives the same values every time, so it isn't particularly random :)
rndPoints :: [String] -> [Point]
rndPoints [] = []
rndPoints xs = zip x y where
size = length xs
x = take size (tail (map fst $ scanl (\(r, gen) _ -> randomR (25::Int,1000::Int) gen) (random (mkStdGen 1)) $ repeat ()))
y = take size (tail (map fst $ scanl (\(r, gen) _ -> randomR (25::Int,775::Int) gen) (random (mkStdGen 1)) $ repeat ()))
Any help would be much appreciated.
First, let's clean up your code a bit. There is a plural version of randomR that delivers an infinite list of random values: randomRs. This simplifies things a bit:
rndPoints1 :: [String] -> [Point]
rndPoints1 [] = []
rndPoints1 xs = zip x y
where
size = length xs
x = take size $ randomRs (25, 1000) (mkStdGen 1)
y = take size $ randomRs (25, 775) (mkStdGen 1)
We can simplify that further, by using zip's property that it stops after the shorter list is exhausted:
rndPoints2 :: [a] -> [Point]
rndPoints2 xs = map snd $ zip xs $ zip x y
where
x = randomRs (25, 1000) (mkStdGen 1)
y = randomRs (25, 775) (mkStdGen 1)
Notice I've also generalized the type of incoming list to just [a]. Since the values are never used, they needn't be Strings!
Now, it gives the same value every time because it uses mkStdGen to create a pseudo-random generator from the same seed (1) each time. If you want it to be different each time, then you need to create a generator in IO which can be based on the radom state of the computer. Rather than put the whole computation in IO, it is cleaner to pass in a StdGen:
rndPoints3 :: StdGen -> [Point]
rndPoints3 sg = zip x y
where
(sg1, sg2) = split sg
x = randomRs (25, 1000) sg1
y = randomRs (25, 775) sg2
pointsForLabels :: [a] -> StdGen -> [(a, Point)]
pointsForLabels xs sg = zip xs $ rndPoints3 sg
example3 :: [a] -> IO [(a, Point)]
example3 xs = newStdGen >>= return . pointsForLabels xs
Here, newStdGen creates a new pseudo-random generator each time, but it is in IO. That is passed eventually to a pure (non-IO) function rndPoints3 that takes the generator, and returns an infinite list of random Points. Within that function, split is used to create two generators from it, and each is used to derive the random list of coordinates.
pointsForLables now separates out the logic of matching up a new random point for each label. I also changed it to return the more likely useful pairs of labels and Points.
Finally, example3 lives in IO, and creates the generator and passes it all into the otherwise pure code.
I ended up using MonadRandom for this. I think the code was a little clearer and easier for me to understand. You could adapt the following code to address the original question.
import Control.Applicative
import Control.Monad.Random
type Point = (Float, Float)
type Poly = [Point]
randomScalar :: (RandomGen g) => Rand g Float
randomScalar = getRandomR (-500, 500)
randomPoint :: (RandomGen g) => Rand g Point
randomPoint = (,) <$> randomScalar <*> randomScalar
randomPoly :: (RandomGen g) => Int -> Rand g Poly
randomPoly n = sequence (replicate n randomPoint)

Sum over Haskell Map

Is there a standard function to sum all values in a Haskell map. My Map reads something like [(a,2),(b,4),(c,6)]?
Essentially what I am trying to do is a normalized frequency distribution. So the values of the keys in the above map are counts for a,b,c. I need to normalize them as [(a,1/6),(b,1/3),(c,1/2)]
You can simply do Map.foldl' (+) 0 (or M.foldl', if you imported Data.Map as M).
This is just like foldl' (+) 0 . Map.elems, but slightly more efficient. (Don't forget the apostrophe — using foldl or foldr to do sums with the standard numeric types (Int, Integer, Float, Double, etc.) will build up huge thunks, which will use up lots of memory and possibly cause your program to overflow the stack.)
However, only sufficiently recent versions of containers (>= 0.4.2.0) contain Data.Map.foldl', and you shouldn't upgrade it with cabal install, since it comes with GHC. So unless you're on GHC 7.2 or above, foldl' (+) 0 . Map.elems is the best way to accomplish this.
You could also use Data.Foldable.sum, which works on any instance of the Foldable typeclass, but will still build up large thunks on the common numeric types.
Here's a complete example:
normalize :: (Fractional a) => Map k a -> Map k a
normalize m = Map.map (/ total) m
where total = foldl' (+) 0 $ Map.elems m
You'll need to import Data.List to use foldl'.
let
total = foldr (\(_, n) r -> r + n) 0 l
in map (\(x, y) -> (x, y/total) l
Where l is your map.
Simple:
import qualified Data.Map as M
sumMap = M.foldl' (+) 0
normalizeMap m =
let s = sumMap m in
M.map (/ s) m
main = do
let m = M.fromList [("foo", 1), ("bar", 2), ("baz", 6)]
(print . sumMap) m
(print . normalizeMap) m
prints:
9.0
fromList [("bar",0.2222222222222222),("baz",0.6666666666666666),("foo",0.1111111111111111)]

Resources