Using a custom generator vs Arbitrary instance in QuickCheck

Using a custom generator vs Arbitrary instance in QuickCheck - haskell

Here's a simple function. It takes an input Int and returns a (possibly empty) list of (Int, Int) pairs, where the input Int is the sum of the cubed elements of any of the pairs.
cubeDecomposition :: Int -> [(Int, Int)]
cubeDecomposition n = [(x, y) | x <- [1..m], y <- [x..m], x^3 + y^3 == n]
where m = truncate $ fromIntegral n ** (1/3)
-- cubeDecomposition 1729
-- [(1,12),(9,10)]
I want to test the property that the above is true; if I cube each element and sum any of the return tuples, then I get my input back:
import Control.Arrow
cubedElementsSumToN :: Int -> Bool
cubedElementsSumToN n = all (== n) d
where d = map (uncurry (+) . ((^3) *** (^3))) (cubeDecomposition n)
For runtime considerations, I'd like to limit the input Ints to a certain size when testing this with QuickCheck. I can define an appropriate type and Arbitrary instance:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Test.QuickCheck
newtype SmallInt = SmallInt Int
deriving (Show, Eq, Enum, Ord, Num, Real, Integral)
instance Arbitrary SmallInt where
arbitrary = fmap SmallInt (choose (-10000000, 10000000))
And then I guess I have to define versions of the function and property that use SmallInt rather than Int:
cubeDecompositionQC :: SmallInt -> [(SmallInt, SmallInt)]
cubeDecompositionQC n = [(x, y) | x <- [1..m], y <- [x..m], x^3 + y^3 == n]
where m = truncate $ fromIntegral n ** (1/3)
cubedElementsSumToN' :: SmallInt -> Bool
cubedElementsSumToN' n = all (== n) d
where d = map (uncurry (+) . ((^3) *** (^3))) (cubeDecompositionQC n)
-- cubeDecompositionQC 1729
-- [(SmallInt 1,SmallInt 12),(SmallInt 9,SmallInt 10)]
This works fine, and the standard 100 tests pass as expected. But it seems unnecessary to define a new type, instance, and function when all I really need is a custom generator. So I tried this:
smallInts :: Gen Int
smallInts = choose (-10000000, 10000000)
cubedElementsSumToN'' :: Int -> Property
cubedElementsSumToN'' n = forAll smallInts $ \m -> all (== n) (d m)
where d = map (uncurry (+) . ((^3) *** (^3)))
. cubeDecomposition
Now, the first few times I ran this, everything worked fine, and all tests pass. But on subsequent runs I observed failures. Bumping up the test size reliably finds one:
*** Failed! Falsifiable (after 674 tests and 1 shrink):
0
8205379
I'm a bit confused here due to the presence of two shrunken inputs - 0 and 8205379 - returned from QuickCheck, where I would intuitively expect one. Also, those inputs work as predicted (on my show-able property, at least):
*Main> cubedElementsSumToN 0
True
*Main> cubedElementsSumToN 8205379
True
So it seems like obviously there's a problem in the property that uses the custom Gen I defined.
What have I done wrong?

I quickly realized that the property as I've written it is obviously incorrect. Here's the proper way to do it, using the original cubedElementsSumToN property:
quickCheck (forAll smallInts cubedElementsSumToN)
which reads quite naturally.

Related

Haskell: Difference between "(Int a, Bool b) => a -> b" and Int -> Bool [duplicate]

I wrote my first program in Haskell today. It compiles and runs successfully. And since it is not a typical "Hello World" program, it in fact does much more than that, so please congrats me :D
Anyway, I've few doubts regarding my code, and the syntax in Haskell.
Problem:
My program reads an integer N from the standard input and then, for each integer i in the range [1,N], it prints whether i is a prime number or not. Currently it doesn't check for input error. :-)
Solution: (also doubts/questions)
To solve the problem, I wrote this function to test primality of an integer:
is_prime :: Integer -> Bool
is_prime n = helper n 2
where
helper :: Integer -> Integer -> Bool
helper n i
| n < 2 * i = True
| mod n i > 0 = helper n (i+1)
| otherwise = False
It works great. But my doubt is that the first line is a result of many hit-and-trials, as what I read in this tutorial didn't work, and gave this error (I suppose this is an error, though it doesn't say so):
prime.hs:9:13:
Type constructor `Integer' used as a class
In the type signature for `is_prime':
is_prime :: Integer a => a -> Bool
According to the tutorial (which is a nicely-written tutorial, by the way), the first line should be: (the tutorial says (Integral a) => a -> String, so I thought (Integer a) => a -> Bool should work as well.)
is_prime :: (Integer a) => a -> Bool
which doesn't work, and gives the above posted error (?).
And why does it not work? What is the difference between this line (which doesn't work) and the line (which works)?
Also, what is the idiomatic way to loop through 1 to N? I'm not completely satisfied with the loop in my code. Please suggest improvements. Here is my code:
--read_int function
read_int :: IO Integer
read_int = do
line <- getLine
readIO line
--is_prime function
is_prime :: Integer -> Bool
is_prime n = helper n 2
where
helper :: Integer -> Integer -> Bool
helper n i
| n < 2 * i = True
| mod n i > 0 = helper n (i+1)
| otherwise = False
main = do
n <- read_int
dump 1 n
where
dump i x = do
putStrLn ( show (i) ++ " is a prime? " ++ show (is_prime i) )
if i >= x
then putStrLn ("")
else do
dump (i+1) x

You are misreading the tutorial. It would say the type signature should be
is_prime :: (Integral a) => a -> Bool
-- NOT Integer a
These are different types:
Integer -> Bool
This is a function that takes a value of type Integer and gives back a value of type Bool.
Integral a => a -> Bool
This is a function that takes a value of type a and gives back a value of type Bool.
What is a? It can be any type of the caller's choice that implements the Integral type class, such as Integer or Int.
(And the difference between Int and Integer? The latter can represent an integer of any magnitude, the former wraps around eventually, similar to ints in C/Java/etc.)
The idiomatic way to loop depends on what your loop does: it will either be a map, a fold, or a filter.
Your loop in main is a map, and because you're doing i/o in your loop, you need to use mapM_.
let dump i = putStrLn ( show (i) ++ " is a prime? " ++ show (is_prime i) )
in mapM_ dump [1..n]
Meanwhile, your loop in is_prime is a fold (specifically all in this case):
is_prime :: Integer -> Bool
is_prime n = all nondivisor [2 .. n `div` 2]
where
nondivisor :: Integer -> Bool
nondivisor i = mod n i > 0
(And on a minor point of style, it's conventional in Haskell to use names like isPrime instead of names like is_prime.)

Part 1: If you look at the tutorial again, you'll notice that it actually gives type signatures in the following forms:
isPrime :: Integer -> Bool
-- or
isPrime :: Integral a => a -> Bool
isPrime :: (Integral a) => a -> Bool -- equivalent
Here, Integer is the name of a concrete type (has an actual representation) and Integral is the name of a class of types. The Integer type is a member of the Integral class.
The constraint Integral a means that whatever type a happens to be, a has to be a member of the Integral class.
Part 2: There are plenty of ways to write such a function. Your recursive definition looks fine (although you might want to use n < i * i instead of n < 2 * i, since it's faster).
If you're learning Haskell, you'll probably want to try writing it using higher-order functions or list comprehensions. Something like:
module Main (main) where
import Control.Monad (forM_)
isPrime :: Integer -> Bool
isPrime n = all (\i -> (n `rem` i) /= 0) $ takeWhile (\i -> i^2 <= n) [2..]
main :: IO ()
main = do n <- readLn
forM_ [1..n] $ \i ->
putStrLn (show (i) ++ " is a prime? " ++ show (isPrime i))

It is Integral a, not Integer a. See http://www.haskell.org/haskellwiki/Converting_numbers.
map and friends is how you loop in Haskell. This is how I would re-write the loop:
main :: IO ()
main = do
n <- read_int
mapM_ tell_prime [1..n]
where tell_prime i = putStrLn (show i ++ " is a prime? " ++ show (is_prime i))

Haskell: RandomGen drops half of values

I am writing a simple deterministic Random Number generator, based on the xorshift. The goal here is not to get a cryptographically secure or statistically perfect (pseudo-)random number generator, but to be able to archieve the same deterministic sequence of semi-random numbers across programming languages.
My Haskell program looks like follows:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module SimpleRNG where
import Data.Word (Word32)
import Data.Bits (xor, shift)
import System.Random (RandomGen(..))
import Control.Arrow
(|>) :: a -> (a -> b) -> b
(|>) x f = f x
infixl 0 |>
newtype SeedState = SeedState Word32
deriving (Eq, Show, Enum, Bounded)
seed :: Integral a => a -> SeedState
seed = SeedState . fromIntegral
rand_r :: SeedState -> (Word32, SeedState)
rand_r (SeedState num) = (res, SeedState res)
where
res = num
|> xorshift 13
|> xorshift (-17)
|> xorshift 5
xorshift :: Int -> Word32 -> Word32
xorshift amount x = x `xor` (shift x amount)
instance RandomGen SeedState where
next seed_state = (first fromIntegral) $ rand_r seed_state
where
genRange seed_state = (fromEnum (minBound `asTypeOf` seed_state),
fromEnum (maxBound `asTypeOf` seed_state))
split seed_state#(SeedState num) = (seed_state', inverted_seed_state')
where
(_, seed_state') = next seed_state
(_, inverted_seed_state') = next inverted_seed_state
inverted_seed_state = SeedState (maxBound - num)
Now, for some reason, when running
take 10 $ System.Random.randoms (seed 42) :: [Word32]
it returns only the 'odd' results, compared to the output of the following Python program:
class SeedState(object):
def __init__(self, seed = 42):
self.data = seed
def rand_r(rng_state):
num = rng_state.data
num ^= (num << 13) % (2 ** 32)
num ^= (num >> 17) % (2 ** 32)
num ^= (num << 5) % (2 ** 32)
rng_state.data = num
return num
__global_rng_state = SeedState(42)
def rand():
global __global_rng_state
return rand_r(__global_rng_state)
def seed(seed):
global __global_rng_state
__global_rng_state = SeedState(seed)
if __name__ == '__main__':
for x in range(0, 10):
print(rand())
It seems like the internals of the System.Random module do some weird trickery with the return result of the generator
(calling
map fst $ take 10 $ iterate (\(_, rng) -> rand_r rng) (rand_r $ seed 42)
gives the result I'd expect).
This is odd, since the type returned by the generator is already a Word32, so it could/should just be passed on unaltered without any remapping happening.
What is going on here, and is there a way to plug this xorshift-generator into System.Random in a way that returns the same results?

This is to do with the behaviour of System.Random.randoms, which repeatedly applies random to a RandomGen, not next.
class Random a where
...
random :: (RandomGen g) => g -> (a, g)
The Random class is what allows you to reuse RandomGen instances across different enums, and the instance for Word32 (as well as nearly all other types) is defined as
instance Random Word32 where randomR = randomIvalIntegral; random = randomBounded
randomBounded just calls randomR, so the behaviour of random is decided by `
randomIvalIntegral (l,h) = randomIvalInteger (toInteger l, toInteger h)
randomIvalInteger is an interesting function, you can read the source here. It's actually causing your problem because the function will discard a certain number of intermediate values based on the range of the generator and the range being generated over.
To get the values you want, you just need to use next instead - the easiest way would just be to define
randoms' g = x : (randoms' g') where (x, g') = next g

How do I memoize?

I have written this function that computes Collatz sequences, and I see wildly varying times of execution depending on the spin I give it. Apparently it is related to something called "memoization", but I have a hard time understanding what it is and how it works, and, unfortunately, the relevant article on HaskellWiki, as well as the papers it links to, have all proven to not be easily surmountable. They discuss intricate details of the relative performance of highly layman-indifferentiable tree constructions, while what I miss must be some very basic, very trivial point that these sources neglect to mention.
This is the code. It is a complete program, ready to be built and executed.
module Main where
import Data.Function
import Data.List (maximumBy)
size :: (Integral a) => a
size = 10 ^ 6
-- Nail the basics.
collatz :: Integral a => a -> a
collatz n | even n = n `div` 2
| otherwise = n * 3 + 1
recollatz :: Integral a => a -> a
recollatz = fix $ \f x -> if (x /= 1)
then f (collatz x)
else x
-- Now, I want to do the counting with a tuple monad.
mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)
remocollatz :: Integral a => a -> ([a], a)
remocollatz = fix $ \f x -> if x /= 1
then f =<< mocollatz x
else return x
-- Trivialities.
collatzLength :: Integral a => a -> Int
collatzLength x = (length . fst $ (remocollatz x)) + 1
collatzPairs :: Integral a => a -> [(a, Int)]
collatzPairs n = zip [1..n] (collatzLength <$> [1..n])
longestCollatz :: Integral a => a -> (a, Int)
longestCollatz n = maximumBy order $ collatzPairs n
where
order :: Ord b => (a, b) -> (a, b) -> Ordering
order x y = snd x `compare` snd y
main :: IO ()
main = print $ longestCollatz size
With ghc -O2 it takes about 17 seconds, without ghc -O2 -- about 22 seconds to deliver the length and the seed of the longest Collatz sequence starting at any point below size.
Now, if I make these changes:
diff --git a/Main.hs b/Main.hs
index c78ad95..9607fe0 100644
--- a/Main.hs
+++ b/Main.hs
## -1,6 +1,7 ##
module Main where
import Data.Function
+import qualified Data.Map.Lazy as M
import Data.List (maximumBy)
size :: (Integral a) => a
## -22,10 +23,15 ## recollatz = fix $ \f x -> if (x /= 1)
mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)
-remocollatz :: Integral a => a -> ([a], a)
-remocollatz = fix $ \f x -> if x /= 1
- then f =<< mocollatz x
- else return x
+remocollatz :: (Num a, Integral b) => b -> ([b], a)
+remocollatz 1 = return 1
+remocollatz x = case M.lookup x (table mutate) of
+ Nothing -> mutate x
+ Just y -> y
+ where mutate x = remocollatz =<< mocollatz x
+
+table :: (Ord a, Integral a) => (a -> b) -> M.Map a b
+table f = M.fromList [ (x, f x) | x <- [1..size] ]
-- Trivialities.
-- Then it will take just about 4 seconds with ghc -O2, but I would not live long enough to see it complete without ghc -O2.
Looking at the details of cost centres with ghc -prof -fprof-auto -O2 reveals that the first version enters collatz about a hundred million times, while the patched one -- just about one and a half million times. This must be the reason of the speedup, but I have a hard time understanding the inner workings of this magic. My best idea is that we replace a portion of expensive recursive calls with O(log n) map lookups, but I don't know if it's true and why it depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.
Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?
P.S. There are two requirements to the achieving of automagical memoization highlighted elsewhere on Stack Overflow:
Make a function to be memoized a top-level name.
Make a function to be memoized a monomorphic one.
In line with these requirements, I rebuilt remocollatz as follows:
remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = mutate x
mutate :: Int -> ([Int], Int)
mutate x = remocollatz =<< mocollatz x
Now it's as top level and as monomorphic as it gets. Running time is about 11 seconds, versus the similarly monomorphized table version:
remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = case M.lookup x (table mutate) of
Nothing -> mutate x
Just y -> y
mutate :: Int -> ([Int], Int)
mutate = \x -> remocollatz =<< mocollatz x
table :: (Int -> ([Int], Int)) -> M.Map Int ([Int], Int)
table f = M.fromList [ (x, f x) | x <- [1..size] ]
-- Running in less than 4 seconds.
I wonder why the memoization ghc is supposedly performing in the first case here is almost 3 times slower than my dumb table.

Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?
Disclaimer: this is a guess, not verified by viewing GHC core output. A careful answer would do so to verify the conjectures outlined below. You can try peering through it yourself: add -ddump-simpl to your compilation line and you will get copious output detailing exactly what GHC has done to your code.
You write:
remocollatz x = {- ... -} table mutate {- ... -}
where mutate x = remocollatz =<< mocollatz x
The expression table mutate in fact does not depend on x; but it appears on the right-hand side of an equation that takes x as an argument. Consequently, without optimizations, this table is recomputed each time remocollatz is called (presumably even from inside the computation of table mutate).
With optimizations, GHC notices that table mutate does not depend on x, and floats it to its own definition, effectively producing:
fresh_variable_name = table mutate
where mutate x = remocollatz =<< mocollatz x
remocollatz x = case M.lookup x fresh_variable_name of
{- ... -}
The table is therefore computed just once for the entire program run.
don't know why it [the performance] depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.
Sorry, but Haskell doesn't work that way. The language definition tells clearly what the meaning of a given Haskell term is, but does not say anything about the runtime or memory performance needed to compute that meaning.

Another approach to memoization that works in some situations, like this one, is to use a boxed vector, whose elements are computed lazily. The function used to initialize each element can use other elements of the vector in its calculation. As long as the evaluation of an element of the vector doesn't loop and refer to itself, just the elements it recursively depends on will be evaluated. Once evaluated, an element is effectively memoized, and this has the further benefit that elements of the vector that are never referenced are never evaluated.
The Collatz sequence is a nearly ideal application for this technique, but there is one complication. The next Collatz value(s) in sequence from a value under the limit may be outside the limit, which would cause a range error when indexing the vector. I solved this by just iterating through the sequence until back under the limit and counting the steps to do so.
The following program takes 0.77 seconds to run unoptimized and 0.30 when optimized:
import qualified Data.Vector as V
limit = 10 ^ 6 :: Int
-- The Collatz function, which given a value returns the next in the sequence.
nextCollatz val
| odd val = 3 * val + 1
| otherwise = val `div` 2
-- Given a value, return the next Collatz value in the sequence that is less
-- than the limit and the number of steps to get there. For example, the
-- sequence starting at 13 is: [13, 40, 20, 10, 5, 16, 8, 4, 2, 1], so if
-- limit is 100, then (nextCollatzWithinLimit 13) is (40, 1), but if limit is
-- 15, then (nextCollatzWithinLimit 13) is (10, 3).
nextCollatzWithinLimit val = (firstInRange, stepsToFirstInRange)
where
firstInRange = head rest
stepsToFirstInRange = 1 + (length biggerThanLimit)
(biggerThanLimit, rest) = span (>= limit) (tail collatzSeqStartingWithVal)
collatzSeqStartingWithVal = iterate nextCollatz val
-- A boxed vector holding Collatz length for each index. The collatzFn used
-- to generate the value for each element refers back to other elements of
-- this vector, but since the vector elements are only evaluated as needed and
-- there aren't any loops in the Collatz sequences, the values are calculated
-- only as needed.
collatzVec :: V.Vector Int
collatzVec = V.generate limit collatzFn
where
collatzFn :: Int -> Int
collatzFn index
| index <= 1 = 1
| otherwise = (collatzVec V.! nextWithinLimit) + stepsToGetThere
where
(nextWithinLimit, stepsToGetThere) = nextCollatzWithinLimit index
main :: IO ()
main = do
-- Use a fold through the vector to find the longest Collatz sequence under
-- the limit, and keep track of both the maximum length and the initial
-- value of the sequence, which is the index.
let (maxLength, maxIndex) = V.ifoldl' accMaxLen (0, 0) collatzVec
accMaxLen acc#(accMaxLen, accMaxIndex) index currLen
| currLen <= accMaxLen = acc
| otherwise = (currLen, index)
putStrLn $ "Max Collatz length below " ++ show limit ++ " is "
++ show maxLength ++ " at index " ++ show maxIndex

Arbitrary String generator in Haskell (Test.QuickCheck.Gen)

I am struggling on Real World Haskell Chapter 11 quickCheck generator implementation for a an algebraic data type.
Following the book implementation (which was published in 2008), I came up with the following:
-- file: ch11/Prettify2.hs
module Prettify2(
Doc(..)
) where
data Doc = Empty
| Char Char
| Text String
| Line
| Concat Doc Doc
| Union Doc Doc
deriving (Show, Eq)
And my Arbitrary implementation:
-- file: ch11/Arbitrary.hs
import System.Random
import Test.QuickCheck.Gen
import qualified Test.QuickCheck.Arbitrary
class Arbitrary a where
arbitrary :: Gen a
-- elements' :: [a] => Gen a {- Expected a constraint, but ‘[a]’ has kind ‘*’ -}
-- choose' :: Random a => (a, a) -> Gen a
-- oneof' :: [Gen a] -> a
data Ternary = Yes
| No
| Unknown
deriving(Eq, Show)
instance Arbitrary Ternary where
arbitrary = do
n <- choose (0, 2) :: Gen Int
return $ case n of
0 -> Yes
1 -> No
_ -> Unknown
instance (Arbitrary a, Arbitrary b) => Arbitrary (a, b) where
arbitrary = do
x <- arbitrary
y <- arbitrary
return (x, y)
instance Arbitrary Char where
arbitrary = elements (['A'..'Z'] ++ ['a' .. 'z'] ++ " ~!##$%^&*()")
I tried the two following implementation with no success:
import Prettify2
import Control.Monad( liftM, liftM2 )
instance Arbitrary Doc where
arbitrary = do
n <- choose (1,6) :: Gen Int
case n of
1 -> return Empty
2 -> do x <- arbitrary
return (Char x)
3 -> do x <- arbitrary
return (Text x)
4 -> return Line
5 -> do x <- arbitrary
y <- arbitrary
return (Concat x y)
6 -> do x <- arbitrary
y <- arbitrary
return (Union x y)
instance Arbitrary Doc where
arbitrary =
oneof [ return Empty
, liftM Char arbitrary
, liftM Text arbitrary
, return Line
, liftM2 Concat arbitrary arbitrary
, liftM2 Union arbitrary arbitrary ]
But it doesn't compile since No instance for (Arbitrary String)
I tried then to implement the instance for Arbitrary String in the following ways:
import qualified Test.QuickCheck.Arbitrary but it does not implement Arbitrary String neither
installing Test.RandomStrings hackage link
instance Arbitrary String where
arbitrary = do
n <- choose (8, 16) :: Gen Int
return $ randomWord randomASCII n :: Gen String
With the following backtrace:
$ ghci
GHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for help
Prelude> :l Arbitrary.hs
[1 of 2] Compiling Prettify2 ( Prettify2.hs, interpreted )
[2 of 2] Compiling Main ( Arbitrary.hs, interpreted )
Arbitrary.hs:76:9:
The last statement in a 'do' block must be an expression
return <- randomWord randomASCII n :: Gen String
Failed, modules loaded: Prettify2
Would you have any good suggestion about how to implement this particular generator and - more in general - how to proceed in these cases?
Thank you in advance

Don't define a new Arbitrary type class, import Test.QuickCheck instead. It defines most of these instances for you. Also be careful about the version of quickcheck, RWH assumes version 1.
The resulting full implementation will be:
-- file: ch11/Arbitrary.hs
import Test.QuickCheck
import Prettify2
import Control.Monad( liftM, liftM2 )
data Ternary = Yes
| No
| Unknown
deriving(Eq, Show)
instance Arbitrary Ternary where
arbitrary = do
n <- choose (0, 2) :: Gen Int
return $ case n of
0 -> Yes
1 -> No
_ -> Unknown
instance Arbitrary Doc where
arbitrary =
oneof [ return Empty
, liftM Char arbitrary
, liftM Text arbitrary
, return Line
, liftM2 Concat arbitrary arbitrary
, liftM2 Union arbitrary arbitrary ]

Intersperse values into separate Vectors using generate

I am trying to generate a tuple of Vectors by using a function that creates a custom data type (or a tuple) of values from an index. Here is an approach that achieves the desired result:
import Prelude hiding (map, unzip)
import Data.Vector hiding (map)
import Data.Array.Repa
import Data.Functor.Identity
data Foo = Foo {fooX :: Int, fooY :: Int}
unfoo :: Foo -> (Int, Int)
unfoo (Foo x y) = (x, y)
make :: Int -> (Int -> Foo) -> (Vector Int, Vector Int)
make n f = unzip $ generate n getElt where
getElt i = unfoo $ f i
Except that I would like to do it in a single iteration per Vector, almost like it is shown below, but avoiding multiple evaluation of function f:
make' :: Int -> (Int -> Foo) -> (Vector Int, Vector Int)
make' n f = (generate n getElt1, generate n getElt2) where
getElt1 i = fooX $ f i
getElt2 i = fooY $ f i
Just as a note, I understand that Vector library supports fusion, and the first example is already pretty efficient. I need a solution to generate concept, other libraries have very similar constructors (Repa has fromFunction for example), and I am using Vectors here simply to demonstrate a problem.
Maybe some sort of memoizing of f function call would work, but I cannot think of anything.
Edit:
Another demonstration of the problem using Repa:
makeR :: Int -> (Int -> Foo) -> (Array U DIM1 Int, Array U DIM1 Int)
makeR n f = runIdentity $ do
let arr = fromFunction (Z :. n) (\ (Z :. i) -> unfoo $ f i)
arr1 <- computeP $ map fst arr
arr2 <- computeP $ map snd arr
return (arr1, arr2)
Same as with vectors, fusion saves the day on performance, but an intermediate array arr of tuples is still required, which I am trying to avoid.
Edit 2: (3 years later)
In the Repa example above it will not create an intermediate array, since fromFunction creates a delayed array. Instead it will be even worse, it will evaluate f twice for each index, one for the first array, second time for the second array. Delayed array must be computed in order to avoid such duplication of work.

Looking back at my own question from a few years ago I can now easily show what I was trying to do back than and how to get it done.
In short, it can't be done purely, therefore we need to resort to ST monad and manual mutation of two vectors, but in the end we do get this nice and pure function that creates only two vectors and does not rely on fusion.
import Control.Monad.ST
import Data.Vector.Primitive
import Data.Vector.Primitive.Mutable
data Foo = Foo {fooX :: Int, fooY :: Int}
make :: Int -> (Int -> Foo) -> (Vector Int, Vector Int)
make n f = runST $ do
let n' = max 0 n
mv1 <- new n'
mv2 <- new n'
let fillVectors i
| i < n' = let Foo x y = f i
in write mv1 i x >> write mv2 i y >> fillVectors (i + 1)
| otherwise = return ()
fillVectors 0
v1 <- unsafeFreeze mv1
v2 <- unsafeFreeze mv2
return (v1, v2)
And the we use it in a similar fashion it is done with generate:
λ> make 10 (\ i -> Foo (i + i) (i * i))
([0,2,4,6,8,10,12,14,16,18],[0,1,4,9,16,25,36,49,64,81])

The essential thing you're trying to write is
splat f = unzip . fmap f
which shares the results of evaluating f between the two result vectors, but you want to avoid the intermediate vector. Unfortunately, I'm pretty sure you can't have it both ways in any meaningful sense. Consider a vector of length 1 for simplicity. In order for the result vectors to share the result of f (v ! 0), each will need a reference to a thunk representing that result. Well, that thunk has to be somewhere, and it really might as well be in a vector.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string