Warning on specialisations when compiling Haskell Code with ghc

Warning on specialisations when compiling Haskell Code with ghc - haskell

I get the following error when trying to compile
$ ghc --make -O2 -Wall -fforce-recomp
[1 of 1] Compiling Main (
isPrimeSmart.hs, isPrimeSmart.o )
SpecConstr
Function `$wa{v s2we} [lid]'
has two call patterns, but the limit is 1
Use -fspec-constr-count=n to set the bound
Use -dppr-debug to see specialisations Linking isPrimeSmart
...
My code is:
{-# OPTIONS_GHC -O2 -optc-O2 #-}
import qualified Data.ByteString.Lazy.Char8 as StrL -- StrL is STRing Library
import Data.List
-- read in a file. First line tells how many cases. Each case is on a separate
-- line with the lower an upper bounds separated by a space. Print all primes
-- between the lower and upper bound. Separate results for each case with
-- a blank line.
main :: IO ()
main = do
let factors = takeWhile (<= (ceiling $ sqrt (1000000000::Double))) allPrimes
(l:ls) <- StrL.lines `fmap` StrL.getContents
let numCases = readInt l
let cases = (take numCases ls)
sequence_ $ intersperse (putStrLn "") $ map (doLine factors) cases
-- get and print all primes between the integers specified on a line.
doLine :: [Integer] -> StrL.ByteString -> IO ()
doLine factors l = mapM_ print $ primesForLine factors l
---------------------- pure code below this line ------------------------------
-- get all primes between the integers specified on a line.
primesForLine :: [Integer] -> StrL.ByteString -> [Integer]
primesForLine factors l = getPrimes factors range
where
range = rangeForLine l
-- Generate a list of numbers to check, store it in list, and then check them...
getPrimes :: [Integer] -> (Integer, Integer) -> [Integer]
getPrimes factors range = filter (isPrime factors) (getCandidates range)
-- generate list of candidate values based on upper and lower bound
getCandidates :: (Integer, Integer) -> [Integer]
getCandidates (propStart, propEnd) = list
where
list = if propStart < 3
then 2 : oddList
else oddList
oddList = [listStart, listStart + 2 .. propEnd]
listStart = if cleanStart `rem` 2 == 0
then cleanStart + 1
else cleanStart
cleanStart = if propStart < 3
then 3
else propStart
-- A line always has the lower and upper bound separated by a space.
rangeForLine :: StrL.ByteString -> (Integer, Integer)
rangeForLine caseLine = start `seq` end `seq` (start, end)
where
[start, end] = (map readInteger $ StrL.words caseLine)::[Integer]
-- read an Integer from a ByteString
readInteger :: StrL.ByteString -> Integer
readInteger x =
case StrL.readInteger x of Just (i,_) -> i
Nothing -> error "Unparsable Integer"
-- read an Int from a ByteString
readInt :: StrL.ByteString -> Int
readInt x =
case StrL.readInt x of Just (i,_) -> i
Nothing -> error "Unparsable Int"
-- generates all primes in a lazy way.
allPrimes :: [Integer]
allPrimes = ps (2:[3,5 .. ])
where
ps (np:candidates) = -- np stands for New Prime
np : ps (filter (\n -> n `rem` np /= 0) candidates)
ps [] = error "this can't happen but is shuts up the compiler"
-- Check to see if it is a prime by comparing against the factors.
isPrime :: [Integer] -> Integer -> Bool
isPrime factors val = all (\f -> val `rem` f /= 0) validFactors
where
validFactors = takeWhile (< ceil) factors
ceil = ((ceiling $ sqrt $ ((fromInteger val)::Double))) :: Integer
I have no idea how to fix this warning. How do I start? Do I compile to assembly and match the error up? What does the warning even mean?

These are just (annoying) warnings, indicating that GHC could do further specializations to your code if you really want to. Future versions of GHC will likely not emit this data by default, since there's nothing you can do about it anyway.
They are harmless, and are not errors. Don't worry about them.
To directly address the problem, you can use -w (suppress warnings) instead of -Wall.
E.g. in a file {-# OPTIONS_GHC -w #-} will disable warnings.
Alternately, increasing the specialization threshold will make the warning go away, e.g. -fspec-constr-count=16

Related

Why is recursion slower than filter with this Random number counter

I am writing a function that generates a million random numbers of 1 or 0 and then counts how many 0s were generated.
import System.Random
import Control.Monad
countZeros :: Int -> IO Int
countZeros n = (length . filter (==0)) <$> (replicateM n $ randomRIO (0,1 :: Int))
countZeros' :: Int -> IO Int
countZeros' n = go n 0
where
go :: Int -> Int -> IO Int
go x acc = do
r <- randomRIO (0,1 :: Int)
case x of
0 -> pure acc
_ -> let acc' = if r == 0 then succ acc else acc
in go (pred x) acc'
when I run the functions with an input of 1000000
>λ= countZeros 1000000
499716
(0.93 secs, 789,015,080 bytes)
>λ= countZeros' 1000000
500442
(2.02 secs, 1,109,569,560 bytes)
I don't understand why the prime function is twice as slow as the other. I assumed that they are essentially doing the same thing behind the scenes.
I am using GHCi.
What am I missing?

With bang patterns, and proper compilation with -O2, the "prime" function is faster:
{-# LANGUAGE BangPatterns #-}
module Main where
import System.Random
import Control.Monad
import System.Environment
countZeros :: Int -> IO Int
countZeros n = (length . filter (==0)) <$> (replicateM n $ randomRIO (0,1 :: Int))
countZeros' :: Int -> IO Int
countZeros' n = go n 0
where
go :: Int -> Int -> IO Int
go !x !acc = do
r <- randomRIO (0,1 :: Int)
case x of
0 -> pure acc
_ -> let acc' = if r == 0 then succ acc else acc
in go (pred x) acc'
main :: IO ()
main = do
[what] <- getArgs
let n = 1000 * 1000 * 10
fun = case what of
"1" -> countZeros
"2" -> countZeros'
_ -> error "arg not a number"
putStrLn "----"
print =<< fun n
putStrLn "----"
Compiled with
$ stack ghc -- RandomPerf.hs -O2 -Wall
$ stack ghc -- --version
The Glorious Glasgow Haskell Compilation System, version 8.6.3
Tests:
$ time ./RandomPerf.exe 1
----
4999482
----
real 0m3.329s
user 0m0.000s
sys 0m0.031s
$ time ./RandomPerf.exe 2
----
5001089
----
real 0m2.338s
user 0m0.000s
sys 0m0.046s
Repeating the tests gives comparable results, so this is not a fluke.
Result: the countZeros' function is significantly faster.
Using Criterion and running a proper benchmark is left as an exercise.
You probably used GHCi to assess performance, which prevents the optimizer to do its job. GHCi sacrifices proper optimization to load files faster, and be more usable in an interactive way.

These actually work in different ways from each other, at a level that matters. And both are slow.
The version using replicateM is bad because replicateM in IO can't stream its results. The entire list will be constructed at once, before filter and length get to start operating on it. The reason it's faster is that length is strict in its accumulator, so it doesn't generate a massive nested chain of thinks the way your other version does. And that's even worse for performance.
The recursive version doesn't use a strict accumulator. This means that the value it returns is a giant chain of nested thunks, holding on to all the generated entries and a bunch of indirect calls via list indexing. This is even more memory used than the filter version, because it's holding on to a bunch of closures as well as all the values. But even with that fixed, it would still be slow. Using !! just wrecks performance. It's recursive when a simple if would do the same job much more efficiently.

How do I memoize?

I have written this function that computes Collatz sequences, and I see wildly varying times of execution depending on the spin I give it. Apparently it is related to something called "memoization", but I have a hard time understanding what it is and how it works, and, unfortunately, the relevant article on HaskellWiki, as well as the papers it links to, have all proven to not be easily surmountable. They discuss intricate details of the relative performance of highly layman-indifferentiable tree constructions, while what I miss must be some very basic, very trivial point that these sources neglect to mention.
This is the code. It is a complete program, ready to be built and executed.
module Main where
import Data.Function
import Data.List (maximumBy)
size :: (Integral a) => a
size = 10 ^ 6
-- Nail the basics.
collatz :: Integral a => a -> a
collatz n | even n = n `div` 2
| otherwise = n * 3 + 1
recollatz :: Integral a => a -> a
recollatz = fix $ \f x -> if (x /= 1)
then f (collatz x)
else x
-- Now, I want to do the counting with a tuple monad.
mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)
remocollatz :: Integral a => a -> ([a], a)
remocollatz = fix $ \f x -> if x /= 1
then f =<< mocollatz x
else return x
-- Trivialities.
collatzLength :: Integral a => a -> Int
collatzLength x = (length . fst $ (remocollatz x)) + 1
collatzPairs :: Integral a => a -> [(a, Int)]
collatzPairs n = zip [1..n] (collatzLength <$> [1..n])
longestCollatz :: Integral a => a -> (a, Int)
longestCollatz n = maximumBy order $ collatzPairs n
where
order :: Ord b => (a, b) -> (a, b) -> Ordering
order x y = snd x `compare` snd y
main :: IO ()
main = print $ longestCollatz size
With ghc -O2 it takes about 17 seconds, without ghc -O2 -- about 22 seconds to deliver the length and the seed of the longest Collatz sequence starting at any point below size.
Now, if I make these changes:
diff --git a/Main.hs b/Main.hs
index c78ad95..9607fe0 100644
--- a/Main.hs
+++ b/Main.hs
## -1,6 +1,7 ##
module Main where
import Data.Function
+import qualified Data.Map.Lazy as M
import Data.List (maximumBy)
size :: (Integral a) => a
## -22,10 +23,15 ## recollatz = fix $ \f x -> if (x /= 1)
mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)
-remocollatz :: Integral a => a -> ([a], a)
-remocollatz = fix $ \f x -> if x /= 1
- then f =<< mocollatz x
- else return x
+remocollatz :: (Num a, Integral b) => b -> ([b], a)
+remocollatz 1 = return 1
+remocollatz x = case M.lookup x (table mutate) of
+ Nothing -> mutate x
+ Just y -> y
+ where mutate x = remocollatz =<< mocollatz x
+
+table :: (Ord a, Integral a) => (a -> b) -> M.Map a b
+table f = M.fromList [ (x, f x) | x <- [1..size] ]
-- Trivialities.
-- Then it will take just about 4 seconds with ghc -O2, but I would not live long enough to see it complete without ghc -O2.
Looking at the details of cost centres with ghc -prof -fprof-auto -O2 reveals that the first version enters collatz about a hundred million times, while the patched one -- just about one and a half million times. This must be the reason of the speedup, but I have a hard time understanding the inner workings of this magic. My best idea is that we replace a portion of expensive recursive calls with O(log n) map lookups, but I don't know if it's true and why it depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.
Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?
P.S. There are two requirements to the achieving of automagical memoization highlighted elsewhere on Stack Overflow:
Make a function to be memoized a top-level name.
Make a function to be memoized a monomorphic one.
In line with these requirements, I rebuilt remocollatz as follows:
remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = mutate x
mutate :: Int -> ([Int], Int)
mutate x = remocollatz =<< mocollatz x
Now it's as top level and as monomorphic as it gets. Running time is about 11 seconds, versus the similarly monomorphized table version:
remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = case M.lookup x (table mutate) of
Nothing -> mutate x
Just y -> y
mutate :: Int -> ([Int], Int)
mutate = \x -> remocollatz =<< mocollatz x
table :: (Int -> ([Int], Int)) -> M.Map Int ([Int], Int)
table f = M.fromList [ (x, f x) | x <- [1..size] ]
-- Running in less than 4 seconds.
I wonder why the memoization ghc is supposedly performing in the first case here is almost 3 times slower than my dumb table.

Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?
Disclaimer: this is a guess, not verified by viewing GHC core output. A careful answer would do so to verify the conjectures outlined below. You can try peering through it yourself: add -ddump-simpl to your compilation line and you will get copious output detailing exactly what GHC has done to your code.
You write:
remocollatz x = {- ... -} table mutate {- ... -}
where mutate x = remocollatz =<< mocollatz x
The expression table mutate in fact does not depend on x; but it appears on the right-hand side of an equation that takes x as an argument. Consequently, without optimizations, this table is recomputed each time remocollatz is called (presumably even from inside the computation of table mutate).
With optimizations, GHC notices that table mutate does not depend on x, and floats it to its own definition, effectively producing:
fresh_variable_name = table mutate
where mutate x = remocollatz =<< mocollatz x
remocollatz x = case M.lookup x fresh_variable_name of
{- ... -}
The table is therefore computed just once for the entire program run.
don't know why it [the performance] depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.
Sorry, but Haskell doesn't work that way. The language definition tells clearly what the meaning of a given Haskell term is, but does not say anything about the runtime or memory performance needed to compute that meaning.

Another approach to memoization that works in some situations, like this one, is to use a boxed vector, whose elements are computed lazily. The function used to initialize each element can use other elements of the vector in its calculation. As long as the evaluation of an element of the vector doesn't loop and refer to itself, just the elements it recursively depends on will be evaluated. Once evaluated, an element is effectively memoized, and this has the further benefit that elements of the vector that are never referenced are never evaluated.
The Collatz sequence is a nearly ideal application for this technique, but there is one complication. The next Collatz value(s) in sequence from a value under the limit may be outside the limit, which would cause a range error when indexing the vector. I solved this by just iterating through the sequence until back under the limit and counting the steps to do so.
The following program takes 0.77 seconds to run unoptimized and 0.30 when optimized:
import qualified Data.Vector as V
limit = 10 ^ 6 :: Int
-- The Collatz function, which given a value returns the next in the sequence.
nextCollatz val
| odd val = 3 * val + 1
| otherwise = val `div` 2
-- Given a value, return the next Collatz value in the sequence that is less
-- than the limit and the number of steps to get there. For example, the
-- sequence starting at 13 is: [13, 40, 20, 10, 5, 16, 8, 4, 2, 1], so if
-- limit is 100, then (nextCollatzWithinLimit 13) is (40, 1), but if limit is
-- 15, then (nextCollatzWithinLimit 13) is (10, 3).
nextCollatzWithinLimit val = (firstInRange, stepsToFirstInRange)
where
firstInRange = head rest
stepsToFirstInRange = 1 + (length biggerThanLimit)
(biggerThanLimit, rest) = span (>= limit) (tail collatzSeqStartingWithVal)
collatzSeqStartingWithVal = iterate nextCollatz val
-- A boxed vector holding Collatz length for each index. The collatzFn used
-- to generate the value for each element refers back to other elements of
-- this vector, but since the vector elements are only evaluated as needed and
-- there aren't any loops in the Collatz sequences, the values are calculated
-- only as needed.
collatzVec :: V.Vector Int
collatzVec = V.generate limit collatzFn
where
collatzFn :: Int -> Int
collatzFn index
| index <= 1 = 1
| otherwise = (collatzVec V.! nextWithinLimit) + stepsToGetThere
where
(nextWithinLimit, stepsToGetThere) = nextCollatzWithinLimit index
main :: IO ()
main = do
-- Use a fold through the vector to find the longest Collatz sequence under
-- the limit, and keep track of both the maximum length and the initial
-- value of the sequence, which is the index.
let (maxLength, maxIndex) = V.ifoldl' accMaxLen (0, 0) collatzVec
accMaxLen acc#(accMaxLen, accMaxIndex) index currLen
| currLen <= accMaxLen = acc
| otherwise = (currLen, index)
putStrLn $ "Max Collatz length below " ++ show limit ++ " is "
++ show maxLength ++ " at index " ++ show maxIndex

Working with Data.Map.StrictMap.Maps using Control.Parallel

I have the following code. The M prefix designates functions from Data.Map.Strict, and Table is a type alias for Data.Map.Strict.Map Mapping Bool, where Mapping is an arbitrary opaque structure.
computeCoverage :: Table -> Expr -> Maybe Coverage
computeCoverage t e = go t True M.empty
where go src flag targ
| null src = if flag
then Nothing
else Just (M.size t, targ)
| otherwise = let ((m, b), rest) = M.deleteFindMin src
result = interpret e m
flag' = result && flag in
go rest flag' (if b == result then targ else M.insert m b targ)
I would like to be able to use Control.Parallel to perform this with as much parallelism as possible. However, I'm not sure how to do this. Based on reading Data.Map.Strict, it seems what you're supposed to do is call splitRoot, then do whatever parallel stuff you want on the resulting list, then recombine (I guess?). Have I basically got the right idea? If not, what should I do instead to parallelize the code above?

Here's a contrived example. You just use parMap over M.splitRoot m:
import qualified Data.Map.Strict as M
import Control.Parallel.Strategies
import System.Environment
fib 0 = 0
fib 1 = 1
fib n = fib (n-2) + fib (n-1)
theMap :: Int -> M.Map Int Int
theMap n = M.fromList [ (x, 33 + mod x 3) | x <- [1..n] ]
isInteresting n = mod (fib n) 2 == 0
countInteresting :: M.Map Int Int -> Int
countInteresting m = length $ filter isInteresting (M.elems m)
doit :: Int -> [Int]
doit n = parMap rseq countInteresting (M.splitRoot $ theMap n)
main :: IO ()
main = do
( arg1 : _) <- getArgs
let n = read arg1
print $ doit n
Note, however these caveats:
the splits may not be of equal size
use splitRoot if working with a Map is helpful for your computation; this particular example doesn't benefit from the Map structure of root - it could have just parMapped over the elements.

Haskell: Parallel code is slower than sequential version

I am pretty new to Haskell threads (and parallel programming in general) and I am not sure why my parallel version of an algorithm runs slower than the corresponding sequential version.
The algorithm tries to find all k-combinations without using recursion. For this, I am using this helper function, which given a number with k bits set, returns the next number with the same number of bits set:
import Data.Bits
nextKBitNumber :: Integer -> Integer
nextKBitNumber n
| n == 0 = 0
| otherwise = ripple .|. ones
where smallest = n .&. (-n)
ripple = n + smallest
newSmallest = ripple .&. (-ripple)
ones = (newSmallest `div` smallest) `shiftR` 1 - 1
It is now easy to obtain sequentially all k-combinations in the range [(2^k - 1), (2^(n-k)+...+ 2^(n-1)):
import qualified Data.Stream as ST
combs :: Int -> Int -> [Integer]
combs n k = ST.takeWhile (<= end) $ kBitNumbers start
where start = 2^k - 1
end = sum $ fmap (2^) [n-k..n-1]
kBitNumbers :: Integer -> ST.Stream Integer
kBitNumbers = ST.iterate nextKBitNumber
main :: IO ()
main = do
params <- getArgs
let n = read $ params !! 0
k = read $ params !! 1
print $ length (combs n k)
My idea is that this should be easily parallelizable splitting this range into smaller parts. For example:
start :: Int -> Integer
start k = 2 ^ k - 1
end :: Int -> Int -> Integer
end n k = sum $ fmap (2 ^) [n-k..n-1]
splits :: Int -> Int -> Int -> [(Integer, Integer, Int)]
splits n k numSplits = fixedRanges ranges []
where s = start k
e = end n k
step = (e-s) `div` (min (e-s) (toInteger numSplits))
initSplits = [s,s+step..e]
ranges = zip initSplits (tail initSplits)
fixedRanges [] acc = acc
fixedRanges [x] acc = acc ++ [(fst x, e, k)]
fixedRanges (x:xs) acc = fixedRanges xs (acc ++ [(fst x, snd x, k)])
At this point, I would like to run each split in parallel, something like:
runSplit :: (Integer, Integer, Int) -> [Integer]
runSplit (start, end, k) = ST.takeWhile (<= end) $ kBitNumbers (fixStart start)
where fixStart s
| popCount s == k = s
| otherwise = fixStart $ s + 1
For pallalelization I am using the monad-par package:
import Control.Monad.Par
import System.Environment
import qualified Data.Set as S
main :: IO ()
main = do
params <- getArgs
let n = read $ params !! 0
k = read $ params !! 1
numTasks = read $ params !! 2
batches = runPar $ parMap runSplit (splits n k numTasks)
reducedNumbers = foldl S.union S.empty $ fmap S.fromList batches
print $ S.size reducedNumbers
The result is that the sequential version is way faster and it uses little memory, while the parallel version consumes a lot of memory and it is noticeable slower.
What might be the reasons causing this? Are threads a good approach for this problem? For example, every thread generates a (potentially large) list of integers and the main thread reduces the results; are threads expected to need much memory or are simply meant to produce simple results (i.e. only cpu-intensive computations)?
I compile my program with stack build --ghc-options -threaded --ghc-options -rtsopts --executable-profiling --library-profiling and run it with ./.stack-work/install/x86_64-osx/lts-6.1/7.10.3/bin/combinatorics 20 3 4 +RTS -pa -N4 -RTS for n=20, k=3 and numSplits=4. An example of the profiling report for the parallel version can be found here and for the sequential version here.

In your sequential version calling combs does not build up a list in memory since after length consumes an element it isn't needed anymore and is freed. Indeed, GHC may not even allocate storage for it.
For instance, this will take a while but won't consume a lot of memory:
main = print $ length [1..1000000000] -- 1 billion
In your parallel version you are generating sub-lists, concatenating them together, building Sets, etc. and therefore the results of each sub-task have to be kept in memory.
A fairer comparison would be to have each parallel task compute the length of the k-bit numbers in its assigned range, and then add up the results. That way the k-bit numbers found by each parallel task wouldn't have to be kept in memory and would operate more like the sequential version.
Update
Here is an example of how to use parMap. Note: under 7.10.2 I've had mixed success getting the parallelism to fire - sometimes it does and sometimes it doesn't. (Figured it out - I was using -RTS -N2 instead of +RTS -N2.)
{-
compile with: ghc -O2 -threaded -rtsopts foo.hs
compare:
time ./foo 26 +RTS -N1
time ./foo 26 +RTS -N2
-}
import Data.Bits
import Control.Parallel.Strategies
import System.Environment
nextKBitNumber :: Integer -> Integer
nextKBitNumber n
| n == 0 = 0
| otherwise = ripple .|. ones
where smallest = n .&. (-n)
ripple = n + smallest
newSmallest = ripple .&. (-ripple)
ones = (newSmallest `div` smallest) `shiftR` 1 - 1
combs :: Int -> Int -> [Integer]
combs n k = takeWhile (<= end) $ iterate nextKBitNumber start
where start = 2^k - 1
end = shift start (n-k)
main :: IO ()
main = do
( arg1 : _) <- getArgs
let n = read arg1
print $ parMap rseq (length . combs n) [1..n]

good approaches for this problem
What do you mean by this problem? If it's how to write, analyze and tune a parallel Haskell program, then this is required background reading:
Simon Marlow: Parallel and Concurrent Programming in Haskell
http://community.haskell.org/~simonmar/pcph/
in particular, Section 15 (Debugging, Tuning, ..)
Use threadscope! (a graphical viewer for thread profile information generated by the Glasgow Haskell compiler) https://hackage.haskell.org/package/threadscope

Using a custom generator vs Arbitrary instance in QuickCheck

Here's a simple function. It takes an input Int and returns a (possibly empty) list of (Int, Int) pairs, where the input Int is the sum of the cubed elements of any of the pairs.
cubeDecomposition :: Int -> [(Int, Int)]
cubeDecomposition n = [(x, y) | x <- [1..m], y <- [x..m], x^3 + y^3 == n]
where m = truncate $ fromIntegral n ** (1/3)
-- cubeDecomposition 1729
-- [(1,12),(9,10)]
I want to test the property that the above is true; if I cube each element and sum any of the return tuples, then I get my input back:
import Control.Arrow
cubedElementsSumToN :: Int -> Bool
cubedElementsSumToN n = all (== n) d
where d = map (uncurry (+) . ((^3) *** (^3))) (cubeDecomposition n)
For runtime considerations, I'd like to limit the input Ints to a certain size when testing this with QuickCheck. I can define an appropriate type and Arbitrary instance:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Test.QuickCheck
newtype SmallInt = SmallInt Int
deriving (Show, Eq, Enum, Ord, Num, Real, Integral)
instance Arbitrary SmallInt where
arbitrary = fmap SmallInt (choose (-10000000, 10000000))
And then I guess I have to define versions of the function and property that use SmallInt rather than Int:
cubeDecompositionQC :: SmallInt -> [(SmallInt, SmallInt)]
cubeDecompositionQC n = [(x, y) | x <- [1..m], y <- [x..m], x^3 + y^3 == n]
where m = truncate $ fromIntegral n ** (1/3)
cubedElementsSumToN' :: SmallInt -> Bool
cubedElementsSumToN' n = all (== n) d
where d = map (uncurry (+) . ((^3) *** (^3))) (cubeDecompositionQC n)
-- cubeDecompositionQC 1729
-- [(SmallInt 1,SmallInt 12),(SmallInt 9,SmallInt 10)]
This works fine, and the standard 100 tests pass as expected. But it seems unnecessary to define a new type, instance, and function when all I really need is a custom generator. So I tried this:
smallInts :: Gen Int
smallInts = choose (-10000000, 10000000)
cubedElementsSumToN'' :: Int -> Property
cubedElementsSumToN'' n = forAll smallInts $ \m -> all (== n) (d m)
where d = map (uncurry (+) . ((^3) *** (^3)))
. cubeDecomposition
Now, the first few times I ran this, everything worked fine, and all tests pass. But on subsequent runs I observed failures. Bumping up the test size reliably finds one:
*** Failed! Falsifiable (after 674 tests and 1 shrink):
0
8205379
I'm a bit confused here due to the presence of two shrunken inputs - 0 and 8205379 - returned from QuickCheck, where I would intuitively expect one. Also, those inputs work as predicted (on my show-able property, at least):
*Main> cubedElementsSumToN 0
True
*Main> cubedElementsSumToN 8205379
True
So it seems like obviously there's a problem in the property that uses the custom Gen I defined.
What have I done wrong?

I quickly realized that the property as I've written it is obviously incorrect. Here's the proper way to do it, using the original cubedElementsSumToN property:
quickCheck (forAll smallInts cubedElementsSumToN)
which reads quite naturally.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string