Related
I am writing a simple deterministic Random Number generator, based on the xorshift. The goal here is not to get a cryptographically secure or statistically perfect (pseudo-)random number generator, but to be able to archieve the same deterministic sequence of semi-random numbers across programming languages.
My Haskell program looks like follows:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
module SimpleRNG where
import Data.Word (Word32)
import Data.Bits (xor, shift)
import System.Random (RandomGen(..))
import Control.Arrow
(|>) :: a -> (a -> b) -> b
(|>) x f = f x
infixl 0 |>
newtype SeedState = SeedState Word32
deriving (Eq, Show, Enum, Bounded)
seed :: Integral a => a -> SeedState
seed = SeedState . fromIntegral
rand_r :: SeedState -> (Word32, SeedState)
rand_r (SeedState num) = (res, SeedState res)
where
res = num
|> xorshift 13
|> xorshift (-17)
|> xorshift 5
xorshift :: Int -> Word32 -> Word32
xorshift amount x = x `xor` (shift x amount)
instance RandomGen SeedState where
next seed_state = (first fromIntegral) $ rand_r seed_state
where
genRange seed_state = (fromEnum (minBound `asTypeOf` seed_state),
fromEnum (maxBound `asTypeOf` seed_state))
split seed_state#(SeedState num) = (seed_state', inverted_seed_state')
where
(_, seed_state') = next seed_state
(_, inverted_seed_state') = next inverted_seed_state
inverted_seed_state = SeedState (maxBound - num)
Now, for some reason, when running
take 10 $ System.Random.randoms (seed 42) :: [Word32]
it returns only the 'odd' results, compared to the output of the following Python program:
class SeedState(object):
def __init__(self, seed = 42):
self.data = seed
def rand_r(rng_state):
num = rng_state.data
num ^= (num << 13) % (2 ** 32)
num ^= (num >> 17) % (2 ** 32)
num ^= (num << 5) % (2 ** 32)
rng_state.data = num
return num
__global_rng_state = SeedState(42)
def rand():
global __global_rng_state
return rand_r(__global_rng_state)
def seed(seed):
global __global_rng_state
__global_rng_state = SeedState(seed)
if __name__ == '__main__':
for x in range(0, 10):
print(rand())
It seems like the internals of the System.Random module do some weird trickery with the return result of the generator
(calling
map fst $ take 10 $ iterate (\(_, rng) -> rand_r rng) (rand_r $ seed 42)
gives the result I'd expect).
This is odd, since the type returned by the generator is already a Word32, so it could/should just be passed on unaltered without any remapping happening.
What is going on here, and is there a way to plug this xorshift-generator into System.Random in a way that returns the same results?
This is to do with the behaviour of System.Random.randoms, which repeatedly applies random to a RandomGen, not next.
class Random a where
...
random :: (RandomGen g) => g -> (a, g)
The Random class is what allows you to reuse RandomGen instances across different enums, and the instance for Word32 (as well as nearly all other types) is defined as
instance Random Word32 where randomR = randomIvalIntegral; random = randomBounded
randomBounded just calls randomR, so the behaviour of random is decided by `
randomIvalIntegral (l,h) = randomIvalInteger (toInteger l, toInteger h)
randomIvalInteger is an interesting function, you can read the source here. It's actually causing your problem because the function will discard a certain number of intermediate values based on the range of the generator and the range being generated over.
To get the values you want, you just need to use next instead - the easiest way would just be to define
randoms' g = x : (randoms' g') where (x, g') = next g
I have written this function that computes Collatz sequences, and I see wildly varying times of execution depending on the spin I give it. Apparently it is related to something called "memoization", but I have a hard time understanding what it is and how it works, and, unfortunately, the relevant article on HaskellWiki, as well as the papers it links to, have all proven to not be easily surmountable. They discuss intricate details of the relative performance of highly layman-indifferentiable tree constructions, while what I miss must be some very basic, very trivial point that these sources neglect to mention.
This is the code. It is a complete program, ready to be built and executed.
module Main where
import Data.Function
import Data.List (maximumBy)
size :: (Integral a) => a
size = 10 ^ 6
-- Nail the basics.
collatz :: Integral a => a -> a
collatz n | even n = n `div` 2
| otherwise = n * 3 + 1
recollatz :: Integral a => a -> a
recollatz = fix $ \f x -> if (x /= 1)
then f (collatz x)
else x
-- Now, I want to do the counting with a tuple monad.
mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)
remocollatz :: Integral a => a -> ([a], a)
remocollatz = fix $ \f x -> if x /= 1
then f =<< mocollatz x
else return x
-- Trivialities.
collatzLength :: Integral a => a -> Int
collatzLength x = (length . fst $ (remocollatz x)) + 1
collatzPairs :: Integral a => a -> [(a, Int)]
collatzPairs n = zip [1..n] (collatzLength <$> [1..n])
longestCollatz :: Integral a => a -> (a, Int)
longestCollatz n = maximumBy order $ collatzPairs n
where
order :: Ord b => (a, b) -> (a, b) -> Ordering
order x y = snd x `compare` snd y
main :: IO ()
main = print $ longestCollatz size
With ghc -O2 it takes about 17 seconds, without ghc -O2 -- about 22 seconds to deliver the length and the seed of the longest Collatz sequence starting at any point below size.
Now, if I make these changes:
diff --git a/Main.hs b/Main.hs
index c78ad95..9607fe0 100644
--- a/Main.hs
+++ b/Main.hs
## -1,6 +1,7 ##
module Main where
import Data.Function
+import qualified Data.Map.Lazy as M
import Data.List (maximumBy)
size :: (Integral a) => a
## -22,10 +23,15 ## recollatz = fix $ \f x -> if (x /= 1)
mocollatz :: Integral b => b -> ([b], b)
mocollatz n = ([n], collatz n)
-remocollatz :: Integral a => a -> ([a], a)
-remocollatz = fix $ \f x -> if x /= 1
- then f =<< mocollatz x
- else return x
+remocollatz :: (Num a, Integral b) => b -> ([b], a)
+remocollatz 1 = return 1
+remocollatz x = case M.lookup x (table mutate) of
+ Nothing -> mutate x
+ Just y -> y
+ where mutate x = remocollatz =<< mocollatz x
+
+table :: (Ord a, Integral a) => (a -> b) -> M.Map a b
+table f = M.fromList [ (x, f x) | x <- [1..size] ]
-- Trivialities.
-- Then it will take just about 4 seconds with ghc -O2, but I would not live long enough to see it complete without ghc -O2.
Looking at the details of cost centres with ghc -prof -fprof-auto -O2 reveals that the first version enters collatz about a hundred million times, while the patched one -- just about one and a half million times. This must be the reason of the speedup, but I have a hard time understanding the inner workings of this magic. My best idea is that we replace a portion of expensive recursive calls with O(log n) map lookups, but I don't know if it's true and why it depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.
Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?
P.S. There are two requirements to the achieving of automagical memoization highlighted elsewhere on Stack Overflow:
Make a function to be memoized a top-level name.
Make a function to be memoized a monomorphic one.
In line with these requirements, I rebuilt remocollatz as follows:
remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = mutate x
mutate :: Int -> ([Int], Int)
mutate x = remocollatz =<< mocollatz x
Now it's as top level and as monomorphic as it gets. Running time is about 11 seconds, versus the similarly monomorphized table version:
remocollatz :: Int -> ([Int], Int)
remocollatz 1 = return 1
remocollatz x = case M.lookup x (table mutate) of
Nothing -> mutate x
Just y -> y
mutate :: Int -> ([Int], Int)
mutate = \x -> remocollatz =<< mocollatz x
table :: (Int -> ([Int], Int)) -> M.Map Int ([Int], Int)
table f = M.fromList [ (x, f x) | x <- [1..size] ]
-- Running in less than 4 seconds.
I wonder why the memoization ghc is supposedly performing in the first case here is almost 3 times slower than my dumb table.
Can I haz an explanation of what happens here, and why the performance differs so vastly between ghc -O2 and plain ghc builds?
Disclaimer: this is a guess, not verified by viewing GHC core output. A careful answer would do so to verify the conjectures outlined below. You can try peering through it yourself: add -ddump-simpl to your compilation line and you will get copious output detailing exactly what GHC has done to your code.
You write:
remocollatz x = {- ... -} table mutate {- ... -}
where mutate x = remocollatz =<< mocollatz x
The expression table mutate in fact does not depend on x; but it appears on the right-hand side of an equation that takes x as an argument. Consequently, without optimizations, this table is recomputed each time remocollatz is called (presumably even from inside the computation of table mutate).
With optimizations, GHC notices that table mutate does not depend on x, and floats it to its own definition, effectively producing:
fresh_variable_name = table mutate
where mutate x = remocollatz =<< mocollatz x
remocollatz x = case M.lookup x fresh_variable_name of
{- ... -}
The table is therefore computed just once for the entire program run.
don't know why it [the performance] depends so much on some godforsaken compiler flags, while, as I see it, such performance swings should all follow solely from the language.
Sorry, but Haskell doesn't work that way. The language definition tells clearly what the meaning of a given Haskell term is, but does not say anything about the runtime or memory performance needed to compute that meaning.
Another approach to memoization that works in some situations, like this one, is to use a boxed vector, whose elements are computed lazily. The function used to initialize each element can use other elements of the vector in its calculation. As long as the evaluation of an element of the vector doesn't loop and refer to itself, just the elements it recursively depends on will be evaluated. Once evaluated, an element is effectively memoized, and this has the further benefit that elements of the vector that are never referenced are never evaluated.
The Collatz sequence is a nearly ideal application for this technique, but there is one complication. The next Collatz value(s) in sequence from a value under the limit may be outside the limit, which would cause a range error when indexing the vector. I solved this by just iterating through the sequence until back under the limit and counting the steps to do so.
The following program takes 0.77 seconds to run unoptimized and 0.30 when optimized:
import qualified Data.Vector as V
limit = 10 ^ 6 :: Int
-- The Collatz function, which given a value returns the next in the sequence.
nextCollatz val
| odd val = 3 * val + 1
| otherwise = val `div` 2
-- Given a value, return the next Collatz value in the sequence that is less
-- than the limit and the number of steps to get there. For example, the
-- sequence starting at 13 is: [13, 40, 20, 10, 5, 16, 8, 4, 2, 1], so if
-- limit is 100, then (nextCollatzWithinLimit 13) is (40, 1), but if limit is
-- 15, then (nextCollatzWithinLimit 13) is (10, 3).
nextCollatzWithinLimit val = (firstInRange, stepsToFirstInRange)
where
firstInRange = head rest
stepsToFirstInRange = 1 + (length biggerThanLimit)
(biggerThanLimit, rest) = span (>= limit) (tail collatzSeqStartingWithVal)
collatzSeqStartingWithVal = iterate nextCollatz val
-- A boxed vector holding Collatz length for each index. The collatzFn used
-- to generate the value for each element refers back to other elements of
-- this vector, but since the vector elements are only evaluated as needed and
-- there aren't any loops in the Collatz sequences, the values are calculated
-- only as needed.
collatzVec :: V.Vector Int
collatzVec = V.generate limit collatzFn
where
collatzFn :: Int -> Int
collatzFn index
| index <= 1 = 1
| otherwise = (collatzVec V.! nextWithinLimit) + stepsToGetThere
where
(nextWithinLimit, stepsToGetThere) = nextCollatzWithinLimit index
main :: IO ()
main = do
-- Use a fold through the vector to find the longest Collatz sequence under
-- the limit, and keep track of both the maximum length and the initial
-- value of the sequence, which is the index.
let (maxLength, maxIndex) = V.ifoldl' accMaxLen (0, 0) collatzVec
accMaxLen acc#(accMaxLen, accMaxIndex) index currLen
| currLen <= accMaxLen = acc
| otherwise = (currLen, index)
putStrLn $ "Max Collatz length below " ++ show limit ++ " is "
++ show maxLength ++ " at index " ++ show maxIndex
How to derive a parameterized similarity in a way that it would be convenient to use in Haskell?
The class should be such that the domain can be numeric or text (and possibly something else), and the parameter controlling the internals of comparison function can also be of several types.
Below, you may find the one approach that uses two class parameters. What implications this design entails if the goal is to define several "similarity or equality groups"? (What kind of use cases would be hard to implement compared to some alternative implementation?) In this example, the similarity groups of words could be defined to be edit distances of one, two etc. and in double to be different precisions.
Some of the methods take both numeric and textual inputs like the "quiteSimilar"-method. Why not use just some distance? Some of the similarities should be able to be defined by the user of the parameterized equality, e.g. on text (words) they could be based on synonyms.
And on doubles, well, I don't know yet, what kind of comparisons will be needed. (Suggestions are welcome.) After equalities comes the question, how to compare the order of items so that similar items will be deemed to be equal and not the larger and smaller, see the last line of the output.
{-# LANGUAGE MultiParamTypeClasses #-}
import Data.Array
import qualified Data.Text as T
-- parameterized eq
class Peq a b where peq :: a -> b -> b -> Bool
instance Peq Double Double where peq = almostEqRelPrec
instance Peq Int T.Text where peq = editDistance
class Comment a where
quiteSimilar :: a -> a -> T.Text
instance Comment Double where
quiteSimilar a b = if peq (epsilon * 100::Double) a b then T.pack "alike" else T.pack "unalike"
instance Comment T.Text where
quiteSimilar a b = if peq (1::Int) a b then T.pack "alike" else T.pack "unalike"
x1' x = quiteSimilar 0.25 (0.25 - x * epsilon :: Double)
x1 = quiteSimilar 0.25 (0.25 - 25 * epsilon :: Double)
x2 = quiteSimilar 0.25 (0.25 - 26 * epsilon :: Double)
x3' x = quiteSimilar 1e12 (1e12 - x * ulp 1e12 :: Double)
x3 = quiteSimilar 1e12 (1e12 - 181 * ulp 1e12 :: Double)
x4 = quiteSimilar 1e12 (1e12 - 182 * ulp 1e12 :: Double)
u181 = 181 * ulp 1e12 :: Double
main = do
let a = 0.2 + 0.65 :: Double
b = 0.85 :: Double
s = T.pack "trial"
t = T.pack "tr1al"
putStrLn $ "0.2 + 0.65 = " ++ show a ++ " and compared to " ++ show b ++ ", it is " ++ T.unpack (quiteSimilar a b)
putStrLn $ "Texts " ++ T.unpack s ++ " and " ++ T.unpack t ++ " are " ++ T.unpack (quiteSimilar s t)
putStrLn $ "Note that " ++ show a ++ " > " ++ show b ++ " is " ++ show (a > b)
-- packege Numeric.Limits contains this one
epsilon :: RealFloat a => a
epsilon = r
where r = 1 - encodeFloat (m-1) e
(m, e) = decodeFloat (1 `asTypeOf` r)
ulp :: RealFloat a => a -> a
ulp a = r
where r = a - encodeFloat (m-1) e
(m, e) = decodeFloat (a `asTypeOf` r)
almostEqRelPrec :: (RealFloat a) => a -> a -> a -> Bool
almostEqRelPrec maxRelPrec a b = d <= (largest * maxRelPrec)
where
d = abs $ a - b
largest = max (abs a) (abs b)
editDistance :: Int -> T.Text -> T.Text -> Bool
editDistance i a b = i == editDistance' (show a) (show b)
-- from https://wiki.haskell.org/Edit_distance
-- see also https://hackage.haskell.org/package/edit-distance-0.2.2.1
editDistance' :: Eq a => [a] -> [a] -> Int
editDistance' xs ys = table ! (m,n)
where
(m,n) = (length xs, length ys)
x = array (1,m) (zip [1..] xs)
y = array (1,n) (zip [1..] ys)
table :: Array (Int,Int) Int
table = array bnds [(ij, dist ij) | ij <- range bnds]
bnds = ((0,0),(m,n))
dist (0,j) = j
dist (i,0) = i
dist (i,j) = minimum [table ! (i-1,j) + 1, table ! (i,j-1) + 1,
if x ! i == y ! j then table ! (i-1,j-1) else 1 + table ! (i-1,j-1)]
On my machine, the output is:
0.2 + 0.65 = 0.8500000000000001 and compared to 0.85, it is alike
Texts trial and tr1al are alike
Note that 0.8500000000000001 > 0.85 is True
Edit:
Trying to rephrase the question: could this be achieved more elegantly with a similarity class that has only one parameter a and not two (a and b)? I have a feeling that multiparameter classes may turn out to be difficult later on. Is this a needless fear? First solution along this line that came to my mind was to define similarity class with one parameter a and a class for functions having two parameters. And on instances constraint other type to be similarity class parameter and the other would be for actual method returning Bool.
Are there some benefits of using the latter approach to the one presented? Or actually what are the possible trade-offs between these approaches? And if there are still more ways to make achieve this kind of things, how do they compare?
could this be achieved more elegantly with a similarity class that has only one parameter a and not two (a and b)
Yes. Many MultiParamTypeClasses can be rewritten quite easily to single-param ones... by simply degrading the second parameter to an associated type family:
{-# LANGUAGE TypeFamilies #-}
class Peq b where
type SimilarityThreshold b :: *
peq :: SimilarityThreshold b -> b -> b -> Bool
instance Peq Double where
type SimilarityThreshold Double = Double
peq = almostEqRelPrec
instance Peq T.Text where
type SimilarityThreshold T.Text = Int
peq = editDistance
This is quite a bit more verbose, but indeed I tend to favour this style. The main difference is that the associated type family always assigng each type of values to be compared unambiguously a threshold-type. This can save you some could not deduce... type inference trouble, however it also means that you can't use two different metric-types for a single type (but why would you, anyway).
Note that you can achieve exactly the same semantics by simply adding a fundep to your original class:
{-# LANGUAGE FunctionalDependencies #-}
class Peq a b | b -> a where
peq :: a -> b -> b -> Bool
This is just a bit different in usage – again I tend to favour the type families approach: it is more explicit in what the parameters are for, while at the same time avoiding the second parameter to turn up in the constraints to any Peq-polymorphic function.
Here's a simple function. It takes an input Int and returns a (possibly empty) list of (Int, Int) pairs, where the input Int is the sum of the cubed elements of any of the pairs.
cubeDecomposition :: Int -> [(Int, Int)]
cubeDecomposition n = [(x, y) | x <- [1..m], y <- [x..m], x^3 + y^3 == n]
where m = truncate $ fromIntegral n ** (1/3)
-- cubeDecomposition 1729
-- [(1,12),(9,10)]
I want to test the property that the above is true; if I cube each element and sum any of the return tuples, then I get my input back:
import Control.Arrow
cubedElementsSumToN :: Int -> Bool
cubedElementsSumToN n = all (== n) d
where d = map (uncurry (+) . ((^3) *** (^3))) (cubeDecomposition n)
For runtime considerations, I'd like to limit the input Ints to a certain size when testing this with QuickCheck. I can define an appropriate type and Arbitrary instance:
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
import Test.QuickCheck
newtype SmallInt = SmallInt Int
deriving (Show, Eq, Enum, Ord, Num, Real, Integral)
instance Arbitrary SmallInt where
arbitrary = fmap SmallInt (choose (-10000000, 10000000))
And then I guess I have to define versions of the function and property that use SmallInt rather than Int:
cubeDecompositionQC :: SmallInt -> [(SmallInt, SmallInt)]
cubeDecompositionQC n = [(x, y) | x <- [1..m], y <- [x..m], x^3 + y^3 == n]
where m = truncate $ fromIntegral n ** (1/3)
cubedElementsSumToN' :: SmallInt -> Bool
cubedElementsSumToN' n = all (== n) d
where d = map (uncurry (+) . ((^3) *** (^3))) (cubeDecompositionQC n)
-- cubeDecompositionQC 1729
-- [(SmallInt 1,SmallInt 12),(SmallInt 9,SmallInt 10)]
This works fine, and the standard 100 tests pass as expected. But it seems unnecessary to define a new type, instance, and function when all I really need is a custom generator. So I tried this:
smallInts :: Gen Int
smallInts = choose (-10000000, 10000000)
cubedElementsSumToN'' :: Int -> Property
cubedElementsSumToN'' n = forAll smallInts $ \m -> all (== n) (d m)
where d = map (uncurry (+) . ((^3) *** (^3)))
. cubeDecomposition
Now, the first few times I ran this, everything worked fine, and all tests pass. But on subsequent runs I observed failures. Bumping up the test size reliably finds one:
*** Failed! Falsifiable (after 674 tests and 1 shrink):
0
8205379
I'm a bit confused here due to the presence of two shrunken inputs - 0 and 8205379 - returned from QuickCheck, where I would intuitively expect one. Also, those inputs work as predicted (on my show-able property, at least):
*Main> cubedElementsSumToN 0
True
*Main> cubedElementsSumToN 8205379
True
So it seems like obviously there's a problem in the property that uses the custom Gen I defined.
What have I done wrong?
I quickly realized that the property as I've written it is obviously incorrect. Here's the proper way to do it, using the original cubedElementsSumToN property:
quickCheck (forAll smallInts cubedElementsSumToN)
which reads quite naturally.
Here is my first Haskell program. What parts would you write in a better way?
-- Multiplication table
-- Returns n*n multiplication table in base b
import Text.Printf
import Data.List
import Data.Char
-- Returns n*n multiplication table in base b
mulTable :: Int -> Int -> String
mulTable n b = foldl (++) (verticalHeader n b w) (map (line n b w) [0..n])
where
lo = 2* (logBase (fromIntegral b) (fromIntegral n))
w = 1+fromInteger (floor lo)
verticalHeader :: Int -> Int -> Int -> String
verticalHeader n b w = (foldl (++) tableHeader columnHeaders)
++ "\n"
++ minusSignLine
++ "\n"
where
tableHeader = replicate (w+2) ' '
columnHeaders = map (horizontalHeader b w) [0..n]
minusSignLine = concat ( replicate ((w+1)* (n+2)) "-" )
horizontalHeader :: Int -> Int -> Int -> String
horizontalHeader b w i = format i b w
line :: Int -> Int -> Int -> Int -> String
line n b w y = (foldl (++) ((format y b w) ++ "|" )
(map (element b w y) [0..n])) ++ "\n"
element :: Int -> Int -> Int -> Int -> String
element b w y x = format (y * x) b w
toBase :: Int -> Int -> [Int]
toBase b v = toBase' [] v where
toBase' a 0 = a
toBase' a v = toBase' (r:a) q where (q,r) = v `divMod` b
toAlphaDigits :: [Int] -> String
toAlphaDigits = map convert where
convert n | n < 10 = chr (n + ord '0')
| otherwise = chr (n + ord 'a' - 10)
format :: Int -> Int -> Int -> String
format v b w = concat spaces ++ digits ++ " "
where
digits = if v == 0
then "0"
else toAlphaDigits ( toBase b v )
l = length digits
spaceCount = if (l > w) then 0 else (w-l)
spaces = replicate spaceCount " "
Here are some suggestions:
To make the tabularity of the computation more obvious, I would pass the list [0..n] to the line function rather than passing n.
I would further split out the computation of the horizontal and vertical axes so that they are passed as arguments to mulTable rather than computed there.
Haskell is higher-order, and almost none of the computation has to do with multiplication. So I would change the name of mulTable to binopTable and pass the actual multiplication in as a parameter.
Finally, the formatting of individual numbers is repetitious. Why not pass \x -> format x b w as a parameter, eliminating the need for b and w?
The net effect of the changes I am suggesting is that you build a general higher-order function for creating tables for binary operators. Its type becomes something like
binopTable :: (i -> String) -> (i -> i -> i) -> [i] -> [i] -> String
and you wind up with a much more reusable function—for example, Boolean truth tables should be a piece of cake.
Higher-order and reusable is the Haskell Way.
You don't use anything from import Text.Printf.
Stylistically, you use more parentheses than necessary. Haskellers tend to find code more readable when it's cleaned of extraneous stuff like that. Instead of something like h x = f (g x), write h = f . g.
Nothing here really requires Int; (Integral a) => a ought to do.
foldl (++) x xs == concat $ x : xs: I trust the built-in concat to work better than your implementation.
Also, you should prefer foldr when the function is lazy in its second argument, as (++) is – because Haskell is lazy, this reduces stack space (and also works on infinite lists).
Also, unwords and unlines are shortcuts for intercalate " " and concat . map (++ "\n") respectively, i.e. "join with spaces" and "join with newlines (plus trailing newline)"; you can replace a couple things by those.
Unless you use big numbers, w = length $ takeWhile (<= n) $ iterate (* b) 1 is probably faster. Or, in the case of a lazy programmer, let w = length $ toBase b n.
concat ( (replicate ((w+1)* (n+2)) "-" ) == replicate ((w+1) * (n+2)) '-' – not sure how you missed this one, you got it right just a couple lines up.
You do the same thing with concat spaces, too. However, wouldn't it be easier to actually use the Text.Printf import and write printf "%*s " w digits?
Norman Ramsey gave excellent high-level (design) suggestions; Below are some low-level ones:
First, consult with HLint. HLint is a friendly program that gives you rudimentary advice on how to improve your Haskell code!
In your case HLint gives 7 suggestions. (mostly about redundant brackets)
Modify your code according to HLint's suggestions until it likes what you feed it.
More HLint-like stuff:
concat (replicate i "-"). Why not replicate i '-'?
Consult with Hoogle whenever there is reason to believe that a function you need is already available in Haskell's libraries. Haskell comes with tons of useful functions so Hoogle should come in handy quite often.
Need to concatenate strings? Search for [String] -> String, and voila you found concat. Now go replace all those folds.
The previous search also suggested unlines. Actually, this even better suits your needs. It's magic!
Optional: pause and thank in your heart to Neil M for making Hoogle and HLint, and thank others for making other good stuff like Haskell, bridges, tennis balls, and sanitation.
Now, for every function that takes several arguments of the same type, make it clear which means what, by giving them descriptive names. This is better than comments, but you can still use both.
So
-- Returns n*n multiplication table in base b
mulTable :: Int -> Int -> String
mulTable n b =
becomes
mulTable :: Int -> Int -> String
mulTable size base =
To soften the extra characters blow of the previous suggestion: When a function is only used once, and is not very useful by itself, put it inside its caller's scope in its where clause, where it could use the callers' variables, saving you the need to pass everything to it.
So
line :: Int -> Int -> Int -> Int -> String
line n b w y =
concat
$ format y b w
: "|"
: map (element b w y) [0 .. n]
element :: Int -> Int -> Int -> Int -> String
element b w y x = format (y * x) b w
becomes
line :: Int -> Int -> Int -> Int -> String
line n b w y =
concat
$ format y b w
: "|"
: map element [0 .. n]
where
element x = format (y * x) b w
You can even move line into mulTable's where clause; imho, you should.
If you find a where clause nested inside another where clause troubling, then I suggest to change your indentation habits. My recommendation is to use consistent indentation of always 2 or always 4 spaces. Then you can easily see, everywhere, where the where in the other where is at. ok
Below's what it looks like (with a few other changes in style):
import Data.List
import Data.Char
mulTable :: Int -> Int -> String
mulTable size base =
unlines $
[ vertHeaders
, minusSignsLine
] ++ map line [0 .. size]
where
vertHeaders =
concat
$ replicate (cellWidth + 2) ' '
: map horizontalHeader [0 .. size]
horizontalHeader i = format i base cellWidth
minusSignsLine = replicate ((cellWidth + 1) * (size + 2)) '-'
cellWidth = length $ toBase base (size * size)
line y =
concat
$ format y base cellWidth
: "|"
: map element [0 .. size]
where
element x = format (y * x) base cellWidth
toBase :: Integral i => i -> i -> [i]
toBase base
= reverse
. map (`mod` base)
. takeWhile (> 0)
. iterate (`div` base)
toAlphaDigit :: Int -> Char
toAlphaDigit n
| n < 10 = chr (n + ord '0')
| otherwise = chr (n + ord 'a' - 10)
format :: Int -> Int -> Int -> String
format v b w =
spaces ++ digits ++ " "
where
digits
| v == 0 = "0"
| otherwise = map toAlphaDigit (toBase b v)
spaces = replicate (w - length digits) ' '
0) add a main function :-) at least rudimentary
import System.Environment (getArgs)
import Control.Monad (liftM)
main :: IO ()
main = do
args <- liftM (map read) $ getArgs
case args of
(n:b:_) -> putStrLn $ mulTable n b
_ -> putStrLn "usage: nntable n base"
1) run ghc or runhaskell with -Wall; run through hlint.
While hlint doesn't suggest anything special here (only some redundant brackets), ghc will tell you that you don't actually need Text.Printf here...
2) try running it with base = 1 or base = 0 or base = -1
If you want multiline comments use:
{- A multiline
comment -}
Also, never use foldl, use foldl' instead, in cases where you are dealing with large lists which must be folded. It is more memory efficient.
A brief comments saying what each function does, its arguments and return value, is always good. I had to read the code pretty carefully to fully make sense of it.
Some would say if you do that, explicit type signatures may not be required. That's an aesthetic question, I don't have a strong opinion on it.
One minor caveat: if you do remove the type signatures, you'll automatically get the polymorphic Integral support ephemient mentioned, but you will still need one around toAlphaDigits because of the infamous "monomorphism restriction."