I wrote code for solving the local alignment problem with Smith–Waterman algorithm.
I want to do this with input of strings with length 10000, with reasonable memory(under 2GB ram) and reasonable time (under 5 minutes).
At first I was using bio library's built in function for this, and it runs way too slow and eat up 4GB of ram before I killed it.
Note the java program jAligner, which implements the same algorithm, can solve this problem with less than 1GB of memory and less than 20 seconds.
When I wrote an unboxed version of this, the program gives me <<loop>>. I think it's because the array need to access items in the array before the array gets built entirely.
So I wonder is it even possible to write Haskell code with similar performance for this kind of larger dynamic programming problems.
module LocalAlign where
--import Data.Array.Unboxed
import Data.Tuple
import Data.Array
localAffineAlignment :: (Char -> Char -> Int)
-> Int
-> Int
-> String
-> String
-> (Int, (String, String, String, String))
localAffineAlignment f g e s' t' = (score, best) where
n = length s'
m = length t'
s= array (0,n-1) $ zip [0..n-1] s'
t= array (0,m-1) $ zip [0..m-1] t'
table :: (Array (Int,Int) Int,Array (Int,Int) Int)
table = (c,d)
where --a :: UArray (Int,Int) Int
a = array ((0,0),(n,m)) [((x,y),a' x y)|x<-[0..n],y<-[0..m]] --s end with gap
b = array ((0,0),(n,m)) [((x,y),b' x y)|x<-[0..n],y<-[0..m]] --t end with gap
c = array ((0,0),(n,m)) [((x,y),fst (c' x y))|x<-[0..n],y<-[0..m]] -- best
d = array ((0,0),(n,m)) [((x,y),snd (c' x y))|x<-[0..n],y<-[0..m]] -- direction
a' i j
| i==0 || j==0 = inf
| otherwise = max (a!(i-1,j)-e) (c!(i-1,j)-g-e)
b' i j
| i==0 || j==0 = inf
| otherwise = max (b!(i,j-1)-e) (c!(i,j-1)-g-e)
c' i j
| min i j == 0 = (0,0)
| otherwise = maximum [(b!(i,j),3),(a!(i,j),2),(c!(i-1,j-1) + f u v,1),(0,0)]
where u = s!(i-1)
v = t!(j-1)
inf = -1073741824
score :: Int
score = maximum $ elems $ fst table
best :: (String, String, String, String)
best = (drop si $ take ei s',drop sj $ take ej t',b1,b2)
where (a,d') = table
(si,sj,b1,b2) = build ei ej [] []
(ei,ej) = snd $ maximum $ map swap $ assocs a
build x y ss tt
| o == 0 = (x,y,ss,tt)
| d == 1 = build (x-1) (y-1) (u:ss) (v:tt)
| d == 2 = build (x-1) y (u:ss) ('-':tt)
| otherwise = build x (y-1) ('-':ss) (v:tt)
where o = a!(x,y)
d = d'!(x,y)
u = s!(x-1)
v = t!(y-1)
is it even possible to write Haskell code with similar performance for this kind of larger dynamic programming problems.
Yes, of course. Use the same data structures and the same algorithms, and you will get same (or better, or worse, by constant factors) performance.
You are using (intermediate) lists and boxed arrays heavily. Consider using the vector package instead.
You might be interested in the MemoCombinators library, which makes doing dynamic programming much easier. You can basically write the algorithm without memoization, then just annotate which variables you want memoized, and the compiler takes it from there.
Related
Good morning everyone!
I'm using the following function as a fitting example of a function that needs to have a simple input and output. In this case it's a function that converts a number from decimal to binary form, as a list of digits no less, just because it is convenient later on.
I chose to write it like this, because even though a number goes in and a list comes out, another structure is needed as an intermediate step, that will hold the digits found so far and hold the quotient of the division, as input for the next step of the loop. I will clean up the necessary mess before outputing anything, though, by selecting the part of the structure that I'm interested in, in this case the second one , and not counters or other stuff, that I'm done with. (As I mentioned this is an example only, and it's not unusual in other cases to initialize the until loop with a triplet like (a,b,c), only to pick one of them at the end, as I see fit, with the help of additional function, like pickXof3.)
So there,
dec2Bin :: Int -> [Int]
dec2Bin num = snd $ until
(\(n,l) -> n <=0) -- test
(\(n,l) -> (fst $ division n, (snd $ division n):l)) -- main function
(num,[]) -- initialization
where division a = divMod a 2
I find it very convenient that Haskell, although lacking traditional for/while loops has a function like until, which reminds me very much of Mathematica's NextWhile, that I'm familiar with.
In the past I would write sth even uglier, like two functions, a "helper" one and a "main" one, like so
dec2BinHelper :: (Int,[Int]) -> (Int,[Int])
dec2BinHelper (n,l)
| n <= 0 = (n,l)
| otherwise = dec2BinHelper (fst $ division n, (snd $ division n):l)
where division a = divMod a 2
-- a function with the sole purpose to act as a front-end to the helper function, initializing its call parameters and picking up its output
dec2Bin :: Int -> [Int]
dec2Bin n = snd $ dec2BinHelper (n,[])
which I think is unnecessarily bloated.
Still, while the use of until allows me to define just one function, I get the feeling that it could be done even simpler/easier to read, perhaps in a way more fitting to functional programming. Is that so? How would you write such a function differently, while keeping the input and output at the absolutely essential values?
I strongly prefer your second solution. I'd start a clean-up with two things: use pattern matching, and use where to hide your helper functions. So:
dec2Bin :: Int -> [Int]
dec2Bin n = snd $ dec2BinHelper (n, []) where
dec2BinHelper (n, l)
| n <= 0 = (n, l)
| otherwise = dec2BinHelper (d, m:l)
where (d, m) = divMod n 2
Now, in the base case, you return a tuple; but then immediately call snd on it. Why not fuse the two?
dec2Bin :: Int -> [Int]
dec2Bin n = dec2BinHelper (n, []) where
dec2BinHelper (n, l)
| n <= 0 = l
| otherwise = dec2BinHelper (d, m:l)
where (d, m) = divMod n 2
There's no obvious reason why you should pass these arguments in a tuple, rather than as separate arguments, which is more idiomatic and saves some allocation/deallocation noise besides.
dec2Bin :: Int -> [Int]
dec2Bin n = dec2BinHelper n [] where
dec2BinHelper n l
| n <= 0 = l
| otherwise = dec2BinHelper d (m:l)
where (d, m) = divMod n 2
You can swap the arguments to dec2BinHelper and eta-reduce; that way, you will not be shadowing the definition of n.
dec2Bin :: Int -> [Int]
dec2Bin = dec2BinHelper [] where
dec2BinHelper l n
| n <= 0 = l
| otherwise = dec2BinHelper (m:l) d
where (d, m) = divMod n 2
Since you know that n > 0 in the recursive call, you can use the slightly faster quotRem in place of divMod. You could also consider using bitwise operations like (.&. 1) and shiftR 1; they may be even better, but you should benchmark to know for sure.
dec2Bin :: Int -> [Int]
dec2Bin = dec2BinHelper [] where
dec2BinHelper l n
| n <= 0 = l
| otherwise = dec2BinHelper (r:l) q
where (q, r) = quotRem n 2
When you don't have a descriptive name for your helper function, it's traditional to name it go or loop.
dec2Bin :: Int -> [Int]
dec2Bin = go [] where
go l n
| n <= 0 = l
| otherwise = go (r:l) q
where (q, r) = quotRem n 2
At this point, the two sides of the conditional are short enough that I'd be tempted to put them on their own line, though this is something of an aesthetic choice.
dec2Bin :: Int -> [Int]
dec2Bin = go [] where
go l n = if n <= 0 then l else go (r:l) q
where (q, r) = quotRem n 2
Finally, a comment on the name: the input isn't really in decimal in any meaningful sense. (Indeed, it's much more physically accurate to think of the input as already being in binary!) Perhaps int2Bin or something like that would be more accurate. Or let the type speak for itself, and just call it toBin.
toBin :: Int -> [Int]
toBin = go [] where
go l n = if n <= 0 then l else go (r:l) q
where (q, r) = quotRem n 2
At this point I'd consider this code quite idiomatic.
Here is a simple programming problem from SPOJ: http://www.spoj.com/problems/PROBTRES/.
Basically, you are asked to output the biggest Collatz cycle for numbers between i and j. (Collatz cycle of a number $n$ is the number of steps to eventually get from $n$ to 1.)
I have been looking for a Haskell way to solve the problem with comparative performance than that of Java or C++ (so as to fits in the allowed run-time limit). Although a simple Java solution that memoizes the cycle length of any already computed cycles will work. I haven't been successful at applying the idea to obtain a Haskell solution.
I have tried the Data.Function.Memoize, as well as home-brewed log time memoization technique using the idea from this post: Memoization in Haskell?. Unfortunately, memoization actually makes the computation of cycle(n) even slower. I believe the slow down comes from the overhead of haskell way. (I tried running with the compiled binary code, instead of interpreting.)
I also suspect that simply iterating numbers from i to j can be costly ($i,j\le10^6$). So I even tried precompute everything for the range query, using idea from http://blog.openendings.net/2013/10/range-trees-and-profiling-in-haskell.html. However, this still gives "Time Limit Exceeding" error.
Can you help to inform a neat competitive Haskell program for this?
Thanks!
>>> using the approach bellow, I could submit an accepted answer to SPOJ. You may check the entire code from here.
The problem has bounds 0 < n < 1,000,000. Pre-calculate all of them and store them inside an array; then freeze the array. The array can be used as its own cache / memoization space.
The problem would then reduce to a range query problem over an array, which can be done very efficiently using trees.
With the code bellow I can get Collatz of 1..1,000,000 in a fraction of a second:
$ time echo 1000000 | ./collatz
525
real 0m0.177s
user 0m0.173s
sys 0m0.003s
Note that collatz function below, uses mutable STUArray internally, but itself is a pure function:
import Control.Monad.ST (ST)
import Control.Monad (mapM_)
import Control.Applicative ((<$>))
import Data.Array.Unboxed (UArray, elems)
import Data.Array.ST (STUArray, readArray, writeArray, runSTUArray, newArray)
collatz :: Int -> UArray Int Int
collatz size = out
where
next i = if odd i then 3 * i + 1 else i `div` 2
loop :: STUArray s Int Int -> Int -> ST s Int
loop arr k
| size < k = succ <$> loop arr (next k)
| otherwise = do
out <- readArray arr k
if out /= 0 then return out
else do
out <- succ <$> loop arr (next k)
writeArray arr k out
return out
out = runSTUArray $ do
arr <- newArray (1, size) 0
writeArray arr 1 1
mapM_ (loop arr) [2..size]
return arr
main = do
size <- read <$> getLine
print . maximum . elems $ collatz size
In order to perform range queries on this array, you may build a balanced tree as simple as below:
type Range = (Int, Int)
data Tree = Leaf Int | Node Tree Tree Range Int
build_tree :: Int -> Tree
build_tree size = loop 1 cnt
where
ctz = collatz size
cnt = head . dropWhile (< size) $ iterate (*2) 1
(Leaf a) +: (Leaf b) = max a b
(Node _ _ _ a) +: (Node _ _ _ b) = max a b
loop lo hi
| lo == hi = Leaf $ if size < lo then minBound else ctz ! lo
| otherwise = Node left right (lo, hi) (left +: right)
where
i = (lo + hi) `div` 2
left = loop lo i
right = loop (i + 1) hi
query_tree :: Tree -> Int -> Int -> Int
query_tree (Leaf x) _ _ = x
query_tree (Node l r (lo, hi) x) i j
| i <= lo && hi <= j = x
| mid < i = query_tree r i j
| j < 1 + mid = query_tree l i j
| otherwise = max (query_tree l i j) (query_tree r i j)
where mid = (lo + hi) `div` 2
Here is the same as in the other answer, but with an immutable recursively defined array (and it also leaks slightly (can someone say why?) and so two times slower):
import Data.Array
upper = 10^6
step :: Integer -> Int
step i = 1 + colAt (if odd i then 3 * i + 1 else i `div` 2)
colAt :: Integer -> Int
colAt i | i > upper = step i
colAt i = col!i
col :: Array Integer Int
col = array (1, upper) $ (1, 1) : [(i, step i) | i <- [2..upper]]
main = print $ maximum $ elems col
all.
While trying to solve some programming quiz:
https://www.hackerrank.com/challenges/missing-numbers
, I came across with space leak.
Main function is difference, which implements multi-set difference.
I've found out that List ':' and Triples (,,) kept on heaps
with -hT option profiling. However, only big lists are difference's
two arguments, and it shrinks as difference keeps on tail recursion.
But the memory consumed by lists keeps increasing as program runs.
Triples is ephemeral array structure, used for bookkeeping the count of multiset's each element. But the memory consumed by triples also
keeps increasing, and I cannot find out why.
Though I've browsed similar 'space leak' questions in stackoverflow,
I couldn't grasp the idea. Surely I have much to study.
I appreciate any comments. Thank you.
p.s) executable is compiled with -O2 switch.
$ ./difference -hT < input04.txt
Stack space overflow: current size 8388608 bytes.
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3
.
import Data.List
import Data.Array
-- array (non-zero-count, start-offset, array_data)
array_size=101
myindex :: Int -> Int -> Int
myindex key offset
| key >= offset = key - offset
| otherwise = key - offset + array_size
mylookup x (_,offset,arr) = arr ! idx
where idx = myindex x offset
addOrReplace :: Int -> Int -> (Int, Int, Array Int (Int,Int)) -> (Int, Int, Array Int (Int,Int))
addOrReplace key value (count,offset,arr) = (count', offset, arr // [(idx,(key,value))])
where idx = myindex key offset
(_,prev_value) = arr ! idx
count' = case (prev_value, value) of
(0,0) -> count
(0,_) -> count + 1
(_,0) -> count - 1
otherwise -> count
difference :: (Int,Int,Array Int (Int,Int)) -> [Int] -> [Int] -> [Int]
difference (count,offset,arr) [] []
| count == 0 = []
| otherwise = [ k | x <- [0..array_size-1], let (k,v) = (arr ! x), v /= 0]
difference m (x:xs) y = difference new_m xs y
where (_,v) = mylookup x m
new_m = addOrReplace x (v + 1) m
difference m [] (y:ys) = difference new_m [] ys
where (_,v) = mylookup y m
new_m = if v == 0
then m
else addOrReplace y (v - 1) m
main = do
n <- readLn :: IO Int
pp <- getLine
m <- readLn :: IO Int
qq <- getLine
let p = map (read :: String->Int) . words $ pp
q = map (read :: String->Int) . words $ qq
startArray = (0,head q, array (0,100) [(i,(0,0)) | i <- [0..100]] )
putStrLn . unwords . map show . sort $ difference startArray q p
[EDIT]
I seq'ed value and Array thanks to Carl's advice.
I attach heap diagram.
[original heap profiling]
[]1
[after seq'ing value v]
difference m (x:xs) y = difference new_m xs y
where (_,v) = mylookup x m
new_m = v `seq` addOrReplace x (v + 1) m
[after seq'ing value v and Array]
difference m (x:xs) y = new_m `seq` difference new_m xs y
where (_,v) = mylookup x m
new_m = v `seq` addOrReplace x (v + 1) m
I see three main problems with this code.
First (and not the cause of the memory use, but definitely the cause of generally poor performance) Array is horrible for this use case. O(1) lookups are useless when updates are O(n).
Speaking of, the values being stored in the Array aren't forced while difference is looping over its first input. They are thunks containing pointers to an unevaluated lookup in the previous version of the array. You can ensure that the value is evaluated at the same time the array is updated, in a variety of ways. When difference loops over its second input, it does this accidentally, in fact, by comparing the value against 0.
Third, difference doesn't even force the evaluation of the new arrays being created while traversing its first argument. Nothing requires the old array to be evaluated during that portion of the loop.
Both of those latter issues need to be resolved to fix the space leak. The first issue doesn't cause a space leak, just much higher overheads than needed.
Consider the modified Euler problem #4 -- "Find the maximum palindromic number which is a product of two numbers between 100 and 9999."
rev :: Int -> Int
rev x = rev' x 0
rev' :: Int -> Int -> Int
rev' n r
| n == 0 = r
| otherwise = rev' (n `div` 10) (r * 10 + n `mod` 10)
pali :: Int -> Bool
pali x = x == rev x
main :: IO ()
main = print . maximum $ [ x*y | x <- nums, y <- nums, pali (x*y)]
where
nums = [9999,9998..100]
This Haskell solution using -O2 and ghc 7.4.1 takes about 18
seconds.
The similar C solution takes 0.1 second.
So Haskell is 180 times
slower. What's wrong with my solution? I assume that this type of
problems Haskell solves pretty well.
Appendix - analogue C solution:
#define A 100
#define B 9999
int ispali(int n)
{
int n0=n, k=0;
while (n>0) {
k = 10*k + n%10;
n /= 10;
}
return n0 == k;
}
int main(void)
{
int max = 0;
for (int i=B; i>=A; i--)
for (int j=B; j>=A; j--) {
if (i*j > max && ispali(i*j))
max = i*j; }
printf("%d\n", max);
}
The similar C solution
That is a common misconception.
Lists are not loops!
And using lists to emulate loops has performance implications unless the compiler is able to eliminate the list from the code.
If you want to compare apples to apples, write the Haskell structure more or less equivalent to a loop, a tail recursive worker (with strict accumulator, though often the compiler is smart enough to figure out the strictness by itself).
Now let's take a more detailed look. For comparison, the C, compiled with gcc -O3, takes ~0.08 seconds here, the original Haskell, compiled with ghc -O2 takes ~20.3 seconds, with ghc -O2 -fllvm ~19.9 seconds. Pretty terrible.
One mistake in the original code is to use div and mod. The C code uses the equivalent of quot and rem, which map to the machine division instructions and are faster than div and mod. For positive arguments, the semantics are the same, so whenever you know that the arguments are always non-negative, never use div and mod.
Changing that, the running time becomes ~15.4 seconds when compiling with the native code generator, and ~2.9 seconds when compiling with the LLVM backend.
The difference is due to the fact that even the machine division operations are quite slow, and LLVM replaces the division/remainder with a multiply-and-shift operation. Doing the same by hand for the native backend (actually, a slightly better replacement taking advantage of the fact that I know the arguments will always be non-negative) brings its time down to ~2.2 seconds.
We're getting closer, but are still a far cry from the C.
That is due to the lists. The code still builds a list of palindromes (and traverses a list of Ints for the two factors).
Since lists cannot contain unboxed elements, that means there is a lot of boxing and unboxing going on in the code, that takes time.
So let us eliminate the lists, and take a look at the result of translating the C to Haskell:
module Main (main) where
a :: Int
a = 100
b :: Int
b = 9999
ispali :: Int -> Bool
ispali n = go n 0
where
go 0 acc = acc == n
go m acc = go (m `quot` 10) (acc * 10 + (m `rem` 10))
maxpal :: Int
maxpal = go 0 b
where
go mx i
| i < a = mx
| otherwise = go (inner mx b) (i-1)
where
inner m j
| j < a = m
| p > m && ispali p = inner p (j-1)
| otherwise = inner m (j-1)
where
p = i*j
main :: IO ()
main = print maxpal
The nested loop is translated to two nested worker functions, we use an accumulator to store the largest palindrome found so far. Compiled with ghc -O2, that runs in ~0.18 seconds, with ghc -O2 -fllvm it runs in ~0.14 seconds (yes, LLVM is better at optimising loops than the native code generator).
Still not quite there, but a factor of about 2 isn't too bad.
Maybe some find the following where the loop is abstracted out more readable, the generated core is for all intents and purposes identical (modulo a switch of argument order), and the performance of course the same:
module Main (main) where
a :: Int
a = 100
b :: Int
b = 9999
ispali :: Int -> Bool
ispali n = go n 0
where
go 0 acc = acc == n
go m acc = go (m `quot` 10) (acc * 10 + (m `rem` 10))
downto :: Int -> Int -> a -> (a -> Int -> a) -> a
downto high low acc fun = go high acc
where
go i acc
| i < low = acc
| otherwise = go (i-1) (fun acc i)
maxpal :: Int
maxpal = downto b a 0 $ \m i ->
downto b a m $ \mx j ->
let p = i*j
in if mx < p && ispali p then p else mx
main :: IO ()
main = print maxpal
#axblount is at least partly right; the following modification makes the program run almost three times as fast as the original:
maxPalindrome = foldl f 0
where f a x | x > a && pali x = x
| otherwise = a
main :: IO ()
main = print . maxPalindrome $ [x * y | x <- nums, y <- nums]
where nums = [9999,9998..100]
That still leaves a factor 60 slowdown, though.
This is more true to what the C code is doing:
maxpali :: [Int] -> Int
maxpali xs = go xs 0
where
go [] m = m
go (x:xs) m = if x > m && pali(x) then go xs x else go xs m
main :: IO()
main = print . maxpali $ [ x*y | x <- nums, y <- nums ]
where nums = [9999,9998..100]
On my box this takes 2 seconds vs .5 for the C version.
Haskell may be storing that entire list [ x*y | x <- nums, y <- nums, pali (x*y)] where as the C solution calculates the maximum on the fly. I'm not sure about this.
Also the C solution will only calculate ispali if the product beats the previous maximum. I would bet Haskell calculates are palindrome products regardless of whether x*y is a possible max.
It seems to me that you are having a branch prediction problem. In the C code, you have two nested loops and as soon as a palindrome is seen in the inner loop, the rest of the inner loop will be skipped very fast.
The way you feed this list of products instead of the nested loops I am not sure that ghc is doing any of this prediction.
Another way to write this is to use two folds, instead of one fold over the flattened list:
-- foldl g0 0 [x*y | x<-[b-1,b-2..a], y<-[b-1,b-2..a], pali(x*y)] (A)
-- foldl g1 0 [x*y | x<-[b-1,b-2..a], y<-[b-1,b-2..a]] (B)
-- foldl g2 0 [ [x*y | y<-[b-1,b-2..a]] | x<-[b-1,b-2..a]] (C)
maxpal b a = foldl f1 0 [b-1,b-2..a] -- (D)
where
f1 m x = foldl f2 m [b-1,b-2..a]
where
f2 m y | p>m && pali p = p
| otherwise = m
where p = x*y
main = print $ maxpal 10000 100
Seems to run much faster than (B) (as in larsmans's answer), too (only 3x - 4x slower then the following loops-based code). Fusing foldl and enumFromThenTo definitions gets us the "functional loops" code (as in DanielFischer's answer),
maxpal_loops b a = f (b-1) 0 -- (E)
where
f x m | x < a = m
| otherwise = g (b-1) m
where
g y m | y < a = f (x-1) m
| p>m && pali p = g (y-1) p
| otherwise = g (y-1) m
where p = x*y
The (C) variant is very suggestive of further algorithmic improvements (that's outside the scope of the original Q of course) that exploit the hidden order in the lists, destroyed by the flattening:
{- foldl g2 0 [ [x*y | y<-[b-1,b-2..a]] | x<-[b-1,b-2..a]] (C)
foldl g2 0 [ [x*y | y<-[x, x-1..a]] | x<-[b-1,b-2..a]] (C1)
foldl g0 0 [ safehead 0 . filter pali $
[x*y | y<-[x, x-1..a]] | x<-[b-1,b-2..a]] (C2)
fst $ until ... (\(m,s)-> (max m .
safehead 0 . filter pali . takeWhile (> m) $
head s, tail s))
(0,[ [x*y | y<-[x, x-1..a]] | x<-[b-1,b-2..a]]) (C3)
safehead 0 $ filter pali $ mergeAllDescending
[ [x*y | y<-[x, x-1..a]] | x<-[b-1,b-2..a]] (C4)
-}
(C3) can stop as soon as the head x*y in a sub-list is smaller than the currently found maximum. It is what short-cutting functional loops code could achieve, but not (C4), which is guaranteed to find the maximal palindromic number first. Plus, for list-based code its algorithmic nature is more visually apparent, IMO.
Here is my first Haskell program. What parts would you write in a better way?
-- Multiplication table
-- Returns n*n multiplication table in base b
import Text.Printf
import Data.List
import Data.Char
-- Returns n*n multiplication table in base b
mulTable :: Int -> Int -> String
mulTable n b = foldl (++) (verticalHeader n b w) (map (line n b w) [0..n])
where
lo = 2* (logBase (fromIntegral b) (fromIntegral n))
w = 1+fromInteger (floor lo)
verticalHeader :: Int -> Int -> Int -> String
verticalHeader n b w = (foldl (++) tableHeader columnHeaders)
++ "\n"
++ minusSignLine
++ "\n"
where
tableHeader = replicate (w+2) ' '
columnHeaders = map (horizontalHeader b w) [0..n]
minusSignLine = concat ( replicate ((w+1)* (n+2)) "-" )
horizontalHeader :: Int -> Int -> Int -> String
horizontalHeader b w i = format i b w
line :: Int -> Int -> Int -> Int -> String
line n b w y = (foldl (++) ((format y b w) ++ "|" )
(map (element b w y) [0..n])) ++ "\n"
element :: Int -> Int -> Int -> Int -> String
element b w y x = format (y * x) b w
toBase :: Int -> Int -> [Int]
toBase b v = toBase' [] v where
toBase' a 0 = a
toBase' a v = toBase' (r:a) q where (q,r) = v `divMod` b
toAlphaDigits :: [Int] -> String
toAlphaDigits = map convert where
convert n | n < 10 = chr (n + ord '0')
| otherwise = chr (n + ord 'a' - 10)
format :: Int -> Int -> Int -> String
format v b w = concat spaces ++ digits ++ " "
where
digits = if v == 0
then "0"
else toAlphaDigits ( toBase b v )
l = length digits
spaceCount = if (l > w) then 0 else (w-l)
spaces = replicate spaceCount " "
Here are some suggestions:
To make the tabularity of the computation more obvious, I would pass the list [0..n] to the line function rather than passing n.
I would further split out the computation of the horizontal and vertical axes so that they are passed as arguments to mulTable rather than computed there.
Haskell is higher-order, and almost none of the computation has to do with multiplication. So I would change the name of mulTable to binopTable and pass the actual multiplication in as a parameter.
Finally, the formatting of individual numbers is repetitious. Why not pass \x -> format x b w as a parameter, eliminating the need for b and w?
The net effect of the changes I am suggesting is that you build a general higher-order function for creating tables for binary operators. Its type becomes something like
binopTable :: (i -> String) -> (i -> i -> i) -> [i] -> [i] -> String
and you wind up with a much more reusable function—for example, Boolean truth tables should be a piece of cake.
Higher-order and reusable is the Haskell Way.
You don't use anything from import Text.Printf.
Stylistically, you use more parentheses than necessary. Haskellers tend to find code more readable when it's cleaned of extraneous stuff like that. Instead of something like h x = f (g x), write h = f . g.
Nothing here really requires Int; (Integral a) => a ought to do.
foldl (++) x xs == concat $ x : xs: I trust the built-in concat to work better than your implementation.
Also, you should prefer foldr when the function is lazy in its second argument, as (++) is – because Haskell is lazy, this reduces stack space (and also works on infinite lists).
Also, unwords and unlines are shortcuts for intercalate " " and concat . map (++ "\n") respectively, i.e. "join with spaces" and "join with newlines (plus trailing newline)"; you can replace a couple things by those.
Unless you use big numbers, w = length $ takeWhile (<= n) $ iterate (* b) 1 is probably faster. Or, in the case of a lazy programmer, let w = length $ toBase b n.
concat ( (replicate ((w+1)* (n+2)) "-" ) == replicate ((w+1) * (n+2)) '-' – not sure how you missed this one, you got it right just a couple lines up.
You do the same thing with concat spaces, too. However, wouldn't it be easier to actually use the Text.Printf import and write printf "%*s " w digits?
Norman Ramsey gave excellent high-level (design) suggestions; Below are some low-level ones:
First, consult with HLint. HLint is a friendly program that gives you rudimentary advice on how to improve your Haskell code!
In your case HLint gives 7 suggestions. (mostly about redundant brackets)
Modify your code according to HLint's suggestions until it likes what you feed it.
More HLint-like stuff:
concat (replicate i "-"). Why not replicate i '-'?
Consult with Hoogle whenever there is reason to believe that a function you need is already available in Haskell's libraries. Haskell comes with tons of useful functions so Hoogle should come in handy quite often.
Need to concatenate strings? Search for [String] -> String, and voila you found concat. Now go replace all those folds.
The previous search also suggested unlines. Actually, this even better suits your needs. It's magic!
Optional: pause and thank in your heart to Neil M for making Hoogle and HLint, and thank others for making other good stuff like Haskell, bridges, tennis balls, and sanitation.
Now, for every function that takes several arguments of the same type, make it clear which means what, by giving them descriptive names. This is better than comments, but you can still use both.
So
-- Returns n*n multiplication table in base b
mulTable :: Int -> Int -> String
mulTable n b =
becomes
mulTable :: Int -> Int -> String
mulTable size base =
To soften the extra characters blow of the previous suggestion: When a function is only used once, and is not very useful by itself, put it inside its caller's scope in its where clause, where it could use the callers' variables, saving you the need to pass everything to it.
So
line :: Int -> Int -> Int -> Int -> String
line n b w y =
concat
$ format y b w
: "|"
: map (element b w y) [0 .. n]
element :: Int -> Int -> Int -> Int -> String
element b w y x = format (y * x) b w
becomes
line :: Int -> Int -> Int -> Int -> String
line n b w y =
concat
$ format y b w
: "|"
: map element [0 .. n]
where
element x = format (y * x) b w
You can even move line into mulTable's where clause; imho, you should.
If you find a where clause nested inside another where clause troubling, then I suggest to change your indentation habits. My recommendation is to use consistent indentation of always 2 or always 4 spaces. Then you can easily see, everywhere, where the where in the other where is at. ok
Below's what it looks like (with a few other changes in style):
import Data.List
import Data.Char
mulTable :: Int -> Int -> String
mulTable size base =
unlines $
[ vertHeaders
, minusSignsLine
] ++ map line [0 .. size]
where
vertHeaders =
concat
$ replicate (cellWidth + 2) ' '
: map horizontalHeader [0 .. size]
horizontalHeader i = format i base cellWidth
minusSignsLine = replicate ((cellWidth + 1) * (size + 2)) '-'
cellWidth = length $ toBase base (size * size)
line y =
concat
$ format y base cellWidth
: "|"
: map element [0 .. size]
where
element x = format (y * x) base cellWidth
toBase :: Integral i => i -> i -> [i]
toBase base
= reverse
. map (`mod` base)
. takeWhile (> 0)
. iterate (`div` base)
toAlphaDigit :: Int -> Char
toAlphaDigit n
| n < 10 = chr (n + ord '0')
| otherwise = chr (n + ord 'a' - 10)
format :: Int -> Int -> Int -> String
format v b w =
spaces ++ digits ++ " "
where
digits
| v == 0 = "0"
| otherwise = map toAlphaDigit (toBase b v)
spaces = replicate (w - length digits) ' '
0) add a main function :-) at least rudimentary
import System.Environment (getArgs)
import Control.Monad (liftM)
main :: IO ()
main = do
args <- liftM (map read) $ getArgs
case args of
(n:b:_) -> putStrLn $ mulTable n b
_ -> putStrLn "usage: nntable n base"
1) run ghc or runhaskell with -Wall; run through hlint.
While hlint doesn't suggest anything special here (only some redundant brackets), ghc will tell you that you don't actually need Text.Printf here...
2) try running it with base = 1 or base = 0 or base = -1
If you want multiline comments use:
{- A multiline
comment -}
Also, never use foldl, use foldl' instead, in cases where you are dealing with large lists which must be folded. It is more memory efficient.
A brief comments saying what each function does, its arguments and return value, is always good. I had to read the code pretty carefully to fully make sense of it.
Some would say if you do that, explicit type signatures may not be required. That's an aesthetic question, I don't have a strong opinion on it.
One minor caveat: if you do remove the type signatures, you'll automatically get the polymorphic Integral support ephemient mentioned, but you will still need one around toAlphaDigits because of the infamous "monomorphism restriction."