Faster SumSquareDifference in Haskell - haskell

I am implementing a fractal image compression algorithm of binary images in Haskell. For this purpose i have to find to a given range block (a sub-image) the closest image in a so called domain pool, a list of lists of images. I am comparing images by calculating the sum square difference of both their pixel values.
I use the Haskell Image Processing (HIP) library for reading and writing images.
compress :: Image VS X Bit -> Int -> [(Int, Int)]
compress img blockSize = zip dIndices tIndices
where rImg = img
dImg = downsample2 rImg
rBlocks = (toBlocks rImg blockSize) :: [Image VS X Bit]
dBlocks = (toBlocks dImg blockSize) :: [Image VS X Bit]
dPool = (createDPool dBlocks) :: [[Image VS X Bit]]
distanceLists = map (\x -> (map.map) (distance x) dPool) rBlocks
dIndices = map (fst . getMinIndices) distanceLists
tIndices = map (snd . getMinIndices) distanceLists
distance :: Image VS X Bit -> Image VS X Bit-> Int
distance x y = sumSquareDifference (toBinList x) (toBinList y)
where toBinList = map (toNum . extractBitOfPixel) . concat . toLists
toLists :: MArray arr cs e => Image arr cs e -> [[Pixel cs e]]
toLists img = [[index img (i, j) | j <- [0..cols img -1]] | i <- [0.. rows img -1]]
extractBitOfPixel :: Pixel X Bit -> Bit
extractBitOfPixel (PixelX b) = b
sumSquareDifference :: [Int] -> [Int] -> Int
sumSquareDifference a b = sum $ zipWith (\x y -> (x-y)^2) a b
The performance of this code is really bad. Compressing a 256x256 image with a block size of 2 takes around 5 minutes despite compiling with -O2. Profiling shows me that most of the runtime is spent in the function distance, especially in sumSquareDifference, but also in toLists and toBinList:
binaryCompressionSimple +RTS -p -RTS
total time = 1430.89 secs (1430893 ticks # 1000 us, 1 processor)
total alloc = 609,573,757,744 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
sumSquareDifference Main binaryCompressionSimple.hs:87:1-63 30.9 28.3
toLists Main binaryCompressionSimple.hs:66:1-90 20.3 47.0
distance.toBinList Main binaryCompressionSimple.hs:74:11-79 10.9 15.1
main Main binaryCompressionSimple.hs:(14,1)-(24,21) 7.3 0.0
compress Main binaryCompressionSimple.hs:(28,1)-(36,60) 6.9 0.0
distance Main binaryCompressionSimple.hs:(71,1)-(74,79) 5.7 0.9
compress.distanceLists.\ Main binaryCompressionSimple.hs:34:38-65 5.2 4.4
compress.distanceLists Main binaryCompressionSimple.hs:34:11-74 2.8 0.0
main.\ Main binaryCompressionSimple.hs:20:72-128 2.7 0.0
getMinIndices.getMinIndex Main binaryCompressionSimple.hs:116:11-53 2.7 1.8
sumSquareDifference.\ Main binaryCompressionSimple.hs:87:52-58 2.7 2.5
Is there a way to improve performance?
A block size of 2 means comparing 16384 range blocks each with 131072 images of the domain pool, so sumSquareDifference will be called (16384*131072=)2147483648 times and calculate each time the sum square difference of two lists with length=4. I realize this is a large number but shouldn't the code be faster anyway (lazy evaluating of lists)? Is this a Haskell problem or an algorithm problem?
Edit:
I was able to at least improve the performance by a third by using:
distance :: Image VS X Bit -> Image VS X Bit-> Int
distance x y
| x == y = 0
| otherwise = sumSquareDifference (toBinList x) (toBinList y)
where toBinList = map (toNum . extractBitOfPixel) . concat . inlinedToLists
Edit 2:
I was able to increase the performance enormously by creating dPool with the function genDistanceList, which stops the calculation as soon as two identical images are found:
genDistanceList :: [[Image VS X Bit]] -> Image VS X Bit -> [[Int]]
genDistanceList dPool rBlock = nestedTakeWhileInclusive (/= 0) $ (map.map) (distance rBlock) dPool

The absolute first thing to try is skipping the conversion to lists:
{-# INLINE numIndex #-}
numIndex :: Image VS X Bit -> (Int, Int) -> Int
numIndex img pos = toNum . extractBitOfPixel $ index img pos
distance :: Image VS X Bit -> Image VS X Bit -> Int
distance a b = sum
[ (numIndex a pos - numIndex b pos)^2
| i <- [0 .. cols a-1]
, j <- [0 .. rows a-1]
, let pos = (i, j)
]
Since you haven't provided us with a minimal reproducible example, it's impossible to tell what effect, if any, that would have. If you want better advice, provide better data.
EDIT
Looking through the haddocks for hip, I suspect the following will be even better still:
distance :: Image VS X Bit -> Image VS X Bit -> Int
distance a b = id
. getX
. fold (+)
$ zipWith bitDistance a b
bitDistance :: Pixel X Bit -> Pixel X Bit -> Pixel X Int
bitDistance (PixelX a) (PixelX b) = PixelX (fromIntegral (a-b))
-- use (a-b)^2 when you switch to grayscale, but for Bit the squaring isn't needed
Here, the fold and zipWith are the ones provided by hip, not base.

Related

Haskell :: Recursion in Recursion for Loop in Loop (Part 2)

This question derived from previous question and answer. You can found the link here: Haskell :: Recursion in Recursion for Loop in Loop (Part 1)
The question were answered, I can say super amazing with nice explanation for future reference. Credit to #user2407038 for his amazing skills. However, something interesting to ponder with recursion with more than two partition. To make it clear I've changed the data a little bit for simplicity. Here how it looks:
Previously, the 2 red dots were generated by finding (min x, min y) and (max x, max y). To generate 4 red dots (min x, min y) (max x, min y) (min x, max y) (max x, max y) partition4 should take into consideration. Visually it looks like this:
Considering the max members for each group is 3, group 1 and group 4 exceed the number. A new group should be created based on these group. However, the trickier part is that this group will not compute the distance with previous red dots:
The edited code for previous question:
data Point = Point { ptX :: Double, ptY :: Double }
data Cluster = Cluster { clusterPts :: [Point] }
minMaxPoints :: [Point] -> (Point, Point)
minMaxPoints ps =
(Point minX minY
,Point maxX maxY)
where minX = minimum $ map ptX ps
maxX = maximum $ map ptX ps
minY = minimum $ map ptY ps
maxY = maximum $ map ptY ps
main = do
let pointDistance :: Point -> Point -> Double
pointDistance (Point x1 y1) (Point x2 y2) = sqrt $ (x1-x2)^2 + (y1-y2)^2
cluster1 :: [Point] -> [Cluster]
cluster1 ps =
let (mn, mx) = minMaxPoints ps
(psmn, psmx) = partition (\p -> pointDistance mn p < pointDistance mx p) ps
in [ Cluster psmn, Cluster psmx ]
cluster :: [Point] -> [Cluster]
cluster ps =
cluster1 ps >>= \cl#(Cluster c) ->
if length c > 5
then cluster c
else [cl]
testPts :: [Point]
testPts = map (uncurry Point)
[ (1,0), (2,1), (0,2)
, (5,2), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3) ]
main = mapM (map (\p -> (ptX p, ptY p)) . clusterPts) $ cluster testPts
print main
I've found it when the length c changed the answer as not as expected it should be. Perhaps I've edited it wrongly (Sigh).
Still figuring how to fit in PartitionN code for partitioning into N groups as suggested.

How much space does ridge regression require?

In Haskell, ridge regression can be expressed as:
import Numeric.LinearAlgebra
createReadout :: Matrix Double → Matrix Double → Matrix Double
createReadout a b = oA <\> oB
where
μ = 1e-4
oA = (a <> (tr a)) + (μ * (ident $ rows a))
oB = a <> (tr b)
However, this operation is very memory expensive. Here is a minimalistic example that requires more than 2GB on my machine and takes 3 minutes to execute.
import Numeric.LinearAlgebra
import System.Random
createReadout :: Matrix Double -> Matrix Double -> Matrix Double
createReadout a b = oA <\> oB
where
mu = 1e-4
oA = (a <> (tr a)) + (mu * (ident $ rows a))
oB = a <> (tr b)
teacher :: [Int] -> Int -> Int -> Matrix Double
teacher labelsList cols' correctRow = fromBlocks $ f <$> labelsList
where ones = konst 1.0 (1, cols')
zeros = konst 0.0 (1, cols')
rows' = length labelsList
f i | i == correctRow = [ones]
| otherwise = [zeros]
glue :: Element t => [Matrix t] -> Matrix t
glue xs = fromBlocks [xs]
main :: IO ()
main = do
let n = 1500 -- <- The constant to be increased
m = 10000
cols' = 12
g <- newStdGen
-- Stub data
let labels = take m . map (`mod` 10) . randoms $ g :: [Int]
a = (n >< (cols' * m)) $ take (cols' * m * n) $ randoms g :: Matrix Double
teachers = zipWith (teacher [0..9]) (repeat cols') labels
b = glue teachers
print $ maxElement $ createReadout a b
return ()
$ cabal exec ghc -- -O2 Test.hs
$ time ./Test
./Test 190.16s user 5.22s system 106% cpu 3:03.93 total
The problem is to increase the constant n, at least to n = 4000, while RAM is limited by 5GB. What is minimal space that matrix inversion operation requires in theory? How can this operation be optimized in terms of space? Can ridge regression be efficiently replaced with a cheaper method?
Simple Gauss-Jordan elimination only takes space to store the input and output matrices plus constant auxiliary space. If I'm reading correctly, the matrix oA you need to invert is n x n so that's not a problem.
Your memory usage is completely dominated by storing the input matrix a, which uses at least 1500 * 120000 * 8 = 1.34 GB. n = 4000 would be 4000 * 120000 * 8 = 3.58 GB which is over half of your space budget. I don't know what matrix library you are using or how it stores its matrices, but if they are on the Haskell heap then GC effects could easily account for another factor of 2 in space usage.
Well you can get away with 3*m + nxn space, but how numerically stable this will be I'm not sure.
The basis is the identity
inv( inv(Q) + A'*A)) = Q - Q*A'*R*A*Q
where R = inv( I + A*Q*A')
If A is your A matrix and
Q = inv( mu*I*mu*I) = I/(mu*mu)
then the solution to your ridge regression is
inv( inv(Q) + A'*A)) * A'*b
A little more algebra shows
inv( inv(Q) + A'*A)) = (I - A'*inv( (mu2 + A*A'))*A)/mu2
where mu2 = mu*m
Note that since A is n x m, A*A' is n x n.
So one algorithm would be
Compute C = A*A' + mu2
Do a cholesky decompostion of C, ie find upper triangular U so that U'*U = C
Compute the vector y = A'*b
Compute the vector z = A*y
Solve U'*u = z for u in z
Solve U*v = z for v in z
compute w = A'*z
Compute x = (y - w)/mu2.

Using force vs time / space efficiency

Hi I am generating a sparse DAG of 1000 X 1000 nodes each having ~4 edges (direction). Here is the relevant code : Full Code with imports
The problem i am solving has values between [0-1500]. I have hardcoded 1501 as upper value for now. I am trying to calculate longest path of edges in the DAG. However, these details are not direct part of my question :
My question is related to how to judge the usage of force or similar constructs while writing algos in haskell :
type OutGoingEdges = Map.Map NodeId [ NodeId ]
type NodesData = Map.Map NodeId Node
type NodeId = Int
data DAG = AdjList
{ outGoingEdges :: OutGoingEdges
, nodesData :: NodesData
} deriving (Eq, Show)
makeDAG :: DAGDataPath -> IO (DAG, SourceNodes)
makeDAG filepath = do
listOfListOfInts <- makeInteger <$> readLines filepath
let [width, height] = head listOfListOfInts
numNodes = width * height
rows = (replicate width 1501) : (drop 1 listOfListOfInts) ++ [(replicate width 1501)]
heightsWithNodeIdsRows = force . fmap (\ (row, rowId) -> fmap (\ (height, colId) -> (height, rowId * width + colId)) $ zip row [1..]) $ zip rows [1..]
emptyGraph = AdjList Map.empty $ Map.fromList (fmap (\(h, nid) -> (nid, Node h)) . concat . tail . init $ heightsWithNodeIdsRows)
emptyNodesWithEdges = Set.empty
threeRowsInOneGo = zip3 heightsWithNodeIdsRows (drop 1 heightsWithNodeIdsRows) (drop 2 heightsWithNodeIdsRows)
(graph, nodesWithInEdges) = DL.foldl' makeGraph (emptyGraph, emptyNodesWithEdges) threeRowsInOneGo
sourceNodes = Set.difference (Set.fromList . Map.keys . nodesData $ graph) nodesWithInEdges
-- traceShow [take 10 . Map.keys . nodesData $ graph] (return (Set.toList sourceNodes))
-- traceShow graph (return (Set.toList sourceNodes))
-- traceShow sourceNodes (return (Set.toList sourceNodes))
return (graph, force $ Set.toList sourceNodes)
where
makeGraph (graphTillNow, nodesWithInEdges) (prevRow, row, nextRow) =
let updownEdges = zip3 prevRow row nextRow
(graph', nodesInEdges') = addEdges (graphTillNow, nodesWithInEdges) updownEdges
leftRightEdges = zip3 ((1501, 0) : row) (drop 1 row) (drop 2 row)
(graph'', nodesInEdges'') = addEdges (graph', nodesInEdges') leftRightEdges
Next line is interesting... graph'' is DAG and nodesInEdges'' is a Set NodeId
in (graph'', nodesInEdges'')
addEdges (g, n) edges =
DL.foldl' (\ (!g', !n') ((p, pId), (c, cId), (n, nId)) ->
let (g'', n'') = if c > p
then (makeEdge cId pId g', Set.insert pId n')
else (g', n')
(g''', n''') = if c > n
then (makeEdge cId nId g'', Set.insert nId n'')
else (g'', n'')
in (g''', n'''))
(g, n)
edges
While profiling i found that, if i use (force graph'', force nodesInEdges'') instead of (graph'', nodesInEdges''), my memory usage reduces from 3 GB to 600 MB. But running time of program increases from 37 secs to 69 secs. These numbers are from time command and looking at activity monitor. I also checked with profiling and it was similar results.
I am compiling profile builds with :
stack build --executable-profiling --library-profiling --ghc-options="-fprof-auto -auto-all -caf-all -fforce-recomp -rtsopts" --file-watch
I have ghc-7.10.3 and stack 1.1.2.
I think that force goes over the data structure again and again.
Can force be told to not go over the graph if it already fully evaluated ?
Can i use some other strategy ?
Sample Input:
2 2 -- width height
1 2
3 4
Output:
3
Output is length of longest path in the graph. [4 -> 2 -> 1] i.e. [(1,1),(0,1), (0,0)]. Just to remind, correctness of program is not the question;
space/time efficiency is. Thanks

How do I stop randomness from pervading my code in Haskell?

I am attempting to implement the following algorithm, as detailed here.
Start with a flat terrain (initialize all height values to zero).
Pick a random point on or near the terrain, and a random radius
between some predetermined minimum and maximum. Carefully choosing
this min and max will make a terrain rough and rocky or smooth and
rolling.
Raise a hill on the terrain centered at the point, having the given
radius.
Go back to step 2, and repeat as many times as necessary. The number
of iterations chosen will affect the appearance of the terrain.
However, I start to struggle once I get to the point where I have to select a random point on the terrain. This random point is wrapped in an IO monad, which is then passed up my chain of functions.
Can I cut the IO off at a certain point and, if so, how do I find that point?
The following is my (broken) code. I would appreciate any suggestions on improving it / stopping the randomness from infecting everything.
type Point = (GLfloat, GLfloat, GLfloat)
type Terrain = [Point]
flatTerrain :: Double -> Double -> Double -> Double -> Terrain
flatTerrain width length height spacing =
[(realToFrac x, realToFrac y, realToFrac z)
| x <- [-width,-1+spacing..width], y <- [height], z <- [-length,-1+spacing..length]]
hill :: Terrain -> Terrain
hill terrain = hill' terrain 100
where hill' terrain 0 = terrain
hill' terrain iterations = do
raised <- raise terrain
hill' (raise terrain) (iterations - 1)
raise terrain = do
point <- pick terrain
map (raisePoint 0.1 point) terrain
raisePoint r (cx,cy,cz) (px,py,pz) =
(px, r^2 - ((cx - px)^2 + (cz - pz)^2), pz)
pick :: [a] -> IO a
pick xs = randomRIO (0, (length xs - 1)) >>= return . (xs !!)
The algorithm says that you need to iterate and in each iteration select a random number and update the terrain which can be viewed as generate a list of random points and use this list to update the terrain i.e iteration to generate random numbers == list of random numbers.
So you can do something like:
selectRandomPoints :: [Points] -> Int -> IO [Points] -- generate Int times random points
updateTerrain :: Terrain -> [Points] -> Terrain
-- somewhere in IO
do
pts <- selectRandomPoints allPts iterationCount
let newTerrain = updateTerrain t pts
One of the most useful features of haskell is to know a function is deterministic just based on its type - it makes testing much easier. For this reason, I would base my design on limiting randomness as much as possible, and wrapping the core non random functions with a random variant. This is easily done with the MonadRandom type class, which is the best way of writing code in haskell that requires random values.
For fun, I wrote a console version of that hill generator. It is pretty basic, with a lot of hard coded constants. However, it does provide a pretty cool ascii terrain generator :)
Note with my solution all of the calculations are isolated in pure, non random functions. This could then be tested easily, as the result is deterministic. As little as possible occurs in the IO monad.
import Control.Monad
import Control.Monad.Random
import Data.List
import Data.Function (on)
type Point = (Double, Double, Double)
type Terrain = [Point]
-- Non random code
flatTerrain :: Double -> Double -> Double -> Double -> Terrain
flatTerrain width length height spacing = [(realToFrac x, realToFrac y, realToFrac z)
| x <- [-width,-width+spacing..width], y <- [height], z <- [-length,-length+spacing..length]]
-- simple terrain displayer, uses ascii to render the area.
-- assumes the terrain points are all separated by the same amount
showTerrain :: Terrain -> String
showTerrain terrain = unlines $ map (concat . map showPoint) pointsByZ where
pointsByZ = groupBy ((==) `on` getZ) $ sortBy (compare `on` getZ) terrain
getZ (_, _, z) = z
getY (_, y, _) = y
largest = getY $ maximumBy (compare `on` getY) terrain
smallest = getY $ minimumBy (compare `on` getY) terrain
atPC percent = (largest - smallest) * percent + smallest
showPoint (_, y, _)
| y < atPC (1/5) = " "
| y < atPC (2/5) = "."
| y < atPC (3/5) = "*"
| y < atPC (4/5) = "^"
| otherwise = "#"
addHill :: Double -- Radius of hill
-> Point -- Position of hill
-> Terrain -> Terrain
addHill radius point = map (raisePoint radius point) where
raisePoint :: Double -> Point -> Point -> Point
-- I had to add max py here, otherwise new hills destroyed the
-- old hills with negative values.
raisePoint r (cx,cy,cz) (px,py,pz) = (px, max py (r^2 - ((cx - px)^2 + (cz - pz)^2)), pz)
-- Some random variants. IO is an instance of MonadRandom, so these function can be run in IO. They
-- can also be run in any other monad that has a MonadRandom instance, so they are pretty flexible.
-- creates a random point. Note that the ranges are hardcoded - an improvement would
-- be to be able to specify them, either through parameters, or through reading from a Reader
-- monad or similar
randomPoint :: (MonadRandom m) => m Point
randomPoint = do
x <- getRandomR (-30, 30)
y <- getRandomR (0,10)
z <- getRandomR (-30, 30)
return (x, y, z)
addRandomHill :: (MonadRandom m) => Terrain -> m Terrain
addRandomHill terrain = do
radius <- getRandomR (0, 8) -- hardcoded again
position <- randomPoint
return $ addHill radius position terrain
-- Add many random hills to the Terrain
addRandomHills :: (MonadRandom m) => Int -> Terrain -> m Terrain
addRandomHills count = foldr (>=>) return $ replicate count addRandomHill
-- testing code
test hillCount = do
let terrain = flatTerrain 30 30 0 2
withHills <- addRandomHills hillCount terrain
-- let oneHill = addHill 8 (0, 3, 0) terrain
-- putStrLn $ showTerrain oneHill
putStrLn $ showTerrain withHills
main = test 200
Example output:
... .. ..*. .***^^^***.
... ... .***. .***^^^*^^*.
... .. .*^**......*^*^^^^.
. .***.***. ..*^^^*.
....*^^***^*. .^##^*.
..*.*^^^*****. .^###^..*
.**^^^^.***... .*^#^*.**
.***^##^**..*^^*.*****..**
....***^^##^*.*^##^****. ..
.......*^###^.*###^****.
.*********^###^**^##^***....
*^^^*^##^^^^###^.^^^*. .****..
*^^^^####*^####^..**. .******.
*^^^*####**^###*. .. .*******
*^#^^^##^***^^*. ...........***
*^^^**^^*..*... ..*******...***
.***..*^^*... ..*^^#^^^*......
...*^##^**. .*^^#####*.
.*^##^**....**^^####*. .***
.. ..*^^^*...*...**^^###^* *^#^
..****^^*. .... ...**###^*.^###
..*******.**. ..**^^^#^^..^###
.*****..*^^* ..**^##^**...*^##
.^^^^....*^^*..*^^^##^* ..**^^^
*###^*. .*^**..^###^^^*...*****
^####*.*..*^^*.^###^**.....*..
*###^**^**^^^*.*###^. .. .
.^^^***^^^^#^*.**^^**.
.....***^##^**^^^*^^*.
.*^^##^*^##^^^^^.
.*^^^^*.^##^*^^*.
Nope, you can't escape IO. Perhaps you can do all your randomness up front and rewrite your functions to take that randomness as a parameter; if not, you can use MonadRandom or similar to track a random seed or just put everything in IO.

Is it possible to use Haskell to reasonably solve large DP problems

I wrote code for solving the local alignment problem with Smith–Waterman algorithm.
I want to do this with input of strings with length 10000, with reasonable memory(under 2GB ram) and reasonable time (under 5 minutes).
At first I was using bio library's built in function for this, and it runs way too slow and eat up 4GB of ram before I killed it.
Note the java program jAligner, which implements the same algorithm, can solve this problem with less than 1GB of memory and less than 20 seconds.
When I wrote an unboxed version of this, the program gives me <<loop>>. I think it's because the array need to access items in the array before the array gets built entirely.
So I wonder is it even possible to write Haskell code with similar performance for this kind of larger dynamic programming problems.
module LocalAlign where
--import Data.Array.Unboxed
import Data.Tuple
import Data.Array
localAffineAlignment :: (Char -> Char -> Int)
-> Int
-> Int
-> String
-> String
-> (Int, (String, String, String, String))
localAffineAlignment f g e s' t' = (score, best) where
n = length s'
m = length t'
s= array (0,n-1) $ zip [0..n-1] s'
t= array (0,m-1) $ zip [0..m-1] t'
table :: (Array (Int,Int) Int,Array (Int,Int) Int)
table = (c,d)
where --a :: UArray (Int,Int) Int
a = array ((0,0),(n,m)) [((x,y),a' x y)|x<-[0..n],y<-[0..m]] --s end with gap
b = array ((0,0),(n,m)) [((x,y),b' x y)|x<-[0..n],y<-[0..m]] --t end with gap
c = array ((0,0),(n,m)) [((x,y),fst (c' x y))|x<-[0..n],y<-[0..m]] -- best
d = array ((0,0),(n,m)) [((x,y),snd (c' x y))|x<-[0..n],y<-[0..m]] -- direction
a' i j
| i==0 || j==0 = inf
| otherwise = max (a!(i-1,j)-e) (c!(i-1,j)-g-e)
b' i j
| i==0 || j==0 = inf
| otherwise = max (b!(i,j-1)-e) (c!(i,j-1)-g-e)
c' i j
| min i j == 0 = (0,0)
| otherwise = maximum [(b!(i,j),3),(a!(i,j),2),(c!(i-1,j-1) + f u v,1),(0,0)]
where u = s!(i-1)
v = t!(j-1)
inf = -1073741824
score :: Int
score = maximum $ elems $ fst table
best :: (String, String, String, String)
best = (drop si $ take ei s',drop sj $ take ej t',b1,b2)
where (a,d') = table
(si,sj,b1,b2) = build ei ej [] []
(ei,ej) = snd $ maximum $ map swap $ assocs a
build x y ss tt
| o == 0 = (x,y,ss,tt)
| d == 1 = build (x-1) (y-1) (u:ss) (v:tt)
| d == 2 = build (x-1) y (u:ss) ('-':tt)
| otherwise = build x (y-1) ('-':ss) (v:tt)
where o = a!(x,y)
d = d'!(x,y)
u = s!(x-1)
v = t!(y-1)
is it even possible to write Haskell code with similar performance for this kind of larger dynamic programming problems.
Yes, of course. Use the same data structures and the same algorithms, and you will get same (or better, or worse, by constant factors) performance.
You are using (intermediate) lists and boxed arrays heavily. Consider using the vector package instead.
You might be interested in the MemoCombinators library, which makes doing dynamic programming much easier. You can basically write the algorithm without memoization, then just annotate which variables you want memoized, and the compiler takes it from there.

Resources