I said in this question that I didn't understand the source code of findIndices.
In fact I didn't pay enough attention and I didn't see that there are two definitions of this function:
findIndices :: (a -> Bool) -> [a] -> [Int]
#if defined(USE_REPORT_PRELUDE)
findIndices p xs = [ i | (x,i) <- zip xs [0..], p x]
#else
-- Efficient definition, adapted from Data.Sequence
{-# INLINE findIndices #-}
findIndices p ls = build $ \c n ->
let go x r k | p x = I# k `c` r (k +# 1#)
| otherwise = r (k +# 1#)
in foldr go (\_ -> n) ls 0#
#endif /* USE_REPORT_PRELUDE */
I understand the first definition, the one I didn't see. I don't understand the second one. I have a couple of questions:
what is if defined(USE_REPORT_PRELUDE) ?
can one explain the second definition ? What are build, I#, +#, 1# ?
why the second definition is inlined, not the first one ?
The CPP extensions enables the C preprocessor, as for the C programming language. Here, it is used to test if the flag USE_REPORT_PRELUDE was set during compilation. According to that flag, the compiler uses the #if or the #else variant of code.
build is a function which could be defined as
build f = f (:) []
So, using build (\c n -> ... essentially lets c to the "cons" (:), and n to the "nil" [].
This is not used for convenience: it is not convenient at all! However, the compiler optimizer works great with build and foldr combined, so the code is written here in a weird way to take advantage of that.
Further, I# ... is the low-level constructor for integers. When we normally write
x :: Int
x = 4+2
GHC implements x (very roughly) with a pointer to some memory that reads as unevaluated: 4+2. After x is forced the first time, this memory gets overwritten with evaluated: I# 6#. This is needed to implement laziness.
The "boxing" here refers to the indirection through a pointer.
Instead, the type Int# is a plain machine integer, with no pointers, no indirection, no unevaluated expressions. It is strict (instead of lazy), but being more low-level it is more efficient. One creates a value as in
x' :: Int#
x' = 6#
x :: Int
x = I# x'
Indeed, Int is defined as newtype Int = I# Int#.
Keep in mind that this is not standard Haskell, but GHC-specific low-level details. In normal code, you should not need to use such unboxed types. In libraries, the authors do that to achieve a little more performance, but that's it.
Sometimes, even if in our code we only use Ints, GHC is smart enough to automatically convert our code to using Int# and achieve more efficiency, avoiding the boxing. This can be observed if we ask GHC to "dump Core" so that we can see the result of the optimization.
For instance, compiling
f :: Int -> Int
f 0 = 0
f n = n + f (n-1)
GHC produces a lower level version (this is GHC Core, not Haskell, but it is similar enough to be understood):
Main.$wf :: GHC.Prim.Int# -> GHC.Prim.Int#
Main.$wf = \ (ww_s4un :: GHC.Prim.Int#) ->
case ww_s4un of ds_X210 {
__DEFAULT ->
case Main.$wf (GHC.Prim.-# ds_X210 1#) of ww1_s4ur { __DEFAULT ->
GHC.Prim.+# ds_X210 ww1_s4ur
};
0# -> 0#
}
Notice the number of arguments to go. go x r k = ... === go x r = \k -> .... This is the standard trick to arrange for left-to-right information flow while folding the list (go is used as the reducer function, in foldr go (\_ -> n) ls 0#). Here, it's the counting of [0..], explicated as the initial k=0 and the (k + 1) on each step (k is an unfortunate naming choice, i seems better; k is overloaded with the irrelevant "constant" and "continuation", not just "counter" which was probably the intended meaning here).
The foldr/build (sic) fusion (linked to by luqui in the comments) turns foldr c n $ findIndices p [a1,a2,...,an] into a loop, exposing the inner foldr of the findIndices definition, avoiding building the actual list structure of the result of the findIndices call:
build g = g (:) []
foldr c n $ build g = g c n
foldr c n $ findIndices p [a1,a2,...,an]
==
foldr c n $ build g where {g c n = ...}
=
g c n where {g c n = ...}
=
foldr go (const n) [a1,a2,...,an] 0 where {go x r k = ...}
=
go a1 (foldr go (const n) [a2,...,an]) 0
=
let { x=a1, r=foldr go (const n) [a2,...,an], k=0 }
in
if | p x -> c (I# k) (r (k +# 1#)) -- no 'cons' (`:`), only 'c'
| otherwise -> r (k +# 1#)
=
....
So you see, it's a standard trick to have foldr define a function which expects one more input argument, to arrange the left-to-right information flow while processing the input list.
All the stuff with the hash sign are "primitive" or "closer-to-machine-level" entities. I# is a primitive Int constructor; 0# is a machine-level 0; etc.. This may or may not be exactly correct, but it should be close enough.
foldr/build fusion seems a particular case of transducers-based code transformation, which is based on the fact that nested folds are fused by composing their reducers' transformers (aka transducers), as in
foldr c n $
foldr (tr2 c2) n2 $
foldr (tr3 c3) n3 xs
=
foldr (tr2 c) n $ -- fold "replaces" the constructor nodes with its reducer
foldr (tr3 c3) n3 xs -- so just use the outer reducer in the first place!
=
foldr (tr3 (tr2 c)) n xs
=
foldr ((tr3 . tr2) c) n xs
and build g === foldr . tr for some appropriate choice of tr for a given g, so that
build g = g c n = (foldr . tr) c n = foldr (tr c) n
As for USE_REPORT_PRELUDE, again, I can't say this with any authority, but I always assumed that it is the compilation flag which is enabled when the mock definitions from the Haskell Report are used as actual code, even though they were intended as an executable specification.
How it goes: Based on the set of tuple (id, x, y), find the min max for x and y , then two dots (red points) created. Each element in tuple are grouped to two groups based on the distance towards the red dots.
Each group cant exceed 5 dots. If exceed, new group should be computed. I've managed to do recursion for the first phase. But I have no idea how to do it for second phase. The second phase should look like this:
Based on these two groups, again it need to find the min max for x and y (for each group), then four dots (red points) created. Each element in tuple are grouped to two groups based on the distance towards the red dots.
getDistance :: (Int, Double, Double) -> (Int, Double, Double) -> Double
getDistance (_,x1,y1) (_,x2,y2) = sqrt $ (x1-x2)^2 + (y1-y2)^2
getTheClusterID :: (Int, Double, Double) -> Int
getTheClusterID (id, _, _) = id
idxy = [(id, x, y)]
createCluster id cs = [(id, minX, minY),(id+1, maxX, minY), (id+2, minX, maxY), (id+3, maxX, maxY)]
where minX = minimum $ map (\(_,x,_,_) -> x) cs
maxX = maximum $ map (\(_,x,_,_) -> x) cs
minY = minimum $ map (\(_,_,y,_) -> y) cs
maxY = maximum $ map (\(_,_,y,_) -> y) cs
idCluster = [1]
cluster = createCluster (last idCluster) idxy
clusterThis (id,a,b) = case (a,b) of
j | getDistance (a,b) (cluster!!0) < getDistance (a,b) (cluster!!1) &&
-> (getTheClusterID (cluster!!0), a, b)
j | getDistance (a,b) (cluster!!1) < getDistance (a,b) (cluster!!0) &&
-> (getTheClusterID (cluster!!1), a, b)
_ -> (getTheClusterID (cluster!!0), a, b)
groupAll = map clusterThis idxy
I am moving from imperative to functional. Sorry if my way of thinking is still in imperative way. Still learning.
EDIT:
To clarify, this is the original data looks like.
The basic principle to follow in writing such an algorithm is to write small, compositional programs; each program is then easy to reason about and test in isolation, and the final program can be written in terms of the smaller ones.
The algorithm can be summarized as follows:
Compute the points which bound the set of points.
Split the rest of the points into two clusters, one containing points closer to the minimum point, the other containing all other points (equivalently, points closer to the maximum point).
If any cluster contains more than 5 points, repeat the process on that cluster.
The presence of a 'repeat the process' step indicates this to be a divide and conquer problem.
I see no need for an ID for each point, so I've dispensed with this.
To begin, define datatypes for each type of data you will be working with:
import Data.List (partition)
data Point = Point { ptX :: Double, ptY :: Double }
data Cluster = Cluster { clusterPts :: [Point] }
This may seem silly for such simple data, but it can potentially save you quite a bit of confusion during debugging. Also note the import of a function we will be using later.
The 1st step:
minMaxPoints :: [Point] -> (Point, Point)
minMaxPoints ps =
(Point minX minY
,Point maxX maxY)
where minX = minimum $ map ptX ps
maxX = maximum $ map ptX ps
minY = minimum $ map ptY ps
maxY = maximum $ map ptY ps
This is essentially the same as your createCluster function.
The 2nd step:
pointDistance :: Point -> Point -> Double
pointDistance (Point x1 y1) (Point x2 y2) = sqrt $ (x1-x2)^2 + (y1-y2)^2
cluster1 :: [Point] -> [Cluster]
cluster1 ps =
let (mn, mx) = minMaxPoints ps
(psmn, psmx) = partition (\p -> pointDistance mn p < pointDistance mx p) ps
in [ Cluster psmn, Cluster psmx ]
This function should clear - it is a direct translation of the above statement of this step into code. The partition function takes a predicate and a list and produces two lists, the first containing all elements for which the predicate is true, and the second all elements for which it is false. pointDistance is essentially the same as your getDistance function.
The 3rd step:
cluster :: [Point] -> [Cluster]
cluster ps =
cluster1 ps >>= \cl#(Cluster c) ->
if length c > 5
then cluster c
else [cl]
This also implements the statement above very directly. Perhaps the only confusing part is the use of >>=, which (here) has type [a] -> (a -> [b]) -> [b]; it simply applies the given function to each element of the given list, and concatenates the result (equivalently, it is written flip concatMap).
Finally your test case (which I hope I've translated correctly from pictures to Haskell data):
testPts :: [Point]
testPts = map (uncurry Point)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
main = mapM_ (print . map (\p -> (ptX p, ptY p)) . clusterPts) $ cluster testPts
Running this program produces
[(0.0,0.0),(0.0,2.0),(2.0,1.0),(1.0,0.0)]
[(4.0,4.0),(5.0,2.0),(5.0,4.0),(4.0,3.0)]
[(10.0,2.0),(9.0,3.0),(8.0,2.0)]
[(13.0,3.0),(12.0,3.0),(11.0,4.0),(13.0,5.0)]
Functional programmers love recursion, yet they go to great lengths to avoid writing it. Jeez, people, make up your minds!
I like to structure my code, to the extent possible, using common, well-understood combinators. I want to demonstrate a style of Haskell programming which leans heavily on standard tools to implement the boring parts of a program (mapping, zipping, looping) as tersely and generically as possible, freeing you up to focus on the problem at hand.
So don't worry if you don't understand everything here. I just want to show you what's possible! (And please ask if you have questions!)
Vectors
First things first: we're working with two-dimensional space, so we'll need two-dimensional vectors and some secondary school vector algebra to work with them.
I'm going to parameterise my vector by the scalar on which our vector space is built. This'll allow me to work with standard type classes like Functor, so I can delegate a lot of the work of building a vector algebra to the machine. I've turned on DeriveFunctor and DeriveFoldable, which allow me to utter the magic words deriving (Functor, Foldable).
data Pair a = Pair {
px :: a,
py :: a
} deriving (Show, Functor, Foldable)
Hereafter I'm going to avoid working explicitly with Pair, and program to an interface, not an implementation. This'll allow me to build a simple linear algebra library in a manner that's independent of the dimensionality of the vector space. I'll give example type signatures in terms of V2:
type V2 = Pair Double
Scalar multiplication: functors
A vector space is required to have two operations: scalar multiplication and vector addition. Scalar multiplication means multiplying each component of a vector by a constant scalar. If you view a vector as a container of components, it should be clear that this means "do the same thing to every element in a container" - that is, it's a mapping operation. That's what Functor is for.
-- mul :: Double -> V2 -> V2
mul :: (Functor f, Num n) => n -> f n -> f n
mul k f = fmap (k *) f
Vector addition: zippy applicatives
Vector addition involves adding up the components of a vector point-wise. Thinking of a vector as a container of components, addition is a zipping operation - match up each element of the two vectors and add them up.
Applicative functors are functors with an additional "apply" operation. Thinking of a functor f as a container, Applicative's <*> :: f (a -> b) -> f a -> f b gives you a way to take a container of functions and apply it to a container of values to get a new container of values. It should be clear that one way to make Pair into an Applicative is to use zipping to apply functions to values.
instance Applicative Pair where
pure x = Pair x x
Pair f g <*> Pair x y = Pair (f x) (g y)
(For another example of a zippy applicative, see this answer of mine.)
Now that we have a way to zip two pairs, we can leverage a bit of standard Applicative machinery to implement vector addition.
-- add :: V2 -> V2 -> V2
add :: (Applicative f, Num n) => f n -> f n -> f n
add = liftA2 (+)
Vector subtraction, which gives you a way to find the distance between two points, is defined in terms of multiplication and addition.
-- minus :: V2 -> V2 -> V2
minus :: (Applicative f, Num n) => f n -> f n -> f n
v `minus` u = v `add` mul (-1) u
Dot products: foldable containers
2D Euclidean space is actually a Hilbert space - a vector space equipped with a way to measure lengths and angles in the form of a dot product. To take the dot product of two vectors, you multiply the components together and then add up the results. Once more, we'll be using Applicative to multiply the components, but that just gives us another vector: how do we implement "adding up the results"?
Foldable is the class of containers which admit an "aggregation" operation foldr :: (a -> b -> b) -> b -> f a -> b. The standard prelude's sum is defined in terms of foldr, so:
-- dot :: V2 -> V2 -> Double
dot :: (Applicative f, Foldable f, Num n) => f n -> f n -> n
v `dot` u = sum $ liftA2 (*) v u
This gives us a way to find the absolute length of a vector: dot it with itself and take the square root.
-- modulus :: V2 -> Double
modulus :: (Applicative f, Foldable f, Floating n) => f n -> n
modulus v = sqrt $ v `dot` v
So the distance between two points is the modulus of the difference of the vectors.
dist :: (Applicative f, Foldable f, Floating n) => f n -> f n -> n
dist v u = modulus (v `minus` u)
N-ary zipping: traversable containers
An axis-aligned (hyper-)rectangle can be defined by just two points. We'll represent the bounding box of a set of points as a Pair of vectors pointing to opposite corners of the bounding box.
Given a collection of vectors of components, we can find the opposite corners of the bounding box by finding the maximum and minimum of each component across the collection. This requires us to zip up, or transpose, a collection of vectors of components into a vector of collections of components. For this I'll use Traversable's sequenceA.
-- boundingBox :: [V2] -> Pair V2
boundingBox :: (Traversable t, Applicative f, Ord n) => t (f n) -> Pair (f n)
boundingBox vs =
let components = sequenceA vs
in Pair (minimum <$> components) (maximum <$> components)
Clustering
Now that we have a library for working with vectors, we can get down to the meaty part of the algorithm: dividing sets of points into clusters.
Partitioning
Let me rephrase the specification of the inner loop of your algorithm. You want to partition a set of points based on whether they're closer to the bottom-left corner of the set's bounding box or to the top-right corner. That's what partition does.
We can write a function, whichCluster which uses minus and modulus to decide this for a single point, and then use partition to apply it to the whole set.
type Cluster = []
-- cluster :: Cluster V2 -> [Cluster V2]
cluster :: (Applicative f, Foldable f, Ord n, Floating n) => Cluster (f n) -> [Cluster (f n)]
cluster vs =
let Pair bottomLeft topRight = boundingBox vs
whichCluster v = dist v bottomLeft <= dist v topRight
(g1, g2) = partition whichCluster vs
in [g1, g2]
Repetition, repetition, repetition
Now we want to repeatedly cluster until we don't have any groups larger than 5. Here's the plan. We'll keep track of two sets of clusters, those which are small enough, and those which require further sub-clustering. I'll use partition to sort a list of clusters into those which are small enough and those which need subclustering. I'll use the list monad's >>= :: [a] -> (a -> [b]) -> [b] (here [Cluster V2] -> ([V2] -> [Cluster V2]) -> [Cluster V2]), which maps a function over a list and flattens the result, to implement the notion of subclustering. And I'll use until to repeatedly subcluster until the set of remaining too-large clusters is empty.
-- smallClusters :: Int -> Cluster V2 -> [Cluster V2]
smallClusters :: (Applicative f, Foldable f, Ord n, Floating n) => Int -> Cluster (f n) -> [Cluster (f n)]
smallClusters maxSize vs = fst $ until (null . snd) splitLarge ([], [vs])
where
smallEnough xs = length xs <= maxSize
splitLarge (small, remaining) =
let (newSmall, large) = partition smallEnough remaining
in (small ++ newSmall, large >>= cluster)
A quick test, cribbed from #user2407038's answer:
testPts :: [V2]
testPts = map (uncurry Pair)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
ghci> smallClusters 5 testPts
[
[Pair {px = 0.0, py = 0.0},Pair {px = 1.0, py = 0.0},Pair {px = 2.0, py = 1.0},Pair {px = 0.0, py = 2.0}],
[Pair {px = 5.0, py = 2.0},Pair {px = 5.0, py = 4.0},Pair {px = 4.0, py = 3.0},Pair {px = 4.0, py = 4.0}],
[Pair {px = 8.0, py = 2.0},Pair {px = 9.0, py = 3.0},Pair {px = 10.0, py = 2.0}]
[Pair {px = 11.0, py = 4.0},Pair {px = 12.0, py = 3.0},Pair {px = 13.0, py = 3.0},Pair {px = 13.0, py = 5.0}]
]
There you go. Small clusters in n-dimensional space, all without a single recursive function.
Labelling
Part of the point of working with the Applicative and Foldable interfaces, rather than working with V2 directly, was so I could demonstrate the following little magic trick.
Your original code represented points as 3-tuples consisting of two Doubles for the location and an Int for the point's label, but my V2 has no label. Can we recover this? Well, since the code doesn't at any point mention any concrete types - just standard type classes - we can just build a new type for labelled vectors. As long as said type is a Foldable Applicative all of the above code will continue to work without modification!
data Labelled m f a = Labelled m (f a) deriving (Show, Functor, Foldable)
instance (Monoid m, Applicative f) => Applicative (Labelled m f) where
pure = Labelled mempty . pure
Labelled m ff <*> Labelled n fx = Labelled (m <> n) (ff <*> fx)
The Monoid constraint is there because when combining actions you also need a way to combine their labels. I'm just going to use First - left-biased choice - because I'm not expecting the points' labels to be relevant to the zipping operations like modulus and boundingBox.
type LabelledV2 = Labelled (First Int) Pair Double
testPts :: [LabelledV2]
testPts = zipWith (Labelled . First . Just) [0..] $ map (uncurry Pair)
[ (0,0), (1,0), (2,1), (0,2)
, (5,2), (5,4), (4,3), (4,4)
, (8,2), (9,3), (10,2)
, (11,4), (12,3), (13,3), (13,5) ]
ghci> traverse (traverse (getFirst . lbl)) $ smallClusters 5 testPts
Just [[0,1,2,3],[4,5,6,7],[8,9,10],[11,12,13,14]] -- try reordering testPts
I'm loading an RGB image from disk with JuicyPixels-repa. Unfortunately the Array representation of the image is Array F DIM3 Word8 where the inner dimension is the RGB pixels. That's a bit incompatibe with existing repa imageprocessing algorithms where an RGB image is Array U DIM2 (Word8, Word8, Word8).
I want to calculate the RGB histograms of the image, I'm searching a function with the Signature:
type Hist = Array U DIM1 Int
histogram:: Array F DIM3 Word8 -> (Hist, Hist, Hist)
how can I fold my 3d array to get a 1d array for each colorchannel?
Edit:
The main problem is not that I'm not able to convert from DIM3 to DIM2 for each channel (easy done with slicing). The problem is that I have to iterate the source image DIM2 or DIM3 and have to accumulate to an DIM1 array of a different Shape (Z:.256) and extent.
So I can't use repa's foldS as it reduces the dimension by one, but with the same extent.
I also experimented with traverse but it iterates over the extent of the destination image, providing a function to get pixels from the source image, that would lead to very inefficient code, counting the same pixels for each colorvalue.
A good way would be a simple folding over a Vector with the histogram type as accumulator, but unfortunately I have no U (unboxed) or V (vector) based array, from which I can efficiently get a Vector. I have an Array F (foreign pointer).
Ok, I found a few minutes. Below, I cover four solutions and have made the worst solutions (the middle two, involving O(n) data conversion) really easy for you.
Lets Acknowledge the Dumb Solution
It's reasonable to start with the obvious. You could use Data.List.foldl to traverse the rows and columns, building up your histograms from initial zero arrays (untested/partial code follows):
foldl (\(histR, histG, histB) (row,col) ->
let r = arr ! (Z:.row:.col:.0)
g = arr ! (Z:.row:.col:.1)
b = arr ! (Z:.row:.col:.2)
in (incElem r histR, incElem g histG, incElem b histB)
(zero,zero,zero)
[ (row,col) | row <- [0..nrRow-1], col <- [0..nrCol-1] ]
...
where (Z:.nrRow:.nrCol:._) = extent arr
I'm not sure how efficient this will be, but suspect that it will do too much bounds checking. Switching to unsafeIndex should do reasonably, assuming the delayed arrays, hist*, do well due to however you'd pick to implement incElem.
You Can Build the Array You Want
Using traverse you can actually convert JP-Repa style arrays into DIM2 arrays with tuples for elements:
main = do
let arr = R.fromFunction (Z:.a:.b:.c) (\(Z:.i:.j:.k) -> i+j-k)
a =4 :: Int
b = 4 :: Int
c= 4 :: Int
new = R.traverse arr
(\(Z:.r:.c:._) -> Z:.r:.c) -- the extent
(\l idx -> (l (idx:.0)
,l (idx:.1)
,l (idx :. 2)))
print (R.computeS new :: R.Array R.U DIM2 (Int,Int,Int))
Could you point me to the body of code you talked about that uses this format? It would be simple to patch JP-Repa to include a function of this type.
You can build the Unboxed Vector You Mentioned
You mentioned an easy solution is to fold over unboxed vectors, but lamented that JP-repa doesn't provide an unboxed array. Luckily, conversion is simple:
toUnboxed :: Img a -> VU.Vector Word8
toUnboxed = R.toUnboxed . R.computeUnboxedS . R.delay . imgData
We Could Patch Repa
This is really only a problem because Repa doesn't have what I consider a normal traverse function. Repa's traverse is more of an array construction that happens to provide an indexing function into another array. We want traverse in the form:
newTraverse :: Array r sh e -> a -> (a -> sh -> e -> a) -> a
but of coarse this is actually just a malformed fold. So lets rename it and reorder the arguments:
foldAllIdxS :: (sh -> a - > e -> a) -> a -> Array r sh e -> a
which contrasts nicely with the (preexisting) foldAllS operation:
foldAllS :: (a -> a -> a) -> a -> Array r sh a -> a
Notice how our new fold has two critical characteristics. The result type is not required to match the element type, so we could start with a tuple of Histograms. Second, our version of fold passes the index, which allows you to select which histogram in the tuple to update (if any).
You can lazily use the latest JuicyPixels-Repa
To acquire your preferred Repa array format, or to acquire an unboxed vector, you can just use the newly uploaded JuicyPixels-Repa-0.6.
someImg <- readImage path :: IO (Either String (Img RGBA))
let img = either (error "Blah") id someImg
uvec = toUnboxed img
tupleArr = collapseColorChannel img
Now you can fold over the vector or use the tuple array directly, as you originally desired.
I also took an ugly stab at fleshing out the first, horribly naive, solution:
histograms :: Img a -> (Histogram, Histogram, Histogram, Histogram)
histograms (Img arr) =
let (Z:.nrRow:.nrCol:._) = R.extent arr
zero = R.fromFunction (Z:.256) (\_ -> 0 :: Word8)
incElem idx x = RU.unsafeTraverse x id (\l i -> l i + if i==(Z:.fromIntegral idx) then 1 else 0)
in Prelude.foldl (\(hR, hG, hB, hA) (row,col) ->
let r = R.unsafeIndex arr (Z:.row:.col:.0)
g = R.unsafeIndex arr (Z:.row:.col:.1)
b = R.unsafeIndex arr (Z:.row:.col:.2)
a = R.unsafeIndex arr (Z:.row:.col:.3)
in (incElem r hR, incElem g hG, incElem b hB, incElem a hA))
(zero,zero,zero,zero)
[ (row,col) | row <- [0..nrRow-1], col <- [0..nrCol-1] ]
I'm too wary of the performance of this code (3 traversals per index... I must be tired) to throw it into JP-Repa, but if you find it works well then let me know.