I have a function ascArr :: String -> BigData for parsing some big strict data from a string and another one, altitude :: BigData -> Pt -> Maybe Double, for getting something useful from the parsed data. I want to parse the big data once and then use the altitude function with the first argument fixed and the second one varying. Here's the code (TupleSections are enabled):
exampleParseAsc :: IO ()
exampleParseAsc = do
asc <- readFile "foo.asc"
let arr = ascArr asc
print $ map (altitude arr . (, 45)) [15, 15.01 .. 16]
This is all ok. Then I want to connect the two functions together and to use partial application for caching the big data. I use three versions of the same function:
parseAsc3 :: String -> Pt -> Maybe Double
parseAsc3 str = altitude d
where d = ascArr str
parseAsc4 :: String -> Pt -> Maybe Double
parseAsc4 str pt = altitude d pt
where d = ascArr str
parseAsc5 :: String -> Pt -> Maybe Double
parseAsc5 = curry (uncurry altitude . first ascArr)
And I call them like this:
exampleParseAsc2 :: IO ()
exampleParseAsc2 = do
asc <- readFile "foo.asc"
let alt = parseAsc5 asc
print $ map (alt . (, 45)) [15, 15.01 .. 16]
Only the parseAsc3 works like in the exampleParseAsc: Memory usage rises at the beginning (when allocating memory for the UArray in the BigData), then it is constant while parsing, then altitude quickly evaluates the result and then everything is done and the memory is freed. The other two versions are different: The memory usage rises multiple times until all the memory is consumed, I think that the parsed big data is not cached inside the alt closure. Could someone explain the behaviour? Why are the versions 3 and 4 not equivalent? In fact I started with something like parseAsc2 function and just after hours of trial I found out the parseAsc3 solution. And I am not satisfied without knowing the reason...
Here you can see all my effort (only the parseAsc3 does not consume whole the memory; parseAsc is a bit different from the others - it uses parsec and it was really greedy for memory, I'd be glad if some one explained me why, but I think that the reason is different than the main point of this question, you may just skip it):
type Pt = (Double, Double)
type BigData = (UArray (Int, Int) Double, Double, Double, Double)
parseAsc :: String -> Pt -> Maybe Double
parseAsc str (x, y) =
case parse ascParse "" str of
Left err -> error "no parse"
Right (x1, y1, coef, m) ->
let bnds = bounds m
i = (round $ (x - x1) / coef, round $ (y - y1) / coef)
in if inRange bnds i then Just $ m ! i else Nothing
where
ascParse :: Parsec String () (Double, Double, Double, UArray (Int, Int) Double)
ascParse = do
[w, h] <- mapM ((read <$>) . keyValParse digit) ["ncols", "nrows"]
[x1, y1, coef] <- mapM ((read <$>) . keyValParse (digit <|> char '.'))
["xllcorner", "yllcorner", "cellsize"]
keyValParse anyChar "NODATA_value"
replicateM 6 $ manyTill anyChar newline
rows <- replicateM h . replicateM w
$ read <$> (spaces *> many1 digit)
return (x1, y1, coef, listArray ((0, 0), (w - 1, h - 1)) (concat rows))
keyValParse :: Parsec String () Char -> String -> Parsec String () String
keyValParse format key = string key *> spaces *> manyTill format newline
parseAsc2 :: String -> Pt -> Maybe Double
parseAsc2 str (x, y) = if all (inRange bnds) (is :: [(Int, Int)])
then Just $ (ff * (1 - px) + cf * px) * (1 - py)
+ (fc * (1 - px) + cc * px) * py
else Nothing
where (header, elevs) = splitAt 6 $ lines str
header' = map ((!! 1) . words) header
[w, h] = map read $ take 2 header'
[x1, y1, coef, _] = map read $ drop 2 header'
bnds = ((0, 0), (w - 1, h - 1))
arr :: UArray (Int, Int) Double
arr = listArray bnds (concatMap (map read . words) elevs)
i = [(x - x1) / coef, (y - y1) / coef]
[ixf, iyf, ixc, iyc] = [floor, ceiling] >>= (<$> i)
is = [(ix, iy) | ix <- [ixf, ixc], iy <- [iyf, iyc]]
[px, py] = map (snd . properFraction) i
[ff, cf, fc, cc] = map (arr !) is
ascArr :: String -> BigData
ascArr str = (listArray bnds (concatMap (map read . words) elevs), x1, y1, coef)
where (header, elevs) = splitAt 6 $ lines str
header' = map ((!! 1) . words) header
[w, h] = map read $ take 2 header'
[x1, y1, coef, _] = map read $ drop 2 header'
bnds = ((0, 0), (w - 1, h - 1))
altitude :: BigData -> Pt -> Maybe Double
altitude d (x, y) = if all (inRange bnds) (is :: [(Int, Int)])
then Just $ (ff * (1 - px) + cf * px) * (1 - py)
+ (fc * (1 - px) + cc * px) * py
else Nothing
where (arr, x1, y1, coef) = d
bnds = bounds arr
i = [(x - x1) / coef, (y - y1) / coef]
[ixf, iyf, ixc, iyc] = [floor, ceiling] >>= (<$> i)
is = [(ix, iy) | ix <- [ixf, ixc], iy <- [iyf, iyc]]
[px, py] = map (snd . properFraction) i
[ff, cf, fc, cc] = map (arr !) is
parseAsc3 :: String -> Pt -> Maybe Double
parseAsc3 str = altitude d
where d = ascArr str
parseAsc4 :: String -> Pt -> Maybe Double
parseAsc4 str pt = altitude d pt
where d = ascArr str
parseAsc5 :: String -> Pt -> Maybe Double
parseAsc5 = curry (uncurry altitude . first ascArr)
Compiled with GHC 7.10.3, with -O optimization.
Thank you.
You can figure out what's happening by looking at the generated core from GHC. The evaluation semantics of optimized core are very predictable (unlike Haskell itself) so it is often a useful tool for performance analysis.
I compiled your code with ghc -fforce-recomp -O2 -ddump-simpl file.hs with GHC 7.10.3. You can look at the full output yoursefl but I've extracted the relevant bits:
$wparseAsc2
$wparseAsc2 =
\ w_s8e1 ww_s8e5 ww1_s8e6 ->
let { ...
parseAsc2 =
\ w_s8e1 w1_s8e2 ->
case w1_s8e2 of _ { (ww1_s8e5, ww2_s8e6) ->
$wparseAsc2 w_s8e1 ww1_s8e5 ww2_s8e6
}
The code above looks a little funny but is essentially Haskell. Note that the first thing parseAsc2 does is force its second argument to be evaluated (the case statement evaluates the tuple, which corresponds to the pattern match) - but not the string. The string won't be touched until deep inside $wParseAsc2 (definition omitted). But the part of the function that computes the "parse" is inside the lambda - it will be recomputed for every invocation of the function. You don't even have to look at what it is - the rules for evaluating core expressions are very prescriptive.
$wparseAsc
$wparseAsc =
\ w_s8g9 ww_s8gg ww1_s8gi -> ...
parseAsc
parseAsc =
\ w_s8g9 w1_s8ga ->
case w1_s8ga of _ { (ww1_s8gd, ww2_s8gi) ->
case ww1_s8gd of _ { D# ww4_s8gg ->
$wparseAsc w_s8g9 ww4_s8gg ww2_s8gi
}
}
The situation with parseAsc has little to do with Parsec*. This is much like version two - now both arguments are evaluated, however. This has little effect, however, on the performance, because the same problem is there - $wparseAsc is just a lambda, meaning all the work it does is done at every invocation of the function. There can be no sharing.
parseAsc3 =
\ str_a228 ->
let {
w_s8c1
w_s8c1 =
case $wascArr str_a228
of _ { (# ww1_s8gm, ww2_s8gn, ww3_s8go, ww4_s8gp #) ->
(ww1_s8gm, ww2_s8gn, ww3_s8go, ww4_s8gp)
} } in
\ w1_s8c2 ->
case w1_s8c2 of _ { (ww1_s8c5, ww2_s8c6) ->
$waltitude w_s8c1 ww1_s8c5 ww2_s8c6
}
Here is the "good" version. It takes a string, applies $wascArr to it, and then the string is never used again. This is crucial - if this function is partially applied to a string, you are left with let w_s = .. in \w1 -> ... - none of this mentions the string, so it can be garbage collected. The long lived reference is to w_s which is your "big data". And note: even if a reference to the string was maintained, and it could not be garbage collected, this version would still be substantially better - simply because it does not recompute the "parse" at each invocation of the function. This is the critical flaw - the fact that the string can be garbage collected immediately is extra.
parseAsc4 =
\ str_a22a pt_a22b ->
case pt_a22b of _ { (ww1_s8c5, ww2_s8c6) ->
$waltitude (ascArr str_a22a) ww1_s8c5 ww2_s8c6
}
Same issue as version two. Unlike version three, if you partially apply this, you get \w1 -> altitude (ascArr ...) ..., so ascArr is recomputed for every invocation of the function. It doesn't matter how you use this function - it simply won't work the way you want.
parseAsc5 = parseAsc4
Amazingly (to me), GHC figures out that parseAsc5 is precisely the same as parseAsc4! Well this one should be obvious then.
As for why GHC generates this particular core for this code, it really isn't easy to tell. In many cases the only way to guarantee sharing is to have explicit sharing in your original code. GHC does not do common subexpression elimination - parseAsc3 implements manual sharing.
*Maybe the parser itself has some performance issues too, but that isn't the focus here. If you have question about your Parsec parser (performance wise, or otherwise) I encourage you to ask a separate question.
Related
I started this function as follows:
type Place = (String, Float, Float, [Int])
distanceList :: Float -> Float-> [Place] -> [Float]
distanceList _ _ [] = []
distanceList degN degE ((location, float1, float2, rainfall):place) =
sqrt((float1-degN)^2 + (float2-degE)^2) : distanceList degN degE place
minDistance :: Float -> Float-> Float -> Float -> [Float] -> Float
minDistance _ _ _ _ [] = 0
minDistance degN degE float1 float2 (x:xs)
| x < minDistance degN degE float1 float2 xs = x
| otherwise = minDistance degN degE float1 float2 xs
closestPlace :: Float -> Float -> [Place] -> String
closestPlace _ _ [] = " An error as occured "
closestPlace degN degE ((location, float1, float2, rainfall):place)
| rainfall !!0 == 0 && pythag == distance = location
| otherwise = closestPlace degN degE place
where
pythag = sqrt((float1-degN)^2 + (float2-degE)^2)
distance = minDistance degN degE float1 float2 (distanceList
degN degE place)
i have test data which is passed through the Place. and the two floats are coordinates north and east, i need to use Pythagoras theorem to return a single String of which location is closest.
The code above runs but when thee closestPlace function runs it never gets passed the first part which is the error message, i just need the function to compare the Pythagorus output of the two functions above to and print the string of the location which is closest to the coordinates inputted.
the code to run this is as follows:
demo :: Int -> IO ()
demo 7 = putStrLn (closestPlace 55.0 (-5.3) testData)
You're pretty much there. As far as calculating the distance goes, one observation can be made:
If the distance to point A is greater than the distance to point B, the square of distance to A will be greater than the square of the distance to B.
This means that instead of comparing distances (and computing a square root unnecessarily), we can compare squares of them, which are calculated like so:
dist2 x1 y1 x2 y2 = (x2-x1)^2 + (y2-y1)^2
However, you also want to extract some additional metadata. Two common approaches are appending the sorting key and stripping it after, or using the comparison function directly. Here, the second approach will work quite well:
closestPlace :: Float -> Float -> [Place] -> Place
closestPlace x y = minimumBy (comparing (dist2P x y))
I've added a helper here called dist2P that allows us to compare a pair of points to a place directly; this is convenient, because we can then use comparing to automatically apply that on a pair of points that is then used by minimumBy to find the closest point.
dist2P :: Float -> Float -> Place -> Float
dist2P x y (_, x', y', _) = dist x y x' y'
That helper then gets partially applied with the source point.
Thanks to minimumBy we're also able to eta-reduce closestPlace, which is always nice.
And finally, I've decided to change the signature of your function to return Place instead of String. Why? Well, if it turns out you need any other information from the place, you can easily access it with a getter like so:
placeName :: Place -> String
placeName (s, _, _, _) = s
And similarly, your original function would thus be implemented as a composition of the two:
closestPlaceName :: Float -> Float -> [Place] -> String
closestPlaceName x y = placeName . closestPlace x y
all.
While trying to solve some programming quiz:
https://www.hackerrank.com/challenges/missing-numbers
, I came across with space leak.
Main function is difference, which implements multi-set difference.
I've found out that List ':' and Triples (,,) kept on heaps
with -hT option profiling. However, only big lists are difference's
two arguments, and it shrinks as difference keeps on tail recursion.
But the memory consumed by lists keeps increasing as program runs.
Triples is ephemeral array structure, used for bookkeeping the count of multiset's each element. But the memory consumed by triples also
keeps increasing, and I cannot find out why.
Though I've browsed similar 'space leak' questions in stackoverflow,
I couldn't grasp the idea. Surely I have much to study.
I appreciate any comments. Thank you.
p.s) executable is compiled with -O2 switch.
$ ./difference -hT < input04.txt
Stack space overflow: current size 8388608 bytes.
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.6.3
.
import Data.List
import Data.Array
-- array (non-zero-count, start-offset, array_data)
array_size=101
myindex :: Int -> Int -> Int
myindex key offset
| key >= offset = key - offset
| otherwise = key - offset + array_size
mylookup x (_,offset,arr) = arr ! idx
where idx = myindex x offset
addOrReplace :: Int -> Int -> (Int, Int, Array Int (Int,Int)) -> (Int, Int, Array Int (Int,Int))
addOrReplace key value (count,offset,arr) = (count', offset, arr // [(idx,(key,value))])
where idx = myindex key offset
(_,prev_value) = arr ! idx
count' = case (prev_value, value) of
(0,0) -> count
(0,_) -> count + 1
(_,0) -> count - 1
otherwise -> count
difference :: (Int,Int,Array Int (Int,Int)) -> [Int] -> [Int] -> [Int]
difference (count,offset,arr) [] []
| count == 0 = []
| otherwise = [ k | x <- [0..array_size-1], let (k,v) = (arr ! x), v /= 0]
difference m (x:xs) y = difference new_m xs y
where (_,v) = mylookup x m
new_m = addOrReplace x (v + 1) m
difference m [] (y:ys) = difference new_m [] ys
where (_,v) = mylookup y m
new_m = if v == 0
then m
else addOrReplace y (v - 1) m
main = do
n <- readLn :: IO Int
pp <- getLine
m <- readLn :: IO Int
qq <- getLine
let p = map (read :: String->Int) . words $ pp
q = map (read :: String->Int) . words $ qq
startArray = (0,head q, array (0,100) [(i,(0,0)) | i <- [0..100]] )
putStrLn . unwords . map show . sort $ difference startArray q p
[EDIT]
I seq'ed value and Array thanks to Carl's advice.
I attach heap diagram.
[original heap profiling]
[]1
[after seq'ing value v]
difference m (x:xs) y = difference new_m xs y
where (_,v) = mylookup x m
new_m = v `seq` addOrReplace x (v + 1) m
[after seq'ing value v and Array]
difference m (x:xs) y = new_m `seq` difference new_m xs y
where (_,v) = mylookup x m
new_m = v `seq` addOrReplace x (v + 1) m
I see three main problems with this code.
First (and not the cause of the memory use, but definitely the cause of generally poor performance) Array is horrible for this use case. O(1) lookups are useless when updates are O(n).
Speaking of, the values being stored in the Array aren't forced while difference is looping over its first input. They are thunks containing pointers to an unevaluated lookup in the previous version of the array. You can ensure that the value is evaluated at the same time the array is updated, in a variety of ways. When difference loops over its second input, it does this accidentally, in fact, by comparing the value against 0.
Third, difference doesn't even force the evaluation of the new arrays being created while traversing its first argument. Nothing requires the old array to be evaluated during that portion of the loop.
Both of those latter issues need to be resolved to fix the space leak. The first issue doesn't cause a space leak, just much higher overheads than needed.
Again I'm requesting comments about how a given design should be implemented in Haskell.
Thanks in advance to everyone providing helpful comments. Also I hope this could be an aid to other Haskell novices like me, having a practical sample code.
This time, we have a polymorphic function doSampling (in module Samples) that takes a generic function f and
a list of reals (indexes) and returns a Samples (indexes, values=f(indexes)). We want implement doSampling only once, as it doesn't matter if is f is a Polynomial or a Sinus. For that,
we have introduced an interface Function, and have Polynomial and Sinus types implement it.
The following is the design being implemented:
Edit 1:
There is a debate on the Function interface (class in Haskell). It has been suggested it is not actually necessary, as doSampling may take a "nude" function (Double -> Double) instead.
But, how to do it, if you need some extra state within the nude function (coeffs for a polynomial, amp+freq+phase for a sinus?
Edit 2:
Very good answers by kosmikus and by Chris Taylor. Thanks.
A key idea in both: have
doSampling :: (Double -> Double) -> [Double] -> Samples
This is: it takes a function (Double -> Double) (instead of Function) and list and returns samples.
My intention was to keep the state of Polynomials and Sinuses. That is not regarded in Chris answer, but it is in kosmikus'. On the other hand, the weak point in kosmikus version could be how to extend its Function definition if you don't have access to the source code.
I would also point out:
Chris' idea of encapsulating a polynomial or a sinus into a function (Double -> Double) by means of a factory function mkPolynomial or mkSinus that generates (using currying?) the desired function taking the apropriate parameters. (Although you can't consult the parameters later).
kosmikous' idea of using value to transform (also using currying?) a Function into a (Double -> Double)
Both answers are worth reading as they have other little Haskell tricks to reduce and simplify code.
In sum
Chris answers does not support keeping the state of a Polynomial or of a Sinus
kosmikus answers is not extensible: adding new type of functions (Cosinus ...)
my answer (being verbose) does overcome the previous downsides, and it would allow (this not necessary for the problem) impose Function types to have more associated-functions apart of value (in the sense of how an java-interfaces work).
My own approach
main (usage)
import Polynomial
import Sinus
import Function
import Samples
-- ...............................................................
p1 = Polynomial [1, 0, 0.5] -- p(x) = 1 + 0.5x^2
s1 = Sinus 2 0.5 3 -- f(x) = 2 sin(0.5x + 3)
-- ...............................................................
-- sample p1 from 0 to 5
m1 = doSampling p1 [0, 0.5 .. 5]
m2 = doSampling s1 [0, 0.5 .. 5]
-- ...............................................................
-- main
-- ...............................................................
main = do
putStrLn "Hello"
print $ value p1 2
print $ value s1 (pi/2)
print $ pairs m1
print $ pairs m2
Function
module Function where
-- ...............................................................
-- "class type" : the types belonging to this family of types
-- must implement the following functions:
-- + value : takes a function and a real and returns a real
-- ...............................................................
class Function f where
value :: f -> Double -> Double
-- f is a type variable, this is:
-- f is a type of the Function "family" not an actual function
Samples
module Samples where
import Function
-- ...............................................................
-- Samples: new data type
-- This is the constructor and says it requieres
-- two list, one for the indexes (xs values) and another
-- for the values ( ys = f (xs) )
-- this constructor should not be used, instead use
-- the "factory" function: new_Samples that performs some checks
-- ...............................................................
data Samples = Samples { indexes :: [Double] , values :: [Double] }
deriving (Show)
-- ...............................................................
-- constructor: it checks lists are equal size, and indexes are sorted
new_Samples :: [Double] -> [Double] -> Samples
new_Samples ind val
| (length ind) /= (length val) = samplesVoid
| not $ isSorted ind = samplesVoid
| otherwise = Samples ind val
-- ...............................................................
-- sample a funcion
-- it takes a funcion f and a list of indexes and returns
-- a Samples calculating the values array as f(indexes)
doSampling :: (Function f) => f -> [Double] -> Samples
doSampling f ind = new_Samples ind vals
where
vals = [ value f x | x <- ind ]
-- ...............................................................
-- used as "error" in the construction
samplesVoid = Samples [] []
-- ...............................................................
size :: Samples -> Int
size samples = length (indexes samples)
-- ...............................................................
-- utility function to get a pair (index,value) out of a Samples
pairs :: Samples -> [(Double, Double)]
pairs samples = pairs' (indexes samples) (values samples)
pairs' :: [Double] -> [Double] -> [(Double, Double)]
pairs' [] [] = []
pairs' [i] [v] = [(i,v)]
pairs' (i:is) (v:vs) = (i,v) : pairs' is vs
-- ...............................................................
-- to check whether a list is sorted (<)
isSorted :: (Ord t) => [t] -> Bool
isSorted [] = True
isSorted [e] = True
isSorted (e1:(e2:tail))
| e1 < e2 = isSorted (e2:tail)
| otherwise = False
Sinus
module Sinus where
-- ...............................................................
import Function
-- ...............................................................
-- Sinus: new data type
-- This is the constructor and says it requieres
-- a three reals
-- ...............................................................
data Sinus = Sinus { amplitude :: Double, frequency :: Double, phase :: Double }
deriving (Show)
-- ...............................................................
-- we say that a Sinus is a Function (member of the class Function)
-- and then, how value is implemented
instance Function Sinus where
value s x = (amplitude s) * sin ( (frequency s)*x + (phase s))
Polynomial
module Polynomial where
-- ...............................................................
import Function
-- ...............................................................
-- Polynomial: new data type
-- This is the constructor and says it requieres
-- a list of coefficients
-- ...............................................................
data Polynomial = Polynomial { coeffs :: [Double] }
deriving (Show)
-- ...............................................................
degree :: Polynomial -> Int
degree p = length (coeffs p) - 1
-- ...............................................................
-- we say that a Polynomial is a Function (member of the class Function)
-- and then, how value is implemented
instance Function Polynomial where
value p x = value' (coeffs p) x 1
-- list of coeffs -> x -> pw (power of x) -> Double
value' :: [Double] -> Double -> Double -> Double
value' (c:[]) _ pw = c * pw
value' (c:cs) x pw = (c * pw) + (value' cs x x*pw)
You certainly don't need the Function class. All this heavyweight class, instance, member variable fluff is one of the things that Haskell is designed to avoid. Pure functions can be much more flexible.
Here's a simple way of doing what you want.
type Sample = ([Double], [Double])
newSample xs vs
| isSorted xs && length xs == length vs = (indices, values)
| otherwise = ([], [])
pairs = uncurry zip
doSampling :: (Double -> Double) -> [Double] -> Sample
doSampling f xs = newSample xs (map f xs)
mkPolynomial :: [Double] -> (Double -> Double)
mkPolynomial coefs x = go coefs
where
go [] = 0
go (c:cs) = c + x * go cs
mkSinus :: Double -> Double -> Double -> (Double -> Double)
mkSinus amp freq phase x = amp * sin (freq * x + phase)
p1 = mkPolynomial [1, 0, 0.5] -- 1 + 0.5x^2
s1 = mkSinus 2 0.5 3 -- 2 sin(0.5x + 3)
m1 = doSampling p1 [0, 0.5 .. 5]
m2 = doSampling s1 [0, 0.5 .. 5]
main :: IO ()
main = do
print $ p1 2
print $ s1 (pi/2)
print $ pairs m1
print $ pairs m2
[Expanded my comment on request.]
I'd probably do this roughly as follows:
import Data.Functor
-- Use a datatype rather than a class. Yes, this makes it harder to
-- add new types of functions later, and in turn easier to define new
-- operations. ("expression problem")
data Function =
Sinus { amplitude :: Double, frequency :: Double, phase :: Double }
| Polynomial { coeffs :: [Double] }
deriving (Show)
-- Interpreting a Function as an actual function.
value :: Function -> (Double -> Double)
value (Sinus amp freq ph) x = amp * sin (freq * x + ph)
value (Polynomial cs) x = value' cs x
-- Rewrite value' to not require non-empty lists. This can also be
-- nicely written as a fold.
value' :: [Double] -> Double -> Double
value' [] _ = 0
value' (c:cs) x = c + x * value' cs x
data Samples = Samples { indexes :: [Double] , values :: [Double] }
deriving (Show)
-- Use Maybe to detect error conditions, instead of strange values
-- such as voidSamples.
newSamples :: [Double] -> [Double] -> Maybe Samples
newSamples ind val
| length ind /= length val = Nothing
| not $ isSorted ind = Nothing
| otherwise = Just (Samples ind val)
doSampling :: (Double -> Double) -> [Double] -> Maybe Samples
doSampling f ind = newSamples ind (map f ind)
isSorted :: (Ord t) => [t] -> Bool
isSorted [] = True
isSorted [e] = True
isSorted (e1:e2:es)
| e1 < e2 = isSorted (e2:es)
| otherwise = False
-- This is just zip.
pairs :: Samples -> [(Double, Double)]
pairs (Samples idxs vals) = zip idxs vals
p1 = Polynomial [1, 0, 0.5] -- p(x) = 1 + 0.5x^2
s1 = Sinus 2 0.5 3 -- f(x) = 2 sin(0.5x + 3)
m1 = doSampling (value p1) [0, 0.5 .. 5]
m2 = doSampling (value s1) [0, 0.5 .. 5]
-- The <$> maps over a Maybe.
main = do
putStrLn "Hello"
print $ value p1 2
print $ value s1 (pi/2)
print $ pairs <$> m1
print $ pairs <$> m2
I'm trying to write a Haskell program to parse huge text file (about 14Gb), but i can't understand how to make it free unused data from memory or not to make stack overflow during foldr. Here is the program source:
import qualified Data.ByteString.Lazy.Char8 as LBS
import qualified Data.ByteString.Lex.Lazy.Double as BD
import System.Environment
data Vertex =
Vertex{
vertexX :: Double,
vertexY :: Double,
vertexZ :: Double}
deriving (Eq, Show, Read)
data Extent =
Extent{
extentMax :: Vertex,
extentMin :: Vertex}
deriving (Eq, Show, Read)
addToExtent :: Extent -> Vertex -> Extent
addToExtent ext vert = Extent vertMax vertMin where
(vertMin, vertMax) = (makeCmpVert max (extentMax ext) vert, makeCmpVert min (extentMin ext) vert) where
makeCmpVert f v1 v2 = Vertex(f (vertexX v1) (vertexX v2))
(f (vertexY v1) (vertexY v2))
(f (vertexZ v1) (vertexZ v2))
readCoord :: LBS.ByteString -> Double
readCoord l = case BD.readDouble l of
Nothing -> 0
Just (value, _) -> value
readCoords :: LBS.ByteString -> [Double]
readCoords l | LBS.length l == 0 = []
| otherwise = let coordWords = LBS.split ' ' l
in map readCoord coordWords
parseLine :: LBS.ByteString -> Vertex
parseLine line = Vertex (head coords) (coords!!1) (coords!!2) where
coords = readCoords line
processLines :: [LBS.ByteString] -> Extent -> Extent
processLines strs ext = foldr (\x y -> addToExtent y (parseLine x)) ext strs
processFile :: String -> IO()
processFile name = do
putStrLn name
content <- LBS.readFile name
let (countLine:recordsLines) = LBS.lines content
case LBS.readInt countLine of
Nothing -> putStrLn "Can't read records count"
Just (recordsCount, _) -> do
print recordsCount
let vert = parseLine (head recordsLines)
let ext = Extent vert vert
print $ processLines recordsLines ext
main :: IO()
main = do
args <- getArgs
case args of
[] -> do
putStrLn "Missing file path"
xs -> do
processFile (head xs)
return()
Text file contains lines with three floating point numbers delimited with space character. This program always tries to occupy all free memory on a computer and crashes with out of memory error.
You are being too lazy. Vertex and Extent have non-strict fields, and all your functions returning a Vertex return
Vertex thunk1 thunk2
without forcing the components to be evaluated. Also addToExtent directly returns an
Extent thunk1 thunk2
without evaluating the components.
Thus none of the ByteStrings actually is released early to be garbage-collected, since the Doubles are not parsed from them yet.
When that is fixed by making the fields of Vertex and Extent strict - or the functions returning a Vertex resp. Extent forcing all parts of their input, you have the problem that
processLines strs ext = foldr (\x y -> addToExtent y (parseLine x)) ext strs
can't start assembling the result before the end of the list of lines is reached because then
(\x y -> addToExtent y (parseLine x))
is strict in its second argument.
However, barring NaNs and undefined values, if I didn't miss something, the result would be the same if you use a (strict!) left fold, so
processLines strs ext = foldl' (\x y -> addToExtent x (parseLine y)) ext strs
should produce the desired result without holding on to the data if Vertex and Extent get strict fields.
Ah, I did miss something:
addToExtent ext vert = Extent vertMax vertMin
where
(vertMin, vertMax) = (makeCmpVert max (extentMax ext) vert, makeCmpVert min (extentMin ext)
If that isn't a typo (what I expect it is), fixing that would be somewhat difficult.
I think it should be
(vertMax, vertMin) = ...
addToExtent is too lazy. A possible alternative definition is
addToExtent :: Extent -> Vertex -> Extent
addToExtent ext vert = vertMax `seq` vertMin `seq` Extent vertMax vertMin where
(vertMin, vertMax) = (makeCmpVert max (extentMax ext) vert, makeCmpVert min (extentMinext) vert) where
makeCmpVert f v1 v2 = Vertex(f (vertexX v1) (vertexX v2))
(f (vertexY v1) (vertexY v2))
(f (vertexZ v1) (vertexZ v2))
data Vertex =
Vertex{
vertexX :: {-# UNPACK #-} !Double,
vertexY :: {-# UNPACK #-} !Double,
vertexZ :: {-# UNPACK #-} !Double}
deriving (Eq, Show, Read)
The problem is that vertMin and vertMax are never evaluated until the entire file is processed - resulted in two huge thunks in Extent.
I also recommend changing the definition of Extent to
data Extent =
Extent{
extentMax :: !Vertex,
extentMin :: !Vertex}
deriving (Eq, Show, Read)
(though with these changes, the seq calls in addToExtent become redundant).
Here is my first Haskell program. What parts would you write in a better way?
-- Multiplication table
-- Returns n*n multiplication table in base b
import Text.Printf
import Data.List
import Data.Char
-- Returns n*n multiplication table in base b
mulTable :: Int -> Int -> String
mulTable n b = foldl (++) (verticalHeader n b w) (map (line n b w) [0..n])
where
lo = 2* (logBase (fromIntegral b) (fromIntegral n))
w = 1+fromInteger (floor lo)
verticalHeader :: Int -> Int -> Int -> String
verticalHeader n b w = (foldl (++) tableHeader columnHeaders)
++ "\n"
++ minusSignLine
++ "\n"
where
tableHeader = replicate (w+2) ' '
columnHeaders = map (horizontalHeader b w) [0..n]
minusSignLine = concat ( replicate ((w+1)* (n+2)) "-" )
horizontalHeader :: Int -> Int -> Int -> String
horizontalHeader b w i = format i b w
line :: Int -> Int -> Int -> Int -> String
line n b w y = (foldl (++) ((format y b w) ++ "|" )
(map (element b w y) [0..n])) ++ "\n"
element :: Int -> Int -> Int -> Int -> String
element b w y x = format (y * x) b w
toBase :: Int -> Int -> [Int]
toBase b v = toBase' [] v where
toBase' a 0 = a
toBase' a v = toBase' (r:a) q where (q,r) = v `divMod` b
toAlphaDigits :: [Int] -> String
toAlphaDigits = map convert where
convert n | n < 10 = chr (n + ord '0')
| otherwise = chr (n + ord 'a' - 10)
format :: Int -> Int -> Int -> String
format v b w = concat spaces ++ digits ++ " "
where
digits = if v == 0
then "0"
else toAlphaDigits ( toBase b v )
l = length digits
spaceCount = if (l > w) then 0 else (w-l)
spaces = replicate spaceCount " "
Here are some suggestions:
To make the tabularity of the computation more obvious, I would pass the list [0..n] to the line function rather than passing n.
I would further split out the computation of the horizontal and vertical axes so that they are passed as arguments to mulTable rather than computed there.
Haskell is higher-order, and almost none of the computation has to do with multiplication. So I would change the name of mulTable to binopTable and pass the actual multiplication in as a parameter.
Finally, the formatting of individual numbers is repetitious. Why not pass \x -> format x b w as a parameter, eliminating the need for b and w?
The net effect of the changes I am suggesting is that you build a general higher-order function for creating tables for binary operators. Its type becomes something like
binopTable :: (i -> String) -> (i -> i -> i) -> [i] -> [i] -> String
and you wind up with a much more reusable function—for example, Boolean truth tables should be a piece of cake.
Higher-order and reusable is the Haskell Way.
You don't use anything from import Text.Printf.
Stylistically, you use more parentheses than necessary. Haskellers tend to find code more readable when it's cleaned of extraneous stuff like that. Instead of something like h x = f (g x), write h = f . g.
Nothing here really requires Int; (Integral a) => a ought to do.
foldl (++) x xs == concat $ x : xs: I trust the built-in concat to work better than your implementation.
Also, you should prefer foldr when the function is lazy in its second argument, as (++) is – because Haskell is lazy, this reduces stack space (and also works on infinite lists).
Also, unwords and unlines are shortcuts for intercalate " " and concat . map (++ "\n") respectively, i.e. "join with spaces" and "join with newlines (plus trailing newline)"; you can replace a couple things by those.
Unless you use big numbers, w = length $ takeWhile (<= n) $ iterate (* b) 1 is probably faster. Or, in the case of a lazy programmer, let w = length $ toBase b n.
concat ( (replicate ((w+1)* (n+2)) "-" ) == replicate ((w+1) * (n+2)) '-' – not sure how you missed this one, you got it right just a couple lines up.
You do the same thing with concat spaces, too. However, wouldn't it be easier to actually use the Text.Printf import and write printf "%*s " w digits?
Norman Ramsey gave excellent high-level (design) suggestions; Below are some low-level ones:
First, consult with HLint. HLint is a friendly program that gives you rudimentary advice on how to improve your Haskell code!
In your case HLint gives 7 suggestions. (mostly about redundant brackets)
Modify your code according to HLint's suggestions until it likes what you feed it.
More HLint-like stuff:
concat (replicate i "-"). Why not replicate i '-'?
Consult with Hoogle whenever there is reason to believe that a function you need is already available in Haskell's libraries. Haskell comes with tons of useful functions so Hoogle should come in handy quite often.
Need to concatenate strings? Search for [String] -> String, and voila you found concat. Now go replace all those folds.
The previous search also suggested unlines. Actually, this even better suits your needs. It's magic!
Optional: pause and thank in your heart to Neil M for making Hoogle and HLint, and thank others for making other good stuff like Haskell, bridges, tennis balls, and sanitation.
Now, for every function that takes several arguments of the same type, make it clear which means what, by giving them descriptive names. This is better than comments, but you can still use both.
So
-- Returns n*n multiplication table in base b
mulTable :: Int -> Int -> String
mulTable n b =
becomes
mulTable :: Int -> Int -> String
mulTable size base =
To soften the extra characters blow of the previous suggestion: When a function is only used once, and is not very useful by itself, put it inside its caller's scope in its where clause, where it could use the callers' variables, saving you the need to pass everything to it.
So
line :: Int -> Int -> Int -> Int -> String
line n b w y =
concat
$ format y b w
: "|"
: map (element b w y) [0 .. n]
element :: Int -> Int -> Int -> Int -> String
element b w y x = format (y * x) b w
becomes
line :: Int -> Int -> Int -> Int -> String
line n b w y =
concat
$ format y b w
: "|"
: map element [0 .. n]
where
element x = format (y * x) b w
You can even move line into mulTable's where clause; imho, you should.
If you find a where clause nested inside another where clause troubling, then I suggest to change your indentation habits. My recommendation is to use consistent indentation of always 2 or always 4 spaces. Then you can easily see, everywhere, where the where in the other where is at. ok
Below's what it looks like (with a few other changes in style):
import Data.List
import Data.Char
mulTable :: Int -> Int -> String
mulTable size base =
unlines $
[ vertHeaders
, minusSignsLine
] ++ map line [0 .. size]
where
vertHeaders =
concat
$ replicate (cellWidth + 2) ' '
: map horizontalHeader [0 .. size]
horizontalHeader i = format i base cellWidth
minusSignsLine = replicate ((cellWidth + 1) * (size + 2)) '-'
cellWidth = length $ toBase base (size * size)
line y =
concat
$ format y base cellWidth
: "|"
: map element [0 .. size]
where
element x = format (y * x) base cellWidth
toBase :: Integral i => i -> i -> [i]
toBase base
= reverse
. map (`mod` base)
. takeWhile (> 0)
. iterate (`div` base)
toAlphaDigit :: Int -> Char
toAlphaDigit n
| n < 10 = chr (n + ord '0')
| otherwise = chr (n + ord 'a' - 10)
format :: Int -> Int -> Int -> String
format v b w =
spaces ++ digits ++ " "
where
digits
| v == 0 = "0"
| otherwise = map toAlphaDigit (toBase b v)
spaces = replicate (w - length digits) ' '
0) add a main function :-) at least rudimentary
import System.Environment (getArgs)
import Control.Monad (liftM)
main :: IO ()
main = do
args <- liftM (map read) $ getArgs
case args of
(n:b:_) -> putStrLn $ mulTable n b
_ -> putStrLn "usage: nntable n base"
1) run ghc or runhaskell with -Wall; run through hlint.
While hlint doesn't suggest anything special here (only some redundant brackets), ghc will tell you that you don't actually need Text.Printf here...
2) try running it with base = 1 or base = 0 or base = -1
If you want multiline comments use:
{- A multiline
comment -}
Also, never use foldl, use foldl' instead, in cases where you are dealing with large lists which must be folded. It is more memory efficient.
A brief comments saying what each function does, its arguments and return value, is always good. I had to read the code pretty carefully to fully make sense of it.
Some would say if you do that, explicit type signatures may not be required. That's an aesthetic question, I don't have a strong opinion on it.
One minor caveat: if you do remove the type signatures, you'll automatically get the polymorphic Integral support ephemient mentioned, but you will still need one around toAlphaDigits because of the infamous "monomorphism restriction."