I build an old big project, Pugs, with ghc 7.10.1 using stack build (I wrote my own stack.yaml). Then I run stack build --library-profiling --executable-profiling and .stack-work/install/x86_64-osx/nightly-2015-06-26/7.10.1/bin/pugs -e 'my $i=0; for (1..100_000) { $i++ }; say $i' +RTS -pa and output the following pugs.prof file.
Fri Jul 10 00:10 2015 Time and Allocation Profiling Report (Final)
pugs +RTS -P -RTS -e my $i=0; for (1..10_000) { $i++ }; say $i
total time = 0.60 secs (604 ticks # 1000 us, 1 processor)
total alloc = 426,495,472 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc ticks bytes
MAIN MAIN 92.2 90.6 557 386532168
CAF Pugs.Run 2.8 5.2 17 22191000
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc ticks bytes
MAIN MAIN 287 0 92.2 90.6 100.0 100.0 557 386532168
listAssocOp Pugs.Parser.Operator 841 24 0.0 0.0 0.0 0.0 0 768
nassocOp Pugs.Parser.Operator 840 24 0.0 0.0 0.0 0.0 0 768
lassocOp Pugs.Parser.Operator 839 24 0.0 0.0 0.0 0.0 0 768
rassocOp Pugs.Parser.Operator 838 24 0.0 0.0 0.0 0.0 0 768
postfixOp Pugs.Parser.Operator 837 24 0.0 0.0 0.0 0.0 0 768
termOp Pugs.Parser.Operator 824 24 0.0 0.5 0.7 1.2 0 2062768
insert Data.HashTable.ST.Basic 874 1 0.0 0.0 0.0 0.0 0 152
checkOverflow Data.HashTable.ST.Basic 890 1 0.0 0.0 0.0 0.0 0 80
readDelLoad Data.HashTable.ST.Basic 893 0 0.0 0.0 0.0 0.0 0 184
writeLoad Data.HashTable.ST.Basic 892 0 0.0 0.0 0.0 0.0 0 224
readLoad Data.HashTable.ST.Basic 891 0 0.0 0.0 0.0 0.0 0 184
_values Data.HashTable.ST.Basic 889 1 0.0 0.0 0.0 0.0 0 0
_keys Data.HashTable.ST.Basic 888 1 0.0 0.0 0.0 0.0 0 0
.. snip ..
MAIN costs 92.2% of time, however, I don't know what MAIN means. What does MAIN label mean?
I was in the same spot a few days ago. What I deduced is the same thing, MAIN is expressions without anotations. It's counts shrink significantly if you add "-fprof-auto" and "-caf-all". Those options will also let you find a lot of interesting things happening in your code.
Related
I've got a Haskell program that is performing non linearly performance wise (worse then O(n)).
I'm trying to investigate whether memoization is taking place on a function, can I verify this? I'm familiar with GHC profiling - but I'm not too sure which values I should be looking at?
A work around is too just plug some values and observe the execution time - but it's not ideal.
As far as I know there is no automatic memoization in Haskell.
That said there seems to be an optimization in GHC that caches values for parameterless function like the following
rightTriangles = [ (a,b,c) |
c <- [1..],
b <- [1..c],
a <- [1..b],
a^2 + b^2 == c^2]
If you try out the following in GHCi twice, you'll see that the second call ist much faster:
ghci > take 500 rightTriangles
Not really an answer but should still be helpfull, memoization does not seem to make a difference in profiling output in terms of function "entries". Demonstrated with the following basic example:
module Main where
fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)
fibmemo = (map fib [0 ..] !!)
main :: IO ()
main = do
putStrLn "Begin.."
print $ fib 10
-- print $ fibmemo 10
With the above code the profiling output is:
individual inherited
COST CENTRE MODULE SRC no. entries %time %alloc %time %alloc
MAIN MAIN <built-in> 119 0 0.0 1.3 0.0 100.0
CAF Main <entire-module> 237 0 0.0 1.0 0.0 1.2
main Main Main.hs:(12,1)-(14,16) 238 1 0.0 0.2 0.0 0.2
fib Main Main.hs:(5,1)-(7,29) 240 177 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal <entire-module> 230 0 0.0 1.2 0.0 1.2
CAF GHC.IO.Encoding <entire-module> 220 0 0.0 5.4 0.0 5.4
CAF GHC.IO.Encoding.Iconv <entire-module> 218 0 0.0 0.4 0.0 0.4
CAF GHC.IO.Handle.FD <entire-module> 210 0 0.0 67.7 0.0 67.7
CAF GHC.IO.Handle.Text <entire-module> 208 0 0.0 0.2 0.0 0.2
main Main Main.hs:(12,1)-(14,16) 239 0 0.0 22.6 0.0 22.6
While if we comment out fib 10 and uncomment the fibmemo 10 we get:
individual inherited
COST CENTRE MODULE SRC no. entries %time %alloc %time %alloc
MAIN MAIN <built-in> 119 0 0.0 1.2 0.0 100.0
CAF Main <entire-module> 237 0 0.0 1.0 0.0 2.9
fibmemo Main Main.hs:9:1-29 240 1 0.0 1.6 0.0 1.6
fib Main Main.hs:(5,1)-(7,29) 242 177 0.0 0.0 0.0 0.0
main Main Main.hs:(12,1)-(15,20) 238 1 0.0 0.2 0.0 0.2
fibmemo Main Main.hs:9:1-29 241 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal <entire-module> 230 0 0.0 1.2 0.0 1.2
CAF GHC.IO.Encoding <entire-module> 220 0 0.0 5.3 0.0 5.3
CAF GHC.IO.Encoding.Iconv <entire-module> 218 0 0.0 0.4 0.0 0.4
CAF GHC.IO.Handle.FD <entire-module> 210 0 0.0 66.6 0.0 66.6
CAF GHC.IO.Handle.Text <entire-module> 208 0 0.0 0.2 0.0 0.2
main Main Main.hs:(12,1)-(15,20) 239 0 0.0 22.2 0.0 22.2
I'm still kinda new to Haskell and learning new things every day. My problem is a too high memory usage during seralization using the Data.Binary library. Maybe I'm just using the library the wrong way, but I can't figure it out.
The actual idea is, that I read binary data from disk, add new data und write everything back to disk. Here's the code:
module Main
where
import Data.Binary
import System.Environment
import Data.List (foldl')
data DualNo = DualNo Int Int deriving (Show)
instance Data.Binary.Binary DualNo where
put (DualNo a b) = do
put a
put b
get = do
a <- get
b <- get
return (DualNo a b)
-- read DualNo from HDD
readData :: FilePath -> IO [DualNo]
readData filename = do
no <- decodeFile filename :: IO [DualNo]
return no
-- write DualNo to HDD
writeData :: [DualNo] -> String -> IO ()
writeData no filename = encodeFile filename (no :: [DualNo])
writeEmptyDataToDisk :: String -> IO ()
writeEmptyDataToDisk filename = writeData [] filename
-- feed a the list with a new dataset
feedWithInputData :: [DualNo] -> [(Int, Int)] -> [DualNo]
feedWithInputData existData newData = foldl' func existData newData
where
func dataset (a,b) = DualNo a b : dataset
main :: IO ()
main = do
[newInputData, toPutIntoExistingData] <- System.Environment.getArgs
if toPutIntoExistingData == "empty"
then writeEmptyDataToDisk "myData.dat"
else return ()
loadedData <- readData "myData.dat"
newData <- return (case newInputData of
"dataset1" -> feedWithInputData loadedData dataset1
"dataset2" -> feedWithInputData loadedData dataset2
otherwise -> feedWithInputData loadedData dataset3)
writeData newData "myData.dat"
dataset1 = zip [1..100000] [2,4..200000]
dataset2 = zip [5,10..500000] [3,6..300000]
dataset3 = zip [4,8..400000] [6,12..600000]
I'm pretty sure, there's a lot to improve in this code. But my biggest problem is the memory usage with big datasets.
I profiled my programm with GHC.
$ ghc -O2 --make -prof -fprof-auto -auto-all -caf-all -rtsopts -fforce-recomp Main.hs
$ ./Main dataset1 empty +RTS -p -sstderr
165,085,864 bytes allocated in the heap
70,643,992 bytes copied during GC
12,298,128 bytes maximum residency (7 sample(s))
424,696 bytes maximum slop
35 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 306 colls, 0 par 0.035s 0.035s 0.0001s 0.0015s
Gen 1 7 colls, 0 par 0.053s 0.053s 0.0076s 0.0180s
INIT time 0.001s ( 0.001s elapsed)
MUT time 0.059s ( 0.062s elapsed)
GC time 0.088s ( 0.088s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.003s ( 0.003s elapsed)
Total time 0.154s ( 0.154s elapsed)
%GC time 57.0% (57.3% elapsed)
Alloc rate 2,781,155,968 bytes per MUT second
Productivity 42.3% of total user, 42.5% of total elapsed
Looking at the prof-file:
Tue Apr 12 18:11 2016 Time and Allocation Profiling Report (Final)
Main +RTS -p -sstderr -RTS dataset1 empty
total time = 0.06 secs (60 ticks # 1000 us, 1 processor)
total alloc = 102,613,008 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
put Main 48.3 53.0
writeData Main 30.0 18.8
dataset1 Main 13.3 23.4
feedWithInputData Main 6.7 0.0
feedWithInputData.func Main 1.7 4.7
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 68 0 0.0 0.0 100.0 100.0
main Main 137 0 0.0 0.0 86.7 76.6
feedWithInputData Main 150 1 6.7 0.0 8.3 4.7
feedWithInputData.func Main 154 100000 1.7 4.7 1.7 4.7
writeData Main 148 1 30.0 18.8 78.3 71.8
put Main 155 100000 48.3 53.0 48.3 53.0
readData Main 147 0 0.0 0.1 0.0 0.1
writeEmptyDataToDisk Main 142 0 0.0 0.0 0.0 0.1
writeData Main 143 0 0.0 0.1 0.0 0.1
CAF:main1 Main 133 0 0.0 0.0 0.0 0.0
main Main 136 1 0.0 0.0 0.0 0.0
CAF:main2 Main 132 0 0.0 0.0 0.0 0.0
main Main 139 0 0.0 0.0 0.0 0.0
writeEmptyDataToDisk Main 140 1 0.0 0.0 0.0 0.0
writeData Main 141 1 0.0 0.0 0.0 0.0
CAF:main7 Main 131 0 0.0 0.0 0.0 0.0
main Main 145 0 0.0 0.0 0.0 0.0
readData Main 146 1 0.0 0.0 0.0 0.0
CAF:dataset1 Main 123 0 0.0 0.0 5.0 7.8
dataset1 Main 151 1 5.0 7.8 5.0 7.8
CAF:dataset4 Main 122 0 0.0 0.0 5.0 7.8
dataset1 Main 153 0 5.0 7.8 5.0 7.8
CAF:dataset5 Main 121 0 0.0 0.0 3.3 7.8
dataset1 Main 152 0 3.3 7.8 3.3 7.8
CAF:main4 Main 116 0 0.0 0.0 0.0 0.0
main Main 138 0 0.0 0.0 0.0 0.0
CAF:main6 Main 115 0 0.0 0.0 0.0 0.0
main Main 149 0 0.0 0.0 0.0 0.0
CAF:main3 Main 113 0 0.0 0.0 0.0 0.0
main Main 144 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 107 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 103 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 101 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 94 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 86 0 0.0 0.0 0.0 0.0
Now I add further data:
$ ./Main dataset2 myData.dat +RTS -p -sstderr
343,601,008 bytes allocated in the heap
175,650,728 bytes copied during GC
34,113,936 bytes maximum residency (8 sample(s))
971,896 bytes maximum slop
78 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 640 colls, 0 par 0.082s 0.083s 0.0001s 0.0017s
Gen 1 8 colls, 0 par 0.140s 0.141s 0.0176s 0.0484s
INIT time 0.001s ( 0.001s elapsed)
MUT time 0.138s ( 0.139s elapsed)
GC time 0.221s ( 0.224s elapsed)
RP time 0.000s ( 0.000s elapsed)
PROF time 0.000s ( 0.000s elapsed)
EXIT time 0.006s ( 0.006s elapsed)
Total time 0.370s ( 0.370s elapsed)
%GC time 59.8% (60.5% elapsed)
Alloc rate 2,485,518,518 bytes per MUT second
Productivity 39.9% of total user, 39.8% of total elapsed
Looking at the new prof-file:
Tue Apr 12 18:15 2016 Time and Allocation Profiling Report (Final)
Main +RTS -p -sstderr -RTS dataset2 myData.dat
total time = 0.14 secs (139 ticks # 1000 us, 1 processor)
total alloc = 213,866,232 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
put Main 41.0 50.9
writeData Main 25.9 18.0
get Main 25.2 16.8
dataset2 Main 4.3 11.2
readData Main 1.4 0.8
feedWithInputData.func Main 1.4 2.2
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 68 0 0.0 0.0 100.0 100.0
main Main 137 0 0.0 0.0 95.7 88.8
feedWithInputData Main 148 1 0.7 0.0 2.2 2.2
feedWithInputData.func Main 152 100000 1.4 2.2 1.4 2.2
writeData Main 145 1 25.9 18.0 66.9 68.9
put Main 153 200000 41.0 50.9 41.0 50.9
readData Main 141 0 1.4 0.8 26.6 17.6
get Main 144 0 25.2 16.8 25.2 16.8
CAF:main1 Main 133 0 0.0 0.0 0.0 0.0
main Main 136 1 0.0 0.0 0.0 0.0
CAF:main7 Main 131 0 0.0 0.0 0.0 0.0
main Main 139 0 0.0 0.0 0.0 0.0
readData Main 140 1 0.0 0.0 0.0 0.0
CAF:dataset2 Main 126 0 0.0 0.0 0.7 3.7
dataset2 Main 149 1 0.7 3.7 0.7 3.7
CAF:dataset6 Main 125 0 0.0 0.0 2.2 3.7
dataset2 Main 151 0 2.2 3.7 2.2 3.7
CAF:dataset7 Main 124 0 0.0 0.0 1.4 3.7
dataset2 Main 150 0 1.4 3.7 1.4 3.7
CAF:$fBinaryDualNo1 Main 120 0 0.0 0.0 0.0 0.0
get Main 143 1 0.0 0.0 0.0 0.0
CAF:main4 Main 116 0 0.0 0.0 0.0 0.0
main Main 138 0 0.0 0.0 0.0 0.0
CAF:main6 Main 115 0 0.0 0.0 0.0 0.0
main Main 146 0 0.0 0.0 0.0 0.0
CAF:main5 Main 114 0 0.0 0.0 0.0 0.0
main Main 147 0 0.0 0.0 0.0 0.0
CAF:main3 Main 113 0 0.0 0.0 0.0 0.0
main Main 142 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 107 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding 103 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 101 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 94 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 86 0 0.0 0.0 0.0 0.0
The more often I add new data, the higher the memory usage becomes. I mean, it's clear, that I need more memory for a bigger dataset. But isn't there a better solution for this problem (like gradually writing data back to disk).
Edit:
Actually the most important thing, that bothers me, is the following observation:
I run the program for the first time and add new data to an existing (empty) file on my disk.
The size of the saved file on my disk is: 1.53 MByte.
But (looking at the first prof-file) the program allocated more than 102 MByte. More than 50% was allocated by the put function from the Data.Binary package.
I run the program a second time and add new data to an existing (not empty) file on my disk.
The size of the saved file on my disk is 3.05 MByte.
But (looking at the second prof-file) the program allocated more than 213 MByte. More than 66% was allocated by the put and get function together.
=> Conclusion: In the first example I needed 102/1.53 = 66 times more memory running the program than space for the binary file on my disk.
In the second example I needed 213/3.05 = 69 times more memory running the programm than space for the binary file on my disk.
Question:
Is the Data.Binary package for serialization so efficient (and awesome), that it can decrease the needed memory to such an extent.
Analogous question:
Do I really need so much more memory for loading the data in my program than space for the the same data in a binary-file on disk?
I’m experiencing small CPU leaks using GHC 7.8.3 and Yesod 1.4.9.
When I run my site with time and stop it (Ctrl+C) after 1 minute without doing anything (just run, no request at all), it consumes 1 second. It represents approximately 1.7% of CPU.
$ time mysite
^C
real 1m0.226s
user 0m1.024s
sys 0m0.060s
If I disable the idle garbage collector, it drops to 0.35 second (0.6% of CPU). Though it’s better, it still consumes CPU without doing anything.
$ time mysite +RTS -I0 # Disable idle GC
^C
real 1m0.519s
user 0m0.352s
sys 0m0.064s
$ time mysite +RTS -I0
^C
real 4m0.676s
user 0m0.888s
sys 0m0.468s
$ time mysite +RTS -I0
^C
real 7m28.282s
user 0m1.452s
sys 0m0.976s
Compared to a cat command waiting indefinitely for something on the standard input:
$ time cat
^C
real 1m1.349s
user 0m0.000s
sys 0m0.000s
Is there anything else in Haskell that does consume CPU in the background ?
Is it a leak from Yesod ?
Or is it something that I have done in my program ? (I have only added handler functions, I don’t do parallel computation)
Edit 2015-05-31 19:25
Here’s the execution with the -s flag:
$ time mysite +RTS -I0 -s
^C 23,138,184 bytes allocated in the heap
4,422,096 bytes copied during GC
2,319,960 bytes maximum residency (4 sample(s))
210,584 bytes maximum slop
6 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 30 colls, 0 par 0.00s 0.00s 0.0001s 0.0003s
Gen 1 4 colls, 0 par 0.03s 0.04s 0.0103s 0.0211s
TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.86s (224.38s elapsed)
GC time 0.03s ( 0.05s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.90s (224.43s elapsed)
Alloc rate 26,778,662 bytes per MUT second
Productivity 96.9% of total user, 0.4% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
real 3m44.447s
user 0m0.896s
sys 0m0.320s
And with profiling:
$ time mysite +RTS -I0
^C 23,024,424 bytes allocated in the heap
19,367,640 bytes copied during GC
2,319,960 bytes maximum residency (94 sample(s))
211,312 bytes maximum slop
6 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 27 colls, 0 par 0.00s 0.00s 0.0002s 0.0005s
Gen 1 94 colls, 0 par 1.09s 1.04s 0.0111s 0.0218s
TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 1.00s (201.66s elapsed)
GC time 1.07s ( 1.03s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.02s ( 0.02s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 2.09s (202.68s elapsed)
Alloc rate 23,115,591 bytes per MUT second
Productivity 47.7% of total user, 0.5% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
real 3m22.697s
user 0m2.088s
sys 0m0.060s
mysite.prof:
Sun May 31 19:16 2015 Time and Allocation Profiling Report (Final)
mysite +RTS -N -p -s -h -i0.1 -I0 -RTS
total time = 0.05 secs (49 ticks # 1000 us, 1 processor)
total alloc = 17,590,528 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
MAIN MAIN 98.0 93.7
acquireSeedSystem.\.\ System.Random.MWC 2.0 0.0
toByteString Data.Serialize.Builder 0.0 3.9
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 5684 0 98.0 93.7 100.0 100.0
createSystemRandom System.Random.MWC 11396 0 0.0 0.0 2.0 0.3
withSystemRandom System.Random.MWC 11397 0 0.0 0.1 2.0 0.3
acquireSeedSystem System.Random.MWC 11399 0 0.0 0.0 2.0 0.2
acquireSeedSystem.\ System.Random.MWC 11401 1 0.0 0.2 2.0 0.2
acquireSeedSystem.\.\ System.Random.MWC 11403 1 2.0 0.0 2.0 0.0
sndS Data.Serialize.Put 11386 21 0.0 0.0 0.0 0.0
put Data.Serialize 11384 21 0.0 0.0 0.0 0.0
unPut Data.Serialize.Put 11383 21 0.0 0.0 0.0 0.0
toByteString Data.Serialize.Builder 11378 21 0.0 3.9 0.0 4.0
flush.\ Data.Serialize.Builder 11393 21 0.0 0.0 0.0 0.0
withSize Data.Serialize.Builder 11388 0 0.0 0.0 0.0 0.0
withSize.\ Data.Serialize.Builder 11389 21 0.0 0.0 0.0 0.0
runBuilder Data.Serialize.Builder 11390 21 0.0 0.0 0.0 0.0
runBuilder Data.Serialize.Builder 11382 21 0.0 0.0 0.0 0.0
unstream/resize Data.Text.Internal.Fusion 11372 174 0.0 0.1 0.0 0.1
CAF GHC.IO.Encoding 11322 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 11319 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 11318 0 0.0 0.2 0.0 0.2
CAF GHC.Event.Thread 11304 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 11292 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 11288 0 0.0 0.0 0.0 0.0
CAF GHC.TopHandler 11284 0 0.0 0.0 0.0 0.0
CAF GHC.Event.Control 11271 0 0.0 0.0 0.0 0.0
CAF Main 11263 0 0.0 0.0 0.0 0.0
main Main 11368 1 0.0 0.0 0.0 0.0
CAF Application 11262 0 0.0 0.0 0.0 0.0
CAF Foundation 11261 0 0.0 0.0 0.0 0.0
CAF Model 11260 0 0.0 0.1 0.0 0.3
unstream/resize Data.Text.Internal.Fusion 11375 35 0.0 0.1 0.0 0.1
CAF Settings 11259 0 0.0 0.1 0.0 0.2
unstream/resize Data.Text.Internal.Fusion 11370 20 0.0 0.1 0.0 0.1
CAF Database.Persist.Postgresql 6229 0 0.0 0.3 0.0 0.9
unstream/resize Data.Text.Internal.Fusion 11373 93 0.0 0.6 0.0 0.6
CAF Database.PostgreSQL.Simple.Transaction 6224 0 0.0 0.0 0.0 0.0
CAF Database.PostgreSQL.Simple.TypeInfo.Static 6222 0 0.0 0.0 0.0 0.0
CAF Database.PostgreSQL.Simple.Internal 6219 0 0.0 0.0 0.0 0.0
CAF Yesod.Static 6210 0 0.0 0.0 0.0 0.0
CAF Crypto.Hash.Conduit 6193 0 0.0 0.0 0.0 0.0
CAF Yesod.Default.Config2 6192 0 0.0 0.0 0.0 0.0
unstream/resize Data.Text.Internal.Fusion 11371 1 0.0 0.0 0.0 0.0
CAF Yesod.Core.Internal.Util 6154 0 0.0 0.0 0.0 0.0
CAF Text.Libyaml 6121 0 0.0 0.0 0.0 0.0
CAF Data.Yaml 6120 0 0.0 0.0 0.0 0.0
CAF Data.Yaml.Internal 6119 0 0.0 0.0 0.0 0.0
unstream/resize Data.Text.Internal.Fusion 11369 1 0.0 0.0 0.0 0.0
CAF Database.Persist.Quasi 6055 0 0.0 0.0 0.0 0.0
unstream/resize Data.Text.Internal.Fusion 11376 1 0.0 0.0 0.0 0.0
CAF Database.Persist.Sql.Internal 6046 0 0.0 0.0 0.0 0.0
unstream/resize Data.Text.Internal.Fusion 11377 6 0.0 0.0 0.0 0.0
CAF Data.Pool 6036 0 0.0 0.0 0.0 0.0
CAF Network.HTTP.Client.TLS 6014 0 0.0 0.0 0.0 0.0
CAF System.X509.Unix 6010 0 0.0 0.0 0.0 0.0
CAF Crypto.Hash.MD5 5927 0 0.0 0.0 0.0 0.0
CAF Data.Serialize 5873 0 0.0 0.0 0.0 0.0
put Data.Serialize 11385 1 0.0 0.0 0.0 0.0
CAF Data.Serialize.Put 5872 0 0.0 0.0 0.0 0.0
withSize Data.Serialize.Builder 11387 1 0.0 0.0 0.0 0.0
CAF Data.Serialize.Builder 5870 0 0.0 0.0 0.0 0.0
flush Data.Serialize.Builder 11392 1 0.0 0.0 0.0 0.0
toByteString Data.Serialize.Builder 11391 0 0.0 0.0 0.0 0.0
defaultSize Data.Serialize.Builder 11379 1 0.0 0.0 0.0 0.0
defaultSize.overhead Data.Serialize.Builder 11381 1 0.0 0.0 0.0 0.0
defaultSize.k Data.Serialize.Builder 11380 1 0.0 0.0 0.0 0.0
CAF Crypto.Random.Entropy.Unix 5866 0 0.0 0.0 0.0 0.0
CAF Network.HTTP.Client.Manager 5861 0 0.0 0.0 0.0 0.0
unstream/resize Data.Text.Internal.Fusion 11374 3 0.0 0.0 0.0 0.0
CAF System.Random.MWC 5842 0 0.0 0.0 0.0 0.0
coff System.Random.MWC 11405 1 0.0 0.0 0.0 0.0
ioff System.Random.MWC 11404 1 0.0 0.0 0.0 0.0
acquireSeedSystem System.Random.MWC 11398 1 0.0 0.0 0.0 0.0
acquireSeedSystem.random System.Random.MWC 11402 1 0.0 0.0 0.0 0.0
acquireSeedSystem.nbytes System.Random.MWC 11400 1 0.0 0.0 0.0 0.0
createSystemRandom System.Random.MWC 11394 1 0.0 0.0 0.0 0.0
withSystemRandom System.Random.MWC 11395 1 0.0 0.0 0.0 0.0
CAF Data.Streaming.Network.Internal 5833 0 0.0 0.0 0.0 0.0
CAF Data.Scientific 5728 0 0.0 0.1 0.0 0.1
CAF Data.Text.Array 5722 0 0.0 0.0 0.0 0.0
CAF Data.Text.Internal 5718 0 0.0 0.0 0.0 0.0
Edit 2015-06-01 08:40
You can browse source code at the following repository → https://github.com/Zigazou/Ouep
Found a related bug in the Yesod bug tracker. Ran my program like this:
myserver +RTS -I0 -RTS Development
And now idle CPU usage is down to almost nothing, compared to 14% or so before (ARM computer). The I0 (that's I and zero) option turns off periodic garbage collection, which defaults to 0.3 secs I think. Not sure about that implications for app responsiveness or memory usage, but for me at least this is definitely the culprit.
Is there any hidden option that will put cost centres in libraries? Currently I have set up my profiling like this:
cabal:
ghc-prof-options: -O2
-threaded
-fexcess-precision
-fprof-auto
-rtsopts
"-with-rtsopts=-N -p -s -h -i0.1"
exec:
# cabal sandbox init
# cabal install --enable-library-profiling --enable-executable-profiling
# cabal configure --enable-library-profiling --enable-executable-profiling
# cabal run
This works and creates the expected .prof file, .hp file and the summary when the program finishes.
Problem is that the .prof file doesn't contain anything that doesn't belong to the current project. My guess is that there is probably a option that will put cost centers in external library code?
My guess is that there is probably a option that will put cost centers in external library code?
Well, not per default. You need to add the cost centers when you compile the dependency. However, you can add -fprof-auto to the ghc options during cabal install:
$ cabal sandbox init
$ cabal install --ghc-option=-fprof-auto -p --enable-executable-profiling
Example
An example using code from this question, where the code from the question is contained in SO.hs:
$ cabal sandbox init
$ cabal install vector -p --ghc-options=-fprof-auto
$ cabal exec -- ghc --make SO.hs -prof -fprof-auto -O2
$ ./SO /usr/share/dict/words +RTS -s -p
$ cat SO.prof
Tue Dec 2 15:01 2014 Time and Allocation Profiling Report (Final)
Test +RTS -s -p -RTS /usr/share/dict/words
total time = 0.70 secs (698 ticks # 1000 us, 1 processor)
total alloc = 618,372,952 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
letterCount Main 40.3 24.3
letterCount.letters1 Main 13.2 18.2
basicUnsafeWrite Data.Vector.Primitive.Mutable 10.0 12.1
basicUnsafeWrite Data.Vector.Unboxed.Base 7.2 7.3
basicUnsafeRead Data.Vector.Primitive.Mutable 5.4 4.9
>>= Data.Vector.Fusion.Util 5.0 13.4
basicUnsafeIndexM Data.Vector.Unboxed.Base 4.9 0.0
basicUnsafeIndexM Data.Vector.Primitive 2.7 4.9
basicUnsafeIndexM Data.Vector.Unboxed.Base 2.3 0.0
letterCount.letters1.\ Main 2.0 2.4
>>= Data.Vector.Fusion.Util 1.9 6.1
basicUnsafeWrite Data.Vector.Unboxed.Base 1.7 0.0
letterCount.\ Main 1.3 2.4
readByteArray# Data.Primitive.Types 0.3 2.4
basicUnsafeNew Data.Vector.Primitive.Mutable 0.0 1.2
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 72 0 0.0 0.0 100.0 100.0
main Main 145 0 0.1 0.2 99.9 100.0
main.counts Main 148 1 0.0 0.0 99.3 99.6
letterCount Main 149 1 40.3 24.3 99.3 99.6
basicUnsafeFreeze Data.Vector.Unboxed.Base 257 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 259 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Primitive 258 1 0.0 0.0 0.0 0.0
letterCount.\ Main 256 938848 1.3 2.4 1.3 2.4
basicUnsafeWrite Data.Vector.Unboxed.Base 252 938848 1.3 0.0 5.0 6.1
basicUnsafeWrite Data.Vector.Primitive.Mutable 253 938848 3.7 6.1 3.7 6.1
writeByteArray# Data.Primitive.Types 255 938848 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 254 938848 0.0 0.0 0.0 0.0
basicUnsafeRead Data.Vector.Unboxed.Base 248 938848 0.7 0.0 6.6 7.3
basicUnsafeRead Data.Vector.Primitive.Mutable 249 938848 5.4 4.9 5.9 7.3
readByteArray# Data.Primitive.Types 251 938848 0.3 2.4 0.3 2.4
primitive Control.Monad.Primitive 250 938848 0.1 0.0 0.1 0.0
>>= Data.Vector.Fusion.Util 243 938848 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Unboxed.Base 242 938848 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Unboxed.Base 237 938848 4.9 0.0 11.7 10.9
>>= Data.Vector.Fusion.Util 247 938848 1.9 6.1 1.9 6.1
basicUnsafeIndexM Data.Vector.Unboxed.Base 238 938848 2.3 0.0 5.0 4.9
basicUnsafeIndexM Data.Vector.Primitive 239 938848 2.7 4.9 2.7 4.9
indexByteArray# Data.Primitive.Types 240 938848 0.0 0.0 0.0 0.0
>>= Data.Vector.Fusion.Util 236 938849 3.4 7.3 3.4 7.3
unId Data.Vector.Fusion.Util 235 938849 0.0 0.0 0.0 0.0
basicLength Data.Vector.Unboxed.Base 234 1 0.0 0.0 0.0 0.0
basicLength Data.Vector.Primitive.Mutable 233 1 0.0 0.0 0.0 0.0
basicUnsafeCopy Data.Vector.Unboxed.Base 222 1 0.0 0.0 0.0 0.0
basicUnsafeCopy Data.Vector.Primitive 223 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.ByteArray 226 3 0.0 0.0 0.0 0.0
basicLength Data.Vector.Unboxed.Base 214 1 0.0 0.0 0.0 0.0
basicLength Data.Vector.Primitive 215 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Unboxed.Base 212 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 220 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Primitive.Mutable 216 1 0.0 0.0 0.0 0.0
sizeOf Data.Primitive 217 1 0.0 0.0 0.0 0.0
sizeOf# Data.Primitive.Types 218 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.Types 219 1 0.0 0.0 0.0 0.0
basicLength Data.Vector.Unboxed.Base 211 1 0.0 0.0 0.0 0.0
letterCount.len Main 178 1 0.0 0.0 0.0 0.0
letterCount.letters1 Main 177 1 13.2 18.2 30.9 41.3
basicUnsafeFreeze Data.Vector.Unboxed.Base 204 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Unboxed.Base 210 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 207 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Primitive 206 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Unboxed.Base 205 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Primitive 208 0 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Unboxed.Base 200 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Unboxed.Base 203 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Unboxed.Base 201 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Primitive.Mutable 202 1 0.0 0.0 0.0 0.0
basicUnsafeWrite Data.Vector.Unboxed.Base 193 938848 7.2 7.3 14.2 13.4
basicUnsafeWrite Data.Vector.Unboxed.Base 198 938848 0.0 0.0 0.0 0.0
basicUnsafeWrite Data.Vector.Unboxed.Base 194 938848 0.4 0.0 7.0 6.1
basicUnsafeWrite Data.Vector.Primitive.Mutable 195 938848 6.3 6.1 6.6 6.1
writeByteArray# Data.Primitive.Types 197 938848 0.3 0.0 0.3 0.0
primitive Control.Monad.Primitive 196 938848 0.0 0.0 0.0 0.0
letterCount.letters1.\ Main 192 938848 2.0 2.4 2.0 2.4
>>= Data.Vector.Fusion.Util 191 938848 1.6 6.1 1.6 6.1
unId Data.Vector.Fusion.Util 190 938849 0.0 0.0 0.0 0.0
upperBound Data.Vector.Fusion.Stream.Size 180 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Unboxed.Base 179 1 0.0 0.0 0.0 1.2
basicUnsafeNew Data.Vector.Unboxed.Base 189 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 187 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Primitive.Mutable 182 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Unboxed.Base 181 1 0.0 0.0 0.0 1.2
basicUnsafeNew Data.Vector.Primitive.Mutable 183 0 0.0 1.2 0.0 1.2
sizeOf Data.Primitive 184 1 0.0 0.0 0.0 0.0
sizeOf# Data.Primitive.Types 185 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.Types 186 1 0.0 0.0 0.0 0.0
printCounts Main 146 1 0.4 0.2 0.4 0.2
basicUnsafeIndexM Data.Vector.Unboxed.Base 266 256 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Primitive 267 0 0.0 0.0 0.0 0.0
indexByteArray# Data.Primitive.Types 268 256 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Primitive 265 256 0.0 0.0 0.0 0.0
>>= Data.Vector.Fusion.Util 264 256 0.0 0.0 0.0 0.0
unId Data.Vector.Fusion.Util 263 256 0.0 0.0 0.0 0.0
basicLength Data.Vector.Unboxed.Base 262 1 0.0 0.0 0.0 0.0
basicLength Data.Vector.Primitive 261 1 0.0 0.0 0.0 0.0
CAF Main 143 0 0.0 0.0 0.0 0.0
main Main 144 1 0.0 0.0 0.0 0.0
main.counts Main 150 0 0.0 0.0 0.0 0.0
letterCount Main 151 0 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Unboxed.Base 244 0 0.0 0.0 0.0 0.0
>>= Data.Vector.Fusion.Util 245 0 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Unboxed.Base 246 0 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 224 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Unboxed.Base 173 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 175 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Primitive 174 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Unboxed.Base 171 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Primitive.Mutable 172 1 0.0 0.0 0.0 0.0
basicUnsafeWrite Data.Vector.Unboxed.Base 167 256 0.0 0.0 0.0 0.0
basicUnsafeWrite Data.Vector.Primitive.Mutable 168 256 0.0 0.0 0.0 0.0
writeByteArray# Data.Primitive.Types 170 256 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 169 256 0.0 0.0 0.0 0.0
>>= Data.Vector.Fusion.Util 165 256 0.0 0.0 0.0 0.0
unId Data.Vector.Fusion.Util 164 257 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Unboxed.Base 156 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 162 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Primitive.Mutable 157 1 0.0 0.0 0.0 0.0
sizeOf Data.Primitive 158 1 0.0 0.0 0.0 0.0
sizeOf# Data.Primitive.Types 159 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.Types 160 1 0.0 0.0 0.0 0.0
upperBound Data.Vector.Fusion.Stream.Size 153 1 0.0 0.0 0.0 0.0
elemseq Data.Vector.Unboxed.Base 152 1 0.0 0.0 0.0 0.0
printCounts Main 147 0 0.0 0.0 0.0 0.0
CAF Data.Vector.Internal.Check 142 0 0.0 0.0 0.0 0.0
doBoundsChecks Data.Vector.Internal.Check 213 1 0.0 0.0 0.0 0.0
doUnsafeChecks Data.Vector.Internal.Check 155 1 0.0 0.0 0.0 0.0
doInternalChecks Data.Vector.Internal.Check 154 1 0.0 0.0 0.0 0.0
CAF Data.Vector.Fusion.Util 141 0 0.0 0.0 0.0 0.0
return Data.Vector.Fusion.Util 241 1 0.0 0.0 0.0 0.0
return Data.Vector.Fusion.Util 166 1 0.0 0.0 0.0 0.0
CAF Data.Vector.Unboxed.Base 136 0 0.0 0.0 0.0 0.0
basicUnsafeCopy Data.Vector.Unboxed.Base 227 0 0.0 0.0 0.0 0.0
basicUnsafeCopy Data.Vector.Primitive 228 0 0.0 0.0 0.0 0.0
basicUnsafeCopy.sz Data.Vector.Primitive 229 1 0.0 0.0 0.0 0.0
sizeOf Data.Primitive 230 1 0.0 0.0 0.0 0.0
sizeOf# Data.Primitive.Types 231 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.Types 232 1 0.0 0.0 0.0 0.0
CAF Data.Primitive.MachDeps 128 0 0.0 0.0 0.0 0.0
sIZEOF_INT Data.Primitive.MachDeps 161 1 0.0 0.0 0.0 0.0
CAF Text.Printf 118 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 112 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 109 0 0.1 0.0 0.1 0.0
CAF GHC.IO.Encoding 99 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 98 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 95 0 0.0 0.0 0.0 0.0
Unfortunately, you cannot state --ghc-option=… as a flag at the dependencies.
You also need -prof.
GHC Users's Guide says "There are a few other profiling-related compilation options. Use them in addition to -prof. These do not have to be used consistently for all modules in a program.
"
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
free -m
total used free shared buffers cached
Mem: 7974 6993 981 0 557 893
-/+ buffers/cache: 5542 2432
Swap: 2047 0 2047
You see that my system has used 5542MB memory, but when I use ps aux to check who uses it, I couldn't figure out.
ps aux | awk '$6 > 0{print $3, $4, $5, $6}'
%CPU %MEM VSZ RSS
0.0 0.0 10344 700
0.0 0.0 51172 2092
0.0 0.0 51172 1032
0.0 0.0 68296 1600
0.0 0.0 12692 872
0.0 0.0 33840 864
0.0 0.0 10728 376
0.0 0.0 8564 648
0.0 0.0 74856 1132
53.2 0.5 930408 45824
0.0 0.0 24236 1768
0.0 0.0 51172 2100
0.0 0.0 51172 1040
0.0 0.0 68296 1600
51.9 0.5 864348 42740
0.0 0.0 34360 2672
0.0 0.0 3784 528
0.0 0.0 3784 532
0.0 0.0 3784 528
0.0 0.0 3784 528
0.0 0.0 3784 532
0.0 0.0 65604 900
0.0 0.0 63916 832
0.0 0.0 94020 5980
0.0 0.0 3836 468
0.0 0.0 93736 4000
0.0 0.0 3788 484
0.0 0.0 3652 336
0.0 0.0 3652 336
0.0 0.0 3684 344
0.0 0.0 3664 324
0.0 0.0 19184 4880
0.0 0.0 3704 324
0.0 0.0 340176 1312
0.0 0.0 46544 816
0.0 0.0 10792 1092
0.0 0.0 3824 400
0.0 0.0 3640 292
0.0 0.0 3652 332
0.0 0.0 3652 332
0.0 0.0 3664 328
0.0 0.0 4264 1004
0.0 0.0 4584 2368
0.0 0.0 77724 3060
0.0 0.0 89280 2704
you see, that the sum of RSS is 152.484MB, the sum of VSZ is 3376.34MB, so I don't know who eat up the rest of the memory, the kernel?
From my system:
$ grep ^S[^wh] /proc/meminfo
Slab: 4707412 kB
SReclaimable: 4602900 kB
SUnreclaim: 104512 kB
These three metrics are data structures held by the slab alocator. While SUnreclaimable is, well, unreclaimable, SReclaimable is just like any other cache in the system - it will be made available to processes under memory pressure. Unfortunately free does not seem to take it into account, as mentioned in detail in this older answer of mine, and this part of memory can easily grow to several GB of memory...
If you really want to see how much memory your processes are using you could try going through the cache-emptying procedure described in my post - you can skip the swap-related parts, since your system does not appear to be using any swap memory anyway.