I am doing network performance tests, and realized the interface's interrupts processing on 8 cpus is not balanced. So I want to make them more balanced.
I just set the files :
echo 11 > /proc/irq/16/smp_affinity
echo 22 > /proc/irq/17/smp_affinity
echo 44 > /proc/irq/18/smp_affinity
echo 88 > /proc/irq/19/smp_affinity
where the 16 17 18 and 19 are my four IRQ no of my network interfaces.
[root#localhost ~]# cat /proc/interrupts | grep ens
16: 30490 0 16838 427032 379 0 10678 0 IO-APIC-fasteoi vmwgfx, ens34, ens42
17: 799858 0 68176 0 78056 0 44715 0 IO-APIC-fasteoi ioc0, ens35, ens43, ens39
18: 2673 0 6149 0 7651 0 5585 0 IO-APIC-fasteoi uhci_hcd:usb2, snd_ens1371, ens40, ens44
19: 145769 1431206 0 0 0 0 305 0 IO-APIC-fasteoi ehci_hcd:usb1, ens41, ens45, ens33
But, sadly, I still found the IRQ is not balanced over the CPUs:
Tasks: 263 total, 2 running, 261 sleeping, 0 stopped, 0 zombie
%Cpu0 : 7.5 us, 10.0 sy, 0.0 ni, 65.3 id, 0.0 wa, 0.4 hi, 16.7 si, 0.0 st
%Cpu1 : 9.7 us, 15.0 sy, 0.0 ni, 59.1 id, 0.0 wa, 0.0 hi, 16.2 si, 0.0 st
%Cpu2 : 11.7 us, 21.6 sy, 0.0 ni, 66.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 10.4 us, 16.6 sy, 0.0 ni, 66.0 id, 0.0 wa, 0.0 hi, 6.9 si, 0.0 st
%Cpu4 : 10.9 us, 24.5 sy, 0.0 ni, 64.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 11.8 us, 29.4 sy, 0.0 ni, 58.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 9.0 us, 19.8 sy, 0.0 ni, 71.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 11.5 us, 22.6 sy, 0.0 ni, 65.9 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
So, why the IRQs not occur on all CPUs?
How can I balance the irq processing over all CPUs?
Related
In my production environment i had a 8 core, around 40GB+ RAM server. I am seeing my docker containers were using more cpu cycles, and making other standalone deployed services slow.
Ex: we had a nifi service running in a container using cpu around 300%(varying one) and other containers like database, kafka some times.
here is the docker container inspect i found
"Isolation": "",
"CpuShares": 0,
"Memory": 0,
"NanoCpus": 0,
"CgroupParent": "",
"BlkioWeight": 0,
"BlkioWeightDevice": null,
"BlkioDeviceReadBps": null,
"BlkioDeviceWriteBps": null,
"BlkioDeviceReadIOps": null,
"BlkioDeviceWriteIOps": null,
"CpuPeriod": 0,
"CpuQuota": 0,
"CpuRealtimePeriod": 0,
"CpuRealtimeRuntime": 0,
"CpusetCpus": "",
"CpusetMems": "",
"Devices": null,
"DeviceCgroupRules": null,
"DiskQuota": 0,
"KernelMemory": 0,
"MemoryReservation": 0,
"MemorySwap": 0,
"MemorySwappiness": null,
"OomKillDisable": false,
"PidsLimit": 0,
"Ulimits": null,
"CpuCount": 0,
"CpuPercent": 0,
"IOMaximumIOps": 0,
"IOMaximumBandwidth": 0
here is the top usage
$top
top - 01:08:47 up 15 days, 3:23, 1 user, load average: 5.73, 5.58, 5.44
Tasks: 320 total, 4 running, 314 sleeping, 0 stopped, 2 zombie
%Cpu0 : 15.9 us, 6.2 sy, 0.0 ni, 77.2 id, 0.0 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu1 : 26.8 us, 3.1 sy, 0.0 ni, 69.4 id, 0.0 wa, 0.0 hi, 0.7 si, 0.0 st
%Cpu2 : 8.9 us, 5.2 sy, 0.0 ni, 84.9 id, 0.0 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu3 : 13.8 us, 6.0 sy, 0.0 ni, 78.8 id, 0.0 wa, 0.0 hi, 1.4 si, 0.0 st
%Cpu4 : 34.4 us, 3.8 sy, 0.0 ni, 61.1 id, 0.3 wa, 0.0 hi, 0.3 si, 0.0 st
%Cpu5 : 81.3 us, 4.9 sy, 0.0 ni, 11.3 id, 0.0 wa, 0.0 hi, 2.5 si, 0.0 st
%Cpu6 : 64.8 us, 2.5 sy, 0.0 ni, 31.7 id, 0.0 wa, 0.0 hi, 1.1 si, 0.0 st
%Cpu7 : 77.4 us, 5.0 sy, 0.0 ni, 16.5 id, 0.0 wa, 0.0 hi, 1.1 si, 0.0 st
KiB Mem: 57803616 total, 51826940 used, 5976676 free, 1047628 buffers
KiB Swap: 16773116 total, 77040 used, 16696076 free. 19288708 cached Mem
in general when we services running directly on server like web application which takes some amount of cpu and memory. will docker containers completely use the cpu cycle making other services slow?
if yes, then what would be the best process to make both services use resources properly by not making system heavily loaded?
Thanks in advance !!!
I have a question about Apache Spark. I set up an Apache Spark standalone cluster on my Ubuntu desktop. Then I wrote two lines in the spark_env.sh file: SPARK_WORKER_INSTANCES=4 and SPARK_WORKER_CORES=1. (I found that export is not necessary in spark_env.sh file if I start the cluster after I edit the spark_env.sh file.)
I wanted to have 4 worker instances in my single desktop and let them occupy 1 CPU core each. And the result was like this:
top - 14:37:54 up 2:35, 3 users, load average: 1.30, 3.60, 4.84
Tasks: 255 total, 1 running, 254 sleeping, 0 stopped, 0 zombie
%Cpu0 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 1.7 us, 0.3 sy, 0.0 ni, 98.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 41.6 us, 0.0 sy, 0.0 ni, 58.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 59.0 us, 0.0 sy, 0.0 ni, 41.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 16369608 total, 11026436 used, 5343172 free, 62356 buffers
KiB Swap: 16713724 total, 360 used, 16713364 free. 2228576 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10829 aaaaaa 20 0 42.624g 1.010g 142408 S 101.2 6.5 0:22.78 java
10861 aaaaaa 20 0 42.563g 1.044g 142340 S 101.2 6.7 0:22.75 java
10831 aaaaaa 20 0 42.704g 1.262g 142344 S 100.8 8.1 0:24.86 java
10857 aaaaaa 20 0 42.833g 1.315g 142456 S 100.5 8.4 0:26.48 java
1978 aaaaaa 20 0 1462096 186480 102652 S 1.0 1.1 0:34.82 compiz
10720 aaaaaa 20 0 7159748 1.579g 32008 S 1.0 10.1 0:16.62 java
1246 root 20 0 326624 101148 65244 S 0.7 0.6 0:50.37 Xorg
1720 aaaaaa 20 0 497916 28968 20624 S 0.3 0.2 0:02.83 unity-panel-ser
2238 aaaaaa 20 0 654868 30920 23052 S 0.3 0.2 0:06.31 gnome-terminal
I think java in the first 4 lines are Spark workers. If it's correct, it's nice that there are four Spark workers and each of them are using 1 physical core each (e.g., 101.2%).
But I see that 5 physical cores are used. Among them, CPU0, CPU3, CPU7 are fully used. I think one Spark worker is using one of those physical cores. It's fine.
However, the usage levels of CPU2 and CPU6 are 41.6% and 59.0%, respectively. They add up to 100.6%, and I think one worker's job is distributed to those 2 physical cores.
With SPARK_WORKER_INSTANCES=4 AND SPARK_WORKER_CORES=1, is this a normal situation? Or is this a sign of some errors or problems?
This is perfectly normal behavior. Whenever Spark uses term core it actually means either process or thread and neither one is bound to a single core or processor.
In any multitasking environment processes are not executed continuously. Instead, operating system is constantly switching between different processes which each one getting only small share of available processor time.
I’m experiencing small CPU leaks using GHC 7.8.3 and Yesod 1.4.9.
When I run my site with time and stop it (Ctrl+C) after 1 minute without doing anything (just run, no request at all), it consumes 1 second. It represents approximately 1.7% of CPU.
$ time mysite
^C
real 1m0.226s
user 0m1.024s
sys 0m0.060s
If I disable the idle garbage collector, it drops to 0.35 second (0.6% of CPU). Though it’s better, it still consumes CPU without doing anything.
$ time mysite +RTS -I0 # Disable idle GC
^C
real 1m0.519s
user 0m0.352s
sys 0m0.064s
$ time mysite +RTS -I0
^C
real 4m0.676s
user 0m0.888s
sys 0m0.468s
$ time mysite +RTS -I0
^C
real 7m28.282s
user 0m1.452s
sys 0m0.976s
Compared to a cat command waiting indefinitely for something on the standard input:
$ time cat
^C
real 1m1.349s
user 0m0.000s
sys 0m0.000s
Is there anything else in Haskell that does consume CPU in the background ?
Is it a leak from Yesod ?
Or is it something that I have done in my program ? (I have only added handler functions, I don’t do parallel computation)
Edit 2015-05-31 19:25
Here’s the execution with the -s flag:
$ time mysite +RTS -I0 -s
^C 23,138,184 bytes allocated in the heap
4,422,096 bytes copied during GC
2,319,960 bytes maximum residency (4 sample(s))
210,584 bytes maximum slop
6 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 30 colls, 0 par 0.00s 0.00s 0.0001s 0.0003s
Gen 1 4 colls, 0 par 0.03s 0.04s 0.0103s 0.0211s
TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.86s (224.38s elapsed)
GC time 0.03s ( 0.05s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.90s (224.43s elapsed)
Alloc rate 26,778,662 bytes per MUT second
Productivity 96.9% of total user, 0.4% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
real 3m44.447s
user 0m0.896s
sys 0m0.320s
And with profiling:
$ time mysite +RTS -I0
^C 23,024,424 bytes allocated in the heap
19,367,640 bytes copied during GC
2,319,960 bytes maximum residency (94 sample(s))
211,312 bytes maximum slop
6 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 27 colls, 0 par 0.00s 0.00s 0.0002s 0.0005s
Gen 1 94 colls, 0 par 1.09s 1.04s 0.0111s 0.0218s
TASKS: 5 (1 bound, 4 peak workers (4 total), using -N1)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 1.00s (201.66s elapsed)
GC time 1.07s ( 1.03s elapsed)
RP time 0.00s ( 0.00s elapsed)
PROF time 0.02s ( 0.02s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 2.09s (202.68s elapsed)
Alloc rate 23,115,591 bytes per MUT second
Productivity 47.7% of total user, 0.5% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
real 3m22.697s
user 0m2.088s
sys 0m0.060s
mysite.prof:
Sun May 31 19:16 2015 Time and Allocation Profiling Report (Final)
mysite +RTS -N -p -s -h -i0.1 -I0 -RTS
total time = 0.05 secs (49 ticks # 1000 us, 1 processor)
total alloc = 17,590,528 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
MAIN MAIN 98.0 93.7
acquireSeedSystem.\.\ System.Random.MWC 2.0 0.0
toByteString Data.Serialize.Builder 0.0 3.9
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 5684 0 98.0 93.7 100.0 100.0
createSystemRandom System.Random.MWC 11396 0 0.0 0.0 2.0 0.3
withSystemRandom System.Random.MWC 11397 0 0.0 0.1 2.0 0.3
acquireSeedSystem System.Random.MWC 11399 0 0.0 0.0 2.0 0.2
acquireSeedSystem.\ System.Random.MWC 11401 1 0.0 0.2 2.0 0.2
acquireSeedSystem.\.\ System.Random.MWC 11403 1 2.0 0.0 2.0 0.0
sndS Data.Serialize.Put 11386 21 0.0 0.0 0.0 0.0
put Data.Serialize 11384 21 0.0 0.0 0.0 0.0
unPut Data.Serialize.Put 11383 21 0.0 0.0 0.0 0.0
toByteString Data.Serialize.Builder 11378 21 0.0 3.9 0.0 4.0
flush.\ Data.Serialize.Builder 11393 21 0.0 0.0 0.0 0.0
withSize Data.Serialize.Builder 11388 0 0.0 0.0 0.0 0.0
withSize.\ Data.Serialize.Builder 11389 21 0.0 0.0 0.0 0.0
runBuilder Data.Serialize.Builder 11390 21 0.0 0.0 0.0 0.0
runBuilder Data.Serialize.Builder 11382 21 0.0 0.0 0.0 0.0
unstream/resize Data.Text.Internal.Fusion 11372 174 0.0 0.1 0.0 0.1
CAF GHC.IO.Encoding 11322 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 11319 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 11318 0 0.0 0.2 0.0 0.2
CAF GHC.Event.Thread 11304 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 11292 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 11288 0 0.0 0.0 0.0 0.0
CAF GHC.TopHandler 11284 0 0.0 0.0 0.0 0.0
CAF GHC.Event.Control 11271 0 0.0 0.0 0.0 0.0
CAF Main 11263 0 0.0 0.0 0.0 0.0
main Main 11368 1 0.0 0.0 0.0 0.0
CAF Application 11262 0 0.0 0.0 0.0 0.0
CAF Foundation 11261 0 0.0 0.0 0.0 0.0
CAF Model 11260 0 0.0 0.1 0.0 0.3
unstream/resize Data.Text.Internal.Fusion 11375 35 0.0 0.1 0.0 0.1
CAF Settings 11259 0 0.0 0.1 0.0 0.2
unstream/resize Data.Text.Internal.Fusion 11370 20 0.0 0.1 0.0 0.1
CAF Database.Persist.Postgresql 6229 0 0.0 0.3 0.0 0.9
unstream/resize Data.Text.Internal.Fusion 11373 93 0.0 0.6 0.0 0.6
CAF Database.PostgreSQL.Simple.Transaction 6224 0 0.0 0.0 0.0 0.0
CAF Database.PostgreSQL.Simple.TypeInfo.Static 6222 0 0.0 0.0 0.0 0.0
CAF Database.PostgreSQL.Simple.Internal 6219 0 0.0 0.0 0.0 0.0
CAF Yesod.Static 6210 0 0.0 0.0 0.0 0.0
CAF Crypto.Hash.Conduit 6193 0 0.0 0.0 0.0 0.0
CAF Yesod.Default.Config2 6192 0 0.0 0.0 0.0 0.0
unstream/resize Data.Text.Internal.Fusion 11371 1 0.0 0.0 0.0 0.0
CAF Yesod.Core.Internal.Util 6154 0 0.0 0.0 0.0 0.0
CAF Text.Libyaml 6121 0 0.0 0.0 0.0 0.0
CAF Data.Yaml 6120 0 0.0 0.0 0.0 0.0
CAF Data.Yaml.Internal 6119 0 0.0 0.0 0.0 0.0
unstream/resize Data.Text.Internal.Fusion 11369 1 0.0 0.0 0.0 0.0
CAF Database.Persist.Quasi 6055 0 0.0 0.0 0.0 0.0
unstream/resize Data.Text.Internal.Fusion 11376 1 0.0 0.0 0.0 0.0
CAF Database.Persist.Sql.Internal 6046 0 0.0 0.0 0.0 0.0
unstream/resize Data.Text.Internal.Fusion 11377 6 0.0 0.0 0.0 0.0
CAF Data.Pool 6036 0 0.0 0.0 0.0 0.0
CAF Network.HTTP.Client.TLS 6014 0 0.0 0.0 0.0 0.0
CAF System.X509.Unix 6010 0 0.0 0.0 0.0 0.0
CAF Crypto.Hash.MD5 5927 0 0.0 0.0 0.0 0.0
CAF Data.Serialize 5873 0 0.0 0.0 0.0 0.0
put Data.Serialize 11385 1 0.0 0.0 0.0 0.0
CAF Data.Serialize.Put 5872 0 0.0 0.0 0.0 0.0
withSize Data.Serialize.Builder 11387 1 0.0 0.0 0.0 0.0
CAF Data.Serialize.Builder 5870 0 0.0 0.0 0.0 0.0
flush Data.Serialize.Builder 11392 1 0.0 0.0 0.0 0.0
toByteString Data.Serialize.Builder 11391 0 0.0 0.0 0.0 0.0
defaultSize Data.Serialize.Builder 11379 1 0.0 0.0 0.0 0.0
defaultSize.overhead Data.Serialize.Builder 11381 1 0.0 0.0 0.0 0.0
defaultSize.k Data.Serialize.Builder 11380 1 0.0 0.0 0.0 0.0
CAF Crypto.Random.Entropy.Unix 5866 0 0.0 0.0 0.0 0.0
CAF Network.HTTP.Client.Manager 5861 0 0.0 0.0 0.0 0.0
unstream/resize Data.Text.Internal.Fusion 11374 3 0.0 0.0 0.0 0.0
CAF System.Random.MWC 5842 0 0.0 0.0 0.0 0.0
coff System.Random.MWC 11405 1 0.0 0.0 0.0 0.0
ioff System.Random.MWC 11404 1 0.0 0.0 0.0 0.0
acquireSeedSystem System.Random.MWC 11398 1 0.0 0.0 0.0 0.0
acquireSeedSystem.random System.Random.MWC 11402 1 0.0 0.0 0.0 0.0
acquireSeedSystem.nbytes System.Random.MWC 11400 1 0.0 0.0 0.0 0.0
createSystemRandom System.Random.MWC 11394 1 0.0 0.0 0.0 0.0
withSystemRandom System.Random.MWC 11395 1 0.0 0.0 0.0 0.0
CAF Data.Streaming.Network.Internal 5833 0 0.0 0.0 0.0 0.0
CAF Data.Scientific 5728 0 0.0 0.1 0.0 0.1
CAF Data.Text.Array 5722 0 0.0 0.0 0.0 0.0
CAF Data.Text.Internal 5718 0 0.0 0.0 0.0 0.0
Edit 2015-06-01 08:40
You can browse source code at the following repository → https://github.com/Zigazou/Ouep
Found a related bug in the Yesod bug tracker. Ran my program like this:
myserver +RTS -I0 -RTS Development
And now idle CPU usage is down to almost nothing, compared to 14% or so before (ARM computer). The I0 (that's I and zero) option turns off periodic garbage collection, which defaults to 0.3 secs I think. Not sure about that implications for app responsiveness or memory usage, but for me at least this is definitely the culprit.
Is there any hidden option that will put cost centres in libraries? Currently I have set up my profiling like this:
cabal:
ghc-prof-options: -O2
-threaded
-fexcess-precision
-fprof-auto
-rtsopts
"-with-rtsopts=-N -p -s -h -i0.1"
exec:
# cabal sandbox init
# cabal install --enable-library-profiling --enable-executable-profiling
# cabal configure --enable-library-profiling --enable-executable-profiling
# cabal run
This works and creates the expected .prof file, .hp file and the summary when the program finishes.
Problem is that the .prof file doesn't contain anything that doesn't belong to the current project. My guess is that there is probably a option that will put cost centers in external library code?
My guess is that there is probably a option that will put cost centers in external library code?
Well, not per default. You need to add the cost centers when you compile the dependency. However, you can add -fprof-auto to the ghc options during cabal install:
$ cabal sandbox init
$ cabal install --ghc-option=-fprof-auto -p --enable-executable-profiling
Example
An example using code from this question, where the code from the question is contained in SO.hs:
$ cabal sandbox init
$ cabal install vector -p --ghc-options=-fprof-auto
$ cabal exec -- ghc --make SO.hs -prof -fprof-auto -O2
$ ./SO /usr/share/dict/words +RTS -s -p
$ cat SO.prof
Tue Dec 2 15:01 2014 Time and Allocation Profiling Report (Final)
Test +RTS -s -p -RTS /usr/share/dict/words
total time = 0.70 secs (698 ticks # 1000 us, 1 processor)
total alloc = 618,372,952 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
letterCount Main 40.3 24.3
letterCount.letters1 Main 13.2 18.2
basicUnsafeWrite Data.Vector.Primitive.Mutable 10.0 12.1
basicUnsafeWrite Data.Vector.Unboxed.Base 7.2 7.3
basicUnsafeRead Data.Vector.Primitive.Mutable 5.4 4.9
>>= Data.Vector.Fusion.Util 5.0 13.4
basicUnsafeIndexM Data.Vector.Unboxed.Base 4.9 0.0
basicUnsafeIndexM Data.Vector.Primitive 2.7 4.9
basicUnsafeIndexM Data.Vector.Unboxed.Base 2.3 0.0
letterCount.letters1.\ Main 2.0 2.4
>>= Data.Vector.Fusion.Util 1.9 6.1
basicUnsafeWrite Data.Vector.Unboxed.Base 1.7 0.0
letterCount.\ Main 1.3 2.4
readByteArray# Data.Primitive.Types 0.3 2.4
basicUnsafeNew Data.Vector.Primitive.Mutable 0.0 1.2
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 72 0 0.0 0.0 100.0 100.0
main Main 145 0 0.1 0.2 99.9 100.0
main.counts Main 148 1 0.0 0.0 99.3 99.6
letterCount Main 149 1 40.3 24.3 99.3 99.6
basicUnsafeFreeze Data.Vector.Unboxed.Base 257 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 259 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Primitive 258 1 0.0 0.0 0.0 0.0
letterCount.\ Main 256 938848 1.3 2.4 1.3 2.4
basicUnsafeWrite Data.Vector.Unboxed.Base 252 938848 1.3 0.0 5.0 6.1
basicUnsafeWrite Data.Vector.Primitive.Mutable 253 938848 3.7 6.1 3.7 6.1
writeByteArray# Data.Primitive.Types 255 938848 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 254 938848 0.0 0.0 0.0 0.0
basicUnsafeRead Data.Vector.Unboxed.Base 248 938848 0.7 0.0 6.6 7.3
basicUnsafeRead Data.Vector.Primitive.Mutable 249 938848 5.4 4.9 5.9 7.3
readByteArray# Data.Primitive.Types 251 938848 0.3 2.4 0.3 2.4
primitive Control.Monad.Primitive 250 938848 0.1 0.0 0.1 0.0
>>= Data.Vector.Fusion.Util 243 938848 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Unboxed.Base 242 938848 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Unboxed.Base 237 938848 4.9 0.0 11.7 10.9
>>= Data.Vector.Fusion.Util 247 938848 1.9 6.1 1.9 6.1
basicUnsafeIndexM Data.Vector.Unboxed.Base 238 938848 2.3 0.0 5.0 4.9
basicUnsafeIndexM Data.Vector.Primitive 239 938848 2.7 4.9 2.7 4.9
indexByteArray# Data.Primitive.Types 240 938848 0.0 0.0 0.0 0.0
>>= Data.Vector.Fusion.Util 236 938849 3.4 7.3 3.4 7.3
unId Data.Vector.Fusion.Util 235 938849 0.0 0.0 0.0 0.0
basicLength Data.Vector.Unboxed.Base 234 1 0.0 0.0 0.0 0.0
basicLength Data.Vector.Primitive.Mutable 233 1 0.0 0.0 0.0 0.0
basicUnsafeCopy Data.Vector.Unboxed.Base 222 1 0.0 0.0 0.0 0.0
basicUnsafeCopy Data.Vector.Primitive 223 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.ByteArray 226 3 0.0 0.0 0.0 0.0
basicLength Data.Vector.Unboxed.Base 214 1 0.0 0.0 0.0 0.0
basicLength Data.Vector.Primitive 215 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Unboxed.Base 212 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 220 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Primitive.Mutable 216 1 0.0 0.0 0.0 0.0
sizeOf Data.Primitive 217 1 0.0 0.0 0.0 0.0
sizeOf# Data.Primitive.Types 218 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.Types 219 1 0.0 0.0 0.0 0.0
basicLength Data.Vector.Unboxed.Base 211 1 0.0 0.0 0.0 0.0
letterCount.len Main 178 1 0.0 0.0 0.0 0.0
letterCount.letters1 Main 177 1 13.2 18.2 30.9 41.3
basicUnsafeFreeze Data.Vector.Unboxed.Base 204 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Unboxed.Base 210 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 207 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Primitive 206 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Unboxed.Base 205 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Primitive 208 0 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Unboxed.Base 200 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Unboxed.Base 203 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Unboxed.Base 201 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Primitive.Mutable 202 1 0.0 0.0 0.0 0.0
basicUnsafeWrite Data.Vector.Unboxed.Base 193 938848 7.2 7.3 14.2 13.4
basicUnsafeWrite Data.Vector.Unboxed.Base 198 938848 0.0 0.0 0.0 0.0
basicUnsafeWrite Data.Vector.Unboxed.Base 194 938848 0.4 0.0 7.0 6.1
basicUnsafeWrite Data.Vector.Primitive.Mutable 195 938848 6.3 6.1 6.6 6.1
writeByteArray# Data.Primitive.Types 197 938848 0.3 0.0 0.3 0.0
primitive Control.Monad.Primitive 196 938848 0.0 0.0 0.0 0.0
letterCount.letters1.\ Main 192 938848 2.0 2.4 2.0 2.4
>>= Data.Vector.Fusion.Util 191 938848 1.6 6.1 1.6 6.1
unId Data.Vector.Fusion.Util 190 938849 0.0 0.0 0.0 0.0
upperBound Data.Vector.Fusion.Stream.Size 180 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Unboxed.Base 179 1 0.0 0.0 0.0 1.2
basicUnsafeNew Data.Vector.Unboxed.Base 189 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 187 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Primitive.Mutable 182 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Unboxed.Base 181 1 0.0 0.0 0.0 1.2
basicUnsafeNew Data.Vector.Primitive.Mutable 183 0 0.0 1.2 0.0 1.2
sizeOf Data.Primitive 184 1 0.0 0.0 0.0 0.0
sizeOf# Data.Primitive.Types 185 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.Types 186 1 0.0 0.0 0.0 0.0
printCounts Main 146 1 0.4 0.2 0.4 0.2
basicUnsafeIndexM Data.Vector.Unboxed.Base 266 256 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Primitive 267 0 0.0 0.0 0.0 0.0
indexByteArray# Data.Primitive.Types 268 256 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Primitive 265 256 0.0 0.0 0.0 0.0
>>= Data.Vector.Fusion.Util 264 256 0.0 0.0 0.0 0.0
unId Data.Vector.Fusion.Util 263 256 0.0 0.0 0.0 0.0
basicLength Data.Vector.Unboxed.Base 262 1 0.0 0.0 0.0 0.0
basicLength Data.Vector.Primitive 261 1 0.0 0.0 0.0 0.0
CAF Main 143 0 0.0 0.0 0.0 0.0
main Main 144 1 0.0 0.0 0.0 0.0
main.counts Main 150 0 0.0 0.0 0.0 0.0
letterCount Main 151 0 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Unboxed.Base 244 0 0.0 0.0 0.0 0.0
>>= Data.Vector.Fusion.Util 245 0 0.0 0.0 0.0 0.0
basicUnsafeIndexM Data.Vector.Unboxed.Base 246 0 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 224 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Unboxed.Base 173 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 175 1 0.0 0.0 0.0 0.0
basicUnsafeFreeze Data.Vector.Primitive 174 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Unboxed.Base 171 1 0.0 0.0 0.0 0.0
basicUnsafeSlice Data.Vector.Primitive.Mutable 172 1 0.0 0.0 0.0 0.0
basicUnsafeWrite Data.Vector.Unboxed.Base 167 256 0.0 0.0 0.0 0.0
basicUnsafeWrite Data.Vector.Primitive.Mutable 168 256 0.0 0.0 0.0 0.0
writeByteArray# Data.Primitive.Types 170 256 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 169 256 0.0 0.0 0.0 0.0
>>= Data.Vector.Fusion.Util 165 256 0.0 0.0 0.0 0.0
unId Data.Vector.Fusion.Util 164 257 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Unboxed.Base 156 1 0.0 0.0 0.0 0.0
primitive Control.Monad.Primitive 162 1 0.0 0.0 0.0 0.0
basicUnsafeNew Data.Vector.Primitive.Mutable 157 1 0.0 0.0 0.0 0.0
sizeOf Data.Primitive 158 1 0.0 0.0 0.0 0.0
sizeOf# Data.Primitive.Types 159 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.Types 160 1 0.0 0.0 0.0 0.0
upperBound Data.Vector.Fusion.Stream.Size 153 1 0.0 0.0 0.0 0.0
elemseq Data.Vector.Unboxed.Base 152 1 0.0 0.0 0.0 0.0
printCounts Main 147 0 0.0 0.0 0.0 0.0
CAF Data.Vector.Internal.Check 142 0 0.0 0.0 0.0 0.0
doBoundsChecks Data.Vector.Internal.Check 213 1 0.0 0.0 0.0 0.0
doUnsafeChecks Data.Vector.Internal.Check 155 1 0.0 0.0 0.0 0.0
doInternalChecks Data.Vector.Internal.Check 154 1 0.0 0.0 0.0 0.0
CAF Data.Vector.Fusion.Util 141 0 0.0 0.0 0.0 0.0
return Data.Vector.Fusion.Util 241 1 0.0 0.0 0.0 0.0
return Data.Vector.Fusion.Util 166 1 0.0 0.0 0.0 0.0
CAF Data.Vector.Unboxed.Base 136 0 0.0 0.0 0.0 0.0
basicUnsafeCopy Data.Vector.Unboxed.Base 227 0 0.0 0.0 0.0 0.0
basicUnsafeCopy Data.Vector.Primitive 228 0 0.0 0.0 0.0 0.0
basicUnsafeCopy.sz Data.Vector.Primitive 229 1 0.0 0.0 0.0 0.0
sizeOf Data.Primitive 230 1 0.0 0.0 0.0 0.0
sizeOf# Data.Primitive.Types 231 1 0.0 0.0 0.0 0.0
unI# Data.Primitive.Types 232 1 0.0 0.0 0.0 0.0
CAF Data.Primitive.MachDeps 128 0 0.0 0.0 0.0 0.0
sIZEOF_INT Data.Primitive.MachDeps 161 1 0.0 0.0 0.0 0.0
CAF Text.Printf 118 0 0.0 0.0 0.0 0.0
CAF GHC.Conc.Signal 112 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 109 0 0.1 0.0 0.1 0.0
CAF GHC.IO.Encoding 99 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Encoding.Iconv 98 0 0.0 0.0 0.0 0.0
CAF GHC.IO.FD 95 0 0.0 0.0 0.0 0.0
Unfortunately, you cannot state --ghc-option=… as a flag at the dependencies.
You also need -prof.
GHC Users's Guide says "There are a few other profiling-related compilation options. Use them in addition to -prof. These do not have to be used consistently for all modules in a program.
"
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
free -m
total used free shared buffers cached
Mem: 7974 6993 981 0 557 893
-/+ buffers/cache: 5542 2432
Swap: 2047 0 2047
You see that my system has used 5542MB memory, but when I use ps aux to check who uses it, I couldn't figure out.
ps aux | awk '$6 > 0{print $3, $4, $5, $6}'
%CPU %MEM VSZ RSS
0.0 0.0 10344 700
0.0 0.0 51172 2092
0.0 0.0 51172 1032
0.0 0.0 68296 1600
0.0 0.0 12692 872
0.0 0.0 33840 864
0.0 0.0 10728 376
0.0 0.0 8564 648
0.0 0.0 74856 1132
53.2 0.5 930408 45824
0.0 0.0 24236 1768
0.0 0.0 51172 2100
0.0 0.0 51172 1040
0.0 0.0 68296 1600
51.9 0.5 864348 42740
0.0 0.0 34360 2672
0.0 0.0 3784 528
0.0 0.0 3784 532
0.0 0.0 3784 528
0.0 0.0 3784 528
0.0 0.0 3784 532
0.0 0.0 65604 900
0.0 0.0 63916 832
0.0 0.0 94020 5980
0.0 0.0 3836 468
0.0 0.0 93736 4000
0.0 0.0 3788 484
0.0 0.0 3652 336
0.0 0.0 3652 336
0.0 0.0 3684 344
0.0 0.0 3664 324
0.0 0.0 19184 4880
0.0 0.0 3704 324
0.0 0.0 340176 1312
0.0 0.0 46544 816
0.0 0.0 10792 1092
0.0 0.0 3824 400
0.0 0.0 3640 292
0.0 0.0 3652 332
0.0 0.0 3652 332
0.0 0.0 3664 328
0.0 0.0 4264 1004
0.0 0.0 4584 2368
0.0 0.0 77724 3060
0.0 0.0 89280 2704
you see, that the sum of RSS is 152.484MB, the sum of VSZ is 3376.34MB, so I don't know who eat up the rest of the memory, the kernel?
From my system:
$ grep ^S[^wh] /proc/meminfo
Slab: 4707412 kB
SReclaimable: 4602900 kB
SUnreclaim: 104512 kB
These three metrics are data structures held by the slab alocator. While SUnreclaimable is, well, unreclaimable, SReclaimable is just like any other cache in the system - it will be made available to processes under memory pressure. Unfortunately free does not seem to take it into account, as mentioned in detail in this older answer of mine, and this part of memory can easily grow to several GB of memory...
If you really want to see how much memory your processes are using you could try going through the cache-emptying procedure described in my post - you can skip the swap-related parts, since your system does not appear to be using any swap memory anyway.