How should I interpret the output of the ghc heap profiler? - haskell

I have a server process implemented in haskell that acts as a simple in-memory db. Client processes can connect then add and retrieve data. The service uses more memory than I would expect, and I'm attempting to work out why.
The crudest metric I have is linux "top". When I start the process I see an "VIRT" image size of ~27MB. After running a client to insert 60,000 data items, I see an image size of ~124MB.
Running the process to capture GC statistics (+RTS -S), I see initially
Alloc Copied Live GC GC TOT TOT Page Flts
bytes bytes bytes user elap user elap
28296 8388 9172 0.00 0.00 0.00 0.32 0 0 (Gen: 1)
and on adding the 60k items I see the live bytes grow smoothly to
...
532940 14964 63672180 0.00 0.00 23.50 31.95 0 0 (Gen: 0)
532316 7704 63668672 0.00 0.00 23.50 31.95 0 0 (Gen: 0)
530512 9648 63677028 0.00 0.00 23.50 31.95 0 0 (Gen: 0)
531936 10796 63686488 0.00 0.00 23.51 31.96 0 0 (Gen: 0)
423260 10047016 63680532 0.03 0.03 23.53 31.99 0 0 (Gen: 1)
531864 6996 63693396 0.00 0.00 23.55 32.01 0 0 (Gen: 0)
531852 9160 63703536 0.00 0.00 23.55 32.01 0 0 (Gen: 0)
531888 9572 63711876 0.00 0.00 23.55 32.01 0 0 (Gen: 0)
531928 9716 63720128 0.00 0.00 23.55 32.01 0 0 (Gen: 0)
531856 9640 63728052 0.00 0.00 23.55 32.02 0 0 (Gen: 0)
529632 9280 63735824 0.00 0.00 23.56 32.02 0 0 (Gen: 0)
527948 8304 63742524 0.00 0.00 23.56 32.02 0 0 (Gen: 0)
528248 7152 63749180 0.00 0.00 23.56 32.02 0 0 (Gen: 0)
528240 6384 63756176 0.00 0.00 23.56 32.02 0 0 (Gen: 0)
341100 10050336 63731152 0.03 0.03 23.58 32.35 0 0 (Gen: 1)
5080 10049728 63705868 0.03 0.03 23.61 32.70 0 0 (Gen: 1)
This appears to be telling me that the heap has ~63MB of live data. This could well be consistent with numbers from top, by the time you add on stack space, code space, GC overhead etc.
So I attempted to use the heap profiler to work out what's making up
this 63MB. The results are confusing. Running with "+RTS -h", and looking at the
generated hp file, the last and largest snapshot has:
containers-0.3.0.0:Data.Map.Bin 1820400
bytestring-0.9.1.7:Data.ByteString.Internal.PS 1336160
main:KV.Store.Memory.KeyTree 831972
main:KV.Types.KF_1 750328
base:GHC.ForeignPtr.PlainPtr 534464
base:Data.Maybe.Just 494832
THUNK 587140
All of the other numbers in the snapshot are much smaller than this.
Adding these up gives the peak memory usage as ~6MB, as reflected in the
chart output:
Why is this inconsistent with the live bytes as shown in the GC statistics? It's
hard to see how my data structures could be requiring 63MB, and the
profiler says they are not. Where is the memory going?
Thanks for any tips or pointers on this.
Tim

I have a theory. My theory is that your program is using a lot of something like ByteStrings. My theory is that because the main content of ByteStrings is mallocated, they are not displayed while profiling. Thus you could run out of heap without the largest content of your heap showing up on the profiling graph.
To make matters even worse, when you grab substrings of ByteStrings, they by default retain the pointer to the originally allocated block of memory. So even if you are trying to only store a small fragement of some ByteString you could end up retaining the whole of the originally allocated ByteString and this won't show up on your heap profile.
That is my theory anyways. I don't know enough facts about how GHC's heap profiler works nor about how ByteStrings are implemented to know for certain. Maybe someone else can chime in and confirm or dispute my theory.
Edit2: tibbe notes that the buffer used by ByteStrings are pinned. So if you are allocating/freeing lots of small Bytestrings, you can fragment your heap meaning you run out of useable heap with half of it unallocated.
Edit: JaffaCake tells me that sometimes the heap profiler will not display the memory allocated by ByteStrings.

You should use, e.g., hp2ps to get a graphical view of what's going on. Looking at the raw hp file is difficult.

Not everything is included in the profile by default, for example threads and stacks. Try with +RTS -xT.

Related

how are cassandra sessions allocated

I see cassandra java process on my linux instance, it uses ~38gb of memory and shows ~700 threads under it.
When any connections are made to the database via python or java? Do they become a thread under the main java process or a separate OS process?
What happens when a cluster connection is spawning multiple threads, would they also become threads under the main process? If so, how to distinguish between connection threads and connection spawning threads?
The memory allocated for session threads, does it get allocated under non-heap memory?
Update -
#chris - here is the output of tpstats
[username#hostname ~]$ nodetool tpstats
Pool Name Active Pending Completed Blocked All time blocked
ReadStage 0 0 110336013 0 0
ContinuousPagingStage 0 0 31 0 0
MiscStage 0 0 0 0 0
CompactionExecutor 0 0 4244757 0 0
MutationStage 0 0 25309020 0 0
GossipStage 0 0 2484700 0 0
RequestResponseStage 0 0 46705216 0 0
ReadRepairStage 0 0 2193356 0 0
CounterMutationStage 0 0 3563130 0 0
MemtablePostFlush 0 0 117717 0 0
ValidationExecutor 1 1 111176 0 0
MemtableFlushWriter 0 0 23843 0 0
ViewMutationStage 0 0 0 0 0
CacheCleanupExecutor 0 0 0 0 0
Repair#1953 1 3 1 0 0
MemtableReclaimMemory 0 0 28251 0 0
PendingRangeCalculator 0 0 6 0 0
AntiCompactionExecutor 0 0 0 0 0
SecondaryIndexManagement 0 0 0 0 0
HintsDispatcher 0 0 29 0 0
Native-Transport-Requests 0 0 110953286 0 0
MigrationStage 0 0 19 0 0
PerDiskMemtableFlushWriter_0 0 0 27853 0 0
Sampler 0 0 0 0 0
InternalResponseStage 0 0 21264 0 0
AntiEntropyStage 0 0 350913 0 0
Message type Dropped Latency waiting in queue (micros)
50% 95% 99% Max
READ 0 0.00 0.00 0.00 10090.81
RANGE_SLICE 0 0.00 0.00 10090.81 10090.81
_TRACE 0 N/A N/A N/A N/A
HINT 0 0.00 0.00 0.00 0.00
MUTATION 0 0.00 0.00 0.00 10090.81
COUNTER_MUTATION 0 0.00 0.00 0.00 10090.81
BATCH_STORE 0 0.00 0.00 0.00 0.00
BATCH_REMOVE 0 0.00 0.00 0.00 0.00
REQUEST_RESPONSE 0 0.00 0.00 0.00 12108.97
PAGED_RANGE 0 N/A N/A N/A N/A
READ_REPAIR 0 0.00 0.00 0.00 0.00```
The connections go to a netty service which should have threads equal to the number of cores, even if you have 10000 connected clients. However Cassandra was initially designed with a Staged Event Driven Architecture (SEDA) which kinda sits between an async and fully threaded model. It creates pools of threads to handle different types of tasks. This does mean however depending on the configuration in your yaml there can be a lot of threads. For example by default theres up to 128 threads for the native transport pool, 32 concurrent readers, 32 concurrent writers, 32 counter mutations etc but if your cluster was tuned for ssds these might be higher. That in mind theres a number of these pools that use a shared pool (show up as SharedWorkers) with the SEPExecutor (Single executor pool). So with spikes there might be many created but the threads may not be utilized often.
nodetool tpstats will give you details on the different pools and how many are active which can help identify which and if the threads are being used. If not there then you can also use jstack (use as same user as the cassandra process) to dump the traces. If its too much to look through theres tools like https://fastthread.io/ to make viewing it easier.
For what it is worth 32gb of memory and 700 threads doesn't sound like an issue.

Figuring out Linux memory usage

I've got a bit weird Linux memory usage I'm trying to figure out.
I've got 2 processes: nxtcapture & nxtexport. None of these processes really allocate much memory however they both mmap a 1 TB file each. nxtexport has no heap allocations (apart from during startup). nxtcapture writes sequentially to the file and nxtexport reads sequentially. Since nxtexport reads from the tail of nxtcapture I don't really have any read IO.
ing992:~# iostat -m
Linux 4.4.52-nxt (ing992) 05/25/17 _x86_64_ (32 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
29.17 1.99 0.96 0.06 0.00 67.82
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
loop0 0.02 0.00 0.00 2 0
sdf 16.47 0.06 0.85 4207 61442
sdf1 0.00 0.00 0.00 5 0
sdf2 0.01 0.00 0.00 77 0
sdf3 16.45 0.06 0.85 4115 61442
sdf4 0.00 0.00 0.00 7 0
sde 15.45 0.01 0.85 1032 61442
sde1 0.00 0.00 0.00 5 0
sde2 0.00 0.00 0.00 0 0
sde3 15.44 0.01 0.85 1017 61442
sde4 0.00 0.00 0.00 7 0
sdb 43.08 0.00 15.72 22 1136368
sda 43.07 0.00 15.72 21 1136406
sdc 43.42 0.04 15.72 2711 1136332
sdd 43.07 0.00 15.72 20 1136301
md127 0.01 0.00 0.00 77 0
md126 23.77 0.07 0.85 5132 61145
This all great. However, looking at the memory usage I can see the following:
Which shows that more than half of my memory is unavailable?! How is this possible? I understand that mmap will keep pages cached. But shouldn't such (non-dirty) pages be counted as available? What's going on here? How can I debug this?
free -m
total used free shared buffers cached
Mem: 32020 31608 412 221 13 9655
-/+ buffers/cache: 21939 10081
Swap: 0 0 0

Read mixed text file in MATLAB

I have a text file which contains numbers and characters, and, more importantly, it also has * which means repetition. For example:
data
-- comment
34*0.00 0.454 0.223
0.544 5*4.866
/
the above example starts with 34 , zeros, 0.00 , and then 0.454 and then 0.223 , then 0.544 and 5 of 4.866 repeated. which means it has 34 + 1 + 1 +1 + 5 = 42 numeric values. What is the best way to write a general code that can read such text files? Nothing else matters in the text file; only the numbers are relevant.
The first step is to read the data in. I'm assuming that the contents of your file look like this:
-- comment
34*0.00 0.454 0.223
0.544 5*4.866
For that format, you can use textscan like so:
fid = fopen('data.txt');
data = textscan(fid, '%s', 'CommentStyle', '--');
fclose(fid);
data = data{1};
And data will look like this when displayed:
data =
5×1 cell array
'34*0.00'
'0.454'
'0.223'
'0.544'
'5*4.866'
Now, there are a few different ways you could try to convert this into numeric data of the format you need. One (potentially horrifying) way is to use regexprep like so:
>> data = regexprep(data, '([\d\.]+)\*([\d\.]+)', ...
'${repmat([$2 blanks(1)], 1, str2num($1))}')
data =
5×1 cell array
'0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.0…'
'0.454'
'0.223'
'0.544'
'4.866 4.866 4.866 4.866 4.866 '
As you can see, it replicates each string in place as needed. Now, we can convert each cell of the cell array to a numeric value and concatenate them all together like so, using cellfun and str2num:
>> num = cellfun(#str2num, data, 'UniformOutput', false);
>> num = [num{:}]
num =
Columns 1 through 14
0 0 0 0 0 0 0 0 0 0 0 0 0 0
Columns 15 through 28
0 0 0 0 0 0 0 0 0 0 0 0 0 0
Columns 29 through 42
0 0 0 0 0 0 0.4540 0.2230 0.5440 4.8660 4.8660 4.8660 4.8660 4.8660

varnish round robin director not picking backends

I have varnish setup with 2 backend servers with a round-robin director.
The 2 backends are showing up in varnishstat and varnishadm as healthy.
varnishadm output:
Backend name Admin Probe
boot.app1 probe Healthy 5/5
boot.app2 probe Healthy 5/5
VCL Configuration:
probe ping {
.interval = 5s;
.timeout = 1s;
.threshold = 3;
.window = 5;
.url = "/ping";
}
backend app1 {
.host = "app-1.example.com";
.port = "80";
.probe = ping;
}
backend app2 {
.host = "app-2.example.com";
.port = "80";
.probe = ping;
}
new application_servers = directors.round_robin();
application_servers.add_backend(app1);
application_servers.add_backend(app2);
set req.backend_hint = application_servers;
varnishstat output:
VBE.boot.app1.happy ffffffffff VVVVVVVVVVVVVVVVVVVVVVVV
VBE.boot.app1.bereq_hdrbytes 66.17K 0.00 91.00 0.00 0.00 0.00
VBE.boot.app1.beresp_hdrbytes 76.72K 0.00 106.00 0.00 0.00 0.00
VBE.boot.app1.beresp_bodybytes 11.91M 0.00 16.50K 0.00 0.00 0.00
VBE.boot.app1.conn 251 0.00 . 251.00 251.00 251.00
VBE.boot.app1.req 251 0.00 . 0.00 0.00 0.00
VBE.boot.app2.happy ffffffffff VVVVVVVVVVVVVVVVVVVVVVVV
You can see from the varnishstat command that traffic appears to only be sent to the first server in the round-robin configuration. There's no other lines for the app2 server other than .happy
Any thoughts on what would be causing the director to pick the first server every time?
Varnishstat -1 Output
MAIN.uptime 218639 1.00 Child process uptime
MAIN.sess_conn 5253150 24.03 Sessions accepted
MAIN.sess_drop 0 0.00 Sessions dropped
MAIN.sess_fail 0 0.00 Session accept failures
MAIN.client_req_400 0 0.00 Client requests received, subject to 400 errors
MAIN.client_req_417 0 0.00 Client requests received, subject to 417 errors
MAIN.client_req 1174495 5.37 Good client requests received
MAIN.cache_hit 61 0.00 Cache hits
MAIN.cache_hitpass 395 0.00 Cache hits for pass
MAIN.cache_miss 1927 0.01 Cache misses
MAIN.backend_conn 0 0.00 Backend conn. success
MAIN.backend_unhealthy 0 0.00 Backend conn. not attempted
MAIN.backend_busy 0 0.00 Backend conn. too many
MAIN.backend_fail 0 0.00 Backend conn. failures
MAIN.backend_reuse 7720 0.04 Backend conn. reuses
MAIN.backend_recycle 8926 0.04 Backend conn. recycles
MAIN.backend_retry 0 0.00 Backend conn. retry
MAIN.fetch_head 0 0.00 Fetch no body (HEAD)
MAIN.fetch_length 1350 0.01 Fetch with Length
MAIN.fetch_chunked 7572 0.03 Fetch chunked
MAIN.fetch_eof 3 0.00 Fetch EOF
MAIN.fetch_bad 0 0.00 Fetch bad T-E
MAIN.fetch_none 0 0.00 Fetch no body
MAIN.fetch_1xx 0 0.00 Fetch no body (1xx)
MAIN.fetch_204 0 0.00 Fetch no body (204)
MAIN.fetch_304 27 0.00 Fetch no body (304)
MAIN.fetch_failed 0 0.00 Fetch failed (all causes)
MAIN.fetch_no_thread 0 0.00 Fetch failed (no thread)
MAIN.pools 2 . Number of thread pools
MAIN.threads 20 . Total number of threads
MAIN.threads_limited 0 0.00 Threads hit max
MAIN.threads_created 1377 0.01 Threads created
MAIN.threads_destroyed 1357 0.01 Threads destroyed
MAIN.threads_failed 0 0.00 Thread creation failed
MAIN.thread_queue_len 0 . Length of session queue
MAIN.busy_sleep 3 0.00 Number of requests sent to sleep on busy objhdr
MAIN.busy_wakeup 3 0.00 Number of requests woken after sleep on busy objhdr
MAIN.busy_killed 0 0.00 Number of requests killed after sleep on busy objhdr
MAIN.sess_queued 1728 0.01 Sessions queued for thread
MAIN.sess_dropped 0 0.00 Sessions dropped for thread
MAIN.n_object 135 . object structs made
MAIN.n_vampireobject 0 . unresurrected objects
MAIN.n_objectcore 141 . objectcore structs made
MAIN.n_objecthead 146 . objecthead structs made
MAIN.n_waitinglist 17 . waitinglist structs made
MAIN.n_backend 6 . Number of backends
MAIN.n_expired 840 . Number of expired objects
MAIN.n_lru_nuked 0 . Number of LRU nuked objects
MAIN.n_lru_moved 52 . Number of LRU moved objects
MAIN.losthdr 0 0.00 HTTP header overflows
MAIN.s_sess 5253150 24.03 Total sessions seen
MAIN.s_req 1174495 5.37 Total requests seen
MAIN.s_pipe 0 0.00 Total pipe sessions seen
MAIN.s_pass 7025 0.03 Total pass-ed requests seen
MAIN.s_fetch 8952 0.04 Total backend fetches initiated
MAIN.s_synth 1165482 5.33 Total synthethic responses made
MAIN.s_req_hdrbytes 58007743 265.31 Request header bytes
MAIN.s_req_bodybytes 8324 0.04 Request body bytes
MAIN.s_resp_hdrbytes 250174363 1144.23 Response header bytes
MAIN.s_resp_bodybytes 658785662 3013.12 Response body bytes
MAIN.s_pipe_hdrbytes 0 0.00 Pipe request header bytes
MAIN.s_pipe_in 0 0.00 Piped bytes from client
MAIN.s_pipe_out 0 0.00 Piped bytes to client
MAIN.sess_closed 1170177 5.35 Session Closed
MAIN.sess_closed_err 5244623 23.99 Session Closed with error
MAIN.sess_readahead 0 0.00 Session Read Ahead
MAIN.sess_herd 3208 0.01 Session herd
MAIN.sc_rem_close 3518 0.02 Session OK REM_CLOSE
MAIN.sc_req_close 0 0.00 Session OK REQ_CLOSE
MAIN.sc_req_http10 1165458 5.33 Session Err REQ_HTTP10
MAIN.sc_rx_bad 0 0.00 Session Err RX_BAD
MAIN.sc_rx_body 0 0.00 Session Err RX_BODY
MAIN.sc_rx_junk 4079015 18.66 Session Err RX_JUNK
MAIN.sc_rx_overflow 0 0.00 Session Err RX_OVERFLOW
MAIN.sc_rx_timeout 276 0.00 Session Err RX_TIMEOUT
MAIN.sc_tx_pipe 0 0.00 Session OK TX_PIPE
MAIN.sc_tx_error 0 0.00 Session Err TX_ERROR
MAIN.sc_tx_eof 0 0.00 Session OK TX_EOF
MAIN.sc_resp_close 4688 0.02 Session OK RESP_CLOSE
MAIN.sc_overload 0 0.00 Session Err OVERLOAD
MAIN.sc_pipe_overflow 0 0.00 Session Err PIPE_OVERFLOW
MAIN.sc_range_short 0 0.00 Session Err RANGE_SHORT
MAIN.shm_records 92391706 422.58 SHM records
MAIN.shm_writes 24787122 113.37 SHM writes
MAIN.shm_flushes 4278 0.02 SHM flushes due to overflow
MAIN.shm_cont 72956 0.33 SHM MTX contention
MAIN.shm_cycles 30 0.00 SHM cycles through buffer
MAIN.backend_req 8952 0.04 Backend requests made
MAIN.n_vcl 3 0.00 Number of loaded VCLs in total
MAIN.n_vcl_avail 3 0.00 Number of VCLs available
MAIN.n_vcl_discard 0 0.00 Number of discarded VCLs
MAIN.bans 1 . Count of bans
MAIN.bans_completed 1 . Number of bans marked 'completed'
MAIN.bans_obj 0 . Number of bans using obj.*
MAIN.bans_req 0 . Number of bans using req.*
MAIN.bans_added 1 0.00 Bans added
MAIN.bans_deleted 0 0.00 Bans deleted
MAIN.bans_tested 0 0.00 Bans tested against objects (lookup)
MAIN.bans_obj_killed 0 0.00 Objects killed by bans (lookup)
MAIN.bans_lurker_tested 0 0.00 Bans tested against objects (lurker)
MAIN.bans_tests_tested 0 0.00 Ban tests tested against objects (lookup)
MAIN.bans_lurker_tests_tested 0 0.00 Ban tests tested against objects (lurker)
MAIN.bans_lurker_obj_killed 0 0.00 Objects killed by bans (lurker)
MAIN.bans_dups 0 0.00 Bans superseded by other bans
MAIN.bans_lurker_contention 0 0.00 Lurker gave way for lookup
MAIN.bans_persisted_bytes 16 . Bytes used by the persisted ban lists
MAIN.bans_persisted_fragmentation 0 . Extra bytes in persisted ban lists due to fragmentation
MAIN.n_purges 0 . Number of purge operations executed
MAIN.n_obj_purged 0 . Number of purged objects
MAIN.exp_mailed 2879 0.01 Number of objects mailed to expiry thread
MAIN.exp_received 2879 0.01 Number of objects received by expiry thread
MAIN.hcb_nolock 2383 0.01 HCB Lookups without lock
MAIN.hcb_lock 975 0.00 HCB Lookups with lock
MAIN.hcb_insert 975 0.00 HCB Inserts
MAIN.esi_errors 0 0.00 ESI parse errors (unlock)
MAIN.esi_warnings 0 0.00 ESI parse warnings (unlock)
MAIN.vmods 2 . Loaded VMODs
MAIN.n_gzip 0 0.00 Gzip operations
MAIN.n_gunzip 2945 0.01 Gunzip operations
MAIN.vsm_free 972480 . Free VSM space
MAIN.vsm_used 83961104 . Used VSM space
MAIN.vsm_cooling 1024 . Cooling VSM space
MAIN.vsm_overflow 0 . Overflow VSM space
MAIN.vsm_overflowed 0 0.00 Overflowed VSM space
MGT.uptime 218640 1.00 Management process uptime
MGT.child_start 1 0.00 Child process started
MGT.child_exit 0 0.00 Child process normal exit
MGT.child_stop 0 0.00 Child process unexpected exit
MGT.child_died 0 0.00 Child process died (signal)
MGT.child_dump 0 0.00 Child process core dumped
MGT.child_panic 0 0.00 Child process panic
MEMPOOL.busyobj.live 0 . In use
MEMPOOL.busyobj.pool 10 . In Pool
MEMPOOL.busyobj.sz_wanted 65536 . Size requested
MEMPOOL.busyobj.sz_actual 65504 . Size allocated
MEMPOOL.busyobj.allocs 8952 0.04 Allocations
MEMPOOL.busyobj.frees 8952 0.04 Frees
MEMPOOL.busyobj.recycle 8934 0.04 Recycled from pool
MEMPOOL.busyobj.timeout 2477 0.01 Timed out from pool
MEMPOOL.busyobj.toosmall 0 0.00 Too small to recycle
MEMPOOL.busyobj.surplus 0 0.00 Too many for pool
MEMPOOL.busyobj.randry 18 0.00 Pool ran dry
MEMPOOL.req0.live 0 . In use
MEMPOOL.req0.pool 10 . In Pool
MEMPOOL.req0.sz_wanted 65536 . Size requested
MEMPOOL.req0.sz_actual 65504 . Size allocated
MEMPOOL.req0.allocs 2622296 11.99 Allocations
MEMPOOL.req0.frees 2622296 11.99 Frees
MEMPOOL.req0.recycle 2622295 11.99 Recycled from pool
MEMPOOL.req0.timeout 1604 0.01 Timed out from pool
MEMPOOL.req0.toosmall 0 0.00 Too small to recycle
MEMPOOL.req0.surplus 0 0.00 Too many for pool
MEMPOOL.req0.randry 1 0.00 Pool ran dry
MEMPOOL.sess0.live 0 . In use
MEMPOOL.sess0.pool 10 . In Pool
MEMPOOL.sess0.sz_wanted 512 . Size requested
MEMPOOL.sess0.sz_actual 480 . Size allocated
MEMPOOL.sess0.allocs 2620824 11.99 Allocations
MEMPOOL.sess0.frees 2620824 11.99 Frees
MEMPOOL.sess0.recycle 2620823 11.99 Recycled from pool
MEMPOOL.sess0.timeout 2001 0.01 Timed out from pool
MEMPOOL.sess0.toosmall 0 0.00 Too small to recycle
MEMPOOL.sess0.surplus 0 0.00 Too many for pool
MEMPOOL.sess0.randry 1 0.00 Pool ran dry
MEMPOOL.req1.live 0 . In use
MEMPOOL.req1.pool 10 . In Pool
MEMPOOL.req1.sz_wanted 65536 . Size requested
MEMPOOL.req1.sz_actual 65504 . Size allocated
MEMPOOL.req1.allocs 2633786 12.05 Allocations
MEMPOOL.req1.frees 2633786 12.05 Frees
MEMPOOL.req1.recycle 2633785 12.05 Recycled from pool
MEMPOOL.req1.timeout 1589 0.01 Timed out from pool
MEMPOOL.req1.toosmall 0 0.00 Too small to recycle
MEMPOOL.req1.surplus 0 0.00 Too many for pool
MEMPOOL.req1.randry 1 0.00 Pool ran dry
MEMPOOL.sess1.live 0 . In use
MEMPOOL.sess1.pool 10 . In Pool
MEMPOOL.sess1.sz_wanted 512 . Size requested
MEMPOOL.sess1.sz_actual 480 . Size allocated
MEMPOOL.sess1.allocs 2632326 12.04 Allocations
MEMPOOL.sess1.frees 2632326 12.04 Frees
MEMPOOL.sess1.recycle 2632325 12.04 Recycled from pool
MEMPOOL.sess1.timeout 1908 0.01 Timed out from pool
MEMPOOL.sess1.toosmall 0 0.00 Too small to recycle
MEMPOOL.sess1.surplus 0 0.00 Too many for pool
MEMPOOL.sess1.randry 1 0.00 Pool ran dry
SMA.s0.c_req 93 0.00 Allocator requests
SMA.s0.c_fail 0 0.00 Allocator failures
SMA.s0.c_bytes 905611 4.14 Bytes allocated
SMA.s0.c_freed 849277 3.88 Bytes freed
SMA.s0.g_alloc 7 . Allocations outstanding
SMA.s0.g_bytes 56334 . Bytes outstanding
SMA.s0.g_space 6442394610 . Bytes available
SMA.Transient.c_req 2363316 10.81 Allocator requests
SMA.Transient.c_fail 0 0.00 Allocator failures
SMA.Transient.c_bytes 2083208664 9528.07 Bytes allocated
SMA.Transient.c_freed 2083145488 9527.79 Bytes freed
SMA.Transient.g_alloc 132 . Allocations outstanding
SMA.Transient.g_bytes 63176 . Bytes outstanding
SMA.Transient.g_space 0 . Bytes available
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app1.happy 18446744073709551615 . Happy health probes
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app1.bereq_hdrbytes 1944414 8.89 Request header bytes
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app1.bereq_bodybytes 8324 0.04 Request body bytes
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app1.beresp_hdrbytes 1608040 7.35 Response header bytes
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app1.beresp_bodybytes 154396823 706.17 Response body bytes
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app1.pipe_hdrbytes 0 0.00 Pipe request header bytes
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app1.pipe_out 0 0.00 Piped bytes to backend
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app1.pipe_in 0 0.00 Piped bytes from backend
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app1.conn 4297 . Concurrent connections to backend
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app1.req 4297 0.02 Backend requests sent
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app2.happy 18446744073709551615 . Happy health probes
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app2.bereq_hdrbytes 0 0.00 Request header bytes
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app2.bereq_bodybytes 0 0.00 Request body bytes
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app2.beresp_hdrbytes 0 0.00 Response header bytes
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app2.beresp_bodybytes 0 0.00 Response body bytes
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app2.pipe_hdrbytes 0 0.00 Pipe request header bytes
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app2.pipe_out 0 0.00 Piped bytes to backend
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app2.pipe_in 0 0.00 Piped bytes from backend
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app2.conn 0 . Concurrent connections to backend
VBE.58b9f33d-bb8c-4540-ab9e-73da4e8c1cf9.app2.req 0 0.00 Backend requests sent
LCK.backend.creat 7 0.00 Created locks
LCK.backend.destroy 0 0.00 Destroyed locks
LCK.backend.locks 194025 0.89 Lock Operations
LCK.backend_tcp.creat 2 0.00 Created locks
LCK.backend_tcp.destroy 0 0.00 Destroyed locks
LCK.backend_tcp.locks 34549 0.16 Lock Operations
LCK.ban.creat 1 0.00 Created locks
LCK.ban.destroy 0 0.00 Destroyed locks
LCK.ban.locks 1193862 5.46 Lock Operations
LCK.busyobj.creat 8951 0.04 Created locks
LCK.busyobj.destroy 8952 0.04 Destroyed locks
LCK.busyobj.locks 227907 1.04 Lock Operations
LCK.cli.creat 1 0.00 Created locks
LCK.cli.destroy 0 0.00 Destroyed locks
LCK.cli.locks 72890 0.33 Lock Operations
LCK.exp.creat 1 0.00 Created locks
LCK.exp.destroy 0 0.00 Destroyed locks
LCK.exp.locks 17964 0.08 Lock Operations
LCK.hcb.creat 1 0.00 Created locks
LCK.hcb.destroy 0 0.00 Destroyed locks
LCK.hcb.locks 3030 0.01 Lock Operations
LCK.lru.creat 2 0.00 Created locks
LCK.lru.destroy 0 0.00 Destroyed locks
LCK.lru.locks 6675 0.03 Lock Operations
LCK.mempool.creat 5 0.00 Created locks
LCK.mempool.destroy 0 0.00 Destroyed locks
LCK.mempool.locks 22011020 100.67 Lock Operations
LCK.objhdr.creat 1094 0.01 Created locks
LCK.objhdr.destroy 947 0.00 Destroyed locks
LCK.objhdr.locks 4780984 21.87 Lock Operations
LCK.pipestat.creat 1 0.00 Created locks
LCK.pipestat.destroy 0 0.00 Destroyed locks
LCK.pipestat.locks 0 0.00 Lock Operations
LCK.sess.creat 5250338 24.01 Created locks
LCK.sess.destroy 5252842 24.03 Destroyed locks
LCK.sess.locks 11 0.00 Lock Operations
LCK.smp.creat 0 0.00 Created locks
LCK.smp.destroy 0 0.00 Destroyed locks
LCK.smp.locks 0 0.00 Lock Operations
LCK.vbe.creat 1 0.00 Created locks
LCK.vbe.destroy 0 0.00 Destroyed locks
LCK.vbe.locks 72885 0.33 Lock Operations
LCK.vcapace.creat 1 0.00 Created locks
LCK.vcapace.destroy 0 0.00 Destroyed locks
LCK.vcapace.locks 0 0.00 Lock Operations
LCK.vcl.creat 1 0.00 Created locks
LCK.vcl.destroy 0 0.00 Destroyed locks
LCK.vcl.locks 33732 0.15 Lock Operations
LCK.vxid.creat 1 0.00 Created locks
LCK.vxid.destroy 0 0.00 Destroyed locks
LCK.vxid.locks 1348 0.01 Lock Operations
LCK.waiter.creat 2 0.00 Created locks
LCK.waiter.destroy 0 0.00 Destroyed locks
LCK.waiter.locks 43236 0.20 Lock Operations
LCK.wq.creat 3 0.00 Created locks
LCK.wq.destroy 0 0.00 Destroyed locks
LCK.wq.locks 16545779 75.68 Lock Operations
LCK.wstat.creat 1 0.00 Created locks
LCK.wstat.destroy 0 0.00 Destroyed locks
LCK.wstat.locks 5362050 24.52 Lock Operations
LCK.sma.creat 2 0.00 Created locks
LCK.sma.destroy 0 0.00 Destroyed locks
LCK.sma.locks 4726679 21.62 Lock Operations
Generally you would call .backend() to pass at as the backend_hint.
req.backend_hint = application_servers.backend();
I would think that would be a syntax error to not do that, but it's possible that it simply returns the first backend in the case that you use the director instance as a backend.
Where in your vcl is the line set req.backend.hint = application_servers.backend(); ?
It should be the first line in sub vcl_recv
Also, what does the probe window look like? You listed 'happy' but if you run varnishstat -1 you should see the full probe window.
To help debug it further put in some syslog calls in your vcl. Use either vmod std or just inline them with
C{
syslog(LOG_ERR, "I am at line X in my vcl");
}C
You need to turn-on inline C option. In Varnish 4 it defaults to off.
Found that the issue was related to the puppet module for deploying varnish.
The puppet module wasn't including the relevant template in the vcl file.
I've submitted a pull request to GitHub to fix this

Haskell: Leaking memory from ST / GC not collecting?

I have a computation inside ST which allocates memory through a Data.Vector.Unboxed.Mutable. The vector is never read or written, nor is any reference retained to it outside of runST (to the best of my knowledge). The problem I have is that when I run my ST computation multiple times, I sometimes seem to keep the memory for the vector around.
Allocation statistics:
5,435,386,768 bytes allocated in the heap
5,313,968 bytes copied during GC
134,364,780 bytes maximum residency (14 sample(s))
3,160,340 bytes maximum slop
518 MB total memory in use (0 MB lost due to fragmentation)
Here I call runST 20x with different values for my computation and a 128MB vector (again - unused, not returned or referenced outside of ST). The maximum residency looks good, basically just my vector plus a few MB of other stuff. But the total memory use indicates that I have four copies of the vector active at the same time. This scales perfectly with the size of the vector, for 256MB we get 1030MB as expected.
Using a 1GB vector runs out of memory (4x1GB + overhead > 32bit). I don't understand why the RTS keeps seemingly unused, unreferenced memory around instead of just GC'ing it, at least at the point where an allocation would otherwise fail.
Running with +RTS -S reveals the following:
Alloc Copied Live GC GC TOT TOT Page Flts
bytes bytes bytes user elap user elap
134940616 13056 134353540 0.00 0.00 0.09 0.19 0 0 (Gen: 1)
583416 6756 134347504 0.00 0.00 0.09 0.19 0 0 (Gen: 0)
518020 17396 134349640 0.00 0.00 0.09 0.19 0 0 (Gen: 1)
521104 13032 134359988 0.00 0.00 0.09 0.19 0 0 (Gen: 0)
520972 1344 134360752 0.00 0.00 0.09 0.19 0 0 (Gen: 0)
521100 828 134360684 0.00 0.00 0.10 0.19 0 0 (Gen: 0)
520812 592 134360528 0.00 0.00 0.10 0.19 0 0 (Gen: 0)
520936 1344 134361324 0.00 0.00 0.10 0.19 0 0 (Gen: 0)
520788 1480 134361476 0.00 0.00 0.10 0.20 0 0 (Gen: 0)
134438548 5964 268673908 0.00 0.00 0.19 0.38 0 0 (Gen: 0)
586300 3084 268667168 0.00 0.00 0.19 0.38 0 0 (Gen: 0)
517840 952 268666340 0.00 0.00 0.19 0.38 0 0 (Gen: 0)
520920 544 268666164 0.00 0.00 0.19 0.38 0 0 (Gen: 0)
520780 428 268666048 0.00 0.00 0.19 0.38 0 0 (Gen: 0)
520820 2908 268668524 0.00 0.00 0.19 0.38 0 0 (Gen: 0)
520732 1788 268668636 0.00 0.00 0.19 0.39 0 0 (Gen: 0)
521076 564 268668492 0.00 0.00 0.19 0.39 0 0 (Gen: 0)
520532 712 268668640 0.00 0.00 0.19 0.39 0 0 (Gen: 0)
520764 956 268668884 0.00 0.00 0.19 0.39 0 0 (Gen: 0)
520816 420 268668348 0.00 0.00 0.20 0.39 0 0 (Gen: 0)
520948 1332 268669260 0.00 0.00 0.20 0.39 0 0 (Gen: 0)
520784 616 268668544 0.00 0.00 0.20 0.39 0 0 (Gen: 0)
521416 836 268668764 0.00 0.00 0.20 0.39 0 0 (Gen: 0)
520488 1240 268669168 0.00 0.00 0.20 0.40 0 0 (Gen: 0)
520824 1608 268669536 0.00 0.00 0.20 0.40 0 0 (Gen: 0)
520688 1276 268669204 0.00 0.00 0.20 0.40 0 0 (Gen: 0)
520252 1332 268669260 0.00 0.00 0.20 0.40 0 0 (Gen: 0)
520672 1000 268668928 0.00 0.00 0.20 0.40 0 0 (Gen: 0)
134553500 5640 402973292 0.00 0.00 0.29 0.58 0 0 (Gen: 0)
586776 2644 402966160 0.00 0.00 0.29 0.58 0 0 (Gen: 0)
518064 26784 134342772 0.00 0.00 0.29 0.58 0 0 (Gen: 1)
520828 3120 134343528 0.00 0.00 0.29 0.59 0 0 (Gen: 0)
521108 756 134342668 0.00 0.00 0.30 0.59 0 0 (Gen: 0)
Here it seems we have 'live bytes' exceeding ~128MB.
The +RTS -hy profile basically just says we allocate 128MB:
http://imageshack.us/a/img69/7765/45q8.png
I tried reproducing this behavior in a simpler program, but even with replicating the exact setup with ST, a Reader containing the Vector, same monad/program structure etc. the simple test program doesn't show this. Simplifying my big program the behavior also stops eventually when removing apparently completely unrelated code.
Qs:
Am I really keeping this vector around 4 times out of 20?
If yes, how do I actually tell since +RTS -Hy and maximum residency claim I'm not, and what can I do to stop this behavior?
If no, why is Haskell not GC'ing it and running out of address space / memory, and what can I do to stop this behavior?
Thanks!
I suspect this is a bug in GHC and/or the RTS.
First, I'm confident there is no actual space leak or anything like that.
Reasons:
The vector is never used anywhere. Not read, not written, not referenced. It should be collected once runST is done. Even when the ST computation returns a single Int which is immediately printed out to evaluate it, the memory issue still exists. There is no reference to that data.
Every profiling mode the RTS offers is in violent agreement that I never actually have more than a single vector's worth of memory allocated/referenced. Every statistic and pretty chart says that.
Now, here's the interesting bit. If I manually force the GC by calling System.Mem.performGC after every run of my function, the problem goes away, completely.
So we have a case where the runtime has GBs worth of memory which (demonstrably!) can be reclaimed by the GC and even according to its own statistic is not held by anybody anymore. When running out of its memory pool the runtime does not collect, but instead asks the OS for more memory. And even when that finally fails, the runtime still does not collect (which would reclaim GBs of memory, demonstrably) but instead chooses to terminate the program with an out-of-memory error.
I'm no expert on Haskell, GHC or GC. But this does look awfully broken to me. I'll report this as a bug.

Resources