Nfs Hanged suddenly not able to restart service - ubuntu-14.04

Nfs Hanged suddenly ! Not able to even restart service :
# /etc/init.d/nfs-kernel-server restart
* Stopping NFS kernel daemon [ OK ]
* Unexporting directories for NFS kernel daemon... [ OK ]
* Exporting directories for NFS kernel daemon... [ OK ]
* Starting NFS kernel daemon ^C
Syslogs says nfs is blocked for more than 120sec. How to understand the logs ? what has gone wrong here ?
# cat /var/log/syslog | grep nfs
Sep 4 10:02:11 InksedgeDev kernel: [2427120.248354] INFO: task nfsd:2318 blocked for more than 120 seconds.
Sep 4 10:02:11 InksedgeDev kernel: [2427120.248363] nfsd D 0000000000000000 0 2318 2 0x00000000
Sep 4 10:02:11 InksedgeDev kernel: [2427120.248406] [<ffffffffa02a28c0>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
Sep 4 10:02:11 InksedgeDev kernel: [2427120.248418] [<ffffffffa02a3c11>] nfsd_vfs_write.isra.13+0x81/0x3b0 [nfsd]
Sep 4 10:02:11 InksedgeDev kernel: [2427120.248422] [<ffffffffa02bf691>] ? lookup_clientid+0x51/0x80 [nfsd]
Sep 4 10:02:11 InksedgeDev kernel: [2427120.248426] [<ffffffffa02bf766>] ? nfsd4_lookup_stateid+0xa6/0xc0 [nfsd]
Sep 4 10:02:11 InksedgeDev kernel: [2427120.248429] [<ffffffffa02a7773>] nfsd_write+0x93/0x110 [nfsd]
Sep 4 10:02:11 InksedgeDev kernel: [2427120.248433] [<ffffffffa02b2281>] nfsd4_write+0x1b1/0x250 [nfsd]
Sep 4 10:02:11 InksedgeDev kernel: [2427120.248437] [<ffffffffa02b35ca>] nfsd4_proc_compound+0x56a/0x7c0 [nfsd]
Sep 4 10:02:11 InksedgeDev kernel: [2427120.248440] [<ffffffffa029fd2b>] nfsd_dispatch+0xbb/0x200 [nfsd]
Sep 4 10:02:11 InksedgeDev kernel: [2427120.248455] [<ffffffffa029f71f>] nfsd+0xbf/0x130 [nfsd]
Sep 4 10:02:11 InksedgeDev kernel: [2427120.248458] [<ffffffffa029f660>] ? nfsd_destroy+0x80/0x80 [nfsd]
Output of Iostat :
# iostat 5 5
Linux 3.13.0-29-generic (InksedgeDev) 09/04/2015 _x86_64_ (2 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
5.68 0.00 0.48 0.25 0.01 93.58
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvdap1 3.63 15.71 82.80 38138709 201066916
xvdb 0.01 0.05 0.01 119271 15231
xvdc 0.01 0.05 0.01 112329 14600
xvdf 4.25 155.14 175.05 376748353 425087092
md0 0.02 0.10 0.01 231288 29831
avg-cpu: %user %nice %system %iowait %steal %idle
0.30 0.00 0.20 49.75 0.00 49.75
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvdap1 0.00 0.00 0.00 0 0
xvdb 0.00 0.00 0.00 0 0
xvdc 0.00 0.00 0.00 0 0
xvdf 0.00 0.00 0.00 0 0
md0 0.00 0.00 0.00 0 0
What might be the reason ? how to troubleshoot ?

Related

is there a way to convert .html file (tab delimited text file) to csv without using pandas

I have a .html file but the file does not contain any tags. I want to convert this .html file to either excel or CSV. So that I can use pandas for carrying some operations.
Currently, I am unable to use pandas as it searches for a table, tr tag for converting HTML file to CSV or excel.
An example HTML file (full page available here):
<title>Vdbench output/localhost-0.html</title><pre>
Slave summary report for slave=localhost-0
Console log: Slave stdout/stderr
Link to Run Definitions: run1 For loops: xfersize=4k
xfersize=16k
xfersize=32k
xfersize=64k
xfersize=128k
<a name="_1313953385"></a><i><b>19:28:33.002 Starting RD=run1; I/O rate: Uncontrolled MAX; elapsed=7200 warmup=600; For loops: xfersize=4k</b></i>
Mar 10, 2018 interval i/o MB/sec bytes read resp read write resp resp queue cpu% cpu%
rate 1024**2 i/o pct time resp resp max stddev depth sys+u sys
19:29:33.086 1 22213.93 86.77 4096 50.00 0.353 0.229 0.478 21.536 0.304 7.9 1.8 1.1
19:30:33.045 2 23145.18 90.41 4096 50.04 0.342 0.227 0.456 21.646 0.278 7.9 1.8 1.2
19:31:33.045 3 23540.93 91.96 4096 50.09 0.336 0.225 0.448 21.278 0.274 7.9 1.8 1.1
19:32:33.044 4 23456.57 91.63 4096 49.99 0.337 0.226 0.449 21.621 0.276 7.9 1.9 1.2
19:33:33.045 5 22726.32 88.77 4096 50.02 0.348 0.230 0.466 22.555 0.283 7.9 2.0 1.3
19:34:33.045 6 22475.53 87.80 4096 50.01 0.352 0.229 0.475 26.673 0.287 7.9 2.1 1.4
19:35:33.044 7 22852.70 89.27 4096 50.01 0.346 0.226 0.466 21.195 0.277 7.9 2.2 1.4
19:36:33.044 8 22929.10 89.57 4096 49.98 0.345 0.229 0.460 21.640 0.277 7.9 2.2 1.5
19:37:33.043 9 22884.20 89.39 4096 50.06 0.346 0.228 0.463 24.478 0.288 7.9 2.0 1.3
19:38:33.045 10 22891.77 89.42 4096 50.00 0.345 0.228 0.463 21.435 0.280 7.9 2.1 1.3
19:39:33.043 11 22358.80 87.34 4096 50.00 0.354 0.228 0.479 21.616 0.282 7.9 1.9 1.2
19:40:33.044 12 22611.53 88.33 4096 49.97 0.350 0.228 0.472 22.261 0.281 7.9 2.1 1.3
19:41:33.046 13 22450.95 87.70 4096 50.05 0.352 0.226 0.478 21.789 0.284 7.9 2.4 1.5
19:42:33.043 14 22610.82 88.32 4096 50.08 0.350 0.227 0.473 21.576 0.282 7.9 2.5 1.4
19:43:33.046 15 22271.13 87.00 4096 50.01 0.355 0.228 0.482 21.216 0.285 7.9 1.9 1.2
19:44:33.044 16 22264.43 86.97 4096 49.99 0.355 0.228 0.482 21.643 0.283 7.9 1.9 1.2
19:45:33.043 17 22710.52 88.71 4096 49.99 0.349 0.223 0.475 21.742 0.284 7.9 1.8 1.2
19:46:33.043 18 22806.60 89.09 4096 49.97 0.347 0.225 0.469 21.536 0.278 7.9 2.0 1.3
19:47:33.043 19 22353.10 87.32 4096 50.01 0.354 0.232 0.476 21.104 0.277 7.9 2.0 1.3
19:48:33.043 20 22285.80 87.05 4096 49.99 0.355 0.230 0.481 21.626 0.282 7.9 1.9 1.2
19:49:33.043 21 22611.10 88.32 4096 50.02 0.350 0.227 0.474 21.776 0.281 7.9 1.9 1.2
19:50:33.044 22 22283.10 87.04 4096 50.07 0.355 0.230 0.481 20.364 0.284 7.9 1.9 1.2
19:51:33.043 23 22308.05 87.14 4096 50.00 0.355 0.228 0.482 21.341 0.282 7.9 2.0 1.3
19:52:33.043 24 22244.08 86.89 4096 50.01 0.356 0.230 0.482 21.464 0.276 7.9 2.0 1.3
19:53:33.042 25 22502.08 87.90 4096 50.05 0.352 0.227 0.477 21.740 0.283 7.9 2.0 1.3
19:54:33.042 26 22170.22 86.60 4096 50.04 0.357 0.229 0.486 26.998 0.291 7.9 1.9 1.2
19:55:33.043 27 22216.00 86.78 4096 50.02 0.356 0.230 0.484 21.016 0.290 7.9 2.0 1.3
19:56:33.043 28 21692.47 84.74 4096 50.04 0.365 0.231 0.498 22.148 0.286 7.9 2.0 1.3
19:57:33.043 29 22002.18 85.95 4096 50.02 0.359 0.229 0.490 21.995 0.285 7.9 2.2 1.4
19:58:33.044 30 22139.08 86.48 4096 50.03 0.357 0.228 0.487 22.367 0.283 7.9 2.2 1.3
Mar 10, 2018 interval i/o MB/sec bytes read resp read write resp resp queue cpu% cpu%
rate 1024**2 i/o pct time resp resp max stddev depth sys+u sys
19:59:33.043 31 21665.45 84.63 4096 49.99 0.365 0.229 0.502 21.849 0.290 7.9 2.0 1.3
20:00:33.042 32 22208.17 86.75 4096 50.08 0.356 0.229 0.484 21.479 0.288 7.9 2.1 1.3
20:01:33.043 33 22254.08 86.93 4096 49.90 0.356 0.229 0.482 21.963 0.287 7.9 2.0 1.3
20:02:33.042 34 21757.10 84.99 4096 49.99 0.364 0.229 0.499 21.713 0.288 7.9 2.1 1.3
20:03:33.043 35 21884.17 85.49 4096 49.99 0.362 0.229 0.494 21.974 0.286 7.9 2.0 1.3
20:04:33.042 36 21920.02 85.63 4096 50.06 0.361 0.229 0.494 22.162 0.285 7.9 1.9 1.2
20:05:33.042 37 21375.17 83.50 4096 49.97 0.370 0.229 0.511 21.662 0.291 7.9 2.0 1.3
20:06:33.043 38 21563.78 84.23 4096 49.97 0.367 0.231 0.503 21.654 0.292 7.9 2.1 1.3
20:07:33.043 39 21792.60 85.13 4096 50.03 0.363 0.229 0.498 24.398 0.301 7.9 2.1 1.3
20:08:33.043 40 21515.17 84.04 4096 50.02 0.368 0.232 0.504 21.738 0.290 7.9 2.0 1.3
20:09:33.043 41 21761.85 85.01 4096 50.07 0.364 0.232 0.496 20.902 0.285 7.9 2.2 1.4
20:10:33.044 42 21676.10 84.67 4096 49.98 0.365 0.230 0.501 22.007 0.289 7.9 2.0 1.3
20:11:33.043 43 22036.17 86.08 4096 50.03 0.359 0.231 0.487 21.992 0.286 7.9 2.0 1.3
20:12:33.043 44 21638.15 84.52 4096 49.94 0.366 0.233 0.498 21.510 0.293 7.9 2.0 1.3
20:13:33.042 45 21951.72 85.75 4096 50.07 0.361 0.231 0.491 21.774 0.286 7.9 2.0 1.3
20:14:33.042 46 21327.32 83.31 4096 50.07 0.371 0.232 0.511 21.522 0.295 7.9 2.1 1.3
20:15:33.043 47 21736.30 84.91 4096 49.99 0.364 0.232 0.496 32.060 0.294 7.9 2.2 1.4
20:16:33.043 48 21526.45 84.09 4096 50.02 0.368 0.231 0.504 21.944 0.286 7.9 2.2 1.4
20:17:33.044 49 21509.20 84.02 4096 49.95 0.368 0.231 0.505 21.529 0.287 7.9 2.1 1.3
20:18:33.043 50 21079.62 82.34 4096 49.99 0.376 0.232 0.520 21.797 0.296 7.9 2.1 1.3
20:19:33.042 51 21683.97 84.70 4096 50.02 0.365 0.235 0.495 21.841 0.291 7.9 2.1 1.3
20:20:33.042 52 21517.90 84.05 4096 49.95 0.368 0.233 0.502 21.721 0.291 7.9 2.1 1.3
20:21:33.045 53 21582.05 84.30 4096 49.95 0.367 0.230 0.503 21.508 0.288 7.9 2.0 1.3
20:22:33.043 54 21693.77 84.74 4096 49.90 0.365 0.233 0.496 22.114 0.283 7.9 2.2 1.4
20:23:33.044 55 21809.38 85.19 4096 50.03 0.363 0.232 0.494 21.552 0.283 7.9 2.1 1.3
20:24:33.043 56 21323.70 83.30 4096 49.98 0.371 0.232 0.511 35.232 0.298 7.9 2.0 1.3
20:25:33.044 57 21538.38 84.13 4096 50.03 0.367 0.232 0.503 21.807 0.292 7.9 2.2 1.4
20:26:33.043 58 21482.22 83.91 4096 49.95 0.368 0.233 0.504 21.613 0.291 7.9 2.2 1.4
20:27:33.043 59 21915.38 85.61 4096 50.00 0.361 0.230 0.492 22.282 0.285 7.9 2.1 1.4
20:28:33.042 60 21716.03 84.83 4096 49.93 0.365 0.232 0.497 21.166 0.285 7.9 2.0 1.3
Can anyone let me know how can i convert this file.
If these files are consistent and they have the same structure, what you can do is to read the HTML file as a regular file in Python, strip away this part:
Slave summary report for slave=localhost-0
Console log: Slave stdout/stderr
Link to Run Definitions: run1 For loops: xfersize=4k
xfersize=16k
xfersize=32k
xfersize=64k
xfersize=128k
19:28:33.002 Starting RD=run1; I/O rate: Uncontrolled MAX; elapsed=7200 warmup=600; For loops: xfersize=4k
And then use the remaining as a text file that you can then parse to Pandas as it really is just a structural text file.
Something like this might help you get started:
import pandas as pd
import io
with open('test.html') as f:
content = f.readlines()
body = content[14:] # Skip header
df = pd.read_csv(io.StringIO('\n'.join(body)), delim_whitespace=True)
print(df)
vdbench has this as a built in. /vdbench parseflat -i flatfile.html -o something.csv . Look in the pdf.
jim

Why latest version of sys-stat not showing average values after kill..?

The old output from my logs which showing Avg values after ctrl+c
#pidstat 1 -p `pgrep bgpd`
Linux 3.16.7-gd1a374d-dellz9100on (rtr1) Tuesday 29 May 2018 _x86_64_ (4 CPU)
05:07:01 UTC UID PID %usr %system %guest %CPU CPU Command
05:07:07 UTC 0 2144 0.00 0.00 0.00 0.00 2 bgpd
05:07:08 UTC 0 2144 0.00 0.00 0.00 0.00 2 bgpd
05:07:09 UTC 0 2144 0.00 0.00 0.00 0.00 2 bgpd
05:07:10 UTC 0 2144 0.00 0.00 0.00 0.00 2 bgpd
05:07:11 UTC 0 2144 0.00 0.00 0.00 0.00 2 bgpd
05:07:12 UTC 0 2144 0.00 0.00 0.00 0.00 2 bgpd
^C
Average: 0 2144 0.09 0.00 0.00 0.09 - bgpd
Now it is not showing Avg values
# pidstat 1 -p `pgrep bgpd`
Linux 3.16.7-gd1a374d-dellz9100on (rtr1) 06/13/18 _x86_64_ (4 CPU)
07:32:51 PID %usr %system %guest %CPU CPU Command
07:32:56 2144 0.00 0.00 0.00 0.00 0 bgpd
07:32:57 2144 0.00 0.00 0.00 0.00 1 bgpd
07:32:58 2144 0.00 0.00 0.00 0.00 1 bgpd
^C
#
Version of pidstat
root#rtr1:/home/ocnos# pidstat -V
sysstat version 10.0.5
(C) Sebastien Godard (sysstat <at> orange.fr)
root#rtr1:/home/ocnos#
The mail reply from Developer of this tools
No, your previous version was not 9.x.
This feature (displaying average stats when Ctrl/C is hit) was added to pidstat in release 10.1.4.
Regards,
Sebastien.

Figuring out Linux memory usage

I've got a bit weird Linux memory usage I'm trying to figure out.
I've got 2 processes: nxtcapture & nxtexport. None of these processes really allocate much memory however they both mmap a 1 TB file each. nxtexport has no heap allocations (apart from during startup). nxtcapture writes sequentially to the file and nxtexport reads sequentially. Since nxtexport reads from the tail of nxtcapture I don't really have any read IO.
ing992:~# iostat -m
Linux 4.4.52-nxt (ing992) 05/25/17 _x86_64_ (32 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
29.17 1.99 0.96 0.06 0.00 67.82
Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn
loop0 0.02 0.00 0.00 2 0
sdf 16.47 0.06 0.85 4207 61442
sdf1 0.00 0.00 0.00 5 0
sdf2 0.01 0.00 0.00 77 0
sdf3 16.45 0.06 0.85 4115 61442
sdf4 0.00 0.00 0.00 7 0
sde 15.45 0.01 0.85 1032 61442
sde1 0.00 0.00 0.00 5 0
sde2 0.00 0.00 0.00 0 0
sde3 15.44 0.01 0.85 1017 61442
sde4 0.00 0.00 0.00 7 0
sdb 43.08 0.00 15.72 22 1136368
sda 43.07 0.00 15.72 21 1136406
sdc 43.42 0.04 15.72 2711 1136332
sdd 43.07 0.00 15.72 20 1136301
md127 0.01 0.00 0.00 77 0
md126 23.77 0.07 0.85 5132 61145
This all great. However, looking at the memory usage I can see the following:
Which shows that more than half of my memory is unavailable?! How is this possible? I understand that mmap will keep pages cached. But shouldn't such (non-dirty) pages be counted as available? What's going on here? How can I debug this?
free -m
total used free shared buffers cached
Mem: 32020 31608 412 221 13 9655
-/+ buffers/cache: 21939 10081
Swap: 0 0 0

High CPU Usage in php-fpm

We have very strange problem on our Web-project.
We use:
2 Intel(R) Xeon(R) CPU E5520 # 2.27GHz
12 GB memory
We have about 20 hits per seconds. 4-5 requests per second are heavy – it is a search requests.
We use nginx + php-fpm (5.3.22)
MySQL server installed on another machine.
Most of time we have load average less than 10 and cpu usage about 50%
Sometimes we get cpu usage about 95% and after that load average grows to 50 and more!!!
You can see Load Average and CPU Usage here (my reputation low to send images here)
Load Average
CPU Usage
We have to reload php-fpm ( /etc/init.d/php-fpm reload) to normalize situation.
This can happens 4-5 times per day.
I tried to use strace to exam this situation.
Sorry for long logs! This output of command strace -cp PID
PID – is the random php-fpm process id (We start 100 php-fpm processes).
This two results in the moment with high cpu usage.
Process 17272 attached - interrupt to quit
Process 17272 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
65.56 0.008817 267 33 munmap
13.38 0.001799 900 2 clone
9.66 0.001299 2 589 read
7.43 0.000999 125 8 mremap
2.84 0.000382 1 559 96 access
0.59 0.000080 40 2 waitpid
0.29 0.000039 0 627 gettimeofday
0.16 0.000022 0 346 write
0.04 0.000006 0 56 getcwd
0.04 0.000005 0 348 poll
0.00 0.000000 0 55 open
0.00 0.000000 0 69 close
0.00 0.000000 0 17 chdir
0.00 0.000000 0 189 time
0.00 0.000000 0 28 lseek
0.00 0.000000 0 2 pipe
0.00 0.000000 0 17 times
0.00 0.000000 0 8 brk
0.00 0.000000 0 8 getrusage
0.00 0.000000 0 18 setitimer
0.00 0.000000 0 8 flock
0.00 0.000000 0 1 nanosleep
0.00 0.000000 0 11 rt_sigaction
0.00 0.000000 0 13 rt_sigprocmask
0.00 0.000000 0 6 pread64
0.00 0.000000 0 7 pwrite64
0.00 0.000000 0 33 mmap2
0.00 0.000000 0 18 4 stat64
0.00 0.000000 0 34 lstat64
0.00 0.000000 0 92 fstat64
0.00 0.000000 0 63 fcntl64
0.00 0.000000 0 53 clock_gettime
0.00 0.000000 0 1 socket
0.00 0.000000 0 1 1 connect
0.00 0.000000 0 9 accept
0.00 0.000000 0 1 send
0.00 0.000000 0 21 recv
0.00 0.000000 0 9 1 shutdown
0.00 0.000000 0 1 getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00 0.013448 3363 102 total
[root#hp-php ~]# strace -cp 30767
Process 30767 attached - interrupt to quit
Process 30767 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
52.88 0.016926 220 77 munmap
29.06 0.009301 2 4343 read
8.73 0.002794 466 6 clone
3.59 0.001149 0 5598 time
3.18 0.001017 0 3745 write
1.12 0.000358 0 7316 gettimeofday
0.64 0.000205 1 164 fcntl64
0.39 0.000124 21 6 waitpid
0.22 0.000070 0 1496 326 access
0.13 0.000041 0 3769 poll
0.03 0.000009 0 151 close
0.02 0.000008 0 114 clock_gettime
0.02 0.000007 0 110 getcwd
0.00 0.000000 0 112 open
0.00 0.000000 0 38 chdir
0.00 0.000000 0 47 lseek
0.00 0.000000 0 6 pipe
0.00 0.000000 0 38 times
0.00 0.000000 0 135 brk
0.00 0.000000 0 3 ioctl
0.00 0.000000 0 14 getrusage
0.00 0.000000 0 38 setitimer
0.00 0.000000 0 19 flock
0.00 0.000000 0 40 mlock
0.00 0.000000 0 40 munlock
0.00 0.000000 0 6 nanosleep
0.00 0.000000 0 27 rt_sigaction
0.00 0.000000 0 31 rt_sigprocmask
0.00 0.000000 0 13 pread64
0.00 0.000000 0 18 pwrite64
0.00 0.000000 0 78 mmap2
0.00 0.000000 0 111 10 stat64
0.00 0.000000 0 49 lstat64
0.00 0.000000 0 182 fstat64
0.00 0.000000 0 8 socket
0.00 0.000000 0 8 5 connect
0.00 0.000000 0 19 accept
0.00 0.000000 0 7 send
0.00 0.000000 0 66 recv
0.00 0.000000 0 3 recvfrom
0.00 0.000000 0 20 1 shutdown
0.00 0.000000 0 5 setsockopt
0.00 0.000000 0 4 getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00 0.032009 28080 342 total
Yes, out scripts reads much information. This is normal.
But why munmap works very long??!! And when we have problem munmap ALWAYS in top!
For comparison this is result of trace random php-fpm process in regular situation:
Process 28606 attached - interrupt to quit
Process 28606 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
45.72 0.001816 1 2601 read
32.88 0.001306 435 3 clone
9.19 0.000365 0 2175 write
6.95 0.000276 0 7521 time
2.24 0.000089 0 4158 gettimeofday
2.01 0.000080 1 114 brk
0.28 0.000011 0 2166 poll
0.20 0.000008 0 833 155 access
0.20 0.000008 0 53 recv
0.18 0.000007 2 3 waitpid
0.15 0.000006 0 18 munlock
0.00 0.000000 0 69 open
0.00 0.000000 0 96 close
0.00 0.000000 0 29 chdir
0.00 0.000000 0 36 lseek
0.00 0.000000 0 3 pipe
0.00 0.000000 0 29 times
0.00 0.000000 0 10 getrusage
0.00 0.000000 0 5 munmap
0.00 0.000000 0 1 ftruncate
0.00 0.000000 0 29 setitimer
0.00 0.000000 0 1 sigreturn
0.00 0.000000 0 11 flock
0.00 0.000000 0 18 mlock
0.00 0.000000 0 5 nanosleep
0.00 0.000000 0 19 rt_sigaction
0.00 0.000000 0 24 rt_sigprocmask
0.00 0.000000 0 6 pread64
0.00 0.000000 0 12 pwrite64
0.00 0.000000 0 69 getcwd
0.00 0.000000 0 5 mmap2
0.00 0.000000 0 35 7 stat64
0.00 0.000000 0 41 lstat64
0.00 0.000000 0 96 fstat64
0.00 0.000000 0 108 fcntl64
0.00 0.000000 0 87 clock_gettime
0.00 0.000000 0 5 socket
0.00 0.000000 0 4 4 connect
0.00 0.000000 0 16 2 accept
0.00 0.000000 0 8 send
0.00 0.000000 0 15 shutdown
0.00 0.000000 0 4 getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00 0.003972 20541 168 total
Process 29168 attached - interrupt to quit
Process 29168 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
54.81 0.002366 1 1717 read
26.41 0.001140 1 1696 poll
8.29 0.000358 0 1662 write
7.37 0.000318 2 131 121 stat64
1.53 0.000066 0 3249 gettimeofday
1.18 0.000051 0 746 525 access
0.23 0.000010 0 27 fcntl64
0.19 0.000008 0 62 brk
0.00 0.000000 0 1 restart_syscall
0.00 0.000000 0 7 open
0.00 0.000000 0 16 close
0.00 0.000000 0 3 chdir
0.00 0.000000 0 1039 time
0.00 0.000000 0 1 lseek
0.00 0.000000 0 3 times
0.00 0.000000 0 3 ioctl
0.00 0.000000 0 1 getrusage
0.00 0.000000 0 4 munmap
0.00 0.000000 0 3 setitimer
0.00 0.000000 0 1 sigreturn
0.00 0.000000 0 1 flock
0.00 0.000000 0 1 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 2 pwrite64
0.00 0.000000 0 3 getcwd
0.00 0.000000 0 4 mmap2
0.00 0.000000 0 7 fstat64
0.00 0.000000 0 9 clock_gettime
0.00 0.000000 0 6 socket
0.00 0.000000 0 5 1 connect
0.00 0.000000 0 3 2 accept
0.00 0.000000 0 5 send
0.00 0.000000 0 64 recv
0.00 0.000000 0 3 recvfrom
0.00 0.000000 0 2 shutdown
0.00 0.000000 0 1 getsockopt
------ ----------- ----------- --------- --------- ----------------
100.00 0.004317 10489 649 total
And you can see that munmap not in top.
Now we don’t have ideas how to solve this problem :(
We examined next potential problems and answers are "NO":
additioan user activity
long scripts execution (several seconds)
using swap
Can you help us?

Can I measure memory taken by mod_perl?

Problem: my mod_perl leaks and I cannot control it.
I run mod_perl script under Ubuntu (production code).
Usually there are 8-10 script instances running concurrently.
According to Unix "top" utilty each instance takes 55M of memory.
55M is a lot, but I was told here that most of this memory is shared.
The memory is leaking.
There are 512M on the server.
There is a significant decrease of free memory in 24 hours after reboot.
Test: free memory on the system at the moment 10 scripts are running:
-after reboot: 270M
-in 24 hours since reboot: 50M
In 24 hours memory taken by each script is roughly the same - 55M (according to "top" utility).
I don't understand where the memory leakes out.
And don't know how can I find the leaks.
I share memory, I preload all the modules required by the script in startup.pl.
One more test.
A very simple mod_perl script ("Hello world!") takes 52M (according to "top")
According to "Practical mod_perl" I can use GTop utility to measure the real memory taken by mod_perl.
I have made a very simple script that measures the memory with GTop.
It shows there are 54M real memory taken by a very simple perl script!
54 Megabytes by "Hello world"?!!!
proc-mem-size: 59,707392
proc-mem-share: 52,59264
diff: 54,448128
There must be something wrong in the way I measure mod_perl memory.
Help please!
This problem is driving me mad for several days.
These are the snapshots of "top" output after reboot and in 24 hours after reboot.
The processes are sorted by Memory.
---- RIGHT AFTER REBOOT ----
top - 10:25:24 up 55 min, 2 users, load average: 0.10, 0.07, 0.07
Tasks: 59 total, 3 running, 56 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni, 97.3%id, 0.7%wa, 0.0%hi, 0.0%si, 2.0%st
Mem: 524456k total, 269300k used, 255156k free, 12024k buffers
Swap: 0k total, 0k used, 0k free, 71276k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2307 www-data 15 0 58500 27m 5144 S 0.0 5.3 0:02.02 apache2
2301 www-data 15 0 58492 27m 4992 S 0.0 5.3 0:02.09 apache2
2302 www-data 15 0 57936 26m 4960 R 0.0 5.2 0:01.74 apache2
2895 www-data 15 0 57812 26m 5048 S 0.0 5.2 0:00.98 apache2
2903 www-data 15 0 56944 26m 4792 S 0.0 5.1 0:01.12 apache2
2886 www-data 15 0 56860 26m 4784 S 0.0 5.1 0:01.20 apache2
2896 www-data 15 0 56520 26m 4804 S 0.0 5.1 0:00.85 apache2
2911 www-data 15 0 56404 25m 4768 S 0.0 5.1 0:00.87 apache2
2901 www-data 15 0 56520 25m 4744 S 0.0 5.1 0:00.84 apache2
2893 www-data 15 0 56608 25m 4740 S 0.0 5.1 0:00.73 apache2
2277 root 15 0 51504 22m 6332 S 0.0 4.5 0:01.02 apache2
2056 mysql 18 0 98628 21m 5164 S 0.0 4.2 0:00.64 mysqld
3162 root 15 0 6356 3660 1276 S 0.0 0.7 0:00.00 vi
2622 root 15 0 8584 2980 2392 R 0.0 0.6 0:00.07 sshd
3083 root 15 0 8448 2968 2392 S 0.0 0.6 0:00.06 sshd
3164 par 15 0 5964 2828 1868 S 0.0 0.5 0:00.05 proftpd
1 root 18 0 3060 1900 576 S 0.0 0.4 0:00.00 init
2690 root 17 0 4272 1844 1416 S 0.0 0.4 0:00.00 bash
3151 root 15 0 4272 1844 1416 S 0.0 0.4 0:00.00 bash
2177 root 15 0 8772 1640 520 S 0.0 0.3 0:00.00 sendmail-mta
2220 proftpd 15 0 5276 1448 628 S 0.0 0.3 0:00.00 proftpd
2701 root 15 0 2420 1120 876 R 0.0 0.2 0:00.09 top
1966 root 18 0 5396 1084 692 S 0.0 0.2 0:00.00 sshd
---- ROUGHLY IN 24 HOURS AFTER REBOOT
top - 17:45:38 up 23:39, 1 user, load average: 0.02, 0.09, 0.11
Tasks: 55 total, 2 running, 53 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 524456k total, 457660k used, 66796k free, 127780k buffers
Swap: 0k total, 0k used, 0k free, 114620k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16248 www-data 15 0 63712 35m 6668 S 0.0 6.8 0:23.79 apache2
19417 www-data 15 0 60396 31m 6472 S 0.0 6.2 0:10.95 apache2
19419 www-data 15 0 60276 31m 6376 S 0.0 6.1 0:11.71 apache2
19321 www-data 15 0 60480 29m 4888 S 0.0 5.8 0:11.51 apache2
21241 www-data 15 0 58632 29m 6260 S 0.0 5.8 0:05.18 apache2
22063 www-data 15 0 57400 28m 6396 S 0.0 5.6 0:02.05 apache2
21240 www-data 15 0 58520 27m 4856 S 0.0 5.5 0:04.60 apache2
21236 www-data 15 0 58244 27m 4868 S 0.0 5.4 0:05.24 apache2
22499 www-data 15 0 56736 26m 4776 S 0.0 5.1 0:00.70 apache2
2055 mysql 15 0 100m 25m 5656 S 0.0 5.0 0:20.95 mysqld
2277 root 18 0 51500 22m 6332 S 0.0 4.5 0:01.07 apache2
22686 www-data 15 0 53004 21m 4092 S 0.0 4.3 0:00.21 apache2
22689 root 15 0 8584 2980 2392 R 0.0 0.6 0:00.06 sshd
2176 root 15 0 8768 1928 736 S 0.0 0.4 0:00.00 sendmail-
+mta
1 root 18 0 3064 1900 576 S 0.0 0.4 0:00.02 init
22757 root 15 0 4268 1844 1416 S 0.0 0.4 0:00.00 bash
2220 proftpd 18 0 5276 1448 628 S 0.0 0.3 0:00.00 proftpd
22768 root 15 0 2424 1100 876 R 0.0 0.2 0:00.00 top
1965 root 15 0 5400 1088 692 S 0.0 0.2 0:00.00 sshd
2258 root 18 0 3416 1036 820 S 0.0 0.2 0:00.01 cron
1928 klog 25 0 2248 1008 420 S 0.0 0.2 0:00.04 klogd
1946 messageb 19 0 2648 804 596 S 0.0 0.2 0:01.63 dbus-daem
+on
1908 syslog 18 0 2016 716 556 S 0.0 0.1 0:00.17 syslogd
It doesn't actually look like the number of apache/mod_perl processes in existence or the memory they use has changed much between the two reports you post. I note you did not post the header for the second report. It would be interesting to see the "cached" figure after 24 hours. I am going to go out on a limb and guess that this is where your memory is going - Linux is using it for caching file I/O. You can think of the file I/O cache as essentially free memory, since Linux will make that memory available if processes need it.
You can also check that this is what's going on by performing
sync; echo 3 > /proc/sys/vm/drop_caches
as root to cause the memory in use by the caches to be released, and confirming that this causes the amount of free memory reported to revert to initial values.

Resources