WHM Server receiving lots of "FAILED: cphulk" - linux

I have a WHM server on GoDaddy.
I'm receiving quite a lot (3-4 a day) mails about a process failing and recovering itself. Happens mostly to "cphulkd" but also to "lfd".
My server:
WHM version v68.0.33. Contains two websites (One Moodle and one Wordpress). 2GB Ram, 60GB HD.
This is the whole mail:
Server s50-62-22-123.secureserver.net Primary IP Address
50.62.22.123 Service Name cphulkd Service Status failed ⛔ Notification The service “cphulkd” appears to be down. Service Check
Method The system’s command to check or to restart this service
failed. Number of Restart Attempts 1 Service Check Raw Output (XID
ejd2e7) The “cphulkd” service is down.
The subprocess “/usr/local/cpanel/scripts/restartsrv_cphulkd” reported
error number 255 when it ended. Startup Log Starting cPHulkd...
Started. Starting PID 3789: cPhulkd - processor - dormant mode -
accepting connections Memory Information Used 2.43 GB Available
1.57 GB Installed 4 GB Load Information 0.17 0.19 0.18 Uptime 2 days, 18 hours, 59 minutes, and 37 seconds IOStat Information
avg-cpu: %user %nice %system %iowait %steal %idle
0.62 0.11 0.12 0.17 0.00 98.99 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn Top Processes
PID Owner CPU % Memory % Command 18850 root 2.45 2.29 spamd
child 3452 root 0.94 2.35
/usr/local/cpanel/3rdparty/perl/524/bin/perl -T -w
/usr/local/cpanel/3rdparty/bin/spamd --max-spare=1 --max-children=3
--allowed-ips=127.0.0.1,::1 --pidfile=/var/run/spamd.pid --listen=5 1488 mysql 0.52 7.49 /usr/sbin/mysqld --basedir=/usr
--datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=s50-62-22-179.secureserver.net.err --open-files-limit=10000 --pid-file=/var/lib/mysql/s50-62-22-179.secureserver.net.pid 18854 dovecot 0.31 0.06 dovecot/auth 20291 root 0.07 0.71 lfd -
sleeping
Any ideas?
What's weird is that the mail says I have 4GB but I only have 2GB..

Related

Low CPU usage on ubuntu 14.04 and nodejs

I have two servers running the exact same nodejs application. I am doing load testing and I can't figure out why one of my servers will not utilize more CPU and RAM.
It is much slower when load testing yet it is not even close to utilizing all the free CPU and memory.
If I run top during the load test, these are the numbers I am getting
PID User PR NI VIRT RES SHR S %CPU %MEM TIME COMMAND
1308 ubuntu 20 0 1002524 87508 9788 S 5.3 4.3 0:03.06 nodejs
1307 ubuntu 20 0 925540 75288 9436 S 5.0 3.7 0:02.17 nodejs
1308 ubuntu 20 0 992076 77068 9788 S 14.0 3.8 0:03.48 nodejs
1307 ubuntu 20 0 937140 86904 9436 S 2.7 4.3 0:02.25 nodejs
1308 ubuntu 20 0 1012936 98000 9788 S 14.3 4.8 0:03.91 nodejs
1307 ubuntu 20 0 942940 92644 9436 S 1.0 4.5 0:02.28 nodejs
1307 ubuntu 20 0 943204 92976 9436 S 6.0 4.6 0:02.46 nodejs
1308 ubuntu 20 0 1011764 96804 9788 S 6.0 4.7 0:04.09 nodejs
1307 ubuntu 20 0 933644 83388 9436 S 8.6 4.1 0:02.72 nodejs
1308 ubuntu 20 0 1008720 93556 9788 S 5.3 4.6 0:04.25 nodejs
1308 ubuntu 20 0 1000184 85256 9788 S 8.6 4.2 0:04.51 nodejs
1307 ubuntu 20 0 944092 93988 9436 S 7.6 4.6 0:02.95 nodejs
1307 ubuntu 20 0 941748 91816 9436 S 15.0 4.5 0:03.40 nodejs
1308 ubuntu 20 0 1004832 90008 9788 S 1.3 4.4 0:04.55 nodejs
1307 ubuntu 20 0 933460 82632 9436 S 9.0 4.1 0:03.67 nodejs
Running two processes I don't see memory getting above 4.7% and CPU is at 14%.
It is taking twice as long to serve the exact same resources as a machine with one core and half the memory.
My other server is using %52 of CPU. Granted it has one core and the above has two, but it doesn't seem like that would make the difference.
I downloaded cpufrequtils and set the GOVERNOR to performance but I don't think it is working. This is what I get when I run cpufreq-info
analyzing CPU 0:
no or unknown cpufreq driver is active on this CPU
maximum transition latency: 4294.55 ms.
analyzing CPU 1:
no or unknown cpufreq driver is active on this CPU
maximum transition latency: 4294.55 ms.
Here is the CPU
Intel(R) Core(TM)2 CPU 6300 # 1.86GHz
Any ideas or hints would be appreciated
If both servers are running the same node.js application, then you may want to
compare the other settings on the machines, are they the same? ulimit -a
Also for dual/multicore core machines, node.js is single threaded, it will not benefit from dual/multicores unless you use cluster to make use of it.

ubuntu 14.04.1 server idle load average 1.00

Scratching my head here. Hoping someone can help me troubleshoot.
I have a Dell PowerEdge SC1435 server which had been running with a previous version of ubuntu for a while. (I believe it was 13.10 server x64)
I recently reformatted the drive (SSD) and installed ubuntu server 14.04.1 x64.
All seemed fine through the install but the machine hung on first boot at the end of the kernel output, just before I would expect the screen to clear and a logon prompt appear. There were no obvious errors at the end of the kernel output that I saw. (There was a message about "not using cpu thermal sensor that is unreliable" but that appears to be there regardless of whether it boots or not)
I gave it a good 5 minutes and then forced a reboot. To my surprise it booted to the logon prompt in about 1-2 seconds after bios post. I rebooted again and it seemed to pause for a few extra seconds where it hung before, but proceeded to the login screen. Rebooting again it was fast again. So at this point I thought it was just one of those random one-off glitches that I would never explain so I moved on.
I installed a few packages (exact same packages installed on the same OS version on other hardware), did apt upgrade and dist-upgrade then rebooted. It seemed to hang again so I drove to the datacentre and connected a console only to get a blank screen. Forced reboot again. (also setup ipmi for remote rebooting and got rid of the grub recordfail so it would not wait for me to press enter!)
That was very late last night. I came home, did a few reboots with no issue so went to bed.
Today I did a reboot again to check it and again it crashed somewhere. I remotely force rebooted it.
As this point I started digging a little more and immediately noticed something really strange.
top - 14:18:35 up 8 min, 1 user, load average: 1.00, 0.85, 0.45
Tasks: 148 total, 1 running, 147 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.1 us, 0.3 sy, 0.0 ni, 99.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 33013620 total, 338928 used, 32674692 free, 9740 buffers
KiB Swap: 3906556 total, 0 used, 3906556 free. 47780 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 33508 2772 1404 S 0.0 0.0 0:03.82 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
6 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/u16:0
8 root 20 0 0 0 0 S 0.0 0.0 0:00.24 rcu_sched
9 root 20 0 0 0 0 S 0.0 0.0 0:00.02 rcuos/0
10 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuos/1
11 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuos/2
This server is completely unused and idle, yet it has a 1 minute load average of exactly 1.00?
As I watch the other values - the 5 minute and 15 minute also appear to be heading towards 1.00 so I assume they will all reach 1.00 at some point. (The "1 Running" is the top process)
I have never had this before and since I have no idea what is causing the startup crashing, I am assuming at this point that the two are likely related.
What I would like to do is identify (and hopefully eliminate) what is causing that false load average and my crashing issue.
So far I have been unable to identify what process could be waiting for a resource of some kind to generate that load average.
I would very much appreciate it if someone could help me to try and track it down.
top shows all processes pretty much always sleeping. Some occasionally popping up top but I think that's pretty normal. CPU usage is mostly showing 100% IDLE, with very occasional dips to 99% or so.
nmon doesn't show me much. everything just looks idle.
iotop shows pretty much no traffic whatsoever. (again, very occasional spots of disk access)
interrupt frequency seems low. way below 100/sec from what I can see.
I saw numerous google discussions suggesting this:
echo 100 > /sys/module/ipmi_si/parameters/kipmid_max_busy_us
..no effect.
RAM in the server is ECC and test passes.
Server install was 'minimal' (F4 option) with OpenSSH server ticked during install.
Installed a few packages afterwards including vim, bcache-tools, bridge-utils, qemu, software-properties-common, open-iscsi, qemu-kvm, cpu-checker, socat, ntp and nodejs. (Think that is about it)
I have tried disabling and removing the bcache kernel module. no effect.
stopped iscsi service.. no effect. (although there is absolutely nothing configured on this server yet)
I will leave it there before this gets insanely long. If anyone could help me try to figure this out it would be very much appreciated.
Cheers,
James
the load average of 1.0 is an artefact of bcache write-back thread staying in uninterruptible sleep. It may be corrected in 3.19 kernels or newer. See this Debian bug report for instance.

OpCache not caching

ive recently activated opcache but it doesn't appear to be working.
It's confirmed activated via phpinfo()
As you can see
0 hits
1 miss
1 cached script (opcached gui)
What am I missing?
Server is a Linux server centos 6.5 vps
PHP 5.5
A bit more info about opcache configuration
opcache_enabled true
cache_full false
restart_pending false
restart_in_progress false
used_memory 8.54 MB
free_memory 503.46 MB
wasted_memory 0 bytes
current_wasted_percentage 0.00%
buffer_size 4194304
used_memory 446.41 kB
free_memory 3.56 MB
number_of_strings 4895
num_cached_scripts 1
num_cached_keys 1
max_cached_keys 65407
hits 0
start_time Sat, 26 Jul 14 23:20:32 +0000
last_restart_time never
oom_restarts 0
hash_restarts 0
manual_restarts 0
misses 1
blacklist_misses 0
blacklist_miss_ratio 0.00%
opcache_hit_rate 0.00%
This looks like you are using cgi rather than mod_php5. The shared memory area (SMA) is used for both, but it only persists request-to-request for the latter.
I had this issue on a WHM/cPanel server today. As TerryE suggests, you are probably running CGI or suPHP. Change to DSO.

How to find out the cause of CPU 100% of Node.js server?

I'm running Node.js server with socket.io. It's simple chat server. It's been 2 years so the versions of software are pretty old, so I updated them recently. After updates, the server consumes CPU 100% frequently. It has worked well for 2 years so I don't think the cause is application code, but I cannot find out what the problem is.
Before I updated:
Node.js 0.8.14
socket.io 0.9.16
express 2.5.2
Now I'm using:
Node.js 0.10.28 ~ 0.11.13 (tried both)
socket.io 1.0.1
express 4.1.1
I've tried benchmark but I couldn't reproduce. I've figured out the template rendering is pretty slow, but my chat server is for mobile apps so it doesn't use html page a lot. Only admin page is using template engine, but CPU 100% happens when I didn't see admin pages.
Using strace, I got this:
strace -r -p 32224 -c
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
16.91 0.003417 35 97 futex
14.47 0.002923 8 347 72 epoll_ctl
14.10 0.002848 20 144 write
11.32 0.002286 15 152 read
6.27 0.001266 18 70 close
5.77 0.001165 19 61 61 connect
5.53 0.001117 6 183 clock_gettime
5.20 0.001051 117 9 munmap
4.65 0.000940 5 173 gettimeofday
4.19 0.000846 14 61 socket
3.72 0.000752 6 122 ioctl
3.36 0.000679 12 58 epoll_wait
2.34 0.000473 7 72 getsockopt
1.95 0.000394 56 7 mmap
0.22 0.000045 23 2 open
------ ----------- ----------- --------- --------- ----------------
100.00 0.020202 1558 133 total
However, I don't know how to analyze this report. epoll_ctl seems to be used by event loop, and the errors of epoll_ctl may be caused by the errors of connect, right? I found that syscall connect is for socket connection, but I cannot go further.
This strace report is 2 minutes long. There aren't so many users. Just 2~5 users for that time.
Can I find out the cause using report? or Do I have to find other way to debug?
There is the V8 Profiler that can output a report that can be read in Chrome Profiling tab. If you use PM2 and Keymetrics, it's really easy. Just install v8-profiler and pmx modules. Make sure to require the pmx module in the script and then start profiling via the Keymetrics site. You can always use the V8 Profiler alone to get the same report. It's a little more work though.

Cannot Understand the TOP command output on Hadoop Datanode

Hi I just installed Cloudera Manager on my cluster, 1 namenode and 4 datanodes, each data nodes has 64 GB RAM, 24 cores Xeon CPU, 16 1T disks SAS..etc.
I installed brand new Redhat Linux and upgraded to 6.5, each disk has been logically set up as RAID0 since there is no JBOD option available on the array controller.
I am running a hive query and here is the top command on the data node. I am so confused and wondering if some experienced hadoop admin could help me understand if my cluster is working fine.
Why there is only 1 task running out of 897 while the other 896 sleeping? There are 2271 mappers for that hive query and it is only 80% on the mapper side.
The load average is 8.66, I read from here that if you computer is working hard, the load average should be around the number of cores. Is my datanode working hard enought?
List item 69/70 memory has been "used", seems like the active yarn process is fairly low memory cost, how could those 64GB memory be so easily used up?
Here is the top output:
top - 22:50:24 up 1 day, 8:24, 3 users, load average: 8.66, 8.50, 7.95
Tasks: 897 total, 1 running, 896 sleeping, 0 stopped, 0 zombie
Cpu(s): 32.3%us, 5.2%sy, 0.0%ni, 62.3%id, 0.2%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 70096068k total, 69286800k used, 809268k free, 222268k buffers
Swap: 4194296k total, 0k used, 4194296k free, 61468376k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
439 yarn 20 0 1417m 591m 19m S 193.9 0.9 1:06.12 java
561 yarn 20 0 1401m 581m 19m S 193.2 0.8 0:19.75 java
721 yarn 20 0 1415m 561m 19m S 172.0 0.8 0:08.54 java
611 yarn 20 0 1415m 574m 19m S 127.0 0.8 0:16.87 java
354 yarn 20 0 1428m 595m 19m S 121.4 0.9 0:35.96 java
27418 yarn 20 0 1513m 483m 18m S 13.6 0.7 18:26.14 java
16895 hdfs 20 0 1438m 410m 18m S 9.6 0.6 103:23.70 java
3726 hdfs 20 0 860m 249m 21m S 1.7 0.4 2:12.28 java
I am fairly new at system admin and any metric tool or common sense will be much appreciated! Thanks!

Resources