Apache - High CPU usage after upgrading to Amazon Linux 2 - linux

Amazon Linux 1:
Server version: Apache/2.4.54 (Amazon)
Server built: Jul 11 2022 21:47:38
Server's Module Magic Number: 20120211:124
Server loaded: APR 1.6.3, APR-UTIL 1.5.4, PCRE 8.21 2011-12-12
PHP 7.3.30 (cli) (built: Oct 6 2021 20:34:22) ( NTS )
Amazon Linux 2:
Server version: Apache/2.4.54 ()
Server built: Jun 30 2022 11:02:23
Server's Module Magic Number: 20120211:124
Server loaded: APR 1.7.0, APR-UTIL 1.6.1, PCRE 8.32 2012-11-30
PHP 7.4.30 (cli) (built: Jun 23 2022 20:19:00) ( NTS )
The server's are configured via automation and loaded into ALB/ASG's.
Instance size is m4.large (2x vCPU, 8GiB Memory)
Auto-Scaling group is configured with Min:4 Max:8
This is what my httpd.conf file looks like:
<IfModule prefork.c>
StartServers 8
MinSpareServers 5
MaxSpareServers 20
ServerLimit 256
MaxClients 256
MaxConnectionsPerChild 4000
</IfModule>
On Amazon Linux 1, the site works as expected. The ASG spawns 4 instances and they each hover at 40-60% CPU utilization during peak hours.
The same build on Amazon Linux 2 yields wildly different results. All instances immediately get bombarded by a huge number of httpd processes.
The ASG scales up to 8 instances with every single one at 90%+ CPU usage. The instances start to lock up which causes the target group to mark them as "unhealthy" and the ASG to rotate them out endlessly. The website obviously does not work.
What could be causing them to behave so differently? What steps can I take to try and mitigate this? I'm honestly pretty new to all this so I don't know where to start.

Related

How to confirm if TimeSync service is enabled on a RHEL 8.2 VM running in Azure?

Im new to linux - so im abit confused if i have to do any best practice time sync config with Azure, or not?
From https://learn.microsoft.com/en-us/windows-server/networking/windows-time-service/accurate-time?redirectedfrom=MSDN#allowing-linux-to-use-hyper-v-host-time
The above link mentions: "For Linux guests running in Hyper-V, clients are typically configured to use the NTP daemon for time synchronization against NTP servers. If the Linux distribution supports the TimeSync version 4 protocol and the Linux guest has the TimeSync integration service enabled, then it will synchronize against the host time. This could lead to inconsistent time keeping if both methods are enabled."
How can i confirm this?
How can i confirm if TimeSync service is enabled on my RHEL 8.2 VM running in Azure?
Also how can i confirm if my ntp daemaon is configured for time synchronization against NTP servers?
As part of my investigation I have run the following on the RHEL 8.2 VM (running in Azure)
My findings on this lab are that ntp is not configured directly (/etc/ntp.conf does not exist and (as recorded in earlier comments) the ntpq command is not found,.
[user#vm-aep-dev-eastu ~]$ service ntpd status
Redirecting to /bin/systemctl status ntpd.service
Unit ntpd.service could not be found
.
however "chrony" is active.Chrony appears to be synchronising the system clock with NTP servers.
systemctl status chronyd
● chronyd.service - NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2020-07-16 08:58:39 UTC; 7h ago
Other details:
$ /sbin/lsmod | egrep -i "^hv|hyperv"
hv_utils 36864 2
hv_balloon 28672 0
hyperv_fb 20480 1
hv_netvsc 86016 0
hv_storvsc 20480 4
hid_hyperv 16384 0
hyperv_keyboard 16384 0
hv_vmbus 114688 7 hv_balloon,hv_utils,hv_netvsc,hid_hyperv,hv_storvsc,hyperv_keyboard,hyperv_fb
Thanks
From the document Time sync for Linux VMs in Azure,
On Ubuntu 19.10 and later versions, Red Hat Enterprise Linux, and
CentOS 8.x, chrony is configured to use a PTP source clock.
For more information about Red Hat and NTP, see Configure NTP.
If both chrony and VMICTimeSync sources are enabled simultaneously,
you can mark one as prefer, which sets the other source as a backup.
Because NTP services do not update the clock for large skews except
after a long period, the VMICTimeSync will recover the clock from
paused VM events far more quickly than NTP-based tools alone.
See here for more details.

Ubuntu Xenial time discrepancy with VM and Windows host

I have 3 new fresh installs of Ubuntu 16.04.2 LTS xenial on Azure VM, in the system log I noticed I have a time discrepancy and the system is logging this ever 5 seconds.
Mar 5 17:57:57 server1 systemd[1]: snapd.refresh.timer: Adding 2h 17min 4.279485s random time.
Mar 5 17:57:57 server1 systemd[1]: apt-daily.timer: Adding 5h 14min 48.381690s random time.
Mar 5 17:57:57 server1 systemd[19425]: Time has been changed
Mar 5 17:57:57 server1 systemd[37054]: Time has been changed
I have stopped the two services: apt-daily.timer and snapd.refresh.timer, and the "Time has been changed" messages still persist. It seems to be a time discrepancy between the VM and host system. I am not sure how to address this. I also have VMs of the same exact version that I installed over a month ago on Azure and they don't show this error.
Thanks for guidance on this

Kurento Media Server 6.4 segmentation fault in libnice

I am using latest Kurento Media Server (6.4) and node.js app for one-2-one calls. However, Kurento process crashes from time to time inside libnice: (multiple crashes point to the same lib entries)
Segmentation fault (thread 139888166897408, pid 1093)
Stack trace:
[nice_output_stream_new]
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x28C74
[nice_output_stream_new]
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x2B00A
[nice_output_stream_new]
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x2BD09
[nice_agent_gather_candidates]
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x13E8B
[nice_agent_gather_candidates]
/usr/lib/x86_64-linux-gnu/libnice.so.10:0x1431E
[g_simple_permission_new]
/usr/lib/x86_64-linux-gnu/libgio-2.0.so.0:0x75B21
[g_main_context_dispatch]
/lib/x86_64-linux-gnu/libglib-2.0.so.0:0x49EAA
[g_main_context_dispatch]
/lib/x86_64-linux-gnu/libglib-2.0.so.0:0x4A250
[g_main_loop_run]
/lib/x86_64-linux-gnu/libglib-2.0.so.0:0x4A572
[gst_nice_src_get_type]
/usr/lib/x86_64-linux-gnu/gstreamer-1.5/libgstnice15.so:0x2F99
[g_test_get_filename]
/lib/x86_64-linux-gnu/libglib-2.0.so.0:0x70965
[start_thread]
/lib/x86_64-linux-gnu/libpthread.so.0:0x8182
[clone]
/lib/x86_64-linux-gnu/libc.so.6:0xFA47D
At the same time, the most recent debug events recorded to log files are ice candidates pairings:
2016-03-17 08:43:12,095520 1093 [0x00007f3a407f0700] debug KurentoWebRtcEndpointImpl WebRtcEndpointImpl.cpp:214 newSelectedPairFull() <kmswebrtcendpoint43> New pair selected stream_id: 1, component_id: 1, local candidate: candidate:4 1 UDP 2013266431 222.250.42.158 55844 typ host, remot
e candidate: candidate:candidate:2003496507 1 UDP 25042687 222.250.42.151 51817 typ relay raddr 83.21.212.134 rport 51817
Application log shows that error happens at the moment after both SDP answers were generated, and just before startCommunication command is going to be fired:
AppServer-0 Thu, 17 Mar 2016 08:43:11 GMT kms:CallMediaPipeline generateSdpOffer 16
AppServer-0 Thu, 17 Mar 2016 08:43:11 GMT kms:server pipeline:sdpAnswer:caller sdpReady
AppServer-0 Thu, 17 Mar 2016 08:43:11 GMT kms:CallMediaPipeline generateSdpOffer 15
AppServer-0 Thu, 17 Mar 2016 08:43:11 GMT kms:server pipeline:sdpAnswer:callee sdpReady
AppServer-0 reconnect to server 0 100 ff19ebc8-b114-495e-bf31-31f188f6ea8e
Full stack trace and log in can be seen in this gist
$ kurento-media-server -v
Version: 6.4.1~1.g3ffe480
Found modules:
Module: 'core' version '6.4.1~2.g4ed0cfc'
Module: 'elements' version '6.4.1~3.g8e842ad'
Module: 'filters' version '6.4.1~3.g06e2b4f'
# dpkg -l |grep -i libnice
ii libnice10:amd64 0.1.13.1~20160224182402.77.g7bbb87a.trusty amd64 ICE library (shared library)
I can reproduce this by using two Chrome browsers (Mac + Win) and by making around 10-15 calls (call out - hang - call out - hang ..)
If anyone could give any hints, suggestions or directions, I would really appreciate that.
Thanks!
UPD: Problem solved after switching TURN server from reciprocate-turn-server 1.9.7 to coturn 4.4.2.1

Redis and memory

I am developing software in an embedded system (512 MB RAM). I'm using redis to take the place of a shared memory between processes inside a django application.
We are talking about 150 values, stored every second, coming from a MODBUS device. They all have the same key and their expire time is 10 minutes.
After some work hours (tipically a day), redis ceases to function, due to memory problems. Can someone help me out?
output of ps aux | grep redis
redis 1934 1.9 2.2 76216 8400 ? Ssl 07:49 10:37 /usr/bin/redis-server 127.0.0.1:6379
redis.info
# Server
redis_version:2.8.6
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:7cc01333adfd1c61
redis_mode:standalone
os:Linux 3.4.79+ armv7l
arch_bits:32
multiplexing_api:epoll
gcc_version:4.6.3
process_id:1934
run_id:be418b5a05b6670bb4bff9c73cc7126589d6b5c8
tcp_port:6379
uptime_in_seconds:33584
uptime_in_days:0
hz:10
lru_clock:858206
config_file:/etc/redis/redis.conf
# Clients
connected_clients:138
client_longest_output_list:54
client_biggest_input_buf:0
blocked_clients:0
# Memory
used_memory:6893800
used_memory_human:6.57M
used_memory_rss:6152192
used_memory_peak:47902480
used_memory_peak_human:45.68M
used_memory_lua:25600
mem_fragmentation_ratio:0.89
mem_allocator:jemalloc-3.0.0
# Persistence
loading:0
rdb_changes_since_last_save:1370469
rdb_bgsave_in_progress:0
rdb_last_save_time:1434611837
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:0
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
# Stats
total_connections_received:155
total_commands_processed:3207775
instantaneous_ops_per_sec:55
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:297
evicted_keys:0
keyspace_hits:2141758
keyspace_misses:12495
pubsub_channels:6
pubsub_patterns:0
latest_fork_usec:0
# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
# CPU
used_cpu_sys:295.85
used_cpu_user:333.85
used_cpu_sys_children:0.00
used_cpu_user_children:0.00
# Keyspace
db0:keys=2,expires=2,avg_ttl=3185003
db1:keys=937,expires=242,avg_ttl=585637
snippet of redis-server.log
[1969] 18 Jun 22:17:21.819 # Server started, Redis version 2.8.6
[1969] 18 Jun 22:17:21.919 * DB loaded from disk: 0.100 seconds
[1969] 18 Jun 22:17:21.919 * The server is now ready to accept connections on port 6379
[1969] 18 Jun 22:17:21.919 * The server is now ready to accept connections at /var/run/redis/redis.sock
[1969] 19 Jun 09:16:50.444 # Client addr=127.0.0.1:38745 fd=9 name= age=39516 idle=4330 flags=N db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=13724 oll=415 omem=8509160 events=rw cmd=subscribe scheduled to be closed ASAP for overcoming of output buffer limits.
[1969] 19 Jun 09:20:54.056 # Client addr=127.0.0.1:38759 fd=14 name= age=4713 idle=4331 flags=N db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=13688 oll=415 omem=8509160 events=rw cmd=subscribe scheduled to be closed ASAP for overcoming of output buffer limits.
[1941] 19 Jun 09:17:17.134 # Unable to set the max number of files limit to 10032 (Operation not permitted), setting the max clients configuration to 3984.
Some lines from redis-cli monitor, if someone finds that useful. Being the same keys rewritten over and over again, it puzzles me with the high amount of memory used and all those files descriptors.
http://pastebin.com/rQqThUHF

When I run JavaFx Application in Linux Fedora, my application crash..!

I have made JavaFx application which is running fine in Window , Mac OS but when i run in Linux Fedora the application make crash the whole system with the following log.
1) What is the reason of crash in Linux ?
2) What may the be the possible solution of this crash?
A fatal error has been detected by the Java Runtime Environment:
SIGSEGV (0xb) at pc=0x00840e58, pid=2114, tid=2694839152 JRE version:
Java(TM) SE Runtime Environment (7.0_51-b13) (build 1.7.0_51-b13)
Java VM: Java HotSpot(TM) Client VM (24.51-b03 mixed mode linux-x86 )
Problematic frame: C [libc.so.6+0x2fe58] exit+0x38 Failed to write
core dump. Core dumps have been disabled. To enable core dumping, try
"ulimit -c unlimited" before starting Java again If you would like to
submit a bug report, please visit:
http://bugreport.sun.com/bugreport/crash.jsp The crash happened
outside the Java Virtual Machine in native code. See problematic
frame for where to report the bug.
--------------- T H R E A D ---------------
Current thread (0xa0a8d800): JavaThread "JNativeHook Native Hook"
[_thread_in_native, id=2306, stack(0xa01ff000,0xa0a00000)]
--------------- S Y S T E M ---------------
OS:Fedora release 14 (Laughlin)
uname:Linux 2.6.35.6-45.fc14.i686 #1 SMP Mon Oct 18 23:56:17 UTC 2010
i686 libc:glibc 2.12.90 NPTL 2.12.90 rlimit: STACK 8192k, CORE 0k,
NPROC 1024, NOFILE 1024, AS infinity load average:20.56 6.52 4.06
/proc/meminfo: MemTotal: 1013996 kB MemFree: 112652 kB
Buffers: 4224 kB Cached: 140000 kB
Memory: 4k page, physical 1013996k(112652k free), swap
1535996k(665220k free)
vm_info: Java HotSpot(TM) Client VM (24.51-b03) for linux-x86 JRE
(1.7.0_51-b13), built on Dec 18 2013 18:49:34 by "java_re" with gcc
4.3.0 20080428 (Red Hat 4.3.0-8)
time: Mon Feb 10 16:29:44 2014 elapsed time: 15804 seconds
I am not entering the whole log because it is too long to post. please provide possible solution of Exception log
Please file a bug at https://github.com/kwhat/jnativehook with the entire crash log. Chances are the issue has already been fixed in the 1.2 trunk.

Resources