Observed an issue in which system is throwing - "too many open files" after working fine for few hours.
Observed that there are many tcp connection stuck in "CLOSE_WAIT" state.
sudo lsof | grep ":http (CLOSE_WAIT)" | wc -l -> 16215.
Number is increasing with time and in few hours it would cross max limit allowed.
Also ran netstat command -
"netstat -ant | awk '{print $6}' | sort | uniq -c | sort -n" and output is -> 122 CLOSE_WAIT.
Why output from netstat command is way lower than lsof command. Both are returning close wait connections and should have given approx same value.
Once i got to know that connection to specific service is causing this issue, then what should i do identify the exact code where this is happening ? I went through the client code for connecting to service and i don't see any connection leakage.
Related
I'm writing a bash script that will print on the screen all the latest logs from a service that has already died (or still lives, both situations must work). I know its name and don't have to guess.
I'm having difficulty getting the latest PID for a process that has already died from journalctl. I'm not talking about this:
journalctl | grep "<processname>"
This will give me all the logs that include processname in their text.
I've also tried:
journalctl | pgrep -f "<processname>"
This command gave me a list of numbers which supposedly should include the pid of my process. It was not there.
These ideas came from searching for previous questions. I haven't found a question that answers specifically what I asked.
How can I extract the latest PID from journalctl for a specific process?
I figured this out.
First, you must be printing your PID in your logs. It doesn't appear there automatically. Then, you can use grep -E and awk to grab exactly the expression you want from your log:
Var=$(journalctl --since "24 hours ago" | grep -E "\[([0-9]+)\]" | tail -n 1 | awk '{print $5}' | awk -F"[][{}]" '{print $2}'
This one-liner script takes the logs from the last 24 hours, grep with -E to use an expression, tail -n 1 to grab the last most updated line from those results and then, using awk to delimit the line and grab the exact expression you need from it.
I have a bash script with contents-
#!/bin/bash
while true;do
netstat -antp | grep LISTEN | tr -s ' ' | cut -d ' ' -f 4 > /tmp/log
sleep 100
done
Say I create a service which executes the script on boot.But when I use ps -eo command I'm able to see the commands being executed.For eg -
netstat -antp
grep LISTEN
tr -s ' '
cut -d ' ' -f 4
But I wish to suppress this output and hide the execution of these commands.Is there a way to do it?
Any other suggestions are welcome too.Thanks in advance!
You can't hide running processes from the system, at least not without some kernel hooks. Doing so is something typically only found in malware, so you'll not likely get much help.
There's really no reason to hide those processes from the system. If something in your script gets hung up, you'll want to see those processes to give you an idea of what's happening.
If there's a specific problem the presence of those processes is causing, you need to detail what that is.
Why read -t doesn't time out when reading from pipe on RHEL5 or RHEL6?
Here is my example which doesn't timeout on my RHEL boxes wile reading from the pipe:
tail -f logfile.log | grep 'something' | read -t 3 variable
If I'm correct read -t 3 should timeout after 3 seconds?
Many thanks in advance.
Chris
GNU bash, version 4.1.2(1)-release (x86_64-redhat-linux-gnu)
The solution given by chepner should work.
An explanation why your version doesn't is simple: When you construct a pipe like yours, the data flows through the pipe from the left to the right. When your read times out however, the programs on the left side will keep running until they notice that the pipe is broken, and that happens only when they try to write to the pipe.
A simple example is this:
cat | sleep 5
After five seconds the pipe will be broken because sleep will exit, but cat will nevertheless keep running until you press return.
In your case that means, until grep produces a result, your command will keep running despite the timeout.
While not a direct answer to your specific question, you will need to run something like
read -t 3 variable < <( tail -f logfile.log | grep "something" )
in order for the newly set value of variable to be visible after the pipeline completes. See if this times out as expected.
Since you are simply using read as a way of exiting the pipeline after a fixed amount of time, you don't have to worry about the scope of variable. However, grep may find a match without printing it within your timeout due to its own internal buffering. You can disable that (with GNU grep, at least), using the --line-buffered option:
tail -f logfile.log | grep --line-buffered "something" | read -t 3
Another option, if available, is the timeout command as a replacement for the read:
timeout 3 tail -f logfile.log | grep -q --line-buffered "something"
Here, we kill tail after 3 seconds, and use the exit status of grep in the usual way.
I dont have a RHEL server to test your script right now but I could bet than read is exiting on timeout and working as it should. Try run:
grep 'something' | strace bash -c "read -t 3 variable"
and you can confirm that.
Is there a way to show all open files by IP address on linux?
I use this:
netstat -atun | awk '{print $5}' | cut -d: -f1 | sed -e '/^$/d' |sort | uniq -c | sort -n
to show all connections from IP sorted by number of connections.
How do I know what are these IP's hitting?
thanks in advance!
If you can find a way to identify the process that has the socket open in netstat, you could use the ls -l /proc/<pid>/fd to find what files that process has open. Of course, a lot of those files may not be accessed from the network - for example your typical apache server will have /var/log/httpd/access_log and /var/log/httpd/error_log and quite possibly some other files too. And of course, it will be a "moment in time", what file that process has open in 5 seconds or a millisecond from now may be quite a bit different.
I presume you don't just let anything/anyone access the server, so I'm guessing it's a webserver or some such, in which case, it would probably be easier to put some code into your web interface to track who does what.
Im looking to monitor some aspects of a farm of servers that are necessary for the application that runs on them.
Basically, Im looking to have a file on each machine, which when accessed via http (on a vlan), with curl, that will spit out information Im looking for, which I can log into the database with dameon that sits in a loop and checks the health of all the servers one by one.
The info Im looking to get is
<load>server load</load>
<free>md0 free space in MB</free>
<total>md0 total space in MB</total>
<processes># of nginx processes</processes>
<time>timestamp</time>
Whats the best way of doing that?
EDIT: We are using cacti and opennms, however what Im looking for here is data that is necessary for the application that runs on these servers. I dont want to complicate it by having it rely on any 3rd party software to fetch this basic data which can be gotten with a few linux commands.
Make a cron entry that:
executes a shell script every few minutes (or whatever frequency you want)
saves the output in a directory that's published by the web server
Assuming your text is literally what you want, this will get you 90% of the way there:
#!/usr/bin/env bash
LOAD=$(uptime | cut -d: -f5 | cut -d, -f1)
FREE=$(df -m / | tail -1 | awk '{ print $4 }')
TOTAL=$(df -m / | tail -1 | awk '{ print $2 }')
PROCESSES=$(ps aux | grep [n]ginx | wc -l)
TIME=$(date)
cat <<-EOF
<load>$LOAD</load>
<free>$FREE</free>
<total>$TOTAL</total>
<processes>$PROCESSES</processes>
<time>$TIME</time>
EOF
Sample output:
<load> 0.05</load>
<free>9988</free>
<total>13845</total>
<processes>6</processes>
<time>Wed Apr 18 22:14:35 CDT 2012</time>