List contents of varnish cache?

List contents of varnish cache? - varnish

Is there a way to list the contents of the varnish cache storage? Also, it would be nice to somehow list the most frequent cache hits.
I found a way to see the most frequent cache misses by listing what is being sent to the backend with:
varnishtop -b -i TxURL
It would be very useful to see what are my top cache hits URLs.
Edit: I am using version: varnish-3.0.3 revision 9e6a70f

I think this you can help:
You can use the parameter "Varnish:hitmiss" of varnishncsa.
First capture a sample of logs with:
varnishncsa -F '%U%q %{Varnish:hitmiss}x' -n NAME -w /path/requests.logs
and then:
sort -k 1 /path/requests.logs | uniq -c | sort -k 1 -n -r | head -25

This feature is not included in Varnish, but you can easily add some scripting to do so.
First thing you need, is to launch varnishncsa as a service and write output in a daily file.
Then add to the default output format at least %{Varnish:hitmiss}x and %U (see varnishncsa doc)
Finally, write some scripts to compute your top URL for example something as below :
# we admit %{Varnish:hitmiss}x is the first column and %U the second
awk '$0 ~ / hit / { arr[$8]=arr[$8]+1 }END{ for(k in arr) { print arr[k]";"k } }' varnishncsa.log|sort -k 1 -nr |head
And feel free to update for your specific needs..

Related

Bash input the data from a text file and process it

I want to create a bash script which will take entries from a text file and process it based on the requirement.
I am trying to find out the last login time for each of the mail accounts. I have a text file email.txt contains all the email addresses line by line and need to check the last login time for each of these account using the below command :
cat /var/log/maillog | grep 'mail1#address.com' | grep 'Login' | tail -1 > result.txt
So the result would be inside result.txt
Thanks,

I think you are looking for something like this.
while read email_id ; do grep "$email_id" /var/log/maillog | grep 'Login' >> result.txt ; done < email.txt

To get all the email addresses in parallel (rather than repeatedly scan what is probably a very large log file).
grep -f file_with_email_addrs /var/log/maillog
Weather this is worthwile depends on the size of the log file and the number of e-mail addresses you are getting. A small log file or a small number of users will probably render this mute. But a big log file and a lot of users will make this a good first step. But will then require some extra processing.
I am unsure about the format of the log lines or the relative distribution of Logins in the file, so what happens next depends on more information.
# If logins are rare?
grep Login /var/log/maillog | grep -f file_with_email_addrs | sort -u -k <column of email>
# If user count is small
grep -f file_with_email_addrs /var/log/maillog | awk '/Login/ {S[$<column of email>] = $0;} END {for (loop in S) {printf("%s\n",S[loop]);}'

Your question is unclear, do you want to look for the last login for 'mail_address1' which is read from a file?
If so:
#/bin/bash
while read mail_address1;
do;
# Your command line here
cat /var/log/maillog | grep $mail_address1 | grep 'Login' | tail -1 >> result.txt
done; < file_with_email_adddrs
PS: I've changed the answer to reflect the suggestions (extremely valid ones) in the comments that advise against using a for i in $(cat file) style loop to loop through a file.

grepping data Apache access.log

I want to have a real-time copy of /var/log/apache2/access.log so i can grep, do hostname resolution, etc.
What's the best way to do this?
I am curious to see what kind of traffic is passing by

You could:
configure apache to send logs via syslog, than configure syslog to obtain separated logs files (with specific owner). Take a look at: O'Reilly : Sending Apache httpd Logs to Syslog
use tail -f, but you have to ensure that following commands are unbuffered in order to read events immediately
tail -f /var/log/apache2/access.log | grep --line-buffered "something" or
tail -f /var/log/apache2/access.log | sed -une "/something/p"
Make the tail -f | grep using perl or python (perl is a good choice for grepping in log files).
(This sample are copied from man perlfaq5:
for (;;) {
for ($curpos = tell(GWFILE); <GWFILE>; $curpos = tell(GWFILE)) {
# search for some stuff and put it into files
}
# sleep for a while
seek(GWFILE, $curpos, 0); # seek to where we had been
}

Do this:
#Customize as appropriate:
tail -f /var/log/apache2/access.log | cut -f 0 -d ' ' &
tail -f /var/log/apache2/access.log | grep foo &

Extract multiple substrings in bash

I have a page exported from a wiki and I would like to find all the links on that page using bash. All the links on that page are in the form [wiki:<page_name>]. I have a script that does:
...
# First search for the links to the pages
search=`grep '\[wiki:' pages/*`
# Check is our search turned up anything
if [ -n "$search" ]; then
# Now, we want to cut out the page name and find unique listings
uniquePages=`echo "$search" | cut -d'[' -f 2 | cut -d']' -f 1 | cut -d':' -f2 | cut -d' ' -f 1 | sort -u`
....
However, when presented with a grep result with multiple [wiki: text in it, it only pulls the last one and not any others. For example if $search is:
Before starting the configuration, all the required libraries must be installed to be detected by Cmake. If you have missed this step, see the [wiki:CT/Checklist/Libraries "Libr By pressing [t] you can switch to advanced mode screen with more details. The 5 pages are available [wiki:CT/Checklist/Cmake/advanced_mode here]. To obtain information about ea - '''Installation of Cantera''': If Cantera has not been correctly installed or if you do not have sourced the setup file '''~/setup_cantera''' you should receive the following message. Refer to the [wiki:CT/FormulationCantera "Cantera installation"] page to fix this problem. You can set the Cantera options to OFF if you plan to use built-in transport, thermodynamics and chemistry.
then it only returns CT/FormulationCantera and it doesn't give me any of the other links. I know this is due to using cut so I need a replacement for the $uniquepages line.
Does anybody have any suggestions in bash? It can use sed or perl if needed, but I'm hoping for a one-liner to extract a list of page names if at all possible.

egrep -o '\[wiki:[^]]*]' pages/* | sed 's/\[wiki://;s/]//' | sort -u
upd. to remove all after space without cut
egrep -o '\[wiki:[^]]*]' pages/* | sed 's/\[wiki://;s/]//;s/ .*//' | sort -u

Bash script to get server health

Im looking to monitor some aspects of a farm of servers that are necessary for the application that runs on them.
Basically, Im looking to have a file on each machine, which when accessed via http (on a vlan), with curl, that will spit out information Im looking for, which I can log into the database with dameon that sits in a loop and checks the health of all the servers one by one.
The info Im looking to get is
<load>server load</load>
<free>md0 free space in MB</free>
<total>md0 total space in MB</total>
<processes># of nginx processes</processes>
<time>timestamp</time>
Whats the best way of doing that?
EDIT: We are using cacti and opennms, however what Im looking for here is data that is necessary for the application that runs on these servers. I dont want to complicate it by having it rely on any 3rd party software to fetch this basic data which can be gotten with a few linux commands.

Make a cron entry that:
executes a shell script every few minutes (or whatever frequency you want)
saves the output in a directory that's published by the web server
Assuming your text is literally what you want, this will get you 90% of the way there:
#!/usr/bin/env bash
LOAD=$(uptime | cut -d: -f5 | cut -d, -f1)
FREE=$(df -m / | tail -1 | awk '{ print $4 }')
TOTAL=$(df -m / | tail -1 | awk '{ print $2 }')
PROCESSES=$(ps aux | grep [n]ginx | wc -l)
TIME=$(date)
cat <<-EOF
<load>$LOAD</load>
<free>$FREE</free>
<total>$TOTAL</total>
<processes>$PROCESSES</processes>
<time>$TIME</time>
EOF
Sample output:
<load> 0.05</load>
<free>9988</free>
<total>13845</total>
<processes>6</processes>
<time>Wed Apr 18 22:14:35 CDT 2012</time>

In my Apache2 access.log, how do I filter what gets displayed?

Someone told me to do this in order to keep track of latest people hitting my server:
tail -f access.log
However, this shows all the "includes", including the JS file, the graphics, etc. What if I just want to see the pages that people hit? How do I filter that using tail -f?

You can pipe the output through grep or awk. For example, if all your pages have .php in the URL, you can try the following:
tail -f access.log | grep '\.php'
If your access logs include a referrer field, the above will also match many resources. We're only interested in events with .php in the request field, not the referrer field. By using awk, we can distinguish between these.
tail -f access.log | awk '$7 ~ /\.php/ { print }'
You may need to adjust $7 if your log format is unusual.

if you're serving .php files:
tail -f access_log | grep ".php"
alternatively, if all your includes are in a folder named "include", for example:
tail -f access_log | grep "include" -v
or if you want to count hits to a certain file:
grep "filename" access_log -c

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

List contents of varnish cache? - varnish

I think this you can help: You can use the parameter "Varnish:hitmiss" of varnishncsa. First capture a sample of logs with: varnishncsa -F '%U%q %{Varnish:hitmiss}x' -n NAME -w /path/requests.logs and then: sort -k 1 /path/requests.logs | uniq -c | sort -k 1 -n -r | head -25

Related

Bash input the data from a text file and process it

grepping data Apache access.log

Extract multiple substrings in bash

Bash script to get server health

In my Apache2 access.log, how do I filter what gets displayed?

Categories

Resources