Collecting varnish stats at the Url Level - varnish

We are using varnish to cache different urls like:-
/discovery/v1/search
/discovery/v1/suggest
/discovery/v1/recommend
/orders/ordersearch
Right now we were able to get varnish stats like cache_hit, cache_miss for the overall varnish. Is there a way we can get the stats at the url level like cache_hit, cache_miss, etc for /discovery/v1/search, /discovery/v1/suggest separately?
Environment:-
varnishd (varnish-4.1.0 revision 3041728)
Debian 8 (x86_64)

I would run varnishncsa if you are not already. Add this parameter to the DAEMON_OPTS (or command line arguments) for the format:
%{Varnish:handling}x
I changed the following single line in the init.d script for varnishncsa like this:
DAEMON_OPTS="-a -w $logfile -D -P $pidfile -F \"%h %l %u %t %r %s %b %D %{VCL_Log:Referer}x [%{X-Forwarded-For}i] %{Varnish:handling}x\""
This will give you "hit", "miss" or "pass" in the handling field for each line.
Then I would post process the log-file when logrotated runs.

Related

How I can see in varnish which Agents (Browsers or bots agents ) request a certain url

I have 2 box
first frontend - with nginx + varnish
second backend - apache
How i can see for a certain URL Agents that request (hit) it in Varnish?
you want varnishncsa:
varnishncsa -q 'ReqURL eq "/whatever/url/you/want"' -F "%{User-agent}i"
this will give you real-time output, it you just want to see whatever you have in the backlog, add -d:
varnishncsa -q 'ReqURL eq "/whatever/url/you/want"' -F "%{User-agent}i" -d
In doubt, man varnishncsa, it has all the info about the format string, and you can look at man vsl and man vsl-query for mor info about the -q argument.

How to see all Request URLs the server is doing (final URLs)

How list from the command line URLs requests that are made from the server (an *ux machine) to another machine.
For instance, I am on the command line of server ALPHA_RE .
I do a ping to google.co.uk and another ping to bbc.co.uk
I would like to see, from the prompt :
google.co.uk
bbc.co.uk
so, not the ip address of the machine I am pinging, and NOT an URL from servers that passes my the request to google.co.uk or bbc.co.uk , but the actual final urls.
Note that only packages that are available on normal ubuntu repositories are available - and it has to work with command line
Edit
The ultimate goal is to see what API URLs a PHP script (run by a cronjob) requests ; and what API URLs the server requests 'live'.
These ones do mainly GET and POST requests to several URLs, and I am interested in knowing the params :
Does it do request to :
foobar.com/api/whatisthere?and=what&is=there&too=yeah
or to :
foobar.com/api/whatisthathere?is=it&foo=bar&green=yeah
And does the cron jobs or the server do any other GET or POST request ?
And that, regardless what response (if any) these API gives.
Also, the API list is unknown - so you cannot grep to one particular URL.
Edit:
(OLD ticket specified : Note that I can not install anything on that server (no extra package, I can only use the "normal" commands - like tcpdump, sed, grep,...) // but as getting these information with tcpdump is pretty hard, then I made installation of packages possible)
You can use tcpdump and grep to get info about activity about network traffic from the host, the following cmd line should get you all lines containing Host:
tcpdump -i any -A -vv -s 0 | grep -e "Host:"
If I run the above in one shell and start a Links session to stackoverflow I see:
Host: www.stackoverflow.com
Host: stackoverflow.com
If you want to know more about the actual HTTP request you can also add statements to the grep for GET, PUT or POST requests (i.e. -e "GET"), which can get you some info about the relative URL (should be combined with the earlier determined host to get the full URL).
EDIT:
based on your edited question I have tried to make some modification:
first a tcpdump approach:
[root#localhost ~]# tcpdump -i any -A -vv -s 0 | egrep -e "GET" -e "POST" -e "Host:"
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
E..v.[#.#.......h.$....P....Ga .P.9.=...GET / HTTP/1.1
Host: stackoverflow.com
E....x#.#..7....h.$....P....Ga.mP...>;..GET /search?q=tcpdump HTTP/1.1
Host: stackoverflow.com
And an ngrep one:
[root#localhost ~]# ngrep -d any -vv -w byline | egrep -e "Host:" -e "GET" -e "POST"
^[[B GET //meta.stackoverflow.com HTTP/1.1..Host: stackoverflow.com..User-Agent:
GET //search?q=tcpdump HTTP/1.1..Host: stackoverflow.com..User-Agent: Links
My test case was running links stackoverflow.com, putting tcpdump in the search field and hitting enter.
This gets you all URL info on one line. A nicer alternative might be to simply run a reverse proxy (e.g. nginx) on your own server and modify the host file (such as shown in Adam's answer) and have the reverse proxy redirect all queries to the actual host and use the logging features of the reverse proxy to get the URLs from there, the logs would probably a bit easier to read.
EDIT 2:
If you use a command line such as:
ngrep -d any -vv -w byline | egrep -e "Host:" -e "GET" -e "POST" --line-buffered | perl -lne 'print $3.$2 if /(GET|POST) (.+?) HTTP\/1\.1\.\.Host: (.+?)\.\./'
you should see the actual URLs
A simple solution is to modify your '/etc/hosts' file to intercept the API calls and redirect them to your own web server
api.foobar.com 127.0.0.1

logstash file input failing to read file

I've been scratching my head over this for hours, and I'm getting kind of frustrated. I'm new to logstash, so I might be doing something wrong, but after a few hours working on this, I can't figure out what. I configured both agent and server using the chef-logstash cookbook.
I have two system that I've set up, an agent and a server. The agent reads files, filters them, then ships them off to the redis instance on the server. The server grabs incoming entries from redis, and indexes them in elasticsearch (using embedded).
Here's my problem, I can use a simple config like the one below, enter input to the server, and everything ships off to the server, just fine.
input { stdin { } }
output {
redis {
host => "192.168.33.11"
data_type => "list"
key => "logstash"
codec => json
}
stdout { codec => rubydebug }
}
Everything get's picked up properly by the logstash running on my server (in vagrant), they get indexed, and I can see them in Kibana.
The agent is another story. On my agent, started with 3 config files, input_file_nginx.conf, output_stdout.conf, output_redis.conf. I found that the logs weren't getting to my redis on my server, so I tried to narrow it down. It was when I looked at my logs on my agent I got really confused. As far as I could tell, nothing was getting read. Either that, or my output_stdout.conf is messed up.
Here's my input_file_nginx.conf
input {
file {
path => "/home/silkstart/logs/*.log"
type => "nginx"
}
}
For reference, the two files in there are nginx.silkstart.80.access.log and nginx.silkstart.80.error.log, which both have 644 permissions, so should be readable.
And my output_stdout.conf
output {
stdout {
codec => rubydebug
}
}
These were all generated using logstash_config from some erbs.
My instance came almost verbatim from the agent.rb example
logstash_service name do
action [:enable]
method "runit"
end
Here's the resulting config
#!/bin/sh
cd //opt/logstash/agent
exec 2>&1
# Need to set LOGSTASH_HOME and HOME so sincedb will work
LOGSTASH_HOME="/opt/logstash/agent"
GC_OPTS=""
JAVA_OPTS="-server -Xms198M -Xmx596M -Djava.io.tmpdir=$LOGSTASH_HOME/tmp/ "
LOGSTASH_OPTS="agent -f $LOGSTASH_HOME/etc/conf.d"
LOGSTASH_OPTS="$LOGSTASH_OPTS --pluginpath $LOGSTASH_HOME/lib"
LOGSTASH_OPTS="$LOGSTASH_OPTS -vv"
LOGSTASH_OPTS="$LOGSTASH_OPTS -l $LOGSTASH_HOME/log/logstash.log"
export LOGSTASH_OPTS="$LOGSTASH_OPTS -w 1"
HOME=$LOGSTASH_HOME exec chpst -u logstash:logstash $LOGSTASH_HOME/bin/logstash $LOGSTASH_OPTS
This is fairly similar to my server config, which works
#!/bin/sh
ulimit -Hn 65550
ulimit -Sn 65550
cd //opt/logstash/server
exec 2>&1
# Need to set LOGSTASH_HOME and HOME so sincedb will work
LOGSTASH_HOME="/opt/logstash/server"
GC_OPTS=""
JAVA_OPTS="-server -Xms1024M -Xmx218M -Djava.io.tmpdir=$LOGSTASH_HOME/tmp/ "
LOGSTASH_OPTS="agent -f $LOGSTASH_HOME/etc/conf.d"
LOGSTASH_OPTS="$LOGSTASH_OPTS --pluginpath $LOGSTASH_HOME/lib"
LOGSTASH_OPTS="$LOGSTASH_OPTS -l $LOGSTASH_HOME/log/logstash.log"
export LOGSTASH_OPTS="$LOGSTASH_OPTS -w 1"
HOME=$LOGSTASH_HOME exec chpst -u logstash:logstash $LOGSTASH_HOME/bin/logstash $LOGSTASH_OPTS
The only difference I can see here is
ulimit -Hn 65550
ulimit -Sn 65550
but I don't see why that should stop that from working. This would increase the number of file descriptors, but the default 4096 should be plenty.
When make some requests to the server to make sure the log has new stuff, and I check the runit logs, it only points me to /opt/logstash/agent/log/logstash.log, which I have pasted the contents of at https://gist.github.com/jrstarke/384f192abdd93c0acf2a.
To really throw a wrench in things, if I sudo su logstash and run bin/logstash -f etc/conf.d from the command line, everything works as expected.
Any help would be greatly appreciated.
I managed to figure this out. For anyone else that's facing a similar issue, you will want to check your permissions on the files you're trying to access.
If you're accessing files that you have access to through group permissions, you're likely facing the same issue I did.
Look closely at this line
exec chpst -u logstash:logstash
That this tells us is that we want to run a program as user logstash, with the group permissions logstash. In my case, the group that I wanted to use was an additional group. The docs for chpst note that
If group consists of a colon-separated list of group names, chpst sets the group ids of all listed groups.
So if I wanted to run the program as user1 with both group1 and group2, that command would become
exec chpst -u user1:group1:group2
I hope this helps anyone else that is running into the same issue I did.

Parallel SSH with Custom Parameters to Each Host

There are plenty of threads and documentation about parallel ssh, but I can't find anything on passing custom parameters to each host. Using pssh as an example, the hosts file is defined as:
111.111.111.111
222.222.222.222
However, I want to pass custom parameters to each host via a shell script, like this:
111.111.111.111 param1a param1b ...
222.222.222.222 param2a param2b ...
Or, better, the hosts and parameters would be split between 2 files.
Because this isn't common, is this misuse of parallel ssh? Should I just create many ssh processes from my script? How should I approach this?
You could use GNU parallel.
Suppose you have a file argfile:
111.111.111.111 param1a param1b ...
222.222.222.222 param2a param2b ...
Then running
parallel --colsep ' ' ssh {1} prog {2} {3} ... :::: argfile
Would run prog on each host with the corresponding parameters. It is important that the number of parameters be the same for each host.
Here is a solution that you can use, after tailoring it to suit your needs:
#!/bin/bash
#filename: example.sh
#usage: ./example.sh <par1> <par2> <par3> ... <par6>
#set your ip addresses
$traf1=1.1.1.1
$traf2=2.2.2.2
$traf3=3.3.3.3
#set some custom parameters for your scripts and use them as you wish.
#In this example, I use the first 6 command line parameters passed when run the example.sh
ssh -T $traf1 -l username "/export/home/path/to/script.sh $1 $2" 1>traf1.txt 2>/dev/null &
echo "Fetching data from traffic server 2..."
ssh -T $traf2 -l username "/export/home/path/to/script.sh $3 $4" 1> traf2.txt 2>/dev/null &
echo "Fetching data from traffic server 3..."
ssh -T $traf3 -l username "/export/home/path/to/script.sh $5 $6" 1> traf3.txt 2>/dev/null &
#your application will block on this line, and will only continue if all
#3 remotely executed scripts will complete
wait
Keep in mind that the above requires that you setup passwordless login between the machines, otherwise the solution will break to request for password input.
If you can use Perl:
use Net::OpenSSH::Parallel;
use Data::Dumper;
my $pssh = Net::OpenSSH::Parallel->new;
$pssh->add_host('111.111.111.111');
$pssh->add_host('222.222.222.222');
$pssh->push('111.111.111.111', $cmd, $param11, $param12);
$pssh->push('222.222.222.222', $cmd, $param21, $param22);
$pssh->run;
if (my %errors = $ssh->get_errors) {
print STDERR "ssh errors:\n", Dumper \%errors;
}

RRD print the timestamp of the last valid data

I have a rdd database storing ping response from a wide range of network equipments
How can i print on the graph the timestamp of the last valid entry in the rrd database, so i can see if a host is down when did it went down
I use the folowing to creade the RRD file.
rrdtool create terminal_1.rrd -s 60 \
DS:ping:GAUGE:120:0:65535 \
RRA:AVERAGE:0.5:1:2880
Use the lastupdate option of rrdtool.
Another solution exists if you only have one file per host : don't update your RRD if the host is down. You can then see the last updated time with a plain ls or stat as in :
ls -l terminal_1.rrd
stat --format %Y terminal_1.rrd
In case you plan to use the caching daemon of RRD, you have to use the last command in order to flush the pending updates.
rrdtool last terminal_1.rrd

Resources