I have configured Varnish for my website and its working as site performance is excellent but does anybody know that there is any Varnish monitoring tool to see what content are getting stored in Varnish and what are hits & misses something similar to Memcached or APC monitoring tool.
Thanks,
-Nitin
The varnishhist tool is also great for monitoring hits vs. misses and response time simultaneously. This article gives a good overview of how to interpret the output since the manpage is pretty sparse.
To monitor Varnish, the most useful of the available tools directly available in Varnish is the command line varnishstat which gives you a detailed snapshot of Varnish’s current performance. It provides access to in-memory statistics such as cache hits and misses, resource consumption, threads created, and more.
If you add the -1 flag, varnishstat will exit after printing the list one time.
To list specific metrics values, pass them with the -f flag:
e.g. varnishstat -f field1,field2,field3
For example: varnishstat -f MAIN.cache_hit will display a continuously updating count of the cache hits.
Here is a series of articles with more details on how to monitor Varnish, what are the key metrics to watch, how to collect the metrics you want, etc...
There are plugins for virtually any monitoring tool available. For instance Munin, Nagios or New Relic can easily be setup with varnish plugins. If you want to monitor it in realtime on the command line you have the varnishstat command that comes with varnish.
Related
I have a requirement to motiror what was the CPU usage and memory usage of the system when perticular request came.
Is it possible using IIS logs or any other method/tool to do so?
We dont want the usage of IIS process we want the usage of whole system at that time.
You can use windows performance monitor to record cpu and memory usage (using data collector sets). Then, you can check in your IIS logs at what time the request in question came in and look up the recorded data in the performance monitor data collector set.
I don't think there is a tool which automatically combines the IIS log with system performance data. There are tools which include IIS monitoring, but those usually won't break reports down to a single request. If you want to do some further research you can use my list of 40 windows server performance monitoring tools as a starting point.
I want to create a tool, with which we can administer the server.There are two questions with in this question:
To administer access/hit rate of a server. That is to calculate how many times the server has been accessed from a particular time period and then may be generate some kind of graph to demonstrate the load at a particular time on a particular day.
However i don't have any idea, how i can gather these information.
A pretty vague idea is to
use a watch over access log(in case of apache) and then count the number of times the notification occurs and note down the time simultaneously
Parse access.log file every time and then generate the output(but access.log file can be very big, so not sure about this idea)
I am familiar with apache and hence the above idea is based on apache's access log and i don't have idea about other like nginx etc.
Hence i would like to know, if I can use the above procedure or is there any other way possible.
I would like to know when the server is reaching its limit. The idea of using top and then show the live result of cpu usage and ram usage via CPP
To monitor a web server the easiest way is probably to use some existing tool like webalizer.
http://www.webalizer.org/
To monitor other things like CPU and memory usage I would suggest snmpd together with some other tool like mrtg. http://oss.oetiker.ch/mrtg/
If you think that webalizer does not sample data often enough with its hourly statistics but the sample time of mrtg with 5 minutes would be better it is also possible to provide more data with snmpd by writing an snmpd extension. Such an extension could parse the apache log file with a rather small amount of code and give you all the graphing functionality for free from mrtg or some other tool processing snmp data.
I have setup graphite and statsd on a specific machine that will be dedicated for stats. Now, if I would like to connect my application servers to provide stats - what would be the best way?
I know that carbon does this for the stats machine already, but what do I do on the appservers that doesn't have graphite installed?
What I am looking for is to store load, disk usage and memory free/used.
running collectd (http://collectd.org/) with a graphite agent (https://github.com/indygreg/collectd-carbon) would be an excellent start to gather the information you're after.
There is an almost unlimited amount of ways to get your data into graphite.
You can find a list of tools that have known to work very well with graphite on the readthedocs.org page: http://graphite.readthedocs.org/en/0.9.10/tools.html
What solutions do you have in place for handling bandwidth billing for your vhosts on a shared environment in apache? If you are using log parsing, does your solution scale well when the logs become very very large? Anyone using any sort of module out there for this?
There exist certain modules for Apache 1.x and 2.x that will allow you to set a maximum on the transfer amount, most of them keep track using the scoreboard file that Apache generates (when mod_status is enabled with ExtendedStatus on). The one I still have bookmarked from when I was looking for one is mod_curb, however it is not complete and at the current moment in time looks to only work on a server-wide scale and not for individual virtual hosts.
Apache modules can be set to be outbound filters, so you could write a costume module that would sit at the end of the chain, and add up all the outgoing packets, using the data that APR provides you can then add it to a counter for that specific domain/sub-domain. After that you have a choice of what to do with the data.
For specific examples, take a look at mod_deflate that Apache provides, to see how it sits at the end of the chain and compresses everything but the headers the server sends out. This should give you a good start.
As for log based processing, it becomes slower the more logs exist. This is just the nature of the beast. When we were using a log based solution we had a custom perl script that ran every 15 minutes. Eventually it would take longer than 15 minutes to parse, and since we had proper locking after a while multiple of these log processing perl scripts were now running, all waiting on each other. We ended up re-writing it with a simple call to tail -F, which then let perl parse each and every request as it came in, while not entirely efficient, it worked. The upside of that was that we were now able to update traffic statistics in near realtime so that clients were updated sooner rather than later if they went over their limits.
You could go the poor man's route, and use Webalizer or Awstats. Both of these will give you an idea of traffic based off of access logs, and can be done on a per virtual host basis. In the case of Awstats, I know once you start doing 10GB+ of traffic daily, it starts to consume resources. You can always nice it, but then you'll get your data next week, rather than when you actually need it. In the past with Webalizer I've had to use some hackery to get it to handle large access logs, by chunking up the logs to smaller pieces that it could manage. It didn't provide as many useful metrics from what I've done with it, but I've also never needed to save a server from it :)
If virtual host does not have own IP, there is no easier way than logfile parsing. Just use mod_logio to calculate actual bytes transferred. mod_logio handles broken connections, compressed data etc. correctly. You should be able to parse logs realtime using piped logs. Use BufferedLogs to scale further (just check that parser handles lines broken when buffered correctly). Parser should save data periodically (like every minute) somewhere, just avoid locking issues as parsing must not slow down httpd. If httpd connections is spending time in L-state at server-status, you are too slow. After you have numbers, you can sum then further and then save data to billing system.
If you save billing logs as file too you can correct and doublecheck realtime traffic calculations. If you boot httpd you can end up missing some lines. But generally losing couple hundred requests is acceptable as it less than seconds worth on a high volume site.
There is modules that try to handle and limit bandwidth, like mod_cband and mod_bw. But they don't work when you have same vhost on multiple machines. I guess they would work ok on smaller scale.
If you have IP per vhost you could try IP based methods like feeding firewall logs to traffic calculator. Simple way is to use iptables.
Although we use IIS rather than apache we do use log file analysis for bandwidth billing (and bandwidth profiling / analysis). We use a custom application to load data collected in the log files in one hour increments, and act upon any required notifications or bandwidth overuse.
The log file loader runs as a low priority process, so as not to interupt operation of the server. Even on high usage servers with a large number of sites, processing takes less than 15 minutes, so we don't see scalability as a problem with this methodology.
There may be better ways of doing this, but this is perfectly adequate for what we need. I look forward to viewing the other responses.
It can be easily achieved with mod_cband. We've rewritten the module to fix a few bugs, provide true redundancy on restarts and incorporate FTP and Mail statistics.
http://www.howtoforge.com/mod_cband_apache2_bandwidth_quota_throttling
Well mod_cband would be great, except for when i'm using it, the max_connections (the overall, total value for every client combined), decides to crawl upwards until it hits the max value i've set. when it does reach the highest value, it just stays there and leaves all my clients receiving a constant "503 Service Temporarily Unavailable" error.
for example, i set "CbandSpeed 1000Mbps 500 1200", and the server connections crawls up to 1200 in about 8 hrs, then stays there. at this point, i count the total number of connections under Remote Clients in the mod_cband status window, and i see around 50. i've also used ps aux and i see around the same amount (~50) open http processes, which is normal, except for the fact that nobody can access the site at all because of the 503 errors.
Any ideas what could be wrong, or can this be fixed?
When running any kind of server under load there are several resources that one would like to monitor to make sure that the server is healthy. This is specifically true when testing the system under load.
Some examples for this would be CPU utilization, memory usage, and perhaps disk space.
What other resource should I be monitoring, and what tools are available to do so?
As many as you can afford to, and can then graph/understand/look at the results. Monitoring resources is useful for not only capacity planning, but anomaly detection, and anomaly detection significantly helps your ability to detect security events.
You have a decent start with your basic graphs. I'd want to also monitor the number of threads, number of connections, network I/O, disk I/O, page faults (arguably this is related to memory usage), context switches.
I really like munin for graphing things related to hosts.
I use Zabbix extensively in production, which comes with a stack of useful defaults. Some examples of the sorts of things we've configured it to monitor:
Network usage
CPU usage (% user,system,nice times)
Load averages (1m, 5m, 15m)
RAM usage (real, swap, shm)
Disc throughput
Active connections (by port number)
Number of processes (by process type)
Ping time from remote location
Time to SSL certificate expiry
MySQL internals (query cache usage, num temporary tables in RAM and on disc, etc)
Anything you can monitor with Zabbix, you can also attach triggers to - so it can restart failed services; or page you to alert about problems.
Collect the data now, before performance becomes an issue. When it does, you'll be glad of the historical baselines, and the fact you'll be able to show what date and time problems started happening for when you need to hunt down and punish exactly which developer made bad changes :)
I ended up using dstat which is vmstat's nicer looking cousin.
This will show most everything you need to know about a machine's health,
including:
CPU
Disk
Memory
Network
Swap
"df -h" to make sure that no partition runs full which can lead to all kinds of funky problems, watching the syslog is of course also useful, for that I recommend installing "logwatch" (Logwatch Website) on your server which sends you an email if weird things start showing up in your syslog.
Cacti is a good web-based monitoring/graphing solution. Very complete, very easy to use, with a large userbase including many large Enterprise-level installations.
If you want more 'alerting' and less 'graphing', check out nagios.
As for 'what to monitor', you want to monitor systems at both the system and application level, so yes: network/memory/disk i/o, interrupts and such over the system level. The application level gets more specific, so a webserver might measure hits/second, errors/second (non-200 responses), etc and a database might measure queries/second, average query fulfillment time, etc.
Beware the afore-mentioned slowquerylog in mysql. It should only be used when trying to figure out why some queries are slow. It has the side-effect of making ALL your queries slow while it's enabled. :P It's intended for debugging, not logging.
Think 'passive monitoring' whenever possible. For instance, sniff the network traffic rather than monitor it from your server -- have another machine watch the packets fly back and forth and record statistics about them.
(By the way, that's one of my favorites -- if you watch connections being established and note when they end, you can find a lot of data about slow queries or slow anything else, without putting any load on the server you care about.)
In addition to top and auth.log, I often look at mtop, and enable mysql's slowquerylog and watch mysqldumpslow.
I also use Nagios to monitor CPU, Memory, and logged in users (on a VPS or dedicated server). That last lets me know when someone other than me has logged in.
network of course :) Use MRTG to get some nice bandwidth graphs, they're just pretty most of the time.. until a spammer finds a hole in your security and it suddenly increases.
Nagios is good for alerting as mentioned, and is easy to get setup. You can then use the mrtg plugin to get alerts for your network traffic too.
I also recommend ntop as it shows where your network traffic is going.
A good link to get you going with Munin and Monit: link text
I typically watch top and tail -f /var/log/auth.log.