How do you handle bandwidth billing on shared servers in apache? - linux

What solutions do you have in place for handling bandwidth billing for your vhosts on a shared environment in apache? If you are using log parsing, does your solution scale well when the logs become very very large? Anyone using any sort of module out there for this?

There exist certain modules for Apache 1.x and 2.x that will allow you to set a maximum on the transfer amount, most of them keep track using the scoreboard file that Apache generates (when mod_status is enabled with ExtendedStatus on). The one I still have bookmarked from when I was looking for one is mod_curb, however it is not complete and at the current moment in time looks to only work on a server-wide scale and not for individual virtual hosts.
Apache modules can be set to be outbound filters, so you could write a costume module that would sit at the end of the chain, and add up all the outgoing packets, using the data that APR provides you can then add it to a counter for that specific domain/sub-domain. After that you have a choice of what to do with the data.
For specific examples, take a look at mod_deflate that Apache provides, to see how it sits at the end of the chain and compresses everything but the headers the server sends out. This should give you a good start.
As for log based processing, it becomes slower the more logs exist. This is just the nature of the beast. When we were using a log based solution we had a custom perl script that ran every 15 minutes. Eventually it would take longer than 15 minutes to parse, and since we had proper locking after a while multiple of these log processing perl scripts were now running, all waiting on each other. We ended up re-writing it with a simple call to tail -F, which then let perl parse each and every request as it came in, while not entirely efficient, it worked. The upside of that was that we were now able to update traffic statistics in near realtime so that clients were updated sooner rather than later if they went over their limits.

You could go the poor man's route, and use Webalizer or Awstats. Both of these will give you an idea of traffic based off of access logs, and can be done on a per virtual host basis. In the case of Awstats, I know once you start doing 10GB+ of traffic daily, it starts to consume resources. You can always nice it, but then you'll get your data next week, rather than when you actually need it. In the past with Webalizer I've had to use some hackery to get it to handle large access logs, by chunking up the logs to smaller pieces that it could manage. It didn't provide as many useful metrics from what I've done with it, but I've also never needed to save a server from it :)

If virtual host does not have own IP, there is no easier way than logfile parsing. Just use mod_logio to calculate actual bytes transferred. mod_logio handles broken connections, compressed data etc. correctly. You should be able to parse logs realtime using piped logs. Use BufferedLogs to scale further (just check that parser handles lines broken when buffered correctly). Parser should save data periodically (like every minute) somewhere, just avoid locking issues as parsing must not slow down httpd. If httpd connections is spending time in L-state at server-status, you are too slow. After you have numbers, you can sum then further and then save data to billing system.
If you save billing logs as file too you can correct and doublecheck realtime traffic calculations. If you boot httpd you can end up missing some lines. But generally losing couple hundred requests is acceptable as it less than seconds worth on a high volume site.
There is modules that try to handle and limit bandwidth, like mod_cband and mod_bw. But they don't work when you have same vhost on multiple machines. I guess they would work ok on smaller scale.
If you have IP per vhost you could try IP based methods like feeding firewall logs to traffic calculator. Simple way is to use iptables.

Although we use IIS rather than apache we do use log file analysis for bandwidth billing (and bandwidth profiling / analysis). We use a custom application to load data collected in the log files in one hour increments, and act upon any required notifications or bandwidth overuse.
The log file loader runs as a low priority process, so as not to interupt operation of the server. Even on high usage servers with a large number of sites, processing takes less than 15 minutes, so we don't see scalability as a problem with this methodology.
There may be better ways of doing this, but this is perfectly adequate for what we need. I look forward to viewing the other responses.

It can be easily achieved with mod_cband. We've rewritten the module to fix a few bugs, provide true redundancy on restarts and incorporate FTP and Mail statistics.
http://www.howtoforge.com/mod_cband_apache2_bandwidth_quota_throttling

Well mod_cband would be great, except for when i'm using it, the max_connections (the overall, total value for every client combined), decides to crawl upwards until it hits the max value i've set. when it does reach the highest value, it just stays there and leaves all my clients receiving a constant "503 Service Temporarily Unavailable" error.
for example, i set "CbandSpeed 1000Mbps 500 1200", and the server connections crawls up to 1200 in about 8 hrs, then stays there. at this point, i count the total number of connections under Remote Clients in the mod_cband status window, and i see around 50. i've also used ps aux and i see around the same amount (~50) open http processes, which is normal, except for the fact that nobody can access the site at all because of the 503 errors.
Any ideas what could be wrong, or can this be fixed?

Related

NodeJS Monitoring Website (Worker Threads?/Multi Process?)

I am doing small project of application that will monitor some servers.
It will base on telnet port check, ping, and also it will use libraries to connect directly to databases (MSSQL, Oracle, MySQL) to check their status.
I wonder what will be the best effective solution for this idea, currently with around 30 servers it works quite smooth, around 2.5sec to check status for all of them (running async). However I am worried that in the future with more servers it might get worse. Hence thinking about using some alternative like Worker Threads maybe? or some multi processing? Any ideas? Everything is happening in internal network so I do not expect huge latency.
Thank you in advance.
Have you ever tried the PM2 cluster mode:
https://pm2.keymetrics.io/docs/usage/cluster-mode/
The telnet stuff is TCP, which Node.js does very well using OS-level networking events. The connections to databases can vary. In the case of Oracle, you'll likely be using the node-oracledb. Those are SQL*Net connections that rely on the OCI libs and Node.js' thread pool. The thread pool defaults to four threads, but you can grow it up to 128 per Node.js process. See this doc for info:
https://oracle.github.io/node-oracledb/doc/api.html#-143-connections-threads-and-parallelism
Having said all that, other than increasing the size of the thread pool, I wouldn't recommend you make any changes. Why fight fires before they're burning? No need to over-engineer things. You're getting acceptable performance given the current number of servers you have.
How many servers do you plan to add in, say, 5 years? What's the difference in timing if you run the status checks for half of the servers vs all of them? Perhaps you could use that kind of data to make an educated guess as to where things would go.
As you add new ones, keep track of the total time to check the status. Is it slipping? If so, look into where the time is being spent and write the solution that will help.

How to get number of hits in server

I want to create a tool, with which we can administer the server.There are two questions with in this question:
To administer access/hit rate of a server. That is to calculate how many times the server has been accessed from a particular time period and then may be generate some kind of graph to demonstrate the load at a particular time on a particular day.
However i don't have any idea, how i can gather these information.
A pretty vague idea is to
use a watch over access log(in case of apache) and then count the number of times the notification occurs and note down the time simultaneously
Parse access.log file every time and then generate the output(but access.log file can be very big, so not sure about this idea)
I am familiar with apache and hence the above idea is based on apache's access log and i don't have idea about other like nginx etc.
Hence i would like to know, if I can use the above procedure or is there any other way possible.
I would like to know when the server is reaching its limit. The idea of using top and then show the live result of cpu usage and ram usage via CPP
To monitor a web server the easiest way is probably to use some existing tool like webalizer.
http://www.webalizer.org/
To monitor other things like CPU and memory usage I would suggest snmpd together with some other tool like mrtg. http://oss.oetiker.ch/mrtg/
If you think that webalizer does not sample data often enough with its hourly statistics but the sample time of mrtg with 5 minutes would be better it is also possible to provide more data with snmpd by writing an snmpd extension. Such an extension could parse the apache log file with a rather small amount of code and give you all the graphing functionality for free from mrtg or some other tool processing snmp data.

What are the most important statistics to look at when deploying a Node.js web-application?

First - a little bit about my background: I have been programming for some time (10 years at this point) and am fairly competent when it comes to coding ideas up. I started working on web-application programming just over a year ago, and thankfully discovered nodeJS, which made web-app creation feel a lot more like traditional programming. Now, I have a node.js app that I've been developing for some time that is now running in production on the web. My main confusion stems from the fact that I am very new to the world of the web development, and don't really know what's important and what isn't when it comes to monitoring my application.
I am using a Joyent SmartMachine, and looking at the analytics options that they provide is a little overwhelming. There are so many different options and configurations, and I have no clue what purpose each analytic really serves. For the questions below, I'd appreciate any answer, whether it's specific to Joyent's Cloud Analytics or completely general.
QUESTION ONE
Right now, my main concern is to figure out how my application is utilizing the server that I have it running on. I want to know if my application has the right amount of resources allocated to it. Does the number of requests that it receives make the server it's on overkill, or does it warrant extra resources? What analytics are important to look at for a NodeJS app for that purpose? (using both MongoDB and Redis on separate servers if that makes a difference)
QUESTION TWO
What other statistics are generally really important to look at when managing a server that's in production? I'm used to programs that run once to do something specific (e.g. a raytracer that finishes running once it has computed an image), as opposed to web-apps which are continuously running and interacting with many clients. I'm sure there are many things that are obvious to long-time server administrators that aren't to newbies like me.
QUESTION THREE
What's important to look at when dealing with NodeJS specifically? What are statistics/analytics that become particularly critical when dealing with the single-threaded event loop of NodeJS versus more standard server systems?
I have other questions about how databases play into the equation, but I think this is enough for now...
We have been running node.js in production nearly an year starting from 0.4 and currenty 0.8 series. Web app is express 2 and 3 based with mongo, redis and memcached.
Few facts.
node can not handle large v8 heap, when it grows over 200mb you will start seeing increased cpu usage
node always seem to leak memory, or at least grow large heap size without actually using it. I suspect memory fragmentation, as v8 profiling or valgrind shows no leaks in js space nor resident heap. Early 0.8 was awful in this respect, rss could be 1GB with 50MB heap.
hanging requests are hard to track. We wrote our middleware to monitor these especially as our app is long poll based
My suggestions.
use multiple instances per machine, at least 1 per cpu. Balance with haproxy, nginx or such with session affinity
write midleware to report hanged connections, ie ones that code never responded or latency was over threshold
restart instances often, at least weekly
write poller that prints out memory stats with process module one per minute
Use supervisord and fabric for easy process management
Monitor cpu, reported memory stats and restart on threshold
Whichever the type of web app, NodeJS or otherwise, load testing will answer whether your application has the right amount of server resources. A good website I recently found for this is Load Impact.
The real question to answer is WHEN does the load time begin to increase as the number of concurrent users increase? A tipping point is reached when you get to a certain number of concurrent users, after which the server performance will start to degrade. So load test according to how many users you expect to reach your website in the near future.
How can you estimate the amount of users you expect?
Installing Google Analytics or another analytics package on your pages is a must! This way you will be able to see how many daily users are visiting your website, and what is the growth of your visits from month-to-month which can help in predicting future expected visits and therefore expected load on your server.
Even if I know the number of users, how can I estimate actual load?
The answer is in the F12 Development Tools available in all browsers. Open up your website in any browser and push F12 (or for Opera Ctrl+Shift+I), which should open up the browser's development tools. On Firefox make sure you have Firebug installed, on Chrome and Internet Explorer it should work out of the box. Go to the Net or Network tab and then refresh your page. This will show you the number of HTTP requests, bandwidth usage per page load!
So the formula to work out daily server load is simple:
Number of HTTP requests per page load X the average number of pages load per user per day X Expected number of concurrent users = Total HTTP Requests to Server per Day
And...
Number of MBs transferred per page load X the average number of pages load per user per day X Expected number of concurrent users = Total Bandwidth Required per Day
I've always found it easier to calculate these figures on a daily basis and then extrapolate it to weeks and months.
Node.js is single threaded so you should definitely start a process for every cpu your machine has. Cluster is by far the best way to do this and has the added benefit of being able to restart died workers and to detect unresponsive workers.
You also want to do load testing until your requests start timing out or exceed what you consider a reasonable response time. This will give you a good idea of the upper limit your server can handle. Blitz is one of the many options to have a look at.
I have never used Joyent's statistics, but NodeFly and their node-nodefly-gcinfo is a great tools to monitor node processes.

Number of instances needed for windows azure application

I'm fairly new to Windows Azure and want to host a survey application that will be filled out by appr. 30.000 users simultaniously.
The application consists of 1 .aspx page that will be sent to the client once, asks 25 questions and will give a wrap-up of the given answers at the end. When the user has given the answer and hits the 'next question' buttons the given answer will be send via an .ashx handler to the server. The response is the next question and answers. The wrap-up is sent to the client after a full postback.
The answer is saved in an Azure Table that is partitioned so that each partition can hold a max of 450 users.
I would like to ask if someone can give an estimated guess about how many web-role instances we need to start in order to have this application keep running. (If that is too hard to say, is it more likely to start 5, 50 or 500 instances?)
What is a better way to go: 20 small instances or 5 large instances?
Thanks for your help!
The most obvious answer: you would be best served by testing this yourself and see how your application holds up. You can easily get performance counters and other diagnostics out of Windows Azure; for instance, you can connect Microsoft SCOM (System Center Operations Manager) to monitor your environment during test. Site Hammer is a simple load testing tool for Windows Azure (on MSDN code gallery).
Apart from this very obvious answer, I will share some guesstimates: given the type of load, you are probably better of with more small instances as opposed to a lower number of large ones, especially since you already have your storage partitioned. If you are really going to have 30K visitors simultaneously and give them a ~15 second interval between reading the questions & posting their answers you are looking at 2,000 requests per second. 10 nodes should be more than enough to handle that load. Remember that this is just a simple estimate, lacking any form of insight in your architecture, etc. For these types of loads, caching is a very good idea; it will dramatically increase the load each node can handle.
However, the best advice I can give you is to make sure that you are actively monitoring. It takes less than 30 minutes to spin up additional instances, so if you monitor your environment and/or make sure that you are notified whenever it starts to choke, you can easily upgrade your setup. Keep in mind that you do need to contact customer support to be able to go over 20 instances (this is a default limit, in place to protect you from over-spending).
Aside from the sage advice tijmenvdk gave you, let me add my opinion on instance size. In general, go with the smallest size that will support your app, and then scale out to handle increased traffic. This way, when you scale back down, your minimum compute cost is kept low. If you ran, say, a pair of extra-large instances as your baseline (since you always want minimum two instances to get the uptime SLA), your cost footprint starts at 0.12 x 8 x 2 = $1.92 per hour, even during low-traffic times. If you go with small instances, you'd be at 0.12 x 1 x 2 = $0.24 per hour.
Each VM size as associated CPU, memory, and local 9non-durable) disk storage, so pick the smallest size unit that your app works efficiently in.
For load/performance-testing, you might also want to consider a hosted solution such as Loadstorm.
How simultaneous are the requests in reality?
Will they all type the address in at exactly the same time?
That said, profile your app locally, this will enable you to estimate CPU, Network and Memory usage on Azure. Then, rather than looking at how many instances you need, look at how you can reduce the requirement! Apply these tips, and profile locally again.
Most performance tips have a tradeoff between cpu, memory or bandwith usage, the idea is to ensure that they scale equally. If you're application runs out of memory, but you have loads of CPU and network, dont
For a single page survey, ensure your html, css & js is minified, ensure its cacheable.
Combine them if possible, and to get really scaleable, push static files (css,js & images) to a CDN. This all reduces the number of requests the webserver has to deal with, and therefore reduces the number of webroles you will need = less network.
How does the ashx return the response? i.e. is it sending html, xml or json?
personally, I'd get it to return JSON, as this will require less network bandwidth, and most likely less server side processing = less mem and network.
Use Asyncronous API's to access azure storage (this uses IO completion ports to free up the iis thread to handle more requests until azure storage comes back = enabling cpu to scale)
tijmenvdk has already mentioned using queues to write. Do the list of questions change? if not, cache them, so that the app only has to read from table storage once on start-up and once for each client for the final wrap-up = saves network and cpu at the expense of memory.
All of these tips are equally applicable to a normal web application, on a single server or web-farm environment.
The point I'm trying to make is that what you can't measure, you cant improve, and measurement, improvement and cost all go hand in hand. Dynamic scaling will reduce costs, but fundamentally if your application hasn't been measured and resource usage optimised, asking how many instances you need is pointless.

Which resources should one monitor on a Linux server running a web-server or database

When running any kind of server under load there are several resources that one would like to monitor to make sure that the server is healthy. This is specifically true when testing the system under load.
Some examples for this would be CPU utilization, memory usage, and perhaps disk space.
What other resource should I be monitoring, and what tools are available to do so?
As many as you can afford to, and can then graph/understand/look at the results. Monitoring resources is useful for not only capacity planning, but anomaly detection, and anomaly detection significantly helps your ability to detect security events.
You have a decent start with your basic graphs. I'd want to also monitor the number of threads, number of connections, network I/O, disk I/O, page faults (arguably this is related to memory usage), context switches.
I really like munin for graphing things related to hosts.
I use Zabbix extensively in production, which comes with a stack of useful defaults. Some examples of the sorts of things we've configured it to monitor:
Network usage
CPU usage (% user,system,nice times)
Load averages (1m, 5m, 15m)
RAM usage (real, swap, shm)
Disc throughput
Active connections (by port number)
Number of processes (by process type)
Ping time from remote location
Time to SSL certificate expiry
MySQL internals (query cache usage, num temporary tables in RAM and on disc, etc)
Anything you can monitor with Zabbix, you can also attach triggers to - so it can restart failed services; or page you to alert about problems.
Collect the data now, before performance becomes an issue. When it does, you'll be glad of the historical baselines, and the fact you'll be able to show what date and time problems started happening for when you need to hunt down and punish exactly which developer made bad changes :)
I ended up using dstat which is vmstat's nicer looking cousin.
This will show most everything you need to know about a machine's health,
including:
CPU
Disk
Memory
Network
Swap
"df -h" to make sure that no partition runs full which can lead to all kinds of funky problems, watching the syslog is of course also useful, for that I recommend installing "logwatch" (Logwatch Website) on your server which sends you an email if weird things start showing up in your syslog.
Cacti is a good web-based monitoring/graphing solution. Very complete, very easy to use, with a large userbase including many large Enterprise-level installations.
If you want more 'alerting' and less 'graphing', check out nagios.
As for 'what to monitor', you want to monitor systems at both the system and application level, so yes: network/memory/disk i/o, interrupts and such over the system level. The application level gets more specific, so a webserver might measure hits/second, errors/second (non-200 responses), etc and a database might measure queries/second, average query fulfillment time, etc.
Beware the afore-mentioned slowquerylog in mysql. It should only be used when trying to figure out why some queries are slow. It has the side-effect of making ALL your queries slow while it's enabled. :P It's intended for debugging, not logging.
Think 'passive monitoring' whenever possible. For instance, sniff the network traffic rather than monitor it from your server -- have another machine watch the packets fly back and forth and record statistics about them.
(By the way, that's one of my favorites -- if you watch connections being established and note when they end, you can find a lot of data about slow queries or slow anything else, without putting any load on the server you care about.)
In addition to top and auth.log, I often look at mtop, and enable mysql's slowquerylog and watch mysqldumpslow.
I also use Nagios to monitor CPU, Memory, and logged in users (on a VPS or dedicated server). That last lets me know when someone other than me has logged in.
network of course :) Use MRTG to get some nice bandwidth graphs, they're just pretty most of the time.. until a spammer finds a hole in your security and it suddenly increases.
Nagios is good for alerting as mentioned, and is easy to get setup. You can then use the mrtg plugin to get alerts for your network traffic too.
I also recommend ntop as it shows where your network traffic is going.
A good link to get you going with Munin and Monit: link text
I typically watch top and tail -f /var/log/auth.log.

Resources