Server keeps crashing due to high RAM usage - realtime IP monitoring? - linux

Is there a way to monitor realtime IP traffic coming into my server and seeing how much bandwith and RAM is being used?
Every once in a while it seems like I get a DOS type of attack where my website becomes unresponsive, and I can't do anthing until I request a hard re-boot from my hosting company. I would like to be able to see which IP addresses are currently listed on my server at the time the server becomes unresponsive due to exhausted RAM usage this way I can block these IPs from accessing my website again in the future.
Thank you!

There are a lot of tools that you can use:
Command line tools: iptraf, iftop, jnettop
Web interface tools Ntop(It runs as a Daemon and you can see a lot of graphs and summaries about all traffic in your server)
The old Nagios (Good and Robust but you may need some time to familiarize yourself with)
You can find here a good article on best monitoring tools in the Unix/Linux world : Linux performance monitoring tools.


How to limit account resource usage on a single dedicated centos6 server

I have a centos6 linux box for hosting client websites.
Some of the websites spike on resources and the server becomes unresponsive.
Is it possible to write a script that can limit a particular account or process that uses a high load?
I don't know any linux-related scripting, but if I can be pointed in the right direction, I can learn what's necessary.
I'm not sure, but have a look at `cgroups'

What are the most important statistics to look at when deploying a Node.js web-application?

First - a little bit about my background: I have been programming for some time (10 years at this point) and am fairly competent when it comes to coding ideas up. I started working on web-application programming just over a year ago, and thankfully discovered nodeJS, which made web-app creation feel a lot more like traditional programming. Now, I have a node.js app that I've been developing for some time that is now running in production on the web. My main confusion stems from the fact that I am very new to the world of the web development, and don't really know what's important and what isn't when it comes to monitoring my application.
I am using a Joyent SmartMachine, and looking at the analytics options that they provide is a little overwhelming. There are so many different options and configurations, and I have no clue what purpose each analytic really serves. For the questions below, I'd appreciate any answer, whether it's specific to Joyent's Cloud Analytics or completely general.
Right now, my main concern is to figure out how my application is utilizing the server that I have it running on. I want to know if my application has the right amount of resources allocated to it. Does the number of requests that it receives make the server it's on overkill, or does it warrant extra resources? What analytics are important to look at for a NodeJS app for that purpose? (using both MongoDB and Redis on separate servers if that makes a difference)
What other statistics are generally really important to look at when managing a server that's in production? I'm used to programs that run once to do something specific (e.g. a raytracer that finishes running once it has computed an image), as opposed to web-apps which are continuously running and interacting with many clients. I'm sure there are many things that are obvious to long-time server administrators that aren't to newbies like me.
What's important to look at when dealing with NodeJS specifically? What are statistics/analytics that become particularly critical when dealing with the single-threaded event loop of NodeJS versus more standard server systems?
I have other questions about how databases play into the equation, but I think this is enough for now...
We have been running node.js in production nearly an year starting from 0.4 and currenty 0.8 series. Web app is express 2 and 3 based with mongo, redis and memcached.
Few facts.
node can not handle large v8 heap, when it grows over 200mb you will start seeing increased cpu usage
node always seem to leak memory, or at least grow large heap size without actually using it. I suspect memory fragmentation, as v8 profiling or valgrind shows no leaks in js space nor resident heap. Early 0.8 was awful in this respect, rss could be 1GB with 50MB heap.
hanging requests are hard to track. We wrote our middleware to monitor these especially as our app is long poll based
My suggestions.
use multiple instances per machine, at least 1 per cpu. Balance with haproxy, nginx or such with session affinity
write midleware to report hanged connections, ie ones that code never responded or latency was over threshold
restart instances often, at least weekly
write poller that prints out memory stats with process module one per minute
Use supervisord and fabric for easy process management
Monitor cpu, reported memory stats and restart on threshold
Whichever the type of web app, NodeJS or otherwise, load testing will answer whether your application has the right amount of server resources. A good website I recently found for this is Load Impact.
The real question to answer is WHEN does the load time begin to increase as the number of concurrent users increase? A tipping point is reached when you get to a certain number of concurrent users, after which the server performance will start to degrade. So load test according to how many users you expect to reach your website in the near future.
How can you estimate the amount of users you expect?
Installing Google Analytics or another analytics package on your pages is a must! This way you will be able to see how many daily users are visiting your website, and what is the growth of your visits from month-to-month which can help in predicting future expected visits and therefore expected load on your server.
Even if I know the number of users, how can I estimate actual load?
The answer is in the F12 Development Tools available in all browsers. Open up your website in any browser and push F12 (or for Opera Ctrl+Shift+I), which should open up the browser's development tools. On Firefox make sure you have Firebug installed, on Chrome and Internet Explorer it should work out of the box. Go to the Net or Network tab and then refresh your page. This will show you the number of HTTP requests, bandwidth usage per page load!
So the formula to work out daily server load is simple:
Number of HTTP requests per page load X the average number of pages load per user per day X Expected number of concurrent users = Total HTTP Requests to Server per Day
Number of MBs transferred per page load X the average number of pages load per user per day X Expected number of concurrent users = Total Bandwidth Required per Day
I've always found it easier to calculate these figures on a daily basis and then extrapolate it to weeks and months.
Node.js is single threaded so you should definitely start a process for every cpu your machine has. Cluster is by far the best way to do this and has the added benefit of being able to restart died workers and to detect unresponsive workers.
You also want to do load testing until your requests start timing out or exceed what you consider a reasonable response time. This will give you a good idea of the upper limit your server can handle. Blitz is one of the many options to have a look at.
I have never used Joyent's statistics, but NodeFly and their node-nodefly-gcinfo is a great tools to monitor node processes.

Remote monitoring of system stats with node.js

We have implemented a monitoring solution in node.js, which does some basic checks for database integrity and API up-time. We want to expand this system to collect basic system stats of our Linux servers like CPU and disc usage. Some of these servers are behind a firewall which is out of our control, with only some very basic ports open (ssh,ftp,http,https).
How can I gather the system information of these servers in node.js. Are there monitoring systems which expose these information through a (secured) RESTful API?
I've had a lot of success with this ssh client written in javascript:
So there tons of available solutions for monitoring system stats: Nagios, Zabbix, Scout, Cacti. There are even some hosted ones like ServerDensity.
All of these systems should cover the top-level stats: CPU, RAM, Disk IO & Network. They all have a plug-in infrastructure so that you can send custom stats (API uptime, DB availability) and send them along with the regular stats.
If you're running on a cloud infrastructure somewhere, many of these provide information "out of the box", generally in your account dashboard (see guys like Joyent or Azure).
Big question here is "what else do you need"?
Use NRPE from Nagios as a client on the box you want to monitor. It's fairly simple to set up and it's API is documentented.

Could a web-scraper get around a good throttle protection?

Suppose that a data source sets a tight IP-based throttle. Would a web scraper have any way to download the data if the throttle starts rejecting their requests as early as 1% of the data being downloaded?
The only technique I could think of a hacker using here would be some sort of proxy system. But, it seems like the proxies (even if fast) would eventually all reach the throttle.
Update: Some people below have mentioned big proxy networks like Yahoo Pipes and Tor, but couldn't these IP ranges or known exit nodes be blacklisted as well?
A list of thousands or poxies can be compiled for FREE. IPv6 addresses can be rented for pennies. Hell, an attacker could boot up an Amazon EC2 micro instance for 2-7 cents an hour.
And you want to stop people from scraping your site? The internet doesn't work that way, and hopefully it never will.
(I have seen IRC servers do a port scan on clients to see if the following ports are open: 8080,3128,1080. However there are proxy servers that use different ports and there are also legit reasons to run proxy server or to have these ports open, like if you are running Apache Tomcat. You could bump it up a notch by using YAPH to see if a client is running a proxy server. In effect you'd be using an attacker's too against them ;)
Someone using Tor would be hopping IP addresses every few minutes. I used to run a website where this was a problem, and resorted to blocking the IP addresses of known Tor exit nodes whenever excessive scraping was detected. You can implement this if you can find a regularly updated list of Tor exit nodes, for example,
You could use a P2P crawling network to accomplish this task. There will be a lot of IPs availble and there will be no problem if one of them become throttled. Also, you may combine a lot of client instances using some proxy configuration as suggested in previous answers.
I think you can use YaCy, a P2P opensource crawling network.
A scraper that wants the information will get the information. Timeouts, changing agent names, proxies, and of course EC2/RackSpace or any other cloud services that have the ability to start and stop servers with new IP addresses for pennies.
I've heard of people using Yahoo Pipes to do such things, essentially using Yahoo as a proxy to pull the data.
Maybe try running your scraper on amazon ec2 instances. Every time you get throttled, startup a new instance (at new IP), and kill the old one.
It depends on the time the attacker has for obtaining the data. If most of the data is static, it might be interesting for an attacker to run his scraper for, say, 50 days. If he is on a DSL line where he can request a "new" IP address twice a day, 1% limit would not harm him that much.
Of course, if you need the data more quickly (because it is outdated quickly), there are better ways (use EC2 instances, set up a BOINC project if there is public interest in the collected data, etc.).
Or have a Pyramid scheme a la "get 10 people to run my crawler and you get PORN, or get 100 people to crawl it and you get LOTS OF PORN", as it was quite common a few years ago with ad-filled websites. Because of the competition involved (who gets the most referrals) you might quickly get a lot of nodes running your crawler for very little money.

Which resources should one monitor on a Linux server running a web-server or database

When running any kind of server under load there are several resources that one would like to monitor to make sure that the server is healthy. This is specifically true when testing the system under load.
Some examples for this would be CPU utilization, memory usage, and perhaps disk space.
What other resource should I be monitoring, and what tools are available to do so?
As many as you can afford to, and can then graph/understand/look at the results. Monitoring resources is useful for not only capacity planning, but anomaly detection, and anomaly detection significantly helps your ability to detect security events.
You have a decent start with your basic graphs. I'd want to also monitor the number of threads, number of connections, network I/O, disk I/O, page faults (arguably this is related to memory usage), context switches.
I really like munin for graphing things related to hosts.
I use Zabbix extensively in production, which comes with a stack of useful defaults. Some examples of the sorts of things we've configured it to monitor:
Network usage
CPU usage (% user,system,nice times)
Load averages (1m, 5m, 15m)
RAM usage (real, swap, shm)
Disc throughput
Active connections (by port number)
Number of processes (by process type)
Ping time from remote location
Time to SSL certificate expiry
MySQL internals (query cache usage, num temporary tables in RAM and on disc, etc)
Anything you can monitor with Zabbix, you can also attach triggers to - so it can restart failed services; or page you to alert about problems.
Collect the data now, before performance becomes an issue. When it does, you'll be glad of the historical baselines, and the fact you'll be able to show what date and time problems started happening for when you need to hunt down and punish exactly which developer made bad changes :)
I ended up using dstat which is vmstat's nicer looking cousin.
This will show most everything you need to know about a machine's health,
"df -h" to make sure that no partition runs full which can lead to all kinds of funky problems, watching the syslog is of course also useful, for that I recommend installing "logwatch" (Logwatch Website) on your server which sends you an email if weird things start showing up in your syslog.
Cacti is a good web-based monitoring/graphing solution. Very complete, very easy to use, with a large userbase including many large Enterprise-level installations.
If you want more 'alerting' and less 'graphing', check out nagios.
As for 'what to monitor', you want to monitor systems at both the system and application level, so yes: network/memory/disk i/o, interrupts and such over the system level. The application level gets more specific, so a webserver might measure hits/second, errors/second (non-200 responses), etc and a database might measure queries/second, average query fulfillment time, etc.
Beware the afore-mentioned slowquerylog in mysql. It should only be used when trying to figure out why some queries are slow. It has the side-effect of making ALL your queries slow while it's enabled. :P It's intended for debugging, not logging.
Think 'passive monitoring' whenever possible. For instance, sniff the network traffic rather than monitor it from your server -- have another machine watch the packets fly back and forth and record statistics about them.
(By the way, that's one of my favorites -- if you watch connections being established and note when they end, you can find a lot of data about slow queries or slow anything else, without putting any load on the server you care about.)
In addition to top and auth.log, I often look at mtop, and enable mysql's slowquerylog and watch mysqldumpslow.
I also use Nagios to monitor CPU, Memory, and logged in users (on a VPS or dedicated server). That last lets me know when someone other than me has logged in.
network of course :) Use MRTG to get some nice bandwidth graphs, they're just pretty most of the time.. until a spammer finds a hole in your security and it suddenly increases.
Nagios is good for alerting as mentioned, and is easy to get setup. You can then use the mrtg plugin to get alerts for your network traffic too.
I also recommend ntop as it shows where your network traffic is going.
A good link to get you going with Munin and Monit: link text
I typically watch top and tail -f /var/log/auth.log.
