Tux, Varnish or Squid? [closed] - linux

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
We need a web content accelerator for static images to sit in front of our Apache web front end servers
Our previous hosting partner used Tux with great success and I like the fact it's part of Red Hat Linux which we're using, but its last update was in 2006 and there seems little chance of future development. Our ISP recommends we use Squid in reverse caching proxy role.
Any thoughts between Tux and Squid? Compatibility, reliability and future support are as important to us as performance.
Also, I read in other threads here about Varnish; anyone have any real-world experience of Varnish compared with Squid, and/or Tux, gained in high-traffic environments?
Cheers
Ian
UPDATE: We're testing Squid now. Using ab to pull the same image 10,000 times with a concurrency of 100, both Apache on its own and Squid/Apache burned through the requests very quickly. But Squid made only a single request to Apache for the image then served them all from RAM, whereas Apache alone had to fork a large number of workers in order to serve the images. It looks like Squid will work well in freeing up the Apache workers to handle dynamic pages.

In my experience varnish is much faster than squid, but equally importantly it's much less of a black box than squid is. Varnish gives you access to very detailed logs that are useful when debugging problems. It's configuration language is also much simpler and much more powerful that squid's.

#Daniel, #MKUltra, to elaborate on Varnish's supposed problems with cookies, there aren't really any. It is completely normal not to cache a request if it returns a cookie with it. Cookies are mostly meant to be used to distinguish different user preferences, so I don't think one would want to cache these (especially if you they include some secret information like a session id or a password!).
If you server sends cookies with your .js and images, that's a problem on your backend side, not on Varnish's side. As referenced by #Daniel (link provided), you can force the caching of these files anyway, thanks to the really cool language/DSL integrated in Varnish...

If you're looking to push static images and a lot of them, you may want to look at some basics first.
Your application should ensure that all correct headers are being passed, Cache-Control and Expires for example. That should result in the clients browsers caching those images locally and cutting down on your request count.
Use a CDN (if it's in your budget), this brings the images closer to your clients (generally) and will result in a better user experience for them. For the CDN to be a productive investment you'll again need to make sure all your necessary caching headers are properly set,
as per the point I made in the previous paragraph.
After all that if you are still going to use a reverse proxy, I recommend using nginx in proxy mode, over Varnish and squid. Yes Varnish is fast, and as fast as nginx, but what you're wanting to do is really quite simple, Varnish comes into it's own when you want to do complex caching, and ESI. So Keep It Simple, Stupid. nginx will do your job very nicely indeed.
I have no experience with Tux, so I can't comment on it sorry.

For what it's worth, I recently set up nginx as a reverse-proxy in front of Apache on a 6-year-old low-power webserver (running Fedora Core 2) which was under a mild DDoS attack (10K req/sec). Pages loading was snappy (<100ms) and system load stayed low at around 20% CPU utilization, and very little memory consumption. The attack lasted 1 week, and visitors saw no ill effects.
Not bad for over half a million hits per minute sustained. Just be sure to log to /dev/null.

It's interesting that no one mentioned the Apache Traffic Server (formerly, Yahoo! Traffic Server) http://trafficserver.apache.org/
Please have a look at it, it works beautifully.

We use Varnish on http://www.mangahigh.com and have been able to scale from around 100 concurrent pre-varnish to over 560 concurrent post-varnish (server load remained at 0 at this point, so there's plenty of space to grow!). Documentation for varnish could be better, but it is quite flexible once you get used to it.
Varnish is meant to be a lot faster than Squid (having never used Squid, I can't say for certain) - and http://users.linpro.no/ingvar/varnish/stats-2009-05-19 shows Twitter, Wikia, Hulu, perezhilton.com and quite a number of other big names also using it.

both Squid and nginx are specifically designed for this. nginx is particularly easy to configure for a server farm, and can also be a frontend to FastCGI.

I've only used squid and can't compare. We use squid to cache an entire site on a server in the USA (all data gets pulled from a machine in Germany). It was pretty easy to set up and works nicely. I've found the documentation to be kind of lacking unless you already know what to look for.

Since you already have apache serving the static and dynamic content I would recommend you to go with Varnish.
In this way you can use your apache to deliver the static content and use varnish to cache it for you. Varnish is very flexible, giving you both caching and loadbalancing features for growing your website in the best ways.

We are about to roll out a varnish 2.01 server in front of an IIS 6 installation. The only caveats we've had was with our SSL (as varnish can't handle SSL). So we've also installed Nginx to handle those requests.
In all our testing we've shown a 66% percent increase in the amount of traffic the site can handle.
My only gripe is that varnish doesn't handle cookies well, and the documentation is still a bit scattered.

Nobody mentions that Squid follows the HTTP specification to the letter (or at least they try to) whereas Varnish does not. In my opinion, this means Varnish is better suited for caching content for individual sites (by extensively tuning Varnish) and Squid is better for caching content for many sites (each of which will have to make their content "cachable" according to spec).

Related

Why all those new languages have their own web server?

I am kinda old school and the first programming language for web I saw was PHP, and everybody uses it with Apache. At that time, I also knew ASP, which were used along with Microsoft IIS and, later, ASP.NET, that runs over IIS, as well.
The time passed, I went to the ERP world and, when I came back (few months ago), I knew Golang and Node.js and for my surprise they have their own web servers.
I can see many advantages in the builtin web servers, but, every application needs to rewrite their web server rules (I faced that recently when I needed to setup a HTTPS server using Express.js).
After some hard work to understand all the nuances of the HTTP protocol, I asked myself: and if I am doing it in the wrong way? If all the permissive rules that I created in my dev server go to production? Maybe this is an useless concern. But maybe I am creating a fragile server that could be exploited by a naive hacker.
Using a server like Apache it is harder to misuse security rules, because there are settings for development and production environments that are explicit. If the rules are hardcoded (as they are in Node or Go), an unaware developer can use development rules in production and nobody is going to see it before the stuff happens.
Any thoughts?
web server focuses on the speed capacity and the caculating capacity. No matter how good java or php web is or how many old companies put them in use, as long as a new language can provides a faster speed and better capacity such as go, more programmer would go for it.
by the way, to run a web server in go is really such an easy thing.It's faster building and slightly running.And the routine in go helps the web server beter serves milions of client requests,Which old web language can hardly do it.
You can still use nginx or apache in front of your golang gateway for many reasons including tls termination.
But service to service communication might be nice to communicate directly to services and the golang http webserver is fast. It also supports http2 out of the box. Go leverages its "goroutines" to reduce overhead from the os to handle many requests at once.
Node.js and Golang do not have their web server, these are just some lib packages implement http-protocols and open some ports to provide services.
Like Spring web.
Nginx/IIS/Apache are true server, web server just a component of them.
I think Spring should meet the full application scenarios, include /gateway/security/route/package/runtime manage/ and so on.
But when we has some different language platform, then we need nginx/apache/spring gateway/zuul/or others to route them.

Website with node.js, hosting architecture decision

I am planning to start my first website. The website is a little HTML5+CSS+JS website with a backend running node.js that serves the data stored on mongodb. I would like to know which one is the best solution regarding mostly the security:
Web hosting (SSL and cloudflare) + VPS serving on port 3000 (with SSL, cloudflare and node.js with sensible data;users and pass and a local mongodb)
Everything in the same VPS.
Any other approach you can give.
The thing is that in the first approach there are two elements in the architecture so if someone wants to hack it i suppose it's more difficult. On the other hand in the second approach if the VPS is hacked everything is hacked and they could access to passwords, mongodb database. I am quite obsessed with security as it is my first website and i don't know what meassures to make to protect my VPS (node.js and mongodb).
Furthermore, i would like to know in terms of efficiency which would be best solution imagine for a 10MB website with 1.000 visits a day.
Regardless of how many actual servers you decide to deploy on, I'd strongly suggest not serving your site directly from node.js. Instead, proxy it through a more robust http server such as Apache or Nginx or even lighttpd. For the very simple reason that the http module in node.js was never meant to protect against worms and hacking attempts and various other malware.
I've written web servers from scratch myself and have noticed that in general, you'll get your first hacking attempt within the first hour of putting your server online. You'll get around a dozen or so hacking attempt per day on the slowest days and it goes up from there. These attempts are so common that most server software no longer log them in access logs and simply block them.
From my own personal experience I estimate that around 5% to 10% of my bandwidth is consumed by failed hacking/infection attempts. That is when I'm not being actively attacked.
Security through obscurity is not good security. Especially since node's http module is not very obscure in the first place and someone is bound to find a hackable weakness one of these days.
Apart from security, you also waste fewer CPU cycles ignoring these hacking attempts in Apache or Nginx compared to node.js since you don't need to run any javascript code to handle them.
You can make the choice between the two architectures moot. Both architectures are hackable, and your data will be exposed.
If security is paramount, check out Mylar - it's a platform that protects data confidentiality even when an attacker gets full access to servers. Mylar stores only encrypted data on the server, and decrypts data only in users' browsers.
It runs on top of Meteor, which in turn runs on top of Node.js and uses MongoDB, so if your web app is small, it should be easy to port the code. Meteor also stores passwords using bcrpyt, the best
password hashing algorithm nowadays.

G-Wan / Ngnix / Squid / Varnish as HTTP Live Streaming Reverse Proxy

I'm planing to build a caching reverse proxy for HTTP Live Stream(Apple HLS)
For my situation, I configured size of each segment file will be about 500-700KB.I read a lot of article about the performance review for popular Web Server software. But all of them are testing small-file-size caching. So is there anybody has experience about build cache server for large file(honestly, 700KB is not too large I think)? Or any review article I missed you can provide to me?
I think I can get answers from review article before. But maybe I list my questions below.
If I increase the number of total segments, will this cause performance decrease(since search take longer time) and how serious it is?
If I want to maximize the throughput(Let's say 1Gbps I got), what server software and CPU I should choose?(This is as same as asking which server software can provide highest throughput)
As jeremy reminds me, caching time will really affect the hit rate and performance. For caching segments, should I set the caching time to be the rotation time?(exp. 00-99.ts#10s each .ts file should be changed 990s after last time updated, so rotation time is 990s) Or any better suggestion?
Thank you.
500-700KB files are still very small, I've had great success with both NGinx and Varnish for this exact task.
You will want to make sure you have a fairly large expiration time on your .ts files (you want cache hits on these). And you want to set your expiration on .m3u8 files to be less than 1/2 of your segment length.
This is especially true if you are going to be using a CDN, since the CDN will honor the cache control headers (typically) and you will want to limit the number of requests back to the origin.

Obfuscating server headers

I have a WSGI application running in PythonPaste. I've noticed that the default 'Server' header leaks a fair amount of information ("Server: PasteWSGIServer/0.5 Python/2.6").
My knee jerk reaction is to change it...but I'm curious what others think.
Is there any utility in the server header, or benefit in removing it? Should I feel uncomfortable about giving away information on my infrastructure?
Thanks
Well "Security through Obscurity" is never a best practice; your equipment should be able to maintain integrity against an attacker that has extensive knowledge of your setup (barring passwords, console access, etc). Can't really stop a DDOS or something similar, but you shouldn't have to worry about people finding out you OS version, etc.
Still, no need to give away information for free. Fudging the headers may discourage some attackers, and, in cases like this where you're running an application that may have a known exploit crop up, there are significant benefits in not advertising that you're running it.
I say change it. Internally, you shouldn't see much benefit in leaving it alone, and externally you have a chance of seeing benefits if you change it.
Given the requests I find in my log files (like requests for IIS-specific bugs in Apache logs, and I'm sure IIS server logs will show Apache-specific requests as well), there's many bots out there that don't care about any such header at all. I guess almost everything is brute force nowadays.
(And actually, as for example I've set up quite a few instances of Tomcat sitting behind IIS, I guess I would not take the headers into account either, if I were to try to hack my way into some server.)
And above all: when using free software I kind of find it appropriate to give the makers some credits in statistics.
Masking your version number is a very important security measure. You do not want to give the attacker any information about what software you are running. This security feature is available in the mod_security, the Open Source Web Application Firewall for Apache:
http://www.modsecurity.org/
Add this line to your mod_security configuration file:
SecServerSignature "IIS/6.0"

Is it good practice to hide web server information in HTTP headers?

This question is more security related than programming related, sorry if it shouldn't be here.
I'm currently developing a web application and I'm curious as to why most websites don't mind displaying their exact server configuration in HTTP headers, like versions of Apache and PHP, with complete "mod_perl, mod_python, ..." listing and so on.
From a security point of view, I'd prefer that it would be impossible to find out if I'm running PHP on Apache, ASP.NET on IIS or even Rails on Lighttpd.
Obviously "obscurity is not security" but should I be worried at all that visitors know what version of Apache and PHP my server is running ? Is it good practice or totally unnecessary to hide this information ?
Prevailing wisdom is to remove the server ID and the version; better yet, change them to another legitimate server ID and version - that way the attacker goes off trying IIS vulnerabilities against Apache or something like that. Might as well mislead the attacker.
But honestly, there are so many other clues to go by, I wonder about whether this is worth it. I suppose it could stop attackers using a search engine to find servers with known vulnerabilities.
(Personally, I don't bother on my HTTP server, but it's written in Java and much less vulnerable to the typical kinds of attack.)
I think you usually see those headers because the systems send them by default.
I routinely remove them as they provide no real value and could, as you suggested reveal information about the server.
Hiding the information in the headers usually just slows down the lazy and ignorant villains. There are many ways to fingerprint a system.
Running nmap -O -sV against an IP will give you the OS and service versions with a fairly high degree of accuracy. The only extra info you're giving away by having your server advertise that information is which modules you have loaded.
It seems that some of the answers are missing an obvious advantage of turning off the headers.
Yes, you all are right; turning of the headers (and the statusline present e.g. at directory listings) does not stop an attacker from finding out what software you use.
However, turning this information off prevents malware which uses google to look for vulnerable systems from finding you.
tldr: Don't use it as a (or even as THE) security-measure, but as a measure to drive away unwanted traffic.
I normally turn off Apache's long header version information with ServerTokens; it adds nothing useful.
One point which nobody has picked up on, is it looks like better security to a prospective client, pen testing company etc, if you're giving out less information from your web server.
So giving less information out boosts the perceived security (i.e. it shows you have actually thought about it and done something)

Resources