Slow down rogue web srappers on my website and still use Varnish

Slow down rogue web srappers on my website and still use Varnish - node.js

Imagine there are scrappers crawling my website.
How can I ban them and still white list Google Bots ?
I think I can find the ip range of Google bots, and I am thinking of using Redis to store all the access of the day and if in a short time I see too many requests from the same IP -> ban.
My stack is ubuntu server, nodejs, expressjs.
The main problem I see is that this detection is behind Varnish. So Varnish cache has to be disabled. Any better idea, or good thoughts ?

You can use an Varnish ACL [1], it will be possibly a bit harder to maintain that in apache, but surely will work:
acl bad_boys {
"666.666.666.0"/24; // Your evil range
"696.696.696.696"; //Another evil IP
}
// ...
sub vcl_recv {
if (client.ip ~ bad_boys) {
error 403 "Forbidden";
}
// ...
}
// ...
You can also white-listing, use user agent or other techniques to ensure that it isn't GoogleBot... but I would defend myself in Varnish rather than in Apache.
[1] https://www.varnish-cache.org/docs/3.0/reference/vcl.html#acls

You could stop the crawler using the robots.txt
User-agent: BadCrawler
Disallow: /
This solution works if the crawler follow the robots.txt specifications

Related

Have Node app resort to ISP Not Found instead of 404

I have a Meteor App that I'm whitelisting to just a specific IP.
So something like
handleRoute(req,res) => {
if (req.HEADERS[x-forwarder-for]) === WHITELISTED_IP) {
next(res,req)
} else {
res.writeHead(404);
res.end();
}
}
This works and you get a 404 page.
However, this can lead an attacker to know that the site at least exists. I'd like to obfuscate that further if possible.
Like, if you go to some obscure site that doesn't exist you'll probably see some splash page from your ISP. I'm guessing this is something the ISP put in place when DNS lookup fails.
I'm wondering if it's possible to still show that somehow. This would be using standard Node HTTP Request req,res API.
Thanks!

No, that's not possible. Once the control flow reaches your Node application, an attacker will know that it exists. They will be able to tell the difference between a page that is rendered by the browser on failure to look up a domain name in DNS, and a page you return to them. Besides, they won't be using browsers to investigate targets, so they will see quite a bit more than what a user in a browser would.
I think your best bet would be to copy & paste one of those annoying domain parking pages that web hosts put on a domain when it was purchased but isn't yet hosting a page yet. Ideally you would use the parking page of the domain registrar you used to acquire your domain because it will be the most believable. And of course, try to replicate the entire message (including headers), not just the HTTP body. Unlike the idea of serving a fake "can't resolve domain" page, this one should be entirely possible.

TYPO3 block IP addresses

Somebody tried to get access to my TYPO3 backend. I already have the IP adress of the attacker and want to block it from the backend.
I already tried to block ip with .htaccess but this doensn't work. I think the rules are overwritten by something else in the .htaccess file which I couldn't figure out yet.
Captcha is at the moment not a suitable solution.
Are there any good extensions for blocking IP adresses or is there another way to avoid these brute-force attacks?

If you are really concerned about somebody to be able to successfully get access to the system I suggest to go the "white list" path instead of blacklisting single IPs.
TYPO3 has a built in feature to block backand access for ALL IPs except some white listed ones.
To do this just add the following into AdditionalConfiguration.php putting your own IP and the IPs (or subnets) of other users too.
$GLOBALS["TYPO3_CONF_VARS"]['BE']['IPmaskList'] = 'x.x.x.x,y.y.y.*,z.z.*.*';
Other than that, just make sure you take the basic steps to make your backend more secure:
1) Force SSL for the backend:
$GLOBALS['TYPO3_CONF_VARS']['BE']['lockSSL'] = 2;
2) Implement a secure password policy for the backend users by using e.g. EXT:be_secure_pw
3) Secure session cookies to have ssl_only and http_only attributes:
$GLOBALS['TYPO3_CONF_VARS']['SYS']['cookieHttpOnly']=1;
$GLOBALS['TYPO3_CONF_VARS']['SYS']['cookieSecure']=1;
4) And last but not least: make sure you are using the most recent version of your TYPO3 version line, ideally a maintained LTS version.

You should block requests before PHP/MySql is in use in the best case. So .htaccess is the correct way in my eyes. If it does not work, you should ask your hoster.

It sounds like you want to block the IP of the attacker and put measures in place to block known bad ip's. One of the main issues with blocking the IP of the attacker is that it's fairly easy for an attacker to setup a new IP address and launch a new attack.
There are services that provide lists of known bad ip's if you wanted to implement your own firewall.
Alternatively you can look to place your URL behind a solution such as Cloudflare that have the ability to block IP's or countries. I know of business's that block traffic from China and Russia since they identified that most of their attacks came from these countries.

How can I return a 500 response for all requests to a specific file at the Varnish level?

Background:
Our network structure brings all traffic into a Varnish installation, which then ports traffic to one of 5 different web servers, based on rules that a previous administrator setup. I don't have much experience with Varnish.
Last night we were being bombarded by requests to a specific file. This file is one that we limit to a specific set of servers, and it has direct link to our master database, due to reasons. Obviously, this wasn't optimal, and our site was hit pretty hard because of it. What I attempted to do, and failed, was to write a block of code in the Varnish VCL that would return a 500 response for every request to that file, which I could then comment out after the attack period ended.
Question:
What would that syntax be? I've done my googling, but at this point I think it's the fact that I don't know enough about Varnish to be able to word my search properly, so I'm not finding the information that I need.

You can define your own vcl_recv, prior to any other vcl_recv in your configuration, reload Varnish, and you should get the behaviour you're looking for.
sub vcl_recv {
if (req.url ~ "^/path/to/file(\?.*)?$") {
return (synth(500, "Internal Server Error"));
}
}

How to encrypt parts of a webpage with node.js

I am working on a node.js app. It presents the art of a friend of mine. Besides some general information and a lot of pictures it also includes a contact form for people who want to contact him about his work. I want to encrypt the contact page to protect the personal data that is send through the form but keep the rest of the page unencrypted to reduce the loading time of the page.
I already managed to create a HTTPS server which encrypts my entire website. But I just want to have an encrypted connection for the contact page.
Does anyone know how to implement that?
One solution I though of would be to create two node.js servers. One HTTP server and one HTTPS server and then run the main page on the HTTP server and the contact page on the HTTPS server. The link to the contact page would then include https. Something like: "https://www.domain.com/contact".
But somehow that doesn't feel like the right way so I am hoping that someone has a better solution.
Thank you for your help!

The notion that HTTPS is less performant than plain HTTP is a myth that needs to die.
On modern systems, the overhead of SSL/TLS is < 1% of CPU. In fact, with optimizations like SPDY, you might find that a TLS-secured connection actually performs better than plain HTTP. I'd be willing to wager that in reality, you'd be unable to notice any negative difference in performance.
Another common misconception is that browsers do not cache resources delivered over HTTPS. This is false. All browsers cache HTTPS content by default.
Google now ranks all-HTTPS sites higher than plain HTTP.
The one and only thing you lose by going all HTTPS is intermediate caching proxies, but to be honest I wouldn't lose any sleep over this unless you know your site's users are on a network with a relatively slow link and a local caching proxy. Otherwise, this isn't really a concern.
The overhead associated with TLS is negligible. There is no reason to not just serve your entire site over HTTPS.
Once serving all your content over HTTPS, you should still run a HTTP server that issues permanent redirects. The easiest thing to do would be to run a separate Express app:
var http = express();
http.use(function(req, res){
res.redirect(301, 'https://mydomain.com' + req.url);
});
http.listen(80);
Your HTTPS server should also enable HSTS.
app.use(function(req, res, next) {
res.set('Strict-Transport-Security', 'max-age=31536000');
next();
});

Maintenance page for nginx based nodejs site

I want to set up a maintenance page for my site (that visitors would see) but I also want to allow devs a way to still be able to access the site and test things (though this last bit, access for devs, is a request from a project manager... not sure it's the best way to do this as it seems like we should test on a staging server).
The site is nodejs based and runs on an nginx server via a proxy_pass.
The way I would have done this under apache is to permit a get param to be passed in that would allow a dev to circumvent being redirected to the maintenance page. Can't seem to figure out how to do this under nginx with a proxy_pass.
I was able to get everything to redirect to the maintenance page but images and css were broken and would not load. Additionally I could not implement a GET param override.
Any suggestions on how to approach this? The various tutorials around the web and comments here on SO don't seem to work and I suspect it has to do with the proxy_pass usage. Not certain.
*edit: I saw this post on SO but my attempts to implement it ended up with the visitor being redirected to "/maintenance" and getting a server error instead of my maintenance page. Also, it doesn't address overriding the redirect.

This is going be a question of how you decide to filter users. If you can filter access on IP address, cookie, or some other request aspect, then it's possible to use an if directive to redirect/rewrite all other users to the maintenance page. You mention using a GET parameter -- this condition would be an example of that (using $arg_PARAMETER as documented here):
server {
if ($arg_secret != "123456") {
rewrite ^(.*)$ /maintenance$1 break;
}
location /maintenance {
#root directive etc
}
location / {
#proxy_pass directive etc
}
}
Or you could invert the condition and configuration, and only proxy_pass for the condition being true. However, ``if` directives can be problematic (see http://wiki.nginx.org/IfIsEvil) so try before deploying.
As for the issue you've found with images and CSS not loading, you'll need to ensure that these maintenance resources always continue to be served because they were likely being affected by redirection rules as well. An example location directive could be like so:
location ~ /(.*\.css|.*\.jpg) {
root /var/www/maintenance;
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Slow down rogue web srappers on my website and still use Varnish - node.js

You could stop the crawler using the robots.txt User-agent: BadCrawler Disallow: / This solution works if the crawler follow the robots.txt specifications

Related

Have Node app resort to ISP Not Found instead of 404

TYPO3 block IP addresses

How can I return a 500 response for all requests to a specific file at the Varnish level?

How to encrypt parts of a webpage with node.js

Maintenance page for nginx based nodejs site

Categories

Resources