Stopping a bot attack server side solution (without a CAPTCHA or JavaScript)

Stopping a bot attack server side solution (without a CAPTCHA or JavaScript) - security

I inherited some code that was recently attacked by repeated remote form submissions.
Initially I implemented some protection by setting a unique session auth token (not the session id). While I realize this specific attack is not CSRF, I adapted my solution from these posts (albeit dated).
https://www.owasp.org/index.php/Cross-Site_Request_Forgery_%28CSRF%29
http://tyleregeto.com/a-guide-to-nonce
http://shiflett.org/articles/cross-site-request-forgeries
I've also read existing posts on SO, such as Practical non-image based CAPTCHA approaches?
However, the attacker now requests the form page first, starting a valid session, and then passes the session cookie in the following POST request. Therefore having a valid session token. So fail on my part.
I need to put some additional preventative measures in place. I'd like to avoid CAPTCHA (do to poor user experience) and JavaScript solutions if possible. I've also considered referrer checks (can be faked), honeypots (hidden fields), as well as rate limiting (which can be overcome by throttling). This attacker is persistent.
With that said, what would be a more robust solution.

If you are having a human that attacks specifically your page, then you need to find what makes this attacker different from the regular user.
If he spams you with certain URLs or text or alike - block them after they are submitted.
You can also quarantine submissions - don't let them go for say 5 minutes. Within those 5 minutes if you receive another submission to the same form from the same IP - discard both posts and block the IP.
CAPTCHA is good if you use good CAPTCHA, cause many custom home-made captchas are now recognized automatically by specially crafted software.
To summarize - your problem needs not just technical, but more social solutions, aiming at neutralizing the botmaster rather than preventing the bot from posting.

CAPTCHAs were invented for this exact reason. Because there is NO WAY to differentiate 100% between human and bot.
You can throttle your users by increasing a server-side counter, and when it reaches X times, then you can consider it as a bot attack, and lock the site out. Then, when some time elapse (save the time of the attack as well), allow entry.

i've thought a little about this myself.
i had an idea to extend the session auth token to also store a set of randomized form variable names. so instead of
<input name="title" ... >
you'd get
<input name="aZ5KlMsle2" ... >
and then additionally add a bunch of traps fields, which are hidden via css.
if any of the traps are filled out, then it was not a normal user, but a bot examining your html source...

How about a hidden form field? If it gets filled automatically by the bot, you accept the request, but dismiss it.

Related

How can I keep spambots from getting past multiple web security measures?

I am trying to stop spam accounts from being created on my website. I run a website that has approximately 50-80k pageviews per month. It's a social media website. Users sign up and communicate with one another for free. We've been battling with spam as of late even though we have implemented multiple security measures to counteract bots. I'd like to get any further suggestions of tips and tricks that I can try and also some help to see if I can identify if these are people coming from clickfarms, etc. (i.e. real people or computers)
Problem:
Signup form being completed and users posting spam in their profile information. Spammer signs up for the website by completing the signup form, activates their account via an email account, Logs into their account, and then completes their profile, putting spam in the description box with a link/url to their website they are advertising (everything from ##$%S enlargment to random blogs, to web developer websites, etc.) If there was one link they were posting we could detect it and ban them but they are not -- They are coming from multiple IP's, posting various links, using multiple email provider addresses for activating the accounts, registering with information from multiple countries, and creating about 10-30 accounts per day. Before implementing many security measures we were getting moreso around 100-200 fake accounts per day, but now we're down to 10-30 ... so we've seen some improvement, but the issue is still annoying me. So I'm half thinking now that the security measures are helping quite a bit, but that this is possibly humans still targeting our website and perhaps getting paid per signup they do or something similar to that. Even if so, is there any way I could confirm they are humans versus bots?
Security measures:
I won't get into all of the details here (for security reasons), but I'll just indicate what we've done to counteract the spambots:
Created honeypots at various areas of our website which automatically ban based on IP
IP banning - based on known botter/spammer ip addresses
Duration detection of signup form pageload to form submission -- if less than 5 seconds to complete our signup form, we're confirming you're a bot and then preventing the signup
Hidden checkbox in signup form -- there is a hidden checkbox in the signup form that is invisible to regular users (if a bot checks it we are automatically detecting and preventing the signup)
Google re-Captcha - We've enabled Google re-Captcha in our signup form as well
Email activation link - We send our users an activation email with a link that they have to click on to signup -- they are not able to sign into our website until they've activated their account.
Future actions include:
Detecting what users are posting in their descriptions in their profiles and banning based on that -- string detection for banned words, etc.
Any other suggestions or tips or tricks? In all honesty, if spam bots are getting through all of those security measures above --
do you think they are just that intelligent?
do you think we're being targeted?
Also, any way I can determine if they are bots or real humans? Suggestions?

This is a perennial problem; over the years I've found that as I add more anti-spam measures, the spammers continually get better at circumventing my measures.
I recommend doing an analysis of your spam to figure out how you can detect it. The spam itself contains the key to how to outsmart it. Look at the patterns, the structure, and decide what information is most useful and how the easiest way is to filter it out. Your spam detection doesn't need to be perfect, but generally, you want to get as much as possible, while getting as few false positives as possible.
Also, to answer your one question, you can make your bot-detection perfect, but there will always be humans submitting spam. And humans are tough to outsmart, and you may always need some manual attention to do it.
You are already implementing a lot of measures. Here are some more I would suggest:
When a signup form is generated, put a hidden field with a unique hash generated from the user's browser info, including the user's IP, HTTP user agent, and the date. Then, when the form is submitted, check the hash. This one method eliminated a surprising amount of spam.
If you want to take the previous method even farther, use a custom, time-sensitive hash in the URL of your contact form, and have the link to this form be dynamically generated. This way, if a spammer stores the form's URL, it won't work, but the link will work for every legitimate user of the site.
Make it so newly created, non-trusted users, cannot display any public profile information, such as URL's or text even. With a site as small as yours you could require manual approval of each user, and if your userbase got bigger, you could use an automated reputation system, a lot like Stack Overflow and the other Stack Exchange sites use. This removes the incentive for spam. Also, I found an overwhelming majority of spammers only ever logged onto the site once. If you wait to do the manual approval of users, until they have logged on twice, or even have returned to the site on another day, using a persistent cookie, you will filter out the vast majority of spammers and you will only have to do a small amount of manual approval work. Then have the system delete the unvalidated/inactive accounts after a certain amount of time.
Check for certain keywords or structure of info. I found an overwhelming majority of my spammers would use certain words or phrases that were never used by my legitimate users. Another one was entering a phone number in their profile, a common pattern in spammers, that no legitimate user ever did. Also look for signs of foul play like XSS attacks. A huge portion of spammers will, at some point, submit something that has a ton of HTML tags in it, you can either use the tags itself to filter them out, or you can do something like stripping the HTML tags and then comparing string length and banning them if it's more than a small amount (i.e. allow someone to do something simple like a few <em></em> or <strong></strong> tags.) Usually, if there are HTML tags in the entry, there's a ton of it. Also look for material with weird encodings or characters that don't make sense. This is often an attempt at sophisticated SQL injection attacks, XSS, or other types of hacking attempts.
Use external IP blacklists. AbuseIPDB is one example; it has an API that you can use to check new IP's before storing them in your temporary database. Their free plan allows checking of up to 1000 IP's a day and you can pay for more than that. It won't catch all the manual spam but I find they catch a ton of the automated spam.
Are they targeting you? Yes. They are targeting everyone. But any site with 50k+ pageviews a month is high enough volume to be an attractive target. The higher traffic you get, the more attractive of a target you will be. Even some of my tiny sites have been targeted with suprisingly sophisticated attacks these days. Everyone needs to be on guard.
Good luck. I wish this weren't so much of a problem, but it is.

Safe Way To Register Users On Website

I have my first website. One of the first tasks I have encountered is creating a registration page to register a new user. I have concerns about "safe" ways to do this. Essentially a registration page is a window to do database inserts into a user table. I'm concerned about script kiddies getting a hold of my registration form and mercilessly pounding the database with false inserts.
A couple things I've researched and struggled with:
Captchas: I really wanted to be able to create my site without these as from my research it sounds like they're about 20% effective at turning away bots while they are guaranteed to anger real human users. If at all possible I'd like to make captcha's be either non-existent on my site or dynamically appear if it seems I'm being scripted against.
IP Spoofing - I toyed with the idea of checking based on IP so that if I get a lot of successive form submissions from the same IP I could give them a captcha. However, it is my understanding that it is trivial to spoof IP addresses and that checking for repeat submissions from someone who is appropriately spoofing would be ineffective.
Registration Confirmation via Email Link - You see this a lot on forums, etc. After the user registers you send them a confirmation link with a unique token to verify they have a real email box and haven't put in a fake one (or perhaps genuinely mis-typed). While this may add some value around validating a user is "real" you have already inserted into your user table and thus script kiddies prevail at filling a database with useless information.
How do site developers prevent script kiddies from spamming their database with tons of useless users? If the assumptions I've made above are correct I don't see an effective way to prevent it. I have toyed with other ideas that after I think about them are all crap. The search terms I'm currently using aren't turning up many results so I apologize if this is an overplayed topic.

I'm not totally agreed with eliminating the captcha part, however you can trap some bots in a Honeypot. Make an input field which is invisible to the end-user, however still exists for bots. If the submitted form contains the fake-field value then ignore it, real users can't see invisible fields! :)
For example:
// jQuery
$("#username").hide();
// HTML
<input type="text" name="real-username">
<input type="text" name="username" id="username">
// PHP
if (!empty($_REQUEST['username']))
die('Oops!');
Just remember that you need to ignore the username field, your real username is in real-username.

I have found the Confirmation Emails, combined with a clean up task (that deletes all registrations over x number of days which are not confirmed) will help. You won't be able to prevent all spam registrations, but a little bit of work in the DB will help keep the table small.

Voting system hack proof

I'm implementing a voting system like Stackoverflow's. How can I implement this so it is hack proof?
I've got some PHP that does database work according to the ajax request sent after the javascript parses it. Would doing a query to check the current vote state of a user be enough to avoid unauthorised votes?

It is definitely possible to implement pretty reliable solution. But this must be done server-side.
Basic rule of security: you don't trust client data.
Move all your checks to PHP and make your javascript as dumb as
$(".vote").click(function(e) {
$.post('/vote.php', vote_data, function(result) {
// update UI according to returned result
}
}
It's a common thing, however, to still do checks on the client, but as a way to improve usability (mark required form fields that weren't filled) or reduce server load (by not sending obviously incomplete data). These client checks are for user's comfort, not for your security.
Answering to your updated question:
If you store full log of when which user voted for which question, then yes, it's pretty easy to prevent multiple voting (when user can vote for the same thing several times). Assuming, of course, that anonymous votes are not allowed.
But if you have a popular site, this log can get pretty big and be a problem. Some systems try to get away by disabling voting on old articles (and removing corresponding log entries).
What if someone intentionally tries to hack me?
There are different types of attacks a malicious user can perform.
CSRF (cross-site request forgery)
The article lists some methods for preventing the attack. Modern Ruby on Rails has built-in protection, enabled by default. Don't know how it is in PHP world.
Clickjacking
This attack tricks users into clicking on something what isn't what they think. For example, they may click "Play video", but the site will intercept this click and post on user's wall instead.
There are some articles on the Web as well.
Wiki on clickjacking
5 ways to prevent clickjacking
Javascript to prevent clickjacking

NOTE: THIS IS AN ANSWER TO THE ORIGINAL QUESTIONDon't downvote it just because the OP radically changed his question.
It's a huge error even just thinking of relying on browser-side components to enforce application logic. Javascript should be used, in untrusted environments, exclusively for presentation purposes.
All application logic should be implemented, validated and enforced server-side.

Possible solutions for keeping track of anonymous users

I'm currently developing a web application that has one feature while allows input from anonymous users (No authorization required). I realize that this may prove to have security risks such as repeated arbitrary inputs (ex. spam), or users posting malicious content. So to remedy this I'm trying to create a sort of system that keeps track of what each anonymous user has posted.
So far all I can think of is tracking by IP, but it seems as though it may not be viable due to dynamic IPs, are there any other solutions for anonymous user tracking?

I would recommend requiring them to answer a captcha before posting, or after an unusual number of posts from a single ip address.
"A CAPTCHA is a program that protects websites against bots by generating and grading tests >that humans can pass but current computer programs cannot. For example, humans can read >distorted text as the one shown below, but current computer programs can't"
That way the spammers are actual humans. That will slow the firehose to a level where you can weed out any that does get through.
http://www.captcha.net/

There's two main ways: clientside and serverside. Tracking IP is all that I can think of serverside; clientside there's more accurate options, but they are all under user's control, and he can reanonymise himself (it's his machine, after all): cookies and storage come to mind.

Drop a cookie with an ID on it. Sure, cookies can be deleted, but this at least gives you something.

My suggestion is:
Use cookies for tracking of user identity. As you yourself have said, due to dynamic IP addresses, you can't reliably use them for tracking user identity.
To detect and curb spam, use IP + user browser agent combination.

Programmatic Bot Detection

I need to write some code to analyze whether or not a given user on our site is a bot. If it's a bot, we'll take some specific action. Looking at the User Agent is not something that is successful for anything but friendly bots, as you can specify any user agent you want in a bot. I'm after behaviors of unfriendly bots. Various ideas I've had so far are:
If you don't have a browser ID
If you don't have a session ID
Unable to write a cookie
Obviously, there are some cases where a legitimate user will look like a bot, but that's ok. Are there other programmatic ways to detect a bot, or either detect something that looks like a bot?

User agents can be faked. Captchas have been cracked. Valid cookies can be sent back to your server with page requests. Legitimate programs, such as Adobe Acrobat Pro can go in and download your web site in one session. Users can disable JavaScript. Since there is no standard measure of "normal" user behaviour, it cannot be differentiated from a bot.
In other words: it can't be done short of pulling the user into some form of interactive chat and hope they pass the Turing Test, then again, they could be a really good bot too.

Clarify why you want to exclude bots, and how tolerant you are of mis-classification.
That is, do you have to exclude every single bot at the expense of treating real users like bots? Or is it okay if bots crawl your site as long as they don't have a performance impact?
The only way to exclude all bots is to shut down your web site. A malicious user can distribute their bot to enough machines that you would not be able to distinguish their traffic from real users. Tricks like JavaScript and CSS will not stop a determined attacker.
If a "happy medium" is satisfactory, one trick that might be helpful is to hide links with CSS so that they are not visible to users in a browser, but are still in the HTML. Any agent that follows one of these "poison" links is a bot.

A simple test is javascript:
<script type="text/javascript">
document.write('<img src="/not-a-bot.' + 'php" style="display: none;">');
</script>
The not-a-bot.php can add something into the session to flag that the user is not a bot, then return a single pixel gif.
The URL is broken up to disguise it from the bot.

Here's an idea:
Most bots don't download css, javascript and images. They just parse the html.
If you could keep track in a user's session whether or not they download all of the above, e.g. by routing all of the download requests through a script that logs the attempts, then you could quickly identify users that only download the raw html (very few normal users will do this).

You say that it is okay that some users appear as bots, therefore,
Most bots don't run javascript. Use javascript to do an Ajax like call to the server that identifies this IP address as NonBot. Store that for a set period of time to identify future connections from this IP as good clients and to prevent further wasteful javascript calls.

For each session on the server you can determine if the user was at any point clicking or typing too fast. After a given number of repeats, set the "isRobot" flag to true and conserve resources within that session. Normally you don't tell the user that he's been robot-detected, since he'd just start a new session in that case.

Well, this is really for a particular page of the site. We don't want a bot submitting the form b/c it messes up tracking. Honestly, the friendly bots, Google, Yahoo, etc aren't a problem as they don't typically fill out the form to begin with. If we suspected someone of being a bot, we might show them a captcha image or something like that... If they passed, they're not a bot and the form submits...
I've heard things like putting a form in flash, or making the submit javascript, but I'd prefer not to prevent real users from using the site until I suspected they were a bot...

I think your idea with checking the session id will already be quite useful.
Another idea: You could check whether embedded resources are downloaded as well.
A bot which does not load images (e.g. to save time and bandwidth) should be distinguishable from a browser which typically will load images embedded into a page.
Such a check however might not be suited as a real-time check because you would have to analyze some sort of server log which might be time consuming.

Hey, thanks for all the responses. I think that a combination of a few suggestions will work well. Mainly, the hidden form element that times how fast the form was filled out, and possibly the "poison link" idea. I think that it will cover most basis. When you're talking about bots, you're not going to find them all, so there's no point thinking that you will... Silly bots.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string