How can I keep spambots from getting past multiple web security measures? - bots

I am trying to stop spam accounts from being created on my website. I run a website that has approximately 50-80k pageviews per month. It's a social media website. Users sign up and communicate with one another for free. We've been battling with spam as of late even though we have implemented multiple security measures to counteract bots. I'd like to get any further suggestions of tips and tricks that I can try and also some help to see if I can identify if these are people coming from clickfarms, etc. (i.e. real people or computers)
Problem:
Signup form being completed and users posting spam in their profile information. Spammer signs up for the website by completing the signup form, activates their account via an email account, Logs into their account, and then completes their profile, putting spam in the description box with a link/url to their website they are advertising (everything from ##$%S enlargment to random blogs, to web developer websites, etc.) If there was one link they were posting we could detect it and ban them but they are not -- They are coming from multiple IP's, posting various links, using multiple email provider addresses for activating the accounts, registering with information from multiple countries, and creating about 10-30 accounts per day. Before implementing many security measures we were getting moreso around 100-200 fake accounts per day, but now we're down to 10-30 ... so we've seen some improvement, but the issue is still annoying me. So I'm half thinking now that the security measures are helping quite a bit, but that this is possibly humans still targeting our website and perhaps getting paid per signup they do or something similar to that. Even if so, is there any way I could confirm they are humans versus bots?
Security measures:
I won't get into all of the details here (for security reasons), but I'll just indicate what we've done to counteract the spambots:
Created honeypots at various areas of our website which automatically ban based on IP
IP banning - based on known botter/spammer ip addresses
Duration detection of signup form pageload to form submission -- if less than 5 seconds to complete our signup form, we're confirming you're a bot and then preventing the signup
Hidden checkbox in signup form -- there is a hidden checkbox in the signup form that is invisible to regular users (if a bot checks it we are automatically detecting and preventing the signup)
Google re-Captcha - We've enabled Google re-Captcha in our signup form as well
Email activation link - We send our users an activation email with a link that they have to click on to signup -- they are not able to sign into our website until they've activated their account.
Future actions include:
Detecting what users are posting in their descriptions in their profiles and banning based on that -- string detection for banned words, etc.
Any other suggestions or tips or tricks? In all honesty, if spam bots are getting through all of those security measures above --
do you think they are just that intelligent?
do you think we're being targeted?
Also, any way I can determine if they are bots or real humans? Suggestions?

This is a perennial problem; over the years I've found that as I add more anti-spam measures, the spammers continually get better at circumventing my measures.
I recommend doing an analysis of your spam to figure out how you can detect it. The spam itself contains the key to how to outsmart it. Look at the patterns, the structure, and decide what information is most useful and how the easiest way is to filter it out. Your spam detection doesn't need to be perfect, but generally, you want to get as much as possible, while getting as few false positives as possible.
Also, to answer your one question, you can make your bot-detection perfect, but there will always be humans submitting spam. And humans are tough to outsmart, and you may always need some manual attention to do it.
You are already implementing a lot of measures. Here are some more I would suggest:
When a signup form is generated, put a hidden field with a unique hash generated from the user's browser info, including the user's IP, HTTP user agent, and the date. Then, when the form is submitted, check the hash. This one method eliminated a surprising amount of spam.
If you want to take the previous method even farther, use a custom, time-sensitive hash in the URL of your contact form, and have the link to this form be dynamically generated. This way, if a spammer stores the form's URL, it won't work, but the link will work for every legitimate user of the site.
Make it so newly created, non-trusted users, cannot display any public profile information, such as URL's or text even. With a site as small as yours you could require manual approval of each user, and if your userbase got bigger, you could use an automated reputation system, a lot like Stack Overflow and the other Stack Exchange sites use. This removes the incentive for spam. Also, I found an overwhelming majority of spammers only ever logged onto the site once. If you wait to do the manual approval of users, until they have logged on twice, or even have returned to the site on another day, using a persistent cookie, you will filter out the vast majority of spammers and you will only have to do a small amount of manual approval work. Then have the system delete the unvalidated/inactive accounts after a certain amount of time.
Check for certain keywords or structure of info. I found an overwhelming majority of my spammers would use certain words or phrases that were never used by my legitimate users. Another one was entering a phone number in their profile, a common pattern in spammers, that no legitimate user ever did. Also look for signs of foul play like XSS attacks. A huge portion of spammers will, at some point, submit something that has a ton of HTML tags in it, you can either use the tags itself to filter them out, or you can do something like stripping the HTML tags and then comparing string length and banning them if it's more than a small amount (i.e. allow someone to do something simple like a few <em></em> or <strong></strong> tags.) Usually, if there are HTML tags in the entry, there's a ton of it. Also look for material with weird encodings or characters that don't make sense. This is often an attempt at sophisticated SQL injection attacks, XSS, or other types of hacking attempts.
Use external IP blacklists. AbuseIPDB is one example; it has an API that you can use to check new IP's before storing them in your temporary database. Their free plan allows checking of up to 1000 IP's a day and you can pay for more than that. It won't catch all the manual spam but I find they catch a ton of the automated spam.
Are they targeting you? Yes. They are targeting everyone. But any site with 50k+ pageviews a month is high enough volume to be an attractive target. The higher traffic you get, the more attractive of a target you will be. Even some of my tiny sites have been targeted with suprisingly sophisticated attacks these days. Everyone needs to be on guard.
Good luck. I wish this weren't so much of a problem, but it is.

Related

How should signup form error responses be displayed

I have a subscription based application that is build using MERN. I've recently submitted the application to be security tested and one of the responses that I received was that the application should not specifically tell the user why their signup application has been rejected for all cases. For example, if they enter a username or email that has already been registered, I shouldn't return an error message that says "Sorry, this username is already registered", as this would allow the user to build a list of users and emails that have registered with our site.
I understand why we need to prevent this, but I don't understand how I can tell the user why there signup submission failed without telling them that it's because that email has already been registered. It seems pointless to reject their signup form without giving them a specific reason, does anyone know what the best thing to do here is?
I have a subscription based application that is build using MERN
The fact you're using MongoDB, Express, React and NodeJS is irrelevant to how your end-users and visitors use your product.
I've recently submitted the application to be security tested...
Watch out - most "security consultants" I've come across that offer to do "analysis" just run some commodity scripts and vulnerability scanners against a website and then lightly touch-up the generated reports to make them look hand-written.
one of the responses that I received was that the application should not specifically tell the user why their signup application has been rejected for all cases
Hnnnng - not in "all" cases, yes - but unfortunately usability and security tend to be opposite ends of a seesaw that you need to carefully balance.
If you're a non-expert or otherwise inexperienced, I'd ask your security-consultant for an exhaustive list of those cases where they consider harmful information-disclosure is possible and then you should run that list by your UX team (and your legal team) to have them weigh-in.
I'll add (if not stress) that the web-application security scene is full of security-theatre and cargo-cult-programming practices, and bad and outdated advice sticks around in peoples' heads for too long (e.g. remember how everyone used to insist on changing your password every ~90 days? not anymore: it turns out that due to human-factors reasons that changing passwords frequently is often less secure).
For example, if they enter a username or email that has already been registered, I shouldn't return an error message that says "Sorry, this username is already registered", as this would allow the user to build a list of users and emails that have registered with our site.
Before considering any specific scenarios, first consider the nature of your web-application and your threat-model and ask yourself if the damage to the end user-experience is justified by the security gains, or even if there's any actual security gained at all.
For example, and using that issue specifically (i.e. not informing users on the registration page if a username and/or e-mail address is already in-use), I'd argue that for a public Internet website with a general-audience that usernames (i.e. login-names, screen-names, etc) are not particularly sensitive, and they're usually mutable, so there is no real end-user harm by disclosing if a username is already taken or not.
...but the existence or details of an e-mail address in your user-accounts database generally should not be disclosed to unauthenticated visitors. However, I don't think this is really possible to hide from visitors: if someone completes your registration form with completely valid data (excepting an already-in-use e-mail address) and the website rejects the registration attempt with a vague or completely useless error message then a novice user is going to be frustrated and give-up (and think your website is just broken), while a malicious user (with even a basic knowledge of how web-applications work) is going to instantly know it's because the e-mail address is in-use because it will work when they submit a different e-mail address - ergo: you haven't actually gained any security benefit while simultaneously losing business because your registration process is made painfully difficult.
However, consider alternative approaches:
One possible alternative approach for this problem specifically is to make it appear that the registration was successful, but to not let the malicious user in until they verify the e-mail address via emailed link (which they won't be able to do if it isn't their address), and if it is just a novice-user who is already registered and didn't realise it then just send them an email reminding them of that fact. This approach might be preferable on a social-media site where it's important to not disclose anything relating to any other users' PII - but this approach probably wouldn't be appropriate for a line-of-business system.
Another alternative approach: don't have your own registration system: just use OIDC and let users authenticate and register via Google, Facebook, Apple, etc. This also saves your users from having to remember another password.
As for the risk of information-harvesting: I appreciate that bots that brute-force large amounts of form-submissions sounds like a good match for never revealing information, a better solution is to just add a CAPTCHA and to rate-limit clients (both by limiting total requests-per-hour as well as adding artificial delays to user registration processing (e.g. humans generally don't care if a registration form POST takes 500ms or 1500ms, but that 1000ms difference will drastically affect bots.
In all my time building web-applications, I've never encountered any serious attempts at information-harvesting via automated registration form or login form submissions: it's always just marketing spam, and adding a CAPTCHA (even without rate-limiting) was all that was needed to put an end to that.
(The "non-serious" attempts at information-harvesting that I have seen were things like non-technical human-users manually "brute-forcing" themselves by typing through their keyboard: they all give-up after a few dozen attempts).
I understand why we need to prevent this, but I don't understand how I can tell the user why there signup submission failed without telling them that it's because that email has already been registered. It seems pointless to reject their signup form without giving them a specific reason, does anyone know what the best thing to do here is?
I'm getting the feeling maybe you got scammed by your security "consultants" making-up overstated risks in their report to you - rather than your web-application actually being at risk of being exploited.

Preventing registered users from sharing passwords

Below is a proposal for dealing with a situation of website security. I am wondering whether it seems feasible, from both a technical and usability point of view. I want to make sure that the proposal does contain any glaring errors.
A. THE WEBSITE
The website in question is a school website where students may purchase various items. These students get an account on the website with a username and password, which they can use to login. Once they login, they have access to protected pages with private content which are unavailable to the public at large.
B. THE SECURITY CONCERN
The owners of the website wish to prevent a situation where a single student signs up on the website, obtains a username and password, and then circulates those credentials to a circle of friends who are then able to login to the website and illegally view the private content.
C. THE SOLUTION
The central idea we have come up with to deal with this security problem is to permit each student to login to the website on two devices only. Once a student logs in on two different devices, they are restricted to those two devices. If they then attempt to login on a third device, the system would simply not permit them to do so. It is our understanding that other websites offering private content, such as Netflix, use such an approach.
D. IMPLEMENTATION
Two ideas come to mind to implement the above security measure: a. IP address. b. Cookies. We rule out IP addresses, which can change, and choose cookies. Websites such as amazon.com allow their customers to login once, and then whenever they return to the website, they are always recognized. This is almost certainly achieved through cookies.
Thus each time a student logs in, we will store on their device a cookie. And we will also store this cookie in our database under that student's account. Thus each time a student logs in on any device, we will check whether the device they are currently logging in on contains the cookie we have stored for that student. If it does not, we will know that the student is logging in on a different device. We will thus be able to know how many devices the student is trying to login on.
E. DRAWBACKS
We have identified at least three possible drawbacks to this approach:
Clearing Cookies. People can, for a variety of reasons, choose to clear the cookies from their computer.
A bona fide person may occasionally not have access to their usual device, and wish to login on a different computer.
People do purchase new devices from time to time.
These are examples of situations where a bona fide user, for legitimate reasons, wishes to login, but will be unable to, due to the website's security restriction of two devices.
We have some ideas as to how to build logic into the system to deal with such situations, which we may implement in the future, but for the time being, we feel that such situations are sufficiently rare that we do not need to handle them programmatically.
Rather, for now, in the event that a student is locked out, they will get a screen with a message explaining why we have not allowed them into the system, and a button which they can click on which will automatically generate an email to the site administrators.
The email will inform them that a student wishes to login on a third device. The administators can then contact the student, and, if they are satisfied that the need is bona fide, they will be able to take steps from the CMS to allow that student in.
The size of the student body is sufficiently manageable that the above approach should be feasible.
F. FAIR WARNING
We will inform the students of these security measures when their account is activated, in order to prevent unpleasant surprises.
This is a typical example of trying to fix via IT a real-world policy problem; in this case, forbidding students to share their credentials.
The solution you are proposing limits usability (for the reasons which you already listed under Drawbacks) and does not ensure security because cookies - and especially content - can be copied around. In short, it is feasible, but can easily be circumvented. It's up to you to decide whether it is worth to implement it. It would probably be better to enforce the "no sharing credentials" rule e.g. set up random checks and send any student caught violating the rule to the Dean.

Safe Way To Register Users On Website

I have my first website. One of the first tasks I have encountered is creating a registration page to register a new user. I have concerns about "safe" ways to do this. Essentially a registration page is a window to do database inserts into a user table. I'm concerned about script kiddies getting a hold of my registration form and mercilessly pounding the database with false inserts.
A couple things I've researched and struggled with:
Captchas: I really wanted to be able to create my site without these as from my research it sounds like they're about 20% effective at turning away bots while they are guaranteed to anger real human users. If at all possible I'd like to make captcha's be either non-existent on my site or dynamically appear if it seems I'm being scripted against.
IP Spoofing - I toyed with the idea of checking based on IP so that if I get a lot of successive form submissions from the same IP I could give them a captcha. However, it is my understanding that it is trivial to spoof IP addresses and that checking for repeat submissions from someone who is appropriately spoofing would be ineffective.
Registration Confirmation via Email Link - You see this a lot on forums, etc. After the user registers you send them a confirmation link with a unique token to verify they have a real email box and haven't put in a fake one (or perhaps genuinely mis-typed). While this may add some value around validating a user is "real" you have already inserted into your user table and thus script kiddies prevail at filling a database with useless information.
How do site developers prevent script kiddies from spamming their database with tons of useless users? If the assumptions I've made above are correct I don't see an effective way to prevent it. I have toyed with other ideas that after I think about them are all crap. The search terms I'm currently using aren't turning up many results so I apologize if this is an overplayed topic.
I'm not totally agreed with eliminating the captcha part, however you can trap some bots in a Honeypot. Make an input field which is invisible to the end-user, however still exists for bots. If the submitted form contains the fake-field value then ignore it, real users can't see invisible fields! :)
For example:
// jQuery
$("#username").hide();
// HTML
<input type="text" name="real-username">
<input type="text" name="username" id="username">
// PHP
if (!empty($_REQUEST['username']))
die('Oops!');
Just remember that you need to ignore the username field, your real username is in real-username.
I have found the Confirmation Emails, combined with a clean up task (that deletes all registrations over x number of days which are not confirmed) will help. You won't be able to prevent all spam registrations, but a little bit of work in the DB will help keep the table small.

Possible solutions for keeping track of anonymous users

I'm currently developing a web application that has one feature while allows input from anonymous users (No authorization required). I realize that this may prove to have security risks such as repeated arbitrary inputs (ex. spam), or users posting malicious content. So to remedy this I'm trying to create a sort of system that keeps track of what each anonymous user has posted.
So far all I can think of is tracking by IP, but it seems as though it may not be viable due to dynamic IPs, are there any other solutions for anonymous user tracking?
I would recommend requiring them to answer a captcha before posting, or after an unusual number of posts from a single ip address.
"A CAPTCHA is a program that protects websites against bots by generating and grading tests >that humans can pass but current computer programs cannot. For example, humans can read >distorted text as the one shown below, but current computer programs can't"
That way the spammers are actual humans. That will slow the firehose to a level where you can weed out any that does get through.
http://www.captcha.net/
There's two main ways: clientside and serverside. Tracking IP is all that I can think of serverside; clientside there's more accurate options, but they are all under user's control, and he can reanonymise himself (it's his machine, after all): cookies and storage come to mind.
Drop a cookie with an ID on it. Sure, cookies can be deleted, but this at least gives you something.
My suggestion is:
Use cookies for tracking of user identity. As you yourself have said, due to dynamic IP addresses, you can't reliably use them for tracking user identity.
To detect and curb spam, use IP + user browser agent combination.

Importance of verifying user email on web signup

I know this question is crazy - but my employers client is demanding that email verification be removed from the sign up process (they feel it is impeding sign up).
I wanted to garner feedback from the programming community at large as to their experience and opinions regarding sign up and email verification - and the possible consequences of removing this safeguard.
I'm on their side -- 95% of the time websites don't actually need an e-mail address, they just collect it because all the other web registrations they've seen collect one. If you're worried about spam, use a captcha; e-mails are a horrible way to stop automated registrations. With sites like Mailinator to give people instant throwaway e-mails and BugMeNot to save people the hassle of dealing with registrations like yours, you should avoid making your registration any harder than it needs to be. Stack Overflow is a great example -- you don't even need to register to ask/answer questions
My guess is that robots will not bother going through a registration process. Your average simple-minded robot simply spams into a form that requires no other action (authentication, identification) at all. The mere act of asking for one or more extra clicks will prevent most simple-minded "attacks." If you look at the blog site for Coding Horror, they use a captcha with a constant capture word.
On the other hand, while a few extra clicks will deter dumb robots, they will not deter human spammers, jokers, griefers, etc. But then again, throwaway email addresses are pretty easy to come by, so if someone truly wants to fill your site with junk they can.
My conclusion is this: I guess you will get about 10% to 20% more "junk" on your pages, and between 5% and 25% more "desired" accesses, depending on how badly it was bothering your potential customers. Thus, I don't see any big harm in removing the email barrier.
Email is important to identify the user, for instance, when they forget their password. If email verification is setup in such a way that users are not able to log on until they verify their email address then I also think that it is impeding. The application should allow the user to log on and use the application and set it up so that the user needs to verify their email address in a fixed number of days, for example, a week. If they do not their account is suspended.
On the other hand if we have to remove the email verification then I think we would need to add a feature similar to the major email services that allow the user to reset their forgotten passwords in the absence of a valid email address.

Resources