Safe Way To Register Users On Website

Safe Way To Register Users On Website - security

I have my first website. One of the first tasks I have encountered is creating a registration page to register a new user. I have concerns about "safe" ways to do this. Essentially a registration page is a window to do database inserts into a user table. I'm concerned about script kiddies getting a hold of my registration form and mercilessly pounding the database with false inserts.
A couple things I've researched and struggled with:
Captchas: I really wanted to be able to create my site without these as from my research it sounds like they're about 20% effective at turning away bots while they are guaranteed to anger real human users. If at all possible I'd like to make captcha's be either non-existent on my site or dynamically appear if it seems I'm being scripted against.
IP Spoofing - I toyed with the idea of checking based on IP so that if I get a lot of successive form submissions from the same IP I could give them a captcha. However, it is my understanding that it is trivial to spoof IP addresses and that checking for repeat submissions from someone who is appropriately spoofing would be ineffective.
Registration Confirmation via Email Link - You see this a lot on forums, etc. After the user registers you send them a confirmation link with a unique token to verify they have a real email box and haven't put in a fake one (or perhaps genuinely mis-typed). While this may add some value around validating a user is "real" you have already inserted into your user table and thus script kiddies prevail at filling a database with useless information.
How do site developers prevent script kiddies from spamming their database with tons of useless users? If the assumptions I've made above are correct I don't see an effective way to prevent it. I have toyed with other ideas that after I think about them are all crap. The search terms I'm currently using aren't turning up many results so I apologize if this is an overplayed topic.

I'm not totally agreed with eliminating the captcha part, however you can trap some bots in a Honeypot. Make an input field which is invisible to the end-user, however still exists for bots. If the submitted form contains the fake-field value then ignore it, real users can't see invisible fields! :)
For example:
// jQuery
$("#username").hide();
// HTML
<input type="text" name="real-username">
<input type="text" name="username" id="username">
// PHP
if (!empty($_REQUEST['username']))
die('Oops!');
Just remember that you need to ignore the username field, your real username is in real-username.

I have found the Confirmation Emails, combined with a clean up task (that deletes all registrations over x number of days which are not confirmed) will help. You won't be able to prevent all spam registrations, but a little bit of work in the DB will help keep the table small.

Related

How should signup form error responses be displayed

I have a subscription based application that is build using MERN. I've recently submitted the application to be security tested and one of the responses that I received was that the application should not specifically tell the user why their signup application has been rejected for all cases. For example, if they enter a username or email that has already been registered, I shouldn't return an error message that says "Sorry, this username is already registered", as this would allow the user to build a list of users and emails that have registered with our site.
I understand why we need to prevent this, but I don't understand how I can tell the user why there signup submission failed without telling them that it's because that email has already been registered. It seems pointless to reject their signup form without giving them a specific reason, does anyone know what the best thing to do here is?

I have a subscription based application that is build using MERN
The fact you're using MongoDB, Express, React and NodeJS is irrelevant to how your end-users and visitors use your product.
I've recently submitted the application to be security tested...
Watch out - most "security consultants" I've come across that offer to do "analysis" just run some commodity scripts and vulnerability scanners against a website and then lightly touch-up the generated reports to make them look hand-written.
one of the responses that I received was that the application should not specifically tell the user why their signup application has been rejected for all cases
Hnnnng - not in "all" cases, yes - but unfortunately usability and security tend to be opposite ends of a seesaw that you need to carefully balance.
If you're a non-expert or otherwise inexperienced, I'd ask your security-consultant for an exhaustive list of those cases where they consider harmful information-disclosure is possible and then you should run that list by your UX team (and your legal team) to have them weigh-in.
I'll add (if not stress) that the web-application security scene is full of security-theatre and cargo-cult-programming practices, and bad and outdated advice sticks around in peoples' heads for too long (e.g. remember how everyone used to insist on changing your password every ~90 days? not anymore: it turns out that due to human-factors reasons that changing passwords frequently is often less secure).
For example, if they enter a username or email that has already been registered, I shouldn't return an error message that says "Sorry, this username is already registered", as this would allow the user to build a list of users and emails that have registered with our site.
Before considering any specific scenarios, first consider the nature of your web-application and your threat-model and ask yourself if the damage to the end user-experience is justified by the security gains, or even if there's any actual security gained at all.
For example, and using that issue specifically (i.e. not informing users on the registration page if a username and/or e-mail address is already in-use), I'd argue that for a public Internet website with a general-audience that usernames (i.e. login-names, screen-names, etc) are not particularly sensitive, and they're usually mutable, so there is no real end-user harm by disclosing if a username is already taken or not.
...but the existence or details of an e-mail address in your user-accounts database generally should not be disclosed to unauthenticated visitors. However, I don't think this is really possible to hide from visitors: if someone completes your registration form with completely valid data (excepting an already-in-use e-mail address) and the website rejects the registration attempt with a vague or completely useless error message then a novice user is going to be frustrated and give-up (and think your website is just broken), while a malicious user (with even a basic knowledge of how web-applications work) is going to instantly know it's because the e-mail address is in-use because it will work when they submit a different e-mail address - ergo: you haven't actually gained any security benefit while simultaneously losing business because your registration process is made painfully difficult.
However, consider alternative approaches:
One possible alternative approach for this problem specifically is to make it appear that the registration was successful, but to not let the malicious user in until they verify the e-mail address via emailed link (which they won't be able to do if it isn't their address), and if it is just a novice-user who is already registered and didn't realise it then just send them an email reminding them of that fact. This approach might be preferable on a social-media site where it's important to not disclose anything relating to any other users' PII - but this approach probably wouldn't be appropriate for a line-of-business system.
Another alternative approach: don't have your own registration system: just use OIDC and let users authenticate and register via Google, Facebook, Apple, etc. This also saves your users from having to remember another password.
As for the risk of information-harvesting: I appreciate that bots that brute-force large amounts of form-submissions sounds like a good match for never revealing information, a better solution is to just add a CAPTCHA and to rate-limit clients (both by limiting total requests-per-hour as well as adding artificial delays to user registration processing (e.g. humans generally don't care if a registration form POST takes 500ms or 1500ms, but that 1000ms difference will drastically affect bots.
In all my time building web-applications, I've never encountered any serious attempts at information-harvesting via automated registration form or login form submissions: it's always just marketing spam, and adding a CAPTCHA (even without rate-limiting) was all that was needed to put an end to that.
(The "non-serious" attempts at information-harvesting that I have seen were things like non-technical human-users manually "brute-forcing" themselves by typing through their keyboard: they all give-up after a few dozen attempts).
I understand why we need to prevent this, but I don't understand how I can tell the user why there signup submission failed without telling them that it's because that email has already been registered. It seems pointless to reject their signup form without giving them a specific reason, does anyone know what the best thing to do here is?
I'm getting the feeling maybe you got scammed by your security "consultants" making-up overstated risks in their report to you - rather than your web-application actually being at risk of being exploited.

How can I keep spambots from getting past multiple web security measures?

I am trying to stop spam accounts from being created on my website. I run a website that has approximately 50-80k pageviews per month. It's a social media website. Users sign up and communicate with one another for free. We've been battling with spam as of late even though we have implemented multiple security measures to counteract bots. I'd like to get any further suggestions of tips and tricks that I can try and also some help to see if I can identify if these are people coming from clickfarms, etc. (i.e. real people or computers)
Problem:
Signup form being completed and users posting spam in their profile information. Spammer signs up for the website by completing the signup form, activates their account via an email account, Logs into their account, and then completes their profile, putting spam in the description box with a link/url to their website they are advertising (everything from ##$%S enlargment to random blogs, to web developer websites, etc.) If there was one link they were posting we could detect it and ban them but they are not -- They are coming from multiple IP's, posting various links, using multiple email provider addresses for activating the accounts, registering with information from multiple countries, and creating about 10-30 accounts per day. Before implementing many security measures we were getting moreso around 100-200 fake accounts per day, but now we're down to 10-30 ... so we've seen some improvement, but the issue is still annoying me. So I'm half thinking now that the security measures are helping quite a bit, but that this is possibly humans still targeting our website and perhaps getting paid per signup they do or something similar to that. Even if so, is there any way I could confirm they are humans versus bots?
Security measures:
I won't get into all of the details here (for security reasons), but I'll just indicate what we've done to counteract the spambots:
Created honeypots at various areas of our website which automatically ban based on IP
IP banning - based on known botter/spammer ip addresses
Duration detection of signup form pageload to form submission -- if less than 5 seconds to complete our signup form, we're confirming you're a bot and then preventing the signup
Hidden checkbox in signup form -- there is a hidden checkbox in the signup form that is invisible to regular users (if a bot checks it we are automatically detecting and preventing the signup)
Google re-Captcha - We've enabled Google re-Captcha in our signup form as well
Email activation link - We send our users an activation email with a link that they have to click on to signup -- they are not able to sign into our website until they've activated their account.
Future actions include:
Detecting what users are posting in their descriptions in their profiles and banning based on that -- string detection for banned words, etc.
Any other suggestions or tips or tricks? In all honesty, if spam bots are getting through all of those security measures above --
do you think they are just that intelligent?
do you think we're being targeted?
Also, any way I can determine if they are bots or real humans? Suggestions?

This is a perennial problem; over the years I've found that as I add more anti-spam measures, the spammers continually get better at circumventing my measures.
I recommend doing an analysis of your spam to figure out how you can detect it. The spam itself contains the key to how to outsmart it. Look at the patterns, the structure, and decide what information is most useful and how the easiest way is to filter it out. Your spam detection doesn't need to be perfect, but generally, you want to get as much as possible, while getting as few false positives as possible.
Also, to answer your one question, you can make your bot-detection perfect, but there will always be humans submitting spam. And humans are tough to outsmart, and you may always need some manual attention to do it.
You are already implementing a lot of measures. Here are some more I would suggest:
When a signup form is generated, put a hidden field with a unique hash generated from the user's browser info, including the user's IP, HTTP user agent, and the date. Then, when the form is submitted, check the hash. This one method eliminated a surprising amount of spam.
If you want to take the previous method even farther, use a custom, time-sensitive hash in the URL of your contact form, and have the link to this form be dynamically generated. This way, if a spammer stores the form's URL, it won't work, but the link will work for every legitimate user of the site.
Make it so newly created, non-trusted users, cannot display any public profile information, such as URL's or text even. With a site as small as yours you could require manual approval of each user, and if your userbase got bigger, you could use an automated reputation system, a lot like Stack Overflow and the other Stack Exchange sites use. This removes the incentive for spam. Also, I found an overwhelming majority of spammers only ever logged onto the site once. If you wait to do the manual approval of users, until they have logged on twice, or even have returned to the site on another day, using a persistent cookie, you will filter out the vast majority of spammers and you will only have to do a small amount of manual approval work. Then have the system delete the unvalidated/inactive accounts after a certain amount of time.
Check for certain keywords or structure of info. I found an overwhelming majority of my spammers would use certain words or phrases that were never used by my legitimate users. Another one was entering a phone number in their profile, a common pattern in spammers, that no legitimate user ever did. Also look for signs of foul play like XSS attacks. A huge portion of spammers will, at some point, submit something that has a ton of HTML tags in it, you can either use the tags itself to filter them out, or you can do something like stripping the HTML tags and then comparing string length and banning them if it's more than a small amount (i.e. allow someone to do something simple like a few <em></em> or <strong></strong> tags.) Usually, if there are HTML tags in the entry, there's a ton of it. Also look for material with weird encodings or characters that don't make sense. This is often an attempt at sophisticated SQL injection attacks, XSS, or other types of hacking attempts.
Use external IP blacklists. AbuseIPDB is one example; it has an API that you can use to check new IP's before storing them in your temporary database. Their free plan allows checking of up to 1000 IP's a day and you can pay for more than that. It won't catch all the manual spam but I find they catch a ton of the automated spam.
Are they targeting you? Yes. They are targeting everyone. But any site with 50k+ pageviews a month is high enough volume to be an attractive target. The higher traffic you get, the more attractive of a target you will be. Even some of my tiny sites have been targeted with suprisingly sophisticated attacks these days. Everyone needs to be on guard.
Good luck. I wish this weren't so much of a problem, but it is.

Stopping a bot attack server side solution (without a CAPTCHA or JavaScript)

I inherited some code that was recently attacked by repeated remote form submissions.
Initially I implemented some protection by setting a unique session auth token (not the session id). While I realize this specific attack is not CSRF, I adapted my solution from these posts (albeit dated).
https://www.owasp.org/index.php/Cross-Site_Request_Forgery_%28CSRF%29
http://tyleregeto.com/a-guide-to-nonce
http://shiflett.org/articles/cross-site-request-forgeries
I've also read existing posts on SO, such as Practical non-image based CAPTCHA approaches?
However, the attacker now requests the form page first, starting a valid session, and then passes the session cookie in the following POST request. Therefore having a valid session token. So fail on my part.
I need to put some additional preventative measures in place. I'd like to avoid CAPTCHA (do to poor user experience) and JavaScript solutions if possible. I've also considered referrer checks (can be faked), honeypots (hidden fields), as well as rate limiting (which can be overcome by throttling). This attacker is persistent.
With that said, what would be a more robust solution.

If you are having a human that attacks specifically your page, then you need to find what makes this attacker different from the regular user.
If he spams you with certain URLs or text or alike - block them after they are submitted.
You can also quarantine submissions - don't let them go for say 5 minutes. Within those 5 minutes if you receive another submission to the same form from the same IP - discard both posts and block the IP.
CAPTCHA is good if you use good CAPTCHA, cause many custom home-made captchas are now recognized automatically by specially crafted software.
To summarize - your problem needs not just technical, but more social solutions, aiming at neutralizing the botmaster rather than preventing the bot from posting.

CAPTCHAs were invented for this exact reason. Because there is NO WAY to differentiate 100% between human and bot.
You can throttle your users by increasing a server-side counter, and when it reaches X times, then you can consider it as a bot attack, and lock the site out. Then, when some time elapse (save the time of the attack as well), allow entry.

i've thought a little about this myself.
i had an idea to extend the session auth token to also store a set of randomized form variable names. so instead of
<input name="title" ... >
you'd get
<input name="aZ5KlMsle2" ... >
and then additionally add a bunch of traps fields, which are hidden via css.
if any of the traps are filled out, then it was not a normal user, but a bot examining your html source...

How about a hidden form field? If it gets filled automatically by the bot, you accept the request, but dismiss it.

Importance of verifying user email on web signup

I know this question is crazy - but my employers client is demanding that email verification be removed from the sign up process (they feel it is impeding sign up).
I wanted to garner feedback from the programming community at large as to their experience and opinions regarding sign up and email verification - and the possible consequences of removing this safeguard.

I'm on their side -- 95% of the time websites don't actually need an e-mail address, they just collect it because all the other web registrations they've seen collect one. If you're worried about spam, use a captcha; e-mails are a horrible way to stop automated registrations. With sites like Mailinator to give people instant throwaway e-mails and BugMeNot to save people the hassle of dealing with registrations like yours, you should avoid making your registration any harder than it needs to be. Stack Overflow is a great example -- you don't even need to register to ask/answer questions

My guess is that robots will not bother going through a registration process. Your average simple-minded robot simply spams into a form that requires no other action (authentication, identification) at all. The mere act of asking for one or more extra clicks will prevent most simple-minded "attacks." If you look at the blog site for Coding Horror, they use a captcha with a constant capture word.
On the other hand, while a few extra clicks will deter dumb robots, they will not deter human spammers, jokers, griefers, etc. But then again, throwaway email addresses are pretty easy to come by, so if someone truly wants to fill your site with junk they can.
My conclusion is this: I guess you will get about 10% to 20% more "junk" on your pages, and between 5% and 25% more "desired" accesses, depending on how badly it was bothering your potential customers. Thus, I don't see any big harm in removing the email barrier.

Email is important to identify the user, for instance, when they forget their password. If email verification is setup in such a way that users are not able to log on until they verify their email address then I also think that it is impeding. The application should allow the user to log on and use the application and set it up so that the user needs to verify their email address in a fixed number of days, for example, a week. If they do not their account is suspended.
On the other hand if we have to remove the email verification then I think we would need to add a feature similar to the major email services that allow the user to reset their forgotten passwords in the absence of a valid email address.

Programmatic Bot Detection

I need to write some code to analyze whether or not a given user on our site is a bot. If it's a bot, we'll take some specific action. Looking at the User Agent is not something that is successful for anything but friendly bots, as you can specify any user agent you want in a bot. I'm after behaviors of unfriendly bots. Various ideas I've had so far are:
If you don't have a browser ID
If you don't have a session ID
Unable to write a cookie
Obviously, there are some cases where a legitimate user will look like a bot, but that's ok. Are there other programmatic ways to detect a bot, or either detect something that looks like a bot?

User agents can be faked. Captchas have been cracked. Valid cookies can be sent back to your server with page requests. Legitimate programs, such as Adobe Acrobat Pro can go in and download your web site in one session. Users can disable JavaScript. Since there is no standard measure of "normal" user behaviour, it cannot be differentiated from a bot.
In other words: it can't be done short of pulling the user into some form of interactive chat and hope they pass the Turing Test, then again, they could be a really good bot too.

Clarify why you want to exclude bots, and how tolerant you are of mis-classification.
That is, do you have to exclude every single bot at the expense of treating real users like bots? Or is it okay if bots crawl your site as long as they don't have a performance impact?
The only way to exclude all bots is to shut down your web site. A malicious user can distribute their bot to enough machines that you would not be able to distinguish their traffic from real users. Tricks like JavaScript and CSS will not stop a determined attacker.
If a "happy medium" is satisfactory, one trick that might be helpful is to hide links with CSS so that they are not visible to users in a browser, but are still in the HTML. Any agent that follows one of these "poison" links is a bot.

A simple test is javascript:
<script type="text/javascript">
document.write('<img src="/not-a-bot.' + 'php" style="display: none;">');
</script>
The not-a-bot.php can add something into the session to flag that the user is not a bot, then return a single pixel gif.
The URL is broken up to disguise it from the bot.

Here's an idea:
Most bots don't download css, javascript and images. They just parse the html.
If you could keep track in a user's session whether or not they download all of the above, e.g. by routing all of the download requests through a script that logs the attempts, then you could quickly identify users that only download the raw html (very few normal users will do this).

You say that it is okay that some users appear as bots, therefore,
Most bots don't run javascript. Use javascript to do an Ajax like call to the server that identifies this IP address as NonBot. Store that for a set period of time to identify future connections from this IP as good clients and to prevent further wasteful javascript calls.

For each session on the server you can determine if the user was at any point clicking or typing too fast. After a given number of repeats, set the "isRobot" flag to true and conserve resources within that session. Normally you don't tell the user that he's been robot-detected, since he'd just start a new session in that case.

Well, this is really for a particular page of the site. We don't want a bot submitting the form b/c it messes up tracking. Honestly, the friendly bots, Google, Yahoo, etc aren't a problem as they don't typically fill out the form to begin with. If we suspected someone of being a bot, we might show them a captcha image or something like that... If they passed, they're not a bot and the form submits...
I've heard things like putting a form in flash, or making the submit javascript, but I'd prefer not to prevent real users from using the site until I suspected they were a bot...

I think your idea with checking the session id will already be quite useful.
Another idea: You could check whether embedded resources are downloaded as well.
A bot which does not load images (e.g. to save time and bandwidth) should be distinguishable from a browser which typically will load images embedded into a page.
Such a check however might not be suited as a real-time check because you would have to analyze some sort of server log which might be time consuming.

Hey, thanks for all the responses. I think that a combination of a few suggestions will work well. Mainly, the hidden form element that times how fast the form was filled out, and possibly the "poison link" idea. I think that it will cover most basis. When you're talking about bots, you're not going to find them all, so there's no point thinking that you will... Silly bots.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string