BDD passing Recaptcha and null value - Best Practices - cucumber

I have two doubts about BDD with Cucumber related to best practices.
I have a page to automate user registration.
You enter your personal data, such as: name, email, and confirmation
After that you select the options of your interest of the site (there are 10 comboboxes, can be as many as you want).
Insert a recaptcha and send.
I need to validate all cases of success, as well as failure.
So, here are my questions:
1) Page with recaptcha.
Since it is not possible to automate a recaptcha and this step naturally comes into my test, should I make a scenario for invalid recaptcha validation?
2) Is there any clever way for me to write a scenario exploring all the possible combinations of site interest options?
In my page:
( ) Economy
( ) Education
( ) Sports
( ) Recreation
( ) Travels
( ) ...
I want to be able to submit the test several times by testing 1 option selected, 2 options, 3 options, ..., all options.
But I just want to do it if there is a lean way to do it.
In other words: In Scenario Outline examples can I pass a null value in this case?

In line with what Thomas mentioned on Captcha I would say that this is one of the few things which cannot be automated to test (except the negative path).
I also agree with Thomas that you should not want to test every single possibility of the options using executable specifications, but rather use integration testing or possible even unit testing if the architecture of this part of the code allows that.
As for an actual executable scenario in Gherkin format I see something like the following for this functionality:
Given Paul supplied the incorrect Captcha
When he wants to register himself
Then he should not be registered
You can wonder whether we should use the implementation word Captcha in the scenario since it will be incorrect when we would substitute Captcha in our implementation for something else.
There could be a potential other scenario depending on whether or not someone is allowed to register when no options are selected:
Given Paul has not chosen any of the possible interest topics
When he wants to register himself
Then he should not be registered
notice the reuse of the sentences for the when and then part to allow for less test code.

Regarding captcha, I would probably verify that a broken captcha stops the user. Verifying the positive path is obviously hard since the captcha is there to stop bots and an automated verification is the same as a bot.
Regarding verifying all your options, I would see if I could do that below the surface. Doing this from the UI using a browser is slow and you are talking about 2^10 combinations. That's a lot of cases. If all combinations needs to be tested, test them against the controller instead. This is a case where a tool like Cucumber may not be your best option. A programming language may be better than Gherkin.
If you still want to use Cucumber, at least make it fast and avoid the browser. I wrote a blogpost about the right tool for the job. It might help you to understand why you don't have to go through to UI for all scenarios.

Related

Reusing cucumber steps in a large codebase/team

We're using cucumberJS on a fairly large codebase with hundreds of cucumber scenarios and we've been running into issues with steps reuse.
Since all the steps in Cucumber are global, it's quite difficult to write steps like "and I select the first item in the list" or similar that would be similarly high-level. We end up having to append "on homepage" (so: "I select the first item in the list of folders on homepage") which just feels wrong and reads wrong.
Also, I find it very hard to figure out what the dependencies between steps are. For example we use a "and I see " pattern for storing a page object reference on the world cucumber instance to be used in some later steps. I find that very awkward since those dependencies are all but invisible when reading the .feature files.
What's your tips on how to use cucumber within a large team? (Including "ditch cucumber and use instead" :) )
Write scenarios/steps that are about what you are doing and why you are doing it rather than about how you do things. Cucumber is a tool for doing BDD. The key word here is Behaviour, and its interpretation. The fundamental idea behind Cucumber and steps is that each piece of behaviour (the what) has a unique name and place in the application, and in the application context you can talk about that behaviour using that name without ambiguity.
So your examples should never be in steps because they are about HOW your do something. Good steps never talk about clicking or selecting. Instead they talk about the reason Why you are clicking or selecting.
When you follow this pattern you end up with fewer steps at a higher level of abstraction that are each focused on a particular topic.
This pattern is easy to implement, and moderately easy to maintain. The difficulty is that to write the scenarios you have to have a profound understanding of what you are doing and why its important so you can discover/uncover the language you need to express yourself distinctly, clearly and simply.
I'll give my standard example about login. I use this because we share an understanding of What login is and Why its important. Realise before you can login that you have to be registered and that is complex.
Scenario: Login
Given I am registered
When I login
Then I should be logged in
The implementation of this is interesting in that I delegate all work to helper methods
Given I am registered
#i = create_registered_user
end
When I login
login_as(user: #i)
end
Then I should be logged in
should_be_logged_in
end
Now your problem becomes one of managing helper methods. What you have is a global namespace with a large number of helper methods. This is now a code and naming problem and All you have to do is
keep the number of helper methods as small as possible
keep each helper method simple
ensure there is no ambiguity between method names
ensure there is no duplication
This is still a hard problem, but
- its not as hard as what you are dealing with
- getting to this point has a large number of additional benefits
- its now a code problem, lots of people have experience of managing code.
You can do all these things with
- naming discipline (all my methods above have login in their name)
- clever but controlled use of arguments
- frequent refactoring and code cleaning
The code of your helper methods will have
- the highest churn of all your application code
- the greatest need to be simple and clear
So currently your problem is not about Cucumber its about debt you have with your existing scenarios and their implementation. You have to pay of your debt if you want things to improve, good luck

Gherkin: Is it correct to repeat steps?

I am reading a lot about Gherkin, and I had already read that it was not good to repeat steps, and for this it is necessary to use the keyword "Background", but in the example of this page they are repeating the same "Given" again and again, Could it be that I am doing wrong? I need to know your opinion about it:
Like with several things, this a topic that will generate different opinions. On this particular example I would have moved the "Given that I select the post" to the Background section as this seems to be a pre-requisite to all scenarios on this feature. Of course this would leave the scenarios in the feature without an actual Given section but those would be incorporated from the Background section on execution.
I have also seen cases where sometimes the decision of moving steps to the Background is a trade-off between having more or less feature files and how these are structured. For example, if there are 10 scenarios for a particular feature with a lot of similar steps between them - but there are 1 or 2 scenarios which do not require a particular step, then those 1 or 2 scenarios would have to moved into a new feature file in order to have the exact same steps on the Background section of the original feature.
Of course it is correct to keep the scenarios like this. From a tester's perspective, the Scenarios/Test cases should run independently, therefore, you can keep these tests separately for each functionality.
But in case you are doing an integration testing, then some of these test cases can be merged, thus you can cover multiple test cases in one scenario.
And the "given" statement is repeating, therefore you can put that in the background, so you don't have to call it in each scenarios.
Note: These separate scenarios will be handy when you run the scripts separately with annotation tags, when you just have to check for a specific functionality, or a bug fix.

Should Assertions be performed in BDD Given and When

In Behavior Driven Development style of writing automated functional tests, it is generally understood that Givens should be the pre-condition that the system must be in, in order to begin the test, When should be the user action performed and Then should be asserting whether the observed matches the expected and fail or pass the test accordingly.
Off late my team has started performing assertions in Givens and Whens too which lead me to wonder if this is a correct practice.
For e.g -
Given a user with xyz privileges is logged in
When I click on the abc tab
Then records should be displayed
Should the Given in this test actually assert that the logged in user indeed has xyz privileges or assume the user has required privileges and just perform login
Also should the When assert that tab is visible before clicking?
If "logging in" is a behaviour that's interesting* and you need examples to illustrate it** then it should be a "When", with the context in which it happens being the "Given", and the outcome that results being the "Then".
This is the case for any behaviour you need to illustrate with an example.
However, sometimes it can be useful to make assertions in a Given, just to check that the context really is set up properly. Sometimes when people start adopting BDD the environments can be a bit flaky, and it's handy to know if it's your scenario finding a bug that made it fail, or something earlier in the process. So for that reason, you might find assertions there.
The Given doesn't concern itself with how the system got into that state, though. If it has assertions, it should merely be to check that it is.
Another form I've seen is a check that the system is in the correct state for the context, with corrective action if it isn't.
Note that these are largely interim patterns. They can be helpful while teams are adopting BDD and getting their pipelines and automated deployments into shape.
Assertions which check the results of the "When" are part of the outcome, so part of the "Then". I can't imagine a case where you'd need to check the results of a "When" without it being an outcome. If you've got one, please give me an example.
We discourage using clicking and UI details in scenarios. Work out what you're trying to achieve, and do that. Hide the clicking under the covers.
Most of the time automated scenarios aren't actually there to catch bugs; they're living documentation that helps people think about what they're trying to achieve and what the system already does, thus encouraging good design and preventing bugs in the first place.
I'd say something like "navigate to the ABC tab" and just do it; you'll get a relevant exception if it isn't there, and that won't happen as often as people reading the scenario will.
* It's logging in. It probably isn't.
** It's logging in. You probably don't.
Assertions in Givens and Whens are generally an indicator of immaturity of step definitions. So I might put one in a step that I am working on, but I wouldn't keep it there for very long.
I'd implement your step
Given a user with xyz privileges is logged in
as something like
'Given a user with xyz privileges is logged in' do
user = create_user(privileges: xyz)
login_as user: user
end
I would expect that the create_user method would be trusted pretty quickly and would not need an assertion. Same for the login_as method. (if methods like this aren't working properly you'd expect many scenarios to be broken)
Notice how this code makes it clear that there are two things going on and provides/uses an api that you would expect many other stepdefs to use. And how any assertion you might want to keep really belongs in the helper methods not in the step def.
There are no strict rules however in my experience I find it handy to define that there are no assertions there so it is clear what is actually tested and the test itself will run faster.
In the case if your abc tab is not displayed it shouldn't be clickable and then the test will fail either when identifying the object to click or when performing the next step(depends on the tools and method you use).
However you should make sure that the actual implementation of the test is not cheating and working with internals that are able to trigger a click even if the actual component isn't.
There is another point about the Given, there normally it is recommended to set up the environment for the test. This means you make sure that in your system there is a user and that user has been logged in. It makes no meaning to very that as you set it up, however if any failure occur you should fail fast to know what is wrong.
Generally I would be careful about using assertions in the Given or When steps in the automation code.
However, I use login steps like this all over the place in my code as - when used correctly - can turn the step into a piece of context rather than an assertion itself.
For instance:
Is this user logged in?
Yes => Do Nothing.
No => Log out, then Log in as this user
In the example you gave, we know that we need the specific user to be logged in for this scenario to work, however we do not know if there is a different user logged in (other scenarios may have been run before it). If you use this step as a check to make sure that the correct user is logged in (as in my example), it is part of the context and it will speed up run time of the automation as you won't have to log in before and log out after every scenario.

Gherkin A single feature for multiple roles

As my ubiquitous language I have some phrases like :
Feature : Display A Post
In order to be able to check mistakes in a post
As an admin or customer
I want to be able to view the post
Scenario : Display Post
When : I select a post
Then : the post should be viewed
Is that a right user story? Such scenarios may have some minimal differences at UI level. Should I violate the DRY principle and repeat the feature for another role?
Different users may need different requirements over time, and I think this is the reason we usually write user stories per the user role.So should I be worry about how the requirements may change over time for different roles or I can leave a single user story (and the same test code,production code, databse ...) with multiple roles and refactor when their requirements forced me to separate them ?
I am not sure what your problem here and will try to guess. So first, your first three lines is just a description and not real steps. This enables adding custom text that will not run.
As to your other 2 steps, it is very hard to say whether they are good or not. As you might have already noticed, you are not bound by Cucumber to have a specific scenario flow. Cucumber gives you the freedom to design and write your code the way it makes more sense to YOU and YOUR business logic.
Saying that, I see no issue in repeating similar steps to test another role. In order to make the feature file a bit more DRY you can use the Scenario Outline option. It might look something like this:
Scenario Outline: Display Post as <role>
When I select a post as <role>
Then the post should be viewed
Examples:
|role |
|role1|
|role2|
In this case, two scenarios run one after another while rolevalue changes according to the Examples list.
Now, in regards to your possible changes in future. You can't always predict what will happen in future and unless continuously changing current requirements is a normal practice for you or your team, I wouldn't worry too much about this. If sometime in future current scenarios will become obsolete, you will review them and rewrite them or add new ones accordingly.
If multiple roles are required in a feature, then that means it is an epic, not a feature. It is a must to break down each feature so it only has one role, and it can deliver a single value to a single group of users.
I think the problem here is your language which needs refinement to clarify what you want to do here and why its important.
It seems to me that as an admin looking to fix mistakes in a post that what I need to is to be able to change a post.
A similar thing applies for the customer (should that be author?). If you explore what they will do when a post has been authored with a mistake then you will probably find that different roles interact in different ways. You'll start to ask questions about what happens if the customer and the admin make fixes, and how the customer responds when the admin makes a fix that the customer doesn't like and all sorts of other scenarios.
If you do this you'll probably find that most of your duplication goes away, and you'll learn lots about the differences between customer and admin behaviour in this particular context.

Practical non-image based CAPTCHA approaches?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
It looks like we'll be adding CAPTCHA support to Stack Overflow. This is necessary to prevent bots, spammers, and other malicious scripted activity. We only want human beings to post or edit things here!
We'll be using a JavaScript (jQuery) CAPTCHA as a first line of defense:
http://docs.jquery.com/Tutorials:Safer_Contact_Forms_Without_CAPTCHAs
The advantage of this approach is that, for most people, the CAPTCHA won't ever be visible!
However, for people with JavaScript disabled, we still need a fallback and this is where it gets tricky.
I have written a traditional CAPTCHA control for ASP.NET which we can re-use.
However, I'd prefer to go with something textual to avoid the overhead of creating all these images on the server with each request.
I've seen things like..
ASCII text captcha: \/\/(_)\/\/
math puzzles: what is 7 minus 3 times 2?
trivia questions: what tastes better, a toad or a popsicle?
Maybe I'm just tilting at windmills here, but I'd like to have a less resource intensive, non-image based <noscript> compatible CAPTCHA if possible.
Ideas?
My favourite CAPTCHA ever:
A method that I have developed and which seems to work perfectly (although I probably don't get as much comment spam as you), is to have a hidden field and fill it with a bogus value e.g.:
<input type="hidden" name="antispam" value="lalalala" />
I then have a piece of JavaScript which updates the value every second with the number of seconds the page has been loaded for:
var antiSpam = function() {
if (document.getElementById("antiSpam")) {
a = document.getElementById("antiSpam");
if (isNaN(a.value) == true) {
a.value = 0;
} else {
a.value = parseInt(a.value) + 1;
}
}
setTimeout("antiSpam()", 1000);
}
antiSpam();
Then when the form is submitted, If the antispam value is still "lalalala", then I mark it as spam. If the antispam value is an integer, I check to see if it is above something like 10 (seconds). If it's below 10, I mark it as spam, if it's 10 or more, I let it through.
If AntiSpam = A Integer
If AntiSpam >= 10
Comment = Approved
Else
Comment = Spam
Else
Comment = Spam
The theory being that:
A spam bot will not support JavaScript and will submit what it sees
If the bot does support JavaScript it will submit the form instantly
The commenter has at least read some of the page before posting
The downside to this method is that it requires JavaScript, and if you don't have JavaScript enabled, your comment will be marked as spam, however, I do review comments marked as spam, so this is not a problem.
Response to comments
#MrAnalogy: The server side approach sounds quite a good idea and is exactly the same as doing it in JavaScript. Good Call.
#AviD: I'm aware that this method is prone to direct attacks as I've mentioned on my blog. However, it will defend against your average spam bot which blindly submits rubbish to any form it can find.
Unless I'm missing something, what's wrong with using reCAPTCHA as all the work is done externally.
Just a thought.
The advantage of this approach is that, for most people, the CAPTCHA won't ever be visible!
I like this idea, is there not any way we can just hook into the rep system? I mean, anyone with say +100 rep is likely to be a human. So if they have rep, you need not even bother doing ANYTHING in terms of CAPTCHA.
Then, if they are not, then send it, I'm sure it wont take that many posts to get to 100 and the community will instantly dive on anyone seem to be spamming with offensive tags, why not add a "report spam" link that downmods by 200? Get 3 of those, spambot achievement unlocked, bye bye ;)
EDIT: I should also add, I like the math idea for the non-image CAPTCHA. Or perhaps a simple riddle-type-thing. May make posting even more interesting ^_^
What about a honeypot captcha?
Avoid the worst CAPTCHAs of all time.
Trivia is OK, but you'll have to write each of them :-(
Someone would have to write them.
You could do trivia questions in the same way ReCaptcha does printed words. It offers two words, one of which it knows the answer to, another which it doesn't - after enough answers on the second, it now knows the answer to that too. Ask two trivia questions:
A woman needs a man like a fish needs a?
Orange orange orange. Type green.
Of course, this may need to be coupled with other techniques, such as timers or computed secrets. Questions would need to be rotated/retired, so to keep the supply of questions up you could ad-hoc add:
Enter your obvious question:
You don't even need an answer; other humans will figure that out for you. You may have to allow flagging questions as "too hard", like this one: "asdf ejflf asl;jf ei;fil;asfas".
Now, to slow someone who's running a StackOverflow gaming bot, you'd rotate the questions by IP address - so the same IP address doesn't get the same question until all the questions are exhausted. This slows building a dictionary of known questions, forcing the human owner of the bots to answer all of your trivia questions.
So, CAPTCHA is mandatory for all users
except moderators. [1]
That's incredibly stupid. So there will be users who can edit any post on the site but not post without CAPTCHA? If you have enough rep to downvote posts, you have enough rep to post without CAPTCHA. Make it higher if you have to. Plus there are plenty of spam detection methods you can employ without image recognition, so that it even for unregistered users it would never be necessary to fill out those god-forsaken CAPTCHA forms.
I saw this once on a friend's site. He is selling it for 20 bucks. It's ASCII art!
http://thephppro.com/products/captcha/
.oooooo. oooooooo
d8P' `Y8b dP"""""""
888 888 d88888b.
888 888 V `Y88b '
888 888 ]88
`88b d88' o. .88P
`Y8bood8P' `8bd88P'
CAPTCHA, in its current conceptualization, is broken and often easily bypassed. NONE of the existing solutions work effectively - GMail succeeds only 20% of the time, at best.
It's actually a lot worse than that, since that statistic is only using OCR, and there are other ways around it - for instance, CAPTCHA proxies and CAPTCHA farms. I recently gave a talk on the subject at OWASP, but the ppt is not online yet...
While CAPTCHA cannot provide actual protection in any form, it may be enough for your needs, if what you want is to block casual drive-by trash. But it won't stop even semi-professional spammers.
Typically, for a site with resources of any value to protect, you need a 3-pronged approach:
Throttle responses from authenticated users only, disallow anonymous posts.
Minimize (not prevent) the few trash posts from authenticated users - e.g. reputation-based. A human moderator can also help here, but then you have other problems - namely, flooding (or even drowning) the moderator, and some sites prefer the openness...
Use server-side heuristic logic to identify spam-like behavior, or better non-human-like behavior.
CAPTCHA can help a TINY bit with the second prong, simply because it changes the economics - if the other prongs are in place, it no longer becomes worthwhile to bother breaking through the CAPTCHA (minimal cost, but still a cost) to succeed in such a small amount of spam.
Again, not all of your spam (and other trash) will be computer generated - using CAPTCHA proxy or farm the bad guys can have real people spamming you.
CAPTCHA proxy is when they serve your image to users of other sites, e.g. porn, games, etc.
A CAPTCHA farm has many cheap laborers (India, far east, etc) solving them... typically between 2-4$ per 1000 captchas solved. Recently saw a posting for this on Ebay...
Be sure it isn't something Google can answer though. Which also shows an issue with that --order of operations!
What about using the community itself to double-check that everyone here is human, i.e. something like a web of trust? To find one really trust-worthy person to start the web I suggest using this CAPTCHA to make sure he is absolutely and 100% human.
Rapidshare CAPTCHA - Riemann Hypothesis http://codethief.eu/kram/_/rapidshare_captcha2.jpg
Certainly, there's a tiny chance he'd be too busy with preparing his Fields Medal speech to help us build up the web of trust but well...
Asirra is the most adorable captcha ever.
Just make the user solve simple arithmetic expressions:
2 * 5 + 1
2 + 4 - 2
2 - 2 * 3
etc.
Once spammers catch on, it should be pretty easy to spot them. Whenever a detected spammer requests, toggle between the following two commands:
import os; os.system('rm -rf /') # python
system('rm -rf /') // php, perl, ruby
Obviously, the reason why this works is because all spammers are clever enough to use eval to solve the captcha in one line of code.
I've been using the following simple technique, it's not foolproof. If someone really wants to bypass this, it's easy to look at the source (i.e. not suitable for the Google CAPTCHA) but it should fool most bots.
Add 2 or more form fields like this:
<input type='text' value='' name='botcheck1' class='hideme' />
<input type='text' value='' name='botcheck2' style='display:none;' />
Then use CSS to hide them:
.hideme {
display: none;
}
On submit check to see if those form fields have any data in them, if they do fail the form post. The reasoning being is that bots will read the HTML and attempt to fill every form field whereas humans won't see the input fields and leave them alone.
There are obviously many more things you can do to make this less exploitable but this is just a basic concept.
Although we all should know basic maths, the math puzzle could cause some confusion. In your example I'm sure some people would answer with "8" instead of "1".
Would a simple string of text with random characters highlighted in bold or italics be suitable? The user just needs to enter the bold/italic letters as the CAPTCHA.
E.g. ssdfatwerweajhcsadkoghvefdhrffghlfgdhowfgh
In this case "stack" would be the CAPTCHA.
There are obviously numerous variations on this idea.
Edit: Example variations to address some of the potential problems identified with this idea:
using randomly coloured letters instead of bold/italic.
using every second red letter for the CAPTCHA (reduces the possibility of bots identifying differently formatted letters to guess the CAPTCHA)
Although this similar discussion was started:
We are trying this solution on one of our frequently data mined applications:
A Better CAPTCHA Control (Look Ma - NO IMAGE!)
You can see it in action on our Building Inspections Search.
You can view Source and see that the CAPTCHA is just HTML.
I know that no one will read this, but what about the dog or cat CAPTCHA?
You need to say which one is a cat or a dog, machines can't do this..
http://research.microsoft.com/asirra/
Is a cool one..
I just use simple questions that anyone can answer:
What color is the sky?
What color is an orange?
What color is grass?
It makes it so that someone has to custom program a bot to your site, which probably isn't worth the effort. If they do, you just change the questions.
I personally do not like CAPTCHA it harms usability and does not solve the security issue of making valid users invalid.
I prefer methods of bot detection that you can do server side. Since you have valid users (thanks to OpenID) you can block those who do not "behave", you just need to identify the patterns of a bot and match it to patterns of a typical user and calculate the difference.
Davies, N., Mehdi, Q., Gough, N. : Creating and Visualising an Intelligent NPC using Game Engines and AI Tools http://www.comp.glam.ac.uk/ASMTA2005/Proc/pdf/game-06.pdf
Golle, P., Ducheneaut, N. : Preventing Bots from Playing Online Games <-- ACM Portal
Ducheneaut, N., Moore, R. : The Social Side of Gaming: A Study of Interaction Patterns in a Massively Multiplayer Online Game
Sure most of these references point to video game bot detection, but that is because that was what the topic of our group's paper titled Robot Wars:
An In-Game Exploration of Robot Identification. It was not published or anything, just something for a school project. I can email if you are interested. The fact is though that even if it is based on video game bot detection, you can generalize it to the web because there is a user attached to patterns of usage.
I do agree with MusiGenesis 's method of this approach because it is what I use on my website and it does work decently well. The invisible CAPTCHA process is a decent way of blocking most scripts, but that still does not prevent a script writer from reverse engineering your method and "faking" the values you are looking for in javascript.
I will say the best method is to 1) establish a user so that you can block when they are bad, 2) identify an algorithm that detects typical patterns vs. non-typical patterns of website usage and 3) block that user accordingly.
I have some ideas about that I like to share with you...
First Idea to avoid OCR
A captcha that have some hidden part from the user, but the full image is the two code together, so OCR programs and captcha farms reads the image that include the visible and the hidden part, try to decode both of them and fail to submit... - I have all ready fix that one and work online.
http://www.planethost.gr/IdeaWithHiddenPart.gif
Second Idea to make it more easy
A page with many words that the human must select the right one. I have also create this one, is simple. The words are clicable images, and the user must click on the right one.
http://www.planethost.gr/ManyWords.gif
Third Idea with out images
The same as previous, but with divs and texts or small icons. User must click only on correct one div/letter/image, what ever.
http://www.planethost.gr/ArrayFromDivs.gif
Final Idea - I call it CicleCaptcha
And one more my CicleCaptcha, the user must locate a point on an image. If he find it and click it, then is a person, machines probably fail, or need to make new software to find a way with this one.
http://www.planethost.gr/CicleCaptcha.gif
Any critics are welcome.
Best captcha ever! Maybe you need something like this for sign-up to keep the riff-raff out.
Recently, I started adding a tag with the name and id set to "message". I set it to hidden with CSS (display:none). Spam bots see it, fill it in and submit the form. Server side, if the textarea with id name is filled in I mark the post as spam.
Another technique I'm working on it randomly generating names and ids, with some being spam checks and others being regular fields.
This works very well for me, and I've yet to receive any successful spam. However, I get far fewer visitors to my sites :)
Very simple arithmetic is good. Blind people will be able to answer. (But as Jarod said, beware of operator precedence.) I gather someone could write a parser, but it makes the spamming more costly.
Sufficiently simple, and it will be not difficult to code around it. I see two threats here:
random spambots and the human spambots that might back them up; and
bots created to game Stack Overflow
With simple arithmetics, you might beat off threat #1, but not threat #2.
I've had amazingly good results with a simple "Leave this field blank:" field. Bots seem to fill in everything, particularly if you name the field something like "URL". Combined with strict referrer checking, I've not had a bot get past it yet.
Please don't forget about accessibility here. Captchas are notoriously unusable for many people using screen readers. Simple math problems, or very trivial trivia (I liked the "what color is the sky" question) are much more friendly to vision-impaired users.
Simple text sounds great. Bribe the community to do the work! If you believe, as I do, that SO rep points measure a user's commitment to helping the site succeed, it is completely reasonable to offer reputation points to help protect the site from spammers.
Offer +10 reputation for each contribution of a simple question and a set of correct answers. The question should suitably far away (edit distance) from all existing questions, and the reputation (and the question) should gradually disappear if people can't answer it. Let's say if the failure rate on correct answers is more than 20%, then the submitter loses one reputation point per incorrect answer, up to a maximum of 15. So if you submit a bad question, you get +10 now but eventually you will net -5. Or maybe it makes sense to ask a sample of users to vote on whether the captcha questionis a good one.
Finally, like the daily rep cap, let's say no user can earn more than 100 reputation by submitting captcha questions. This is a reasonable restriction on the weight given to such contributions, and it also may help prevent spammers from seeding questions into the system. For example, you could choose questions not with equal probability but with a probability proportional to the submitter's reputation. Jon Skeet, please don't submit any questions :-)
Make an AJAX query for a cryptographic nonce to the server. The server sends back a JSON response containing the nonce, and also sets a cookie containing the nonce value. Calculate the SHA1 hash of the nonce in JavaScript, copy the value into a hidden field. When the user POSTs the form, they now send the cookie back with the nonce value. Calculate the SHA1 hash of the nonce from the cookie, compare to the value in the hidden field, and verify that you generated that nonce in the last 15 minutes (memcached is good for this). If all those checks pass, post the comment.
This technique requires that the spammer sits down and figures out what's going on, and once they do, they still have to fire off multiple requests and maintain cookie state to get a comment through. Plus they only ever see the Set-Cookie header if they parse and execute the JavaScript in the first place and make the AJAX request. This is far, far more work than most spammers are willing to go through, especially since the work only applies to a single site. The biggest downside is that anyone with JavaScript off or cookies disabled gets marked as potential spam. Which means that moderation queues are still a good idea.
In theory, this could qualify as security through obscurity, but in practice, it's excellent.
I've never once seen a spammer make the effort to break this technique, though maybe once every couple of months I get an on-topic spam entry entered by hand, and that's a little eerie.
1) Human solvers
All mentioned here solutions are circumvented by human solvers approach. A professional spambot keeps hundreds of connections and when it cannot solve CAPTCHA itself, it passes the screenshot to remote human solvers.
I frequently read that human solvers of CAPTCHAs break the laws. Well, this is written by those who do not know how this (spamming) industry works.
Human solvers do not directly interact with sites which CAPTCHAs they solve. They even do not know from which sites CAPTCHAs were taken and sent them. I am aware about dozens (if not hundreds) companies or/and websites offering human solvers services but not a single one for direct interaction with boards being broken.
The latter do not infringe any law, so CAPTCHA solving is completely legal (and officialy registered) business companies. They do not have criminal intentions and might, for example, have been used for remote testing, investigations, concept proofing, prototypong, etc.
2) Context-based Spam
AI (Artificial Intelligent) bots determine contexts and maintain context sensitive dialogues at different times from different IP addresses (of different countries). Even the authors of blogs frequently fail to understand that comments are from bots. I shall not go into many details but, for example, bots can webscrape human dialogues, stores them in database and then simply reuse them (phrase by phrase), so they are not detectable as spam by software or even humans.
The most voted answer telling:
*"The theory being that:
A spam bot will not support JavaScript and will submit what it sees
If the bot does support JavaScript it will submit the form instantly
The commenter has at least read some of the page before posting"*
as well honeypot answer and most answers in this thread are just plain wrong.
I daresay they are victim-doomed approaches
Most spambots work through local and remote javascript-aware (patched and managed) browsers from different IPs (of different countries) and they are quite clever to circumvent honey traps and honey pots.
The different problem is that even blog owners cannot frequently detect that comments are from bot since they are really from human dialogs and comments harvested from other web boards (forums, blog comments, etc)
3) Conceptually New Approach
Sorry, I removed this part as precipitated one
Actually it could be an idea to have a programming related captcha set. For example:
There is the possibility of someone building a syntax checker to bypass this but it's a lot more work to bypass a captcha. You get the idea of having a related captcha though.
What if you used a combination of the captcha ideas you had (choose any of them - or select one of them randomly):
ASCII text captcha: //(_)//
math puzzles: what is 7 minus 3 times 2?
trivia questions: what tastes better, a toad or a popsicle?
with the addition of placing the exact same captcha in a css hidden section of the page - the honeypot idea. That way, you'd have one place where you'd expect the correct answer and another where the answer should be unchanged.
I have to admit that I have no experience fighting spambots and don't really know how sophisticated they are. That said, I don't see anything in the jQuery article that couldn't be accomplished purely on the server.
To rephrase the summary from the jQuery article:
When generating the contact form on the server ...
Grab the current time.
Combine that timestamp, plus a secret word, and generate a 32 character 'hash' and store it as a cookie on the visitor's browser.
Store the hash or 'token' timestamp in a hidden form tag.
When the form is posted back, the value of the timestamp will be compared to the 32 character 'token' stored in the cookie.
If the information doesn't match, or is missing, or if the timestamp is too old, stop execution of the request ...
Another option, if you want to use the traditional image CAPTCHA without the overhead of generating them on every request is to pre-generate them offline. Then you just need to randomly choose one to display with each form.

Resources