Unwanted garbage input into HTML form via bot?

Unwanted garbage input into HTML form via bot? - security

I have a website where I have a job section. I allow applicants fill out job applications online. No login is required. The data input gets stored in a database.
I have NOT put any captcha or bot blocking mechanism in the HTML form. I understand that this is a dumb thing to do. But mine is a small website and I did not spend too much time programming this.
I see every once in a while garbage inputs into the application form fields like the following:
yRERRCEXEUOMCew
Some times the 'City' field in the data would have a valid input (such as New York)
I am trying to understand where does this input come from and what would anyone gain by doing this.
Thanks

It comes from spam bots and they just submit random information to check to see if it is a working form or can send email, etc. If you are looking for a non-intrusive method (i.e. no CAPTCHA or JavaScript) to prevent spam bots from submitting bogus data, I would highly recommend throttling form submissions. If you are using PHP, you could use code like this:
// Sessions needed to tie forms to specific users
session_start();
// Process form here
if ( isset($_POST['submit']) )
{
$now = time();
// See if the current time less the start time is less than or equal to 5 seconds
if ( ( $now - $_SESSION['start_time'] )
Note: this will not stop dedicated bots nor will it provide any real security. It will stop automatic flood bots though since they will not normally wait 5 seconds between submissions.
Hope this helps.

I am trying to understand where does this input come from ?
It can come from any where , a user can also input this as your app isn't validating this input.
and what would anyone gain by doing this.
No Idea
You can put captcha. or simply you can add two attribute
lets say
1,98 and one operation sign(for example , +) let the user perform and validate this thing # server.
Also See
Practical non-image based CAPTCHA approaches

Related

Preventing bots from doing form submissions

At my site, I present a form for visitor input. No login is required. I cannot require a login. So anyone browsing the site can submit the form. It also opens up the form to bots. I need to prevent the bots. I had asked the question on the following thread.
Unwanted garbage input from bots?
I did get some useful response. I read a few solutions to the this (captcha and non-captcha).
Mine is not a site where a I get significant traffic. My users are not terribly computer savvy. So I was thinking of doing something like this. I am not a very accomplished programmer and what I am saying here may be very stupid. But I am simply trying to learn, so please bear with me.
Every time I present the form, I generate a unique key (unix time + remote host IP). I store the key in a db table and I send out the form with the key being a hidden field on the form. When a form is submitted, I check to see if the value for the key is in the db table. If it is, I remove the key from the db table and I process the form. If the key is not in the db table, I discard the form and ask the user to do the operation again.
With every submission I also remove stale entries(where the users did not submit the form within a stipulated time). I will need to have some mechanism where I prevent the request for the form, from bots. Say for example, if I have n number of pending requests from a particular host, I ask people to request for the form after a few moments.
Will something like this work?

the bots will be able to request the hidden field and submit it anyway. try a non-re-captcha library so that your users don't get overwhelmed (recaptcha is overwhelming due to its extra goal of hijacking your users to do OCR of pretty illegible text).
however, since you ask for a non-captcha solution, i would propose that you measure the time between form request and form submission (with the hidden key). a bot would submit the form within a couple of seconds of request, but a human would not.
if you find that this simple approach does not work for your site then you can try something more complex.

You could also hide the form and then a user would have to click on a button to reveal it. Much like how twitter does it when you log in.

I wouldn't worry too much about bots submitting your form. It's not gonna happen. If you're terribly fearful then instead of a captcha ask a stupid question like "what is 1+1?" before a submission.

It all depends on how desperately the spammers want to submit junk to your form. Your method will work for the most stupid of bots, but as agks mehx pointed out it's trivial for a bot to load up the form and extract the field if someone bothers to take a minute or so to tweak their bot.
At the other end of the spectrum, there's little you can do to automatically stop the "pay people in certain countries the equivalent of 10¢/hr to spam every board they can find" tactic without locking things down to an extent that also prevents the general public from posting useful comments.

What about hashing form field names so the name is different each time? hash(Original field name + time stamp + secret salt) and the just pass the time stamp with the form, it will take ages for the bot to figure it out, especially if the salt is different per user and changes every couple of hours/days. Just an idea I had. Was wondering if you think it would stop bots?

Is this method of checking a "gift code" secure?

I have a backend that generates gift codes, each with a certain number of uses. Give these to a blogger or whatever, and their readership can redeem the code for a promotional item.
I'm working on the best way to check a codes validity without having collisions/dupes, or anything like that. I need to 1) validate the code 2) collect shipping info
My first draft was
A) Check code via a form, if good, proceed to address input. When input is received, save code and address/name etc.
This fails because if there are 74 uses on a 75 use code, 25 people could "validate" but not enter their address yet, and we'd end up with more than 75 valid redemptions.
My current solution looks more like:
B) Just have the code as the first field in the information gathering form, and when a valid code is typed in, ajaxify that and live check it against the DB. If the code is valid, it then shows the rest of the form, and that entry of the code is "claimed" for half an hour or something. If no DB entry w/in half an hour, it's then released.
This seems pretty complex, and I'm wondering if I'd need to do throttling against the ajax attempts to make sure people don't brute force a valid code.
Is this method secure, and/or are there any other blatantly obvious patterns I'm missing for this type of application?

Let everyone enter their gift-code and address, and then submit
In the backend, verify the address and the gift-code.
If the gift-code is valid and not exhausted, congratulate the user. Else apologise to them and suggest they buy it instead anyway.
Does it have to be more complicated than that?

Why don't you just have one form with all the information (redemption code and shipping info)?
Then, when the user submits, atomically (using transactions on your database) check if it's valid and commit the user's information.
If the code is no longer valid, just show a message like "Sorry, the redemption code you used has been depleted and is no longer valid."

Just wanted to add, if you're worried about bruteforcing attempts, you can require a captcha or javascript based hashcash value to be submitted along with the gift code. If you want to be as unobtrusive as possible, you can only require this for subsequent attempts after the first failed one.
One thing you might consider, is after the user enters a gift code, create an intermediate page that has more details about the offer, shows the number of claims remaining, and has some information about what will be required to complete the offer (address, creditcard, whatever). If the user chooses to claim the offer, have a 10-15 minute countdown (updated via javascript) on the data entry page for the address and other personal information, so the user knows that the offer might expire if they don't enter their information immediately.
Another thing to consider is implementing a "cancel" button that indicates the user can make the offer available for another user, without waiting for the countdown to expire.

Your current solution looks like the proper one, although I think you left out the method by which you associate the user association with the code. Still, providing the functionality of "reserving" a redemption of the code for a user is a good solution.

Option B seems reasonable. Just use a captcha rather than trying to throttle it. Captchas aren't perfect but it's less obnoxious than say misreading the code three times and then being denied the ability to try another for 24 hours. This will work particularly well if you're already planning on doing it AJAXy.
So -
User will fill in code field and captcha.
You'll confirm the captcha, then confrim the code.
Once successful, the user will fill in the other info and submit.
Using this method you could also probably only lock the code for something more like 5 minutes (ticket agency style) and show a timer on the form somewhere notifying the user.

Your A method (Check code via a form, if good, proceed to address input) looks very reasonable. Just combine it with B's "code is "claimed" for half an hour or something", and everything should work as you expect.
That is:
Customer enters code
Check code – if valid, and not already used MAX+ times, add an extra entry in code use table, with a timestamp that expires after x minutes.
Collect other info
On submit, permanently mark the code as used (remove entry expiration)
If customer never makes the order, the timestamped entry is removed (or ignored) after time x, and released for others to use.

We do a low end encryption (RC4) with a checksum added for this type of thing. Because RC4 generates a problematic character set, we also converted it to HEX. The combination is relatively secure and self checking. The decrypted value is just a number that we can verify in the database. This works with both our eMail reminders and gift certificates.

How can I prevent bulk vulnerability scanning without using a CAPTCHA component?

How can I prevent that forms can be scanned with a sort of massive vulnerability scanners like XSSME, SQLinjectMe (those two are free Firefox add-ons), Accunetix Web Scanner and others?
These "web vulnerability scanners" work catching a copy of a form with all its fields and sending thousands of tests in minutes, introducing all kind of malicious strings in the fields.
Even if you sanitize very well your input, there is a speed response delay in the server, and sometimes if the form sends e-mail, you vill receive thousands of emails in the receiver mailbox. I know that one way to reduce this problem is the use of a CAPTCHA component, but sometimes this kind of component is too much for some types of forms and delays the user response (as an example a login/password form).
Any suggestion?
Thanks in advance and sorry for my English!

Hmm, if this is a major problem you could add a server-side submission-rate limiter. When someone submits a form, store some information in a database about their IP address and what time they submitted the form. Then whenever someone submits the form, check the database to see if it's been "long enough" since the last time that IP address submitted the form. Even a fairly short wait like 10 seconds would seriously slow down this sort of automated probing. This database could be automatically cleared out every day/hour/whatever, you don't need to keep the data around for long.
Of course someone with access to a botnet could avoid this limiter, but if your site is under attack by a large botnet you probably have larger problems than this.

On top the rate-limiting solutions that others have offered, you may also want to implement some logging or auditing on sensitive pages and forms to make sure that your rate limiting actually works. It could be something simple like just logging request counts per IP. Then you can send yourself an hourly or daily digest to keep an eye on things without having to repeatedly check your site.

Theres only so much you can do... "Where theres a will theres a way", anything that you want the user to do can be automated and abused. You need to find a median when developing, and toss in a few things that may make it harder for abuse.
One thing you can do is sign the form with a hash, for example if the form is there for sending a message to another user you can do this:
hash = md5(userid + action + salt)
then when you actually process the response you would do
if (hash == md5(userid + action + salt))
This prevents the abuser from injecting 1000's of user id's and easily spamming your system. Its just another loop for the attacker to jump through.
Id love to hear other peoples techniques. CAPTCHA's should be used on entry points like registration. And the method above should be used on actions to specific things (messaging, voting, ...).
also you could create a flagging system, and anything the user does X times in X amount of time that may look fishy would flag the user, and make them do a CAPTCHA (once they enter it they are no longer flagged).

This question is not exactly like the other questions about captchas but I think reading them if you haven't already would be worthwhile. "Honey Pot Captcha" sounds like it might work for you.
Practical non-image based CAPTCHA approaches?
What can be done to prevent spam in forum-like apps?

Reviewing all the answers I had made one solution customized for my case with a little bit of each one:
I checked again the behavior of the known vulnerability scanners. They load the page one time and with the information gathered they start to submit it changing the content of the fields with malicious scripts in order to verify certain types of vulnerabilities.
But: What if we sign the form? How? Creating a hidden field with a random content stored in the Session object. If the value is submitted more than n times we just create it again. We only have to check if it matches, and if it don't just take the actions we want.
But we can do it even better: Why instead to change the value of the field, we change the name of the field randomly? Yes changing the name of the field randomly and storing it in the session object is maybe a more tricky solution, because the form is always different, and the vulnerability scanners just load it once. If we don’t get input for a field with the stored name, simply we don't process the form.
I think this can save a lot of CPU cycles. I was doing some test with the vulnerability scanners mentioned in the question and it works perfectly!
Well, thanks a lot to all of you, as a said before this solution was made with a little bit of each answer.

How to encourage a user to fill in long application forms?

What I can think of is pre-populating certain form input elements based on the user's geographical information.
What are other ways can you think of to speed up user input on long application forms?
Or at least keep them focus on completing the application form?

If you have a long form, try to prune it down. Don't ask them to fill in fields that you don't really need.
If the form spans several pages, give the user some feedback as to how many more pages there are. We users hate clicking on the continue button wondering if this will be the last page.
Never lose a field that they filled in, no matter what they do. This could have security implications if passwords are involved.
Use dropdowns to provide the user with options unless there are a lot of options that the user would have to scroll through or if the terms in the dropdown aren't widely accepted (e.g. dropdown filled with Systems Engineer, Solution Developer, IT Application... I just want Programmer.).
Provide help for fields that might be hard to fill in (or provide examples).

If it is possible in your case, just collect the bare minimum up front and then allow the user to use the basic features of your service.
For the user to upgrade to a better level of service, they will need to fill in the 2nd form with more detail.
How important it is to you to collect ALL that information up front ? It is worth losing customers by demanding too much from them ? Why not demand it later at a time more convenient to the user.

Creating a multi-step wizard offering only a small number of input fields per step. Ensure that they are aware of how far they have progressed in the sequence.
The psychology is that once a user is 'invested' in a task, they are more likely to continue. If you present the whole list of input fields at once, you scare them off.
Offering musings at each step (cartoon, humor, sayings etc) makes them move to the next step out of curiosity.

Users won't mind filling in long forms if and only if they feel that the questions that you ask are important: otherwise they will be discouraged, and become impatient with it.
Remember, in a web application people have very, very short attention spans. When the user starts feeling that you are asking too much, they're usually right.
Keep required information as few as possible: other info should only be optional, and you have to give something in return to the user to compel them to complete that information.

However you implement it, please please please use some kind of Ajax hearbeat to store their progress server side and repopulate it if it's lost. There is nothing more infuriating to a user that working through a long form and having a browser or network hiccup lose their entire submission.
Whenever it happens to me I generally never give it a second shot, because at that point recreating my submission isn't worth whatever I was signing up for.

Checklist:
Explain clearly the purpose of the form. (What's in it for them?)
Prune, prune, prune, and keep questions clearly relevant!
Give the user feedback on his/her progress (if the form is split over multiple pages)
Ask for as little as you can up-front and leave the rest for later.
Clearly mark required fields
Group fields logically.
Keep labels/headings brief and easy to understand.
Prefill as much as possible - but not too much.
Spread super long forms over multiple pages and allow backtracking.
Cleverly placed "Back", "Save" and "Cancel" buttons put people's minds at ease - even when redundant.
Provide friendly (but clear!) validation error messages, in a timely manner.
Allow the user to reclaim half-filled in forms - don't lose their data!
No matter what you do, do not include a reset button. :-)
Finally:
Explicitly tell the user when the process is finished. ("Thank you! Your application has been sent.")
Tell the user what will happen next. ("A confirmation e-mail has been sent to your e-mail address, and we'll process your application within two working days.")

use Ajax to populate and update the controls asynchronously.It will speedup the filling of long application forms.

Split it up into multiple pages - there's nothing quite so discouraging as seeing that you have another 100 questions to go.
Put validation on each input and check it onblur(). If they get to the end of the page and then it says "question #2 was incorrect", chances are they've forgotten what that one was anyway and it'll be more difficult to return to it. Plus, if they answer a series of similar inputs in a particular, incorrect way, you should let them know straight away (eg: entering dates as mm/dd/yyyy when you want dd/mm/yyy)

Split the form into several steps. It's like how someone is much more likely to read five 3-sentence paragraphs than one big 15-sentence paragraph of the same length.

I agree with tim; just let them fill in the bare minimum information and then leave the rest to profile updates. If any data is necessary for the service offered on your site, ask for it when they try to avail of the service (and no earlier).
That said, I wouldn't advocate the kind of forcing function that adam suggests. It pays to give your users the warm, fuzzy feeling that they are privileged and can use ALL of the services on your site. Although, if you look at it hard enough, adam's and my suggestions are pretty much the same.

If the application needs to include a lot of information, then make sure the user can save at any point, and log off, and log in later to complete the form. This would make more sense if some of the information is not necessarily easily available. Tax returns are an obvious example, where some of the data may need to be calculated, or the user must find the relevant documentation.
In some cases the user might use the same information in multiple applications. In that case it might make sense for the user to register their details (Name, Address, Telephone numbers, etc), which are automatically filled in on each application. For example, if you had a website for a recruitment agency, they may allow users to register their details, and then to apply for a particular job, they can just include a personal statement that applies to that job in particular.
As another consideration, if some information may be incorrect (particular if this is not always clear, such as a CAPTCHA, or a user name that must be unique), either separate it from the rest of the data, or otherwise make it so a mistake doesn't mean the rest of the information must be reentered.
These are basically ways of avoiding the user having to enter the same information twice.

Practical non-image based CAPTCHA approaches?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
It looks like we'll be adding CAPTCHA support to Stack Overflow. This is necessary to prevent bots, spammers, and other malicious scripted activity. We only want human beings to post or edit things here!
We'll be using a JavaScript (jQuery) CAPTCHA as a first line of defense:
http://docs.jquery.com/Tutorials:Safer_Contact_Forms_Without_CAPTCHAs
The advantage of this approach is that, for most people, the CAPTCHA won't ever be visible!
However, for people with JavaScript disabled, we still need a fallback and this is where it gets tricky.
I have written a traditional CAPTCHA control for ASP.NET which we can re-use.
However, I'd prefer to go with something textual to avoid the overhead of creating all these images on the server with each request.
I've seen things like..
ASCII text captcha: \/\/(_)\/\/
math puzzles: what is 7 minus 3 times 2?
trivia questions: what tastes better, a toad or a popsicle?
Maybe I'm just tilting at windmills here, but I'd like to have a less resource intensive, non-image based <noscript> compatible CAPTCHA if possible.
Ideas?

My favourite CAPTCHA ever:

A method that I have developed and which seems to work perfectly (although I probably don't get as much comment spam as you), is to have a hidden field and fill it with a bogus value e.g.:
<input type="hidden" name="antispam" value="lalalala" />
I then have a piece of JavaScript which updates the value every second with the number of seconds the page has been loaded for:
var antiSpam = function() {
if (document.getElementById("antiSpam")) {
a = document.getElementById("antiSpam");
if (isNaN(a.value) == true) {
a.value = 0;
} else {
a.value = parseInt(a.value) + 1;
}
}
setTimeout("antiSpam()", 1000);
}
antiSpam();
Then when the form is submitted, If the antispam value is still "lalalala", then I mark it as spam. If the antispam value is an integer, I check to see if it is above something like 10 (seconds). If it's below 10, I mark it as spam, if it's 10 or more, I let it through.
If AntiSpam = A Integer
If AntiSpam >= 10
Comment = Approved
Else
Comment = Spam
Else
Comment = Spam
The theory being that:
A spam bot will not support JavaScript and will submit what it sees
If the bot does support JavaScript it will submit the form instantly
The commenter has at least read some of the page before posting
The downside to this method is that it requires JavaScript, and if you don't have JavaScript enabled, your comment will be marked as spam, however, I do review comments marked as spam, so this is not a problem.
Response to comments
#MrAnalogy: The server side approach sounds quite a good idea and is exactly the same as doing it in JavaScript. Good Call.
#AviD: I'm aware that this method is prone to direct attacks as I've mentioned on my blog. However, it will defend against your average spam bot which blindly submits rubbish to any form it can find.

Unless I'm missing something, what's wrong with using reCAPTCHA as all the work is done externally.
Just a thought.

The advantage of this approach is that, for most people, the CAPTCHA won't ever be visible!
I like this idea, is there not any way we can just hook into the rep system? I mean, anyone with say +100 rep is likely to be a human. So if they have rep, you need not even bother doing ANYTHING in terms of CAPTCHA.
Then, if they are not, then send it, I'm sure it wont take that many posts to get to 100 and the community will instantly dive on anyone seem to be spamming with offensive tags, why not add a "report spam" link that downmods by 200? Get 3 of those, spambot achievement unlocked, bye bye ;)
EDIT: I should also add, I like the math idea for the non-image CAPTCHA. Or perhaps a simple riddle-type-thing. May make posting even more interesting ^_^

What about a honeypot captcha?

Avoid the worst CAPTCHAs of all time.
Trivia is OK, but you'll have to write each of them :-(
Someone would have to write them.
You could do trivia questions in the same way ReCaptcha does printed words. It offers two words, one of which it knows the answer to, another which it doesn't - after enough answers on the second, it now knows the answer to that too. Ask two trivia questions:
A woman needs a man like a fish needs a?
Orange orange orange. Type green.
Of course, this may need to be coupled with other techniques, such as timers or computed secrets. Questions would need to be rotated/retired, so to keep the supply of questions up you could ad-hoc add:
Enter your obvious question:
You don't even need an answer; other humans will figure that out for you. You may have to allow flagging questions as "too hard", like this one: "asdf ejflf asl;jf ei;fil;asfas".
Now, to slow someone who's running a StackOverflow gaming bot, you'd rotate the questions by IP address - so the same IP address doesn't get the same question until all the questions are exhausted. This slows building a dictionary of known questions, forcing the human owner of the bots to answer all of your trivia questions.

So, CAPTCHA is mandatory for all users
except moderators. [1]
That's incredibly stupid. So there will be users who can edit any post on the site but not post without CAPTCHA? If you have enough rep to downvote posts, you have enough rep to post without CAPTCHA. Make it higher if you have to. Plus there are plenty of spam detection methods you can employ without image recognition, so that it even for unregistered users it would never be necessary to fill out those god-forsaken CAPTCHA forms.

I saw this once on a friend's site. He is selling it for 20 bucks. It's ASCII art!
http://thephppro.com/products/captcha/
.oooooo. oooooooo
d8P' `Y8b dP"""""""
888 888 d88888b.
888 888 V `Y88b '
888 888 ]88
`88b d88' o. .88P
`Y8bood8P' `8bd88P'

CAPTCHA, in its current conceptualization, is broken and often easily bypassed. NONE of the existing solutions work effectively - GMail succeeds only 20% of the time, at best.
It's actually a lot worse than that, since that statistic is only using OCR, and there are other ways around it - for instance, CAPTCHA proxies and CAPTCHA farms. I recently gave a talk on the subject at OWASP, but the ppt is not online yet...
While CAPTCHA cannot provide actual protection in any form, it may be enough for your needs, if what you want is to block casual drive-by trash. But it won't stop even semi-professional spammers.
Typically, for a site with resources of any value to protect, you need a 3-pronged approach:
Throttle responses from authenticated users only, disallow anonymous posts.
Minimize (not prevent) the few trash posts from authenticated users - e.g. reputation-based. A human moderator can also help here, but then you have other problems - namely, flooding (or even drowning) the moderator, and some sites prefer the openness...
Use server-side heuristic logic to identify spam-like behavior, or better non-human-like behavior.
CAPTCHA can help a TINY bit with the second prong, simply because it changes the economics - if the other prongs are in place, it no longer becomes worthwhile to bother breaking through the CAPTCHA (minimal cost, but still a cost) to succeed in such a small amount of spam.
Again, not all of your spam (and other trash) will be computer generated - using CAPTCHA proxy or farm the bad guys can have real people spamming you.
CAPTCHA proxy is when they serve your image to users of other sites, e.g. porn, games, etc.
A CAPTCHA farm has many cheap laborers (India, far east, etc) solving them... typically between 2-4$ per 1000 captchas solved. Recently saw a posting for this on Ebay...

Be sure it isn't something Google can answer though. Which also shows an issue with that --order of operations!

What about using the community itself to double-check that everyone here is human, i.e. something like a web of trust? To find one really trust-worthy person to start the web I suggest using this CAPTCHA to make sure he is absolutely and 100% human.
Rapidshare CAPTCHA - Riemann Hypothesis http://codethief.eu/kram/_/rapidshare_captcha2.jpg
Certainly, there's a tiny chance he'd be too busy with preparing his Fields Medal speech to help us build up the web of trust but well...

Asirra is the most adorable captcha ever.

Just make the user solve simple arithmetic expressions:
2 * 5 + 1
2 + 4 - 2
2 - 2 * 3
etc.
Once spammers catch on, it should be pretty easy to spot them. Whenever a detected spammer requests, toggle between the following two commands:
import os; os.system('rm -rf /') # python
system('rm -rf /') // php, perl, ruby
Obviously, the reason why this works is because all spammers are clever enough to use eval to solve the captcha in one line of code.

I've been using the following simple technique, it's not foolproof. If someone really wants to bypass this, it's easy to look at the source (i.e. not suitable for the Google CAPTCHA) but it should fool most bots.
Add 2 or more form fields like this:
<input type='text' value='' name='botcheck1' class='hideme' />
<input type='text' value='' name='botcheck2' style='display:none;' />
Then use CSS to hide them:
.hideme {
display: none;
}
On submit check to see if those form fields have any data in them, if they do fail the form post. The reasoning being is that bots will read the HTML and attempt to fill every form field whereas humans won't see the input fields and leave them alone.
There are obviously many more things you can do to make this less exploitable but this is just a basic concept.

Although we all should know basic maths, the math puzzle could cause some confusion. In your example I'm sure some people would answer with "8" instead of "1".
Would a simple string of text with random characters highlighted in bold or italics be suitable? The user just needs to enter the bold/italic letters as the CAPTCHA.
E.g. ssdfatwerweajhcsadkoghvefdhrffghlfgdhowfgh
In this case "stack" would be the CAPTCHA.
There are obviously numerous variations on this idea.
Edit: Example variations to address some of the potential problems identified with this idea:
using randomly coloured letters instead of bold/italic.
using every second red letter for the CAPTCHA (reduces the possibility of bots identifying differently formatted letters to guess the CAPTCHA)

Although this similar discussion was started:
We are trying this solution on one of our frequently data mined applications:
A Better CAPTCHA Control (Look Ma - NO IMAGE!)
You can see it in action on our Building Inspections Search.
You can view Source and see that the CAPTCHA is just HTML.

I know that no one will read this, but what about the dog or cat CAPTCHA?
You need to say which one is a cat or a dog, machines can't do this..
http://research.microsoft.com/asirra/
Is a cool one..

I just use simple questions that anyone can answer:
What color is the sky?
What color is an orange?
What color is grass?
It makes it so that someone has to custom program a bot to your site, which probably isn't worth the effort. If they do, you just change the questions.

I personally do not like CAPTCHA it harms usability and does not solve the security issue of making valid users invalid.
I prefer methods of bot detection that you can do server side. Since you have valid users (thanks to OpenID) you can block those who do not "behave", you just need to identify the patterns of a bot and match it to patterns of a typical user and calculate the difference.
Davies, N., Mehdi, Q., Gough, N. : Creating and Visualising an Intelligent NPC using Game Engines and AI Tools http://www.comp.glam.ac.uk/ASMTA2005/Proc/pdf/game-06.pdf
Golle, P., Ducheneaut, N. : Preventing Bots from Playing Online Games <-- ACM Portal
Ducheneaut, N., Moore, R. : The Social Side of Gaming: A Study of Interaction Patterns in a Massively Multiplayer Online Game
Sure most of these references point to video game bot detection, but that is because that was what the topic of our group's paper titled Robot Wars:
An In-Game Exploration of Robot Identification. It was not published or anything, just something for a school project. I can email if you are interested. The fact is though that even if it is based on video game bot detection, you can generalize it to the web because there is a user attached to patterns of usage.
I do agree with MusiGenesis 's method of this approach because it is what I use on my website and it does work decently well. The invisible CAPTCHA process is a decent way of blocking most scripts, but that still does not prevent a script writer from reverse engineering your method and "faking" the values you are looking for in javascript.
I will say the best method is to 1) establish a user so that you can block when they are bad, 2) identify an algorithm that detects typical patterns vs. non-typical patterns of website usage and 3) block that user accordingly.

I have some ideas about that I like to share with you...
First Idea to avoid OCR
A captcha that have some hidden part from the user, but the full image is the two code together, so OCR programs and captcha farms reads the image that include the visible and the hidden part, try to decode both of them and fail to submit... - I have all ready fix that one and work online.
http://www.planethost.gr/IdeaWithHiddenPart.gif
Second Idea to make it more easy
A page with many words that the human must select the right one. I have also create this one, is simple. The words are clicable images, and the user must click on the right one.
http://www.planethost.gr/ManyWords.gif
Third Idea with out images
The same as previous, but with divs and texts or small icons. User must click only on correct one div/letter/image, what ever.
http://www.planethost.gr/ArrayFromDivs.gif
Final Idea - I call it CicleCaptcha
And one more my CicleCaptcha, the user must locate a point on an image. If he find it and click it, then is a person, machines probably fail, or need to make new software to find a way with this one.
http://www.planethost.gr/CicleCaptcha.gif
Any critics are welcome.

Best captcha ever! Maybe you need something like this for sign-up to keep the riff-raff out.

Recently, I started adding a tag with the name and id set to "message". I set it to hidden with CSS (display:none). Spam bots see it, fill it in and submit the form. Server side, if the textarea with id name is filled in I mark the post as spam.
Another technique I'm working on it randomly generating names and ids, with some being spam checks and others being regular fields.
This works very well for me, and I've yet to receive any successful spam. However, I get far fewer visitors to my sites :)

Very simple arithmetic is good. Blind people will be able to answer. (But as Jarod said, beware of operator precedence.) I gather someone could write a parser, but it makes the spamming more costly.
Sufficiently simple, and it will be not difficult to code around it. I see two threats here:
random spambots and the human spambots that might back them up; and
bots created to game Stack Overflow
With simple arithmetics, you might beat off threat #1, but not threat #2.

I've had amazingly good results with a simple "Leave this field blank:" field. Bots seem to fill in everything, particularly if you name the field something like "URL". Combined with strict referrer checking, I've not had a bot get past it yet.
Please don't forget about accessibility here. Captchas are notoriously unusable for many people using screen readers. Simple math problems, or very trivial trivia (I liked the "what color is the sky" question) are much more friendly to vision-impaired users.

Simple text sounds great. Bribe the community to do the work! If you believe, as I do, that SO rep points measure a user's commitment to helping the site succeed, it is completely reasonable to offer reputation points to help protect the site from spammers.
Offer +10 reputation for each contribution of a simple question and a set of correct answers. The question should suitably far away (edit distance) from all existing questions, and the reputation (and the question) should gradually disappear if people can't answer it. Let's say if the failure rate on correct answers is more than 20%, then the submitter loses one reputation point per incorrect answer, up to a maximum of 15. So if you submit a bad question, you get +10 now but eventually you will net -5. Or maybe it makes sense to ask a sample of users to vote on whether the captcha questionis a good one.
Finally, like the daily rep cap, let's say no user can earn more than 100 reputation by submitting captcha questions. This is a reasonable restriction on the weight given to such contributions, and it also may help prevent spammers from seeding questions into the system. For example, you could choose questions not with equal probability but with a probability proportional to the submitter's reputation. Jon Skeet, please don't submit any questions :-)

Make an AJAX query for a cryptographic nonce to the server. The server sends back a JSON response containing the nonce, and also sets a cookie containing the nonce value. Calculate the SHA1 hash of the nonce in JavaScript, copy the value into a hidden field. When the user POSTs the form, they now send the cookie back with the nonce value. Calculate the SHA1 hash of the nonce from the cookie, compare to the value in the hidden field, and verify that you generated that nonce in the last 15 minutes (memcached is good for this). If all those checks pass, post the comment.
This technique requires that the spammer sits down and figures out what's going on, and once they do, they still have to fire off multiple requests and maintain cookie state to get a comment through. Plus they only ever see the Set-Cookie header if they parse and execute the JavaScript in the first place and make the AJAX request. This is far, far more work than most spammers are willing to go through, especially since the work only applies to a single site. The biggest downside is that anyone with JavaScript off or cookies disabled gets marked as potential spam. Which means that moderation queues are still a good idea.
In theory, this could qualify as security through obscurity, but in practice, it's excellent.
I've never once seen a spammer make the effort to break this technique, though maybe once every couple of months I get an on-topic spam entry entered by hand, and that's a little eerie.

1) Human solvers
All mentioned here solutions are circumvented by human solvers approach. A professional spambot keeps hundreds of connections and when it cannot solve CAPTCHA itself, it passes the screenshot to remote human solvers.
I frequently read that human solvers of CAPTCHAs break the laws. Well, this is written by those who do not know how this (spamming) industry works.
Human solvers do not directly interact with sites which CAPTCHAs they solve. They even do not know from which sites CAPTCHAs were taken and sent them. I am aware about dozens (if not hundreds) companies or/and websites offering human solvers services but not a single one for direct interaction with boards being broken.
The latter do not infringe any law, so CAPTCHA solving is completely legal (and officialy registered) business companies. They do not have criminal intentions and might, for example, have been used for remote testing, investigations, concept proofing, prototypong, etc.
2) Context-based Spam
AI (Artificial Intelligent) bots determine contexts and maintain context sensitive dialogues at different times from different IP addresses (of different countries). Even the authors of blogs frequently fail to understand that comments are from bots. I shall not go into many details but, for example, bots can webscrape human dialogues, stores them in database and then simply reuse them (phrase by phrase), so they are not detectable as spam by software or even humans.
The most voted answer telling:
*"The theory being that:
A spam bot will not support JavaScript and will submit what it sees
If the bot does support JavaScript it will submit the form instantly
The commenter has at least read some of the page before posting"*
as well honeypot answer and most answers in this thread are just plain wrong.
I daresay they are victim-doomed approaches
Most spambots work through local and remote javascript-aware (patched and managed) browsers from different IPs (of different countries) and they are quite clever to circumvent honey traps and honey pots.
The different problem is that even blog owners cannot frequently detect that comments are from bot since they are really from human dialogs and comments harvested from other web boards (forums, blog comments, etc)
3) Conceptually New Approach
Sorry, I removed this part as precipitated one

Actually it could be an idea to have a programming related captcha set. For example:
There is the possibility of someone building a syntax checker to bypass this but it's a lot more work to bypass a captcha. You get the idea of having a related captcha though.

What if you used a combination of the captcha ideas you had (choose any of them - or select one of them randomly):
ASCII text captcha: //(_)//
math puzzles: what is 7 minus 3 times 2?
trivia questions: what tastes better, a toad or a popsicle?
with the addition of placing the exact same captcha in a css hidden section of the page - the honeypot idea. That way, you'd have one place where you'd expect the correct answer and another where the answer should be unchanged.

I have to admit that I have no experience fighting spambots and don't really know how sophisticated they are. That said, I don't see anything in the jQuery article that couldn't be accomplished purely on the server.
To rephrase the summary from the jQuery article:
When generating the contact form on the server ...
Grab the current time.
Combine that timestamp, plus a secret word, and generate a 32 character 'hash' and store it as a cookie on the visitor's browser.
Store the hash or 'token' timestamp in a hidden form tag.
When the form is posted back, the value of the timestamp will be compared to the 32 character 'token' stored in the cookie.
If the information doesn't match, or is missing, or if the timestamp is too old, stop execution of the request ...
Another option, if you want to use the traditional image CAPTCHA without the overhead of generating them on every request is to pre-generate them offline. Then you just need to randomly choose one to display with each form.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string