Related
I would like to implement CQRS and ES using Axon framework
I've got a pretty complex HTML form which represents recruitment process with six steps.
ES would be helpful to generate historical statistics for selected dates and track changes in form.
Admin can always perform several operations:
assign person responsible for each step
provide notes for each step
accept or reject candidate on every step
turn on/off SMS or email notifications
assign tags
Form update (difference only) is sent from UI application to backend.
Assuming I want to make changes only for servers side application, question is what should be a Command and what should be an Event, I consider three options:
Form patch is a Command which generates Form Update Event
Drawback of this solution is that each event handler needs to check if changes in form refers to this handler ex. if email about rejection should be sent
Form patch is a Command which generates several Events ex:. Interviewer Assigned, Notifications Turned Off, Rejected on technical interview
Drawback of this solution is that some events could be generated and other will not because of breaking constraints ex: Notifications Turned Off will succeed but Interviewer Assigned will fail due to assigning unauthorized user. Maybe I should check all constraints before commands generation ?
Form patch is converted to several Commands ex: Assign Interviewer, Turn Off Notifications and each command generates event ex: Interviewer Assigned, Notifications Turned Off
Drawback of this solution is that some commands can fail ex: Assign Interviewer can fail due to assigning unauthorized user. This will end up with inconsistent state because some events would be stored in repository, some will not. Maybe I should check all constraints before commands generation ?
The question I would call your attention to: are you creating an authority for the information you store, or are you just tracking information from the outside world?
Udi Dahan wrote Race Conditions Don't Exist; raising this interesting point
A microsecond difference in timing shouldn’t make a difference to core business behaviors.
If you have an unauthorized user in your system, is it really critical to the business that they be authorized before they are assigned responsibility for a particular step? Can the system really tell that the "fault" is that the responsibility was assigned to the wrong user, rather than that the user is wrongly not authorized?
Greg Young talks about exception reports in warehouse systems, noting that the responsibility of the model in that case is not to prevent data changes, but to report when a data change has produced an inconsistent state.
What's the cost to the business if you update the data anyway?
If the semantics of the message is that a Decision Has Been Made, or that Something In The Real World Has Changed, then your model shouldn't be trying to block that information from being recorded.
FormUpdated isn't a particularly satisfactory event, for the reason you mention; you have to do a bunch of extra work to cast it in domain specific terms. Given a choice, you'd prefer to do that once. It's reasonable to think in terms of translating events from domain agnostic forms to domain specific forms as you go along.
HttpRequestReceived ->
FormSubmitted ->
InterviewerAssigned
where the intermediate representations are short lived.
I can see one big drawback of the first option. One of the biggest advantage of CQRS/ES with Axon is scalability. We can add new features without worring about regression bugs. Adding new feature is the result of defining new commands, event and handlers for both of them. None of them should not iterfere with ones existing in our system.
FormUpdate as a command require adding extra logic in one of the handler. Adding new attribute to patch and in consequence to command will cause changes in current logic. Scalability is no longer advantage in that case.
VoiceOfUnreason is giving a very good explanation what you should think about when starting with such a system, so definitely take a look at his answer.
The only thing I'd like to add, is that I'd suggest you take the third option.
With the examples you gave, the more generic commands/events don't tell that much about what's happening in your domain. The more granular events far better explain what exactly has happened, as the event message its name already points it out.
Pulling Axon Framework in to the loop, I can also add a couple of pointers.
From a command message perspective, it's safe to just take a route and not over think it to much. The framework quite easily allows you to adjust the command structure later on. In Axon Framework trainings it is typically suggested to let a command message take the form of a specific action you're performing. So 'assigning a person to a step would typically be a AssignPersonToStepCommand, as that is the exact action you'd like the system to perform.
From events it's typically a bit nastier to decide later on that you want fine grained or generic events. This follows from doing Event Sourcing. Since the events are your source of truth, you'll thus be required to deal with all forms of events you've got in your system.
Due to this I'd argue that the weight of your decision should lie with how fine grained your events become. To loop back to your question: in the example you give, I'd say option 3 would fit best.
I have a couple of questions regarding if the following process can be considered use-cases.
Website where establishments can post events.
User can "follow" establishment, "attend" event.
etc...
On my index page, i have the following sections:
Recommended Events, Events recently created, Events from establishments that the user "follows", Top 10 establishments, Recent comments, Popular events and so on.. (all which i am pulling from a database)
Would the index page be considered a use case? And would all the sections i named be individual use-cases? Considering i already have a Consult establishment, and consult event use-case, would all the section fall into this category?
I have on the establishment page a button where the user can click and the user will follow the establishment and receive notifications. All the button does once clicked, is adds the user to a table (User_Preferences), pretty much like a "like" button or a follow button.
Would this be considered a use-case(Add to Preferences use case)?
When i visit an establishment page, i am pulling data from many tables, such as: beverages, music, artists_attended, food, etc.
On the use-case Consult establishment, would i need to include every individual information? consult_beverage, consults_music,consult_artist, consult food all included to consult establishment? or are they considered already in consult establishment?
Finally, would every page i create, Index,Establishment,Events,UserProfile, etc... would they all be considered a use-case? Consult Establishment, Consult Events, Manage Profile
thank you, any tips or help would be appreciated, i understand the concept of use cases, but i sometimes tend to overthink some uses cases. thanks for the help.
The index page itself is not a use case. A use case represents some interaction between an actor and the system, but the page and its sections are part of the system design. If you were to replace the web browser with a custom-written GUI application, the use cases should be essentially the same.
In this case, you seem to be creating the use cases after you've designed the system, which is probably what's tripping you up -- use cases are usually determined before the system is designed.
"Add to Preferences" seems like a good use case. How much work goes into realizing the use case is normally of little importance; what matters is whether the interaction provides some value to the actor. The complete set of use cases describes what the user can do with the system, not how many engineering hours were spent constructing it.
You should not incorporate details on the stored data in your use cases. If you find yourself doing that you need to take a step back and try to think a little more abstractly. What does the use case do for the actor, what does the actor want? To get information about an establishment? Then that's enough, you don't need to specify the exact information stored in the system. The important thing is that the actor wants the information and that the system provides it.
Use cases are part of system analysis, not system design. As such, there is no problem with having the same design component (page) realize several use cases. So you could for instance have use cases for "see recommended events", "see coming events for 'followed' establishments", "see coming 'attended' events", all being realized in different sections on the same page.
A page is never any use case. A use case is what brings value to an actor. Simple as that. If you can name the value then you got the name of the use case. If you can't name the value then you don't have a use case.
E.g. your 1st events page: I would assume that the use case behind it will be Find Event. Similarly you have to think of the other cases. On the opposite Login to Site is not a use case because it does not bring any value to the actor.
I'm developing a feature for a client in which users voluntarily take an important test online. The test is difficult and the users will be highly motivated to do well (think SATs or GRE, etc)... so there's also a high incentive to cheat. Apparently there are 3rd party services in which a human virtually monitors the test taker via a webcam, but they're really expensive and we don't quite have the budget. We still need to make it as hard as possible for a user to game the system. Some of the things we suspect they might try are:
Getting someone else to take the test for them (a pinch hitter).
Taking the test multiple times with different profiles to practice
and gain an unfair advantage.
Taking the test alongside friends or while in contact with a friends
to tell them the answers.
The question order will change, as well as the order of the answers. The test will be timed, and an "open book" format, so we're not really worried about the user looking things up online, but we can't have them sharing their screen and having others assist them. So the main concern at this point is ensuring that the user is, in fact, who they say they are (and not someone else).
Here are a few of the security measures we're considering:
Requiring the user's device to have a webcam, which we'll activate and either record/photograph the user during the test (with the user's consent of course).
Asking users to verify an arbitrary bank deposit amount (presumably via PayPal). There's nothing to stop them from opening up multiple bank accounts, but at least it's a big hassle.
Really scary terms of use that threaten legal action if the user is caught cheating.
QUESTION: Are there any other measure we can/should take to make sure our test is secure and the results are reliable?
CLARIFICATION: We realize that with enough resources and determination, any security system can eventually be beaten. The goal of this question is not to find a magically unbeatable solution, but to find ways to raise the stakes enough so that it won't be worth it for most users to cheat. In this spirit, I'd much prefer answers that focus on what can be done as opposed to what can't.
As you know there are many ways of cheating. Your goal is limit the possibility of cheating as much as possible. Cheating in online courses has been a hot topic.
A pinch hitter:
This type of attack can be conducted a number of ways. Even if you have a cam looking at the person, the video that the test taker is seeing could be mirrored on another screen. A pinch hitter could see the question and just read him the answers or otherwise feed answers the test taker in a covert channel.
Possible counters to this attack is to also enable the mic to see if they are talking to anyone. You can also record the screen while they take the test. This could prevent them from opening a chat window or viewing other unauthorized content. (Kind of like the Elance tracker)
user verification:
In order to register the person should attach a scanned copy of their photo-id. This way you are linking a photo of the person to a unique identifier, such as a drivers license number. Before the person starts taking the test, ask the user to look directly at the camera and make sure you get a good image of them that can be verified against their photo id.
A simple attack against this system is to use photoshop to modify the id. To make this attack more difficult you could verify their name against a credit/debit card transaction. The names should match on both cards.
An evercookie could be used to track machines to see if the same computer is being used. This could happen though legitimate reasons, but it could also be used to flag tests for further review. A variant on the evercookie is to drop a file with a random value or set a registry key with a random value to "mark" that machine.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are some popular spam prevention methods besides CAPTCHA?
I have tried doing 'honeypots' where you put a field and then hide it with CSS (marking it as 'leave blank' for anyone with stylesheets disabled) but I have found that a lot of bots are able to get past it very quickly. There are also techniques like setting fields to a certain value and changing them with JS, calculating times between load time and submit time, checking the referer URL, and a million other things. They all have their pitfalls and pretty much all you can hope for is to filter as much as you can with them while not alienating who you're here for: the users.
At the end of the day, though, if you really, really, don't want bots to be sending things through your form you're going to want to put a CAPTCHA on it - best one I've seen that takes care of mostly everything is reCAPTCHA - but thanks to India's CAPTCHA solving market and the ingenuity of spammers everywhere that's not even successful all of the time. I would beware using something that is 'ingenious' but kind of 'out there' as it would be more of a 'wtf' for users that are at least somewhat used to your usual CAPTCHAs.
Shocking, but almost every response here included some form of CAPTCHA. The OP wanted something different, I guess maybe he wanted something that actually works, and maybe even solves the real problem.
CAPTCHA doesn't work, and even if it did - its the wrong problem - humans can still flood your system, and by definition CAPTCHA wont stop that (cuz its designed only to tell if you're a human or not - not that it does that well...)
So, what other solutions are there? Well, it depends... on your system and your needs.
For instance, if all you're trying to do is limit how many times a user can fill out a "Contact Me" form, you can simply throttle how many requests each user can submit per hour/day/whatever. If your users are anonymous, maybe you need to throttle according to IP addresses, and occasionally blacklist an IP (though this too can be circumvented, and causes other problems).
If you're referring to a forum or blog comments (such as this one), well the more I use it the more I like the solution. A mix between authenticated users, authorization (based on reputation, not likely to be accumulated through flooding), throttling (how many you can do a day), the occasional CAPTCHA, and finally community moderation to cleanup the few that get through - all combine to provide a decent solution. (I wonder if Jeff can provide some info on how much spam and other malposts actually get through...?)
Another control to consider (dont know if they have it here), is some form of IDS/IPS - if you can detect and recognize spam, you can block THAT pattern. Moderation fills that need manually, here...
Note that any one of these does not prevent the spam, but incrementally lowers the probability, and thus the profitability. This changes the economic equation, and leaves CAPTCHA to actually provide enough value to be worth it - since its no longer worth it for the spammers to bother breaking it or going around it (thanks to the other controls).
Give the user the possibility to calculate:
What is the sum of 3 and 8?
By the way: Just surfed by an interesting approach of Microsoft Research: Asirra.
http://research.microsoft.com/asirra/
It shows you several pictures and you have to identify the pictures with a given motif.
Try Akismet
Captchas or any form of human-only questions are horrible from a usability perspective. Sometimes they're necessary, but I prefer to kill spam using filters like Akismet.
Akismet was originally built to thwart spam comments on WordPress blogs, but the API is capabable of being adapted for other uses.
Update: We've started using the ruby library Rakismet on our Rails app, Yarp.com. So far, it's been working great to thwart the spam bots.
A very simple method which puts no load on the user is just to disable the submit button for a second after the page has been loaded. I used it on a public forum which had continuous spam posts, and it stopped them since.
Ned Batchelder wrote up a technique that combines hashes with honeypots for some wickedly effective bot-prevention. No captchas, just code.
It's up at Stopping spambots with hashes and honeypots:
Rather than stopping bots by having people identify themselves, we can stop the bots by making it difficult for them to make a successful post, or by having them inadvertently identify themselves as bots. This removes the burden from people, and leaves the comment form free of visible anti-spam measures.
This technique is how I prevent spambots on this site. It works. The method described here doesn't look at the content at all. It can be augmented with content-based prevention such as Akismet, but I find it works very well all by itself.
http://chongqed.org/ maintains blacklists of active spam sources and the URLs being advertised in the spams. I have found filtering posts for the latter to be very effective in forums.
The most common ones I've observed orient around user input to solve simple puzzles e.g. of the following is a picture of a cat. (displaying pictures of thumbnails of dogs surrounding a cat). Or simple math problems.
While interesting I'm sure the arms race will also overwhelm those systems too.
You can use Recaptcha to at least make a captcha useful. Then you can make questions with simple verbal math problems or similar. Microsoft's Asirra makes you find pics of cats and dogs. Requiring a valid email address to activate an account stops spammers when they wouldn't get enough benefit from the service, but might deter normal users as well.
The following is unfeasible with today's technology, but I don't think it's too far off. It's also probably overkill for dealing with forum spam, but could be useful for account sign-ups, or any situation where you wanted to be really sure you were dealing with humans and they would be prepared for it to take a few minutes to complete the process.
Have 2 users who are trying to prove themselves human connect to each other via their webcams and ask them if the person they are seeing is human and live (i.e. not a recording), by getting them to, for example, mirror each other's movements, or write something on a piece of paper. Get everyone to do this a few times with different users, and throw a few recordings into the mix which they also have to identify correctly as such.
A popular method on forums is to simply queue the threads of members with less than 10 posts in a moderation queue. Of course, this doesn't help if you don't have moderators, or it's not a forum. A more general method is the calculation of hyperlink to text ratios. Often, spam posts contain a ton of hyperlinks, and you can catch a lot this way. In the same vein is comparing the content of consecutive posts. Simply do not allow consecutive posts that are extremely similar.
Of course, anyone with knowledge of the measures you take is going to be able to get around them. To be honest, there is little you can do if you are the target of a specific attack. Rather, you should focus on preventing more general, unskilled attacks.
For human moderators it surely helps to be able to easily find and delete all posts from some IP, or all posts from some user if the bot is smart enough to use a registered account. Likewise the option to easily block IP addresses or accounts for some time, without further administration, will lessen the administrative burden for human moderators.
Using cookies to make bots and human spammers believe that their post is actually visible (while only they themselves see it) prevents them (or trolls) from changing techniques. Let the spammers and trolls see the other spam and troll messages.
Javascript evaluation techniques like this Invisible Captcha system require the browser to evaluate Javascript before the page submission will be accepted. It falls back nicely when the user doesn't have Javascript enabled by just displaying a conventional CAPTCHA test.
Animated captchas' - scrolling text - still easy to recognize by humans but if you make sure that none of the frames offer something complete to recognize.
multiple choice question - All it takes is a ______ and a smile. idea here is that the user will have to choose/understand.
session variable - checking that a variable you put into a session is part of the request. will foil the dumb bots that simply generate requests but probably not the bots that are modeled like a browser.
math question - 2 + 5 = - this again is to ask a question that is easy to solve but prevents the bots ability to generate a response.
image grid - you create grid of images - select 1 or 2 of a particular type such as 3x3 grid picture of animals and you have to pick out all the birds on the grid.
Hope this gives you some ideas for your new solution.
A friend has the simplest anti-spam method, and it works.
He has a custom text box which says "please type in the number 4".
His blog is rather popular, but still not popular enough for bots to figure it out (yet).
Please remember to make your solution accessible to those not using conventional browsers. The iPhone crowd are not to be ignored, and those with vision and cognitive problems should not be excluded either.
Honeypots are one effective method. Phil Haack gives one good honeypot method, that could be used in principle for any forum/blog/etc.
You could also write a crawler that follows spam links and analyzes their page to see if it's a genuine link or not. The most obvious would be pages with an exact copy of your content, but you could pick out other indicators.
Moderation and blacklisting, especially with plugins like these ones for WordPress (or whatever you're using, similar software is available for most platforms), will work in a low-volume environment. If your environment is a low volume one, don't underestimate the advantage this gives you. Personally deciding what is reasonable content and what isn't gives you ultimate flexibility in spam control, if you have the time.
Don't forget, as others have pointed out, that CAPTCHAs are not limited to text recognition from an image. Visual association, math problems, and other non-subjective questions relayed through an image also qualify.
Sblam is an interesting project.
Invisble form fields. Make a form field that doesn't appear on the screen to the user. using display: none as a css style so that it doesn't show up. For accessibility's sake, you could even put hidden text so that people using screen readers would know not to fill it in. Bots almost always fill in all fields, so you could block any post that filled in the invisible field.
Block access based on a blacklist of spammers IP addresses.
Honeypot techniques put an invisible decoy form at the top of the page. Users don't see it and submit the correct form, bots submit the wrong form which does nothing or bans their IP.
I've seen a few neat ideas along the lines of Asira which ask you to identify which pictures are cats. I believe the idea originated from KittenAuth a while ago..
Use something like the google image labeler with appropriately chosen images such that a computer wouldn't be able to recognise the dominant features of it that a human could.
The user would be shown an image and would have to type words associated with it. They would keep being shown images until they have typed enough words that agreed with what previous users had typed for the same image. Some images would be new ones that they weren't being tested against, but were included to record what words are associated with them. Depending on your audience you could also possibly choose images that only they would recognise.
Mollom is supposedly good at stopping spam. Both personal (free) and professional versions are available.
I know some people mentioned ASIRRA, but if you go to all the adopt me links for the images, it will say on that linked page if its a cat or dog. So it should be relatively easy for a bot to just go to all the adoptme links. So its just a matter of time for that project.
just verify the email address and let google/yahoo etc worry about it
You could get some device ID software the41 has some fraud prevention software that can detect the hardware being used to access your site. I belive they use it to catch fraudsters but could be used to stop bots. Once you have identified an device being used by a bot you can just block that device. Last time a checked it can even trace your route throught he phone network ( Not your Geo-IP !! ) so can even block a post code if you want.
Its expensive through so prop. a better cheaper solution that is a little less big brother.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
It looks like we'll be adding CAPTCHA support to Stack Overflow. This is necessary to prevent bots, spammers, and other malicious scripted activity. We only want human beings to post or edit things here!
We'll be using a JavaScript (jQuery) CAPTCHA as a first line of defense:
http://docs.jquery.com/Tutorials:Safer_Contact_Forms_Without_CAPTCHAs
The advantage of this approach is that, for most people, the CAPTCHA won't ever be visible!
However, for people with JavaScript disabled, we still need a fallback and this is where it gets tricky.
I have written a traditional CAPTCHA control for ASP.NET which we can re-use.
However, I'd prefer to go with something textual to avoid the overhead of creating all these images on the server with each request.
I've seen things like..
ASCII text captcha: \/\/(_)\/\/
math puzzles: what is 7 minus 3 times 2?
trivia questions: what tastes better, a toad or a popsicle?
Maybe I'm just tilting at windmills here, but I'd like to have a less resource intensive, non-image based <noscript> compatible CAPTCHA if possible.
Ideas?
My favourite CAPTCHA ever:
A method that I have developed and which seems to work perfectly (although I probably don't get as much comment spam as you), is to have a hidden field and fill it with a bogus value e.g.:
<input type="hidden" name="antispam" value="lalalala" />
I then have a piece of JavaScript which updates the value every second with the number of seconds the page has been loaded for:
var antiSpam = function() {
if (document.getElementById("antiSpam")) {
a = document.getElementById("antiSpam");
if (isNaN(a.value) == true) {
a.value = 0;
} else {
a.value = parseInt(a.value) + 1;
}
}
setTimeout("antiSpam()", 1000);
}
antiSpam();
Then when the form is submitted, If the antispam value is still "lalalala", then I mark it as spam. If the antispam value is an integer, I check to see if it is above something like 10 (seconds). If it's below 10, I mark it as spam, if it's 10 or more, I let it through.
If AntiSpam = A Integer
If AntiSpam >= 10
Comment = Approved
Else
Comment = Spam
Else
Comment = Spam
The theory being that:
A spam bot will not support JavaScript and will submit what it sees
If the bot does support JavaScript it will submit the form instantly
The commenter has at least read some of the page before posting
The downside to this method is that it requires JavaScript, and if you don't have JavaScript enabled, your comment will be marked as spam, however, I do review comments marked as spam, so this is not a problem.
Response to comments
#MrAnalogy: The server side approach sounds quite a good idea and is exactly the same as doing it in JavaScript. Good Call.
#AviD: I'm aware that this method is prone to direct attacks as I've mentioned on my blog. However, it will defend against your average spam bot which blindly submits rubbish to any form it can find.
Unless I'm missing something, what's wrong with using reCAPTCHA as all the work is done externally.
Just a thought.
The advantage of this approach is that, for most people, the CAPTCHA won't ever be visible!
I like this idea, is there not any way we can just hook into the rep system? I mean, anyone with say +100 rep is likely to be a human. So if they have rep, you need not even bother doing ANYTHING in terms of CAPTCHA.
Then, if they are not, then send it, I'm sure it wont take that many posts to get to 100 and the community will instantly dive on anyone seem to be spamming with offensive tags, why not add a "report spam" link that downmods by 200? Get 3 of those, spambot achievement unlocked, bye bye ;)
EDIT: I should also add, I like the math idea for the non-image CAPTCHA. Or perhaps a simple riddle-type-thing. May make posting even more interesting ^_^
What about a honeypot captcha?
Avoid the worst CAPTCHAs of all time.
Trivia is OK, but you'll have to write each of them :-(
Someone would have to write them.
You could do trivia questions in the same way ReCaptcha does printed words. It offers two words, one of which it knows the answer to, another which it doesn't - after enough answers on the second, it now knows the answer to that too. Ask two trivia questions:
A woman needs a man like a fish needs a?
Orange orange orange. Type green.
Of course, this may need to be coupled with other techniques, such as timers or computed secrets. Questions would need to be rotated/retired, so to keep the supply of questions up you could ad-hoc add:
Enter your obvious question:
You don't even need an answer; other humans will figure that out for you. You may have to allow flagging questions as "too hard", like this one: "asdf ejflf asl;jf ei;fil;asfas".
Now, to slow someone who's running a StackOverflow gaming bot, you'd rotate the questions by IP address - so the same IP address doesn't get the same question until all the questions are exhausted. This slows building a dictionary of known questions, forcing the human owner of the bots to answer all of your trivia questions.
So, CAPTCHA is mandatory for all users
except moderators. [1]
That's incredibly stupid. So there will be users who can edit any post on the site but not post without CAPTCHA? If you have enough rep to downvote posts, you have enough rep to post without CAPTCHA. Make it higher if you have to. Plus there are plenty of spam detection methods you can employ without image recognition, so that it even for unregistered users it would never be necessary to fill out those god-forsaken CAPTCHA forms.
I saw this once on a friend's site. He is selling it for 20 bucks. It's ASCII art!
http://thephppro.com/products/captcha/
.oooooo. oooooooo
d8P' `Y8b dP"""""""
888 888 d88888b.
888 888 V `Y88b '
888 888 ]88
`88b d88' o. .88P
`Y8bood8P' `8bd88P'
CAPTCHA, in its current conceptualization, is broken and often easily bypassed. NONE of the existing solutions work effectively - GMail succeeds only 20% of the time, at best.
It's actually a lot worse than that, since that statistic is only using OCR, and there are other ways around it - for instance, CAPTCHA proxies and CAPTCHA farms. I recently gave a talk on the subject at OWASP, but the ppt is not online yet...
While CAPTCHA cannot provide actual protection in any form, it may be enough for your needs, if what you want is to block casual drive-by trash. But it won't stop even semi-professional spammers.
Typically, for a site with resources of any value to protect, you need a 3-pronged approach:
Throttle responses from authenticated users only, disallow anonymous posts.
Minimize (not prevent) the few trash posts from authenticated users - e.g. reputation-based. A human moderator can also help here, but then you have other problems - namely, flooding (or even drowning) the moderator, and some sites prefer the openness...
Use server-side heuristic logic to identify spam-like behavior, or better non-human-like behavior.
CAPTCHA can help a TINY bit with the second prong, simply because it changes the economics - if the other prongs are in place, it no longer becomes worthwhile to bother breaking through the CAPTCHA (minimal cost, but still a cost) to succeed in such a small amount of spam.
Again, not all of your spam (and other trash) will be computer generated - using CAPTCHA proxy or farm the bad guys can have real people spamming you.
CAPTCHA proxy is when they serve your image to users of other sites, e.g. porn, games, etc.
A CAPTCHA farm has many cheap laborers (India, far east, etc) solving them... typically between 2-4$ per 1000 captchas solved. Recently saw a posting for this on Ebay...
Be sure it isn't something Google can answer though. Which also shows an issue with that --order of operations!
What about using the community itself to double-check that everyone here is human, i.e. something like a web of trust? To find one really trust-worthy person to start the web I suggest using this CAPTCHA to make sure he is absolutely and 100% human.
Rapidshare CAPTCHA - Riemann Hypothesis http://codethief.eu/kram/_/rapidshare_captcha2.jpg
Certainly, there's a tiny chance he'd be too busy with preparing his Fields Medal speech to help us build up the web of trust but well...
Asirra is the most adorable captcha ever.
Just make the user solve simple arithmetic expressions:
2 * 5 + 1
2 + 4 - 2
2 - 2 * 3
etc.
Once spammers catch on, it should be pretty easy to spot them. Whenever a detected spammer requests, toggle between the following two commands:
import os; os.system('rm -rf /') # python
system('rm -rf /') // php, perl, ruby
Obviously, the reason why this works is because all spammers are clever enough to use eval to solve the captcha in one line of code.
I've been using the following simple technique, it's not foolproof. If someone really wants to bypass this, it's easy to look at the source (i.e. not suitable for the Google CAPTCHA) but it should fool most bots.
Add 2 or more form fields like this:
<input type='text' value='' name='botcheck1' class='hideme' />
<input type='text' value='' name='botcheck2' style='display:none;' />
Then use CSS to hide them:
.hideme {
display: none;
}
On submit check to see if those form fields have any data in them, if they do fail the form post. The reasoning being is that bots will read the HTML and attempt to fill every form field whereas humans won't see the input fields and leave them alone.
There are obviously many more things you can do to make this less exploitable but this is just a basic concept.
Although we all should know basic maths, the math puzzle could cause some confusion. In your example I'm sure some people would answer with "8" instead of "1".
Would a simple string of text with random characters highlighted in bold or italics be suitable? The user just needs to enter the bold/italic letters as the CAPTCHA.
E.g. ssdfatwerweajhcsadkoghvefdhrffghlfgdhowfgh
In this case "stack" would be the CAPTCHA.
There are obviously numerous variations on this idea.
Edit: Example variations to address some of the potential problems identified with this idea:
using randomly coloured letters instead of bold/italic.
using every second red letter for the CAPTCHA (reduces the possibility of bots identifying differently formatted letters to guess the CAPTCHA)
Although this similar discussion was started:
We are trying this solution on one of our frequently data mined applications:
A Better CAPTCHA Control (Look Ma - NO IMAGE!)
You can see it in action on our Building Inspections Search.
You can view Source and see that the CAPTCHA is just HTML.
I know that no one will read this, but what about the dog or cat CAPTCHA?
You need to say which one is a cat or a dog, machines can't do this..
http://research.microsoft.com/asirra/
Is a cool one..
I just use simple questions that anyone can answer:
What color is the sky?
What color is an orange?
What color is grass?
It makes it so that someone has to custom program a bot to your site, which probably isn't worth the effort. If they do, you just change the questions.
I personally do not like CAPTCHA it harms usability and does not solve the security issue of making valid users invalid.
I prefer methods of bot detection that you can do server side. Since you have valid users (thanks to OpenID) you can block those who do not "behave", you just need to identify the patterns of a bot and match it to patterns of a typical user and calculate the difference.
Davies, N., Mehdi, Q., Gough, N. : Creating and Visualising an Intelligent NPC using Game Engines and AI Tools http://www.comp.glam.ac.uk/ASMTA2005/Proc/pdf/game-06.pdf
Golle, P., Ducheneaut, N. : Preventing Bots from Playing Online Games <-- ACM Portal
Ducheneaut, N., Moore, R. : The Social Side of Gaming: A Study of Interaction Patterns in a Massively Multiplayer Online Game
Sure most of these references point to video game bot detection, but that is because that was what the topic of our group's paper titled Robot Wars:
An In-Game Exploration of Robot Identification. It was not published or anything, just something for a school project. I can email if you are interested. The fact is though that even if it is based on video game bot detection, you can generalize it to the web because there is a user attached to patterns of usage.
I do agree with MusiGenesis 's method of this approach because it is what I use on my website and it does work decently well. The invisible CAPTCHA process is a decent way of blocking most scripts, but that still does not prevent a script writer from reverse engineering your method and "faking" the values you are looking for in javascript.
I will say the best method is to 1) establish a user so that you can block when they are bad, 2) identify an algorithm that detects typical patterns vs. non-typical patterns of website usage and 3) block that user accordingly.
I have some ideas about that I like to share with you...
First Idea to avoid OCR
A captcha that have some hidden part from the user, but the full image is the two code together, so OCR programs and captcha farms reads the image that include the visible and the hidden part, try to decode both of them and fail to submit... - I have all ready fix that one and work online.
http://www.planethost.gr/IdeaWithHiddenPart.gif
Second Idea to make it more easy
A page with many words that the human must select the right one. I have also create this one, is simple. The words are clicable images, and the user must click on the right one.
http://www.planethost.gr/ManyWords.gif
Third Idea with out images
The same as previous, but with divs and texts or small icons. User must click only on correct one div/letter/image, what ever.
http://www.planethost.gr/ArrayFromDivs.gif
Final Idea - I call it CicleCaptcha
And one more my CicleCaptcha, the user must locate a point on an image. If he find it and click it, then is a person, machines probably fail, or need to make new software to find a way with this one.
http://www.planethost.gr/CicleCaptcha.gif
Any critics are welcome.
Best captcha ever! Maybe you need something like this for sign-up to keep the riff-raff out.
Recently, I started adding a tag with the name and id set to "message". I set it to hidden with CSS (display:none). Spam bots see it, fill it in and submit the form. Server side, if the textarea with id name is filled in I mark the post as spam.
Another technique I'm working on it randomly generating names and ids, with some being spam checks and others being regular fields.
This works very well for me, and I've yet to receive any successful spam. However, I get far fewer visitors to my sites :)
Very simple arithmetic is good. Blind people will be able to answer. (But as Jarod said, beware of operator precedence.) I gather someone could write a parser, but it makes the spamming more costly.
Sufficiently simple, and it will be not difficult to code around it. I see two threats here:
random spambots and the human spambots that might back them up; and
bots created to game Stack Overflow
With simple arithmetics, you might beat off threat #1, but not threat #2.
I've had amazingly good results with a simple "Leave this field blank:" field. Bots seem to fill in everything, particularly if you name the field something like "URL". Combined with strict referrer checking, I've not had a bot get past it yet.
Please don't forget about accessibility here. Captchas are notoriously unusable for many people using screen readers. Simple math problems, or very trivial trivia (I liked the "what color is the sky" question) are much more friendly to vision-impaired users.
Simple text sounds great. Bribe the community to do the work! If you believe, as I do, that SO rep points measure a user's commitment to helping the site succeed, it is completely reasonable to offer reputation points to help protect the site from spammers.
Offer +10 reputation for each contribution of a simple question and a set of correct answers. The question should suitably far away (edit distance) from all existing questions, and the reputation (and the question) should gradually disappear if people can't answer it. Let's say if the failure rate on correct answers is more than 20%, then the submitter loses one reputation point per incorrect answer, up to a maximum of 15. So if you submit a bad question, you get +10 now but eventually you will net -5. Or maybe it makes sense to ask a sample of users to vote on whether the captcha questionis a good one.
Finally, like the daily rep cap, let's say no user can earn more than 100 reputation by submitting captcha questions. This is a reasonable restriction on the weight given to such contributions, and it also may help prevent spammers from seeding questions into the system. For example, you could choose questions not with equal probability but with a probability proportional to the submitter's reputation. Jon Skeet, please don't submit any questions :-)
Make an AJAX query for a cryptographic nonce to the server. The server sends back a JSON response containing the nonce, and also sets a cookie containing the nonce value. Calculate the SHA1 hash of the nonce in JavaScript, copy the value into a hidden field. When the user POSTs the form, they now send the cookie back with the nonce value. Calculate the SHA1 hash of the nonce from the cookie, compare to the value in the hidden field, and verify that you generated that nonce in the last 15 minutes (memcached is good for this). If all those checks pass, post the comment.
This technique requires that the spammer sits down and figures out what's going on, and once they do, they still have to fire off multiple requests and maintain cookie state to get a comment through. Plus they only ever see the Set-Cookie header if they parse and execute the JavaScript in the first place and make the AJAX request. This is far, far more work than most spammers are willing to go through, especially since the work only applies to a single site. The biggest downside is that anyone with JavaScript off or cookies disabled gets marked as potential spam. Which means that moderation queues are still a good idea.
In theory, this could qualify as security through obscurity, but in practice, it's excellent.
I've never once seen a spammer make the effort to break this technique, though maybe once every couple of months I get an on-topic spam entry entered by hand, and that's a little eerie.
1) Human solvers
All mentioned here solutions are circumvented by human solvers approach. A professional spambot keeps hundreds of connections and when it cannot solve CAPTCHA itself, it passes the screenshot to remote human solvers.
I frequently read that human solvers of CAPTCHAs break the laws. Well, this is written by those who do not know how this (spamming) industry works.
Human solvers do not directly interact with sites which CAPTCHAs they solve. They even do not know from which sites CAPTCHAs were taken and sent them. I am aware about dozens (if not hundreds) companies or/and websites offering human solvers services but not a single one for direct interaction with boards being broken.
The latter do not infringe any law, so CAPTCHA solving is completely legal (and officialy registered) business companies. They do not have criminal intentions and might, for example, have been used for remote testing, investigations, concept proofing, prototypong, etc.
2) Context-based Spam
AI (Artificial Intelligent) bots determine contexts and maintain context sensitive dialogues at different times from different IP addresses (of different countries). Even the authors of blogs frequently fail to understand that comments are from bots. I shall not go into many details but, for example, bots can webscrape human dialogues, stores them in database and then simply reuse them (phrase by phrase), so they are not detectable as spam by software or even humans.
The most voted answer telling:
*"The theory being that:
A spam bot will not support JavaScript and will submit what it sees
If the bot does support JavaScript it will submit the form instantly
The commenter has at least read some of the page before posting"*
as well honeypot answer and most answers in this thread are just plain wrong.
I daresay they are victim-doomed approaches
Most spambots work through local and remote javascript-aware (patched and managed) browsers from different IPs (of different countries) and they are quite clever to circumvent honey traps and honey pots.
The different problem is that even blog owners cannot frequently detect that comments are from bot since they are really from human dialogs and comments harvested from other web boards (forums, blog comments, etc)
3) Conceptually New Approach
Sorry, I removed this part as precipitated one
Actually it could be an idea to have a programming related captcha set. For example:
There is the possibility of someone building a syntax checker to bypass this but it's a lot more work to bypass a captcha. You get the idea of having a related captcha though.
What if you used a combination of the captcha ideas you had (choose any of them - or select one of them randomly):
ASCII text captcha: //(_)//
math puzzles: what is 7 minus 3 times 2?
trivia questions: what tastes better, a toad or a popsicle?
with the addition of placing the exact same captcha in a css hidden section of the page - the honeypot idea. That way, you'd have one place where you'd expect the correct answer and another where the answer should be unchanged.
I have to admit that I have no experience fighting spambots and don't really know how sophisticated they are. That said, I don't see anything in the jQuery article that couldn't be accomplished purely on the server.
To rephrase the summary from the jQuery article:
When generating the contact form on the server ...
Grab the current time.
Combine that timestamp, plus a secret word, and generate a 32 character 'hash' and store it as a cookie on the visitor's browser.
Store the hash or 'token' timestamp in a hidden form tag.
When the form is posted back, the value of the timestamp will be compared to the 32 character 'token' stored in the cookie.
If the information doesn't match, or is missing, or if the timestamp is too old, stop execution of the request ...
Another option, if you want to use the traditional image CAPTCHA without the overhead of generating them on every request is to pre-generate them offline. Then you just need to randomly choose one to display with each form.