My Question
What are the best practices for creating a customized report based on a user form input? Specifically, how do I create an easy to maintain system which takes user input which is collected in a form and generate multiple paragraphs that explains the results of analysis.
Background
I am working on a very large multiyear project with a startup (who is my client). My job is to program analysis and generate reports to users. The pipeline for data looks like this:
Users enter information into a form -> results are calculated based on user input -> reports are displayed to users that share analysis.
It is really important to my client that some of the analysis results are displayed in paragraphs in a non-formal user friendly tone. The challenge is that the form and analysis are quite complex and will only get more complex over time. An example of the type of template for the paragraphs looks something like this:
resultsParagraphText=`Hi ${userName}. We found that the best ice cream flavour for you is ${bestIceCreamFlavor}. These other flavors ${otherFlavors} might be good for you. Here are the reasons why you might enjoy these flavors: ${reasonsWhyGoodFlavors}.
However we would not recommend these other flavors ${badFlavors}. Here are the reasons you should avoid this bad flavors: ${reasonsWhyBadFlavors}.`
These results paragraphs, of which there of many, have several minor problems which combined are significant:
If there is a bug in the code, minor visual errors would be visible to end users (capitalization errors, missing/extra commas, and so on).
A lot of string comparisons (e.g. if answers.previousFlavors.includes("Vanilla")) are required to generate the results paragraphs. Minor errors in the forms (e.g. vanilla in the form is not capitalized so answers.previousFlavors.includes("Vanilla") returns false even when user enters vanilla.) can cause errors in the results paragraph.
Changes in different parts of the project (form, analysis) directly effect how the results paragraph is made. Bad types, differences in string values, null or undefined values not being caught directly have an impact on how the results paragraph is made.
There are many edge cases (e.g. What if the user has no other suitable good flavors for them? The the sentence These other flavors ${otherFlavors} might be good for you. needs to be excluded).
It is hard to write paragraphs that use templates and have a non-formal tone.
and so on.
I have charts and other types of ways to display results and have explained to the client the challenges of sharing the information in paragraph form.
What I am looking for
I need examples, how tos, best practices on how to build a maintainable system for generating customized paragraphs based on user input. I know how to solve each of the individual issues (as they are fairly simple) but in a large project this will become very hard to maintain.
Notes
I have no clue what tags to use for the post. Feel free to edit/add tags if you know more appropriate ones.
The project is planning to use machine learning in the future other parts of the project. If there is a ML/AI solution that is useful please tell me.
I am working primarily in JavaScript, Python, C, and R, but if there is a library or tool in any other language please tell me. Finding a solution is very important to me and I would be willing to learn a lot find a best solution.
To avoid this question being removed because I have rephrased it to avoid asking for personal opinion, instead asking for existing examples or how tos. I can also imagine that others might find a solution fairly useful. If you can edit it to make the question less subjective please do so.
If you have any questions or need clarification feel free to ask. Any help is appreciated.
I'm creating a Book store in Magento and am having trouble figuring out the best way to handle the Authors of a Book (which would be the product).
What I currently have is an Attribute called "authors" which is multi-select and a thousand [test] values. It's still manageable but does get a little slow when editing a product. Also, when adding an option/value to the authors attribute itself, a huge list is rendered in the HTML making this an inefficient solution.
Is there another approach I should take?
Is it possible to create an Author object (entity type?) which is associated to a product through a join table? If yes, can someone give me an explanation about how that is done or point me to some good documentation?
If I'd take the Author object approach, could that still be used in the layered navigation?
How would I show the list of all books for a single author?
Thanks in advance!
PS: I am aware of extensions like Improved Navigation but AFAIK it adds something like attributes to attributes themselves which is not what I'm looking for.
For Googlers: The same would apply for Artists of a music site or manufacturers.
If you create an author entity type, you'll just increase your work trying to add it to layered navigation, and I don't see a reason why it would be faster.
Your approach seems the best fit to the problem, given the way Magento is set up. How are you going to display 1,000 (which presumably pales in comparison to the actual list) authors in layered navigation?
Depending on the requirements, you could go the route of denormalizing the field and accepting text for it. That would still allow you to display it, search based on it, etc, but would eliminate the need to render every possible artist to manipulate the list. You could add a little code around selecting the proper artist (basically add an AJAX autocomplete to the backend field) to minimize typos as well.
Alternatively, you could write a simple utility to add a new artist to the system without some of the overhead of Magento's loading the list. To be honest, though, it seems that the lag that this has the potential to create on the frontend will probably outweigh the backend trouble.
Hope that helps!
Thanks,
Joe
Out of curiosity, I would love to know what tag clouds formats best serve the purpose of discovery of more and more (relevant)content?
I am aware of 3 formats, but don't know which one is the best.
1) delicious one - color shading
2) The standard one with font size variations -
3) The one on this site - numbers showing importance/usage.
So which ones do you prefer? and why?
Edit:
Thanks to the answers below, I now have much more understanding of tag cloud visualization techniques.
4) Parallel Tag Clouds - a simple use of parallel coordinates technique. I find it more organized and readable.
5) voroni diagram - more useful for identifying tag relationships and making decisions based on them. Doesn't serves our purpose of discovery of relevant content.
6) Mind maps - They are good and can be employed to step by step filter content.
I found some more interesting techniques here - http://www.cs.toronto.edu/~ccollins/research/index.html
I really do think that depends on the content of the information and the audience. What's relevant to one is not relevant to another. If an audience is more specialized, then they will be more likely to think along the same lines, but it would still need to be analyzed and catered to by the content provider.
There are also multiple paths that a person can take to "discover more". Take the tag "DNS" for example. You could drill down to more specific details like "UDP Port 53" and "MX Record", or you could go sideways with terms like "IP address" "Hostname" and "URL". A Voronoi diagram shows clusters, but wouldn't handle the case where general terms could be related to many concepts. Hostname mapping to "DNS", "HTTP", "SSH" etc.
I've noticed that in certain tag clouds there's usually one or two items that are vastly larger than the others. Those sorts of things could be served by a mind map, where one central concept has others radiating out from it.
For the cases of lots of "main topics" where a mind map is inappropriate, there are parallel coordinates but that would be baffling to many net users.
I think that if we found an extremely well organized way of sorting clusters of tags while preserving links between generalities and specificities, that would be somewhat helpful to AI research.
In terms of which I personally prefer, I think the numeric approach is nice because infrequently referenced tags are still presented at a readable font size. I also think SO does it this way because they have vastly more tags to cover than the average size based cloud a la the standard.
I would go with #2 out of the options you listed above.
1 - The human eye recognizes and comprehends size differences much more effectively than color, when the color scale is along the same spectrum (ie, various blues as opposed to discrete individual colors).
3 - Requires the user to scan the full list and mathematically compare each individual number while scanning. No real meaningful relationship between tags without a lot of work on the users part.
So, going with #2, there are several considerations to take into account:
Keep the tags alphabetical. This affords the user another method of searching and establishes a known relationship between each (assuming they know the alphabet!). If they're unordered, it's just a crapshoot to find a single one.
If size comparison is absolutely critical (this usually isn't the case, as you can scale up each level by a certain percentage or pixel amount), use a monospaced font. Otherwise, certain letter combinations may end up looking larger than they actually are.
Don't include any commas, pipes, or other dividers. You're already going to have a lot of data in a small area - no need to clutter it up with debris. Space the tags out with a decent amount of padding, of course. Just don't double the number of visual elements by adding more than just the data.
Set a min/max font size and scale between those. There are situations where one tag may be so popular that visually it may appear exponentially larger than the others. Likewise, you don't want a tag to end up rendering at 1px! Set the min/max and adjust between as necessary.
size adjusted voroni diagram
- it shows which tags are inter-related
My favorite tag cloud format is the Wordle format. It looks great and it also does a pretty good job of fitting a lot of tags in a small space.
Most popular web-sites that require you to log in, have the authentication form on the right side of the page. More or less. As a right-handed person, I find it rather intuitive to look at and convenient to work with—that I don't have to sprain my neck or move my mouse too much to select the username field (though of late, most pages do that by default, immediately after loading completes). Not being omniscient I wonder how a left-handed person would react to the very same UI. Which begs the question: should this not be part of the web-site design goal to flip the forms for a left-handed person? Also, I guess it matters what language you are interacting in. For a language like English that reads left to right, having the form on your right probably makes more sense.
Some examples to look at with different layout of auth forms:
Facebook, Gmail, Y! Right
Buzzword Center
SOF Left
Feel free to share your $0.02. I'd also be interested to know if actual research has gone in to this.
Update:(02/20) Some excellent posts there. Good time to summarize:
The story so far:
Most web-pages are static in terms of manoeuvrability.
Users have little/no choice on how content is served.
English being a the lingua franca of the Internet,
web sites have, over time ended up using the left-to-right
reading order of English as the order. This is in
keeping with UI design guidelines.
Being left-handed puts you at unease when using such web-sites
(not a general rule perhaps, but people have experienced issues)
Users tend to change habits rather than complain.
Clarification: Some of you seem to have misinterpreted my reference to mouse manoeuvre. It was supposed to serve as an example of what I think I'd take time to get adjusted to if things weren't the way they are. Cheers!
i'm left-handed. even more, i'm the only one i know that uses mouse on the left and swaps buttons (so the 'main' button is my index). but i don't see why you think that anything on screen should move 'for our benefit'.
one thing i've found by obsessively observing (i did my first semi-formal poll about this when i was 13 years old) other left-handed people's habits is that there's a lot more variety among us than among right-handed people. so, if you want to do some 'multi-handed' ergonomics, you shouldn't assume anything, just allow for maximum flexibility.
I don't think it has anything to do with what's natural. At least not anymore. Whatever the original reason, it's self-perpetuating now. Most websites have the login form on the right-hand side of the page. Therefore, if you're striving for the goal of "don't make me think," you should put the login form on the right-hand side of the page...
...thus increasing the number of sites that have the login form on the right-hand side of the page and the strength of the suggestion to put the login form on the right-hand side of the page so your users won't have to think...
...thus increasing... you get the idea.
I'd guess the reason for sticking it in the upper-right corner is that it's an important thing to do on a page, but not nearly as important as the title for the page/website, and that goes in the top-left corner. It's all about reading order.
I doubt right-handedness or left-handedness makes any difference. Your hand and neck use independent muscle groups.
UI Design 101 dictates that you orient the controls of your user interface (desktop application, web page, etc.) using the natural reading order of your customers. For English users, this would entail a left-to-right, top-to-bottom approach. That is, the most important information should be in the top-left corner and the least important information should be in the bottom-right corner.
The reason various websites put their login controls at different locations has less to do with conformance with some industry standard than it does with what the website designers perceive to be the most important information.
Take Gmail for example. Google is more concerned with advertising their various products (Gmail, Web History, iGoogle, etc.) to new users than they are about you logging in. Hence, they tout their products in the place that most users look first - the top-left corner. If you've already got an account, you immediately skip over this and type your login credentials on the right-hand side. And remember that once you're logged in, you never see this screen again. With this approach, Google is clearly trying to accommodate new users, not existing users. From a business perspective, this makes sense.
You might find this page interesting.
"As the owner of a website you want people to be able to use your website easily and reach content quickly - which is your ultimate goal. Being consistent with other websites in terms of the positioning of menus and content will help your visitors, give them a better overall experience and reduce the likelihood of closing the browser in annoyance. Once you have confirmed the layout, you can by all means go wild with the content and design."
Having the login box on the right side also allows you to keep the left hand navigation bar the same for people who are logged in and those who are not. See ING Direct's website for an example.
I found this article by Joel very enlightening. The part about conforming with the leaders in your field in order to eschew confusion and frustration is particularly applicable to your question.
I am left handed but I use the mouse with my right hand, always have....I feel awkward when I use it with my left now.
I'd never really thought about this before... interesting. I think it has to do with organization for an left-to-right language universe.
Since most languages move from left-to-right, it makes sense to have the site expand from the left to the right. Along those same lines, if you need to expand your menus, it makes sense to expand your menus from the left out to the right. However, user links (such as login, profile, etc.) are typically static. If you want to keep them out of the way of the rest of your navigation, better to put them on the other corner of the page - thus the right corner, rather than the left.
Edit: Sorry, I think I misinterpreted your question to mean "why are the login links on the upper right" rather than "why is the whole form on the right side of the page".
I am not expert in design, but I have my opinion about this subject.
The main reason there is alignment on the right should have to do with the way of reading. The majority of existing languages. The most people don't care about the minority of people, they only care about the majority.
Not being omniscient I wonder how a left-handed person would react to the very same UI
I don't think that this could matters for any person. But I'm not the right kind of guy to talk about this... I write with right hand :S
It would be great that all companies start to worry about this subject, for a better world :D
It's choice, not chance, that determines your destiny.
I don't see how it matters which hand you use the mouse with (FWIW, I am right-handed but use the mouse with my left hand).
Which hand you use does not alter where your cursor rests on the screen, so you don't have to move it any further with one hand than with the other. The distance is the same - i.e. however far it is from the last thing you clicked on.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
What are some popular spam prevention methods besides CAPTCHA?
I have tried doing 'honeypots' where you put a field and then hide it with CSS (marking it as 'leave blank' for anyone with stylesheets disabled) but I have found that a lot of bots are able to get past it very quickly. There are also techniques like setting fields to a certain value and changing them with JS, calculating times between load time and submit time, checking the referer URL, and a million other things. They all have their pitfalls and pretty much all you can hope for is to filter as much as you can with them while not alienating who you're here for: the users.
At the end of the day, though, if you really, really, don't want bots to be sending things through your form you're going to want to put a CAPTCHA on it - best one I've seen that takes care of mostly everything is reCAPTCHA - but thanks to India's CAPTCHA solving market and the ingenuity of spammers everywhere that's not even successful all of the time. I would beware using something that is 'ingenious' but kind of 'out there' as it would be more of a 'wtf' for users that are at least somewhat used to your usual CAPTCHAs.
Shocking, but almost every response here included some form of CAPTCHA. The OP wanted something different, I guess maybe he wanted something that actually works, and maybe even solves the real problem.
CAPTCHA doesn't work, and even if it did - its the wrong problem - humans can still flood your system, and by definition CAPTCHA wont stop that (cuz its designed only to tell if you're a human or not - not that it does that well...)
So, what other solutions are there? Well, it depends... on your system and your needs.
For instance, if all you're trying to do is limit how many times a user can fill out a "Contact Me" form, you can simply throttle how many requests each user can submit per hour/day/whatever. If your users are anonymous, maybe you need to throttle according to IP addresses, and occasionally blacklist an IP (though this too can be circumvented, and causes other problems).
If you're referring to a forum or blog comments (such as this one), well the more I use it the more I like the solution. A mix between authenticated users, authorization (based on reputation, not likely to be accumulated through flooding), throttling (how many you can do a day), the occasional CAPTCHA, and finally community moderation to cleanup the few that get through - all combine to provide a decent solution. (I wonder if Jeff can provide some info on how much spam and other malposts actually get through...?)
Another control to consider (dont know if they have it here), is some form of IDS/IPS - if you can detect and recognize spam, you can block THAT pattern. Moderation fills that need manually, here...
Note that any one of these does not prevent the spam, but incrementally lowers the probability, and thus the profitability. This changes the economic equation, and leaves CAPTCHA to actually provide enough value to be worth it - since its no longer worth it for the spammers to bother breaking it or going around it (thanks to the other controls).
Give the user the possibility to calculate:
What is the sum of 3 and 8?
By the way: Just surfed by an interesting approach of Microsoft Research: Asirra.
http://research.microsoft.com/asirra/
It shows you several pictures and you have to identify the pictures with a given motif.
Try Akismet
Captchas or any form of human-only questions are horrible from a usability perspective. Sometimes they're necessary, but I prefer to kill spam using filters like Akismet.
Akismet was originally built to thwart spam comments on WordPress blogs, but the API is capabable of being adapted for other uses.
Update: We've started using the ruby library Rakismet on our Rails app, Yarp.com. So far, it's been working great to thwart the spam bots.
A very simple method which puts no load on the user is just to disable the submit button for a second after the page has been loaded. I used it on a public forum which had continuous spam posts, and it stopped them since.
Ned Batchelder wrote up a technique that combines hashes with honeypots for some wickedly effective bot-prevention. No captchas, just code.
It's up at Stopping spambots with hashes and honeypots:
Rather than stopping bots by having people identify themselves, we can stop the bots by making it difficult for them to make a successful post, or by having them inadvertently identify themselves as bots. This removes the burden from people, and leaves the comment form free of visible anti-spam measures.
This technique is how I prevent spambots on this site. It works. The method described here doesn't look at the content at all. It can be augmented with content-based prevention such as Akismet, but I find it works very well all by itself.
http://chongqed.org/ maintains blacklists of active spam sources and the URLs being advertised in the spams. I have found filtering posts for the latter to be very effective in forums.
The most common ones I've observed orient around user input to solve simple puzzles e.g. of the following is a picture of a cat. (displaying pictures of thumbnails of dogs surrounding a cat). Or simple math problems.
While interesting I'm sure the arms race will also overwhelm those systems too.
You can use Recaptcha to at least make a captcha useful. Then you can make questions with simple verbal math problems or similar. Microsoft's Asirra makes you find pics of cats and dogs. Requiring a valid email address to activate an account stops spammers when they wouldn't get enough benefit from the service, but might deter normal users as well.
The following is unfeasible with today's technology, but I don't think it's too far off. It's also probably overkill for dealing with forum spam, but could be useful for account sign-ups, or any situation where you wanted to be really sure you were dealing with humans and they would be prepared for it to take a few minutes to complete the process.
Have 2 users who are trying to prove themselves human connect to each other via their webcams and ask them if the person they are seeing is human and live (i.e. not a recording), by getting them to, for example, mirror each other's movements, or write something on a piece of paper. Get everyone to do this a few times with different users, and throw a few recordings into the mix which they also have to identify correctly as such.
A popular method on forums is to simply queue the threads of members with less than 10 posts in a moderation queue. Of course, this doesn't help if you don't have moderators, or it's not a forum. A more general method is the calculation of hyperlink to text ratios. Often, spam posts contain a ton of hyperlinks, and you can catch a lot this way. In the same vein is comparing the content of consecutive posts. Simply do not allow consecutive posts that are extremely similar.
Of course, anyone with knowledge of the measures you take is going to be able to get around them. To be honest, there is little you can do if you are the target of a specific attack. Rather, you should focus on preventing more general, unskilled attacks.
For human moderators it surely helps to be able to easily find and delete all posts from some IP, or all posts from some user if the bot is smart enough to use a registered account. Likewise the option to easily block IP addresses or accounts for some time, without further administration, will lessen the administrative burden for human moderators.
Using cookies to make bots and human spammers believe that their post is actually visible (while only they themselves see it) prevents them (or trolls) from changing techniques. Let the spammers and trolls see the other spam and troll messages.
Javascript evaluation techniques like this Invisible Captcha system require the browser to evaluate Javascript before the page submission will be accepted. It falls back nicely when the user doesn't have Javascript enabled by just displaying a conventional CAPTCHA test.
Animated captchas' - scrolling text - still easy to recognize by humans but if you make sure that none of the frames offer something complete to recognize.
multiple choice question - All it takes is a ______ and a smile. idea here is that the user will have to choose/understand.
session variable - checking that a variable you put into a session is part of the request. will foil the dumb bots that simply generate requests but probably not the bots that are modeled like a browser.
math question - 2 + 5 = - this again is to ask a question that is easy to solve but prevents the bots ability to generate a response.
image grid - you create grid of images - select 1 or 2 of a particular type such as 3x3 grid picture of animals and you have to pick out all the birds on the grid.
Hope this gives you some ideas for your new solution.
A friend has the simplest anti-spam method, and it works.
He has a custom text box which says "please type in the number 4".
His blog is rather popular, but still not popular enough for bots to figure it out (yet).
Please remember to make your solution accessible to those not using conventional browsers. The iPhone crowd are not to be ignored, and those with vision and cognitive problems should not be excluded either.
Honeypots are one effective method. Phil Haack gives one good honeypot method, that could be used in principle for any forum/blog/etc.
You could also write a crawler that follows spam links and analyzes their page to see if it's a genuine link or not. The most obvious would be pages with an exact copy of your content, but you could pick out other indicators.
Moderation and blacklisting, especially with plugins like these ones for WordPress (or whatever you're using, similar software is available for most platforms), will work in a low-volume environment. If your environment is a low volume one, don't underestimate the advantage this gives you. Personally deciding what is reasonable content and what isn't gives you ultimate flexibility in spam control, if you have the time.
Don't forget, as others have pointed out, that CAPTCHAs are not limited to text recognition from an image. Visual association, math problems, and other non-subjective questions relayed through an image also qualify.
Sblam is an interesting project.
Invisble form fields. Make a form field that doesn't appear on the screen to the user. using display: none as a css style so that it doesn't show up. For accessibility's sake, you could even put hidden text so that people using screen readers would know not to fill it in. Bots almost always fill in all fields, so you could block any post that filled in the invisible field.
Block access based on a blacklist of spammers IP addresses.
Honeypot techniques put an invisible decoy form at the top of the page. Users don't see it and submit the correct form, bots submit the wrong form which does nothing or bans their IP.
I've seen a few neat ideas along the lines of Asira which ask you to identify which pictures are cats. I believe the idea originated from KittenAuth a while ago..
Use something like the google image labeler with appropriately chosen images such that a computer wouldn't be able to recognise the dominant features of it that a human could.
The user would be shown an image and would have to type words associated with it. They would keep being shown images until they have typed enough words that agreed with what previous users had typed for the same image. Some images would be new ones that they weren't being tested against, but were included to record what words are associated with them. Depending on your audience you could also possibly choose images that only they would recognise.
Mollom is supposedly good at stopping spam. Both personal (free) and professional versions are available.
I know some people mentioned ASIRRA, but if you go to all the adopt me links for the images, it will say on that linked page if its a cat or dog. So it should be relatively easy for a bot to just go to all the adoptme links. So its just a matter of time for that project.
just verify the email address and let google/yahoo etc worry about it
You could get some device ID software the41 has some fraud prevention software that can detect the hardware being used to access your site. I belive they use it to catch fraudsters but could be used to stop bots. Once you have identified an device being used by a bot you can just block that device. Last time a checked it can even trace your route throught he phone network ( Not your Geo-IP !! ) so can even block a post code if you want.
Its expensive through so prop. a better cheaper solution that is a little less big brother.