Preferable Tag Cloud Visualization Formats - layout

Out of curiosity, I would love to know what tag clouds formats best serve the purpose of discovery of more and more (relevant)content?
I am aware of 3 formats, but don't know which one is the best.
1) delicious one - color shading
2) The standard one with font size variations -
3) The one on this site - numbers showing importance/usage.
So which ones do you prefer? and why?
Edit:
Thanks to the answers below, I now have much more understanding of tag cloud visualization techniques.
4) Parallel Tag Clouds - a simple use of parallel coordinates technique. I find it more organized and readable.
5) voroni diagram - more useful for identifying tag relationships and making decisions based on them. Doesn't serves our purpose of discovery of relevant content.
6) Mind maps - They are good and can be employed to step by step filter content.
I found some more interesting techniques here - http://www.cs.toronto.edu/~ccollins/research/index.html

I really do think that depends on the content of the information and the audience. What's relevant to one is not relevant to another. If an audience is more specialized, then they will be more likely to think along the same lines, but it would still need to be analyzed and catered to by the content provider.
There are also multiple paths that a person can take to "discover more". Take the tag "DNS" for example. You could drill down to more specific details like "UDP Port 53" and "MX Record", or you could go sideways with terms like "IP address" "Hostname" and "URL". A Voronoi diagram shows clusters, but wouldn't handle the case where general terms could be related to many concepts. Hostname mapping to "DNS", "HTTP", "SSH" etc.
I've noticed that in certain tag clouds there's usually one or two items that are vastly larger than the others. Those sorts of things could be served by a mind map, where one central concept has others radiating out from it.
For the cases of lots of "main topics" where a mind map is inappropriate, there are parallel coordinates but that would be baffling to many net users.
I think that if we found an extremely well organized way of sorting clusters of tags while preserving links between generalities and specificities, that would be somewhat helpful to AI research.
In terms of which I personally prefer, I think the numeric approach is nice because infrequently referenced tags are still presented at a readable font size. I also think SO does it this way because they have vastly more tags to cover than the average size based cloud a la the standard.

I would go with #2 out of the options you listed above.
1 - The human eye recognizes and comprehends size differences much more effectively than color, when the color scale is along the same spectrum (ie, various blues as opposed to discrete individual colors).
3 - Requires the user to scan the full list and mathematically compare each individual number while scanning. No real meaningful relationship between tags without a lot of work on the users part.
So, going with #2, there are several considerations to take into account:
Keep the tags alphabetical. This affords the user another method of searching and establishes a known relationship between each (assuming they know the alphabet!). If they're unordered, it's just a crapshoot to find a single one.
If size comparison is absolutely critical (this usually isn't the case, as you can scale up each level by a certain percentage or pixel amount), use a monospaced font. Otherwise, certain letter combinations may end up looking larger than they actually are.
Don't include any commas, pipes, or other dividers. You're already going to have a lot of data in a small area - no need to clutter it up with debris. Space the tags out with a decent amount of padding, of course. Just don't double the number of visual elements by adding more than just the data.
Set a min/max font size and scale between those. There are situations where one tag may be so popular that visually it may appear exponentially larger than the others. Likewise, you don't want a tag to end up rendering at 1px! Set the min/max and adjust between as necessary.

size adjusted voroni diagram
- it shows which tags are inter-related

My favorite tag cloud format is the Wordle format. It looks great and it also does a pretty good job of fitting a lot of tags in a small space.

Related

Handle different layout of document using kofax

I am new to KofaxTotalAgility solution, but i am well aware of OCR, OMR and recognition mechanism.
I have two forms in one folder, A and B.
both of them are identical, but due to manual scan there are slight axes change, say 20 pixel right shift, so Layout is slightly differ.
Layout of Image A and Image B are different, position of a form in a page are not fix.
I know, other solution like "abbyy fine reader", provide flexilayout where we can handle this by finding the text and setting up right left top down to automatically identify zones.
As i have started learning KofaxTotalAgility, i am unaware of all option provided by "kofax Transformation Designer".
My question is which Locator should i use, i am currently using/working-on advance zone locator and for one document(Image A) which i set as a reference, extraction is proper. But for other,(Image B) due to layout mismatch text/box field are not getting extracted.
Can anyone point out the right direction from where i can get this case handled properly.
I know, i am asking direct option/solution, any help is highly appreciable.
In general, Kofax Transformations has two groups of locators:
Deterministic. You tell the locator precisely what to do, and how to do it (similar to an imperative approach when programming)
Probabilistic. You just tell your locator what to extract, and it works out the rest (based on AI).
Here's a (non-exhaustive) diagram I created the other day:
When working with forms, you might be tempted to rely on forms-specific locators such as the Advanced Zone Locator. While this locator can account for fields "moving around", for example due to images being jolted, zoomed, or distorted, there are certain limitations. Other locators don't have these limitations - the format locator for example allows you to define a certain pattern (a Regular Expression) that should be matched along with a keyword that has to be found somewhere around that pattern.
For your example, you could create a regex like M|F|X, and then define "Gender" as the keyword that needs to be present on the left.
However, any locator that's ruled by determinism follows Murphy's law - at some point that keyword might change. There could be different languages. And maybe additional letters for certain genders might be added; ultimately breaking your extraction logic.
Enter AI - while Murphy's law still applies when using Group Locators, the difference here is that users can train the system to pick up the new data. Said locator will automatically work out the best way to extract that piece of data. If you used a format locator, the customer would need to get back to you to add additional expressions, or have the keywords changed.
In your particular case, I'd try to use a Trainable Group Locator first. If you already know what you're looking for - for example SSNs that you have somewhere in a database, go for the Database Locator. Use Format Locators as a last resort, as tempting as they may be. Advanced Zone Locators are useful when you deal with forms, but I find myself using them almost exclusively for handprint or checkbox recognition.

Level of Detail in 3D graphics - What are the pros and cons?

I understand the concept of LOD but I am trying to find out the negative side of it and I see no reference to that from Googling around. The only pro I keep coming across is that it improves performance by omitting details when an object is far and displaying better graphics when the object is near.
Seriously that is the only pro and zero con? Please advice. Tnks.
There are several kinds of LOD based on camera distance. Geometric, animation, texture, and shading variations are the most common (there are also LOD changes that can occur based on image size and, for gaming, hardware capabilities and/or frame rate considerations).
At far distances, models can change tessellation or be replaced by simpler models. Animated details (say, fingers) may simplify or disappear. Textures may move to simpler textures, bump maps vanish, spec/diffuse maps combines, etc. And shaders may also swap-put to reduce the number of texture inputs or calculation (though this is less common and may be less profitable, since when objects are far away they already fill fewer pixels -- but it's important for screen-filling entities like, say, a mountain).
The upsides are that your game/app will have to render less data, and in some cases, the LOD down-rezzed model may actually look better when far away than the more-complex model (usually because the more detailed model will exhibit aliasing when far away, but the simpler one can be tuned for that distance). This frees-up resources for the nearer models that you probably care about, and lets you render overall larger scenes -- you might only be able to render three spaceships at a time at full-res, but hundreds if you use LODs.
The downsides are pretty obvious: you need to support asset swapping, which can mean both the real-time selection of different assets and switching them but also the management (at times of having both models in your memory pipeline (one to discard, one to load)); and those models don't come from the air, someone needs to create them. Finally, and this is really tricky for PC apps, less so for more stable platforms like console gaming: HOW DO YOU MEASURE the rendering benefit? What's the best point to flip from version A of a model to B, and B to C, etc? Often LODs are made based on some pretty hand-wavy specifications from an engineer or even a producer or an art director, based on hunches. Good measurement is important.
LOD has a variety of frameworks. What you are describing fits a distance-based framework.
One possible con is that you will have inaccuracies when you choose an arbitrary point within the object for every distance calculation. This will cause popping effects at times since the viewpoint can change depending on orientation.

Identifying teeth area within a mouth region in an image

I am trying to do an image manipulation wherein the user would be prompted to enclose the mouth portion within an image. Once the user does that my application should identify the pixels that would identify the teeth (the color varying from white to yellow) and then I would like to brighten only those pixel. Could anyone give me a guidance on how to proceed?
Your question is quite honestly, very broad as an adequate answer will touch on a large number of areas.
Nevertheless, what you are trying to attempt is called Pattern Recognition. More specifically, your problem is geared towards image-analysis, dealing mainly in Template Matching:
Template matching is a technique in digital image processing for
finding small parts of an image which match a template image. It can
be used in manufacturing as a part of quality control, a way to
navigate a mobile robot, or as a way to detect edges in images.
The Template Matching page has a C-like language sample algorithm which demonstrates what you are attempting to do (identify a specific color within an image).
As for how to go about this, generally speaking you will have to load an image, store it into an array then try to manipulate it as the algorithm suggests:
One way to perform template matching on color images is to decompose
the pixels into their color components and measure the quality of
match between the color template and search image using the sum of the
absolute differences (SAD) computed for each color separately.
Of course, there are numerous projects in various languages that do that for you. My suggestion is to read up a bit more on the topic, pick a language, and attempt a solution using libraries as necessary.
One book that you might find to be very helpful is the classic Phillips: Image Processing in C even if you don't want to use C. Why? Because it pores over a lot of the algorithmic details in how they work, and how to implement them. And, its free too.

Evaluating the "Value" Attribute

I'm attempting to use the OpenAmplify API to evaluate the content of a URI. The point is to draw out the topics that are truly relevant to the article. Unfortunately, the topical analysis I'm getting back is:
Huge, and
Varied
Neither quality is terribly useful for what I'm trying to do because the signal to noise ratio is being heavily skewed towards noise. I'm analyzing web content, so there is a certain amount (perhaps a large amount) of irrelevant content (ads, etc.) involved. I get that.
Nonetheless, many of the topics being returned are either useless (utterly non-sensical, not even words), irrelevant (as in, where did that come from?) or too granular to provide any meaning or insight. I can probably filter out most of this noise using the value, um, value that is returned for each domain, subdomain, topic, et al, but I don't really know what it means.
Certainly I understand that the value it's a measure of "the prominence of the word in the text," but the number itself appears entirely arbitrary in a way that I prevents me saying something like "ignore any terms with a value less than 50" and have it carry any real meaning.
Are there any range criteria that I can use to help me understand how to use a topic's value score as a filtering threshold? Alternatively, is there another field that I should be using for this sort of filtration?
Thanks for your help.
From other channels, I've learned that the value attribute can't be evaluated the way I was hoping. It means different things for different signals and none are defined in such a way that are meaningful for this kind of requirement.

Web Interfaces: What is a good way to display many static fields?

My web applications have pages that display many static fields.
I know that poor layout invariably leads to information overload and poor readability.
My Question:
Are there any best-practices or heuristics for laying out a screen that contains many static fields?
Ordinarily, I would reference Bill Scott and Theresa Neil's excellent book, but I can't seem to find any guidance for this issue.
Here are some guidelines that I'm inclined to follow:
Group related fields.
Position the major (or parent) fields towards the top and the left. Position the minor (or child) fields towards the bottom and right.
Don't feel obliged to fill every pixel. Consider white space if it will improve readability.
Favor progressive disclosure wherever possible.
Consider an accordion control.
Perhaps a Details on Demand approach would work well. Ask yourself which data are absolutely and immediately relevant to the user and group those, while hiding the other data. You can always provide an 'Expand' link or control that would allow a user to view the details if desired.
(It's good to see that you're looking at interface design patterns. They are often overlooked!)
I work on HR software and have faced this problem many times. One thing that we keep seeing in feedback when we introduce collapse controls or any progressive disclosure pattern really is that our users don't like the way those types of "web 2.0" (their words lol) pages don't print out. So just a reminder if your user base still insists on printing large pages of data include a print media stylesheet.
Depending on how large your set of data is I'd seriously consider a some search functionality or a sorting mechanism. Many times when the data set is large different users have different priorities and allowing customization is the only way to satisfy a wide audience.
I think you pretty much answered your own question, especially the grouping of important fields tied to progressive disclosure.

Resources