Detecting presence (arrival/departure) with active RFID tags - rfid

Actually arrival is pretty simple, tag gets into a range of receivers antenna, but the departure is what is causing the problems.
First some information about the setup we have.
Tags:
They work at 433Mhz, every 1.5 seconds they transmit a "heartbeat", on movement they go into a transmission burst mode which lasts for as long as they are moving.
They transmit their ID, transmission sequence number(1 to 255, repeating over and over), for how long they have been in use, and input from motion sensor, if any. We have no control over them whatsoever. They will continue doing what they do until their battery dies. And they are sealed shut.
Receiver forwards all that data + signal strength of a tag to our software. Software can work with several receivers. Currently we are using omnidirectional antennas.
How can we be sure that the tag has departed from premises?
Problems:
Sometimes two or more tags transmit "heartbeat" at the same time and no signal is received. With number of tags increasing these collisions happen more often, this problem is solved by tags randomly changing their heartbeat rate (in several milliseconds) to avoid collisions. Problem is I can't rely on tags not "checking in" for a certain period of time as sign of departure. It could be timeout because of collisions. Because of these collisions we cannot rely that every "heartbeat" will be received.
Tag manufacturer advised that we use two receivers and set them up as a gate for tags to pass through. Based on the order of tags passing through "gates" we can tell in which direction they are going. The problem with our omnidirectional antennas is that sometimes tag signal bounces of building and then arrives to receiver. So based on signal strength it looks like its farther away then it is.
Does anybody have a solution of what we can do to have a reliable way of determining if tags are coming or leaving? Also we can setup antennas in different way as well.
I wrote the software that interprets data from receivers, so that part can be manipulated in any way. But I'm out of ideas of how to interpret information to get reliability we need.
Right now the only idea is to try out with directional antennas? But I would like to tryout all the options with the current equipment we have.
Also any literature suggestion that deals with active RFID tags is more than welcome, most of books I've found deal with passive tag solutions.

As a top level statement, if you need to track items leaving your site, your RFID technology is probably the wrong one. The technology you have is better suited to the positional tracking tags within a large area - eg a factory floor. Notwithstanding the above, here is my take:
A good approach to active RFID is to break your area down into zones that are tied to your business processes, for example:
Warehouse
Loading bay
Packing
Entry of a tag into a zone represents the start of a new process or perhaps the end of a process the tag is currently in. For example, moving from warehouse to the packing represents assembling a shipment, and movement into the loading bay initiates a shipment.
The crux of many RFID implementations is the installation and configuration of the RFID intrastructure to:
Map tag -> asset (which you have done)
Map tag read -> zone (and by inference asset -> zone)
Map movements between zones to steps in a business processes (and therefore understand when an asset leaves the site, your goal)
There are a number of considerations: the physical characteristics of 433MHz signals, position of antennae, sensitivity of antennae and some tricks that some vendors have. After an optimal site configuration, then you may need to have some processing tricks on the tag reads that will pour in.
Dirty data
Always keep in mind that tag read data is dirty - that RF interference (from unshielded motors, electric wiring, etc), weather conditions and physical manipulation of tags (eg covering with metal) happen all the time.
RSSI's are like stock tickers - there is a lot of random/microeconomic noise on top of broad macroeconomic trends. To interpret movement, compute the linear regression of groups of reads rather then rely on a specific read's RSSI.
If you do see a tag broadcasting with a high RSSI, which then falls to medium then low and then disappears, you really can interpret that as the tag is leaving the range of the receiver. Is that off-site? Well, you need to consider the site's layout (the zones) and the positioning of receivers within the zones.
TriangulationTrilateration
EDIT I had incorrectly used the term 'triangulation'. This refers to determining the position of something by known the angle it subtends from two or three known locations. In RFID, you use the distance and as such it is called 'trilateration'.
In my experience, vendors selling the tag technology you describe have server software that determines the absolute position of the tags using the received RSSI. You should be able to obtain the position of the tag within 1-10m using such software. Determining if the tag is moving off-site is then easy.
To code this yourself:
First, each tag is pinging away when moving. These pings hit the receivers at almost the same time and sent to the server. However the messages can sometimes arrive out of order or interleaved with earlier and later reads from other receivers. To help correlate pings, the ping contains a sequence number. You are looking for tag reads from the same tag, with the same sequence number, received by three (or more) receivers. If more than three, pick the three with the largest RSSI.
The distance is approximated from RSSI. This is not linear and subject to non-trivial random variation. A quick google turns up:
Given three approximate distances from three known points (the receivers' locations), you can then resolve the approximate position of the tag using Trilateration using 3 latitude and longitude points, and 3 distances.
Now you have the absolute position of the tag. You can use these positions to track the absolute movement of the tag.
To make this useful, you should position receivers so that you can reliably detect tags right up to the physical site boundaries. You should then determine a 'geofence' around your site, within receiver range. I would write a business rule that states:
If the last known position of a tag was outside the geofence, and
A tag read from the tag has not been detected in (say) 10s, then
Declare the tag has left the site.
By using the trilateration and geofence, you can focus the business logic on only those tags close to going awol. If you fail to receive your 1.5s ping only a few times from such a tag, it's highly likely that the tag has gone outside your receiver's range, and therefore off-site.
You're already aware that tag reads can sometimes come from reflections. If you have a lot of these, then your trilateration will be pretty poor. So this method works best when there are fairly large open spaces and minimal reflectors.
Some RFID vendors have all this built into their servers - processing this by writing your own code is (clearly) non-trivial.
Zone design using wide-area receivers
Logical design of zones can help the business logic layer. For example, suppose you have two zones (A and B) with two receivers (1 and 2):
A B
+----------+----------+
| | |
| 1 | 2 |
| | |
+----------+----------+
If you get tag reads from the tag at receiver 1, then one at receiver 2, how do you interpret that? Did tag T move into zone B, or just get a read at the extreme range of 2?
If you get a later read at 1, did the tag move back, or did it never move?
A better physical solution is:
A B
+----------+----------+
| | |
| 1 2 3 |
| | |
+----------+----------+
In this approach, a tag moving from A to B would get reads from the following receivers:
1 1 1 2 1 2 2 3 2 2 3 2 3 3 3 3 3
-------> time
From a programming logic point of view, a movement from A -> B has to traverse reads 1 -> 2 -> 3 (even though there is a lot of jitter). It gets even easier to interpret when you combine with RSSI.
Portal design with directional receivers
You can create quite a good portal using two directional receivers (you will need to spend some time configuring the antenna and sensitivity carefully). Mount a receiver well above the door on both sides. Below is a schematic from the side. R1 and R2 are the receivers (and the rough read field is shown), and on the left is a worker pushing an asset through the door:
----> direction of motion
-------------------+----------------
R1 | R2
/ \ | / \
o / \ / \
|-++ / \ / \
|\++ / \ / \
------------------------------------------
You should get a pattern of reads like this:
<nothing> 1 1 1 1 1 12 1 21 2 12 2 1 2 2 2 2 2 <nothing>
-------> time
This indicates a movement from receiver 1 to receiver 2.
"Signposts"
Savi implementations often use "sign posts" to assist with location. The sign post emits beam that illuminates a small area (like a doorway) in a 123KHz beam. The signpost also transmits a unique number identifying itself (left door might be 1, while the right door might be 2). When the tag passes through the beam, it wakes up and re-broadcasts the number. The reader now knows which door the tag passed through.
Watch out for any metal in the surrounding area. 123KHz travels extremely well down rebar in concrete walls, metal fences and rail tracks. We once had tags reporting themselves hundreds of meters from a signpost due to such effects.
With this approach you can implement a portal much like you would for passive.
Simulating signposts
If you don't have the ability to use signposts, then there is a dirty hack:
Stick a passive RFID tag to your active RFID tag
Install a passive RFID reader on each doorway
Passive RFID is actually very good in restricted spaces, so this implementation can work very well. This solution may be the same cost (or cheaper) than with your active RFID vendor.
If you're clever, you can use the EPC GIAI namespace for the passive tag ID and so burn it with the active tag ID. Both active and passive tags would then be identically named.
Physical considerations
433MHz tags have some interesting characteristics. Well-constructed receivers can get a read of tags within about 100m, which is a long way for RFID. In addition, 433MHz wraps itself around obstacles very well, especially metal ones. We could even read tags in the boot (trunk) of a car travelling at 50km/h - the signal propagates from the rubber seal.
When installing a reader to monitor a zone, you need to adjust its location and sensitivity very carefully to maximize the reads from tags within your zone, but also to minimize reads from outside your zone. This might be done in HW or in SW configuration (like dropping all reads below a particular RSSI).
One idea might be to move the receiver away from the area where your tags are exiting as in the layout below (R is the reader):
+-------------------------+-----------+
| Warehouse | Exit |
| . |
| .
| R . R --->
| .
| . |
| | |
+-------------------------+-----------+
It pays to do a RF site survey and spend enough time to properly understand how tags and readers work in an area. Getting the physical installation right is critical.
Other thing to do is to consider physical constrictions such as corridors and doorways and treat them as choke-points - map logical zones to them. Put a reader (with directional receiver tuned to cover the constriction) and lower sensitivity in to cover the constriction.
What no tag-reads actually means
If my experience of RFID has taught me anything, it is that you can get spurious reads at any time, and you need to treat everything with a degree of suspicion. For example, you might have a few seconds of missing reads from a given tag - this can mean anything:
A user accidentally putting a metal tin over the tag
A fork lift truck getting between tag and reader
An RF collision
A momentary network congestion
The battery dying or fading out (remember to check the low-battery flag in tag reads and ensure the business has a process to replace old tags).
Tag destroyed by a pallet being pushed into it
Stollen by someone wanting to resell it for scrap (Not a joke - this actually happened)
Oh yeah, it may be that the tag moved off-site.
If the tag has not been heard of in, say, 5 minutes, odds are that it's off site.
In most business processes that you would use this active tag technology for, a short delay before the system decides the tag is off-site is acceptable.
Conclusions
Site survey: spend time experimenting with readers in different locations. Walk around the site with a tag and see what reads you are actually getting. Use this to:
Logically segment your site into zones and locate receivers to most accurately position tags in zones
It's easier to determine movement between zones using several receivers; if possible, instrument physical constrictions such as doors and corridors as portals. As part of your RFID implementation, you might even want to install new walls or fences to create such constrictions. Consider a passive RFID for portals.
Beware of metal, especially large expanses of it.
You have dirty data. You need to compute linear regressions on the RSSIs to spot trends over short periods; you need to be able to forgive a small number of missing tag reads
Make sure that there are business processes to handle dying batteries and sudden disappearances of tags.
Above all, this problem is best solved by getting the receivers installed in the best locations and configuring them carefully, then getting the software right. Trying to solve a bad site installation with software can cause premature ageing.
Disclosure: I worked 8 years for a major active RFID vendor.

Using directional antennas sounds like it may be a more reliable option, although this obviously depends on the precise layout of your premises.
As far as using your current omnidirectional receivers, there are a couple of options I can think of:
First one, and likely easiest, would be to collect some data on the average 'check-in' times you are seeing for on-site tags, possibly as a function of the number of on-site tags (if the number is likely to change dramatically - as your collision frequency will be related to the number of tags present). You can then analyse this data to see if you can choose a suitable cut-off time, after which you declare that a tag is no longer present.. Obviously exactly what cut-off you choose will depend on the data you see and your willingness to accept false positives - it could also be that any acceptable cut-off time lies outside your 3 minute window (although I suspect that if that is the case then your 3 minute window may not be viable).
Another, more difficult, option (or group of options more like), would be to utilise more historical information about each tag - for instance, look for tags whose signal strength gradually decreases and then disappears, or tags whose check-in time changes drastically, or perhaps utilise multiple receivers and look for patterns between receivers - such as tags which are only seen by one receiver and then disappear, or distinctive patterns of signal strength (indicating bearing) between receivers as tags go off-site.
Obviously the second option is really about looking for patterns, both over time and between receivers, and is likely to be much more labour (and analysis) intensive to implement. If you are able to capture enough good quality data you might be able to utilise machine-learning algorithms to identify relevant patterns.

We do this every day.
First question is: "How many tags do you have at a reader at any given time?". Collisions are more rare than you might think, but they do happen and tag over-population can be easily determined.
Our Software was written and might be using the same readers and tags that you are using. We set reader timeouts to determine when a tag is "away" or "offsite"; usually 30 seconds without the tag being read. Arrival of course is instantaneous when a tag is detected at the reader, then the tag is flagged "onsite".
We also have the option to use multiple readers; one at a gate and another on the parking lot or in the building for example. The gate reader has a short timeout. If a tag passes the gate reader, it is red and then times out very quickly to flag the tag as "offsite". If a tag is then read by any other reader, the tag is then considered "onsite".
I can post links if you think it would be helpful, else you can search for RFID Track. It's iOS App and there are settings posted for a demo server.
Peter

Related

Optimization of resources in Excel

I'm struggling with a task of optimization of resources, and I wonder if someone knows any efficient way to find the optimal solution for it, using only Excel. I explain what I'm trying to achieve:
Suppose you are managing an assembly factory for metal tubes. Your raw material is standard size tubes, and then in your factory you need to cut these tubes according to a list of requests from clients, with very specific sizes. All tubes are of the same type, so we can reuse leftovers from each cut, if the length of that leftover is sufficient to satisfy any tube request.
We can also group small length requests to be made from one single tube, for example, on the attached list, we could use one 8 metre tube to deliver the last four entries (1,615+1,62+1,625+1,67), with 1,47 leftover wasted.
Assuming a long list of requests, and that the tubes supplied are 8 metres each, do you know of any way of calculating how many tubes I have to order to satisfy the list of requests, minimising the losses per each cut?
Example of request list, each entry is in metres

Detecting damaged car parts

I am trying to build a system that on providing an image of a car can assess the damage percentage of it and also find out which parts are damaged in the car.
Is there any possible way to do this using Python and open-cv or tensorflow ?
The GitHub repositories I found that were relevant to my work are these
https://github.com/VakhoQ/damage-car-detector/tree/master/DamageCarDetector
https://github.com/neokt/car-damage-detective
But what they provide is a qualitative output( like they say the car damage is high or low), I wanted to print out a quantitative output( percentage of damage ) along with the individual part names which are damaged
Is this possible ?
If so please help me out.
Thank you.
To extend the good answers given by #yves-daoust: It is not a trivial task and you should not try to do it at once with one single approach.
You should question yourself how a human with a comparable task, i.e. say an expert who reviews these cars after a leasing contract, proceeds with this. Then you have to formulate requirements and also restrictions for your system.
For instance, an expert first checks for any visual occurences and rates these, then they may check technical issues which may well be hidden from optical sensors (i.e. if the car is drivable, driving a round and estimate if the engine is running smoothly, the steering geometry is aligned (i.e. if the car manages to stay in line), if there are any minor vibrations which should not be there and so on) and they may also apply force (trying to manually shake the wheels to check if the bearings are ok).
If you define your measurement system as restricted to just a normal camera sensor, you are somewhat limited within to what extend your system is able to deliver.
If you just want to spot cosmetic damages, i.e. classification of scratches in paint and rims, I'd say a state of the art machine vision application should be able to help you to some extent:
First you'd need to detect the scratches. Bear in mind that visibility of scratches, especially in the field with changing conditions (sunlight) may be a very hard to impossible task for a cheap sensor. I.e. to cope with reflections a system might need to make use of polarizing filters, special effect paints may interfere with your optical system in a way you are not able to spot anything.
Secondly, after you detect the position and dimension of these scratches in the camera coordinates, you need to transform them into real world coordinates for getting to know the real dimensions of these scratches. It would also be of great use to know the exact location of the scratch on the car (which would require a digital twin of the car - which is not to be trivially done anymore).
After determining the extent of the scratch and its position on the car, you need to apply a cost model. Because some car parts are easily fixable, say a scratch in the bumper, just respray the bumper, but scratch in the C-Pillar easily is a repaint for the whole back quarter if it should not be noticeable anymore.
Same goes with bigger scratches / cracks: The optical detection model needs to be able to distinguish between scratches and cracks (which is very hard to do, just by looking at it) and then the cost model can infer the cost i.e. if a bumper needs just respray or needs complete replacement (because it is cracked and not just scratched). This cost model may seem to be easy but bear in mind this needs to be adopted to every car you "scan". Because one cheap damage for the one car body might be a very hard to fix damage for a different car body. I'd say this might even be harder than to spot the inital scratches because you'd need to obtain the construction plans/repair part lists (the repair handbooks / repair part lists are mostly accessible if you are a registered mechanic but they might cost licensing fees) of any vehicle you want to quote.
You see, this is a very complex problem which is composed of multiple hard sub-problems. The easiest or probably the best way to do this would be to do a bottom up approach, i.e. starting with a simple "scratch detector" which just spots scratches in paint. Then go from there and you easily see what is possible and what is not

How to export specific price and volume data from the LMAX level 2 widget to excel

Background -
I am not a programmer.
I do trade spot forex on an intraday basis.
I am willing to learn programming
Specific Query -
I would like to know how to export into Excel in real time 'top of book' price and volume data as displayed on the LMAX level 2 widget/frame on -
https://s3-eu-west-1.amazonaws.com/lmax-widget/website-widget-quote-prof-flex.html?a=rTWcS34L5WRQkHtC
In essence I am looking to export
price and volume data where the coloured flashes occur.
price and volume data for when the coloured flashes do not occur.
I understand that 1) and 2) will encompass all the top of book prices and volume. However i would like to keep 1) and 2) separate/distinguished as far as data collection is concerned.
Time period for which the collected data intends to be stored -> 2-3 hours.
What kind of languages do I need to know to do the above?
I understand that I need to be an advanced excel user too.
Long term goals -
I intend to use the above information to make discretionary intraday trading decisions.
In the long run I will get more involved with creating an algo or indicator to help with the decision making process, which would include the information above.
I have understood that one needs to know coding to get involved in activities such as the above. Hence I have started learning C ++. More so to get a hang/feel for coding.
I have been searching all over the web as to where to start in this endeavor. However I am quite confused and overwhelmed with all the information.
Hence apart from the specific data export query, any additional guidelines would also be helpful.
As of now I use MT4 to trade. Hence I believe to do the above - I will need more than just MT4.
Any help would be highly appreciated.
Yes, MetaTrader4 is still not able ( in spite of all white-label-ed Terminals' OrderBook Add-On(s) marketing and PR efforts ) to provide an OrderBook-L2/DoM-data into your MQL4 / NewMQL4 algorithm for any decision making. Third party software tools' integration is needed to make MQL4-code aware of the real-time L2/DoM-data.
LMAX widget has impressive look & feel, however for your Excel export it requires a lot of programming efforts to re-use it for an automated scanner to produce data for 1 & 2 while there may be some further, non-technical, troubles on legal / operational restrictions for automated scanner to be operated on such data-source. To bring an example, the data-publisher policy restrict automated Options-pricing scanners for options on { FTSE | CAC | AMS | DAX }, may re-visit the online published data-sources no more than once a quarter of an hour and get blocked / black-listed otherwise. So a care and a proper data-source engineering is in place.
Size of data collection is another issue. Excel has some restrictions on an amount of rows/columns that may get imported. Large data-files, the more the CSV-imports may strike these limits. L2/DoM-data, collected for 2-3 hours just for one single FX Major may go beyond such a limit, as there are many records per second ( tens, if not hundreds, with just a few miliseconds between them ). Static file-size of collected data-records take typically several minutes to just get written on disk, so proper distributed processing data-flow-design and non-blocking-fileIO engineering is a must.
Real-time system design is the right angle to view the problem solution approach, rather than just some programming language excersise. Having mastered some programming language is a great move, nevertheless, so called robust real-time system design, and Trading software is such a domain, requires, with all respect, a lot more insights and hands-on experience than to make an MQL4 code run multi-thread-ed & multi-process-ed with a few DLL services for a Cloud/Grid-based distributed processing system.
How much real-time traffic is expected to be there?
For just a raw idea, what the Market can produce per second, per milisecon, per microsecond, let's view a NYNEX traffic analysis for one instrument:
One second can have this wild relief:
And once looking into 5-msec sampling:
How to export
Check if the data-source owner legally permits your automated processing.
Create your own real-time DataPump software, independent of the HTML-wrapped Widget
Create your own 'DB-store' to efficiently off-load scanned data-records from real-time DataPump
Test the live data-source >> DataPump >> DB-store performance & robustness on being able to serve error-free a 24/6 duty for several FX Majors in parallel
Integrate your DataPump fed DB-store local data-source for on-line/off-line interactions with your preferred { MT4 | Excel | quantitative-analytics } package
Integrate a monitoring of any production environment irregularity in your real-time processing pipeline, which may range from network issues, VPN / hosting issues, data-source availability issues to an unexpected change in the scanned data-source format/access conditions.

Calculate distance to RFID tag?

Is there a way to calculate/estimate the physical distance to a long-distance passive RFID tag when reading it with a tag reader? E.g. to determine the order of books in a shelf, or telling if one object is close or far away.
If the answer is 'No - not according to the standard', would it be possible to build a reader with this feature? (I guess the only way to achieve this would be to measure the time between call and response very precisely).
It is possible, but to what extent end precision depends on a lot of factors: reader and tag performance, the quality of the software and the resources you are willing to invest in such a software (both time and people in R&D).
There are mainly two ways this can be achieved: The first one relies on getting the RSSI, which is basically the signal strength. The main difficulty using this indicator is that signal strength depends on a lot of factors that can influence it like, reflections if the signal needs to pass a wood cabinet or a wall, the quality of the tag, etc.
The second one is use the time the response is received to an enquiry (Time Differece of Arrival between tags). Given that you know the speed of the beam you can estimate the distance given a very precise timer. The problem here is that this also is influenced by a lot of factors: the mean time the tag needs to complete a cycle (which you should know, and should be the same for every tag used), the timer precision which is not built precisely for these purposes.
Naturally a combination of both should be employed for maximum precission and both are actually used by companies that rely on these algorithms to provide RTLS (Real Time Location Systems) application through Triangulation and Trilateration.
For further information you can check: RTLS, RSSI, TDOA, Trilateration (and Multilateration).
It is possible. As far as I know the company below (I'm not working there, I just happen to know someone who worked there a year before):
http://www.lambda4.com/
is working on such a technology.
It may not be possible if you have a single reader; however if you have multiple receivers and reasonably clear lines of sight , "estimating" the distance becomes possible by looking at signal strengths. It's not trivial though, since the power radiated by a RFID tag is not isotropic (I.e. not uniform in all directions) due to the antenna design; if you have three receivers and a uniform source of RF, you can solve for the distance, but when you add in the antenna pattern and other factors like signal path attenuation and multiparty, it becomes really hard - especially when there are multiple devices in the vicinity.
This is at least in part because the RFID was not designed with an output pattern that helps optimize localization, such as a frequency chirp, short power bursts, or other modulation features that allow estimating the time of flight of the signal from source to receiver and back.
General equation to find distance to RFID tag is Ploss = 20⋅log[
(4 π ⋅ d)
/λ]
In case of UHF RFID, the equation to find the gap or distance to passive tag from the reader is
Pgap = 22.6(dB)+ Patt, where 22.6dB is the power for near field(λ =c/f ≈ 35cm), where f is frequency operated, Patt is the magnitude of POWER ATTENUATOR
22.6+Patt = 20⋅log[(4 π ⋅ d)/λ],
In free space, by using the above equation, the approximate distance to RFID tag may be acheived..

Preferable Tag Cloud Visualization Formats

Out of curiosity, I would love to know what tag clouds formats best serve the purpose of discovery of more and more (relevant)content?
I am aware of 3 formats, but don't know which one is the best.
1) delicious one - color shading
2) The standard one with font size variations -
3) The one on this site - numbers showing importance/usage.
So which ones do you prefer? and why?
Edit:
Thanks to the answers below, I now have much more understanding of tag cloud visualization techniques.
4) Parallel Tag Clouds - a simple use of parallel coordinates technique. I find it more organized and readable.
5) voroni diagram - more useful for identifying tag relationships and making decisions based on them. Doesn't serves our purpose of discovery of relevant content.
6) Mind maps - They are good and can be employed to step by step filter content.
I found some more interesting techniques here - http://www.cs.toronto.edu/~ccollins/research/index.html
I really do think that depends on the content of the information and the audience. What's relevant to one is not relevant to another. If an audience is more specialized, then they will be more likely to think along the same lines, but it would still need to be analyzed and catered to by the content provider.
There are also multiple paths that a person can take to "discover more". Take the tag "DNS" for example. You could drill down to more specific details like "UDP Port 53" and "MX Record", or you could go sideways with terms like "IP address" "Hostname" and "URL". A Voronoi diagram shows clusters, but wouldn't handle the case where general terms could be related to many concepts. Hostname mapping to "DNS", "HTTP", "SSH" etc.
I've noticed that in certain tag clouds there's usually one or two items that are vastly larger than the others. Those sorts of things could be served by a mind map, where one central concept has others radiating out from it.
For the cases of lots of "main topics" where a mind map is inappropriate, there are parallel coordinates but that would be baffling to many net users.
I think that if we found an extremely well organized way of sorting clusters of tags while preserving links between generalities and specificities, that would be somewhat helpful to AI research.
In terms of which I personally prefer, I think the numeric approach is nice because infrequently referenced tags are still presented at a readable font size. I also think SO does it this way because they have vastly more tags to cover than the average size based cloud a la the standard.
I would go with #2 out of the options you listed above.
1 - The human eye recognizes and comprehends size differences much more effectively than color, when the color scale is along the same spectrum (ie, various blues as opposed to discrete individual colors).
3 - Requires the user to scan the full list and mathematically compare each individual number while scanning. No real meaningful relationship between tags without a lot of work on the users part.
So, going with #2, there are several considerations to take into account:
Keep the tags alphabetical. This affords the user another method of searching and establishes a known relationship between each (assuming they know the alphabet!). If they're unordered, it's just a crapshoot to find a single one.
If size comparison is absolutely critical (this usually isn't the case, as you can scale up each level by a certain percentage or pixel amount), use a monospaced font. Otherwise, certain letter combinations may end up looking larger than they actually are.
Don't include any commas, pipes, or other dividers. You're already going to have a lot of data in a small area - no need to clutter it up with debris. Space the tags out with a decent amount of padding, of course. Just don't double the number of visual elements by adding more than just the data.
Set a min/max font size and scale between those. There are situations where one tag may be so popular that visually it may appear exponentially larger than the others. Likewise, you don't want a tag to end up rendering at 1px! Set the min/max and adjust between as necessary.
size adjusted voroni diagram
- it shows which tags are inter-related
My favorite tag cloud format is the Wordle format. It looks great and it also does a pretty good job of fitting a lot of tags in a small space.

Resources