Text Display Implementation Across Multiple Platforms - text

I have been scouring the internet for weeks trying to figure out exactly how text (such as what you are reading right now) is displayed to the screen.
Results have been shockingly sparse.
I've come across the concepts of rasterization, bitmaps, vector graphics, etc. What I don't understand, is how the underlying implementation works so uniformly across all systems (windows, linux, etc.) in way we as humans understand. Is there a specification defined somewhere? Is implementation code open source and viewable by the general public?
My understanding as of right now, is this:
Create a font with an external drawing program, one character at a time
Add these characters into a font file that is understood by language-specific libraries
These characters are then read from the file as needed by the GPU and displayed to the screen in a linear fashion as defined by the parenting code.
Additionally, if characters are defined in a font file such as 'F, C, ..., Z', how are vector graphics (which rely on a set of coordinate points) supported? Without coordinate points, rasterization would seem the only option for size changes.
This is about as far as my assumptions/research goes.
If you are familiar with this topic and can provide a detailed answer that may be useful to myself and other readers, please answer at your discretion. I find it fascinating just how much code we take for granted that is remarkably complicated under the hood.

The following provides an overview (leaving out lots of gory details):
Two key components for display of text on the Internet are (i) character encoding and (ii) fonts.
Character encoding is a scheme by which characters, such as the Latin capital letter "A", are assigned a representation as a byte sequence. Many different character encoding schemes have been devised in the past. Today, what is almost ubiquitously used on the Internet is Unicode. Unicode assigns each character to a code point, which is an integer value; e.g., Unicode assigns LATIN CAPITAL LETTER A to the code point 65, or 41 in hexadecimal. By convention, Unicode code points are referred to using four to six hex digits with "U+" as a prefix. So, LATIN CAPITAL LETTER A is assigned to U+0041.
Fonts provide the graphical data used to display text. There have been various font formats created over the years. Today, what is ubiquitously used on the Internet are fonts that follow the OpenType spec (which is an extension of the TrueType font format created back around 1991).
What you see presented on the screen are glyphs. An OpenType font contains data for the glyphs, and also a table that maps Unicode code points to corresponding glyphs. More precisely, the character-to-glyph mapping (or 'cmap') table maps Unicode code points to glyph IDs. The code points are defined by Unicode; the glyph IDs are a font-internal implementation detail, and are used to look up the glyph descriptions and related data in other tables.
Glyphs in an OpenType font can be defined as bitmaps, or (far more common) as vector outlines (Bezier curves). There is an assumed coordinate grid for the glyph descriptions. The vector outlines, then, are defined as an ordered list of coordinate pairs for Bezier curve control points. When text is displayed, the vector outline is scaled onto a display grid, based on the requested text size (e.g., 10 point) and pixel sizing on the display. A rasterizer reads the control point data in the font, scales as required for the display grid, and generates a bitmap that is put onto the screen at an appropriate position.
One extra detail about displaying the rasterized bitmap: most operating systems or apps will use some kind of filtering to give glyphs a smoother and more legible appearance. For example, a grayscale anti-alias filter will set display pixels at the edges of glyphs to a gray level, rather than pure black or pure white, to make edges appear smoother when the scaled outline doesn't align exactly to the physical pixel boundaries—which is most of the time.
I mentioned "at an appropriate position". The font has metric (positioning) information for the font as a whole and for each glyph.
The font-wide metrics will include a recommended line-to-line distance for lines of text, and the placement of the baseline within each line. These metrics are expressed in the units of the font's glyph design grid; the baseline corresponds to y=0 within the grid. To start a line, the (0,0) design grid position is aligned to where the baseline meets the edge of a text container within the page layout, and the first glyph is positioned.
The font also has glyph metrics. One of the glyph metrics is an advance width for each given glyph. So, when the app is drawing a line of text, it has a starting "pen position" at the start of the line, as described above. It then places the first glyph on the line accordingly, and advances the pen position by the amount of that first glyph's advance width. It then places the second glyph using the new pen position, and advances again. And so on as glyphs are placed along the line.
There are (naturally) more complexities in laying out lines of text. What I described above is sufficient for English text displayed in a basic text editor. More generally, display of a line of text can involve substitution of the default glyphs with certain alternate glyphs; this is needed, for example, when displaying Arabic text so that characters appear cursively connected. OpenType fonts contain a "glyph substitution" (or 'GSUB') table that provides details for glyph substitution actions. In addition, the positioning of glyphs can be adjusted for various reasons; for example, to position a diacritic glyph correctly over a letter. OpenType fonts contain a "glyph positioning" ('GPOS') table that provides the position adjustment data. Operating system platforms and browsers today support all of this functionality so that Unicode-encoded text for many different languages can be displayed using OpenType fonts.
Addendum on glyph scaling:
Within the font, a grid is set up with a certain number of units per em. This is set by the font designer. For example, the designer might specify 1000 units per em, or 2048 units per em. The glyphs in the font and all the metric values—glyph advance width, default line-to-line distinance, etc.—are all set in font design grid units.
How does the em relate to what content authors set? In a word processing app, you typically set text size in points. In the printing world, a point is a well defined unit for length, approximately but not quite 1/72 of an inch. In digital typography, points are defined as exactly 1/72 of an inch. Now, in a word processor, when you set text size to, say, 12 points, that really means 12 points per em.
So, for example, suppose a font is designed using 1000 design units per em. And suppose a particular glyph is exactly 1 em wide (e.g., an em dash); in terms of the design grid units, it will be exactly 1000 units wide. Now, suppose the text size is set to 36 points. That means 36 points per em, and 36 points = 1/2", so the glyph will print exactly 1/2" wide.
When the text is rasterized, it's done for a specific target device, that has a certain pixel density. A desktop display might have a pixel (or dot) density of 96 dpi; a printer might have a pixel density of 1200 dpi. Those are relative to inches, but from inches you can get to points, and for a given text size, you can get to ems. You end up with a certain number of pixels per em based on the device and the text size. So, the rasterizer takes the glyph outline defined in font design units per em, and scales it up or down for the given number of pixels per em.
For example, suppose a font is designed using 1000 units per em, and a printer is 1000 dpi. If text is set to 72 points, that's 1" per em, and the font design units will exactly match the printer dots. If the text is set to 12 points, then the rasterizer will scale down so that there are 6 font design units per printer dot.
At that point, the details in the glyph outline might not align to whole units in the device grid. The rasterizer needs to decide which pixels/dots get ink and which do not. The font can include "hints" that affect the rasterizer behaviour. The hints might ensure that certain font details stay aligned, or the hints might be instructions to move a Bezier control point by a certain amount based on the current pixels-per-em.
For more details, see Digitizing Letterform Designs and Font Engine from Apple's TrueType Reference Manual, which goes into lots of detail.


Is it possible to have pixel hinting on vector graphics of an unknown size on a webpage?

Glyphs in typefaces for screens often use hinting to align the shapes with the screen pixels so the result has sharp edges. Could I do something similar with arbitrary vector graphics on a webpage?
I know that I can align lines with pixels in a vector graphic, but that works at only the default size and its integer multiples. My idea is that the graphic would have hinting similar to what is used in typefaces to have sharp edges at all sizes.
This could be used for icons, text decorations or list item markers and for prerendered math formulae. In the case of a formula, the hinting would be automatically derived from the hinting of glyphs in the typeface used to render the formula.
SVG supports two CSS properties for pixel alignment optimization:
shape-rendering handles edges of grafic primitives and the anti-aliasing applied.
text-rendering handles the positioning of glyphs and the way font-internal rendering hints are applied.
Both are presentation attributes that can be used either in CSS styles or as XML attributes.
Both act under the caveat that the values of the properties are treated as hints, with the browser free to interpret them the optimal way.
There is not one solution that will work out in every situation. A prominent case is text rendered at an angle to a horizontal line, or text along a curved path. If you choose to optimizeLegibility, the individual glyphs will often be slightly rotated and moved away from their precise position and may not remain in a straight line. If you choose geometricPrecision, especially small fonts may suffer from degrading legibility.
For grafic primitives, the most pronounced effects show up for narrow (curved) strokes and for multiple grafical primitives that have a common edge (think two rectangles next to each other). There, hinting (to turn antialiasing on - geometricPrecision or off - crispEdges) may help in some situations, but in others you still have to resort to wider strokes or overlapping areas.
Another fallback technique may include restricting the scaling of a grafic to only some multiples or fractions of integers, so that you still have control over pixel alignment.

Are RGB images converted to sRGB automatically before being viewed in web browser?

If we have an RGB image, most browsers and, in fact, monitors only support sRGB space. I am trying to understand something important. Does the monitor/web then convert each of the pixels in the image to sRGB and then display it? Meaning we are actually seeing the sRGB version of the image.
Also, if that is the case, which formula can we use to do the conversion, and if we did the conversion ourself, I assume we would get an image that 'looks' exactly the same as the original?
The pixel values in an image file are just numbers. If the image is "in" a space larger than sRGB (such as ProPhoto), the pixel values will only be "converted" to sRGB if you have color management enabled, OR you perform the conversion yourself.
A browser or device will only convert tagged images of a non-sRGB colorspace TO sRGB IF there is a color management engine.
With no color management, and a standard sRGB monitor, all images will displays "as if" they were sRGB, regardless of their colorspace. I.e. they may display incorrectly.
Even with color management, if the image is untagged, it will be displayed as whatever default (usually sRGB) the system is set to use.
As for formulas: the conversion is known generally as "gamut mapping" — you are literally mapping the chromaticity components from one space to another. There are multiple techniques and methods which I will discuss below with links.
If you want to do you own colorspace conversions, take a look at Bruce Lindbloom's site. If you want a color management engine you can play around with, check out Argyll, and here is a list of open source tools.
EDIT TO ADD: Terms & Techniques
Because there are other answers here with some "alternate" (spurious) information, I'm editing my answer here to add and help clarify.
sRGB uses the same primary and whitepoint chromaticities as Rec.709 (aka HDTV). The tone response curve is slightly different.
sRGB was developed by HP and Microsoft in 1996, and was set as an international standard by IEC circa 1998.
W3.org (World Wide Web Consortium.) set sRGB as the standard for all web content, defined in CSS Color.
Side note, "HTML" is not a standards organization, it is a markup language. sRGB was added to the CSS 3 standard.
Profiles do nothing if there is no color management system in place.
And to be clear (in reference to another answer): RGB is a color MODEL, sRGB is a color SPACE, and a parsable description like an ICC profile of sRGB is a color PROFILE.
Second, sRGB is the STANDARD for web content, and RGB values with no profile displayed in a web browser are nominally assumed to be sRGB, and thus interpreted as sRGB in most browsers. HOWEVER, if the user has wide gamut (non-sRGB monitors) and no color management, then a non-color managed browser is typically displaying at the display's colorspace which can have unexpected results.
RGB is an additive COLOR MODEL. It is so named as it is a tristimulus model that uses three narrow band light "colors" (red green and blue) which are chosen to stimulate each of the eye's cone types as independently as possible.
sRGB is a colorSPACE. A color space is a subset of a color model, but adding in specifics such as the chromaticities of the primary colors, the white point, and the tone response curve (aka TRC or gamma).
sRGB-IEC61966-2.1.icc is an ICC color PROFILE of the sRGB colorspace, used to inform color management software to the specifics such that appropriate conversion can take place.
Some profiles relate to a specific device like a printer.
Some profiles are used as a "working space".
An ICC profile includes some of the math related information to apply the profile to a given target.
Color Management is a system that uses information about device profiles and working color space to handle the conversion for output, viewing, soft proofing on a monitor, etc. See This Crash Course on CM
LUT or LookUp Table is another file type that can be used to convert images or apply color "looks".
Gamut mapping is the technique to convert, or map, the color coordinates of one color space to the coordinates of a different color space. The method for doing this depends on a number of factors, and the desired result or rendering intent.
Rendering intent means the approach used, as in, should it be perceptual? Absolute colorimetric? Relative with black point compensation? Saturation? See this discussion of mapping and Argyll.
Colorspace transforms and conversions are non-trivial. It's not as if you can just stick image data through a nominal formula and have a good result.
A reading through the several links above will help, and I'd also like to suggest Elle Stone's site. particularly regarding profiles.
Note: I use what I think the most common notation (so as the alternate notation in the previous answer): RGB is a colour model, [so formula to calculate various things, but without defined colorimetry, scale, and gamma; sRGB is a colour space, so with a defined gamut; with a colour space we know which colour could be described and which not; and profile is a characterisation of a device (so it defines device specific colour space), intent, and often also some calculation methods to transform colours.
sRGB was defined by computer manufacturers and software companies, to standardize colours, but with old screens and low resolution, it really didn't matter much. Note: They used the primary colour of Rec.709 (HDTV), but with a different white point and gamma (view conditions are different: we watched TV and movies in darker rooms; we have computer for work that we use in brighter lit rooms).
So the normal way (before colour profiles): An image had 3 channels with values from 0 to 255 each, one for red, one for green, one for blue. This was send directly to video memory, and the video card sent these values without modifying them (on digital RGB signals) to the screen. The screen used the 3 channel values for the intensities of the 3 sub-pixels. Note: contrast and brightness control [on CRT screens] permitted some correction.
Because the assumed colour space was sRGB (and screens were built to display sRGB), this was the standard, and it was standardized by HTML (as default colour space). So if your browser has not an explicit colour space (e.g. for an image), it will assume it is sRGB, so it will not change the values.
Screens improved, creation and modification of content started to be done on computers, and there are many media which have a different colour space, images started to specify the colour space: TV has a restricted range (16-235) and a different gamma (and white point), DCI-P3 (digital cinema) has different gamma and primaries (wide-gamut), printing requires often wider gamut (forget small CMYK printers), printing photos also requires different dynamic ranges, gamma, white, and colour space.
So now (assuming an RGB image, but note that many images are not RGB, but YCC (e.g. JPEG)), an image should have its own profile, which tells us the colour characteristic of the camera (so which red is the value 255,0,0). A colour aware program will check the output profile, and the input profile, and it will adapt the colours, so that the final result is near to the intended colour.
So, if you have an unprofiled or an sRGB image, and no profile for your screen (or a fake sRGB profile): the value 255,0,0 will display the most intense and "red-dest" Red that your screen can display.
If you have an unprofiled image, but a profile for the output screen: if the intent is "absolute": the screen tries as best as it can to match the colours according to sRGB. Out of gamut will be just as the nearer in gamut colour. The "relative" intent: it scales many values, so that you will not see highlights (same colour for many out of gamut values). Eyes will correct, so you will adapt (and we adapt quickly e.g. to unsaturated colour spaces as sRGB). The other intents are more about graphics, so it keeps the values: different as the original, but as distinct as possible (for plots and comics this could be good).
If you have a profiled image, it is nearly the same, just that you will find more differences.
An AdobeRGB image (but without profile) will be displayed with the correct saturation on most wide-gamut screens (with wide gamut enabled), and it will be displayed as unsaturated on a RGB screen (if there is no profile; "absolute and perceptual intent" could correct the lack of saturation).
On contrary, an sRGB image, but displayed on AdobeRGB will be seen as too saturated. If the image has a profile, the image will be seen correctly.
On an RGB image (usual formats) you cannot have colour out of gamut of such image: 255,0,0 and 0,255,0 and 0,0,255 are the primary colour of the image colour space, so you can describe only colours in its colour space (assume sRGB if none is specified). This is not true on some formats, where negative values, or values above "white values" are allowed, e.g. on format with float point values (openEXR).
Note: Wide gamut screens have often a hardware button to switch colour space, from the native to sRGB (and back), because many applications were not compatible with colour profiles, but we still need browsers and mails.
If you are interested, the book of Giorgianni et al. (from Kodak) is a good introduction: both authors worked at Kodak (so film [photo, movies], but they were working creating the PhotoCD), so with a lot of problems with screens and colour spaces, and intent. ICC (the standard for profile) is (in my opinion) the follow up of such book: the ICC site has various information about the topic.
In very simple terms: RGB is a color space while sRGB is a color profile.
A profile interprets the RGB values to a specific context e.g. device, software, browser etc. to make sure you have consistent colors across a wide range of devices where a user might see your picture.
RGB values without a profile are basically useless because the displaying device or software has to guess how to map the RGB values to the displays color space. An equivalent is to imagine you getting 100 bills of an unknown currency and being asked to convert this to your home currency. It doesn't work – you need to know how to map these two valuewise.
So basically you don't have to worry about interpreting your images yourself. For web is seems the soundest approach to always convert to the sRGB profile (the images color space is still RGB) and let the browser do the interpretation.
You'll find seemingly up to date info with a graphic of the main browsers and their ability to correctly display the sRGB profile on this EIZO page.
PS: To add to the general confusion around color management – a color space might sometimes be called color model while a color profile might sometimes be called color space.

Why do we use the term DPI for matters involving images on computers

I'm told that DPI and Points are no longer relevant in terminology involving graphical displays on computer screens and mobile devices yet we use the term "High DPI Aware" and in Windows you can set the various DPI levels (96, 120, 144, 192).
Here is my understanding of the various terms that are used in displaying images on computer monitors and devices:
DPI = number of dots in one linear inch. But DPI refers to printers and printed images.
Resolution = the number of pixels that make up a picture whether it is printed on paper or displayed on a computer screen. Higher resolution provides the capability to display more detail. Higher DPI = Higher resolution, however, resolution does not refer to size, it refers to the number of pixels in each dimension.
DPI Awareness = an app takes the DPI setting into account, making it possible for an application to behave as if it knew the real size of the pixels.
Points and Pixels: (There are 72 points per inch.)
At 300 DPI, there are 300 pixels per inch. So 4.16 Pixels = 1 point.
At 96 DPI there are 1.33 pixels in one point.
Is there a nice way to "crisply" describe the relationship between DPI, PPI, Points, and Resolution?
You are correct that DPI refers to the maximum amount of detail per unit of physical length.
Computer screens are devices that have a physical size, so we speak of the number of pixels per inch they have. Traditionally this value has been around 80 PPI, but now it can be up to 400 PPI.
The notion of "High DPI Aware" (e.g. Retina) is based on the fact that physical screen sizes don't change much over time (for example, there have been 10-inch tablets for more than a decade), but the number of pixels we pack into the screens is increasing. Because the size isn't increasing, it means the density - or the PPI - must be increasing.
Now when we want to display an image on a screen has more pixels than an older screen, we can either:
Map the old pixels 1:1 onto the new screen. The physical image is smaller due to the increased density. People start to complain about how small the icons and text are.
Stretch the old image and fill in the extra details. The physical image is the same size, but now there are more pixels to represent the content. For example, this results in font curves being smoother and photographs showing more fine details.
The term DPI (Dots Per Inch) to refer to device or image resolution came into common use well before the invention of printers that could print multiple dots per pixel. I remember using it in the 1970's. The term PPI was invented later to accommodate the difference, but the old usage still lingers in places such as Windows which was developed in the 1980's.
The DPI assigned in Windows rarely corresponds to the actual PPI of the screen. It's merely a way to specify the intended scaling of elements such as fonts.
DPI vs. resolution – What’s the difference?
The acronym dpi stands for dots per inch. Similarly, ppi stands for pixels per inch. So, why have two different acronyms for measuring roughly the same thing? Because there is a key difference between the two and if you don’t understand this difference it can have a negative impact on your digital signage project.
Part of the confusion between the two terms stems from the fact that many people who use them are lazy and tend to use the terms interchangeably. The simplest way of thinking about them is that one is digital (ppi) and represents what you see on the computer screen and the other is physical (dpi) for example, how an image appears when you print it out on a piece of paper.
I suggest you to check this in-depth article talking about the technicality of this topic.

How to convert from alphabets to pixels

Do you know a program or script that converts from a letter to a matrix (consisting of 0 and 1) representing the letter?
For example, from letter I to a matrix something like this: (it's an LED pannel showing letter I):
Please let me know a way to create such matrix other than hand typing
The only solution is to use font.
well for HW implementation I usually used EGA/VGA 8x8 font
extracted from gfx card BIOS you can do it easy in MS-DOS environment
another way is to extract font programaticaly from image
draw entire font to bitmap (in line or in matrix ..., or use some already created like mine). Use fixed pitch, font size most suitable your needs and do not forget that almost none modern font supports fixed pitch so use OEM_CHARSET and System named font from it. Set color properly (ideal is black background and white font) and read image pixel by pixel and store it as table of numbers. Pixel with not background color is set pixel.
Do not compare to font color because of anti-aliasing and filters. Now read all characters and set/res corresponding bit inside font table. First compute start x,y of character in image (from ASCII code and image organization) then do 2 nested 8-steps x,y for loops (in order according to your font[] organization)
set/res corresponding font[] bits at addresses 8*ASCII to 8*ASCII+8
I assume you use MCU to control LED panel
the font organization in memory is usually that 8-bit number represents single row in character. Of course if your LED panel is meant to display animated scroll then column organization of font and also HW implementation will ease things up a lot. If you have 16 bit MCU and IO access than you can use 16-bit / pixels font size.
if you have more than 8 pixels and only 8 bit MCU you can still use 16 bit data but the IO access will be in two steps via two IO ports instead of 1. I strongly recommend whole data-wide IO access instead of set/res individual IO lines its much quicker and can prevent flickering
OK here is my old 8x8 font I used back in the days ... I think this one is extracted from EGA/VGA BIOS but I am not shore ... it was too many years ago
Now the fun part
const BYTE font[8*256]={ 0,0,0,0,0,0,0,0, ... }
any character is represented as 8 numbers. if bit is 0 then it means paper (background pixel) if bit is 1 then it means ink (font pixel). Now there are more combinations (left to right, up to down and their combinations)
OK ( row-vise | left to right | up to down ) organization means:
first number is up most row of char
msb is left most pixel
lsb is right most pixel
so for example char '1' in 8x8 will be something like this (b means binary number):
When you have extracted all characters to font table than save it as source code to file which you will later include in your MCU code (can be placed in EEPROM for pgm-code)
Now the algorithm to print char on LED panel is strongly depended on your HW implementation
so please post a circuit diagram of interconnection between LED panel and control system
specify target platform and language
specify desired functionality
I assume you want left moving scroll by pixel step
the best fit will be if your LED panel is driven by columns not rows
You can activate single column of LEDs by some data IO port (all bits can be active at a time) and selecting which one column is active is driven by another select IO port (only single bit can be active at a time). So in this case compute the start address of the column to display in font table:
address = (8*ASCII + x_offset)
send font[8*ASCII + x_offset] to data IO port
activate select IO port with the correct bit active
wait a while (1-10ms) ... so you can actually see the light if delay is too short then there is no brightness if delay is too long then there is flickering so you need to experiment a little (depends on number of select bits).
deactivate select IO port
repeat with the next column
x_offset is the scrolling shift
if your HW implementation does not fit in such way don't worry
just use bit SHIFT,AND,OR operations to create the data words in memory and then send them in similar manner
Hope it helps a litle
You could try to find a font that looks the way you want (probably a monospaced font such as Courier), draw/rasterize it with a certain size (8pt?), without anti-aliasing, and convert the resulting image to your matrix format.

Filling text outlines in Direct3D

I'm suprised that Google doesn't shed much light on this.
I'm creating a simple CAD viewer using Direct3D. Because of it's nature (zoom functionality etc) text elements must be vector text; I can't use textured polys.
I've called into gdi32.dll to get the glyphs and create quite reasonable text outlines from straight lines and bezier curves, however the text isn't solid and the points aren't necessarily regular in any way. Enclosing characters (b, p, o, A, etc) actually have more than one seperate outline.
As a consequence, I can't just shoot the points into a vertex buffer and specify a primitive type.
All I can do at the moment is render the outlines as line strips, resulting in hallow text.
Can anyone suggest a good strategy for rendering solid vector text with their outlines?
Note that I interpolate the bezier curves into point lists (A lot of people use shaders/witchcraft).
You don't mention what version of DirectX you are using, but the utility function D3DXCreateText will create a 3D mesh for a given text in any TrueType font. If you want a 2D version, simply use no or minimal extrusion, and straight-on orthogonal projection.
If you need explicit outlines, you might be able to either (a) combine this approach with the Outline you already have, (b) draw the text twice at a slightly different scale (depending on current zoom level) or (c) use shaders to draw a pixel-perfect outline.
A screenshot of the exact look-and-feel you are after might help. My CAD drawings all have solid text, no outlines.
I am creating text meshes with D3DXCreateText (Win32, DX9). They rotate nicely. However, they always seem to be the same size regardless of the height of the font that has been selected in the DC.
The mesh lines in smaller characters are aliased and don't look good on video cards without multisampling.
