I have been scouring the internet for weeks trying to figure out exactly how text (such as what you are reading right now) is displayed to the screen.
Results have been shockingly sparse.
I've come across the concepts of rasterization, bitmaps, vector graphics, etc. What I don't understand, is how the underlying implementation works so uniformly across all systems (windows, linux, etc.) in way we as humans understand. Is there a specification defined somewhere? Is implementation code open source and viewable by the general public?
My understanding as of right now, is this:
Create a font with an external drawing program, one character at a time
Add these characters into a font file that is understood by language-specific libraries
These characters are then read from the file as needed by the GPU and displayed to the screen in a linear fashion as defined by the parenting code.
Additionally, if characters are defined in a font file such as 'F, C, ..., Z', how are vector graphics (which rely on a set of coordinate points) supported? Without coordinate points, rasterization would seem the only option for size changes.
This is about as far as my assumptions/research goes.
If you are familiar with this topic and can provide a detailed answer that may be useful to myself and other readers, please answer at your discretion. I find it fascinating just how much code we take for granted that is remarkably complicated under the hood.
The following provides an overview (leaving out lots of gory details):
Two key components for display of text on the Internet are (i) character encoding and (ii) fonts.
Character encoding is a scheme by which characters, such as the Latin capital letter "A", are assigned a representation as a byte sequence. Many different character encoding schemes have been devised in the past. Today, what is almost ubiquitously used on the Internet is Unicode. Unicode assigns each character to a code point, which is an integer value; e.g., Unicode assigns LATIN CAPITAL LETTER A to the code point 65, or 41 in hexadecimal. By convention, Unicode code points are referred to using four to six hex digits with "U+" as a prefix. So, LATIN CAPITAL LETTER A is assigned to U+0041.
Fonts provide the graphical data used to display text. There have been various font formats created over the years. Today, what is ubiquitously used on the Internet are fonts that follow the OpenType spec (which is an extension of the TrueType font format created back around 1991).
What you see presented on the screen are glyphs. An OpenType font contains data for the glyphs, and also a table that maps Unicode code points to corresponding glyphs. More precisely, the character-to-glyph mapping (or 'cmap') table maps Unicode code points to glyph IDs. The code points are defined by Unicode; the glyph IDs are a font-internal implementation detail, and are used to look up the glyph descriptions and related data in other tables.
Glyphs in an OpenType font can be defined as bitmaps, or (far more common) as vector outlines (Bezier curves). There is an assumed coordinate grid for the glyph descriptions. The vector outlines, then, are defined as an ordered list of coordinate pairs for Bezier curve control points. When text is displayed, the vector outline is scaled onto a display grid, based on the requested text size (e.g., 10 point) and pixel sizing on the display. A rasterizer reads the control point data in the font, scales as required for the display grid, and generates a bitmap that is put onto the screen at an appropriate position.
One extra detail about displaying the rasterized bitmap: most operating systems or apps will use some kind of filtering to give glyphs a smoother and more legible appearance. For example, a grayscale anti-alias filter will set display pixels at the edges of glyphs to a gray level, rather than pure black or pure white, to make edges appear smoother when the scaled outline doesn't align exactly to the physical pixel boundaries—which is most of the time.
I mentioned "at an appropriate position". The font has metric (positioning) information for the font as a whole and for each glyph.
The font-wide metrics will include a recommended line-to-line distance for lines of text, and the placement of the baseline within each line. These metrics are expressed in the units of the font's glyph design grid; the baseline corresponds to y=0 within the grid. To start a line, the (0,0) design grid position is aligned to where the baseline meets the edge of a text container within the page layout, and the first glyph is positioned.
The font also has glyph metrics. One of the glyph metrics is an advance width for each given glyph. So, when the app is drawing a line of text, it has a starting "pen position" at the start of the line, as described above. It then places the first glyph on the line accordingly, and advances the pen position by the amount of that first glyph's advance width. It then places the second glyph using the new pen position, and advances again. And so on as glyphs are placed along the line.
There are (naturally) more complexities in laying out lines of text. What I described above is sufficient for English text displayed in a basic text editor. More generally, display of a line of text can involve substitution of the default glyphs with certain alternate glyphs; this is needed, for example, when displaying Arabic text so that characters appear cursively connected. OpenType fonts contain a "glyph substitution" (or 'GSUB') table that provides details for glyph substitution actions. In addition, the positioning of glyphs can be adjusted for various reasons; for example, to position a diacritic glyph correctly over a letter. OpenType fonts contain a "glyph positioning" ('GPOS') table that provides the position adjustment data. Operating system platforms and browsers today support all of this functionality so that Unicode-encoded text for many different languages can be displayed using OpenType fonts.
Addendum on glyph scaling:
Within the font, a grid is set up with a certain number of units per em. This is set by the font designer. For example, the designer might specify 1000 units per em, or 2048 units per em. The glyphs in the font and all the metric values—glyph advance width, default line-to-line distinance, etc.—are all set in font design grid units.
How does the em relate to what content authors set? In a word processing app, you typically set text size in points. In the printing world, a point is a well defined unit for length, approximately but not quite 1/72 of an inch. In digital typography, points are defined as exactly 1/72 of an inch. Now, in a word processor, when you set text size to, say, 12 points, that really means 12 points per em.
So, for example, suppose a font is designed using 1000 design units per em. And suppose a particular glyph is exactly 1 em wide (e.g., an em dash); in terms of the design grid units, it will be exactly 1000 units wide. Now, suppose the text size is set to 36 points. That means 36 points per em, and 36 points = 1/2", so the glyph will print exactly 1/2" wide.
When the text is rasterized, it's done for a specific target device, that has a certain pixel density. A desktop display might have a pixel (or dot) density of 96 dpi; a printer might have a pixel density of 1200 dpi. Those are relative to inches, but from inches you can get to points, and for a given text size, you can get to ems. You end up with a certain number of pixels per em based on the device and the text size. So, the rasterizer takes the glyph outline defined in font design units per em, and scales it up or down for the given number of pixels per em.
For example, suppose a font is designed using 1000 units per em, and a printer is 1000 dpi. If text is set to 72 points, that's 1" per em, and the font design units will exactly match the printer dots. If the text is set to 12 points, then the rasterizer will scale down so that there are 6 font design units per printer dot.
At that point, the details in the glyph outline might not align to whole units in the device grid. The rasterizer needs to decide which pixels/dots get ink and which do not. The font can include "hints" that affect the rasterizer behaviour. The hints might ensure that certain font details stay aligned, or the hints might be instructions to move a Bezier control point by a certain amount based on the current pixels-per-em.
For more details, see Digitizing Letterform Designs and Font Engine from Apple's TrueType Reference Manual, which goes into lots of detail.
I have gone through canvas and SVG in html5. When it comes to the difference, It is mentioned that canvas is pixel based and SVG is vector based. I have not got what do they mean by these.
Thanks in advance
There is 2 way to register an image in your computer:
Register in pixel: It means that your image is register as a table of pixel. And in every box of your table a color is register. Such images, have a defined size (1 computer pixel for 1 table box). If you want to reduce the size, an algorithm will mix pixel to render a lower size image. And if you want to display bigger than it is you will see pixel or the image will become blurred.
Register in vector: This kind of image do not own a size. Indeed the file format register vector (direction and scale). And when you want to display it, you will specify a size and the computer will process your image to display it. If you zoom on the image (for example a line). You will never see pixels. Indeed every time you zoom, the image is reprocessed and a line stay a line.
How i can increase perPixelTargetFind area in Fabricjs, because on ipad to hard target rect width 6-10 strokeWidth? But if increase area by 10 invisible pixels..what will be nice
Well, you have to increase the padding of the object too.
So targetFindTolerance work when you are inside the natural bounding box.
Outside the bounding box the perPixelTarget will not work.
So adding some tolerance will give you more target area from the inside of the rectangle but not from the outside.
Increasing the padding will give you bigger bounding box and so the ability of targetFindTolerance to work inside and outside the stroke.
"targetFindTolerance" option. Read docs before ask in SO =)
I have an image sequence (video). I would like to count the number of objects in the image sequence. But the main objective is to count them once, meaning not just in each and every frame, since an object may exist in for several frames. My idea is to count the objects as they exit the screen, because of less occlusions. I am thinking of doing this by scanning the bottom part of the image for non zero pixels.
I have a CV_FILLED binary image (from rectangle function) where I want to do the scanning, then create an instance on an object if abject is found. But this scanning will not be scanning each and every pixel along the horizontal line, just certain sections.
Like we could do it over ranges, say certain columns, then skip by a margin.
A sample binary image I have is attached . This is an image obtained from the feed. I do not want to count only the objects in this image, but also those that are still coming.
A full picture of detected objects is attached here.Your guidance or constructive criticism is welcome
* I do not want to use CVBlob
If you don't want to use cvBlobLib, you could use the contour detection that is part of OpenCV.
There is a tutorial on the website.
The doc for the method is here. Your image seem pretty simple, but if you get blobs with occlusions and so you want to look at the CV_RETR_EXTERNAL constant to get only the outer contours.
That is what I usualy use, even though it needs a bit more work to use the results of the method.
Hope this helps.
If the squares do not overlap at the bottom, I suggest the following:
scan the very bottom row of the image and identify those connected pixels which are white. Each white line will correspond to one square. Save the center of the white line segment and its length. In the next frame, do the same and associate the corresponding line segments to the previous (same length and center very close). When you cannot find a corresponding line segment anymore, the square has moved out of the image which means you can increase your squares counter by one. Note that line segments at the right and left ends of the line will have decreasing length with every frame.
Thx guys. I managed to solve this already. I used small ROIs along the paths of the squares, and found countNonZero() within the ROI.
I kept on checking with boolean variables to see if the ROI still had the white pixels. If not, incremented counter. Worked well, and I was able to count.
Thx for your input...
I'm looking for an idea for getting the most representative color in a grid of pixels. There is any algorithm for this? I'm not sure if the most representative is one of the colors appearing in the grid of is the average af all the pixels better?
alt text http://www.stan.mx/images/stackoverflowPixels.gif
Have a look at some color quantization algorithms. I found them to be the most effective method to generate palettes from photographs. Also, most image manipulation/processing libraries should have some fast quantization built in.
You are probably looking at "average" as percepted by human. First you need to change you colors representation in a color space that is specially designed to be
"perceptually uniform" (for calculation of color "distances") Lab* link text
Then, each color is a point in 3D color space. Now you can find the "center" of the cloud of points and this is the "most representative color".