only writing visible points to disk of an overplotted scatterplot - python-3.x

I am creating matplotlib scatterplots of around 10000 points. At the point size I am using, this results in overplotting, i.e. some of the points will be hidden by the points that are plotted over them.
While I don't mind about the fact that I cannot see the hidden points, they are redundantly written out when I write the figure to disk as pdf (or other vector format), resulting in a large file.
Is there a way to create a vector image where only the visible points would be written to the file? This would be similar to the concept of "flattening" / merging layers in photo editing software. (I still like to retain the image as vector, as I would like to have the ability to zoom in).
Example plot:
import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
random.seed(15)
df = pd.DataFrame({'x': np.random.normal(10, 1.2, 10000),
'y': np.random.normal(10, 1.2, 10000),
'color' : np.random.normal(10, 1.2, 10000)})
df.plot(kind = "scatter", x = "x", y = "y", c = "color", s = 80, cmap = "RdBu_r")
plt.show()

tl;dr
I don't know of any simple solution such as
RemoveOccludedCircles(C)
The algorithm below requires some implementation, but it shouldn't be too bad.
Problem reformulation
While we could try to remove existing circles when adding new ones, I find it easier to think about the problem the other way round, processing all circles in reverse order and pretending to draw each new circle behind the existing ones.
The main problem then becomes: How can I efficiently determine whether one circle would be completely hidden by another set of circles?
Conditions
In the following, I will describe an algorithm for the case where the circles are sorted by size, such that larger circles are placed behind smaller circles. This includes the special case where all circles have same size. An extension to the general case would actually be significantly more complicated as one would have to maintain a triangulation of the intersection points. In addition, I will make the assumption that no two circles have the exact same properties (radius and position). These identical circles could easily be filtered.
Datastructures
C: A set of visible circles
P: A set of control points
Control points will be placed in such a way that no newly placed circle can become visible unless either its center lies outside the existing circles or at least one control point falls inside the new circle.
Problem visualisation
In order to better understand the role of control poins, their maintenance and the algorithm, have a look at the following drawing:
Processing 6 circles
In the linked image, active control points are painted in red. Control points that are removed after each step are painted in green or blue, where blue points were created by computing intersections between circles.
In image g), the green area highlights the region in which the center of a circle of same size could be placed such that the corresponding circle would be occluded by the existing circles. This area was derived by placing circles on each control point and subtracting the resulting area from the area covered by all visible circles.
Control point maintenance
Whenever adding one circle to the canvas, we add four active points, which are placed on the border of the circle in an equidistant way. Why four? Because no circle of same or bigger size can be placed with its center inside the current circle without containing one of the four control points.
After placing one circle, the following assumption holds: A new circle is completely hidden by existing circles if
Its center falls into a visible circle.
No control point lies strictly inside the new circle.
In order to maintain this assumption while adding new circles, the set of control points needs to be updated after each addition of a visible circle:
Add 4 new control points for the new circle, as described before.
Add new control points at each intersection of the new circle with existing visible circles.
Remove all control points that lie strictly inside any visible circle.
This rule will maintain control points at the outer border of the visible circles in such a dense way that no new visible circle intersecting the existing circles can be placed without 'eating' at least one control point.
Pseudo-Code
AllCircles <- All circles, sorted from front to back
C <- {} // the set of visible circles
P <- {} // the set of control points
for X in AllCircles {
if (Inside(center(X), C) AND Outside(P, X)) {
// ignore circle, it is occluded!
} else {
C <- C + X
P <- P + CreateFourControlPoints(X)
P <- P + AllCuttingPoints(X, C)
RemoveHiddenControlPoints(P, C)
}
}
DrawCirclesInReverseOrder(C)
The functions 'Inside' and 'Outside' are a bit abstract here, as 'Inside' returns true if a point is contained in one or more circles from a seto circles and 'Outside' returns true if all points from a set of points lie outside of a circle. But none of the functions used should be hard to write out.
Minor problems to be solved
How to determine in a numerically stable way whether a point is strictly inside a circle? -> This shouldn't be too bad to solve as all points are never more complicated than the solution of a quadratic equation. It is important, though, to not rely solely on floating point representations as these will be numerically insufficient and some control points would probable get completely lost, effectively leaving holes in the final plot. So keep a symbolic and precise representation of the control point coordinates. I would try SymPy to tackle this problem as it seems to cover all the required math. The formula for intersecting circles can easily be found online, for example here.
How to efficiently determine whether a circle contains any control point or any visible circle contains the center of a new circle? -> In order to solve this, I would propose to keep all elements of P and C in grid-like structures, where the width and height of each grid element equals the radius of the circles. On average, the number of active points and visible circles per grid cell should be in O(1), although it is possible to contruct artificial setups with arbitrary amounts of elements per grid cell, which would turn the overall algorithm from O(N) to O(N * N).
Runtime thoughts
As mentioned above, I would expect the runtime to scale linearly with the number of circles on average, because the number of visible circles in each grid cell will be in O(N) unless constructed in an evil way.
The data structures should be easily maintainable in memory if the circle radius isn't excessively small and computing intersections between circles should also be quite fast. I'm curious about final computational time, but I don't expect that it would be much worse than drawing all circles in a naive way a single time.

My best guess would be to use a hexbin. Note that with a scatter plot, the dots that are plotted latest will be the only ones visible. With a hexbin, all coinciding dots will be averaged.
If interested, the centers of the hexagons can be used to again create a scatter plot only showing the minimum.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(15)
df = pd.DataFrame({'x': np.random.normal(10, 1.2, 10000),
'y': np.random.normal(10, 1.2, 10000),
'color': np.random.normal(10, 1.2, 10000)})
fig, ax = plt.subplots(ncols=4, gridspec_kw={'width_ratios': [10,10,10,1]})
norm = plt.Normalize(df.color.min(), df.color.max())
df.plot(kind="scatter", x="x", y="y", c="color", s=10, cmap="RdBu_r", norm=norm, colorbar=False, ax=ax[0])
hexb = ax[1].hexbin(df.x, df.y, df.color, cmap="RdBu_r", norm=norm, gridsize=80)
centers = hexb.get_offsets()
values = hexb.get_array()
ax[2].scatter(centers[:,0], centers[:,1], c=values, s=10, cmap="RdBu_r", norm=norm)
plt.colorbar(hexb, cax=ax[3])
plt.show()
Here is another comparison. The number of dots is reduced with a factor of 10, and the plot is more "honest" as overlapping dots are averaged.

Related

Networkx (or Graphviz) layout with fixed y positions

Are there any layout algorithms in networkx (or that I can call in Graphviz) that allow me to fix the Y-position of nodes in a DAG to a potentially different floating point value for each node, but spread out the X positions in some reasonable way (ideally attempting to minimise edge lengths or crossovers, although I suspect this might not be possible)? I can only find layouts that require nodes to be on discrete layers.
Added: Below is an example of the sort of graph topology I have, plotted using nx.kamada_kawai_layout. The thing is that these nodes have a "time" value (not shown here), which I want to plot on the Y axis. The vertices are directed in time, so that a parent node (e.g. 54 here) is always older than its children (here 52 and 53). So I want to lay this out with the Y position given by the node "time", and the X position such that crossings are minimised, in as much as that's possible (I know this is NP hard in general, but the layout below is actually doing a pretty good job.
p.s. usually all the leaf nodes, e.g. 2, 3, 7 here, are at time 0, so should be laid out at the bottom of the final layout.
p.p.s. Essentially what I would like to do is to imagine this as a spring diagram, "pick up" the root node (54) in the plot above and place it at the top of the page, with the topology dangling down, then adjust the Y-position of the children to the their internal "time" values.
Edit 2. Thanks to #sroush below, I can get a decent layout with the dot graphviz engine:
A = nx.nx_agraph.to_agraph(G)
fig = plt.figure(1, figsize=(10, 10))
A.add_subgraph(ts.samples(), level="same", name="cluster")
A.layout(prog="dot")
pos = {n: [float(x) for x in A.get_node(n).attr["pos"].split(",")] for n in G.nodes()}
nx.draw_networkx(G, pos, with_labels=True)
But I then want to reposition the nodes slightly so instead of ranked times (the numbers) they use their actual, floating point times. Like this:
true_times = nx.get_node_attributes(G, 'time')
reposition = {node_id: np.array([pos[node_id][0], true_times[node_id]]) for node_id in true_times}
nx.draw_networkx(G, reposition, with_labels=True)
As you can see, that squashed the nodes together rather a lot. Is there any way to increase the horizontal positions of those nodes to make them not bump into one-another? I could perhaps cluster some on to the same layer and iterate, but that seems quite expensive.
The Graphviz dot engine can get you pretty close. This is usually described as a "timeline" issue. Here is a graph that is part of the Graphviz source that seems to do what you want: https://www.flickr.com/photos/kentbye/1155560169

How to tell if an xyY color lies within the CIE 1931 gamut?

I'm trying to plot the CIE 1931 color gamut using math.
I take a xyY color with Y fixed to 1.0 then vary x and y from 0.0 to 1.0.
If I plot the resulting colors as an image (ie. the pixel at (x,y) is my xyY color converted to RGB) I get a pretty picture with the CIE 1931 color gamut somewhere in the middle of it, like this:
xyY from 0.0 to 1.0:
Now I want the classic tongue-shaped image so my question is: How do I cull pixels outside the range of the CIE 1931 color gamut?
ie. How can I tell if my xyY color is inside/outside the CIE 1931 color range?
I happened upon this question while searching for a slightly different but related issue, and what immediately caught my eye is the rendering at the top. It's identical to the rendering I had produced a few hours earlier, and trying to figure out why it didn't make sense is, in part, what led me here.
For readers: the rendering is what results when you convert from {x ∈ [0, 1], y ∈ [0, 1], Y = 1} to XYZ, convert that color to sRGB, and then clamp the individual components to [0, 1].
At first glance, it looks OK. At second glance, it looks off... it seems less saturated than expected, and there are visible transition lines at odd angles. Upon closer inspection, it becomes clear that the primaries aren't smoothly transitioning into each other. Much of the range, for example, between red and blue is just magenta—both R and B are 100% for almost the entire distance between them. When you then add a check to skip drawing any colors that have an out-of-range component, instead of clamping, everything disappears. It's all out-of-gamut. So what's going on?
I think I've got this one small part of colorimetry at least 80% figured out, so I'm setting this out, greatly simplified, for the edification of anyone else who might find it interesting or useful. I also try to answer the question.
(⚠️ Before I begin, an important note: valid RGB display colors in the xyY space can be outside the boundary of the CIE 1931 2° Standard Observer. This isn't the case for sRGB, but it is the case for Display P3, Rec. 2020, CIE RGB, and other wide gamuts. This is because the three primaries need to add up to the white point all by themselves, and so even monochromatic primaries must be incredibly, unnaturally luminous compared to the same wavelength under equivalent illumination.)
Coloring the chromaticity diagram
The xy chromaticity diagram isn't just a slice through xyY space. It's intrinsically two dimensional. A point in the xy plane represents chromaticity apart from luminance, so to the extent that there is a color there it is to represent as best as possible only the chromaticity, not any specific color. Normally the colors seem to be the brightest, most saturated colors for that chromaticity, or whatever's closest in the display's color space, but that's an arbitrary design decision.
Which is to say: to the extent that there are illustrative colors drawn they're necessarily fictitious, in much the same way that coloring an electoral map is purely a matter of data visualization: a convenience to aid comprehension. It's just that, in this case, we're using colors to visualize one aspect of colorimetry, so it's super easy to conflate the two things.
(Image credit: Michael Horvath)
The falsity, and necessity thereof, of the colors becomes obvious when we consider the full 3D shape of the visible spectrum in the xyY space. The classic spectral locus ("horse shoe") can easily be seen to be the base of a quasi-Gibraltian volume, widest at the spectral locus and narrowing to a summit (the white point) at {Y = 1}. If viewed as a top-down projection, then colors located on and near the spectral locus would be very dark (although still the brightest possible color for that chromaticity), and would grow increasingly luminous towards the center. If viewed as a slice of the xyY volume, through a particular value of Y, the colors would be equally luminous but would grow brighter overall and the shape of the boundary would shrink, again unevenly, with increasing Y, until it disappeared entirely. So far as I can tell, neither of these possibilities see much, if any, practical use, interesting though they may be.
Instead, the diagram is colored inside out: the gamut being plotted is colored with maximum intensities (each primary at its brightest, and then linear mixtures in the interior) and out-of-gamut colors are projected from the inner gamut triangle to the spectral locus. This is annoying because you can't simply use a matrix transformation to turn a point on the xy plane into a sensible color, but in terms of actually communicating useful and somewhat accurate information it seems, unfortunately, to be unavoidable.
(To clarify: it is actually possible to move a single chromaticity point into the sRGB space, and color the chromaticity diagram pixel-by-pixel with the most brightly saturated sRGB colors possible—it's just more complicated than a simple matrix transformation. To do so, first move the three-coordinate xyz chromaticity into sRGB. Then clamp any negative values to 0. Finally, scale the components uniformly such that the maximum component value is 1. Be aware this can be much slower than plotting the whitepoint and the primaries and then interpolating between them, depending on your rendering method and the efficiency of your data representations and their operations.)
Drawing the spectral locus
The most straightforward way to get the characteristic horseshoe shape is just to use a table of the empirical data.
(http://cvrl.ioo.ucl.ac.uk/index.htm, scroll down for the "historical" datasets that will most closely match other sources intended for the layperson. Their too-clever icon scheme for selecting data is that a dotted-line icon is for data sampled at 5nm, a solid line icon is for data sampled at 1nm.)
Construct a path with the points as vertices (you might want to trim some off the top, I cut it back to 700nm, the CIERGB red primary), and use the resulting shape as a mask. With 1nm samples, a polyline should be smooth enough for near any resolution: there's no need for fitting bezier curves or whatnot.
(Note: only every 5th point shown for illustrative purposes.)
If all we want to do is draw the standard horse shoe bounded by the triangle {x = 0, y = 0}, {0, 1}, and {1, 0} then that should suffice. Note that we can save rendering time by skipping any coordinates where x + y >= 1. If we want to do more complex things, like plot the changing boundary for different Y values, then we're talking about the color matching functions that define the XYZ space.
Color matching functions
(Image credit: User:Acdx - Own work, CC BY-SA 4.0)
The ground truth for the XYZ space is in the form of three functions that map spectral power distributions to {X, Y, Z} tristimulus values. A lot of data and calculations went into constructing the XYZ space, but it all gets baked into these three functions, which uniquely determine the {X, Y, Z} values for a given spectrum of light. In effect, what the functions do is define 3 imaginary primary colors, which can't be created with any actual light spectrum, but can be mixed together to create perceptible colors. Because they can be mixed, every non-negative point in the XYZ space is meaningful mathematically, but not every point corresponds to a real color.
The functions themselves are actually defined as lookup tables, not equations that can be calculated exactly. The Munsell Color Science Laboratory (https://www.rit.edu/science/munsell-color-lab) provides 1nm resolution samples: scroll down to "Useful Color Data" under "Educational Resources." Unfortunately, it's in Excel format. Other sources might provide 5nm data, and anything more precise than 1nm is probably a modern reconstruction which might not commute with the 1931 space.
(For interest: this paper—http://jcgt.org/published/0002/02/01/—provides analytic approximations with error within the variability of the original human subject data, but they're mostly intended for specific use cases. For our purposes, it's preferable, and simpler, to stick with the empirically sampled data.)
The functions are referred to as x̅, y̅, and z̅ (or x bar, y bar, and z bar.) Collectively, they're known as the CIE 1931 2 Degree Standard Observer. There's a separate 1964 standard observer constructed from a wider 10 degree field-of-view, with minor differences, which can be used instead of the 1931 standard observer, but which arguably creates a different color space. (The 1964 standard observer shouldn't be confused with the separate CIE 1964 color space.)
To calculate the tristimulus values, you take the inner product of (1) the spectrum of the color and (2) the color matching function. This just means that every point (or sample) in the spectrum is multiplied by the corresponding point (or sample) in the color matching function, which serves to reweight the data. Then, you take the integral (or summation, more accurately, since we're dealing with discrete samples) over the whole range of visible light ([360nm, 830nm].) The functions are normalized so that they have equal area under their curves, so an equal energy spectrum (the sampled value for every wavelength is the same) will have {X = Y = Z}. (FWIW, the Munsell Color Lab data are properly normalized, but they sum to 106 and change, for some reason.)
Taking another look at that 3D plot of the xyY space, we notice again that the familiar spectral locus shape seems to be the shape of the volume at {Y = 0}, i.e. where those colors are actually black. This now makes some sort of sense, since they are monochromatic colors, and their spectrums should consist of a single point, and thus when you take the integral over a single point you'll always get 0. However, that then raises the question: how do they have chromaticity at all, since the other two functions should also be 0?
The simplest explanation is that Y at the base of the shape is actually ever-so-slightly greater than zero. The use of sampling means that the spectrums for the monochromatic sources are not taken to be instantaneous values. Instead, they're narrow bands of the spectrum near their wavelengths. You can get arbitrarily close to instantaneous and still expect meaningful chromaticity, within the bounds of precision, so the limit as the sampling bandwidth goes to 0 is the ideal spectral locus, even if it disappears at exactly 0. However, the spectral locus as actually derived is just calculated from the single-sample values for the x̅, y̅, and z̅ color matching functions.
That means that you really just need one set of data—the lookup tables for x̅, y̅, and z̅. The spectral locus can be computed from each wavelength by just dividing x̅(wl) and y̅(wl) by x̅(wl) + y̅(wl) + z̅(wl).
(Image credit: Apple, screenshot from ColorSync Utility)
Sometimes you'll see a plot like this, with a dramatically arcing, rainbow-colored line swooping up and around the plot, and then back down to 0 at the far red end of the spectrum. This is just the y̅ function plotted along the spectral locus, scaled so that y̅ = Y. Note that this is not a contour of the 3D shape of the visible gamut. Such a contour would be well inside the spectral locus through the blue-green range, when plotted in 2 dimensions.
Delineating the visible spectrum in XYZ space
The final question becomes: given these three color matching functions, how do we use them to decide if a given {X, Y, Z} is within the gamut of human color perception?
Useful fact: you can't have luminosity by itself. Any real color will also have a non-zero value for one or both of the other functions. We also know Y by definition has a range of [0, 1], so we're really only talking about figuring whether {X, Z} is valid for a given Y.
Now the question becomes: what spectrums (simplified for our purposes: an array of 471 values, either 0 or 1, for the wavelengths [360nm, 830nm], band width 1nm), when weighted by y̅, will sum to Y?
The XYZ space is additive, like RGB, so any non-monochromatic light is equivalent to a linear combination of monochromatic colors at various intensities. In other words, any point inside of the spectral locus can be created by some combination of points situated exactly on the boundary. If you took the monochromatic CIE RGB primaries and just added up their tristimulus values, you'd get white, and the spectrum of that white would just be the spectrum of the three primaries superimposed, a thin band at the wavelength for each primary.
It follows, then, that every possible combination of monochromatic colors is within the gamut of human vision. However, there's a ton of overlap: different spectrums can produce the same perceived color. This is called metamerism. So, while it might be impractical to enumerate every possible individually perceptible color or spectrums that can produce them, it's actually relatively easy to calculate the overall shape of the space from a trivially enumerable set of spectrums.
What we do is step through the gamut wavelength-by-wavelength, and, for that given wavelength, we iteratively sum ever-larger slices of the spectrum starting from that point, until we either hit our Y target or run out of spectrum. You can picture this as going around a circle, drawing progressively larger arcs from one starting point and plotting the center of the resulting shape—when you get to an arc that is just the full circle, the centers coincide, and you get white, but until then the points you plot will spiral inward from the edge. Repeat that from every point on the circumference, and you'll have points spiraling in along every possible path, covering the gamut. You can actually see this spiraling in effect, sometimes, in 3D color space plots.
In practice, this takes the form of two loops, the outer loop going from 360 to 830, and the inner loop going from 1 to 470. In my implementation, what I did for the inner loop is save the current and last summed values, and once the sum exceeds the target I use the difference to calculate a fractional number of bands and push the outer loop's counter and that interpolated width onto an array, then break out of the inner loop. Interpolating the bands greatly smooths out the curves, especially in the prow.
Once we have the set of spectrums of the right luminance, we can calculate their X and Z values. For that, I have a higher order summation function that gets passed the function to sum and the interval. From there, the shape of the gamut on the chromaticity diagram for that Y is just the path formed by the derived {x, y} coordinates, as this method only enumerates the surface of the gamut, without interior points.
In effect, this is a simpler version of what libraries like the one mentioned in the accepted answer do: they create a 3D mesh via exhaustion of the continuous spectrum space and then interpolate between points to decide if an exact color is inside or outside the gamut. Yes, it's a pretty brute-force method, but it's simple, speedy, and effective enough for demonstrative and visualization purposes. Rendering a 20-step contour plot of the overall shape of the chromaticity space in a browser is effectively instantaneous, for instance, with nearly perfect curves.
There are a couple of places where a lack of precision can't be entirely smoothed over: in particular, two corners near orange are clipped. This is due to the shapes of the lines of partial sums in this region being a combination of (1) almost perfectly horizontal and (2) having a hard cusp at the corner. Since the points exactly at the cusp aren't at nice even values of Y, the flatness of the contours is more a problem because they're perpendicular to the mostly-vertical line of the cusp, so interpolating points to fit any given Y will be most pessimum in this region. Another problem is that the points aren't uniformly distributed, being concentrated very near to the cusp: the clipping of the corner corresponds to situations where an outlying point is interpolated. All these issues can clearly be seen in this plot (rendered with 20nm bins for clarity but, again, more precision doesn't eliminate the issue):
Conclusion
Of course, this is the sort of highly technical and pitfall-prone problem (PPP) that is often best outsourced to a quality 3rd party library. Knowing the basic techniques and science behind it, however, demystifies the entire process and helps us use those libraries effectively, and adapt our solutions as needs change.
You could use Colour and the colour.is_within_visible_spectrum definition:
>>> import numpy as np
>>> is_within_visible_spectrum(np.array([0.3205, 0.4131, 0.51]))
array(True, dtype=bool)
>>> a = np.array([[0.3205, 0.4131, 0.51],
... [-0.0005, 0.0031, 0.001]])
>>> is_within_visible_spectrum(a)
array([ True, False], dtype=bool)
Note that this definition expects CIE XYZ tristimulus values, so you would have to convert your CIE xyY colourspace values to XYZ by using colour.xyY_to_XYZ definition.

BoundingBox Shape

In my Android mapping activity, I have a parallelogram shaped area that I want to tell if points (ie:LatLng) are inside. I've tried using the:
bounds = new LatLngBounds.Builder()
.include(latlngNW)
.include(latlngNE)
.include(latlngSW)
.include(latlngSE)
.build();
and later
if (bounds.contains(currentLatLng) {
.....
}
but it is not that accurate. Do I need to create equations for lines connecting the four corners?
Thanks in advance.
The LatLngBounds appears to create a box from the points included. Given the shape that I'm trying to monitor is a parallelogram, you do need to create equations for each of the edges of the shape and use if statements to determine which side of the line a point lies.
Not an easy solution!
If you wish to build a parallelogram-shaped bounding "box" from a collection of points, and you know the desired angles of the parallelogram's sides, your best bet is to probably define a 2d linear shear transform which will one of those angles to horizontal, and the other to vertical. One may then feed the transformed points into normal "bounding box" routines, and feed the corners of the resulting box through the inverse of the above transform to get a bounding parallelogram.
Note that this approach is generally only suitable for parallelograms, not trapezoids. There are a few special cases where it could be used to find bounding trapezoids [e.g. if the top and bottom were horizontal, and the sides were supposed to converge at a known point (x0-y0), one could map x' = (x-x0)/(y-y0)] but for many kinds of trapezoids, the trapezoid formed by inverse mapping the corners of a horizontal/vertical bounding rectangle may not properly bound the points that are supposed to be within it.

How to draw a curve that goes both up and down?

I've used the following code to make 3 points, draw them to a bitmap, then draw the bitmap to the main form, however it seems to always draw point 3 before point 2, because the Y co-ordinate is lower then point 2's. Is there a way to get over this, as I need a curve that curves up and down, rather than just up
Bitmap bit = new Bitmap(490, 490);
Graphics g = Graphics.FromImage(bit);
Graphics form = this.CreateGraphics();
pntPoints[0] = this.pictureBox1.Location;
pntPoints[1] = new Point(100,300);
pntPoints[2] = new Point(200, 150);
g.DrawCurve(p, pntPoints);
form.DrawImage(bit, 0, 5);
bit.Dispose();
g.Dispose();
Y-coordinate for point 3 is not lower, it's actually higher. The (0;0) point of Graphics is in the left top corner, and the Y value increases from the top down rather than from the bottom up. So a point (0;100) will be higher than (0;200) on the result image.
If you want a curve that goes up then down, you should place your first point in (0; 489), your second point in (100, 190) and your third point in (200, 340).
I recommend you put in a debug function that will mark and identify the points themselves, so you can see exactly where they are. A pixel in an off color, the index of the point, and the coordinates together will help you identify what is going where.
Now, I'm wondering, are those two points really supposed to be absolute, or are they supposed to be relative to the first point?

Algorithm for Polygon Image Fill

I want an efficient algorithm to fill polygon with an Image, I want to fill an Image into Trapezoid. currently I am doing it in two steps
1) First Perform StretchBlt on Image,
2) Perform Column by Column vertical StretchBlt,
Is there any better method to implement this? Is there any Generic and Fast algorithm which can fill any polygon?
Thanks,
Sunny
I can't help you with the distortion part, but filling polygons is pretty simple, especially if they are convex.
For each Y scan line have a table indexed by Y, containing a minX and maxX.
For each edge, run a DDA line-drawing algorithm, and use it to fill in the table entries.
For each Y line, now you have a minX and maxX, so you can just fill that segment of the scan line.
The hard part is a mental trick - do not think of coordinates as specifying pixels. Think of coordinates as lying between the pixels. In other words, if you have a rectangle going from point 0,0 to point 2,2, it should light up 4 pixels, not 9. Most problems with polygon-filling revolve around this issue.
ADDED: OK, it sounds like what you're really asking is how to stretch the image to a non-rectangular shape (but trapezoidal). I would do it in terms of parameters s and t, going from 0 to 1. In other words, a location in the original rectangle is (x + w0*s, y + h0*t). Then define a function such that s and t also map to positions in the trapezoid, such as ((x+t*a) + w0*s*(t-1) + w1*s*t, y + h1*t). This defines a coordinate mapping between the two shapes. Then just scan x and y, converting to s and t, and mapping points from one to the other. You probably want to have a little smoothing filter rather than a direct copy.
ADDED to try to give a better explanation:
I'm supposing both your rectangle and trapezoid have top and bottom edges parallel with the X axis. The lower-left corner of the rectangle is <x0,y0>, and the lower-left corner of the trapezoid is <x1,y1>. I assume the rectangle's width and height are <w,h>.
For the trapezoid, I assume it has height h1, and that it's lower width is w0, while it's upper width is w1. I assume it's left edge "slants" by a distance a, so that the position of its upper-left corner is <x1+a, y1+h1>. Now suppose you iterate <x,y> over the rectangle. At each point, compute s = (x-x0)/w, and t = (y-y0)/h, which are both in the range 0 to 1. (I'll let you figure out how to do that without using floating point.) Then convert that to a coordinate in the trapezoid, as xt = ((x1 + t*a) + s*(w0*(1-t) + w1*t)), and yt = y1 + h1*t. Then <xt,yt> is the point in the trapezoid corresponding to <x,y> in the rectangle. Now I'll let you figure out how to do the copying :-) Good luck.
P.S. And please don't forget - coordinates fall between pixels, not on them.
Would it be feasible to sidestep the problem and use OpenGL to do this for you? OpenGL can render to memory contexts and if you can take advantage of any hardware acceleration by doing this that'll completely dwarf any code tweaks you can make on the CPU (although on some older cards memory context rendering may not be able to take advantage of the hardware).
If you want to do this completely in software MESA may be an option.

Resources