How can I translate an image with subpixel accuracy? - graphics

I have a system that requires moving an image on the screen. I am currently using a png and just placing it at the desired screen coordinates.
Because of a combination of the screen resolution and the required frame rate, some frames are identical because the image has not yet moved a full pixel. Unfortunately, the resolution of the screen is not negotiable.
I have a general understanding of how sub-pixel rendering works to smooth out edges but I have been unable to find a resource (if it exists) as to how I can use shading to translate an image by less than a single pixel.
Ideally, this would be usable with any image but if it was only possible with a simple shape like a circle or a ring, that would also be acceptable.

Sub-pixel interpolation is relatively simple. Typically you apply what amounts to an all-pass filter with a constant phase shift, where the phase shift corresponds to the required sub-pixel image shift. Depending on the required image quality you might use e.g. a 5 point Lanczos or other windowed sinc function and then apply this in one or both axes depending on whether you want an X shift or a Y shift or both.
E.g. for a 0.5 pixel shift the coefficients might be [ 0.06645, 0.18965, 0.27713, 0.27713, 0.18965 ]. (Note that the coefficients are normalised, i.e. their sum is equal to 1.0.)
To generate a horizontal shift you would convolve these coefficients with the pixels from x - 2 to x + 2, e.g.
const float kCoeffs[5] = { 0.06645f, 0.18965f, 0.27713f, 0.27713f, 0.18965f };
for (y = 0; y < height; ++y) // for each row
for (x = 2; x < width - 2; ++x) // for each col (apart from 2 pixel border)
{
float p = 0.0f; // convolve pixel with Lanczos coeffs
for (dx = -2; dx <= 2; ++dx)
p += in[y][x + dx] * kCoeffs[dx + 2];
out[y][x] = p; // store interpolated pixel
}

Conceptually, the operation is very simple. First you scale up the image (using any method of interpolation, as you like), then you translate the result, and finally you subsample down to the original image size.
The scale factor depends on the precision of sub-pixel translation you want to do. If you want to translate by 0.5 degrees, you need scale up the original image by a factor of 2 then you translate the resulting image by 1 pixel; if you want to translate by 0.25 degrees, you need to scale up by a factor of 4, and so on.
Note that this implementation is not efficient because when you scale up you end up calculating pixel values that you won't actually use because they're just dropped when you subsample back to the original image size. The implementation in Paul's answer is more efficient.

Related

Pixel space depth offset in vertex shader

I'm trying to draw simple scaled points in my custom graphics engine. The points are scaled in pixel space, and the radius of the points are in pixels, but the position of the points fed to the draw function are in world coordinates.
So far, everything is working great, except for a depth clipping issue. The points are of constant size, regardless of how far away they are, which is done by offsetting the vertices in projected/clip space. However, when they are close to surfaces, they partially intersect them in the depth buffer.
Since these points represent world coordinates, I want them to use the depth buffer, and be hidden behind objects that are in front of them. However, when the point is close to a surface, I want to push it toward the camera, so it doesn't partially intersect it. I think it is easier to just always do this push, regardless of the point being close to a surface. What makes the most sense to me is to just push it by its radius, so that all of its vertices are exactly far enough away to avoid clipping into nearby surfaces.
The easiest way I've found to do this is to simply subtract from the Z value in the vertex shader, after transforming into view-projection space. However, I'm having some trouble converting my pixel radius into a depth offset. Regardless of the math I use, what works close up never seems to work far away. I'm thinking maybe this is due to how the z buffer is non-linear, but could be wrong.
Currently, the closest I've been to solving this is the following:
proj_vertex_pos.z -= point_pixel_radius / proj_vertex_pos.w * 100.0
I'm honestly not sure why 100.0 helps make this work yet. I added it simply because dividing the radius by w was too small of a value. Can anyone point me in the right direction? How do I convert my pixel distance into a depth distance? Especially if the depth distance changes scale depending on which depth you are at? Or am I just way off?
The solution was to convert my pixel space radius into world space units, since the z-buffer is still in world space, even after transforming by the view-projection transform. This can be done by converting pixels into a factor (factor = pixels / screen_size), then convert the factor into world space units, which was a little more involved - I had to calculate the world-space size of the screen at a given distance, then multiply the factor by that to get world units. I can post the related code if anyone needs it. There's probably a simpler way to calculate it, but my brain always goes straight for factors.
The reason I was getting different results at different distances was mainly because I was only offsetting the z component of the clip position by the result. It's also necessary to offset the w component, to make the depth offset work at any distance (linear). However, in order to offset the w component, you first have to scale xy by w, modify w as needed, then divide xy by the new w. This resulted in making the math pretty involved, so I changed the strategy to offset the vertex before clip space, which requires calculating the distance to the camera in Z space manually, but it honestly ended up being about the same amount of math either way.
Here is the final vertex shader at the moment. Hopefully the global values make sense. I did not modify this to post it, so please forgive any sillyness in my comments. EDIT: I had to make some edits to this, because I was accidentally moving the vertex along the camera-Z direction instead of directly toward the camera:
lerpPoint main(vinBake vin)
{
// prepare output
lerpPoint pin;
// extract radius/size from input
pin.InRadius = vin.TexCoord.y;
// compute offset from vertex to camera
float3 to_cam_offset = Scene.CamPos - vin.Position.xyz;
// compute the Z distance of the camera from the vertex
float cam_z_dist = -dot( Scene.CamZ, to_cam_offset );
// compute the radius factor
// + this describes what percentage of the screen is covered by our radius
// + this removes it from pixel space into factor-space
float radius_fac = Scene.InvScreenRes.x * pin.InRadius;
// compute world-space radius by scaling with FieldFactor
// + FieldFactor.x represents the world-space-width of the camera view at whatever distance we scale it by
// + here, we scale FieldFactor.x by the camera z distance, which gives us the world radius, in world units
// + we must multiply by 2 because FieldFactor.x only represents HALF of the screen
float radius_world = radius_fac * Scene.FieldFactor.x * cam_z_dist * 2.0;
// finally, push the vertex toward the camera by the world radius
// + note: moving by radius will only work with surfaces facing the camera, since we are moving toward the camera, rather than away from the surface
// + because of this, we also multiply by another 4, to compensate for nearby surface angles, but there is no scale that would work for every angle
float3 offset = normalize(to_cam_offset) * (radius_world * -4.0);
// generate projected position
// + after this, x=-1 is left, x=+1 is right, y=-1 is bottom, and y=+1 is top of screen
// + note that after this transform, w represents "distance from camera", and z represents "distance from near plane", both in world space
pin.ClipPos = mul( Scene.ViewProj, float4( vin.Position.xyz + offset, 1.0) );
// calculate radius of point, in clip space from our radius factor
// + we scale by 2 to convert pixel radius into clip-radius
float clip_radius = radius_fac * 2.0 * pin.ClipPos.w;
// compute scaled clip-space offset and apply it to our clip-position
// + vin.Prop.xy: -1,-1 = bottom-left, -1,1 = top left, 1,-1 = bottom right, 1,1 = top right (note: in clip-space, +1 = top, -1 = bottom)
// + we scale by clipping depth (part of clip_radius) to retain constant scale, but this will give us a VERY LARGE result
// + we scale by inverter resolution (clip_radius) to convert our input screen scale (eg, 1->1024) into a clip scale (eg, 0.001 to 1.0 )
pin.ClipPos.x += vin.Prop.x * clip_radius;
pin.ClipPos.y += vin.Prop.y * clip_radius * Scene.Aspect;
// return result
return pin;
}
Here is the other version that offsets z & w instead of changing things in world space. After edits above, this is probably the more optimal solution:
lerpPoint main(vinBake vin)
{
// prepare output
lerpPoint pin;
// extract radius/size from input
pin.InRadius = vin.TexCoord.y;
// generate projected position
// + after this, x=-1 is left, x=+1 is right, y=-1 is bottom, and y=+1 is top of screen
// + note that after this transform, w represents "distance from camera", and z represents "distance from near plane", both in world space
pin.ClipPos = mul( Scene.ViewProj, float4( vin.Position.xyz, 1.0) );
// compute the radius factor
// + this describes what percentage of the screen is covered by our radius
// + this removes it from pixel space into factor-space
float radius_fac = Scene.InvScreenRes.x * pin.InRadius;
// compute world-space radius by scaling with FieldFactor
// + FieldFactor.x represents the world-space-width of the camera view at whatever distance we scale it by
// + here, we scale FieldFactor.x by the camera z distance, which gives us the world radius, in world units
// + we must multiply by 2 because FieldFactor.x only represents HALF of the screen
float radius_world = radius_fac * Scene.FieldFactor.x * pin.ClipPos.w * 2.0;
// offset depth by our world radius
// + we scale this extra to compensate for surfaces with high angles relative to the camera (since we are moving directly at it)
// + notice we have to make the perspective divide before modifying w, then re-apply it after, or xy will be off
pin.ClipPos.xy /= pin.ClipPos.w;
pin.ClipPos.z -= radius_world * 10.0;
pin.ClipPos.w -= radius_world * 10.0;
pin.ClipPos.xy *= pin.ClipPos.w;
// calculate radius of point, in clip space from our radius factor
// + we scale by 2 to convert pixel radius into clip-radius
float clip_radius = radius_fac * 2.0 * pin.ClipPos.w;
// compute scaled clip-space offset and apply it to our clip-position
// + vin.Prop.xy: -1,-1 = bottom-left, -1,1 = top left, 1,-1 = bottom right, 1,1 = top right (note: in clip-space, +1 = top, -1 = bottom)
// + we scale by clipping depth (part of clip_radius) to retain constant scale, but this will give us a VERY LARGE result
// + we scale by inverter resolution (clip_radius) to convert our input screen scale (eg, 1->1024) into a clip scale (eg, 0.001 to 1.0 )
pin.ClipPos.x += vin.Prop.x * clip_radius;
pin.ClipPos.y += vin.Prop.y * clip_radius * Scene.Aspect;
// return result
return pin;
}

Homogeneous coordinates and perspective-correctness?

Does the technique that vulkan uses (and I assume other graphics libraries too) to interpolate vertex attributes in a perspective-correct manner require that the vertex shader must normalize the homogenous camera-space vertex position (ie: divide through by the w-coordinate such that the w-coordinate is 1.0) prior to multiplication by a typical projection matrix of the form...
g/s 0 0 0
0 g 0 n
0 0 f/(f-n) -nf/(f-n)
0 0 1 0
...in order for perspective-correctness to work properly?
Or, will perspective-correctness continue to work on any homogeneous vertex position in camera-space (with a w-coordinate other than 1.0)?
(I didn't completely follow the perspective-correctness math, so it is unclear which to me which is the case.)
Update:
In order to clarify terminology:
vec4 modelCoordinates = vec4(x_in, y_in, z_in, 1);
mat4 modelToWorld = ...;
vec4 worldCoordinates = modelToWorld * modelCoordinates;
mat4 worldToCamera = ...;
vec4 cameraCoordinates = worldToCamera * worldCoordinates;
mat4 cameraToProjection = ...;
vec4 clipCoordinates = cameraToProjection * cameraCoordinates;
output(clipCoordinates);
cameraToProjection is a matrix like the one shown in the question
The question is does cameraCoordinates.w have to be 1.0?
And consequently the last row of both the modelToWorld and worldToCamera matricies have to be 0 0 0 1?
You have this exactly backwards. Doing the perspective divide in the shader is what prevents perspective-correct interpolation. The rasterizer needs the perspective information provided by the W component to do its job. With a W of 1, the interpolation is done in window space, without any regard to perspective.
Provide a clip-space coordinate to the output of your vertex processing stage, and let the system do what it exists to do.
the vertex shader must normalize the homogenous camera-space vertex position (ie: divide through by the w-coordinate such that the w-coordinate is 1.0) prior to multiplication by a typical projection matrix of the form...
If your camera-space vertex position does not have a W of 1.0, then one of two things has happened:
You are deliberately operating in a post-projection world space or some similar construct. This is a perfectly valid thing to do, and the math for a camera space can be perfectly reasonable.
Your code is broken somewhere. That is, you intend for your world and camera space to be a normal, Euclidean, non-homogeneous space, but somehow the math didn't work out. Obviously, this is not a perfectly valid thing to do.
In both cases, dividing by W is the wrong thing to do. If your world space that you're placing a camera into is post-projection (such as in this example), dividing by W will break your perspective-correct interpolation, as outlined above. If your code is broken, dividing by W will merely mask the actual problem; better to fix your code than to hide the bug, as it may crop up elsewhere.
To see whether or not the camera coordinates need to be in normal form, let's represent the camera coordinates as multiples of w, so they are (wx,wy,wz,w).
Multiplying through by the given projection matrix, we get the clip coordinates (wxg/s, wyg, fwz/(f-n)-nfw/(f-n)), wz)
Calculating the x-y framebuffer coordinates as per the fixed Vulkan formula we get (P_x * xg/sz +O_x, P_y * Hgy/z + O_y). Notice this does not depend on w, so the position in the framebuffer of a polygons verticies doesn't require the camera coordinates be in normal form.
Likewise calculation of the barycentric coordinates of fragments within a polygon only depends on x,y in framebuffer coordinates, and so is also independant of w.
However perspective-correct perspective interpolation of fragment attributes does depend on W_clip of the verticies as this is used in the formula given in the Vulkan spec. As shown above W_clip is wz which does depend on w and scales with it, so we can conclude that camera coordinates must be in normal form (their w must be 1.0)

detecting lane lines on a binary mask

I have a binary mask of a road, the mask is a little irregular(sometimes even more than depicted in the image).
I have tried houghLine in OpenCV to detect boundary lines, but the boundary lines are not as expected. I tried erosion and dilation to smooth out things, but no luck. Also since the path is curved it becomes even difficult to detect boundary lines using houghLines. How can I modify the code to detect lines better?
img2=cv2.erode(img2,None,iterations=2)
img2=cv2.dilate(img2,None,iterations=2)
can=cv2.Canny(img2,150,50)
lines=cv2.HoughLinesP(can,1,np.pi/180,50,maxLineGap=50,minLineLength=10)
if(lines is not None):
for x in lines:
#print(lines[0])
#mask=np.zeros(frame2.shape,dtype=np.uint8)
#roi=lines
#cv2.fillPoly(mask,roi,(255,255,255))
#cv2.imshow(mask)
for x1,y1,x2,y2 in x:
cv2.line(frame2,(x1,y1),(x2,y2),(255,0,0),2)
You say that Hough is failing but you don't say why. Why is your output "not as expected"? In my experience, Hough Line Detection’s critical points are two: 1) The edges mask you pass to it and 2) how you filter the resulting lines. You should be fine-tuning those two steps and Hough should be enough for your problem.
I don't know what kind of problems the line detector is giving you, but suppose you are interested (as your question suggests) in other methods for lane detection. There are at least two things you could try: 1) Bird's eye transform of the road – which makes line detection much easier since all your lines are now parallel lines. And 2) Contour detection (instead of lines).
Let's examine 2 and what kind of results you can obtain. Listen, man, I offer my answer in C++, but I make notes along with it. I try to highlight the important ideas, so you can implement them in your language of choice. However, if all you want is a CTRL+C and CTRL+V solution, that's ok, but this answer won't help you.
Ok, let's start by reading the image and converting it to binary. Our goal here is to first obtain the edges. Pretty standard stuff:
//Read input image:
std::string imagePath = "C://opencvImages//lanesMask.png";
cv::Mat testImage = cv::imread( imagePath );
//Convert BGR to Gray:
cv::Mat grayImage;
cv::cvtColor( testImage, grayImage, cv::COLOR_RGB2GRAY );
//Get binary image via Otsu:
cv::Mat binaryImage;
cv::threshold( grayImage, binaryImage, 0, 255, cv::THRESH_OTSU );
Now, simply pass this image to Canny's Edge detector. The parameters are also pretty standard. As per Canny's documentation, the ratios between lower and upper thresholds are related by a factor of 3:
//Get Edges via Canny:
cv::Mat testEdges;
//Setup lower and upper thresholds for edge detection:
float lowerThreshold = 30;
float upperThreshold = 3 * lowerThreshold;
cv::Canny( binaryImage, testEdges, lowerThreshold, upperThreshold );
Your mask is pretty good; these are the edges Canny finds:
Now, here's where we are trying something different. We won't use Hough's line detection, instead, let's find the contours of the mask. Each contour is made of points. What we are looking for is actually lines, straight lines that can be fitted to these points. There's more than a method for achieving that. I propose K-means, a clustering algorithm.
The idea is that the points, as you can see, can be clustered in 4 groups: The vanishing point of the lanes (those should be 2 endpoints there) and the 2 starting points of the road. If we give K-means the points of the contour and tell it to cluster the data in 4 separate groups, we should get the means (location) of those 4 points.
Let's try it out. The first step is to find the contours in the edges mask:
//Get contours:
std::vector< std::vector<cv::Point> > contours;
std::vector< cv::Vec4i > hierarchy;
cv::findContours( testEdges, contours, hierarchy, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, cv::Point(0, 0) );
K-means needs a specific data type on its input. I'll use a cv::Point2f vector to store all the contour points. Let's set up the variables used by K-means:
//Set up the data containers used by K-means:
cv::Mat centers; cv::Mat labels;
std::vector<cv::Point2f> points; //the data for clustering is stored here
Next, let's loop through the contours and store each point inside the Point2f vector, so we can further pass it to K-means. Let’s use the loop to also draw the contours and make sure we are not messing things up:
//Loop thru the found contours:
for( int i = 0; i < (int)contours.size(); i++ ){
//Set a color & draw contours:
cv::Scalar color = cv::Scalar( 0, 256, 0 );
cv::drawContours( testImage, contours, i, color, 2, 8, hierarchy, 0, cv::Point() );
//This is the current vector of points that is being processed:
std::vector<cv::Point> currentVecPoint = contours[i];
//Loop thru it and store each point as a float point inside a plain vector:
for(int k = 0; k < (int)currentVecPoint.size(); k++){
cv::Point currentPoint = currentVecPoint[k];
//Push (store) the point into the vector:
points.push_back( currentPoint );
}
}
These are the contours found:
There, now, I have the contour points in my vector. Let's pass the info on to K-means:
//Setup K-means:
int clusterCount = 4; //Number of clusters to split the set by
int attempts = 5; //Number of times the algorithm is executed using different initial labels
int flags = cv::KMEANS_PP_CENTERS;
cv::TermCriteria criteria = cv::TermCriteria( CV_TERMCRIT_ITER | CV_TERMCRIT_EPS, 10, 0.01 );
//The call to kmeans:
cv::kmeans( points, clusterCount, labels, criteria, attempts, flags, centers );
And that's all. The result of K-means is in the centers matrix. Each row of the matrix should have 2 columns, denoting a point center. In this case, the matrix is of size 4 x 2. Let's draw that info:
As expected, 4 center points, each is the mean of a cluster. Very cool, now, is this approximation enough for your application? Only you know that! You could work with those points and extend both lines, but that's a possible improvement of this result.

What is the focal length and image plane distance from this raytracing formula

I have a 4x4 camera matrix comprised of right, up, forward and position vectors.
I raytrace the scene with the following code that I found in a tutorial but don't really entirely understand it:
for (int i = 0; i < m_imageSize.width; ++i)
{
for (int j = 0; j < m_imageSize.height; ++j)
{
u = (i + .5f) / (float)(m_imageSize.width - 1) - .5f;
v = (m_imageSize.height - 1 - j + .5f) / (float)(m_imageSize.height - 1) - .5f;
Ray ray(cameraPosition, normalize(u*cameraRight + v*cameraUp + 1 / tanf(m_verticalFovAngleRadian) *cameraForward));
I have a couple of questions:
How can I find the focal length of my raytracing camera?
Where is my image plane?
Why cameraForward needs to be multiplied with this 1/tanf(m_verticalFovAngleRadian)?
Focal length is a property of lens systems. The camera model that this code uses, however, is a pinhole camera, which does not use lenses at all. So, strictly speaking, the camera does not really have a focal length. The corresponding optical properties are instead expressed as the field of view (the angle that the camera can observe; usually the vertical one). You could calculate the focal length of a camera that has an equivalent field of view with the following formula (see Wikipedia):
FOV = 2 * arctan (x / 2f)
FOV diagonal field of view
x diagonal of film; by convention 24x36 mm -> x=43.266 mm
f focal length
There is no unique image plane. Any plane that is perpendicular to the view direction can be seen as the image plane. In fact, the projected images differ only in their scale.
For your last question, let's take a closer look at the code:
u = (i + .5f) / (float)(m_imageSize.width - 1) - .5f;
v = (m_imageSize.height - 1 - j + .5f) / (float)(m_imageSize.height - 1) - .5f;
These formulas calculate u/v coordinates between -0.5 and 0.5 for every pixel, assuming that the entire image fits in the box between -0.5 and 0.5.
u*cameraRight + v*cameraUp
... is just placing the x/y coordinates of the ray on the pixel.
... + 1 / tanf(m_verticalFovAngleRadian) *cameraForward
... is defining the depth component of the ray and ultimately the depth of the image plane you are using. Basically, this is making the ray steeper or shallower. Assume that you have a very small field of view, then 1/tan(fov) is a very large number. So, the image plane is very far away, which produces exactly this small field of view (when keeping the size of the image plane constant since you already set the x/y components). On the other hand, if the field of view is large, the image plane moves closer. Note that this notion of image plane is only conceptual. As I said, all other image planes are equally valid and would produce the same image. Another way (and maybe a more intuitive one) to specify the ray would be
u * tanf(m_verticalFovAngleRadian) * cameraRight
+ v * tanf(m_verticalFovAngleRadian) * cameraUp
+ 1 * cameraForward));
As you see, this is exactly the same ray (just scaled). The idea here is to set the conceptual image plane to a depth of 1 and scale the x/y components to adapt the size of the image plane. tan(fov) (with fov being the half field of view) is exactly the size of the half image plane at a depth of 1. Just draw a triangle to verify that. Note that this code is only able to produce square image planes. If you want to allow rectangular ones, you need to take into account the ratio of the side lengths.

svg feGaussianBlur: correlation between stdDeviation and size

When I blur an object in Inkscape by let's say 10%, it get's a filter with a feGaussionBlur with a stdDeviation of 10% * size / 2.
However the filter has a size of 124% (it is actually that big, Inkscape doesn't add a bit just to be on the safe-side).
Where does this number come from? My guess would be 100% + 2.4 * (2*stdDeviation/size), but then where does this 2.4 come from?
From the SVG 1.1 spec:
This filter primitive performs a Gaussian blur on the input image.
The Gaussian blur kernel is an approximation of the normalized convolution:
G(x,y) = H(x)I(y)
where
H(x) = exp(-x2/ (2s2)) / sqrt(2* pis2)
and
I(y) = exp(-y2/ (2t2)) / sqrt(2 pi*t2)
with 's' being the standard deviation in the x direction and 't' being the standard deviation in the y direction, as specified by ‘stdDeviation’.
The value of ‘stdDeviation’ can be either one or two numbers. If two numbers are provided, the first number represents a standard deviation value along the x-axis of the current coordinate system and the second value represents a standard deviation in Y. If one number is provided, then that value is used for both X and Y.
Even if only one value is provided for ‘stdDeviation’, this can be implemented as a separable convolution.
For larger values of 's' (s >= 2.0), an approximation can be used: Three successive box-blurs build a piece-wise quadratic convolution kernel, which approximates the Gaussian kernel to within roughly 3%.
let d = floor(s * 3*sqrt(2*pi)/4 + 0.5)
... if d is odd, use three box-blurs of size 'd', centered on the output pixel.
... if d is even, two box-blurs of size 'd' (the first one centered on the pixel boundary between the output pixel and the one to the left, the second one centered on the pixel boundary between the output pixel and the one to the right) and one box blur of size 'd+1' centered on the output pixel.
Note: the approximation formula also applies correspondingly to 't'.*

Resources