I have been trying to better match my SSAO to Crytek's original implementation. Their Finding Next Gen - CryEngine 2 document states
To reduce the sample count we vary the sample position for nearby
pixels. The initial sample positions are distributed around the origin
in a sphere and the variation is achieved by reflecting the sample
positions on a random 3D plane through the origin
I have implemented this as the following: Full Shader
vec3 plane = texture(texNoise, uvCoords * noiseScale).xyz - vec3(1.0);
for(int i = 0; i < ssao.sample_amount; ++i) {
vec3 samplePos = reflect(TBN * ssao.samples[i].xyz, plane);
samplePos = viewSpacePositions + samplePos * radius;
/....
}
My result appears far from the original Crytek SSAO implementation with an overall odd look and occlusion seemingly only appearing on the right side.
Related
I've been trying to learn screen-space techniques, specifically Ray-marching ones but I have been struggling to get a single working example to continue learning from and solidify my knowledge. I'm implementing Screen-space shadows following this article but my result just seems to be a white image and I cannot seem to understand why. The code makes sense to me but the result does not seem to be right. I can't seem to understand where I might have gone wrong while attempting this screen-space ray-marching technique and would appreciate any insight to that will help me continue learning.
Using Vulkan + GLSL
Full shader: screen_space_shadows.glsl
// calculate screen space shadows
float computeScreenSpaceShadow()
{
vec3 FragPos = texture(gPosition, uvCoords).rgb;
vec4 ViewSpaceLightPosition = camera.view * light.LightPosition;
vec3 LightDirection = ViewSpaceLightPosition.xyz - FragPos.xyz;
// Ray position and direction in view-space.
vec3 RayPos = texture(gPosition, uvCoords).xyz; // ray start position
vec3 RayDirection = normalize(-LightDirection.xyz);
// Save original depth of the position
float DepthOriginal = RayPos.z;
// Ray step
vec3 RayStep = RayDirection * STEP_LENGTH;
float occlusion = 0.0;
for(uint i = 0; i < MAX_STEPS; i++)
{
RayPos += RayStep;
vec2 Ray_UV = ViewToScreen(RayPos);
// Make sure the UV is inside screen-space
if(!ValidRay(Ray_UV)){
return 1.0;
}
// Compute difference between ray and cameras depth
float DepthZ = linearize_depth(texture(depthMap, Ray_UV).x);
float DepthDelta = RayPos.z - DepthZ;
// Check if camera cannot see the ray. Ray depth must be larger than camera depth = positive delta
bool canCameraSeeRay = (DepthDelta > 0.0) && (DepthDelta < THICKNESS);
bool occludedByOriginalPixel = abs(RayPos.z - DepthOriginal) < MAX_DELTA_FROM_ORIGINAL_DEPTH;
if(canCameraSeeRay && occludedByOriginalPixel)
{
// Mark as occluded
occlusion = 1.0;
break;
}
}
return 1.0 - occlusion;
}
Output
I am writing a spatial shader in godot to pixelate an object.
Previously, I tried to write outside of an object, however that is only possible in CanvasItem shaders, and now I am going back to 3D shaders due rendering annoyances (I am unable to selectively hide items without using the culling mask, which being limited to 20 layers is not an extensible solution.)
My naive approach:
Define a pixel "cell" resolution (ie. 3x3 real pixels)
For each fragment:
If the entire "cell" of real pixels is within the models draw bounds, color the current pixel as per the lower-left (where the pixel that has coordinates that are the multiple of the cell resolution).
If any pixel of the current "cell" is out of the draw bounds, set alpha to 1 to erase the entire cell.
psuedo-code for people asking for code of the likely non-existant functionality that I am seeking:
int cell_size = 3;
fragment {
// check within a cell to see if all pixels are part of the object being drawn to
for (int y = 0; y < cell_size; y++) {
for (int x = 0; x < cell_size; x++) {
int erase_pixel = 0;
if ( uv_in_model(vec2(FRAGCOORD.x - (FRAGCOORD.x % x), FRAGCOORD.y - (FRAGCOORD.y % y))) == false) {
int erase_pixel = 1;
}
}
}
albedo.a = erase_pixel
}
tl;dr, is it possible to know if any given point will be called by the fragment function?
On your object's material there should be a property called Next Pass. Add a new Spatial Material in this section, open up flags and check transparent and unshaded, and then right-click it to bring up the option to convert it to a Shader Material.
Now, open up the new Shader Material's Shader. The last process should have created a Shader formatted with a fragment() function containing the line vec4 albedo_tex = texture(texture_albedo, base_uv);
In this line, you can replace "texture_albedo" with "SCREEN_TEXTURE" and "base_uv" with "SCREEN_UV". This should make the new shader look like nothing has changed, because the next pass material is just sampling the screen from the last pass.
Above that, make a variable called something along the lines of "pixelated" and set it to the following expression:
vec2 pixelated = floor(SCREEN_UV * scale) / scale; where scale is a float or vec2 containing the pixel size. Finally replace SCREEN_UV in the albedo_tex definition with pixelated.
After this, you can have a float depth which samples DEPTH_TEXTURE with pixelated like this:
float depth = texture(DEPTH_TEXTURE, pixelated).r;
This depth value will be very large for pixels that are just trying to render the background onto your object. So, add a conditional statement:
if (depth > 100000.0f) { ALPHA = 0.0f; }
As long as the flags on this new next pass shader were set correctly (transparent and unshaded) you should have a quick-and-dirty pixelator. I say this because it has some minor artifacts around the edges, but you can make scale a uniform variable and set it from the editor and scripts, so I think it works nicely.
"Testing if a pixel is modifiable" in your case means testing if the object should be rendering it at all with that depth conditional.
Here's the full shader with my modifications from the comments
// NOTE: Shader automatically converted from Godot Engine 3.4.stable's SpatialMaterial.
shader_type spatial;
render_mode blend_mix,depth_draw_opaque,cull_back,unshaded;
//the size of pixelated blocks on the screen relative to pixels
uniform int scale;
void vertex() {
}
//vec2 representation of one used for calculation
const vec2 one = vec2(1.0f, 1.0f);
void fragment() {
//scale SCREEN_UV up to the size of the viewport over the pixelation scale
//assure scale is a multiple of 2 to avoid artefacts
vec2 pixel_scale = VIEWPORT_SIZE / float(scale * 2);
vec2 pixelated = SCREEN_UV * pixel_scale;
//truncate the decimal place from the pixelated uvs and then shift them over by half a pixel
pixelated = pixelated - mod(pixelated, one) + one / 2.0f;
//scale the pixelated uvs back down to the screen
pixelated /= pixel_scale;
vec4 albedo_tex = texture(SCREEN_TEXTURE,pixelated);
ALBEDO = albedo_tex.rgb;
ALPHA = 1.0f;
float depth = texture(DEPTH_TEXTURE, pixelated).r;
if (depth > 10000.0f)
{
ALPHA = 0.0f;
}
}
I have been trying to implement SSAO following the LearnOpenGL implementation. In their implementation they have utilized the positions g-buffer to obtain the sample positions depth value and I am wondering how I could go about using the depth buffer instead since I have this ready to use, in order to retrieve the depth value instead of using the positions g-buffer. I have shown the LearnOpenGL implementation using the positions texture and the attempt I made at using the depth buffer. I think I might be missing a required step for utilizing the depth buffer but I am unsure.
[LearnOpenGL SSAO][1]
Using Positions g-buffer
layout(binding = 7) uniform sampler2D positionsTexture;
layout(binding = 6) uniform sampler2D depthMap;
// ...
vec4 offset = vec4(samplePos, 1.0);
offset = camera.proj * offset; //transform sample to clip space
offset.xyz /= offset.w; // perspective divide
offset.xyz = offset.xyz * 0.5 + 0.5; // transform to range 0-1
float sampleDepth = texture(positionsTexture, offset.xy).z;
I want to use the depth buffer instead. This approach did not seem to work for me
float sampleDepth = texture(depthMap, offset.xy).x;
Update: 8/01
I have implemented a linerazation function for the depth result. Still unable to obtain the right result. Am I missing something more?
float linearize_depth(float d,float zNear,float zFar)
{
return zNear * zFar / (zFar + d * (zNear - zFar));
}
float sampleDepth = linearize_depth(texture(depthMap, offset.xy).z, zNear, zFar);
Consider a simple depth-of-field filter (my actual use case is similar). It loops over the image and scatters every pixel over a circular neighborhood of its. The radius of the neighborhood depends on the depth of the pixel - the closer the it is to the focal plane, the smaller the radius.
Note that I said "scatters" and not "gathers". In simpler image processing applications, you normally use the "gather" technique to perform an uniform Gaussian blur. IOW, you loop over the neighborhood of each pixel, and "gather" the nearby values into a weighted average. This works fine in that case, but if you make the blur kernel vary between pixels, while still using "gathering", you'll get a somewhat unrealistic effect. Such "space-variant filtering" scenarios are where "scattering" is different from "gathering".
To be clear: the scatter algo is something like this:
init resultImage to black
loop over sourceImage
var c = fetch current pixel from sourceImage
var toAdd = c * weight // weight < 1
loop over circular neighbourhood of current sourcepixel
add toAdd to current neighbor from resultImage
My question is: if I do a direct translation of this pseudocode to OpenCL, will there be synchronization issues due to different work-items simultaneously writing to the same output pixel?
Does the answer vary depending on whether I'm using Buffers or Images?
The course I'm reading suggests that there will be synchronization issues. But OTOH I read the source of Mandelbulber 1.21-2, which does a straightforward OpenCL DOF just like my above pseudocode, and it seems to work fine.
(the relevant code is in mandelbulber-opencl-1.21-2.orig/usr/share/cl/cl_DOF.cl and it's as follows)
//*********************************************************
// MANDELBULBER
// kernel for DOF effect
//
//
// author: Krzysztof Marczak
// contact: buddhi1980#gmail.com
// licence: GNU GPL v3.0
//
//*********************************************************
typedef struct
{
int width;
int height;
float focus;
float radius;
} sParamsDOF;
typedef struct
{
float z;
int i;
} sSortZ;
//------------------ MAIN RENDER FUNCTION --------------------
kernel void DOF(__global ushort4 *in_image, __global ushort4 *out_image, __global sSortZ *zBuffer, sParamsDOF p)
{
const unsigned int i = get_global_id(0);
uint index = p.height * p.width - i - 1;
int ii = zBuffer[index].i;
int2 scr = (int2){ii % p.width, ii / p.width};
float z = zBuffer[index].z;
float blur = fabs(z - p.focus) / z * p.radius;
blur = min(blur, 500.0f);
float4 center = convert_float4(in_image[scr.x + scr.y * p.width]);
float factor = blur * blur * sqrt(blur)* M_PI_F/3.0f;
int blurInt = (int)blur;
int2 scr2;
int2 start = (int2){scr.x - blurInt, scr.y - blurInt};
start = max(start, 0);
int2 end = (int2){scr.x + blurInt, scr.y + blurInt};
end = min(end, (int2){p.width - 1, p.height - 1});
for (scr2.y = start.y; scr2.y <= end.y; scr2.y++)
{
for(scr2.x = start.x; scr2.x <= end.x; scr2.x++)
{
float2 d = scr - scr2;
float r = length(d);
float op = (blur - r) / factor;
op = clamp(op, 0.0f, 1.0f);
float opN = 1.0f - op;
uint address = scr2.x + scr2.y * p.width;
float4 old = convert_float4(out_image[address]);
out_image[address] = convert_ushort4(opN * old + op * center);
}
}
}
No, you can't without worrying about synchronization. If two work items scatter to the same location without synchronization, you have a race condition and won't get the correct results. Same for both buffers and images. With buffers you could use atomics, but they can slow down your code, especially when there is contention (but even when not). AFAIK, read/write images don't have atomic operations.
I have a heightmap. I want to efficiently compute which tiles in it are visible from an eye at any given location and height.
This paper suggests that heightmaps outperform turning the terrain into some kind of mesh, but they sample the grid using Bresenhams.
If I were to adopt that, I'd have to do a line-of-sight Bresenham's line for each and every tile on the map. It occurs to me that it ought to be possible to reuse most of the calculations and compute the heightmap in a single pass if you fill outwards away from the eye - a scanline fill kind of approach perhaps?
But the logic escapes me. What would the logic be?
Here is a heightmap with a the visibility from a particular vantagepoint (green cube) ("viewshed" as in "watershed"?) painted over it:
Here is the O(n) sweep that I came up with; I seems the same as that given in the paper in the answer below How to compute the visible area based on a heightmap? Franklin and Ray's method, only in this case I am walking from eye outwards instead of walking the perimeter doing a bresenhams towards the centre; to my mind, my approach would have much better caching behaviour - i.e. be faster - and use less memory since it doesn't have to track the vector for each tile, only remember a scanline's worth:
typedef std::vector<float> visbuf_t;
inline void map::_visibility_scan(const visbuf_t& in,visbuf_t& out,const vec_t& eye,int start_x,int stop_x,int y,int prev_y) {
const int xdir = (start_x < stop_x)? 1: -1;
for(int x=start_x; x!=stop_x; x+=xdir) {
const int x_diff = abs(eye.x-x), y_diff = abs(eye.z-y);
const bool horiz = (x_diff >= y_diff);
const int x_step = horiz? 1: x_diff/y_diff;
const int in_x = x-x_step*xdir; // where in the in buffer would we get the inner value?
const float outer_d = vec2_t(x,y).distance(vec2_t(eye.x,eye.z));
const float inner_d = vec2_t(in_x,horiz? y: prev_y).distance(vec2_t(eye.x,eye.z));
const float inner = (horiz? out: in).at(in_x)*(outer_d/inner_d); // get the inner value, scaling by distance
const float outer = height_at(x,y)-eye.y; // height we are at right now in the map, eye-relative
if(inner <= outer) {
out.at(x) = outer;
vis.at(y*width+x) = VISIBLE;
} else {
out.at(x) = inner;
vis.at(y*width+x) = NOT_VISIBLE;
}
}
}
void map::visibility_add(const vec_t& eye) {
const float BASE = -10000; // represents a downward vector that would always be visible
visbuf_t scan_0, scan_out, scan_in;
scan_0.resize(width);
vis[eye.z*width+eye.x-1] = vis[eye.z*width+eye.x] = vis[eye.z*width+eye.x+1] = VISIBLE;
scan_0.at(eye.x) = BASE;
scan_0.at(eye.x-1) = BASE;
scan_0.at(eye.x+1) = BASE;
_visibility_scan(scan_0,scan_0,eye,eye.x+2,width,eye.z,eye.z);
_visibility_scan(scan_0,scan_0,eye,eye.x-2,-1,eye.z,eye.z);
scan_out = scan_0;
for(int y=eye.z+1; y<height; y++) {
scan_in = scan_out;
_visibility_scan(scan_in,scan_out,eye,eye.x,-1,y,y-1);
_visibility_scan(scan_in,scan_out,eye,eye.x,width,y,y-1);
}
scan_out = scan_0;
for(int y=eye.z-1; y>=0; y--) {
scan_in = scan_out;
_visibility_scan(scan_in,scan_out,eye,eye.x,-1,y,y+1);
_visibility_scan(scan_in,scan_out,eye,eye.x,width,y,y+1);
}
}
Is it a valid approach?
it is using centre-points rather than looking at the slope between the 'inner' pixel and its neighbour on the side that the LoS passes
could the trig in to scale the vectors and such be replaced by factor multiplication?
it could use an array of bytes since the heights are themselves bytes
its not a radial sweep, its doing a whole scanline at a time but away from the point; it only uses only a couple of scanlines-worth of additional memory which is neat
if it works, you could imagine that you could distribute it nicely using a radial sweep of blocks; you have to compute the centre-most tile first, but then you can distribute all immediately adjacent tiles from that (they just need to be given the edge-most intermediate values) and then in turn more and more parallelism.
So how to most efficiently calculate this viewshed?
What you want is called a sweep algorithm. Basically you cast rays (Bresenham's) to each of the perimeter cells, but keep track of the horizon as you go and mark any cells you pass on the way as being visible or invisible (and update the ray's horizon if visible). This gets you down from the O(n^3) of the naive approach (testing each cell of an nxn DEM individually) to O(n^2).
More detailed description of the algorithm in section 5.1 of this paper (which you might also find interesting for other reasons if you aspire to work with really enormous heightmaps).