As far as I understand, the scalar layout qualifier in vulkan GLSL should allow a simple array of vec3 values to work, apparently with out any physical device features to be specified. I've tried with and with out specifying scalar layout features like so:
VkPhysicalDeviceScalarBlockLayoutFeatures layoutfeatures = {};
layoutfeatures.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_SCALAR_BLOCK_LAYOUT_FEATURES;
layoutfeatures.scalarBlockLayout = VK_TRUE;
VkPhysicalDeviceFeatures2 features2 = {};
features2.features = m_required_features;
features2.sType = VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_FEATURES_2;
features2.pNext = &layoutfeatures;
VkDeviceCreateInfo create_info = {};
create_info.sType = VK_STRUCTURE_TYPE_DEVICE_CREATE_INFO;
create_info.queueCreateInfoCount = static_cast<uint32_t>(queue_create_infos.size());
create_info.pQueueCreateInfos = queue_create_infos.data();
create_info.enabledExtensionCount = static_cast<uint32_t>(enabled_extensions_names.size());
create_info.ppEnabledExtensionNames = enabled_extensions_names.data();
create_info.pEnabledFeatures = &features2 .features;
create_info.pNext = &features2 ;
create_info.flags = flags;
VkDevice device;
VUL_EXCEPT_RESULT(vkCreateDevice(physical_device, &create_info, pAllocator, &device));
but to no avail. I'm getting strange spec invalidation errors.
here's example code:
#version 450
#extension GL_EXT_scalar_block_layout: require
#define WORKGROUP_SIZE 128
layout (local_size_x = WORKGROUP_SIZE, local_size_y = 1, local_size_z = 1) in;
layout(scalar, binding = 3) buffer MyBufferBlock{
vec3 example_array[];
};
void main(){
example_array[gl_GlobalInvocationID.x] = 0.0;
}
and I get the following SPIR-V validation error:
"example.exe"
ERROR : SPEC_INVALIDATION - Message ID Number 7060244, Message ID String UNASSIGNED-CoreValidation-Shader-InconsistentSpirv:
Validation Error: [ UNASSIGNED-CoreValidation-Shader-InconsistentSpirv ] Object 0: handle = 0x6543e40, type = VK_OBJECT_TYPE_DEVICE; | MessageID = 0x6bbb14 | SPIR-V module not valid: Structure id 500 decorated as BufferBlock for variable in Uniform storage class must follow relaxed storage buffer layout rules: member 0 contains an array with stride 12 not satisfying alignment to 16
%MyBufferBlock = OpTypeStruct %_runtimearr_v3float
First I don't understand why it is trying to call out my buffer as a uniform? and obviuosly I'm still confused on how this is at all a spec violation.
Related
for the past few days I've been trying to implement mouse picking in my vulkan game engine and part of that is reading data from framebuffers. Now I can successfully read RGBA data. But what I need is to read specific attachment containing id of the entity.
Here is what my fragment shader looks like:
#version 450 core
struct VertexInput
{
vec4 Color;
vec2 TexCoord;
float TilingFactor;
};
// Inputs
layout(location = 0) in VertexInput Input;
layout(location = 3) in flat uint in_TexIndex;
layout(location = 4) in flat int in_EntityID;
layout(binding = 1) uniform sampler2D u_Samplers[10];
// Outputs
layout(location = 0) out vec4 o_Color;
layout(location = 1) out int o_EntityID; // <-- This is what I need
void main()
{
o_EntityID = in_EntityID;
o_Color = Input.Color * texture(u_Samplers[in_TexIndex], Input.TexCoord * Input.TilingFactor);
}
And get this warning
Validation layer: Validation Warning: [ UNASSIGNED-CoreValidation-Shader-OutputNotConsumed ] Object 0: handle = 0x5dbcf90000000065, type = VK_OBJECT_TYPE_SHADER_MODULE; | MessageID = 0x609a13b | fragment shader writes to output location 1 with no matching attachment
which is understandable.
I know I need to add attachment into my framebuffer object but I am not sure which one or how should I approach this problem. Any ideas? Thanks :)
I want check the BIC usage in statistics.
My little example, which is saved as check_bic.cpp, is presented as follows:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
using namespace arma;
// [[Rcpp::export]]
List check_bic(const int N = 10, const int p = 20, const double seed=0){
arma_rng::set_seed(seed); // for reproducibility
arma::mat Beta = randu(p,N); //randu/randn:random values(uniform and normal distributions)
arma::vec Bic = randu(N);
uvec ii = find(Bic == min(Bic)); // may be just one or several elements
int id = ii(ii.n_elem); // fetch the last one element
vec behat = Beta.col(id); // fetch the id column of matrix Beta
List ret;
ret["Bic"] = Bic;
ret["ii"] = ii;
ret["id"] = id;
ret["Beta"] = Beta;
ret["behat"] = behat;
return ret;
}
Then I compile check_bic.cpp in R by
library(Rcpp)
library(RcppArmadillo);
sourceCpp("check_bic.cpp")
and the compilation can pass successfully.
However, when I ran
check_bic(10,20,0)
in R, it shows errors as
error: Mat::operator(): index out of bounds
Error in check_bic(10, 20, 0) : Mat::operator(): index out of bounds
I check the .cpp code line by line, and guess the problems probably
happen at
uvec ii = find(Bic == min(Bic)); // may be just one or several elements
int id = ii(ii.n_elem); // fetch the last one element
since if uvec ii only has one element, then ii.n_elem may be NaN or something
else in Rcpp (while it's ok in Matlab), while I dont konw how to
deal with case. Any help?
I'm trying to create a blank bitmap in a Xamarin.iOS project using the CGBitmapContext constructor(s). However, no matter what I try I just get the error "System.Exception: Invalid parameters to context creation"
Example code:
const int width = 100;
const int height = 100;
const int bytesPerPixel = 4;
const int bitsPerComponent = 8;
var intBytes = width * height * bytesPerPixel;
byte[] pixelData = new byte[intBytes];
var colourSpace = CGColorSpace.CreateDeviceRGB();
using (var objContext = new CGBitmapContext(pixelData, width, height, bitsPerComponent, width * bytesPerPixel, colourSpace, CGBitmapFlags.ByteOrderDefault))
{
// ...
}
I have tried changing most of the parameters; I tried fixing the data block and passing it as an IntPtr. I tried using null as the first parameter so that the system would allocate the data. And I've tried various flags in the last parameter. I always get that same error. What parameter is wrong? And what needs to be changed in the code above to make it execute?
I changed last parameter from CGBitmapFlags.ByteOrderDefault to CGBitmapFlags.PremultipliedLast , and the error disappear.
refer to CGBitmapFlags Enumeration
As the link said
This enumeration specifies the layout information for the component data in a bitmap.
This enumeration determines the in-memory organization of the data and includes the color model, whether there is an alpha channel present and whether the component values have been premultiplied.
I think we have to select the corresponding flag to match the data info, especially the color model.
For the same reason, If you choose CreateDeviceCmyk , CGBitmapFlags.None will be appropriate.
Consider a simple depth-of-field filter (my actual use case is similar). It loops over the image and scatters every pixel over a circular neighborhood of its. The radius of the neighborhood depends on the depth of the pixel - the closer the it is to the focal plane, the smaller the radius.
Note that I said "scatters" and not "gathers". In simpler image processing applications, you normally use the "gather" technique to perform an uniform Gaussian blur. IOW, you loop over the neighborhood of each pixel, and "gather" the nearby values into a weighted average. This works fine in that case, but if you make the blur kernel vary between pixels, while still using "gathering", you'll get a somewhat unrealistic effect. Such "space-variant filtering" scenarios are where "scattering" is different from "gathering".
To be clear: the scatter algo is something like this:
init resultImage to black
loop over sourceImage
var c = fetch current pixel from sourceImage
var toAdd = c * weight // weight < 1
loop over circular neighbourhood of current sourcepixel
add toAdd to current neighbor from resultImage
My question is: if I do a direct translation of this pseudocode to OpenCL, will there be synchronization issues due to different work-items simultaneously writing to the same output pixel?
Does the answer vary depending on whether I'm using Buffers or Images?
The course I'm reading suggests that there will be synchronization issues. But OTOH I read the source of Mandelbulber 1.21-2, which does a straightforward OpenCL DOF just like my above pseudocode, and it seems to work fine.
(the relevant code is in mandelbulber-opencl-1.21-2.orig/usr/share/cl/cl_DOF.cl and it's as follows)
//*********************************************************
// MANDELBULBER
// kernel for DOF effect
//
//
// author: Krzysztof Marczak
// contact: buddhi1980#gmail.com
// licence: GNU GPL v3.0
//
//*********************************************************
typedef struct
{
int width;
int height;
float focus;
float radius;
} sParamsDOF;
typedef struct
{
float z;
int i;
} sSortZ;
//------------------ MAIN RENDER FUNCTION --------------------
kernel void DOF(__global ushort4 *in_image, __global ushort4 *out_image, __global sSortZ *zBuffer, sParamsDOF p)
{
const unsigned int i = get_global_id(0);
uint index = p.height * p.width - i - 1;
int ii = zBuffer[index].i;
int2 scr = (int2){ii % p.width, ii / p.width};
float z = zBuffer[index].z;
float blur = fabs(z - p.focus) / z * p.radius;
blur = min(blur, 500.0f);
float4 center = convert_float4(in_image[scr.x + scr.y * p.width]);
float factor = blur * blur * sqrt(blur)* M_PI_F/3.0f;
int blurInt = (int)blur;
int2 scr2;
int2 start = (int2){scr.x - blurInt, scr.y - blurInt};
start = max(start, 0);
int2 end = (int2){scr.x + blurInt, scr.y + blurInt};
end = min(end, (int2){p.width - 1, p.height - 1});
for (scr2.y = start.y; scr2.y <= end.y; scr2.y++)
{
for(scr2.x = start.x; scr2.x <= end.x; scr2.x++)
{
float2 d = scr - scr2;
float r = length(d);
float op = (blur - r) / factor;
op = clamp(op, 0.0f, 1.0f);
float opN = 1.0f - op;
uint address = scr2.x + scr2.y * p.width;
float4 old = convert_float4(out_image[address]);
out_image[address] = convert_ushort4(opN * old + op * center);
}
}
}
No, you can't without worrying about synchronization. If two work items scatter to the same location without synchronization, you have a race condition and won't get the correct results. Same for both buffers and images. With buffers you could use atomics, but they can slow down your code, especially when there is contention (but even when not). AFAIK, read/write images don't have atomic operations.
I have a problem while reading a couple of positions in a double array from different threads.
I enqueue the execution with :
nelements = nx*ny;
err = clEnqueueNDRangeKernel(queue,kernelTvl2of,1,NULL,&nelements,NULL,0,NULL,NULL);
kernelTvl2of has (among other) the code
size_t k = get_global_id(0);
(...)
u1_[k] = (float)u1[k];
(...)
barrier(CLK_GLOBAL_MEM_FENCE);
forwardgradient(u1_,u1x,u1y,k,nx,ny);
barrier(CLK_GLOBAL_MEM_FENCE);
and forwardgradient has the code:
void forwardgradient(global double *f, global double *fx, global double *fy, int ker,int nx, int ny){
unsigned int rowsnotlast = ((nx)*(ny-1));
if(ker<rowsnotlast){
fx[ker] = f[ker+1] - f[ker];
fy[ker] = f[ker+nx] - f[ker];
}
if(ker<nx*ny){
fx[ker] = f[ker+1] - f[ker];
if(ker==4607){
fx[0] = f[4607];
fx[1] = f[4608];
fx[2] = f[4608] - f[4607];
fx[3] = f[ker];
fx[4] = f[ker+1];
fx[5] = f[ker+1] - f[ker];
}
}
if(ker==(nx*ny)-1){
fx[ker] = 0;
fy[ker] = 0;
}
if(ker%nx == nx-1){
fx[ker]=0;
}
fx[6] = f[4608];
}
When I get the contents of the first positions of fx, they are:
-6 0 6 -6 0 6 -6
And here's my problem: when I query fx[ker+1] or fx[4608] on thread with id 4607 I get a '0' (positions second and fifth of the output array), but from other threads I get a '-6' last position of the output array)
Anyone has a clue on what I'm doing wrong, or where I could look to?
Thanks a lot,
Anton
Within a kernel, global memory consistency is only achievable within a single work-group. This means that if a work-item writes a value to global memory, a barrier(CLK_GLOBAL_MEM_FENCE) only guarantees that other work-items within the same work-group will be able to read the updated value.
If you need global memory consistency across multiple work-groups, you need to split your kernel into multiple kernels.