VC++ 64 Bit SetDIBits does not work - visual-c++

I recompiled existing VC++ code in VS2013 .Net 4 with Target Machine 64 bit. Compilation is fine but for some reason SetDIBits is not working. Images are coming up black. These are monochrome images. Same exact code works fine with VS2008 .Net 2.0 32-bit compilation. Any ideas will be helpful.
Sample code is below (This function gets the IntPtr handle which is then used for additional drawing):
IntPtr GetGraphics(Bitmap ^src, BitmapData ^pData, IntPtr ^phBitmapOldIntPtr) {
HDC hdc;
IntPtr scan0;
BitmapData ^data;
System::Drawing::Rectangle rectSrc = System::Drawing::Rectangle(0, 0, src->Width, src->Height);
BYTE *pBits;
IntPtr pixels, hBitmapIntPtr, hdcIntPtr;
BITMAPINFO *pbmi;
int iScanLines;
int iCopied;
HBITMAP hBitmapSrc, hBitmapOld;
hdc = CreateCompatibleDC(NULL);
hBitmapSrc = CreateCompatibleBitmap(hdc, src->Width, src->Height);
data = src->LockBits(rectSrc, ImageLockMode::ReadWrite, src->PixelFormat);
pixels = data->Scan0;
pBits = (BYTE*)pixels.ToPointer();
//ZeroMemory(&pbmi, sizeof(BITMAPINFO));
pbmi = (BITMAPINFO*)GlobalAlloc(GMEM_FIXED, sizeof(BITMAPINFO) + 2 * sizeof(RGBQUAD));
pbmi->bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
pbmi->bmiHeader.biBitCount = 1;
pbmi->bmiHeader.biHeight = -src->Height;
pbmi->bmiHeader.biWidth = src->Width;
pbmi->bmiHeader.biPlanes = 1;
pbmi->bmiHeader.biCompression = BI_RGB;
pbmi->bmiHeader.biSizeImage = 0;
pbmi->bmiHeader.biXPelsPerMeter = 300;
pbmi->bmiHeader.biYPelsPerMeter = 300;
pbmi->bmiHeader.biClrUsed = 0;
pbmi->bmiHeader.biClrImportant = 0;
pbmi->bmiColors[0].rgbBlue = 0;
pbmi->bmiColors[0].rgbGreen = 0;
pbmi->bmiColors[0].rgbRed = 0;
pbmi->bmiColors[1].rgbBlue = 255;
pbmi->bmiColors[1].rgbGreen = 255;
pbmi->bmiColors[1].rgbRed = 255;
iScanLines = src->Height;
iCopied = SetDIBits(hdc, hBitmapSrc, 0, iScanLines, pBits, pbmi, DIB_RGB_COLORS);
GlobalFree(pbmi);
if (pData) pData = data;
hBitmapOld = (HBITMAP)SelectObject(hdc, hBitmapSrc);
phBitmapOldIntPtr = IntPtr(hBitmapOld);
hdcIntPtr = IntPtr((void*)hdc);
return hdcIntPtr;
}
Thanks in advance!

The whole function can be replaced with:
src->Clone() to copy the bitmap to a compatible bitmap
Graphics::FromImage() to make a Graphics compatible with the clone
GetHdc() on the graphics, to get an HDC for your existing drawing functions

Related

E_INVALIDARG while IMFTransform::ProcessOutput

I am struggling to make cartoon shader in Media Foundation, and to do so I need to convert NV12 provided by my camera natively to RGB24. By far my tries with IMFTransform looks like this:
Setup:
inputVideoTypes = new MFT_REGISTER_TYPE_INFO;
inputVideoTypes->guidMajorType = MFMediaType_Video;
inputVideoTypes->guidSubtype = MFVideoFormat_NV12;
outputVideoTypes = new MFT_REGISTER_TYPE_INFO;
outputVideoTypes->guidMajorType = MFMediaType_Video;
outputVideoTypes->guidSubtype = MFVideoFormat_RGB24;
IMFActivate **transformActivateArray = NULL;
UINT32 MFTcount;
hr = MFTEnumEx(MFT_CATEGORY_VIDEO_PROCESSOR, MFT_ENUM_FLAG_ALL, inputVideoTypes, outputVideoTypes, &transformActivateArray, &MFTcount);
hr = VP->GetAttributes(&VPAttributes);
hr = VPAttributes->SetUINT32(MF_TOPOLOGY_ENABLE_XVP_FOR_PLAYBACK, TRUE);
hr = VP->SetInputType(0, streamType2, 0);
MediaFoundationSamples::LogMediaType(streamType2);
DWORD dwIndex = 4;
hr = VP->GetOutputAvailableType(0, dwIndex, &streamType3);
hr = MFSetAttributeSize(streamType3, MF_MT_FRAME_SIZE, 1280, 720);
hr = streamType3->SetUINT32(MF_MT_FIXED_SIZE_SAMPLES, 1);
hr = MFSetAttributeRatio(streamType3, MF_MT_FRAME_RATE, 30, 1);
hr = MFSetAttributeRatio(streamType3, MF_MT_PIXEL_ASPECT_RATIO, 1, 1);
streamType3->SetUINT32(MF_MT_ALL_SAMPLES_INDEPENDENT, 1);
streamType3->SetUINT32(MF_MT_INTERLACE_MODE, 2);
MediaFoundationSamples::LogMediaType(streamType3);
hr = VP->SetOutputType(0, streamType3, 0);
hr = VP->GetInputStreamInfo(0, &InputInfo);
hr = VP->GetOutputStreamInfo(0, &OutputInfo);
InOnReadSample:
hr = VP->ProcessMessage(MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, NULL);
hr = VP->ProcessInput(0, sample, 0);
DWORD statusFlags;
hr = VP->GetOutputStatus(&statusFlags);
while (statusFlags == 0)
{
hr = VP->ProcessInput(0, sample, 0);
hr = VP->GetOutputStatus(&statusFlags);
}
DWORD outputStatus = 0;
IMFSample* outputSample;
MFCreateSample(&outputSample);
MFT_OUTPUT_DATA_BUFFER outputBuffer = {};
outputBuffer.pSample = outputSample;
hr = VP->ProcessOutput(0, 1, &outputBuffer, &outputStatus);
But the problem is that ProcessOutput returns hr = E_INVALIDARG and I have no idea why. Weird things are OutputInfo and InputInfo. both dwFlags are 0, but their cbSize seems normal.
Logs of MediaTypes:
input (streamType2):
MF_MT_FRAME_SIZE 1280 x 720
MF_MT_YUV_MATRIX 2
MF_MT_MAJOR_TYPE MFMediaType_Video
MF_MT_VIDEO_LIGHTING 3
MF_MT_VIDEO_CHROMA_SITING 1
MF_MT_AM_FORMAT_TYPE {F72A76A0-EB0A-11D0-ACE4-0000C0CC16BA}
MF_MT_FIXED_SIZE_SAMPLES 1
MF_MT_VIDEO_NOMINAL_RANGE 1
MF_MT_FRAME_RATE 30 x 1
MF_MT_PIXEL_ASPECT_RATIO 1 x 1
MF_MT_ALL_SAMPLES_INDEPENDENT 1
MF_MT_FRAME_RATE_RANGE_MIN 128849018881
MF_MT_VIDEO_PRIMARIES 2
MF_MT_INTERLACE_MODE 2
MF_MT_FRAME_RATE_RANGE_MAX 128849018881
{EA031A62-8BBB-43C5-B5C4-572D2D231C18} 1
MF_MT_SUBTYPE MFVideoFormat_NV12
output (streamType3):
MF_MT_FRAME_SIZE 1280 x 720
MF_MT_MAJOR_TYPE MFMediaType_Video
MF_MT_FIXED_SIZE_SAMPLES 1
MF_MT_FRAME_RATE 30 x 1
MF_MT_PIXEL_ASPECT_RATIO 1 x 1
MF_MT_ALL_SAMPLES_INDEPENDENT 1
MF_MT_INTERLACE_MODE 2
MF_MT_SUBTYPE MFVideoFormat_RGB24
Is anyone able to tell me what I am doing wrong?
Thank you!
You try to convert buffers without setting up Direct3D awareness. This is fine for memory buffers and in this mode you are typically supposed to provide both input and output buffers yourselves. Zero OutputInfo.dwFlags suggests exactly this.
So you are on the right track there with your MFT_OUTPUT_DATA_BUFFER::pSample initialization but what kind of sample you are submitting for output? It is a sample with no buffer attached. Hence, invalid argument.
Use MFCreateMemoryBuffer to allocate memory for your output RGB24 sample and then use it in ProcessOutput call.

DirectX 11 changing the pixel bytes

Followed this guide here
I am tasked with "using map and unmap methods to draw a line across the screen by setting pixel byte data to rgb red values".
I have the sprite and background displaying but have no idea how to get the data.
I also tried doing this:
//Create device
D3D11_TEXTURE2D_DESC desc;
ZeroMemory(&desc, sizeof(D3D11_TEXTURE2D_DESC));
desc.Width = 500;
desc.Height = 300;
desc.Format = DXGI_FORMAT_B8G8R8A8_UNORM;
desc.Usage = D3D11_USAGE_DYNAMIC;
desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
desc.MiscFlags = 0;
desc.MipLevels = 1;
desc.ArraySize = 1;
desc.SampleDesc.Count = 1;
desc.SampleDesc.Quality = 0;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
m_d3dDevice->CreateTexture2D(&desc, nullptr, &texture);
m_d3dDevice->CreateShaderResourceView(texture, 0, &textureView);
// Render
D3D11_MAPPED_SUBRESOURCE mapped;
m_d3dContext->Map(texture, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped);
data = (BYTE*)mapped.pData;
rows = (BYTE)sizeof(data);
std::cout << "hi" << std::endl;
m_d3dContext->Unmap(texture, 0);
Problem is that in that case data array is size 0 but has a pointer. This means that I am pointing to a texture that doesn't have any data or am I not getting this?
Edit:
currently I found
D3D11_SHADER_RESOURCE_VIEW_DESC desc;
m_background->GetDesc(&desc);
desc.Buffer; // buffer
I felt the need to create an Answer for this as when I searched for how do this. This question pops up first and the supplied answer didn't really solve the problem for me and wasn't quite the way I wanted to do it anyways...
In my program I have a method as below.
void ContentLoader::WritePixelsToShaderIndex(uint32_t *data, int width, int height, int index)
{
D3D11_TEXTURE2D_DESC desc = {};
desc.Width = width;
desc.Height = height;
desc.MipLevels = 1;
desc.ArraySize = 1;
desc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
desc.SampleDesc.Count = 1;
desc.SampleDesc.Quality = 0;
desc.Usage = D3D11_USAGE_DEFAULT;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
desc.CPUAccessFlags = 0;
desc.MiscFlags = 0;
D3D11_SUBRESOURCE_DATA initData;
initData.pSysMem = data;
initData.SysMemPitch = width * 4;
initData.SysMemSlicePitch = width * height * 4;
Microsoft::WRL::ComPtr<ID3D11Texture2D> tex;
Engine::device->CreateTexture2D(&desc, &initData, tex.GetAddressOf());
Engine::device->CreateShaderResourceView(tex.Get(), NULL, ContentLoader::GetTextureAddress(index));
}
Then using the below code I tested drawing a Blue Square with a White Line. And it works perfectly fine. The issue I was getting was setting the System Mem Slice and Mem Pitch after looking in the WICTextureLoader class I was able to figure out how the data is stored. So it appears the
MemPitch = The Row's Size in Bytes.
MemSlice = The Total Image Pixels Size In Bytes.
const int WIDTH = 200;
const int HEIGHT = 200;
const uint32_t RED = 255 | (0 << 8) | (0 << 16) | (255 << 24);
const uint32_t WHITE = 255 | (255 << 8) | (255 << 16) | (255 << 24);
const uint32_t BLUE = 0 | (0 << 8) | (255 << 16) | (255 << 24);
uint32_t *buffer = new uint32_t[WIDTH * HEIGHT];
bool flip = false;
for (int X = 0; X < WIDTH; ++X)
{
for (int Y = 0; Y < HEIGHT; ++Y)
{
int pixel = X + Y * WIDTH;
buffer[pixel] = flip ? BLUE : WHITE;
}
flip = true;
}
WritePixelsToShaderIndex(buffer, WIDTH, HEIGHT, 3);
delete [] buffer;
First of all, most of those functions return HRESULT values that you are ignoring. That's not safe as you will miss important errors that invalidate the remaining code. You can use if(FAILED(...)) if you want, or you can use ThrowIfFailed, but you can't just ignore the return value in a functioning app.
HRESULT hr = m_d3dDevice->CreateTexture2D(&desc, nullptr, &texture);
if (FAILED(hr))
// error!
hr = m_d3dDevice->CreateShaderResourceView(texture, 0, &textureView);
if (FAILED(hr))
// error!
// Render
D3D11_MAPPED_SUBRESOURCE mapped;
hr = m_d3dContext->Map(texture, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped);
if (FAILED(hr))
// error!
Second, you should enable the Debug Device and look for diagnostic output which will likely point you to the reason for the failure.
sizeof(data) is always going to be 4 or 8 since data is a BYTE* i.e. the size of a pointer. It has nothing to do with the size of your data array. The locked buffer pointed to by mapped.pData is going to be mapped.RowPitch * desc.Height bytes in size.
You have to copy your pixel data into it row-by-row. Depending on the format and other factors, mapped.RowPitch is not necessarily going to be 4 * desc.Width--4 bytes per pixel is because you are using a format of DXGI_FORMAT_B8G8R8A8_UNORM. It should be at least that big, but it could be bigger to align the overall size.
This is pseudo-code and not necessarily an efficient way to do it, but:
for(UINT y = 0; y < desc.Height; ++y )
{
for(UINT x = 0; x < desc.Width; ++x )
{
// Find the memory location of the pixel at (x,y)
int pixel = y * mapped.RowPitch + (x*4)
BYTE* blue = data[pixel];
BYTE* green = data[pixel] + 1;
BYTE* red = data[pixel] + 2;
BYTE* alpha = data[pixel] + 3;
*blue = /* value between 0 and 255 */;
*green = /* value between 0 and 255 */;
*red = /* value between 0 and 255 */;
*alpha = /* value between 0 and 255 */;
}
}
You should take a look at DirectXTex which does a lot of this kind of row-by-row processing.

D3D11: Creating a cube map from 6 images

How do I create a cube map in D3D11 from 6 images? All the examples I've found use only one .dds. Specifically, how do I upload individual faces of the cube texture?
It works like this:
D3D11_TEXTURE2D_DESC texDesc;
texDesc.Width = description.width;
texDesc.Height = description.height;
texDesc.MipLevels = 1;
texDesc.ArraySize = 6;
texDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
texDesc.CPUAccessFlags = 0;
texDesc.SampleDesc.Count = 1;
texDesc.SampleDesc.Quality = 0;
texDesc.Usage = D3D11_USAGE_DEFAULT;
texDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
texDesc.CPUAccessFlags = 0;
texDesc.MiscFlags = D3D11_RESOURCE_MISC_TEXTURECUBE;
D3D11_SHADER_RESOURCE_VIEW_DESC SMViewDesc;
SMViewDesc.Format = texDesc.Format;
SMViewDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURECUBE;
SMViewDesc.TextureCube.MipLevels = texDesc.MipLevels;
SMViewDesc.TextureCube.MostDetailedMip = 0;
D3D11_SUBRESOURCE_DATA pData[6];
std::vector<vector4b> d[6]; // 6 images of type vector4b = 4 * unsigned char
for (int cubeMapFaceIndex = 0; cubeMapFaceIndex < 6; cubeMapFaceIndex++)
{
d[cubeMapFaceIndex].resize(description.width * description.height);
// fill with red color
std::fill(
d[cubeMapFaceIndex].begin(),
d[cubeMapFaceIndex].end(),
vector4b(255,0,0,255));
pData[cubeMapFaceIndex].pSysMem = &d[cubeMapFaceIndex][0];// description.data;
pData[cubeMapFaceIndex].SysMemPitch = description.width * 4;
pData[cubeMapFaceIndex].SysMemSlicePitch = 0;
}
HRESULT hr = renderer->getDevice()->CreateTexture2D(&texDesc,
description.data[0] ? &pData[0] : nullptr, &m_pCubeTexture);
assert(hr == S_OK);
hr = renderer->getDevice()->CreateShaderResourceView(
m_pCubeTexture, &SMViewDesc, &m_pShaderResourceView);
assert(hr == S_OK);
This creates six "red" images, for the CubeMap.
I know this question is old, and there is already a solution.
Here is a code example that loads 6 textures from disk and puts them together as a cubemap:
Precondition:
ID3D11ShaderResourceView* srv = 0;
ID3D11Resource* srcTex[6];
Pointer to a ShaderResourceView and an array filled with the six textures from disc. I use the order right, left, top, bottom, front, back.
// Each element in the texture array has the same format/dimensions.
D3D11_TEXTURE2D_DESC texElementDesc;
((ID3D11Texture2D*)srcTex[0])->GetDesc(&texElementDesc);
D3D11_TEXTURE2D_DESC texArrayDesc;
texArrayDesc.Width = texElementDesc.Width;
texArrayDesc.Height = texElementDesc.Height;
texArrayDesc.MipLevels = texElementDesc.MipLevels;
texArrayDesc.ArraySize = 6;
texArrayDesc.Format = texElementDesc.Format;
texArrayDesc.SampleDesc.Count = 1;
texArrayDesc.SampleDesc.Quality = 0;
texArrayDesc.Usage = D3D11_USAGE_DEFAULT;
texArrayDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
texArrayDesc.CPUAccessFlags = 0;
texArrayDesc.MiscFlags = D3D11_RESOURCE_MISC_TEXTURECUBE;
ID3D11Texture2D* texArray = 0;
if (FAILED(pd3dDevice->CreateTexture2D(&texArrayDesc, 0, &texArray)))
return false;
// Copy individual texture elements into texture array.
ID3D11DeviceContext* pd3dContext;
pd3dDevice->GetImmediateContext(&pd3dContext);
D3D11_BOX sourceRegion;
//Here i copy the mip map levels of the textures
for (UINT x = 0; x < 6; x++)
{
for (UINT mipLevel = 0; mipLevel < texArrayDesc.MipLevels; mipLevel++)
{
sourceRegion.left = 0;
sourceRegion.right = (texArrayDesc.Width >> mipLevel);
sourceRegion.top = 0;
sourceRegion.bottom = (texArrayDesc.Height >> mipLevel);
sourceRegion.front = 0;
sourceRegion.back = 1;
//test for overflow
if (sourceRegion.bottom == 0 || sourceRegion.right == 0)
break;
pd3dContext->CopySubresourceRegion(texArray, D3D11CalcSubresource(mipLevel, x, texArrayDesc.MipLevels), 0, 0, 0, srcTex[x], mipLevel, &sourceRegion);
}
}
// Create a resource view to the texture array.
D3D11_SHADER_RESOURCE_VIEW_DESC viewDesc;
viewDesc.Format = texArrayDesc.Format;
viewDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURECUBE;
viewDesc.TextureCube.MostDetailedMip = 0;
viewDesc.TextureCube.MipLevels = texArrayDesc.MipLevels;
if (FAILED(pd3dDevice->CreateShaderResourceView(texArray, &viewDesc, &srv)))
return false;
If anyone reads this question again, maybe try this one. Warning: this function is not threadsafe, because i have to use the deviceContext.

Color.HSBtoRGB missing in WinRT

I'm builing a fractal application and need to generate a smooth color scheme, and I found a nice algorithm at Smooth spectrum for Mandelbrot Set rendering.
But that required me to call Color.HSBtoRGB and that method is not available in WinRT / Windows Store apps.
Is there some other built-in method to do this conversion?
Other tips on how to convert HSB to RGB?
I ended up using the HSB to RGB conversion algorithm found at http://www.adafruit.com/blog/2012/03/14/constant-brightness-hsb-to-rgb-algorithm/, I adopted the inital (long) version. Perhaps this can be further optimized but for my purpose this was perfect!
As the hsb2rgb method is in C and I needed C#, I'm sharing my version here:
private byte[] hsb2rgb(int index, byte sat, byte bright)
{
int r_temp, g_temp, b_temp;
byte index_mod;
byte inverse_sat = (byte)(sat ^ 255);
index = index % 768;
index_mod = (byte)(index % 256);
if (index < 256)
{
r_temp = index_mod ^ 255;
g_temp = index_mod;
b_temp = 0;
}
else if (index < 512)
{
r_temp = 0;
g_temp = index_mod ^ 255;
b_temp = index_mod;
}
else if ( index < 768)
{
r_temp = index_mod;
g_temp = 0;
b_temp = index_mod ^ 255;
}
else
{
r_temp = 0;
g_temp = 0;
b_temp = 0;
}
r_temp = ((r_temp * sat) / 255) + inverse_sat;
g_temp = ((g_temp * sat) / 255) + inverse_sat;
b_temp = ((b_temp * sat) / 255) + inverse_sat;
r_temp = (r_temp * bright) / 255;
g_temp = (g_temp * bright) / 255;
b_temp = (b_temp * bright) / 255;
byte[] color = new byte[3];
color[0] = (byte)r_temp;
color[1] = (byte)g_temp;
color[2] = (byte)b_temp;
return color;
}
To call it based on the code linked in the original post I needed to make some minor modifications:
private byte[] SmoothColors1(int maxIterationCount, ref Complex z, int iteration)
{
double smoothcolor = iteration + 1 - Math.Log(Math.Log(z.Magnitude)) / Math.Log(2);
byte[] color = hsb2rgb((int)(10 * smoothcolor), (byte)(255 * 0.6f), (byte)(255 * 1.0f));
if (iteration >= maxIterationCount)
{
// Make sure the core is black
color[0] = 0;
color[1] = 0;
color[2] = 0;
}
return color;
}

OpenCL image2d_t writing mostly zeros

I am trying to use OpenCL and image2d_t objects to speed up image convolution. When I noticed that the output was a blank image of all zeros, I simplified the OpenCL kernel to a basic read from the input and write to the output (shown below). With a little bit of tweaking, I got it to write a few scattered pixels of the image into the output image.
I have verified that the image is intact up until the call to read_imageui() in the OpenCL kernel. I wrote the image to GPU memory with CommandQueue::enqueueWriteImage() and immediately read it back into a brand new buffer in CPU memory with CommandQueue::enqueueReadImage(). The result of this call matched the original input image. However, when I retrieve the pixels with read_imageui() in the kernel, the vast majority of the pixels are set to 0.
C++ source:
int height = 112;
int width = 9216;
unsigned int numPixels = height * width;
unsigned int numInputBytes = numPixels * sizeof(uint16_t);
unsigned int numDuplicatedInputBytes = numInputBytes * 4;
unsigned int numOutputBytes = numPixels * sizeof(int32_t);
cl::size_t<3> origin;
origin.push_back(0);
origin.push_back(0);
origin.push_back(0);
cl::size_t<3> region;
region.push_back(width);
region.push_back(height);
region.push_back(1);
std::ifstream imageFile("hri_vis_scan.dat", std::ifstream::binary);
checkErr(imageFile.is_open() ? CL_SUCCESS : -1, "hri_vis_scan.dat");
uint16_t *image = new uint16_t[numPixels];
imageFile.read((char *) image, numInputBytes);
imageFile.close();
// duplicate our single channel image into all 4 channels for Image2D
cl_ushort4 *imageDuplicated = new cl_ushort4[numPixels];
for (int i = 0; i < numPixels; i++)
for (int j = 0; j < 4; j++)
imageDuplicated[i].s[j] = image[i];
cl::Buffer imageBufferOut(context, CL_MEM_WRITE_ONLY, numOutputBytes, NULL, &err);
checkErr(err, "Buffer::Buffer()");
cl::ImageFormat inFormat;
inFormat.image_channel_data_type = CL_UNSIGNED_INT16;
inFormat.image_channel_order = CL_RGBA;
cl::Image2D bufferIn(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, inFormat, width, height, 0, imageDuplicated, &err);
checkErr(err, "Image2D::Image2D()");
cl::ImageFormat outFormat;
outFormat.image_channel_data_type = CL_UNSIGNED_INT16;
outFormat.image_channel_order = CL_RGBA;
cl::Image2D bufferOut(context, CL_MEM_WRITE_ONLY, outFormat, width, height, 0, NULL, &err);
checkErr(err, "Image2D::Image2D()");
int32_t *imageResult = new int32_t[numPixels];
memset(imageResult, 0, numOutputBytes);
cl_int4 *imageResultDuplicated = new cl_int4[numPixels];
for (int i = 0; i < numPixels; i++)
for (int j = 0; j < 4; j++)
imageResultDuplicated[i].s[j] = 0;
std::ifstream kernelFile("convolutionKernel.cl");
checkErr(kernelFile.is_open() ? CL_SUCCESS : -1, "convolutionKernel.cl");
std::string imageProg(std::istreambuf_iterator<char>(kernelFile), (std::istreambuf_iterator<char>()));
cl::Program::Sources imageSource(1, std::make_pair(imageProg.c_str(), imageProg.length() + 1));
cl::Program imageProgram(context, imageSource);
err = imageProgram.build(devices, "");
checkErr(err, "Program::build()");
cl::Kernel basic(imageProgram, "basic", &err);
checkErr(err, "Kernel::Kernel()");
basic.setArg(0, bufferIn);
basic.setArg(1, bufferOut);
basic.setArg(2, imageBufferOut);
queue.finish();
cl_ushort4 *imageDuplicatedTest = new cl_ushort4[numPixels];
for (int i = 0; i < numPixels; i++)
{
imageDuplicatedTest[i].s[0] = 0;
imageDuplicatedTest[i].s[1] = 0;
imageDuplicatedTest[i].s[2] = 0;
imageDuplicatedTest[i].s[3] = 0;
}
double gpuTimer = clock();
err = queue.enqueueReadImage(bufferIn, CL_FALSE, origin, region, 0, 0, imageDuplicatedTest, NULL, NULL);
checkErr(err, "CommandQueue::enqueueReadImage()");
// Output from above matches input image
err = queue.enqueueNDRangeKernel(basic, cl::NullRange, cl::NDRange(height, width), cl::NDRange(1, 1), NULL, NULL);
checkErr(err, "CommandQueue::enqueueNDRangeKernel()");
queue.flush();
err = queue.enqueueReadImage(bufferOut, CL_TRUE, origin, region, 0, 0, imageResultDuplicated, NULL, NULL);
checkErr(err, "CommandQueue::enqueueReadImage()");
queue.flush();
err = queue.enqueueReadBuffer(imageBufferOut, CL_TRUE, 0, numOutputBytes, imageResult, NULL, NULL);
checkErr(err, "CommandQueue::enqueueReadBuffer()");
queue.finish();
OpenCL kernel:
__kernel void basic(__read_only image2d_t input, __write_only image2d_t output, __global int *result)
{
const sampler_t smp = CLK_NORMALIZED_COORDS_TRUE | //Natural coordinates
CLK_ADDRESS_NONE | //Clamp to zeros
CLK_FILTER_NEAREST; //Don't interpolate
int2 coord = (get_global_id(1), get_global_id(0));
uint4 pixel = read_imageui(input, smp, coord);
result[coord.s0 + coord.s1 * 9216] = pixel.s0;
write_imageui(output, coord, pixel);
}
The coordinates in the kernel are currently mapped to (x, y) = (width, height).
The input image is a single channel greyscale image with 16 bits per pixel, which is why I had to duplicate the channels to fit into OpenCL's Image2D. The output after convolution will be 32 bits per pixel, which is why numOutputBytes is set to that. Also, although the width and height appear weird, the input image's dimensions are 9216x7824, so I'm only taking a portion of it to test the code first, so it doesn't take forever.
I added in a write to global memory after reading from the image in the kernel to see if the issue was reading the image or writing the image. After the kernel executes, this section of global memory also contains mostly zeros.
Any help would be greatly appreciated!
The documentation for read_imageui states that
Furthermore, the read_imagei and read_imageui calls that take integer coordinates must use a sampler with normalized coordinates set to CLK_NORMALIZED_COORDS_FALSE and addressing mode set to CLK_ADDRESS_CLAMP_TO_EDGE, CLK_ADDRESS_CLAMP or CLK_ADDRESS_NONE; otherwise the values returned are undefined.
But you're creating a sampler with CLK_NORMALIZED_COORDS_TRUE (but seem to be passing in non-normalized coords :S ?).

Resources