DirectX 11 changing the pixel bytes - visual-c++

Followed this guide here
I am tasked with "using map and unmap methods to draw a line across the screen by setting pixel byte data to rgb red values".
I have the sprite and background displaying but have no idea how to get the data.
I also tried doing this:
//Create device
D3D11_TEXTURE2D_DESC desc;
ZeroMemory(&desc, sizeof(D3D11_TEXTURE2D_DESC));
desc.Width = 500;
desc.Height = 300;
desc.Format = DXGI_FORMAT_B8G8R8A8_UNORM;
desc.Usage = D3D11_USAGE_DYNAMIC;
desc.CPUAccessFlags = D3D11_CPU_ACCESS_WRITE;
desc.MiscFlags = 0;
desc.MipLevels = 1;
desc.ArraySize = 1;
desc.SampleDesc.Count = 1;
desc.SampleDesc.Quality = 0;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
m_d3dDevice->CreateTexture2D(&desc, nullptr, &texture);
m_d3dDevice->CreateShaderResourceView(texture, 0, &textureView);
// Render
D3D11_MAPPED_SUBRESOURCE mapped;
m_d3dContext->Map(texture, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped);
data = (BYTE*)mapped.pData;
rows = (BYTE)sizeof(data);
std::cout << "hi" << std::endl;
m_d3dContext->Unmap(texture, 0);
Problem is that in that case data array is size 0 but has a pointer. This means that I am pointing to a texture that doesn't have any data or am I not getting this?
Edit:
currently I found
D3D11_SHADER_RESOURCE_VIEW_DESC desc;
m_background->GetDesc(&desc);
desc.Buffer; // buffer

I felt the need to create an Answer for this as when I searched for how do this. This question pops up first and the supplied answer didn't really solve the problem for me and wasn't quite the way I wanted to do it anyways...
In my program I have a method as below.
void ContentLoader::WritePixelsToShaderIndex(uint32_t *data, int width, int height, int index)
{
D3D11_TEXTURE2D_DESC desc = {};
desc.Width = width;
desc.Height = height;
desc.MipLevels = 1;
desc.ArraySize = 1;
desc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
desc.SampleDesc.Count = 1;
desc.SampleDesc.Quality = 0;
desc.Usage = D3D11_USAGE_DEFAULT;
desc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
desc.CPUAccessFlags = 0;
desc.MiscFlags = 0;
D3D11_SUBRESOURCE_DATA initData;
initData.pSysMem = data;
initData.SysMemPitch = width * 4;
initData.SysMemSlicePitch = width * height * 4;
Microsoft::WRL::ComPtr<ID3D11Texture2D> tex;
Engine::device->CreateTexture2D(&desc, &initData, tex.GetAddressOf());
Engine::device->CreateShaderResourceView(tex.Get(), NULL, ContentLoader::GetTextureAddress(index));
}
Then using the below code I tested drawing a Blue Square with a White Line. And it works perfectly fine. The issue I was getting was setting the System Mem Slice and Mem Pitch after looking in the WICTextureLoader class I was able to figure out how the data is stored. So it appears the
MemPitch = The Row's Size in Bytes.
MemSlice = The Total Image Pixels Size In Bytes.
const int WIDTH = 200;
const int HEIGHT = 200;
const uint32_t RED = 255 | (0 << 8) | (0 << 16) | (255 << 24);
const uint32_t WHITE = 255 | (255 << 8) | (255 << 16) | (255 << 24);
const uint32_t BLUE = 0 | (0 << 8) | (255 << 16) | (255 << 24);
uint32_t *buffer = new uint32_t[WIDTH * HEIGHT];
bool flip = false;
for (int X = 0; X < WIDTH; ++X)
{
for (int Y = 0; Y < HEIGHT; ++Y)
{
int pixel = X + Y * WIDTH;
buffer[pixel] = flip ? BLUE : WHITE;
}
flip = true;
}
WritePixelsToShaderIndex(buffer, WIDTH, HEIGHT, 3);
delete [] buffer;

First of all, most of those functions return HRESULT values that you are ignoring. That's not safe as you will miss important errors that invalidate the remaining code. You can use if(FAILED(...)) if you want, or you can use ThrowIfFailed, but you can't just ignore the return value in a functioning app.
HRESULT hr = m_d3dDevice->CreateTexture2D(&desc, nullptr, &texture);
if (FAILED(hr))
// error!
hr = m_d3dDevice->CreateShaderResourceView(texture, 0, &textureView);
if (FAILED(hr))
// error!
// Render
D3D11_MAPPED_SUBRESOURCE mapped;
hr = m_d3dContext->Map(texture, 0, D3D11_MAP_WRITE_DISCARD, 0, &mapped);
if (FAILED(hr))
// error!
Second, you should enable the Debug Device and look for diagnostic output which will likely point you to the reason for the failure.
sizeof(data) is always going to be 4 or 8 since data is a BYTE* i.e. the size of a pointer. It has nothing to do with the size of your data array. The locked buffer pointed to by mapped.pData is going to be mapped.RowPitch * desc.Height bytes in size.
You have to copy your pixel data into it row-by-row. Depending on the format and other factors, mapped.RowPitch is not necessarily going to be 4 * desc.Width--4 bytes per pixel is because you are using a format of DXGI_FORMAT_B8G8R8A8_UNORM. It should be at least that big, but it could be bigger to align the overall size.
This is pseudo-code and not necessarily an efficient way to do it, but:
for(UINT y = 0; y < desc.Height; ++y )
{
for(UINT x = 0; x < desc.Width; ++x )
{
// Find the memory location of the pixel at (x,y)
int pixel = y * mapped.RowPitch + (x*4)
BYTE* blue = data[pixel];
BYTE* green = data[pixel] + 1;
BYTE* red = data[pixel] + 2;
BYTE* alpha = data[pixel] + 3;
*blue = /* value between 0 and 255 */;
*green = /* value between 0 and 255 */;
*red = /* value between 0 and 255 */;
*alpha = /* value between 0 and 255 */;
}
}
You should take a look at DirectXTex which does a lot of this kind of row-by-row processing.

Related

D3D11: Creating a cube map from 6 images

How do I create a cube map in D3D11 from 6 images? All the examples I've found use only one .dds. Specifically, how do I upload individual faces of the cube texture?
It works like this:
D3D11_TEXTURE2D_DESC texDesc;
texDesc.Width = description.width;
texDesc.Height = description.height;
texDesc.MipLevels = 1;
texDesc.ArraySize = 6;
texDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM;
texDesc.CPUAccessFlags = 0;
texDesc.SampleDesc.Count = 1;
texDesc.SampleDesc.Quality = 0;
texDesc.Usage = D3D11_USAGE_DEFAULT;
texDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
texDesc.CPUAccessFlags = 0;
texDesc.MiscFlags = D3D11_RESOURCE_MISC_TEXTURECUBE;
D3D11_SHADER_RESOURCE_VIEW_DESC SMViewDesc;
SMViewDesc.Format = texDesc.Format;
SMViewDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURECUBE;
SMViewDesc.TextureCube.MipLevels = texDesc.MipLevels;
SMViewDesc.TextureCube.MostDetailedMip = 0;
D3D11_SUBRESOURCE_DATA pData[6];
std::vector<vector4b> d[6]; // 6 images of type vector4b = 4 * unsigned char
for (int cubeMapFaceIndex = 0; cubeMapFaceIndex < 6; cubeMapFaceIndex++)
{
d[cubeMapFaceIndex].resize(description.width * description.height);
// fill with red color
std::fill(
d[cubeMapFaceIndex].begin(),
d[cubeMapFaceIndex].end(),
vector4b(255,0,0,255));
pData[cubeMapFaceIndex].pSysMem = &d[cubeMapFaceIndex][0];// description.data;
pData[cubeMapFaceIndex].SysMemPitch = description.width * 4;
pData[cubeMapFaceIndex].SysMemSlicePitch = 0;
}
HRESULT hr = renderer->getDevice()->CreateTexture2D(&texDesc,
description.data[0] ? &pData[0] : nullptr, &m_pCubeTexture);
assert(hr == S_OK);
hr = renderer->getDevice()->CreateShaderResourceView(
m_pCubeTexture, &SMViewDesc, &m_pShaderResourceView);
assert(hr == S_OK);
This creates six "red" images, for the CubeMap.
I know this question is old, and there is already a solution.
Here is a code example that loads 6 textures from disk and puts them together as a cubemap:
Precondition:
ID3D11ShaderResourceView* srv = 0;
ID3D11Resource* srcTex[6];
Pointer to a ShaderResourceView and an array filled with the six textures from disc. I use the order right, left, top, bottom, front, back.
// Each element in the texture array has the same format/dimensions.
D3D11_TEXTURE2D_DESC texElementDesc;
((ID3D11Texture2D*)srcTex[0])->GetDesc(&texElementDesc);
D3D11_TEXTURE2D_DESC texArrayDesc;
texArrayDesc.Width = texElementDesc.Width;
texArrayDesc.Height = texElementDesc.Height;
texArrayDesc.MipLevels = texElementDesc.MipLevels;
texArrayDesc.ArraySize = 6;
texArrayDesc.Format = texElementDesc.Format;
texArrayDesc.SampleDesc.Count = 1;
texArrayDesc.SampleDesc.Quality = 0;
texArrayDesc.Usage = D3D11_USAGE_DEFAULT;
texArrayDesc.BindFlags = D3D11_BIND_SHADER_RESOURCE;
texArrayDesc.CPUAccessFlags = 0;
texArrayDesc.MiscFlags = D3D11_RESOURCE_MISC_TEXTURECUBE;
ID3D11Texture2D* texArray = 0;
if (FAILED(pd3dDevice->CreateTexture2D(&texArrayDesc, 0, &texArray)))
return false;
// Copy individual texture elements into texture array.
ID3D11DeviceContext* pd3dContext;
pd3dDevice->GetImmediateContext(&pd3dContext);
D3D11_BOX sourceRegion;
//Here i copy the mip map levels of the textures
for (UINT x = 0; x < 6; x++)
{
for (UINT mipLevel = 0; mipLevel < texArrayDesc.MipLevels; mipLevel++)
{
sourceRegion.left = 0;
sourceRegion.right = (texArrayDesc.Width >> mipLevel);
sourceRegion.top = 0;
sourceRegion.bottom = (texArrayDesc.Height >> mipLevel);
sourceRegion.front = 0;
sourceRegion.back = 1;
//test for overflow
if (sourceRegion.bottom == 0 || sourceRegion.right == 0)
break;
pd3dContext->CopySubresourceRegion(texArray, D3D11CalcSubresource(mipLevel, x, texArrayDesc.MipLevels), 0, 0, 0, srcTex[x], mipLevel, &sourceRegion);
}
}
// Create a resource view to the texture array.
D3D11_SHADER_RESOURCE_VIEW_DESC viewDesc;
viewDesc.Format = texArrayDesc.Format;
viewDesc.ViewDimension = D3D11_SRV_DIMENSION_TEXTURECUBE;
viewDesc.TextureCube.MostDetailedMip = 0;
viewDesc.TextureCube.MipLevels = texArrayDesc.MipLevels;
if (FAILED(pd3dDevice->CreateShaderResourceView(texArray, &viewDesc, &srv)))
return false;
If anyone reads this question again, maybe try this one. Warning: this function is not threadsafe, because i have to use the deviceContext.

ArrayFire frame search algorithm crash

I am new to ArrayFire and CUDA development in general, I just started using ArrayFire a couple of days ago after failing miserably using Thrust.
I am building an ArrayFire-based algorithm that is supposed to search a single 32x32 pixel frame in a database of a couple hundred thousand 32x32 frames that are stored into device memory.
At first I initialize a matrix that has 1024 + 1 pixels as rows (I need an extra one to keep a frame group id) and a predefined number (this case 1000) of frames, indexed by coloumn.
Here's the function that performs the search, if I uncomment "pixels_uint32 = device_frame_ptr[pixel_group_idx];" the program crashes. The pointer seems to be valid so I do not understand why this happens. Maybe there is something I do not know regarding accessing device memory in this way?
#include <iostream>
#include <stdio.h>
#include <sys/types.h>
#include <arrayfire.h>
#include "utils.h"
using namespace af;
using namespace std;
/////////////////////////// CUDA settings ////////////////////////////////
#define TEST_DEBUG false
#define MAX_NUMBER_OF_FRAMES 1000 // maximum (2499999 frames) X (1024 + 1 pixels per frame) x (2 bytes per pixel) = 5.124.997.950 bytes (~ 5GB)
#define BLOB_FINGERPRINT_SIZE 1024 //32x32
//percentage of macroblocks that should match: 0.9 means 90%
#define MACROBLOCK_COMPARISON_OVERALL_THRESHOLD 768 //1024 * 0.75
//////////////////////// End of CUDA settings ////////////////////////////
array search_frame(array d_db_vec)
{
try {
uint number_of_uint32_for_frame = BLOB_FINGERPRINT_SIZE / 2;
// create one-element array to hold the result of the computation
array frame_found(1,MAX_NUMBER_OF_FRAMES, u32);
frame_found = 0;
gfor (array frame_idx, MAX_NUMBER_OF_FRAMES) {
// get the blob id it's the last coloumn of the matrix
array blob_id = d_db_vec(number_of_uint32_for_frame, frame_idx); // addressing with (pixel_idx, frame_idx)
// define some hardcoded pixel to search for
uint8_t searched_r = 0x0;
uint8_t searched_g = 0x3F;
uint8_t searched_b = 0x0;
uint8_t b1 = 0;
uint8_t g1 = 0;
uint8_t r1 = 0;
uint8_t b2 = 0;
uint8_t g2 = 0;
uint8_t r2 = 0;
uint32_t sum1 = 0;
uint32_t sum2 = 0;
uint32_t *device_frame_ptr = NULL;
uint32_t pixels_uint32 = 0;
uint pixel_match_counter = 0;
//uint pixel_match_counter = 0;
array frame = d_db_vec(span, frame_idx);
device_frame_ptr = frame.device<uint32_t>();
for (uint pixel_group_idx = 0; pixel_group_idx < number_of_uint32_for_frame; pixel_group_idx++) {
// test to see if the whole matrix is traversed
// d_db_vec(pixel_group_idx, frame_idx) = 0;
/////////////////////////////// PROBLEMATIC CODE ///////////////////////////////////
pixels_uint32 = 0x7E007E0;
//pixels_uint32 = device_frame_ptr[pixel_group_idx]; //why does this crash the program?
// if I uncomment the above line the program tries to copy the u32 frame into the pixels_uint32 variable
// something goes wrong, since the pointer device_frame_ptr is not NULL and the elements should be there judging by the lines above
////////////////////////////////////////////////////////////////////////////////////
// splitting the first pixel into its components
b1 = (pixels_uint32 & 0xF8000000) >> 27; //(input & 11111000000000000000000000000000)
g1 = (pixels_uint32 & 0x07E00000) >> 21; //(input & 00000111111000000000000000000000)
r1 = (pixels_uint32 & 0x001F0000) >> 16; //(input & 00000000000111110000000000000000)
// splitting the second pixel into its components
b2 = (pixels_uint32 & 0xF800) >> 11; //(input & 00000000000000001111100000000000)
g2 = (pixels_uint32 & 0x07E0) >> 5; //(input & 00000000000000000000011111100000)
r2 = (pixels_uint32 & 0x001F); //(input & 00000000000000000000000000011111)
// checking if they are a match
sum1 = abs(searched_r - r1) + abs(searched_g - g1) + abs(searched_b - b1);
sum2 = abs(searched_r - r2) + abs(searched_g - g2) + abs(searched_b - b2);
// if they match, increment the local counter
pixel_match_counter = (sum1 <= 16) ? pixel_match_counter + 1 : pixel_match_counter;
pixel_match_counter = (sum2 <= 16) ? pixel_match_counter + 1 : pixel_match_counter;
}
bool is_found = pixel_match_counter > MACROBLOCK_COMPARISON_OVERALL_THRESHOLD;
// write down if the frame is a match or not
frame_found(0,frame_idx) = is_found ? frame_found(0,frame_idx) : blob_id;
}
// test to see if the whole matrix is traversed - this has to print zeroes
if (TEST_DEBUG)
print(d_db_vec);
// return the matches array
return frame_found;
} catch (af::exception& e) {
fprintf(stderr, "%s\n", e.what());
throw;
}
}
// make 2 green pixels
uint32_t make_test_pixel_group() {
uint32_t b1 = 0x0; //11111000000000000000000000000000
uint32_t g1 = 0x7E00000; //00000111111000000000000000000000
uint32_t r1 = 0x0; //00000000000111110000000000000000
uint32_t b2 = 0x0; //00000000000000001111100000000000
uint32_t g2 = 0x7E0; //00000000000000000000011111100000
uint32_t r2 = 0x0; //00000000000000000000000000011111
uint32_t green_pix = b1 | g1 | r1 | b2 | g2 | r2;
return green_pix;
}
int main(int argc, char ** argv)
{
info();
/////////////////////////////////////// CREATE THE DATABASE ///////////////////////////////////////
uint number_of_uint32_for_frame = BLOB_FINGERPRINT_SIZE / 2;
array d_db_vec(number_of_uint32_for_frame + 1, // fingerprint size + 1 extra u32 for blob id
MAX_NUMBER_OF_FRAMES, // number of frames
u32); // type of elements is 32-bit unsigned integer (unsigned) with the configuration RGBRGB (565565)
if (TEST_DEBUG == true) {
for (uint frame_idx = 0; frame_idx < MAX_NUMBER_OF_FRAMES; frame_idx++) {
for (uint pix_idx = 0; pix_idx < number_of_uint32_for_frame; pix_idx++) {
d_db_vec(pix_idx, frame_idx) = make_test_pixel_group(); // fill everything with green :D
}
}
} else {
d_db_vec = rand(number_of_uint32_for_frame + 1, MAX_NUMBER_OF_FRAMES);
}
cout << "Setting blob ids. \n\n";
for (uint frame_idx = 0; frame_idx < MAX_NUMBER_OF_FRAMES; frame_idx++) {
// set the blob id to 123456
d_db_vec(number_of_uint32_for_frame, frame_idx) = 123456; // blob_id = 123456
}
if (TEST_DEBUG)
print(d_db_vec);
cout << "Done setting blob ids. \n\n";
//////////////////////////////////// CREATE THE SEARCHED FRAME ///////////////////////////////////
// to be done, for now we use the hardcoded values at line 37-39 to simulate the searched pixel:
//37 uint8_t searched_r = 0x0;
//38 uint8_t searched_g = 0x3F;
//39 uint8_t searched_b = 0x0;
///////////////////////////////////////////// SEARCH /////////////////////////////////////////////
clock_t timer = startTimer();
for (int i = 0; i< 1000; i++) {
array frame_found = search_frame(d_db_vec);
if (TEST_DEBUG)
print(frame_found);
}
stopTimer(timer);
return 0;
}
Here is the console output with the line commented:
arrayfire/examples/helloworld$ ./helloworld
ArrayFire v1.9.1 (64-bit Linux, build 9af23ea)
License: Server (27000#server.accelereyes.com)
CUDA toolkit 5.0, driver 304.54
GPU0 Tesla C2075, 5376 MB, Compute 2.0
Memory Usage: 5312 MB free (5376 MB total)
Setting blob ids.
Done setting blob ids.
Time: 0.03 seconds.
Here is the console output with the line uncommented:
arrayfire/examples/helloworld$ ./helloworld
ArrayFire v1.9.1 (64-bit Linux, build 9af23ea)
License: Server (27000#server.accelereyes.com)
CUDA toolkit 5.0, driver 304.54
GPU0 Tesla C2075, 5376 MB, Compute 2.0
Memory Usage: 5312 MB free (5376 MB total)
Setting blob ids.
Done setting blob ids.
Segmentation fault
Thanks in advance for any help on this issue. I really tried everything but without success.
Disclaimer: I am the lead developer of arrayfire. I see that you have posted on AccelerEyes forums as well, but I am posting here to clear up some common issues with your code.
Do not use .device(), .host(), .scalar() inside gfor loop. This will cause divergences inside the GFOR loop, and GFOR was not designed for this.
You can not index into a device pointer. The pointer refers to a location on the GPU. When you do device_frame_ptr[pixel_group_idx];, the system is looking for the equivalent position on the CPU. This is the reason for your segmentation fault.
Use vectorized code. For example, you don't need the inner for loop of the gfor. Instead of doing b1 = (pixels_uint32 & 0xF8000000) >> 27; inside a for loop, You can do array B1 = (frame & 0xF800000000) >> 27;. i.e. instead of getting data back to CPU and using a for loop, you are doing the entire operation inside the GPU.
Don't use if-else or ternary operators inside GFOR. These cause divergences again. For example, pixel_match_counter = sum(sum1 <= 16) + sum(sum2 < 16); and found(0, found_idx) = is_found * found(0, found_idx) + (1 - is_found) * blob_id.
I have answered the particular problem you are facing. If you have any follow up questions, please follow up on our forums and / or our support email. Stackoverflow is good for asking a specific question, but not to debug your entire program.

Color.HSBtoRGB missing in WinRT

I'm builing a fractal application and need to generate a smooth color scheme, and I found a nice algorithm at Smooth spectrum for Mandelbrot Set rendering.
But that required me to call Color.HSBtoRGB and that method is not available in WinRT / Windows Store apps.
Is there some other built-in method to do this conversion?
Other tips on how to convert HSB to RGB?
I ended up using the HSB to RGB conversion algorithm found at http://www.adafruit.com/blog/2012/03/14/constant-brightness-hsb-to-rgb-algorithm/, I adopted the inital (long) version. Perhaps this can be further optimized but for my purpose this was perfect!
As the hsb2rgb method is in C and I needed C#, I'm sharing my version here:
private byte[] hsb2rgb(int index, byte sat, byte bright)
{
int r_temp, g_temp, b_temp;
byte index_mod;
byte inverse_sat = (byte)(sat ^ 255);
index = index % 768;
index_mod = (byte)(index % 256);
if (index < 256)
{
r_temp = index_mod ^ 255;
g_temp = index_mod;
b_temp = 0;
}
else if (index < 512)
{
r_temp = 0;
g_temp = index_mod ^ 255;
b_temp = index_mod;
}
else if ( index < 768)
{
r_temp = index_mod;
g_temp = 0;
b_temp = index_mod ^ 255;
}
else
{
r_temp = 0;
g_temp = 0;
b_temp = 0;
}
r_temp = ((r_temp * sat) / 255) + inverse_sat;
g_temp = ((g_temp * sat) / 255) + inverse_sat;
b_temp = ((b_temp * sat) / 255) + inverse_sat;
r_temp = (r_temp * bright) / 255;
g_temp = (g_temp * bright) / 255;
b_temp = (b_temp * bright) / 255;
byte[] color = new byte[3];
color[0] = (byte)r_temp;
color[1] = (byte)g_temp;
color[2] = (byte)b_temp;
return color;
}
To call it based on the code linked in the original post I needed to make some minor modifications:
private byte[] SmoothColors1(int maxIterationCount, ref Complex z, int iteration)
{
double smoothcolor = iteration + 1 - Math.Log(Math.Log(z.Magnitude)) / Math.Log(2);
byte[] color = hsb2rgb((int)(10 * smoothcolor), (byte)(255 * 0.6f), (byte)(255 * 1.0f));
if (iteration >= maxIterationCount)
{
// Make sure the core is black
color[0] = 0;
color[1] = 0;
color[2] = 0;
}
return color;
}

OpenCL image2d_t writing mostly zeros

I am trying to use OpenCL and image2d_t objects to speed up image convolution. When I noticed that the output was a blank image of all zeros, I simplified the OpenCL kernel to a basic read from the input and write to the output (shown below). With a little bit of tweaking, I got it to write a few scattered pixels of the image into the output image.
I have verified that the image is intact up until the call to read_imageui() in the OpenCL kernel. I wrote the image to GPU memory with CommandQueue::enqueueWriteImage() and immediately read it back into a brand new buffer in CPU memory with CommandQueue::enqueueReadImage(). The result of this call matched the original input image. However, when I retrieve the pixels with read_imageui() in the kernel, the vast majority of the pixels are set to 0.
C++ source:
int height = 112;
int width = 9216;
unsigned int numPixels = height * width;
unsigned int numInputBytes = numPixels * sizeof(uint16_t);
unsigned int numDuplicatedInputBytes = numInputBytes * 4;
unsigned int numOutputBytes = numPixels * sizeof(int32_t);
cl::size_t<3> origin;
origin.push_back(0);
origin.push_back(0);
origin.push_back(0);
cl::size_t<3> region;
region.push_back(width);
region.push_back(height);
region.push_back(1);
std::ifstream imageFile("hri_vis_scan.dat", std::ifstream::binary);
checkErr(imageFile.is_open() ? CL_SUCCESS : -1, "hri_vis_scan.dat");
uint16_t *image = new uint16_t[numPixels];
imageFile.read((char *) image, numInputBytes);
imageFile.close();
// duplicate our single channel image into all 4 channels for Image2D
cl_ushort4 *imageDuplicated = new cl_ushort4[numPixels];
for (int i = 0; i < numPixels; i++)
for (int j = 0; j < 4; j++)
imageDuplicated[i].s[j] = image[i];
cl::Buffer imageBufferOut(context, CL_MEM_WRITE_ONLY, numOutputBytes, NULL, &err);
checkErr(err, "Buffer::Buffer()");
cl::ImageFormat inFormat;
inFormat.image_channel_data_type = CL_UNSIGNED_INT16;
inFormat.image_channel_order = CL_RGBA;
cl::Image2D bufferIn(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, inFormat, width, height, 0, imageDuplicated, &err);
checkErr(err, "Image2D::Image2D()");
cl::ImageFormat outFormat;
outFormat.image_channel_data_type = CL_UNSIGNED_INT16;
outFormat.image_channel_order = CL_RGBA;
cl::Image2D bufferOut(context, CL_MEM_WRITE_ONLY, outFormat, width, height, 0, NULL, &err);
checkErr(err, "Image2D::Image2D()");
int32_t *imageResult = new int32_t[numPixels];
memset(imageResult, 0, numOutputBytes);
cl_int4 *imageResultDuplicated = new cl_int4[numPixels];
for (int i = 0; i < numPixels; i++)
for (int j = 0; j < 4; j++)
imageResultDuplicated[i].s[j] = 0;
std::ifstream kernelFile("convolutionKernel.cl");
checkErr(kernelFile.is_open() ? CL_SUCCESS : -1, "convolutionKernel.cl");
std::string imageProg(std::istreambuf_iterator<char>(kernelFile), (std::istreambuf_iterator<char>()));
cl::Program::Sources imageSource(1, std::make_pair(imageProg.c_str(), imageProg.length() + 1));
cl::Program imageProgram(context, imageSource);
err = imageProgram.build(devices, "");
checkErr(err, "Program::build()");
cl::Kernel basic(imageProgram, "basic", &err);
checkErr(err, "Kernel::Kernel()");
basic.setArg(0, bufferIn);
basic.setArg(1, bufferOut);
basic.setArg(2, imageBufferOut);
queue.finish();
cl_ushort4 *imageDuplicatedTest = new cl_ushort4[numPixels];
for (int i = 0; i < numPixels; i++)
{
imageDuplicatedTest[i].s[0] = 0;
imageDuplicatedTest[i].s[1] = 0;
imageDuplicatedTest[i].s[2] = 0;
imageDuplicatedTest[i].s[3] = 0;
}
double gpuTimer = clock();
err = queue.enqueueReadImage(bufferIn, CL_FALSE, origin, region, 0, 0, imageDuplicatedTest, NULL, NULL);
checkErr(err, "CommandQueue::enqueueReadImage()");
// Output from above matches input image
err = queue.enqueueNDRangeKernel(basic, cl::NullRange, cl::NDRange(height, width), cl::NDRange(1, 1), NULL, NULL);
checkErr(err, "CommandQueue::enqueueNDRangeKernel()");
queue.flush();
err = queue.enqueueReadImage(bufferOut, CL_TRUE, origin, region, 0, 0, imageResultDuplicated, NULL, NULL);
checkErr(err, "CommandQueue::enqueueReadImage()");
queue.flush();
err = queue.enqueueReadBuffer(imageBufferOut, CL_TRUE, 0, numOutputBytes, imageResult, NULL, NULL);
checkErr(err, "CommandQueue::enqueueReadBuffer()");
queue.finish();
OpenCL kernel:
__kernel void basic(__read_only image2d_t input, __write_only image2d_t output, __global int *result)
{
const sampler_t smp = CLK_NORMALIZED_COORDS_TRUE | //Natural coordinates
CLK_ADDRESS_NONE | //Clamp to zeros
CLK_FILTER_NEAREST; //Don't interpolate
int2 coord = (get_global_id(1), get_global_id(0));
uint4 pixel = read_imageui(input, smp, coord);
result[coord.s0 + coord.s1 * 9216] = pixel.s0;
write_imageui(output, coord, pixel);
}
The coordinates in the kernel are currently mapped to (x, y) = (width, height).
The input image is a single channel greyscale image with 16 bits per pixel, which is why I had to duplicate the channels to fit into OpenCL's Image2D. The output after convolution will be 32 bits per pixel, which is why numOutputBytes is set to that. Also, although the width and height appear weird, the input image's dimensions are 9216x7824, so I'm only taking a portion of it to test the code first, so it doesn't take forever.
I added in a write to global memory after reading from the image in the kernel to see if the issue was reading the image or writing the image. After the kernel executes, this section of global memory also contains mostly zeros.
Any help would be greatly appreciated!
The documentation for read_imageui states that
Furthermore, the read_imagei and read_imageui calls that take integer coordinates must use a sampler with normalized coordinates set to CLK_NORMALIZED_COORDS_FALSE and addressing mode set to CLK_ADDRESS_CLAMP_TO_EDGE, CLK_ADDRESS_CLAMP or CLK_ADDRESS_NONE; otherwise the values returned are undefined.
But you're creating a sampler with CLK_NORMALIZED_COORDS_TRUE (but seem to be passing in non-normalized coords :S ?).

unsigned char* buffer to System::Drawing::Bitmap

I'm trying to create a tool/asset converter that rasterises a font to a texture page for an XNA game using the FreeType2 engine.
Below, the first image is the direct output from the FreeType2]1 engine. The second image is the result after attempting to convert it to a System::Drawing::Bitmap.
target http://www.freeimagehosting.net/uploads/fb102ee6da.jpg currentresult http://www.freeimagehosting.net/uploads/9ea77fa307.jpg
Any hints/tips/ideas on what is going on here would be greatly appreciated. Links to articles explaining byte layout and pixel formats would also be helpful.
FT_Bitmap *bitmap = &face->glyph->bitmap;
int width = (face->bitmap->metrics.width / 64);
int height = (face->bitmap->metrics.height / 64);
// must be aligned on a 32 bit boundary or 4 bytes
int depth = 8;
int stride = ((width * depth + 31) & ~31) >> 3;
int bytes = (int)(stride * height);
// as *.bmp
array<Byte>^ values = gcnew array<Byte>(bytes);
Marshal::Copy((IntPtr)glyph->buffer, values, 0, bytes);
Bitmap^ systemBitmap = gcnew Bitmap(width, height, PixelFormat::Format24bppRgb);
// create bitmap data, lock pixels to be written.
BitmapData^ bitmapData = systemBitmap->LockBits(Rectangle(0, 0, width, height), ImageLockMode::WriteOnly, bitmap->PixelFormat);
Marshal::Copy(values, 0, bitmapData->Scan0, bytes);
systemBitmap->UnlockBits(bitmapData);
systemBitmap->Save("Test.bmp");
Update. Changed PixelFormat to 8bppIndexed.
FT_Bitmap *bitmap = &face->glyph->bitmap;
// stride must be aligned on a 32 bit boundary or 4 bytes
int depth = 8;
int stride = ((width * depth + 31) & ~31) >> 3;
int bytes = (int)(stride * height);
target = gcnew Bitmap(width, height, PixelFormat::Format8bppIndexed);
// create bitmap data, lock pixels to be written.
BitmapData^ bitmapData = target->LockBits(Rectangle(0, 0, width, height), ImageLockMode::WriteOnly, target->PixelFormat);
array<Byte>^ values = gcnew array<Byte>(bytes);
Marshal::Copy((IntPtr)bitmap->buffer, values, 0, bytes);
Marshal::Copy(values, 0, bitmapData->Scan0, bytes);
target->UnlockBits(bitmapData);
Ah ha. Worked it out.
FT_Bitmap is an 8bit image, so the correct PixelFormat was 8bppIndexed, which resulted this output.
Not aligned to 32byte boundary http://www.freeimagehosting.net/uploads/dd90fa2252.jpg
System::Drawing::Bitmap needs to be aligned on a 32 bit boundary.
I was calculating the stride but was not padding it when writing the bitmap. Copied the FT_Bitmap buffer to a byte[] and then wrote that to a MemoryStream, adding the necessary padding.
int stride = ((width * pixelDepth + 31) & ~31) >> 3;
int padding = stride - (((width * pixelDepth) + 7) / 8);
array<Byte>^ pad = gcnew array<Byte>(padding);
array<Byte>^ buffer = gcnew array<Byte>(size);
Marshal::Copy((IntPtr)source->buffer, buffer, 0, size);
MemoryStream^ ms = gcnew MemoryStream();
for (int i = 0; i < height; ++i)
{
ms->Write(buffer, i * width, width);
ms->Write(pad, 0, padding);
}
Pinned the memory so the GC would leave it alone.
// pin memory and create bitmap
GCHandle handle = GCHandle::Alloc(ms->ToArray(), GCHandleType::Pinned);
target = gcnew Bitmap(width, height, stride, PixelFormat::Format8bppIndexed, handle.AddrOfPinnedObject());
ms->Close();
As there is no Format8bppIndexed Grey the image was still not correct.
alt text http://www.freeimagehosting.net/uploads/8a883b7dce.png
Then changed the bitmap palette to grey scale 256.
// 256-level greyscale palette
ColorPalette^ palette = target->Palette;
for (int i = 0; i < palette->Entries->Length; ++i)
palette->Entries[i] = Color::FromArgb(i,i,i);
target->Palette = palette;
alt text http://www.freeimagehosting.net/uploads/59a745269e.jpg
Final solution.
error = FT_Load_Char(face, ch, FT_LOAD_RENDER);
if (error)
throw gcnew InvalidOperationException("Failed to load and render character");
FT_Bitmap *source = &face->glyph->bitmap;
int width = (face->glyph->metrics.width / 64);
int height = (face->glyph->metrics.height / 64);
int pixelDepth = 8;
int size = width * height;
// stride must be aligned on a 32 bit boundary or 4 bytes
// padding is the number of bytes to add to make each row a 32bit aligned row
int stride = ((width * pixelDepth + 31) & ~31) >> 3;
int padding = stride - (((width * pixelDepth) + 7) / 8);
array<Byte>^ pad = gcnew array<Byte>(padding);
array<Byte>^ buffer = gcnew array<Byte>(size);
Marshal::Copy((IntPtr)source->buffer, buffer, 0, size);
MemoryStream^ ms = gcnew MemoryStream();
for (int i = 0; i < height; ++i)
{
ms->Write(buffer, i * width, width);
ms->Write(pad, 0, padding);
}
// pin memory and create bitmap
GCHandle handle = GCHandle::Alloc(ms->ToArray(), GCHandleType::Pinned);
target = gcnew Bitmap(width, height, stride, PixelFormat::Format8bppIndexed, handle.AddrOfPinnedObject());
ms->Close();
// 256-level greyscale palette
ColorPalette^ palette = target->Palette;
for (int i = 0; i < palette->Entries->Length; ++i)
palette->Entries[i] = Color::FromArgb(i,i,i);
target->Palette = palette;
FT_Done_FreeType(library);
Your "depth" value doesn't match the PixelFormat of the Bitmap. It needs to be 24 to match Format24bppRgb. The PF for the bitmap needs to match the PF and stride of the FT_Bitmap as well, I don't see you take care of that.

Resources