Efficiently render single-byte-per-pixel data in OpenFL? - haxe

I have a byte array that uses one byte to represent a pixel. To render it, currently I use a BitmapData and a loop to fill it:
bitmapData.lock();
for(y in 0...height) for(x in 0...width) {
var v = data[y * width + x];
bitmapData.setPixel(x, y, v << 16 | v << 8 | v); // grayscale
}
bitmapData.unlock();
But it is very CPU intensive. I wonder if there is a more efficient way, such as doing it in a shader?

You can look at the "TextRendering" sample from the Lime project to see an example of rendering from only an alpha texture. As of writing, this is not inherently support in OpenFL's renderer, but a pull request to this effect would be welcome.
The alternative would be using an OpenGLView with code similar to the "TextRendering" sample.

Related

Is there a more efficient way of texturing a circle?

I'm trying to create a randomly generated "planet" (circle), and I want the areas of water, land and foliage to be decided by perlin noise, or something similar. Currently I have this (psudo)code:
for (int radius = 0; radius < circleRadius; radius++) {
for (float theta = 0; theta < TWO_PI; theta += 0.1) {
float x = radius * cosine(theta);
float y = radius * sine(theta);
int colour = whateverFunctionIMake(x, y);
setPixel(x, y, colour);
}
}
Not only does this not work (there are "gaps" in the circle because of precision issues), it's incredibly slow. Even if I increase the resolution by changing the increment to 0.01, it still has missing pixels and is even slower (I get 10fps on my mediocre computer using Java (I know not the best) and an increment of 0.01. This is certainly not acceptable for a game).
How might I achieve a similar result whilst being much less computationally expensive?
Thanks in advance.
Why not use:
(x-x0)^2 + (y-y0)^2 <= r^2
so simply:
int x0=?,y0=?,r=?; // your planet position and size
int x,y,xx,rr,col;
for (rr=r*r,x=-r;x<=r;x++)
for (xx=x*x,y=-r;y<=r;y++)
if (xx+(y*y)<=rr)
{
col = whateverFunctionIMake(x, y);
setPixel(x0+x, y0+y, col);
}
all on integers, no floating or slow operations, no gaps ... Do not forget to use randseed for the coloring function ...
[Edit1] some more stuff
Now if you want speed than you need direct pixel access (in most platforms Pixels, SetPixel, PutPixels etc are slooow. because they perform a lot of stuff like range checking, color conversions etc ... ) In case you got direct pixel access or render into your own array/image whatever you need to add clipping with screen (so you do not need to check if pixel is inside screen on each pixel) to avoid access violations if your circle is overlapping screen.
As mentioned in the comments you can get rid of the x*x and y*y inside loop using previous value (as both x,y are only incrementing). For more info about it see:
32bit SQRT in 16T without multiplication
the math is like this:
(x+1)^2 = (x+1)*(x+1) = x^2 + 2x + 1
so instead of xx = x*x we just do xx+=x+x+1 for not incremented yet x or xx+=x+x-1 if x is already incremented.
When put all together I got this:
void circle(int x,int y,int r,DWORD c)
{
// my Pixel access
int **Pixels=Main->pyx; // Pixels[y][x]
int xs=Main->xs; // resolution
int ys=Main->ys;
// circle
int sx,sy,sx0,sx1,sy0,sy1; // [screen]
int cx,cy,cx0, cy0 ; // [circle]
int rr=r*r,cxx,cyy,cxx0,cyy0; // [circle^2]
// BBOX + screen clip
sx0=x-r; if (sx0>=xs) return; if (sx0< 0) sx0=0;
sy0=y-r; if (sy0>=ys) return; if (sy0< 0) sy0=0;
sx1=x+r; if (sx1< 0) return; if (sx1>=xs) sx1=xs-1;
sy1=y+r; if (sy1< 0) return; if (sy1>=ys) sy1=ys-1;
cx0=sx0-x; cxx0=cx0*cx0;
cy0=sy0-y; cyy0=cy0*cy0;
// render
for (cxx=cxx0,cx=cx0,sx=sx0;sx<=sx1;sx++,cxx+=cx,cx++,cxx+=cx)
for (cyy=cyy0,cy=cy0,sy=sy0;sy<=sy1;sy++,cyy+=cy,cy++,cyy+=cy)
if (cxx+cyy<=rr)
Pixels[sy][sx]=c;
}
This renders a circle with radius 512 px in ~35ms so 23.5 Mpx/s filling on mine setup (AMD A8-5500 3.2GHz Win7 64bit single thread VCL/GDI 32bit app coded by BDS2006 C++). Just change the direct pixel access to style/api you use ...
[Edit2]
to measure speed on x86/x64 you can use RDTSC asm instruction here some ancient C++ code I used ages ago (on 32bit environment without native 64bit stuff):
double _rdtsc()
{
LARGE_INTEGER x; // unsigned 64bit integer variable from windows.h I think
DWORD l,h; // standard unsigned 32 bit variables
asm {
rdtsc
mov l,eax
mov h,edx
}
x.LowPart=l;
x.HighPart=h;
return double(x.QuadPart);
}
It returns clocks your CPU has elapsed since power up. Beware you should account for overflows as on fast machines the 32bit counter is overflowing in seconds. Also each core has separate counter so set affinity to single CPU. On variable speed clock before measurement heat upi CPU by some computation and to convert to time just divide by CPU clock frequency. To obtain it just do this:
t0=_rdtsc()
sleep(250);
t1=_rdtsc();
fcpu = (t1-t0)*4;
and measurement:
t0=_rdtsc()
mesured stuff
t1=_rdtsc();
time = (t1-t0)/fcpu
if t1<t0 you overflowed and you need to add the a constant to result or measure again. Also the measured process must take less than overflow period. To enhance precision ignore OS granularity. for more info see:
Measuring Cache Latencies
Cache size estimation on your system? setting affinity example
Negative clock cycle measurements with back-to-back rdtsc?

Dump subtitle from AVSubtitle in the file

In FFMPEG sofftware, AVPicture is used to store image data using data pointer and linesizes.It means all subtitles are stored in the form of picture inside ffmpeg. Now I have DVB subtitle and I want to dump picture of subtitles stored in AVPicture in a buffer. I know these images subtitles can be dump using for, fopen and sprintf. But do not know how to dump Subtitle.I have to dump subtitles in .ppm file format.
Can anyone help me to dump picture of subtitles in buffer from AVSubtitle .
This process looks complex but actually very simple.
AVSubtitle is generic format, supports text and bitmap modes. Dvbsub format afaik bitmap only and the bitmap format can be differ like 16color or 256color mode as called CLUT_DEPTH.
I believe (in current ffmpeg) the bitmaps stored in AVSubtitleRect structure, which is member of AVSubtitle.
I assume you have a valid AVSubtitle packet(s) and if I understand correctly you can do these and it should work:
1) Check pkt->rect[0]->type. The pkt here is a valid AVSubtitle packet. It must be type of SUBTITLE_BITMAP.
2) If so, bitmap with and height can be read from pkt->rects[0]->w and pkt->rects[0]->h.
3) Bitmap data itself in will be pkt->rects[0]->data[0].
4) CLUT_DEPTH can be read from pkt->rects[0]->nb_colors.
5) And CLUT itself (color table) will be in pkt->rects[0]->data[1].
With these data, you can construct a valid .bmp file that can be viewable on windows or linux desktop, but I left this part to you.
PPM Info
First check this info about PPM format:
https://www.cs.swarthmore.edu/~soni/cs35/f13/Labs/extras/01/ppm_info.html
What I understand is PPM format uses RGB values (24bit/3bytes). It looks like to me all you have to do is construct a header according to data obtained from AVSubtitle packet above. And write a conversion function for dvbsub's indexed color buffer to RGB. I'm pretty sure somewhere there are some ready to use codes out there but I'll explain anyway.
In the picture frame data Dvbsub uses is liner and every pixel is 1 byte (even in 16color mode). This byte value is actually index value that correspond RGB (?) values stored in Color Look-Up Table (CLUT), in 16 color mode there are 16 index each 4 bytes, first 3 are R, G, B values and 4th one is alpha (transparency values, if PPM doesn't support this, ignore it).
I'm not sure if decoded subtitle still has encoded YUV values. I remember it should be plain RGBA format.
encode_dvb_subtitles function on ffmpeg shows how this encoding done. If you need it.
https://github.com/FFmpeg/FFmpeg/blob/a0ac49e38ee1d1011c394d7be67d0f08b2281526/libavcodec/dvbsub.c
Hope that helps.
As this is where I ended up when searching for answers to how to create a thumbnail of an AVSubtitle, here is what I ended up using in my test application. The code is optimized for readability only. I got some help from this question which had some sample code.
Using avcodec_decode_subtitle2() I get a AVSubtitle structure. This contains a number of rectangles. First I iterate over the rectangles to find the max of x + w and y + h to determine the width and height of the target frame.
The color table in data[1] is RGBA, so I allocate an AVFrame called frame in AV_PIX_FMT_RGBA format and shuffle the pixels over to it:
struct [[gnu::packed]] rgbaPixel {
uint8_t r;
uint8_t g;
uint8_t b;
uint8_t a;
};
// Copy the pixel buffers
for (unsigned int i = 0; i < sub.num_rects; ++ i) {
AVSubtitleRect* rect = sub.rects[i];
for (int y = 0; y < rect->h; ++ y) {
int dest_y = y + rect->y;
// data[0] holds index data
uint8_t *in_linedata = rect->data[0] + y * rect->linesize[0];
// In AVFrame, data[0] holds the pixel buffer directly
uint8_t *out_linedata = frame->data[0] + dest_y * frame->linesize[0];
rgbaPixel *out_pixels = reinterpret_cast<rgbaPixel*>(out_linedata);
for (int x = 0; x < rect->w; ++ x) {
// data[1] contains the color map
// compare libavcodec/dvbsubenc.c
uint8_t colidx = in_linedata[x];
uint32_t color = reinterpret_cast<uint32_t*>(rect->data[1])[colidx];
// Now store the pixel in the target buffer
out_pixels[x + rect->x] = rgbaPixel{
.r = static_cast<uint8_t>((color >> 16) & 0xff),
.g = static_cast<uint8_t>((color >> 8) & 0xff),
.b = static_cast<uint8_t>((color >> 0) & 0xff),
.a = static_cast<uint8_t>((color >> 24) & 0xff),
};
}
}
}
I did manage to push that AVFrame through an image decoder to output it as a bitmap image, and it looked OK. I did get green areas where the alpha channel is, but that might be an artifact of the settings in the JPEG encoder I used.

unity3d: Use main camera's depth buffer for rendering another camera view

After my main camera renders, I'd like to use (or copy) its depth buffer to a (disabled) camera's depth buffer.
My goal is to draw particles onto a smaller render target (using a separate camera) while using the depth buffer after opaque objects are drawn.
I can't do this in a single camera, since the goal is to use a smaller render target for the particles for performance reasons.
Replacement shaders in Unity aren't an option either: I want my particles to use their existing shaders - i just want the depth buffer of the particle camera to be overwritten with a subsampled version of the main camera's depth buffer before the particles are drawn.
I didn't get any reply to my earlier question; hence, the repost.
Here's the script attached to my main camera. It renders all the non-particle layers and I use OnRenderImage to invoke the particle camera.
public class MagicRenderer : MonoBehaviour {
public Shader particleShader; // shader that uses the main camera's depth buffer to depth test particle Z
public Material blendMat; // material that uses a simple blend shader
public int downSampleFactor = 1;
private RenderTexture particleRT;
private static GameObject pCam;
void Awake () {
// make the main cameras depth buffer available to the shaders via _CameraDepthTexture
camera.depthTextureMode = DepthTextureMode.Depth;
}
// Update is called once per frame
void Update () {
}
void OnRenderImage(RenderTexture src, RenderTexture dest) {
// create tmp RT
particleRT = RenderTexture.GetTemporary (Screen.width / downSampleFactor, Screen.height / downSampleFactor, 0);
particleRT.antiAliasing = 1;
// create particle cam
Camera pCam = GetPCam ();
pCam.CopyFrom (camera);
pCam.clearFlags = CameraClearFlags.SolidColor;
pCam.backgroundColor = new Color (0.0f, 0.0f, 0.0f, 0.0f);
pCam.cullingMask = 1 << LayerMask.NameToLayer ("Particles");
pCam.useOcclusionCulling = false;
pCam.targetTexture = particleRT;
pCam.depth = 0;
// Draw to particleRT's colorBuffer using mainCam's depth buffer
// ?? - how do i transfer this camera's depth buffer to pCam?
pCam.Render ();
// pCam.RenderWithShader (particleShader, "Transparent"); // I don't want to replace the shaders my particles use; os shader replacement isnt an option.
// blend mainCam's colorBuffer with particleRT's colorBuffer
// Graphics.Blit(pCam.targetTexture, src, blendMat);
// copy resulting buffer to destination
Graphics.Blit (pCam.targetTexture, dest);
// clean up
RenderTexture.ReleaseTemporary(particleRT);
}
static public Camera GetPCam() {
if (!pCam) {
GameObject oldpcam = GameObject.Find("pCam");
Debug.Log (oldpcam);
if (oldpcam) Destroy(oldpcam);
pCam = new GameObject("pCam");
pCam.AddComponent<Camera>();
pCam.camera.enabled = false;
pCam.hideFlags = HideFlags.DontSave;
}
return pCam.camera;
}
}
I've a few additional questions:
1) Why does camera.depthTextureMode = DepthTextureMode.Depth; end up drawing all the objects in the scene just to write to the Z-buffer? Using Intel GPA, I see two passes before OnRenderImage gets called:
(i) Z-PrePass, that only writes to the depth buffer
(ii) Color pass, that writes to both the color and depth buffer.
2) I re-rendered the opaque objects to pCam's RT using a replacement shader that writes (0,0,0,0) to the colorBuffer with ZWrite On (to overcome the depth buffer transfer problem). After that, I reset the layers and clear mask as follows:
pCam.cullingMask = 1 << LayerMask.NameToLayer ("Particles");
pCam.clearFlags = CameraClearFlags.Nothing;
and rendered them using pCam.Render().
I thought this would render the particles using their existing shaders with the ZTest.
Unfortunately, what I notice is that the depth-stencil buffer is cleared before the particles are drawn (inspite me not clearing anything..).
Why does this happen?
It's been 5 years but I delevoped an almost complete solution for rendering particles in a smaller seperate render target. I write this for future visitors. A lot of knowledge is still required.
Copying the depth
First, you have to get the scene depth in the resolution of your smaller render texture.
This can be done by creating a new render texture with the color format "depth".
To write the scene depth to the low resolution depth, create a shader that just outputs the depth:
struct fragOut{
float depth : DEPTH;
};
sampler2D _LastCameraDepthTexture;
fragOut frag (v2f i){
fragOut tOut;
tOut.depth = tex2D(_LastCameraDepthTexture, i.uv).x;
return tOut;
}
_LastCameraDepthTexture is automatically filled by Unity, but there is a downside.
It only comes for free if the main camera renders with deferred rendering.
For forward shading, Unity seems to render the scene again just for the depth texture.
Check the frame debugger.
Then, add a post processing effect to the main camera that executes the shader:
protected virtual void OnRenderImage(RenderTexture pFrom, RenderTexture pTo) {
Graphics.Blit(pFrom, mSmallerSceneDepthTexture, mRenderToDepthMaterial);
Graphics.Blit(pFrom, pTo);
}
You can probably do this without the second blit, but it was easier for me for testing.
Using the copied depth for rendering
To use the new depth texture for your second camera, call
mSecondCamera.SetTargetBuffers(mParticleRenderTexture.colorBuffer, mSmallerSceneDepthTexture.depthBuffer);
Keep targetTexture empty.
You then must ensure the second camera does not clear the depth, only the color.
For this, disable clear on the second camera completely and clear manually like this
Graphics.SetRenderTarget(mParticleRenderTexture);
GL.Clear(false, true, Color.clear);
I recommend to also render the second camera by hand. Disable it and call
mSecondCamera.Render();
after clearing.
Merging
Now you have to merge the main view and the seperate layer.
Depending on your rendering, you will probably end up with a render texture with so called premultiplied alpha.
To mix this with the rest, use a post processing step on the main camera with
fixed4 tBasis = tex2D(_MainTex, i.uv);
fixed4 tToInsert = tex2D(TransparentFX, i.uv);
//beware premultiplied alpha in insert
tBasis.rgb = tBasis.rgb * (1.0f- tToInsert.a) + tToInsert.rgb;
return tBasis;
Additive materials work out of the box, but alpha blended do not.
You have to create a shader with custom blending to create working alpha blended materials. The blending is
Blend SrcAlpha OneMinusSrcAlpha, One OneMinusSrcAlpha
This changes how the alpha channel is modified for every performed blending.
Results
add blended in front of alpha blended
fx layer rgb
fx layer alpha
alpha blended in front of add blended
fx layer rgb
fx layer alpha
I did not test yet if the performance actually increases.
If anyone has a simpler solution, let me know please.
I managed to reuse camera Z-buffer "manually" in the shader used for rendering. See http://forum.unity3d.com/threads/reuse-depth-buffer-of-main-camera.280460/ for more.
Just alter the particle shader you use already for particle rendering.

Accessing GPU memory in OpenCL/C++Amp

I need to find information about how the Unified Shader Array accessess the GPU memory to have an idea how to use it effectively. The image of the architecture of my graphics card doesn't show it clearly.
I need to load a big image into GPU memory using C++Amp and divide it into small pieces (like 4x4 pixels). Every piece should be computed with a different thread. I don't know how the threads share the access to the image.
Is there any way of doing it in such way that the threads aren't blocking each other while accessing the image? Maybe they have their own memory that can be accesses exclusively?
Or maybe the access to the unified memory is so fast that I shouldn't care about it (however I don't belive in it)? It is really important, because I need to compute about 10k subsets for every image.
For C++ AMP you want to load the data that each thread within a tile uses into tile_static memory before starting your convolution calculation. Because each thread accesses pixels which are also read by other threads this allows your to do a single read for each pixel from (slow) global memory and cache it in (fast) tile static memory so that all of the subsequent reads are faster.
You can see an example of tiling for convolution here. The DetectEdgeTiled method loads all the data that it requires and the calls idx.barrier.wait() to ensure all the threads have finished writing data into tile static memory. Then it executes the edge detection code taking advantage of tile_static memory. There are many other examples of this pattern in the samples. Note that the loading code in DetectEdgeTiled is complex only because it must account for the additional pixels around the edge of the pixels that are being written in the current tile and is essentially an unrolled loop, hence it's length.
I'm not sure you are thinking about the problem in quite the right way. There are two levels of partitioning here. To calculate the new value for each pixel the thread doing this work reads the block of surrounding pixels. In addition blocks (tiles) of threads loads larger blocks of pixel data into tile_static memory. Each thread on the tile then calculates the result for one pixel within the block.
void ApplyEdgeDetectionTiledHelper(const array<ArgbPackedPixel, 2>& srcFrame,
array<ArgbPackedPixel, 2>& destFrame)
{
tiled_extent<tileSize, tileSize> computeDomain = GetTiledExtent(srcFrame.extent);
parallel_for_each(computeDomain.tile<tileSize, tileSize>(), [=, &srcFrame, &destFrame, &orgFrame](tiled_index<tileSize, tileSize> idx) restrict(amp)
{
DetectEdgeTiled(idx, srcFrame, destFrame, orgFrame);
});
}
void DetectEdgeTiled(
tiled_index<tileSize, tileSize> idx,
const array<ArgbPackedPixel, 2>& srcFrame,
array<ArgbPackedPixel, 2>& destFrame) restrict(amp)
{
const UINT shift = imageBorderWidth / 2;
const UINT startHeight = 0;
const UINT startWidth = 0;
const UINT endHeight = srcFrame.extent[0];
const UINT endWidth = srcFrame.extent[1];
tile_static RgbPixel localSrc[tileSize + imageBorderWidth ]
[tileSize + imageBorderWidth];
const UINT global_idxY = idx.global[0];
const UINT global_idxX = idx.global[1];
const UINT local_idxY = idx.local[0];
const UINT local_idxX = idx.local[1];
const UINT local_idx_tsY = local_idxY + shift;
const UINT local_idx_tsX = local_idxX + shift;
// Copy image data to tile_static memory. The if clauses are required to deal with threads that own a
// pixel close to the edge of the tile and need to copy additional halo data.
// This pixel
index<2> gNew = index<2>(global_idxY, global_idxX);
localSrc[local_idx_tsY][local_idx_tsX] = UnpackPixel(srcFrame[gNew]);
// Left edge
if (local_idxX < shift)
{
index<2> gNew = index<2>(global_idxY, global_idxX - shift);
localSrc[local_idx_tsY][local_idx_tsX-shift] = UnpackPixel(srcFrame[gNew]);
}
// Right edge
// Top edge
// Bottom edge
// Top Left corner
// Bottom Left corner
// Bottom Right corner
// Top Right corner
// Synchronize all threads so that none of them start calculation before
// all data is copied onto the current tile.
idx.barrier.wait();
// Make sure that the thread is not referring to a border pixel
// for which the filter cannot be applied.
if ((global_idxY >= startHeight + 1 && global_idxY <= endHeight - 1) &&
(global_idxX >= startWidth + 1 && global_idxX <= endWidth - 1))
{
RgbPixel result = Convolution(localSrc, index<2>(local_idx_tsY, local_idx_tsX));
destFrame[index<2>(global_idxY, global_idxX)] = result;
}
}
This code was taken from CodePlex and I stripped out a lot of the real implementation to make it clearer.
WRT #sharpneli's answer you can use texture<> in C++ AMP to achieve the same result as OpenCL images. There is also an example of this on CodePlex.
In this particular case you do not have to worry. Just use OpenCL images. GPU's are extremely good at simply reading images (due to texturing). However this method requires writing the result into a separate image because you cannot read and write from the same image in a single kernel. You should use this if you can perform the computation as a single pass (no need to iterate).
Another way is to access it as a normal memory buffer, load the parts within a wavefront (group of threads running in sync) into local memory (this memory is blazingly fast), perform computation and write complete end result back into unified memory after computation. You should use this approach if you need to read and write values to the same image while computing. If you are not memory bound you can still read the original values from a texture, then iterate in local memory and write the end results in separate image.
Reads from unified memory are slow only if it's not const * restrict and multiple threads read the same location. In general if subsequent thread id's read subsequent locations it's rather fast. However if your threads both write and read to unified memory then it's going to be slow.

Why is the transitive closure of matrix multiplication not working in this vertex shader?

This can probably be filed under premature optimization, but since the vertex shader executes on each vertex for each frame, it seems like something worth doing (I have a lot of vars I need to multiply before going to the pixel shader).
Essentially, the vertex shader performs this operation to convert a vector to projected space, like so:
// Transform the vertex position into projected space.
pos = mul(pos, model);
pos = mul(pos, view);
pos = mul(pos, projection);
output.pos = pos;
Since I'm doing this operation to multiple vectors in the shader, it made sense to combine those matrices into a cumulative matrix on the CPU and then flush that to the GPU for calculation, like so:
// VertexShader.hlsl
cbuffer ModelViewProjectionConstantBuffer : register (b0)
{
matrix model;
matrix view;
matrix projection;
matrix cummulative;
float3 eyePosition;
};
...
// Transform the vertex position into projected space.
pos = mul(pos, cummulative);
output.pos = pos;
And on the CPU:
// Renderer.cpp
// now is also the time to update the cummulative matrix
m_constantMatrixBufferData->cummulative =
m_constantMatrixBufferData->model *
m_constantMatrixBufferData->view *
m_constantMatrixBufferData->projection;
// NOTE: each of the above vars is an XMMATRIX
My intuition was that there was some mismatch of row-major/column-major, but XMMATRIX is a row-major struct (and all of its operators treat it as such) and mul(...) interprets its matrix parameter as row-major. So that doesn't seem to be the problem but perhaps it still is in a way that I don't understand.
I've also checked the contents of the cumulative matrix and they appear correct, further adding to the confusion.
Thanks for reading, I'll really appreciate any hints you can give me.
EDIT (additional information requested in comments):
This is the struct that I am using as my matrix constant buffer:
// a constant buffer that contains the 3 matrices needed to
// transform points so that they're rendered correctly
struct ModelViewProjectionConstantBuffer
{
DirectX::XMMATRIX model;
DirectX::XMMATRIX view;
DirectX::XMMATRIX projection;
DirectX::XMMATRIX cummulative;
DirectX::XMFLOAT3 eyePosition;
// and padding to make the size divisible by 16
float padding;
};
I create the matrix stack in CreateDeviceResources (along with my other constant buffers), like so:
void ModelRenderer::CreateDeviceResources()
{
Direct3DBase::CreateDeviceResources();
// Let's take this moment to create some constant buffers
... // creation of other constant buffers
// and lastly, the all mighty matrix buffer
CD3D11_BUFFER_DESC constantMatrixBufferDesc(sizeof(ModelViewProjectionConstantBuffer), D3D11_BIND_CONSTANT_BUFFER);
DX::ThrowIfFailed(
m_d3dDevice->CreateBuffer(
&constantMatrixBufferDesc,
nullptr,
&m_constantMatrixBuffer
)
);
... // and the rest of the initialization (reading in the shaders, loading assets, etc)
}
I write into the matrix buffer inside a matrix stack class I created. The client of the class calls Update() once they are done modifying the matrices:
void MatrixStack::Update()
{
// then update the buffers
m_constantMatrixBufferData->model = model.front();
m_constantMatrixBufferData->view = view.front();
m_constantMatrixBufferData->projection = projection.front();
// NOTE: the eye position has no stack, as it's kept updated by the trackball
// now is also the time to update the cummulative matrix
m_constantMatrixBufferData->cummulative =
m_constantMatrixBufferData->model *
m_constantMatrixBufferData->view *
m_constantMatrixBufferData->projection;
// and flush
m_d3dContext->UpdateSubresource(
m_constantMatrixBuffer.Get(),
0,
NULL,
m_constantMatrixBufferData,
0,
0
);
}
Given your code snippets, it should work.
Possible causes of your problem:
Have you tried the inverted multiplication: projection * view * model?
Are you sure you set correctly the cummulative (register index = 12, offset = 192) in your constant buffer
Same for eyePosition (register index = 12, offset = 256)?

Resources