Loading thread using a second shared OpenGL context - multithreading

My plan was to create a loading thread inside of which I load resources for a game; such as 3D models, shaders, textures, etc. On the main thread I perform all the game logic and rendering. Then, on my loading thread, I create a sf::Context (SFML shared OpenGL context) which is used only for loading.
This is working for loading shaders. However, xserver sometimes crashes when attempting to load models. I have narrowed the crash down to the glBufferData() call. I have checked that there is nothing wrong with the data that I am sending.
Is it possible call glBufferData() from a second thread using a second OpenGL context? If not, why is it possible to load shaders in the second context? If it is possible, what could be going wrong?
#include <iostream>
#include <thread>
#include <GL/glew.h>
#include <SFML/OpenGL.hpp>
#include <SFML/Graphics.hpp>
#include <X11/Xlib.h>
class ResourceLoader
{
public:
void Run()
{
sf::Context loadingContext;
loadingContext.setActive(true);
// Some test data.
float* testData = new float[3000];
for (unsigned int i = 0; i < 3000; ++i)
{
testData[i] = 0.0f;
}
// Create lots of VBOs containing our
// test data.
for (unsigned int i = 0; i < 1000; ++i)
{
// Create VBO.
GLuint testVBO = 0;
glGenBuffers(1, &testVBO);
std::cout << "Buffer ID: " << testVBO << std::endl;
// Bind VBO.
glBindBuffer(GL_ARRAY_BUFFER, testVBO);
// Crashes on this call!
glBufferData(
GL_ARRAY_BUFFER,
sizeof(float) * 3000,
&testData[0],
GL_STATIC_DRAW
);
// Unbind VBO.
glBindBuffer(GL_ARRAY_BUFFER, 0);
// Sleep for a bit.
std::this_thread::sleep_for(std::chrono::milliseconds(10));
}
delete[] testData;
}
};
int main()
{
XInitThreads();
// Create the main window.
sf::RenderWindow window(sf::VideoMode(800, 600), "SFML window", sf::Style::Default, sf::ContextSettings(32));
window.setVerticalSyncEnabled(true);
// Make it the active window for OpenGL calls.
window.setActive();
// Configure the OpenGL viewport to be the same size as the window.
glViewport(0, 0, window.getSize().x, window.getSize().y);
// Initialize GLEW.
glewExperimental = GL_TRUE; // OSX fix.
if (glewInit() != GLEW_OK)
{
window.close();
exit(1); // failure
}
// Enable Z-buffer reading and writing.
glEnable(GL_DEPTH_TEST);
glDepthMask(GL_TRUE);
// Create the resource loader.
ResourceLoader loader;
// Run the resource loader in a separate thread.
std::thread loaderThread(&ResourceLoader::Run, &loader);
// Detach the loading thread, allowing it to run
// in the background.
loaderThread.detach();
// Game loop.
while (window.isOpen())
{
// Event loop.
sf::Event event;
while (window.pollEvent(event))
{
if (event.type == sf::Event::Closed)
{
window.close();
}
}
// Clear scren.
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
// Switch to SFML's OpenGL state.
window.pushGLStates();
{
// Perform SFML drawing here...
sf::RectangleShape rect(sf::Vector2f(100.0f, 100.0f));
rect.setPosition(100.0f, 100.0f);
rect.setFillColor(sf::Color(255, 255, 255));
window.draw(rect);
}
// Switch back to our game rendering OpenGL state.
window.popGLStates();
// Perform OpenGL drawing here...
// Display the rendered frame.
window.display();
}
return 0;
}

I think the problem is that you call glBufferData before setting up GLEW, so the function pointer to glBufferData is not initialized. Please try this ordering for initializing your program:
Initialize the RenderWindow
Initialize GLEW
Start threads, and create additional contexts as required!

Related

Is there a way to create GLX context after Xlib's window creation?

I'm trying to create OpenGLx context after the Xlib's window creation. I'm trying to separate the Xlib window creation and opengl context creation into two different phases.
Win32 window-opengl context creation was rather simple but I couldnt find any resource that illustrates the same process with Xlib-opengl in linux
This is how its done for xlib-linux
GLint glxAttribs[] = {
GLX_RGBA,
GLX_DOUBLEBUFFER,
GLX_DEPTH_SIZE, 24,
GLX_STENCIL_SIZE, 8,
GLX_RED_SIZE, 8,
GLX_GREEN_SIZE, 8,
GLX_BLUE_SIZE, 8,
GLX_SAMPLE_BUFFERS, 0,
GLX_SAMPLES, 0,
None
};
XVisualInfo* visual = glXChooseVisual(display, screenId, glxAttribs);
XSetWindowAttributes windowAttribs;
windowAttribs.border_pixel = BlackPixel(display, screenId);
windowAttribs.background_pixel = WhitePixel(display, screenId);
windowAttribs.override_redirect = True;
windowAttribs.colormap = XCreateColormap(display, RootWindow(display, screenId), visual->visual, AllocNone);
windowAttribs.event_mask = ExposureMask;
window = XCreateWindow(display, RootWindow(display, screenId), 0, 0, 320, 200, 0, visual->depth, InputOutput, visual->visual, CWBackPixel | CWColormap | CWBorderPixel | CWEventMask, &windowAttribs);
This is how its done in windows
const WindowsWindow* pWin32Window = (const WindowsWindow*)pOwnerWindow;
HWND windowHandle = pWin32Window->GetWin32WindowHandle();
HDC windowDeviceContext = pWin32Window->GetWin32WindowDeviceContext();
/*
* Create pixel format
*/
PIXELFORMATDESCRIPTOR pfd = { sizeof(pfd),1 };
memset(&pfd, 0, sizeof(PIXELFORMATDESCRIPTOR));
pfd.nSize = sizeof(PIXELFORMATDESCRIPTOR);
pfd.dwFlags = PFD_DRAW_TO_WINDOW | PFD_SUPPORT_OPENGL | PFD_DOUBLEBUFFER;
pfd.iPixelType = PFD_TYPE_RGBA;
pfd.nVersion = 1;
pfd.cColorBits = OpenGLDeviceUtilsWin32::GetColorBits(desc.SwapchainBufferFormat);
pfd.cAlphaBits = OpenGLDeviceUtilsWin32::GetAlphaBits(desc.SwapchainBufferFormat);
pfd.cDepthBits = OpenGLDeviceUtilsWin32::GetDepthBits(desc.SwapchainDepthStencilBufferFormat);
pfd.cStencilBits = OpenGLDeviceUtilsWin32::GetStencilBits(desc.SwapchainDepthStencilBufferFormat);
pfd.cAuxBuffers = 3;
pfd.iLayerType = PFD_MAIN_PLANE;
const int pixelFormatIndex = ChoosePixelFormat(windowDeviceContext, &pfd);
ASSERT(pixelFormatIndex != 0,"OpenGLDevice","Invalid pixel format");
ASSERT(SetPixelFormat(windowDeviceContext, pixelFormatIndex, &pfd), "OpenGLDevice", "Win32 window rejected the specified pixel format");
HGLRC tempContext = wglCreateContext(windowDeviceContext);
ASSERT(tempContext != NULL, "OpenGLDevice", "Creation of wgl dummy context failed!");
wglMakeCurrent(windowDeviceContext, tempContext);
PFNWGLCREATECONTEXTATTRIBSARBPROC wglCreateContextAttribsARB = NULL;
wglCreateContextAttribsARB = (PFNWGLCREATECONTEXTATTRIBSARBPROC)wglGetProcAddress("wglCreateContextAttribsARB");
ASSERT(wglCreateContextAttribsARB != NULL, "OpenGLDevice", "WGL get proc address failed!");
But I would expect something like this.
Create xlib window
Check for glx attribs if the window can support that pixel format
Create glx context using pixel format
But instead it goes as
Create window with your specific glx attribs
Create glx context
I wonder if there is a way for us to create window without letting xlib know we are going to use it for opengl and implement OpenGL specific setup for window creation process.
I'm trying to create OpenGLx context after the Xlib's window creation.
I don't really see your problem. On Win32 the usual stanza is:
Create window
Select pixelformat
Set pixelformat on window
Get HDC from window and use it to create context
On GLX the stanza is:
Select visual for window
Create window that's compatible with visual
Create OpenGL context with the selected visual
Take note that in both Win32 and GLX there is no hard tie between the window and the OpenGL context. As long as the pixelformat/visual of a OpenGL context and a window are compatiple, you can use them with each other.
The only difference between GLX and Win32 is, how the pixelformat/visual is communicated to OpenGL context creation. In GLX it's done directy, in Win32 the pixelformat is communicated in a rather convoluted way by means of the HDC of a window. And take note that in order to obtain a modern OpenGL context you actually have to go the route of OpenGL context creation with attributes which works exactly the same in Win32 and GLX (with Win32 needing the added steps of creating a dummy OpenGL context first in order to obtain the function pointers to the wglCreateContextAttribsARB functions, which are directly available in GLX).
Honestly, I do not understand your motivation.
Many implementations like GLFW gets Visual from GLX/EGL APIs (including glXChooseFBConfig) and use it when creating a window. The GLX/EGL stuff part can be abstracted by writing wrappers, so I don't see the need to go to the trouble of avoiding it.
That being said, it is still possible to avoid it, so I wrote the sample code for you.
// To build, execute the command below.
// c++ -Wall -Wextra -std=c++17 -o main main.cpp -lX11 -lGLX -lGL
#include <cstdio>
#include <chrono>
#include <thread>
#include <sys/time.h>
#include <unistd.h>
#include <X11/Xlib.h>
#include <X11/Xutil.h>
#include <GL/glx.h>
#include <GL/glxext.h>
#define OGL_MAJOR_VERSION 3
#define OGL_MINOR_VERSION 3
#define WINDOW_WIDTH 640
#define WINDOW_HEIGHT 360
#define FPS 60
static double get_time() {
static timeval s_tTimeVal;
gettimeofday(&s_tTimeVal, NULL);
double time = s_tTimeVal.tv_sec * 1000.0; // sec to ms
time += s_tTimeVal.tv_usec / 1000.0; // us to ms
return time;
}
struct TestWindowConfig {
int width = 640;
int height = 360;
};
class TestWindow final {
public:
explicit TestWindow(const TestWindowConfig& config) : m_config(config) {}
virtual ~TestWindow() {
if (m_display) {
if (m_xid) {
XDestroyWindow(m_display, m_xid);
}
XCloseDisplay(m_display);
}
}
bool create() {
m_display = XOpenDisplay(NULL);
if (!m_display) {
fprintf(stderr, "XOpenDisplay() failed\n");
return false;
}
XSetWindowAttributes x_attr;
x_attr.override_redirect = False;
x_attr.border_pixel = 0;
m_xid = XCreateWindow(m_display, DefaultRootWindow(m_display), 0, 0, m_config.width,
m_config.height, 0, CopyFromParent, InputOutput, CopyFromParent,
CWOverrideRedirect | CWBorderPixel, &x_attr);
if (!m_xid) {
fprintf(stderr, "XOpenDisplay() failed\n");
return false;
}
XStoreName(m_display, m_xid, "X11-GLX Sample");
XMapWindow(m_display, m_xid);
m_wm_delete_window = XInternAtom(m_display, "WM_DELETE_WINDOW", True);
XSetWMProtocols(m_display, m_xid, &m_wm_delete_window, 1);
return true;
}
void show() const {
if (m_display && m_xid) {
XMapRaised(m_display, m_xid);
}
}
bool poll_events() {
if (!m_display) {
fprintf(stderr, "Display is null\n");
return false;
}
while (XPending(m_display) > 0) {
XEvent ev;
XNextEvent(m_display, &ev);
if (ev.type == ClientMessage) {
if ((Atom)ev.xclient.data.l[0] == m_wm_delete_window) {
m_should_close = true;
}
}
}
return true;
}
bool should_close() const { return m_should_close; }
Display* display() const { return m_display; }
Window xid() const { return m_xid; }
int screen_id() const { return DefaultScreen(m_display); }
private:
TestWindowConfig m_config;
Display* m_display = nullptr;
Window m_xid = 0;
Atom m_wm_delete_window;
bool m_should_close = false;
};
class TestGLContext final {
public:
explicit TestGLContext() = default;
virtual ~TestGLContext() = default;
bool create(const TestWindow& window) {
// clang-format off
int visual_attr[] = {
GLX_DRAWABLE_TYPE, GLX_WINDOW_BIT,
GLX_RENDER_TYPE, GLX_RGBA_BIT,
GLX_RED_SIZE, 8,
GLX_GREEN_SIZE, 8,
GLX_BLUE_SIZE, 8,
GLX_ALPHA_SIZE, 8,
GLX_DEPTH_SIZE, 0,
GLX_STENCIL_SIZE, 0,
GLX_DOUBLEBUFFER, True,
None
};
// clang-format on
int cfg_count;
auto fb_configs =
glXChooseFBConfig(window.display(), window.screen_id(), visual_attr, &cfg_count);
if (!fb_configs || (cfg_count < 1)) {
fprintf(stderr, "glXChooseFBConfig(): No config found\n");
return false;
}
PFNGLXCREATECONTEXTATTRIBSARBPROC glXCreateContextAttribsARB =
(PFNGLXCREATECONTEXTATTRIBSARBPROC)glXGetProcAddressARB(
(const GLubyte*)"glXCreateContextAttribsARB");
if (!glXCreateContextAttribsARB) {
fprintf(stderr, "Failed to load glXCreateContextAttribsARB\n");
return false;
}
// clang-format off
int ctx_attr[] = {
GLX_CONTEXT_PROFILE_MASK_ARB, GLX_CONTEXT_CORE_PROFILE_BIT_ARB,
GLX_CONTEXT_MAJOR_VERSION_ARB, OGL_MAJOR_VERSION,
GLX_CONTEXT_MINOR_VERSION_ARB, OGL_MINOR_VERSION,
0, 0
};
// clang-format on
m_ctx = glXCreateContextAttribsARB(window.display(), fb_configs[0], NULL, True, ctx_attr);
if (!m_ctx) {
fprintf(stderr, "Failed to create GLX Context\n");
return false;
}
m_should_destroy = true;
return true;
}
bool make_current(const TestWindow& window) {
if (glXMakeCurrent(window.display(), window.xid(), m_ctx) != True) {
fprintf(stderr, "glXMakeCurrent() Failed\n");
return false;
}
return true;
}
void swap_buffers(const TestWindow& window) { glXSwapBuffers(window.display(), window.xid()); }
static void* get_proc_address(const char* name) {
return reinterpret_cast<void*>(glXGetProcAddress((const GLubyte*)name));
}
void destroy(const TestWindow& window) {
glXDestroyContext(window.display(), m_ctx);
m_should_destroy = false;
}
bool should_destroy() const { return m_should_destroy; }
private:
GLXContext m_ctx;
bool m_should_destroy = false;
};
int main() {
// 1. Prepare Window and OpenGL Context
// In normal design, TestWindow should have its GLContext within itself.
// But, in order to fit your needs, I separated these explicitly.
TestWindowConfig config{.width = WINDOW_WIDTH, .height = WINDOW_HEIGHT};
TestWindow window{config};
TestGLContext glctx{};
if (!window.create()) {
return 1;
}
if (!glctx.create(window) || !glctx.make_current(window)) {
if (glctx.should_destroy()) {
glctx.destroy(window);
}
return 1;
}
// 2. Load OpenGL functions
// In normal cases, you are always recommended to use loader libraries like glad.
// In this example, I omited the loading part.
//
// if (!gladLoadGLLoader((GLADloadproc)glctx.get_proc_address)) {
// fprintf(stderr, "Failed to load OpenGL functions\n");
// return 1;
// }
// 3. Show the window and call OpenGL APIs
// As above, there are various problems in this implentation for real use.
window.show();
double last_time = get_time();
while (true) {
if (!window.poll_events() || window.should_close()) {
break;
}
auto delta_ms = get_time() - last_time;
if (auto diff = (1000.0 / FPS) - delta_ms; diff > 0) {
std::this_thread::sleep_for(std::chrono::milliseconds((long)diff));
continue;
}
// fprintf(stderr, "delta: %f\n", delta_ms);
glViewport(0, 0, config.width, config.height);
glClearColor(0.0f, 0.0f, 1.0f, 1.0f);
glClear(GL_COLOR_BUFFER_BIT);
glctx.swap_buffers(window);
last_time = get_time();
}
glctx.destroy(window);
return 0;
}

Is the NVIDIA h.264 encoder MFT leaking resources?

I've been struggling with a resource leak seemingly caused by NVIDIA's h.264 encoder MFT. Each time a frame is submitted to the encoder, the reference count of my D3D device is incremented by 1, and this reference is not given up even after shutting down the MFT. A bunch of threads are leaked as well.
I'm almost ready to bring this up with NVIDIA, but I'd like to first make sure there's nothing obvious I have missed. Please see my implementation below - I've tried to keep it as concise and clear as possible.
Arguments for why this might be a problem with NVIDIA's encoder:
This only happens with NVIDIA's encoder. No leak is observed when running on e.g. Intel's QuickSync.
Arguments for why this might be a problem in my code:
I've tried using a SinkWriter to write DXGI surfaces to a file in a similar fashion, and here the leak is not present. Unfortunately I don't have access to the source code of SinkWriter. I would be very happy if anyone could point me to some working sample code that I could compare against.
#pragma comment(lib, "D3D11.lib")
#pragma comment(lib, "mfplat.lib")
#pragma comment(lib, "mf.lib")
#pragma comment(lib, "evr.lib")
#pragma comment(lib, "mfuuid.lib")
#pragma comment(lib, "Winmm.lib")
// std
#include <iostream>
#include <string>
// Windows
#include <windows.h>
#include <atlbase.h>
// DirectX
#include <d3d11.h>
// Media Foundation
#include <mfapi.h>
#include <mfplay.h>
#include <mfreadwrite.h>
#include <mferror.h>
#include <Codecapi.h>
// Error handling
#define CHECK(x) if (!(x)) { printf("%s(%d) %s was false\n", __FILE__, __LINE__, #x); throw std::exception(); }
#define CHECK_HR(x) { HRESULT hr_ = (x); if (FAILED(hr_)) { printf("%s(%d) %s failed with 0x%x\n", __FILE__, __LINE__, #x, hr_); throw std::exception(); } }
// Constants
constexpr UINT ENCODE_WIDTH = 1920;
constexpr UINT ENCODE_HEIGHT = 1080;
constexpr UINT ENCODE_FRAMES = 120;
void runEncode();
int main()
{
CHECK_HR(CoInitializeEx(NULL, COINIT_APARTMENTTHREADED));
CHECK_HR(MFStartup(MF_VERSION));
for (;;)
{
runEncode();
if (getchar() == 'q')
break;
}
CHECK_HR(MFShutdown());
return 0;
}
void runEncode()
{
CComPtr<ID3D11Device> device;
CComPtr<ID3D11DeviceContext> context;
CComPtr<IMFDXGIDeviceManager> deviceManager;
CComPtr<IMFVideoSampleAllocatorEx> allocator;
CComPtr<IMFTransform> transform;
CComPtr<IMFAttributes> transformAttrs;
CComQIPtr<IMFMediaEventGenerator> eventGen;
DWORD inputStreamID;
DWORD outputStreamID;
// ------------------------------------------------------------------------
// Initialize D3D11
// ------------------------------------------------------------------------
CHECK_HR(D3D11CreateDevice(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, D3D11_CREATE_DEVICE_VIDEO_SUPPORT | D3D11_CREATE_DEVICE_DEBUG, NULL, 0, D3D11_SDK_VERSION, &device, NULL, &context));
{
// Probably not necessary in this application, but maybe the MFT requires it?
CComQIPtr<ID3D10Multithread> mt(device);
CHECK(mt);
mt->SetMultithreadProtected(TRUE);
}
// Create device manager
UINT resetToken;
CHECK_HR(MFCreateDXGIDeviceManager(&resetToken, &deviceManager));
CHECK_HR(deviceManager->ResetDevice(device, resetToken));
// ------------------------------------------------------------------------
// Initialize hardware encoder MFT
// ------------------------------------------------------------------------
{
// Find the encoder
CComHeapPtr<IMFActivate*> activateRaw;
UINT32 activateCount = 0;
// Input & output types
MFT_REGISTER_TYPE_INFO inInfo = { MFMediaType_Video, MFVideoFormat_NV12 };
MFT_REGISTER_TYPE_INFO outInfo = { MFMediaType_Video, MFVideoFormat_H264 };
// Query for the adapter LUID to get a matching encoder for the device.
CComQIPtr<IDXGIDevice> dxgiDevice(device);
CHECK(dxgiDevice);
CComPtr<IDXGIAdapter> adapter;
CHECK_HR(dxgiDevice->GetAdapter(&adapter));
DXGI_ADAPTER_DESC adapterDesc;
CHECK_HR(adapter->GetDesc(&adapterDesc));
CComPtr<IMFAttributes> enumAttrs;
CHECK_HR(MFCreateAttributes(&enumAttrs, 1));
CHECK_HR(enumAttrs->SetBlob(MFT_ENUM_ADAPTER_LUID, (BYTE*)&adapterDesc.AdapterLuid, sizeof(LUID)));
CHECK_HR(MFTEnum2(MFT_CATEGORY_VIDEO_ENCODER, MFT_ENUM_FLAG_HARDWARE | MFT_ENUM_FLAG_SORTANDFILTER, &inInfo, &outInfo, enumAttrs, &activateRaw, &activateCount));
CHECK(activateCount != 0);
// Choose the first returned encoder
CComPtr<IMFActivate> activate = activateRaw[0];
// Memory management
for (UINT32 i = 0; i < activateCount; i++)
activateRaw[i]->Release();
// Activate
CHECK_HR(activate->ActivateObject(IID_PPV_ARGS(&transform)));
// Get attributes
CHECK_HR(transform->GetAttributes(&transformAttrs));
}
// ------------------------------------------------------------------------
// Query encoder name (not necessary, but nice) and unlock for async use
// ------------------------------------------------------------------------
{
UINT32 nameLength = 0;
std::wstring name;
CHECK_HR(transformAttrs->GetStringLength(MFT_FRIENDLY_NAME_Attribute, &nameLength));
// IMFAttributes::GetString returns a null-terminated wide string
name.resize((size_t)nameLength + 1);
CHECK_HR(transformAttrs->GetString(MFT_FRIENDLY_NAME_Attribute, &name[0], (UINT32)name.size(), &nameLength));
name.resize(nameLength);
printf("Using %ls\n", name.c_str());
// Unlock the transform for async use and get event generator
CHECK_HR(transformAttrs->SetUINT32(MF_TRANSFORM_ASYNC_UNLOCK, TRUE));
CHECK(eventGen = transform);
}
// Get stream IDs (expect 1 input and 1 output stream)
{
HRESULT hr = transform->GetStreamIDs(1, &inputStreamID, 1, &outputStreamID);
if (hr == E_NOTIMPL)
{
inputStreamID = 0;
outputStreamID = 0;
hr = S_OK;
}
CHECK_HR(hr);
}
// ------------------------------------------------------------------------
// Configure hardware encoder MFT
// ------------------------------------------------------------------------
// Set D3D manager
CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_SET_D3D_MANAGER, reinterpret_cast<ULONG_PTR>(deviceManager.p)));
// Set output type
CComPtr<IMFMediaType> outputType;
CHECK_HR(MFCreateMediaType(&outputType));
CHECK_HR(outputType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video));
CHECK_HR(outputType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264));
CHECK_HR(outputType->SetUINT32(MF_MT_AVG_BITRATE, 30000000));
CHECK_HR(MFSetAttributeSize(outputType, MF_MT_FRAME_SIZE, ENCODE_WIDTH, ENCODE_HEIGHT));
CHECK_HR(MFSetAttributeRatio(outputType, MF_MT_FRAME_RATE, 60, 1));
CHECK_HR(outputType->SetUINT32(MF_MT_INTERLACE_MODE, 2));
CHECK_HR(outputType->SetUINT32(MF_MT_ALL_SAMPLES_INDEPENDENT, TRUE));
CHECK_HR(transform->SetOutputType(outputStreamID, outputType, 0));
// Set input type
CComPtr<IMFMediaType> inputType;
CHECK_HR(transform->GetInputAvailableType(inputStreamID, 0, &inputType));
CHECK_HR(inputType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video));
CHECK_HR(inputType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_NV12));
CHECK_HR(MFSetAttributeSize(inputType, MF_MT_FRAME_SIZE, ENCODE_WIDTH, ENCODE_HEIGHT));
CHECK_HR(MFSetAttributeRatio(inputType, MF_MT_FRAME_RATE, 60, 1));
CHECK_HR(transform->SetInputType(inputStreamID, inputType, 0));
// ------------------------------------------------------------------------
// Create sample allocator
// ------------------------------------------------------------------------
{
MFCreateVideoSampleAllocatorEx(IID_PPV_ARGS(&allocator));
CHECK(allocator);
CComPtr<IMFAttributes> allocAttrs;
MFCreateAttributes(&allocAttrs, 2);
CHECK_HR(allocAttrs->SetUINT32(MF_SA_D3D11_BINDFLAGS, D3D11_BIND_RENDER_TARGET));
CHECK_HR(allocAttrs->SetUINT32(MF_SA_D3D11_USAGE, D3D11_USAGE_DEFAULT));
CHECK_HR(allocator->SetDirectXManager(deviceManager));
CHECK_HR(allocator->InitializeSampleAllocatorEx(1, 2, allocAttrs, inputType));
}
// ------------------------------------------------------------------------
// Start encoding
// ------------------------------------------------------------------------
CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_COMMAND_FLUSH, NULL));
CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, NULL));
CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_NOTIFY_START_OF_STREAM, NULL));
// Encode loop
for (int i = 0; i < ENCODE_FRAMES; i++)
{
// Get next event
CComPtr<IMFMediaEvent> event;
CHECK_HR(eventGen->GetEvent(0, &event));
MediaEventType eventType;
CHECK_HR(event->GetType(&eventType));
switch (eventType)
{
case METransformNeedInput:
{
CComPtr<IMFSample> sample;
CHECK_HR(allocator->AllocateSample(&sample));
CHECK_HR(transform->ProcessInput(inputStreamID, sample, 0));
// Dereferencing the device once after feeding each frame "fixes" the leak.
//device.p->Release();
break;
}
case METransformHaveOutput:
{
DWORD status;
MFT_OUTPUT_DATA_BUFFER outputBuffer = {};
outputBuffer.dwStreamID = outputStreamID;
CHECK_HR(transform->ProcessOutput(0, 1, &outputBuffer, &status));
DWORD bufCount;
DWORD bufLength;
CHECK_HR(outputBuffer.pSample->GetBufferCount(&bufCount));
CComPtr<IMFMediaBuffer> outBuffer;
CHECK_HR(outputBuffer.pSample->GetBufferByIndex(0, &outBuffer));
CHECK_HR(outBuffer->GetCurrentLength(&bufLength));
printf("METransformHaveOutput buffers=%d, bytes=%d\n", bufCount, bufLength);
// Release the sample as it is not processed further.
if (outputBuffer.pSample)
outputBuffer.pSample->Release();
if (outputBuffer.pEvents)
outputBuffer.pEvents->Release();
break;
}
}
}
// ------------------------------------------------------------------------
// Finish encoding
// ------------------------------------------------------------------------
CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_NOTIFY_END_OF_STREAM, NULL));
CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_NOTIFY_END_STREAMING, NULL));
CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_COMMAND_DRAIN, NULL));
// Shutdown
printf("Finished encoding\n");
// I've tried all kinds of things...
//CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_SET_D3D_MANAGER, reinterpret_cast<ULONG_PTR>(nullptr)));
//transform->SetInputType(inputStreamID, NULL, 0);
//transform->SetOutputType(outputStreamID, NULL, 0);
//transform->DeleteInputStream(inputStreamID);
//deviceManager->ResetDevice(NULL, resetToken);
CHECK_HR(MFShutdownObject(transform));
}
I think the answer is “yes”.
I saw the problem before: Is it possible to shut down a D3D device?
To workaround, I stopped re-creating D3D devices. Instead I’m using a global CAtlMap collection. The keys are uint64_t containing LUID of the GPU from DXGI_ADAPTER_DESC::AdapterLuid field. The values are structures with 2 fields, CComPtr<ID3D11Device> and CComPtr<IMFDXGIDeviceManager>

thread sync using mutex and condition variable

I'm trying to implement an multi-thread job, a producer and a consumer, and basically what I want to do is, when consumer finishes the data, it notifies the producer so that producer provides new data.
The tricky part is, in my current impl, producer and consumer both notifies each other and waits for each other, I don't know how to implement this part correctly.
For example, see the code below,
mutex m;
condition_variable cv;
vector<int> Q; // this is the queue the consumer will consume
vector<int> Q_buf; // this is a buffer Q into which producer will fill new data directly
// consumer
void consume() {
while (1) {
if (Q.size() == 0) { // when consumer finishes data
unique_lock<mutex> lk(m);
// how to notify producer to fill up the Q?
...
cv.wait(lk);
}
// for-loop to process the elems in Q
...
}
}
// producer
void produce() {
while (1) {
// for-loop to fill up Q_buf
...
// once Q_buf is fully filled, wait until consumer asks to give it a full Q
unique_lock<mutex> lk(m);
cv.wait(lk);
Q.swap(Q_buf); // replace the empty Q with the full Q_buf
cv.notify_one();
}
}
I'm not sure this the above code using mutex and condition_variable is the right way to implement my idea,
please give me some advice!
The code incorrectly assumes that vector<int>::size() and vector<int>::swap() are atomic. They are not.
Also, spurious wakeups must be handled by a while loop (or another cv::wait overload).
Fixes:
mutex m;
condition_variable cv;
vector<int> Q;
// consumer
void consume() {
while(1) {
// Get the new elements.
vector<int> new_elements;
{
unique_lock<mutex> lk(m);
while(Q.empty())
cv.wait(lk);
new_elements.swap(Q);
}
// for-loop to process the elems in new_elements
}
}
// producer
void produce() {
while(1) {
vector<int> new_elements;
// for-loop to fill up new_elements
// publish new_elements
{
unique_lock<mutex> lk(m);
Q.insert(Q.end(), new_elements.begin(), new_elements.end());
cv.notify_one();
}
}
}
Maybe that is close to what you want to achive. I used 2 conditional variables to notify producers and consumers between each other and introduced variable denoting which turn is now:
#include <ctime>
#include <condition_variable>
#include <iostream>
#include <mutex>
#include <queue>
#include <thread>
template<typename T>
class ReaderWriter {
private:
std::vector<std::thread> readers;
std::vector<std::thread> writers;
std::condition_variable readerCv, writerCv;
std::queue<T> data;
std::mutex readerMutex, writerMutex;
size_t noReaders, noWriters;
enum class Turn { WRITER_TURN, READER_TURN };
Turn turn;
void reader() {
while (1) {
{
std::unique_lock<std::mutex> lk(readerMutex);
while (turn != Turn::READER_TURN) {
readerCv.wait(lk);
}
std::cout << "Thread : " << std::this_thread::get_id() << " consumed " << data.front() << std::endl;
data.pop();
if (data.empty()) {
turn = Turn::WRITER_TURN;
writerCv.notify_one();
}
}
}
}
void writer() {
while (1) {
{
std::unique_lock<std::mutex> lk(writerMutex);
while (turn != Turn::WRITER_TURN) {
writerCv.wait(lk);
}
srand(time(NULL));
int random_number = std::rand();
data.push(random_number);
std::cout << "Thread : " << std::this_thread::get_id() << " produced " << random_number << std::endl;
turn = Turn::READER_TURN;
}
readerCv.notify_one();
}
}
public:
ReaderWriter(size_t noReadersArg, size_t noWritersArg) : noReaders(noReadersArg), noWriters(noWritersArg), turn(ReaderWriter::Turn::WRITER_TURN) {
}
void run() {
int noReadersArg = noReaders, noWritersArg = noWriters;
while (noReadersArg--) {
readers.emplace_back(&ReaderWriter::reader, this);
}
while (noWritersArg--) {
writers.emplace_back(&ReaderWriter::writer, this);
}
}
~ReaderWriter() {
for (auto& r : readers) {
r.join();
}
for (auto& w : writers) {
w.join();
}
}
};
int main() {
ReaderWriter<int> rw(5, 5);
rw.run();
}
Here's a code snippet. Since the worker treads are already synchronized, requirement of two buffers is ruled out. So a simple queue is used to simulate the scenario:
#include "conio.h"
#include <iostream>
#include <thread>
#include <mutex>
#include <queue>
#include <atomic>
#include <condition_variable>
using namespace std;
enum state_t{ READ = 0, WRITE = 1 };
mutex mu;
condition_variable cv;
atomic<bool> running;
queue<int> buffer;
atomic<state_t> state;
void generate_test_data()
{
const int times = 5;
static int data = 0;
for (int i = 0; i < times; i++) {
data = (data++) % 100;
buffer.push(data);
}
}
void ProducerThread() {
while (running) {
unique_lock<mutex> lock(mu);
cv.wait(lock, []() { return !running || state == WRITE; });
if (!running) return;
generate_test_data(); //producing here
lock.unlock();
//notify consumer to start consuming
state = READ;
cv.notify_one();
}
}
void ConsumerThread() {
while (running) {
unique_lock<mutex> lock(mu);
cv.wait(lock, []() { return !running || state == READ; });
if (!running) return;
while (!buffer.empty()) {
auto data = buffer.front(); //consuming here
buffer.pop();
cout << data << " \n";
}
//notify producer to start producing
if (buffer.empty()) {
state = WRITE;
cv.notify_one();
}
}
}
int main(){
running = true;
thread producer = thread([]() { ProducerThread(); });
thread consumer = thread([]() { ConsumerThread(); });
//simulating gui thread
while (!getch()){
}
running = false;
producer.join();
consumer.join();
}
Not a complete answer, though I think two condition variables could be helpful, one named buffer_empty that the producer thread will wait on, and another named buffer_filled that the consumer thread will wait on. Number of mutexes, how to loop, and so on I cannot comment on, since I'm not sure about the details myself.
Accesses to shared variables should only be done while holding the
mutex that protects it
condition_variable::wait should check a condition.
The condition should be a shared variable protected by the mutex that you pass to condition_variable::wait.
The way to check the condition is to wrap the call to wait in a while loop or use the 2-argument overload of wait (which is
equivalent to the while-loop version)
Note: These rules aren't strictly necessary if you truly understand what the hardware is doing. However, these problems get complicated quickly when with simple data structures, and it will be easier to prove that your algorithm is working correctly if you follow them.
Your Q and Q_buf are shared variables. Due to Rule 1, I would prefer to have them as local variables declared in the function that uses them (consume() and produce(), respectively). There will be 1 shared buffer that will be protected by a mutex. The producer will add to its local buffer. When that buffer is full, it acquires the mutex and pushes the local buffer to the shared buffer. It then waits for the consumer to accept this buffer before producing more data.
The consumer waits for this shared buffer to "arrive", then it acquires the mutex and replaces its empty local buffer with the shared buffer. Then it signals to the producer that the buffer has been accepted so it knows to start producing again.
Semantically, I don't see a reason to use swap over move, since in every case one of the containers is empty anyway. Maybe you want to use swap because you know something about the underlying memory. You can use whichever you want and it will be fast and work the same (at least algorithmically).
This problem can be done with 1 condition variable, but it may be a little easier to think about if you use 2.
Here's what I came up with. Tested on Visual Studio 2017 (15.6.7) and GCC 5.4.0. I don't need to be credited or anything (it's such a simple piece), but legally I have to say that I offer no warranties whatsoever.
#include <thread>
#include <vector>
#include <mutex>
#include <condition_variable>
#include <chrono>
std::vector<int> g_deliveryBuffer;
bool g_quit = false;
std::mutex g_mutex; // protects g_deliveryBuffer and g_quit
std::condition_variable g_producerDeliver;
std::condition_variable g_consumerAccepted;
// consumer
void consume()
{
// local buffer
std::vector<int> consumerBuffer;
while (true)
{
if (consumerBuffer.empty())
{
std::unique_lock<std::mutex> lock(g_mutex);
while (g_deliveryBuffer.empty() && !g_quit) // if we beat the producer, wait for them to push to the deliverybuffer
g_producerDeliver.wait(lock);
if (g_quit)
break;
consumerBuffer = std::move(g_deliveryBuffer); // get the buffer
}
g_consumerAccepted.notify_one(); // notify the producer that the buffer has been accepted
// for-loop to process the elems in Q
// ...
consumerBuffer.clear();
// ...
}
}
// producer
void produce()
{
std::vector<int> producerBuffer;
while (true)
{
// for-loop to fill up Q_buf
// ...
producerBuffer = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 };
// ...
// once Q_buf is fully filled, wait until consumer asks to give it a full Q
{ // scope is for lock
std::unique_lock<std::mutex> lock(g_mutex);
g_deliveryBuffer = std::move(producerBuffer); // ok to push to deliverybuffer. it is guaranteed to be empty
g_producerDeliver.notify_one();
while (!g_deliveryBuffer.empty() && !g_quit)
g_consumerAccepted.wait(lock); // wait for consumer to signal for more data
if (g_quit)
break;
// We will never reach this point if the buffer is not empty.
}
}
}
int main()
{
// spawn threads
std::thread consumerThread(consume);
std::thread producerThread(produce);
// for for 5 seconds
std::this_thread::sleep_for(std::chrono::seconds(5));
// signal that it's time to quit
{
std::lock_guard<std::mutex> lock(g_mutex);
g_quit = true;
}
// one of the threads may be sleeping
g_consumerAccepted.notify_one();
g_producerDeliver.notify_one();
consumerThread.join();
producerThread.join();
return 0;
}

Qt C++ Displaying images outside the GUI thread (Boost thread)

I am developing a C++ library realizing its interface by means of Qt, using VS2015. On the library side, 3 boost threads continously load images from 3 folders. I am trying to display these images in 3 different QLabel (or equivalent QWidgets), so the thread body consists of this functionality,
in particular by exploiting the setPixmap method. Although the call to the function is protected by a boost mutex, I got exceptions probably due to threads synchronization. Looking for a solution, I already awared that the QPixmap widget is not "thread-safe" (non-reentrant). I also tried to use QGraphicsView but it in turn relies on QPixmap, thus I came across the same problem.
So my question is: does an alternative to QPixmap exist to display images in Qt in a thread-safe
manner?
I would recommend to do not multi-threading in GUI programming. Although, Qt provides multi-threading support in general, IMHO, the widgets are not well-prepared for this.
Thus, to achieve image loaders which run concurrently in separate threads I would suggest the following concept:
Each threaded image loader feeds a private buffer. The GUI inspects from time to time (using QTimer) these buffers and updates its QPixmap. As access to buffers should be possible from the resp. image loader thread as well as the GUI thread they have to be mutex guarded, of course.
My sample code testLoadImageMT.cc:
#include <atomic>
#include <chrono>
#include <mutex>
#include <thread>
#include <QtWidgets>
// manually added types (normally provided by glib)
typedef unsigned guint;
typedef unsigned char guint8;
// the fluffy-cat image sample
struct Image {
guint width;
guint height;
guint bytes_per_pixel; /* 3:RGB, 4:RGBA */
guint8 pixel_data[1];
};
extern "C" const Image fluffyCat;
class ImageLoader {
private:
const Image &_img;
std::atomic<bool> _exit;
std::mutex _lock;
QImage _qImg;
std::thread _thread;
public: // main thread API
ImageLoader(const Image &img = fluffyCat):
_img(img),
_qImg(img.width, img.height, QImage::Format_RGB888),
_exit(false), _thread(&ImageLoader::loadImage, std::ref(*this))
{ }
~ImageLoader()
{
_exit = true;
_thread.join();
}
ImageLoader(const ImageLoader&) = delete;
void applyImage(QLabel &qLblImg)
{
std::lock_guard<std::mutex> lock(_lock);
qLblImg.setPixmap(QPixmap::fromImage(_qImg));
}
private: // thread private
void loadImage()
{
for (;;) {
{ std::lock_guard<std::mutex> lock(_lock);
_qImg.fill(0);
}
size_t i = 0;
for (int y = 0; y < (int)_img.height; ++y) {
for (int x = 0; x < (int)_img.width; ++x) {
const quint32 value
= _img.pixel_data[i + 2]
| (_img.pixel_data[i + 1] << 8)
| (_img.pixel_data[i + 0] << 16)
| (0xff << 24);
i += _img.bytes_per_pixel;
{ std::lock_guard<std::mutex> lock(_lock);
_qImg.setPixel(x, y, value);
}
if (_exit) return; // important: make thread co-operative
}
std::this_thread::sleep_for(std::chrono::milliseconds(100)); // slow down CPU cooler
}
}
}
};
int main(int argc, char **argv)
{
// settings:
enum { N = 3 }; // number of images loaded/displayed
enum { Interval = 50 }; // update rate for GUI 50 ms -> 20 Hz (round about)
// build appl.
qDebug() << "Qt Version: " << QT_VERSION_STR;
QApplication app(argc, argv);
// build GUI
QWidget qMainWin;
QVBoxLayout qVBox;
QLabel *pQLblImgs[N];
for (int i = 0; i < N; ++i) {
qVBox.addWidget(
new QLabel(QString::fromUtf8("Image %1").arg(i + 1)));
qVBox.addWidget(
pQLblImgs[i] = new QLabel());
}
qMainWin.setLayout(&qVBox);
qMainWin.show();
// build image loaders
ImageLoader imgLoader[N];
// install timer
QTimer qTimer;
qTimer.setInterval(Interval); // ms
QObject::connect(&qTimer, &QTimer::timeout,
[&imgLoader, &pQLblImgs]() {
for (int i = 0; i < N; ++i) {
imgLoader[i].applyImage(*pQLblImgs[i]);
}
});
qTimer.start();
// exec. application
return app.exec();
}
Sorry, I used std::thread instead of boost::thread as I've no experience with the latter, nor a working installation. I believe (hope) the differences will be marginal. QThread would have been the "Qt native" alternative but again – no experiences.
To keep things simple, I just copied data out of a linked binary image (instead of loading one from file or from anywhere else). Hence, a second file has to be compiled and linked to make this an MCVE – fluffyCat.cc:
/* GIMP RGB C-Source image dump (fluffyCat.cc) */
// manually added types (normally provided by glib)
typedef unsigned guint;
typedef unsigned char guint8;
extern "C" const struct {
guint width;
guint height;
guint bytes_per_pixel; /* 3:RGB, 4:RGBA */
guint8 pixel_data[16 * 16 * 3 + 1];
} fluffyCat = {
16, 16, 3,
"x\211s\215\232\200gw`fx`at[cx^cw^fu\\itZerWn|ap~cv\204jnzedq^fr^kzfhv^Ra"
"GRbMWdR\\jXer^qw_\311\256\226\271\253\235\275\264\252\315\277\260\304\255"
"\231u~i\213\225\207l{fly`jx\\^nRlz_z\206nlx`t~i\221\211s\372\276\243\375"
"\336\275\376\352\340\356\312\301\235\216\212judgwcl~f\212\226u}\206h\212"
"\224q\231\237z\232\236{\216\225v\225\230\200\306\274\244\376\360\327\376"
"\361\331\376\360\341\326\275\272\253\240\244{\203p\202\220xp~e{\204^\222"
"\230n\212\217g\240\242{\234\236z\214\222r\270\271\247\360\353\340\376\370"
"\336\376\363\334\375\357\336\310\254\262\232\223\234\\gRfrX\204\220z\212"
"\225g\225\232j\254\255\177\252\250{\225\226u\304\302\265\374\365\351\376"
"\375\366\376\367\341\376\361\320\374\346\324\306\241\242\237\232\235n{fj"
"xckyfu~fUX#VZCfnT\231\231\207\374\374\371\377\372\354\376\376\374\376\376"
"\372\376\362\332\375\340\301\341\300\264\260\253\262jvdbq\\XkVJTDNTCCG8O"
"TE\322\321\313\377\377\375\376\376\373\376\377\376\376\376\375\376\374\362"
"\376\360\342\344\311\306\250\244\254R_PL^HXkT<#2OP#`dP\217\220\177\374\374"
"\370\377\377\374\376\375\371\377\377\376\376\374\360\377\367\336\376\350"
"\316\342\303\274\246\236\245jtbXdQTdNQYGU\\KchV\317\315\302\377\376\372\377"
"\376\367\376\373\360\377\376\367\376\366\337\376\355\312\374\331\271\323"
"\263\251\216\214\214\\hTP^HL\\FR[LMXI^dW\355\352\342\376\375\366\377\374"
"\360\376\374\361\376\374\361\376\356\321\374\331\264\374\330\266\330\270"
"\260\200||Y`SLVE>K9BJ<CN?VYP\347\330\322\376\366\345\376\363\330\376\367"
"\337\377\372\350\374\342\314\326\243\210\375\350\314\352\317\304shc^`TV`"
"RVbT>B4IS?PTD\244\232\216\374\355\320\376\354\311\376\351\306\376\362\332"
"\374\344\321\267\206u\375\362\337\326\274\272\\POMNBT]LNZH:<*<A*TV>OI;\242"
"\222\207\340\304\243\375\335\262\372\336\272\376\361\334\320\241\212\374"
"\352\322\266\233\237c\\WFH;MR>\\`F~xP\220\214[pqE\211\202\\g]=\230\214`\313"
"\266\207\344\303\240\362\336\274\323\257\201\333\304\240\305\252\204\254"
"\232p\216\206\\\206\203U\232\224b\234\244b\246\257m\220\232`\224\227h~\202"
"W\206\213]\204\210W\227\227i|\177RvzNlsGrtJwtLz}N{\204RlxF",
};
I compiled and tested in VS2013, with Qt 5.9.2 on Windows 10 (64 bit). This is how it looks:
I solved using signal/slot: the "non-GUI" thread emits a signal instead of displaying the images and the called slot paints the QLabel inside the GUI thread!

what could be causing this opengl segfault in glBufferSubData?

I've been whittling down this segfault for a while, and here's a pretty minimal reproducible example on my machine (below). I have the sinking feeling that it's a driver bug, but I'm very unfamiliar with OpenGL, so it's more likely I'm just doing something wrong.
Is this correct OpenGL 3.3 code? Should be fine regardless of platform and compiler and all that?
Here's the code, compiled with gcc -ggdb -lGL -lSDL2
#include <stdio.h>
#include "GL/gl.h"
#include "GL/glext.h"
#include "SDL2/SDL.h"
// this section is for loading OpenGL things from later versions.
typedef void (APIENTRY *GLGenVertexArrays) (GLsizei n, GLuint *arrays);
typedef void (APIENTRY *GLGenBuffers) (GLsizei n, GLuint *buffers);
typedef void (APIENTRY *GLBindVertexArray) (GLuint array);
typedef void (APIENTRY *GLBindBuffer) (GLenum target, GLuint buffer);
typedef void (APIENTRY *GLBufferData) (GLenum target, GLsizeiptr size, const GLvoid* data, GLenum usage);
typedef void (APIENTRY *GLBufferSubData) (GLenum target, GLintptr offset, GLsizeiptr size, const GLvoid* data);
typedef void (APIENTRY *GLGetBufferSubData) (GLenum target, GLintptr offset, GLsizeiptr size, const GLvoid* data);
typedef void (APIENTRY *GLFlush) (void);
typedef void (APIENTRY *GLFinish) (void);
GLGenVertexArrays glGenVertexArrays = NULL;
GLGenBuffers glGenBuffers = NULL;
GLBindVertexArray glBindVertexArray = NULL;
GLBindBuffer glBindBuffer = NULL;
GLBufferData glBufferData = NULL;
GLBufferSubData glBufferSubData = NULL;
GLGetBufferSubData glGetBufferSubData = NULL;
void load_gl_pointers() {
glGenVertexArrays = (GLGenVertexArrays)SDL_GL_GetProcAddress("glGenVertexArrays");
glGenBuffers = (GLGenBuffers)SDL_GL_GetProcAddress("glGenBuffers");
glBindVertexArray = (GLBindVertexArray)SDL_GL_GetProcAddress("glBindVertexArray");
glBindBuffer = (GLBindBuffer)SDL_GL_GetProcAddress("glBindBuffer");
glBufferData = (GLBufferData)SDL_GL_GetProcAddress("glBufferData");
glBufferSubData = (GLBufferSubData)SDL_GL_GetProcAddress("glBufferSubData");
glGetBufferSubData = (GLGetBufferSubData)SDL_GL_GetProcAddress("glGetBufferSubData");
}
// end OpenGL loading stuff
#define CAPACITY (1 << 8)
// return nonzero if an OpenGL error has occurred.
int opengl_checkerr(const char* const label) {
GLenum err;
switch(err = glGetError()) {
case GL_INVALID_ENUM:
printf("GL_INVALID_ENUM");
break;
case GL_INVALID_VALUE:
printf("GL_INVALID_VALUE");
break;
case GL_INVALID_OPERATION:
printf("GL_INVALID_OPERATION");
break;
case GL_INVALID_FRAMEBUFFER_OPERATION:
printf("GL_INVALID_FRAMEBUFFER_OPERATION");
break;
case GL_OUT_OF_MEMORY:
printf("GL_OUT_OF_MEMORY");
break;
case GL_STACK_UNDERFLOW:
printf("GL_STACK_UNDERFLOW");
break;
case GL_STACK_OVERFLOW:
printf("GL_STACK_OVERFLOW");
break;
default: return 0;
}
printf(" %s\n", label);
return 1;
}
int main(int nargs, const char* args[]) {
printf("initializing..\n");
SDL_Init(SDL_INIT_EVERYTHING);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MAJOR_VERSION, 3);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_MINOR_VERSION, 3);
SDL_GL_SetAttribute(SDL_GL_CONTEXT_PROFILE_MASK, SDL_GL_CONTEXT_PROFILE_CORE);
SDL_Window* const w =
SDL_CreateWindow(
"broken",
SDL_WINDOWPOS_CENTERED, SDL_WINDOWPOS_CENTERED,
1, 1,
SDL_WINDOW_OPENGL
);
if(w == NULL) {
printf("window was null\n");
return 0;
}
SDL_GLContext context = SDL_GL_CreateContext(w);
if(context == NULL) {
printf("context was null\n");
return 0;
}
load_gl_pointers();
if(opengl_checkerr("init")) {
return 1;
}
printf("GL_VENDOR: %s\n", glGetString(GL_VENDOR));
printf("GL_RENDERER: %s\n", glGetString(GL_RENDERER));
float* const vs = malloc(CAPACITY * sizeof(float));
memset(vs, 0, CAPACITY * sizeof(float));
unsigned int i = 0;
while(i < 128000) {
GLuint vertex_array;
GLuint vertex_buffer;
glGenVertexArrays(1, &vertex_array);
glBindVertexArray(vertex_array);
glGenBuffers(1, &vertex_buffer);
glBindBuffer(GL_ARRAY_BUFFER, vertex_buffer);
if(opengl_checkerr("gen/binding")) {
return 1;
}
glBufferData(
GL_ARRAY_BUFFER,
CAPACITY * sizeof(float),
vs, // initialize with `vs` just to make sure it's allocated.
GL_DYNAMIC_DRAW
);
// verify that the memory is allocated by reading it back into `vs`.
glGetBufferSubData(
GL_ARRAY_BUFFER,
0,
CAPACITY * sizeof(float),
vs
);
if(opengl_checkerr("creating buffer")) {
return 1;
}
glFlush();
glFinish();
// segfault occurs here..
glBufferSubData(
GL_ARRAY_BUFFER,
0,
CAPACITY * sizeof(float),
vs
);
glFlush();
glFinish();
++i;
}
return 0;
}
When I bump the iterations from 64k to 128k, I start getting:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff754c859 in __memcpy_sse2_unaligned () from /usr/lib/libc.so.6
(gdb) bt
#0 0x00007ffff754c859 in __memcpy_sse2_unaligned () from /usr/lib/libc.so.6
#1 0x00007ffff2ea154d in ?? () from /usr/lib/xorg/modules/dri/i965_dri.so
#2 0x0000000000400e5c in main (nargs=1, args=0x7fffffffe8d8) at opengl-segfault.c:145
However, I can more than double the capacity (keeping the number of iterations at 64k) without segfaulting.
GL_VENDOR: Intel Open Source Technology Center
GL_RENDERER: Mesa DRI Intel(R) Haswell Mobile
I had a very similar issue when calling glGenTextures and glBindTexture. I tried debugging and when i would try to step through these lines I would get something like:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff26eaaa8 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
Note that prior to adding textures, I could successfully run programs with vbos and vaos and generate meshes fine. After looking into the answer suggesting switching from xf86-video-intel driver to xf86-video-fbdev driver, I would advise against it(There really isn't that much info on this issue or users facing segfaults on linux with integrated intel graphics cards. perhaps a good question to ask the folks over at Intel OpenSource).
The solution I found was to stop using freeglut. Switch to glfw instead. Whether there actually is some problem with the intel linux graphics stack is besides the matter, it seems the solvable problem is freeglut. If you want to use glfw with your machines most recent opengl core profile you need just the following:
glfwWindowHint (GLFW_CONTEXT_VERSION_MAJOR, 3);
glfwWindowHint (GLFW_CONTEXT_VERSION_MINOR, 0);
glfwWindowHint (GLFW_OPENGL_FORWARD_COMPAT, GL_TRUE);
Setting forward compat(although ive seen lots of post argueing you shouldnt do this) means mesa is free to select a core context permitted one sets the minimum context to 3.0 or higher. I guess freeglut must be going wrong somewhere in its interactions with mesa, if anyone can share some light on this that would be great!
This is a bug in the intel graphics drivers for Linux. Switching from the xf86-video-intel driver to xf86-video-fbdev driver solves the problem.
Edit: I'm not recommending switching to fbdev, just using it as an experiment to see whether the segfault goes away.

Resources