Divide a task into subtasks and assign to thread pool - visual-c++

I am trying to read an image manipulate the pixel data(Gaussian or any other) and write the pixels to a new image.Since the images come in big sizes(>1GB and may even be >20 GB),I read them one row at at time with whole width.So it becomes block wise reading.Now my work requires a faster mechanism to do the whole process.Will thread pool work as an effective solution.I cannot use other libraries for image processing,we have an engine built for that.
I have referred to a sample threadpool sample from code project and I am reading the image in the thread's run function but I am really not sure how it works.
HRESULT hRes = m_ObjPool.Init(10, 100); //spawning the thread
void CThreadObject::Run(CThreadPoolThreadCallback &pool)
{
//I read and write my image here using for loop
for(int i=0;i<nImageHeight;++i)
{
for(int j=0;j<nImageWidth;j++)
{
Engine.ReadImage(params);
}
}
}
What I was trying to achieve here is how do I give tasks to thread pool if the image is segmented into 10 or 100 parts(depending on the image size and block size).

Related

Xcode UIImage imageWithCGImage leaks memory

I have ARC enabled in my app and noticed that if I create a ton of images, my app would crash. As part of my investigation, I created a small project that reproduces the issue which can be found here. The meat of the sample project code is the following:
int width = 10;
int height = 10;
uint8_t data[100];
while (true)
{
CGColorSpaceRef colorspace = CGColorSpaceCreateDeviceGray();
CGContextRef context = CGBitmapContextCreate(data, width, height, 8, width, colorspace, kCGBitmapByteOrderDefault | kCGImageAlphaNone);
CGImageRef cgimage = CGBitmapContextCreateImage(context);
// Remove this line and memory is stable, keep it and you lose 15-20 MB per second. Why?
UIImage* uiimage = [UIImage imageWithCGImage:cgimage];
CGImageRelease(cgimage);
CGContextRelease(context);
CGColorSpaceRelease(colorspace);
}
While running this code, the sidebar in Xcode will show the total memory of the app increasing at around 15-20 MB per second. If you comment out the line that creates the UIImage, the leak disappears.
There are a number of questions on Stack Overflow about whether or not you should release a CGImage after creating a UIImage via imageWithCGImage, and it doesn't look like there is a real consensus. However, if I don't call CGImageRelease(cgimage), then the memory usage increases by over 100 MB per second, so I'm certain that manually releasing the image is the correct thing to do.
Since I have ARC enabled, I tried setting uiimage to nil after releasing everything, which didn't work. Not storing the return value of the call to imageWithCGImage: also doesn't prevent the leak.
Is there something fundamental I'm missing about how to use Core Graphics?
Is there something fundamental I'm missing about how to use Core Graphics?
It seems more likely that you are missing something fundamental about memory management.
Many Foundation / Cocoa framework commands, especially those that create ready-made objects, make objects that are autoreleased. That means you don't have to release the object, because it will be released later, automatically, when you are done with it. But how is that possible? Such objects go into the autorelease pool and, once their retain count drops to zero, they are drained later on, when there is an opportunity. But you are looping continuously, so there is no such opportunity. So you need to wrap your troublesome line in an #autoreleasepool{} block so as to construct and drain your own pool.
Also note that there can be intermediate autoreleased objects of which you are unaware. The autorelease pool can help with those too.
See this section of my book for more information about autoreleased objects.

Unreal Engine 4: Adapting ReadPixels() to a multithreaded framework

I am trying to access pixel data and save images from an in-game camera to disk. Initially, the simple approach was to use a render target and subsequently RenderTarget->ReadPixels(), but as the native implementation of ReadPixels() contains a call to FlushRenderingCommands(), it would block the game thread until the image is saved. Being a computationally intensive operation, this was lowering my FPS way too much.
To solve this problem, I am trying to create a dedicated thread that can access the camera as a CaptureComponent, and then follow a similar approach. But as the FlushRenderingCommands() block can only be called from a game thread, I had to rewrite ReadPixels() without that call, (in a non-blocking way of sorts, inspired by the tutorial at https://wiki.unrealengine.com/Render_Target_Lookup): but even then I am facing a problem with my in-game FPS being jerky whenever an image is saved (I confirmed this is not because of the actual saving to disk operation, but because of the pixel data access). My rewritten ReadPixels() function looks as below, I was hoping to get some suggestions as to what could be going wrong here. I am not sure if ENQUEUE_UNIQUE_RENDER_COMMAND_ONEPARAMETER can be called from a non-game thread, and if that's part of my problem.
APIPCamera* cam = GameThread->CameraDirector->getCamera(0);
USceneCaptureComponent2D* capture = cam->getCaptureComponent(EPIPCameraType::PIP_CAMERA_TYPE_SCENE, true);
if (capture != nullptr) {
if (capture->TextureTarget != nullptr) {
FTextureRenderTargetResource* RenderResource = capture->TextureTarget->GetRenderTargetResource();
if (RenderResource != nullptr) {
width = capture->TextureTarget->GetSurfaceWidth();
height = capture->TextureTarget->GetSurfaceHeight();
// Read the render target surface data back.
struct FReadSurfaceContext
{
FRenderTarget* SrcRenderTarget;
TArray<FColor>* OutData;
FIntRect Rect;
FReadSurfaceDataFlags Flags;
};
bmp.Reset();
FReadSurfaceContext ReadSurfaceContext =
{
RenderResource,
&bmp,
FIntRect(0, 0, RenderResource->GetSizeXY().X, RenderResource->GetSizeXY().Y),
FReadSurfaceDataFlags(RCM_UNorm, CubeFace_MAX)
};
ENQUEUE_UNIQUE_RENDER_COMMAND_ONEPARAMETER(
ReadSurfaceCommand,
FReadSurfaceContext, Context, ReadSurfaceContext,
{
RHICmdList.ReadSurfaceData(
Context.SrcRenderTarget->GetRenderTargetTexture(),
Context.Rect,
*Context.OutData,
Context.Flags
);
});
}
}
}
EDIT: One more thing I have noticed is that the stuttering goes away if I disable HDR in my render target settings (but this results in low quality images): so it seems plausible that the size of the image, perhaps, is still blocking one of the core threads because of the way I am implementing it.
It should be possible to call ENQUEUE_UNIQUE_RENDER_COMMAND_ONEPARAMETER from any thread since there is underlying call of Task Graph. You can see it, when you analize what code this macro generates:
if(ShouldExecuteOnRenderThread())
{
CheckNotBlockedOnRenderThread();
TGraphTask<EURCMacro_##TypeName>::CreateTask().ConstructAndDispatchWhenReady(ParamValue1);
}
You should be cautious about accessing UObjects (like USceneCaptureComponent2D) from different threads cause these are managed by Garbage Collector and own by game thread.
(...) but even then I am facing a problem with my in-game FPS being jerky whenever an image is saved
Did you check what thread is causing FPS drop with stat unit or stat unitgraph command? You could also use profiling tools to perform more detailed insight and make sure there is no other causes of lag.
Edit:
I've found yet another method of accessing pixel data. Try this without actually copying data in for loop and check, if there is any improvement in FPS. This could be a bit faster cause there is no pixel manipulation/conversion in-between.

Multithreading in DirectX 12

I am having a hard time trying to swallow a concept of multithreaded render in DX12.
According to MSDN one must write draw commands into direct command lists (preferably using bundles) and then submit those lists to a command queue.
It is also said that one can have more than one command queue for direct command lists. But it is unclear for me what is the purpose of doing so.
I take the full profit of multithreading by building command lists in parallel threads, don't i? If so, why would i want to have more than one command queue associated with the device?
I suspect that improper management of command queues can lead to enormous troubles with performance in later stages of rendering library development.
The main benefit to directx 12 is that execution of commands is almost purely asynchronous. Meaning when you call ID3D12CommandQueue::ExecuteCommandLists it will kick off work of the commands passed in. This brings another point however. A common misconception is that rendering is somehow multithreaded now, and this is just simply not true. All work is still executed on the GPU. However command list recording is what is done on several threads, as you will create a ID3D12GraphicsCommandList object for each thread needing it.
An example:
DrawObject DrawObjects[10];
ID3D12CommandQueue* GCommandQueue = ...
void RenderThread1()
{
ID3D12GraphicsCommandList* clForThread1 = ...
for (int i = 0; i < 5; i++)
clForThread1->RecordDraw(DrawObjects[i]);
}
void RenderThread2()
{
ID3D12GraphicsCommandList* clForThread2 = ...
for (int i = 5; i < 10; i++)
clForThread2->RecordDraw(DrawObjects[i]);
}
void ExecuteCommands()
{
ID3D12GraphicsCommandList* cl[2] = { clForThread1, clForThread2 };
GCommandQueue->ExecuteCommandLists(2, cl);
GCommandQueue->Signal(...)
}
This example is a very rough use case, but that is the general idea. That you can record objects of your scene on different threads to remove the CPU overhead of recording the commands.
Another useful thing however is that with this setup, you can kick off rendering tasks and start recording another.
An example
void Render()
{
ID3D12GraphicsCommandList* cl = ...
cl->DrawObjectsInTheScene(...);
CommandQueue->Execute(cl); // Just send it to the gpu to start rendering all the objects in the scene
// And since we have started the gpu work on rendering the scene, we can move to render our post processing while the scene is being rendered on the gpu
ID3D12GraphicsCommandList* cl2 = ...
cl2->SetBloomPipelineState(...);
cl2->SetResources(...);
cl2->DrawOnScreenQuad();
}
The advantage here over directx 11 or opengl is that those apis potentially just sit there and record and record, and possibly don't send their commands until Present() is called, which forces the cpu to wait, and incurring an overhead.

Fps drops when using .NETs ThreadPool

I have asked about this before, but didn't provide code because I didn't have an easy way to do so. However now I've started a new project in Unity and tried to replicate the behaviour without all the unnecessary baggage attached.
So this is my current setup:
public class Main : MonoBehaviour
{
public GameObject calculatorPrefab;
void Start ()
{
for (int i = 0; i < 10000; i++)
{
Instantiate(calculatorPrefab);
}
}
}
public class Calculator : MonoBehaviour
{
void Start ()
{
ThreadPool.QueueUserWorkItem(DoCalculations);
}
void DoCalculations(object o)
{
// Just doing some pointless calculations so the thread actually has something to do.
float result = 0;
for (int i = 0; i < 1000; i++)
{
// Note that the loop count doesn't seem to matter at all, other than taking longer.
for (int i2 = 0; i2 < 1000; i2++)
{
result = i * i2 * Mathf.Sqrt(i * i2 + 59);
}
}
}
}
Both scripts are attached to GameObjects. The 'Main' script is on a GameObject thats placed in the scene and is supposed to create a bunch of other GameObjects at start up which then in turn queue some random calculations for the ThreadPool. Obviously this produces a fairly big CPU spike at start up, but that's not the problem. The problem is that the main thread seems to be blocked by this. In other words, it produces horrible fps. Why is that ? Isn't it supposed to run in the background ? Isn't the whole point behind this not to make the main thread unresponsive ?
I'm really struggling to figure out what I'm doing wrong, because as far as I see it, it doesn't get much simpler than this.
On the first frame you instantiate 10000 prefabs. That is quite a load for a single frame. On the second frame you initialize 10000 thread pools. That is quite a number of threads and I am sure you are running into some upfront initialization costs.
The background task is not that complex. I use background tasks for really long running operations. For instance web calls and long running calculations. I dont think your task really fits. In other words the upfront cost exceeds the cost of running your calculation.
Try using a coroutine instead to breakup your calculations and instantiations. I think that is a better solution for this particular background task.
Edit Ran some tests per the comments below.
10k instantiates took on average (median) 104 milliseconds. The editor had poor framerate and used about 15% of my I7 cpu capacity.
10k QueueUserWorkItem took on average (median) 23 milliseconds. The editor locked up for multiple seconds. My cpu capacity had a wonderfull 99% capacity.
Conclusion
Queuing the worker thread has some cost, but not a lot. The problems are mainly with your instantiate. That, and why are we quing 1000 worker threads for such a simple calculation ?
I see the following problems with your code.
You are creating far too many background jobs at once, 10,000 to be precise. .NET won't run them all concurrently all the same but still is perhaps not the best way to go. On my machine (8 logical cores) the initial max workers via ThreadPool.GetMaxThreads() was 1023
Each job is rather complex. The calculation of Sqrt is not cheap and no wonder takes so long
Unity has methods for updating and methods for drawing. The problem here is that your jobs are ongoing and thus drags down everything including updating; drawing; and everything in between; rather than computation just happening during Update()
Taking your code and just running it in a stand-alone .NET app, it took 15 seconds to complete maxing out all 8 of my cores.
However, changing
result = i * i2 * Mathf.Sqrt(i * i2 + 59)
...to:
result = i * i2 * i * i2 + 59;
...also maxed out all of my 8 cores as before but this time took 6 seconds.
You might ask, "well you took away to sqrt, what is your point". My point is I don't believe you realise how intensive a call Sqrt is particularly with this statement:
And it's still terrible. I even reduced the amount of objects being created from 10000 to 100, while increasing the loop count so it still takes a while. No real difference
Furthermore, scheduling so many jobs, regardless of tech, purely to update game objects won't scale. Game designers update in batches.
My suggestion:
Design tip
Generally when there is alot of calculations that must be performed for many objects, instead of doing so in one frame, group them and spread them out over time. So for 10,000 objects maybe have a batch size of 1000 or 100? Source: Cities: Skylines;
Tell me more
Game Engine Architecture
Image copyright respective owners

Creating multithreads continuously

I have a string list includes file paths. The count of list elements is 80. I want to create 8 threads continuously until files in list have moved. If a thread finishes its work, I will create one thread so that thread count must be 8.
Can anybody help me?
Unless each thread is writing to a different drive, having multiple threads copying files is slower than doing it with a single thread. The disk drive can only do one thing at a time. If you have eight threads all trying to write to the same disk drive, then it takes extra time to do disk head seeks and such.
Also, if you don't have at least eight CPU cores, then trying to run eight concurrent threads is going to require extra thread context switches. If you're doing this on a four-core machine, then you shouldn't have more than four threads working on it.
If you really need to have eight threads doing this, then put all of the file paths into a BlockingCollection, start eight threads, and have them go to work. So you have eight persistent threads rather than starting and stopping threads all the time. Something like this:
BlockingCollection<string> filePaths = new BlockingCollection<string>();
List<Thread> threads = new List<Thread>();
// add paths to queue
foreach (var path in ListOfFilePaths)
filePaths.Add(path);
filePaths.CompleteAdding();
// start threads to process the paths
for (int i = 0; i < 8; ++i)
{
Thread t = new Thread(CopyFiles);
threads.Add(t);
t.Start();
}
// threads are working. At some point you'll need to clean up:
foreach (var t in threads)
{
t.Join();
}
Your CopyFiles method looks like this:
void CopyFiles()
{
foreach (var path in filePaths.GetConsumingEnumerable())
{
CopyTheFile(path);
}
}
Since you're working with .NET 4.0, you could use Task instead of Thread. The code would be substantially similar.

Resources