Objective C For loops with #autorelease and ARC - core-data

As part of an app that allows auditors to create findings and associate photos to them (Saved as Base64 strings due to a limitation on the web service) I have to loop through all findings and their photos within an audit and set their sync value to true.
Whilst I perform this loop I see a memory spike jumping from around 40MB up to 500MB (for roughly 350 photos and 255 findings) and this number never goes down. On average our users are creating around 1000 findings and 500-700 photos before attempting to use this feature. I have attempted to use #autorelease pools to keep the memory down but it never seems to get released.
for (Finding * __autoreleasing f in self.audit.findings){
#autoreleasepool {
[f setToSync:#YES];
NSLog(#"%#", f.idFinding);
for (FindingPhoto * __autoreleasing p in f.photos){
#autoreleasepool {
[p setToSync:#YES];
p = nil;
}
}
f = nil;
}
}
The relationships and retain cycles look like this
Audit has a strong reference to Finding
Finding has a weak reference to Audit and a strong reference to FindingPhoto
FindingPhoto has a weak reference to Finding
What am I missing in terms of being able to effectively loop through these objects and set their properties without causing such a huge spike in memory. I'm assuming it's got something to do with so many Base64 strings being loaded into memory when looping through but never being released.

So, first, make sure you have a batch size set on the fetch request. Choose a relatively small number, but not too small because this isn't for UI processing. You want to batch a reasonable number of objects into memory to reduce loading overhead while keeping memory usage down. Try 50 or 100 and see how it goes, then consider upping the batch size a little.
If all of the objects you're loading are managed objects then the correct way to evict them during processing is to turn them into faults. That's done by calling refreshObject:mergeChanges: on the context. BUT - that discards any changes, and your loop is specifically there to make changes.
So, what you should really be doing is batch saving the objects you've modified and then turning those objects back into faults to remove the data from memory.
So, in your loop, keep a counter of how many you've modified and save the context each time you hit that count and refresh all the objects that were processed so far. The batch on the fetch and the batch size to save should be the same number.

There's probably a big difference in size between your "Finding" objects and the associated images. So your primary aim should be to redesign your database in a way so that unfaulting (loading) a Finding object does not automatically load the base64 encoded image.
That's actually one of the major strengths of Code Data: Loading part of an object hierarchy. Just try to move the base64 encoded data to an own (managed) object so that Core Data does not load it. It will still be loaded as needed when the reference is touched.

Related

Why GBuffers need to be created for each frame in D3D12?

I have experience with D3D11 and want to learn D3D12. I am reading the official D3D12 multithread example and don't understand why the shadow map (generated in the first pass as a DSV, consumed in the second pass as SRV) is created for each frame (actually only 2 copies, as the FrameResource is reused every 2 frames).
The code that creates the shadow map resource is here, in the FrameResource class, instances of which is created here.
There is actually another resource that is created for each frame, the constant buffer. I kind of understand the constant buffer. Because it is written by CPU (D3D11 dynamic usage) and need to remain unchanged until the GPU finish using it, so there need to be 2 copies. However, I don't understand why the shadow map needs to do the same, because it is only modified by GPU (D3D11 default usage), and there are fence commands to separate reading and writing to that texture anyway. As long as the GPU follows the fence, a single texture should be enough for the GPU to work correctly. Where am I wrong?
Thanks in advance.
EDIT
According to the comment below, the "fence" I mentioned above should more accurately be called "resource barrier".
The key issue is that you don't want to stall the GPU for best performance. Double-buffering is a minimal requirement, but typically triple-buffering is better for smoothing out frame-to-frame rendering spikes, etc.
FWIW, the default behavior of DXGI Present is to stall only after you have submitted THREE frames of work, not two.
Of course, there's a trade-off between triple-buffering and input responsiveness, but if you are maintaining 60 Hz or better than it's likely not noticeable.
With all that said, typically you don't need to double-buffered depth/stencil buffers for rendering, although if you wanted to make the initial write of the depth-buffer overlap with the read of the previous depth-buffer passes then you would want distinct buffers per frame for performance and correctness.
The 'writes' are all complete before the 'reads' in DX12 because of the injection of the 'Resource Barrier' into the command-list:
void FrameResource::SwapBarriers()
{
// Transition the shadow map from writeable to readable.
m_commandLists[CommandListMid]->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_shadowTexture.Get(), D3D12_RESOURCE_STATE_DEPTH_WRITE, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE));
}
void FrameResource::Finish()
{
m_commandLists[CommandListPost]->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_shadowTexture.Get(), D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, D3D12_RESOURCE_STATE_DEPTH_WRITE));
}
Note that this sample is a port/rewrite of the older legacy DirectX SDK sample MultithreadedRendering11, so it may be just an artifact of convenience to have two shadow buffers instead of just one.

in-memory string deduplication

Context: I'm writing something to process log data which involves loading several GB of data into memory and cross checking various things, finding correlations in data and writing the results out to another file. (This is essentially a cooking/denormalization step before loading into a Druid.io cluster.) I want to avoid having to write the information to a database for both performance and code simplicity - it is assumed that in the foreseeable future the volume of data processed at one time can be handled by adding memory to the machine.
My question is if it is a good idea to attempt to explicitly deduplicate strings in my code; and if so, what is a good approach. Many of the values in these log files are the same exact pieces of text (probably about 25% of the total text values in the file are unique, rough guess).
Since we're talking about gigs of data, and while ram is cheap and swap is possible, there is still a limit and if I'm careless I will very likely hit it. If I do something like this:
strstore := make(map[string]string)
// do some work that involves slicing and dicing some text, resulting in:
var a = "some string that we figured out that has about a 75% chance of being duplicate"
// note that there are about 10 other such variables that are calculated here, only "a" shown for simplicity
if _, ok := strstore[a]; !ok {
strstore[a] = a
} else {
a = strstore[a]
}
// now do some more stuff with "a" and keep it in a struct which is in
// some map some place
It would seem to me that this would have the effect of "reusing" existing strings, at the cost of a hash lookup and compare. Seemingly a good trade off.
However, this might not be that helpful if the strings that are in fact new cause memory to be fragmented and have various holes that are left unclaimed.
I could also try to keep one big byte array/slice which has the character data and index into that, but it would make the code hard to write (esp having to mess around with conversion between []byte and strings, which involves it's own allocation) and I would probably just be doing a poor job of something that is really the Go runtime's domain anyway.
Looking for any advice on approaches to this problem, or if anyone's experience with this sort of thing has yielded particularly useful mechanisms to address this.
There are a lot of interesting data structures and algorithms that you could use here. It depends on what you are trying do in the stats and processing stages.
I am not sure how compressible your logs are but you could pre-process the data, again depending on your uses cases : https://github.com/alecthomas/mph/blob/master/README.md
Take a look at some of these data structures as well for background :
https://github.com/Workiva/go-datastructures

Unexpected memory usage of List<T>

I always thought that the default constructor for List would initialize a list with a capacity of 4 and that the capacity would be doubled when adding the 5th element, etc...
In my application I make a lot of lists (tree like structure where each node can have many children), some of these nodes won't have any children and since my application was fast but was also using a bit much memory I decided to use the constructor where I can specify the capacity and have set this at 1.
The strange thing now is that the memory usage when I start with a capacity of 1 is about 15% higher then when I use the default constructor. It can't be because of a better fit with 4 since the doubling would be 1,2,4. So why this extra increase in memory usage? As an extra test I've tried to start with a capacity of 4. This time again the memory usage was 15% higher then when using no specified capacity.
Now this really isn't a problem, but it bothers me that a pretty simple data structure that I've used for years has some extra logic that I didn't know about yet. Does anyone have an idea of the inner workings of List in this aspect?
That's because if you use the default constructor the internal storage array is set to an empty array, but if you use the constructor with a set size an array of the correct size gets set immediately, instead of being generated on the first call to Add.
You can see this using a decompiler like JustDecompile:
public List(int capacity)
{
if (capacity < 0)
{
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.capacity, ExceptionResource.ArgumentOutOfRange_NeedNonNegNum);
}
this._items = new T[capacity];
}
public List()
{
this._items = List<T>._emptyArray;
}
If you look at the Add function it calls EnsureCapacity, which will enlarge the internal storage array if required. Obviously, if the array is set to an empty array initially the first add will create the default size array.

Xna Xbox framedrops when GC kicks in

I'm developing an app (XNA Game) for the XBOX, which is a pretty simple app. The startpage contains tiles with moving gif images. Those gif images are actually all png images, which gets loaded once by every tile, and put in an array. Then, using a defined delay, these images are played (using a counter which increases every time a delay passes).
This all works well, however, I noticed some small lag every x seconds in the movement of the GIF images. I then started to add some benchmarking stuff:
http://gyazo.com/f5fe0da3ff81bd45c0c52d963feb91d8
As you can see, the FPS is pretty low for such a simple program (This is in debug, when running the app from the Xbox itself, I get an avg of 62fps).
2 important settings:
Graphics.SynchronizeWithVerticalRetrace = false;
IsFixedTimeStep = false;
Changing isFixedTimeStep to true increases the lag. The settings tile has wheels which rotate, and you can see the wheels go back a little every x seconds. The same counts for SynchronizeWVR, also increases lag.
I noticed a connection between the lag and the moment the garbage collector kicks in, every time it kicks in, there is a lag...
Don't mind the MAX HMU(Heap memory usage), as this is takes the amount of the start, the avg is more realistic.
Here is another screen from the performance monitor, however I don't understand much from this tool, first time I'm using it... Hope it helps:
http://gyazo.com/f70a3d400657ac61e6e9f2caaaf17587
After a little research I found the culprit.
I have custom components that all derive from GameComponent, and who get added to the Component list of the main Game class.
This was one (of a total of 2) major problem, causing to update everything that wasn't needing an update. (The draw method was the only one who kept the page state in mind, and only drew if needed).
I fixed this by using different "screens" (or pages as I called them), wich are the only components who derive from GameComponent.
Then I only update the page wich is active, and the custom components on that page also get updated. Problem fixed.
The second big problem, is the following;
I made a class which helps me on positioning stuff on the screen, relative that is, with percentages and stuff like that. Parent containers, aligns & v-aligns etc etc.
That class had properties, for size & vectors, but instead of saving the calculated value in a backing field, I recalculated them everytime I accessed a property. But calculating complex stuff like that uses references (to parent & child containers for example) wich made it very hard for the CLR, because it had alot of work to do.
I now rebuilt the whole positioning class to a fully functional optimized class, with different flags for recalculating when necessairy, and instead of drops of 20fps, I now get an average of 170+fps!

How to partially read from a TStringStream, free the read data from the stream and keep the rest (the unread data)?

What I want to do: lets suppose I have a TStringStream that just read a string with 100 characters. If I call .ReadString(50), I will get the first 50 characters of this stream and its cursor is going to be placed on the position 51.
My question is: how do I toss the characters 1 to 50 in this stream in a fast and clean way? I want to read the rest (51 to 100) later.
Thanks in advance.
You cannot do what you are hoping to do. The string stream's data is a Delphi string which is stored as a single memory block. Memory blocks are atomic, they cannot be split. You cannot free some part of a memory block.
If you really need to return memory to the memory manager then you should create a new string with the already processed data removed. You can then re-create your string stream with this new input and destroy the previous string stream.
Having said that, it's hard to see that doing much other than increasing your memory fragmentation. If the sizes of memory involved are large enough, and if the string stream persists for long enough, then this just might be a sensible approach. Otherwise it sounds like an attempt to optimise that actually would hinder performance.
Perhaps some class other than string stream could be more appropriate but it's very hard to advise without knowing more details.
You can't do this. If you really need to do this, you should write your own class that implements the stream-interface and which would let you process some data a little bit at a time and free whatever you want to free. Note that you would only be able to go through the data once, since you've now deleted your data. That is, seeking to the beginning again would become impossible, and your current stream "position" would be a lie.
In short, sounds like you're confused.
If I understand correctly you which to skip forward in the stream?
You can do:
Str.Position := Str.Position + 50;
Or like this:
Str.Seek(50,TSeekOrigin.soCurrent);

Resources