Newbie to monotouch and 'native' iPhone development here. I'm trying to debug some memory leaks and having trouble doing so. My app is UIImage heavy and I'm definitely leaking them, but have trouble finding where.
Trying to use instruments to find where, but looks like I can't see the symbols, just see raw addresses. Any idea on how to get instruments to see correct callstack? Would seem to me that since monotouch doesn't JIT compile, monotouch should be able to generate symbols for the code it generates.
Example of callstack in instruments:
0 libSystem.B.dylib calloc
1 libobjc.A.dylib _internal_class_createInstanceFromZone
2 libobjc.A.dylib class_createInstance
3 CoreFoundation +[NSObject(NSObject) allocWithZone:]
4 CoreFoundation +[NSObject(NSObject) alloc]
5 UIKit GetPartImages
6 UIKit -[UITextFieldRoundedRectBackgroundView _updateImages]
7 UIKit -[UITextFieldBackgroundView initWithFrame:active:]
8 UIKit -[UITextField setBorderStyle:]
9 0xdf40ced
10 0xdf41846
11 0xdf40f0e
12 0xdf3fc75
13 0x88b2051
14 iPhone mono_jit_runtime_invoke
15 iPhone mono_runtime_invoke
16 iPhone mono_runtime_invoke_array
17 iPhone ves_icall_InternalInvoke
18 0xcfc5d93
19 0xcfc57c5
20 0xcfc5648
21 0xcfc9083
22 0xcfc8f2d
23 0xcfc8d93
24 0xdcb02c1
25 0xdcb025b
26 0xdcb0034
Related
I'm running very "simple" Test with.
#Fork(value = 1, jvmArgs = { "--illegal-access=permit", "-Xms10G", "-XX:+UnlockDiagnosticVMOptions", "-XX:+DebugNonSafepoints", "-XX:ActiveProcessorCount=7",
"-XX:+UseNUMA"
, "-XX:+UnlockDiagnosticVMOptions", "-XX:DisableIntrinsic=_currentTimeMillis,_nanoTime",
"-Xmx10G", "-XX:+UnlockExperimentalVMOptions", "-XX:ConcGCThreads=5", "-XX:ParallelGCThreads=10", "-XX:+UseZGC", "-XX:+UsePerfData", "-XX:MaxMetaspaceSize=10G", "-XX:MetaspaceSize=256M"}
)
#Benchmark
public String generateRandom() {
return UUID.randomUUID().toString();
}
May be it's not very simple, because uses random, but same issue is on any other tests with java
On my home desktop
Intel(R) Core(TM) i7-8700 CPU # 3.20GHz 12 Threads (hyperthreading enabled ), 64 GB Ram, "Ubuntu" VERSION="20.04.2 LTS (Focal Fossa)"
Linux homepc 5.8.0-59-generic #66~20.04.1-Ubuntu SMP Thu Jun 17 11:14:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
Performance with 7 threads:
Benchmark Mode Cnt Score Error Units
RulesBenchmark.generateRandom thrpt 5 1312295.357 ± 27853.707 ops/s
Flame Graph with AsyncProfiler Result with 7 Thread At Home
I have an issue on Oracle Linux
Linux 5.4.17-2102.201.3.el8uek.x86_64 #2 SMP Fri Apr 23 09:05:57 PDT 2021 x86_64 x86_64 x86_64 GNU/Linux
Intel(R) Xeon(R) Gold 6258R CPU # 2.70GHz with 56 Threads(hyperthreading disabled, the same when enabled and there is 112 cpu threads ) and 1 TB RAM I have half of performance (Even increasing threads) NAME="Oracle Linux Server" VERSION="8.4"
with 1 thread, I have very great performance:
Benchmark Mode Cnt Score Error Units
RulesBenchmark.generateRandom thrpt 5 2377471.113 ± 8049.532 ops/s
Flame Graph with AsyncProfiler Result 1 Thread
But with 7 thread
Benchmark Mode Cnt Score Error Units
RulesBenchmark.generateRandom thrpt 5 688612.296 ± 70895.058 ops/s
Flame Graph with AsyncProfiler Result 7 Thread
May be it's an issue of NUMA becase there is 2 Sockets, and system is configured with only 1 NUMA node
numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
node 0 size: 1030835 MB
node 0 free: 1011029 MB
node distances:
node 0
0: 10
But after disabling some cpu threads using:
for i in {12..55}
do
# your-unix-command-here
echo '0'| sudo tee /sys/devices/system/cpu/cpu$i/online
done
Performance little improved, not much.
This is just very "simple" test. On complex test with real code, it's even worth,
It spends a lot of time on .annobin___pthread_cond_signal.start
I also deployed vagrant image with the same version of Oracle Linux and kernel version on my home desktop and run it with 10 cpu threads, and performance was nearly as same (~1M op/sec) as on my descktop. So it's not about OS or kernel, but some configuration
Tested with several jDK versions and vendors (jdk 11 and above). It's very little performance when using OpenJDK 11 from YUM distribution, but not significant.
Can you sugest some advice
Thanks in advance
In essense, your benchmark tests the throughput of SecureRandom. The default implementation is synchronized (more precisely, the default implementation mixes the input form /dev/urandom and the above provider).
The paradox is, more threads result in more contention, and thus lower overall performance, as the main part of the algorithm is under a global lock anyway. Async-profiler indeed shows that the bottleneck is the synchronization on a Java monitor: __lll_unlock_wake, __pthread_cond_wait, __pthread_cond_signal - all come from that synchronization.
The contention overhead definitely depends on the hardware, the firmware, and the OS configuration. Instead of trying to reduce this overhead (which can be hard, as, you know, some day will arrive yet another security patch that will make syscalls 2x slower, for example), I'd suggest to get rid of the contention in the first place.
This can be achieved by installing a different, non-blocking SecureRandom provider like shown in this answer. I won't give a recommendation on a particular SecureRandomSpi, as it depends on your specific requirements (throughput/scalability/security). Will just mention that an implementation can be based on
rdrand
OpenSSL
raw /dev/urandom access, etc.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I'm seeing similar output to that on the traceroute manpage, but I've always been curious on why ping times go down between hops? I.e., I'd always expect values to increment from hop to hop like:
[yak 71]% traceroute nis.nsf.net.
traceroute to nis.nsf.net (35.1.1.48), 64 hops max, 38 byte packet
1 helios.ee.lbl.gov (128.3.112.1) 19 ms 19 ms 0 ms
2 lilac-dmc.Berkeley.EDU (128.32.216.1) 39 ms 39 ms 19 ms
3 lilac-dmc.Berkeley.EDU (128.32.216.1) 39 ms 39 ms 19 ms
4 ccngw-ner-cc.Berkeley.EDU (128.32.136.23) 39 ms 40 ms 39 ms
5 ccn-nerif22.Berkeley.EDU (128.32.168.22) 39 ms 39 ms 39 ms
6 128.32.197.4 (128.32.197.4) 40 ms 59 ms 59 ms
7 131.119.2.5 (131.119.2.5) 59 ms 59 ms 59 ms
8 129.140.70.13 (129.140.70.13) 99 ms 99 ms 80 ms
9 129.140.71.6 (129.140.71.6) 139 ms 239 ms 319 ms
10 129.140.81.7 (129.140.81.7) 220 ms 199 ms 199 ms
11 nic.merit.edu (35.1.1.48) 239 ms 239 ms 239 ms
But what is the reasoning for:
[yak 72]% traceroute allspice.lcs.mit.edu.
traceroute to allspice.lcs.mit.edu (18.26.0.115), 64 hops max
1 helios.ee.lbl.gov (128.3.112.1) 0 ms 0 ms 0 ms
2 lilac-dmc.Berkeley.EDU (128.32.216.1) 19 ms 19 ms 19 ms
3 lilac-dmc.Berkeley.EDU (128.32.216.1) 39 ms 19 ms 19 ms
4 ccngw-ner-cc.Berkeley.EDU (128.32.136.23) 19 ms 39 ms 39 ms
5 ccn-nerif22.Berkeley.EDU (128.32.168.22) 20 ms 39 ms 39 ms
6 128.32.197.4 (128.32.197.4) 59 ms 119 ms 39 ms
7 131.119.2.5 (131.119.2.5) 59 ms 59 ms 39 ms
8 129.140.70.13 (129.140.70.13) 80 ms 79 ms 99 ms
9 129.140.71.6 (129.140.71.6) 139 ms 139 ms 159 ms
10 129.140.81.7 (129.140.81.7) 199 ms 180 ms 300 ms
11 129.140.72.17 (129.140.72.17) 300 ms 239 ms 239 ms
12 * * *
13 128.121.54.72 (128.121.54.72) 259 ms 499 ms 279 ms
14 * * *
15 * * *
16 * * *
17 * * *
18 ALLSPICE.LCS.MIT.EDU (18.26.0.115) 339 ms 279 ms 279 ms
Why is the time from hops 3 to 4 go from 39ms to 19ms?
They are not just different "hops" but indeed totally different events, at different times, and possibly under different conditions.
The way traceroute works is, it sends a packet with at TTL of 1 and sees how long it takes for an answer to arrive. The program repeats this three times to account for some variation that is possible and normal (note that even the three numbers in "the same hop" are sometimes very different, since they are totally separate things, too).
Then, traceroute sends a packet with a TTL of 2 (and then 3, and 4, etc.) and sees how long it takes for an answer to arrive. In the mean time, the load on the first router may have lowered, so your packets are actually forwarded faster. Or, the physical cable where your packets go out may not be busy because your roommate just finished downloading his porn (whereas before your network card would have to wait for the cable to get idle). This is something that happens all the time, and you have no way of knowing or controlling that.
There is no way of going back in time, so you can't change the fact that a second ago it took a bit longer for your packets to make it through. You're probing at two different times, so all you can get is a reasonable guestimate. Yes, generally, the time from hop to hop should gradually increase, but it needs not.
That doesn't mean that the round trip time is really going down between hops 3 and 4. It only means that it went down in the meantime while you you independently tried to get an estimate of how long 3 hops and 4 hops might take.
Most likely random delays between the lines. Note that on both the second and third packet sent for hops 3-4 go from 19ms to 39ms. They're all on the same network anyway, so most delays would be due to processing/congestion.
Although, another case that you occasionally see is that some servers provide higher QoS to ICMP messages, superficially giving the appearance of better performance.
On a packet-switched networks packet delays are not deterministic, particularly if there's congestion in the network. Each hop of the traceroute test is a separate test, so varying network conditions can easily cause later tests to have better latency than the nearer hops.
That is to say, traceroute doesn't just send out three packets and then somehow figure out what the time was at each hop. It sends out packets deliberately designed to timeout after n hops and measures the time as n increases.
I'm coming across a problematic leak which - thanks to Instruments - appears to be coming from CTFrameSetterCreateWithAttributedString. The call stack is below.
1 CoreText -[_CTNativeGlyphStorage prepareWithCapacity:preallocated:]
2 CoreText -[_CTNativeGlyphStorage initWithCount:]
3 CoreText +[_CTNativeGlyphStorage newWithCount:]
4 CoreText TTypesetterAttrString::Initialize(__CFAttributedString const*)
5 CoreText TTypesetterAttrString::TTypesetterAttrString(__CFAttributedString const*)
6 CoreText TFramesetterAttrString::TFramesetterAttrString(__CFAttributedString const*)
7 CoreText CTFramesetterCreateWithAttributedString
The code generating this call stack is:
CFAttributedStringRef attrRef = (CFAttributedStringRef)self.attributedString;
CTFramesetterRef framesetter = CTFramesetterCreateWithAttributedString(attrRef);
CFRelease(attrRef);
...
CFRelease(framesetter);
self.attributedString gets released elsewhere. I feel like I'm correctly releasing everything else... Given all this, where could the leak be coming from? It's fairly significant for my purposes - 6-10 MB a pop. Thanks for any help.
Turns out this problem was substantially addressed by not using an ivar for a CFMutableArrayRef. We were using the Akosma Obj-C wrapper for CoreText, and in the AKOMultiColumnTextView class there's a CFMutableArrayRef _frames ivar which - for some reason - is hanging around even after CFRelease calls. Not having the _frames ivar is a bit expensive, but has drastically improved memory usage. Thanks Neevek for your contributions.
I am working on an app that makes extensive use of AVfoundation. Recently I did some leak checking with Instruments. The "leaks" instrument was reporting a leak at a point a in the code where I was instantiating a new AVPlayer, like this:
player1 = [AVPlayer playerWithPlayerItem:playerItem1];
To reduce the problem, I created an entirely new Xcode project with for a single view application, using ARC, and put in the following line.
AVPlayer *player = [[AVPlayer alloc] init];
This produces the same leak report in Instruments. Below is the stack trace. Does anybody know why a simple call to [[AVPlayer alloc] init] would cause a leak? Although I am using ARC, I tried a turning it off and inserting the corresponding [player release]; instruction and it makes no difference. This seems to have to do specifically with AVPlayer.
0 libsystem_c.dylib malloc
1 libsystem_c.dylib strdup
2 libnotify.dylib token_table_add
3 libnotify.dylib notify_register_check
4 AVFoundation -[AVPlayer(AVPlayerMultitaskSupport) _iapdExtendedModeIsActive]
5 AVFoundation -[AVPlayer init]
6 TestApp -[ViewController viewDidLoad] /Users/jason/Synaptic Revival/Project Field Trip/software development/TestApp/TestApp/ViewController.m:22
7 UIKit -[UIViewController view]
--- 2 frames omitted ---
10 UIKit -[UIWindow makeKeyAndVisible]
11 TestApp -[AppDelegate application:didFinishLaunchingWithOptions:] /Users/jason/Synaptic Revival/Project Field Trip/software development/TestApp/TestApp/AppDelegate.m:24
12 UIKit -[UIApplication _callInitializationDelegatesForURL:payload:suspended:]
--- 3 frames omitted ---
16 UIKit _UIApplicationHandleEvent
17 GraphicsServices PurpleEventCallback
18 CoreFoundation __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__
--- 3 frames omitted ---
22 CoreFoundation CFRunLoopRunInMode
23 UIKit -[UIApplication _run]
24 UIKit UIApplicationMain
25 TestApp main /Users/jason/software development/TestApp/TestApp/main.m:16
26 TestApp start
This 48-bytes leak is confirmed by Apple as a known issue, which not only lives in AVPlayer but also in UIScrollView (I have an app happened to use both components.)
Please see this thread to get detail:
Memory leak every time UIScrollView is released
Here's the link to apple's answer on the thead (you may need a developer id to sign in):
https://devforums.apple.com/thread/144449?start=0&tstart=0
Apple's brief quote:
This is a known bug that will be fixed in a future release.
In the meantime, while all leaks are obviously undesirable this isn't going to cause any user-visible problems in the real world. A user would have to scroll roughly 22,000 times in order to leak 1 megabyte of memory, so it's not going to impact daily usage.
It seems any component that refers to notify_register_check and notify_register_mach_port will cause this issue.
Currently no obvious walk around or fix can be found. It is confirmed that this issue remains in iOS versions for 5.1 and 5.1.1. Hopefully apple can fix that in iOS 6 because it is really annoying and destructive.
I have an instance of AVPlayer in my application. I use the time boundary observing feature:
[self setTimeObserver:[player addBoundaryTimeObserverForTimes:watchedTimes
queue:NULL usingBlock:^{
NSLog(#"A: %i", [timeObserver retainCount]);
[player removeTimeObserver:timeObserver];
NSLog(#"B: %i", [timeObserver retainCount]);
[self setTimeObserver:nil];
}]];
The problem is that according to Instruments I am leaking some arrays and values somewhere around this code. I checked the retain count of the time-observing token returned by AVPlayer on places marked A and B in the sample code. At the A point the retain count is 2, at point B the retain count increases to 3 (!). Adding a local autorelease pool does not change anything. I know that retain count is not a reliable metric, but this seems to be fishy. Any ideas about why the retain count increases or about my leaks? The stack trace at the leak point looks like this:
0 libSystem.B.dylib calloc
1 libobjc.A.dylib _internal_class_createInstanceFromZone
2 libobjc.A.dylib class_createInstance
3 CoreFoundation __CFAllocateObject2
4 CoreFoundation +[__NSArrayI __new::]
5 CoreFoundation -[__NSPlaceholderArray initWithObjects:count:]
6 CoreFoundation +[NSArray arrayWithObjects:count:]
7 CoreFoundation -[NSArray sortedArrayWithOptions:usingComparator:]
8 CoreFoundation -[NSArray sortedArrayUsingComparator:]
9 AVFoundation -[AVPlayerOccasionalCaller initWithPlayer:times:queue:block:]
10 AVFoundation -[AVPlayer addBoundaryTimeObserverForTimes:queue:usingBlock:]
If I understand things correctly, AVPlayerOccasionalCaller is the “opaque” object returned by addBoundaryTimeObserverForTimes:queue:usingBlock:, or the time observer.
Do not use -retainCount.
The absolute retain count of an object is meaningless.
You should call release exactly same number of times that you caused the object to be retained. No less (unless you like leaks) and, certainly, no more (unless you like crashes).
See the Memory Management Guidelines for full details.
In this specific case, the retain count you are printing is entirely irrelevant. removeTimeObserver: is probably retaining and autoreleasing the object. Doesn't really matter; it is an implementation detail.
When using the Leaks template in Instrument, note that the Allocations instrument is configured to record reference counts. When you have detected a "leak", look at the list of reference count events for that object. There will likely be a stack where some code of yours is triggering an extra retain. If not, it might be a framework bug.