Profiling arbitrary CUDA applications

Profiling arbitrary CUDA applications - linux

I know of the existence of nvvp and nvprof, of course, but for various reasons nvprof does not want to work with my app that involves lots of shared libraries. nvidia-smi can hook into the driver to find out what's running, but I cannot find a nice way to get nvprof to attach to a running process.
There is a flag --profile-all-processes which does actually give me a message "NVPROF is profiling process 12345", but nothing further prints out. I am using CUDA 8.
How can I get a detailed performance breakdown of my CUDA kernels in this situation?

As comments suggest, you simply have to make sure to start the CUDA profiler (now it's NSight Systems or NSight Compute, no longer nvprof) before the processes you want to profile. You could, for example, configure it to run on system startup.
Your inability to profile your application has nothing to do with it being an "app that involves lots of shared libraries" - the profiling tools profile such applications just fine.

I've been looking for the process attach solution too but found no existing tool.
A possible direction is to use lower CUDA API to build a tool or integrate to your tool. See cupti: https://docs.nvidia.com/cupti/r_main.html#r_dynamic_detach

Related

"Skipped 75 frames! The application may be doing too much work on its main thread." running empty Compose app

i'm learning how to use jetpack compose in android project.
i just created new project and choose empty compose activity template,
after build finished i run application on Android Emulator.
it successfully run but in Run logs it keep showing info log as
I/Choreographer: Skipped 75 frames! The application may be doing too much work on its main thread.
i'm worried about this issue.
can anyone please help me for this issue i will be very thankful.
log error snapshot

That's nothing to worry about. Emulator performance isn't necessarily representative of real device performance and is often slower due to the overhead of running a second operating system (Android) within your operating system. This is especially true if you don't have the emulator's various hardware acceleration options enabled.
Also, apps run from Studio are debuggable, which disables a number of the optimizations that ART (the Android runtime) would be able to perform on a release app. Plus it needs a bit to load the code into memory and perform just-in-time compilation of the Compose framework.
Bottom line: Don't worry about performance unless you see issues in release mode on a real device.

Debugging .NET Core under Linux

Currently I am trying to debug a Linux .NET Core application under Linux.
The trouble is, it fails somewhere right in the beginning, and I cannot get where. Logging is impossible under current circumstances.
As far as I can see on the Internet, and (severely avoiding any kind of systematizedness and consequtiveness) on MSDN specifically, the only currently available options for Linux are:
debug remotely (would not do well in my case);
Rider EAP by Jetbrains (proprietary decision);
using lldb.
So, my questions are:
Is there any way to launch the .NET Core self-contained app (via the "dotnet Some.dll" command) in such a way that it instantly breaks (i.e. as if there was a breakpoint) at the entry point?
If not, how can one launch lldb for a .NET Core console application attached (since numerous examples and issues over the Internet all show attaching to the already-running .NET Core process)?
Once again, there is the dotnet-dump utility, which works with already-running processes as well - so that, even dumps are an unavailable ooption for processes that crash almost instantly. I expected there might have been ways to make it dump like (imaginary) "dotnet-dump collect SomeInvocation.dll" along with (actully existing) "dotnet-dump collect --process-id 1234". Is there such a way?

Is it possible to profile only a plugin shared library without impacting main program?

Is it possible to profile only a shared library without looking the main program ?
For example, I developed a plugin and I would like to profile but with no need to profile the whole application. I just want to see the bottleneck of my plugin. (Of course, I would like to profile it while the main application is running and has loaded my plugin...)
I'm working on linux and I'm used to callgrind, but for the curiosity, I'm also interested by the possibilities on all systems, so I let the question general.
I'm interested in this because the main program is quite slow, and don't want to add the overhead of profiling on since I'm not interested by the main program performance here...

In Linux perf statistical profiling tool has very low overhead (1-2%), so you can profile entire application with perf record ./your_application and then analyze generated profile perf.data with perf report command. You may filter perf report output to some shared libraries or search function names of your plugin. Read more at http://www.brendangregg.com/perf.html
Callgrind is not just a profiler, it is binary recompiler used to implement exact profiler with instrumentation approach and it has 10-20 times overhead for any code, even when profiling tool is not enabled.

Your plugin only runs during certain times, right? Like when the user requests certain activities? People use this method with any IDE, manually pausing during that time. The pause will land in the plugin according to how much time it uses. Before you pause there is no performance impact because the app runs full speed, while in the pause, it is stopped, which you don't care because you're diagnosing your plugin.

Profiling Node.js web application on Linux

Which would be the best option to profile a Node.js application on linux? I tried https://github.com/c4milo/node-webkit-agent and https://github.com/baryshev/look (this is based on nodetime), but they both seem pretty experimental. What surprises me the most is that the results reported by these tools are different.
The major disadvantages for look are that the heap snapshots aren't very relevant and you can't CPU profile for more than 1 minute.
With node-webkit-agent the Chrome browser is running out of memory.
I'm doing profiling while sending requests using JMeter to my web application.

Not sure if you're willing to use an online service instead of a module, but you could give http://nodefly.com/ a try, it's free and has worked quite good for me.

Performance of IcedTea 6 vs Sun's HotSpot 6

How does IcedTea 6's performance stand up against Sun's own HotSpot on linux systems? I tried searching Google but Phoronix's test is the best I got, which is almost a year old now. Hopefully things have improved since then.
Also, once Sun completely open sources the JVM, would it be possible to implement it for Linux platforms such that a main module (Quickstarter in the Consumer JRE) starts up with the OS and loads the minimal Java kernel, regardless of any Java apps running. And then progressively load other modules as necessary. Might improve startup times.

so it will be within the answer: http://www.phoronix.com/scan.php?page=article&item=java_vm_performance&num=1 and http://www.phoronix.com/scan.php?page=article&item=os_threeway_2008&num=1
I'd expect SUN's stuff to be faster, but it really depends on all kinds of optimizations, so one version might be faster doing operation X, but in the next version it might not be as fast..
EDIT:
regarding kernel preloading: on linux you may use preload or alternatives to speed up app loading, without affecting the overall system performance (loading a Quickstarter equivalent will keep memory occupied at all times). Also, as far as i know, java loads lots of shared libraries, that are shared between apps, so i don't really see the point of building in-kernel support for this thing. I guess its easy to make a simple app that loads some libraries and does nothing after that(quickstarter), but i dont see this doing a big difference when loading apps, and in some cases it might even slow down the system(i'm thinking about ram usage, and memory swapping)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Profiling arbitrary CUDA applications - linux

I've been looking for the process attach solution too but found no existing tool. A possible direction is to use lower CUDA API to build a tool or integrate to your tool. See cupti: https://docs.nvidia.com/cupti/r_main.html#r_dynamic_detach

Related

"Skipped 75 frames! The application may be doing too much work on its main thread." running empty Compose app

Debugging .NET Core under Linux

Is it possible to profile only a plugin shared library without impacting main program?

Profiling Node.js web application on Linux

Performance of IcedTea 6 vs Sun's HotSpot 6

Categories

Resources