Nsight Compute can't profile Waveglow (PyTorch application) - pytorch

I tried to profile https://github.com/NVIDIA/waveglow by this command:
nv-nsight-cu-cli --export ./nsight_output ~/.virtualenvs/waveglow/bin/python3 inference.py -f <(ls mel_spectrograms/*.pt) -w waveglow_256channels.pt -o . --is_fp16 -s 0.6
Python command is from instruction of https://github.com/NVIDIA/waveglow#generate-audio-with-our-pre-existing-model ,
and it works with Nsight System, not Nsight Compute.
Profiling doesn't end printing this log; so I pressed Ctrl+C.
Also, It profiles only one kernel but I have more kernels. (checked by Nsight Systems)
...
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 286: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 287: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 288: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 289: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 290: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 291: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 292: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 293: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 294: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 295: 0%....50%....100% - 48 passes
==PROF== Profiling "weight_norm_fwd_first_dim_ker..." - 296: 0%....50%...^C
==PROF== Received signal, trying to shutdown target application
- 43 passes
==ERROR== Failed to profile kernel "weight_norm_fwd_first_dim_ker..." in process
==ERROR== An error occurred while trying to profile.
==ERROR== An error occurred while trying to profile
==PROF== Report: nsight_compute_result.nsight-cuprof-report
OS: CentOS Linux 7, Nsight Compute (2019.3.1, Build 26317742),
GPU: Tesla V100-PCIE-32GB
How can I fix this?

I don't think there is any error here, the tool behaves as expected. It does not profile only one kernel, it profiled 296 kernel launches already in your log output (which appear to all be from one kernel function).
You can control the number or types of kernels that are profiled using e.g. the --launch-count or the --kernel-regex options. You can also control the metrics collected for each kernel using --metrics and --section, as collecting fewer metrics reduces the overhead of the tool.
See https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html#command-line-options for more available command line options.

Related

systemd aborts in clock_gettime

When booting systemd on a 5.4 kernel or later on a 32bit CPU, systemd aborts:
Assertion 'clock_gettime(map_clock_id(clock_id), &ts) == 0' failed at ../src/basic/time-util.c:55, function now(). Aborting.
Why?
Enable CONFIG_COMPAT_32BIT_TIME when building the Linux kernel. Doing so will enable the relevant syscall. When the syscall is unavailable, it will return -ENOSYS triggering the assertion.
It became easier to disable around 5.4 and can now be disabled by allnoconfig.

Is setting the linux memory to unlimit will have an adverse effect?

I am running MPI job in linux server. I got error:
--------------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory. This typically can indicate that the
memlock limits are set too low. For most HPC installations, the
memlock limits should be set to "unlimited". The failure occured
here:
Local host: yw0431
OMPI source: ../../../../../ompi/mca/btl/openib/btl_openib_component.c:1216
Function: ompi_free_list_init_ex_new()
Device: mlx4_0
Memlock limit: 65536
You may need to consult with your system administrator to get this
problem fixed. This FAQ entry on the Open MPI web site may also be
helpful:
http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: yw0431
Local device: mlx4_0
--------------------------------------------------------------------------
[yw0431:20193] 11 more processes have sent help message help-mpi-btl-openib.txt / init-fail-no-mem
[yw0431:20193] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[yw0431:20193] 11 more processes have sent help message help-mpi-btl-openib.txt / error in device init
forrtl: error (78): process killed (SIGTERM)
it means that my linux server have locked memory with 65M, but my job needed more memory. I think 2G should be emough.
I have found a solution about ulimiting the memory:
ulimit -l unlimited
But i am worried that i will cause system crash or some bad things happen.
so can i set "ulimit -l umlimited"?
When you set ulimit as unlimited and your process starting using memory exhaustively then OOM killer will kill ur job for system stability,I would set the ulimit as 80 to 90% of RAM of instead of unlimited.

Hadoop error log jvm sqoop

My mistake - after 6-8 hours of running programs on Java i get this log hs_err_pid6662.log
and this
[testuser#apus ~]$ sh /home/progr/work/import.sh
/usr/bin/hadoop: fork: retry: Resource temporarily unavailable
/usr/bin/hadoop: fork: retry: Resource temporarily unavailable
/usr/bin/hadoop: fork: retry: Resource temporarily unavailable
/usr/bin/hadoop: fork: retry: Resource temporarily unavailable
/usr/bin/hadoop: fork: Resource temporarily unavailable
Programs run every five minutes and try to import/export from oracle
How to fix this?
# There is insufficient memory for the Java Runtime Environment to continue.
# Cannot create GC thread. Out of system resources.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (gcTaskThread.cpp:48), pid=6662,
tid=0x00007f429a675700
#
--------------- T H R E A D ---------------
Current thread (0x00007f4294019000): JavaThread "Unknown thread"
[_thread_in_vm, id=6696, stack(0x00007f429a575000,0x00007f429a676000)]
Stack: [0x00007f429a575000,0x00007f429a676000], sp=0x00007f429a674550,
free space=1021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
VM Arguments:
jvm_args: -Xmx1000m -Dhadoop.log.dir=/opt/cloudera/parcels/CDH-5.11.1-
1.cdh5.11.1.p0.4/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -
Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.11.1-
1.cdh5.11.1.p0.4/lib/hadoop -Dhadoop.id.str= -
Dhadoop.root.logger=INFO,console -
Launcher Type: SUN_STANDARD
Environment Variables:
JAVA_HOME=/usr/java/jdk1.8.0_102
# JRE version: (8.0_102-b14) (build )
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.102-b14 mixed mode linux-
amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
Memory: 4k page, physical 24591972k(6051016k free), swap 12369916k(11359436k
free)
I am running programs like sqoop-import,sqoop-export on Java every 5 minutes.
example:
#!/bin/bash
hadoop jar /home/progr/import_sqoop/oracle.jar.
CDH version 5.11.1
java version jdk1.8.0_102
OS:Red Hat Enterprise Linux Server release 6.9 (Santiago)
Mem free:
total used free shared buffers cached
Mem: 24591972 20080336 4511636 132036 334456 2825792
-/+ buffers/cache: 16920088 7671884
Swap: 12369916 1008664 11361252
Host Memory Usage
enter image description here
The maximum heap memory is (by default) limited to 1GB. You need to increase this
JRE version: (8.0_102-b14) (build )
jvm_args: -Xmx1000m -Dhadoop.log.dir=/opt/cloudera/parcels/CDH-5.11.1-
1.cdh5.11.1.p0.4/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -
Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.11.1-
1.cdh5.11.1.p0.4/lib/hadoop -Dhadoop.id.str= -
Dhadoop.root.logger=INFO,console -
Try the following for to increase this to 2048MB (or higher if required).
export HADOOP_CLIENT_OPTS="-Xmx2048m ${HADOOP_CLIENT_OPTS}"
Reference:
Pig: Hadoop jobs Fail
https://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201104.mbox/%3C5FFFF0E4-B3BA-420A-ADE3-B422A66E8B11#yahoo-inc.com%3E

Tie System.map values to kernel addresses

I'm trying to boot a custom kernel on a BeagleBoneBlack. u-boot works, and loads stuff as follows:
U-Boot 2016.03 (Apr 26 2016 - 11:32:30 +0000)
Watchdog enabled
I2C: ready
DRAM: 512 MiB
MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1
*** Warning - bad CRC, using default environment
Net: <ethaddr> not set. Validating first E-fuse MAC
cpsw, usb_ether
Press SPACE to abort autoboot in 2 seconds
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
Found /boot/extlinux/extlinux.conf
Retrieving file: /boot/extlinux/extlinux.conf
278 bytes read in 39 ms (6.8 KiB/s)
1: Linux grsec
Retrieving file: /boot/initramfs-grsec
5875398 bytes read in 349 ms (16.1 MiB/s)
Retrieving file: /boot/vmlinuz-4.4.8-grsec
3140944 bytes read in 211 ms (14.2 MiB/s)
append: BOOT_IMAGE=/boot/vmlinuz-4.4.8-grsec modules=loop,squashfs,sd-mod,usb-storage modloop=/boot/modloop-grsec console=ttyO0,115200n8
Retrieving file: /boot/dtbs/am335x-boneblack.dtb
31516 bytes read in 426 ms (71.3 KiB/s)
Kernel image # 0x82000000 [ 0x000000 - 0x2fed50 ]
## Flattened Device Tree blob at 88000000
Booting using the fdt blob at 0x88000000
Loading Ramdisk to 8fa65000, end 8ffff6c6 ... OK
Loading Device Tree to 8fa5a000, end 8fa64b1b ... OK
Starting kernel ...
Everything looks good so far, I think. But the kernel fails to load. I can't get access to anything from the kernel with low level debugging enabled in the kernel options either.
I've attached a J-Link JTAG debugger and was hoping to trace through to the problem, but I'm having trouble tying the System.map through to the disassembly.
Here for example is the start of the System.Map:
00000000 t __vectors_start
00000024 A cpu_ca8_suspend_size
00000024 A cpu_v7_suspend_size
0000002c A cpu_ca9mp_suspend_size
00001000 t __stubs_start
00001004 t vector_rst
00001020 t vector_irq
000010a0 t vector_dabt
00001120 t vector_pabt
000011a0 t vector_und
00001220 t vector_addrexcptn
00001240 t vector_fiq
00001240 T vector_fiq_offset
80204000 A swapper_pg_dir
80208000 T _text
80208000 T stext
8020808c t __create_page_tables
8020813c t __turn_mmu_on_loc
80208148 t __fixup_smp
802081b0 t __fixup_smp_on_up
802081d4 t __fixup_pv_table
80208228 t __vet_atags
80208280 T __idmap_text_start
80208280 T __turn_mmu_on
80208280 T _stext
So taking __create_page_tables, I grep in the source code under ./arch/arm/kernel with:
.../arm/arm/kernel$ grep __create_page_tables -rn
Binary file head.o matches
head.S:128: bl __create_page_tables
head.S:180:__create_page_tables:
head.S:355:ENDPROC(__create_page_tables)
So we're looking for the following at the symbol address:
__create_page_tables:
pgtbl r4, r8 # page table address
But the disassembler shows something different at the address I'm translating too give the Kernel is loaded at 0x82000000:
How can I translate the kernel symbols to the debugger addresses?

unable to debug Jmeter through CLI

I'm trying to run Jmeter through command line on Centos VM like so:
./jmeter -n -t temp_cli/sampler.jmx -l temp_cli/results.xml -j temp_cli/j.log
I get :
INFO - jmeter.threads.JMeterThread: Thread is done: sampler flow 1-1
INFO - jmeter.threads.JMeterThread: Thread finished: sampler flow 1-1
DEBUG - jmeter.threads.ThreadGroup: Ending thread sampler 1-1
summary = 1 in 1s = 2.0/s Avg: 434 Min: 434 Max: 434 Err: 1 (100.00%)
Tidying up ... # Wed Apr 13 07:57:42 UTC 2016 (1460534262577)
... end of run
It supposed to take more than 1s so I'm pretty sure somthing went wrong. The thing is I don't get enough data about what went wrong.
I tried tail -f jmeter.log but I got no errors
Anyone knows how can I get more information?
Your file results.xml will give you more details.
You can see here that you got 100% error rate so your unique sample failed.
If you are running the test in non gui mode on a different machine from where you ran the gui mode, then you most probably did not install the plugin jars.

Resources