Analyse TF training speed - how to debug? - linux

I've trained the same model on 2 installations:
1. iMac (late '13) 3.5Ghz i7 & 32GB ram
2. 1 node in a Debian slurm CPU-cluster with 24cores 2.2Ghz, 64gb ram
Both in TF version 1.0.1 (cpu), both were installed with pip. Training times on the model:
1. iMac: Total Time: 86.92s - 207,42s user 38,82s system 267% CPU 1:32,07
2. Linux: Total Time: 330.66s - real 5m38.478s user 22m14.264s
sys 3m5.964s
Obviously not what I was expecting. How can I search for bottlenecks, or profile, and find out why the Linux setup is so slow?

Related

WineBottler returning error message every time i try an open .exe file on Mac

I'm trying to open a plugin for SNAP (to process Sentinel-3 imagery) on my Mac - the plugin downloads as an .exe file which means I need to open it using WineBottler. Every time I try and open the file however, I get this error message:
###BOTTLING### default.sh
/var/folders/rz/rr6ytzhx5gl60f1v1tbc67xm0000gn/T/AppTranslocation/6CDA1855-FA78-4A2A-A976-2C1A539F36ED/d/WineBottler.app/Contents/Frameworks/WBottler.framework/Resources/bottler.sh: line 39: /Applications/Wine.app/Contents/Resources/bin/wine: Bad CPU type in executable
###BOTTLING### Gathering debug Info...
Versions
OS...........................: darwin21
Wine.........................:
WineBottler..................: 1.8.6
Wineticks....................: 20220411-next - sha256sum: b6370f13c4dc410023f2a4e4e9a4385d2a0420031666c2f30befccc9b39c8f65
Environment
PWD..........................: '/Applications/Wine.app/Contents/Resources/bin'
PATH.........................: /Applications/Wine.app/Contents/Resources/bin:/usr/bin:/bin:/usr/sbin:/sbin
USER.........................: hannah
HOME.........................: /Users/hannah
COMPUTERNAME.................: hannahâs MacBook Air
BUNDLERESOURCEPATH...........: /var/folders/rz/rr6ytzhx5gl60f1v1tbc67xm0000gn/T/AppTranslocation/6CDA1855-FA78-4A2A-A976-2C1A539F36ED/d/WineBottler.app/Contents/Frameworks/WBottler.framework/Resources
WINEPREFIX...................: /Applications/Wine.app/Contents/Resources
WINEPATH.....................: /Applications/Wine.app/Contents/Resources/bin
LD_LIBRARY_PATH..............: /Applications/Wine.app/Contents/Resources/lib:/opt/X11/lib:/usr/X11/lib
DYLD_FALLBACK_LIBRARY_PATH...: /Applications/Wine.app/Contents/Resources/lib:/usr/lib:/opt/X11/lib:/usr/X11/lib
SILENT.......................:
http_proxy...................:
https_proxy..................:
ftp_proxy....................:
socks5_proxy.................:
Bottle
TEMPLATE.....................:
BOTTLE.......................: /Users/hannah/Desktop/Untitled.app
INSTALLER_URL................: /Users/hannah/Desktop/iCOR_Setup_3.0.0.exe
INSTALLER_IS_ZIPPED..........: 0
INSTALLER_NAME...............: iCOR_Setup_3.0.0.exe
INSTALLER_ARGUMENTS..........:
REMOVE_MONO..................:
REMOVE_GECKO.................:
REMOVE_USERS.................:
REMOVE_INSTALLERS............:
WINETRICKS_ITEMS.............: winxp
DLL_OVERRIDES................:
EXECUTABLE_PATH..............: winefile
EXECUTABLE_ARGUMENTS.........:
EXECUTABLE_VERSION...........: 1.0.0
BUNDLE_COPYRIGHT.............: © Your Company
BUNDLE_IDENTIFIER............: com.yourcompany.yourapp
BUNDLE_CATEGORYTYPE..........: public.app-category.business
SILENT.......................:
Hardware:
Hardware Overview:
Model Name: MacBook Air
Model Identifier: MacBookAir7,2
Processor Name: Dual-Core Intel Core i5
Processor Speed: 1.6 GHz
Number of Processors: 1
Total Number of Cores: 2
L2 Cache (per Core): 256 KB
L3 Cache: 3 MB
Hyper-Threading Technology: Enabled
Memory: 4 GB
System Firmware Version: 476.0.0.0.0
OS Loader Version: 540.120.3~22
SMC Version (system): 2.27f2
Serial Number (system): C02QM1XWG941
Hardware UUID: EE27242F-C2B2-59E6-AAED-D598D1D61044
Provisioning UDID: EE27242F-C2B2-59E6-AAED-D598D1D61044
###BOTTLING### Create .app...
###BOTTLING### Enabling CoreAudio, Colors, Antialiasing and flat menus...
/var/folders/rz/rr6ytzhx5gl60f1v1tbc67xm0000gn/T/AppTranslocation/6CDA1855-FA78-4A2A-A976-2C1A539F36ED/d/WineBottler.app/Contents/Frameworks/WBottler.framework/Resources/bottler.sh: line 134: /Applications/Wine.app/Contents/Resources/bin/wine: Bad CPU type in executable
### LOG ### Command '/Applications/Wine.app/Contents/Resources/bin/wine regedit /tmp/reg.reg' returned status 126.
###ERROR### Command '/Applications/Wine.app/Contents/Resources/bin/wine regedit /tmp/reg.reg' returned status 126.
Task returned with status 1.
I've tried downloading the 'stable' version of WineBottler, download and redownload it to no avail - it always returns this message. I can't seem to find any way of getting around this or recently posted question (a lot are from 2010-15 and are outdated in their solutions)
Does anyone know what I can do to get around this and open it? It's driving me insane!!!
Thanks!

Pytorch not detecting multiple GPUs

I have 10 GPUs available and 1 GPU (e.g. GPU#9) is in use by another torch process. I would like to run another process on any of the remaining GPUs (e.g. GPU#2, GPU#3, GPU#4) but I always get the error message:
RuntimeError: CUDA error: all CUDA-capable devices are busy or unavailable
I tried several options and none of them worked:
OPTION 1: Selecting GPUs on python script
import os
os.environ[“CUDA_DEVICE_ORDER”] = ‘PCI_BUS_ID’
os.environ[“CUDA_VISIBLE_DEVICES”] = ‘2,3,4’
print(f’[INFO] Using GPU: {torch.cuda.current_device()}‘)
print(f’[INFO] Available GPUs: {torch.cuda.device_count()}')
for d in range(torch.cuda.device_count()):
print(torch.cuda.get_device_name(d))
It recognises the 3 GPUs and the process will now assign them as GPU#0, GPU#1 and GPU#3 within that process.
[INFO] Using GPU: 2,3,4
[INFO] Available GPUs: 10
GeForce GTX 1080 Ti
OPTION 2: Selecting GPU on command line:
CUDA_VISIBLE_DEVICES=2,3,4 python gdxray_cganTrainer.py
and I check on linux using
env | grep CUDA_VISIBLE_DEVICES
CUDA_VISIBLE_DEVICES=2,3,4
None of the above 2 options work as still get the CUDA-capable devices are busy or unavailable. It looks like it knows that the already running process on GPU#9 was assigned as GPU#0 and when I select GPU#2,3,4 the first GPU will be assigned as GPU#0 in the new process. But it shouldn't be as they are different processes
I am using torch 1.8.0, python 3.8.8, cudatoolkit 11.1.1
Any help?

"Microsoft Windows 10 Enterprise 2016 LTSB 10.0.14393 Version 1607" fails on boot in qemu/kvm (proxmox)

I have played with different versions of windows 10 inside qemu/kvm (proxmox) and all of them works fine except: "Microsoft Windows 10 Enterprise 2016 LTSB 10.0.14393 Version 1607".
I don't think that the problem is connected with proxmox itself. As I know proxmox is stable and reliable system that use qemu/kvm under the hood. So lets think more about qemu/kvm. However my proxmox versions below.
root#home:~# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-10-pve)
pve-manager: 5.3-8 (running version: 5.3-8/2929af8e)
pve-kernel-4.15: 5.3-1
pve-kernel-4.15.18-10-pve: 4.15.18-32
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-44
libpve-guest-common-perl: 2.0-19
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-36
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-2
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-33
pve-container: 2.0-33
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-17
pve-firmware: 2.0-6
pve-ha-manager: 2.0-6
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 3.10.1-1
qemu-server: 5.0-45
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.12-pve1~bpo1
In any case I have not found any similar thread on proxmox forum. That's why I post my problem here.
This is clean original msdn iso from microsoft with confirmed hash sums(installed more then 10 times).
Steps to reproduce:
Create VM with next configuration
root#home:~# cat /etc/pve/qemu-server/102.conf
bios: ovmf
boot: dcn
bootdisk: scsi0
cores: 8
cpu: host
efidisk0: local-lvm:vm-102-disk-0,size=4M
ide2: iso-backs:iso/MS DaRT 10 Eng x86 x64.iso,media=cdrom,size=600320K
machine: q35
memory: 8192
name: win10-test
net0: virtio=C2:25:D9:DD:F2:4F,bridge=vmbr0
numa: 1
ostype: win10
scsi0: local-lvm:vm-102-disk-1,size=100G
scsi1: external:vm-102-disk-0,size=100G
scsihw: virtio-scsi-pci
smbios1: uuid=9d455cbf-1fa2-495f-928d-3935ec39c245
sockets: 1
usb0: host=1c4f:0002
usb1: host=09da:9090
vmgenid: 40cd47b6-35c4-47ab-8f9e-ed2acb618fcc
Install latest virtio drivers (scsi, netkvm, baloon, qemu-fwcfg)
Accept disk auto-partitioning (4 partitions will be created for this iso)
Wait installations end and reboot the system
Boot will stuck at proxmox logo
However, I can always boot from live cd (MS DaRT), to do that I need to manually choose harddisk from "Use a device" menu.
Once it load successfully there is a chance it will boot again an indefinite number of times. I can't figure out the reason of such behavior.
I have tried to avoid this problem by installing grub. But nothing has changed - I am still able to load system through live cd and always have a random chance to stuck at default load process.
Event Viewer errors(repeatable):
Distributed COM Event_id: 10016
Eventlog Event_id: 1101
Kernel-Power Event_id: 41
Eventlog Event_id: 6008
Kernel-Power Event_id: 13

Hadoop error log jvm sqoop

My mistake - after 6-8 hours of running programs on Java i get this log hs_err_pid6662.log
and this
[testuser#apus ~]$ sh /home/progr/work/import.sh
/usr/bin/hadoop: fork: retry: Resource temporarily unavailable
/usr/bin/hadoop: fork: retry: Resource temporarily unavailable
/usr/bin/hadoop: fork: retry: Resource temporarily unavailable
/usr/bin/hadoop: fork: retry: Resource temporarily unavailable
/usr/bin/hadoop: fork: Resource temporarily unavailable
Programs run every five minutes and try to import/export from oracle
How to fix this?
# There is insufficient memory for the Java Runtime Environment to continue.
# Cannot create GC thread. Out of system resources.
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (gcTaskThread.cpp:48), pid=6662,
tid=0x00007f429a675700
#
--------------- T H R E A D ---------------
Current thread (0x00007f4294019000): JavaThread "Unknown thread"
[_thread_in_vm, id=6696, stack(0x00007f429a575000,0x00007f429a676000)]
Stack: [0x00007f429a575000,0x00007f429a676000], sp=0x00007f429a674550,
free space=1021k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
VM Arguments:
jvm_args: -Xmx1000m -Dhadoop.log.dir=/opt/cloudera/parcels/CDH-5.11.1-
1.cdh5.11.1.p0.4/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -
Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.11.1-
1.cdh5.11.1.p0.4/lib/hadoop -Dhadoop.id.str= -
Dhadoop.root.logger=INFO,console -
Launcher Type: SUN_STANDARD
Environment Variables:
JAVA_HOME=/usr/java/jdk1.8.0_102
# JRE version: (8.0_102-b14) (build )
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.102-b14 mixed mode linux-
amd64 compressed oops)
# Failed to write core dump. Core dumps have been disabled. To enable core
dumping, try "ulimit -c unlimited" before starting Java again
Memory: 4k page, physical 24591972k(6051016k free), swap 12369916k(11359436k
free)
I am running programs like sqoop-import,sqoop-export on Java every 5 minutes.
example:
#!/bin/bash
hadoop jar /home/progr/import_sqoop/oracle.jar.
CDH version 5.11.1
java version jdk1.8.0_102
OS:Red Hat Enterprise Linux Server release 6.9 (Santiago)
Mem free:
total used free shared buffers cached
Mem: 24591972 20080336 4511636 132036 334456 2825792
-/+ buffers/cache: 16920088 7671884
Swap: 12369916 1008664 11361252
Host Memory Usage
enter image description here
The maximum heap memory is (by default) limited to 1GB. You need to increase this
JRE version: (8.0_102-b14) (build )
jvm_args: -Xmx1000m -Dhadoop.log.dir=/opt/cloudera/parcels/CDH-5.11.1-
1.cdh5.11.1.p0.4/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -
Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.11.1-
1.cdh5.11.1.p0.4/lib/hadoop -Dhadoop.id.str= -
Dhadoop.root.logger=INFO,console -
Try the following for to increase this to 2048MB (or higher if required).
export HADOOP_CLIENT_OPTS="-Xmx2048m ${HADOOP_CLIENT_OPTS}"
Reference:
Pig: Hadoop jobs Fail
https://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201104.mbox/%3C5FFFF0E4-B3BA-420A-ADE3-B422A66E8B11#yahoo-inc.com%3E

Three.js shader error 1282

Receiving following when trying to render using three.js.
THREE.WebGLProgram: shader error: 1282 gl.VALIDATE_STATUS false
gl.getProgramInfoLog not linked.
I have gotten my project running on a Windows 10 laptop and now trying to get it running on a Windows Server 2012. I speculate that it might be due to the server not having a real GPU, but I am no OpenGL developer and it's all chinese to me. This is what the DXDIAG says:
Display Devices
Card name: RDPUDD Chained DD
Manufacturer:
Chip type:
DAC type:
Device Type: n/a
Device Key: Enum\ROOT\BASICRENDER
Display Memory: n/a Dedicated Memory: n/a
Shared Memory: n/a
Current Mode: 1920 x 1080 (32 bit) (32Hz)
Driver Name: Driver File Version: ()
Driver Version:
DDI Version: 9Ex
Feature Levels:
Driver Model: Graphics Preemption: Compute Preemption: Driver Attributes: Final Retail Driver Date/Size: , 0 bytes
WHQL Logo'd: n/a
WHQL Date Stamp: n/a Device Identifier: {D7B71AF4-43CC-11CF-3921-F003ADC2CB35}
Vendor ID: 0x1414
Device ID: 0x008C
SubSys ID: 0x00000000
Revision ID: 0x0000 Driver Strong Name:
Rank Of Driver:
Video Accel:
DXVA2 Modes:
Deinterlace Caps: n/a
D3D9 Overlay: Not Supported
DXVA-HD: Not Supported
DDraw Status: Not Available
D3D Status: Enabled
AGP Status: Not Available
So the process is as usual to install node-gyp, which is risky business. That requires Visual C++ Build tools, which is installed on the server. Python 2.7 is installed and I've run the commands that are stated in the Installation section: https://github.com/nodejs/node-gyp
Then I use headless gl to be able to create the WebGL context in Three.js. That requires d3dcompiler_47.dll, which is added to the system32 folder. https://github.com/stackgl/headless-gl
The thing is that the server is pretty locked.. I am not able to do npm install of modules due to self signed certificate, probably since I am not allowed to do jack on the server. Now I'm pretty much stuck.. I don't know if my error is due to some compiler not being installed, if a module needs to be installed on the server, if node-gyp is mocking with me or whatever. It's a complete mess. But the server do work on my own machine. Anyone that have encountered something like it and could point me in a direction? Would be greatful!

Resources