While installing tensorflow in ubuntu for keras backend should I install it in virtual environment or main python files - keras

iddharth#siddharth-HP-EliteBook-8460p:~$ source ./venv/bin/activate
(venv) siddharth#siddharth-HP-EliteBook-8460p:~$ python -c "import
tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
2020-06-07 13:40:24.858083: W
tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could
not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot
open shared object file: No such file or directory 2020-06-07
13:40:24.858216: E tensorflow/stream_executor/cuda/cuda_driver.cc:313]
failed call to cuInit: UNKNOWN ERROR (303) 2020-06-07 13:40:24.858349:
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel
driver does not appear to be running on this host
(siddharth-HP-EliteBook-8460p): /proc/driver/nvidia/version does not
exist 2020-06-07 13:40:25.024713: I
tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU
Frequency: 2593965000 Hz 2020-06-07 13:40:25.025838: I
tensorflow/compiler/xla/service/service.cc:168] XLA service
0x7f0574000b60 initialized for platform Host (this does not guarantee
that XLA will be used). Devices: 2020-06-07 13:40:25.025876: I
tensorflow/compiler/xla/service/service.cc:176] StreamExecutor
device (0): Host, Default Version tf.Tensor(-1066.3622, shape=(),
dtype=float32)

Related

Kernel modules not loaded during boot

Observing that some kernel modules are not being loaded in the latest kernel 5.15.34-v7.
So I have built a core-image-base from meta-raspberrypi (0135a02) and while trying access the camera using Picamera got some errors. The errors mainly complain about mmal drivers not present.
root#raspberrypi3:~# python3
Python 3.10.4 (main, Mar 23 2022, 20:25:24) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from picamera import PiCamera
>>> camera = PiCamera()
mmal: mmal_vc_shm_init: could not initialize vc shared memory service
mmal: mmal_vc_component_create: failed to initialise shm for 'vc.camera_info' (7:EIO)
mmal: mmal_component_create_core: could not create component 'vc.camera_info' (7)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.10/site-packages/picamera/camera.py", line 408, in __init__
self._init_revision(options)
File "/usr/lib/python3.10/site-packages/picamera/camera.py", line 480, in _init_revision
with mo.MMALCameraInfo() as camera_info:
File "/usr/lib/python3.10/site-packages/picamera/mmalobj.py", line 2425, in __init__
super(MMALCameraInfo, self).__init__()
File "/usr/lib/python3.10/site-packages/picamera/mmalobj.py", line 696, in __init__
mmal_check(
File "/usr/lib/python3.10/site-packages/picamera/exc.py", line 184, in mmal_check
raise PiCameraMMALError(status, prefix)
picamera.exc.PiCameraMMALError: Failed to create MMAL component b'vc.camera_info': I/O error
>>>
>>>
root#raspberrypi3:~#
After digging through my system found an older build (don't know why I didn't delete it but thankfully it gave some insight into the issue), I tried booting that image and everything seems to be working fine.
So I checked out to the commit which the older build was using (63a3d8cb17c5d1affe8f2848f45fcc6a706f9412), and the camera worked fine(though I had to make few changes, which are not significant for this issue). While analyzing the bootlogs found that the latest build (0135a02) doesn't load all the drivers.
Also I have observed that the kernel module are compressed in the 5.15.34 kernel, eg: root#raspberrypi3:~# ls /lib/modules/5.15.34-v7/kernel/drivers/usb/gadget/libcomposite.ko.xz and while trying load the modules using modprobe getting the following error:
root#raspberrypi3:~# ls /lib/modules/5.15.34-v7/kernel/drivers/usb/gadget/legacy/
g_acm_ms.ko.xz g_cdc.ko.xz g_hid.ko.xz g_midi.ko.xz g_printer.ko.xz g_webcam.ko.xz gadgetfs.ko.xz
g_audio.ko.xz g_ether.ko.xz g_mass_storage.ko.xz g_multi.ko.xz g_serial.ko.xz g_zero.ko.xz
root#raspberrypi3:~# modprobe gadgetfs
modprobe: FATAL: Module gadgetfs not found in directory /lib/modules/5.15.34-v7
My question is what and where the changes have happened to the kernel between 63a3d8cb17c5d1affe8f2848f45fcc6a706f9412 (5.10) and 0135a02 (5.15) , so that I can look into and adapt the changes required ?
Note: All the commit hashes which are mentioned above are of meta-raspberrypi repo.
Logs
lsmod logs
5.15.34
root#raspberrypi3:~# lsmod
Module Size Used by
root#raspberrypi3:~#
5.10.81
root#raspberrypi3:~# lsmod
Module Size Used by
rfcomm 49152 2
cmac 16384 3
algif_hash 16384 1
nfc 86016 0
aes_arm_bs 24576 2
crypto_simd 16384 1 aes_arm_bs
cryptd 24576 2 crypto_simd
algif_skcipher 16384 1
af_alg 28672 6 algif_hash,algif_skcipher
bnep 20480 2
hci_uart 40960 1
btbcm 16384 1 hci_uart
bluetooth 421888 31 hci_uart,bnep,btbcm,rfcomm
ecdh_generic 16384 2 bluetooth
ecc 36864 1 ecdh_generic
ipv6 503808 26
brcmfmac 331776 0
brcmutil 24576 1 brcmfmac
sha256_generic 16384 0
bcm2835_v4l2 49152 0
cfg80211 782336 1 brcmfmac
bcm2835_codec 40960 0
bcm2835_isp 32768 0
v4l2_mem2mem 36864 1 bcm2835_codec
rfkill 32768 4 bluetooth,nfc,cfg80211
bcm2835_mmal_vchiq 36864 3 bcm2835_isp,bcm2835_codec,bcm2835_v4l2
videobuf2_dma_contig 20480 2 bcm2835_isp,bcm2835_codec
videobuf2_vmalloc 16384 1 bcm2835_v4l2
videobuf2_memops 16384 2 videobuf2_dma_contig,videobuf2_vmalloc
videobuf2_v4l2 32768 4 bcm2835_isp,bcm2835_codec,bcm2835_v4l2,v4l2_mem2mem
videobuf2_common 61440 5 bcm2835_isp,bcm2835_codec,bcm2835_v4l2,v4l2_mem2mem,videobuf2_v4l2
raspberrypi_hwmon 16384 0
videodev 253952 6 bcm2835_isp,bcm2835_codec,videobuf2_common,bcm2835_v4l2,v4l2_mem2mem,videobuf2_v4l2
mc 45056 6 bcm2835_isp,bcm2835_codec,videobuf2_common,videodev,v4l2_mem2mem,videobuf2_v4l2
vc_sm_cma 32768 2 bcm2835_isp,bcm2835_mmal_vchiq
uio_pdrv_genirq 16384 0
uio 20480 1 uio_pdrv_genirq
fixed 16384 0
root#raspberrypi3:~#
Make sure you have kernel-modules installed:
IMAGE_INSTALL_append = " kernel-modules"
EDIT
The package that provides all kernel modules is kernel-modules, or each modules is within a separate package kernel-module-<module_name>. For meta-raspberrypi, they set kernel-modules as a package not essential for boot, means that if the package is not found, the board should boot normal:
meta-raspberrypi/conf/machine/include/rpi-base.inc
MACHINE_EXTRA_RRECOMMENDS += "kernel-modules udev-rules-rpi"
In previous meta-raspberrypi branches, it was a rpi image recipe rpi-basic-image.bb:
# Base this image on core-image-minimal
include recipes-core/images/core-image-minimal.bb
# Include modules in rootfs
IMAGE_INSTALL += " \
kernel-modules \
"
SPLASH = "psplash-raspberrypi"
IMAGE_FEATURES += "ssh-server-dropbear splash"
do_image:prepend() {
bb.warn("The image 'rpi-basic-image' is deprecated, please use 'core-image-base' instead")
}
So, the only thing needed for integrating the kernel modules is kernel-modules package, either by the image example, or try:
local.conf
MACHINE_EXTRA_RRECOMMENDS_remove = "kernel-modules"

qemu: installating ubuntu through ISO gets stuck, shows "SVM" CPU bit warning

I am trying to install ubuntu in one of the qcow2 images I have created, using the below command
sudo qemu-system-x86_64 -enable-kvm -nographic -smp 8 -m 8G -cpu qemu64 -cdrom ubuntu-19.10-live-server-amd64.iso -boot d ubuntu-19.10-live-server-amd64.qcow2
First it spits out a warning, and then just hangs
qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
qemu-system-x86_64: warning: host doesn't support requested feature: CPUID.80000001H:ECX.svm [bit 2]
SeaBIOS (version 1.13.0-1ubuntu1)
iPXE (http://ipxe.org) 00:03.0 CA00 PCI2.10 PnP PMM+BFF8C9F0+BFECC9F0 CA00
Booting from DVD/CD...
ISOLINUX 6.04 20190226 ETCD Copyright (C) 1994-2015 H. Peter Anvin et al
Loading bootlogo...
Initializing gfx code...
I have searched a lot and got a number of solutions and possible problems but none worked.
1) Have tried with Ubuntu 20 also, but same error.
2) VT-x not enabled.
It is enabled, lscpu shows
Virtualization: VT-x
Hypervisor vendor: KVM
Flags: .. vmx ..
3) Try with -cpu qemu64. Did not work
4) use qemu-system-i386 instead of qemu-system-x86_64. But that fails with a different error
This kernel requires an x86-64 CPU, but only detected an i686 CPU.
Unable to boot - please use a kernel appropriate for your CPU.
5) I did find out that the "SVM" CPU bit corresponds to "AMD Secure Virtual Machine", which confused me since my CPU is Haswell.
Help!
I found what the issue was.
While using nested virtualization, the option -cpu host works. This advises qemu to use the same cpu format as the host, which in our case is also a VM, and which mostly will be using the host CPU format too..
The above setting works,
unless; you are using nested virtualization over a virtualbox, and trying to run qemu on the VM. Then, to make this work, we have to skip enable-kvm and the -cpu option altogether. It does make the qemu VM run slow, but it works.

Cannot dlopen some GPU libraries. Skipping registering GPU devices

Tensorflow is only using the CPU and wont use the GPU. I assume its because it expects Cuda 10.0 and it finds 10.2.
I had installed 10.2 but have purged it and installed 10.0.
Im running Ubuntu 19.10, AMD Ryzen 2700 Cpu, RTX 2080 S.
I have installed the 440 Nvidia driver, It says cuda version 10.2 when i check with nvidia-smi and nvcc -version.
From pip3: tensorflow-gpu 1.14.0
tensorflow-datasets 2.0.0
tensorflow-estimator 1.14.0
tensorflow-metadata 0.21.1
From Nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... Off | 00000000:08:00.0 On | N/A |
| 0% 48C P8 13W / 250W | 369MiB / 7979MiB | 3% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1110 G /usr/lib/xorg/Xorg 18MiB |
| 0 1611 G /usr/lib/xorg/Xorg 73MiB |
| 0 1816 G /usr/bin/gnome-shell 108MiB |
| 0 2655 C python3 115MiB |
+-----------------------------------------------------------------------------+
from nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
But when i check the version.txt i get 10.0.130
cat /usr/local/cuda/version.txt
CUDA Version 10.0.130
I check the devices with :
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
result:
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 4810338588393992961
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 7271419476897292826
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 4332706623198547606
physical_device_desc: "device: XLA_GPU device"
]
How do i register the 10.0.130
Is that the reason why it wont run on GPU? Its super slow on the 8 Core CPU. Any advice?
Here is the log:
2020-02-13 14:11:31.411277: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-02-13 14:11:31.440150: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3193485000 Hz
2020-02-13 14:11:31.441076: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5625b689c790 executing computations on platform Host. Devices:
2020-02-13 14:11:31.441123: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2020-02-13 14:11:31.443001: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-02-13 14:11:31.472935: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 14:11:31.473407: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.845
pciBusID: 0000:08:00.0
2020-02-13 14:11:31.474361: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-02-13 14:11:31.487124: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-02-13 14:11:31.496148: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-02-13 14:11:31.498873: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-02-13 14:11:31.514842: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-02-13 14:11:31.525992: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-02-13 14:11:31.526168: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/lib64
2020-02-13 14:11:31.526183: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-02-13 14:11:31.618627: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-13 14:11:31.618655: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-02-13 14:11:31.618662: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-02-13 14:11:31.620367: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 14:11:31.621395: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5625b732d5f0 executing computations on platform CUDA. Devices:
2020-02-13 14:11:31.621407: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce RTX 2080 SUPER, Compute Capability 7.5
[name: "/device:CPU:0"
device_type: "CPU"
memory_limit: 268435456
locality {
}
incarnation: 13330791690361361129
, name: "/device:XLA_CPU:0"
device_type: "XLA_CPU"
memory_limit: 17179869184
locality {
}
incarnation: 11872341970779952422
physical_device_desc: "device: XLA_CPU device"
, name: "/device:XLA_GPU:0"
device_type: "XLA_GPU"
memory_limit: 17179869184
locality {
}
incarnation: 15007819717683015571
physical_device_desc: "device: XLA_GPU device"
]
WARNING:tensorflow:From pokeGAN.py:172: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead.
WARNING:tensorflow:From pokeGAN.py:174: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.
WARNING:tensorflow:From pokeGAN.py:77: The name tf.get_variable is deprecated. Please use tf.compat.v1.get_variable instead.
2020-02-13 14:11:33.799163: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-02-13 14:11:33.799597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce RTX 2080 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.845
pciBusID: 0000:08:00.0
2020-02-13 14:11:33.799646: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-02-13 14:11:33.799658: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-02-13 14:11:33.799669: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-02-13 14:11:33.799684: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-02-13 14:11:33.799695: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-02-13 14:11:33.799706: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-02-13 14:11:33.799777: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/lib64
2020-02-13 14:11:33.799786: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2020-02-13 14:11:33.800016: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-13 14:11:33.800028: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]
WARNING:tensorflow:From pokeGAN.py:203: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead.
2020-02-13 14:11:34.197990: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
WARNING:tensorflow:From /home/node/.local/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
WARNING:tensorflow:From pokeGAN.py:211: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
total training sample num:91
batch size: 64, batch num per epoch: 1, epoch num: 5000
start training...
Judging from your logs it looks like tensorflow finds the correct cuda version but the cudnn library is missing.
2020-02-13 14:11:31.474361: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-02-13 14:11:31.526168: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-10.2/lib64
Have you installed the correct version of cudnn? As you can see here
tensorflow 1.14 also requires cudnn 7.4
The only thing that worked for me to solve this issue was to completely remove CUDA and reinstall it again.

SLURM, using srun to print outputs

I am using srun to run my program, however, it cannot print the output.
me#home:~$ srun -p K80q --gres=gpu:1 -N 1 python3 main.py
2019-05-15 19:56:43.305156: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-05-15 19:56:43.543516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
pciBusID: 0000:85:00.0
totalMemory: 11.17GiB freeMemory: 11.10GiB
2019-05-15 19:56:43.543567: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2019-05-15 19:56:43.900189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-05-15 19:56:43.900248: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2019-05-15 19:56:43.900257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2019-05-15 19:56:43.900619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10761 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:85:00.0, compute capability: 3.7)
I only got the above output and it cannot print the information I expected. How can I fix it?
By the way, simply define a test code
import tensorflow
if __name__ == '__main__':
for i in range(10):
print('Hello')
It can print Hello 10 times.
Update:
After 20 minutes, it outputs some information I expected. How can I make it output immediately?
Try the -u option of srun:
-u, --unbuffered
By default the connection between slurmstepd and the user launched application is over a pipe. The stdio output written by
the application is
buffered by the glibc until it is flushed or the output is set as unbuffered. See setbuf(3). If this option is specified the
tasks are executed
with a pseudo terminal so that the application output is unbuffered.

No userspace chardev available when using pwm-ir-tx module

I'm attempted to use a pwm output pin as a IR transmitter using a RAMIPS SoC and the PWM IR TX kernel module. I'm running linux 4.14.37 and have added the following entry to the dts file:
pwm_ir_tx1: pwm-ir-transmitter1 {
compatible = "pwm-ir-tx";
pwms = <&pwm 1 100>;
};
I'm loading the rc-core and pwm-ir-tx kernel modules:
lsmod | grep pwm
pwm_ir_tx 2032 0
pwm_mediatek_ramips 1744 1
rc_core 19348 2 pwm_ir_tx
When the pwm-ir-tx module loads, the kernel logs:
[ 3754.108259] rc rc0: PWM IR Transmitter as /devices/platform/pwm-ir-transmitter1/rc/rc0
The sysfs nodes appear to be loaded correctly:
ls -la /sys/class/rc/
drwxr-xr-x 2 root root 0 May 29 00:18 .
drwxr-xr-x 23 root root 0 Jan 1 1970 ..
lrwxrwxrwx 1 root root 0 May 29 01:16 rc0 -> ../../devices/platform/pwm-ir-transmitter1/rc/rc0
But there is no userspace (chardev) lirc device listed in /dev so I'm not sure how I'm supposed to interact with the device. Ideally I'd like to use the Remote Controller API but this requires a chardev to be present in /dev.
ls /dev
autofs mtd2ro network_throughput
console mtd3 null
cpu_dma_latency mtd3ro port
full mtd4 ptmx
gpiochip0 mtd4ro pts
gpiochip1 mtd5 random
gpiochip2 mtd5ro shm
gpiochip3 mtd6 tty
i2c-0 mtd6ro ttyS0
kmsg mtdblock0 ttyS1
log mtdblock1 ttyS2
memory_bandwidth mtdblock2 urandom
mtd0 mtdblock3 watchdog
mtd0ro mtdblock4 watchdog0
mtd1 mtdblock5 zero
mtd1ro mtdblock6
mtd2 network_latency
I've tried loading the lirc_dev module before and after the pwm-ir-tx module, but nothing appears in /dev still, the following output appears when I load the lirc_dev module:
[ 4775.367966] lirc_dev: IR Remote Control driver registered, major 251
But still no lirc userspace device in /dev... I'm thinking the lirc_dev module is required since it provides the lirc userspace api but there doesn't appear to be any connection between it and the pwm-ir-tx module, and it's not creating any lirc chardevs in /dev.
The pwm_ir_tx module seems to be more or less a piggy-back to the pwm driver. And the pwm driver seems to be available at /sys/class/pwm/. See https://www.kernel.org/doc/Documentation/pwm.txt.
BTW, not having a lirc link in rc0/ is not uncommon - not all drivers implements this.
After reading some of rc source files in the kernel, it became obvious the ir-lirc-codec module was also required.
Basically, the pwm-ir-tx driver is defined as a type of RC_DRIVER_IR_RAW_TX. When the rc-core module registers a driver with a type of RC_DRIVER_IR_RAW or RC_DRIVER_IR_RAW_TX it calls the ir_raw_event_prepare function which in turn tries to load the ir-lirc-codec module. Once this module was available the following kernel logs appear:
[ 10.004460] lirc_dev: IR Remote Control driver registered, major 251
[ 10.131011] IR LIRC bridge handler initialized
[ 10.471561] rc rc0: PWM IR Transmitter as /devices/platform/pwm-ir-transmitter1/rc/rc0
[ 10.487456] rc rc0: lirc_dev: driver ir-lirc-codec (pwm-ir-tx) registered at minor = 0
And in /dev there is a lirc chardev device available:
ls /dev/li*
/dev/lirc0

Resources