Could I set some quota of one hypervisor like memory_mb?
I want to set the hypervisor's memory_mb to smaller than the true value -- 32126, because I need to preserve some resource for the host's system running.
root#l-pc:~# openstack hypervisor show l-pc -c free_ram_mb -c memory_mb -c memory_mb_used -c running_vms
+----------------+-------+
| Field | Value |
+----------------+-------+
| free_ram_mb | 27518 |
| memory_mb | 32126 |
| memory_mb_used | 4608 |
| running_vms | 1 |
+----------------+-------+
root#l-pc:~# free -h
total used free shared buff/cache available
Mem: 31G 9.8G 17G 119M 3.9G 20G
Swap: 14G 0B 14G
I know there is a openstack quota set command can limit the resource for the project, but not for one hypervisor.
root#l-pc:~# openstack hypervisor -h
Command "hypervisor" matches:
hypervisor list
hypervisor show
hypervisor stats show
Is there any method to achieve that like openstack hypervisor set for limit the resource consume in one hypervisor?
You can use "reserved_host_memory_mb" parameter in nova.conf
Any positive integer representing amount of memory in MB to reserve for the host.
Related
Torch Error:
RuntimeError: CUDA out of memory. Tried to allocate 392.00 MiB (GPU 0; 10.73 GiB total capacity; 9.47 GiB already allocated; 347.56 MiB free; 9.51 GiB reserved in total by PyTorch)
I checked GPU resource by nvidia-smi, showing no other running process and memory-usage: 10/10989MiB.
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 208... On | 00000000:04:00.0 Off | N/A |
| 22% 30C P8 10W / 230W | 10MiB / 10989MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
I have tried following 3 methods:
with torch.no_grad();
torch.cuda.empty_cache();
reduce batch_size;
All of them did not work.
I assume you've checked the GPU allocation post the error "CUDA out of memory". and torch.no_grad() does'nt have anything to do with cuda memory. It depends on the problem you are defining and solving.
Try monitoring the cuda memory using watch -n1 nvidia-smiand if you can post the code of dataloader and your training loop. so, the we can assist you. But in general reducing the batch size and detaching the unnecessary tensors should improve this.
With the command free -g, I am able to get the total occupied size and free size of RAM in Linux. But want to understand which tasks or process taking more size, so that I can free up the RAM size.
total used free shared buffers cached
Mem: 125 121 4 0 6 94
-/+ buffers/cache: 20 105
Swap: 31 0 31
Go for top command
then press shift+f
press a for pid information
ALso check
ps -eo pmem,vsz,pid
man ps
checkout pmem,vsz,pid.......
hope it helps..
thanks for the question !
You can use below command to find running processes sorted by memory use:
ps -eo pmem,pcpu,rss,vsize,args | sort -k 1 -r | less
lsblk provides output in this fornat:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sr0 11:0 1 1024M 0 rom
sda 8:0 0 300G 0 disk
sda1 8:1 0 500M 0 part /boot
sda2 8:2 0 299.5G 0 part
vg_data1-lv_root (dm-0) 253:0 0 50G 0 lvm /
vg_data2-lv_swap (dm-1) 253:1 0 7.7G 0 lvm [SWAP]
vg_data3-LogVol04 (dm-2) 253:2 0 46.5G 0 lvm
vg_data4-LogVol03 (dm-3) 253:3 0 97.7G 0 lvm /map1
vg_data5-LogVol02 (dm-4) 253:4 0 97.7G 0 lvm /map2
sdb 8:16 0 50G 0 disk
for a mounted volume say /map1 how do i directly get the physical volume associated with it. Is there any direct command to fetch the information?
There is no direct command to show that information for a mount. You can run
lvdisplay -m
Which will show which physical volumes are currently being used by the logical volume.
Remember, thought, that there is no such thing as a direct association between a logical volume and a physical volume. Logical volumes are associated with volume groups. Volume groups have a pool of physical volumes over which they can distribute any volume group. If you always want to know that a given lv is on a given pv, you have to restrict the vg to only having that one pv. That rather misses the point. You can use pvmove to push extents off a pv (sometimes useful for maintenance) but you can't stop new extents being created on it if logical volumes are extended or created.
As to why there is no such potentially useful command...
LVM is not ZFS. ZFS is a complete storage and filesystem management system, managing both storage (at several levels of abstraction) and the mounting of filesystems. LVM, in contrast, is just one layer of the Linux Virtual File System. It provides a layer of abstraction on top of physical storage devices and makes no assumption about how the logical volumes are used.
Leaving the grep/awk/cut/whatever to you, this will show which PVs each LV actually uses:
lvs -o +devices
You'll get a separate line for each PV used by a given LV, so if an LV has extents on three PVs you will see three lines for that LV. The PV device node path is followed by the starting extent(I think) of the data on that PV in parentheses.
I need to emphasize that there is no direct relation between a mountpoint (logical volume) and a physical volume in LVM. This is one of its design goals.
However you can traverse the associations between the logical volume, the volume group and physical volumes assigned to that group. However this only tells you: The data is stored on one of those physical volumes, but not where exactly.
I couldn't find a command which can produce the output directly. However you can tinker something using mount, lvdisplay, vgdisplay and awk|sed:
mp=/mnt vgdisplay -v $(lvdisplay $(mount | awk -vmp="$mp" '$3==mp{print $1}') | awk '/VG Name/{print $3}')
I'm using the environment variable mp to pass the mount point to the command. (You need to execute the command as root or using sudo)
For my test-scenario it outputs:
...
--- Volume group ---
VG Name vg1
System ID
Format lvm2
Metadata Areas 2
Metadata Sequence No 2
VG Access read/write
VG Status resizable
...
VG Size 992.00 MiB
PE Size 4.00 MiB
Total PE 248
Alloc PE / Size 125 / 500.00 MiB
Free PE / Size 123 / 492.00 MiB
VG UUID VfOdHF-UR1K-91Wk-DP4h-zl3A-4UUk-iB90N7
--- Logical volume ---
LV Path /dev/vg1/testlv
LV Name testlv
VG Name vg1
LV UUID P0rgsf-qPcw-diji-YUxx-HvZV-LOe0-Iq0TQz
...
Block device 252:0
--- Physical volumes ---
PV Name /dev/loop0
PV UUID Qwijfr-pxt3-qcQW-jl8q-Q6Uj-em1f-AVXd1L
PV Status allocatable
Total PE / Free PE 124 / 0
PV Name /dev/loop1
PV UUID sWFfXp-lpHv-eoUI-KZhj-gC06-jfwE-pe0oU2
PV Status allocatable
Total PE / Free PE 124 / 123
If you only want to display the physical volumes you might pipe the results of the above command to sed:
above command | sed -n '/--- Physical volumes ---/,$p'
dev=$(df /map1 | tail -n 1|awk '{print $1}')
echo $dev | grep -q ^/dev/mapper && lvdisplay -m $dev 2>/dev/null | awk '/Physical volume/{print $3}' || echo $dev
As it says in the description I have installed cudaHashcat-1.33 on an AWS g2.2xlarge instance.
I've used the .run file to install the CUDA Toolkit and then performed this test: deviceQuery ; as explained here in the official documentation (http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html#running-binaries).
Then I installed cudaHashcat-1.33, following these instructions.
sudo apt-get install p7zip-full
wget http://hashcat.net/files/cudaHashcat-1.33.7z
7za x cudaHashcat-1.33.7z
cd cudaHashcat-1.33
Then I tried to run this: cudaExample0.sh in ~/cudaHashcat-1.33/cudaExample0.sh and I end up getting this output:
cudaHashcat v1.33 starting...
Device #1: GRID K520, 4095MB, 797Mhz, 8MCU
Device #1: WARNING! Kernel exec timeout is not disabled, it might cause you errors of code 702
Hashes: 6494 hashes; 6494 unique digests, 1 unique salts
Bitmaps: 16 bits, 65536 entries, 0x0000ffff mask, 262144 bytes
Applicable Optimizers:
* Zero-Byte
* Precompute-Init
* Precompute-Merkle-Demgard
* Meet-In-The-Middle
* Early-Skip
* Not-Salted
* Not-Iterated
* Single-Salt
* Scalar-Mode
* Raw-Hash
Watchdog: Temperature abort trigger set to 90c
Watchdog: Temperature retain trigger set to 80c
ERROR: cuModuleLoad() 209
A second example is this one, where I actually use the file I want to attack.
ubuntu#ip-172-31-58-154:~$ ~/maskprocessor/src/mp64.bin ?l?l?l?l?l?l?l?l | ~/cudaHashcat-1.33/cudaHashcat64.bin -m 2500 xxx.hccap
cudaHashcat v1.33 starting...
Device #1: GRID K520, 4095MB, 797Mhz, 8MCU
Device #1: WARNING! Kernel exec timeout is not disabled, it might cause you errors of code 702
Hashes: 1 hashes; 1 unique digests, 1 unique salts
Bitmaps: 8 bits, 256 entries, 0x000000ff mask, 1024 bytes
Rules: 1
Applicable Optimizers:
* Zero-Byte
* Single-Hash
* Single-Salt
Watchdog: Temperature abort trigger set to 90c
Watchdog: Temperature retain trigger set to 80c
ERROR: cuModuleLoad() 209
nvidia-smi
[root#ip-xxx-xxx-xxx-xxx cudaHashcat-1.33]$ nvidia-smi
Wed Mar 4 19:07:35 2015
+------------------------------------------------------+
| NVIDIA-SMI 340.32 Driver Version: 340.32 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GRID K520 On | 0000:00:03.0 Off | N/A |
| N/A 43C P8 17W / 125W | 10MiB / 4095MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| No running compute processes found |
+-----------------------------------------------------------------------------+
If someone knows what is going on, I'd appreciate any help.
So after a lot of searching through forums I finally found an answer. #Robert Crovella, thanks for pointing out that the driver was the wrong one. So it turns out that finding the linux drivers for NVIDIA is not that easy, but I came across this page, which then lead me to the Linux Drivers of NVIDIA. Just download the driver required for your architecture (if you use wget click on 'Download' first, since there is an acceptance page). After that do 'chmod +x nvidia-driver.run' and then install it with 'sudo ./nvidia-driver.run'.
Hope that my experience helps someone else.
I'm running Ubuntu Trusty 14.04 on a new machine with 8GB of RAM, and it seems to be locking up periodically and nothing is in syslog file. I've installed Nagios and have been watching the graphs, and it looks like memory is going high from 7% to 72% in just a span of 10 mins. Only node process are running on server. In top I found all process are running very normal memory consumption. Even after stopping node process. Memory remains with same utilization.
free agrees, claiming I'm using more than 5.7G of memory:
free -h
total used free shared buffers cached
Mem: 7.8G 6.5G 1.3G 2.2M 233M 612M
-/+ buffers/cache: 5.7G 2.1G
Swap: 2.0G 0B 2.0G
This other formula for totaling the memory roughly agrees:
# ps -e -orss=,args= | sort -b -k1,1n | awk '{total = total + $1}END{print total}'
503612
If the processes only total 500 MiB, where's the rest of the memory going?
I've got solution on this... so just wanna to update the same...
echo 2 > /proc/sys/vm/drop_caches
This resolved my issue. So I have added the same in my cron for every 5 mins on each of ubuntu server