Sorted output of free space of partition - linux

Sorted output of free space for each partition.
In other words, a table with at least two columns: partition name and free or unsupported storage space sorted according to free or unused storage space.
Motivation: I want to back up files and use a hard drive that still has as much space as possible.
what i tried:
df -h | sort -h -r
if found that's sound great
diskutil info disk1s4 | awk '/Free Space:.* GB/ {print $3,$4}'
but not working at my Manjaro-Linux. Command not found

First of all, I guess you meant "use partition" instead of hard drive.
df -l --output=source,avail|sed '/\/dev\//!d'|sort -nr -k2
Notes:
-l only local devices, that is, network shares are not listed
the sed part removes the title and filesystems not under /dev/ adjust this if you want something else.
sort does sort.
the mountpoints are not listed, if they are required, add target in the --output list.

Related

Calculating the total disk usage for a set of subdirectories across a directory tree?

Here is the scenario.
Imagine we have this style of directory stucture -
/snapshots/201801/users/tom/
/snapshots/201802/users/harry/
/snapshots/201803/users/chris/
and so on.
What I'm trying to get in my output is the total disk usage of the name directories but factoring in all of the "20180x" folders into the total. At the end this output should be sorted by highest disk usage to lowest.
As a very basic example if I was to do -
du -h --max-depth=1 /snapshots/2018*/users/ | sort -n
The output would show the total directory disk usage for each "201801", "201802", "201803" directory in the output as it goes through them.
What I actually want is the output to instead show the total of the disk usage across "201801", "201802", "201803" for a given user and then sort it by size.
So for example the resulting output would need to be showing disk space totals where it's calculated like - "20GB = The total sum of "/tom" - across all of the available "201801", "201802", "201803" folders in the directory tree. Imagine the "201801", "201802", "201803" are representative of a particular version of the user's folder, and we want to calculate totals for each user across all of these. We can assume the "201801","201802","201803" style folders will always be found at the same depth in the tree.
20GB /tom
10GB /harry
900MB /chris
Hopefully this makes sense. Please let me know if you have a better title I could use for this question as I'm not sure of the best terminology to use to explain what I'm trying to do in just a sentence.
Thanks!
Using GNU utilities du and sort, you can write a script something along these lines:
#!/bin/bash
users=(tom harry chris)
for user in "${users[#]}"; do
usage=$(du -hc /snapshots/*/users/"$user")
usage=${usage##*$'\n'}
printf "%s\t%s\n" "${usage%%[[:blank:]]*}" "$user"
done | sort -h

How to get the capacity and free space of a directory which has another mount in CentOS

For example, the dir "/app/source"
There is an 100GB filesystem mount on "/"
So when I use "df /app/source" I can get the capacity is 100GB.
Then there is a dir "/app/source/video" and an 100GB filesystem mount on it.
Is there any easy way to get the real capacity (200GB) of "/app/source" ?
/app/source don't have a capacity of 200G, thus you cannot expect to see it. It's "real" capacity is 100G as the underlying disk capacity is 100G. If you think it has a capacity of 200G, then you expect to be able to store 200G of data in /app/source but you cannot ! You can store 100G in /app/source and 100G in /app/source/video
Maybe you would like to really merge the capacity of both partitions, for this you could use LVM.
Trying to merge only the reported numbers, which you could do with a simple script or alias (see below), would give you bad information and then you may try to add files on a full partition.
If at the end you still need the added total, maybe something like this can help:
# df -h --total /app/source /app/source/video | grep total | awk -F" " '{print $2}'

How to get disk name that contain specific partition

If i know that partition is for example /dev/sda1 how can i get disk name (/dev/sda in this case) that contain the partition?
The output should be only path to disk. (like '/dev/sda')
EDIT: It shouldn't be string manipulation
You can use the shell's built-in string chopping:
$ d=/dev/sda1
$ echo ${d%%[0-9]*}
/dev/sda
$ d=/dev/sda11212
$ echo ${d%%[0-9]*}
/dev/sda
This works for some of the disk names only. If there can be several digits in the name, it will chop everything after the first.
What is the exact specification to separate a disk name from a partition name?
You can use sed to get the disk. Because partitions are just increments of disk names, it's easy to perform:
echo "/dev/sda1" | sed 's/[0-9]*//g'
which produces the output /dev/sda
Another command you can use to obtain disk information is lsblk. Just typing it without args prints out all info pertaining to your disks and partitions.

efficient sort | uniq for the case of a large number of duplicates

Summary : is there a way to get the unique lines from a file and the number of occurrences more efficiently than using a sort | uniq -c | sort -n?
Details: I often pipe to sort | uniq -c | sort -n when doing log analysis to get a general trending of which log entries show up the most / least etc. This works most of the time - except when I'm dealing with a very large log file that ends up with a very large number of duplicates (in which case sort | uniq -c ends up taking a long time).
Example: The specific case I'm facing right now is for getting a trend from an 'un-parametrized' mysql bin log to find out which queries are run the most. For a file of a million entries which I pass through a grep/sed combination to remove parameters - resulting in about 150 unique lines - I spend about 3 seconds grepping & sedding, and about 15s sorting/uniq'ing.
Currently, I've settled with a simple c++ program that maintains a map of < line, count > - which does the job in less than a second - but I was wondering if an existing utility already exists.
I'm not sure what the performance difference will be, but you can replace the sort | uniq -c with a simple awk script. Since you have many duplicates and it hashes instead of sorting, I'd imagine it's faster:
awk '{c[$0]++}END{for(l in c){print c[l], l}}' input.txt | sort -n

Best way to extract number of partitions?

Assuming that there are only primary partitions on a disk, what is the best way to find the current number of partitions?
Is there any better way than:
fdisk -l > temp
#Following returns first column of the last line of temp e.g. /dev/sda4
lastPart=$(tail -n 1 temp | awk '{print $1}')
totalPartitions=$(echo ${lastPart:8})
$totalPartitions variable sometimes returns NULL. That's why, I was wondering if there is a more reliable way to find the current number of partitions.
What about:
totalPartitions=$(grep -c 'sda[0-9]' /proc/partitions)
?
(Where sda is the name of the disk you're interested in, replacing it as appropriate)
I found this question while I was writing a script to safely wipe test and re-provision storage, which is sometimes a memory card, so mmcblk0p1 is often the format of its partitions.
Here's my answer:
diskICareAbout="sda"
totalPartitions="$( ls /sys/block/${diskICareAbout}/*/partition | wc -l )"
/proc/partitions is archaic and flat. The sys filesystem can comunicate the heirarchal nature of partitions well enough that grep is not needed.
You can use partx for this.
partx -g /dev/<disk> | wc -l
will return the total number of partitions (-g omits the header line). To get the last partition on a disk, use
partx -rgo NR -n -1:-1 /dev/<disk>
which may be useful if there are gaps in the partition numbers. -r omits aligning spaces, and -o specifies the comma-separated columns to include. -n specifies a range of partitions start:end, where -1 is the last partition.

Resources