get the size and other info with the "du" command

get the size and other info with the "du" command - linux

I'm doing a little script on bash, which shows the total size in mb, the number of files, the number of the folder and the name of folder.
I have almost everything except the size in mb
du -a -h | cut -d/ -f2 | sort | uniq -c
It shows something like this:
4 01 folder 01
6 02 folder 02
11 03 folder 03
13 04 folder 04
16 05 folder 05
.....
15 13 folder 13
1 5.7G .
as you see, the sort is: number of files, number of folder and name.
I want this:
300M 4 01 folder 01
435M 6 02 folder 02
690M 11 03 folder 03
780M 13 04 folder 04
1.6G 16 05 folder 05
.....
15 13 folder 13
1 5.7G .
thank you in advance.
PD there is some way to show the name over each column like this?
M F # name
300M 4 01 folder 01
435M 6 02 folder 02
690M 11 03 folder 03
780M 13 04 folder 04
1.6G 16 05 folder 05
.....
15 13 folder 13
1 5.7G .

How about this?
echo -e "Size\tFiles\tDirectory"; paste <(du -sh ./*/ | sort -k2 | cut -f1) <(find ./*/ | cut -d/ -f2 | uniq -c | sort -k2 | awk '{print ($1-1)"\t"$2}') | sort -nk2
Sample output:
Size Files Directory
172M 36 callrecords
17M 747 manual
83M 2251 input
7.5G 16867 output
Explanation:
Add the header:
echo -e "Size\tFiles\tDirectory";
<(COMMAND) is a structure which allows the output of a command to be used as if it were a file. Paste takes 2 files, and outputs them side by side. So we are pasting together the outputs of two commands. The first is this:
<(du -sh ./*/ | sort -k2 | cut -f1)
Which simply finds the size of subfolders of the current folder, summarising anything inside. This is then sorted according to the names of the files/folders, and then the first column is taken. This gives us a list of the sizes of subfolders of the current folder, sorted by their name.
The second command is this:
<(find ./*/ | cut -d/ -f2 | uniq -c | sort -k2 | awk '{print ($1-1)"\t"$2}')
This is similar to your original command - it finds folders below the current directory, truncates the names to include only the first sublevel, then counts them to give a list of sub-folders of the current folder, and the number of files within each. This is then sorted by the folder names, and the awk command formats the results and also subtracts 1 from the file count for each folder (as the folder itself is included). We can then paste the results together to get the (almost) final output.
Finally, we use sort -nk2 on the output of the paste command to sort by number on the 2nd field - ie the number of files.

Related

Is there any linux command to Keep only 5 recent files that start with REF in a folder?

I would like to make a linux command which will keep only the last 5 recent files, but these files must start with REF, and delete the other files also which start with REF, but not touch the other files.
For example: in my folder, I have:
-rw-r--r-- 1 0 Jan 1, 2022 File_0
-rw-r--r-- 1 0 Jan 1, 2022 REF_1
-rw-r--r-- 1 0 Feb 1 2022 REF_2
-rw-r--r-- 1 0 March 1, 2022 REF_3
-rw-r--r-- 1 0 Apr 1, 2022 REF_4
-rw-r--r-- 1 0 May 1, 2022 REF_5
-rw-r--r-- 1 0 June 1, 2022 REF_6
-rw-r--r-- 1 0 Jul 1, 2022 file_7
-rw-r--r-- 1 0 1 Aug 2022 file_8
-rw-r--r-- 1 0 Sep 1, 2022 REF_9
The command should remove only:
-rw-r--r-- 1 0 Jan 1, 2022 REF_1
-rw-r--r-- 1 0 Feb 1 2022 REF_2
... and should keep the other files. I tried ls -t REF* | head -n+4 | xargs rm REF* but this command deletes all files that start with REF!
What command can I use?

Using zsh (available on many Linux distributions and also on AIX from IBM's AIX Toolbox for Open Source Software), you could simply:
rm REF*(om[6,-1])
This uses zsh's powerful globbing (filename generation) abilities to:
gather the list of files starting with REF
sort the files by their modification time (newest first) with (om...)
keep the five newest files by selecting the 6th and remaining files with [6,-1]
pass that list of files to rm
Test it first with a simple print -l REF*(om[6,-1]) to see which files would be collected.
See Glob Qualifiers for more about zsh's glob qualifiers.

Logrotate was mentioned, why not use it?
It can't handlle the separator being an underscore (_).
$ cat log.conf
REF* {
rotate 5
}
% logrotate log.conf
error: log.conf:1 keyword 'REF' not properly separated, found 0x2a
Here is a complete script, with safe filenames.
find . -name 'REF_*' -print0 | \
xargs -0 stat -c "%Y %n" | \
sort -n | \
head -n+3 | \
sed -e 's/^[0-9]* //' | \
tr '\12' '\0' | \
xargs -0 rm
First, let us use find with nulls to fetch the list of files.
Then use xargs to use stat to prepend the unix time stamp.
Use sort to sort by oldest first.
Use head -n+3 to find all except the last 5.
Use sed to strip the temporary unix time stamp.
Use tr to convert the returns from stat to nulls again.
Finally xargs to delete the unwanted files.

Count directories and subdirectories

I want to combine directories and sub-directories and sum-up the first column as follows:
original output:
8 ./.g/apps/panel/mon/lt/prefs
12 ./.g/apps/panel/mon/lt
40 ./.g/apps/panel/mon
44 ./.g/apps/panel
88 ./.g/apps
112 ./.g
4 ./.g
4 ./.pof
20 ./.local/share/applications
4 ./.local/share/m/packages
8 ./.local/share/m
4 ./.local/share/Trash/info
4 ./.local/share/Trash/files
12 ./.local/share/Trash
44 ./.local/share
new output:
308 ./.g
4 ./.pof
96 ./.local/share
the original command: du -k, and I'm trying with awk and cut commands but fails.
edit:- I got up to here:
du -k | awk '{print $1}' | cut -d "/" -f 1
Now, I'm struggling to merge similar lines and sum-up the first column.
p.s this is just an output example*
thank you.

Use du -d 1 to list accumulative content of 1 directory bellow current.
du -h -d 1
Provide a human readable count.

You can try with command:
du -sh *

Try
du -sk .g .pof .local/share
The -s switch is summary, that is, du will search all the files, all the way down the folders inside, and report just the grand total. (The -k switch print the size in kilobytes; thanks Romeo Ninov).
You have to manually specify each folder you want to know the grand total of.
If you type, for example
du -sk .
it will output just a single number, accounting for the current folder (and below) file sizes.
If you type
du -sk *
the result will depend on what your shell expands * to (usually all the files and folders not starting with a dot (.) in the current folder).

Using variables as the input to command

I've scoured various message boards to understand why I can't use a variable as input to a command in certain scenarios. Is it a STDIN issue/limitation? Why does using echo and here strings fix the problem?
For example,
~$ myvar=$(ls -l)
~$ grep Jan "$myvar"
grep: total 9
-rwxr-xr-x 1 jvp jvp 561 Feb 2 23:59 directoryscript.sh
-rw-rw-rw- 1 jvp jvp 0 Jan 15 10:30 example1
drwxrwxrwx 2 jvp jvp 0 Jan 19 21:54 linuxtutorialwork
-rw-rw-rw- 1 jvp jvp 0 Jan 15 13:08 littlefile
-rw-rw-rw- 1 jvp jvp 0 Jan 19 21:54 man
drwxrwxrwx 2 jvp jvp 0 Feb 2 20:33 projectbackups
-rwxr-xr-x 1 jvp jvp 614 Feb 2 20:41 projectbackup.sh
drwxrwxrwx 2 jvp jvp 0 Feb 2 20:32 projects
-rw-rw-rw- 1 jvp jvp 0 Jan 19 21:54 test1
-rw-rw-rw- 1 jvp jvp 0 Jan 19 21:54 test2
-rw-rw-rw- 1 jvp jvp 0 Jan 19 21:54 test3: File name too long
As you can see I get the error... 'File name too long'
Now, I am able to get this to work by using either:
echo "$myvar" | grep Jan
grep Jan <<< "$myvar"
However, I'm really after a better understanding of why this is the way it is. Perhaps I am missing something about basics of command substitution or what is an acceptable form of STDIN.

The grep utility can operate...
On files the names of which are provided on the command line, after the regular expression used for matching
On a stream supplied on its standard input.
You are doing this :
myvar=$(ls -l)
grep Jan "$myvar"
This provides the content of variable myvar as an argument to the grep command, and since it is not a file name, it does not work.
There are many ways to achieve your goal. Here are a few examples.
Use the content of the variable as a stream connected to the standard input of grep, with one of the following methods (all providing the same output) :
grep Jan <<<"$myvar"
echo "$myvar" | grep Jan
grep Jan < <(echo "$myvar")
Avoid the variable to start with, and send the output of ls directly to grep :
ls -l | grep Jan
grep Jan < <(ls -l)
Provide grep with an expression that actually is a file name :
grep Jan <(ls -l)
The <(ls -l) expression is syntax that causes a FIFO (first-in first-out) special file to be created. The ls -l command sends its output to FIFO. The expression is converted by Bash to an actual file name that can be used for reading.
To clear any confusion, the two statements below (already shown above) look similar, but are fundamentally very different :
grep Jan <(ls -l)
grep Jan < <(ls -l)
In the first one, grep receives a file name as an argument and reads this files. In the second case, the additional < (whitespace between the two < is important) creates a redirection that reads the FIFO and feeds its output to the standard input of grep. There is a FIFO in both cases, but it is presented in a totally different way to the command.

I think there's a fundamental misunderstanding of how Unix tools/Bash operates here.
It appears what you're trying to do here is store the output of ls in a variable (which is something you shouldn't do for other reasons) and trying to grep across the string stored inside that variable using grep.
This is not how grep works. If you look at the man page for grep, it says:
SYNOPSIS
grep [OPTIONS] PATTERN [FILE...]
grep [OPTIONS] [-e PATTERN | -f FILE] [FILE...]
DESCRIPTION
grep searches the named input FILEs for lines containing a match to
the given PATTERN. If no files are specified, or if the file “-” is
given, grep searches standard input. By default, grep prints the
matching lines.
Note that it specifically says "grep searches the named input FILEs".
Then it goes on to say "If no files are specified [...] grep searches standard input".
In other words, by definition grep does not search over strings. It searches files. Therefore you can not pass grep a string, via a bash variable.
When you type
grep Jan "$myvar"
Based on the syntax, grep thinks "Jan" is the PATTERN and the entire string in "$myvar" is a FILEname. Hence the error File name too long.
When you write
echo "$myvar" | grep Jan
What you're now doing is making bash output the contents of "$myvar" to standard output. The | (pipe operator) in bash, connects the stdout (standard output) of the echo command, to the stdin (standard input) of the grep command. As noted above, when you omit the FILEname parameter to grep, it searches for a string in it's stdin by default, which is why this works.

Grep takes as command line parameters files, not direct strings. You do indeed need the echo to make grep search in your variable.

bash tail the newest file in folder without variable

I have a bunch of log files in a folder. When I cd into the folder and look at the files it looks something like this.
$ ls -lhat
-rw-r--r-- 1 root root 5.3K Sep 10 12:22 some_log_c48b72e8.log
-rw-r--r-- 1 root root 5.1M Sep 10 02:51 some_log_cebb6a28.log
-rw-r--r-- 1 root root 1.1K Aug 25 14:21 some_log_edc96130.log
-rw-r--r-- 1 root root 406K Aug 25 14:18 some_log_595c9c50.log
-rw-r--r-- 1 root root 65K Aug 24 16:00 some_log_36d179b3.log
-rw-r--r-- 1 root root 87K Aug 24 13:48 some_log_b29eb255.log
-rw-r--r-- 1 root root 13M Aug 22 11:55 some_log_eae54d84.log
-rw-r--r-- 1 root root 1.8M Aug 12 12:21 some_log_1aef4137.log
I want to look at the most recent messages in the most recent log file. I can now manually copy the name of the most recent log and then perform a tail on it and that will work.
$ tail -n 100 some_log_c48b72e8.log
This does involve manual labor so instead I would like to use bash-fu to do this.
I currently found this way to do it;
filename="$(ls -lat | sed -n 2p | tail -c 30)"; tail -n 100 $filename
It works, but I am bummed out that I need to save data into a variable to do it. Is it possible to do this in bash without saving intermediate results into a variable?

tail -n 100 "$(ls -at | head -n 1)"
You do not need ls to actually print timestamps, you just need to sort by them (ls -t). I added the -a option because it was in your original code, but note that this is not necessary unless your logfiles are "dot files", i.e. starting with a . (which they shouldn't).
Using ls this way saves you from parsing the output with sed and tail -c. (And you should not try to parse the output of ls.) Just pick the first file in the list (head -n 1), which is the newest. Putting it in quotation marks should save you from the more common "problems" like spaces in the filename. (If you have newlines or similar in your filenames, fix your filenames. :-D )
Instead of saving into a variable, you can use command substitution in-place.

A truly ls-free solution:
tail -n 100 < <(
for f in *; do
[[ $f -nt $newest ]] && newest=$f
done
cat "$newest"
)
There's no need to initialize newest, since any file will be newer than the null file named by the empty string.
It's a bit verbose, but it's guaranteed to work with any legal file name. Save it to a shell function for easier use:
tail_latest () {
dir=${1:-.}
size=${2:-100}
for f in "$dir"/*; do
[[ $f -nt $newest ]] && newest=$f
done
tail -f "$size" "$newest"
}
Some examples:
# Default of 100 lines from newest file in the current directory
tail_latest
# 200 lines from the newest file in another directory
tail_latest /some/log/dir 200
A plug for zsh: glob qualifiers let you sort the results of a glob directly, making it much easier to get the newest file.
tail -n 100 *(om[1,1])
om sorts the results by modification time (newest first). [1,1] limits the range of files matched to the first. (I think Y1 should do the same, but it kept giving me an "unknown file attribute" error.)

Without parsing ls, you'd use stat
tail -n 100 "$(stat -c "%Y %n" * | sort -nk1,1 | tail -1 | cut -d" " -f 2-)"
Will break if your filenames contain newlines.
version 2: newlines are OK
tail -n 100 "$(
stat --printf "%Y:%n\0" * |
sort -z -t: -k1,1nr |
{ IFS=: read -d '' time filename; echo "$filename"; }
)"

You can try this way also
ls -1t | head -n 1 | xargs tail -c 50
Explanation :
ls -1rht -- list the files based on modified time in reverse order.
tail -n 1 -- get the last one file
tail -c 50 -- show the last 50 character from the file.

Sort logs by date field in bash

let's have
126 Mar 8 07:45:09 nod1 /sbin/ccccilio[12712]: INFO: sadasdasdas
2 Mar 9 08:16:22 nod1 /sbin/zzzzo[12712]: sadsdasdas
1 Mar 8 17:20:01 nod1 /usr/sbin/cron[1826]: asdasdas
4 Mar 9 06:24:01 nod1 /USR/SBIN/CRON[27199]: aaaasdsd
1 Mar 9 06:24:01 nod1 /USR/SBIN/CRON[27201]: aaadas
I would like to sort this output by date and time key.
Thank you very much.
Martin

For GNU sort: sort -k2M -k3n -k4
-k2M sorts by second column by month (this way "March" comes before "April")
-k3n sorts by third column in numeric mode (so that " 9" comes before "10")
-k4 sorts by the fourth column.
See more details in the manual.

little off-topic - but anyway. only useful when working within filetrees
ls -l -r --sort=time
from this you could create a one-liner which for example deletes the oldest backup in town.
ls -l -r --sort=time | grep backup | head -n1 | while read line; do oldbackup=\`echo $line | awk '{print$8}'\`; rm $oldbackup; done;

days need numeric (not lexical) sort, so it should be sort -s -k 2M -k 3n -k 4,4
See more details here.

You can use the sort command:
cat $logfile | sort -M -k 2
That means: Sort by month (-M) beginning from second column (-k 2).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string