Count directories and subdirectories - linux

I want to combine directories and sub-directories and sum-up the first column as follows:
original output:
8 ./.g/apps/panel/mon/lt/prefs
12 ./.g/apps/panel/mon/lt
40 ./.g/apps/panel/mon
44 ./.g/apps/panel
88 ./.g/apps
112 ./.g
4 ./.g
4 ./.pof
20 ./.local/share/applications
4 ./.local/share/m/packages
8 ./.local/share/m
4 ./.local/share/Trash/info
4 ./.local/share/Trash/files
12 ./.local/share/Trash
44 ./.local/share
new output:
308 ./.g
4 ./.pof
96 ./.local/share
the original command: du -k, and I'm trying with awk and cut commands but fails.
edit:- I got up to here:
du -k | awk '{print $1}' | cut -d "/" -f 1
Now, I'm struggling to merge similar lines and sum-up the first column.
p.s this is just an output example*
thank you.

Use du -d 1 to list accumulative content of 1 directory bellow current.
du -h -d 1
Provide a human readable count.

You can try with command:
du -sh *

Try
du -sk .g .pof .local/share
The -s switch is summary, that is, du will search all the files, all the way down the folders inside, and report just the grand total. (The -k switch print the size in kilobytes; thanks Romeo Ninov).
You have to manually specify each folder you want to know the grand total of.
If you type, for example
du -sk .
it will output just a single number, accounting for the current folder (and below) file sizes.
If you type
du -sk *
the result will depend on what your shell expands * to (usually all the files and folders not starting with a dot (.) in the current folder).

Related

Remove duplicates from INSANE BIG WORDLIST

What is the best way of doing this? It's a 250GB Text file 1 word per line
Input:
123
123
123
456
456
874
875
875
8923
8932
8923
Output wanted:
123
456
874
875
8923
8932
I need to get 1 copy of each duplicated line I DON'T WANT if there are 2 of the SAME LINES, REMOVE BOTH, just remove 1, always keeping 1 unique line.
What I do now:
$ cat final.txt | sort | uniq > finalnoduplicates.txt
In a screen, this is working? I don't know, because when I check the size of output file, and it's 0:
123user#instance-1:~$ ls -l
total 243898460
-rw-rw-r-- 1 123user 249751990933 Sep 3 13:59 final.txt
-rw-rw-r-- 1 123user 0 Sep 3 14:26 finalnoduplicates.txt
123user#instance-1:~$
But when I check htop cpu value of the screen running this command is at 100%.
Am I doing something wrong?
You can do this using just sort.
$ sort -u final.txt > finalnoduplicates.txt
You can simplify this further and just have sort do all of it:
$ sort -u final.txt -o finalnoduplicates.txt
Finally, since your input file is purely just numerical data, you can tell sort via the -n switch this to further improve the overall performance of this task:
$ sort -nu final.txt -o finalnoduplicates.txt
sort's man page
-n, --numeric-sort
compare according to string numerical value
-u, --unique
with -c, check for strict ordering; without -c, output only the
first of an equal run
-o, --output=FILE
write result to FILE instead of standard output
I found out about this awesome tool called Duplicut. The entire point of the project was to combine the advantages of unique sorting and increasing the memory limit for wordlists.
It is pretty simple to install, this is the GitHub link
https://github.com/nil0x42/duplicut

List all directories sorted by size in descending order

I have a requirement to sort all directories of current directory in descended order by size.
I tried following
du -sh * | sort -rg
It is listing all the folders by size but it's just listing by size of folder by values. However it's not sorting correcting. 100 MB Dir should be listed before 200KB.
Any help will be appreciable.
-g is for floats. For human-readable output use human-readable sort:
du -sh * | sort -rh
If you have numfmt utility from coreutils, you can use numeric sort with numfmt formatting:
du -s * | sort -rn | numfmt --to=iec -d$'\t' --field=1
I prefer to just go straight to comparing bytes.
du -b * | sort -nr
du -b reports bytes.
sort -n sorts numerically. Obviously, -r reverses.
My /tmp before I clean it -
104857600 wbxtra_RESIDENT_07202018_075931.wbt
815372 wbxtra_RESIDENT_07192018_075744.wbt
215310 Slack Crashes
148028 wbxtra_RESIDENT_07182018_162525.wbt
144496 wbxtra_RESIDENT_07182018_163507.wbt
141688 wbxtra_RESIDENT_07182018_161957.wbt
56617 Notification Cache
20480 ~DFFA6E4895E749B423.TMP
16384 ~DF543949D7B4DF074A.TMP
13254 AdobeARM.log
3614 PhishMeOutlookReporterLoader.log
3448 msohtmlclip1/01
3448 msohtmlclip1
512 ~DF92FFF2C02995D884.TMP
28 ExchangePerflog_8484fa311d504d0fdcd6c672.dat
0 WPDNSE
0 VPMECTMP
0 VBE
Don't ask the machine to process human data:
du -s * | sort -rg

Why does du give different results?

I use du command very often. Recently, I used it to find out 5 heaviest files in root directory of my server, I used the following command:
sudo du -ah / | sort -nr | head -n 5
Result was:
1016K /var/cache/apt/archives/fonts-dejavu-core_2.35-1_all.deb
1016K /bin/bash
1008K /usr/src/linux-aws-headers-4.4.0-1052/fs
1008K /usr/src/linux-aws-headers-4.4.0-1049/fs
1004K /var/awslogs/lib/python2.7/site-packages/botocore/data/ec2/2016-09-15/
I then removed -h, and observed an entirely different result:
sudo du -a / | sort -nr | head -n 5
Result:
2551396 /
1189240 /usr
894000 /var
541836 /usr/lib
406276 /var/lib
From the man page of du,
-h, --human-readable
print sizes in human readable format (e.g., 1K 234M 2G)
According to my understanding, including or excluding of -h should not really cause change in the results, just the size format.
Could you help me understand why this would happen?
1016K is numerically greater than 2M. sort -n just extracts 1016 and 2.
Try sort -h if your sort supports it.
The sort command does not take K/M/G into account, and sorts all "K" before all "M".
Probably whith -h your directories' sizes are something like 2G or 4G, while files remain 1016K or 1008K, and 2 is smaller than 1016 hence directories get sorted out to the end of a list.
Without -h total sizes of directories (actually: directory trees) appear clearly greater than sizes of files inside, hence directories are sorted at the top of a list.

How can I list (ls) the 5 last modified files in a directory?

I know ls -t will list all files by modified time. But how can I limit these results to only the last n files?
Try using head or tail. If you want the 5 most-recently modified files:
ls -1t | head -5
The -1 (that's a one) says one file per line and the head says take the first 5 entries.
If you want the last 5 try
ls -1t | tail -5
The accepted answer lists only the filenames, but to get the top 5 files one can also use:
ls -lht | head -6
where:
-l outputs in a list format
-h makes output human readable (i.e. file sizes appear in kb, mb, etc.)
-t sorts output by placing most recently modified file first
head -6 will show 5 files because ls prints the block size in the first line of output.
I think this is a slightly more elegant and possibly more useful approach.
Example output:
total 26960312
-rw-r--r--# 1 user staff 1.2K 11 Jan 11:22 phone2.7.py
-rw-r--r--# 1 user staff 2.7M 10 Jan 15:26 03-cookies-1.pdf
-rw-r--r--# 1 user staff 9.2M 9 Jan 16:21 Wk1_sem.pdf
-rw-r--r--# 1 user staff 502K 8 Jan 10:20 lab-01.pdf
-rw-rw-rw-# 1 user staff 2.0M 5 Jan 22:06 0410-1.wmv
Use tail command:
ls -t | tail -n 5
By default ls -t sorts output from newest to oldest, so the combination of commands to use depends in which direction you want your output to be ordered.
For the newest 5 files ordered from newest to oldest, use head to take the first 5 lines of output:
ls -t | head -n 5
For the newest 5 files ordered from oldest to newest, use the -r switch to reverse ls's sort order, and use tail to take the last 5 lines of output:
ls -tr | tail -n 5
ls -t list files by creation time not last modified time. Use ls -ltc if you want to list files by last modified time from last to first(top to bottom). Thus to list the last n: ls -ltc | head ${n}
None of other answers worked for me. The results were both folders and files, which is not what I would expect.
The solution that worked for me was:
find . -type f -mmin -10 -ls
This lists in the current directory all the files modified in the last 10 minutes. It will not list last 5 files, but I think it might help nevertheless
if you want to watch as it process last five modified file and refresh in 2 secs and show total number of files at top use this:
watch 'ls -Art | wc -l ; ls -ltr | tail -n 5'

Linux:How to list the information about file or directory(size,permission,number of files by type?) in total

Suppose I am staying in currenty directory, I wanted to list all the files in total numbers, as well as the size, permission, and also the number of files by types.
here is the sample outputs:
Here is a sample :
Print information about "/home/user/poker"
total number of file : 83
pdf files : 5
html files : 9
text files : 15
unknown : 5
NB: anyfile without extension could be consider as unknown.
i hope to use some simple command like ls, cut, sort, unique ,(just examples) put each different extension in file and using wc -l to count number of lines
or do i need to use grep, awk , or something else?
Hope to get the everybody's advices.thank you!
Best way is to use file to output only mimetype and pass it to awk.
file * -ib | awk -F'[;/.]' '{print $(NF-1)}' | sort -n | uniq -c
On my home directory it produces this output.
35 directory
3 html
1 jpeg
1 octet-stream
1 pdf
32 plain
5 png
1 spreadsheet
7 symlink
1 text
1 x-c++
3 x-empty
1 xml
2 x-ms-asf
4 x-shellscript
1 x-shockwave-flash
If you think text/x-c++ and text/plain should be in same Use this
file * -ib | awk -F'[;/.]' '{print $1}' | sort -n | uniq -c
6 application
6 image
45 inode
40 text
2 video
Change the {print $1} part according to your need to get the appropriate output.
You need bash.
files=(*)
pdfs=(*.pdf)
echo "${#files[#]}"
echo "${#pdfs[#]}"
echo "$((${#files[#]}-${#pdfs[#]}))"
find . -type f | xargs -n1 basename | fgrep . | sed 's/.*\.//' | sort | uniq -c | sort -n
That gives you a recursive list of file extensions. If you want only the current directory add a -maxdepth 1 to the find command.

Resources