How do i strip the full path retaining only the filename? - linux

I am using the following find command to list all the files recursively within a folder and sort it by size ( higest size being on top )
find . -not -path '*/\.*' -not -name '*.nfo' -type f -exec du -h {} + | sort -r -h
The command is working well but i need to strip off the full path from each result only retaining the filename
Eg.
Dir/AnotherDir/file.mp4 should be listed as file.mp4
Generally when i have to do this in find command , i simply use -printf '%f\n' but this is can't be used in my current command as the files are being printed by du command

Just post process the data:
find ... | sort ... | sed -E 's#[[:space:]].*/# #'
or
... | awk '{printf "%s\t%s\n", $1, $NF}' FS='\t\|/'

Related

UNIX: Use a single find command to search files larger than 4 MiB, then pipe the output to a sort command

I currently have a question I am trying to answer below. Below is what I have come up with, but doesn't appear to be working:
find /usr/bin -type f -size +4194304c | sort -n
Am I on the right track with the above?
Question:
Use a single find command to search for all files larger than 4 MiB in
/usr/bin, printing the listing in a long format. Pipe this output to a sort command
which will sort the list from largest to smallest
I'd fiddle with for -printf command line switch, sth like this:
find YOUR_CONDITION_HERE -printf '%s %p\n' | sort -n: %s stands for size in bytes, %p for file name.
You can trim the sizes later, e.g. using cut, e.g.:
find -type f -size +4194304c -printf '%s %p\n' | sort -n | cut -f 2 -d ' '
But given the fact you need the long list format, I guess you'll be adding more fields to printf's argument.
Related topic: https://superuser.com/questions/294161/unix-linux-find-and-sort-by-date-modified
You are on the right track, but the find command will only output the name of the file, not it's size. This is why sort will sort them alphabetically.
To sort by size, you can output the file list and then pass it to ls with xargs like this:
find /usr/bin -type f -size +4194304c | xargs ls -S
If you want ls to output the file list on a single column, you can replace the -S with -S1. The command would become:
find /usr/bin -type f -size +4194304c | xargs ls -S1
To make your command resistant to all filenames, I would suggest using -print0 (it will separate paths with the null character which is the only one that cannot appear in a filename in Linux). The command would become:
find /usr/bin -type f -size +4194304c -print0 | xargs -0 ls -S1
You could also try
find /usr/bin -type f -size +4194304c -ls | sort -n -k7
and if you want the results reversed then try
find /usr/bin -type f -size +4194304c -ls | sort -r -n -k7
Or another option
find /usr/bin -type f -size +4194304c -exec ls -lSd {} +

How to use "find" and "grep" to get the file size too?

I have this script:
find test -type f \( -iname \*.html -o -iname \*.htm -o -iname \*.xhtml \) -exec grep -il ".swf" {} \; -printf '%k KB - \t %p\n' > result-swf-files.csv
This will search the directory "test" (and its subdirectories) for all HTML files which contains the word ".swf" in them. ANd will write a CSV file with the results.
But I want to get the file size too in the same line (now, the script outputs on one line the grep result - which doesn't have the file size - and in another line the printf result - which includes the file size).
How do I add an option to grep to get the file size?
A less verbose way is to use recursive grep (if your system supports it):
grep -rl --include="*.htm*" ".swf" test|xargs ls -l|awk '{ print $9 "," $5 }'
Explanation :
Grep recursively using the "rl" flag
include file pattern "*.htm"
search for the string ".swf" in each htm* file
search only under the "test" directory
pipe the result to xargs where each filename becomes argument to "ls -l" command
Then use awk to get to only filename and file size. Use comma "," in between 9th and 5th columns in awk print to get the csv output.
Feel free to replace "ls -l" with human readable variants such as "ls -lk" or "ls -lh"
Alternatively, in your script, you can just print only the 2nd line of each file (the one that contains the size). You can just pipe and use grep like this : grep "[0-9] [KB]"
Below is the complete command:
find . -type f \( -iname \*.html -o -iname \*.htm -o -iname \*.xhtml \) -exec grep -il ".swf" {} \; -printf '%k KB - \t %p\n'| grep "[0-9] [KB]"
find . -name *PATTERN*.gz -print0 | xargs -0 ls -lh
So you get ls for all files that you want.

Output file size for all files of certain type in directory recursively?

I am trying to get a sum of the total size of PDF files recursively within a directory. I have tried running the command below within the directory, but the recursive part does not work properly as it seems to only report on the files in the current directory and not include the directories within. I am expecting the result to be near 100 GB in size however the command is only reporting about 200 MB of files.
find . -name "*.pdf" | xargs du -sch
Please help!
Use stat -c %n,%s to get the file name and size of the individual files. Then use awk to sum the size and print.
$ find . -name '*.pdf' -exec stat -c %n,%s {} \; | awk -F, '{sum+=$2}END{print sum}'
In fact you don't need %n, since you want only the sum:
$ find . -name '*.pdf' -exec stat -c %s {} \; | awk '{sum+=$1}END{print sum}'
You can get the sum of sizes of all files using:
find . -name '*.pdf' -exec stat -c %s {} \; | tr '\n' '+' | sed 's/+$/\n/'| bc
First find finds all the files as specified and for each file runs stat, which prints the file size. Then I supstitude all newline with a '+' using tr, then I remove the trailing '+' for the newline back and pass it to bc, which prints the sum.

List newest file, by type (.txt), after searching recursively, in a terminal

I'm trying to get my terminal to return the latest .txt file, with path intact. I've been researching ls, grep, find, and tail, using the '|' functionality of passing results from one utility to the next. The end result would be to have a working path + result that I could pass my text editor.
I've been getting close with tests like this:
find . | grep '.txt$' | tail -1
..but I haven't had luck with grep returning the newest file - is there a flag I'm missing?
Trying to use find & ls isn't exactly working either:
find . -name "*.txt" | ls -lrth
..the ls returns the current directories instead of the results of my find query.
Please help!
You're so very close.
vi "$(find . -name '*.txt' -exec ls -t {} + | head -1)"
find /usr/share -name '*.txt' -printf '%C+ %p\n' | sort -r | head -1 | sed 's/^[^ ]* //'
If you have bash4+
ls -t ./**/*.txt | head -1
edit the latest txt file
vim $(ls -t ./**/*.txt |head -1)
ps: need enabled shopt -s globstar in your .bashrc or .profile...
You can use the stat function to print each file with just the latest modification time and name.
find . -name "*.txt" -exec stat -c "%m %N" {} \; | sort

Measure disk space of certain file types in aggregate

I have some files across several folders:
/home/d/folder1/a.txt
/home/d/folder1/b.txt
/home/d/folder1/c.mov
/home/d/folder2/a.txt
/home/d/folder2/d.mov
/home/d/folder2/folder3/f.txt
How can I measure the grand total amount of disk space taken up by all the .txt files in /home/d/?
I know du will give me the total space of a given folder, and ls -l will give me the total space of individual files, but what if I want to add up all the txt files and just look at the space taken by all .txt files in one giant total for all .txt in /home/d/ including both folder1 and folder2 and their subfolders like folder3?
find folder1 folder2 -iname '*.txt' -print0 | du --files0-from - -c -s | tail -1
This will report disk space usage in bytes by extension:
find . -type f -printf "%f %s\n" |
awk '{
PARTSCOUNT=split( $1, FILEPARTS, "." );
EXTENSION=PARTSCOUNT == 1 ? "NULL" : FILEPARTS[PARTSCOUNT];
FILETYPE_MAP[EXTENSION]+=$2
}
END {
for( FILETYPE in FILETYPE_MAP ) {
print FILETYPE_MAP[FILETYPE], FILETYPE;
}
}' | sort -n
Output:
3250 png
30334451 mov
57725092729 m4a
69460813270 3gp
79456825676 mp3
131208301755 mp4
Simple:
du -ch *.txt
If you just want the total space taken to show up, then:
du -ch *.txt | tail -1
Here's a way to do it (in Linux, using GNU coreutils du and Bash syntax), avoiding bad practice:
total=0
while read -r line
do
size=($line)
(( total+=size ))
done < <( find . -iname "*.txt" -exec du -b {} + )
echo "$total"
If you want to exclude the current directory, use -mindepth 2 with find.
Another version that doesn't require Bash syntax:
find . -iname "*.txt" -exec du -b {} + | awk '{total += $1} END {print total}'
Note that these won't work properly with file names which include newlines (but those with spaces will work).
macOS
use the tool du and the parameter -I to exclude all other files
Linux
-X, --exclude-from=FILE
exclude files that match any pattern in FILE
--exclude=PATTERN
exclude files that match PATTERN
This will do it:
total=0
for file in *.txt
do
space=$(ls -l "$file" | awk '{print $5}')
let total+=space
done
echo $total
GNU find,
find /home/d -type f -name "*.txt" -printf "%s\n" | awk '{s+=$0}END{print "total: "s" bytes"}'
Building on ennuikiller's, this will handle spaces in names. I needed to do this and get a little report:
find -type f -name "*.wav" | grep export | ./calc_space
#!/bin/bash
# calc_space
echo SPACE USED IN MEGABYTES
echo
total=0
while read FILE
do
du -m "$FILE"
space=$(du -m "$FILE"| awk '{print $1}')
let total+=space
done
echo $total
A one liner for those with GNU tools on bash:
for i in $(find . -type f | perl -ne 'print $1 if m/\.([^.\/]+)$/' | sort -u); do echo "$i"": ""$(du -hac **/*."$i" | tail -n1 | awk '{print $1;}')"; done | sort -h -k 2 -r
You must have extglob enabled:
shopt -s extglob
If you want dot files to work, you must run
shopt -s dotglob
Sample output:
d: 3.0G
swp: 1.3G
mp4: 626M
txt: 263M
pdf: 238M
ogv: 115M
i: 76M
pkl: 65M
pptx: 56M
mat: 50M
png: 29M
eps: 25M
etc
my solution to get a total size of all text files in a given path and subdirectories (using perl oneliner)
find /path -iname '*.txt' | perl -lane '$sum += -s $_; END {print $sum}'
I like to use find in combination with xargs:
find . -name "*.txt" -print0 |xargs -0 du -ch
Add tail if you only want to see the grand total
find . -name "*.txt" -print0 |xargs -0 du -ch | tail -n1
For anyone wanting to do this with macOS at the command line, you need a variation based on the -print0 argument instead of printf. Some of the above answers address that but this will do it comprehensively by extension:
find . -type f -print0 | xargs -0 stat -f "%N %i" |
awk '{
PARTSCOUNT=split( $1, FILEPARTS, "." );
EXTENSION=PARTSCOUNT == 1 ? "NULL" : FILEPARTS[PARTSCOUNT];
FILETYPE_MAP[EXTENSION]+=$2
}
END {
for( FILETYPE in FILETYPE_MAP ) {
print FILETYPE_MAP[FILETYPE], FILETYPE;
}
}' | sort -n
There are several potential problems with the accepted answer:
it does not descend into subdirectories (without relying on non-standard shell features like globstar)
in general, as pointed out by Dennis Williamson below, you should avoid parsing the output of ls
namely, if the user or group (columns 3 and 4) have spaces in them, column 5 will not be the file size
if you have a million such files, this will spawn two million subshells, and it'll be sloooow
As proposed by ghostdog74, you can use the GNU-specific -printf option to find to achieve a more robust solution, avoiding all the excessive pipes, subshells, Perl, and weird du options:
# the '%s' format string means "the file's size"
find . -name "*.txt" -printf "%s\n" \
| awk '{sum += $1} END{print sum " bytes"}'
Yes, yes, solutions using paste or bc are also possible, but not any more straightforward.
On macOS, you would need to use Homebrew or MacPorts to install findutils, and call gfind instead. (I see the "linux" tag on this question, but it's also tagged "unix".)
Without GNU find, you can still fall back to using du:
find . -name "*.txt" -exec du -k {} + \
| awk '{kbytes+=$1} END{print kbytes " Kbytes"}'
…but you have to be mindful of the fact that du's default output is in 512-byte blocks for historical reasons (see the "RATIONALE" section of the man page), and some versions of du (notably, macOS's) will not even have an option to print sizes in bytes.
Many other fine solutions here (see Barn's answer in particular), but most suffer the drawback of being unnecessarily complex or depending too heavily on GNU-only features—and maybe in your environment, that's OK!

Resources