Bash How to find directories of given files - linux

I have a folder with several subfolders and in some of those subfolders I have text files example_1.txt, example_2.txt and so on. example_1.txt may be found in subfolder1, some subfolders do not contain text files.
How can I list all directories that contain a text file starting with example?
I can find all those files by running this command
find . -name "example*"
But what I need to do is to find the directories these files are located in? I would need a list like this subfolder1, subfolder4, subfolder8 and so on. Not sure how to do that.

You must be use the following command,
find . -name "example*" | uniq -u | awk -F / '{print $2}'

find all files in subfolders with mindepth 2 to avoid this folder (.)
get dirname with xargs dirname
sort output-list and make folders unique with sort -u
print only basenames with awk (delimiter is / and last string is $NF). add "," after every subfolder
translate newlines in blanks with tr
remove last ", " with sed
list=$(find ./ -mindepth 2 -type f -name "example*"|xargs dirname|sort -u|awk -F/ '{print $NF","}'|tr '\n' ' '|sed 's/, $//')
echo $list
flow, over, stack

Suggesting find command that prints only the directories path.
Than sort the paths.
Than remove duplicates.
find . -type f -name "example*" -printf "%h\n"|sort|uniq

Related

How to use GNU find command to find files by pattern and list files in order of most recent modification to least?

I want to use the GNU find command to find files based on a pattern, and then have them displayed in order of the most recently modified file to the least recently modified.
I understand this:
find / -type f -name '*.md'
but then what would be added to sort the files from the most recently modified to the least?
find can't sort files, so you can instead output the modification time plus filename, sort on modification time, then remove the modification time again:
find . -type f -name '*.md' -printf '%T# %p\0' | # Print time+name
sort -rnz | # Sort numerically, descending
cut -z -d ' ' -f 2- | # Remove time
tr '\0' '\n' # Optional: make human readable
This uses \0-separated entries to avoid problems with any kind of filenames. You can pass this directly and safely to a number of tools, but here it instead pipes to tr to show the file list as individual lines.
find <dir> -name "*.mz" -printf "%Ts - %h/%f\n" | sort -rn
Print the modified time in epoch format (%Ts) as well as the directories (%h) and file name (%f). Pipe this through to sort -rn to sort in reversed number order.
Pipe the output of find to xargs and ls:
find / -type f -name '*.md' | xargs ls -1t

Search filenames for a list of patterns and copy to destination

I have a list of patterns in filenames.txt, and I want to search a folder for filenames containing the names.
patterns.txt:
254b
0284ee
001ty
288qa
I want to search a folder for filenames containing any of these patterns in its filename and copy all found files to a destination directory.
So far i found a solution to view files as follows:
set -f; find ./ -type f \( $(printf -- ' -o - iname *%s*' $(cat patterns.txt) | cut -b4-) \); set +f
I can find all files based on the patterns on my patterns.txt file, but how do I copy them top a newfolder ?
Assuming target folder will not need to maintain the original hierarchy (or that the input directory does not have sub directories), using find, grep, and xargs should work:
find . -type f -print0 |
grep -z -i -F -f patterns.txt |
xargs -0 -s1000 cp -t /new/folder
The sequence has the advantage of bulking the copy - will be efficient for large number of files. Using NUL to separate file name should allow any special character in the file name.

How to count number of files in each directory?

I am able to list all the directories by
find ./ -type d
I attempted to list the contents of each directory and count the number of files in each directory by using the following command
find ./ -type d | xargs ls -l | wc -l
But this summed the total number of lines returned by
find ./ -type d | xargs ls -l
Is there a way I can count the number of files in each directory?
This prints the file count per directory for the current directory level:
du -a | cut -d/ -f2 | sort | uniq -c | sort -nr
Assuming you have GNU find, let it find the directories and let bash do the rest:
find . -type d -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
find . -type f | cut -d/ -f2 | sort | uniq -c
find . -type f to find all items of the type file, in current folder and subfolders
cut -d/ -f2 to cut out their specific folder
sort to sort the list of foldernames
uniq -c to return the number of times each foldername has been counted
You could arrange to find all the files, remove the file names, leaving you a line containing just the directory name for each file, and then count the number of times each directory appears:
find . -type f |
sed 's%/[^/]*$%%' |
sort |
uniq -c
The only gotcha in this is if you have any file names or directory names containing a newline character, which is fairly unlikely. If you really have to worry about newlines in file names or directory names, I suggest you find them, and fix them so they don't contain newlines (and quietly persuade the guilty party of the error of their ways).
If you're interested in the count of the files in each sub-directory of the current directory, counting any files in any sub-directories along with the files in the immediate sub-directory, then I'd adapt the sed command to print only the top-level directory:
find . -type f |
sed -e 's%^\(\./[^/]*/\).*$%\1%' -e 's%^\.\/[^/]*$%./%' |
sort |
uniq -c
The first pattern captures the start of the name, the dot, the slash, the name up to the next slash and the slash, and replaces the line with just the first part, so:
./dir1/dir2/file1
is replaced by
./dir1/
The second replace captures the files directly in the current directory; they don't have a slash at the end, and those are replace by ./. The sort and count then works on just the number of names.
Here's one way to do it, but probably not the most efficient.
find -type d -print0 | xargs -0 -n1 bash -c 'echo -n "$1:"; ls -1 "$1" | wc -l' --
Gives output like this, with directory name followed by count of entries in that directory. Note that the output count will also include directory entries which may not be what you want.
./c/fa/l:0
./a:4
./a/c:0
./a/a:1
./a/a/b:0
Slightly modified version of Sebastian's answer using find instead of du (to exclude file-size-related overhead that du has to perform and that is never used):
find ./ -mindepth 2 -type f | cut -d/ -f2 | sort | uniq -c | sort -nr
-mindepth 2 parameter is used to exclude files in current directory. If you remove it, you'll see a bunch of lines like the following:
234 dir1
123 dir2
1 file1
1 file2
1 file3
...
1 fileN
(much like the du-based variant does)
If you do need to count the files in current directory as well, use this enhanced version:
{ find ./ -mindepth 2 -type f | cut -d/ -f2 | sort && find ./ -maxdepth 1 -type f | cut -d/ -f1; } | uniq -c | sort -nr
The output will be like the following:
234 dir1
123 dir2
42 .
Everyone else's solution has one drawback or another.
find -type d -readable -exec sh -c 'printf "%s " "$1"; ls -1UA "$1" | wc -l' sh {} ';'
Explanation:
-type d: we're interested in directories.
-readable: We only want them if it's possible to list the files in them. Note that find will still emit an error when it tries to search for more directories in them, but this prevents calling -exec for them.
-exec sh -c BLAH sh {} ';': for each directory, run this script fragment, with $0 set to sh and $1 set to the filename.
printf "%s " "$1": portably and minimally print the directory name, followed by only a space, not a newline.
ls -1UA: list the files, one per line, in directory order (to avoid stalling the pipe), excluding only the special directories . and ..
wc -l: count the lines
This can also be done with looping over ls instead of find
for f in */; do echo "$f -> $(ls $f | wc -l)"; done
Explanation:
for f in */; - loop over all directories
do echo "$f -> - print out each directory name
$(ls $f | wc -l) - call ls for this directory and count lines
This should return the directory name followed by the number of files in the directory.
findfiles() {
echo "$1" $(find "$1" -maxdepth 1 -type f | wc -l)
}
export -f findfiles
find ./ -type d -exec bash -c 'findfiles "$0"' {} \;
Example output:
./ 6
./foo 1
./foo/bar 2
./foo/bar/bazzz 0
./foo/bar/baz 4
./src 4
The export -f is required because the -exec argument of find does not allow executing a bash function unless you invoke bash explicitly, and you need to export the function defined in the current scope to the new shell explicitly.
My answer is a little different, due to the options of find, you can actually be much more flexible. Just try:
find . -type f -printf "%h\n" | sort | uniq -c
With the "%h" option to "-printf", find prints only the directory of the files it found. Then sort and count with "uniq -c". This prints the number of search result entries with the same directory, per directory.
Using further options on find, you can be much more flexible. For example, to get an overview how many files in which directory have been modified at a certain date, use:
find . -newermt "2022-01-01 00:00:00" -type f -printf "%TY-%Tm-%Td %h\n" | sort | uniq -c
This finds all files that have been modified since 1. January 2022, prints (with "-printf") the modification date and the directory, then sorts and counts them. In this example, each line in the result has the number of files, the date of modification (without time), and the directory.
Note that "-printf" may not be available in all versions of find I think.
I combined #glenn jackman's answer and #pcarvalho's answer(in comment list, there is something wrong with pcarvalho's answer because the extra style control function of character '`'(backtick)).
My script can accept path as an augument and sort the directory list as ls -l, also it can handles the problem of "space in file name".
#!/bin/bash
OLD_IFS="$IFS"
IFS=$'\n'
for dir in $(find $1 -maxdepth 1 -type d | sort);
do
files=("$dir"/*)
printf "%5d,%s\n" "${#files[#]}" "$dir"
done
FS="$OLD_IFS"
My first answer in stackoverflow, and I hope it can help someone ^_^
THis could be another way to browse through the directory structures and provide depth results.
find . -type d | awk '{print "echo -n \""$0" \";ls -l "$0" | grep -v total | wc -l" }' | sh
find . -type f -printf '%h\n' | sort | uniq -c
gives for example:
5 .
4 ./aln
5 ./aln/iq
4 ./bs
4 ./ft
6 ./hot
I tried with some of the others here but ended up with subfolders included in the file count when I only wanted the files. This prints ./folder/path<tab>nnn with the number of files, not including subfolders, for each subfolder in the current folder.
for d in `find . -type d -print`
do
echo -e "$d\t$(find $d -maxdepth 1 -type f -print | wc -l)"
done
This will give the overall count.
for file in */; do echo "$file -> $(ls $file | wc -l)"; done | cut -d ' ' -f 3| py --ji -l 'numpy.sum(l)'
A super fast miracle command, which recursively traverses files to count the number of images in a directory and organize the output by image extension:
find . -type f | sed -e 's/.*\.//' | sort | uniq -c | sort -n | grep -Ei '(tiff|bmp|jpeg|jpg|png|gif)$'
Credits: https://unix.stackexchange.com/a/386135/354980
I edited the script in order to exclude all node_modules directories inside the analyzed one.
This can be used to check if the project number of files is exceeding the maximum number that the file watcher can handle.
find . -type d ! -path "*node_modules*" -print0 | while read -d '' -r dir; do
files=("$dir"/*)
printf "%5d files in directory %s\n" "${#files[#]}" "$dir"
done
To check the maximum files that your system can watch:
cat /proc/sys/fs/inotify/max_user_watches
node_modules folder should be added to your IDE/editor excluded paths in slow systems, and the other files count shouldn't ideally exceed the maximum (which can be changed though).
Easy Method:
find ./|grep "Search_file.txt" |cut -d"/" -f2|sort |uniq -c
In my case I needed the count at subfolder level, so I did:
du -a | cut -d/ -f3 | sort | uniq -c | sort -nr
Easy way to recursively find files of a given type. In this case, .jpg files for all folders in current directory:
find . -name *.jpg -print | wc -l
omg why the complex commands. just use something like
find whatever_folder | wc -l

list the file and its base directory

I have some files in my folder /home/sample/* * /*.pdf and *.doc and * .xls etc ('**' means some sub-sub directory.
I need the shell script or linux command to list the files in following manner.
pdf_docs/xx.pdf
documents/xx.doc
excel/xx.xls
pdf_docs, documents and excel are directories, which is located in various depth in /home/sample. like
/home/sample/12091/pdf_docs/xx.pdf
/home/sample/documents/xx.doc
/home/excel/V2hm/1001/excel/xx.xls
You can try this:
for i in {*.pdf,*.doc,*.xls}; do find /home/sample/ -name "$i"; done | awk -F/ '{print $(NF-1) "/" $NF}'
I ve added a line of awk which will print the last 2 fields (seperated by '/' ) of the result alone
Something like this?
for i in {*.pdf,*.doc,*.xls}; do
find /home/sample/ -name "$i";
done | perl -lnwe '/([^\/]+\/[^\/]+)$/&&print $1'
How about this?
find /home/sample -type f -regex '^.*\.\(pdf\|doc\|xls\)$'
Takes into account spaces in file names, potential case of extension
for a in {*.pdf,*.doc,*.xls}; do find /home/sample/ -type f -iname "$a" -exec basename {} \; ; done
EDIT
Edited to take into account only files
You don't need to call out to an external program to chop the pathname like you're looking for:
$ filename=/home/sample/12091/pdf_docs/xx.pdf
$ echo ${filename%/*/*}
/home/sample/12091
$ echo ${filename#${filename%/*/*}?}
pdf_docs/xx.pdf
So,
find /home/sample -name \*.doc -o -name \*.pdf -o -name \*.xls -print0 |
while read -r -d '' pathname; do
echo "${pathname#${pathname%/*/*}?}"
done

Copy the three newest files under one directory (recursively) to another specified directory

I'm using bash.
Suppose I have a log file directory /var/myprogram/logs/.
Under this directory I have many sub-directories and sub-sub-directories that include different types of log files from my program.
I'd like to find the three newest files (modified most recently), whose name starts with 2010, under /var/myprogram/logs/, regardless of sub-directory and copy them to my home directory.
Here's what I would do manually
1. Go through each directory and do ls -lt 2010*
to see which files starting with 2010 are modified most recently.
2. Once I go through all directories, I'd know which three files are the newest. So I copy them manually to my home directory.
This is pretty tedious, so I wondered if maybe I could somehow pipe some commands together to do this in one step, preferably without using shell scripts?
I've been looking into find, ls, head, and awk that I might be able to use but haven't figured the right way to glue them together.
Let me know if I need to clarify. Thanks.
Here's how you can do it:
find -type f -name '2010*' -printf "%C#\t%P\n" |sort -r -k1,1 |head -3 |cut -f 2-
This outputs a list of files prefixed by their last change time, sorts them based on that value, takes the top 3 and removes the timestamp.
Your answers feel very complicated, how about
for FILE in find . -type d; do ls -t -1 -F $FILE | grep -v "/" | head -n3 | xargs -I{} mv {} ..; done;
or laid out nicely
for FILE in `find . -type d`;
do
ls -t -1 -F $FILE | grep -v "/" | grep "^2010" | head -n3 | xargs -I{} mv {} ~;
done;
My "shortest" answer after quickly hacking it up.
for file in $(find . -iname *.php -mtime 1 | xargs ls -l | awk '{ print $6" "$7" "$8" "$9 }' | sort | sed -n '1,3p' | awk '{ print $4 }'); do cp $file ../; done
The main command stored in $() does the following:
Find all files recursively in current directory matching (case insensitive) the name *.php and having been modified in the last 24 hours.
Pipe to ls -l, required to be able to sort by modification date, so we can have the first three
Extract the modification date and file name/path with awk
Sort these files based on datetime
With sed print only the first 3 files
With awk print only their name/path
Used in a for loop and as action copy them to the desired location.
Or use #Hasturkun's variant, which popped as a response while I was editing this post :)

Resources