Find but do not descend into directories containing the searched files - linux

I have several projects configured by a pom.xml or similar.
I would like to use the linux file command to locate these projects e.g. by find -name pom.xml.
This however takes some time because of the deep paths. I would like to use find -prune to stop searching in subdirectories when I already find the file, but prune only stops on matched directories not on matched files.
Is there a way to get find to stop descending when the directory aleady contains a searched file?
For clarification; this is what I do without find:
pfind() {
parent=$1 && shift
for file in "$#" ; do
path=$parent/$file
if [ -e "$path" ] ; then
echo "$path"
exit 0
fi
done
for dir in $(echo $parent/*) ; do
if [ -d "$dir" ] ; then
pfind "$dir" "$#"
fi
done
}
But I'd rather use a simple way with find so it is better understandable/extendable for others

find . -name pom.xml -print -quit
If you want to speed up the search, you can also work with locate, which queries a database instead of scanning the file system.
You can update the database using by running updatedb

One line python:
python -c 'import os; print "\n".join(p for p, d, f in os.walk(os.sys.argv[1], topdown=True) if os.sys.argv[2] in f and list(d.remove(i) for i in list(d)))' $PWD pom.xml
os.walk idea from https://stackoverflow.com/a/37267686/1888983

Related

How to delete older files but keep recent ones during backup?

I have a remote server that copies 30-some backup files to a local server every day and I want to remove the old backups if and only if a newer backup successfully copied.
With different codes I tried, I managed to erase older files, but I got the problem that if it found one new backup, it deleted ALL older ones.
I have something like (picture this with 20 virtual machines):
vm001-2019-08-01.bck
vm001-2019-07-28.bck
vm002-2019-08-01.bck
vm003-2019-07-29.bck
vm004-2019-08-01.bck
vm004-2019-07-31.bck
vm004-2019-07-30.bck
vm004-2019-07-29.bck
...
And I'd want to erase all but keep only the most recent ones.
i.e.: erase:
vm001-2019-07-28.bck
vm002-2019-07-29.bck
vm004-2019-07-31.bck
vm004-2019-07-30.bck
vm004-2019-07-29.bck
and keep only:
vm001-2019-08-01.bck
vm002-2019-08-01.bck
vm003-2019-07-29.bck
vm004-2019-08-01.bck
the problem I had is that if I have any recent backup of any machine, files like vm-003-2019-07-29 get deleted, because they are older, even if they are of different machines.
I know there are several variants of this question in the site, but I can't quite get this to work.
I've been trying variants of this code:
#!/bin/bash
for i in ./*.bck
do
echo "found" "$i"
if [[ -n $(find "$i" -type f -mmin -1440) ]]
then
echo "$i"
find "$i" -type f -mmin +1440 -exec rm -f "$i" {} +
fi
done
(The echos are for debugging purposes only)
At this time, this code finds the newer and the older files, but doesn't delete anything. If I put find "$i" -type f -mmin +1440 -exec echo "$i" {} +, it never prints anything, as if find $i is not finding anything, but when I run it as a solo command in the terminal, it does (minus the -exec part).
I've tested this script generating files with different timestamps using touch -d, but I had no success.
Unless you add the -name test before the filename find is going to consider "$i" to be the name of a directory to search in. So your find command should be:
find -name "$i" -type f -mmin -1440
which will search in the current directory. Or
find /path/to/dir -name "$i" -type f -mmin -1440
which will search in a directory named "/path/to/dir".
But, based on BashFAQ/099, I would do this to delete all but the newest file for each VM (untested):
#!/bin/bash
declare -A newest # associative array to store name of newest file for each VM
for f in *
do
vm=${f%%-*} # extracts vm name from filename (i.e. vmm001 from vm001-2019-08-01.bck)
if [[ -f $f && $f -nt ${newest["$vm"]} ]]
then
newest["$vm"]=$f
fi
done
for f in *
do
vm=${f%%-*}
if [[ -f $f && $f != ${newest["$vm"]} ]]
then
rm "$f"
fi
done
This is set up to run against files in the current directory. It assumes that the files are named as shown in the question (the VM name is separated from the rest of the file name by a hyphen). In order to use an associative array, Bash 4 or higher is required.

using IF to see a directory exists if not do something

I am trying to move the directories from $DIR1 to $DIR2 if $DIR2 does not have the same directory name
if [[ ! $(ls -d /$DIR2/* | grep test) ]] is what I currently have.
then
mv $DIR1/test* /$DIR2
fi
first it gives
ls: cannot access //data/lims/PROCESSING/*: No such file or directory
when $DIR2 is empty
however, it still works.
secondly
when i run the shell script twice.
it doesn't let me move the directories with the similar name.
for example
in $DIR1 i have test-1 test-2 test-3
when it runs for the first time all three directories moves to $DIR2
after that i do mkdir test-4 at $DIR1 and run the script again..
it does not let me move the test-4 because my loop thinks that test-4 is already there since I am grabbing all test
how can I go around and move test-4 ?
Firstly, you can check whether or not a directory exists using bash's built in 'True if directory exists' expression:
test="/some/path/maybe"
if [ -d "$test" ]; then
echo "$test is a directory"
fi
However, you want to test if something is not a directory. You've shown in your code that you already know how to negate the expression:
test="/some/path/maybe"
if [ ! -d "$test" ]; then
echo "$test is NOT a directory"
fi
You also seem to be using ls to get a list of files. Perhaps you want to loop over them and do something if the files are not a directory?
dir="/some/path/maybe"
for test in $(ls $dir);
do
if [ ! -d $test ]; then
echo "$test is NOT a directory."
fi
done
A good place to look for bash stuff like this is Machtelt Garrels' guide. His page on the various expressions you can use in if statements helped me a lot.
Moving directories from a source to a destination if they don't already exist in the destination:
For the sake of readability I'm going to refer to your DIR1 and DIR2 as src and dest. First, let's declare them:
src="/place/dir1/"
dest="/place/dir2/"
Note the trailing slashes. We'll append the names of folders to these paths so the trailing slashes make that simpler. You also seem to be limiting the directories you want to move by whether or not they have the word test in their name:
filter="test"
So, let's first loop through the directories in source that pass the filter; if they don't exist in dest let's move them there:
for dir in $(ls -d $src | grep $filter); do
if [ ! -d "$dest$dir" ]; then
mv "$src$dir" "$dest"
fi
done
I hope that solves your issue. But be warned, #gniourf_gniourf posted a link in the comments that should be heeded!
If you need to mv some directories to another according to some pattern, than you can use find:
find . -type d -name "test*" -exec mv -t /tmp/target {} +
Details:
-type d - will search only for directories
-name "" - set search pattern
-exec - do something with find results
-t, --target-directory=DIRECTORY move all SOURCE arguments into DIRECTORY
There are many examples of exec or xargs usage.
And if you do not want to overwrite files, than add -n option to mv command:
find . -type d -name "test*" -exec mv -n -t /tmp/target {} +
-n, --no-clobber do not overwrite an existing file

Removing Colons From Multiple FIles on Linux

I am trying to take some directories that and transfer them from Linux to Windows. The problem is that the files on Linux have colons in them. And I need to copy these directories (I cannot alter them directly since they are needed as they are the server) over to files with a name that Windows can use. For example, the name of a directory on the server might be:
IAPLTR2b-ERVK-LTR_chr9:113137544-113137860_-
while I need it to be:
IAPLTR2b-ERVK-LTR_chr9-113137544-113137860_-
I have about sixty of these directories and I have collected the names of the files with their absolute paths in a file I call directories.txt. I need to walk through this file changing the colons to hyphens. Thus far, my attempt is this:
#!/bin/bash
$DIRECTORIES=`cat directories.txt`
for $i in $DIRECTORIES;
do
cp -r "$DIRECTORIES" "`echo $DIRECTORIES | sed 's/:/-/'`"
done
However I get the error:
./my_shellscript.sh: line 10: =/bigpartition1/JKim_Test/test_bs_1/129c-test-biq/IAPLTR1_Mm-ERVK-LTR_chr10:104272652-104273004_+.fasta: No such file or directory ./my_shellscript.sh: line 14: `$i': not a valid identifier
Can anyone here help me identify what I am doing wrong and maybe what I need to do?
Thanks in advance.
This monstrosity will rename the directories in situ:
find tmp -depth -type d -exec sh -c '[ -d "{}" ] && echo mv "{}" "$(echo "{}" | tr : _)"' \;
I use -depth so it descends down into the deepest subdirectories first.
The [ -d "{}" ] is necessary because as soon as the subdirectory is renamed, its parent directory (as found by find) may no longer exist (having been renamed).
Change "echo mv" to "mv" if you're satisfied it will do what you want.

LINUX - shell script finding and listing all files with rights to write in directory tree

Here is the code that i have soo far :
echo $(pwd > adress)
var=$(head -1 adress)
rm adress
found=0 #Flag
fileshow()
{
cd $1
for i in *
do
if [ -d $i ]
then
continue
elif [ -w $i ]
then
echo $i
found=1
fi
done
cd ..
}
fileshow $1
if [ $found -eq 0 ]
then
clear
echo "$(tput setaf 1)There arent any executable files !!!$(tput sgr0)"
fi
Its working but it find files only in current directory.
I was told that i need to use some kind of recursive method to loop through all sub-directories but i dont know how to do it.
So if any one can help me i will be very grateful.
Thanks!
The effect of your script is to find the files below the current working directory that are not directories and are writeable to the current user. This can be achieved with the command:
find ./ -type f -writable
The advantage of using -type f is that it also excludes symbolic links and other special kinds of file, if that's what you want. If you want all files that are not directories (as suggested by your script), then you can use:
find ./ ! -type d -writable
If you want to sort these files (added question, assuming lexicographic ascending order), you can use sort:
find ./ -type f -writable | sort
If you want to use these sorted filenames for something else, the canonical pattern would be (to handle filenames with embedded newlines and other seldom-used characters):
while read -r -d $'\0'; do
echo "File '$REPLY' is an ordinary file and is writable"
done < <(find ./ -type f -writable -print0 | sort -z)
If you're using a very old version of find that does not support the handy -writable predicate (added to v.4.3 in 2005), then you only have file permissions to go on. You then have to be clear about what you mean by “writable” in the specific context (writable to whom?), and you can replace the -writable predicate with the -perm predicates described in #gregb's answer. If you decide that you mean “writable by anyone” you could use -perm /u=w,g=w,o=w or -perm /222, but there's actually no way of getting all the benefits of -writable just using permissions. Also note that the + form of permission tests to -perm is deprecated and should no longer be used; the / form should be used instead.
You could use find:
find /path/to/directory/ -type f -perm -o=w
Where the -o=w implies that each file has the "other write-permission" set.
or,
find /path/to/directory/ -type f -perm /u+w,g+w,o+w
Where /u+w,g+w,o+w implies that each file either has user, group, or other write-permissions set.

A bash script to run a program for directories that do not have a certain file

I need a Bash Script to Execute a program for all directories that do not have a specific file and create the output file on the same directory.This program needs an input file which exist in every directory with the name *.DNA.fasta.Suppose I have the following directories that may contain sub directories also
dir1/a.protein.fasta
dir2/b.protein.fasta
dir3/anyfile
dir4/x.orf.fasta
I have started by finding the directories that don't have that specific file whic name is *.protein.fasta
in this case I want the dir3 and dir4 to be listed (since they do not contain *.protein.fasta)
I have tried this code:
find . -maxdepth 1 -type d \! -exec test -e '{}/*protein.fasta' \; -print
but it seems I missed some thing it does not work.
also I do not know how to proceed for the whole story.
This is a tricky one.
I can't think of a good solution. But here's a solution, nevertheless. Note that this is guaranteed not to work if your directory or file names contain newlines, and it's not guaranteed to work if they contain other special characters. (I've only tested with the samples in your question.)
Also, I haven't included a -maxdepth because you said you need to search subdirectories too.
#!/bin/bash
# Create an associative array
declare -A excludes
# Build an associative array of directories containing the file
while read line; do
excludes[$(dirname "$line")]=1
echo "excluded: $(dirname "$line")" >&2
done <<EOT
$(find . -name "*protein.fasta" -print)
EOT
# Walk through all directories, print only those not in array
find . -type d \
| while read line ; do
if [[ ! ${excludes[$line]} ]]; then
echo "$line"
fi
done
For me, this returns:
.
./dir3
./dir4
All of which are directories that do not contain a file matching *.protein.fasta. Of course, you can replace the last echo "$line" with whatever you need to do with these directories.
Alternately:
If what you're really looking for is just the list of top-level directories that do not contain the matching file in any subdirectory, the following bash one-liner may be sufficient:
for i in *; do test -d "$i" && ( find "$i" -name '*protein.fasta' | grep -q . || echo "$i" ); done
#!/bin/bash
for dir in *; do
test -d "$dir" && ( find "$dir" -name '*protein.fasta' | grep -q . || Programfoo"$dir/$dir.DNA.fasta");
done

Resources