Shell script find command for matching file names with one or the other word - linux

I am looking for a command for bash shell script to find the list of files in a directory whose names start with either WordA or WordB and end with digits.
I have this command with WordA and duplicating the same code with WordB.
find /log/ -name 'WordA*[[:digit:]]'
I tried putting Or condition in various formats such as (WordA|WordB), [WordA|WordB], [(WordA)|(WordB)], [(WordA)||(WordB)] to match WordA or WordB followed by digits.
None of them worked.

You need to use the -regextype supported in find command, in the case used the posix-extended type and do
-regextype type
Changes the regular expression syntax understood by -regex and -iregex
tests which occur later on the command line. Currently-implemented
types are emacs (this is the default), posix-awk, posix-basic,
posix-egrep and posix-extended.
So your command should be
find /log/ -regextype posix-extended -regex './(WordA|WordB)([[:digit:]]+)$'

In the find man page, under header Expressions, you'll find (no pun intended) the following
Operators
Operators join together the other items within the expression.
They include for example -o (meaning logical OR) and -a (meaning
logical AND). Where an operator is missing, -a is assumed.
Thus the answer to your problem appears to be
find /log/ -name 'WordA*[[:digit:]]' -o 'WordB*[[:digit:]]'

Related

'find' files containing an integer in a specified range (in bash)

You'd think I could find an answer to this already somewhere, but I am struggling to do so. I want to find some log files with names like
myfile_3.log
however I only want to find the ones with numbers in a certain range. I tried things like this:
find <path> -name myfile_{0..67}.log #error: find: paths must precede expression
find <path> -name myfile_[0-67].log #only return 0-7, not 67
find <path> -name myfile_[0,67].log #only returns 0,6,7
find <path> -name myfile_*([0,67]).log # returns only 0,6,7,60,66,67,70,76,77
Any other ideas?
If you want to match an integer range using regular expression, use the option -regex in the your find command.
For example to match all files from 0 to 67, use this:
find <path> -regextype egrep -regex '.*file([0-5][0-9]|6[0-7])\.txt'
There are 2 parts in the regex:
[0-5][0-9] matches the range 0-59
6[0-7] matches the range 60-67
Note the option -regextype egrep to have extended regular expression.
Note also the option -regex matches the whole filename, including path, that's the reason of .* at the beginning of the regex.
You can do this simply and concisely, but admittedly not very efficiently, with GNU Parallel:
parallel find . -name "*file{}.txt" ::: {0..67}
In case, you are wondering why I say it is not that efficient, it is because it starts 68 parallel instances of find - each looking for a different number in the filename... but that may be ok.
The following will find all files named myfile_X.log - whereby the X part is a digit ranging from 0-67.
find <path> -type f | grep -E "/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$"
Explanation:
-type f finds files whose type is file.
| pipes the filepath(s) to grep for filtering.
grep -E "/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$" performs an extended (-E) regexp to find the last part of the path (i.e. the filename) which:
begins with myfile_
followed with a digit(s) ranging from 0-67.
ends with .log
Edit:
Alternatively, as suggested by #ghoti in the comments, you can utilize the -regex option in the find command instead of piping to grep. For example:
find -E <path> -type f -regex ".*/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$"
Note: The regexp is very similar to the previous grep example shown previously. However, it begins with .*/ to match all parts of the filepath up to and including the final forward slash. For some reason, unknown to me, the .*/ part is not necessary with grep1.
Footnotes:
1If any readers know why the ERE utilized with find's -regex option requires the initial .* and the same ERE with grep does not - then please leave a comment. You'll make me sleep better at night ;)
One possibility is to build up the range from several ranges that can be matched by glob patterns. For example:
find . -name 'myfile_[0-9].log' -o -name 'myfile_[1-5][0-9].log' -o -name 'myfile_6[0-7].log'
You cannot represent a general range with a regular expression, although you can craft a regex for a specific range. Better use find to get files with a number and filter the output with another tool that perform the range checking, like awk.
START=0
END=67
while IFS= read -r -d '' file
do
N=$(echo "$file" | sed 's/file_\([0-9]\+\).log/\1/')
if [ "$N" -ge "$START" -a "$N" -le "$END" ]
then
echo "$file"
fi
done < <(find <path> -name "myfile_*.log" -print0)
In that script, you perform a find of all the files that have the desired pattern, then you loop through the found files and sed is used to capture the number in the filename. Finally, you compare that number with your range limits. If the comparisons succeed, the file is printed.
There are many other answers that give you a regex for the specific range in the example, but they are not general. Any of them allows for easy modification of the range involved.

Finding files of fixed length

I am trying to find file names that start with the letter 'a' and are of length 6. I have tried many variations, the latest one being:
find /usr/bin -type f -regex "^[a]" > grep {6}
However I get an error message of:
find: paths must precede expression: {6}
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
What am I doing wrong?
Without any regexes, just globbing:
find /usr/bin -type f -name 'a?????'
References:
Findutils manual: Shell pattern matching
Bash manual, Filename expansion and pattern matching
I would use the following command which is using extended posix regexes:
find /usr/bin -type f -regextype posix-extended -regex '.*/a.{5}'
Let me explain the pattern from the end:
.{5} matches five arbitrary characters
a matches a literal a
the / matches the path delimiter right before the filename
.* is the path, in this case /usr/bin
Btw, a simple command which does not even require a special regex engine would be:
find /usr/bin -type f -regex '.*/a.....'
$ is the end of the filename
..... are five arbitrary characters
a is a literal a
.*/ is the preceding path
Another thing. While your regex is wrong and grep is not required at all, why do you get this strange error message?
You are using find ... > grep where I think you wanted to use find ... | grep. Note that > will redirect the output of the find command to a file. In this case a file named grep. If you want to redirect the output of the find command into the input of a grep command you need to use the pipe symbol find ... | grep.
A > filename redirection can appear anywhere in a command line, it does not necessarily have to be at the end. That' why {6} is interpreted as the last argument to find. Since this argument is not expected, find supposed that you accidentally passed a search path at the end, which is a common mistake. That's why the message.

How to find files with specific pattern in directory with specific number? Linux

I've got folders named folder1 all the way up to folder150 and maybe beyond.. but I only want to find the complete path to text files in some of the folders (for example folder1 to folder50).
I thought a command like the following might work, but it is incorrect.
find '/path/to/directory/folder{1..50}' -name '*.txt'
The solution doesn't have to use find, as long as it does the correct thing.
find /path/to/directory/folder{1..50} -name '*.txt' 2>/dev/null
Or only basename
find /path/to/directory/folder{1..50} -name '*.txt' -exec basename {} \; 2>/dev/null
Or basename without .txt
find /path/to/directory/folder{1..50} -name '*.txt' -exec basename {} .txt \; 2>/dev/null
V. Michel's answer directly solves your problem; to complement it with an explanation:
Bash's brace expansion is only applied to unquoted strings; your solution attempt uses a single-quoted string, whose contents are by definition interpreted as literals.
Contrast the following two statements:
# WRONG:
# {...} inside a single-quoted (or double-quoted) string: interpreted as *literal*.
echo 'folder{1..3}' # -> 'folder{1..3}'
# OK:
# Unquoted use of {...} -> *brace expansion* is applied.
echo 'folder'{1..3} # -> 'folder1 folder2 folder 3'
Note how only the brace expression is left unquoted in the 2nd example above, which demonstrates that you can selectively mix quoted and unquoted substrings in Bash.
It is worth noting that it is - and can only be - Bash that performs brace expansion here, and find only sees the resulting, literal paths.[1]
find only accepts literal paths as filename operands.
(Some of find's primaries (tests), such as -name and -path, do support globs (as demonstrated in the question), but not brace expansion; to ensure that such globs are passed through intact to find, without premature expansion by Bash, they must be quoted; e.g., -name '*.txt')
[1] After Bash performs brace expansion, globbing (pathname expansion) may occur in addition, as demonstrated in ehaymore's answer; folder(?,[1-4]?,50) is brace-expanded to tokens folder?, folder[1-4]?, and folder50, the first two of which are subject to globbing, due to containing pattern metacharacters (?, [...]). Whether globbing is involved or not, the target program ultimately only sees the resulting literal paths.
You can give multiple directories to the find command, each matching part of the pattern you're looking for. For example,
find /path/to/directory/folder{?,[1-4]?,50} -name '*.txt'
which expands to three patterns:
folder? (matches 0-9)
folder[1-4]? (matches 10-49)
folder50
The question mark is a single-character wildcard.

Linux find command shell expansion

I have just a little question I don't understand with the find command.
I can do this :
[root#hostnaoem# ❯❯❯ls /proc/*/fd
But this give me an error :
[root#hostnaoem# ❯❯❯ find /proc/*/fd -ls
find: `/proc/*/fd': No such file or directory
even if I use "/proc//fd", /proc/""/fd or "/proc/*/fd"
I've searched wha find shell expansion says about that, but I found nothing. Can someone tell me why?
Thanks
If you just RTFM, you'll learn that the syntax for find is:
find [-H] [-L] [-P] [-D debugopts] [-Olevel] [path...] [expression]
The usually used subset of that is:
find whereToSearch (-howToSearch arg)*
To find all files|directories named fd in /proc:
find /proc -name fd
-name is the most common howToSearch expression:
-name pattern
Base of file name (the path with the leading directories
removed) matches shell pattern pattern. The metacharacters
(`*', `?', and `[]') match a `.' at the start of the base name
(this is a change in findutils-4.2.2; see section STANDARDS CON‐
FORMANCE below). To ignore a directory and the files under it,
use -prune; see an example in the description of -path. Braces
are not recognised as being special, despite the fact that some
shells including Bash imbue braces with a special meaning in
shell patterns. The filename matching is performed with the use
of the fnmatch(3) library function. Don't forget to enclose
the pattern in quotes in order to protect it from expansion by
the shell.
(Note the the last sentence)
If your pattern contains slashes, you need -path or -wholename (same thing):
find /proc/ -wholename '/proc/[0-9]*/fd' 2>/dev/null
Other expressions you might want to use are:
-type
-depth, -mindepth, -maxdepth
-user, -uid
See find(1) to learn more about each search expressions. If you want to search the in-terminal manual (man find or man 1 find), you can use the / character to enter search mode (like Ctrl+F in most GUI apps).
Usage of ls with globbing (*) is generally a code smell. Unless you use the -d flag, it'll list the contents of the directories that match the glob pattern in addition to the matches.
I find the echo globpattern form generally more convenient for viewing the results of a glob pattern match.
This work :
[root#hostname # ❯❯❯ find /proc/ -path /proc/*/fd -ls
Regards.

Remove special characters in linux files

I have a lot of files *.java, *.xml. But a guy wrote some comments and Strings with spanish characters. I been searching on the web how to remove them.
I tried find . -type f -exec sed 's/[áíéóúñ]//g' DefaultAuthoritiesPopulator.java just as an example, how can i remove these characters from many other files in subfolders?
If that's what you really want, you can use find, almost as you are using it.
find -type f \( -iname '*.java' -or -iname '*.xml' \) -execdir sed -i 's/[áíéóúñ]//g' '{}' ';'
The differences:
The path . is implicit if no path is supplied.
This command only operates on *.java and *.xml files.
execdir is more secure than exec (read the man page).
-i tells sed to modify the file argument in place. Read the man page to see how to use it to make a backup.
{} represents a path argument which find will substitute in.
The ; is part of the find syntax for exec/execdir.
You're almost there :)
find . -type f -exec sed -i 's/[áíéóúñ]//g' {} \;
^^ ^^
From sed(1):
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied)
From find(1):
-exec command ;
Execute command; true if 0 status is returned. All
following arguments to find are taken to be arguments to
the command until an argument consisting of `;' is
encountered. The string `{}' is replaced by the current
file name being processed everywhere it occurs in the
arguments to the command, not just in arguments where it
is alone, as in some versions of find. Both of these
constructions might need to be escaped (with a `\') or
quoted to protect them from expansion by the shell. See
the EXAMPLES section for examples of the use of the -exec
option. The specified command is run once for each
matched file. The command is executed in the starting
directory. There are unavoidable security problems
surrounding use of the -exec action; you should use the
-execdir option instead.
tr is the tool for the job:
NAME
tr - translate or delete characters
SYNOPSIS
tr [OPTION]... SET1 [SET2]
DESCRIPTION
Translate, squeeze, and/or delete characters from standard input, writing to standard out‐
put.
-c, -C, --complement
use the complement of SET1
-d, --delete
delete characters in SET1, do not translate
-s, --squeeze-repeats
replace each input sequence of a repeated character that is listed in SET1 with a
single occurrence of that character
piping your input through tr -d áíéóúñ will probably do what you want.
Why are you trying to remove only characters with diacritic signs? It probably worth removing all characters with codes not in the range 0-127, so removal regexp will be s/[\0x80-\0xFF]//g if you're sure that your files should not contain higher ascii.

Resources