How to find files with specific pattern in directory with specific number? Linux

How to find files with specific pattern in directory with specific number? Linux - linux

I've got folders named folder1 all the way up to folder150 and maybe beyond.. but I only want to find the complete path to text files in some of the folders (for example folder1 to folder50).
I thought a command like the following might work, but it is incorrect.
find '/path/to/directory/folder{1..50}' -name '*.txt'
The solution doesn't have to use find, as long as it does the correct thing.

find /path/to/directory/folder{1..50} -name '*.txt' 2>/dev/null
Or only basename
find /path/to/directory/folder{1..50} -name '*.txt' -exec basename {} \; 2>/dev/null
Or basename without .txt
find /path/to/directory/folder{1..50} -name '*.txt' -exec basename {} .txt \; 2>/dev/null

V. Michel's answer directly solves your problem; to complement it with an explanation:
Bash's brace expansion is only applied to unquoted strings; your solution attempt uses a single-quoted string, whose contents are by definition interpreted as literals.
Contrast the following two statements:
# WRONG:
# {...} inside a single-quoted (or double-quoted) string: interpreted as *literal*.
echo 'folder{1..3}' # -> 'folder{1..3}'
# OK:
# Unquoted use of {...} -> *brace expansion* is applied.
echo 'folder'{1..3} # -> 'folder1 folder2 folder 3'
Note how only the brace expression is left unquoted in the 2nd example above, which demonstrates that you can selectively mix quoted and unquoted substrings in Bash.
It is worth noting that it is - and can only be - Bash that performs brace expansion here, and find only sees the resulting, literal paths.[1]
find only accepts literal paths as filename operands.
(Some of find's primaries (tests), such as -name and -path, do support globs (as demonstrated in the question), but not brace expansion; to ensure that such globs are passed through intact to find, without premature expansion by Bash, they must be quoted; e.g., -name '*.txt')
[1] After Bash performs brace expansion, globbing (pathname expansion) may occur in addition, as demonstrated in ehaymore's answer; folder(?,[1-4]?,50) is brace-expanded to tokens folder?, folder[1-4]?, and folder50, the first two of which are subject to globbing, due to containing pattern metacharacters (?, [...]). Whether globbing is involved or not, the target program ultimately only sees the resulting literal paths.

You can give multiple directories to the find command, each matching part of the pattern you're looking for. For example,
find /path/to/directory/folder{?,[1-4]?,50} -name '*.txt'
which expands to three patterns:
folder? (matches 0-9)
folder[1-4]? (matches 10-49)
folder50
The question mark is a single-character wildcard.

Related

'find' files containing an integer in a specified range (in bash)

You'd think I could find an answer to this already somewhere, but I am struggling to do so. I want to find some log files with names like
myfile_3.log
however I only want to find the ones with numbers in a certain range. I tried things like this:
find <path> -name myfile_{0..67}.log #error: find: paths must precede expression
find <path> -name myfile_[0-67].log #only return 0-7, not 67
find <path> -name myfile_[0,67].log #only returns 0,6,7
find <path> -name myfile_*([0,67]).log # returns only 0,6,7,60,66,67,70,76,77
Any other ideas?

If you want to match an integer range using regular expression, use the option -regex in the your find command.
For example to match all files from 0 to 67, use this:
find <path> -regextype egrep -regex '.*file([0-5][0-9]|6[0-7])\.txt'
There are 2 parts in the regex:
[0-5][0-9] matches the range 0-59
6[0-7] matches the range 60-67
Note the option -regextype egrep to have extended regular expression.
Note also the option -regex matches the whole filename, including path, that's the reason of .* at the beginning of the regex.

You can do this simply and concisely, but admittedly not very efficiently, with GNU Parallel:
parallel find . -name "*file{}.txt" ::: {0..67}
In case, you are wondering why I say it is not that efficient, it is because it starts 68 parallel instances of find - each looking for a different number in the filename... but that may be ok.

The following will find all files named myfile_X.log - whereby the X part is a digit ranging from 0-67.
find <path> -type f | grep -E "/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$"
Explanation:
-type f finds files whose type is file.
| pipes the filepath(s) to grep for filtering.
grep -E "/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$" performs an extended (-E) regexp to find the last part of the path (i.e. the filename) which:
begins with myfile_
followed with a digit(s) ranging from 0-67.
ends with .log
Edit:
Alternatively, as suggested by #ghoti in the comments, you can utilize the -regex option in the find command instead of piping to grep. For example:
find -E <path> -type f -regex ".*/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$"
Note: The regexp is very similar to the previous grep example shown previously. However, it begins with .*/ to match all parts of the filepath up to and including the final forward slash. For some reason, unknown to me, the .*/ part is not necessary with grep1.
Footnotes:
1If any readers know why the ERE utilized with find's -regex option requires the initial .* and the same ERE with grep does not - then please leave a comment. You'll make me sleep better at night ;)

One possibility is to build up the range from several ranges that can be matched by glob patterns. For example:
find . -name 'myfile_[0-9].log' -o -name 'myfile_[1-5][0-9].log' -o -name 'myfile_6[0-7].log'

You cannot represent a general range with a regular expression, although you can craft a regex for a specific range. Better use find to get files with a number and filter the output with another tool that perform the range checking, like awk.
START=0
END=67
while IFS= read -r -d '' file
do
N=$(echo "$file" | sed 's/file_\([0-9]\+\).log/\1/')
if [ "$N" -ge "$START" -a "$N" -le "$END" ]
then
echo "$file"
fi
done < <(find <path> -name "myfile_*.log" -print0)
In that script, you perform a find of all the files that have the desired pattern, then you loop through the found files and sed is used to capture the number in the filename. Finally, you compare that number with your range limits. If the comparisons succeed, the file is printed.
There are many other answers that give you a regex for the specific range in the example, but they are not general. Any of them allows for easy modification of the range involved.

Linux find command shell expansion

I have just a little question I don't understand with the find command.
I can do this :
[root#hostnaoem# ❯❯❯ls /proc/*/fd
But this give me an error :
[root#hostnaoem# ❯❯❯ find /proc/*/fd -ls
find: `/proc/*/fd': No such file or directory
even if I use "/proc//fd", /proc/""/fd or "/proc/*/fd"
I've searched wha find shell expansion says about that, but I found nothing. Can someone tell me why?
Thanks

If you just RTFM, you'll learn that the syntax for find is:
find [-H] [-L] [-P] [-D debugopts] [-Olevel] [path...] [expression]
The usually used subset of that is:
find whereToSearch (-howToSearch arg)*
To find all files|directories named fd in /proc:
find /proc -name fd
-name is the most common howToSearch expression:
-name pattern
Base of file name (the path with the leading directories
removed) matches shell pattern pattern. The metacharacters
(`*', `?', and `[]') match a `.' at the start of the base name
(this is a change in findutils-4.2.2; see section STANDARDS CON‐
FORMANCE below). To ignore a directory and the files under it,
use -prune; see an example in the description of -path. Braces
are not recognised as being special, despite the fact that some
shells including Bash imbue braces with a special meaning in
shell patterns. The filename matching is performed with the use
of the fnmatch(3) library function. Don't forget to enclose
the pattern in quotes in order to protect it from expansion by
the shell.
(Note the the last sentence)
If your pattern contains slashes, you need -path or -wholename (same thing):
find /proc/ -wholename '/proc/[0-9]*/fd' 2>/dev/null
Other expressions you might want to use are:
-type
-depth, -mindepth, -maxdepth
-user, -uid
See find(1) to learn more about each search expressions. If you want to search the in-terminal manual (man find or man 1 find), you can use the / character to enter search mode (like Ctrl+F in most GUI apps).
Usage of ls with globbing (*) is generally a code smell. Unless you use the -d flag, it'll list the contents of the directories that match the glob pattern in addition to the matches.
I find the echo globpattern form generally more convenient for viewing the results of a glob pattern match.

This work :
[root#hostname # ❯❯❯ find /proc/ -path /proc/*/fd -ls
Regards.

Find files with a certain extension that exceeds a certain file size

I'm having trouble with the find command in bash.
I'm trying to find a file that ends with .c and has a file size bigger than 2000 bytes. I thought it would be:
find $HOME -type f -size +2000c .c$
But obviously that isn't correct.
What am I doing wrong?

find $HOME -type f -name "*.c" -size +2000c
Have a look to the -name switch in the mane page:
-name pattern
Base of file name (the path with the leading directories
removed) matches shell pattern pattern. The metacharacters
(`*', `?', and `[]') match a `.' at the start of the base name
(this is a change in findutils-4.2.2; see section STANDARDS CON‐
FORMANCE below). To ignore a directory and the files under it,
use -prune; see an example in the description of -path. Braces
are not recognised as being special, despite the fact that some
shells including Bash imbue braces with a special meaning in
shell patterns. The filename matching is performed with the use
of the fnmatch(3) library function. Don't forget to enclose
the pattern in quotes in order to protect it from expansion by
the shell.
Note the suggestion at the end to always enclose the pattern inside quotes. The order of the options is not relevant. Have, again, a look to the man page:
EXPRESSIONS
The expression is made up of options (which affect overall operation
rather than the processing of a specific file, and always return true),
tests (which return a true or false value), and actions (which have
side effects and return a true or false value), all separated by opera‐
tors. -and is assumed where the operator is omitted.
If the expression contains no actions other than -prune, -print is per‐
formed on all files for which the expression is true.
So, options are, by default, connected with and -and operator: they've to be all true in order to find a file and the order doesn't matter at all. The order could be relevant only for more complicated pattern matching where there are other operators than -and.

Try this:
find $HOME -type f -size +2000c -name *.c

Try the following:
find $HOME -type f -size +2000c -name *.c

renaming with find

I managed to find several files with the find command.
the files are of the type file_sakfksanf.txt, file_afsjnanfs.pdf, file_afsnjnjans.cpp,
now I want to rename them with the rename and -exec command to
mywish_sakfksanf.txt, mywish_afsjnanfs.pdf, mywish_afsnjnjans.cpp
that only the first prefix is changed. I am trying for some time, so don't blame me for being stupid.

If you read through the -exec section of the man pages for find you will come across the {} string that allows you to use the matches as arguments within -exec. This will allow you to use rename on your find matches in the following way:
find . -name 'file_*' -exec rename 's/file_/mywish_/' {} \;
From the manual:
-exec command ;
Execute command; true if 0 status is returned. All following
arguments to find are taken to be arguments to the command until an
argument consisting of ;' is encountered. The string{}' is replaced
by the current file name being processed everywhere it occurs in the
arguments to the command, not just in arguments where it is alone, as
in some versions of find. Both of these constructions might need to
be escaped (with a `\') or quoted to protect them from expansion by
the shell. See the EXAMPLES section for examples of the use of the
-exec option. The specified command is run once for each matched file. The command is executed in the starting directory.There are
unavoidable security problems surrounding use of the -exec action;
you should use the -execdir option instead.
Although you asked for a find/exec solution, as Mark Reed suggested, you might want to consider piping your results to xargs. If you do, make sure to use the -print0 option with find and either the -0 or -null option with xargs to avoid unexpected behaviour resulting from whitespace or shell metacharacters appearing in your file names. Also, consider using the + version of -exec (also in the manual) as this is the POSIX spec for find and should therefore be more portable if you are wanting to run your command elsewhere (not always true); it also builds its command line in a way similar to xargs which should result in less invocations of rename.

Don't think there's a way you can do this with just find, you'll need to create a script:
#!/bin/bash
NEW=`echo $1 | sed -e 's/file_/mywish_/'`
mv $1 ${NEW}
THen you can:
find ./ -name 'file_*' -exec my_script {} \;

Remove special characters in linux files

I have a lot of files *.java, *.xml. But a guy wrote some comments and Strings with spanish characters. I been searching on the web how to remove them.
I tried find . -type f -exec sed 's/[áíéóúñ]//g' DefaultAuthoritiesPopulator.java just as an example, how can i remove these characters from many other files in subfolders?

If that's what you really want, you can use find, almost as you are using it.
find -type f \( -iname '*.java' -or -iname '*.xml' \) -execdir sed -i 's/[áíéóúñ]//g' '{}' ';'
The differences:
The path . is implicit if no path is supplied.
This command only operates on *.java and *.xml files.
execdir is more secure than exec (read the man page).
-i tells sed to modify the file argument in place. Read the man page to see how to use it to make a backup.
{} represents a path argument which find will substitute in.
The ; is part of the find syntax for exec/execdir.

You're almost there :)
find . -type f -exec sed -i 's/[áíéóúñ]//g' {} \;
^^ ^^
From sed(1):
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied)
From find(1):
-exec command ;
Execute command; true if 0 status is returned. All
following arguments to find are taken to be arguments to
the command until an argument consisting of `;' is
encountered. The string `{}' is replaced by the current
file name being processed everywhere it occurs in the
arguments to the command, not just in arguments where it
is alone, as in some versions of find. Both of these
constructions might need to be escaped (with a `\') or
quoted to protect them from expansion by the shell. See
the EXAMPLES section for examples of the use of the -exec
option. The specified command is run once for each
matched file. The command is executed in the starting
directory. There are unavoidable security problems
surrounding use of the -exec action; you should use the
-execdir option instead.

tr is the tool for the job:
NAME
tr - translate or delete characters
SYNOPSIS
tr [OPTION]... SET1 [SET2]
DESCRIPTION
Translate, squeeze, and/or delete characters from standard input, writing to standard out‐
put.
-c, -C, --complement
use the complement of SET1
-d, --delete
delete characters in SET1, do not translate
-s, --squeeze-repeats
replace each input sequence of a repeated character that is listed in SET1 with a
single occurrence of that character
piping your input through tr -d áíéóúñ will probably do what you want.

Why are you trying to remove only characters with diacritic signs? It probably worth removing all characters with codes not in the range 0-127, so removal regexp will be s/[\0x80-\0xFF]//g if you're sure that your files should not contain higher ascii.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string