'find' files containing an integer in a specified range (in bash)

'find' files containing an integer in a specified range (in bash) - linux

You'd think I could find an answer to this already somewhere, but I am struggling to do so. I want to find some log files with names like
myfile_3.log
however I only want to find the ones with numbers in a certain range. I tried things like this:
find <path> -name myfile_{0..67}.log #error: find: paths must precede expression
find <path> -name myfile_[0-67].log #only return 0-7, not 67
find <path> -name myfile_[0,67].log #only returns 0,6,7
find <path> -name myfile_*([0,67]).log # returns only 0,6,7,60,66,67,70,76,77
Any other ideas?

If you want to match an integer range using regular expression, use the option -regex in the your find command.
For example to match all files from 0 to 67, use this:
find <path> -regextype egrep -regex '.*file([0-5][0-9]|6[0-7])\.txt'
There are 2 parts in the regex:
[0-5][0-9] matches the range 0-59
6[0-7] matches the range 60-67
Note the option -regextype egrep to have extended regular expression.
Note also the option -regex matches the whole filename, including path, that's the reason of .* at the beginning of the regex.

You can do this simply and concisely, but admittedly not very efficiently, with GNU Parallel:
parallel find . -name "*file{}.txt" ::: {0..67}
In case, you are wondering why I say it is not that efficient, it is because it starts 68 parallel instances of find - each looking for a different number in the filename... but that may be ok.

The following will find all files named myfile_X.log - whereby the X part is a digit ranging from 0-67.
find <path> -type f | grep -E "/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$"
Explanation:
-type f finds files whose type is file.
| pipes the filepath(s) to grep for filtering.
grep -E "/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$" performs an extended (-E) regexp to find the last part of the path (i.e. the filename) which:
begins with myfile_
followed with a digit(s) ranging from 0-67.
ends with .log
Edit:
Alternatively, as suggested by #ghoti in the comments, you can utilize the -regex option in the find command instead of piping to grep. For example:
find -E <path> -type f -regex ".*/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$"
Note: The regexp is very similar to the previous grep example shown previously. However, it begins with .*/ to match all parts of the filepath up to and including the final forward slash. For some reason, unknown to me, the .*/ part is not necessary with grep1.
Footnotes:
1If any readers know why the ERE utilized with find's -regex option requires the initial .* and the same ERE with grep does not - then please leave a comment. You'll make me sleep better at night ;)

One possibility is to build up the range from several ranges that can be matched by glob patterns. For example:
find . -name 'myfile_[0-9].log' -o -name 'myfile_[1-5][0-9].log' -o -name 'myfile_6[0-7].log'

You cannot represent a general range with a regular expression, although you can craft a regex for a specific range. Better use find to get files with a number and filter the output with another tool that perform the range checking, like awk.
START=0
END=67
while IFS= read -r -d '' file
do
N=$(echo "$file" | sed 's/file_\([0-9]\+\).log/\1/')
if [ "$N" -ge "$START" -a "$N" -le "$END" ]
then
echo "$file"
fi
done < <(find <path> -name "myfile_*.log" -print0)
In that script, you perform a find of all the files that have the desired pattern, then you loop through the found files and sed is used to capture the number in the filename. Finally, you compare that number with your range limits. If the comparisons succeed, the file is printed.
There are many other answers that give you a regex for the specific range in the example, but they are not general. Any of them allows for easy modification of the range involved.

Related

Count number of files (with arbitrary filenames) in a given directory

I am trying to count the number of files in a given directory matching a specific name pattern. While this initially sounded like a no-brainer the issue turned out to be more complicated than I ever thought because the filenames can contain spaces and other nasty characters.
So, starting from an initial find -name "${filePattern}" | wc -l I have by now now reached this expression:
find . -maxdepth 1 -regextype posix-egrep -regex "${filePattern}" -print0 | wc -l --files0-from=-
The maxdepth option restricts to the current directory only. The -print0 and the -files0-from options of find and wc, respectively, emit and accept filenames which are null-byte terminated. This is supposed to take care of possible special characters contained in the filenames.
BUT: the --files0-from= option interprets the strings as filenames and wc thus counts the lines contained in those files. But I simply need is the number of files themselves (i.e. the number of null-byte terminated strings emitted by the find). For that wc would need a -l0 (or possibly a -w0) option, which it doesn't seem to have. Any idea how can I count just the number of those names/strings?
And - yes: I realized that the syntax for the filePattern has to be different in the two variants. The former one uses shell syntax while the latter one requires "real" regex-syntax. But that's OK and actually what I want: it allows me to search for multiple file patterns in one go. The question is really just to count null-byte terminated strings.

You could delete all the non-NUL characters with tr, then count the number of characters remaining.
find . -maxdepth 1 -regextype posix-egrep -regex "${filePattern}" -print0 | tr -cd '\0' | wc -c
If you're dealing with a small-to-medium number of files, an alternate solution would be to store the matches in an array and check the array size. (As you touch on in your question, this would use glob syntax rather than regexes.)
files=(*foo* *bar*)
echo "${#files[#]}"

List files with names that contain alphabetic characters and any other symbols (i.e numbers, punctuation, etc.) and sort them by size

I need help modifying command i have already written.
That's what i was able to achieve:
find -type f -name '*[:alpha:]*' -exec ls -ltu {} \; | sort -k 5 -n -r
However, this command also finds filenames that cosist solely of alphabetic characters, so i need to get rid of them too. I have tried doing something like this to the code:
find -type f -name '*[:alpha:]*' -and ! -name '[:alpha:]' -exec ls -ltu {} \; | sort -k 5 -n -r
But it does nothing. I understand that something is wrong with my name formatting but i have no idea how to fix it.

Character classes like [:alpha:] may only be used within character range [..] expressions, e.g. [0-9_[:alpha:]]. They may not be used alone.
[:alpha:] by itself a character range expression equivalent to [ahlp:] and matches any of the characters "ahlp" or colons. It does not match alphabetical characters.
To find files that contains both at least one alphabetic and at least one non-alphabetic characters:
find dir -type f -name '*[[:alpha:]]*' -name '*[^[:alpha:]]*'

Command Linux to copy files from a certain weekday

I am figuring out a command to copy files that are modified on a Saturday.
find -type f -printf '%Ta\t%p\n'
This way the line starts with the weekday.
When I combine this with a 'egrep' command using a regular expression (starts with "za") it shows only the files which start with "za".
find -type f -printf '%Ta\t%p\n' | egrep "^(za)"
("za" is a Dutch abbreviation for "zaterdag", which means Saturday,
This works just fine.
Now I want to copy the files with this command:
find -type f -printf '%Ta\t%p\n' -exec cp 'egrep "^(za)" *' /home/richard/test/ \;
Unfortunately it doesn't work.
Any suggestions?

The immediate problem is that -printf and -exec are independent of each other. You want to process the result of -printf to decide whether or not to actually run the -exec part. Also, of course, passing an expression in single quotes simply passes a static string, and does not evaluate the expression in any way.
The immediate fix to the evaluation problem is to use a command substitution instead of single quotes, but the problem that the -printf function's result is not available to the command substitution still remains (and anyway, the command substitution would happen before find runs, not while it runs).
A common workaround would be to pass a shell script snippet to -exec, but that still doesn't expose the -printf function to the -exec part.
find whatever -printf whatever -exec sh -c '
case $something in za*) cp "$1" "$0"; esac' "$DEST_DIR" {} \;
so we have to figure out a different way to pass the $something here.
(The above uses a cheap trick to pass the value of $DEST_DIR into the subshell so we don't have to export it. The first argument to sh -c ... ends up in $0.)
Here is a somewhat roundabout way to accomplish this. We create a format string which can be passed to sh for evaluation. In order to avoid pesky file names, we print the inode numbers of matching files, then pass those to a second instance of find for performing the actual copying.
find $ -false $(find -type f \
-printf 'case %Ta in za*) printf "%%s\\n" "-o -inum %i";; esac\n' |
sh) $ -exec cp -t "$DEST_DIR" \+
Using the inode number means any file name can be processed correctly (including one containing newlines, single or double quotes, etc) but may increase running time significantly, because we need two runs of find. If you have a large directory tree, you will probably want to refactor this for your particular scenario (maybe run only in the current directory, and create a wrapper to run it in every directory you want to examine ... thinking out loud here; not sure it helps actually).
This uses features of GNU find which are not available e.g. in *BSD (including OSX). If you are not on Linux, maybe consider installing the GNU tools.

What you can do is a shell expansion. Something like
cp $(find -type f -printf '%Ta\t%p\n' | egrep "^(za)") $DEST_DIR
Assuming that the result of your find and grep is just the filenames (and full paths, at that), this will copy all the files that match your criteria to whatever you set $DEST_DIR to.
EDIT As mentioned in the comments, this won't work if your filenames contain spaces. If that's the case, you can do something like this:
find -type f -printf '%Ta\t%p\n' | egrep "^(za)" | while read file; do cp "$file" $DEST_DIR; done

Replace a part of statement with another in whole source code

I am trying to find the whole source code for occurrences of, say, "MY_NAME" and want to replace it with, say, "YOUR_NAME". I already know the files and the line numbers where they occur and i want to make a patch for the same so that anyone running the patch can do the same. Can anyone please help?

You can do it by console. Just use find to locate destination files, and then you can declare what you want to replace with what sentence. In example:
find -name '*' | xargs perl -pi -e 's/MY_NAME/YOUR_NAME/g'

It might be easier to do a sed command, and then generate a patch.
sed -e '12s/MY_NAME/YOUR_NAME/g;32s/MY_NAME/YOUR_NAME/g' file > file2
This will replace MY_NAME with YOUR_NAME on lines 12 and 32, and save the output into file2.
You can also generate a sed script if there are many changes:
#!/bin/sed -f
12s/MY_NAME/YOUR_NAME/g
32s/MY_NAME/YOUR_NAME/g
Then, for applying to many files, you should use find:
find -type f '(' -iname "*.c" -or -iname "*.h" ')' -exec "./script.sed" '{}' \;
Hope this helps =)

Use the command diff to create a patch-file that can then be distributed and applied with the patch-command.
man diff Will give you a lot of information on the process.

Remove special characters in linux files

I have a lot of files *.java, *.xml. But a guy wrote some comments and Strings with spanish characters. I been searching on the web how to remove them.
I tried find . -type f -exec sed 's/[áíéóúñ]//g' DefaultAuthoritiesPopulator.java just as an example, how can i remove these characters from many other files in subfolders?

If that's what you really want, you can use find, almost as you are using it.
find -type f \( -iname '*.java' -or -iname '*.xml' \) -execdir sed -i 's/[áíéóúñ]//g' '{}' ';'
The differences:
The path . is implicit if no path is supplied.
This command only operates on *.java and *.xml files.
execdir is more secure than exec (read the man page).
-i tells sed to modify the file argument in place. Read the man page to see how to use it to make a backup.
{} represents a path argument which find will substitute in.
The ; is part of the find syntax for exec/execdir.

You're almost there :)
find . -type f -exec sed -i 's/[áíéóúñ]//g' {} \;
^^ ^^
From sed(1):
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied)
From find(1):
-exec command ;
Execute command; true if 0 status is returned. All
following arguments to find are taken to be arguments to
the command until an argument consisting of `;' is
encountered. The string `{}' is replaced by the current
file name being processed everywhere it occurs in the
arguments to the command, not just in arguments where it
is alone, as in some versions of find. Both of these
constructions might need to be escaped (with a `\') or
quoted to protect them from expansion by the shell. See
the EXAMPLES section for examples of the use of the -exec
option. The specified command is run once for each
matched file. The command is executed in the starting
directory. There are unavoidable security problems
surrounding use of the -exec action; you should use the
-execdir option instead.

tr is the tool for the job:
NAME
tr - translate or delete characters
SYNOPSIS
tr [OPTION]... SET1 [SET2]
DESCRIPTION
Translate, squeeze, and/or delete characters from standard input, writing to standard out‐
put.
-c, -C, --complement
use the complement of SET1
-d, --delete
delete characters in SET1, do not translate
-s, --squeeze-repeats
replace each input sequence of a repeated character that is listed in SET1 with a
single occurrence of that character
piping your input through tr -d áíéóúñ will probably do what you want.

Why are you trying to remove only characters with diacritic signs? It probably worth removing all characters with codes not in the range 0-127, so removal regexp will be s/[\0x80-\0xFF]//g if you're sure that your files should not contain higher ascii.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string