List files with names that contain alphabetic characters and any other symbols (i.e numbers, punctuation, etc.) and sort them by size - linux

I need help modifying command i have already written.
That's what i was able to achieve:
find -type f -name '*[:alpha:]*' -exec ls -ltu {} \; | sort -k 5 -n -r
However, this command also finds filenames that cosist solely of alphabetic characters, so i need to get rid of them too. I have tried doing something like this to the code:
find -type f -name '*[:alpha:]*' -and ! -name '[:alpha:]' -exec ls -ltu {} \; | sort -k 5 -n -r
But it does nothing. I understand that something is wrong with my name formatting but i have no idea how to fix it.

Character classes like [:alpha:] may only be used within character range [..] expressions, e.g. [0-9_[:alpha:]]. They may not be used alone.
[:alpha:] by itself a character range expression equivalent to [ahlp:] and matches any of the characters "ahlp" or colons. It does not match alphabetical characters.
To find files that contains both at least one alphabetic and at least one non-alphabetic characters:
find dir -type f -name '*[[:alpha:]]*' -name '*[^[:alpha:]]*'

Related

List files that start with one or two numeric characters and a dash [duplicate]

This question already has answers here:
use find to match filenames starting with number
(3 answers)
Closed 1 year ago.
I want to select all the files in the current folder that start with either one or two numerical character.
e.g.:
1- filenameA
2- filenameB
....
17- filenameT
18- filenameU
I would like to know how I select only the files that start with up to 2 numerical characters.
So far I have tried
$ find . -name '[0-9]{1,2}*'
But it doesn't return anything, which I don't understand why. My reasoning when writing this command was:
[0-9] select any string starting with a number from 0 to 9
{1,2} and this can be repeated 1x or 2x
What am I getting wrong?
my INelegant solution so far
run the below two commands to tackle first the [0-9] range and then [10-99] range
$ find . -name '[0-9]-*'
$ find . -name '[0-9][0-9]-*'
You don't need find for that.
$ touch 7-foo 42-bar 69-baz
$ printf '%s\n' [0-9]{,[0-9]}-*
7-foo
42-bar
69-baz
$ shopt -s nullglob
$ printf '%s\n' {0..99}-*
7-foo
42-bar
69-baz
You can use gnu find like this with -regex option:
find . -maxdepth 1 -type f -regex '.*/[0-9][0-9]?-.*'
Details:
-maxdepth 1: In current directory only
-type f: Match files only
-regex '.*/[0-9][0-9]?-.*': Match 1 or 2 digits in filename at the start before matching a -
If you don't have gnu find then you may use:
find . -maxdepth 1 -type f \( -name '[0-9][0-9]-*' -o -name '[0-9]-*' \)
In your initial code, you are trying to use a regular expression where find's -name is expecting you to use shell pattern matching notation.
Your original "INelegant solution" using -name '[0-9]*' will fail because [0-9]* matches all files starting with a digit not just those with only one digit. Your updated solution should work better and can be written as a single command:
find \( -name '[0-9]-*' -o -name '[0-9][0-9]-*' \) ...
Alternatively, with POSIX find, you could search for filenames that start with a digit but exclude any whose third character is a digit:
find . -type f -name '[0-9]*' ! -name '??[0-9]*'
To not descend into sub-directories is slightly complicated if your find does not have -maxdepth option:
find . ! -name . -prune -type f -name '[0-9]*' ! -name '??[0-9]*'
! -name . matches everything except the starting directory. Applying -prune to them avoids the sub-directory descent.

'find' files containing an integer in a specified range (in bash)

You'd think I could find an answer to this already somewhere, but I am struggling to do so. I want to find some log files with names like
myfile_3.log
however I only want to find the ones with numbers in a certain range. I tried things like this:
find <path> -name myfile_{0..67}.log #error: find: paths must precede expression
find <path> -name myfile_[0-67].log #only return 0-7, not 67
find <path> -name myfile_[0,67].log #only returns 0,6,7
find <path> -name myfile_*([0,67]).log # returns only 0,6,7,60,66,67,70,76,77
Any other ideas?
If you want to match an integer range using regular expression, use the option -regex in the your find command.
For example to match all files from 0 to 67, use this:
find <path> -regextype egrep -regex '.*file([0-5][0-9]|6[0-7])\.txt'
There are 2 parts in the regex:
[0-5][0-9] matches the range 0-59
6[0-7] matches the range 60-67
Note the option -regextype egrep to have extended regular expression.
Note also the option -regex matches the whole filename, including path, that's the reason of .* at the beginning of the regex.
You can do this simply and concisely, but admittedly not very efficiently, with GNU Parallel:
parallel find . -name "*file{}.txt" ::: {0..67}
In case, you are wondering why I say it is not that efficient, it is because it starts 68 parallel instances of find - each looking for a different number in the filename... but that may be ok.
The following will find all files named myfile_X.log - whereby the X part is a digit ranging from 0-67.
find <path> -type f | grep -E "/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$"
Explanation:
-type f finds files whose type is file.
| pipes the filepath(s) to grep for filtering.
grep -E "/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$" performs an extended (-E) regexp to find the last part of the path (i.e. the filename) which:
begins with myfile_
followed with a digit(s) ranging from 0-67.
ends with .log
Edit:
Alternatively, as suggested by #ghoti in the comments, you can utilize the -regex option in the find command instead of piping to grep. For example:
find -E <path> -type f -regex ".*/myfile_([0-9]|[0-5][0-9]|6[0-7])\.log$"
Note: The regexp is very similar to the previous grep example shown previously. However, it begins with .*/ to match all parts of the filepath up to and including the final forward slash. For some reason, unknown to me, the .*/ part is not necessary with grep1.
Footnotes:
1If any readers know why the ERE utilized with find's -regex option requires the initial .* and the same ERE with grep does not - then please leave a comment. You'll make me sleep better at night ;)
One possibility is to build up the range from several ranges that can be matched by glob patterns. For example:
find . -name 'myfile_[0-9].log' -o -name 'myfile_[1-5][0-9].log' -o -name 'myfile_6[0-7].log'
You cannot represent a general range with a regular expression, although you can craft a regex for a specific range. Better use find to get files with a number and filter the output with another tool that perform the range checking, like awk.
START=0
END=67
while IFS= read -r -d '' file
do
N=$(echo "$file" | sed 's/file_\([0-9]\+\).log/\1/')
if [ "$N" -ge "$START" -a "$N" -le "$END" ]
then
echo "$file"
fi
done < <(find <path> -name "myfile_*.log" -print0)
In that script, you perform a find of all the files that have the desired pattern, then you loop through the found files and sed is used to capture the number in the filename. Finally, you compare that number with your range limits. If the comparisons succeed, the file is printed.
There are many other answers that give you a regex for the specific range in the example, but they are not general. Any of them allows for easy modification of the range involved.

Recursive search on predefined file name in different folders

So I have this file structure:
/var/www/html/website1/app/config/database.php
/var/www/html/website2/app/config/database.php
...
what I need is finding a certain string in every database.php files, I've tried
grep -nrw "string" /var/www/html/*/app/config/*
but it doesn't seem to recognize the path.
I'm just wondering if there is a way to achieve what I'm after.
find /var -maxdepth 5 -mindepth 5 -name database.php -type f | xargs grep -nrw "string"
If you expect "newlines or or types of white space" in the filenames:
find /var -maxdepth 5 -mindepth 5 -name database.php -type f -print0 | xargs -0 grep -nrw "string"
If all the files you'll be searching through will follow the pattern of the ones in your question, then you don't need a recursive search, you can do this with globbing alone. Your * can exist (and be expanded) anywhere in the path, not just at the end.
grep -n -F -w "string" /var/www/html/*/app/config/database.php
This has the advantage of ONLY looking for the paths that you've specified as important.
Note that since you've said you're searching for a string rather than a RE, the -F option may be appropriate.
If this doesn't find files which you believe should exist, you might try something like this:
find /var/www/html/*/app -name database.php -print
to verify that you're actually looking in the right place for this file.

How can I search for files in directories that contain spaces in names, using "find"?

How can I search for files in directories that contain spaces in names, using find?
i use script
#!/bin/bash
for i in `find "/tmp/1/" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
do
for j in `ls "$i" | grep sh | sed 's/\.txt//g'`
do
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
but the files and directories that contain spaces in names are not processed?
This will grab all the files that have spaces in them
$ls
more space nospace stillnospace this is space
$find -type f -name "* *"
./this is space
./more space
I don't know how to achieve you goal. But given your actual solution, the problem is not really with find but with the for loops since "spaces" are taken as delimiter between items.
find has a useful option for those cases:
from man find:
-print0
True; print the full file name on the standard output, followed by a null character
(instead of the newline character that -print uses). This allows file names
that contain newlines or other types of white space to be correctly interpreted
by programs that process the find output. This option corresponds to the -0
option of xargs.
As the man saids, this will match with the -0 option of xargs. Several other standard tools have the equivalent option. You probably have to rewrite your complex pipeline around those tools in order to process cleanly file names containing spaces.
In addition, see bash "for in" looping on null delimited string variable to learn how to use for loop with 0-terminated arguments.
Do it like this
find . -type f -name "* *"
Instead of . you can specify your path, where you want to find files with your criteria
Your first for loop is:
for i in `find "/tmp/1" -iname "*.txt" | sed 's/[0-9A-Za-z]*\.txt//g'`
If I understand it correctly, it is looking for all text files in the /tmp/1 directory, and then attempting to remove the file name with the sed command right? This would cause a single directory with multiple .txt files to be processed by the inner for loop more than once. Is that what you want?
Instead of using sed to get rid of the filename, you can use dirname instead. Also, later on, you use sed to get rid of the extension. You can use basename for that.
for i in `find "/tmp/1" -iname "*.txt"` ; do
path=$(dirname "$i")
for j in `ls $path | grep POD` ; do
file=$(basename "$j" .txt)
# Do what ever you want with the file
This doesn't solve the problem of having a single directory processed multiple times, but if it is an issue for you, you can use the for loop above to store the file name in an array instead and then remove duplicates with sort and uniq.
Use while read loop with null-delimited pathname output from find:
#!/bin/bash
while IFS= read -rd '' i; do
while IFS= read -rd '' j; do
find "/tmp/2/" -iname "$j.sh" -exec echo cp '{}' "$i" \;
done <(exec find "$i" -maxdepth 1 -mindepth 1 -name '*POD*' -not -name '*.txt' -printf '%f\0')
done <(exec find /tmp/1 -iname '*.txt' -not -iname '[0-9A-Za-z]*.txt' -print0)
Never used for i in $(find...) or similar as it'll fail for file names containing white space as you saw.
Use find ... | while IFS= read -r i instead.
It's hard to say without sample input and expected output but something like this might be what you need:
find "/tmp/1/" -iname "*.txt" |
while IFS= read -r i
do
i="${i%%[0-9A-Za-z]*\.txt}"
for j in "$i"/*sh*
do
j="${j%%\.txt}"
find "/tmp/2/" -iname "$j.sh" -exec cp {} "$i" \;
done
done
The above will still fail for file names that contains newlines. If you have that situation and can't fix the file names then look into the -print0 option for find, and piping it to xargs -0.

Remove special characters in linux files

I have a lot of files *.java, *.xml. But a guy wrote some comments and Strings with spanish characters. I been searching on the web how to remove them.
I tried find . -type f -exec sed 's/[áíéóúñ]//g' DefaultAuthoritiesPopulator.java just as an example, how can i remove these characters from many other files in subfolders?
If that's what you really want, you can use find, almost as you are using it.
find -type f \( -iname '*.java' -or -iname '*.xml' \) -execdir sed -i 's/[áíéóúñ]//g' '{}' ';'
The differences:
The path . is implicit if no path is supplied.
This command only operates on *.java and *.xml files.
execdir is more secure than exec (read the man page).
-i tells sed to modify the file argument in place. Read the man page to see how to use it to make a backup.
{} represents a path argument which find will substitute in.
The ; is part of the find syntax for exec/execdir.
You're almost there :)
find . -type f -exec sed -i 's/[áíéóúñ]//g' {} \;
^^ ^^
From sed(1):
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied)
From find(1):
-exec command ;
Execute command; true if 0 status is returned. All
following arguments to find are taken to be arguments to
the command until an argument consisting of `;' is
encountered. The string `{}' is replaced by the current
file name being processed everywhere it occurs in the
arguments to the command, not just in arguments where it
is alone, as in some versions of find. Both of these
constructions might need to be escaped (with a `\') or
quoted to protect them from expansion by the shell. See
the EXAMPLES section for examples of the use of the -exec
option. The specified command is run once for each
matched file. The command is executed in the starting
directory. There are unavoidable security problems
surrounding use of the -exec action; you should use the
-execdir option instead.
tr is the tool for the job:
NAME
tr - translate or delete characters
SYNOPSIS
tr [OPTION]... SET1 [SET2]
DESCRIPTION
Translate, squeeze, and/or delete characters from standard input, writing to standard out‐
put.
-c, -C, --complement
use the complement of SET1
-d, --delete
delete characters in SET1, do not translate
-s, --squeeze-repeats
replace each input sequence of a repeated character that is listed in SET1 with a
single occurrence of that character
piping your input through tr -d áíéóúñ will probably do what you want.
Why are you trying to remove only characters with diacritic signs? It probably worth removing all characters with codes not in the range 0-127, so removal regexp will be s/[\0x80-\0xFF]//g if you're sure that your files should not contain higher ascii.

Resources