Finding files of fixed length - linux

I am trying to find file names that start with the letter 'a' and are of length 6. I have tried many variations, the latest one being:
find /usr/bin -type f -regex "^[a]" > grep {6}
However I get an error message of:
find: paths must precede expression: {6}
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
What am I doing wrong?

Without any regexes, just globbing:
find /usr/bin -type f -name 'a?????'
References:
Findutils manual: Shell pattern matching
Bash manual, Filename expansion and pattern matching

I would use the following command which is using extended posix regexes:
find /usr/bin -type f -regextype posix-extended -regex '.*/a.{5}'
Let me explain the pattern from the end:
.{5} matches five arbitrary characters
a matches a literal a
the / matches the path delimiter right before the filename
.* is the path, in this case /usr/bin
Btw, a simple command which does not even require a special regex engine would be:
find /usr/bin -type f -regex '.*/a.....'
$ is the end of the filename
..... are five arbitrary characters
a is a literal a
.*/ is the preceding path
Another thing. While your regex is wrong and grep is not required at all, why do you get this strange error message?
You are using find ... > grep where I think you wanted to use find ... | grep. Note that > will redirect the output of the find command to a file. In this case a file named grep. If you want to redirect the output of the find command into the input of a grep command you need to use the pipe symbol find ... | grep.
A > filename redirection can appear anywhere in a command line, it does not necessarily have to be at the end. That' why {6} is interpreted as the last argument to find. Since this argument is not expected, find supposed that you accidentally passed a search path at the end, which is a common mistake. That's why the message.

Related

how to delete files have specific pattern in linux?

I have a set of images like these
12345-image-1-medium.jpg 12345-image-2-medium.png 12345-image-3-large.jpg
what pattern should I write to select these images and delete them
I also have these images that don't want to select
12345-image-profile-small.jpg 12345-image-profile-medium.jpg 12345-image-profile-large.png
I have tried this regex but not worked
1234-image-[0-9]+-small.*
I think bash not support regex as in Javascript, Go, Python or Java
for pic in 12345*.{jpg,png};do rm $pic;done
for more information on wildcards take a look here
So long as you do NOT have filenames with embedded '\n' character, then the following find and grep will do:
find . -type f | grep '^.*/[[:digit:]]\{1,5\}-image-[[:digit:]]\{1,5\}'
It will find all files below the current directory and match (1 to 5 digits) followed by "-image-" followed by another (1 to 5 digits). In your case with the following files:
$ ls -1
123-image-99999-small.jpg
12345-image-1-medium.jpg
12345-image-2-medium.png
12345-image-3-large.jpg
12345-image-profile-large.png
12345-image-profile-medium.jpg
12345-image-profile-small.jpg
The files you request are matched in addition to 123-image-99999-small.jpg, e.g.
$ find . -type f | grep '^.*/[[:digit:]]\{1,5\}-image-[[:digit:]]\{1,5\}'
./123-image-99999-small.jpg
./12345-image-3-large.jpg
./12345-image-2-medium.png
./12345-image-1-medium.jpg
You can use the above in a command substitution to remove the files, e.g.
$ rm $(find . -type f | grep '^.*/[[:digit:]]\{1,5\}-image-[[:digit:]]\{1,5\}')
The remaining files are:
$ l1
12345-image-profile-large.png
12345-image-profile-medium.jpg
12345-image-profile-small.jpg
If Your find Supports -regextype
If your find supports the regextype allowing you to specify which set of regular expression syntax to use, you can use -regextype grep for grep syntax and use something similar to the above to remove the files with the -execdir option, e.g.
$ find . -type f -regextype grep -regex '^.*/[[:digit:]]\+-image-[[:digit:]]\+.*$' -execdir rm '{}' +
I do not know whether this is supported by BSD or Solaris, etc.., so check before turning it loose in a script. Also note, [[:digit:]]\+ tests for (1 or more) digits and is not limited to 5-digits as shown in your question.
Ok I solve it with this pattern
12345-image-*[0-9]-*
eg:
rm -rf 12345-image-*[0-9]-*
it matches all the file names start with 12345-image- then a number then - symbol and any thing after that
as I found it's globbing in bash not regex
and I found this app really use full

BASH grep pattern filename in end of line

I just started bash and i am doing a search script that search for files and grep a pattern. Simple idea:
find $HOME -type f | grep $1
In the current script i match everything that contains $1 (files and directories). I only want to match the pattern in the filename, i don't want to match the directories in the path. I tried lots of advanced expressions with symbols like "/.*^$" etc.. to grep in a specific part but honestly for a new user is being a bit hard.
Cut tool is not an option because i want the path of the file.
EDIT:
Correct Example:
$ ./search test
/home/user/documents/test.txt
/home/user/downloads/blahtestblah.py
Incorrect example:
$ ./search test
/home/user/test/whatever.txt
In the incorrect example grep matched the keyword in the path, matching a directory.
^ matches the beginning of the string. Try:
find $HOME -type f | grep "/.*$1$"
You could also use the -regex switch to find instead of piping the output to grep:
find $HOME -type f -regex ".*$1$"

Difference between these two commands (w & w/out "") and why?

In linux, I have a file named test2 in my directory which I created using the touch command.
When I run the command
find . –name “*test*” -ls
It doesn't give me an error, but when I run
find . –name *test* -ls
It gave me an error
find: paths must precede expression: test2
Usage: find [-H] [-L] [-P] [-Olevel] [-D help|tree|search|stat|rates|opt|exec] [path...] [expression]
Why is this?
*test* gets glob expanded by your shell (into more than one token).
Whereas no glob expansion happens in "*test*" because the surrounding " symbols prevent globbing.
Your shell is intercepting *test* and looking for files and directories in the current directory that match that expression, before it passes the expanded list to find. find expects only a single string in that spot, whereas the expanded list may be 0 or many strings.
With quotes, the shell ignores the asterisks and passes the raw string *test* to find, which then uses those asterisks as wildcards as you'd expect.

Linux find command shell expansion

I have just a little question I don't understand with the find command.
I can do this :
[root#hostnaoem# ❯❯❯ls /proc/*/fd
But this give me an error :
[root#hostnaoem# ❯❯❯ find /proc/*/fd -ls
find: `/proc/*/fd': No such file or directory
even if I use "/proc//fd", /proc/""/fd or "/proc/*/fd"
I've searched wha find shell expansion says about that, but I found nothing. Can someone tell me why?
Thanks
If you just RTFM, you'll learn that the syntax for find is:
find [-H] [-L] [-P] [-D debugopts] [-Olevel] [path...] [expression]
The usually used subset of that is:
find whereToSearch (-howToSearch arg)*
To find all files|directories named fd in /proc:
find /proc -name fd
-name is the most common howToSearch expression:
-name pattern
Base of file name (the path with the leading directories
removed) matches shell pattern pattern. The metacharacters
(`*', `?', and `[]') match a `.' at the start of the base name
(this is a change in findutils-4.2.2; see section STANDARDS CON‐
FORMANCE below). To ignore a directory and the files under it,
use -prune; see an example in the description of -path. Braces
are not recognised as being special, despite the fact that some
shells including Bash imbue braces with a special meaning in
shell patterns. The filename matching is performed with the use
of the fnmatch(3) library function. Don't forget to enclose
the pattern in quotes in order to protect it from expansion by
the shell.
(Note the the last sentence)
If your pattern contains slashes, you need -path or -wholename (same thing):
find /proc/ -wholename '/proc/[0-9]*/fd' 2>/dev/null
Other expressions you might want to use are:
-type
-depth, -mindepth, -maxdepth
-user, -uid
See find(1) to learn more about each search expressions. If you want to search the in-terminal manual (man find or man 1 find), you can use the / character to enter search mode (like Ctrl+F in most GUI apps).
Usage of ls with globbing (*) is generally a code smell. Unless you use the -d flag, it'll list the contents of the directories that match the glob pattern in addition to the matches.
I find the echo globpattern form generally more convenient for viewing the results of a glob pattern match.
This work :
[root#hostname # ❯❯❯ find /proc/ -path /proc/*/fd -ls
Regards.

Remove special characters in linux files

I have a lot of files *.java, *.xml. But a guy wrote some comments and Strings with spanish characters. I been searching on the web how to remove them.
I tried find . -type f -exec sed 's/[áíéóúñ]//g' DefaultAuthoritiesPopulator.java just as an example, how can i remove these characters from many other files in subfolders?
If that's what you really want, you can use find, almost as you are using it.
find -type f \( -iname '*.java' -or -iname '*.xml' \) -execdir sed -i 's/[áíéóúñ]//g' '{}' ';'
The differences:
The path . is implicit if no path is supplied.
This command only operates on *.java and *.xml files.
execdir is more secure than exec (read the man page).
-i tells sed to modify the file argument in place. Read the man page to see how to use it to make a backup.
{} represents a path argument which find will substitute in.
The ; is part of the find syntax for exec/execdir.
You're almost there :)
find . -type f -exec sed -i 's/[áíéóúñ]//g' {} \;
^^ ^^
From sed(1):
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if extension supplied)
From find(1):
-exec command ;
Execute command; true if 0 status is returned. All
following arguments to find are taken to be arguments to
the command until an argument consisting of `;' is
encountered. The string `{}' is replaced by the current
file name being processed everywhere it occurs in the
arguments to the command, not just in arguments where it
is alone, as in some versions of find. Both of these
constructions might need to be escaped (with a `\') or
quoted to protect them from expansion by the shell. See
the EXAMPLES section for examples of the use of the -exec
option. The specified command is run once for each
matched file. The command is executed in the starting
directory. There are unavoidable security problems
surrounding use of the -exec action; you should use the
-execdir option instead.
tr is the tool for the job:
NAME
tr - translate or delete characters
SYNOPSIS
tr [OPTION]... SET1 [SET2]
DESCRIPTION
Translate, squeeze, and/or delete characters from standard input, writing to standard out‐
put.
-c, -C, --complement
use the complement of SET1
-d, --delete
delete characters in SET1, do not translate
-s, --squeeze-repeats
replace each input sequence of a repeated character that is listed in SET1 with a
single occurrence of that character
piping your input through tr -d áíéóúñ will probably do what you want.
Why are you trying to remove only characters with diacritic signs? It probably worth removing all characters with codes not in the range 0-127, so removal regexp will be s/[\0x80-\0xFF]//g if you're sure that your files should not contain higher ascii.

Resources