What is file globbing? - linux

I was just wondering what is file globbing? I have never heard of it before and I couldn't find a definition when I tried looking for it online.

Globbing is the * and ? and some other pattern matchers you may be familiar with.
Globbing interprets the standard wild card characters * and ?, character lists in square brackets, and certain other special characters (such as ^ for negating the sense of a match).
When the shell sees a glob, it will perform pathname expansion and replace the glob with matching filenames when it invokes the program.
For an example of the * operator, say you want to copy all files with a .jpg extension in the current directory to somewhere else:
cp *.jpg /some/other/location
Here *.jpg is a glob pattern that matches all files ending in .jpg in the current directory. It's equivalent to (and much easier than) listing the current directory and typing in each file you want manually:
$ ls
cat.jpg dog.jpg drawing.png recipes.txt zebra.jpg
$ cp cat.jpg dog.jpg zebra.jpg /some/other/location
Note that it may look similar, but it is not the same as Regular Expressions.
You can find more detailed information here and here

Related

How do I exclude a character in Linux

Write a wildcard to match all files (does not matter the files are in which directory, just ask for the wildcard) named in the following rule: starts with a string “image”, immediately followed by a one-digit number (in the range of 0-9), then a non-digit char plus anything else, and ends with either “.jpg” or “.png”. For example, image7.jpg and image0abc.png should be matched by your wildcard while image2.txt or image11.png should not.
My folder contained these files imag2gh.jpeg image11.png image1agb.jpg image1.png image2gh.jpg image2.txt image5.png image70.jpg image7bn.jpg Screenshot .png
If my command work it should only display image1agb.jpg image1.png image2gh.jpg image5.png image70.jpg image7bn.jpg
This is the command I used (ls -ad image[0-9][^0-9]*{.jpg,.png}) but I'm only getting this image1agb.jpg image2gh.jpg image7bn.jpg so I'm missing (image1.png image5.png)Kali Terminal and what I did
ls -ad image[0-9][!0-9]*{.jpg,.png}
Info
Character ranges like [0-9] are usually seen in RegEx statements and such. They won't work as shell globs (wildcards) like that.
Possible solution
Pipe output of command ls -a1
to standard input of the grep command (which does support RegEx).
Use a RegEx statement to make grep filter filenames.
ls -a1|grep "image"'[[:digit:]]\+[[:alpha:]]*\.\(png\|jpg\)'

Wildcards as shell parameters

I know how regex and wildcards work in general, but I don't really understand why you can use them as parameters.
ls /[!\(][!\(][!\(]/
command results in the following output
...
com.apple.launchd.AIPZ6SAfpO
com.apple.launchd.HarlOx3LWS
com.apple.launchd.VmTi5KDz1h
powerlog
/usr/:
X11 include libexec sbin standalone
bin lib local share
/var/:
agentx empty log netboot rwho
at folders ma networkd spool
audit install mail root tmp
backups jabberd msgs rpc vm
db lib mysql run yp
from my understanding this should match every three character folder name not containing slash /[!\(][!\(][!\(]/
But why can I use it as parameter?
You can't use regular expressions as parameters (or rather, the shell will not treat a string as a regular expression when placed in a parameter). The unquoted glob /[!\(][!\(][!\(]/ matches, in order:
A slash.
Three characters which are not starting brackets.
A slash.
In other words, three-letter root directories not containing ( anywhere.
The shell expands globs to zero (in case of Bash's nullglob, for example) or more arguments which may be passed to execve, as in this command:
$ strace -fe execve echo *
execve("/usr/bin/echo", ["echo", "directory1", "directory2"], 0x7ffcff705ce8 /* 44 vars */) = 0
Not, you don't know.... shell patterns are described in glob(3) while regular expressions (a more elaborate concept) are described in regex(3) Two different libraries used for similar purposes. sh(1) doesn't use regular expressions when substituting parameters at all. It only uses the glob(3) library.
Because that's how the shell works. Any arguments containing (unquoted) glob characters/expressions, are expanded to filenames. That's what happens in, say rm *.txt (since * is a glob character), and that's what happens in ls /[!\(][!\(][!\(]/ (since [abc] is a glob expression).
They're not regular expressions, though. See e.g. https://mywiki.wooledge.org/glob for the syntax.

Find command with quotation marks results in "no such file"

In my directory there are the files:
file1.txt fix.log fixRRRRRR.log fixXXXX.log output.txt
In order to understand the find command, I tried a lot of stuff among other things I wanted to use 2 wildcards. Target was to find files that start with an f and have an extension starting with an l.
$ find . f*.l*
./file1.txt
./fix.log
./fixRRRRRR.log
./output.txt
./fixXXXX.log
fix.log
fixRRRRRR.log
fixXXXX.log
I read in a forum answer to use quotation marks with find find . "f*.l*" with the result: `
./file1.txt
./fix.log
./fixRRRRRR.log
./output.txt
./fixXXXX.log
It results in find: ‘f*.l*’: No such file or directory
What am I doing wrong, where is my error in reasoning?
Thanks for an answer.
find doesn't work like that. In general find's call form looks like:
find [entry1] [entry2] ... [expressions ...]
Where an entry is a starting point where find starts the search for files.
In your case, you haven't actually supplied any expressions.
In the first command (without quotes), the shell expands the wildcards to a list of matching files (in the current directory), then passes the list to find as arguments. So find . f*.l* is essentially equivalent to find . fix.log fixRRRRRR.log fixXXXX.log. As a result, find treats all of those arguments as directories/files to search (not patterns to search for), and lists all files under ., (everything) then all files under fix.log (it's not a directory, so that's just the file itself), then all files under fixRRRRRR.log and finally all files under fixXXXX.log.
In the second one (with quotes) it searches for all files beneath the current directory (.) and tries the same for the file literally called "f*.l*".
Actually you are likely seeking for the "-name" expression, which may be used like this:
find . -name "f*.l*"

Find all PHP files in the current folder that contain a string

How could I show names of all PHP files in the current folder that contain the string "Form.new" in a Linux system?
I have tried grep "Form.new" .
You need to search recursive or using* instead of ., depending of whether you want to search only file right inside that directory or also in deeper levels. So:
grep -r "Form\.new" .
or
grep "Form\.new" *
Assuming that your PHP files have a .php extension, the following will do the trick:
grep "Form\.new" *.php
Like #LaughDonor mentioned, it's good practise to escape the dot; otherwise, dot is interpreted as “any character” by grep. "Form.new" also matches "Form_new", "Form-new", "Form:new", "FormAnew", etc.

Using wildcards to exclude files with a certain suffix

I am experimenting with wildcards in bash and tried to list all the files that start with "xyz" but does not end with ".TXT" but getting incorrect results.
Here is the command that I tried:
$ ls -l xyz*[!\.TXT]
It is not listing the files with names "xyz" and "xyzTXT" that I have in my directory. However, it lists "xyz1", "xyz123".
It seems like adding [!\.TXT] after "xyz*" made the shell look for something that start with "xyz" and has at least one character after it.
Any ideas why it is happening and how to correct this command? I know it can be achieved using other commands but I am especially interested in knowing why it is failing and if it can done just using wildcards.
These commands will do what you want
shopt -s extglob
ls -l xyz!(*.TXT)
shopt -u extglob
The reason why your command doesn't work is beacause xyz*[!\.TXT] which is equivalent to xyz*[!\.TX] means xyz followed by any sequence of character (*) and finally a character in set {!,\,.,T,X} so matches 'xyzwhateveryouwant!' 'xyzwhateveryouwant\' 'xyzwhateveryouwant.' 'xyzwhateveryouwantT' 'xyzwhateveryouwantX'
EDIT: where whateveryouwant does not contain any of !\.TX
I don't think this is doable with only wildcards.
Your command isn't working because it means:
Match everything that has xyz followed by whatever you want and it must not end with sequent character: \, .,T and X. The second T doesn't count as far as what you have inside [] is read as a family of character and not as a string as you thought.
You don't either need to 'escape' . as long as it has no special meaning inside a wildcard.
At least, this is my knowledge of wildcards.

Resources