Listing entries in a directory using grep - linux

I'm trying to list all entries in a directory whose names contain ONLY upper-case letters. Directories need "/" appended.
#!/bin/bash
cd ~/testfiles/
ls | grep -r *.*
Since grep by default looks for upper-case letters only (right?), I'm just recursively searching through the directories under testfiles for all names who contain only upper-case letters.
Unfortunately this doesn't work.
As for appending directories, I'm not sure why I need to do this. Does anyone know where I can start with some detailed explanations on what I can do with grep? Furthermore how to tackle my problem?

No, grep does not only consider uppercase letters.
Your question I a bit unclear, for example:
from your usage of the -r option, it seems you want to search recursively, however you don't say so. For simplicity I assume you don't need to; consider looking into #twm's answer if you need recursion.
you want to look for uppercase (letters) only. Does that mean you don't want to accept any other (non letter) characters, but which are till valid for file names (like digits or dashes, dots, etc.)
since you don't say th it i not permissible to have only on file per line, I am assuming it is OK (thus using ls -1).
The naive solution would be:
ls -1 | grep "^[[:upper:]]\+$"
That is, print all lines containing only uppercase letters. In my TEMP directory that prints, for example:
ALLBIG
LCFEM
WPDNSE
This however would exclude files like README.TXT or FILE001, which depending on your requirements (see above) should most likely be included.
Thus, a better solution would be:
ls -1 | grep -v "[[:lower:]]\+"
That is, print all lines not containing an lowercase letter. In my TEMP directory that prints for example:
ALLBIG
ALLBIG-01.TXT
ALLBIG005.TXT
CRX_75DAF8CB7768
LCFEM
WPDNSE
~DFA0214428CD719AF6.TMP
Finally, to "properly mark" directories with a trailing '/', you could use the -F (or --classify) option.
ls -1F | grep -v "[[:lower:]]\+"
Again, example output:
ALLBIG
ALLBIG-01.TXT
ALLBIG005.TXT
CRX_75DAF8CB7768
LCFEM/
WPDNSE/
~DFA0214428CD719AF6.TMP
Note a different option would to be use find, if you can live with the different output (e.g. find ! -regex ".*[a-z].*"), but that will have a different output.

The exact regular expression depend on the output format of your ls command. Assuming that you do not use an alias for ls, you can try this:
ls -R | grep -o -w "[A-Z]*"
note that with -R in ls you will recursively list directories and files under the current directory. The grep option -o tells grep to only print the matched part of the text. The -w options tell grep to consider as match only for whole words. The "[A-Z]*" is a regexp to filter only upper-cased words.
Note that this regexp will print TEST.txt as well as TEXT.TXT. In other words, it will only consider names that are formed by letters.

It's ls which lists the files, not grep, so that is where you need to specify that you want "/" appended to directories. Use ls --classify to append "/" to directories.
grep is used to process the results from ls (or some other source, generally speaking) and only show lines that match the pattern you specify. It is not limited to uppercase characters. You can limit it to just upper case characters and "/" with grep -E '^[A-Z/]*$ or if you also want numbers, periods, etc. you could instead filter out lines that contain lowercase characters with grep -v -E [a-z].
As grep is not the program which lists the files, it is not where you want to perform the recursion. ls can list paths recursively if you use ls -R. However, you're just going to get the last component of the file paths that way.
You might want to consider using find to handle the recursion. This works for me:
find . -exec ls -d --classify {} \; | egrep -v '[a-z][^/]*/?$'
I should note, using ls --classify to append "/" to the end of directories may also append some other characters to other types of paths that it can classify. For instance, it may append "*" to the end of executable files. If that's not OK, but you're OK with listing directories and other paths separately, this could be worked around by running find twice - once for the directories and then again for other paths. This works for me:
find . -type d | egrep -v '[a-z][^/]*$' | sed -e 's#$#/#'
find . -not -type d | egrep -v '[a-z][^/]*$'

Related

Finding multiple strings in directory using linux commends

If I have two strings, for example "class" and "btn", what is the linux command that would allow me to search for these two strings in the entire directory.
To be more specific, lets say I have directory that contains few folders with bunch of .php files. My goal is to be able to search throughout those .php files so that it prints out only files that contain "class" and "btn" in one line. Hopefully this clarifies things better.
Thanks,
I normally use the following to search for strings inside my source codes. It searches for string and shows the exact line number where that text appears. Very helpful for searching string in source code files. You can always pipes the output to another grep and filter outputs.
grep -rn "text_to_search" directory_name/
example:
$ grep -rn "angular" menuapp
$ grep -rn "angular" menuapp | grep some_other_string
output would be:
menuapp/public/javascripts/angular.min.js:251://# sourceMappingURL=angular.min.js.map
menuapp/public/javascripts/app.js:1:var app = angular.module("menuApp", []);
grep -r /path/to/directory 'class|btn'
grep is used to search a string in a file. With the -r flag, it searches recursively all files in a directory.
Or, alternatively using the find command to "identify" the files to be searched instead of using grep in recursive mode:
find /path/to/your/directory -type f -exec grep "text_to_search" {} \+;

Using grep to identify a pattern

I have several documents hosted on a cloud instance. I want to extract all words conforming to a specific pattern into a .txt file. This is the pattern:
ABC123A
ABC123B
ABC765A
and so one. Essentially the words start with a specific character string 'ABC', have a fixed number of numerals, and end with a letter. This is my code:
grep -oh ABC[0-9].*[a-zA-Z]$ > /home/user/abcLetterMatches.txt
When I execute the query, it runs for several hours without generating any output. I have over 1100 documents. However, when I run this query:
grep -r ABC[0-9].*[a-zA-Z]$ > /home/user/abcLetterMatches.txt
the list of files with the strings is generated in a matter for seconds.
What do I need to correct in my query? Also, what is causing the delay?
UPDATE 1
Based on the answers, it's evident that the command is missing the file name on which it needs to be executed. I want to run the code on multiple document files (>1000)
The documents I want searched are in multiple sub-directories within a directory. What is a good way to search through them? Doing
grep -roh ABC[0-9].*[a-zA-Z]$ > /home/user/abcLetterMatches.txt
only returns the file names.
UPDATE 2
If I use the updated code from the answer below:
find . -exec grep -oh "ABC[0-9].*[a-zA-Z]$" >> ~/abcLetterMatches.txt {} \;
I get a no file or directory error
UPDATE 3
The pattern can be anywhere in the line.
You can use this regexp :
~/ grep -E "^ABC[0-9]{3}[A-Z]$" docs > filename
ABC123A
ABC123B
ABC765A
There is no delay, grep is just waiting for the input you didn't give it (and therefore it waits on standard input, by default). You can correct your command by supplying argument with filename:
grep -oh "ABC[0-9].*[a-zA-Z]$" file.txt > /home/user/abcLetterMatches.txt
Source (man grep):
SYNOPSIS
grep [OPTIONS] PATTERN [FILE...]
To perform the same grepping on several files recursively, combine it with find command:
find . -exec grep -oh "ABC[0-9].*[a-zA-Z]$" >> ~/abcLetterMatches.txt {} \;
This does what you ask for:
grep -hr '^ABC[0-9]\{3\}[A-Za-z]$'
-h to not get the filenames.
-r to search recursively. If no directory is given (as above) the current one is used. Otherwise just specify one as the last argument.
Quotes around the pattern to avoid accidental globbing, etc.
^ at the beginning of the pattern to — together with $ at the end — only match whole lines. (Not sure if this was a requirement, but the sample data suggests it.)
\{3\} to specify that there should be three digits.
No .* as that would match a whole lot of other things.

grep recursively for a specific file type on Linux

Can we search a term (eg. "onblur") recursively in some folders only in specific files (html files)?
grep -Rin "onblur" *.html
This returns nothing. But,
grep -Rin "onblur" .
returns "onblur" search result from all available files, like in text(".txt"), .mako, .jinja etc.
Consider checking this answer and that one.
Also this might help you: grep certain file types recursively | commandlinefu.com.
The command is:
grep -r --include="*.[ch]" pattern .
And in your case it is:
grep -r --include="*.html" "onblur" .
grep -r --include "*.html" onblur .
Got it from :
How do I grep recursively?
You might also like ag 'the silver searcher' -
ag --html onblur
it searches by regexp and is recursive in the current directory by default, and has predefined sets of extensions to search - in this case --html maps to .htm, .html, .shtml, .xhtml. Also ignores binary files, prints filenames, line numbers, and colorizes output by default.
Some options -
-Q --literal
Do not parse PATTERN as a regular expression. Try to match it literally.
-S --smart-case
Match case-sensitively if there are any uppercase letters in PATTERN,
case-insensitively otherwise. Enabled by default.
-t --all-text
Search all text files. This doesn't include hidden files.
--hidden
Search hidden files. This option obeys ignored files.
For the list of supported filetypes run ag --list-file-types.
The only thing it seems to lack is being able to specify a filetype with an extension, in which case you need to fall back on grep with --include.
To be able to grep only from .py files by typing grepy mystring I added the following line to my bashrc:
alias grepy='grep -r --include="*.py"'
Also note that grep accepts The following:
grep mystring *.html
for .html search in current folder
grep mystring */*.html
for recursive search (excluding any file in current dir!).
grep mystring .*/*/*.html
for recursive search (all files in current dir and all files in subdirs)
Have a look at this answer instead, to a similar question: grep, but only certain file extensions
This worked for me. In your case just type the following:
grep -inr "onblur" --include \*.html ./
consider that
grep: command
-r: recursively
-i: ignore-case
-n: each output line is preceded by its relative line number in the file
--include \*.html: escape with \ just in case you have a directory with asterisks in the filenames
./: start at current directory.

How to tell how many files match description with * in unix

Pretty simple question: say I have a set of files:
a1.txt
a2.txt
a3.txt
b1.txt
And I use the following command:
ls a*.txt
It will return:
a1.txt a2.txt a3.txt
Is there a way in a bash script to tell how many results will be returned when using the * pattern. In the above example if I were to use a*.txt the answer should be 3 and if I used *1.txt the answer should be 2.
Comment on using ls:
I see all the other answers attempt this by parsing the output of
ls. This is very unpredictable because this breaks when you have
file names with "unusual characters" (e.g. spaces).
Another pitfall would be, it is ls implementation dependent. A
particular implementation might format output differently.
There is a very nice discussion on the pitfalls of parsing ls output on the bash wiki maintained by Greg Wooledge.
Solution using bash arrays
For the above reasons, using bash syntax would be the more reliable option. You can use a glob to populate a bash array with all the matching file names. Then you can ask bash the length of the array to get the number of matches. The following snippet should work.
files=(a*.txt) && echo "${#files[#]}"
To save the number of matches in a variable, you can do:
files=(a*.txt)
count="${#files[#]}"
One more advantage of this method is you now also have the matching files in an array which you can iterate over.
Note: Although I keep repeating bash syntax above, I believe the above solution applies to all sh-family of shells.
You can't know ahead of time, but you can count how many results are returned. I.e.
ls -l *.txt | wc -l
ls -l will display the directory entries matching the specified wildcard, wc -l will give you the count.
You can save the value of this command in a shell variable with either
num=$(ls * | wc -l)
or
num=`ls -l *.txt | wc -l`
and then use $num to access it. The first form is preferred.
You can use ls in combination with wc:
ls a*.txt | wc -l
The ls command lists the matching files one per line, and wc -l counts the number of lines.
I like suvayu's answer, but there's no need to use an array:
count() { echo $#; }
count *
In order to count files that might have unpredictable names, e.g. containing new-lines, non-printable characters etc., I would use the -print0 option of find and awk with RS='\0':
num=$(find . -maxdepth 1 -print0 | awk -v RS='\0' 'END { print NR }')
Adjust the options to find to refine the count, e.g. if the criteria is files starting with a lower-case a with .txt extension in the current directory, use:
find . -type f -name 'a*.txt' -maxdepth 1 -print0

Looking for tool to search text in files on command line

Hello
I'm looking some script or program that use keywords or pattern search in files ex. php, html, etc and show where is this file
I use command cat /home/* | grep "keyword"
but i have too many folders and files and this command causes big uptime :/
I need this script to find fake websites (paypal, ebay, etc)
find /home -exec grep -s "keyword" {} \; -print
You don't really say what OS (and shell) you are using. You might want to retag your question to help us out.
Because you mention cat | ... , I am assuming you are using a Unix/Linux variant, so here are some pointers for looking at files. (bmargulies solution is good too).
I'm looking some script or program that use keywords or pattern search in files
grep is the basic program for searching files for text strings. Its usage is
grep [-options] 'search target' file1 file2 .... filen
(Note that 'search target' contains a space, if you don't surround spaces in your searchTarget with double or single quotes, you will have a minor error to debug.)
(Also note that 'search target' can use a wide range of wild-card characters, like .,?,+,,., and many more, that is beyond the scope of your question). ... anyway ...
As I guess you have discovered, you can only cram so many files at a time into the comand-line, even when using wild-card filename expansion. Unix/linux almost always have a utiltiyt that can help with that,
startDir=/home
find ${startDir} -print | xargs grep -l 'Search Target'
This, as one person will be happy to remind you, will require further enhancements if your filenames contain whitespace characters or newlines.
The options available for grep can vary wildly based on which OS you are using. If you're lucky, you type the following to get the man page for your local grep.
man grep
If you don't have your page buffer setup for a large size, you might need to do
man grep | page
so you can see the top of the 'document'. Press any key to advance to the next page and when you are at the end of the document, the last key press returns you to the command prompt.
Some options that most greps have that might be useful to you are
-i (ignore case)
-l (list filenames only (where txt is found)
There is also fgrep, which is usually interpretted to mean 'file' grep
becuase you can give it a file of search targets to scan for, and is used like
fgrep [-other_options] -f srchTargetsFile file1 file2 ... filen
I need this script to find fake websites (paypal, ebay, etc)
Final solution
you can make a srchFile like
paypal.fake.com
ebay.fake.com
etc.fake.com
and then combined with above, run the following
startDir=/home
find ${startDir} -print | xargs fgrep -il -f srchFile
Some greps require that the -fsrchFile be run together.
Now you are finding all files starting /home, searching with fgrep for paypay, ebay, etc in all files. The -l says it will ONLY print the filename where a match is found. You can remove the -l and then you will see the output of what is found, prepended with the filename.
IHTH.

Resources