Regular expression to not match file two file types in Bash

Regular expression to not match file two file types in Bash - linux

I'm trying to do a list in bash of files that are not .html or .js
I've tired both of the following methods but neither work?
ls !(*.html|*.js)
ls | grep -v '\.(html|js)$'

There's yet another way to do it. bash has an option for extended glob patterns:
shopt -s extglob
ls !(*.html|*.js)
(Note that this is still a glob pattern, not a regular expression -- for example, * means "any string", not "zero or more of the preceding thing").

If your version of ls supports the -I flag:
ls -I *.js -I *.html
From the man page:
-I, --ignore=PATTERN
do not list implied entries matching shell PATTERN
Otherwise, use find:
find . -maxdepth 1 -type f ! \( -name "*.html" -o -name "*.js" \)
For formatting add:
-printf "%f\n"
If the filenames need to be piped, you only need to change the printf() statement:
-printf '%f\0' | xargs -0 ...

Use extended-regexp with the -E option:
ls | grep -E -v '\.(html|js)$'

The -I flag can filter ls output:
ls -I '*.html' -I '*.js'
or
ls | grep -v -e '\.html' -e '\.js'
From the man page:
-e PATTERN, --regexp=PATTERN
Use PATTERN as the pattern; useful to protect patterns beginning with -.

Related

how do I change string in all sub directories with same file name (For eg: data.txt) in linux using termianl?

find . -name "data.txt" -print0 | grep -rl "pa028" ./ |xargs -0 sed -i '' -e 's/pa028/pa014/g'
I tried to replace pa028 with pa014 in the file name "data.txt" in all subdirectories. Can you find please correct me?

You can't put grep between find -print0 and xargs -0 because grep operates on lines, and this pipeline contains null-separated text instead of lines. Additionally, grep -r . will ignore the standard input you so expensively set up find to produce.
find . -name "data.txt" -exec grep -q "pa028" {} \; -print0 |
xargs -r -0 sed -i '' -e 's/pa028/pa014/g'
The logic here is to use -exec grep -q as a predicate to find so we produce a null-terminated list of matching files (for which the -exec returns true) to pass to xargs -r -0. (The -r option is important, too; you get weird errors if xargs runs anyway even though find produced no output.)
There is an extension to GNU grep to operate on null-terminated strings with -z and print null-terminated file names with -Z -l but that's a fairly recent development, so I'm not yet prepared to recommend that.

Bash script: Using grep to find a string that is also an option

I call the program with the text I want to find, so programname '-r'
Then, within the script I have text="${1}"
find . -r -name "hi.*" -exec grep -l "${text}" {} \;
The second half of that simplifies to grep -l -r and it waits for another input
How do I specify that -r is the string to be found, and not an option?

The POSIX standard mandates that grep support an option -e that forces the following argument to be treated as a regular expression, rather than another option.
find . -r -name "hi.*" -exec grep -l -e "$text" {} \;

Add -- after the -l [or your last valid option]. That stops option processing in grep so that your text will be interpreted as a string to search for and not an option:
find . -r -name "hi.*" -exec grep -l -- "${text}" {} \;

Bash script to recursively find and replace in files [duplicate]

How do I find and replace every occurrence of:
subdomainA.example.com
with
subdomainB.example.com
in every text file under the /home/www/ directory tree recursively?

find /home/www \( -type d -name .git -prune \) -o -type f -print0 | xargs -0 sed -i 's/subdomainA\.example\.com/subdomainB.example.com/g'
-print0 tells find to print each of the results separated by a null character, rather than a new line. In the unlikely event that your directory has files with newlines in the names, this still lets xargs work on the correct filenames.
\( -type d -name .git -prune \) is an expression which completely skips over all directories named .git. You could easily expand it, if you use SVN or have other folders you want to preserve -- just match against more names. It's roughly equivalent to -not -path .git, but more efficient, because rather than checking every file in the directory, it skips it entirely. The -o after it is required because of how -prune actually works.
For more information, see man find.

The simplest way for me is
grep -rl oldtext . | xargs sed -i 's/oldtext/newtext/g'

Note: Do not run this command on a folder including a git repo - changes to .git could corrupt your git index.
find /home/www/ -type f -exec \
sed -i 's/subdomainA\.example\.com/subdomainB.example.com/g' {} +
Compared to other answers here, this is simpler than most and uses sed instead of perl, which is what the original question asked for.

All the tricks are almost the same, but I like this one:
find <mydir> -type f -exec sed -i 's/<string1>/<string2>/g' {} +
find <mydir>: look up in the directory.
-type f:
File is of type: regular file
-exec command {} +:
This variant of the -exec action runs the specified command on the selected files, but the command line is built by appending
each selected file name at the end; the total number of invocations of the command will be much less than the number of
matched files. The command line is built in much the same way that xargs builds its command lines. Only one instance of
`{}' is allowed within the command. The command is executed in the starting directory.

For me the easiest solution to remember is https://stackoverflow.com/a/2113224/565525, i.e.:
sed -i '' -e 's/subdomainA/subdomainB/g' $(find /home/www/ -type f)
NOTE: -i '' solves OSX problem sed: 1: "...": invalid command code .
NOTE: If there are too many files to process you'll get Argument list too long. The workaround - use find -exec or xargs solution described above.

cd /home/www && find . -type f -print0 |
xargs -0 perl -i.bak -pe 's/subdomainA\.example\.com/subdomainB.example.com/g'

For anyone using silver searcher (ag)
ag SearchString -l0 | xargs -0 sed -i 's/SearchString/Replacement/g'
Since ag ignores git/hg/svn file/folders by default, this is safe to run inside a repository.

This one is compatible with git repositories, and a bit simpler:
Linux:
git grep -l 'original_text' | xargs sed -i 's/original_text/new_text/g'
Mac:
git grep -l 'original_text' | xargs sed -i '' -e 's/original_text/new_text/g'
(Thanks to http://blog.jasonmeridth.com/posts/use-git-grep-to-replace-strings-in-files-in-your-git-repository/)

To cut down on files to recursively sed through, you could grep for your string instance:
grep -rl <oldstring> /path/to/folder | xargs sed -i s^<oldstring>^<newstring>^g
If you run man grep you'll notice you can also define an --exlude-dir="*.git" flag if you want to omit searching through .git directories, avoiding git index issues as others have politely pointed out.
Leading you to:
grep -rl --exclude-dir="*.git" <oldstring> /path/to/folder | xargs sed -i s^<oldstring>^<newstring>^g

A straight forward method if you need to exclude directories (--exclude-dir=..folder) and also might have file names with spaces (solved by using 0Byte for both grep -Z and xargs -0)
grep -rlZ oldtext . --exclude-dir=.folder | xargs -0 sed -i 's/oldtext/newtext/g'

An one nice oneliner as an extra. Using git grep.
git grep -lz 'subdomainA.example.com' | xargs -0 perl -i'' -pE "s/subdomainA.example.com/subdomainB.example.com/g"

Simplest way to replace (all files, directory, recursive)
find . -type f -not -path '*/\.*' -exec sed -i 's/foo/bar/g' {} +
Note: Sometimes you might need to ignore some hidden files i.e. .git, you can use above command.
If you want to include hidden files use,
find . -type f -exec sed -i 's/foo/bar/g' {} +
In both case the string foo will be replaced with new string bar

find /home/www/ -type f -exec perl -i.bak -pe 's/subdomainA\.example\.com/subdomainB.example.com/g' {} +
find /home/www/ -type f will list all files in /home/www/ (and its subdirectories).
The "-exec" flag tells find to run the following command on each file found.
perl -i.bak -pe 's/subdomainA\.example\.com/subdomainB.example.com/g' {} +
is the command run on the files (many at a time). The {} gets replaced by file names.
The + at the end of the command tells find to build one command for many filenames.
Per the find man page:
"The command line is built in much the same way that
xargs builds its command lines."
Thus it's possible to achieve your goal (and handle filenames containing spaces) without using xargs -0, or -print0.

I just needed this and was not happy with the speed of the available examples. So I came up with my own:
cd /var/www && ack-grep -l --print0 subdomainA.example.com | xargs -0 perl -i.bak -pe 's/subdomainA\.example\.com/subdomainB.example.com/g'
Ack-grep is very efficient on finding relevant files. This command replaced ~145 000 files with a breeze whereas others took so long I couldn't wait until they finish.

or use the blazing fast GNU Parallel:
grep -rl oldtext . | parallel sed -i 's/oldtext/newtext/g' {}

grep -lr 'subdomainA.example.com' | while read file; do sed -i "s/subdomainA.example.com/subdomainB.example.com/g" "$file"; done
I guess most people don't know that they can pipe something into a "while read file" and it avoids those nasty -print0 args, while presevering spaces in filenames.
Further adding an echo before the sed allows you to see what files will change before actually doing it.

Try this:
sed -i 's/subdomainA/subdomainB/g' `grep -ril 'subdomainA' *`

According to this blog post:
find . -type f | xargs perl -pi -e 's/oldtext/newtext/g;'

#!/usr/local/bin/bash -x
find * /home/www -type f | while read files
do
sedtest=$(sed -n '/^/,/$/p' "${files}" | sed -n '/subdomainA/p')
if [ "${sedtest}" ]
then
sed s'/subdomainA/subdomainB/'g "${files}" > "${files}".tmp
mv "${files}".tmp "${files}"
fi
done

If you do not mind using vim together with grep or find tools, you could follow up the answer given by user Gert in this link --> How to do a text replacement in a big folder hierarchy?.
Here's the deal:
recursively grep for the string that you want to replace in a certain path, and take only the complete path of the matching file. (that would be the $(grep 'string' 'pathname' -Rl).
(optional) if you want to make a pre-backup of those files on centralized directory maybe you can use this also: cp -iv $(grep 'string' 'pathname' -Rl) 'centralized-directory-pathname'
after that you can edit/replace at will in vim following a scheme similar to the one provided on the link given:
:bufdo %s#string#replacement#gc | update

You can use awk to solve this as below,
for file in `find /home/www -type f`
do
awk '{gsub(/subdomainA.example.com/,"subdomainB.example.com"); print $0;}' $file > ./tempFile && mv ./tempFile $file;
done
hope this will help you !!!

For replace all occurrences in a git repository you can use:
git ls-files -z | xargs -0 sed -i 's/subdomainA\.example\.com/subdomainB.example.com/g'
See List files in local git repo? for other options to list all files in a repository. The -z options tells git to separate the file names with a zero byte, which assures that xargs (with the option -0) can separate filenames, even if they contain spaces or whatnot.

A bit old school but this worked on OS X.
There are few trickeries:
• Will only edit files with extension .sls under the current directory
• . must be escaped to ensure sed does not evaluate them as "any character"
• , is used as the sed delimiter instead of the usual /
Also note this is to edit a Jinja template to pass a variable in the path of an import (but this is off topic).
First, verify your sed command does what you want (this will only print the changes to stdout, it will not change the files):
for file in $(find . -name *.sls -type f); do echo -e "\n$file: "; sed 's,foo\.bar,foo/bar/\"+baz+\"/,g' $file; done
Edit the sed command as needed, once you are ready to make changes:
for file in $(find . -name *.sls -type f); do echo -e "\n$file: "; sed -i '' 's,foo\.bar,foo/bar/\"+baz+\"/,g' $file; done
Note the -i '' in the sed command, I did not want to create a backup of the original files (as explained in In-place edits with sed on OS X or in Robert Lujo's comment in this page).
Happy seding folks!

just to avoid to change also
NearlysubdomainA.example.com
subdomainA.example.comp.other
but still
subdomainA.example.com.IsIt.good
(maybe not good in the idea behind domain root)
find /home/www/ -type f -exec sed -i 's/\bsubdomainA\.example\.com\b/\1subdomainB.example.com\2/g' {} \;

Here's a version that should be more general than most; it doesn't require find (using du instead), for instance. It does require xargs, which are only found in some versions of Plan 9 (like 9front).
du -a | awk -F' ' '{ print $2 }' | xargs sed -i -e 's/subdomainA\.example\.com/subdomainB.example.com/g'
If you want to add filters like file extensions use grep:
du -a | grep "\.scala$" | awk -F' ' '{ print $2 }' | xargs sed -i -e 's/subdomainA\.example\.com/subdomainB.example.com/g'

For Qshell (qsh) on IBMi, not bash as tagged by OP.
Limitations of qsh commands:
find does not have the -print0 option
xargs does not have -0 option
sed does not have -i option
Thus the solution in qsh:
PATH='your/path/here'
SEARCH=\'subdomainA.example.com\'
REPLACE=\'subdomainB.example.com\'
for file in $( find ${PATH} -P -type f ); do
TEMP_FILE=${file}.${RANDOM}.temp_file
if [ ! -e ${TEMP_FILE} ]; then
touch -C 819 ${TEMP_FILE}
sed -e 's/'$SEARCH'/'$REPLACE'/g' \
< ${file} > ${TEMP_FILE}
mv ${TEMP_FILE} ${file}
fi
done
Caveats:
Solution excludes error handling
Not Bash as tagged by OP

If you wanted to use this without completely destroying your SVN repository, you can tell 'find' to ignore all hidden files by doing:
find . \( ! -regex '.*/\..*' \) -type f -print0 | xargs -0 sed -i 's/subdomainA.example.com/subdomainB.example.com/g'

Using combination of grep and sed
for pp in $(grep -Rl looking_for_string)
do
sed -i 's/looking_for_string/something_other/g' "${pp}"
done

perl -p -i -e 's/oldthing/new_thingy/g' `grep -ril oldthing *`

to change multiple files (and saving a backup as *.bak):
perl -p -i -e "s/\|/x/g" *
will take all files in directory and replace | with x
called a “Perl pie” (easy as a pie)

How to grep lines that end with .c or .cpp?

I have a file as below, I want to grep for lines having .c or .cpp extension. I have tried using cat file|grep ".c" grep but I am getting all types of extensions as output. Please shed some light on this. Thanks in advance.
file contents are below:
/dir/a/b/cds/main.c
/dir/a/f/cmdss/file.cpp
/dir/a/b/cds/main.h
/dir/a/f/cmdss/file.hpp
/dir/a/b/cdys/main_abc.c
/dir/a/f/cmfs/file_123.cpp

grep supports regular expressions.
$ grep -E '\.(c|cpp)$' input
-E means 'Interpret PATTERN as an extended regular expression'
\. means a dot .
() is a group
c|cpp is an alternative
$ is the lineend

$ grep -E '\.cp{2}?' testfile1
/dir/a/b/cds/main.c
/dir/a/f/cmdss/file.cpp
/dir/a/b/cdys/main_abc.c
/dir/a/f/cmfs/file_123.cpp
$
May be this variant will useful. Here p{2} mean 'symbol p meet 2 times after symbol c'

Also you can use --include parameter like below
grep --include \*.hpp --include \*.cpp your_search_pattern

The Android framework defines a bash function extensions named cgrep, it goes recursively in the project directory, and it's much faster than using grep -r.
Usage:
cgrep <expession to find>
it greps only C/C++ header and source files.
function cgrep()
{
find . -name .repo -prune -o -name .git -prune -o -type f \( -name '*.c ' -o -name '*.cc' -o -name '*.cpp' -o -name '*.h' \) -print0 | xargs -0 gre p --color -n "$#"
}
You can paste this in you .bashrc file, or use the inline directly in shell.

Unix Command to List files containing string but NOT containing another string

How do I recursively view a list of files that has one string and specifically doesn't have another string? Also, I mean to evaluate the text of the files, not the filenames.
Conclusion:
As per comments, I ended up using:
find . -name "*.html" -exec grep -lR 'base\-maps' {} \; | xargs grep -L 'base\-maps\-bot'
This returned files with "base-maps" and not "base-maps-bot". Thank you!!

Try this:
grep -rl <string-to-match> | xargs grep -L <string-not-to-match>
Explanation: grep -lr makes grep recursively (r) output a list (l) of all files that contain <string-to-match>. xargs loops over these files, calling grep -L on each one of them. grep -L will only output the filename when the file does not contain <string-not-to-match>.

The use of xargs in the answers above is not necessary; you can achieve the same thing like this:
find . -type f -exec grep -q <string-to-match> {} \; -not -exec grep -q <string-not-to-match> {} \; -print
grep -q means run quietly but return an exit code indicating whether a match was found; find can then use that exit code to determine whether to keep executing the rest of its options. If -exec grep -q <string-to-match> {} \; returns 0, then it will go on to execute -not -exec grep -q <string-not-to-match>{} \;. If that also returns 0, it will go on to execute -print, which prints the name of the file.
As another answer has noted, using find in this way has major advantages over grep -Rl where you only want to search files of a certain type. If, on the other hand, you really want to search all files, grep -Rl is probably quicker, as it uses one grep process to perform the first filter for all files, instead of a separate grep process for each file.

These answers seem off as the match BOTH strings. The following command should work better:
grep -l <string-to-match> * | xargs grep -c <string-not-to-match> | grep '\:0'

Here is a more generic construction:
find . -name <nameFilter> -print0 | xargs -0 grep -Z -l <patternYes> | xargs -0 grep -L <patternNo>
This command outputs files whose name matches <nameFilter> (adjust find predicates as you need) which contain <patternYes>, but do not contain <patternNo>.
The enhancements are:
It works with filenames containing whitespace.
It lets you filter files by name.
If you don't need to filter by name (one often wants to consider all the files in current directory), you can strip find and add -R to the first grep:
grep -R -Z -l <patternYes> | xargs -0 grep -L <patternNo>

find . -maxdepth 1 -name "*.py" -exec grep -L "string-not-to-match" {} \;
This Command will get all ".py" files that don't contain "string-not-to-match" at same directory.

To match string A and exclude strings B & C being present in the same line I use, and quotes to allow search string to contain a space
grep -r <string A> | grep -v -e <string B> -e "<string C>" | awk -F ':' '{print $1}'
Explanation: grep -r recursively filters all lines matching in output format
filename: line
To exclude (grep -v) from those lines the ones that also contain either -e string B or -e string C. awk is used to print only the first field (the filename) using the colon as fieldseparator -F

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Regular expression to not match file two file types in Bash - linux

I'm trying to do a list in bash of files that are not .html or .js I've tired both of the following methods but neither work? ls !(.html|.js) ls | grep -v '\.(html|js)$'

There's yet another way to do it. bash has an option for extended glob patterns: shopt -s extglob ls !(.html|.js) (Note that this is still a glob pattern, not a regular expression -- for example, * means "any string", not "zero or more of the preceding thing").

Use extended-regexp with the -E option: ls | grep -E -v '\.(html|js)$'

The -I flag can filter ls output: ls -I '.html' -I '.js' or ls | grep -v -e '\.html' -e '\.js' From the man page: -e PATTERN, --regexp=PATTERN Use PATTERN as the pattern; useful to protect patterns beginning with -.

Related

how do I change string in all sub directories with same file name (For eg: data.txt) in linux using termianl?

Bash script: Using grep to find a string that is also an option

Bash script to recursively find and replace in files [duplicate]

How to grep lines that end with .c or .cpp?

Unix Command to List files containing string but NOT containing another string

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Regular expression to not match file two file types in Bash - linux

I'm trying to do a list in bash of files that are not .html or .js I've tired both of the following methods but neither work? ls !(*.html|*.js) ls | grep -v '\.(html|js)$'

There's yet another way to do it. bash has an option for extended glob patterns: shopt -s extglob ls !(*.html|*.js) (Note that this is still a glob pattern, not a regular expression -- for example, * means "any string", not "zero or more of the preceding thing").

Use extended-regexp with the -E option: ls | grep -E -v '\.(html|js)$'

The -I flag can filter ls output: ls -I '*.html' -I '*.js' or ls | grep -v -e '\.html' -e '\.js' From the man page: -e PATTERN, --regexp=PATTERN Use PATTERN as the pattern; useful to protect patterns beginning with -.

Related

how do I change string in all sub directories with same file name (For eg: data.txt) in linux using termianl?

Bash script: Using grep to find a string that is also an option

Bash script to recursively find and replace in files [duplicate]

How to grep lines that end with .c or .cpp?

Unix Command to List files containing string but *NOT* containing another string

Categories

Resources

I'm trying to do a list in bash of files that are not .html or .js I've tired both of the following methods but neither work? ls !(.html|.js) ls | grep -v '\.(html|js)$'

There's yet another way to do it. bash has an option for extended glob patterns: shopt -s extglob ls !(.html|.js) (Note that this is still a glob pattern, not a regular expression -- for example, * means "any string", not "zero or more of the preceding thing").

The -I flag can filter ls output: ls -I '.html' -I '.js' or ls | grep -v -e '\.html' -e '\.js' From the man page: -e PATTERN, --regexp=PATTERN Use PATTERN as the pattern; useful to protect patterns beginning with -.

Unix Command to List files containing string but NOT containing another string