grep and sed with spaces in filenames - linux

Currently, I have
grep -irl $schema $WORKDIR/ | xargs sed -i 's/'"$schema"'/EXI1/gI'
which doesn't work for filenames with spaces.
Any ideas, how to search and replace recursively for all files?
Thanks

Add the -Z (aka --null) flag to grep, and the -0 (also aka --null) flag to xargs.
This will output NUL terminated file names, and tell xargs to read NUL terminated arguments.
eg.
grep -irlZ $schema $WORKDIR/ | xargs -0 sed -i 's/'"$schema"'/EXI1/gI'

find with sed should work:
find $WORKDIR/ -type f -exec sed -i.bak "s/$schema/EXI1/gI" '{}' +
OR
find $WORKDIR/ -type f -print0 | xargs -0 sed -i.bak "s/$schema/EXI1/gI"

Related

Why does Perl delete file content when used with find?

Can anyone see why Perl deletes all file content when it is used together with find?
echo stning >> test.tex
echo stning >> test.tex
find . -type f -name \*.tex -print0 | xargs -0 perl -i -ne 's/stning/sætning/g'
cat test.tex
The last command doesn't return anything, and that the issue.
You need -p, not -n. The -n flag only reads, but doesn't print.
find . -type f -name \*.tex -print0 | xargs -0 perl -i -pe 's/stning/sætning/g'
You can easily remember this with the mnemonic perl pie, which is perl -p -i -e or shorter perl -pi -e.
You can achieve your result with sed itself. As just you need an inline replacement.
find . -type f -name \*.tex -print0 | xargs -0 sed -i 's/stning/sætning/g'

Grep - How to concatenate filename to each returned line of file content?

I have a statement which
Finds a set of files
Cats their contents out
Then greps their contents
It is this pipeline:
find . | grep -i "Test_" | xargs cat | grep -i "start-node name="
produces an output such as:
<start-node name="Start" secure="false"/>
<start-node name="Run" secure="false"/>
What I was hoping to get is something like:
filename1-<start-node name="Start" secure="false"/>
filename2-<start-node name="Run" secure="false"/>
An easier may be to execute grep on the result of find, without xargs and cat:
grep -i "Test_" `find .` | grep -i "start-node name="
Because you cat all the files into a single stream, grep doesn't have any filename information. You want to give all the filenames to grep as arguments:
find ... | xargs grep "<start-node name=" /dev/null
Note two additional changes - I've dropped the -i flag, as it appears you're inspecting XML, and that's not case-insensitive; I've added /dev/null to the list of files, so that grep always has at least two files of input, even if find only gives one result. That's the portable way to get grep to print filenames.
Now, let's look at the find command. Instead of finding all files, then filtering through grep, we can use the -iregex predicate of GNU grep:
find . -iregex '.*Test_.*' \( -type 'f' -o -type 'l' \) | xargs grep ...
The mixed-case pattern suggests your filenames aren't really case-insensitive, and you might not want to grep symlinks (I'm sure you don't want directories and special files passed through), in which case you can simplify (and can use portable find again):
find . -name '*Test_*' -type 'f' | xargs grep ...
Now protect against the kind of filenames that trip up pipelines, and you have
find . -name '*Test_*' -type 'f' -print0 \
| xargs -0 grep -e "<start-node name=" -- /dev/null
Alternatively, if you have GNU grep, you don't need find at all:
grep --recursive --include '*[Tt]est_*' -e "<start-node name=" .
If you just need to count them:
find . | grep -i "Test_" | xargs cat | grep -i "start-node name=" | awk 'BEGIN{n=0}{n=n+1;print "filename" n "-" $0}'
From man grep:
-H Always print filename headers with output lines.

Bash script to recursively find and replace in files [duplicate]

How do I find and replace every occurrence of:
subdomainA.example.com
with
subdomainB.example.com
in every text file under the /home/www/ directory tree recursively?
find /home/www \( -type d -name .git -prune \) -o -type f -print0 | xargs -0 sed -i 's/subdomainA\.example\.com/subdomainB.example.com/g'
-print0 tells find to print each of the results separated by a null character, rather than a new line. In the unlikely event that your directory has files with newlines in the names, this still lets xargs work on the correct filenames.
\( -type d -name .git -prune \) is an expression which completely skips over all directories named .git. You could easily expand it, if you use SVN or have other folders you want to preserve -- just match against more names. It's roughly equivalent to -not -path .git, but more efficient, because rather than checking every file in the directory, it skips it entirely. The -o after it is required because of how -prune actually works.
For more information, see man find.
The simplest way for me is
grep -rl oldtext . | xargs sed -i 's/oldtext/newtext/g'
Note: Do not run this command on a folder including a git repo - changes to .git could corrupt your git index.
find /home/www/ -type f -exec \
sed -i 's/subdomainA\.example\.com/subdomainB.example.com/g' {} +
Compared to other answers here, this is simpler than most and uses sed instead of perl, which is what the original question asked for.
All the tricks are almost the same, but I like this one:
find <mydir> -type f -exec sed -i 's/<string1>/<string2>/g' {} +
find <mydir>: look up in the directory.
-type f:
File is of type: regular file
-exec command {} +:
This variant of the -exec action runs the specified command on the selected files, but the command line is built by appending
each selected file name at the end; the total number of invocations of the command will be much less than the number of
matched files. The command line is built in much the same way that xargs builds its command lines. Only one instance of
`{}' is allowed within the command. The command is executed in the starting directory.
For me the easiest solution to remember is https://stackoverflow.com/a/2113224/565525, i.e.:
sed -i '' -e 's/subdomainA/subdomainB/g' $(find /home/www/ -type f)
NOTE: -i '' solves OSX problem sed: 1: "...": invalid command code .
NOTE: If there are too many files to process you'll get Argument list too long. The workaround - use find -exec or xargs solution described above.
cd /home/www && find . -type f -print0 |
xargs -0 perl -i.bak -pe 's/subdomainA\.example\.com/subdomainB.example.com/g'
For anyone using silver searcher (ag)
ag SearchString -l0 | xargs -0 sed -i 's/SearchString/Replacement/g'
Since ag ignores git/hg/svn file/folders by default, this is safe to run inside a repository.
This one is compatible with git repositories, and a bit simpler:
Linux:
git grep -l 'original_text' | xargs sed -i 's/original_text/new_text/g'
Mac:
git grep -l 'original_text' | xargs sed -i '' -e 's/original_text/new_text/g'
(Thanks to http://blog.jasonmeridth.com/posts/use-git-grep-to-replace-strings-in-files-in-your-git-repository/)
To cut down on files to recursively sed through, you could grep for your string instance:
grep -rl <oldstring> /path/to/folder | xargs sed -i s^<oldstring>^<newstring>^g
If you run man grep you'll notice you can also define an --exlude-dir="*.git" flag if you want to omit searching through .git directories, avoiding git index issues as others have politely pointed out.
Leading you to:
grep -rl --exclude-dir="*.git" <oldstring> /path/to/folder | xargs sed -i s^<oldstring>^<newstring>^g
A straight forward method if you need to exclude directories (--exclude-dir=..folder) and also might have file names with spaces (solved by using 0Byte for both grep -Z and xargs -0)
grep -rlZ oldtext . --exclude-dir=.folder | xargs -0 sed -i 's/oldtext/newtext/g'
An one nice oneliner as an extra. Using git grep.
git grep -lz 'subdomainA.example.com' | xargs -0 perl -i'' -pE "s/subdomainA.example.com/subdomainB.example.com/g"
Simplest way to replace (all files, directory, recursive)
find . -type f -not -path '*/\.*' -exec sed -i 's/foo/bar/g' {} +
Note: Sometimes you might need to ignore some hidden files i.e. .git, you can use above command.
If you want to include hidden files use,
find . -type f -exec sed -i 's/foo/bar/g' {} +
In both case the string foo will be replaced with new string bar
find /home/www/ -type f -exec perl -i.bak -pe 's/subdomainA\.example\.com/subdomainB.example.com/g' {} +
find /home/www/ -type f will list all files in /home/www/ (and its subdirectories).
The "-exec" flag tells find to run the following command on each file found.
perl -i.bak -pe 's/subdomainA\.example\.com/subdomainB.example.com/g' {} +
is the command run on the files (many at a time). The {} gets replaced by file names.
The + at the end of the command tells find to build one command for many filenames.
Per the find man page:
"The command line is built in much the same way that
xargs builds its command lines."
Thus it's possible to achieve your goal (and handle filenames containing spaces) without using xargs -0, or -print0.
I just needed this and was not happy with the speed of the available examples. So I came up with my own:
cd /var/www && ack-grep -l --print0 subdomainA.example.com | xargs -0 perl -i.bak -pe 's/subdomainA\.example\.com/subdomainB.example.com/g'
Ack-grep is very efficient on finding relevant files. This command replaced ~145 000 files with a breeze whereas others took so long I couldn't wait until they finish.
or use the blazing fast GNU Parallel:
grep -rl oldtext . | parallel sed -i 's/oldtext/newtext/g' {}
grep -lr 'subdomainA.example.com' | while read file; do sed -i "s/subdomainA.example.com/subdomainB.example.com/g" "$file"; done
I guess most people don't know that they can pipe something into a "while read file" and it avoids those nasty -print0 args, while presevering spaces in filenames.
Further adding an echo before the sed allows you to see what files will change before actually doing it.
Try this:
sed -i 's/subdomainA/subdomainB/g' `grep -ril 'subdomainA' *`
According to this blog post:
find . -type f | xargs perl -pi -e 's/oldtext/newtext/g;'
#!/usr/local/bin/bash -x
find * /home/www -type f | while read files
do
sedtest=$(sed -n '/^/,/$/p' "${files}" | sed -n '/subdomainA/p')
if [ "${sedtest}" ]
then
sed s'/subdomainA/subdomainB/'g "${files}" > "${files}".tmp
mv "${files}".tmp "${files}"
fi
done
If you do not mind using vim together with grep or find tools, you could follow up the answer given by user Gert in this link --> How to do a text replacement in a big folder hierarchy?.
Here's the deal:
recursively grep for the string that you want to replace in a certain path, and take only the complete path of the matching file. (that would be the $(grep 'string' 'pathname' -Rl).
(optional) if you want to make a pre-backup of those files on centralized directory maybe you can use this also: cp -iv $(grep 'string' 'pathname' -Rl) 'centralized-directory-pathname'
after that you can edit/replace at will in vim following a scheme similar to the one provided on the link given:
:bufdo %s#string#replacement#gc | update
You can use awk to solve this as below,
for file in `find /home/www -type f`
do
awk '{gsub(/subdomainA.example.com/,"subdomainB.example.com"); print $0;}' $file > ./tempFile && mv ./tempFile $file;
done
hope this will help you !!!
For replace all occurrences in a git repository you can use:
git ls-files -z | xargs -0 sed -i 's/subdomainA\.example\.com/subdomainB.example.com/g'
See List files in local git repo? for other options to list all files in a repository. The -z options tells git to separate the file names with a zero byte, which assures that xargs (with the option -0) can separate filenames, even if they contain spaces or whatnot.
A bit old school but this worked on OS X.
There are few trickeries:
• Will only edit files with extension .sls under the current directory
• . must be escaped to ensure sed does not evaluate them as "any character"
• , is used as the sed delimiter instead of the usual /
Also note this is to edit a Jinja template to pass a variable in the path of an import (but this is off topic).
First, verify your sed command does what you want (this will only print the changes to stdout, it will not change the files):
for file in $(find . -name *.sls -type f); do echo -e "\n$file: "; sed 's,foo\.bar,foo/bar/\"+baz+\"/,g' $file; done
Edit the sed command as needed, once you are ready to make changes:
for file in $(find . -name *.sls -type f); do echo -e "\n$file: "; sed -i '' 's,foo\.bar,foo/bar/\"+baz+\"/,g' $file; done
Note the -i '' in the sed command, I did not want to create a backup of the original files (as explained in In-place edits with sed on OS X or in Robert Lujo's comment in this page).
Happy seding folks!
just to avoid to change also
NearlysubdomainA.example.com
subdomainA.example.comp.other
but still
subdomainA.example.com.IsIt.good
(maybe not good in the idea behind domain root)
find /home/www/ -type f -exec sed -i 's/\bsubdomainA\.example\.com\b/\1subdomainB.example.com\2/g' {} \;
Here's a version that should be more general than most; it doesn't require find (using du instead), for instance. It does require xargs, which are only found in some versions of Plan 9 (like 9front).
du -a | awk -F' ' '{ print $2 }' | xargs sed -i -e 's/subdomainA\.example\.com/subdomainB.example.com/g'
If you want to add filters like file extensions use grep:
du -a | grep "\.scala$" | awk -F' ' '{ print $2 }' | xargs sed -i -e 's/subdomainA\.example\.com/subdomainB.example.com/g'
For Qshell (qsh) on IBMi, not bash as tagged by OP.
Limitations of qsh commands:
find does not have the -print0 option
xargs does not have -0 option
sed does not have -i option
Thus the solution in qsh:
PATH='your/path/here'
SEARCH=\'subdomainA.example.com\'
REPLACE=\'subdomainB.example.com\'
for file in $( find ${PATH} -P -type f ); do
TEMP_FILE=${file}.${RANDOM}.temp_file
if [ ! -e ${TEMP_FILE} ]; then
touch -C 819 ${TEMP_FILE}
sed -e 's/'$SEARCH'/'$REPLACE'/g' \
< ${file} > ${TEMP_FILE}
mv ${TEMP_FILE} ${file}
fi
done
Caveats:
Solution excludes error handling
Not Bash as tagged by OP
If you wanted to use this without completely destroying your SVN repository, you can tell 'find' to ignore all hidden files by doing:
find . \( ! -regex '.*/\..*' \) -type f -print0 | xargs -0 sed -i 's/subdomainA.example.com/subdomainB.example.com/g'
Using combination of grep and sed
for pp in $(grep -Rl looking_for_string)
do
sed -i 's/looking_for_string/something_other/g' "${pp}"
done
perl -p -i -e 's/oldthing/new_thingy/g' `grep -ril oldthing *`
to change multiple files (and saving a backup as *.bak):
perl -p -i -e "s/\|/x/g" *
will take all files in directory and replace | with x
called a “Perl pie” (easy as a pie)

xargs inconsistent behavior and -n1 parameter

I have a shell script
find . -name "*.java" -print0 | xargs -0 grep -Lz 'regular_expression'
which outputs file names not matching the regexp in this way:
file1.java
file2.java
...
The way I understand, it works as follows: find find needed files and concatenate their names with \0. Then xargs split the output of find with \0 and feeds them to grep one-by-one.
Then I wanted to add one more stage and get only basename of the files. I modified the script:
find . -name "*.java" -print0 | xargs -0 grep -LzZ 'regular_expression' | xargs -0 basename
but got an error. I started investigating and made an temporary output:
find . -name "*.java" -print0 | xargs -0 grep -LzZ 'regular_expression' | xargs -0 echo basename
and got this:
basename ./file1.java ./file2.java ./subdir/file1.java ./subdir/file2.java
So, the filenames were not split by \0. I can't get why they are split in case of xargs used with grep and not split in xargs with basename.
I got a workaround by using -n1 in the latter xargs. But still I don't understand why I needed it (given I didn't use in in the xargs with grep) and what this parameter does.
Hope you can explain to me what -n1 does and why I needed it in the latter usage and didn't need it in the former with grep.
-n1 tells xargs to run the given command once per argument.
So if you have something like
echo file1 file2 file2 | xargs basename
That's equivalent to
basename file1 file2 file2
But if you do
echo file1 file2 file2 | xargs -n1 basename
That will cause xargs to run:
basename file1
basename file2
basename file2
As for xargs's -0 flag, that's an alias to the --null option which tells xargs to split on \0 instead of the default whitespace. You needed it after the find because the find put in \0 with -print0, but the result of grep is plain whitespace separated tokens.
The filenames were split by \0. The difference is in the commands you're using. xargs normally takes its standard input, breaks it into a list (here, by splitting on NUL), and then passes that list as extra arguments to your command. So when you do this:
find . -name "*.java" -print0 | xargs -0 grep -Lz 'regular_expression'
What actually runs is this:
grep -Lz 'regular_expression' file1.java file2.java file3.java...
Here, the -z doesn't matter because it only affects how grep reads stdin, and you're not sending anything to its stdin.
So, when you add another xargs that runs basename, you get this:
basename file1.java file2.java file3.java...
But while grep will take any number of filename arguments, basename only takes one and ignores the others.
That's where -n 1 comes in: it tells xargs to break its list of arguments into chunks (of 1), and run the command multiple times. So what runs now is:
basename file1.java
basename file2.java
basename file3.java
...
And all the output is concatenated together onto stdout.

Write output out of grep into a file on Linux?

find . -name "*.php" | xargs grep -i -n "searchstring" >output.txt
Here I am trying to write data into a file which is not happening...
How about appending results using >>?
find . -name "*.php" | xargs grep -i -n "searchstring" >> output.txt
I haven't got a Linux box with me right now, so I'll try to improvize.
the xargs grep -i -n "searchstring" bothers me a bit.
Perhaps you meant xargs -I {} grep -i "searchstring" {}, or just xargs grep -i "searchstring"?
Since -n as grep's argument will give you only number lines, I doubt this is what you needed.
This way, your final code would be
find . -name "*.php" | xargs grep -i "searchstring" >> output.txt
find . -name "*.php" -exec grep -i -n "function" {} \; >output.txt
But you won't know what file it came from. You might want:
find . -name "*.php" -exec grep -i -Hn "function" {} \; >output.txt
instead.
I guess that you have spaces in the php filenames. If you hand them to grep through xargs in the way that you do, the names get split into parts and grep interprets those parts as filenames which it then cannot find.
There is a solution for that. find has a -print0 option that instructs find to separate results by a NUL byte and xargs has a -0 option that instructs xargs to expect a NUL byte as separator. Using those you get:
find . -name "*.php" -print0 | xargs -0 grep -i -n "searchstring" > output.txt
Try using line-buffered
grep --line-buffered
[edit]
I ran your original command on my box and it seems to work fine, so I'm not sure anymore.
Looks fine to me. What happens if you remove >output.txt?
If you're searching trees of source code, please consider using ack. To do what you're doing in ack, regardless of there being spaces in filenames, you'd do:
ack --php -i searchstring > output.txt
I always use the following command. It displays the output on a console and also creates the file
grep -r "string to be searched" . 2>&1 | tee /your/path/to/file/filename.txt
Check free disk space by
$ df -Th
It could be not enough free space on your disk.

Resources