how to batch sed files and redirect results

how to batch sed files and redirect results - linux

I wonder if there is a one-liner for batch processing a set of files in one folder and redirect the results into a different folder.
I tried something like this:
find input_dir/ -name "PATTERN" | xargs -I {} sed 's:foo:bar:g' > output_dir/{}
For example, input_dir/ has file A, B, C and my hoped result is to have processed files A, B, C in output_dir/, with the same file names.
my hope was to use {} to replace file names and to build the output file paths, but this didn't work.
Anyone knows how to fix this? Or other better ways of doing so?
Thanks!

My technique for this is to write a shell script that does the job, and then run it via find. For example, your actions could be written into a script munger.sh:
#!/bin/sh
for file in "$#"
do
output="output_dir/$(basename "$file")"
sed -e 's:foo:bar:g' "$file" > "$output"
done
The find command becomes:
find input_dir -name "PATTERN" -exec sh munger.sh {} +
This runs the script with the file names as arguments, bundling conveniently large number of file names into a single invocation of the shell script. If you're not going to need it again, you can simply remove munger.sh when you're done.
Yes, you can do all sorts of contortions to execute the command the way you want (perhaps using find … -exec bash -c "the script to be executed" arg0 {} +) but it is often harder than writing a relatively simple script and using it and throwing it away. There tend to be fewer problems with quoting, for example, when you run an explicit script than when you try to write the script on the command line. If you find yourself fighting with single quotes, double quotes and backslashes (or back-quotes), then it is time to use a simple script as shown.

Using GNU Parallel it looks like this:
find input_dir/ -name "PATTERN" | parallel sed s:foo:bar:g {} '>' output_dir/{/}
If the sed command has special chars, then you need to quote those double:
find input_dir/ -name "PATTERN" | parallel sed 's:foo.\*:bar:g' {} '>' output_dir/{/}

In two steps:
find input_dir/ -name "PATTERN" -exec cp -t output_dir/ {} +
than
sed 's:foo:bar:g' -i output_dir/*
or, if output_dir could contain files not matching "PATTERN":
find output_dir -name "PATTERN" -exec sed -e 's:foo:bar:g' -i {} +

Related

using grep in single-line files to find the number of occurrences of a word/pattern

I have json files in the current directory, and subdirectories. All the files have a single line of content.
I want to a list of all files that contain the word XYZ, and the number of times it occurs in that file.
I want to print the list according to the following format:
file_name pattern_occurence_times
It should look something like:
.\x1\x2\file1.json 3
.\x1\file3.json 2
The problem is that grep counts the NUMBER of lines containing XYZ, not the number of occurrences.
Since the whole content of the files is always contained in a single line, the count is always 1 (if the pattern occurs in the file).
I used this command for that:
find . -type f -name "*.json" -exec grep --files-with-match -i 'xyz' {} \; -exec grep -wci 'xyz' {} \;
I wrote a python code, and it works, but I would like to know if there is any way of doing that using find and grep or any other command line tools.
Thanks

The classical approach to this problem is the pipeline grep -o regex file | wc -l. However, to execute a pipeline in find's -exec you have to run a shell (e.g. sh -c ... ). But all these things together will only print the number of matches, not the file names. Also, files with no matches have to be filtered out.
Because of all of this I think a single awk command would be preferable:
find ... -type f -exec awk '{$0=tolower($0); c+=gsub(/xyz/,"")}
END {if(c>0) print FILENAME " " c}' {} \;
Here the tolower($0) emulates grep's -i option. Make sure to write your search pattern xyz only in lowercase.
If you want to combine this with subsequent filters in find you can add else exit 1 at the end of the last awk block to continue (inside find) only with the printed files.

Use the -o option of grep, e.g. in conjunction with wc, e.g.
find . -name "*.json" | while read -r f ; do
echo $f : $(grep -ow XYZ "$f" | wc -l)
done

No such file or directory when piping. Each command works separately, but not when piping

I have 2 folders: folder_a & folder_b. In each of these folders there are a bunch of files. I am trying to use sed to move all of these files out of these folders and into my current working directory I am currently in.
My folder structure looks like this:
mytest:
a:
1.txt
2.txt
3.txt
b:
4.txt
5.txt
The command I am trying to use is:
find . -type d ! -iname '*.*' # find all folders other than root
| sed -r 's/.*/&\/*/' # add '/*' to each of the arguments
| sed -r 'p;s/.*/./' # output: a/* . b/* .
| xargs -n 2 mv # should be creating two commands: 'mv a/* .' and 'mv b/* .'
Unfortunately I get an error:
mv: cannot stat './aaa/*': No such file or directory
I also get the same error when I try this other strategy (using ls instead of mv):
for dir in */; do
ls $dir;
done;
Even if I use sed to replace the spaces in each directory name with '\ ', or surround the directory names with quotes I get the same error.
I'm not sure if these 2 examples are related in my misunderstanding of bash but they both seem to demonstrate my ignorance of how bash translates the output from one command into the input of another command.
Can anyone shed some light on this?

Update: Completely rewritten.
As #EtanReisner and #melpomene have noted, mv */* . or, more specifically, mv a/* b/* . is the most straightforward solution, but you state that this is in part a learning exercise, so the remainder of the answer shows an efficient find-based solution and explains the problem with the original command.
An efficient find-based solution
Generally, if feasible, it's best and most efficient to let find itself do the work, without involving additional tools; find's -exec action is like a built-in xargs, with {} representing the path at hand (with terminator \;) / all paths (with +):
find . -type f -exec echo mv -t . {} +
To be safe, his will just print the mv commands that would be executed; remove the echo to actually execute them.
This will execute a single[1] mv command to which all matching files are passed, and -t . moves them all to the current dir.
[1] If the resulting command line is too long (which is unlikely), it is split up into multiple commands, just as with xargs.
Operating on files (-type f) bypasses the need for globbing, as find will then enumerate all files for you (it also bypasses the need to exclude . explicitly).
Note that this solution works on entire subtrees, not just (immediate) subdirectories.
It's tempting to consider turning on Bash 4's globstar option and using mv */** ., but that won't work, because it will attempt to move directories as well, not just the files in them.
A caveat re -exec with +: it only works if {} - the placeholder for all paths - is the token immediately before the +.
Since you're on Linux, we can satisfy this condition by specifying the target folder for mv with option -t before the {}; on BSD-based systems such as OSX, you could not do that, because mv doesn't support -t there, so you'd have to use terminator \;, which means that mv is called once for every path, which is obviously much slower.
Why your command didn't work:
As #EtanReisner points out in a comment, xargs invokes the command specified without (implicitly) involving a shell, so globbing won't work; you can verify this with the following command:
echo '*' | xargs echo # -> '*' - NO globbing
If we leave the globbing issue aside, additional work would have been necessary to make your xargs command work correctly with folder names with embedded spaces (or other shell metacharacters):
find . -mindepth 1 -type d |
sed -r "s/.*/'&'\/* ./" | # -> '<input-path>'/* . (including single-quotes)
xargs -n 2 echo mv # NOTE: still won't work due to lack of globbing
Note how the (combined) sed command now produces a single output line '<input-path>'/* ., with the input path enclosed in embedded single-quotes, which is required for xargs to recognize <input-path> as a single argument, even if it contains embedded spaces.
(If your filenames contain single-quotes, you'd have to do more work; also note that since now all arguments for a given dir. are on a single line, you could use xargs -L 1 ....)
Also note how -mindepth 1 (only process paths at the subdirectory level or below) is used to skip processing of . itself.
The only way to make globbing happen is to get the shell involved:
find . -mindepth 1 -type d |
sed -r "s/.*/'&'\/* ./" | # -> '<input-path>'/* . (including single-quotes)
xargs -I {} sh -c 'echo mv {}' # works, but is inefficient
Note the use of xargs' -I option to treat each input line as its own argument ({} is a self-chosen placeholder for the input).
sh -c invokes the (default) shell to execute the resulting command, at which globbing does happen.
However, overall, this is quite inefficient:
A pipeline with 3 segments is used.
A shell instance is invoked for every input path, which in turn calls the mv utility.
Compare this to the efficient find-only solution above, which (typically) creates only 2 processes in total.

Search&Replace into multiple files with the name of the containing folder

I have multiple folders with names :
1_1,1_2,...,2_1,...,
each of these folders contains the same file with the name file.sh. The file has the following form :
job_name=NAME
Partition = Long
I want to use a search&replace command in the terminal (Linux) for all my folders, like for example the following
find . -type f -name "file.sh" -print |xargs sed -i 's/job_name/REPLACED_TEXT/g'
and in the position of the REPLACED_TEXT I want the name of the folder. For example, inside folder 1_1, there will be the file.sh file with the modified form:
job_name=1_1
Partition = Long
I haven't found a solution for that yet.

You didn't specify how many subdirectories you might have to traverse, e.g.
./1_1/file.sh
./1_2/file.sh
./a/b/c/1_1/file.sh
So for this I'll just assume one subdirectory like so:
./1_1/file.sh
./1_2/file.sh
Something like the below should be able to get you started, not tested, just writing it off the top of my head. It's bash scripted but you can turn it into one big long command. Make sure to back up your directory first in case the script has unpredictable results.
for i in `find . -type f -print "file.sh"`;
do
subdir=`echo $i | awk -F\/ '{print $2}'`
sed -e s/job_name=NAME/jobname=$subdir/ $i > $i.bak
mv $i.bak $i
done

You can try this line to print all the sed commands you want to execute:
find . -type f -name 'file.sh' | \
sed 's=\(.*\)/\([^/]*\)=sed -i "s/NAME/\1/" \"&\"='
For each file we found, it extracts the name of its directory and creates a sed command able to replace NAME with it.
Output should be something like:
sed -i "s/NAME/1_1/" "1_1/file.sh"
sed -i "s/NAME/1_2/" "1_2/file.sh"
Then, if it looks good to you, you can repeat with the e command for sed, which will make the outer sed execute its result (i.e. inner sed command), like this:
find . -type f -name 'file.sh' | \
sed 's=\(.*\)/\([^/]*\)=sed -i "s/NAME/\1/" \"&\"=e'
# 'e' command added here -------------------------^

Linux command output as a parameter of another command

I would like to pass the output list of elements of a command as a parameter of another command. I have found some other pages:
How to display the output of a Linux command on stdout and also pipe it to another command?
Use output of bash command (with pipe) as a parameter for another command
but they seem to be more complex.
I just would like to copy a file to every result of a call to the Linux find command.
What is wrong here?:
find . -name myFile 2>&1 | cp /home/myuser/myFile $1
Thanks

This is what you want:
find . -name myFile -exec cp /home/myuser/myFile {} ';'
A breakdown / explanation of this:
find: invoking the find command
.: start search from current working directory.
Since no depth flags are specified, this will search recursively for all subfolders
-name myFile: find files with the explicit name myFile
-exec: for the search results, perform additional commands with them
cp /home/myuser/myFile {}: copies /home/myuser/myFile to overwrite each result returned by find to ; think of {} as where each search result goes.
';': used to separate different commands to be run after find

There are a couple of ways to solve this, depending on whether you need to worry about files with spaces or other special characters in their names.
If none of the filenames have spaces or special characters (they consist only of letters, numbers, dashes, and underscores), then the following is a simple solution that will work. You can use $(command) to execute a command, and substitute the results into the arguments of another command. The shell will split the result on spaces, tabs, or newlines, and for assign each value to $f in turn, and run the command on each value.
for f in $(find . -name myFile)
do
cp something $f
done
If you do have spaces or tabs, you could use find's -exec option. You pass -exec command args, putting {} where you want the filename to be substituted, and ending the arguments with a ;. You need to quote the {} and ; so that the shell doesn't interpret them.
find . -name myFile -exec cp something "{}" \;
Sometimes -exec is not sufficient. For example, in this question, they wanted to use Bash parameter expansion to compute the filename. In order to do that, you need to pass -exec bash -c 'your command', but then you will run into quoting problems with the {} substitution. To solve this, you can use -print0 from find to print the results delimited with null characters (which are invalid in filenames), and pipe it to a while read loop that splits parameters on nulls:
find . -name myFile -print0 | (while read -d $'\0' f; do
cp something "$f"
done)

The pipe will send the output of one program to the input of another. cp does not read from its input stream at the terminal, it merely uses the arguments on the command line.
You want to either use xargs with the pipe or find's exec argument instead of pipes.
find . -name myFile 2>&1 | xargs -I {} cp /home/myuser/myFile {}
Note: option -I {} defines {} as the place holder you could alternatively use someother placeholder if it conflicts with command to be executed.

Unix: traverse a directory

I need to traverse a directory so starting in one directory and going deeper into difference sub directories. However I also need to be able to have access to each individual file to modify the file. Is there already a command to do this or will I have to write a script? Could someone provide some code to help me with this task? Thanks.

The find command is just the tool for that. Its -exec flag or -print0 in combination with xargs -0 allows fine-grained control over what to do with each file.
Example: Replace all foo's by bar's in all files in /tmp and subdirectories.
find /tmp -type f -exec sed -i -e 's/foo/bar/' '{}' ';'

for i in `find` ; do
if [ -d $i ] ; then do something with a directory ; fi
if [ -f $i ] ; then do something with a file etc. ; fi
done
This will return the whole tree (recursively) in the current directory in a list that the loop will go through.

This can be easily achieved by mixing find, xargs, sed (or other file modification command).
For example:
$ find /path/to/base/dir -type f -name '*.properties' | xargs sed -ie '/^#/d'
This will filter all files with file extension .properties.
The xargs command will feed the file path generated by find command into the sed command.
The sed command will delete all lines start with # in the files (feed by xargs).
Command combination in this way is very flexible.
For example, find command have different parameters so you can filter by user name, file size, file path (eg: under /test/ subfolder), file modification time.
Another dimension of flexibility is how and what to change in your file. For ex, sed command allows you to make changes on file in applying substitution (specify via regular expressions). Similarly, you can use gzip to compress the file. And so on ...

You would usually use the find command. On Linux, you have the GNU version, of course. It has many extra (and useful) options. Both will allow you to execute a command (eg a shell script) on the files as they are found.
The exact details of how to make changes to the file depend on the change you want to make to the file. That is probably best scripted, with find running the script:
POSIX or GNU:
find . -type f -exec your_script '{}' +
This will run your script once for a group of files with those names provided as arguments. If you want to do it one file at a time, replace the + with ';' (or \;).

I am assuming SearchMe is the example directory name you need to traverse completely.
I am also assuming, since it was not specified, the files you want to modify are all text file. Is this correct?
In such scenario I would suggest using the command:
find SearchMe -type f -exec vi {} \;
If you are not familiar with vi editor, just use another one (nano, emacs, kate, kwrite, gedit, etc.) and it should work as well.

Bash 4+
shopt -s globstar
for file in **
do
if [ -f "$file" ];then
# do some processing to your file here
# where the find command can't do conveniently
fi
done

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

how to batch sed files and redirect results - linux

Using GNU Parallel it looks like this: find input_dir/ -name "PATTERN" | parallel sed s:foo:bar:g {} '>' output_dir/{/} If the sed command has special chars, then you need to quote those double: find input_dir/ -name "PATTERN" | parallel sed 's:foo.\*:bar:g' {} '>' output_dir/{/}

In two steps: find input_dir/ -name "PATTERN" -exec cp -t output_dir/ {} + than sed 's:foo:bar:g' -i output_dir/* or, if output_dir could contain files not matching "PATTERN": find output_dir -name "PATTERN" -exec sed -e 's:foo:bar:g' -i {} +

Related

using grep in single-line files to find the number of occurrences of a word/pattern

No such file or directory when piping. Each command works separately, but not when piping

Search&Replace into multiple files with the name of the containing folder

Linux command output as a parameter of another command

Unix: traverse a directory

Categories

Resources