Linux - Get a line from one file and corresponding line from a second file and pass into cp command - linux

I have two .txt files.
'target.txt' is a list of target files
'destination.txt' is a list of (on corresponding lines) of destinations.
I'd like to create a command that does the following:
cp [line 1 from target.txt] [line 1 from destination.txt]
For each line of the files.

paste target.txt destination.txt | sed -e 's/^/cp /' > cp.cmds
Then, after inspecting cp.cmds for correctness, you can just run it as a shell script.
sh cp.cmds

The paste command merges two files by concatenating corresponding lines.
paste target.txt destination.txt | while read target dest; do
cp $target $dest
done
This will not work if any of the filenames contain spaces, though. If that's a requirement, I would use awk to read the first file into an array, then when reading the second file print a cp command with the corresponding lines and quotes around them, and pipe this to sh to execute it.

To handle whitespace in the filenames:
paste -d\\n target.txt destination.txt | xargs -d\\n -n2 -x cp
paste -d\\n interleaves lines of the argument files
xargs -d\\n -n2 reads two complete lines at a time and applies them as two arguments at the end of the command line. The -d flag disables all special processing of quotes, apostrophes and backslashes in the input lines, as well as the eof character (by default _).
The -d command-line options to xargs is a GNU extension. If you are stuck with a Posix standard xargs, you can use the following alternative, courtesy of the Open Group (see example 2, near the end of the page):
paste -d\\n target.txt destination.txt |
sed 's/[^[:alnum:]]/\\&/g' |
xargs -E "" -n 2 -x cp
The sed command backslash-escapes every non-alphanumeric character
xargs -E "" disables the end-of-file character handling.

Related

Move a file list based upon grep pattern in command line [duplicate]

I want to pass each output from a command as multiple argument to a second command, e.g.:
grep "pattern" input
returns:
file1
file2
file3
and I want to copy these outputs, e.g:
cp file1 file1.bac
cp file2 file2.bac
cp file3 file3.bac
How can I do that in one go? Something like:
grep "pattern" input | cp $1 $1.bac
You can use xargs:
grep 'pattern' input | xargs -I% cp "%" "%.bac"
You can use $() to interpolate the output of a command. So, you could use kill -9 $(grep -hP '^\d+$' $(ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }')) if you wanted to.
In addition to Chris Jester-Young good answer, I would say that xargs is also a good solution for these situations:
grep ... `ls -lad ... | awk '{ print $9 }'` | xargs kill -9
will make it. All together:
grep -hP '^\d+$' `ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }'` | xargs kill -9
For completeness, I'll also mention command substitution and explain why this is not recommended:
cp $(grep -l "pattern" input) directory/
(The backtick syntax cp `grep -l "pattern" input` directory/ is roughly equivalent, but it is obsolete and unwieldy; don't use that.)
This will fail if the output from grep produces a file name which contains whitespace or a shell metacharacter.
Of course, it's fine to use this if you know exactly which file names the grep can produce, and have verified that none of them are problematic. But for a production script, don't use this.
Anyway, for the OP's scenario, where you need to refer to each match individually and add an extension to it, the xargs or while read alternatives are superior anyway.
In the worst case (meaning problematic or unspecified file names), pass the matches to a subshell via xargs:
grep -l "pattern" input |
xargs -r sh -c 'for f; do cp "$f" "$f.bac"; done' _
... where obviously the script inside the for loop could be arbitrarily complex.
In the ideal case, the command you want to run is simple (or versatile) enough that you can simply pass it an arbitrarily long list of file names. For example, GNU cp has a -t option to facilitate this use of xargs (the -t option allows you to put the destination directory first on the command line, so you can put as many files as you like at the end of the command):
grep -l "pattern" input | xargs cp -t destdir
which will expand into
cp -t destdir file1 file2 file3 file4 ...
for as many matches as xargs can fit onto the command line of cp, repeated as many times as it takes to pass all the files to cp. (Unfortunately, this doesn't match the OP's scenario; if you need to rename every file while copying, you need to pass in just two arguments per cp invocation: the source file name and the destination file name to copy it to.)
So in other words, if you use the command substitution syntax and grep produces a really long list of matches, you risk bumping into ARG_MAX and "Argument list too long" errors; but xargs will specifically avoid this by instead copying only as many arguments as it can safely pass to cp at a time, and running cp multiple times if necessary instead.
The above will still work incorrectly if you have file names which contain newlines. Perhaps see also https://mywiki.wooledge.org/BashFAQ/020
#!/bin/bash
for f in files; do
if grep -q PATTERN "$f"; then
echo cp -v "$f" "${f}.bac"
fi
done
files can be *.txt or *.text which basically means files ending in *.txt or *text or replace with something that you want/need, of course replace PATTERN with yours. Remove echo if you're satisfied with the output. For a recursive solution take a look at the bash shell option globstar

How to print File name in the same file?

I have a file with around 5 lines and I want to have the file name printed at the end of every line.
for file in *.txt
do
sed -i "1s/\$/${file%%.*}/" "$file"
done
The above code only writes file name in first line, I want to have file name in every line.
The above code only writes file name in first line
This is what the 1 on the beginning of the sed command does: it is an address that selects the lines processed by the command.
In your case, the s command applies only to the first line (because of 1 in front of the command). Remove the 1 from the command and it will apply to all lines of the file:
for file in *.txt
do
sed -i "s/\$/${file%%.*}/" "$file"
done
Read more about sed at https://www.gnu.org/software/sed/manual/sed.html.
Given that you have already learned sed, typing man sed on your terminal will refresh your memory about its commands.
This is a bit hacky, but it does the trick (bash):
filename=<filename>; len=$(wc -l $filename | cut -f1 -d' '); for i in $(seq $len); do echo $filename; done | paste $filename -
And this is cleaner, but needs python installed:
python -c "import sys; print('\n'.join(line.rstrip() + '\t' + sys.argv[1] for line in open(sys.argv[1])))" <filename>

Linux command to replace set of lines for a group of files under a directory

I need to replace first 4 header lines of only selected 250 erlang files (with extension .erl), but there are 400 erlang files in total in the directory+subdirectories, I need to avoid modifying the files which doesn't need the change.
I've the list of file names that are to be modified, but don't know how to make my linux command to make use of them.
sed -i '1s#.*#%% This Source Code Form is subject to the terms of the Mozilla Public#' *.erl
sed -i '2s#.*#%% License, v. 2.0. If a copy of the MPL was not distributed with this file,#' *.erl
sed -i '3s#.*#%% You can obtain one at http://mozilla.org/MPL/2.0/.#' *.erl
sed -i '4s#.*##' *.erl
in the above commands instead of passing *.erl I want to pass those list of file names which I need to modify, doing that one by one will take me more than 3 days to complete it.
Is there any way to do this?
Iterate over the shortlisted file names using awk and use xargs to execute the sed. You can execute multiple sed commands to a file using -e option.
awk '{print $1}' your_shortlisted_file_lists | xargs sed -i -e first_sed -e second_sed $1
xargs gets the file name from awk in a $1 variable.
Try this:
< file_list.txt xargs -1 sed -i -e 'first_cmd' -e 'second_cmd' ...
Not answering your question but a suggestion for improvement. Four sed commands for replacing header is inefficient. I would instead write the new header into a file and do the following
sed -i -e '1,3d' -e '4{r header' -e 'd}' file
will replace the first four lines of the file with header.
Another concern with your current s### approach is you have to watch for special chars \, & and your delimiter # in the text you are replacing.
You can apply the sed c (for change) command to each file of your list :
while read file; do
sed -i '1,4 c\
%% This Source Code Form is subject to the terms of the Mozilla Public\
%% License, v. 2.0. If a copy of the MPL was not distributed with this file,\
%% You can obtain one at http://mozilla.org/MPL/2.0/.\
' "$file"
done < filelist
Let's say you have a file called file_list.txt with all file names as content:
file1.txt
file2.txt
file3.txt
file4.txt
You can simply read all lines into a variable (here: files) and then iterate through each one:
files=`cat file_list.txt`
for file in $files; do
echo "do something with $file"
done

clean letters and characters in files leaving only numbers using bash

I am reading files and i am doing something like:
cat file | sed s/\ //g |awk '$0 !~ /[^0-9]/'
With this line I want to clean anything different to numbers.
But i have a problem, when the file is not sorted the command works fine, but with a sorted file the command not works, the output is empty.
Who can help me?
with grep -o '[0-9]+' not works because:
I have a file like:
311435ll3e
kk13322;.
erre433
The output is:
311435
3
13322
433
And the 3 is in the second line, the output that i need is:
3114353
13322
433
As a general rule, there is no reason to have both awk and sed appearing in the same pipe, due to a large overlap of capability, and frequently the same is true of awk/grep/sed combinations.
If you just want to suppress the non-digit characters within lines of characters, use (eg) sed -e 's/[^0-9]//g' file, or if you want to do it in place with no backup, sed -i -e 's/[^0-9]//g' file, or in place with backup to a .bak file, sed -ibak -e 's/[^0-9]//g' file.
To suppress blank lines, you can append |egrep -v '^$' after the sed, but it's more efficient to just use sed's d command to delete the pattern space and start next cycle if the pattern space is empty. For example,
sed -e 's/[^0-9]//g; /^$/d' file
does a d if the line is empty after substitution.
The form suggested in 1_CR's comment,
sed -e 's/[^0-9]//g' -e '/./!d'
is an alternative. That form tests if the line has at least one character in it, and if so does not do a d.
If you want to suppress everything in the file that's not digits, use tr -cd 0-9 < file. This suppresses line feeds also.
Note, the form tr -cd [0-9] < file or tr -cd '[0-9]' < file is not correct; it will fail to suppress ] and [ characters because tr will regard them as part of SET1.

Linux using grep to print the file name and first n characters

How do I use grep to perform a search which, when a match is found, will print the file name as well as the first n characters in that file? Note that n is a parameter that can be specified and it is irrelevant whether the first n characters actually contains the matching string.
grep -l pattern *.txt |
while read line; do
echo -n "$line: ";
head -c $n "$line";
echo;
done
Change -c to -n if you want to see the first n lines instead of bytes.
You need to pipe the output of grep to sed to accomplish what you want. Here is an example:
grep mypattern *.txt | sed 's/^\([^:]*:.......\).*/\1/'
The number of dots is the number of characters you want to print. Many versions of sed often provide an option, like -r (GNU/Linux) and -E (FreeBSD), that allows you to use modern-style regular expressions. This makes it possible to specify numerically the number of characters you want to print.
N=7
grep mypattern *.txt /dev/null | sed -r "s/^([^:]*:.{$N}).*/\1/"
Note that this solution is a lot more efficient that others propsoed, which invoke multiple processes.
There are few tools that print 'n characters' rather than 'n lines'. Are you sure you really want characters and not lines? The whole thing can perhaps be best done in Perl. As specified (using grep), we can do:
pattern="$1"
shift
n="$2"
shift
grep -l "$pattern" "$#" |
while read file
do
echo "$file:" $(dd if="$file" count=${n}c)
done
The quotes around $file preserve multiple spaces in file names correctly. We can debate the command line usage, currently (assuming the command name is 'ngrep'):
ngrep pattern n [file ...]
I note that #litb used 'head -c $n'; that's neater than the dd command I used. There might be some systems without head (but they'd pretty archaic). I note that the POSIX version of head only supports -n and the number of lines; the -c option is probably a GNU extension.
Two thoughts here:
1) If efficiency was not a concern (like that would ever happen), you could check $status [csh] after running grep on each file. E.g.: (For N characters = 25.)
foreach FILE ( file1 file2 ... fileN )
grep targetToMatch ${FILE} > /dev/null
if ( $status == 0 ) then
echo -n "${FILE}: "
head -c25 ${FILE}
endif
end
2) GNU [FSF] head contains a --verbose [-v] switch. It also offers --null, to accomodate filenames with spaces. And there's '--', to handle filenames like "-c". So you could do:
grep --null -l targetToMatch -- file1 file2 ... fileN |
xargs --null head -v -c25 --

Resources