How to move files where the first line contains a string? - linux

I am currently using the following command:
grep -l -Z -E '.*?FindMyRegex' /home/user/folder/*.csv | xargs -0 -I{} mv {} /home/destination/folder
This works fine. The problem is it uses grep on the entire file.
I would like to use the grep command on the FIRST line of the file only.
I have tried to use head -1 file | at the beginning, but it did not work.

A change I would add to your script is -
for file in *.csv; do
head -1 "$file" | grep -l -Z -E '.*?FindMyRegex' | xargs -0 -I{} mv {} /home/destination/folder;
done

you can maybe try sed '1q' file.csv | grep ... to search the regexp only in the first line.

You don't need grep or find, as long as your files don't have embedded newlines.
I don't know an easy way off the top of my head to get sed to delimit with nulls.
mv $( for f in /home/user/folder/*.csv;
do sed -ns '1 { /yourPattern/F; q; }' $f;
done ) /home/destination/folder/
EDIT
Rewrote with a loop. This will run a separate instance of sed to check each file, but at least it shouldn't read beyond the first line. It will fail syntactically if there are no hits.
You might need -E depending on your regex.
-n says don't print records from the files.
-s says treat each file as a distinct input - this is so the filenames aren't always the first one.
This does require GNU sed for the F.

gawk 'FNR==1{if($0~/PATTERN/)
printf "mv %s %s\n",FILENAME, "/target";nextfile}' /path/*.csv
First of all, in your regex: .*?FindMyRegex the .*? doesn't make any sense, they could be removed.
The above awk (gawk) one-liner will build up mv file target command lines for you. You can check them, if you are satisfied with them, pipe the output to |sh , the commands are gonna be executed.
replace PATTERN by your regex pattern, and /target by the real target dir.
The one-liner is assuming that the filenames don't contain special chars (space i.e.), if it is the case, add "s to the mv cmd.

using GNU awk to find the filenames, pipe the filenames into xargs
gawk -v pattern="myRegex" '
FNR == 1 {if ($0 ~ pattern) printf "%s\0", FILENAME; nextfile}
' *.csv | xargs -0 echo mv -t destination
If it looks OK, remove "echo"

Try this Shellcheck-clean Bash code:
#! /bin/bash
shopt -s nullglob # Globs that match nothing expand to nothing
shopt -s dotglob # Globs match files whose names start with '.'
dest=/home/destination/folder
for file in *.csv ; do
head -n 1 -- "$file" | grep -qE '.*?FindMyRegex' && mv -- "$file" "$dest"
done
shopt -s nullglob prevents an error if there are no .csv files in the directory.
shopt -s dotglob ensures that files whose name starts with '.' are handled.
The -- in the options for head and mv ensures that files whose names begin with - are handled correctly.
The quotes in "$file" and "$dest" ensure that names that contain whitespace (actually $IFS) characters (including newlines) or glob metacharacters are handled correctly.
Note that the .*? in the reqular expression is probably redundant, and may not do what you think it does (grep -E doesn't do non-greedy matching).

Related

Move a file list based upon grep pattern in command line [duplicate]

I want to pass each output from a command as multiple argument to a second command, e.g.:
grep "pattern" input
returns:
file1
file2
file3
and I want to copy these outputs, e.g:
cp file1 file1.bac
cp file2 file2.bac
cp file3 file3.bac
How can I do that in one go? Something like:
grep "pattern" input | cp $1 $1.bac
You can use xargs:
grep 'pattern' input | xargs -I% cp "%" "%.bac"
You can use $() to interpolate the output of a command. So, you could use kill -9 $(grep -hP '^\d+$' $(ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }')) if you wanted to.
In addition to Chris Jester-Young good answer, I would say that xargs is also a good solution for these situations:
grep ... `ls -lad ... | awk '{ print $9 }'` | xargs kill -9
will make it. All together:
grep -hP '^\d+$' `ls -lad /dir/*/pid | grep -P '/dir/\d+/pid' | awk '{ print $9 }'` | xargs kill -9
For completeness, I'll also mention command substitution and explain why this is not recommended:
cp $(grep -l "pattern" input) directory/
(The backtick syntax cp `grep -l "pattern" input` directory/ is roughly equivalent, but it is obsolete and unwieldy; don't use that.)
This will fail if the output from grep produces a file name which contains whitespace or a shell metacharacter.
Of course, it's fine to use this if you know exactly which file names the grep can produce, and have verified that none of them are problematic. But for a production script, don't use this.
Anyway, for the OP's scenario, where you need to refer to each match individually and add an extension to it, the xargs or while read alternatives are superior anyway.
In the worst case (meaning problematic or unspecified file names), pass the matches to a subshell via xargs:
grep -l "pattern" input |
xargs -r sh -c 'for f; do cp "$f" "$f.bac"; done' _
... where obviously the script inside the for loop could be arbitrarily complex.
In the ideal case, the command you want to run is simple (or versatile) enough that you can simply pass it an arbitrarily long list of file names. For example, GNU cp has a -t option to facilitate this use of xargs (the -t option allows you to put the destination directory first on the command line, so you can put as many files as you like at the end of the command):
grep -l "pattern" input | xargs cp -t destdir
which will expand into
cp -t destdir file1 file2 file3 file4 ...
for as many matches as xargs can fit onto the command line of cp, repeated as many times as it takes to pass all the files to cp. (Unfortunately, this doesn't match the OP's scenario; if you need to rename every file while copying, you need to pass in just two arguments per cp invocation: the source file name and the destination file name to copy it to.)
So in other words, if you use the command substitution syntax and grep produces a really long list of matches, you risk bumping into ARG_MAX and "Argument list too long" errors; but xargs will specifically avoid this by instead copying only as many arguments as it can safely pass to cp at a time, and running cp multiple times if necessary instead.
The above will still work incorrectly if you have file names which contain newlines. Perhaps see also https://mywiki.wooledge.org/BashFAQ/020
#!/bin/bash
for f in files; do
if grep -q PATTERN "$f"; then
echo cp -v "$f" "${f}.bac"
fi
done
files can be *.txt or *.text which basically means files ending in *.txt or *text or replace with something that you want/need, of course replace PATTERN with yours. Remove echo if you're satisfied with the output. For a recursive solution take a look at the bash shell option globstar

how to not escape space and backslashes in echo and while in bash?

I'm passing two positional args to a script to run, both args are a path, and while in the scenario analyzing the paths, the problem is sometimes there is some path like: m i sc . . . . .. . . it has dots and spaces, and sometimes even we have a backslash in dir names.
It is so tried to get arguments via two procedures, directly and via at sign.
SOURCE_ARG=$1
DESTINATION_ARG=$2
and
ARG_COUNT=0
for POSITIONAL_ARGUMENTS in "${#}"
do
((ARG_COUNT++))
ARGUMENT_ARRAY[$ARG_COUNT]=$POSITIONAL_ARGUMENTS
done
In the loop, I iterate through the result of commands that have forwarded to them.
while IFS= read -r dir
do
echo "${ARGUMENT_ARRAY[1]}"
echo "${dir}"
while IFS= read -r item
do
# do some stuff
done < <(ls -A "$dir"/)
done < <(du -hP "$SOURCE_ARG" | awk '{$1=""; print $0}' | grep -v "^.$" | sed "s/^ //g")
when i use echo "${ARGUMENT_ARRAY[1]}" i get the same path as i need to check but when using loop iteration varible as dir in here ->echo "${dir}" i got all the spaces escaped, since other commands for that path could not do their jobs.
What I'm Asking for is that how can I get the output of $dir within the loop and as like as echo "${ARGUMENT_ARRAY[1]}" that i mentioned above(input with all spaces and backslashes)
Thanks to #Barmar in comments.
The only reason that filenames are without escapes (i.e. you see directories with no special character or special characters have been escaped) is because du is printing the filenames with escapes, so $dir variable would have escaped once and special characters are no longer available for the other loop iteration in my problem.
Now that we know the problem was raised by using du in my script:
while IFS= read -r dir
# do sth
done < <(du -hP "$SOURCE_ARG" | awk '{$1=""; print $0}' | grep -v "^.$" | sed "s/^ //g")
We can change the du to find and the problem is solved:
while IFS= read -r dir
# do sth
done < <(find "$SOURCE_ARG" -type d –)
PS 1:
Another problem raised as I wanted to print the lines to check them if they are ok or not (i.e. while debugging application) was with echo.
So be sure to try printf "%s\n" "$dir" instead of echo, as some versions of echo process escape sequences.
echo "${dir}"
printf "%s\n" "$dir"
PS 2:
Also If a filename has more than one space in a row, The way I used awk, was collapsing them into a single space.
awk '{$1=""; print $0}' | grep -v "^.$" | sed "s/^ //g"

How to apply my sed command to some lines of all my files?

I've 95 files that looks like :
2019-10-29-18-00/dev/xx;512.00;0.4;/var/x/xx/xxx
2019-10-29-18-00/dev/xx;512.00;0.68;/xx
2019-10-29-18-00/dev/xx;512.00;1.84;/xx/xx/xx
2019-10-29-18-00/dev/xx;512.00;80.08;/opt/xx/x
2019-10-29-18-00/dev/xx;20480.00;83.44;/var/x/x
2019-10-29-18-00/dev/xx;3584.00;840.43;/var/xx/x
2019-10-30-00-00/dev/xx;2048.00;411.59;/
2019-10-30-00-00/dev/xx;7168.00;6168.09;/usr
2019-10-30-00-00/dev/xx;3072.00;1036.1;/var
2019-10-30-00-00/dev/xx;5120.00;348.72;/tmp
2019-10-30-00-00/dev/xx;20480.00;2033.19;/home
2019-10-30-12-00;/dev/xx;5120.00;348.72;/tmp
2019-10-30-12-00;/dev/hd1;20480.00;2037.62;/home
2019-10-30-12-00;/dev/xx;512.00;0.43;/xx
2019-10-30-12-00;/dev/xx;3584.00;794.39;/xx
2019-10-30-12-00;/dev/xx;512.00;0.4;/var/xx/xx/xx
2019-10-30-12-00;/dev/xx;512.00;0.68;/xx
2019-10-30-12-00;/dev/xx;512.00;1.84;/var/xx/xx
2019-10-30-12-00;/dev/xx;512.00;80.08;/opt/xx/x
2019-10-30-12-00;/dev/xx;20480.00;83.44;/var/xx/xx
2019-10-30-12-00;/dev/x;3584.00;840.43;/var/xx/xx
For some lines I've 2019-10-29-18-00/dev and for some other lines, I've 2019-10-30-12-00;/dev/
I want to add the ; before the /dev/ where it is missing, so for that I use this sed command :
sed 's/\/dev/\;\/dev/'
But How I can apply this command for each lines where the ; is missing ? I try this :
for i in $(cat /home/xxx/xxx/xxx/*.txt | grep -e "00/dev/")
do
sed 's/\/dev/\;\/dev/' $i > $i
done
But it doesn't work... Can you help me ?
Could you please try following with GNU awkif you are ok with it.
awk -i inplace '/00\/dev\//{gsub(/00\/dev\//,"/00;/dev/")} 1' *.txt
sed solution: Tested with GNU sed for few files and it worked fine.
sed -i.bak '/00\/dev/s/00\/dev/00\;\/dev/g' *.txt
This might work for you (GNU sed & parallel):
parallel -q sed -i 's#;*/dev#;/dev#' ::: *.txt
or if you prefer:
sed -i 's#;*/dev#;/dev#' *.txt
Ignore lines with ;/dev.
sed '/;\/dev/{p;d}; s^/dev^;/dev^'
The /;\/dev/ check if the line has ;/dev. If it has ;/dev do: p - print the current line and d - start from the beginning.
You can use any character with s command in sed. Also, there is no need in escaping \;, just ;.
How I can apply this command for each lines where the ; is missing ? I try this
Don't edit the same file redirecting to the same file $i > $i. Think about it. How can you re-write and read from the same file at the same time? You can't, the resulting file will be in most cases empty, as the > $i will "execute" first making the file empty, then sed $i will start running and it will read an empty file. Use a temporary file sed ... "$i" > temp.txt; mv temp.txt "$i" or use gnu extension -i sed option to edit in place.
What you want to do really is:
grep -l '00/dev/' /home/xxx/xxx/xxx/*.txt |
xargs -n1 sed -i '/;\/dev/{p;d}; s^/dev^;/dev^'
grep -l prints list of files that match the pattern, then xargs for each single one -n1 of the files executes sed which -i edits files in place.
grep for filtering can be eliminated in your case, we can accomplish the task with a single sed command:
for f in $(cat /home/xxx/xxx/xxx/*.txt)
do
[[ -f "$f" ]] && sed -Ei '/00\/dev/ s/([^;])(\/dev)/\1;\2/' "$f"
done
The easiest way would be to adjust your regex so that it's looking a bit wider than '/dev/', e.g.
sed -i -E 's|([0-9])/dev|\1;/dev|'
(note that I'm taking advantage of sed's flexible approach to delimiters on substitute. Also, -E changes the group syntax)
Alternatively, sed lets you filter which lines it handles:
sed -i '/[0-9]\/dev/ s/\/dev/;/dev/'
This uses the same substitution you already have but only applied on lines that match the filter regex

Linux: Removing files that don't contain all the words specified

Inside a directory, how can I delete files that lack any of the words specified, so that only files that contain ALL the words are left? I tried to write a simple bash shell script using grep and rm commands, but I got lost. I am totally new to Linux, any help would be appreciated
How about:
grep -L foo *.txt | xargs rm
grep -L bar *.txt | xargs rm
If a file does not contain foo, then the first line will remove it.
If a file does not contain bar, then the second line will remove it.
Only files containing both foo and bar should be left
-L, --files-without-match
Suppress normal output; instead print the name of each input
file from which no output would normally have been printed. The
scanning will stop on the first match.
See also #Mykola Golubyev's post for placing in a loop.
list=`Word1 Word2 Word3 Word4 Word5`
for word in $list
grep -L $word *.txt | xargs rm
done
Addition to the answers above: Use the newline character as delimiter to handle file names with spaces!
grep -L $word $file | xargs -d '\n' rm
grep -L word | xargs rm
To do the same matching filenames (not the contents of files as most of the solutions above) you can use the following:
for file in `ls --color=never | grep -ve "\(foo\|bar\)"`
do
rm $file
done
As per comments:
for file in `ls`
shouldn't be used. The below does the same thing without using the ls
for file in *
do
if [ x`echo $file | grep -ve "\(test1\|test3\)"` == x ]; then
rm $file
fi
done
The -ve reverses the search for the regexp pattern for either foo or bar in the filename.
Any further words to be added to the list need to be separated by \|
e.g. one\|two\|three
First, remove the file-list:
rm flist
Then, for each of the words, add the file to the filelist if it contains that word:
grep -l WORD * >>flist
Then sort, uniqify and get a count:
sort flist | uniq -c >flist_with_count
All those files in flsit_with_count that don't have the number of words should be deleted. The format will be:
2 file1
7 file2
8 file3
8 file4
If there were 8 words, then file1 and file2 should be deleted. I'll leave the writing/testing of the script to you.
Okay, you convinced me, here's my script:
#!/bin/bash
rm -rf flist
for word in fopen fclose main ; do
grep -l ${word} *.c >>flist
done
rm $(sort flist | uniq -c | awk '$1 != 3 {print $2} {}')
This removes the files in the directory that didn't have all three words:
You could try something like this but it may break
if the patterns contain shell or grep meta characters:
(in this example one two three are the patterns)
for f in *; do
unset cmd
for p in one two three; do
cmd="fgrep \"$p\" \"$f\" && $cmd"
done
eval "$cmd" >/dev/null || rm "$f"
done
This will remove all files that doesn't contain words Ping or Sent
grep -L 'Ping\|Sent' * | xargs rm

Linux using grep to print the file name and first n characters

How do I use grep to perform a search which, when a match is found, will print the file name as well as the first n characters in that file? Note that n is a parameter that can be specified and it is irrelevant whether the first n characters actually contains the matching string.
grep -l pattern *.txt |
while read line; do
echo -n "$line: ";
head -c $n "$line";
echo;
done
Change -c to -n if you want to see the first n lines instead of bytes.
You need to pipe the output of grep to sed to accomplish what you want. Here is an example:
grep mypattern *.txt | sed 's/^\([^:]*:.......\).*/\1/'
The number of dots is the number of characters you want to print. Many versions of sed often provide an option, like -r (GNU/Linux) and -E (FreeBSD), that allows you to use modern-style regular expressions. This makes it possible to specify numerically the number of characters you want to print.
N=7
grep mypattern *.txt /dev/null | sed -r "s/^([^:]*:.{$N}).*/\1/"
Note that this solution is a lot more efficient that others propsoed, which invoke multiple processes.
There are few tools that print 'n characters' rather than 'n lines'. Are you sure you really want characters and not lines? The whole thing can perhaps be best done in Perl. As specified (using grep), we can do:
pattern="$1"
shift
n="$2"
shift
grep -l "$pattern" "$#" |
while read file
do
echo "$file:" $(dd if="$file" count=${n}c)
done
The quotes around $file preserve multiple spaces in file names correctly. We can debate the command line usage, currently (assuming the command name is 'ngrep'):
ngrep pattern n [file ...]
I note that #litb used 'head -c $n'; that's neater than the dd command I used. There might be some systems without head (but they'd pretty archaic). I note that the POSIX version of head only supports -n and the number of lines; the -c option is probably a GNU extension.
Two thoughts here:
1) If efficiency was not a concern (like that would ever happen), you could check $status [csh] after running grep on each file. E.g.: (For N characters = 25.)
foreach FILE ( file1 file2 ... fileN )
grep targetToMatch ${FILE} > /dev/null
if ( $status == 0 ) then
echo -n "${FILE}: "
head -c25 ${FILE}
endif
end
2) GNU [FSF] head contains a --verbose [-v] switch. It also offers --null, to accomodate filenames with spaces. And there's '--', to handle filenames like "-c". So you could do:
grep --null -l targetToMatch -- file1 file2 ... fileN |
xargs --null head -v -c25 --

Resources