I have a directory with many text files. I want to search a given string in specific lines in the files(like searching for 'abc' in only 2nd and 3rd line of each file). Then When I find A match I want to print line 1 of the matching file.
My Approach - I'm doing a grep search with -n option and storing the output in a different file and then searching that file for the line number. Then I'm trying to get the file name and then print out it's first line.
Using the approach I mentioned above I'm not able to get the file name of the right file and even if I get that this approach is very lengthy.
Is there a better and fast solution to this?
Eg.
1.txt
file 1
one
two
2.txt
file 2
two
three
I want to search for "two" in line 2 of each file using grep and then print the first line of the file with match. In this example that would be 2.txt and the output should be "file 2"
I know it is easier using sed/awk but is there any way to do this using grep?
Use sed instead (GNU sed):
parse.sed
1h # Save the first line to hold space
2,3 { # On lines 2 and 3
/my pattern/ { # Match `my pattern`
x # If there is a match bring back the first line
p # and print it
:a; n; ba # Loop to the end of the file
}
}
Run it like this:
sed -snf parse.sed file1 file2 ...
Or as a one-liner:
sed -sn '1h; 2,3 { /my pattern/ { x; p; :a; n; ba; } }' file1 file2 ...
You might want to emit the filename as well, e.g. with your example data:
parse2.sed
1h # Save the first line to hold space
2,3 { # On lines 2 and 3
/two/ { # Match `my pattern`
F # Output the filename of the file currently being processed
x # If there is a match bring back the first line
p # and print it
:a; n; ba # Loop to the end of the file
}
}
Run it like this:
sed -snf parse2.sed file1 file2 | paste -d: - -
Output:
file1:file 1
file2:file 2
$ awk 'FNR==2{if(/one/) print line; nextfile} FNR==1{line=$0}' 1.txt 2.txt
file 1
$ awk 'FNR==2{if(/two/) print line; nextfile} FNR==1{line=$0}' 1.txt 2.txt
file 2
FNR will have line number for the current file being read
use FNR>=2 && FNR<=3 if you need a range of lines
FNR==1{line=$0} will save the contents of first line for future use
nextfile should be supported by most implementations, but the solution will still work (slower though) if you need to remove it
With grep and bash:
# Grep for a pattern and print filename and line number
grep -Hn one file[12] |
# Loop over matches where f=filename, n=match-line-number and s=matched-line
while IFS=: read f n s; do
# If match was on line 2 or line 3
# print the first line of the file
(( n == 2 || n == 3 )) && head -n1 $f
done
Output:
file 1
Only using grep, cut and | (pipe):
grep -rnw pattern dir | grep ":line_num:" | cut -d':' -f 1
Explanation
grep -rnw pattern dir
It return name of the file(s) where the pattern was found along with the line number.
It's output will be somthing like this
path/to/file/file1(.txt):8:some pattern 1
path/to/file/file2(.txt):4:some pattern 2
path/to/file/file3(.txt):2:some pattern 3
Now I'm using another grep to get the file with the right line number (for e.g. file that contains the pattern in line 2)
grep -rnw pattern dir | grep ":2:"
It's output will be
path/to/file/file3(.txt):2:line
Now I'm using cut to get the filename
grep -rnw pattern dir | grep ":2:" | cut -d':' -f 1
It will output the file name like this
path/to/file/file3(.txt)
P.S. - If you want to remove the "path/to/file/" from the filename you can use rev then cut and again rev, you can try this yourself or see the code below.
grep -rnw pattern dir | grep ":2:" | cut -d':' -f 1 | rev | cut -d'/' -f 1 | rev
Related
I would like to match all lines from a file containing a word, and take all lines under until coming two two newline characters in a row.
I have the following sed code to cut and paste specific lines, but not subsequent lines:
sed 's|.*|/\\<&\\>/{w results\nd}|' teststring | sed -file.bak -f - testfile
How could I modify this to take all subsequent lines?
For example, say I wanted to match lines with 'dog', the following should take the first 3 lines of the 5:
The best kind of an animal is a dog, for sure
-man's best friend
-related to wolves
Racoons are not cute
Is there a way to do this?
This should do:
awk '/dog/ {f=1} /^$/ {f=0} f {print > "new"} !f {print > "tmp"}' file && mv tmp file
It will set f to true if word dog is found, then if a blank line is found set f to false.
If f is true, print to new file.
If f is false, print to tmp file.
Copy tmp file to original file
Edit: Can be shorten some:
awk '/dog/ {f=1} /^$/ {f=0} {print > (f?"new":"tmp")}' file && mv tmp file
Edit2: as requested add space for every section in the new file:
awk '/dog/ {f=1;print ""> "new"} /^$/ {f=0} {print > (f?"new":"tmp")}' file && mv tmp file
If the original files does contains tabs or spaces instead of just a blank line after each dog section, change from /^$/ to /^[ \t]*$/
This might work for you (GNU sed):
sed 's|.*|/\\<&\\>/ba|' stringFile |
sed -f - -e 'b;:a;w resultFile' -e 'n;/^$/!ba' file
Build a set of regexps from the stringFile and send matches to :a. Then write the matched line and any further lines until an empty line (or end of file) to the resultFile.
N.B. The results could be sent directly to resultFile,using:
sed 's#.*#/\\<&\\>/ba#' stringFile |
sed -nf - -e 'b;:a;p;n;/^$/!ba' file > resultFile
To cut the matches from the original file use:
sed 's|.*|/\\<&\\>/ba|' stringFile |
sed -f - -e 'b;:a;N;/\n\s*$/!ba;w resultFile' -e 's/.*//p;d' file
Is this what you're trying to do?
$ awk -v RS= '/dog/' file
The best kind of an animal is a dog, for sure
-man's best friend
-related to wolves
Could you please try following.
awk '/dog/{count="";found=1} found && ++count<4' Input_file > temp && mv temp Input_file
I have files containing below format, generated by another system
12;453453;TBS;OPPS;
12;453454;TGS;OPPS;
12;453455;TGS;OPPS;
12;453456;TGS;OPPS;
20;787899;THS;CLST;
33;786789;
i have to check the last line contains 33 , then have to continue to copy the file/files to other location. else discard the file.
currently I am doing as below
tail -1 abc.txt >> c.txt
awk '{print substr($0,0,2)}' c.txt
then if the o/p is saved to another variable and copying.
Can anyone suggest any other simple way.
Thank you!
R/
Imagine you have the following input file:
$ cat file
a
b
c
d
e
agc
Then you can run the following commands (grep, awk, sed, cut) to get the first 2 char of last line:
AWK
$ awk 'END{print substr($0,0,2)}' file
ag
SED
$ sed -n '$s/^\(..\).*/\1/p' file
ag
GREP
$ tail -1 file | grep -oE '^..'
ag
CUT
$ tail -1 file | cut -c '1-2'
ag
BASH SUBSTRING
line=$(tail -1 file); echo ${line:0:2}
All of those commands do what you are looking for, the awk command will just do the operation on the last line of the file so you do not need tail anymore, the said command will extract the last line of the file and store it in its pattern buffer, then replace everything that is not the first 2 chars by nothing and then print the pattern buffer (the 2 fist char of the last line), another solution is just to tail the last line of the file and to extract the first 2 chars using grep, by piping those 2 commands you can also do it in one step without using intermediate variables, files.
Now if you want to put everything in one script this become:
$ more file check_2chars.sh
::::::::::::::
file
::::::::::::::
a
b
c
d
e
33abc
::::::::::::::
check_2chars.sh
::::::::::::::
#!/bin/bash
s1=$(tail -1 file | cut -c 1-2) #you can use other commands from this post
s2=33
if [ "$s1" == "$s2" ]
then
echo "match" #implement the copy/discard logic
fi
Execution:
$ ./check_2chars.sh
match
I will let you implement the copy/discard logic
PROOF:
Given the task of either copying or deleting files based on their contents, shell variables aren't necessary.
Using the sed File name command and xargs the whole task can be done in just one line:
find | xargs -l sed -n '${/^33/!F}' | xargs -r rm ; cp * dest/dir/
Or preferably, with GNU sed:
sed -sn '${/^33/!F}' * | xargs -r rm ; cp * dest/dir/
Or if all the filenames contain no whitespace:
rm -r $(sed -sn '${/^33/!F}' *) ; cp * dest/dir/
That assumes all the files in the current directory are to be tested.
sed looks at the last line ($) of every file, and runs what's in the curly braces.
If any of those last lines line do not begin with 33 (/^33/!), sed outputs just those unwanted file names (F).
Supposing the unwanted files are named foo and baz -- those are piped to xargs which runs rm foo baz.
At this point the only files left should be copied over to dest/dir/: cp * dest/dir/.
It's efficient, cp and rm need only be run once.
If a shell variable must be used, here are two more methods:
Using tail and bash, store first two chars of the last line to $n:
n="$(tail -1 abc.txt)" n="${n:0:2}"
Here's a more portable POSIX shell version:
n="$(tail -1 abc.txt)" n="${n%${n#??}}"
You may explicitly test with sed for the last line ($) starting with 33 (/^33.*/):
echo " 12;453453;TBS;OPPS;
12;453454;TGS;OPPS;
12;453455;TGS;OPPS;
12;453456;TGS;OPPS;
20;787899;THS;CLST;
33;786789;" | sed -n "$ {/^33.*/p}"
33;786789;
If you store the result in a variable, you may test it for being empty or not:
lastline33=$(echo " 12;453453;TBS;OPPS;
12;453454;TGS;OPPS;
12;453455;TGS;OPPS;
12;453456;TGS;OPPS;
20;787899;THS;CLST;
33;786789;" | sed -n "$ {/^33.*/p}")
echo $(test -n "$lastline33" && echo not null || echo null)
not null
Probably you like the regular expression to contain the semicolon, because else it would match 330, 331, ...339, 33401345 and so on, but maybe that can be excluded from the context - to me it seems a good idea:
lastline33=$(sed -n "$ {/^33;.*/p}" abc.txt)
I have a usecase where I need to search and replace the last occurrence of a string in a file and write the changes back to the file. The case below is a simplified version of that usecase:
I'm attempting to reverse the file, make some changes reverse it back again and write to the file. I've tried the following snippet for this:
tac test | sed s/a/b/ | sed -i '1!G;h;$!d' test
test is a text file with contents:
a
1
2
3
4
5
I was expecting this command to make no changes to the order of the file, but it has actually reversed the contents to:
5
4
3
2
1
b
How can i make the substitution as well as retain the order of the file?
You can tac your file, apply substitution on first occurrence of desired pattern, tac again and tee result to a temporary file before you rename it with the original name:
tac file | sed '0,/a/{s//b/}' | tac > tmp && mv tmp file
Another way is to user grep to get the number of the last line that contains the text you want to change, then use sed to change that line:
$ linno=$( grep -n 'abc' <file> | tail -1 | cut -d: -f1 )
$ sed -i "${linno}s/abc/def/" <file>
Try to cat test | rev | sed -i '1!G;h;$!d' | rev
Or you can use only sed coomand:
For example you want to replace ABC on DEF:
You need to add 'g' to the end of your sed:
sed -e 's/\(.*\)ABC/\1DEF/g'
This tells sed to replace every occurrence of your regex ("globally") instead of only the first occurrence.
You should also add a $, if you want to ensure that it is replacing the last occurrence of ABC on the line:
sed -e 's/\(.*\)ABC$/\1DEF/g'
EDIT
Or simply add another | tac to your command:
tac test | sed s/a/b/ | sed -i '1!G;h;$!d' | tac
Here is a way to do this in a single command using awk.
First input file:
cat file
a
1
2
3
4
a
5
Now this awk command:
awk '{a[i++]=$0} END{p=i; while(i--) if (sub(/a/, "b", a[i])) break;
for(i=0; i<p; i++) print a[i]}' file
a
1
2
3
4
b
5
To save output back into original file use:
awk '{a[i++]=$0} END{p=i; while(i--) if (sub(/a/, "b", a[i])) break;
for(i=0; i<p; i++) print a[i]}' file >> $$.tmp && mv $$.tmp f
Another in awk. First a test file:
$ cat file
a
1
a
2
a
and solution:
$ awk '
$0=="a" && NR>1 { # when we meet "a"
print b; b="" # output and clear buffer b
}
{
b=b (b==""?"":ORS) $0 # gether the buffer
}
END { # in the end
sub(/^a/,"b",b) # replace the leading "a" in buffer b with "b"
print b # output buffer
}' file
a
1
a
2
b
Writing back the happens by redirecting the output to a temp file which replaces the original file (awk ... file > tmp && mv tmp file) or if you are using GNU awk v. 4.1.0+ you can use inplace edit (awk -i inplace ...).
I am in need of converting the below in multiple files. Text need not be same, but will be in the same format and length
File 1:
XXXxx81511
XXX is Present
abcdefg
07/09/2014
YES
1
XXX
XXX-XXXX
File 2:
XXXxx81511
XXX is Present
abcdefg
07/09/2014
YES
1
XXX
XXX-XXXX
TO
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXXXXX-XXXX
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXXXXX-XXXX
Basically converting row to column and appending to a new file while adding commas to separate them.
I am trying cat filename | tr '\n' ',' but the results do get added in the same line. like this
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXXXXX-XXXX,XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXXXXX-XXXX
Use:
paste -sd, file1 file2 .... fileN
#e.g.
paste -sd, *.txt file*
prints
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXX,XXX-XXXX
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXX,XXX-XXXX
and if you need the empty line after each one
paste -sd, file* | sed G
prints
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXX,XXX-XXXX
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXX,XXX-XXXX
Short perl variant:
perl -pe 'eof||s|$/|,|' files....
You need to insert an echo after tr. Use a script like this:
for f in file1 file2; do
tr '\n' ',' < "$f"; echo
done > files.output
Use a for loop:
for f in file*; do sed ':a;N;$!ba;s/\n/,/g' < $f; done
The sed code was taken from sed: How can I replace a newline (\n)?. tr '\n' ',' didn't work on my limited test setup.
perl -ne 'chomp; print $_ . (($. % 8) ? "," : "\n")' f*
where:
-n reads the file line by line but doesn't print each line
-e executes the code from the command line
8 number of lines in each file
f* glob for files (replace with something that will select all
your files). If you need a specific order, you will probably need
something more complicated here.
Looking for a better way of extracting a number from the last line in a file.
Sample content:
# Provider added by blah
#
security.provider.3=org.bouncycastle145.jce.provider.BouncyCastleProvider
#
# Provider added by blah
#
security.provider.4=org.bouncycastle145.jce.provider.BouncyCastleProvider
#
# Provider added by blah
#
security.provider.79=org.bouncycastle145.jce.provider.BouncyCastleProvider
I would like to parse the last line in the file and return the number after:
security.provider.
This is what I'm using and it seems to only fine the first digit after:
security.provider.:
tail -1 filename | cut -c19
I know I can use:
tail -1 filename | cut -c19,20
but I wouldn't know if the number is a single digit or double, etc.
You can use sed as:
tail -1 file | sed -r 's/security\.provider\.([0-9]+).*/\1/'
You can do that using just a sed one liner:
sed -ne '$s/security\.provider\.\([0-9]\+\).*/\1/p' <file>
Assuming your final line is always exactly "security.provider.xxx=" and you want xxx:
$ sed -n \$p file | cut -d= -f 1 | cut -d. -f 3