How to do something like grep -B to select only one line? - linux

Everything is in the title. Basicaly let's say I have this pattern
some text lalala
another line
much funny wow grep
I grep funny and I want my output to be "lalala"
Thank you

One possible answer is to use either ed or ex to do this (it is trivial in them):
ed - yourfile <<< 'g/funny/.-2p'
(Or replace ed with ex. You might have red, the restricted editor, too; it can't modify files.) This looks for the pattern /funny/ globally, and whenever it is found, prints the line 2 before the matching line (that's the .-2p part). Or, if you want the most recent line containing 'lalala' before the line matching 'funny':
ed - yourfile <<< 'g/funny/?lalala?p'
The only problem is if you're trying to process standard input rather than a file; then you have to save the standard input to a file and process that file, which spoils the concurrency.
You can't do negative offsets in sed (though GNU sed allows you to do positive offsets, so you could use sed -n '/lalala/,+2p' file to get the 'lalala' to 'funny' lines (which isn't quite what you want) based on finding 'lalala', but you cannot find the 'lalala' lines based on finding 'funny'). Standard sed does not allow offsets at all.
If you need to print just the IP address found on a line 8 lines before the pattern-matching line, you need a slightly more involved ed script, but it is still doable:
ed - yourfile <<< 'g/funny/.-8s/.* //p'
This uses the same basic mechanism to find the right line, then runs a substitute command to remove everything up to the last space on the line and print the modified version. Since there isn't a w command, it doesn't actually modify the file.

Since grep -B only prints each full number of lines before the match, you'll have to pipe the output into something like grep or Awk.
grep -B 2 "funny" file|awk 'NR==1{print $NF; exit}'
You could also just use Awk.
awk -v s="funny" '/[[:space:]]lalala$/{n=NR+2; o=$NF}NR==n && $0~s{print o}' file
For the specific example of an IP address 8 lines before the match as mentioned in your comment:
awk -v s="funny" '
/[[:space:]][0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$/ {
n=NR+8
ip=$NF
}
NR==n && $0~s {
print ip
}' file
These Awk solutions first find the output field you might want, then print the output only if the word you want exists in the nth following line.

Here's an attempt at a slightly generalized Awk solution. It maintains a circular queue of the last q lines and prints the line at the head of the queue when it sees a match.
#!/bin/sh
: ${q=8}
e=$1
shift
awk -v q="$q" -v e="$e" '{ m[(NR%q)+1] = $0 }
$0 ~ e { print m[((NR+1)%q)+1] }' "${#--}"
Adapting to a different default (I set it to 8) or proper option handling (currently, you'd run it like q=3 ./qgrep regex file) as well as remembering (and hence printing) the entire line should be easy enough.
(I also didn't bother to make it work correctly if you see a match in the first q-1 lines. It will just print an empty line then.)

Related

How to truncate rest of the text in a file after finding a specific text pattern, in unix?

I have a HTML PAGE which I have extracted in unix using wget command, in that after the word "Check list" I need to remove all of the text and with the remaining I am trying to grep some data. I am unable to think on a way which can be helpful for removing the text after a keyword. if I do
s/Check list.*//g
It just removes the line , I want everything below that to be gone. How do I perform this?
The other solutions you have so far require non-POSIX-mandatory tools (GNU sed, GNU awk, or perl) so YMMV with their availability and will read the whole file into memory at once.
These will work in any awk in any shell on every Unix box and only read 1 line at a time into memory:
awk -F 'Check list' '{print $1} NF>1{exit}' file
or:
awk 'sub(/Check list.*/,""){f=1} {print} f{exit}' file
With GNU awk for multi-char RS you could do:
awk -v RS='Check list' '{print; exit}' file
but that would still read all of the text before Check list into memory at once.
Depending on which sed version you have, maybe
sed -z 's/Check list.*//'
The /g flag is useless as you only want to replace everything once.
If your sed does not have the -z option (which says to use the ASCII null character as line terminator instead of newline; this hinges on your file not containing any actual nulls, but that should trivially be true for any text file), try Perl:
perl -0777 -pe 's/Check list.*//s'
Unlike sed -z, this explicitly says to slurp the entire file into memory (the argument to -0 is the octal character code of a terminator character, but 777 is not a valid terminator character at all, so it always reads the entire file as a single "line") so this works even if there are spurious nulls in your file. The final s flag says to include newline in what . matches (otherwise s/.*// would still only substitute on the matching physical line).
I assume you are aware that removing everything will violate the integrity of the HTML file; it needs there to be a closing tag for every start tag near the beginning of the document (so if it starts with <html><body> you should keep </body></html> just before the end of the file, for example).
With awk you could make use of RS variable and then set field separator to regex with word boundaries and then print the very first field as per need.
awk -v RS="^$" -v FS='\\<check_list\\>' '{print $1}' Input_file
You might use q to instruct GNU sed to quit, thus ending processing, consider following simple example, let file.txt content be
123
456
789
and say you want to jettison everything beyond 5, then you could do
sed '/5/{s/5.*//;q}' file.txt
which gives output
123
4
Explanation: for line having 5, substitute 5 and everything beyond it with empty string (i.e. delete it), then q. Observe that lowercase q is used to provide printing of altered line before quiting.
(tested in GNU sed 4.7)

Can we do this in perl?

I want the below awk one liner to be translated to perl. is it possible??
awk '{ for(i=1;i<=NF;i++){if(i==NF){printf("%s\n",$NF);}else {printf("%s\t",$i)}}}' file.txt | awk 'NR > 1'
The first awk command removes the leading empty column and the next one removes the first line.
Below is the head of file.txt
#FILEOUTPUT
1 137442 2324
2326 139767 4169
6491 143936 94
The output i get from those commands is below
1 137442 2324
2326 139767 4169
6491 143936 94
Thanks,
Karthic
#Alex got the usage of $. correctly - which is not a very common perl idiom (though a useful one as we see), but they didn't handle the extra spaces correctly.
Awk is all about understanding what the fields are and then manipulating the fields, and as part of that it does a lot of whitespace canonalization.
Perl, OTOH, usually doesn't involve itself in field separation and a lot of users like to do that themeselves - but it does support this Awk behavior using the -a flag.
So a simple implementation of the above Awk line noise might look like this:
perl -anle 'print join("\t",#F) if $. > 1' file.txt
Explanation:
-a: trigger field separation using the default field separator (which works well in this case) or whatever -F says (like Awk).
-n : iterate over the input lines (same as what the outermost {} do in Awk). A common alternative is -p which would mean to iterate over the input lines and then print out whatever the line buffer has after running the code.
-l : When printing, add a new line at the end of the text (makes things like this slightly easier to work with)
-e : here's a script.
Then we just take the separated field array (#F) and join it. Often devs like to just address certain fields with $F[<index>], but here we don't need to loop - we can just take the list as is and pipe it to join().

How to input a command's result as a string argument in sed

i want to execute a command as follows on my bash terminal:
sed -i '6i `sed '1!d' input.in`' out
with which i can insert at line 6 of file out (with replacing -i option) the result of the sed '%1!d' input.in command. I haven't found anything useful, and have tried both `com`, $(com) and com | sed -i '6i ' out, where com stands for sed '%1!d' input.in. I don't have any problem changing the syntax of the whole command but i want it to be written in one line on terminal use sed.
Thanks for listening,
awaiting your answer.
For EdMorton:
Example Input:
input.in:
into a lake.
out:
Mary was runing around a pond and fell
into a lake.
Mary fell into a what?
Desired Output:
Mary was runing around a pond and fell
into a lake.
Mary fell into a what?
into a lake.
Try using r on standard input instead of i.
sed '%1!d' input.in |
sed -i '6r /dev/stdin' out
If your platform doesn't support /dev/stdin or /dev/fd/0, see if your sed supports - to mean standard input ... or, in the worst case, resort to a temporary file.
As commenters have already pointed out, %1!d does not appear to be a valid command in most sed dialects, but that is basically unimportant here. (If you mean to print just the first line, maybe you mean sed '1!d', although sed 'p;q' does that more efficiently.)
sed is for simple substitutions on individual lines, that is all. For anything else you should be using awk.
Given this modified input file
$ cat input.in
a Windows folder C:\Windows\Temp
Here is what the sed solution you posted in your comments does:
$ sed '1!d' input.in > temp.of.in && sed "6i `cat temp.of.in`" out
Mary was runing around a pond and fell
into a lake.
Mary fell into a what?
a Windows folder C:WindowsTemp
and here is what an awk solution does more efficiently and accurately and without a temp file:
$ awk 'NR==1{x=$0;nextfile} FNR==6{print x} 1' input.in out
Mary was runing around a pond and fell
into a lake.
Mary fell into a what?
a Windows folder C:\Windows\Temp
Notice the awk solution preserved the path-separator backslashes while the sed one stripped them. Also note that you should really add && rm temp.of.in to the end of your sed command line to clean up the temp file and you should be using $(..) to execute your command, not obsolete backticks.
The awk solution uses GNU awk for ;nextfile, with other awks you'd replace that with }NR==FNR{next or similar but since you are using GNU sed I assume you have GNU awk too.
Note that if you DID have a burning desire to use sed and accept it won't exactly reproduce the input, there are simpler, more efficient ways to do what your current script does, e.g.:
sed "6i $(head -1 input.in)" out
or even your original idea, just rewritten to remove the obsolete backticks and negative logic of 1!d:
sed "6i $(sed -n '1p' input.in)" out
But seriously - just use awk. For anything other than simple substitutions on individual lines it's much more robust, efficient, clear, portable, extensible, etc. etc. than sed.
EDIT To address the questions in your comments:
Can you explain the arguments on awk.
There are no arguments, just a script that says: If this is the first line read from the first file save it in variable x then move on to the next file. If this is line 6 of the 2nd file print the contents of variable x. For every line of the 2nd file, print it (the 1 is idiomatic but a bit tricky at first glance - it's a true condition so it invokes the default action of printing the current input, equivalent to just writing {print}.
how can i replace the out file with the output (without using '>') as the option -i does on sed and avoid printing it to stdout? Just like GNU sed has -i, GNU awk has -i inplace. Be careful though because, just like with sed, it applies to every input file so if you don't print the contents of the first file then when the script is done the first file will be empty. There's various was to deal with that, including simply printing the lines from file 1 or turning inplace editing on/off in BEGINFILE/ENDFILE blocks, see https://www.gnu.org/software/gawk/manual/gawk.html#Extension-Sample-Inplace, but IMHO awk 'script' file1 file2 > temp && mv temp file2 is the simplest and clearest as well as being portable to all awks/seds/whatever.
Also if there is a multiline solution like "take lines 1 to 4" from "input.in" and drop them on line 6 of "out"? No problem:
.
awk '
NR==FNR { if (NR<=4) x=x $0 ORS; else nextfile }
FNR==6 { printf "%s", x }
{ print }
' input.in out
I changed the 1 from the previous script to { print } for clarity.

Filter out only matched values from a text file in each line

I have a file "test.txt" with the lines below and also lot bunch of extra stuff after the "version"
soainfra_metrics{metric_group="sca_composite",partition="test",is_active="true",state="on",is_default="true",composite="test123"} map:stats version:1.0
soainfra_metrics{metric_group="sca_composite",partition="gello",is_active="true",state="on",is_default="true",composite="test234"} map:stats version:1.8
soainfra_metrics{metric_group="sca_composite",partition="bolo",is_active="true",state="on",is_default="true",composite="3415"} map:stats version:3.1
soainfra_metrics{metric_group="sca_composite",partition="solo",is_active="true",state="on",is_default="true",composite="hji"} map:stats version:1.1
I tried:
egrep -r 'partition|is_active|state|is_default|composite' test.txt
It's displaying every line, but I need only specific mentioned fields like this below,ignoring rest of the data/stuff or lines
in a nut shell, i want to display only these fields from a line not the rest
partition="test",is_active="true",state="on",is_default="true",composite="test123"
partition="gello",is_active="true",state="on",is_default="true",composite="test234"
partition="bolo",is_active="true",state="on",is_default="true",composite="3415"
partition="solo",is_active="true",state="on",is_default="true",composite="hji"
If your version of grep supports Perl-style regular expressions, then I'd use this:
grep -oP '.*?,\K[^}]+' file
It removes everything up to the first comma (\K kills any previous output) and prints everything up to the }.
Alternatively, using awk:
awk -F'}' '{ sub(/[^,]+,/, ""); print $1 }' file
This sets the field separator to } so the part you're interested in is the first field. It then uses sub to remove the part up to the first comma.
For completeness, you could also use sed:
sed 's/[^,]*,\([^}]*\).*/\1/' file
This captures the part after the first , up to the } and replaces the content of the line with it.
After the grep to pick out the lines you want, use sed to edit the lines:
sed 's/.*\(partition[^}]*\)} map.*/\1/'
This means: "whenever you see anything .*, followed by partition and
any number of non-}, then } map and anything else, grab the part
from partition up to but not including the brace \(...\) as group 1.
The replacement text is just group 1 \1.
Use a pipe | to connect the output of egrep to the input of sed:
egrep ... | sed ...
As far as i understood your file might have more lines you don't want to see, so i would use:
sed -n 's/.*\(partition.*\)}.*/\1/p' file
we use -n p to show only lines where we made substitution. The substitution part just gets the part of the line you need substituting the whole line with the pattern.
This might work for you (GNU sed):
sed -r 's/(partition|is_active|state|is_default|composite)="[^"]*"/\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1,/g;s/,$//' file
Treat the problem as if it were a "decomposed club sandwich". Identify the fillings, remove the bread and tidy up.

How can I remove the last character of a file in unix?

Say I have some arbitrary multi-line text file:
sometext
moretext
lastline
How can I remove only the last character (the e, not the newline or null) of the file without making the text file invalid?
A simpler approach (outputs to stdout, doesn't update the input file):
sed '$ s/.$//' somefile
$ is a Sed address that matches the last input line only, thus causing the following function call (s/.$//) to be executed on the last line only.
s/.$// replaces the last character on the (in this case last) line with an empty string; i.e., effectively removes the last char. (before the newline) on the line.
. matches any character on the line, and following it with $ anchors the match to the end of the line; note how the use of $ in this regular expression is conceptually related, but technically distinct from the previous use of $ as a Sed address.
Example with stdin input (assumes Bash, Ksh, or Zsh):
$ sed '$ s/.$//' <<< $'line one\nline two'
line one
line tw
To update the input file too (do not use if the input file is a symlink):
sed -i '$ s/.$//' somefile
Note:
On macOS, you'd have to use -i '' instead of just -i; for an overview of the pitfalls associated with -i, see the bottom half of this answer.
If you need to process very large input files and/or performance / disk usage are a concern and you're using GNU utilities (Linux), see ImHere's helpful answer.
truncate
truncate -s-1 file
Removes one (-1) character from the end of the same file. Exactly as a >> will append to the same file.
The problem with this approach is that it doesn't retain a trailing newline if it existed.
The solution is:
if [ -n "$(tail -c1 file)" ] # if the file has not a trailing new line.
then
truncate -s-1 file # remove one char as the question request.
else
truncate -s-2 file # remove the last two characters
echo "" >> file # add the trailing new line back
fi
This works because tail takes the last byte (not char).
It takes almost no time even with big files.
Why not sed
The problem with a sed solution like sed '$ s/.$//' file is that it reads the whole file first (taking a long time with large files), then you need a temporary file (of the same size as the original):
sed '$ s/.$//' file > tempfile
rm file; mv tempfile file
And then move the tempfile to replace the file.
Here's another using ex, which I find not as cryptic as the sed solution:
printf '%s\n' '$' 's/.$//' wq | ex somefile
The $ goes to the last line, the s deletes the last character, and wq is the well known (to vi users) write+quit.
After a whole bunch of playing around with different strategies (and avoiding sed -i or perl), the best way i found to do this was with:
sed '$! { P; D; }; s/.$//' somefile
If the goal is to remove the last character in the last line, this awk should do:
awk '{a[NR]=$0} END {for (i=1;i<NR;i++) print a[i];sub(/.$/,"",a[NR]);print a[NR]}' file
sometext
moretext
lastlin
It store all data into an array, then print it out and change last line.
Just a remark: sed will temporarily remove the file.
So if you are tailing the file, you'll get a "No such file or directory" warning until you reissue the tail command.
EDITED ANSWER
I created a script and put your text inside on my Desktop. this test file is saved as "old_file.txt"
sometext
moretext
lastline
Afterwards I wrote a small script to take the old file and eliminate the last character in the last line
#!/bin/bash
no_of_new_line_characters=`wc '/root/Desktop/old_file.txt'|cut -d ' ' -f2`
let "no_of_lines=no_of_new_line_characters+1"
sed -n 1,"$no_of_new_line_characters"p '/root/Desktop/old_file.txt' > '/root/Desktop/my_new_file'
sed -n "$no_of_lines","$no_of_lines"p '/root/Desktop/old_file.txt'|sed 's/.$//g' >> '/root/Desktop/my_new_file'
opening the new_file I created, showed the output as follows:
sometext
moretext
lastlin
I apologize for my previous answer (wasn't reading carefully)
sed 's/.$//' filename | tee newFilename
This should do your job.
A couple perl solutions, for comparison/reference:
(echo 1a; echo 2b) | perl -e '$_=join("",<>); s/.$//; print'
(echo 1a; echo 2b) | perl -e 'while(<>){ if(eof) {s/.$//}; print }'
I find the first read-whole-file-into-memory approach can be generally quite useful (less so for this particular problem). You can now do regex's which span multiple lines, for example to combine every 3 lines of a certain format into 1 summary line.
For this problem, truncate would be faster and the sed version is shorter to type. Note that truncate requires a file to operate on, not a stream. Normally I find sed to lack the power of perl and I much prefer the extended-regex / perl-regex syntax. But this problem has a nice sed solution.

Resources