Print log in unix starting text - linux

I have a huge log file and I wanted to copy a log starting from text to my local directory.
log file name
'cktrm.ecg-2015-12-21.gz'
command I ran was
bash-3.2$ gzgrep '2665941' cktrm.ecg-2015-12-21.gz > ~/log.txt
but this only copy all the lines that contains 2665941 to the text file.
What I need is to copy starting from that text to the end of the file.
For example:
...
log.info [id = 2665941]
log.debug ....
log.debug ...
log.debug [add to id 2665941]
...
what currently printed to text file is
log.info [id = 2665941]
log.debug [add to id 2665941]
what I need is
log.info [id = 2665941]
log.debug ....
log.debug ...
log.debug [add to id 2665941]
...

You can use a short awk program that prints its input starting from where it finds a pattern:
zcat cktrm.ecg-2015-12-21.gz | awk "/2665941/ {printLog = 1}
printLog { print }" > ~/log.txt
Note: this is a single command, but it consists of two lines. If you press enter while in the middle of a quoted string bash will give you a secondary prompt and you'll be able to continue to the next line of the command, where of course you'll need to close the quote.
The zcat uncompresses your file and its output is the input of the awk command.
The awk script means:
When the current line matches the given pattern (in this case 2665941), set the printLog parameter to 1.
When the printLog parameter is set to something nonzero, print the current line.
This means that from the moment it finds the first occurrence of the pattern, it will print all lines.
Finally, the output of awk is redirected to your requested file.

If you know the specific format of the start and end markers you can use awk to achieve what you need:
cat > test-file
hear no evil
no evil here
start see no evil
know no evil
no evil known here
end see no evil
no know evil
^D
To only display lines between 'start see no evil' and 'end see no evil'
cat test-file | awk -e '/start/,/end/ {print}'
results in the following output
start see no evil
know no evil
no evil known here
end see no evil

Related

awk, sed, grep specific strings from a file in Linux

Here is part of the complete file that I am trying to filter:
Hashmode: 13761 - VeraCrypt PBKDF2-HMAC-SHA256 + XTS 512 bit + boot-mode (Iterations: 200000)
Speed.#2.........: 2038 H/s (56.41ms) # Accel:128 Loops:32 Thr:256 Vec:1
Speed.#3.........: 2149 H/s (53.51ms) # Accel:128 Loops:32 Thr:256 Vec:1
Speed.#*.........: 4187 H/s
The aim is to print the following:
13761 VeraCrypt PBKDF2-HMAC-SHA256 4187 H/s
Here is what I tried.
The complete file is called complete.txt
cat complete.txt | grep Hashmode | awk '{print $2,$4,$5}' > mode.txt
Output:
13761 VeraCrypt PBKDF2-HMAC-SHA256
Then:
cat complete.txt | grep Speed.# | awk '{print $2,$3}' > speed.txt
Output:
4187 H/s
Then:
paste mode.txt speed.txt
The issue is that the lines do not match. There are approx 200 types of modes to filter within the file 'complete.txt'
I also have a feeling that this can be done using a much simpler command with sed or awk.
I am guessing you are looking for something like the following.
awk '/Hashmode:/ { if(label) print label, speed; label = $2 " " $4 " " $5 }
/Speed\.#/ { speed = $2 " " $ 3 }
END { if (label) print label, speed }' complete.txt
We match up the Hashmode line with the last Speed.# line which follows, then print when we see a new Hashmode, or reach end of file. (Failing to print the last one is a common beginner bug.)
This might work for you (GNU sed):
sed -E '/Hashmode:/{:a;x;s/^[^:]*: (\S+) -( \S+ \S+ ).*\nSpeed.*:\s*(\S+ \S+).*/\1\2\3/p;x;h;d};H;$!d;ba' file
If a line contains Hashmode, swap to the hold space and using pattern matching, manipulate its contents to the desired format and print, swap back to the pattern space, copy the current line to the hold space and delete the current line.
Otherwise, append the current line to the hold space and delete the current line, unless the current line is the last line in the file, in which case process the line as if it contained Hashmode.
N.B. The first time Hashmode is encountered, nothing is output. Subsequent matches and the end-of-file condition will be the only times printing occurs.

how can I use a shell script to replace the name of a file with that file's contents

Say I have a file which includes an arbitrary number of other filenames (in recognizable delimiters), eg
original-file-contents-which-should-remain
{{filename.txt}}
more-untouchable-contents
{{dir/myfile.md}}
How can I use a shell script to replace the filenames (and delimiters), that is, {{filename.txt}} and {{dir/myfile.md}} with the contents of the respective files?
I've tried using sed, and while it works if I hardcode the target file name, it cannot capture the file name from the regex -- ie, the following removes {{myfile}} but does not enter the contents of ./myfile (I guess maybe regex capture only works with s/old/new commands):
sed -e "/{{\(.*\)}}/ { r \1" -e "d}" somefile
If you have GNU sed, then you can use the e flag for the s command like this :
sed -f - input_file <<'EOS'
s/^\s*{{\([^']*\)}}\s*$/cat '\1'/e
EOS
The pattern containing the filename is first replaced with the shell command cat filename, then the command is run and its output piped back into the pattern space.
As you can see, the sed program is loaded from a literal here-doc instead of the usual single-quoted string because it lets use single-quotes to enclose the filename in the shell command while keeping the whole sed command clean (and unless you fully trust the content of your input files, single-quotes are safer than double quotes). For safety again, the address regex prevent single-quotes in the filename and use both begin/end of line anchors to ensure that the shell command is not polluted by possible leading/trailing parts.
As pointed out by KamilCuk, you can use GNU awk as well. It comes with a library of additional functions, such as readfile, which is nice here to avoid the shell command.
gawk -i readfile '
! match($0, /^\s*\{\{(.*)}}\s*$/, refs) { print ; next }
{ printf "%s", readfile(refs[1]) }
' input_file
(updated for possible spaces around the patterns and unneeded escaping thanks to the portable answer from Ed Morton)
This will work with any awk in any shell on every UNIX box given input like in your example:
awk '
match($0,/\{\{.*}}/) {
fname = substr($0,RSTART+2,RLENGTH-4)
while ( (getline < fname) > 0 ) {
print
}
close(fname)
next
}
{ print }
' file
e.g.
$ cat tam
When chapman billies leave the street,
And drouthy neebors neebors meet,
As market-days are wearing late,
{{foo}}
We think na on the lang Scots miles,
The mosses, waters, slaps, and stiles,
{{bar}}
Gathering her brows like gathering storm,
Nursing her wrath to keep it warm.
$ cat foo
And folk begin to tak the gate;
While we sit bousin, at the nappy,
And gettin fou and unco happy,
$ cat bar
That lie between us and our hame,
Whare sits our sulky, sullen dame,
.
awk '
match($0,/\{\{.*}}/) {
fname = substr($0,RSTART+2,RLENGTH-4)
while ( (getline < fname) > 0 ) {
print
}
close(fname)
next
}
{ print }
' tam
When chapman billies leave the street,
And drouthy neebors neebors meet,
As market-days are wearing late,
And folk begin to tak the gate;
While we sit bousin, at the nappy,
And gettin fou and unco happy,
We think na on the lang Scots miles,
The mosses, waters, slaps, and stiles,
That lie between us and our hame,
Whare sits our sulky, sullen dame,
Gathering her brows like gathering storm,
Nursing her wrath to keep it warm.
Sample input courtesy of Rabbie Burns.
Using vim's version of ex:
ex -c 'g/{{/s/{{\([^}]*\)}}/\=readfile(submatch(1))/g' -c 'x!' input.txt

Get text between, but not including, header and footer using awk or sed

Suppose I have a file myfile.txt, with the following contents:
1234
5678
start
stuff
stop
9871
I would like to get the data between the header 'start' and the footer 'stop' but not including these borders (so in this case, my result would just be the line 'stuff'). Using awk and sed, I tried the following:
awk '/start/ { show=1 } show; /stop/ { show=0 }' myfile.txt
sed -n '/start/,/stop/p' myfile.txt
But these include the header and footer in the output. How can I do it so that I don't retain the header and foot - but only the info in between?
Just reverse the order of the tests:
$ awk '/stop/{show=0} show; /start/ { show=1 }' myfile.txt
stuff
How it works
/stop/{show=0}
Any time we encounter a line that matches the regex stop, we set the variable show to 0 (false).
show;
If show is true, print the line.
In more detail, show is a condition, meaning that it is evaluated and, if true, an action is performed. Since we don't explicitly specify an action, the default action is performed which is print $0.
Since no action is explicitly specified, we need to follow show with ; in order to separated it from the next command.
/start/ { show=1 }
Any time we encounter a line that matches the regex start, we set the variable show to 1 (true).
With gnu sed
sed '/start/,/stop/!d;//d' myfile.txt
Another sed-command, but gnu-sed, too:
echo "1234
5678
start
stuff
stop
9871" | sed -n '/start/,/stop/p' | sed '1d;$d'
stuff
There is no problem in programming, which couldn't be solved with another layer of sed. :)

How to remove all lines from a text file starting at first empty line?

What is the best way to remove all lines from a text file starting at first empty line in Bash? External tools (awk, sed...) can be used!
Example
1: ABC
2: DEF
3:
4: GHI
Line 3 and 4 should be removed and the remaining content should be saved in a new file.
With GNU sed:
sed '/^$/Q' "input_file.txt" > "output_file.txt"
With AWK:
$ awk '/^$/{exit} 1' test.txt > output.txt
Contents of output.txt
$ cat output.txt
ABC
DEF
Walkthrough: For lines that matches ^$ (start-of-line, end-of-line), exit (the whole script). For all lines, print the whole line -- of course, we won't get to this part after a line has made us exit.
Bet there are some more clever ways to do this, but here's one using bash's 'read' builtin. The question asks us to keep lines before the blank in one file and send lines after the blank to another file. You could send some of standard out one place and some another if you are willing to use 'exec' and reroute stdout mid-script, but I'm going to take a simpler approach and use a command line argument to let me know where the post-blank data should go:
#!/bin/bash
# script takes as argument the name of the file to send data once a blank line
# found
found_blank=0
while read stuff; do
if [ -z $stuff ] ; then
found_blank=1
fi
if [ $found_blank ] ; then
echo $stuff > $1
else
echo $stuff
fi
done
run it like this:
$ ./delete_from_empty.sh rest_of_stuff < demo
output is:
ABC
DEF
and 'rest_of_stuff' has
GHI
if you want the before-blank lines to go somewhere else besides stdout, simply redirect:
$ ./delete_from_empty.sh after_blank < input_file > before_blank
and you'll end up with two new files: after_blank and before_blank.
Perl version
perl -e '
open $fh, ">","stuff";
open $efh, ">", "rest_of_stuff";
while(<>){
if ($_ !~ /\w+/){
$fh=$efh;
}
print $fh $_;
}
' demo
This creates two output files and iterates over the demo data. When it hits a blank line, it flips the output from one file to the other.
Creates
stuff:
ABC
DEF
rest_of_stuff:
<blank line>
GHI
Another awk would be:
awk -vRS= '1;{exit}' file
By setting the record separator RS to be an empty string, we define the records as paragraphs separated by a sequence of empty lines. It is now easily to adapt this to select the nth block as:
awk -vRS= '(FNR==n){print;exit}' file
There is a problem with this method when processing files with a DOS line-ending (CRLF). There will be no empty lines as there will always be a CR in the line. But this problem applies to all presented methods.

awk command with pipe lining some doubts

I know in the below code the pipe command will pass the output to next command. But I have a doubt in the case of awk execution. My doubt is that
Is each awk block will iterate through all the lines in the file or it will iterate one by one through the line. More clearly and as I assumed ...
1) 1st awk block will iterate through 1st line.
2) print that line if condition satisfies. (pass this out put to next awk block)
3) else do nothing
4) next awk block recieves this output and process that particular line.
5) write it in the filereceipt.tmp
In this way it processing or
1) 1st awk block will iterate through all the lines in that file.
2) pass the out put to next awk block
3) next awk block will operate up on the out put passed 1st awk block.
Please help me. I have no option torun this commands. Thanks in advance!
cat > /tmp/pay.dat
grep -v '^TRAILER' /tmp/pay.dat
| \
awk '{
if ((substr($0,145,2) != "CA")
{
print $0
}
}'
|\
awk 'BEGIN{OFS=""} \
{
if (substr($0,38,1) == "X") \
{
print substr($0,1,37), "S", substr($0,39)
} \
else {
print $0
}
}' > /tmp/receipt.tmp
Either and/or both.
What? How?
Each awk will iterate over the lines given to it - the first awk receives lines that don't start with "TRAILER", the second receives the lines that the first gives to it. The processes execute in parallel, each reading and writing data as it pleases. (A process that tries to read data that has not yet been written will sleep until that data is available.)
The order in which any side effects happen is unpredictable, depending on system process scheduling (including current load), pipe buffer sizes, awk execution overhead, etc.
Shellscript formatting
The grep and the first awk are on their own lines, which do not end in pipes or backslashes. That's not a pipeline, it's just a bunch of commands. And if you're using the Bourne shell or any shell descended from it, quoted strings don't need backslashes - they continue until interrupted by a closing quote.
Try something like this:
# This assumes that your data is already in "/tmp/pay.dat".
grep -v "^TRAILER" /tmp/pay.dat |
awk 'your first
awk script' |
awk 'your second
awk script' > /tmp/receipt.tmp
(In a Bourne-derived shell, lines ending in | are automatically continued - no trailing backslash required.)

Resources