Edit a text file on Unix - linux

I have a text file with the following format:
Wind River Linux glibc_cgl (cgl) 3.0.3
Build label: NDPGSN_5_0_SRC_GSN_LINUX_GPBOS_2
Build host: eselnvlx1114
Build date: Mon Mar 18 23:24:08 CET 2013
Installed: Fri Jun 20 02:22:08 EEST 2014
Last login: Fri Aug 8 11:37:08 2014 from 172
gsh list_imsins
=== sysadm#eqm01s14p2 ANCB ~ # gsh list_imsin
ps Class Identifiers |
---------------------------------------
A imsins -imsi 20201
A imsins -imsi 20205
A imsins -imsi 20210
A imsins -imsi 204045
I want to extract the numbers next to -imsi. The output would look like:
20201
20205
202210
204045
And after that process the output further, which I've already done. At first I was informed that the text format was static, so I wrote the following script:
for (( r=1; r<5; r++));
do
awk 'NR>12' IMSI$r.txt | awk '{print $NF "\r"}' > N$r
awk 'NR>12' IMSI$r.txt | awk '{print $NF "\r"}' >> out
done
I had 2 files as output because I needed to use both for other purposes.
Is there any way to make the script more flexible, to deal with dynamic text files?
As a possible solution, is it possible to make the script look for the phrase -imsi and grab the record after it? And continue doing so until it finds the end of file?
I tried doing that using grep and awk but I never got the right output. If you have any other ideas to do that please share.

I would go for something like:
$ awk '/-imsi/ {print $NF}' file
20201
20205
20210
204045
This prints the last word on those lines containing -imsi.
You can also use grep with a look-behind, to print the numbers after -imsi.
$ grep -Po '(?<=-imsi )[0-9]*' file
20201
20205
20210
204045

Related

Generate a single ouput in AWK script

I have this code in a test.awk file:
{FS=","} {gsub(/\/[0-9]{1,2}\/[0-9]{4}/,"");}1
{FS=","} {gsub(/:[0-9][0-9]/,"");}1
The code makes transformations in a dataset from a dataset.csv file.
I want that using the following command at the shell, returns me
a newdataset.csv with all the modifications:
gawk -f test.awk dataset.csv
Put both commands in the same block.
BEGIN {FS=","}
{ gsub(/\/[0-9]{1,2}\/[0-9]{4}/,"");
gsub(/:[0-9][0-9]/,"");
}1
You could also do them in the same regexp with alternation, since the replacement is the same.
And since you never do anything that operates on individual fields, there's no need to set the field separator.
{gsub(/:[0-9][0-9]|\/[0-9]{1,2}\/[0-9]{4}/, "")}1
#bramaawk : what this -
echo 'abc:12/31/2046def' |
awk '{gsub(/:[0-9][0-9]|\/[0-9]{1,2}\/[0-9]{4}/, "")}1'
.. looks like to any awk is -
abcdef
# gawk profile, created Sun Jun 5 10:59:06 2022
# Rule(s)
1 {
1 gsub(/:[0-9][0-9]|\/[0-9]{1,2}\/[0-9]{4}/,"",$0)
}
1 1 { # 1
1 print
}
what I'm suggesting is to streamline those into 1 single block :
awk 'gsub(":[0-9]{2}|[/][0-9]{1,2}[/][0-9]{4}",_)^_'
so that awk only needs to see :
# gawk profile, created Sun Jun 5 10:59:34 2022
# Rule(s)
1 gsub(":[0-9]{2}|[/][0-9]{1,2}[/][0-9]{4}",_,$0)^_ { # 1
1 print
}
instead of 2 boolean evaluations (or the poorly-termed "patterns"), and 2 action blocks, make them 1 each instead.
To make your solution generic for gawk+mawk+nawk, just do
{m,n,g}awk NF++ FS=':[0-9][0-9]|[/][0-9][0-9]?[/][0-9][0-9][0-9][0-9]' OFS=

How to use multiple field separators in awk in CentOS minimal install

I have an input log file that looks like this:
Sep 24 22:44:57 192.168.1.9 cts4348 ADD ahay844,Akeem Haynes,Men,Athletics,AT,canada
Sep 24 22:46:26 192.168.1.9 cts4348 ADD afro438,Adam Froese,Men,Hockey,HO,canada
Sep 24 22:47:09 192.168.1.9 cts4348 ADD atra522,Allison Track,CT,canada
I would like to output just the column that has "ADD" and the two columns that follow which is the username and full name. After I pull that information I will be generating an account based on the username and a comment with the full name. I need to use the "space" and "," as a field separator.
The command I am currently using is:
cat cts4348 | awk -F' ' -v OFS=',' '{print $6 " " $7 $8}'
And here is a same of my output:
ADD ahay844,AkeemHaynes,Men,Athletics,AT,canada
ADD afro438,AdamFroese,Men,Hockey,HO,canada
ADD atra522,AllisonTrack,CT,canada
Thank you in advance for any help you can provide
Using awk
This approach sets the field separator to be either ADD or ,:
$ awk -F' ADD |,' '/ADD/{print "ADD", $2, $3}' File
ADD ahay844 Akeem Haynes
ADD afro438 Adam Froese
ADD atra522 Allison Track
Because space-separation is not used, this will work even if the person has a middle name.
Limitation: If the other fields were to contain space-A-D-D-space, then the output might be wrong.
Using sed
$ sed -nE '/ ADD /{s/([^ ]* ){5}//; s/(,[^,]*),.*/\1/p}' File
ADD ahay844,Akeem Haynes
ADD afro438,Adam Froese
ADD atra522,Allison Track
On lines containing ADD, this uses two substitute commands:
s/([^ ]* ){5}// removes the first five space-separated fields.
s/(,[^,]*),.*/\1/ removes all but the first comma-separated fields.
Again, because space-separation is not used, this will work even if the person has a middle name.
awk -F'[ ,]' '{print $6,$7","$8,$9}' file
ADD ahay844,Akeem Haynes
ADD afro438,Adam Froese
ADD atra522,Allison Track
with grep
$ cat ip.txt
Sep 24 22:44:57 192.168.1.9 cts4348 ADD ahay844,Akeem Haynes,Men,Athletics,AT,canada
Sep 24 22:46:26 192.168.1.9 cts4348 ADD afro438,Adam Froese,Men,Hockey,HO,canada
Sep 24 22:47:09 192.168.1.9 cts4348 ADD atra522,Allison Track,CT,canada
$ grep -o 'ADD[^,]*,[^,]*' ip.txt
ADD ahay844,Akeem Haynes
ADD afro438,Adam Froese
ADD atra522,Allison Track
ADD[^,]* ADD followed by zero or more non-comma characters
, comma
[^,]* zero or more non-comma characters
Since * is greedy, it will try to match as many characters as possible
awk with split:
$ awk -F, '{ split($1, a, " "); print "ADD", a[length(a)] "," $2 }' file.txt
ADD ahay844,Akeem Haynes
ADD afro438,Adam Froese
ADD atra522,Allison Track

Specific fields using cut or Awk

How to cut a specific field from a line?
The problem is I can't use cut -d ' ' -f 1,2,3,4,5,9,10,11,12,13,14, since the field changes.
Let's say I have a file called /var/log/test, and one of the lines inside the file looks like this :
Apr 12 07:48:11 172.89.92.41 %ASA-5-713120: Group = People, Username = james.robert, IP = 219.89.259.32, PHASE 2 COMPLETED (msgid=9a4ce822)
I only need to get the Username and Time/Date ( please note columns keep changing, that's why I need to match the Username = james.robert and Apr 12 07:48:11
When I use :
grep "james" /var/log/tes | cut -d ' ' -f 1,2,3,4,5,9,10,11,12,13,14
Doesn't work for me. So it has to match the username and prints only username and data/time. Any suggestions?
Ok so when I use this :
awk -F'[ ,]' '$12~/username/{print $1,$2,$3,$12}' /var/log/test
but it works for some users, but not the others, because fields keep moving.
The sample output of this command is :
Apr 12 06:00:39 james.robert
But when I try this command on this username, it doesn't work. See below :
here is another example that with the above command doesn't show anything:
Apr 8 12:16:13 172.24.32.1 %ASA-6-713228: Group = people, Username = marry.tarin, IP = 209.157.190.11, Assigned private IP address 192.168.237.38 to remote user
if your file is structured consistently
awk -F'[ ,]' '{print $1,$2,$3,$12}' file
Apr 12 07:48:11 james.robert
if you need to match the username, using your sample input
$ awk -F'[ ,]' '$12~/james/{print $1,$2,$3,$12}' file
Apr 12 07:48:11 james.robert
UPDATE
OK, your spaces are not consistent, to fix change the -F
$ awk -F' +|,' '{print $1,$2,$3,$12}' file
Apr 12 07:48:11 james.robert
Apr 8 12:16:13 marry.tarin
you can add the /pattern/ to restrict the match to users as above. Note the change in -F option.
-F' +|,' sets the field separator to spaces (one or more) or comma,
the rest is counting the fields and picking up the right one to print.
/pattern/ will filter the lines that matches the regex pattern, which can > be constrained to certain field only (e.g. 12) by $12~/pattern/
if your text may contain mixed case and you want to be case insensitive, use tolower() function, for example
$ awk -F' +|,' 'tolower($12)~/patterninlowercase/{print $1,$2,$3,$12}' file
With sed:
sed -r 's/^([A-Za-z]{3} [0-9]{1,2} [0-9]{2}:[0-9]{2}:[0-9]{2}).*(Username = [^,]*).*/\1 \2/g' file
You could use awk to delimit by comma and then use substr() and length() to get at the pieces you care about:
awk -F"," '{print substr($1,1,15), substring($3, 13, length($3)-12)}' /var/log/test
With gawk
awk '{u=gensub(/.*(Username = [^,]*).*/,"\\1","g",$0);if ( u ~ "james") {print u,$1,$2,$3}}' file
The following perl will print the date and username delimited by a tab. Add additional valid username characters to [\w.]:
perl -ne '
print $+{date}, "\t", $+{user}, "\n" if
/^(?<date>([^\s]+\s+){2}[^\s]+).*\bUsername\s*=\s*(?<user>[\w.]+)/
'
Varying amounts a tabs and spaces are allowed.

Linux/bash parse text output, select fields, ignore nulls in one field only

I've done my requisite 20 searches but I can't quite find an example that includes the 'ignore null' part of what I'm trying to do. Working on a Linux-ish system that uses bash and has grep/awk/sed/perl and the other usual suspects. Output from a job is in the format:
Some Field I Dont Care About = Nothing Interesting
Another Field That Doesnt Matter = 216
Name = The_Job_name
More Useless Stuff = Blah blah
Estimated Completion = Aug 13, 2015 13:30 EDT
Even Yet Still More Nonsense = Exciting value
...
Jobs not currently active will have a null value for estimated completion time. The field names are long, and multi-word names contain spaces as shown. The delimiter is always "=" and it always appears in the same column, padded with spaces on either side. There may be dozens of jobs listed, and there are about 36 fields for each job. At any given time there are only one or two active, and those are the ones I care about.
I am trying to get the value for the 'Name' field and the value of the 'Estimated Completion' field on a single line for each record that is currently active, hence ignoring nulls, like this:
Job_04 Aug 13, 2015 13:30 EDT
Job_21 Aug 09, 2015 10:10 EDT
...
I started with <command> | grep '^Name\|^Estimated' which got me the lines I care about.
I have moved on to awk -F"=" '/^Name|^Estimated/ {print $2}' which gets the values by themselves. This is where is starts to go awry - I tried to join every other line using awk -F"=" '/^Name|^Estimated/ {print $2}'| sed 'N;s/\n/ /' but the output from that is seriously wonky. Add to this I am not sure whether I should be looking for blank lines and eliminating them (and the preceding line) to get rid of the nulls at this point, or if it is better to read the values into variables and printf them.
I'm not a Perl guy, but if that would be a better approach I'd be happy to shift gears and go in that direction. Any thoughts or suggestions appreciated, Thanks!
Some Field I Dont Care About = Nothing Interesting
Another Field That Doesnt Matter = 216
Name = Job_4119
More Useless Stuff = Blah blah
Estimated Completion =
Even Yet Still More Nonsense = Exciting value
...
I can't comment, not enough reputation...
But I think something like this will work, in your print command
{printf "%s,",$2;next}{print;}
Or use paste command?
paste -s -d",\n" file
You can do something like:
awk -F"=" '/^Name/ {name=$2} /^Estimated/ { print name, $2}' file
if they always come in the same order: name first, estimate next.
You can then add a NULL check to the last field and don't print the line if it matches like:
awk -F"=" '/^Name/ {name=$2} /^Estimated/ { if($2 != "") {print name, $2}}' file
$ awk -F'\\s*=\\s*' '{a[$1]=$2} /^Estimated/ && $2{print a["Name"], $2}' file
The_Job_name Aug 13, 2015 13:30 EDT
Replace \\s with [[:space:]] if you aren't using gawk, i.e.:
$ awk -F'[[:space:]]*=[[:space:]]*' '{a[$1]=$2} /^Estimated/ && $2{print a["Name"], $2}' file
and if your awk doesn't even support character classes then GET A NEW AWK but in the meantime:
$ awk -F'[ \t]*=[ \t]*' '{a[$1]=$2} /^Estimated/ && $2{print a["Name"], $2}' file

How to get input to long command in the end

Would it be possible to give the input to a very long command in the end of the command ? Below example would explain my query more clearly.
currently while grepping I have to do like below:
zgrep -i "A VERY LONG TEXT" file |awk '{print $1}'
Every time I have to move the cursor back to the "A VERY LONG TEXT" to change the pattern.I wanted to alter the command in such a way that "A VERY LONG TEXT" comes in the end of the command so I can quickly change it.
command1 |command2 |some_magic "A VERY LONG TEXT"
I know I can achieve this result by doing CAT and then grepping ,wondering if there is any alternate way to do this. May be like assigning it to a temp variable?
EXAMPLE 2:
I need to get real time time stamp of all the commands and their output in my session files.So I have to use below command. But before executing any command I have to move my cursor till unbuffer and change the commands. Is there is any way I can alter the below command such that I can enter my commands in the end of the line ?
/yumm 194>unbuffer ls -lrt | awk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0; }'
2014-10-01 10:38:19 total 0
2014-10-01 10:38:19 -rw-rw-r-- 1 user bcam 0 Oct 1 10:37 1
2014-10-01 10:38:19 -rw-rw-r-- 1 user bcam 0 Oct 1 10:38 test1
2014-10-01 10:38:19 -rw-rw-r-- 1 user bcam 0 Oct 1 10:38 test2
2014-10-01 10:38:19 -rw-rw-r-- 1 user bcam 0 Oct 1 10:38 test3
2014-10-01 10:38:19 -rw-rw-r-- 1 user bcam 0 Oct 1 10:38 test4
yumm 195>
In short, I need some command to get time stamp of all the commands and their output I execute.
What if you just set this text to a variable?
mystring="A VERY LONG TEXT"
zgrep -i "$mystring" file | awk '{print $1}'
^ ^
note you need double quotes to make it work
Based on your edit, you can also do:
awk '{ print strftime("%Y-%m-%d %H:%M:%S"), $0; }' <<< "$(unbuffer ls -ltr)"
^^^^^^^^^^^^^^^^^^^^^^^^^
When editing and re-submitting a command, use:
Ctrl-A to move the cursor back to the start of the line quickly
Ctrl-E to move to the end of the line quickly
Alt-F to move forwards one word
Alt-B to move backwards one word
Or use fc command to open the last command in the editor and allow you to edit - say with vi commands, and then when you save it, it gets re-submitted for execution.
These shortcut key maybe help you.
Ctrl-a: move to beginning of line
Ctrl-e: move to end of line
Ctrl-b: move to previous character
Ctrl-f: move to next character
Ctrl-p: previous command (same as "UP" key in bash history)
Ctrl-n: next command (same as "DOWN" key in bash history)
Ctrl-h: Delete backward character
Ctrl-d: Delete current cursored character
Ctrl-k: Delete characters after cursor
You can easily image if you know Emacs editor.
These shortcut is same key bind as same as emacs.
You can define a function, e.g., in your ~/.bash_profile:
some_magic() {
zgrep "$1" file | awk '{print $1}'
}
And use it the following way:
some_magic "A VERY LONG TEXT"
As for your second example, as soon as a command output is piped, it gets buffered by the pipe. Therefore the timestamp acquired at the other side of the pipe is wrong. Anyway, if you don't mind having a wrong timestamp, you can use this function:
some_other_magic() {
$1 | awk '{print strftime("%Y-%m-%d %H:%M:%S"), $0}'
}
And use it the following way:
some_other_magic "ls -lrt"

Resources