Specific fields using cut or Awk - linux

How to cut a specific field from a line?
The problem is I can't use cut -d ' ' -f 1,2,3,4,5,9,10,11,12,13,14, since the field changes.
Let's say I have a file called /var/log/test, and one of the lines inside the file looks like this :
Apr 12 07:48:11 172.89.92.41 %ASA-5-713120: Group = People, Username = james.robert, IP = 219.89.259.32, PHASE 2 COMPLETED (msgid=9a4ce822)
I only need to get the Username and Time/Date ( please note columns keep changing, that's why I need to match the Username = james.robert and Apr 12 07:48:11
When I use :
grep "james" /var/log/tes | cut -d ' ' -f 1,2,3,4,5,9,10,11,12,13,14
Doesn't work for me. So it has to match the username and prints only username and data/time. Any suggestions?
Ok so when I use this :
awk -F'[ ,]' '$12~/username/{print $1,$2,$3,$12}' /var/log/test
but it works for some users, but not the others, because fields keep moving.
The sample output of this command is :
Apr 12 06:00:39 james.robert
But when I try this command on this username, it doesn't work. See below :
here is another example that with the above command doesn't show anything:
Apr 8 12:16:13 172.24.32.1 %ASA-6-713228: Group = people, Username = marry.tarin, IP = 209.157.190.11, Assigned private IP address 192.168.237.38 to remote user

if your file is structured consistently
awk -F'[ ,]' '{print $1,$2,$3,$12}' file
Apr 12 07:48:11 james.robert
if you need to match the username, using your sample input
$ awk -F'[ ,]' '$12~/james/{print $1,$2,$3,$12}' file
Apr 12 07:48:11 james.robert
UPDATE
OK, your spaces are not consistent, to fix change the -F
$ awk -F' +|,' '{print $1,$2,$3,$12}' file
Apr 12 07:48:11 james.robert
Apr 8 12:16:13 marry.tarin
you can add the /pattern/ to restrict the match to users as above. Note the change in -F option.
-F' +|,' sets the field separator to spaces (one or more) or comma,
the rest is counting the fields and picking up the right one to print.
/pattern/ will filter the lines that matches the regex pattern, which can > be constrained to certain field only (e.g. 12) by $12~/pattern/
if your text may contain mixed case and you want to be case insensitive, use tolower() function, for example
$ awk -F' +|,' 'tolower($12)~/patterninlowercase/{print $1,$2,$3,$12}' file

With sed:
sed -r 's/^([A-Za-z]{3} [0-9]{1,2} [0-9]{2}:[0-9]{2}:[0-9]{2}).*(Username = [^,]*).*/\1 \2/g' file

You could use awk to delimit by comma and then use substr() and length() to get at the pieces you care about:
awk -F"," '{print substr($1,1,15), substring($3, 13, length($3)-12)}' /var/log/test

With gawk
awk '{u=gensub(/.*(Username = [^,]*).*/,"\\1","g",$0);if ( u ~ "james") {print u,$1,$2,$3}}' file

The following perl will print the date and username delimited by a tab. Add additional valid username characters to [\w.]:
perl -ne '
print $+{date}, "\t", $+{user}, "\n" if
/^(?<date>([^\s]+\s+){2}[^\s]+).*\bUsername\s*=\s*(?<user>[\w.]+)/
'
Varying amounts a tabs and spaces are allowed.

Related

How to convert an uneven tab separated file using sed?

How to convert an uneven TAB separated input file to CSV or PSV using sed command?
28828082-1 04/08/19 08:48 04/11/19 12:37 04/12/19 16:22 4/15-4/16 04/17/19 2 9 LCO W OIP 04/08/19 08:53 21 1 58.00 9 222 79 FEDX FEDXH SL3 484657064673 0410099900691041119 SMITHFIELD RI 02917 "41.890066 , -71.548680" YES
Above is 1 row, I tried using sed -r 's/^\s+//;s/\s+/|/g' but the result was not as expected.
gawk to the rescue!
$ awk -vFPAT='([^[:space:]]+)|("[^"]+")' -v OFS='|' '$1=$1' file
28828082-1|04/08/19|08:48|04/11/19|12:37|04/12/19|16:22|4/15-4/16|04/17/19|2|9|LCO|W|OIP|04/08/19|08:53|21|1|58.00|9|222|79|FEDX|FEDXH|SL3|484657064673|0410099900691041119|SMITHFIELD|RI|02917|"41.890066 , -71.548680"|YES
define the field pattern as non space or a quoted value which might include spaces (but not escaped quotes), replace the output field separated with tab, force the line to be parsed and non zero lines will be printed after format change.
A better version would be ... '{$1=$1; print}'.
Of course, if all the field delimiters are tabs and quotes string doesn't include any tabs, it's much simpler.
Your question isn't clear but is this what you're trying to do?
$ printf 'now\t"is the winter"\tof\t"our discontent"\n' > file
$ cat file
now "is the winter" of "our discontent"
$ tr '\t' ',' < file
now,"is the winter",of,"our discontent"
$ tr '\t' '|' < file
now|"is the winter"|of|"our discontent"
You initial answer was very close:
sed 's/[[:space:]]\+/|/g' input.txt
Explanation:
[[:space:]] Match a single whitespace character such as space/tab/CR/newline.
\+ Match one or more of the current grab.
Update:
If you require 2 or more white spaces.
sed 's/[[:space:]]\{2,\}/|/g' input.txt
\{2,\} Match two or more of the current grab.

how to count occurrence of specific word in group of file by bash/shellscript

i have two text files 'simple' and 'simple1' with following data in them
simple.txt--
hello
hi hi hello
this
is it
simple1.txt--
hello hi
how are you
[]$ tr ' ' '\n' < simple.txt | grep -i -c '\bh\w*'
4
[]$ tr ' ' '\n' < simple1.txt | grep -i -c '\bh\w*'
3
this commands show the number of words that start with "h" for each file but i want to display the total count to be 7 i.e. total of both file. Can i do this in single command/shell script?
P.S.: I had to write two commands as tr does not take two file names.
Try this, the straightforward way :
cat simple.txt simple1.txt | tr ' ' '\n' | grep -i -c '\bh\w*'
This alternative requires no pipelines:
$ awk -v RS='[[:space:]]+' '/^h/{i++} END{print i+0}' simple.txt simple1.txt
7
How it works
-v RS='[[:space:]]+'
This tells awk to treat each word as a record.
/^h/{i++}
For any record (word) that starts with h, we increment variable i by 1.
END{print i+0}
After we have finished reading all the files, we print out the value of i.
It is not the case, that tr accepts only one filename, it does not accept any filename (and always reads from stdin). That's why even in your solution, you didn't provide a filename for tr, but used input redirection.
In your case, I think you can replace tr by fmt, which does accept filenames:
fmt -1 simple.txt simple1.txt | grep -i -c -w 'h.*'
(I also changed the grep a bit, because I personally find it better readable this way, but this is a matter of taste).
Note that both solutions (mine and your original ones) would count a string consisting of letters and one or more non-space characters - for instance the string haaaa.hbbbbbb.hccccc - as a "single block", i.e. it would only add 1 to the count of "h"-words, not 3. Whether or not this is the desired behaviour, it's up to you to decide.

How to use multiple field separators in awk in CentOS minimal install

I have an input log file that looks like this:
Sep 24 22:44:57 192.168.1.9 cts4348 ADD ahay844,Akeem Haynes,Men,Athletics,AT,canada
Sep 24 22:46:26 192.168.1.9 cts4348 ADD afro438,Adam Froese,Men,Hockey,HO,canada
Sep 24 22:47:09 192.168.1.9 cts4348 ADD atra522,Allison Track,CT,canada
I would like to output just the column that has "ADD" and the two columns that follow which is the username and full name. After I pull that information I will be generating an account based on the username and a comment with the full name. I need to use the "space" and "," as a field separator.
The command I am currently using is:
cat cts4348 | awk -F' ' -v OFS=',' '{print $6 " " $7 $8}'
And here is a same of my output:
ADD ahay844,AkeemHaynes,Men,Athletics,AT,canada
ADD afro438,AdamFroese,Men,Hockey,HO,canada
ADD atra522,AllisonTrack,CT,canada
Thank you in advance for any help you can provide
Using awk
This approach sets the field separator to be either ADD or ,:
$ awk -F' ADD |,' '/ADD/{print "ADD", $2, $3}' File
ADD ahay844 Akeem Haynes
ADD afro438 Adam Froese
ADD atra522 Allison Track
Because space-separation is not used, this will work even if the person has a middle name.
Limitation: If the other fields were to contain space-A-D-D-space, then the output might be wrong.
Using sed
$ sed -nE '/ ADD /{s/([^ ]* ){5}//; s/(,[^,]*),.*/\1/p}' File
ADD ahay844,Akeem Haynes
ADD afro438,Adam Froese
ADD atra522,Allison Track
On lines containing ADD, this uses two substitute commands:
s/([^ ]* ){5}// removes the first five space-separated fields.
s/(,[^,]*),.*/\1/ removes all but the first comma-separated fields.
Again, because space-separation is not used, this will work even if the person has a middle name.
awk -F'[ ,]' '{print $6,$7","$8,$9}' file
ADD ahay844,Akeem Haynes
ADD afro438,Adam Froese
ADD atra522,Allison Track
with grep
$ cat ip.txt
Sep 24 22:44:57 192.168.1.9 cts4348 ADD ahay844,Akeem Haynes,Men,Athletics,AT,canada
Sep 24 22:46:26 192.168.1.9 cts4348 ADD afro438,Adam Froese,Men,Hockey,HO,canada
Sep 24 22:47:09 192.168.1.9 cts4348 ADD atra522,Allison Track,CT,canada
$ grep -o 'ADD[^,]*,[^,]*' ip.txt
ADD ahay844,Akeem Haynes
ADD afro438,Adam Froese
ADD atra522,Allison Track
ADD[^,]* ADD followed by zero or more non-comma characters
, comma
[^,]* zero or more non-comma characters
Since * is greedy, it will try to match as many characters as possible
awk with split:
$ awk -F, '{ split($1, a, " "); print "ADD", a[length(a)] "," $2 }' file.txt
ADD ahay844,Akeem Haynes
ADD afro438,Adam Froese
ADD atra522,Allison Track

Linux/bash parse text output, select fields, ignore nulls in one field only

I've done my requisite 20 searches but I can't quite find an example that includes the 'ignore null' part of what I'm trying to do. Working on a Linux-ish system that uses bash and has grep/awk/sed/perl and the other usual suspects. Output from a job is in the format:
Some Field I Dont Care About = Nothing Interesting
Another Field That Doesnt Matter = 216
Name = The_Job_name
More Useless Stuff = Blah blah
Estimated Completion = Aug 13, 2015 13:30 EDT
Even Yet Still More Nonsense = Exciting value
...
Jobs not currently active will have a null value for estimated completion time. The field names are long, and multi-word names contain spaces as shown. The delimiter is always "=" and it always appears in the same column, padded with spaces on either side. There may be dozens of jobs listed, and there are about 36 fields for each job. At any given time there are only one or two active, and those are the ones I care about.
I am trying to get the value for the 'Name' field and the value of the 'Estimated Completion' field on a single line for each record that is currently active, hence ignoring nulls, like this:
Job_04 Aug 13, 2015 13:30 EDT
Job_21 Aug 09, 2015 10:10 EDT
...
I started with <command> | grep '^Name\|^Estimated' which got me the lines I care about.
I have moved on to awk -F"=" '/^Name|^Estimated/ {print $2}' which gets the values by themselves. This is where is starts to go awry - I tried to join every other line using awk -F"=" '/^Name|^Estimated/ {print $2}'| sed 'N;s/\n/ /' but the output from that is seriously wonky. Add to this I am not sure whether I should be looking for blank lines and eliminating them (and the preceding line) to get rid of the nulls at this point, or if it is better to read the values into variables and printf them.
I'm not a Perl guy, but if that would be a better approach I'd be happy to shift gears and go in that direction. Any thoughts or suggestions appreciated, Thanks!
Some Field I Dont Care About = Nothing Interesting
Another Field That Doesnt Matter = 216
Name = Job_4119
More Useless Stuff = Blah blah
Estimated Completion =
Even Yet Still More Nonsense = Exciting value
...
I can't comment, not enough reputation...
But I think something like this will work, in your print command
{printf "%s,",$2;next}{print;}
Or use paste command?
paste -s -d",\n" file
You can do something like:
awk -F"=" '/^Name/ {name=$2} /^Estimated/ { print name, $2}' file
if they always come in the same order: name first, estimate next.
You can then add a NULL check to the last field and don't print the line if it matches like:
awk -F"=" '/^Name/ {name=$2} /^Estimated/ { if($2 != "") {print name, $2}}' file
$ awk -F'\\s*=\\s*' '{a[$1]=$2} /^Estimated/ && $2{print a["Name"], $2}' file
The_Job_name Aug 13, 2015 13:30 EDT
Replace \\s with [[:space:]] if you aren't using gawk, i.e.:
$ awk -F'[[:space:]]*=[[:space:]]*' '{a[$1]=$2} /^Estimated/ && $2{print a["Name"], $2}' file
and if your awk doesn't even support character classes then GET A NEW AWK but in the meantime:
$ awk -F'[ \t]*=[ \t]*' '{a[$1]=$2} /^Estimated/ && $2{print a["Name"], $2}' file

How to filter a the required content from a string in linux?

I had a string like:-
sometext sometext BASEDIR=/someword/someword/someword/1342.32 sometext sometext.
Could someone tell me, how to filter this number 1342.32, from the above string in linux??
$ echo "sometext BASEDIR=/someword/1342.32 sometext." |
sed "s/[^0-9.]//g"
> 1342.32.
The sed command searches for anything not in the set "0123456789" or ".", and replaces it with nothing (deletes it). It does this in global mode, so it doesn't stop on the first match.
This is enough if you're just trying to read it. If you're trying to feed the number into another command and need a real number, you will need to clean it up:
$ ... | cut -f 1-2 -d "."
> 1342.32
cut splits the input on the delemiter, then selects fields 1 and 2 (numbered from one). So "1.2.3.4" would return "1.2".
If sometext is always delimited from the surrounding fields by a white space, try this
cat log.txt | awk '{for (i=1;i<=NF;i++) {if ($i ~
/BASEDIR/) {print i,$i}}}' | awk -F/ '{for (i=1;i<=NF;i++) {if ($i ~
/^[0-9][0-9]*$/) {print $i}}}'
The code snippet above assumes that your data is contained in a file called log.txt and organised in records(read this awk-wise)
This works also if digits appear in sometext before BASEDIR as well as if the input has additional lines:
sed -n 's,.*BASEDIR=\(/\w*\)*/\([0-9.]*\).*,\2,p'
-n do not output lines without BASEDIR…
\(/\w*\)* group of / and someword, repeated
\([0-9.]*\) group of repeated digit or decimal point
\2 replacement of everything matched (the entire line) with the 2nd group
p print the result

Resources