I want to search as string in a file, the string is dd//mm. I need to count the number of occurrences of these string. How could I use the string in awk. At present I use something like this but result is empty:
awk ' /$1/$2/ {i++}END{print i}' filename.text
sample contents in file
09/Oct/2012 filecontentesfilecontetn
09/OCt/2012 filecontentesfilecontetn
08/OCt/2012 filecontentesfilecontetn
Assuming your awk is GNU:
$ awk '/'$1'\/'$2'/{i++}END{print i}' IGNORECASE=1 file
2
did you try
awk '/dd\/\/mm/ { i++; } END { print i; }' filename.txt
If all you're doing is counting, you can use grep for that.
$ grep -c '09/Oct' file.txt
The -c option tells it to count the number of times the pattern is matched.
But if you want tp use awk, you can pass the string in from a shell script, you can use an awk variable with -v:
#!/bin/sh
string="09/Oct"
awk -v string="$string" '$0 ~ string {i++} END {print i}' file
If you also need to match things in lower case (per the input shown in your question), you can convert everything to one case before comparison:
awk -v string="$string" 'BEGIN{string=tolower(string)} tolower($0) ~ string {i++} END {print i}' file
This should work with all versions of awk, not just the GNU variety (gawk).
Related
cat a.txt
a.b.c.d.e.google.com
x.y.z.google.com
rev a.txt | awk -F. '{print $2,$3}' | rev
This is showing:
e google
x google
But I want this output
a.b.c.d.e.google
b.c.d.e.google
c.d.e.google
e.google
x.y.z.google
y.z.google
z.google
With your shown samples, please try following awk code. Written and tested in GNU awk should work in any awk.
awk '
BEGIN{
FS=OFS="."
}
{
nf=NF
for(i=1;i<(nf-1);i++){
print
$1=""
sub(/^[[:space:]]*\./,"")
}
}
' Input_file
Here is one more awk solution:
awk -F. '{while (!/^[^.]+\.[^.]+$/) {print; sub(/^[^.]+\./, "")}}' file
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com
Using sed
$ sed -En 'p;:a;s/[^.]+\.(.*([^.]+\.){2}[[:alpha:]]+$)/\1/p;ta' input_file
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com
Using bash:
IFS=.
while read -ra a; do
for ((i=${#a[#]}; i>2; i--)); do
echo "${a[*]: -i}"
done
done < a.txt
Gives:
a.b.c.d.e.google.com
b.c.d.e.google.com
c.d.e.google.com
d.e.google.com
e.google.com
x.y.z.google.com
y.z.google.com
z.google.com
(I assume the lack of d.e.google.com in your expected output is typo?)
For a shorter and arguably simpler solution, you could use Perl.
To auto-split the line on the dot character into the #F array, and then print the range you want:
perl -F'\.' -le 'print join(".", #F[0..$#F-1])' a.txt
-F'\.' will auto-split each input line into the #F array. It will split on the given regular expression, so the dot needs to be escaped to be taken literally.
$#F is the number of elements in the array. So #F[0..$#F-1] is the range of elements from the first one ($F[0]) to the penultimate one. If you wanted to leave out both "google" and "com", you would use #F[0..$#F-2] etc.
Able to trim and transpose the below data with sed, but it takes considerable time. Hope it would be better with AWK. Welcome any suggestions on this
Input Sample Data:
[INX_8_60L ] :9:Y
[INX_8_60L ] :9:N
[INX_8_60L ] :9:Y
[INX_8_60Z ] :9:Y
[INX_8_60Z ] :9:Y
Required Output:
INX?_8_60L¦INX?_8_60L¦INX?_8_60L¦INX?_8_60Z¦INX?_8_60Z
Just use awk, e.g.
awk -v n=0 '{printf (n?"!%s":"%s", substr ($0,2,match($0,/[ \t]+/)-2)); n=1} END {print ""}' file
Which will be orders of magnitude faster. It just picks out the (e.g. "INX_8_60L") substring using substring and match. n is simply used as a false/true (0/1) flag to prevent outputting a "!" before the first string.
Example Use/Output
With your data in file you would get:
$ awk -v n=0 '{printf (n?"!%s":"%s", substr ($0,2,match($0,/[ \t]+/)-2)); n=1} END {print ""}' file
INX_8_60L!INX_8_60L!INX_8_60L!INX_8_60Z!INX_8_60Z
Which appears to be what you are after. (Note: I'm not sure what your separator character is, so just change above as needed) If not, let me know and I'm happy to help further.
Edit Per-Changes
Including the '?' isn't difficult, and I just copied the character, so you would now have:
awk -v n=0 '{s=substr($0,2,match($0,/[ \t]+/)-2); sub(/_/,"?_",s); printf n?"¦%s":"%s", s; n=1}
END {print ""}' file
Example Output
INX?_8_60L¦INX?_8_60L¦INX?_8_60L¦INX?_8_60Z¦INX?_8_60Z
And to simplify, just operating on the first field as in #JamesBrown's answer, that would reduce to:
awk -v n=0 '{s=substr($1,2); sub(/_/,"?_",s); printf n?"¦%s":"%s", s; n=1} END {print ""}' file
Let me know if that needs more changes.
Don't start so many sed commands, separate the sed operations with semicolon instead.
Try to process the data in a single job and avoid regex. Below reading with substr() static sized first block and insterting ? while outputing.
$ awk '{
b=b (b==""?"":";") substr($1,2,3) "?" substr($1,5)
}
END {
print b
}' file
Output:
INX?_8_60L;INX?_8_60L;INX?_8_60L;INX?_8_60Z;INX?_8_60Z
If the fields are not that static in size:
$ awk '
BEGIN {
FS="[[_ ]" # split field with regex
}
{
printf "%s%s?_%s_%s",(i++?";":""), $2,$3,$4 # output semicolons and fields
}
END {
print ""
}' file
Performance of solutions for 20 M records:
Former:
real 0m8.017s
user 0m7.856s
sys 0m0.160s
Latter:
real 0m24.731s
user 0m24.620s
sys 0m0.112s
sed can be very fast when used gingerly, so for simplicity and speed you might wish to consider:
sed -e 's/ .*//' -e 's/\[INX/INX?/' | tr '\n' '|' | sed -e '$s/|$//'
The second call to sed is there to satisfy the requirement that there is no trailing |.
Another solution using GNU awk:
awk -F'[[ ]+' '
{printf "%s%s",(o?"¦":""),gensub(/INX/,"INX?",1,$2);o=1}
END{print ""}
' file
The field separator is set (with -F option) such that it matches the wanted parameter.
The main statement is to print the modified parameter with the ? character.
The variable o allows to keep track of the delimeter ¦.
I am trying to get a substring in a string that is in a large line of data.
The regex (INC............) matches the substring I am trying to get the value of at https://regexr.com/, but I am unable to get the value of the substring into a variable or print it out.
The part of the string around this value is
......TemplateID2":null,"Incident Number":"INC000006743193","Priority":"High","mc_ueid":null,"Assint......
I am getting the error char 26: unknown option to `s' when I try this or the entire string is printed out.
cat /tmp/file1 | sed -n 's/\(INC............\)/\1/p'
cat /tmp/file1 | sed -n 's/./*\(INC............).*/\1/'
Using sed, you need to remove what precedes and follows the string:
sed 's/.*\(INC............\).*/\1/' file
But you can also use grep, if your implementation supports the -o option:
grep -o 'INC............' file
Perl can be used, too:
perl -lne 'print $1 if /(INC............)/' file
That looks like JSON. If it's got {braces} around it which you cut out before posting (tsk tsk), you should definitely use jq if it's available. That said, this page needs some awk!
POSIX (works everywhere):
awk 'match($0, /INC[^"]+/) {print substr($0, RSTART, RLENGTH)}' /tmp/file1`
GNU (works on GNU/Linux):
gawk 'match($0, /INC[^"]+/, a) {print a[0]}' /tmp/file1
If you have more than one match per line (GNU):
gawk '{while(match($0=substr($0, RSTART+RLENGTH), /INC[0-9]+/, a)) print a[0]}' /tmp/file1
fairly new to using linux on shell.
I want to reduce the amount of pipes I used to extract the following data.
V 190917135635Z 1005 unknown /C=DE/ST=City/L=City/O=something/OU=Somewhat/CN=someserver.com/emailAddress=test#toast.com
My goal is to put the following values into a separate file
190917135635 someserver.com
The command I use right now is fairly long, piped and looks like this
grep -v '^R' $file | awk '{print $2, $6}' | awk -F'[=|/]' '{print $1, $3}' | awk '{print $1, $3}' | awk -F 'Z ' '{print $1, $2}' > sdata.txt
(The file contains other lines starting with 'R' so I exclude those in my grep)
Is this a legit way of doing it?
Is there a way to get this in a shorter command?
Thanks a lot!
Another awk. Using match to find CN entry and substr to extract it for print to print, if it exists.
$ awk '!/^R/{
print $2,
(match($0,/CN=[^/]+/)?substr($0,RSTART+3,RLENGTH-3):"") # 3==length("CN=")
}' file
Output:
190917135635Z someserver.com
Looks some of your data fields are used as creating SSL certificates, thus many fields might contain SPACES, i.e. City, Organization Name etc. That's why you need many awk lines(???). Here is one way which might help you overcome these issues. So instead of transforming your existing code logic, the target is to find the domain name by searching the substring CN= and fetching its corresponding value.
awk '
!/^R/{
start = index($0, "CN=")+3
end = index(substr($0, start), "/")
domain = end ? substr($0, start, end-1) : substr($0, start)
print $2, domain
}
' file.txt
Where:
we use index() to find the start-position of the substring CN=, +3 will be the starting point of the domain name
then we search the next / to get the end-position of this domain. if it's at the end of the line, there will be no / and thus end will be '0'
then we get the domain name between the substring CN= and the next '/' by using substr($0, start, end-1) or the end of line by using substr($0, start).
A short version:
awk '!/^R/{s=index($0, "CN=")+3; e=index(substr($0, s), "/"); print $2, substr($0, s, e ? e-1 : 253)}' file.txt
where 253 is the longest possible domain name which might be enough to fit your needs.
Update:
Actually, it's much easier just use match(), but the point is the same:
awk '!/^R/{if(match($0, "/CN=([^/]*)")) print $2, substr($0, RSTART+4, RLENGTH-4)}' file.txt
If this:
$ awk -F'[[:space:]/=]+' '!/^R/{print $2+0, $16}' file
190917135635 someserver.com
isn't all you need then updated your question to clarify your requirements and provide more truly representative sample input/output.
Using GNU sed:
sed -E -n '/^R/d; s/^[A-Za-z]\s+([0-9]+)\s+[0-9]+\s+.*\/CN=(.*)\/.*/\1 \2/p' input_file > new_file
EDIT: Strictly considering that OP's Input_file is same as shown samples only. After seeing OP's samples one could try following.
awk -F"[ =/Z]" '!/^R/{print $8,$37}' Input_file
For FUN :) in case one want to try in OP's approach then we could try following.
awk '
!/^R/{
val=$2 OFS $5
split(val,array,"[ /Z]")
val1=array[1] OFS array[9] OFS array[10]
split(val1,array1,"[ =]")
print array1[1],array1[3]
}
' Input_file
You are using $6 in the second awk command, that means your 5th column has potentially spaces inside unlike the sample data you showed, also it is extracting CN= part (CNAME?).
So here's a more compatible and more exact sed way which does not require GNU sed:
sed -n -e '/^R/!{' -e 's|^[^[:space:]]*[[:space:]]*\([^[:space:]Z][^[:space:]Z]*\).*/CN=\([^/][^/]*\).*|\1 \2|p;}'
If you just want digits in the second column and it begins with digit, then you can change to use this:
sed -n -e '/^R/!{' -e 's|^[^[:space:]]*[[:space:]]*\([0-9][0-9]*\).*/CN=\([^/][^/]*\).*|\1 \2|p;}'
I would like to extract the numbers a field contains.
For example filed $5 looks like [u8789] I would need 8789.
I already know it can be done with echo "[u8789]"|awk -F'[^0-9]*' '$0=$2'
But I need the same in an awk script, and I have not found out how to get the expected result without calling awk from shell.
thx
I would use grep:
grep -o '[[:digit:]]\+'
When awk should be used and you have gawk, you can use the FPAT variable:
gawk '{print $1}' FPAT='[0-9]+'
"From inside" of an awk script without the help of FPAT or delimiter artistry, I would use gsub(), like this:
awk '{gsub(/[^[:digit:]]/, "", $0)}1'
Further reading:
http://www.staff.science.uu.nl/~oostr102/docs/nawk/nawk_92.html
(g)awk scripts
Your question is not very clear...
I believe you want a script.
Following 2 examples
I) get the first 2 number in each line, (following #hek2mgl)
#!/usr/bin/gawk -f
BEGIN { FPAT="[0-9]+"}
{ print $1,$2}
II) get all the numbers inside inside brackets [...]
#!/usr/bin/gawk -f
BEGIN { RS="["; FS="]"; }
{ print $1}