Capturing the value of pattern in UNIX and writing to another file

Capturing the value of pattern in UNIX and writing to another file - linux

I have a file
#InboxPulse.jmx
request.threads3=10
request.loop=10
duration=300
request.ramp=6
#LaunchPulse.jmx
request.threads1=20
request.loop1=5
duration1=300
request.ramp1=6
#BankRetail.jmx
request.threads2=30
request.loop2=7
duration2=300
request.ramp2=6
I would like to capture the values for
request.threads2
request.threads1
request.threads3
into another file like this:
10
20
30
I tried this
awk '/request.threads[0-9]{1,10}=/{print $NF}' build.properties >> sum.txt
It gives the output as:
request.threads3=10
request.threads1=20
request.threads2=30
How can I get the desired output?

Split on the = sign, match on field 1, print field 2:
awk -F'=' '$1 ~ /request.threads[0-9]+$/ {print $2}' build.properties >> sum.txt

1) Extracting values
$ grep -oP 'request.threads\d+=\K\d+' build.properties
10
20
30
Add > sum.txt to command to save output to a file
2) If sum of those values is needed
$ perl -lne '($v)=/request.threads\d+=\K(\d+)/; $s+=$v; END{print $s}' build.properties
60

Related

Add an index column to a csv using awk

How can I add an index to a csv file using awk? For example lets assume I have a file
data.txt
col1,col2,col3
a1,b1,c1
a2,b2,c2
a3,b3,c3
I would like to add another column, which is the index. Basically I would like an output of
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3
I was trying to use awk '{for (i=1; i<=NF; i++) print $i}' but it does not seem to be working right. And what is the best way to just add a comma for the first line but add incrementing number and a comma to the rest of the lines?

You may use this awk solution:
awk '{print (NR == 1 ? "" : NR-2) "," $0}' file
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3

Use this Perl one-liner:
perl -pe '$_ = ( $. > 1 ? ($. - 2) : "" ) . ",$_";' data.txt > out.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
$. : Current input line number.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlvar: Perl predefined variables

I would use GNU AWK for this task following way, let file.txt content be
col1,col2,col3
a1,b1,c1
a2,b2,c2
a3,b3,c3
then
awk 'BEGIN{OFS=","}{print NR==1?"":i++,$0}' file.txt
gives output
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3
Explanation: firstly I inform GNU AWK that output field separator (OFS) is ,, so arguments to print will be concatenated using that character. Then for each line I use so-called ternary operator i.e. condition?valueiftrue:valueiffalse to decide what will be 1st argument, for 1st line (NR==1) it is empty string for all else it is counter which will be first returned then increased by 1, 2nd argument to print is always whole original line ($0).
(tested in gawk 4.2.1)

gawk 'sub("^",substr(++_",",3^(NF~NR)))' FS='^$' \_=-2
mawk 'sub("^",++_+NF ? _",":",")' FS='^$' \_=-2
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3

Pipe awk and grep to save a particular field of a file

What I want to achieve:
grep: extract lines with the contig number and length
awk: remove "length:" from column 2
sort: sort by length (in descending order)
Current code
grep "length:" test_reads.fa.contigs.vcake_output | awk -F:'{print $2}' |sort -g -r > contig.txt
Example content of test_reads.fa.contigs.vcake_output:
>Contig_11 length:42
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_0 length:99995
ATTTATGCCGTTGGCCACGAATTCAGAATCATATTA
Expected output
>Contig_0 99995
>Contig_11 42

With your shown samples, please try following awk + sort solution here.
awk -F'[: ]' '/^>/{print $1,$3}' Input_file | sort -nrk2
Explanation: Simple explanation would be, running awk program to read Input_file first, where setting field separator as : OR space and checking condition if line starts from > then printing its 1st and 2nd fields then sending its output(as a standard input) to sort command where sorting it from 2nd field to get required output.

Here is a gnu-awk solution that does it all in a single command without invoking sort:
awk -F '[:[:blank:]]' '
$2 == "length" {arr[$1] = $3}
END {
PROCINFO["sorted_in"] = "#ind_num_asc"
for (i in arr)
print i, arr[i]
}' file
>Contig_0 99995
>Contig_11 42

Perhaps this, combining grep and awk:
awk -F '[ :]' '$2 == "length" {print $1, $3}' file | sort ...

Assumptions:
if more than one row has the same length then additionally sort the 1st column using 'version' sort
Adding some additional lines to the sample input:
$ cat test_reads.fa.contigs.vcake_output
>Contig_0 length:99995
ATTTATGCCGTTGGCCACGAATTCAGAATCATATTA
>Contig_11 length:42
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_17 length:93
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_837 ignore-this-length:1000000
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
>Contig_8 length:42
ACTCTGAGTGATCTTGGCGTAATAGGCCTGCTTAATGATCGT
One sed/sort idea:
$ sed -rn 's/(>[^ ]+) length:(.*)$/\1 \2/p' test_reads.fa.contigs.vcake_output | sort -k2,2nr -k1,1V
Where:
-En - enable extended regex support and suppress normal printing of input data
(>[^ ])+) - (1st capture group) - > followed by 1 or more non-space characters
length: - space followed by length:
(.*) - (2nd capture group) - 0 or more characters (following the colon)
$ - end of line
\1 \2/p - print 1st capture group + <space> + 2nd capture group
-k2,2nr - sort by 2nd (spaced-delimited) field in reverse numeric order
-k1,1V - sort by 1st (space-delimited) field in Version order
This generates:
>Contig_0 99995
>Contig_17 93
>Contig_8 42
>Contig_11 42

How to export each part of a line of text file to its own file?

I have these output values of an Arduino Sensor saved to text file like this
9 P2.5=195.60 P10=211.00
10 P2.5=195.70 P10=211.10
11 P2.5=195.70 P10=211.10
2295 P2.5=201.20 P10=218.20
2300 P2.5=201.40 P10=218.40
...
...
And I want to extract each column to its own text file.
Expected Output: 3 text Files Number.txt, P25.txt and P10.txt where
Number.txt contains
9
10
11
2295
2300
P25.txt contains
195.60
195.70
195.70
201.20
201.40
and P10.txt contains
211.00
211.10
211.10
218.20
218.40
PS: the file has more than just 5 lines, so the code should be applied to every line.

Here is how you could do:
$ grep -Po '^[0-9.]+' data.txt > Number.txt
$ grep -Po '(?<=P2\.5=)[0-9.]+' data.txt > P25.txt
$ grep -Po '(?<=P10=)[0-9.]+' data.txt > P10.txt
^: Assert position at the start of the line.
[0-9.]+ Matches either a digit or a dot, between one and unlimited times, as much as possible.
(?<=): Positive lookbehind.
P2\.5=: Matches P2.5=.
P10=: Matches P10=.
-o: Print only matching part.
-P: Perl style regex.

Use awk, which can open files itself rather than rely on standard output.
awk '{sub("P2.5=", "", $2);
sub("P10=", "", $3);
print $1 > "Number.txt";
print $2 > "P25.txt";
print $3 > "P10.txt"; }' data.txt
or
awk '{print $1 > "Number.txt";
print substr($2, 6) > "P25.txt";
print substr($3, 5) > "P10.txt"; }' data.txt

awk after grep: print value when grep returns nothing

I have a question when I use awk and grep to parse log files.
The log file contains some strings with figures, e.g.
Amount: 20
Amount: 30.1
And I use grep to parse the lines with keyword "Amount", and then use awk to get the amount and do a sum:
the command is like:
cat mylog.log | grep Amount | awk -F 'Amount: ' '{sum+=$2}END{print sum}'
It works fine for me. However, sometimes the mylog.log file does not contains the keyword 'Amount'. In this case, I want to print 0, but the above awk command will print nothing. How can I make awk print something when grep returns nothing?

You can use this:
awk '/^Amount/ {amount+=$2} END {print amount+0}' file
With the +0 trick you make it print 0 in case the value is not set.
Explanation
There is no need to grep + awk. awk alone can grep (and many more things!):
/^Amount/ {} on lines starting with "Amount", perform what is in {}.
amount+=$2 add field 2's value to the counter "amount".
END {print amount+0} after processing the whole file, print the value of amount. Doing +0 makes it print 0 if it wasn't set before.
Note also there is no need to set 'Amount' as the field separator. It suffices with the default one (the space).
Test
$ cat a
Amount: 20
Amount: 30.1
$ awk '/^Amount/ {amount+=$2} END {print amount+0}' a
50.1
$ cat b
hello
$ awk '/^Amount/ {amount+=$2} END {print amount+0}' b
0

If your line only contains "Amount: 20" then use #fedorqui's solution, but if it's more like "The quick brown fox had Amount: 20 bananas" then use:
awk -F'Amount:' 'NF==2{sum+=$2} END{print sum+0}' file

Awk one-liner,
awk -F 'Amount: ' '/Amount:/{print "1";sum+=$2}!/Amount:/{print "0"}END{print sum}' file
The above awk command would print the number 1 for the lines which has the string Amount and it prints 0 for the lines which don't have the string Amount. And also if the string Amount is found on a line then it stores the value(column 2) to the sum variable and adds it with any further values. Finally the value of the variable sum is printed at the last.
Example:
$ cat file
Amount: 20
Amount: 30.1
foo bar
adbcksjc
sbcskcbks
cnskncsnc
$ awk -F 'Amount: ' '/Amount:/{print "1";sum+=$2}!/Amount:/{print "0"}END{print sum}' file
1
1
0
0
0
0
50.1

Exchange columns in bash

I have this file:
$ cat file
1515523 A45678BF141 A11269151
2234545 A45678BE145 A87979746
5432568 A45678B2123 A40629187
7234573 A45678B4154 A98879129
8889568 A45678B5123 A13409137
9234511 A45678B9176 A23589941
3904568 A45678B7123 A52329165
3234555 A45678B1169 A23589497
9643568 A45678B6123 A39969112
1234547 A45678B2132 A40579243
and this script:
cat file | awk '{FS = " "} {print $1" "$3" "$5}'| awk '{
n = split($3, a, "");
s = "";
for (i = 1; i <= n; i += 2) s = s a[i+1] a[i];
print $1, substr($2, length($2)-3, 4), s
}'| cut -d" " -f3,1 > output
And when I open the output with vi, I have:
1515523 F141 11621915^M
2234545 E145 78797964^M
5432568 2123 04261978^M
7234573 4154 89781992^M
8889568 5123 31041973^M
9234511 9176 32859914^M
3904568 7123 25231956^M
3234555 1169 32854979^M
9643568 6123 93691921^M
1234547 2132 04752934^M
I don't know why I obtain ^M, because when I intend to run the awk snippet:
cat imei | awk '{FS=" "} {print $2","$1}'
the output is mistaken, i.e., it does not exchange the columns, as it does not print the second column. Any ideas on what may be happening?

There are carriage returns (^M or Control-M) in the data file. It probably came from a Windows machine at some point.
When you print $2","$1 (which concatenates $2 with a string containing a comma and then $1 — it took me a couple of looks to see what it was really doing), the carriage return makes the second column overwrite the first.
Look at the data file with od -c or similar tools to see the carriage returns in it.
You can use dos2unix or tr or various other techniques to convert the file from DOS/Windows format to Unix format.
Also, given the data format shown, I'd expect not to use -F " " (or the FS = " ", which is equivalent), so that you have columns $1, $2, and $3, which is more obvious than working with columns 1, 3, 5 as shown. You could set OFS to double-blank if you wanted the output with two blanks between columns.

$ dos2unix file
$ awk '{split($3,a,""); print $1, substr($2,8), a[3]a[2]a[5]a[4]a[7]a[6]a[9]a[8]}' file
1515523 F141 11621915
2234545 E145 78797964
5432568 2123 04261978
7234573 4154 89781992
8889568 5123 31041973
9234511 9176 32859914
3904568 7123 25231956
3234555 1169 32854979
9643568 6123 93691921
1234547 2132 04752934

Since you are using awk you do not need a dos2unix.
simply insert
gsub(/\r/,"");
as a first statement in your awk Script
It cleans up each line read in. Subsequent matching or processing does not get any 'carriage return' characters.

How about a perl 'one liner' (with a continuation line)
$ dos2unix file
$ perl -lane \
'$xxxx = substr($F[1],-4);
#c = split(//,$F[2]);
print "$F[0] $xxxx $c[2]$c[1]$c[4]$c[3]$c[6]$c[5]$c[8]$c[7]"' file

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Capturing the value of pattern in UNIX and writing to another file - linux

Split on the = sign, match on field 1, print field 2: awk -F'=' '$1 ~ /request.threads[0-9]+$/ {print $2}' build.properties >> sum.txt

1) Extracting values $ grep -oP 'request.threads\d+=\K\d+' build.properties 10 20 30 Add > sum.txt to command to save output to a file 2) If sum of those values is needed $ perl -lne '($v)=/request.threads\d+=\K(\d+)/; $s+=$v; END{print $s}' build.properties 60

Related

Add an index column to a csv using awk

Pipe awk and grep to save a particular field of a file

How to export each part of a line of text file to its own file?

awk after grep: print value when grep returns nothing

Exchange columns in bash

Categories

Resources