How to make a loop with multiple columns in shell? - linux

I have a file with three columns (ID number, x, y)
ifile.txt
1 32.2 21.4
4 33.2 43.5
5 21.3 45.6
12 22.3 32.5
32 21.5 56.3
43 33.4 23.4
44 23.3 22.3
55 22.5 32.4
I would like to make a loop over column 2 and 3 so that is will read like
for x=32.2 and y=21.4; do execute a fortran program
for x=33.2 and y=43.5; do execute the same program
and so on
Though my following script is working, but I need it in an efficient way.
s1=1 #serial number
s2=$(wc -l < ifile.txt) #total number to be loop
while [ $s1 -le $s2 ]
do
x=$(awk 'NR=='$s1' {print $2}' ifile.txt)
y=$(awk 'NR=='$s1' {print $3}' ifile.txt)
cat << EOF > myprog.f
...
take value of x and y
...
EOF
ifort myprog.f
./a.out
(( s1++ ))
done
Kindly Note: myprog.f is written within a cat program. for example,
cat << EOF > myprog.f
....
....
take value of x and y
....
....
EOF

Simple way to read a file in bash is
while read -r _ x y; do
echo "x is $x, y is $y"
# your Fortran code execution
done < ifile.txt
x is 32.2, y is 21.4
x is 33.2, y is 43.5
x is 21.3, y is 45.6
x is 22.3, y is 32.5
x is 21.5, y is 56.3
x is 33.4, y is 23.4
x is 23.3, y is 22.3
x is 22.5, y is 32.4

It looks like you're trying to create Fortran source code in each loop iteration with the loop variables baked into the source code, compiling it, and then invoking it, which is quite inefficient.
Instead, you should create a Fortan program once, and have it accept arguments.
(I don't know Fortran, and you haven't stated a specific compiler, but perhaps this GNU Fortran documentation will get you started.)
Assuming you have such a program and its path is ./a.out, you can invoke awk combined with xargs as follows, passing the 2nd ($2) and 3rd ($3) fields as arguments:
awk '{ print $2, $3 }' file | xargs -n 2 ./a.out
awk '{ print $2, $3 }' prints the 2nd and 3rd whitespace-separated field from each input line, separated by a space.
xargs -n 2 takes pairs of values from awk's output and invokes ./a.out with each pair as arguments. (This approach relies on the values having no embedded whitespace, which is the case here.)

Related

In AWK `rec=rec","$i` doesn't work as expected, where $i is each field in a record

I've my vmstat output on a linux box as such:
# cat vmstat.out
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 2675664 653028 3489156 0 0 1 19 22 7 5 1 94 0 0
I intend to keep the value under each field in a comma separated format along with timestamp(of course to use it as CSV file to be later transferred to our very loving MS Excel). So basically this is what I want:
Expected Output:
2016,05,19,23,53,58,1,0,0,2675664,653028,3489156,0,0,1,19,22,7,5,1,94,0,0
Script:
cat vmstat.out | awk 'BEGIN{"date +'%Y,%m,%d,%H,%M,%S'"| getline dt;}{if (NR> 2) {i=1;while (i < NF) {rec=rec","$i; i++;} print dt,rec;}}'
Output that I get from my script:
2016,05,19,23,53,58 ,1,0,0,2675664,653028,3489156,0,0,1,19,22,7,5,1,94,0
Note the extra space : 58 ,1 and the last 0 missing from Expected Output. I know the part in my script that is messing up is: rec=rec","$i
How to get around this ?
no need to reinvent awk features
$ awk -v OFS=, 'BEGIN{time=strftime("%Y,%m,%d,%H,%M,%S")}
NR>2{$1=$1; print time,$0}' file
2016,05,19,15,12,29,1,0,0,2675664,653028,3489156,0,0,1,19,22,7,5,1,94,0,0
The extra space in 58 ,1 is because you're telling awk to print a space (OFS) between dt (which ends in 58) and rec (which starts with ,1) with the comma in print dt,rec, nothing to do with rec=rec","$i.
The missing last field is because you're telling awk to stop looping before the last field. Changing while (i < NF) to while (i <= NF) would have fixed that but the loop's not necessary at all (see below).
I'm assuming you don't have GNU awk or you'd be using strftime() instead of date.
Don't have shell call awk to call shell to call date and then a pipe to getline (which you're using unsafely btw, see http://awk.freeshell.org/AllAboutGetline):
awk 'BEGIN{"date +'%Y,%m,%d,%H,%M,%S'"| getline dt;} {script}'
Just have shell call date:
awk -v dt=$(date +'%Y,%m,%d,%H,%M,%S') '{script}'
and after getting rid of the UUOC the full script is simply:
$ awk -v dt=$(date +'%Y,%m,%d,%H,%M,%S') -v OFS=, 'NR>2{$1=dt OFS $1; print}' vmstat.out
2016,05,19,14,53,05,1,0,0,2675664,653028,3489156,0,0,1,19,22,7,5,1,94,0,0
i <= NF will take care of the missing trailing 0.
Instead of looping over the fields, a more awk'ish way of doing the same thing is to set OFS - Output Field Separator to ",".
awk '
BEGIN{OFS="," ; "date +'%Y,%m,%d,%H,%M,%S'"| getline dt;}
{if (NR> 2) {$1=$1 ; print dt,$0;}}
' vmstat.out
One small glitch with that is that awk doesn't reformat $0 until something is changed. Setting $1=$1 is enough to force awk to do that (setting the output field separator in awk)

Remove lines containing non-numeric entries in bash

I have a sample data file (sample.log) that has entries
0.0262
0.0262
0.7634
5.7262
0.abc02
I need to filter out the lines that contain non-numeric data, in the above lines, the last entry.
I tried this
sed 's/[^0-9]//g' sample.log
It removes the non-numeric line but also removes the decimal values, the output I get is
00262
00262
07634
57262
How can I get the original values retained after eliminating the non-numeric lines. Can I use tr or awk
You can't do this job robustly with sed or grep or any other tool that doesn't understand numbers, you need awk instead:
$ cat file
1e3
1f3
0.1.2.3
0.123
$ awk '$0==($0+0)' file
1e3
0.123
The best you could do with a sed solution would be:
$ sed '/[^0-9.]/d; /\..*\./d' file
0.123
which removes all lines that contains anything other than a digit or period then all those that contain 2 or more periods (e.g. an IP address) but that still can't recognize the exponent notation as a number.
If you have hex input data and GNU awk (see #dawg's comment below):
$ echo "0x123" | awk --non-decimal-data '$0==($0+0){printf "%s => %f\n", $0, ($0+0)}'
0x123 => 291.000000
In awk:
awk '/^[[:digit:].]+$/{print $0}' file
Or, you negate that (and add potential + or - if that is in your strings):
awk '/[^[:digit:].+-]/{next} 1' file
Or, same logic with sed:
sed '/[^[:digit:].+-]/d' file
Ed Morton's solution is robust. Given:
$ cat nums.txt
1e6
.1e6
1E6
.001
.
0.001
.1.2
1abc2
0.0
-0
-0.0
0x123
0223
011
NaN
inf
abc
$ awk '$0==($0+0) {printf "%s => %f\n", $0, ($0+0)}
$0!=($0+0) {notf[$0]++;}
END {for (e in notf) print "\""e"\""" not a float"}' /tmp/nums.txt
1e6 => 1000000.000000
.1e6 => 100000.000000
1E6 => 1000000.000000
.001 => 0.001000
0.001 => 0.001000
0.0 => 0.000000
-0 => 0.000000
-0.0 => 0.000000
0x123 => 291.000000
0223 => 223.000000
011 => 11.000000
NaN => nan
inf => inf
".1.2" not a float
"1abc2" not a float
"abc" not a float
"." not a float
You can do it easily with grep if you discard any line that contains any letter:
grep -v [a-z] test
Use:
$ sed -i '/.*[a-z].*/d' sample.log
This might work for you (GNU sed):
sed '/[^0-9.]/d' file
However this may give a false positive on say an IP address i.e. allowing more than one ..
Using your test data:
sed '/^[0-9]\.[0-9]\{4\}$/!d' file
Would only match a digit, followed by a . followed by 4 digits.

Sort and counting a column without uniq in bash

I want to add a count of only the first column using bash, without doing uniq, like this:
input:
58311s2727 NC_000082.6 100.00 50
58311s2727 NC_000083.6 100.00 60
58311s2727 NC_000084.6 100.00 70
58310s2691 NC_000080.6 100.00 30
58310s2691 NC_000081.6 100.00 20
58308s2441 NC_000074.6 100.00 50
output:
3 58311s2727 NC_000082.6 100.00 50
3 58311s2727 NC_000083.6 100.00 60
3 58311s2727 NC_000084.6 100.00 70
2 58310s2691 NC_000080.6 100.00 30
2 58310s2691 NC_000081.6 100.00 20
1 58308s2441 NC_000074.6 100.00 50
I tried:
sort input.txt | cut -f1 | uniq -c
but the output is not what I want. I want to know if there will be simple ways to solve this.
With sorted input, you can simply use awk, capturing the set of lines that have the same key and printing the previous set out when the key changes. Handling EOF is a tad messy; you have to repeat the printing. You could write an awk function to do the printing, but it is almost overkill for something this simple.
script.awk
$1 != old_key { if (n_keys > 0) for (i = 0; i < n_keys; i++) print n_keys, saved[i]; n_keys = 0 }
{ saved[n_keys++] = $0; old_key = $1 }
END { if (n_keys > 0) for (i = 0; i < n_keys; i++) print n_keys, saved[i] }
Example runs
For the sample input input.txt (which is already grouped), the output is:
$ awk -f script.awk input.txt
3 58311s2727 NC_000082.6 100.00 50
3 58311s2727 NC_000083.6 100.00 60
3 58311s2727 NC_000084.6 100.00 70
2 58310s2691 NC_000080.6 100.00 30
2 58310s2691 NC_000081.6 100.00 20
1 58308s2441 NC_000074.6 100.00 50
$
If you want it sorted, sort it first:
$ sort input.txt | awk -f script.awk
1 58308s2441 NC_000074.6 100.00 50
2 58310s2691 NC_000080.6 100.00 30
2 58310s2691 NC_000081.6 100.00 20
3 58311s2727 NC_000082.6 100.00 50
3 58311s2727 NC_000083.6 100.00 60
3 58311s2727 NC_000084.6 100.00 70
$
Note that amongst other advantages, this can process data from a pipeline because it doesn't need to process the file twice, unlike at least one of the other solutions, which is currently accepted. It also only keeps as many lines in memory as there are lines in the biggest group of a common key, so even fairly big files are unlikely to stress the memory on the system. (The sort probably imposes more memory load than the awk does.)
script2.awk
Using a function, and some white space, the code becomes:
function dump_keys( i) {
if (n_keys > 0)
{
for (i = 0; i < n_keys; i++)
print n_keys, saved[i]
}
n_keys = 0
}
$1 != old_key { dump_keys() }
{ saved[n_keys++] = $0; old_key = $1 }
END { dump_keys() }
The variable i is local to the function (a quirk of awk). I could simply omit it from the argument list since i is not used elsewhere in the script.
This produces the same output as script.awk.
I would do this in awk. But as Aaron said, it will require reading the input twice, since the first time you hit a particular line, you don't know how many other times you'll hit it.
$ awk 'NR==FNR{a[$1]++;next} {print a[$1],$0}' inputfile inputfile
This goes through the file the first time, populating an array with a counter of the first field. Then it goes through a second time, printing the count along with each line.
You can adjust the print statement to suite your formatting requirements (perhaps replacing it with printf).
If you don't want to use awk, and really want this to work natively in bash, you could use a couple of one-liners with for loops to achieve nearly the same results:
$ declare -A a
$ while read word therest; do ((a[$word]++)); done < inputfile
$ while read word therest; do printf "%5d\t%s\t%s\n" "${a[$word]}" "$word" "$therest"; done < inputfile
The declare -A is required because $a needs to be an associative array, with the first word of each line as the key. awk, on the other hand, treats every array as associative. Note that this solution does not maintain your whitespace.
Without uniq, you'll have to read the input twice. There are ways to do that in pure BASH, but this is when I'd switch to a proper scripting language like Python 2:
import codecs
from collections import Counter
filename='...'
encoding='...' # file encoding
counter = Counter()
with codecs.open(filename, 'r', encoding) as fh:
for line in fh:
parts = line.split(' ')
counter[parts[0]] += 1
with codecs.open(filename, 'r', encoding) as fh:
for line in fh:
parts = line.split(' ')
count = counter[parts[0]]
print '%d%s' % (count, line),

how can i remove string and use the number in front of it

I have a file like this.I use Ubuntu and terminal.
1345345 dfgdfg
1345 dfgdfg
13445 dfgdfg
1345345 ddfg
15345 df
145 dfgdfg
45 dfgdfg
15 dfgdfg
I want to create a script that i can i remove the strings and divide the number or multiply the number like this and print the result near by
1345345 *3 or /3 result =
1345 *3 or /3
13445 *3 or /3
1345345 *3 or /3
15345 *3 or /3
145 *3 or /3
45 *3 or /3
15 *3 or /3
this is for a file with 50 or more entry's and then output it on a new text file
All this i have made them on Ubuntu.
thanks
a very basic example would be something like this:
cat input | sed -r 's/ *([0-9]+).*/\1/' | xargs perl -e 'for($c=0;$c<=$#ARGV;$c++) {print ($ARGV[$c] . ": " . $ARGV[$c] * 3 . "\n");}'
(input is a file that contains your data)
gives:
1345345: 4036035
1345: 4035
13445: 40335
1345345: 4036035
15345: 46035
145: 435
45: 135
15: 45
You'll need to flesh it out more to serve your complete purpose no doubt, but that's supposed to be part of the fun
So let's break it down.
we pipe (using |) the contents of our input file into a sed regular expression that only extracts the first numbers and ignores everything else: cat input | sed -r 's/ *([0-9]+).*/\1/'
it takes any numbers that it can find after any or none spaces * (since the example had a few when I copied it)
with ([0-9]+)
that may be followed by anything else .*
and replaces the complete string with its find that's what the s/input/replace/ construct is about
this would land you with the following result:
1345345
1345
13445
1345345
15345
145
45
15
you wish to perform an operation on this data: for this you need to use some programming language in general, like python, perl, ruby or whatever else suits your needs. (some things your shell will do just fine for you), I used perl here which begets us | xargs perl -e 'for($c=0;$c<=$#ARGV;$c++) {print ($ARGV[$c] . ": " . $ARGV[$c] * 3 . "\n");}'
So again we pipe the data to our next command with |
to send output from a previous pipe to another program as an argument we use xargs, it's as simple as just prepending your command with
next in your program you loop (for($c=0;$c<=$#ARGV;$c++) { ... } through the commandline arguments provided, perform your action (here we perform * 3) and print the result out (print ($ARGV[$c] . ": " . $ARGV[$c] * 3 . "\n");).
Once you have your data, redirect it to a new file, not yet done here
alternatively you could also use grep or many other programs, that's the beauty of *nix, it has many tools. The basic concept you're looking for however is filtering your data, working on it and spitting it out again.
Using perl : (in this example, I'm not expecting a space between number and string)
perl -lne 's/\W//; print "$_ x 3 = ", $_ * 3' file.txt | tee newfile.txt
Using awk :
awk '{print $1 " x 3 = " $1 * 3}' file.txt | tee newfile.txt
Using only bash :
while read n x; do echo "$n x 3 = $((n * 3))"; done < file.txt | tee newfile.txt

Using awk to print all columns from the nth to the last

This line worked until I had whitespace in the second field.
svn status | grep '\!' | gawk '{print $2;}' > removedProjs
is there a way to have awk print everything in $2 or greater? ($3, $4.. until we don't have anymore columns?)
I suppose I should add that I'm doing this in a Windows environment with Cygwin.
Print all columns:
awk '{print $0}' somefile
Print all but the first column:
awk '{$1=""; print $0}' somefile
Print all but the first two columns:
awk '{$1=$2=""; print $0}' somefile
There's a duplicate question with a simpler answer using cut:
svn status | grep '\!' | cut -d\ -f2-
-d specifies the delimeter (space), -f specifies the list of columns (all starting with the 2nd)
You could use a for-loop to loop through printing fields $2 through $NF (built-in variable that represents the number of fields on the line).
Edit:
Since "print" appends a newline, you'll want to buffer the results:
awk '{out = ""; for (i = 2; i <= NF; i++) {out = out " " $i}; print out}'
Alternatively, use printf:
awk '{for (i = 2; i <= NF; i++) {printf "%s ", $i}; printf "\n"}'
awk '{out=$2; for(i=3;i<=NF;i++){out=out" "$i}; print out}'
My answer is based on the one of VeeArr, but I noticed it started with a white space before it would print the second column (and the rest). As I only have 1 reputation point, I can't comment on it, so here it goes as a new answer:
start with "out" as the second column and then add all the other columns (if they exist). This goes well as long as there is a second column.
Most solutions with awk leave an space. The options here avoid that problem.
Option 1
A simple cut solution (works only with single delimiters):
command | cut -d' ' -f3-
Option 2
Forcing an awk re-calc sometimes remove the added leading space (OFS) left by removing the first fields (works with some versions of awk):
command | awk '{ $1=$2="";$0=$0;} NF=NF'
Option 3
Printing each field formatted with printf will give more control:
$ in=' 1 2 3 4 5 6 7 8 '
$ echo "$in"|awk -v n=2 '{ for(i=n+1;i<=NF;i++) printf("%s%s",$i,i==NF?RS:OFS);}'
3 4 5 6 7 8
However, all previous answers change all repeated FS between fields to OFS. Let's build a couple of option that do not do that.
Option 4 (recommended)
A loop with sub to remove fields and delimiters at the front.
And using the value of FS instead of space (which could be changed).
Is more portable, and doesn't trigger a change of FS to OFS:
NOTE: The ^[FS]* is to accept an input with leading spaces.
$ in=' 1 2 3 4 5 6 7 8 '
$ echo "$in" | awk '{ n=2; a="^["FS"]*[^"FS"]+["FS"]+";
for(i=1;i<=n;i++) sub( a , "" , $0 ) } 1 '
3 4 5 6 7 8
Option 5
It is quite possible to build a solution that does not add extra (leading or trailing) whitespace, and preserve existing whitespace(s) using the function gensub from GNU awk, as this:
$ echo ' 1 2 3 4 5 6 7 8 ' |
awk -v n=2 'BEGIN{ a="^["FS"]*"; b="([^"FS"]+["FS"]+)"; c="{"n"}"; }
{ print(gensub(a""b""c,"",1)); }'
3 4 5 6 7 8
It also may be used to swap a group of fields given a count n:
$ echo ' 1 2 3 4 5 6 7 8 ' |
awk -v n=2 'BEGIN{ a="^["FS"]*"; b="([^"FS"]+["FS"]+)"; c="{"n"}"; }
{
d=gensub(a""b""c,"",1);
e=gensub("^(.*)"d,"\\1",1,$0);
print("|"d"|","!"e"!");
}'
|3 4 5 6 7 8 | ! 1 2 !
Of course, in such case, the OFS is used to separate both parts of the line, and the trailing white space of the fields is still printed.
NOTE: [FS]* is used to allow leading spaces in the input line.
I personally tried all the answers mentioned above, but most of them were a bit complex or just not right. The easiest way to do it from my point of view is:
awk -F" " '{ for (i=4; i<=NF; i++) print $i }'
Where -F" " defines the delimiter for awk to use. In my case is the whitespace, which is also the default delimiter for awk. This means that -F" " can be ignored.
Where NF defines the total number of fields/columns. Therefore the loop will begin from the 4th field up to the last field/column.
Where $N retrieves the value of the Nth field. Therefore print $i will print the current field/column based based on the loop count.
awk '{ for(i=3; i<=NF; ++i) printf $i""FS; print "" }'
lauhub proposed this correct, simple and fast solution here
This was irritating me so much, I sat down and wrote a cut-like field specification parser, tested with GNU Awk 3.1.7.
First, create a new Awk library script called pfcut, with e.g.
sudo nano /usr/share/awk/pfcut
Then, paste in the script below, and save. After that, this is how the usage looks like:
$ echo "t1 t2 t3 t4 t5 t6 t7" | awk -f pfcut --source '/^/ { pfcut("-4"); }'
t1 t2 t3 t4
$ echo "t1 t2 t3 t4 t5 t6 t7" | awk -f pfcut --source '/^/ { pfcut("2-"); }'
t2 t3 t4 t5 t6 t7
$ echo "t1 t2 t3 t4 t5 t6 t7" | awk -f pfcut --source '/^/ { pfcut("-2,4,6-"); }'
t1 t2 t4 t6 t7
To avoid typing all that, I guess the best one can do (see otherwise Automatically load a user function at startup with awk? - Unix & Linux Stack Exchange) is add an alias to ~/.bashrc; e.g. with:
$ echo "alias awk-pfcut='awk -f pfcut --source'" >> ~/.bashrc
$ source ~/.bashrc # refresh bash aliases
... then you can just call:
$ echo "t1 t2 t3 t4 t5 t6 t7" | awk-pfcut '/^/ { pfcut("-2,4,6-"); }'
t1 t2 t4 t6 t7
Here is the source of the pfcut script:
# pfcut - print fields like cut
#
# sdaau, GNU GPL
# Nov, 2013
function spfcut(formatstring)
{
# parse format string
numsplitscomma = split(formatstring, fsa, ",");
numspecparts = 0;
split("", parts); # clear/initialize array (for e.g. `tail` piping into `awk`)
for(i=1;i<=numsplitscomma;i++) {
commapart=fsa[i];
numsplitsminus = split(fsa[i], cpa, "-");
# assume here a range is always just two parts: "a-b"
# also assume user has already sorted the ranges
#print numsplitsminus, cpa[1], cpa[2]; # debug
if(numsplitsminus==2) {
if ((cpa[1]) == "") cpa[1] = 1;
if ((cpa[2]) == "") cpa[2] = NF;
for(j=cpa[1];j<=cpa[2];j++) {
parts[numspecparts++] = j;
}
} else parts[numspecparts++] = commapart;
}
n=asort(parts); outs="";
for(i=1;i<=n;i++) {
outs = outs sprintf("%s%s", $parts[i], (i==n)?"":OFS);
#print(i, parts[i]); # debug
}
return outs;
}
function pfcut(formatstring) {
print spfcut(formatstring);
}
Would this work?
awk '{print substr($0,length($1)+1);}' < file
It leaves some whitespace in front though.
Printing out columns starting from #2 (the output will have no trailing space in the beginning):
ls -l | awk '{sub(/[^ ]+ /, ""); print $0}'
echo "1 2 3 4 5 6" | awk '{ $NF = ""; print $0}'
this one uses awk to print all except the last field
This is what I preferred from all the recommendations:
Printing from the 6th to last column.
ls -lthr | awk '{out=$6; for(i=7;i<=NF;i++){out=out" "$i}; print out}'
or
ls -lthr | awk '{ORS=" "; for(i=6;i<=NF;i++) print $i;print "\n"}'
If you need specific columns printed with arbitrary delimeter:
awk '{print $3 " " $4}'
col#3 col#4
awk '{print $3 "anything" $4}'
col#3anythingcol#4
So if you have whitespace in a column it will be two columns, but you can connect it with any delimiter or without it.
Perl solution:
perl -lane 'splice #F,0,1; print join " ",#F' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print every line
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the perl code
splice #F,0,1 cleanly removes column 0 from the #F array
join " ",#F joins the elements of the #F array, using a space in-between each element
Python solution:
python -c "import sys;[sys.stdout.write(' '.join(line.split()[1:]) + '\n') for line in sys.stdin]" < file
I want to extend the proposed answers to the situation where fields are delimited by possibly several whitespaces –the reason why the OP is not using cut I suppose.
I know the OP asked about awk, but a sed approach would work here (example with printing columns from the 5th to the last):
pure sed approach
sed -r 's/^\s*(\S+\s+){4}//' somefile
Explanation:
s/// is the standard command to perform substitution
^\s* matches any consecutive whitespace at the beginning of the line
\S+\s+ means a column of data (non-whitespace chars followed by whitespace chars)
(){4} means the pattern is repeated 4 times.
sed and cut
sed -r 's/^\s+//; s/\s+/\t/g' somefile | cut -f5-
by just replacing consecutive whitespaces by a single tab;
tr and cut:
tr can also be used to squeeze consecutive characters with the -s option.
tr -s [:blank:] <somefile | cut -d' ' -f5-
If you don't want to reformat the part of the line that you don't chop off, the best solution I can think of is written in my answer in:
How to print all the columns after a particular number using awk?
It chops what is before the given field number N, and prints all the rest of the line, including field number N and maintaining the original spacing (it does not reformat). It doesn't mater if the string of the field appears also somewhere else in the line.
Define a function:
fromField () {
awk -v m="\x01" -v N="$1" '{$N=m$N; print substr($0,index($0,m)+1)}'
}
And use it like this:
$ echo " bat bi iru lau bost " | fromField 3
iru lau bost
$ echo " bat bi iru lau bost " | fromField 2
bi iru lau bost
Output maintains everything, including trailing spaces
In you particular case:
svn status | grep '\!' | fromField 2 > removedProjs
If your file/stream does not contain new-line characters in the middle of the lines (you could be using a different Record Separator), you can use:
awk -v m="\x0a" -v N="3" '{$N=m$N ;print substr($0, index($0,m)+1)}'
The first case will fail only in files/streams that contain the rare hexadecimal char number 1
This awk function returns substring of $0 that includes fields from begin to end:
function fields(begin, end, b, e, p, i) {
b = 0; e = 0; p = 0;
for (i = 1; i <= NF; ++i) {
if (begin == i) { b = p; }
p += length($i);
e = p;
if (end == i) { break; }
p += length(FS);
}
return substr($0, b + 1, e - b);
}
To get everything starting from field 3:
tail = fields(3);
To get section of $0 that covers fields 3 to 5:
middle = fields(3, 5);
b, e, p, i nonsense in function parameter list is just an awk way of declaring local variables.
All of the other answers given here and in linked questions fail in various ways given various possible FS values. Some leave leading and/or trailing white space, some convert every FS to the OFS, some rely on semantics that only apply when FS is the default value, some rely on negating FS in a bracket expression which will fail given a multi-char FS, etc.
To do this robustly for any FS, use GNU awk for the 4th arg to split():
$ cat tst.awk
{
split($0,flds,FS,seps)
for ( i=n; i<=NF; i++ ) {
printf "%s%s", flds[i], seps[i]
}
print ""
}
$ printf 'a b c d\n' | awk -v n=3 -f tst.awk
c d
$ printf ' a b c d\n' | awk -v n=3 -f tst.awk
c d
$ printf ' a b c d\n' | awk -v n=3 -F'[ ]' -f tst.awk
b c d
$ printf ' a b c d\n' | awk -v n=3 -F'[ ]+' -f tst.awk
b c d
$ printf 'a###b###c###d\n' | awk -v n=3 -F'###' -f tst.awk
c###d
$ printf '###a###b###c###d\n' | awk -v n=3 -F'###' -f tst.awk
b###c###d
Note that I'm using split() above because it's 3rg arg is a field separator, not just a regexp like the 2nd arg to match(). The difference is that field separators have additional semantics to regexps such as skipping leading and/or trailing blanks when the separator is a single blank char - if you wanted to use a while(match()) loop or any form of *sub() to emulate the above then you'd need to write code to implement those semantics whereas split() already implements them for you.
Awk examples looks complex here, here is simple Bash shell syntax:
command | while read -a cols; do echo ${cols[#]:1}; done
Where 1 is your nth column counting from 0.
Example
Given this content of file (in.txt):
c1
c1 c2
c1 c2 c3
c1 c2 c3 c4
c1 c2 c3 c4 c5
here is the output:
$ while read -a cols; do echo ${cols[#]:1}; done < in.txt
c2
c2 c3
c2 c3 c4
c2 c3 c4 c5
This would work if you are using Bash and you could use as many 'x ' as elements you wish to discard and it ignores multiple spaces if they are not escaped.
while read x b; do echo "$b"; done < filename
Perl:
#m=`ls -ltr dir | grep ^d | awk '{print \$6,\$7,\$8,\$9}'`;
foreach $i (#m)
{
print "$i\n";
}
UPDATE :
if you wanna use no function calls at all while preserving the spaces and tabs in between the remaining fields, then do :
echo " 1 2 33 4444 555555 \t6666666 " |
{m,g}awk ++NF FS='^[ \t]*[^ \t]*[ \t]+|[ \t]+$' OFS=
=
2 33 4444 555555 6666666
===================
You can make it a lot more straight forward :
svn status | [m/g]awk '/!/*sub("^[^ \t]*[ \t]+",_)'
svn status | [n]awk '(/!/)*sub("^[^ \t]*[ \t]+",_)'
Automatically takes care of the grep earlier in the pipe, as well as trimming out extra FS after blanking out $1, with the added bonus of leaving rest of the original input untouched instead of having tabs overwritten with spaces (unless that's the desired effect)
If you're very certain $1 does not contain special characters that need regex escaping, then it's even easier :
mawk '/!/*sub($!_"[ \t]+",_)'
gawk -c/P/e '/!/*sub($!_"""[ \t]+",_)'
Or if you prefer customizing FS+OFS to handle it all :
mawk 'NF*=/!/' FS='^[^ \t]*[ \t]+' OFS='' # this version uses OFS
This should be a reasonably comprehensive awk-field-sub-string-extraction function that
returns substring of $0 based on input ranges, inclusive
clamp in out of range values,
handle variable length field SEPs
has speedup treatments for ::
completely no inputs, returning $0 directly
input values resulting in guaranteed empty string ("")
FROM-field == 1
FS = "" that has split $0 out by individual chars
(so the FROM <(_)> and TO <(__)> fields behave like cut -c rather than cut -f)
original $0 restored, w/o overwriting FS seps with OFS
|
{m,g}awk '{
2 print "\n|---BEFORE-------------------------\n"
3 ($0) "\n|----------------------------\n\n ["
4 fld2(2, 5) "]\n [" fld2(3) "]\n [" fld2(4, 2)
5 "]<----------------------------------------------should be
6 empty\n [" fld2(3, 11) "]<------------------------should be
7 capped by NF\n [" fld2() "]\n [" fld2((OFS=FS="")*($0=$0)+11,
8 23) "]<-------------------FS=\"\", split by chars
9 \n\n|---AFTER-------------------------\n" ($0)
10 "\n|----------------------------"
11 }
12 function fld2(_,__,___,____,_____)
13 {
if (+__==(_=-_<+_ ?+_:_<_) || (___=____="")==__ || !NF) {
return $_
16 } else if (NF<_ || (__=NF<+__?NF:+__)<(_=+_?_:!_)) {
return ___
18 } else if (___==FS || _==!___) {
19 return ___<FS \
? substr("",$!_=$!_ substr("",__=$!(NF=__)))__
20 : substr($(_<_),_,__)
21 }
22 _____=$+(____=___="\37\36\35\32\31\30\27\26\25"\
"\24\23\21\20\17\16\6\5\4\3\2\1")
23 NF=__
24 if ($(!_)~("["(___)"]")) {
25 gsub("..","\\&&",___) + gsub(".",___,____)
27 ___=____
28 }
29 __=(_) substr("",_+=_^=_<_)
30 while(___!="") {
31 if ($(!_)!~(____=substr(___,--_,++_))) {
32 ___=____
33 break }
35 ___=substr(___,_+_^(!_))
36 }
37 return \
substr("",($__=___ $__)==(__=substr($!_,
_+index($!_,___))),_*($!_=_____))(__)
}'
those <TAB> are actual \t \011 but relabeled for display clarity
|---BEFORE-------------------------
1 2 33 4444 555555 <TAB>6666666
|----------------------------
[2 33 4444 555555]
[33]
[]<---------------------------------------------- should be empty
[33 4444 555555 6666666]<------------------------ should be capped by NF
[ 1 2 33 4444 555555 <TAB>6666666 ]
[ 2 33 4444 555555 <TAB>66]<------------------- FS="", split by chars
|---AFTER-------------------------
1 2 33 4444 555555 <TAB>6666666
|----------------------------
I wasn't happy with any of the awk solutions presented here because I wanted to extract the first few columns and then print the rest, so I turned to perl instead. The following code extracts the first two columns, and displays the rest as is:
echo -e "a b c d\te\t\tf g" | \
perl -ne 'my #f = split /\s+/, $_, 3; printf "first: %s second: %s rest: %s", #f;'
The advantage compared to the perl solution from Chris Koknat is that really only the first n elements are split off from the input string; the rest of the string isn't split at all and therefor stays completely intact. My example demonstrates this with a mix of spaces and tabs.
To change the amount of columns that should be extracted, replace the 3 in the example with n+1.
ls -la | awk '{o=$1" "$3; for (i=5; i<=NF; i++) o=o" "$i; print o }'
from this answer is not bad but the natural spacing is gone.
Please then compare it to this one:
ls -la | cut -d\ -f4-
Then you'd see the difference.
Even ls -la | awk '{$1=$2=""; print}' which is based on the answer voted best thus far is not preserve the formatting.
Thus I would use the following, and it also allows explicit selective columns in the beginning:
ls -la | cut -d\ -f1,4-
Note that every space counts for columns too, so for instance in the below, columns 1 and 3 are empty, 2 is INFO and 4 is:
$ echo " INFO 2014-10-11 10:16:19 main " | cut -d\ -f1,3
$ echo " INFO 2014-10-11 10:16:19 main " | cut -d\ -f2,4
INFO 2014-10-11
$
If you want formatted text, chain your commands with echo and use $0 to print the last field.
Example:
for i in {8..11}; do
s1="$i"
s2="str$i"
s3="str with spaces $i"
echo -n "$s1 $s2" | awk '{printf "|%3d|%6s",$1,$2}'
echo -en "$s3" | awk '{printf "|%-19s|\n", $0}'
done
Prints:
| 8| str8|str with spaces 8 |
| 9| str9|str with spaces 9 |
| 10| str10|str with spaces 10 |
| 11| str11|str with spaces 11 |
The top-voted answer by zed_0xff did not work for me.
I have a log where after $5 with an IP address can be more text or no text. I need everything from the IP address to the end of the line should there be anything after $5. In my case, this is actually within an awk program, not an awk one-liner so awk must solve the problem. When I try to remove the first 4 fields using the solution proposed by zed_0xff:
echo " 7 27.10.16. Thu 11:57:18 37.244.182.218" | awk '{$1=$2=$3=$4=""; printf "[%s]\n", $0}'
it spits out wrong and useless response (I added [..] to demonstrate):
[ 37.244.182.218 one two three]
There are even some suggestions to combine substr with this wrong answer, but that only complicates things. It offers no improvement.
Instead, if columns are fixed width until the cut point and awk is needed, the correct answer is:
echo " 7 27.10.16. Thu 11:57:18 37.244.182.218" | awk '{printf "[%s]\n", substr($0,28)}'
which produces the desired output:
[37.244.182.218 one two three]

Resources