How to print third column to last column? - linux

I'm trying to remove the first two columns (of which I'm not interested in) from a DbgView log file. I can't seem to find an example that prints from column 3 onwards until the end of the line. Note that each line has variable number of columns.

...or a simpler solution: cut -f 3- INPUTFILE just add the correct delimiter (-d) and you got the same effect.

awk '{for(i=3;i<=NF;++i)print $i}'

awk '{ print substr($0, index($0,$3)) }'
solution found here:
http://www.linuxquestions.org/questions/linux-newbie-8/awk-print-field-to-end-and-character-count-179078/

Jonathan Feinberg's answer prints each field on a separate line. You could use printf to rebuild the record for output on the same line, but you can also just move the fields a jump to the left.
awk '{for (i=1; i<=NF-2; i++) $i = $(i+2); NF-=2; print}' logfile

awk '{$1=$2=$3=""}1' file
NB: this method will leave "blanks" in 1,2,3 fields but not a problem if you just want to look at output.

If you want to print the columns after the 3rd for example in the same line, you can use:
awk '{for(i=3; i<=NF; ++i) printf "%s ", $i; print ""}'
For example:
Mar 09:39 20180301_123131.jpg
Mar 13:28 20180301_124304.jpg
Mar 13:35 20180301_124358.jpg
Feb 09:45 Cisco_WebEx_Add-On.dmg
Feb 12:49 Docker.dmg
Feb 09:04 Grammarly.dmg
Feb 09:20 Payslip 10459 %2828-02-2018%29.pdf
It will print:
20180301_123131.jpg
20180301_124304.jpg
20180301_124358.jpg
Cisco_WebEx_Add-On.dmg
Docker.dmg
Grammarly.dmg
Payslip 10459 %2828-02-2018%29.pdf
As we can see, the payslip even with space, shows in the correct line.

What about following line:
awk '{$1=$2=$3=""; print}' file
Based on #ghostdog74 suggestion. Mine should behave better when you filter lines, i.e.:
awk '/^exim4-config/ {$1=""; print }' file

awk -v m="\x0a" -v N="3" '{$N=m$N ;print substr($0, index($0,m)+1)}'
This chops what is before the given field nr., N, and prints all the rest of the line, including field nr.N and maintaining the original spacing (it does not reformat). It doesn't mater if the string of the field appears also somewhere else in the line, which is the problem with daisaa's answer.
Define a function:
fromField () {
awk -v m="\x0a" -v N="$1" '{$N=m$N; print substr($0,index($0,m)+1)}'
}
And use it like this:
$ echo " bat bi iru lau bost " | fromField 3
iru lau bost
$ echo " bat bi iru lau bost " | fromField 2
bi iru lau bost
Output maintains everything, including trailing spaces
Works well for files where '/n' is the record separator so you don't have that new-line char inside the lines. If you want to use it with other record separators then use:
awk -v m="\x01" -v N="3" '{$N=m$N ;print substr($0, index($0,m)+1)}'
for example. Works well with almost all files as long as they don't use hexadecimal char nr. 1 inside the lines.

awk '{a=match($0, $3); print substr($0,a)}'
First you find the position of the start of the third column.
With substr you will print the whole line ($0) starting at the position(in this case a) to the end of the line.

The following awk command prints the last N fields of each line and at the end of the line prints a new line character:
awk '{for( i=6; i<=NF; i++ ){printf( "%s ", $i )}; printf( "\n"); }'
Find below an example that lists the content of the /usr/bin directory and then holds the last 3 lines and then prints the last 4 columns of each line using awk:
$ ls -ltr /usr/bin/ | tail -3
-rwxr-xr-x 1 root root 14736 Jan 14 2014 bcomps
-rwxr-xr-x 1 root root 10480 Jan 14 2014 acyclic
-rwxr-xr-x 1 root root 35868448 May 22 2014 skype
$ ls -ltr /usr/bin/ | tail -3 | awk '{for( i=6; i<=NF; i++ ){printf( "%s ", $i )}; printf( "\n"); }'
Jan 14 2014 bcomps
Jan 14 2014 acyclic
May 22 2014 skype

Perl solution:
perl -lane 'splice #F,0,2; print join " ",#F' file
These command-line options are used:
-n loop around every line of the input file, do not automatically print every line
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace
-e execute the perl code
splice #F,0,2 cleanly removes columns 0 and 1 from the #F array
join " ",#F joins the elements of the #F array, using a space in-between each element
If your input file is comma-delimited, rather than space-delimited, use -F, -lane
Python solution:
python -c "import sys;[sys.stdout.write(' '.join(line.split()[2:]) + '\n') for line in sys.stdin]" < file

Well, you can easily accomplish the same effect using a regular expression. Assuming the separator is a space, it would look like:
awk '{ sub(/[^ ]+ +[^ ]+ +/, ""); print }'

awk '{print ""}{for(i=3;i<=NF;++i)printf $i" "}'

A bit late here, but none of the above seemed to work. Try this, using printf, inserts spaces between each. I chose to not have newline at the end.
awk '{for(i=3;i<=NF;++i) printf("%s ", $i) }'

awk '{for (i=4; i<=NF; i++)printf("%c", $i); printf("\n");}'
prints records starting from the 4th field to the last field in the same order they were in the original file

In Bash you can use the following syntax with positional parameters:
while read -a cols; do echo ${cols[#]:2}; done < file.txt
Learn more: Handling positional parameters at Bash Hackers Wiki

If its only about ignoring the first two fields and if you don't want a space when masking those fields (like some of the answers above do) :
awk '{gsub($1" "$2" ",""); print;}' file

awk '{$1=$2=""}1' FILENAME | sed 's/\s\+//g'
First two columns are cleared, sed removes leading spaces.

In AWK columns are called fields, hence NF is the key
all rows:
awk -F '<column separator>' '{print $(NF-2)}' <filename>
first row only:
awk -F '<column separator>' 'NR<=1{print $(NF-2)}' <filename>

Related

Count number of ';' in column

I use the following command to count number of ; in a first line in a file:
awk -F';' '(NR==1){print NF;}' $filename
I would like to do same with all lines in the same file. That is to say, count number of ; on all line in file.
What I have :
$ awk -F';' '(NR==1){print NF;}' $filename
11
What I would like to have :
11
11
11
11
11
11
Straight forward method to count ; per line should be:
awk '{print gsub(/;/,"&")}' Input_file
To remove empty lines try:
awk 'NF{print gsub(/;/,"&")}' Input_file
To do this in OP's way reduce 1 from value of NF:
awk -F';' '{print (NF-1)}' Input_file
OR
awk -F';' 'NF{print (NF-1)}' Input_file
I'd say you can solve your problem with the following:
awk -F';' '{if (NF) {a += NF-1;}} END {print a}' test.txt
You want to keep a running count of all the occurrences made (variable a).
As NF will return the number of fields, which is one more than the number of separators, you'll need to subtract 1 for each line. This is the NF-1 part.
However, you don't want to count "-1" for the lines in which there is no separator at all. To skip those you need the if (NF) part.
Here's a (perhaps contrived) example:
$ cat test.txt
;;
; ; ; ;;
; asd ;;a
a ; ;
$ awk -F';' '{if (NF) {a += NF-1;}} END {print a}' test.txt
12
Notice the empty line at the end (to test against the "no separator" case).
A different approach using tr and wc:
$ tr -cd ';' < file | wc -c
42
Your code returns a number one more than the number of semicolons; NF is the number of fields you get from splitting on a semicolon (so for example, if there is one semicolon, the line is split in two).
If you want to add this number from each line, that's easy;
awk -F ';' '{ sum += NF-1 } END { print sum }' "$filename"
If the number of fields is consistent, you could also just count the number of lines and multiply;
awk -F ':' 'END { print NR * (NF-1) }' "$filename"
But that's obviously wrong if you can't guarantee that all lines contain exactly the same number of fields.

Set an external variable in awk

I have written a script in which I want to count the number of columns in data.txt . My problem is I am unable to set the x in awk script.
Any help would be highly appreciated.
while read p; do
x=1;
echo $p | awk -F' ' '{x=NF}'
echo $x;
file="$x"".txt";
echo $file;
done <$1
data.txt file:
4495125 94307025 giovy115p#live.it 94307025.094307025 12443
stazla deva1a23#gmail.com 1992/.:\1
1447585 gioao_87#hotmail.it h1st#1
saknit tomboro#seznam.cz 1233 1990
Expected output:
5.txt
3.txt
3.txt
4.txt
My output:
1.txt
1.txt
1.txt
1.txt
You just cannot import variable set in Awk to a shell context. In your example the value set inside x containing NF will be not reflected outside.
Either you need to use command substitution($(..)) syntax to get the value of NF and use it later
x=$(echo "$p" | awk '{print NF}')
Now x will contain the column count in each of the line. Note that you don't need to use -F' ' which is the default de-limiter in awk.
Besides your requirement can be fully done in Awk itself.
awk 'NF{print NF".txt"}' file
Here the NF{..} is to ensure that the actions inside {..} are applied only to non-empty rows. The for each row we print the length and append the extension .txt along with it.
Awk processes a line at a time -- processing each line in a separate Awk script inside a shell while read loop is horrendously inefficient. See also https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice
Maybe something like this:
awk '{ print >(NF ".txt") }' data.txt
to create a file with the five-column rows in 5.txt, the four-column ones in 4.txt, the three-column rows in 2.txt, etc for each unique column count.
The Awk variable NF contains the number of fields (by default, Awk splits fields on runs of whitespace -- use -F to change to some other separator) and the expression (NF ".txt") simply produces a string catenation of the number of fields with the suffix .txt which we pass as a file name to the print redirection.
With bash:
while read p; do p=($p); echo "${#p[#]}.txt"; done < file
or shorter:
while read -a p; do echo "${#p[#]}.txt"; done < file
Output:
5.txt
3.txt
3.txt
4.txt

How to cut word which is having three digits in a file (100) - shell scripting

I am having file, it has below data. I want to get queue names (FID.MAGNET.ERROR.*) which is having 100 + depth. please help me here.
file name MQData -
Which command i should use to get queue names which is having 100+(three digits > + ) details?
Three digits and >=100 have different meanings.
0000 is more than 3 digits. well perhaps your data won't have those cases.
If the length is important, I will do awk 'length($1)>2{print $2} file
If the value is what you are looking at, I will do awk '($1+0)>=100{print $2}' file
The $1+0 makes sure if your $1 has leading zeros, the comparison will be done correctly too. Take a look this example:
kent$ awk 'BEGIN{if("01001"+0>100)print "OK";else print "NOK"}'
OK
kent$ awk 'BEGIN{if("01001">100)print "OK";else print "NOK"}'
NOK
awk '$1 >= 100 {print $2}' MQData
Does that work?
You can skip lines with grep -v. I use echo -e to create a multi-line stream.
echo -e "1 xx\n22 yy\n333 zz\n100 To be deleted" | grep -Ev "^. |^.. |^100 "

How to efficiently get 10% of random lines out of the large file in Linux?

I want to output random 10% lines of total lines of a file. For instance, file a has 1,000,000 lines then I want to output random 100,000 lines out of the file (100,000 being the 10% of 1,000,000) .
There is a easy to do this supposed that the file is small:
randomLine=`wc -l a | awk '{printf("%d\n",($1/10))}'`
sort -R a | head -n $randomLine
But using sort -R is very slow. It will perform a dedicated random computation. My file has 10,000,000 lines. Sorting takes too much time. Is there anyway to archive a less dedicated and not so random but efficient sampling?
Edit Ideas:
To sample a line every ten lines is acceptable. But I don't know how to do this with shell script.
Read line by line and if
echo $RANDOM%100 | bc
is greater than 20 than output the line (Using the number greater than 10 to ensure get no less than 10% of line) and once output 10% line then stop. But I don't know how to read line by line using shell script.
Edit Description
The reason I want to use shell script is that my file contains \r characters. The new line character in the file should be \n but readline() function in Python and Java regards \r and \n as new line character, which doesn't fit my need.
Let's create a random list of X numbers from 1 to Y. You can do it with:
shuf -i 1-Y -nX
In your case,
shuf -i 1-1000000 -n10000
Then you store it in a variable (space separated) and pass to awk, so that you print those line numbers:
awk 'FNR==NR {a[$1]; next} {if (FNR in a) print}' <(shuf -i 1-1000000 -n10000) file
Explanation
FNR==NR {a[$1]; next} loop through the shuf results and store them in a a[] array.
{if (FNR in a) print} if the line number of the second parameter (the file) is found in the array a[], print it.
Sample with Y=10, X=2
$ cat a
1 hello
2 i am
3 fe
4 do
5 rqui
6 and
7 this
8 is
9 sample
10 text
$ awk 'FNR==NR {a[$1]; next} {if (FNR in a) print}' <(shuf -i 1-10 -n2) a
2 i am
9 sample
$ awk 'FNR==NR {a[$1]; next} {if (FNR in a) print}' <(shuf -i 1-10 -n2) a
4 do
6 and
Improvement
As plundra suggested in comments:
shuf -n $(( $(wc -l < $FILENAME) / 10 )) $FILENAME
I think this is the best way:
file=your file here
lines_in_file=`wc -l < $file`
lines_wanted=$(($lines_in_file/10))
shuf -n $lines_wanted $file
Another creative solution:
echo $RANDOM generates a random number between 0 and 32767
Then, you can do:
echo $(($RANDOM*100000/32767+1))
.. to obtain a random number between 1 and 100000 (as nwellnhof points out in comments below, it's not any number from 1 to 100000, but one of 32768 possible numbers between 1 and 100000, so it's kind of a projection...)
So:
file=your file here
lines_in_file=`wc -l $file | awk {'print $1'}`
lines_wanted=$(($lines_in_file/10))
for i in `seq 1 $lines_wanted`
do line_chosen=$(($RANDOM*${lines_in_file}/32767+1))
sed "${line_chosen}q;d" $file
done
I have this script that will give you roughly 1/x of the lines.
#!/usr/bin/perl -w
use strict;
my $ratio = shift;
while (<>) {
print if ((rand) <= 1 / $ratio);
}
For a large enough $ratio, assuming a uniform distribution of rand's outputs.
Assuming you call this random_select_ratio.pl, run it like this to get 10% of the lines:
random_select_ratio.pl 10 my_file
or
cat my_file | random_select_ratio.pl 10
Just run this awk script with the file as input.
BEGIN { srand() }{ if (rand() < 0.10) print $0; }
It's been a while since I used awk, but I do believe that should do it.
And in fact it does work exactly as expected. Approximately 10% of the lines are output. On my Windows machine using GNU awk, I ran:
awk "BEGIN { srand() }{ if (rand() < 0.10) print $0; }" <numbers.txt >nums.txt
numbers.txt contained the numbers 1 through 1,000,000, one per line. Over multiple runs, the file nums.txt typically contained about 100,200 items, which works out to 10.02%.
If there's a problem with what awk considers a line, you can always change the record separator. That is RS = "\n"; But that should be the default on Linux machine.
Here's one way to do Edit idea 1. in bash:
while readarray -n10 a; do
[ ${#a[#]} = 0 ] && break
printf "%s" "${a[${RANDOM: -1:1}]}"
done < largefile.txt
Kinda slow, though it was about 2.5x faster than the sort -R method on my machine.
We use readarray to read from the input stream 10 lines at a time into an array. Then we use the last digit of $RANDOM as an index into that array and print the resulting line.
Using the readarray/printf combo should ensure the \r characters are passed through unmodified, as in the edited requirement.

Separating Awk input in Unix

I am trying to write an Awk program that takes two dates separated by / so 3/22/2013 for example and breaks them into the three separate numbers so that I could work with the 3 the 22 and the 2013 separately.
I would like the program to be called like
awk -f program_file 2/23/2013 4/15/2013
so far I have:
BEGIN {
d1 = ARGV[1]
d2 = ARGV[2]
}
This will accept both dates, but I am not sure how to break them up. Additionally, the above program must be called with nawk, with awk says it cannot open 2/23/2013.
Thanks in advance.
you cannot do it in your way. since awk thinks you have two files as input. that is, your date strings were looked as filenames. That's why you got that error message.
if the two dates are stored in shell variables, you could:
awk -vd1="$d1" -vd2="$d2" BEGIN{split(d1,one,"/");split(d2,two,"/");...}{...}'
the ... part is your logic, in the line above, the splitted parts are stored in array one and two. for example, you just want to print the elements of one:
kent$ d1=2/23/2013
kent$ d2=4/15/2013
kent$ awk -vd1="$d1" -vd2="$d2" 'BEGIN{split(d1,one,"/");split(d2,two,"/"); for(x in one)print one[x]}'
2
23
2013
or as other suggested, you could use FS of awk, but you have to do in this way:
kent$ echo $d1|awk -F/ '{print $1,$2,$3}'
2 23 2013
if you pass the two vars in one short, the -F/ won't work, unless they(the two dates) are in different lines
hope it helps
How about it?
[root#01 opt]# echo 2/23/2013 | awk -F[/] '{print $1}'
2
[root#01 opt]# echo 2/23/2013 | awk -F[/] '{print $2}'
23
[root#01 opt]# echo 2/23/2013 | awk -F[/] '{print $3}'
2013
You could decide to use / as a field separator, and pass -F / to GNU awk (or to nawk)
If you're on a machine with nawk and awk, there's a chance you're on Solaris and using /bin/awk or /usr/bin/awk, both of which are old, broken awk which must never be used. Use /usr/xpg4/bin/awk on Solaris instead.
Anyway, to your question:
$ cat program_file
BEGIN {
d1 = ARGV[1]
d2 = ARGV[2]
split(d1,array,/\//)
print array[1]
print array[2]
print array[3]
exit
}
$ awk -f program_file 2/23/2013 4/15/2013
2
23
2013
There may be better approaches though. Post some more info about what you're trying to do if you'd like help.

Resources