Using awk with variable - linux

I am using this awk command to extract three rows from a text file.
awk 'BEGIN {FS="\t";OFS=","}; {print $1,$3,$10}' $FILENAME > $OUTPUT
I wish to specify the column numbers as a variable separately so it will be easier to modify in the future like this:
COLUMNS=$1,$3,$10
awk 'BEGIN {FS="\t";OFS=","}; {print $COLUMNS}' $FILENAME > $OUTPUT
However it pulls all columns into the output, not only the 3 I specified. How do I do this properly?

like this ?
$ more file
a,b,c,d,e
1,2,3,4,5
$ a='$1,$2,$NF'
$ awk -F, "{print $a}" file
a b e
1 2 5

Related

How to merge column output to the end of a row in the previous column?

I have a .csv file containing three columns and I need to merge the value of column 2 with the end of the row of column 1.
The .csv file contains thousands of rows and this needs to be done for each row.
Iv'e tried using awk but I'm finding it difficult to get the code correct
cat file.csv | awk '{print $1, $2}'
awk '{if ($2!= " ") {print $1+$2 }}'
These of course don't work
Sample input:
The command used to produce the actual output is simply:
cat test.csv
[2,4,5,6,2,34,61,32,34,54,34, 22] 0.144354
[3,4,6,4,5,6,7,1,2,3,4,53,23, 34] 0.332453
[2,43,6,2,1,2,5,8,9,0,8,6,34, 21] 0.347643
Desired Output:
col1 col2
[2,4,5,6,2,34,61,32,34,54,34,22] 0.144354
[3,4,6,4,5,6,7,1,2,3,4,53,23,34] 0.332453
[2,43,6,2,1,2,5,8,9,0,8,6,34,21] 0.347643
Replace "comma followed by one or more spaces" with "comma":
sed 's/, \{1,\}/,/' file.csv
sed 's/, */,/g' file.csv
Print columns $1 and $2 as $1 (optionally separate with a tab):
awk '{print $1 $2, $3}' OFS='\t' file.csv
You can try:
awk '{printf("%s%s\t%s\n",$1,$2,$3)}' file.cvs
I only see spaces after a comma when you don't want them.
$: sed -E 's/,\s+/,/' file.csv
[2,4,5,6,2,34,61,32,34,54,34,22] 0.144354
[3,4,6,4,5,6,7,1,2,3,4,53,23,34] 0.332453
[2,43,6,2,1,2,5,8,9,0,8,6,34,21] 0.347643
Add -i (after the -E) to make it an in-place edit.
$: sed -Ei 's/,\s+/,/' file.csv
$: cat file.csv
[2,4,5,6,2,34,61,32,34,54,34,22] 0.144354
[3,4,6,4,5,6,7,1,2,3,4,53,23,34] 0.332453
[2,43,6,2,1,2,5,8,9,0,8,6,34,21] 0.347643

find records longer/shorter than a particular col

this is my file: FILEABC.txt
Name|address|age|country
john|london|12|UK
adam|newyork|39|US|X12|123
jake|madrid|45|ESP
ram|delhi
joh|cal|34|US|788
I wanted to find the the header count in the file. so i've this command
cat FILEABC.txt | awk --field-separator='|' '{print NF}' | sort -n |uniq -c
the result i get for this cmd is
cat FILEABC.txt | awk --field-separator='|' '{print NF}' | sort -n |uniq -c
1 2
3 4
1 5
1 6
My requirement is that, how do i find those records that have only 2 fields, 4 fields and so on from my file.
for ex,
if want to see the records having only 2 col:
ram|delhi
if want to see rec's having more than 4 col:
adam|newyork|39|US|X12|123
If you want to only print the records which have 2 fields then following may help you in same.
awk -F"|" 'NF==2' Input_file
For any kind of records if you need a line which has more than 4 fields then change above condition to NF>4 or you need line which have more than 5 fields eg--> NF>5
Explanation: BY doing -F"|" I am making sure field separator is pipe here, then NF is an awk out of the box variable which defines the TOTAL number of fields in a line, so as per your request checking if number of fields are more than 2 here, if this condition is TRUE then print the current line(where I have NOT written print because awk works on method of condition and action, so if condition is TRUE here I am not mentioning any action and by default action print will happen for that line).
Using awk, variable NF gives total number of fields in record/row, by default awk use single space as field separator, if you alter FS, it will calculate NF based on field separator mentioned, so what you can do is
awk -v FS='|' 'NF==2' infile
Which is same as
# Usual Syntax : awk 'condition { action }' infile
awk -v FS='|' 'NF==2{ print }' infile
For more than 4 fields,
awk -v FS='|' 'NF > 4' infile
you can also use grep to filter 2-columed records:
grep '^[^|]*|[^|]*$' FILEABC.txt
It will output:
ram|delhi

awk print number of row only in uniq column

I have data set like this:
1 A
1 B
1 C
2 A
2 B
2 C
3 B
3 C
And I have a script which calculates me:
Number of occurrences in searching string
Number of rows
awk -v search="A" \
'BEGIN{count=0} $2 == search {count++} END{print count "\n" NR}' input
That works perfectly fine.
I would like to add to my awk one liner number of unique lines from the first column.
So the output should be separated by \n:
2
8
3
I can do this in separate awk code, but I am not able to integrate it to my original awk code.
awk '{a[$1]++}END{for(i in a){print i}}' input | wc -l
Any idea how to integrate it in one awk solution without piping ?
Looks like you want this:
awk -v search="A" '{a[$1]++}
$2 == search {count++}
END{OFS="\n";print count+0, NR, length(a)}' file

Removing last column from rows that have three columns using bash

I have a file that contains several lines of data. Some lines contain three columns, but most contain only two. All lines are single-tab separated. For those that contain three columns, the third column is typically redundant and contains the same data as the second so I'd like to remove it.
I imagine awk or cut would be appropriate, but I'm drawing a blank on how to test the row for three columns so my script will only work on those rows. I know awk is a very powerful language with logic and whatnot built into it, I'm just not that strong with it.
I looked at a similar question, but I'm not sure what is going on with the awk answer. Should the -4 be -1 since I only want to remove one column? What about if the row has two columns; will it remove the second even though I don't want to do anything?
I modified it to what I think it would be:
awk -F"\t" -v OFS="\t" '{ for (i=1;i<=NF-4;i++){ print $i }}'
But when I run it (with the file) nothing happens. If I change NF-1 or NF-2 I get some output, but it only a handful of lines and only the first column.
Can anyone clue me into what I should be doing?
If you just want to remove the third column, you could just print the first and the second:
awk -F '\t' '{print $1 "\t" $2}'
And it's similar to cut:
cut -f 1,2
The awk variable NF gives you the number for fields. So an expression like this should work for you.
awk -F, 'NF == 3 {print $1 "," $2} NF != 3 {print $0}'
Running it on an input file like so
a,b,c
x,y
u,v,w
l,m
gives me
$ cat test | awk -F, 'NF == 3 {print $1 "," $2} NF != 3 {print $0}'
a,b
x,y
u,v
l,m
This might work for you (GNU sed):
sed 's/\t[^\t]*//2g' file
Restricts the file to two columns.
awk 'NF==3{print $1"\t"$2}NF==2{print}' your_file
Testde below:
> cat temp
1 2
3 4 5
6 7
8 9 10
>
> awk 'NF==3{print $1"\t"$2}NF==2{print}' temp
1 2
3 4
6 7
8 9
>
or in a much more simplere way in awk:
awk 'NF==3{print $1"\t"$2}NF==2' your_file
Or you can also go with perl:
perl -lane 'print "$F[0]\t$F[1]"' your_file

unix - count of columns in file

Given a file with data like this (i.e. stores.dat file)
sid|storeNo|latitude|longitude
2|1|-28.03720000|153.42921670
9|2|-33.85090000|151.03274200
What would be a command to output the number of column names?
i.e. In the example above it would be 4. (number of pipe characters + 1 in the first line)
I was thinking something like:
awk '{ FS = "|" } ; { print NF}' stores.dat
but it returns all lines instead of just the first and for the first line it returns 1 instead of 4
awk -F'|' '{print NF; exit}' stores.dat
Just quit right after the first line.
This is a workaround (for me: I don't use awk very often):
Display the first row of the file containing the data, replace all pipes with newlines and then count the lines:
$ head -1 stores.dat | tr '|' '\n' | wc -l
Unless you're using spaces in there, you should be able to use | wc -w on the first line.
wc is "Word Count", which simply counts the words in the input file. If you send only one line, it'll tell you the amount of columns.
You could try
cat FILE | awk '{print NF}'
Perl solution similar to Mat's awk solution:
perl -F'\|' -lane 'print $#F+1; exit' stores.dat
I've tested this on a file with 1000000 columns.
If the field separator is whitespace (one or more spaces or tabs) instead of a pipe:
perl -lane 'print $#F+1; exit' stores.dat
If you have python installed you could try:
python -c 'import sys;f=open(sys.argv[1]);print len(f.readline().split("|"))' \
stores.dat
This is usually what I use for counting the number of fields:
head -n 1 file.name | awk -F'|' '{print NF; exit}'
select any row in the file (in the example below, it's the 2nd row) and count the number of columns, where the delimiter is a space:
sed -n 2p text_file.dat | tr ' ' '\n' | wc -l
Proper pure bash way
Simply counting columns in file
Under bash, you could simply:
IFS=\| read -ra headline <stores.dat
echo ${#headline[#]}
4
A lot quicker as without forks, and reusable as $headline hold the full head line. You could, for sample:
printf " - %s\n" "${headline[#]}"
- sid
- storeNo
- latitude
- longitude
Nota This syntax will drive correctly spaces and others characters in column names.
Alternative: strong binary checking for max columns on each rows
What if some row do contain some extra columns?
This command will search for bigger line, counting separators:
tr -dc $'\n|' <stores.dat |wc -L
3
If there are max 3 separators, then there are 4 fields... Or if you consider:
each separator (|) is prepended by a Before and followed by an After, trimed to 1 letter by word:
tr -dc $'\n|' <stores.dat|sed 's/./b&a/g;s/ab/a/g;s/[^ab]//g'|wc -L
4
Counting columns in a CSV file
Under bash, you may use csv loadable plugins:
enable -f /usr/lib/bash/csv csv
IFS= read -r line <file.csv
csv -a fields <<<"$line"
echo ${#fields[#]}
4
For more infos, see How to parse a CSV file in Bash?.
Based on Cat Kerr response.
This command is working on solaris
awk '{print NF; exit}' stores.dat
you may try:
head -1 stores.dat | grep -o \| | wc -l

Resources