Extract value based on column header from Comma separated file using bash - linux

I want to extract 1st value from a csv for a specific column name using bash. For example, i want to extract first value of column "bb". Columns can be in any order
aa,bb,cc
1,2,3
4,5,6
The output should be 2.

Awk solution:
awk -F',' 'NR == 1{ for(i=1; i<=NF; i++) if ($i == "bb") { pos = i; break } }
NR == 2{ print $pos; exit }' file.csv
The output:
2

Use this using csvkit :
csvcut -c 2 file.csv | awk 'NR==2'
Output :
2

Related

Sum of 2nd and 3rd column for same value in 1st column

I want to sum the value in column 2nd and 3rd column for same value in 1st column
1555971000 6 1
1555971000 0 2
1555971300 2 0
1555971300 3 0
Output would be like
1555971000 6 3
1555971300 5 0
I have tried below command
awk -F" " '{b[$2]+=$1} END { for (i in b) { print b[i],i } } '
but this seems to be for only one column.
Here is another way with reading Input_file 2 times and it will provide output in same sequence as Input_file's sequence.
awk 'FNR==NR{a[$1]+=$2;b[$1]+=$3;next} ($1 in a){print $1,a[$1],b[$1];delete a[$1]}' Input_file Input_file
if data in 'd' without sort, tried on gnu awk,
awk 'BEGIN{f=1} {if($1==a||f){b+=$2;c+=$3;f=0} else{print a,b,c;b=$2;c=$3} a=$1} END{print a,b,c}' d
with sort gnu awk
awk '{w[NR]=$0} END{asort(w);f=1;for(;i++<NR;){split(w[i],v);if(v[1]==a||f){f=0;b+=v[2];c+=v[3]} else{print a,b,c;b=v[2];c=v[3];} a=v[1]} print a,b,c;}' d
You can do it with awk by first saving the fields in the first record, and then for all subsequent records, comparing if the first field matches, if so, add the contents of fields two and three and continue. If the first field fails to match, then output your first field and the running-sums, e.g.
awk '{
if ($1 == a) {
b+=$2; c+=$3;
}
else {
print a, b, c; a=$1; b=$2; c=$3;
}
} END { print a, b, c; }' file
With your input in file, you can copy and paste the foregoing into your terminal and obtain, the following:
Example Use/Output
$ awk '{
> if ($1 == a) {
> b+=$2; c+=$3;
> }
> else {
> print a, b, c; a=$1; b=$2; c=$3;
> }
> } END { print a, b, c; }' file
1555971000 6 3
1555971300 5 0
Using awk Arrays
A shorter more succinct alternative using arrays that does not require your input to be in sorted order would be:
awk '{a[$1]+=$2; b[$1]+=$3} END{ for (i in a) print i, a[i], b[i] }' file
(same output)
Using arrays allows the summing of columns for like field1 to work equally well if your data file contained the following lines in random order, e.g.
1555971300 2 0
1555971000 0 2
1555971000 6 1
1555971300 3 0
Another awk that would work regardless of any order of records whether or not they are not sorted :
awk '{r[$1]++}
r[$1]==1{o[++c]=$1}
{f[$1]+=$2;s[$1]+=$3}
END{for(i=1;i<=c;i++){print o[i],f[o[i]],s[o[i]]}}' file
Assuming when you wrote:
awk -F" " '{b[$2]+=$1} END { for (i in b) { print b[i],i } } '
you meant to write:
awk '{ b[$1]+=$2 } END{ for (i in b) print i,b[i] }'
It shouldn't be a huge leap to figure out:
$ awk '{ b[$1]+=$2; c[$1]+=$3 } END{ for (i in b) print i,b[i],c[i] }' file
1555971000 6 3
1555971300 5 0
Please get the book "Effective Awk Programming", 4th Edition, by Arnold Robbins and just read a paragraph or 2 about fields and arrays.

Find duplicate lines based on column and print both lines and their numbers with awk

I have a following file:
userID PWD_HASH
test 1234
admin 1234
user 6789
abcd 5555
efgh 6666
root 1234
Using AWK,
I need to find both original lines and their duplicates with row numbers,
so that get the output like:
NR $0
1 test 1234
2 admin 1234
6 root 1234
I have tried the following, but it does not print the correct row number with NR :
awk 'n=x[$2]{print NR" "n;print NR" "$0;} {x[$2]=$0;}' file.txt
Any help would be appreciated!
$ awk '
($2 in a) { # look for duplicates in $2
if(a[$2]) { # if found
print a[$2] # output the first, stored one
a[$2]="" # mark it outputed
}
print NR,$0 # print the duplicated one
next # skip the storing part that follows
}
{
a[$2]=NR OFS $0 # store the first of each with NR and full record
}' file
Output (with the header in file):
2 test 1234
3 admin 1234
7 root 1234
Using GAWK, you can do this by below construct : -
awk '
{
NR>1
{
a[$2][NR-1 " " $0];
}
}
END {
for (i in a)
if(length(a[i]) > 1)
for (j in a[i])
print j;
}
' Input_File.txt
Create a 2-dimensional array.
In first dimension, store PWD_HASH and in second dimension, store line number(NR-1) concatenated with whole line($0).
To display only duplicate ones, you can use length(a[i] > 1) condition.
Could you please try following.
awk '
FNR==NR{
a[$2]++
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0
next
}
a[$2]>1{
print b[$2,FNR]
}
' Input_file Input_file
Output will be as follows.
1 test 1234
2 admin 1234
6 root 1234
Explanation: Following is the explanation for above code.
awk ' ##Starting awk program here.
FNR==NR{ ##Checking condition here FNR==NR which will be TRUE when first time Input_file is being read.
a[$2]++ ##Creating an array named a whose index is $1 and incrementing its value to 1 each time it sees same index.
b[$2,FNR]=FNR==1?FNR:(FNR-1) OFS $0 ##Creating array b whose index is $2,FNR and concatenating its value to its own.
next ##Using next for skipping all further statements from here.
}
a[$2]>1{ ##Checking condition where value of a[$2] is greater than 1, this will be executed when 2nd time Input_file read.
print b[$2,FNR] ##Printing value of array b whose index is $2,FNR here.
}
' Input_file Input_file ##Mentioning Input_file(s) names here 2 times.
Without using awk, but GNU coretutils tools:
tail -n+2 file | nl | sort -k3n | uniq -D -f2
tail remove the first line.
nl add line number.
sort based on the 3rd field.
uniq only prints duplicate based on the 3rd field.

Linux SHELL script, read each row for different number of columns

I have file and for example values in it:
1 value1.1 value1.2
2 value2.1
3 value3.1 value3.2 value3.3
I need to read values using the shell script from it but number of columns in each row is different!!!
I know that if for example I want to read second column I will do it like this (for row number as input parameter)
$ awk -v key=1 '$1 == key { print $2 }' input.txt
value1.1
But as I mentioned number of columns is different for each row.
How to make this read dynamic?
For example:
if input parameter is 1 it means I should read columns from the first row so output should be
value1.1 value1.2
if input parameter is 2 it means I should read columns from the second row so output should be
value2.1
if input parameter is 3 it means I should read columns from the third row so output should be
value3.1 value3.2 value3.2
Th point is that number of columns is not static and I should read columns from that specific row until the end of the row.
Thank you
Then you can simply say:
awk -v key=1 'NR==key' input.txt
UPDATED
If you want to process with the column data, there will be several ways.
With awk you can say something like:
awk -v key=3 'NR==key {
for (i=1; i<=NF; i++)
printf "column %d = %s\n", i, $i
}' input.txt
which outputs:
column 1 = value3.1
column 2 = value3.2
column 3 = value3.2
In awk you can access each column value by $1, $2, $3 directly or by $i indirectly where variable i holds either of 1, 2, 3.
If you prefer going with bash, try something like:
line=$(awk -v key=3 'NR==key' input.txt)
set -- $line # split into columns
for ((i=1; i<=$#; i++)); do
echo column $i = ${!i}
done
which outputs the same results.
In bash the indirect access is a little bit complex and you need to say ${!i} where i is a variable name.
Hope this helps.

awk print number of row only in uniq column

I have data set like this:
1 A
1 B
1 C
2 A
2 B
2 C
3 B
3 C
And I have a script which calculates me:
Number of occurrences in searching string
Number of rows
awk -v search="A" \
'BEGIN{count=0} $2 == search {count++} END{print count "\n" NR}' input
That works perfectly fine.
I would like to add to my awk one liner number of unique lines from the first column.
So the output should be separated by \n:
2
8
3
I can do this in separate awk code, but I am not able to integrate it to my original awk code.
awk '{a[$1]++}END{for(i in a){print i}}' input | wc -l
Any idea how to integrate it in one awk solution without piping ?
Looks like you want this:
awk -v search="A" '{a[$1]++}
$2 == search {count++}
END{OFS="\n";print count+0, NR, length(a)}' file

insert values of a column into other column

I have a tab-delimited .txt file with two columns and long list of values in both columns
col1 col2
1 a
2 b
3 c
... ...
I want to convert this now to
col1
1
a
2
b
3
c
So that he insert the values from column 2 into column 1 at the correct location.
Is there any way to do this, maybe using awk, or something else through the command line?
You can ask awk to print first column and then second column. By using print for each case, you ensure you have a new line in between them:
awk -F"\t" '{print $1; print $2}' file
Or the following if you just want to print the 1st column on the first line:
awk -F"\t" 'NR==1 {print $1; next} {print $1; print $2}' file
The second command returns the following for your given input:
col1
1
a
2
b
3
c
this should do:
awk -F"\t" -v OFS="\n" '{$1=$1}7' file

Resources