insert values of a column into other column - linux

I have a tab-delimited .txt file with two columns and long list of values in both columns
col1 col2
1 a
2 b
3 c
... ...
I want to convert this now to
col1
1
a
2
b
3
c
So that he insert the values from column 2 into column 1 at the correct location.
Is there any way to do this, maybe using awk, or something else through the command line?

You can ask awk to print first column and then second column. By using print for each case, you ensure you have a new line in between them:
awk -F"\t" '{print $1; print $2}' file
Or the following if you just want to print the 1st column on the first line:
awk -F"\t" 'NR==1 {print $1; next} {print $1; print $2}' file
The second command returns the following for your given input:
col1
1
a
2
b
3
c

this should do:
awk -F"\t" -v OFS="\n" '{$1=$1}7' file

Related

Grep csv with arguments from another csv

If I have the following in file1.csv (space delimited):
RED 4 VWX
BLUE 2 MNO
BLUE 7 DEF
PURPLE 6 JKL
BLACK 8 VWX
BROWN 1 MNO
RED 1 GHI
RED 7 ABC
And the following in file2.csv (comma delimited):
BROWN,2
RED,5
YELLOW,8
Is there a way to use file2.csv to search file1.csv for matching lines? Currently, if I want to use line 1 terms from file2.csv to search file1.csv, I have to manually enter the following:
grep "BROWN" file1.csv | grep "2"
I would like to automate this search to find lines in file1.csv that match BOTH items in a given line in file2.csv. I have tried some awk commands, but am having a hard time using awk output as an argument in grep. I am running all this through a standard Mac terminal (so I guess I'm using bash?) Any help is greatly appreciated. Thank you!
awk one-liner
awk 'FNR==NR{a[$1]=$2; next} ($1 in a) && a[$1]==$2' FS=, file2.csv FS=" " file1.csv
FNR==NR{a[$1]=$2; next} : To read first input file, here file2.csv and create an associative array a with keys as column 1 in file2.csv and value as item number.
($1 in a) && a[$1]==$2: While iterating over second input file i.e file1.csv here check if column 1 value exists as key in array a. If it exists check if item number matches. If it matches the result will be 1 and line would be printed.
Or
simply using grep
grep -wf <(tr "," " " <file2) file1
Here we are replacing , with space in file2 using tr and using each line in file2 as pattern to search in file1 using the -f option provided by our lovely grep
-w is to match with word boundaries so that ABC 1 won't match with ABC 123
You can use awk to do the matching.
awk 'BEGIN { FS=","; }
NR == FNR { a[$1,$2]++; next; }
FNR == 1 { FS=" "; }
($1,$2) in a' file2.csv file1.csv
The second line creates an array whose keys are the values in file2.csv. The third line changes the field separator to space when we're processing file1.csv, and the last line matches any line where the first two fields are a key in the array.

awk print number of row only in uniq column

I have data set like this:
1 A
1 B
1 C
2 A
2 B
2 C
3 B
3 C
And I have a script which calculates me:
Number of occurrences in searching string
Number of rows
awk -v search="A" \
'BEGIN{count=0} $2 == search {count++} END{print count "\n" NR}' input
That works perfectly fine.
I would like to add to my awk one liner number of unique lines from the first column.
So the output should be separated by \n:
2
8
3
I can do this in separate awk code, but I am not able to integrate it to my original awk code.
awk '{a[$1]++}END{for(i in a){print i}}' input | wc -l
Any idea how to integrate it in one awk solution without piping ?
Looks like you want this:
awk -v search="A" '{a[$1]++}
$2 == search {count++}
END{OFS="\n";print count+0, NR, length(a)}' file

AWK write to new column base on if else of other column

I have a CSV file with columns A,B,C,D. Column D contains values on a scale of 0 to 1. I want to use AWK to write to a new column E base in values in column D.
For example:
if value in column D <0.7, value in column E = 0.
if value in column D>=0.7, value in column E =1.
I am able to print the output of column E but not sure how to write it to a new column. Its possible to write the output of my code to a new file then paste it back to the old file but i was wondering if there was a more efficient way. Here is my code:
awk -F"," 'NR>1 {if ($3>=0.7) $4= "1"; else if ($3<0.7) $4= "0"; print $4;}' test_file.csv
below awk command should give you intended output
cat yourfile.csv|awk -F "," '{if($4>=0.7)print $0",1";else if($4<0.7)print $0",0"}' > test_file.csv
You can use:
awk -F, 'NR>1 {$0 = $0 FS (($4 >= 0.7) ? 1 : 0)} 1' test_file.csv

How to use awk to get the result of computation of column1 value of the same column2 value in 2 csv files in Ubuntu?

I am using ubuntu and we got a csv file1.csv with 2 columns looks like
a,1
b,2
c,3
...
and another file2.csv with 2 columns looks like
a,4
b,3
d,2
...
Some of column 1 value appear in file1.csv but not in file2.csv and vice cersa and these values should not be in result.csv. Say the value of first column in file1.csv is x and the value of first column in file2.csv with the same column2 value is y. How to use awk to compute (x-y)/(x+y) of second lines of 2 csv files in Ubuntu to get the result.csv like this:
a,-0.6
b,-0.2
-0.6 is computed by (1-4)/(1+4)
-0.2 is computed by (2-3)/(2+3)
What about this?
$ awk 'BEGIN{FS=OFS=","} FNR==NR {a[$1]=$2; next} {if ($1 in a) print $1,(a[$1]-$2)/(a[$1]+$2)}' f1 f2
a,-0.6
b,-0.2
Explanation
BEGIN{FS=OFS=","} set input and output field separators as comma.
FNR==NR {a[$1]=$2; next} when processinig first file, store in the array a[] the values like a[first col]=second col.
{if ($1 in a) print $1,(a[$1]-$2)/(a[$1]+$2)} when looping through second file, on each line do: check if the first col is stored in the a[] array; if so, print (x-y)/(x+y), being x=stored value and y=current second column.

Removing last column from rows that have three columns using bash

I have a file that contains several lines of data. Some lines contain three columns, but most contain only two. All lines are single-tab separated. For those that contain three columns, the third column is typically redundant and contains the same data as the second so I'd like to remove it.
I imagine awk or cut would be appropriate, but I'm drawing a blank on how to test the row for three columns so my script will only work on those rows. I know awk is a very powerful language with logic and whatnot built into it, I'm just not that strong with it.
I looked at a similar question, but I'm not sure what is going on with the awk answer. Should the -4 be -1 since I only want to remove one column? What about if the row has two columns; will it remove the second even though I don't want to do anything?
I modified it to what I think it would be:
awk -F"\t" -v OFS="\t" '{ for (i=1;i<=NF-4;i++){ print $i }}'
But when I run it (with the file) nothing happens. If I change NF-1 or NF-2 I get some output, but it only a handful of lines and only the first column.
Can anyone clue me into what I should be doing?
If you just want to remove the third column, you could just print the first and the second:
awk -F '\t' '{print $1 "\t" $2}'
And it's similar to cut:
cut -f 1,2
The awk variable NF gives you the number for fields. So an expression like this should work for you.
awk -F, 'NF == 3 {print $1 "," $2} NF != 3 {print $0}'
Running it on an input file like so
a,b,c
x,y
u,v,w
l,m
gives me
$ cat test | awk -F, 'NF == 3 {print $1 "," $2} NF != 3 {print $0}'
a,b
x,y
u,v
l,m
This might work for you (GNU sed):
sed 's/\t[^\t]*//2g' file
Restricts the file to two columns.
awk 'NF==3{print $1"\t"$2}NF==2{print}' your_file
Testde below:
> cat temp
1 2
3 4 5
6 7
8 9 10
>
> awk 'NF==3{print $1"\t"$2}NF==2{print}' temp
1 2
3 4
6 7
8 9
>
or in a much more simplere way in awk:
awk 'NF==3{print $1"\t"$2}NF==2' your_file
Or you can also go with perl:
perl -lane 'print "$F[0]\t$F[1]"' your_file

Resources