comparing two colums and printing a new file using awk - linux

I have two files with 4 columns each. I am trying to compare the second column from the first file with the second column from the second file. I have managed to check on some websites how to do it, and it works, but I have a problem printing a new file containing the whole second file and 3rd and 4nd column from the first file. I have tried to use such a syntax:
awk 'NR==FNR{label[$2]=$2;date[$2]=$3;date[$2]=$4;next}; ($2==label[$2]){print $0" "date[$2]}' file1 file2
I was only able to add the 4th column from the first file. Where do I make a mistake?

Could you please try following, since no samples are given so not tested it.
awk 'NR==FNR{label[$2]=$2;date[$2]=$3;date[$2]=$3 OFS $4;next}; ($2==label[$2]){print $0" "date[$2]}' file1 file2
Basically you need to change from date[$2]=$4 TO date[$2]=$3 OFS $4 to get both 3rd and 4th field in the output.

Related

AWK - Show lines where column contains a specific string

I have a document (.txt) composed like that.
info1: info2: info3: info4
And I want to show some information by column.
For example, I have some different information in "info3" shield, I want to see only the lines who are composed by "test" in "info3" column.
I think I have to use sort but I'm not sure.
Any idea ?
The previous answers are assuming that the third column is exactly equal to test. It looks like you were looking for columns where the value included test. We need to use awk's match function
awk -F: 'match($3, "test")' file
You can use awk for this. Assuming your columns are de-limited by : and column 3 has entries having test, below command lists only those lines having that value.
awk -F':' '$3=="test"' input-file
Assuming that the spacing is consistent, and you're looking for only test in the third column, use
grep ".*:.*: test:.*" file.txt
Or to take care of any spacing that might occur
grep ".*:.*: *test *:.*" file.txt

compare the length of fields of a csv files using awk

I want to compare the lenght of fields of a csv file that contain 2columns, and keep only the lines in witch the lenght of field in the second column exceeds the one in the first column for example if I have the following csv file
ABRTYU;ABGTYUI
GHYUI;GTYIOKJ
RTYUIOJ;GHYU
I want to get as result
ABRTYU;ABGTYUI
GHYUI;GTYIOKJ
Blockquote
like this?
kent$ awk -F';' 'length($2)>length($1)' file
ABRTYU;ABGTYUI
GHYUI;GTYIOKJ

grep in two files returning two columns

I have a big file like this:
79597700
79000364
79002794
79002947
And other big file like this:
79597708|11
79000364|12
79002794|11
79002947|12
79002940|12
Then i need the numbers that appear in the second file that are in the first file bur with the second column, something like:
79000364|12
79002794|11
79002947|12
79002940|12
(The MSISDN that appear in the first file and appear in the second file, but i need return the two columns of the second file)
Who can help me, because whit a grep does not work to me because return only the MSISDN without the second column
and with a comm is not possible because each row is different in the files
Try this:
grep -f bigfile1 bigfile2
Using awk:
awk -F"|" 'FNR==NR{f[$0];next}($1 in f)' file file2
Source: return common fields in two files

Awk item from column one, then awk again using the result in column two?

I have a CSV that I need to sort through using a key provided that'd be in the first column of said CSV, and then I need to awk again and search via column 2 and return all matching data.
So: I'd awk with the first key, and it'd return just the result of the second column [so just cell]. Then I'd awk using the cell contents and have it return all matching rows.
I have almost no bash/awk scripting experience so please bear with me. :)
Input:
KEY1,TRACKINGKEY1,TRACKINGNUMBER1-1,PACKAGENUM1-1
,TRACKINGKEY1,TRACKINGNUMBER1-2,PACKAGENUM1-2
,TRACKINGKEY1,TRACKINGNUMBER1-3,PACKAGENUM1-3
,TRACKINGKEY1,TRACKINGNUMBER1-4,PACKAGENUM1-4
,TRACKINGKEY1,TRACKINGNUMBER1-5,PACKAGENUM1-5
KEY2,TRACKINGKEY2,TRACKINGNUMBER2-1,PACKAGENUM2-1
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1
,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2
Command:
awk -v key=KEY1 -F' *,' '$1==key{f=1} $1 && $1!=key{f=0} f{print $3}' file
Output:
TRACKINGNUMBER1-1
TRACKINGNUMBER1-2
TRACKINGNUMBER1-3
TRACKINGNUMBER1-4
TRACKINGNUMBER1-5
That's what I've tried. I'd like to awk so if I search for key1 that trackingkey1 is returned, then awk with trackingkey one and output each full matching row.
Sorry, I should have been more clear. For example - if I searched for KEY3 I'd like the output to be:
KEY3,TRACKINGKEY3,TRACKINGNUMBER3-1,PACKAGENUM3-1
,TRACKINGKEY3,TRACKINGNUMBER3-2,PACKAGENUM3-2
So what I want is I'd search for KEY3 initially, and it would return TRACKINGKEY3. I'd then search for TRACKINGKEY3 and it would return each full row with said TRACKINGKEY3 in it.
Does this do what you want?
awk -v key=KEY3 -F ',' '{if($1==key)tkey=$2;if($2==tkey)print}' file
It only makes a single pass through the file, not the multiple passes you described, but the output matches what you requested. When it finds the specified key in the first column is grabs the tracking key from the second column. It prints every line that matches this tracking key.
A shorter way to achieve the same thing is by using awk's implicit printing:
awk -v key=KEY3 -F ',' '$1==key{tkey=$2}$2==tkey' file

Chunk a large file based on regex (LInux)

I have a large text file and I want to chunk it to smaller files based on distinct value of a column , columns are separated by comma (it's a csv file) and there are lots of distinct values :
e.g.
1012739937,2006-11-28,d_02245211
1012739937,2006-11-28,d_02238545
1012739937,2006-11-28,d_02236564
1012739937,2006-11-28,d_01918338
1012739937,2006-11-28,d_02148765
1012739937,2006-11-28,d_00868949
1012739937,2006-11-28,d_01908448
1012740478,1998-06-26,d_01913689
1012740478,1998-06-26,i_4869
1012740478,1998-06-26,d_02174766
I want to chunk the file into smaller files such that each file contains records belonging to one year (one for records of 2006 , one for records of 1998 , etc)
(here we may have limited number of years , but I want to the same thing with larger number of distinct values of a specific column)
You can use awk:
awk -F, '{split($2,d,"-");print > d[1]}' file
Explanation:
-F, tells awk that input fields are separated by ','
split($2,d,"-") splits the second column (the date) by '-'
and puts the bits into the array 'd'
print > d[1] prints the whole input line into a file named after the year
A quick awk solution, if slightly fragile (assumes the second column, if it exists, always starts yyyy)
awk -F, '$2{print > (substr($2,0,4) ".csv")}' test.in
It will split input into files yyyy.csv; make sure they don't exist in your current directory or they will be overwritten.
A different awk take: use a slightly more complicated field separator:
awk -F '[,-]' '{print > $2}' file

Resources