I want to compare the lenght of fields of a csv file that contain 2columns, and keep only the lines in witch the lenght of field in the second column exceeds the one in the first column for example if I have the following csv file
ABRTYU;ABGTYUI
GHYUI;GTYIOKJ
RTYUIOJ;GHYU
I want to get as result
ABRTYU;ABGTYUI
GHYUI;GTYIOKJ
Blockquote
like this?
kent$ awk -F';' 'length($2)>length($1)' file
ABRTYU;ABGTYUI
GHYUI;GTYIOKJ
Related
I have two files with 4 columns each. I am trying to compare the second column from the first file with the second column from the second file. I have managed to check on some websites how to do it, and it works, but I have a problem printing a new file containing the whole second file and 3rd and 4nd column from the first file. I have tried to use such a syntax:
awk 'NR==FNR{label[$2]=$2;date[$2]=$3;date[$2]=$4;next}; ($2==label[$2]){print $0" "date[$2]}' file1 file2
I was only able to add the 4th column from the first file. Where do I make a mistake?
Could you please try following, since no samples are given so not tested it.
awk 'NR==FNR{label[$2]=$2;date[$2]=$3;date[$2]=$3 OFS $4;next}; ($2==label[$2]){print $0" "date[$2]}' file1 file2
Basically you need to change from date[$2]=$4 TO date[$2]=$3 OFS $4 to get both 3rd and 4th field in the output.
So I have a large CSV file (in Gb) where I have multiple columns, the first two columns are :
Invoice number|Line Item Number
I want a unix / linux /ubuntu command which can merge this two columns and create a new column which is separated by separator ':', so for eg : If invoice number is 64789544 and Line Item Number is 234533, then my Merged value should be
64789544:234533
Can it really be achieved, If yes can the merged column is possible to be added back to the source csv file.
You can use the following sed command:
$ cat large.csv
Invoice number|Line Item Number|Other1|Other2
64789544|234533|abc|134
64744123|232523|cde|awc
$ sed -i.bak 's/^\([^|]*\)|\([^|]*\)/\1:\2/' large.csv
$ cat large.csv
Invoice number:Line Item Number|Other1|Other2
64789544:234533|abc|134
64744123:232523|cde|awc
Just be aware that it will take a backup of your input file just in case so you need to have enough space in your file system.
Explanations:
s/^\([^|]*\)|\([^|]*\)/\1:\2/ this command will replace the first two field of your CSV separated by | and will replace the separator by : using back references what will merge the 2 columns.
If you are sure about what you are doing, you can change -i.bak in -i to avoid taking a backup of the CSV file.
Perhaps with this simple sed
sed 's/|/:/' infile
I have a big file like this:
79597700
79000364
79002794
79002947
And other big file like this:
79597708|11
79000364|12
79002794|11
79002947|12
79002940|12
Then i need the numbers that appear in the second file that are in the first file bur with the second column, something like:
79000364|12
79002794|11
79002947|12
79002940|12
(The MSISDN that appear in the first file and appear in the second file, but i need return the two columns of the second file)
Who can help me, because whit a grep does not work to me because return only the MSISDN without the second column
and with a comm is not possible because each row is different in the files
Try this:
grep -f bigfile1 bigfile2
Using awk:
awk -F"|" 'FNR==NR{f[$0];next}($1 in f)' file file2
Source: return common fields in two files
I have a requirment where, I get files from source with different number of delimeter data, i need to make them to one standard number of delimeted data.
source file1:
AA,BB,CC,0,0
AC,BD,DB,1,0
EE,ER,DR,0,0
What i want to do is appened an extra 3 zeros at the end for each row
AA,BB,CC,0,0,0,0,0
AC,BD,DB,1,0,0,0,0
EE,ER,DR,0,0,0,0,0
The source file always contains less number of column data . Can anyone help on this.
Thanks In Advance
Try this, it will add particular string after each line of mentioned file
sed '1,$ s/$/,0,0,0/' infile > outfile
Here is what I tried;
sed can do it in place with the -i flag
sed -i "s/$/,0,0,0/g" file
I have a large text file and I want to chunk it to smaller files based on distinct value of a column , columns are separated by comma (it's a csv file) and there are lots of distinct values :
e.g.
1012739937,2006-11-28,d_02245211
1012739937,2006-11-28,d_02238545
1012739937,2006-11-28,d_02236564
1012739937,2006-11-28,d_01918338
1012739937,2006-11-28,d_02148765
1012739937,2006-11-28,d_00868949
1012739937,2006-11-28,d_01908448
1012740478,1998-06-26,d_01913689
1012740478,1998-06-26,i_4869
1012740478,1998-06-26,d_02174766
I want to chunk the file into smaller files such that each file contains records belonging to one year (one for records of 2006 , one for records of 1998 , etc)
(here we may have limited number of years , but I want to the same thing with larger number of distinct values of a specific column)
You can use awk:
awk -F, '{split($2,d,"-");print > d[1]}' file
Explanation:
-F, tells awk that input fields are separated by ','
split($2,d,"-") splits the second column (the date) by '-'
and puts the bits into the array 'd'
print > d[1] prints the whole input line into a file named after the year
A quick awk solution, if slightly fragile (assumes the second column, if it exists, always starts yyyy)
awk -F, '$2{print > (substr($2,0,4) ".csv")}' test.in
It will split input into files yyyy.csv; make sure they don't exist in your current directory or they will be overwritten.
A different awk take: use a slightly more complicated field separator:
awk -F '[,-]' '{print > $2}' file