I have to write a script file to cut the following column and paste it the end of the same row in a new .arff file. I guess the file type doesn't matter.
Current file:
63,male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50'
67,male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1'
The output should be:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
how can I do this? using a Linux script file?
sed -r 's/^([^,]*),(.*)$/\2,\1/' Input_file
Brief explanation,
^([^,]*) would match the first field which separated by commas, and \1 behind refer to the match
(.*)$ would be the remainding part except the first comma, and \2 would refer to the match
Shorter awk solution:
$ awk -F, '{$(NF+1)=$1;sub($1",","")}1' OFS=, input.txt
gives:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
Explanation:
{$(NF+1)=$1 # add extra field with value of field $1
sub($1",","") # search for string "$1," in $0, replace it with ""
}1 # print $0
EDIT: Reading your comments following your question, looks like your swapping more columns than just the first to the end of the line. You might consider using a swap function that you call multiple times:
func swap(i,j){s=$i; $i=$j; $j=s}
However, this won't work whenever you want to move a column to the end of the line. So let's change that function:
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
So now you can do this:
$ cat tst.awk
BEGIN{FS=OFS=","}
{swap(1,NF+1); swap(2,5)}1
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
and:
$ awk -f tst.awk input.txt
male,t,145,233,typ_angina,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,f,160,286,asympt,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
Why using sed or awk, the shell can handle this easily
while read l;do echo ${l#*,},${l%%,*};done <infile
If it's a win file with \r
while read l;do f=${l%[[:cntrl:]]};echo ${f#*,},${l%%,*};done <infile
If you want to keep the file in place.
printf "%s" "$(while read l;do f=${l%[[:cntrl:]]};printf "%s\n" "${f#*,},${l%%,*}";done <infile)">infile
I have a text file with many rows and columns and I want to grep a column by the 'column name'.
M121 M125 M123 M124 M131 M126 M211 N
0.41463252 1.00296561 -0.1713496 0.15923644 -1.49682602 -1.9478695 1.45223392 …
-0.46775802 0.14591103 1.122446 0.83648981 -0.3038532 -1.1841548 2.18074729 …
0.67736835 2.12969375 -0.8187298 0.13582824 -1.49290987 -0.6798428 1.04353114 …
0.08673344 -0.40437672 1.8441559 -0.63679375 0.47998832 0.1702844 0.54029264 …
-0.32606297 -0.95551833 0.6157599 0.02819133 1.44818627 -0.9528659 0.09207864 …
-0.51781121 0.88806507 -0.2913757 -0.00463802 0.05037374 0.953773 0.01244763 …
-0.25724472 0.05119051 0.2109025 -0.26083822 -0.52094072 -0.938595 -0.01275275 …
1.94348766 -1.83607523 1.2010512 -0.54109756 -0.88323831 -0.6263788 -0.96973544 …
0.1900408 -0.61025656 0.4586306 -0.69181051 -0.90713834 0.3589271 0.6870383 …
0.54866057 -0.03861159 -1.505861 0.54871682 -0.24602601 -0.3941754 0.85673905 …
for example, I want to grep M211 column but I don't know the number of column. I tried:
awk '$i == "M211"' filename or awk '$0 == "M211"' filename
awk: illegal field $(), name "i"
input record number 1, filename
source line number 1
Is there any solution ? Thank you.
awk solution - iterates over column names for first line of input file and saves column number if it matches desired pattern. Then print that column. No output if match is not found
$ awk 'NR==1{ for(i=1;i<=NF;i++){if($i=="M125")c=i;} if(c==0)exit; }
{print $c}' ip.txt
M125
1.00296561
0.14591103
2.12969375
-0.40437672
-0.95551833
0.88806507
0.05119051
-1.83607523
-0.61025656
-0.03861159
Similar solution with perl
$ perl -lane '#i = grep {$F[$_] eq "M123"} 0..$#F if $.==1; exit if !#i;
print #F[#i]' ip.txt
M123
-0.1713496
1.122446
-0.8187298
1.8441559
0.6157599
-0.2913757
0.2109025
1.2010512
0.4586306
-1.505861
#i = grep {$F[$_] eq "M123"} 0..$#F if $.==1 for the header line, get index for which column value matches the string M123
exit if !#i exit if no match found
print #F[#i] print the matched column
assumes there'll be only one column match
for multiple matches, use
perl -lane '#i = grep {$F[$_] =~ /^(M121|M126)$/} 0..$#F if $.==1; exit if !#i;
print join " ", #F[#i]' ip.txt
Another in awk:
$ awk 'NR==1 {for(i=NF;i>0;i--) if($i=="M125") break; if(!i) exit} {print $i}' file
M125
1.00296561
0.14591103
2.12969375
-0.40437672
-0.95551833
0.88806507
0.05119051
-1.83607523
-0.61025656
-0.03861159
Explained:
NR==1 { # for the first record
for(i=NF;i>0;i--) # iterate fields backwards for change
if($i=="M125") break # until desired column, remember i
if (!i) exit # if column not found, exit
}
{print $i} # print value from ith field
If you are more familiar with Python:
import csv
column_name = "M125"
with open("file", "rb") as f:
data_dict = csv.DictReader(f, delimiter=" ")
print column_name
for item in data_dict:
print item[column_name]
To do anything with columns ("fields" in awk) by name rather than number you should first create an array that maps the field name to number and from then on just access the fields using that array indexed by the field name(s) rather than accessing them directly by field number(s):
$ awk 'NR==1{for (i=1;i<=NF;i++) f[$i]=i} {print $(f["M124"])}' file
M124
0.15923644
0.83648981
0.13582824
-0.63679375
0.02819133
-0.00463802
-0.26083822
-0.54109756
-0.69181051
0.54871682
or if you don't want to hard-code the column name:
$ awk -v c=M124 'NR==1{for (i=1;i<=NF;i++) f[$i]=i} {print $(f[c])}' file
M124
0.15923644
0.83648981
0.13582824
-0.63679375
0.02819133
-0.00463802
-0.26083822
-0.54109756
-0.69181051
0.54871682
and to print any number of columns in the order you choose:
$ awk -v cols='M129 M124' 'NR==1{for (i=1;i<=NF;i++) f[$i]=i; n=split(cols,c)} {for (i=1;i<=n;i++) printf "%s%s", $(f[c[i]]), (i<n ? OFS : ORS)}' file
M129 M124
1.45223392 0.15923644
2.18074729 0.83648981
1.04353114 0.13582824
0.54029264 -0.63679375
0.09207864 0.02819133
0.01244763 -0.00463802
-0.01275275 -0.26083822
-0.96973544 -0.54109756
0.6870383 -0.69181051
0.85673905 0.54871682
I have 2 files. Basically i want to match the column names from File 1 with the column name listed in the File 2. The resulting output File should have data for the column that matches with File 2 and Null value for the remaining column name in File 2.
Example:
file1
Name|Phone_Number|Location|Email
Jim|032131|xyz|xyz#qqq.com
Tim|037903|zzz|zzz#qqq.com
Pim|039141|xxz|xxz#qqq.com
File2
Location
Name
Age
Based on these 2 files, I want to create new file which has data in the below format:
Output:
Location|Name|Age
xyz|Jim|Null
zzz|Tim|Null
xxz|Pim|Null
Is there a way to get this result using join, awk or sed. I tried with join but couldnt get it working.
$ cat tst.awk
BEGIN { FS=OFS="|" }
NR==FNR { names[++numNames] = $0; next }
FNR==1 {
for (nameNr=1;nameNr<=numNames;nameNr++) {
name = names[nameNr]
printf "%s%s", name, (nameNr<numNames?OFS:ORS)
}
for (i=1;i<=NF;i++) {
name2fldNr[$i] = i
}
next
}
{
for (nameNr=1;nameNr<=numNames;nameNr++) {
name = names[nameNr]
fldNr = name2fldNr[name]
printf "%s%s", (fldNr?$fldNr:"Null"), (nameNr<numNames?OFS:ORS)
}
}
$ awk -f tst.awk file2 file1
Location|Name|Age
xyz|Jim|Null
zzz|Tim|Null
xxz|Pim|Null
Get the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
I'd suggest using csvcut, which is part of CSVKit (https://csvkit.readthedocs.org), along the lines of the following:
#!/bin/bash
HEADERS=File2
PSV=File1
headers=$(tr '\n' , < "$HEADERS" | sed 's/,$//' )
awk '-F|' '
BEGIN {OFS=FS}
NR==1 {print $0,"Age"; next}
{print $0, "Null"}' "$PSV" ) |\
csvcut "-d|" -c "$headers"
I realize this may not be entirely satisfactory, but csvcut doesn't currently have options to handle missing columns or translate missing data to a specified value.
I have a file file1 with the following content
{"name":"clio5", "value":"13"}
{"name":"citroen_c4", "value":"23"}
{"name":"citroen_c3", "value":"12"}
{"name":"golf4", "value":"16"}
{"name":"golf3", "value":"8"}
I want to look for the line which contains the word clio5 and then replace the found line by the following string
string='{"name":"clio5", "value":"1568688554"}'
$ string='{"name":"clio5", "value":"1568688554"}'
$ awk -F'"(:|, *)"' -v string="$string" 'BEGIN{split(string,s)} {print ($2==s[2]?string:$0)}' file
{"name":"clio5", "value":"1568688554"}
{"name":"citroen_c4", "value":"23"}
{"name":"citroen_c3", "value":"12"}
{"name":"golf4", "value":"16"}
{"name":"golf3", "value":"8"}
$ string='{"name":"citroen_c3", "value":"1568688554"}'
$ awk -F'"(:|, *)"' -v string="$string" 'BEGIN{split(string,s)} {print ($2==s[2]?string:$0)}' file
{"name":"clio5", "value":"13"}
{"name":"citroen_c4", "value":"23"}
{"name":"citroen_c3", "value":"1568688554"}
{"name":"golf4", "value":"16"}
{"name":"golf3", "value":"8"}
Updated the above based on #dogbane's comment so it will work even if the text contains "s. It will still fail if the text can contain ":" (with appropriate escapes) but that seems highly unlikely and the OP can tell us if it's a valid concern.
First you extract the name part from your $string as
NAME=`echo $string | sed 's/[^:]*:"\([^"]*\).*/\1/'`
Then, use the $NAME to replace the string as
sed -i "/\<$NAME\>/s/.*/$string/" file1
Use awk like this:
awk -v str="$string" -F '[,{}:]+' '{
split(str, a);
if (a[3] ~ $3)
print str;
else print
}' file.json
here is column 6 in a file:
ttttttttttt
tttttttttt
ttttttttt
tttttttattt
tttttttttt
ttttttttttt
how can I use awk to print out lines that include "a"
If you only want to search the sixth column, use:
awk '$6 ~ /a/' file
If you want the whole line, any of these should work:
awk /a/ file
grep a file
sed '/^[^a]*$/d' file
If you wish to print only those lines in which 6th column contains a then this would work -
awk '$6~/a/' file
if it is an exact match (which yours is not) you're looking for:
$6 == "a"
http://www.pement.org/awk/awk1line.txt
is an excellent resource
awk can also tell you where the pattern is in the column:
awk '{++line_num}{ if ( match($6,"a")) { print "found a at position",RSTART, " line " ,line_num} }' file
though this example will only show the first "a" in column 6; a for loop would be needed to show all instances (I think)
You could try
gawk '{ if ( $1 ~ /a/ ) { print $1 } }' filename