Related
How can I add an index to a csv file using awk? For example lets assume I have a file
data.txt
col1,col2,col3
a1,b1,c1
a2,b2,c2
a3,b3,c3
I would like to add another column, which is the index. Basically I would like an output of
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3
I was trying to use awk '{for (i=1; i<=NF; i++) print $i}' but it does not seem to be working right. And what is the best way to just add a comma for the first line but add incrementing number and a comma to the rest of the lines?
You may use this awk solution:
awk '{print (NR == 1 ? "" : NR-2) "," $0}' file
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3
Use this Perl one-liner:
perl -pe '$_ = ( $. > 1 ? ($. - 2) : "" ) . ",$_";' data.txt > out.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
$. : Current input line number.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlvar: Perl predefined variables
I would use GNU AWK for this task following way, let file.txt content be
col1,col2,col3
a1,b1,c1
a2,b2,c2
a3,b3,c3
then
awk 'BEGIN{OFS=","}{print NR==1?"":i++,$0}' file.txt
gives output
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3
Explanation: firstly I inform GNU AWK that output field separator (OFS) is ,, so arguments to print will be concatenated using that character. Then for each line I use so-called ternary operator i.e. condition?valueiftrue:valueiffalse to decide what will be 1st argument, for 1st line (NR==1) it is empty string for all else it is counter which will be first returned then increased by 1, 2nd argument to print is always whole original line ($0).
(tested in gawk 4.2.1)
gawk 'sub("^",substr(++_",",3^(NF~NR)))' FS='^$' \_=-2
mawk 'sub("^",++_+NF ? _",":",")' FS='^$' \_=-2
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3
I have the following output to grep the value in this case "225". This value is actually a variable $pd so it could change depending on users input" It could be integer numbers or an alphanumeric character case-insensitive exact match. Example if value of variable is "225" then a "0225" or "11225" its not a valid output from the file Im reading it.
Input File:
10.20.223.10|2000-H1|1/1/2|DeviceX_4021|LG
10.20.223.10|2000-H1|1/1/3|Undiscoverable|Unkwn
10.20.225.10|2000-H1|1/1/5|DeviceZ_2050|LG
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
10.20.223.10|2000-H1|1/1/8|DeviceY_01225_|Kenmore
10.20.225.10|2000-H1|1/1/8|DeviceY_2250_|Kenmore
Desired Output File:
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
If user input is "lg"; then it should output the line without not ignoring it because the input file has "lg" in uppercase. (This part is already fixed on the script).
Desired Output:
10.20.223.10|2000-H1|1/1/2|DeviceX_4021|LG
10.20.225.10|2000-H1|1/1/5|DeviceZ_2050|LG
$ awk -F'|' -v n='225' '$4 ~ n' file
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
or if you don't want a partial match (e.g. against 1225) then one way is:
$ awk -F'|' -v n='225' '$4 ~ ("(^|[^0-9])" n "([^0-9]|$)")' file
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
or:
$ awk -F'|' -v n='225' '$4 ~ ("(^|_)" n "(_|$)")' file
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
There are other possibilities too. The right solution depends on the requirements you haven't told us about and will pass or fail when using input other then you've shown us yet.
awk
awk -F"|" -v var="[A-Za-z].225_" '$4 ~ var{print}'
sed
sed -n '/[A-Za-z].225./p'
grep
grep '[A-Za-z].225.'
Output
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
Using sed:
sed -n '/^\([^|]*\|\)\{3\}[^|]*225/p' < input
Explanation:
the -n option disables automatic output at the end of each sed cycle
the pattern matches arbitrary contents of the first three (\{3\}) columns of data via the \(parenthesized\) pattern [^|]*\| -- any number of non-delmiter characters followed by the column delimiter
it matches additional input at the beginning of the fourth column, but not spanning columns, with a similar subexpression: [^|]*
then comes the literal text you want to match
the p command after the pattern causes the line to be printed to sed's output in the event that it matches the pattern
There's almost certainly an awk solution too, but in Perl it's this:
$ perl -aF'\|' -ne '$F[3] =~ 225 and print' < input
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
-a: Autosplit the input into array #F
-F'\|: Set the autosplit delimiter to |
-n: Run code for each line in the input file
-e: Here's the code to run
$F[3]: The 4th element of the autosplit array #F
=~: Regex match
and print: Print the input line if the regex matches
Update: You can get the string you're interested in from a command line parameter by assigning it in a BEGIN block.
$ perl -aF'\|' -ne 'BEGIN { $x = shift } $F[3] =~ $x and print' 225 < input
I want to replace the ">" with variable names staring with ">" and ends with ".". But the following code is not printing the variable names.
for f in *.fasta;
do
nam=$(basename $f .fasta);
awk '{print $f}' $f | awk '{gsub(">", ">$nam."); print $0}'; done
Input of first file sample01.fasta:
cat sample01.fasta:
>textofDNA
ATCCCCGGG
>textofDNA2
ATCCCCGGGTTTT
Output expected:
>sample01.textofDNA
ATCCCCGGG
>sample01.textofDNA2
ATCCCCGGGTTTT
$ awk 'FNR==1{fname=FILENAME; sub(/[^.]+$/,"",fname)} sub(/^>/,""){$0=">" fname $0} 1' *.fasta
>sample01.textofDNA
ATCCCCGGG
>sample01.textofDNA2
ATCCCCGGGTTTT
Compared to the other answers you've got so far, the above will work in any awk, only does the file name calculation once per input file rather than once per line or once per >-line, won't fail if the file name contains other .s, won't fail if the file name contains &, and won't fail if the file name doesn't contain the string fasta..
Or like this? You don't really need the looping and basename or two awk invocations.
awk '{stub=gensub( /^([^.]+\.)fasta.*/ , "\\1", "1",FILENAME ) ; gsub( />/, ">"stub); print}' *.fasta
>sample01.textofDNA
ATCCCCGGG
>sample01.textofDNA2
ATCCCCGGGTTTT
Explanation: awk has knowledge of the filename it currently operates on through the built-in variable FILENAME; I strip the .fasta extension using gensub, and store it in the variable stub. The I invoke gsub to replace ">" with ">" and the content of my variable stub. After that I print it.
As Ed points out in the comments: gensub is a GNU extension and won't work on other awk implementations.
Could you please try following too.
awk '/^>/{split(FILENAME,array,".");print substr($0,1,1) array[1]"." substr($0,2);next} 1' Input_file
Explanation: Adding explanation for above code here.
awk '
/^>/{ ##Checking condition if a line starts from > then do following.
split(FILENAME,array,".") ##Using split function of awk to split Input_file name here which is stored in awk variable FILENAME.
print substr($0,1,1) array[1]"." substr($0,2) ##Printing substring to print 1st char then array 1st element and then substring from 2nd char to till last of line.
next ##next will skip all further statements from here.
}
1 ##1 will print all lines(except line that are starting from >).
' sample01.fasta ##Mentioning Input_file name here.
I have to write a script file to cut the following column and paste it the end of the same row in a new .arff file. I guess the file type doesn't matter.
Current file:
63,male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50'
67,male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1'
The output should be:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
how can I do this? using a Linux script file?
sed -r 's/^([^,]*),(.*)$/\2,\1/' Input_file
Brief explanation,
^([^,]*) would match the first field which separated by commas, and \1 behind refer to the match
(.*)$ would be the remainding part except the first comma, and \2 would refer to the match
Shorter awk solution:
$ awk -F, '{$(NF+1)=$1;sub($1",","")}1' OFS=, input.txt
gives:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
Explanation:
{$(NF+1)=$1 # add extra field with value of field $1
sub($1",","") # search for string "$1," in $0, replace it with ""
}1 # print $0
EDIT: Reading your comments following your question, looks like your swapping more columns than just the first to the end of the line. You might consider using a swap function that you call multiple times:
func swap(i,j){s=$i; $i=$j; $j=s}
However, this won't work whenever you want to move a column to the end of the line. So let's change that function:
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
So now you can do this:
$ cat tst.awk
BEGIN{FS=OFS=","}
{swap(1,NF+1); swap(2,5)}1
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
and:
$ awk -f tst.awk input.txt
male,t,145,233,typ_angina,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,f,160,286,asympt,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
Why using sed or awk, the shell can handle this easily
while read l;do echo ${l#*,},${l%%,*};done <infile
If it's a win file with \r
while read l;do f=${l%[[:cntrl:]]};echo ${f#*,},${l%%,*};done <infile
If you want to keep the file in place.
printf "%s" "$(while read l;do f=${l%[[:cntrl:]]};printf "%s\n" "${f#*,},${l%%,*}";done <infile)">infile
I have two files that look like:
**file1.txt**
"a","1","11","111"
"b","2","22","222"
"c","3","33","333"
"d","4","44","444"
"e","5","55","555"
"f","6","66","666"
**file2.txt**
"b"
"d"
"a"
"c"
"e"
"f"
I need to create a script that changes the order of file1 and begin with the order of file2. e.g.:
"b","2","22","222"
"d","4","44","444"
"a","1","11","111"
"c","3","33","333"
"e","5","55","555"
"f","6","66","666"
I created a command that looks like:
nawk '/^("b")/' file1 ; nawk '/^("d")/' file1 ; nawk '/^("a")/' file1 ; nawk '/^("c")/' file1 ; nawk '/^("e")/' file1 ; nawk '/^("f")/' file1
It does the trick, however I would like to further automate it, but don't know how to proceed. How could I create a command or variable that would look at line 1 of file2("b") and put it the above command, then look at line 2 of file2("d"), and put it in the above command, and so on. Basically if possible, I would like the command to look at file 2 and fill in the blanks in the above command. Any other more convenient commands you guys can suggest would be appreciated. Note that I currently have to manually insert the letters from file 2 in the above command.
The actual file may contain well over 100 lines
awk -F, 'NR==FNR { a[$1]=$0; next }
($1 in a) { print a[$1] }' file1 file2
This reads all of file1 into memory, then prints in the order of file2. If file1 is very large, this may not be feasible.
This is a common Awk idiom; search the many near-duplicates if you need a more detailed explanation.