Related
How can I add an index to a csv file using awk? For example lets assume I have a file
data.txt
col1,col2,col3
a1,b1,c1
a2,b2,c2
a3,b3,c3
I would like to add another column, which is the index. Basically I would like an output of
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3
I was trying to use awk '{for (i=1; i<=NF; i++) print $i}' but it does not seem to be working right. And what is the best way to just add a comma for the first line but add incrementing number and a comma to the rest of the lines?
You may use this awk solution:
awk '{print (NR == 1 ? "" : NR-2) "," $0}' file
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3
Use this Perl one-liner:
perl -pe '$_ = ( $. > 1 ? ($. - 2) : "" ) . ",$_";' data.txt > out.txt
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-p : Loop over the input one line at a time, assigning it to $_ by default. Add print $_ after each loop iteration.
$. : Current input line number.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
perldoc perlvar: Perl predefined variables
I would use GNU AWK for this task following way, let file.txt content be
col1,col2,col3
a1,b1,c1
a2,b2,c2
a3,b3,c3
then
awk 'BEGIN{OFS=","}{print NR==1?"":i++,$0}' file.txt
gives output
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3
Explanation: firstly I inform GNU AWK that output field separator (OFS) is ,, so arguments to print will be concatenated using that character. Then for each line I use so-called ternary operator i.e. condition?valueiftrue:valueiffalse to decide what will be 1st argument, for 1st line (NR==1) it is empty string for all else it is counter which will be first returned then increased by 1, 2nd argument to print is always whole original line ($0).
(tested in gawk 4.2.1)
gawk 'sub("^",substr(++_",",3^(NF~NR)))' FS='^$' \_=-2
mawk 'sub("^",++_+NF ? _",":",")' FS='^$' \_=-2
,col1,col2,col3
0,a1,b1,c1
1,a2,b2,c2
2,a3,b3,c3
I have the following output to grep the value in this case "225". This value is actually a variable $pd so it could change depending on users input" It could be integer numbers or an alphanumeric character case-insensitive exact match. Example if value of variable is "225" then a "0225" or "11225" its not a valid output from the file Im reading it.
Input File:
10.20.223.10|2000-H1|1/1/2|DeviceX_4021|LG
10.20.223.10|2000-H1|1/1/3|Undiscoverable|Unkwn
10.20.225.10|2000-H1|1/1/5|DeviceZ_2050|LG
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
10.20.223.10|2000-H1|1/1/8|DeviceY_01225_|Kenmore
10.20.225.10|2000-H1|1/1/8|DeviceY_2250_|Kenmore
Desired Output File:
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
If user input is "lg"; then it should output the line without not ignoring it because the input file has "lg" in uppercase. (This part is already fixed on the script).
Desired Output:
10.20.223.10|2000-H1|1/1/2|DeviceX_4021|LG
10.20.225.10|2000-H1|1/1/5|DeviceZ_2050|LG
$ awk -F'|' -v n='225' '$4 ~ n' file
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
or if you don't want a partial match (e.g. against 1225) then one way is:
$ awk -F'|' -v n='225' '$4 ~ ("(^|[^0-9])" n "([^0-9]|$)")' file
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
or:
$ awk -F'|' -v n='225' '$4 ~ ("(^|_)" n "(_|$)")' file
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
There are other possibilities too. The right solution depends on the requirements you haven't told us about and will pass or fail when using input other then you've shown us yet.
awk
awk -F"|" -v var="[A-Za-z].225_" '$4 ~ var{print}'
sed
sed -n '/[A-Za-z].225./p'
grep
grep '[A-Za-z].225.'
Output
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
Using sed:
sed -n '/^\([^|]*\|\)\{3\}[^|]*225/p' < input
Explanation:
the -n option disables automatic output at the end of each sed cycle
the pattern matches arbitrary contents of the first three (\{3\}) columns of data via the \(parenthesized\) pattern [^|]*\| -- any number of non-delmiter characters followed by the column delimiter
it matches additional input at the beginning of the fourth column, but not spanning columns, with a similar subexpression: [^|]*
then comes the literal text you want to match
the p command after the pattern causes the line to be printed to sed's output in the event that it matches the pattern
There's almost certainly an awk solution too, but in Perl it's this:
$ perl -aF'\|' -ne '$F[3] =~ 225 and print' < input
10.20.223.10|2000-H1|1/1/8|DeviceY_225_|Kenmore
-a: Autosplit the input into array #F
-F'\|: Set the autosplit delimiter to |
-n: Run code for each line in the input file
-e: Here's the code to run
$F[3]: The 4th element of the autosplit array #F
=~: Regex match
and print: Print the input line if the regex matches
Update: You can get the string you're interested in from a command line parameter by assigning it in a BEGIN block.
$ perl -aF'\|' -ne 'BEGIN { $x = shift } $F[3] =~ $x and print' 225 < input
I have a file that's basically an INI/CFG file the looks like this:
[thing-a]
attribute1=foo
attribute2=bar
attribute3=foobar
attribute4=barfoo
[thing-b]
attribute1=dog
attribute3=foofoo
attribute4=castles
[thing-c]
attribute1=foo
attribute4=barfoo
[thing-d]
attribute1=123455
attribute2=dogs
attribute3=biscuits
attribute4=1234
Each 'thing' has a set of attributes that could include all the same ones or a subset there of.
I am trying to write a small bash script that will replace the attributes for 'thing-c' with a predefined block $a1, $a2 & $a3 are generated elsewhere in the wider script:
NEW_BLOCK="[thing-c]
attribute1=${a1}
attribute2=${a2}
attribute3=${a3}"
I can find the right block with sed like this:
THING_BLOCK=$(sed -nr "/^\[thing-c\]/ { :l /^\s*[^#].*/ p; n; /^\[/ q; b l; }" ./myThingFile)
I'm not sure if i've gone down a rabbit hole or what with this and I'm pretty sure there is a better way of doing it.
I'm wanting to do what is:
sed "s/${THING_BLOCK}/${NEW_BLOCK}/"
But I can't quite figure out the multiline aspect to this and I'm not sure what the best route to take is.
Is there a way to do this sort of multiline find and replace with sed (or a better way with bash)
Is there a way to do this sort of multiline find and replace ...
Yes there is indeed a better way, albeit using awk:
awk -v blk="$NEW_BLOCK" -v RS= '{ORS = RT} $1 == "[thing-c]" {$0 = blk} 1' file
Using -v RS= we use an empty record separator that splits records in input file on each new line.
Another awk. Store the replacement to file2 and:
$ awk -v RS="" '
NR==FNR {
b=$0
next
}
$1~/thing-c/ {
$0=b
}
{
print (++c==1?"":ORS) $0
}' file2 file1
Output:
[thing-a]
attribute1=foo
attribute2=bar
attribute3=foobar
attribute4=barfoo
[thing-b]
attribute1=dog
attribute3=foofoo
attribute4=castles
[thing-c]
attribute1=${a1}
attribute2=${a2}
attribute3=${a3}
[thing-d]
attribute1=123455
attribute2=dogs
attribute3=biscuits
attribute4=1234
When you want to use sed(IMHO awk is better here), you must have "nice" data (no special characters that sed will try to handle and [ inside block thing-3).
I tested with
read -d '' -r NEW_BLOCK <<END
[thing-c]
attribute1=${a1}
attribute2=${a2}
attribute3=${a3}
END
For my solution I first need to replace newlines in $NEW_BLOCK with the two characters \n.
echo "This is the replacement string: ${NEW_BLOCK//$'\n'/\\n}"
With the "multi-line" option "-z" you can do
sed -rz "s/\[thing-c\][^[]*/${NEW_BLOCK//$'\n'/\\n}\n\n/" myThingFile
I want to replace the ">" with variable names staring with ">" and ends with ".". But the following code is not printing the variable names.
for f in *.fasta;
do
nam=$(basename $f .fasta);
awk '{print $f}' $f | awk '{gsub(">", ">$nam."); print $0}'; done
Input of first file sample01.fasta:
cat sample01.fasta:
>textofDNA
ATCCCCGGG
>textofDNA2
ATCCCCGGGTTTT
Output expected:
>sample01.textofDNA
ATCCCCGGG
>sample01.textofDNA2
ATCCCCGGGTTTT
$ awk 'FNR==1{fname=FILENAME; sub(/[^.]+$/,"",fname)} sub(/^>/,""){$0=">" fname $0} 1' *.fasta
>sample01.textofDNA
ATCCCCGGG
>sample01.textofDNA2
ATCCCCGGGTTTT
Compared to the other answers you've got so far, the above will work in any awk, only does the file name calculation once per input file rather than once per line or once per >-line, won't fail if the file name contains other .s, won't fail if the file name contains &, and won't fail if the file name doesn't contain the string fasta..
Or like this? You don't really need the looping and basename or two awk invocations.
awk '{stub=gensub( /^([^.]+\.)fasta.*/ , "\\1", "1",FILENAME ) ; gsub( />/, ">"stub); print}' *.fasta
>sample01.textofDNA
ATCCCCGGG
>sample01.textofDNA2
ATCCCCGGGTTTT
Explanation: awk has knowledge of the filename it currently operates on through the built-in variable FILENAME; I strip the .fasta extension using gensub, and store it in the variable stub. The I invoke gsub to replace ">" with ">" and the content of my variable stub. After that I print it.
As Ed points out in the comments: gensub is a GNU extension and won't work on other awk implementations.
Could you please try following too.
awk '/^>/{split(FILENAME,array,".");print substr($0,1,1) array[1]"." substr($0,2);next} 1' Input_file
Explanation: Adding explanation for above code here.
awk '
/^>/{ ##Checking condition if a line starts from > then do following.
split(FILENAME,array,".") ##Using split function of awk to split Input_file name here which is stored in awk variable FILENAME.
print substr($0,1,1) array[1]"." substr($0,2) ##Printing substring to print 1st char then array 1st element and then substring from 2nd char to till last of line.
next ##next will skip all further statements from here.
}
1 ##1 will print all lines(except line that are starting from >).
' sample01.fasta ##Mentioning Input_file name here.
I have to write a script file to cut the following column and paste it the end of the same row in a new .arff file. I guess the file type doesn't matter.
Current file:
63,male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50'
67,male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1'
The output should be:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
how can I do this? using a Linux script file?
sed -r 's/^([^,]*),(.*)$/\2,\1/' Input_file
Brief explanation,
^([^,]*) would match the first field which separated by commas, and \1 behind refer to the match
(.*)$ would be the remainding part except the first comma, and \2 would refer to the match
Shorter awk solution:
$ awk -F, '{$(NF+1)=$1;sub($1",","")}1' OFS=, input.txt
gives:
male,typ_angina,145,233,t,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,asympt,160,286,f,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
Explanation:
{$(NF+1)=$1 # add extra field with value of field $1
sub($1",","") # search for string "$1," in $0, replace it with ""
}1 # print $0
EDIT: Reading your comments following your question, looks like your swapping more columns than just the first to the end of the line. You might consider using a swap function that you call multiple times:
func swap(i,j){s=$i; $i=$j; $j=s}
However, this won't work whenever you want to move a column to the end of the line. So let's change that function:
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
So now you can do this:
$ cat tst.awk
BEGIN{FS=OFS=","}
{swap(1,NF+1); swap(2,5)}1
func swap(i,j){
s=$i
if (j>NF){
for (k=i;k<NF;k++) $k=$(k+1)
$NF=s
} else {
$i=$j
$j=s
}
}
and:
$ awk -f tst.awk input.txt
male,t,145,233,typ_angina,left_vent_hyper,150,no,2.3,down,0,fixed_defect,'<50',63
male,f,160,286,asympt,left_vent_hyper,108,yes,1.5,flat,3,normal,'>50_1',67
Why using sed or awk, the shell can handle this easily
while read l;do echo ${l#*,},${l%%,*};done <infile
If it's a win file with \r
while read l;do f=${l%[[:cntrl:]]};echo ${f#*,},${l%%,*};done <infile
If you want to keep the file in place.
printf "%s" "$(while read l;do f=${l%[[:cntrl:]]};printf "%s\n" "${f#*,},${l%%,*}";done <infile)">infile