My data :
"1,2,3,4,5,64,3,9",,,,,1,aine
"2,3,4,5",,,,,3,bb
"3,4,5,6,6,2",,,,,2,ff
I have to transpose values inside "...." delimiter like this : how to transpose values two by two using shell?
and Output the result (2 columns) in a new file with the filename = (last-1) columns digits. I have to transpose for each lines of my input file.
What I would like :
$ ls
1 2 3 4 5 6 7 8
example : cat 1
1 2
3 4
5 64
3 9
cat 2 :
3 4
5 6
6 2
cat 3 :
2 3
4 5
Bonus : If I can get every last words (last columns) as title of new files It would be perfect.
Ok, it took a time but i finally solved your problem with the code below:
#!/bin/bash
while read -r LINE; do
FILE_NAME=$(echo {$LINE##*,,,,,} | cut -d ',' -f 1 | tr -d "\"")
DATA=$(echo ${LINE%%,,,,,*} | tr -d "\"" | tr "," " ")
touch $FILE_NAME
i=1
for num in $DATA ;do
echo -n "$num"
if [[ $(($i%2)) == 0 ]]; then
echo ""
else
echo -n " "
fi
i=$((i+1))
done > $FILE_NAME
done < input.txt
in my solution i imagine that your input should be placed in file input.txt and all of your input lines have ,,,,, as a separator. Works like a charm with your sample input.
Assuming there are no colons in the input (choose a different temporary delimiter if necessary) the first part can be done with:
awk '{s = ""; n = split($2,k,","); for(i = 1; i <= n; i+=2 ) { s = sprintf( "%s%c%s:%s", s, s ? ":" : "", k[i+1], k[i])} $2 = s}1' FS=\" OFS=\" input | sort -t , -k6n | tr : ,
eg:
$ cat input
"1,2,3,4,5,64,3,9",,,,,1,aine
"2,3,4,5",,,,,3,bb
"3,4,5,6,6,2",,,,,2,ff
$ awk '{s = ""; n = split($2,k,","); for(i = 1; i <= n; i+=2 ) { s = sprintf( "%s%c%s:%s", s, s ? ":" : "", k[i+1], k[i])} $2 = s}1' FS=\" OFS=\" input | sort -t , -k6n | tr : ,
"2,1,4,3,64,5,9,3",,,,,1,aine
"4,3,6,5,2,6",,,,,2,ff
"3,2,5,4",,,,,3,bb
But it's not clear why you want to do the first part at all when you can just skip straight to part 2 with:
awk '{n = split($2,k,","); m = split($3, j, ","); fname = j[6];
for( i = 1; i <= n; i+=2 ) printf("%d %d\n", k[i+1], k[i]) > fname}' FS=\" input
My answer can't keep up with the changes to the question! If you are outputting the lines into files, then there is no need to sort on the penultimate column. If you want the filenames to be the final column, it's not clear why you ever mentioned using the penultimate column at all. Just change fname in the above to j[7] to get the final column.
I have a text file:
$cat ifile.txt
this is a text file
assign x to 9 and y to 10.0702
define f(x)=x+y
I would like to disable the original line and divide the x-value by 2 and multiply the y-value by 2
My desired output is
$cat ofile.txt
this is a text file
#assign x to 9 and y to 10.0702
assign x to 5 and y to 20.1404
define f(x)=x+y
Here 5 is calculate from 9/2 and rounded to the next integer and 20.14 is calculated from 10.07x2 and not rounded
I am thinking of the following way, but can't write a script.
if [ line contains "assign x to" ]; then new_x_value=[next word]/2
if [ line contains "and y to" ]; then new_y_value=[next word]x2
if [ line contains "assign x to" ];
then disable it and add a line "assign x to new_x_value and y to new_y_value"
Would you please try the following:
#!/bin/bash
pat="(assign x to )([[:digit:]]+)( and y to )([[:digit:].]+)"
while IFS= read -r line; do
if [[ $line =~ $pat ]]; then
echo "#$line"
x2=$(echo "(${BASH_REMATCH[2]} + 1) / 2" | bc)
y2=$(echo "${BASH_REMATCH[4]} * 2" | bc)
echo "${BASH_REMATCH[1]}$x2${BASH_REMATCH[3]}$y2"
else
echo "$line"
fi
done < ifile.txt > ofile.txt
Output:
this is a text file
#assign x to 9 and y to 10.0702
assign x to 5 and y to 20.1404
define f(x)=x+y
The regex (assign x to )([[:digit:]]+)( and y to )([[:digit:].]+) matches
a literal string, followed by digits, followed by a literal string,
and followed by digits including decimal point.
The bc command (${BASH_REMATCH[2]} + 1) / 2 caclulates the ceiling
value of the input divided by 2.
The next bc command ${BASH_REMATCH[4]} * 2 multiplies the input by 2.
The reason I have picked bash is just because it supports back reference in regex and is easier to parse and reuse the input parameters than awk. As often pointed out, bash is not suitable for processing large files due to the performance reason. If you plan to large / multiple files, it will be recommended to use other languages like perl.
With perl you can say:
perl -pe 's|(assign x to )([0-9]+)( and y to )([0-9.]+)|
"#$&\n" . $1 . int(($2 + 1) / 2) . $3 . $4 * 2|ge' ifile.txt > ofile.txt
[EDIT]
If your ifile.txt looks like:
this is a text file
assign x to 9 and y to 10.0702 45
define f(x)=x+y
There are more than one space before the numbers.
One more value exists at the end (after whitespaces).
Then please try the following instead:
pat="(assign x to +)([[:digit:]]+)( and y to +)([[:digit:].]+)( +)([[:digit:].]+)"
while IFS= read -r line; do
if [[ $line =~ $pat ]]; then
echo "#$line"
x2=$(echo "(${BASH_REMATCH[2]} + 1) / 2" | bc)
y2=$(echo "${BASH_REMATCH[4]} * 2" | bc)
y3=$(echo "${BASH_REMATCH[6]} * 2" | bc)
echo "${BASH_REMATCH[1]}$x2${BASH_REMATCH[3]}$y2${BASH_REMATCH[5]}$y3"
else
echo "$line"
fi
done < ifile.txt > ofile.txt
Result:
this is a text file
#assign x to 9 and y to 10.0702 45
assign x to 5 and y to 20.1404 90
define f(x)=x+y
The plus sign after a whitespace is a regex quantifier and defines the number of repetition. In this case it matches one or more whitespace(s).
One in awk:
awk '
/assign/ { # when assign met in record
for(i=1;i<=NF-1;i++) # iterate from the beginning
if($i=="to" && $(i-1)=="x") # if to x
$(i+1)=((v=$(i+1)/2)>(u=int(v))?u+1:u) # ceil of division
else if($i=="to" && $(i-1)=="y") # if to y
$(i+1)*=2 # multiply by 2
}1' file # output
Output:
this is a text file
assign x to 5 and y to 20.1404
define f(x)=x+y
Sanity checking of the ceiling calculation left as homework...
awk '{if(match($0,/^assign/)){b=$0;split($0,a," ");a[8]=a[8]/2;a[4]=a[4]/2; for (x in a) {c = a[x] " " c; $0 = "#" b "\n" c } } { print }}'
Demo :
:>awk ' { if(match ($0, /^assign/)) {b=$0;split($0,a," ");a[8]=a[8]/2; a[4]=a[4]/2; for (x in a) {c = a[x] " " c; $0 = "#" b "\n" c } } { print }}' <ifile
this is a text file
#assign x to 9 and y to 10.0702
to x assign 5.0351 to y and 4.5
define f(x)=x+y
:>
Explanation:
awk ' {
if(match ($0, /^assign/)) <--- $0 is whole input record. ^ is start of line.
We are checking if record is starting with "assign"
{b=$0; <-- Assign input value to variable b
split($0,a," "); <-- Create a array by splitting input record with space as separator
a[8]=a[8]/2; a[4]=a[4]/2; <-- Divide value stored in 8 and 4 index
for (x in a) <-- Loop for getting all values of array
{c = a[x] " " c; <-- Create a variable by concatenating values of a
$0 = "#" b "\n" c <-- Update value of current record. "\n" new line operator
} }
{ print }}'
I want to make pairs of words based on the third column (identifier). My file is similar to this example:
A ID.1
B ID.2
C ID.1
D ID.1
E ID.2
F ID.3
The result I want is:
A C ID.1
A D ID.1
B E ID.2
C D ID.1
Note that I don't want to obtain the same word pair in the opposite order. In my real file some words appear more than one time with different identifiers.
I tried this code which works well but requires a lot of time (and I don't know if there are redundancies):
counter=2
cat filtered_go_annotation.txt | while read f1 f2; do
tail -n +$counter go_annotation.txt | grep $f2 | awk '{print "'$f1' " $1}';
((counter++))
done > go_network2.txt
The 'tail' is used to delete a line when it's read.
Awk solution:
awk '{ a[$2] = ($2 in a? a[$2] FS : "") $1 }
END {
for (k in a) {
len = split(a[k], items);
for (i = 1; i <= len; i++)
for (j = i+1; j <= len; j++)
print items[i], items[j], k
}
}' filtered_go_annotation.txt
The output:
A C ID.1
A D ID.1
C D ID.1
B E ID.2
With GNU awk for sorted_in and true multi-dimensional arrays:
$ cat tst.awk
{ vals[$2][$1] }
END {
PROCINFO["sorted_in"] = "#ind_str_asc"
for (i in vals) {
for (j in vals[i]) {
for (k in vals[i]) {
if (j != k) {
print j, k, i
}
}
delete vals[i][j]
}
}
}
$ awk -f tst.awk file
A C ID.1
A D ID.1
C D ID.1
B E ID.2
I wonder if this would work (in GNU awk):
$ awk '
($2 in a) && !($1 in a[$2]) { # if ID.x is found in a and A not in a[ID.X]
for(i in a[$2]) # loop all existing a[ID.x]
print i,$1,$2 # and output combination of current and all previous matching
}
{
a[$2][$1] # hash to a
}' file
A C ID.1
A D ID.1
C D ID.1
B E ID.2
in two steps
$ sort -k2 file > file.s
$ join -j2 file.s{,} | awk '!(a[$2,$3]++ + a[$3,$2]++){print $2,$3,$1}'
A C ID.1
A D ID.1
C D ID.1
B E ID.2
If your input is large, it may be faster to solve it in steps, e.g.:
# Create temporary directory for generated data
mkdir workspace; cd workspace
# Split original file
awk '{ print $1 > $2 }' ../infile
# Find all combinations
perl -MMath::Combinatorics \
-n0777aE \
'
$c=Math::Combinatorics->new(count=>2, data=>[#F]);
while(#C = $c->next_combination) {
say join(" ", #C) . " " . $ARGV
}
' *
Output:
C D ID.1
C A ID.1
D A ID.1
B E ID.2
Perl
solution using regex backtracking
perl -n0777E '/^([^ ]*) (.*)\n(?:.*\n)*?([^ ]*) (\2)\n(?{say"$1 $3 $2"})(?!)/mg' foo.txt
flags see perl -h.
^([^ ]*) (.*)\n : matches a line with at least one space first capturing group at the left side of first space, second capturing group the right side.
(?:.*\n)*?: matches (without capturing) 0 or more lines lazily to try following pattern first before matching more lines.
([^ ]*) (\2)\n : similar to first match using backreference \2 to match a line with the same key.
(?{say"$1 $3 $2"}) : code to display the captured groups
(?!) : to make the match fail to backtrack.
Note that it could be shortened a bit
perl -n0777E '/^(\S+)(.+)[\s\S]*?^((?1))(\2)$(?{say"$1 $3$2"})(?!)/mg' foo.txt
Yet another awk making use of the redefinition of $0. This makes the solution of RomanPerekhrest a bit shorter :
{a[$2]=a[$2] FS $1}
END { for(i in a) { $0=a[i]; for(j=1;j<NF;j++)for(k=j+1;k<=NF;++k) print $j,$k,i} }
I have a text file mytext.txt, each line of the text is a sentence:
the quick brown fox jumps over the lazy dog
colorless green ideas sleep furiously
Then I have a dictionary file dict.txt like this:
the: A
quick: B
brown: C
fox: D
jumps: E
over: F
lazy: G
dog: H
colorless: I
green: J
ideas: K
sleep: L
furiously: M
I want to replace each word in mytext.txt with the value in dict.txt, like this:
A B C D E F A G H
I J K L M
How can I do it using awk or sed?
If your dict.txt does not have any special chars, a very fast solution is to convert the content of dict.txt into a sed expresion:
sed 's#^#s/#;s#: #/#;s#$#/g;#' dict.txt
will result in
s/the/A/g;
s/quick/B/g;
s/brown/C/g;
s/fox/D/g;
s/jumps/E/g;
s/over/F/g;
s/lazy/G/g;
s/dog/H/g;
s/colorless/I/g;
s/green/J/g;
s/ideas/K/g;
s/sleep/L/g;
s/furiously/M/g;
now this can be used for another sed:
sed -f <(sed 's#^#s/#;s#: #/#;s#$#/g;#' dict.txt) mytext.txt
output:
A B C D E F A G H
I J K L M
But be aware if the dict file contains any characters special to sed / \ . * a.s.o. it wount work
Edit: added the g to sed
Update:
If only whole words should be replaced this will do the trick, because \b will look for word boundarys:
sed -f <(sed 's#^#s/\\b#;s#: #\\b/#;s#$#/g;#' dict.txt) mytext.txt
thx #jm666 for pointing this out.
Edit2:
If the dict.txt file is very long my original version might fail.
The version of #SLePort fixed this, thx.
I previously used "$()" instead of -f <()
$ awk -F'[: ]' 'FNR==NR{a[$1]=$NF;next}{for(i in a)gsub(i,a[i])}1' dist mytext
OR
$ awk -F'[: ]' 'FNR==NR{ a[$1]=$NF; next }
{ for(i=1;i<=NF;i++) if($i in a)$i=a[$i] }1' dist mytext
Input
$ cat mytext
the quick brown fox jumps over the lazy dog
colorless green ideas sleep furiously
$ cat dist
the: A
quick: B
brown: C
fox: D
jumps: E
over: F
lazy: G
dog: H
colorless: I
green: J
ideas: K
sleep: L
furiously: M
Output
$ awk -F'[: ]' 'FNR==NR{a[$1]=$NF;next}{for(i in a)gsub(i,a[i])}1' dist mytext
A B C D E F A G H
I J K L M
$ awk -F'[: ]' 'FNR==NR{a[$1]=$NF; next}
{ for(i=1; i<=NF;i++) if($i in a)$i=a[$i] }1' dist mytext
A B C D E F A G H
I J K L M
here is another alternative with awk and sed
$ sed -f <(awk -F': ' '{print "s/\\b" $1 "\\b/" $2 "/g"}' dict) file
A B C D E F A G H
I J K L M
I have a file that looks like:
ignoretext
START
a b
c d
e
END
ignoretext
START
f g h
i
END
ignoretext
I want to translate that into rows of:
a b c d e
f g h i
Here is one way to do it with awk
awk '/END/ {ORS=RS;print "";f=0} f; /START/ {ORS=" ";f=1}' file
a b c d e
f g h i
Added a version that does not give space at the end of line. It may be shorter way to do this
awk 'a && !/END/ {printf FS} /END/ {print "";f=a=0} f {printf "%s",$0;a++} /START/ {f=1}'
a b c d e
f g h i
Here is another variant using GNU sed:
sed -n '/START/,/END/{:a;/START/d;/END/!{N;ba};s/\n/ /g;s/ END//;p}' file
a b c d e
f g h i
In a more readable format with explaination:
sed -n ' # Suppress default printing
/START/,/END/ { # For the range between /START/ and /END/
:a; # Create a label a
/START/d # If the line contains START, delete it
/END/! { # Until a line with END is seen
N # Append the next line to pattern space
ba # Branch back to label a to repeat
}
s/\n/ /g # Remove all new lines
s/ END// # Remove the END tag
p # Print the pattern space
}' file
Jotne's awk solution is probably the cleanest, but here's one way you can do it with GNU's version of sed:
sed -ne '/START/,/END/{/\(START\|END\)/!H}' \
-e '/END/{s/.*//;x;s/\n/ /g;s/^ *\| *$//\p}'
$ awk 'f{ if (/END/) {print rec; rec=sep=""; f=0} else {rec = rec sep $0; sep=" "} } /START/{f=1}' file
a b c d e
f g h i