How to replace a number to another number in a specific column using awk - linux

This is probably basic but I am completely new to command-line and using awk.
I have a file like this:
1 RQ22067-0 -9
2 RQ34365-4 1
3 RQ34616-4 1
4 RQ34720-1 0
5 RQ14799-8 0
6 RQ14754-1 0
7 RQ22101-7 0
8 RQ22073-1 0
9 RQ30201-1 0
I want the 0s to change to 1 in column3. And any occurence of 1 and 2 to change to 2 in column3. So essentially only changing numbers in column 3. But I am not changing the -9.
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
I have tried using (see below) but it has not worked
>> awk '{gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
>> awk '{gsub("1","2",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
Thank you.

With this code in your question:
awk '{gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
awk '{gsub("1","2",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
you're running both commands on the same input file and writing their
output to the same output file so only the output of the 2nd script
will be present in the output, and
you're trying to change 0 to 1
first and THEN change 1 to 2 so the $3s that start out as 0 would
end up as 2, you need to change the order of the operations.
This is what you should be doing, using your existing code:
awk '{gsub("1","2",$3); gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
For example:
$ awk '{gsub("1","2",$3); gsub("0","1",$3)}1' file
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
The gsub() should also just be sub()s as you only want to perform each substitution once, and you don't need to enclose the numbers in quotes so you could just do:
awk '{sub(1,2,$3); sub(0,1,$3)}1' file

You can check the value of column 3 and then update the field value.
Check for 1 as the first rule because if the first check is for 0, the value will be set to 1 and the next check will set the value to 2 resulting in all 2's.
awk '
{
if($3==1) $3 = 2
if($3==0) $3 = 1
}
1' file
Output
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1

With your shown samples and ternary operators try following code. Simple explanation would be, checking condition if 3rd field is 1 then set it to 2 else check if its 0 then set it to 0 else keep it as it is, finally print the line.
awk '{$3=$3==1?2:($3==0?1:$3)} 1' Input_file
Generic solution: Adding a Generic solution here, where we can have 3 awk variables named: fieldNumber in which you could mention all field numbers which we want to check for. 2nd one is: existValue which we want to match(in condition) and 3rd one is: newValue new value which needs to be there after replacement.
awk -v fieldNumber="3" -v existValue="1,0" -v newValue="2,1" '
BEGIN{
num=split(fieldNumber,arr1,",")
num1=split(existValue,arr2,",")
num2=split(newValue,arr3,",")
for(i=1;i<=num1;i++){
value[arr2[i]]=arr3[i]
}
}
{
for(i=1;i<=num;i++){
if($arr1[i] in value){
$arr1[i]=value[$arr1[i]]
}
}
}
1
' Input_file

This might work for you (GNU sed):
sed -E 's/\S+/\n&\n/3;h;y/01/12/;G;s/.*\n(.*)\n.*\n(.*)\n.*\n.*/\2\1/' file
Surround 3rd column by newlines.
Make a copy.
Replace all 0's by 1's and all 1's by 2's.
Append the original.
Pattern match on newlines and replace the 3rd column in the original by the 3rd column in the amended line.

Also with awk:
awk 'NR > 1 {s=$3;sub(/1/,"2",s);sub(/0/,"1",s);$3=s} 1' file
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
the substitutions are made with sub() on a copy of $3 and then the copy with the changes is assigned to $3.

When you don't like the simple
sed 's/1$/2/; s/0$/1/' file
you might want to play with
sed -E 's/(.*)([01])$/echo "\1$((\2+1))"/e' file

Related

How to extract the number after specific word using awk?

I have several lines of text. I want to extract the number after specific word using awk.
I tried the following code but it does not work.
At first, create the test file by: vi test.text. There are 3 columns (the 3 fields are generated by some other pipeline commands using awk).
Index AllocTres CPUTotal
1 cpu=1,mem=256G 18
2 cpu=2,mem=1024M 16
3 4
4 cpu=12,gres/gpu=3 12
5 8
6 9
7 cpu=13,gres/gpu=4,gres/gpu:ret6000=2 20
8 mem=12G,gres/gpu=3,gres/gpu:1080ti=1 21
Please note there are several empty fields in this file.
what I want to achieve is to extract the number after the first gres/gpu= in each line (if no gres/gpu= occurs in this line, the default number is 0) using a pipeline like: cat test.text | awk '{some_commands}' to output 4 columns:
Index AllocTres CPUTotal GPUAllocated
1 cpu=1,mem=256G 18 0
2 cpu=2,mem=1024M 16 0
3 4 0
4 cpu=12,gres/gpu=3 12 3
5 8 0
6 9 0
7 cpu=13,gres/gpu=4,gres/gpu:ret6000=2 20 4
8 mem=12G,gres/gpu=3,gres/gpu:1080ti=1 21 3
Firstly: awk do not need cat, it could read files on its' own. Combining cat and awk is generally discouraged as useless use of cat.
For this task I would use GNU AWK following way, let file.txt content be
cpu=1,mem=256G
cpu=2,mem=1024M
cpu=12,gres/gpu=3
cpu=13,gres/gpu=4,gres/gpu:ret6000=2
mem=12G,gres/gpu=3,gres/gpu:1080ti=1
then
awk 'BEGIN{FS="gres/gpu="}{print $2+0}' file.txt
output
0
0
0
3
0
0
4
3
Explanation: I inform GNU AWK that field separator (FS) is gres/gpu= then for each line I do print 2nd field increased by zero. For lines without gres/gpu= $2 is empty string, when used in arithmetic context this is same as zero so zero plus zero gives zero. For lines with at least one gres/gpu= increasing by zero provokes GNU AWK to find longest prefix which is legal number, thus 3 (4th line) becomes 3, 4, (7th line) becomes 4, 3, (8th line) becomes 3.
(tested in GNU Awk 5.0.1)
With your shown samples in GNU awk you can try following code. Written and tested in GNU awk. Simple explanation would be using awk's match function where using regex gres\/gpu=([0-9]+)(escaping / here) and creating one and only capturing group to capture all digits coming after =. Once match is found printing current line followed by array's arr's 1st element +0(to print zero in case no match found for any line) here.
awk '
FNR==1{
print $0,"GPUAllocated"
next
}
{
match($0,/gres\/gpu=([0-9]+)/,arr)
print $0,arr[1]+0
}
' Input_file
Using sed
$ sed '1s/$/\tGPUAllocated/;s~.*gres/gpu=\([0-9]\).*~& \t\1~;1!{\~gres/gpu=[0-9]~!s/$/ \t0/}' input_file
Index AllocTres CPUTotal GPUAllocated
1 cpu=1,mem=256G 18 0
2 cpu=2,mem=1024M 16 0
3 4 0
4 cpu=12,gres/gpu=3 12 3
5 8 0
6 9 0
7 cpu=13,gres/gpu=4,gres/gpu:ret6000=2 20 4
8 mem=12G,gres/gpu=3,gres/gpu:1080ti=1 21 3
awk '
BEGIN{FS="\t"}
NR==1{
$(NF+1)="GPUAllocated"
}
NR>1{
$(NF+1)=FS 0
}
/gres\/gpu=/{
split($0, a, "=")
gp=a[3]; gsub(/[ ,].*/, "", gp)
$NF=FS gp
}1' test.text
Index AllocTres CPUTotal GPUAllocated
1 cpu=1,mem=256G 18 0
2 cpu=2,mem=1024M 16 0
3 4 0
4 cpu=12,gres/gpu=3 12 3
5 8 0
6 9 0
7 cpu=13,gres/gpu=4,gres/gpu:ret6000=2 20 4
8 mem=12G,gres/gpu=3,gres/gpu:1080ti=1 21 3

Recode column values in unix

1859115 2258379 24636 Yes 06S14028968 13 1 1 2
1859115 2258379 24636 Yes 06S14028968 13 1 1 2
1859116 2255037 21608 Yes 06S14028969 11 0 2 3
1859117 2268746 34027 Yes 06S14028970 10 0 2 1
Above is the example of my data set. I want to replace the values of 7th column in a way that 1 should be replaced by 2 and 0 should be replaced by 1. So the outcome i am expecting should be like following.
1859115 2258379 24636 Yes 06S14028968 13 2 1 2
1859115 2258379 24636 Yes 06S14028968 13 2 1 2
1859116 2255037 21608 Yes 06S14028969 11 1 2 3
1859117 2268746 34027 Yes 06S14028970 10 1 2 1
I have tried using this approach
awk 'NR==1{$10="Pheno";print;next}\
$7 == "1" {$10="2"};\
$7 == "0" {$10="1"}1' old.txt |column -t > new.txt
and then removing the first row and extracting columns of interest. But i need straight forward way.
This could be simply done by putting a simple condition to check if NF(number of fields in each line) is greater OR equal to 7 then increment 7th field with 1 and print edited/non-edited line then(by doing this we can avoid adding 1 if number of fields are lesser than 7 in any line).
awk 'NF>=7{$7+=1} 1' Input_file

missing number from two squence

How do I findout missing number from two sequence using bash script
from example I have file which contain following data
1 1
1 2
1 3
1 5
2 1
2 3
2 5
output : missing numbers are
1 4
2 2
2 4
This awk one-liner gives the requested output for the specified input:
$ awk '$2!=l2+1&&$1==l1{for(i=l2+1;i<$2;i++)print l1,i}{l1=$1;l2=$2}' file
1 4
2 2
2 4
a solution using grep:
printf "%s\n" {1..2}" "{1..5} | grep -vf file

How can I separate some repeated patterns in a row into multiple rows using bash script?

I have some problem with bash script.
I've got a string which has some repeated patterns like this.
1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 ...
Each fields is separated by tab key.
I want it to look like this...
1 2 3 4
1 2 3 4
1 2 3 4
…
How can I solve this problem using bash script like cut, sed, awk ... ?
I've tried some command like cut -f 'seq 4, 4, 40' example.txt
It doesn't work...
It looks very easy but so difficult to me...
You can use sed like this:
s='1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4'
p='1 2 3 4'
echo "$s"|sed "s/$p\s*/&\n/g"
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
Live Demo: http://ideone.com/P59OCJ
Here's a pure bash solution:
IFS=$'\t' set -- $(<input_file)
seen=()
while [[ $1 ]]; do
if (( ${seen[$1]} )); then # If we've seen the value before, start a new line.
echo
unset seen
fi
printf '%s ' "$1"
seen[$1]=1
shift
done
If you know the ending number of your sequence beforehand, you can do something like:
LAST_NUMBER=4
sed -e "s/$LAST_NUMBER\t*/&\n/g" < example.txt
Just replace 4 with the last number from the sequence
If you don't know the number, you have to search through it using the following:
#!/bin/bash
declare -A CHECKED_NUMBERS
LAST_NUMBER=
while read LINE; do
SPLIT_LINE=$(cut -d" " -f1- <<< "$LINE")
for number in $SPLIT_LINE; do
if [ "${CHECKED_NUMBERS[$number]}" == "1" ]; then
LAST_NUMBER=$number
else
CHECKED_NUMBERS[$number]=1
fi
done
done < example.txt
# do the replacement
sed -e "s/$LAST_NUMBER\t*/&\n/g" < example.txt
An awk version
awk '{for (i=1;i<=NF;i++) {printf "%s"(i%4?" ":"\n"),$i}}' file
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
An gnu awk version
awk -v RS="\t" '{printf "%s"(NR%4?" ":"\n"),$0}' file
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
xargs may help:
kent$ echo "1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4"|xargs -n4
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
This might work for you:
printf "%s\t%s\t%s\t%s\n" $string
or you want the fields space separated:
printf "%s %s %s %s\n" $string

Records filtering

I have this kind of file file-1:
1 1 1.1552422143268792
1 2 1.1552422143268792
1 3 1.1552422143268792
1 4 1.1552422143268792
2 1 2.1906014042706916
2 2 2.1906014042706916
2 3 2.1906014042706916
2 4 2.1906014042706916
2 1 4.1906014042706916
2 2 4.1906014042706916
2 3 4.1906014042706916
2 4 4.1906014042706916
3 1 3.1876823799523781
3 2 3.1876823799523781
3 3 3.1876823799523781
3 4 3.1876823799523781
4 1 0.6213184222668061
4 2 0.6213184222668061
4 3 0.6213184222668061
4 4 0.6213184222668061
and I have antoher file too file-2
1
2
4
I would like to filter those records from file-1, in which the values of the first colum are the same as in file-2, so I would like to get this output
1 1 1.1552422143268792
1 2 1.1552422143268792
1 3 1.1552422143268792
1 4 1.1552422143268792
2 1 2.1906014042706916
2 2 2.1906014042706916
2 3 2.1906014042706916
2 4 2.1906014042706916
2 1 4.1906014042706916
2 2 4.1906014042706916
2 3 4.1906014042706916
2 4 4.1906014042706916
4 1 0.6213184222668061
4 2 0.6213184222668061
4 3 0.6213184222668061
4 4 0.6213184222668061
Can anybody help a little?
awk 'NR==FNR{f2[$1];next}$1 in f2' file-2 file-1
Very simple using join:
join file-1 file-2
The files must be sorted for join to work. The sort is based on text, not numeric values, so you may need to sort into a temp file first. Something like:
sort file-2 > sorted.tmp
sort file-1 | join - sorted.tmp
You can use the -f option in grep to read patterns from a file. But first you must change the patterns so that they match the first field only. You can do this by using sed to add a ^ to the beginning and a space to the end of each pattern in file-2, and using process substitution in your command.
The complete command is:
grep -f <(sed -e "s/^/^/g" -e "s/$/ /g" file-2) file-1
This might work for you:
sed 's/.*/\/^& \/p/' file-2 | sed -nf - file-1
Here is another way to do in awk:
awk 'NR==FNR{a[$1];next} !($1 in a){next}1' file-2 file-1

Resources