How to add specific columns to all text files in a directory in Linux? - linux

Can't find a solution, although thousands of variants of this question have been asked before.
I have several text files in a directory. I want to add one column to the beginning of each file. The added column for the first file is a column of 0's, for the second file it is a column of 1's, for the third file it is a column of 2's etc.
So, how to turn this:
0 2 3 2
3 3 3 1
4 3 4 2
to this:
0 0 2 3 2
0 3 3 3 1
0 4 3 4 2
and this:
2 3 4 3
2 3 3 5
5 4 1 2
to this:
1 2 3 4 3
1 2 3 3 5
1 5 4 1 2
in a loop?
I tried the following without any success:
#!/bin/bash
path=/prosjekt/tvs/QSexpt1_16K
jj=0
for file in "$path"/*.lsf;
do
awk '{$1=$(($jj)); print}' $file >> qq.txt
$jj=$(($jj+1))
done

Try this:
#!/bin/bash
path=/prosjekt/tvs/QSexpt1_16K
jj=0;
for file in "$path"/*.lsf; do
awk "{printf \"$jj\"; print}" "$file" >> qq.txt
jj=$(($jj+1))
done;
Problems in your try were: $jj=$(($jj+1)) - you need to assign variable without $; bash variable won't expand into ''.

Related

How to replace a number to another number in a specific column using awk

This is probably basic but I am completely new to command-line and using awk.
I have a file like this:
1 RQ22067-0 -9
2 RQ34365-4 1
3 RQ34616-4 1
4 RQ34720-1 0
5 RQ14799-8 0
6 RQ14754-1 0
7 RQ22101-7 0
8 RQ22073-1 0
9 RQ30201-1 0
I want the 0s to change to 1 in column3. And any occurence of 1 and 2 to change to 2 in column3. So essentially only changing numbers in column 3. But I am not changing the -9.
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
I have tried using (see below) but it has not worked
>> awk '{gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
>> awk '{gsub("1","2",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
Thank you.
With this code in your question:
awk '{gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
awk '{gsub("1","2",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
you're running both commands on the same input file and writing their
output to the same output file so only the output of the 2nd script
will be present in the output, and
you're trying to change 0 to 1
first and THEN change 1 to 2 so the $3s that start out as 0 would
end up as 2, you need to change the order of the operations.
This is what you should be doing, using your existing code:
awk '{gsub("1","2",$3); gsub("0","1",$3)}1' PRS_with_minus9.pheno.txt > PRS_with_minus9_modified.pheno
For example:
$ awk '{gsub("1","2",$3); gsub("0","1",$3)}1' file
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
The gsub() should also just be sub()s as you only want to perform each substitution once, and you don't need to enclose the numbers in quotes so you could just do:
awk '{sub(1,2,$3); sub(0,1,$3)}1' file
You can check the value of column 3 and then update the field value.
Check for 1 as the first rule because if the first check is for 0, the value will be set to 1 and the next check will set the value to 2 resulting in all 2's.
awk '
{
if($3==1) $3 = 2
if($3==0) $3 = 1
}
1' file
Output
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
With your shown samples and ternary operators try following code. Simple explanation would be, checking condition if 3rd field is 1 then set it to 2 else check if its 0 then set it to 0 else keep it as it is, finally print the line.
awk '{$3=$3==1?2:($3==0?1:$3)} 1' Input_file
Generic solution: Adding a Generic solution here, where we can have 3 awk variables named: fieldNumber in which you could mention all field numbers which we want to check for. 2nd one is: existValue which we want to match(in condition) and 3rd one is: newValue new value which needs to be there after replacement.
awk -v fieldNumber="3" -v existValue="1,0" -v newValue="2,1" '
BEGIN{
num=split(fieldNumber,arr1,",")
num1=split(existValue,arr2,",")
num2=split(newValue,arr3,",")
for(i=1;i<=num1;i++){
value[arr2[i]]=arr3[i]
}
}
{
for(i=1;i<=num;i++){
if($arr1[i] in value){
$arr1[i]=value[$arr1[i]]
}
}
}
1
' Input_file
This might work for you (GNU sed):
sed -E 's/\S+/\n&\n/3;h;y/01/12/;G;s/.*\n(.*)\n.*\n(.*)\n.*\n.*/\2\1/' file
Surround 3rd column by newlines.
Make a copy.
Replace all 0's by 1's and all 1's by 2's.
Append the original.
Pattern match on newlines and replace the 3rd column in the original by the 3rd column in the amended line.
Also with awk:
awk 'NR > 1 {s=$3;sub(/1/,"2",s);sub(/0/,"1",s);$3=s} 1' file
1 RQ22067-0 -9
2 RQ34365-4 2
3 RQ34616-4 2
4 RQ34720-1 1
5 RQ14799-8 1
6 RQ14754-1 1
7 RQ22101-7 1
8 RQ22073-1 1
9 RQ30201-1 1
the substitutions are made with sub() on a copy of $3 and then the copy with the changes is assigned to $3.
When you don't like the simple
sed 's/1$/2/; s/0$/1/' file
you might want to play with
sed -E 's/(.*)([01])$/echo "\1$((\2+1))"/e' file

Merge one-line texts into a data-frame with basic ubuntu shell commands

I have let's say two files Input1.txt and Input2.txt. Each of them is a text file containing a single line of 5 numbers separated by a tab.
For instance Input1.txt is
1 2 3 4 5
and Input2.txt is
6 7 8 9 10
The output that I desire is Output.txt :
Input1 1 2 3 4 5
Input2 6 7 8 9 10
So I want to merge the files in a table with an extra first column containing the names of the original files. Obviously I have more than 2 files (actually 1000) and I would like to make it with a for loop. You can assume that all my files are named as Input*.txt with * between 1 and 1000 and that they are all in the same directory.
I know how to do it with R, but I would like to make it with a basic line of commands in the ubuntu shell. Is it feasible ? Thanks for any help.
Assuming the line in Input1.txt, Input2.txt, etc. is terminated with a newline character, you can use
for i in Input*.txt
do
printf "%s " "$i"
cat "$i"
done > Output.txt
The result is
Input1.txt 1 2 3 4 5
Input2.txt 6 7 8 9 10
If you want to get Input1 etc. without .txt you can use
printf "%s " "${i%.txt}"

Bash code to struture proteomics data

I need help concerning retructuring my dataset so that I can perform the downstream analysis. I am presently dealing with proteomics data and want to perform comparative analysis. The problem is the protein ids. In general one protein can have more then 1 id and they are separated by ";". I need to print the entire line of the same protein with different protein ids. for example:-
Input file :
tom dick harry jan
a;b;c 1 2 3 4
d;e 4 5 7 3
desirable output:
tom dick harry jan
a 1 2 3 4
b 1 2 3 4
c 1 2 3 4
d 4 5 7 3
e 4 5 7 3
many many thanks in advance
$ awk 'NR==1{$0="key "$0} {split($1,a,/;/); for (i=1; i in a; i++) { $1=a[i]; print } }' file | column -t
key tom dick harry jan
a 1 2 3 4
b 1 2 3 4
c 1 2 3 4
d 4 5 7 3
e 4 5 7 3
You can trivially remove the word "key" from the output if you don't like it but IMHO having some columns with and some without headers is a very bad idea - just makes any further processing more difficult.
#!/bin/bash
read header
printf "%4s %s\n" "" "$header"
while true
do
read ids values
for id in $(tr ';' ' ' <<< "$ids")
do
printf "%-4s %s\n" "$id" "$values"
done
done
This reads the header and prints is (just slightly differently formatted), then it reads each line and prints for each of these a bunch of lines, one line for each id given in the beginning of the line. For finding the ids, the ids string is split over semicolon (;).

missing number from two squence

How do I findout missing number from two sequence using bash script
from example I have file which contain following data
1 1
1 2
1 3
1 5
2 1
2 3
2 5
output : missing numbers are
1 4
2 2
2 4
This awk one-liner gives the requested output for the specified input:
$ awk '$2!=l2+1&&$1==l1{for(i=l2+1;i<$2;i++)print l1,i}{l1=$1;l2=$2}' file
1 4
2 2
2 4
a solution using grep:
printf "%s\n" {1..2}" "{1..5} | grep -vf file

Records filtering

I have this kind of file file-1:
1 1 1.1552422143268792
1 2 1.1552422143268792
1 3 1.1552422143268792
1 4 1.1552422143268792
2 1 2.1906014042706916
2 2 2.1906014042706916
2 3 2.1906014042706916
2 4 2.1906014042706916
2 1 4.1906014042706916
2 2 4.1906014042706916
2 3 4.1906014042706916
2 4 4.1906014042706916
3 1 3.1876823799523781
3 2 3.1876823799523781
3 3 3.1876823799523781
3 4 3.1876823799523781
4 1 0.6213184222668061
4 2 0.6213184222668061
4 3 0.6213184222668061
4 4 0.6213184222668061
and I have antoher file too file-2
1
2
4
I would like to filter those records from file-1, in which the values of the first colum are the same as in file-2, so I would like to get this output
1 1 1.1552422143268792
1 2 1.1552422143268792
1 3 1.1552422143268792
1 4 1.1552422143268792
2 1 2.1906014042706916
2 2 2.1906014042706916
2 3 2.1906014042706916
2 4 2.1906014042706916
2 1 4.1906014042706916
2 2 4.1906014042706916
2 3 4.1906014042706916
2 4 4.1906014042706916
4 1 0.6213184222668061
4 2 0.6213184222668061
4 3 0.6213184222668061
4 4 0.6213184222668061
Can anybody help a little?
awk 'NR==FNR{f2[$1];next}$1 in f2' file-2 file-1
Very simple using join:
join file-1 file-2
The files must be sorted for join to work. The sort is based on text, not numeric values, so you may need to sort into a temp file first. Something like:
sort file-2 > sorted.tmp
sort file-1 | join - sorted.tmp
You can use the -f option in grep to read patterns from a file. But first you must change the patterns so that they match the first field only. You can do this by using sed to add a ^ to the beginning and a space to the end of each pattern in file-2, and using process substitution in your command.
The complete command is:
grep -f <(sed -e "s/^/^/g" -e "s/$/ /g" file-2) file-1
This might work for you:
sed 's/.*/\/^& \/p/' file-2 | sed -nf - file-1
Here is another way to do in awk:
awk 'NR==FNR{a[$1];next} !($1 in a){next}1' file-2 file-1

Resources