Replace column between two files in Unix - linux

How can i replace the 6th column of my following data.fam file as following with 3rd second column of my .txt file.
My data.fam file looks like this
20481 20481 0 0 2 -9
20483 20483 0 0 1 1
20488 20488 0 0 2 1
20492 20492 0 0 1 1
20493 20493 0 0 1 -9
20498 20498 0 0 2 -9
data.txt file looks like this.
20481 2 1
20488 2 1
20483 2 1
20493 1 0
22822 2 1
20498 -9 -9
22692 1 0
The output.fam should be like
20481 20481 0 0 2 1
20483 20483 0 0 1 1
20488 20488 0 0 2 1
20492 20492 0 0 1 0
20493 20493 0 0 1 0
20498 20498 0 0 2 1
I have tried awk 'FNR==NR{a[$1]=$2;next}{$6=a[$3]}1' data.txt data.fam >output.fam
But the output is without the 6th column as following at all like the following. Where i am doing wrong??
20481 20481 0 0 2
20483 20483 0 0 1
20488 20488 0 0 2
20492 20492 0 0 1
20493 20493 0 0 1
20498 20498 0 0 2

Related

make lower half of a n*n list zero without using any functions in python

I tried to solve it by using 2 for loops and an if statement . But i was unable to get the desired output.
INPUT-
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
thislist=[1]*10
thislist=[thislist]*10
print(thislist)
for i in range(10):
for j in range(10):
print(thislist[i][j], end = " ")
print()
print()
for i in range(10):
for j in range(10):
if i>j:
thislist[i][j]=0
for i in range(10):
for j in range(10):
print(thislist[i][j], end = " ")
print()
This was the output i got:
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 0 1
but when i made a list using the below method i got the desired output.
thislist=[[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1]]
print(thislist)
for i in range(10):
for j in range(10):
if i>j:
thislist[i][j]=0
for i in range(10):
for j in range(10):
print(thislist[i][j], end = " ")
print()
note-This is the desired OUTPUT-
1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1
0 0 1 1 1 1 1 1 1 1
0 0 0 1 1 1 1 1 1 1
0 0 0 0 1 1 1 1 1 1
0 0 0 0 0 1 1 1 1 1
0 0 0 0 0 0 1 1 1 1
0 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 0 1
Can someone explain whats the difference between the above 2 codes?
As you pointed out, the problem comes from the manner you created your list of list. In your first example, you do something like this:
list1 = [1]*10
list_of_list1=[list1]*10
list_of_list1 is actually a list of shallow copies of the original list1. Then if you modify a value in list_of_list1, the modification will occurs in all the rows of list_of_list1.
The opposit of a shallow copy is a deep copy. You might want to search more info on the Internet about this topic
In the mean time, you can simply try this.
thislist = []
for row in range(10):
list1 = [1]*10
thislist.append(list1)
But I usually go with numpy when it is available.

Python: How do I assign string values to numeric data?

Running the four commands results in below output, from a dataframe called cancer.
$ print("\n target")
$ print(cancer.target)
$ print("\n target_names")
$ print(cancer.target_names)
target
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 0 0 1 0 0 1 1 1 1 0 1 0 0 1 1 1 1 0 1 0 0
1 0 1 0 0 1 1 1 0 0 1 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 0 1 1 1 1 0 1 1 0 1 1
1 1 1 1 1 1 0 0 0 1 0 0 1 1 1 0 0 1 0 1 0 0 1 0 0 1 1 0 1 1 0 1 1 1 1 0 1
1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 0 1 1 0 0 1 1 0 0 1 1 1 1 0 1 1 0 0 0 1 0
1 0 1 1 1 0 1 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 1 1 0 1 0 0 0 0 1 1 0 0 1 1
1 0 1 1 1 1 1 0 0 1 1 0 1 1 0 0 1 0 1 1 1 1 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 0 1 1 0 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1 1 0 0 0 1 1
1 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0
0 1 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 0 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1
1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 1 1 1 1 0 1 1
0 1 0 1 1 0 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 1
1 1 1 1 1 1 0 1 0 1 1 0 1 1 1 1 1 0 0 1 0 1 0 1 1 1 1 1 0 1 1 0 1 0 1 0 0
1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 0 0 0 0 0 0 1]
target_names
['malignant' 'benign']
How would I be able to assign "malignant" to 0 and "benign" to 1, and vice versa?
You can map it by using a dictionary
target=[0,0,0,0,0,1,1,1,0,1,0,0,1]
d={0:'malignant',1:'benign'}
target=[d[t] for t in target]
print(target)
['malignant', 'malignant', 'malignant', 'malignant', 'malignant', 'benign', 'benign', 'benign', 'malignant', 'benign', 'malignant', 'malignant', 'benign']
What so you mean by assign?
Something like this?
for i in range(len(cancer)):
if target[i]==0:
target[i]=target_names[0]
elif target[i]==1:
target[i]=target_names[1]

J Language rank of power function

t=:1
test=: monad define
t=.y
t=. t, 0
)
testloop=: monad def'test^:y t'
testloop 1
1 0
testloop 2
1 0 0
testloop 10
1 0 0 0 0 0 0 0 0 0 0
In order to simplify this
(testloop 0),(testloop 1), (testloop 2), ...
110100100010000...
I tried
, testloop"0 (i.10)
but it gives
1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0...
It seems like I have a problem with a rank, I can't figure out which one to use.
I would be grateful if you could help me on this issue.
Thank you!
This is not so much a rank problem as the fact that the results are padded with zeros so that the row lengths match.
testloop 1
1 0
testloop 2
1 0 0
testloop"0 [ 1 2
1 0 0
1 0 0
testloop"0 [ 1 2 3
1 0 0 0
1 0 0 0
1 0 0 0
If I redefine your test and testloop to add a different appending digit, we can see how the padding is working.
test2 =: 3 : 0
​t=. y
​t=. t,2
​)
test2loop=: monad def'test2^:y t'
test2loop"0 [1
1 2
test2loop"0 [2
1 2 2
test2loop"0 [ 1 2 NB. 0 padded in first row
1 2 0
1 2 2
test2loop"0 [ 1 2 3 NB. 0's padded in first two rows
1 2 0 0
1 2 2 0
1 2 2 2
To get around the padding issue I will use each=: &.> so that the results are boxed before combining to avoid the padding.
testloop each 1 2 3
+---+-----+-------+
|1 0|1 0 0|1 0 0 0|
+---+-----+-------+
testloop each i. 10
+-+---+-----+-------+---------+-----------+-------------+---------------+-----------------+-------------------+
|1|1 0|1 0 0|1 0 0 0|1 0 0 0 0|1 0 0 0 0 0|1 0 0 0 0 0 0|1 0 0 0 0 0 0 0|1 0 0 0 0 0 0 0 0|1 0 0 0 0 0 0 0 0 0|
+-+---+-----+-------+---------+-----------+-------------+---------------+-----------------+-------------------+
using ; to unbox and ravel the results
; testloop each i. 10
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
To be honest I would be more inclined to use the fact that complex numbers used as the left argument of # introduce 0's for padding. The number of 0's depends on the imaginary value of the complex number.
1j0 # 1
1
1j1 # 1
1 0
1j2 # 1
1 0 0
test3=: monad def '(1 j. y)#1'
test3 1
1 0
test3 2
1 0 0
test3 1 2
1 0 1 0 0
test3 i. 10
1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

How to find which combinations have the most frequency?

I have an SPSS data set with 500+ respondents and 18 symptoms that they could have.
Each symptom has its own variable Symptom01 = 1 means they have the symptom 1 Symptom02 = 0 means they dont have the symptom 2 etc etc
What I want to know is what combination of 3 symptoms is more frequent in my data set. For example how many people have symptom 1, 5 and 6; how many people have symptom 1, 2 and 3, etc.
I doesn't mean that they only have those symptoms. Theey could have others. I just want to know which group of 3 symptoms is more frequent in my dataset.
It's a lot of combinations so how would you do this?
Can someone help me?
Please note the macro below uses the variable names Symptom1, Symptom2 etc' instead of "Symptom01", "Symptom02"...
First creating some sample data to work on:
data list list/Symptom1 to Symptom18.
begin data
1 0 0 1 1 0 0 1 1 0 0 0 0 0 1 1 1 1
1 1 1 1 0 1 0 0 1 0 1 1 0 0 1 1 0 0
0 1 1 0 1 1 1 1 1 1 1 0 1 0 0 1 0 0
1 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0
1 0 1 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0
0 0 0 1 1 1 0 0 0 0 1 0 0 0 1 0 0 1
1 0 1 1 1 1 1 0 1 1 0 0 0 1 1 1 0 1
1 0 0 0 1 0 1 1 0 0 0 1 0 1 0 0 1 0
0 0 1 0 1 0 0 0 0 1 1 0 0 1 0 1 1 1
1 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0
end data.
Now defining a macro with three loops:
EDIT - this version accounts for repeating combinations of symptoms
define AllCombsOf3 ()
!do !vr1=1 !to 18
!do !vr2=!vr1 !to 18
!do !vr3=!vr2 !to 18
!if (!vr2<>!vr1 !and !vr2<>!vr3) !then
compute !concat("C_",!vr1,"_",!vr2,"_",!vr3)= !concat("Symptom",!vr1)=1 & !concat("Symptom",!vr2)=1 & !concat("Symptom",!vr3)=1 .
!ifend
!doend
!doend
!doend
!enddefine.
Running the macro and displaying wanted results:
AllCombsOf3.
means C_1_2_3 to C_16_17_18.
EDIT 2 - new macro for a four symptom version
define AllCombsOf4 ()
!do !vr1=1 !to 18
!do !vr2=!vr1 !to 18
!do !vr3=!vr2 !to 18
!do !vr4=!vr3 !to 18
!if (!vr2<>!vr1 !and !vr2<>!vr3 !and !vr3<>!vr4) !then
compute !concat("C_",!vr1,"_",!vr2,"_",!vr3,"_",!vr4)=
!concat("Symptom",!vr1)=1 & !concat("Symptom",!vr2)=1 &
!concat("Symptom",!vr3)=1 & !concat("Symptom",!vr4)=1 .
!ifend
!doend !doend !doend !doend
!enddefine.
AllCombsOf4.
means C_1_2_3_4 to C_15_16_17_18.

if cell 1 contains X, then cell 2 should equal W, if cell 1 contains Z, then look at next criteria

I am trying to transform 8 columns of dummy variables into one column of a 8 level rank.
I am trying to do so with this formular:
=IF(OR(A1="1");"1";IF(OR(B1="1");"2";IF(OR(C1="1");"3";IF(OR(D1="1");"4";IF(OR(E1="1");"5";IF(OR(F1="1");"6";IF(OR(G1="1");"7";IF(OR(H1="1");"8";""))))))))
Here is a view on the table col. 1 to 8 is the data and col.9 is what I would like my command to return:
1 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 1
0 0 0 0 1 0 0 0 5
0 1 0 0 0 0 0 0 2
1 0 0 0 0 0 0 0 1
0 0 0 0 0 0 1 0 7
1 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 1
1 0 0 0 0 0 0 0 1
0 0 0 0 1 0 0 0 5
I have used these other stackoverflow questions as inspriration for the structure.
But it does not work, I don't get an error message, but I also don't get the right output.
Anyone who can see where the problem arises? - Would be much appreciated :)
Best wishes,
Mathilde
Use the MATCH() Function:
=MATCH(1,A1:H1,0)
It appears you use ; instead of , for the delimiter. If so use this.
=MATCH(1;A1:H1;0)

Resources