How to calculate correlation coefficient with R? [duplicate] - excel

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
correlation matrix in r
I have an excel sheet which has 700 columns and 25 rows. The sample of my file is shown below. I would like to calculate the correlation coefficient between A1 and A700, A2 and A700, A3 and A700 ,A4 and A700 and so on. What is the easiest way to do this with R?
A1 A2 A3 A4 ---- --- A700
A 2.7 5 4 34 34
B 5.67 7.8 6 45 25.6
C 2.3 -9 12.5 13 2.8
D 5.6 6 -56 2.5 -66.7
E 7.8 5 20 6.7 -56.8
--
--
--
--

This might do the trick:
table <- read.xls(file)
x <- table[1:699]
y <- table[700]
cor(x, y)

Have a look at ?cor.
For example:
m <- matrix(rnorm(100), nrow=10)
cor(m)
Gives you all the correlations.

Related

Python Pandas: How to insert a new column which is a sum of next 'n' (can be a fraction also) values of another column?

I've got a DataFrame, let's say the name is 'test' storing data as below:
Week Stock(In Number of Weeks) Demand (In Units)
0 W01 2.4 37
1 W02 3.6 33
2 W03 2.0 46
3 W04 5.8 45
4 W05 4.6 56
5 W06 3.0 38
6 W07 5.0 45
7 W08 7.5 54
8 W09 4.3 35
9 W10 2.2 38
10 W11 2.0 50
11 W12 6.0 37
I want to insert a new column in this dataframe which for every row, is the sum of "No. of weeks" rows of column "Demand(In Units)".
That is, in the case of this dataframe,
for 0th row that new column should be the sum of 2.4 rows of column "Demand(In Units)" which would be 37+33+ 0.4*46
for 1st row, the value should be 33+46+45+ 0.6*56
for 2nd row, it should be 46+45
.
.
.
for 7th row, it should be 54+35+38+50+37 (since number of rows left are smaller than the value 7.5, all the remaining rows get summed up)
.
.
.
and so on.
Effectively, I want my dataframe to have a new column as follows:
Week Stock(In Number of Weeks) Demand (In Units) Stock (In Units)
0 W01 2.4 37 88.4
1 W02 3.6 33 157.6
2 W03 2.0 46 91.0
3 W04 5.8 45 266.0
4 W05 4.6 56 214.0
5 W06 3.0 38 137.0
6 W07 5.0 45 222.0
7 W08 7.5 54 214.0
8 W09 4.3 35 160.0
9 W10 2.2 38 95.4
10 W11 2.0 50 87.0
11 W12 6.0 37 37.0
Can somebody suggest some way to achieve this?
I can achieve it through iterating over each row but it would be very slow for millions of rows which I want to process at a time.
The code which I am using right now is:
for i in range(len(test)):
if int(np.floor(test.loc[i, 'Stock(In Number of Weeks)'])) >= len(test[i:]):
number_of_full_rows = len(test[i:])
fraction_of_last_row = 0
y = 0
else:
number_of_full_rows = int(np.floor(test.loc[i, 'Stock(In Number of Weeks)']))
fraction_of_last_row = test.loc[i, 'Stock(In Number of Weeks)'] - number_of_full_rows
y = test.loc[i+number_of_full_rows, 'Demand (In Units)'] * fraction_of_last_row
x = np.sum(test[i:i+number_of_full_rows]['Demand (In Units)'])
test.loc[i, 'Stock (In Units)'] = x+y
I tried with some test data:
def func(r, col):
n = int(r['Stock(In Number of Weeks)'])
f = float(r['Stock(In Number of Weeks)'] - n)
i = r.name # row index value
z = np.zeros(len(df)) #initialize all zeros
v = np.hstack((np.ones(n), np.array(f))) # vecotor of ones and fraction part
e = min(len(v), len(z[i:]))
z[i:i+e] = v[:len(z[i:])] #change z starting at index until lenght
r['Stock (In Units)'] = col # z #compute scalar product
return r
df = df.apply(lambda r: func(df['Demand (In Units)'].values, r), axis=1)

How to find exponential formula coefficients?

I have the following pairs of values:
X Y
1 2736
2 3124
3 3560
4 4047
5 4594
6 5205
7 5890
8 6658
9 7518
10 8480
18 21741
32 108180
35 152237
36 170566
37 191068
38 214087
39 239838
40 268679
When I put these pairs in Excel, I get a exponential formula:
Y = 2559*e^(0.1167*X)
with an accuracy of 99,98%.
Is there a way to ask from Excel to provide a formula in the following format:
Y = (A/B)*C^X-D
If not, is it possible to convert the above formula to the wanted one?
Note, that I am not familiar with Matlab.
You already have it !
A = 2559
B = 1
C = exp(0.1167)
D = 0
You'll see that it is equivalent to your formula Y = 2559*e^(0.1167*X), because e^(0.1167*X) = (e^0.1167)^X

How to write maximize or minimize function in J

For example, if I want to maximize the expectation of returns function
E[r]= w1r1+w2r2 and solve the optimization value for the weight w1 and w2.
The only constraint that you have really given is that w1+w2=1
w1 =.0.25
(,~ -.)w1
0.25 0.75
That takes care of both w1 and w2 given the value of w1.
r1 r2 +/#:* w1 w2 calculates r1w1 + r2w2
r1 =. 5
r2 =.10
(r1,r2) (+/#:* (,-.))w1
8.75
(r1,r2) (+/#:* (,-.))0.9
5.5
(r1,r2) (+/#:* (,-.))0.01
9.95
If you really wanted to maximize you would need to add equations for the value of r1 and r2 and take those into account as well, but perhaps I don't understand your question?
Responding to the comment below: If the constraint of w1+w2=1 still is in play, then the matter just becomes summing the values in r1 and r2, then whichever is bigger should get the w value of 1 and the other will get the w value of 0
r1=.2 4 6 3 2
r2=.2.1 4 6 3 2
r3=.2 4 6 3 2.3
r1 (,-.)#:>/#:(+/#:,.) r2
0 1
r2 (,-.)#:>/#:(+/#:,.) r1
1 0
r3 (,-.)#:>/#:(+/#:,.) r2
1 0
'w1 w2'=.r3 (,-.)#:>/#:(+/#:,.) r2
w1
1
w2
0
'w1 w2'=.r1 (,-.)#:>/#:(+/#:,.) r2
w1
0
w2
1
(r1,.r2) +/#:,#:(+ . *) (0 1) NB. w1=.0 w2=.1
17.1
(r1,.r2) +/#:,#:(+ . *) (1 0) NB. w1=.1 w2=.0
17
(r1,.r2) +/#:,#:(+ . *) (0.5 0.5) NB. w1=.0.5 w2=.0.5
17.05
Based on the follow up comment below I would approach it in one of two ways. I could dig up all my linear programming texts from the 1980's and come up with the definitive mathematical solution (including degenerative cases and local maxima/ minima) or using the same technique as above but for a larger case than n=2. I'm going with the second option.
Let's look first at the r matrix which will be a set of constants. For this example I am taking a random 5 X 10 matrix with values from 1 to 10.
r=. >: ? 5 10 $ 10
r
4 4 8 1 4 3 6 9 6 2
2 6 5 4 4 7 5 10 4 6
2 4 9 10 1 1 9 8 2 7
5 6 5 4 7 9 2 6 10 6
10 3 6 2 10 2 7 10 4 2
Now the trick that I am going to use is that I want to find the column with the highest average to be multiplied by the largest value of w. Easy to do with J using (+/ % #)
(+/ % #) r
4.6 4.6 6.6 4.2 5.2 4.4 5.8 8.6 5.2 4.6
Then find the ranking of the list to be able to reorder the columns of the original r matrix. The leading 7 means that 7 { r is the largest average etc.
\:#:(+/ % #) r
7 2 6 4 8 0 1 9 5 3
I use this to in turn reorder the columns of the matrix r using {"1 since I am working columns. The result is that I have reordered the columns of r so that the column with the largest average is on the left and smallest on the right.
(\:#:(+/ % #) {"1 ]) r
9 8 6 4 6 4 4 2 3 1
10 5 5 4 4 2 6 6 7 4
8 9 9 1 2 2 4 7 1 10
6 5 2 7 10 5 6 6 9 4
10 6 7 10 4 10 3 2 2 2
Once I have that, then the next thing is to develop the w vector. Since I now have all the largest averages on the left I will just maximize the values to the left of w to be as large as possible within the noted constraints.
w=. 0.2 0.2 0.2 0.2 0.15 0.01 0.01 0.01 0.01 0.01
#w NB. w1 through w10
10
+/w NB. sum of the values in w
1
>./w NB. largest value in w
0.2
<./w NB. smallest value in w
0.01
Because the r matrix has been reordered using + . * the dot product gives values for w1r1 , w2r2 , w3r3 ... w10r10
(({"1~ \:#: (+/ % #))r) + . * w
1.8 1.6 1.2 0.8 0.9 0.04 0.04 0.02 0.03 0.01
2 1 1 0.8 0.6 0.02 0.06 0.06 0.07 0.04
1.6 1.8 1.8 0.2 0.3 0.02 0.04 0.07 0.01 0.1
1.2 1 0.4 1.4 1.5 0.05 0.06 0.06 0.09 0.04
2 1.2 1.4 2 0.6 0.1 0.03 0.02 0.02 0.02
to actually get the weight of the matrix ravel all the values then sum
+/ , (({"1~ \:#: (+/ % #))r) + . * w
31.22

Random effects, glmer, nested design

I have the following question:
I am analyzing the number of seed capsules between different genotypes (A,B and C)
I have 4 replicates for each genotype and in each of these replicates, I have 8 plants. Here is an example of the data:
Genotype Replicate_ID Plant_ID Seed_capsules
A 1 1 6
A 1 2 10
A 1 3 15
B 2 1 100
B 2 2 40
B 2 3 63
C 3 1 80
C 3 2 90
C 3 3 100
I used the glmer on the data but I am not sure whether my random is the replicate_ID or the plant_ID or both. Here is an example of what I tried so far:
freplicate_ID <- factor(newdata$replicate_ID)
fplant_ID <- factor(newdata$plant_ID)
m5 <- glmer(seed_capsules ~ pop_ID + (1| freplicate_ID /fplant_ID), family=poisson,data = data)
Further, How do I obtain the diagnostic plots for such a model? How do I compare between the genotypes? Do I need to run lsmeans on the model? Like this for example:
lsmeans(m5,pairwise ~ pop_ID, data=newdata) .
I thank you in advance for your help.
best,
Anna

Excel SUMIF based on array using text string

Is there a way to substitute the cell address containing a text string as the array criteria in the following formula?
=SUM(SUMIF(A5:A10,{1,22,3},E5:E10))
So instead of {1,22,3}, "1, 22, 3" is entered in cell A2 the formula becomes
=SUM(SUMIF(A5:A10,A2,E5:E10))
I have tried but get 0 as a result (refer C16)
A B C D E F G H
1 Tree
2 {1,22,3} 1
3 22
4 Tree Profit 3
5 1 105
6 2 96
7 1 105
8 1 75
9 2 76.8
10 1 45
11
12 330 =SUM(SUMIF(A5:A10,{1,22,3},B5:B10))
13
14 330 =SUMPRODUCT(SUMIF(A5:A10,E2:E3,B5:B10))
15
16 0 =SUM(SUMIF(A5:A10,A2,B5:B10))
17 NB: Custom Format "{"#"}" on Cell A2 I enter 1,22,3 so it displays {1,22,3}
Ok so after some further searching (see Excel string to criteria) and trial and error I have come up with the following solution.
Using Name Manager I created UDF called GetList which Refers to:
=EVALUATE(Sheet1!$A$3) NB: Cell A3 has this formula in it =TEXT(A2,"{#}")
I then used the following formula:
=SUMPRODUCT(SUMIF($A$5:$A$12,GetList,$B$5:$B$12))
which gives the desired result of 321 as per the other two formulas (see D12 below).
If anyone can suggest a better solution then feel free to do so.
Thanks to Dennis to my original post regarding table
A B C D E
1 Tree
2 1,22,3 1
3 {1,22,3} =TEXT(A2,"{#}") 22
4 Tree Profit 3
5 11 105
6 22 96
7 1 105
8 3 75
9 2 76.8
10 1 45
11
12 321 =SUMPRODUCT(SUMIF($A$5:$A$12,GetList,$B$5:$B$12))
13
14 321 =SUM(SUMIF(A5:A10,{1,22,3},B5:B10))
15
16 321 =SUMPRODUCT(SUMIF(A5:A10,E2:E3,B5:B10))
17
18 0 =SUM(SUMIF(A5:A10,A2,B5:B10))
19 NB: Custom Format "{"#"}" on Cell A2 I enter 1,22,3 so it displays {1,22,3}

Resources