fetching data from excel in matlab - excel

I am trying to fetch a column from excel with rows more than 17500. Now problem is that when i call it in MATLAB , it does not gives me whole matrix with all data. it fetches data from somewhere in middle.
Now the real problem is that i have to add up 4 numbers in the column and get average , save it in another column and proceed to next consecutive set of numbers and repeat again till the end..How could i do that in MATLAB .Please help me solve this problem as i am just a rookie. Thank you.
so far i have done is this:
clc
g=xlsread('Data.xlsx',1,'E1:E17500');
x=1;
for i = 1:(17500/4) %as steps has to be stepped at 4 since we need avg of 4
y{i}=((g{x}+g{x+1}+g{x+2}+g{x+3})/4);
x=x+4;
end
xlswrite('Data.xlsx', y, 1, 'F1:F4375');

I see several things here: xlsread with one output gives you a numeric matrix of doubles (not a cell-array). Therefore you should address entries with () and not with {}. The for-loop can be omitted when we use reshape to create a matrix with dimensions 4x4375. The we calculate the average of the 4 values in each column directly with mean (evaluated over the first dimension). To get a column-vector again we have to transpose the result of mean using '.
Here is the code:
g = xlsread('Data.xlsx',1,'E1:E17500');
y = mean(reshape(g,4,[]),1)';
xlswrite('Data.xlsx',y,1,'F1:F4375');
To see in detail what happens within the code, let's see the results of each step using random data for g:
Code:
rng(4);
g = randi(10,12,1)
a = reshape(g,4,[])
b = mean(a,1)
y = b'
Result:
g =
10
6
10
8
7
3
10
1
3
5
8
2
a =
10 7 3
6 3 5
10 10 8
8 1 2
b =
8.5000 5.2500 4.5000
y =
8.5000
5.2500
4.5000

Related

Excel Formula for finding Y maximum in a given X range

I have an XY table in excel. I would like to find the maximum Y value in a given X range. Example data given below. What equation can I use, in an unrelated cell, to output the max Y value in between the X range 2:6
X
Y
1
4
2
7
3
0
4
8
5
4
6
3
Using MAXIFS:
=MAXIFS(B:B,A:A,">=2",A:A,"<=6")
As noted by #ScottCraner, if your version of Excel does not support MAXIFS, see this thread for alternatives.
I understand your question so:
you want the biggest number in column Y which also is present in X
=MAXIFS(B2:B7;B2:B7;">="&MIN(A2:A7);B2:B7;"<="&MAX(A2:A7))

Use a split function in every row of one column of a data frame

I have a rather big pandas data frame (more than 1 million rows) with columns containing either strings or numbers. Now I would like to split the strings in one column before the expression "is applied".
An example to explain what I mean:
What I have:
a b description
2 4 method A is applied
10 5 titration is applied
3 1 computation is applied
What I am looking for:
a b description
2 4 method A
10 5 titration
3 1 computation
I tried the following,
df.description = df.description.str.split('is applied')[0]
But this didn't bring the desired result.
Any ideas how to do it? :-)
You are close, need str[0]:
df.description = df.description.str.split(' is applied').str[0]
Alternative solution:
df.description = df.description.str.extract('(.*)\s+is applied')
print (df)
a b description
0 2 4 method A
1 10 5 titration
2 3 1 computation
But for better performance use list comprehension:
df.description = [x.split(' is applied')[0] for x in df.description]
you can use replace
df.description = df.description.str.replace(' is applied','')
df
a b description
0 2 4 method A
1 10 5 titration
2 3 1 computation

How can the AVERAGEIFS function be translated into MATLAB?

I am working at moving my data over from Excel to Matlab. I have some data that I want to average based on multiple criteria. I can accomplish this by looping, but want to do it with matrix operations only, if possible.
Thus far, I have managed to do so with a single criterion, using accumarray as follows:
data=[
1 3
1 3
1 3
2 3
2 6
2 9];
accumarray(data(:,1),data(:,2))./accumarray(data(:,1),1);
Which returns:
3
6
Corresponding to the averages of items 1 and 2, respectively. I have at least three other columns that I need to include in this averaging but don't know how I can add that in. Any help is much appreciated.
For your single column, you don't need to call accumarray twice, you can provide a function handle to mean as the fourth input
mu = accumarray(data(:,1), data(:,2), [], #mean);
For multiple columns, you can use the row indices as the second input to accumarray and then use those from within the anonymous function to access the rows of your data to operate on.
data = [1 3 5
1 3 10
1 3 8
2 3 7
2 6 9
2 9 12];
tmp = accumarray(data(:,1), 1:size(data, 1), [], #(rows){mean(data(rows,2:end), 1)});
means = cat(1, tmp{:});
% 3.0000 7.6667
% 6.0000 9.3333

Ignore #N/As in Excel LINEST function with multiple independent variables (known_x's)

I am trying to find the equation of a plane of best fit to a set of x,y,z data using the LINEST function. Some of the z data is missing, meaning that there are #N/As in the z column. For example:
A B C
(x) (y) (z)
1 1 1 5.1
2 2 1 5.4
3 3 1 5.7
4 1 2 #N/A
5 2 2 5.2
6 3 2 5.5
7 1 3 4.7
8 2 3 5
9 3 3 5.3
I would like to do =LINEST(C1:C9,A1:B9), but the #N/A causes this to return a value error.
I found a solution for a single independent variable (one column of known_x's, i.e. fitting a line to x,y data), but I have not been able to extend it for two independent variables (two known_x's columns, i.e. fitting a plane to x,y,z data). The solution I found is here: http://www.excelforum.com/excel-general/647448-linest-question.html, and the formula (slightly modified for my application) is:
=LINEST(
N(OFFSET(C1:C9,SMALL(IF(ISNUMBER(C1:C9),ROW(C1:C9)-ROW(C1)),
ROW(INDIRECT("1:"&COUNT(C1:C9)))),0,1)),
N(OFFSET(A1:A9,SMALL(IF(ISNUMBER(C1:C9),ROW(C1:C9)-ROW(C1)),
ROW(INDIRECT("1:"&COUNT(C1:C9)))),0,1)),
)
which is equivalent to =LINEST(C1:C9,A1:A9), ignoring the row containing the #N/A.
The formula from the posted link could probably be adapted but it is unwieldy. Least squares with missing data can be viewed as a regression with weight 1 for numeric values and weight 0 for non-numeric values. Based on this observation you could try this (with Ctrl+Shift+Enter in a 1x3 range):
=LINEST(IF(ISNUMBER(C1:C9),C1:C9,),IF(ISNUMBER(C1:C9),CHOOSE({1,2,3},1,A1:A9,B1:B9),),)
This gives the equation of the plane as z=-0.2x+0.3y+5 which can be checked against the results of using LINEST(C1:C8,A1:B8) with the error row removed.

How do I get rid of NaNs in MATLAB?

I have files which have many empty cells which appear as NaNs when I use cell2mat, but the problem is when I need to get the average values I cannot work with this as it shows error with NaN. In excel it overlooks NaN values, so how do I do the same in MATLAB?
In addition, I am writing a file using xlswrite:
xlswrite('test.xls',M);
I have data in all rows except 1. How do I write:
M(1,:) = ('time', 'count', 'length', 'width')
In other words, I want M(1,1)='time', M(1,2)='count', and so on. I have data from M(2,1) to M(10,20). How can I do this?
As AP correctly points out, you can use the function isfinite to find and keep only finite values in your matrix. You can also use the function isnan. However, removing values from your matrix can have the unintended consequence of reshaping your matrix into a row or column vector:
>> mat = [1 2 3; 4 NaN 6; 7 8 9] % A sample 3-by-3 matrix
mat =
1 2 3
4 NaN 6
7 8 9
>> mat = mat(~isnan(mat)) % Removing the NaN gives you an 8-by-1 vector
mat =
1
4
7
2
8
3
6
9
Another alternative is to use some functions from the Statistics Toolbox (if you have access to it) that are designed to deal with matrices containing NaN values. Since you mention taking averages, you may want to check out nanmean:
>> mat = [1 2 3; 4 NaN 6; 7 8 9];
>> nanmean(mat)
ans =
4 5 6 % The column means computed by ignoring NaN values
EDIT: To answer your additional question on the use of xlswrite, this sample code should illustrate one way you can write your data:
C = {'time','count','length','width'}; % A cell array of strings
M = rand(10,20); % A 10-by-20 array of random values
xlswrite('test.xls',C); % Writes C to cells A1 through D1
xlswrite('test.xls',M,'A2:T11'); % Writes M to cells A2 through T11
Use ' isfinite ' function to get rid of all NaN and infinities
A=A(isfinite(A))
%create the cell array containing the column headers
columnHeader = {'Column 1', 'Column 2', 'Column 3', 'Column 4', 'Column 5',' '};
%write the column headers first
xlswrite('myFile1.xls', columnHeader );
% write the data directly underneath the column headers
xlswrite('newFile.xls',M,'Sheet1','A2');
Statistics Toolbox has several statistical functions to deal with NaN values. See nanmean, nanmedian, nanstd, nanmin, nanmax, etc.
You can set NaN's to an arbitrary number like so:
mat(isnan(mat))=7 // my lucky number of choice.
May be too late, but...
x = [1 2 3; 4 inf 6; 7 -inf NaN];
x(find(x == inf)) = 0; //for inf
x(find(x == -inf)) = 0; //for -inf
x(find(isnan(x))) = 0; //for NaN

Resources