Very new to matlab and still learning the basics. I'm trying to write a script which calculates the distance between two peaks in a waveform. That part I have managed to do, and I have used xlswrite to put the values I have obtained onto an excel file.
For each file, I have between about 50-250 columns, with just two rows: the second row has the numerical value, and the first has the column headings, copied from original excel files I extracted the data from.
Some of the columns have similar, but not identical, headings, e.g. 'green227RightEyereading3' and 'green227RightEyereading4' etc. Is there a way I can group columns with similar headings, e.g. which have the same number/colour in the heading (I.e.green227) and either 'right eye' or 'left eye', and calculate an average of their numerical values? Link to file here: >https://www.dropbox.com/s/ezpyjr3raol31ts/SampleBatchForTesting.xls?dl=0>
>[Excel_file,PathName] = uigetfile('*.xls', 'Pick a File','C:\Users\User\Documents\Optometry\Year 3\Dissertation\A-scan3');
>[~,name,ext] = fileparts(Excel_file);
>sheet = 2;
>FullXLSfile = [PathName, Excel_file];
>[number_data,txt_data,raw_data] = xlsread(FullXLSfile,sheet);
>HowManyWide = size(txt_data);
>NumberOfTitles = HowManyWide(1,2);
>xlRangeA = txt_data;
>Chickens = {'Test'};
>for f = 1:xlRangeA; %%defined as top line of cells on sheet;
>Text = xlRangeA{f};
>HyphenLocations = find(Text == '-');
>R = HyphenLocations(1,1) -1;
>Chick = Text(1:R);
>Chick = cellstr(Chick);
>B = length(Chick);
>TF = strncmp(Chickens,Chick,B);
>if any(TF == 1); %do nothing
>else
>Chickens = {Chickens;Chick};
>end
>end
Here also is a link to the file that is created when I run my entire script. The values below the headings are the calculated thickesses of the tissue I'm analysing. https://www.dropbox.com/s/4p6iu9kk75ecyzl/Choroid_Thickness.xls?dl=0
Thanks very much
If the different characters are located at the very end (or the very beginning) of the heading, you can go with strncmp buit-in function and compare only part of the string. See more here. But please, provide some code and a part of your excel file. It would help.
Also, if I am not mistaken, you are saving all the data into excel and then re-call it again in order to sort it. Maybe you should consider saving only the final result in excel, it will save you some time, especially if you want to run your script many times.
EDIT:
Here is the code I came up with. It is not the best possible solution for sure, but it works with the file you uploaded. I have omitted the unnecessary lines and variables. The code works only if the numbers of each reading have the same amount of digits. They can be 4 digits as long as every entry has 4 digits. Since in each file you have waves of the same color, the only thing that you care about is whether the reading was recorded with the left or the right eye (correct?). Based on that and the code you wrote, the comparison concerns the part of the string that contains the words "Right" or "Left", i.e. the characters between the hyphens.
[Excel_file,PathName] = uigetfile('*.xls', 'Pick a File',...
'C:\Users\User\Documents\Optometry\Year 3\Dissertation\A-scan3');
sheet = 1;
FullXLSfile = [PathName,Excel_file];
[number_data,txt_data,raw_data] = xlsread(FullXLSfile,sheet);
%% data manipulation
NumberOfTitles = length(txt_data);
TextToCompare = txt_data{1};
r1 = 1; % counter for Readings1 vector
r2 = 1; % counter for Readings2 vector
for ff = 1:NumberOfTitles % in your code xlRangeA is a cell vector not a number!
Text = txt_data{ff};
HyphenLocations = find(Text == '-');
Text = Text(HyphenLocations(1,1):HyphenLocations(1,2)); % take only the part that contains the "eye" information
TextToCompare = TextToCompare(HyphenLocations(1,1):HyphenLocations(1,2)); % same here
if (Text == TextToCompare)
Readings1(r1) = number_data(ff); % store the numerical value in a vector
r1 = r1 + 1; % increase the counter of this vector
else
Readings2(r2) = number_data(ff); % same here
r2 = r2 + 1;
end
TextToCompare = txt_data{1}; % TextToCompare re-initialized for the next comparison
end
mean_readings1 = mean(Readings1); % Find the mean of the grouped values
mean_readings2 = mean(Readings2);
I am positive that this can be done in a more efficient and delicate way. I don't know exactly what kind of calculations you want to do so I only included the mean values as an example. Inside the if statement you can also store the txt_data if you need it. Below I have also included a second way which I find more delicate. Just substitute the %%data manipulation part with the part below if you want to test it:
%% more delicate way
Text_Vector = char(txt_data);
TextToCompare2 = txt_data{1};
HyphenLocations2 = find(TextToCompare2 == '-');
TextToCompare2 = TextToCompare2(HyphenLocations2(1,1):HyphenLocations2(1,2));
Text_Vector = Text_Vector(:,HyphenLocations2(1,1):HyphenLocations2(1,2));
Text_Vector = cellstr(Text_Vector);
dummy = strcmpi(Text_Vector,TextToCompare2);
Readings1 = number_data(dummy);
Readings2 = number_data(~dummy);
I hope this helps.
Related
The code
ite = 5 ;
cell = 5;
MRJIT = xlsread('5 Node.xlsm',1,'L62: P67');
max_col = 5 ;
for m=1:ite
for n=1:max_col
a = randi(cell)
b = randi(cell)
while (eq(a,b) ||(MRJIT(a,n)==0 && MRJIT(b,n)==0))
if (a~=b)&&(MRJIT(a,n)> 0 || MRJIT(b,n)>0)
break;
end
a = randi(cell)
b = randi(cell)
end
MRJIT([a,n b,n]) = MRJIT([b,n a,n]) %swap value
end
end
Code explanation
there are 5 column on this table, 5 node.xls
the point of this code is to swap values between 2 cell on each column from the table above that are selected by choosing 2 random number that is a and b but only if one of the selected cell value is not zero, if both of the cell values equal to zero, it will need to select another 2 random number until the one of the selected cells values is not equal to zero
The Question
1.why does the code stuck in the while loop? when i try to force stop the program, it shows some of the a and b values are not the same or equal to zero, but it kept stuck on the while loop
Why does the program only run on column 1 and not the others?
This statement
MRJIT([a,n b,n]) = MRJIT([b,n a,n])
does not swap two values. [a,n b,n] is the same as [a,n,b,n]. That is, you are addressing three values using linear indexing (one of them twice). Alternatives: use sub2ind to compute linear indices to your two values, so you can swap them in one statement like you tried, or use a temporary variable to store the one value, and swap them indexing one item at the time. There is no direct way in MATLAB to index two elements in one operation, unless the elements are on the same row or column (except using linear indices, of course).
Using the sub2ind alternative, you could write:
a = sub2ind(a,n);
b = sub2ind(b,n)
MRJIT([a,b]) = MRJIT([b,a]);
Note the difference between MRJIT([a,b]) and MRJIT(a,b).
The other alternative is:
tmp = MRJIT(a,n);
MRJIT(a,n) = MRJIT(b,n);
MRJIT(b,n) = tmp;
--
As an aside, you might be able to improve (speed up) the way you find a and b by (not tested):
a = 0;
while(MRJIT(a,n)==0)
a = randi(cell);
end
b = 0;
while (a==b || MRJIT(b,n)==0)
b = randi(cell);
end
I am trying to get average a specific value in a long string and cannot figure out how to pull a value out of the middle of the string. I would like to pull out the 27 from this and the other strings and add them
2015-10-1,33,27,20,29,24,20,96,85,70,30.51,30.40,30.13,10,9,4,10,6,,T,5,Snow,35
2015-10-1,33,27,20,29,24,20,96,85,70,30.51,30.40,30.13,10,9,4,10,6,,T,5,Snow,35
2015-10-2,37,32,27,32,27,23,92,80,67,30.35,30.31,30.28,10,10,7,7,4,,T,8,Rain-Snow,19
2015-10-3,39,36,32,35,33,29,100,90,79,30.30,30.17,30.11,10,7,0,8,3,,0.21,8,Fog-Rain,13
2015-10-4,40,37,34,38,36,33,100,96,92,30.23,30.19,30.14,2,1,0,6,0,,0.13,8,Fog-Rain,27
2015-10-5,46,38,30,38,34,30,100,91,61,30.19,30.08,29.93,10,7,0,6,2,,T,6,Fog-Rain,23
fid = fopen('MonthlyHistory.html');
for i=1:2
str = fgets(fid);
c = strsplit(str,',');
mean=mean+c;
end
fprintf('Average Daily Temperature: %d\n',mean);
Method 1: use readtable
I'm guessing this is pulled from weather underground? Take your csv file and make sure it is saved with a .csv ending. Then what I would do is:
my_data = readtable('MonthlyHistory.csv');
This reads the whole file into the highly convenient table variable type. Then you can do:
average_daily_temp = my_data.MeanTemperatureF; %or whatever it is called in the table
I find tables are a super convenient way to keep track of tabular data. (plus readtable is pretty good).
Method 2: continue your approach...
fid = fopen('mh2.csv');
str = fgets(fid); % May need to read off a few lines to get to the
str = fgets(fid); % numbers
my_data = []; %initialize an empty array
while(true)
str = fgets(fid); % read off a line
if(str == -1) % if str is -1, which signifies end of file
break; %exit loop
end
ca = strsplit(str,','); % split string into a cell array of strings
my_data(end+1,:) = str2num(ca{3}); % convert the 3rd element to a number and store it
end
fclose(fid);
Now my_data is an array holding the 3rd element of each line.
You can use textscan, you might be able to simplfy your code using this as well, but for a single string, it works like this:
S='2015-10-1,33,27,20,29,24,20,96,85,70,30.51,30.40,30.13,10,9,4,10,6,,T,5,Snow,35'
T=textscan(S,'%s','Delimiter',',')
str2double(T{1}{3}) %// the value we want is the 3rd field
I have a simple question that I need help on. My code,I believe, is almost complete but im having trouble with the a specific line of code.
I have an assignment question (2 parts) that asks me to find whether a protein (string), has the specified motif (substring) at that particular location (location). This is the first part, and the function and code looks like this:
function output = Motif_Match(motif,protein,location)
%This code wil print a '1' if the motif occurs in the protein starting
at the given location, else it wil print a '0'
for k = 1:location %Iterates through specified location
if protein(1, [k, k+1]) == motif; % if the location matches the protein and motif
output = 1;
else
output = 0;
end
end
This part I was able to get correctly, and example of this is as follows:
p = 'MGNAAAAKKGN'
m = 'GN'
Motif_Match(m,p,2)
ans =
1
The second part of the question, which I am stuck on, is to take the motif and protein and return a vector containing the locations at which the motif occurs in the protein. To do this, I am using calls to my previous code and I am not supposed to use any functions that make this easy such as strfind, find, hist, strcmp etc.
My code for this, so far, is:
function output = Motif_Find(motif,protein)
[r,c] = size(protein)
output = zeros(r,c)
for k = 1:c-1
if Motif_Match(motif,protein,k) == 1;
output(k) = protein(k)
else
output = [];
end
end
I belive something is wrong at line 6 of this code. My thinking on this is that I want the output to give me the locations to me and that this code on this line is incorrect, but I can't seem to think of anything else. An example of what should happen is as follows:
p = 'MGNAAAAKKGN';
m = 'GN';
Motif_Find(m,p)
ans =
2 10
So my question is, how can I get my code to give me the locations? I've been stuck on this for quite a while and can't seem to get anywhere with this. Any help will be greatly appreciated!
Thank you all!
you are very close.
output(k) = protein(k)
should be
output(k) = k
This is because we want just the location K of the match. Using protien(k) will gives us the character at position K in the protein string.
Also the very last thing I would do is only return the nonzero elements. The easiest way is to just use the find command with no arguments besides the vector/matrix
so after your loop just do this
output = find(output); %returns only non zero elements
edit
I just noticed another problem output = []; means set output to an empty array. this isn't what you want i think what you meant was output(k) = 0; this is why you weren't getting the result you expected. But REALLY since you already made the whole array zeros, you don't need that at all. all together, the code should look like this. I also replaced your size with length since your proteins are linear sequences, not 2d matricies
function output = Motif_Find(motif,protein)
protein_len = length(protein)
motif_len = length(motif)
output = zeros(1,protein_len)
%notice here I changed this to motif_length. think of it this way, if the
%length is 4, we don't need to search the last 3,2,or 1 protein groups
for k = 1:protein_len-motif_len + 1
if Motif_Match(motif,protein,k) == 1;
output(k) = k;
%we don't really need these lines, since the array already has zeros
%else
% output(k) = 0;
end
end
%returns only nonzero elements
output = find(output);
In order to make this question easier to describe I have provided the following example code, which is similar to the actual data I am working with:
clear all
AirT = {rand(32,1),rand(32,1),rand(32,1),rand(32,1)};
SolRad = {rand(32,1),rand(32,1),rand(32,1),rand(32,1)};
Rain = {rand(32,1),rand(32,1),rand(32,1),rand(32,1)};
Location = {'England','Wales','Scotland','Ireland'};
points = {'old','old','old','new'};
CorrVariables = {'AirT','SolRad','Rain'};
for i = 1:length(Location);
Data = #(location) struct('Location',location,CorrVariables{1},AirT{i},...
CorrVariables{2},SolRad{i},CorrVariables{3},Rain{i});
D(i) = Data(Location{i});
end
FieldName = {D.Location};
R = corrcoef([D.AirT],'rows','pairwise');
R_Value = [Location(nchoosek(1:size(R,1),2)) num2cell(nonzeros(tril(R,-1)))];
q = points(nchoosek(1:size(R,1),2));
%to calculate the combination of these points we need to convert the
%cell into a matrix.
Re = [R_Value q];
From this example I would like to create another cell array in column 5 of Re which is dependant on the strings in columns 4 and 5. So, if columns 4 and 5 in Re are equal, such as 'old''old' then column 6 should show 'old'. However, if the cells differ e.g. 'old' 'new' then I would like the new cell array (i.e. column 6 in Re) to state 'old/new'.
How would this be possible?
From your description I think the clearest approach is to use a combination of string concatenation and regular expressions.
First combine columns 4 and 5 into a new column:
newColumn = strcat(Re(:,4), '/', Re(:,5));
Now look for the repeated pattern and replace with the first token matched:
newColumn = regexprep(newColumn, '(\w+)/\1', '$1');
Combine into existing cell matrix:
Re = [Re, newColumn];
I am trying to read the file with the following format which repeats itself (but I have cut out the data even for the first repetition because of it being too long):
1.00 'day' 2011-01-02
'Total Velocity Magnitude RC - Matrix' 'm/day'
0.190189 0.279141 0.452853 0.61355 0.757833 0.884577
0.994502 1.08952 1.17203 1.24442 1.30872 1.36653
1.41897 1.46675 1.51035 1.55003 1.58595 1.61824
Download the actual file with the complete data here
This is my code which I am using to read the data from the above file:
fid = fopen(file_name); % open the file
dotTXT_fileContents = textscan(fid,'%s','Delimiter','\n'); % read it as string ('%s') into one big array, row by row
dotTXT_fileContents = dotTXT_fileContents{1};
fclose(fid); %# don't forget to close the file again
%# find rows containing 'Total Velocity Magnitude RC - Matrix' 'm/day'
data_starts = strmatch('''Total Velocity Magnitude RC - Matrix'' ''m/day''',...
dotTXT_fileContents); % data_starts contains the line numbers wherever 'Total Velocity Magnitude RC - Matrix' 'm/day' is found
ndata = length(data_starts); % total no. of data values will be equal to the corresponding no. of '** K' read from the .txt file
%# loop through the file and read the numeric data
for w = 1:ndata-1
%# read lines containing numbers
tmp_str = dotTXT_fileContents(data_starts(w)+1:data_starts(w+1)-3); % stores the content from file dotTXT_fileContents of the rows following the row containing 'Total Velocity Magnitude RC - Matrix' 'm/day' in form of string
%# convert strings to numbers
tmp_str = tmp_str{:}; % store the content of the string which contains data in form of a character
%# assign output
data_matrix_grid_wise(w,:) = str2num(tmp_str); % convert the part of the character containing data into number
end
To give you an idea of pattern of data in my text file, these are some results from the code:
data_starts =
2
1672
3342
5012
6682
8352
10022
ndata =
7
Therefore, my data_matrix_grid_wise should contain 1672-2-2-1(for a new line)=1667 rows. However, I am getting this as the result:
data_matrix_grid_wise =
Columns 1 through 2
0.190189000000000 0.279141000000000
0.423029000000000 0.616590000000000
0.406297000000000 0.604505000000000
0.259073000000000 0.381895000000000
0.231265000000000 0.338288000000000
0.237899000000000 0.348274000000000
Columns 3 through 4
0.452853000000000 0.613550000000000
0.981086000000000 1.289920000000000
0.996090000000000 1.373680000000000
0.625792000000000 0.859638000000000
0.547906000000000 0.743446000000000
0.562903000000000 0.759652000000000
Columns 5 through 6
0.757833000000000 0.884577000000000
1.534560000000000 1.714330000000000
1.733690000000000 2.074690000000000
1.078000000000000 1.277930000000000
0.921371000000000 1.080570000000000
0.934820000000000 1.087410000000000
Where am I wrong? In my final result, I should get data_matrix_grid_wise composed of 10000 elements instead of 36 elements. Thanks.
Update: How can I include the number before 'day' i.e. 1,2,3 etc. on a line just before the data_starts(w)? I am using this within the loop but it doesn't seem to work:
days_str = dotTXT_fileContents(data_starts(w)-1);
days_str = days_str{1};
days(w,:) = sscanf(days_str(w-1,:), '%d %*s %*s', [1, inf]);
Problem in line tmp_str = tmp_str{:}; Matlab have strange behaviour when handling chars. Short solution for you is replace last with the next two lines:
y = cell2mat( cellfun(#(z) sscanf(z,'%f'),tmp_str,'UniformOutput',false));
data_matrix_grid_wise(w,:) = y;
The problem is with last 2 statements. When you do tmp_str{:} you convert cell array to comma-separated list of strings. If you assign this list to a single variable, only the first string is assigned. So the tmp_str will now have only the first row of data.
Here is what you can do instead of last 2 lines:
tmp_mat = cellfun(#str2num, tmp_str, 'uniformoutput',0);
data_matrix_grid_wise(w,:) = cell2mat(tmp_mat);
However, you will have a problem with concatenation (cell2mat) since not all of your rows have the same number of columns. It's depends on you how to solve it.