Trying to read a text file...but not getting all the contents - string

I am trying to read the file with the following format which repeats itself (but I have cut out the data even for the first repetition because of it being too long):
1.00 'day' 2011-01-02
'Total Velocity Magnitude RC - Matrix' 'm/day'
0.190189 0.279141 0.452853 0.61355 0.757833 0.884577
0.994502 1.08952 1.17203 1.24442 1.30872 1.36653
1.41897 1.46675 1.51035 1.55003 1.58595 1.61824
Download the actual file with the complete data here
This is my code which I am using to read the data from the above file:
fid = fopen(file_name); % open the file
dotTXT_fileContents = textscan(fid,'%s','Delimiter','\n'); % read it as string ('%s') into one big array, row by row
dotTXT_fileContents = dotTXT_fileContents{1};
fclose(fid); %# don't forget to close the file again
%# find rows containing 'Total Velocity Magnitude RC - Matrix' 'm/day'
data_starts = strmatch('''Total Velocity Magnitude RC - Matrix'' ''m/day''',...
dotTXT_fileContents); % data_starts contains the line numbers wherever 'Total Velocity Magnitude RC - Matrix' 'm/day' is found
ndata = length(data_starts); % total no. of data values will be equal to the corresponding no. of '** K' read from the .txt file
%# loop through the file and read the numeric data
for w = 1:ndata-1
%# read lines containing numbers
tmp_str = dotTXT_fileContents(data_starts(w)+1:data_starts(w+1)-3); % stores the content from file dotTXT_fileContents of the rows following the row containing 'Total Velocity Magnitude RC - Matrix' 'm/day' in form of string
%# convert strings to numbers
tmp_str = tmp_str{:}; % store the content of the string which contains data in form of a character
%# assign output
data_matrix_grid_wise(w,:) = str2num(tmp_str); % convert the part of the character containing data into number
end
To give you an idea of pattern of data in my text file, these are some results from the code:
data_starts =
2
1672
3342
5012
6682
8352
10022
ndata =
7
Therefore, my data_matrix_grid_wise should contain 1672-2-2-1(for a new line)=1667 rows. However, I am getting this as the result:
data_matrix_grid_wise =
Columns 1 through 2
0.190189000000000 0.279141000000000
0.423029000000000 0.616590000000000
0.406297000000000 0.604505000000000
0.259073000000000 0.381895000000000
0.231265000000000 0.338288000000000
0.237899000000000 0.348274000000000
Columns 3 through 4
0.452853000000000 0.613550000000000
0.981086000000000 1.289920000000000
0.996090000000000 1.373680000000000
0.625792000000000 0.859638000000000
0.547906000000000 0.743446000000000
0.562903000000000 0.759652000000000
Columns 5 through 6
0.757833000000000 0.884577000000000
1.534560000000000 1.714330000000000
1.733690000000000 2.074690000000000
1.078000000000000 1.277930000000000
0.921371000000000 1.080570000000000
0.934820000000000 1.087410000000000
Where am I wrong? In my final result, I should get data_matrix_grid_wise composed of 10000 elements instead of 36 elements. Thanks.
Update: How can I include the number before 'day' i.e. 1,2,3 etc. on a line just before the data_starts(w)? I am using this within the loop but it doesn't seem to work:
days_str = dotTXT_fileContents(data_starts(w)-1);
days_str = days_str{1};
days(w,:) = sscanf(days_str(w-1,:), '%d %*s %*s', [1, inf]);

Problem in line tmp_str = tmp_str{:}; Matlab have strange behaviour when handling chars. Short solution for you is replace last with the next two lines:
y = cell2mat( cellfun(#(z) sscanf(z,'%f'),tmp_str,'UniformOutput',false));
data_matrix_grid_wise(w,:) = y;

The problem is with last 2 statements. When you do tmp_str{:} you convert cell array to comma-separated list of strings. If you assign this list to a single variable, only the first string is assigned. So the tmp_str will now have only the first row of data.
Here is what you can do instead of last 2 lines:
tmp_mat = cellfun(#str2num, tmp_str, 'uniformoutput',0);
data_matrix_grid_wise(w,:) = cell2mat(tmp_mat);
However, you will have a problem with concatenation (cell2mat) since not all of your rows have the same number of columns. It's depends on you how to solve it.

Related

Can you group data with similar written column headings on xlswrite, matlab?

Very new to matlab and still learning the basics. I'm trying to write a script which calculates the distance between two peaks in a waveform. That part I have managed to do, and I have used xlswrite to put the values I have obtained onto an excel file.
For each file, I have between about 50-250 columns, with just two rows: the second row has the numerical value, and the first has the column headings, copied from original excel files I extracted the data from.
Some of the columns have similar, but not identical, headings, e.g. 'green227RightEyereading3' and 'green227RightEyereading4' etc. Is there a way I can group columns with similar headings, e.g. which have the same number/colour in the heading (I.e.green227) and either 'right eye' or 'left eye', and calculate an average of their numerical values? Link to file here: >https://www.dropbox.com/s/ezpyjr3raol31ts/SampleBatchForTesting.xls?dl=0>
>[Excel_file,PathName] = uigetfile('*.xls', 'Pick a File','C:\Users\User\Documents\Optometry\Year 3\Dissertation\A-scan3');
>[~,name,ext] = fileparts(Excel_file);
>sheet = 2;
>FullXLSfile = [PathName, Excel_file];
>[number_data,txt_data,raw_data] = xlsread(FullXLSfile,sheet);
>HowManyWide = size(txt_data);
>NumberOfTitles = HowManyWide(1,2);
>xlRangeA = txt_data;
>Chickens = {'Test'};
>for f = 1:xlRangeA; %%defined as top line of cells on sheet;
>Text = xlRangeA{f};
>HyphenLocations = find(Text == '-');
>R = HyphenLocations(1,1) -1;
>Chick = Text(1:R);
>Chick = cellstr(Chick);
>B = length(Chick);
>TF = strncmp(Chickens,Chick,B);
>if any(TF == 1); %do nothing
>else
>Chickens = {Chickens;Chick};
>end
>end
Here also is a link to the file that is created when I run my entire script. The values below the headings are the calculated thickesses of the tissue I'm analysing. https://www.dropbox.com/s/4p6iu9kk75ecyzl/Choroid_Thickness.xls?dl=0
Thanks very much
If the different characters are located at the very end (or the very beginning) of the heading, you can go with strncmp buit-in function and compare only part of the string. See more here. But please, provide some code and a part of your excel file. It would help.
Also, if I am not mistaken, you are saving all the data into excel and then re-call it again in order to sort it. Maybe you should consider saving only the final result in excel, it will save you some time, especially if you want to run your script many times.
EDIT:
Here is the code I came up with. It is not the best possible solution for sure, but it works with the file you uploaded. I have omitted the unnecessary lines and variables. The code works only if the numbers of each reading have the same amount of digits. They can be 4 digits as long as every entry has 4 digits. Since in each file you have waves of the same color, the only thing that you care about is whether the reading was recorded with the left or the right eye (correct?). Based on that and the code you wrote, the comparison concerns the part of the string that contains the words "Right" or "Left", i.e. the characters between the hyphens.
[Excel_file,PathName] = uigetfile('*.xls', 'Pick a File',...
'C:\Users\User\Documents\Optometry\Year 3\Dissertation\A-scan3');
sheet = 1;
FullXLSfile = [PathName,Excel_file];
[number_data,txt_data,raw_data] = xlsread(FullXLSfile,sheet);
%% data manipulation
NumberOfTitles = length(txt_data);
TextToCompare = txt_data{1};
r1 = 1; % counter for Readings1 vector
r2 = 1; % counter for Readings2 vector
for ff = 1:NumberOfTitles % in your code xlRangeA is a cell vector not a number!
Text = txt_data{ff};
HyphenLocations = find(Text == '-');
Text = Text(HyphenLocations(1,1):HyphenLocations(1,2)); % take only the part that contains the "eye" information
TextToCompare = TextToCompare(HyphenLocations(1,1):HyphenLocations(1,2)); % same here
if (Text == TextToCompare)
Readings1(r1) = number_data(ff); % store the numerical value in a vector
r1 = r1 + 1; % increase the counter of this vector
else
Readings2(r2) = number_data(ff); % same here
r2 = r2 + 1;
end
TextToCompare = txt_data{1}; % TextToCompare re-initialized for the next comparison
end
mean_readings1 = mean(Readings1); % Find the mean of the grouped values
mean_readings2 = mean(Readings2);
I am positive that this can be done in a more efficient and delicate way. I don't know exactly what kind of calculations you want to do so I only included the mean values as an example. Inside the if statement you can also store the txt_data if you need it. Below I have also included a second way which I find more delicate. Just substitute the %%data manipulation part with the part below if you want to test it:
%% more delicate way
Text_Vector = char(txt_data);
TextToCompare2 = txt_data{1};
HyphenLocations2 = find(TextToCompare2 == '-');
TextToCompare2 = TextToCompare2(HyphenLocations2(1,1):HyphenLocations2(1,2));
Text_Vector = Text_Vector(:,HyphenLocations2(1,1):HyphenLocations2(1,2));
Text_Vector = cellstr(Text_Vector);
dummy = strcmpi(Text_Vector,TextToCompare2);
Readings1 = number_data(dummy);
Readings2 = number_data(~dummy);
I hope this helps.

Matlab substring

I am trying to get average a specific value in a long string and cannot figure out how to pull a value out of the middle of the string. I would like to pull out the 27 from this and the other strings and add them
2015-10-1,33,27,20,29,24,20,96,85,70,30.51,30.40,30.13,10,9,4,10,6,,T,5,Snow,35
2015-10-1,33,27,20,29,24,20,96,85,70,30.51,30.40,30.13,10,9,4,10,6,,T,5,Snow,35
2015-10-2,37,32,27,32,27,23,92,80,67,30.35,30.31,30.28,10,10,7,7,4,,T,8,Rain-Snow,19
2015-10-3,39,36,32,35,33,29,100,90,79,30.30,30.17,30.11,10,7,0,8,3,,0.21,8,Fog-Rain,13
2015-10-4,40,37,34,38,36,33,100,96,92,30.23,30.19,30.14,2,1,0,6,0,,0.13,8,Fog-Rain,27
2015-10-5,46,38,30,38,34,30,100,91,61,30.19,30.08,29.93,10,7,0,6,2,,T,6,Fog-Rain,23
fid = fopen('MonthlyHistory.html');
for i=1:2
str = fgets(fid);
c = strsplit(str,',');
mean=mean+c;
end
fprintf('Average Daily Temperature: %d\n',mean);
Method 1: use readtable
I'm guessing this is pulled from weather underground? Take your csv file and make sure it is saved with a .csv ending. Then what I would do is:
my_data = readtable('MonthlyHistory.csv');
This reads the whole file into the highly convenient table variable type. Then you can do:
average_daily_temp = my_data.MeanTemperatureF; %or whatever it is called in the table
I find tables are a super convenient way to keep track of tabular data. (plus readtable is pretty good).
Method 2: continue your approach...
fid = fopen('mh2.csv');
str = fgets(fid); % May need to read off a few lines to get to the
str = fgets(fid); % numbers
my_data = []; %initialize an empty array
while(true)
str = fgets(fid); % read off a line
if(str == -1) % if str is -1, which signifies end of file
break; %exit loop
end
ca = strsplit(str,','); % split string into a cell array of strings
my_data(end+1,:) = str2num(ca{3}); % convert the 3rd element to a number and store it
end
fclose(fid);
Now my_data is an array holding the 3rd element of each line.
You can use textscan, you might be able to simplfy your code using this as well, but for a single string, it works like this:
S='2015-10-1,33,27,20,29,24,20,96,85,70,30.51,30.40,30.13,10,9,4,10,6,,T,5,Snow,35'
T=textscan(S,'%s','Delimiter',',')
str2double(T{1}{3}) %// the value we want is the 3rd field

Optimizing count of occurrence of a string

I have to count how often a certain string is contained in a cell-array. The problem is the code is way to slow it takes almost 1 second in order to do this.
uniqueWordsSize = 6; % just a sample number
wordsCounter = zeros(uniqueWordsSize, 1);
uniqueWords = unique(words); % words is a cell-array
for i = 1:uniqueWordsSize
wordsCounter(i) = sum(strcmp(uniqueWords(i), words));
end
What I'm currently doing is to compare every word in uniqueWords with the cell-array words and use sum in order to calculate the sum of the array which gets returned by strcmp.
I hope someone can help me to optimize that.... 1 second for 6 words is just too much.
EDIT: ismember is even slower.
You can drop the loop completely by using the third output of unique together with hist:
words = {'a','b','c','a','a','c'}
[uniqueWords,~,wordOccurrenceIdx]=unique(words)
nUniqueWords = length(uniqueWords);
counts = hist(wordOccurrenceIdx,1:nUniqueWords)
uniqueWords =
'a' 'b' 'c'
wordOccurrenceIdx =
1 2 3 1 1 3
counts =
3 1 2
tricky way without using explicit fors..
clc
close all
clear all
Paragraph=lower(fileread('Temp1.txt'));
AlphabetFlag=Paragraph>=97 & Paragraph<=122; % finding alphabets
DelimFlag=find(AlphabetFlag==0); % considering non-alphabets delimiters
WordLength=[DelimFlag(1), diff(DelimFlag)];
Paragraph(DelimFlag)=[]; % setting delimiters to white space
Words=mat2cell(Paragraph, 1, WordLength-1); % cut the paragraph into words
[SortWords, Ia, Ic]=unique(Words); %finding unique words and their subscript
Bincounts = histc(Ic,1:size(Ia, 1));%finding their occurence
[SortBincounts, IndBincounts]=sort(Bincounts, 'descend');% finding their frequency
FreqWords=SortWords(IndBincounts); % sorting words according to their frequency
FreqWords(1)=[];SortBincounts(1)=[]; % dealing with remaining white space
Freq=SortBincounts/sum(SortBincounts)*100; % frequency percentage
%% plot
NMostCommon=20;
disp(Freq(1:NMostCommon))
pie([Freq(1:NMostCommon); 100-sum(Freq(1:NMostCommon))], [FreqWords(1:NMostCommon), {'other words'}]);

How do I read a delimited file with strings/numbers with Octave?

I am trying to read a text file containing digits and strings using Octave. The file format is something like this:
A B C
a 10 100
b 20 200
c 30 300
d 40 400
e 50 500
but the delimiter can be space, tab, comma or semicolon. The textread function works fine if the delimiter is space/tab:
[A,B,C] = textread ('test.dat','%s %d %d','headerlines',1)
However it does not work if delimiter is comma/semicolon. I tried to use dklmread:
dlmread ('test.dat',';',1,0)
but it does not work because the first column is a string.
Basically, with textread I can't specify the delimiter and with dlmread I can't specify the format of the first column. Not with the versions of these functions in Octave, at least. Has anybody ever had this problem before?
textread allows you to specify the delimiter-- it honors the property arguments of strread. The following code worked for me:
[A,B,C] = textread( 'test.dat', '%s %d %d' ,'delimiter' , ',' ,1 )
I couldn't find an easy way to do this in Octave currently. You could use fopen() to loop through the file and manually extract the data. I wrote a function that would do this on arbitrary data:
function varargout = coltextread(fname, delim)
% Initialize the variable output argument
varargout = cell(nargout, 1);
% Initialize elements of the cell array to nested cell arrays
% This syntax is due to {:} producing a comma-separated
[varargout{:}] = deal(cell());
fid = fopen(fname, 'r');
while true
% Get the current line
ln = fgetl(fid);
% Stop if EOF
if ln == -1
break;
endif
% Split the line string into components and parse numbers
elems = strsplit(ln, delim);
nums = str2double(elems);
nans = isnan(nums);
% Special case of all strings (header line)
if all(nans)
continue;
endif
% Find the indices of the NaNs
% (i.e. the indices of the strings in the original data)
idxnans = find(nans);
% Assign each corresponding element in the current line
% into the corresponding cell array of varargout
for i = 1:nargout
% Detect if the current index is a string or a num
if any(ismember(idxnans, i))
varargout{i}{end+1} = elems{i};
else
varargout{i}{end+1} = nums(i);
endif
endfor
endwhile
endfunction
It accepts two arguments: the file name, and the delimiter. The function is governed by the number of return variables that are specified, so, for example, [A B C] = coltextread('data.txt', ';'); will try to parse three different data elements from each row in the file, while A = coltextread('data.txt', ';'); will only parse the first elements. If no return variable is given, then the function won't return anything.
The function ignores rows that have all-strings (e.g. the 'A B C' header). Just remove the if all(nans)... section if you want everything.
By default, the 'columns' are returned as cell arrays, although the numbers within those arrays are actually converted numbers, not strings. If you know that a cell array contains only numbers, then you can easily convert it to a column vector with: cell2mat(A)'.

Cell to Char in MATLAB doesn't work

I have used this code to read data from a plaintext file:
[race sex age namef] = textread('Fusion.txt', '%s %s %d %s');
I convert race from cell to char using: race = char(race); to do a string comparison (if(strcmp(race(k),'W')==1)) and it works as expected. I also need to namef to char but when I do, MATLAB returns 0 for every element of namef.
Here is a sample of my file:
W M 50 00001_930831_fb_a.ppm
W M 30 00002_930831_fa.ppm
W M 30 00002_930831_fb.ppm
W M 30 00002_931230_fa.ppm
W M 30 00002_931230_fb.ppm
W M 31 00002_940128_fa.ppm
W M 31 00002_940128_fb.ppm
Why is this happening?
From your question it is not clear whether conversion to char is necessary later on. For what you want to do, it is OK to compare to the individual elements of the cells race or namef:
strcmp(race{k}, 'W')
strcmp(named{k}, '00002_930831_fa.ppm')
Since strcmp operates on cell arrays of strings as well, you can also do things such as strcmp(race, 'W').
Since what you're doing should work fine, you're probably missing one thing: the last column in your file has multiple characters, so you need to access the whole row of the resulting string matrix, rather than a single element:
race = char(race); %// cell to character array of size [N,1]
namef = char(namef); %// cell to character array of size [N,M], padding added
for k=1:size(race,1)
condition_col1 = strcmp(race(k),'W')==1;
condition_col4 = strcmp(strtrim(namef(k,:)),'00002_930831_fa.ppm');
%// ... code goes here
end
If you use namef(k), you'll get the first character of each row, i.e. '0'. So namef(k,:) is my main point.
Also note that I added strtrim to the condition: turning to a character array will pad the fields to the length of the longest element (since matrices have to be rectangular).

Resources