read complicated .txt file into Matlab - string

I would like to read a .txt file into Matlab.
One of the columns contains both letters and numbers.
(So I guess one way is to read this column is as string.)
The problem is I also need to find out numbers which are larger than 5 within that column.
e.g. The .txt looks like
12 1
21 2
32 7
11 a
03 b
22 4
13 5
31 6
i.e. Ultimately, I would like to get
32 7
31 6
How can I get it?? Any experts, please help!

You can read the contents of the file into a cell array of strings using TEXTSCAN, convert the strings to numeric values using CELLFUN and STR2NUM (characters like 'a' and 'b' will result in the empty matrix []), remove rows of the cell array that have any empty cells in them, then convert the remaining data into an N-by-2 matrix using CELL2MAT:
fid = fopen('junk.txt','r'); %# Open the file
data = textscan(fid,'%s %s','CollectOutput',true); %# Read the data as strings
fclose(fid); %# Close the file
data = cellfun(#str2num,data{1},'UniformOutput',false); %# Convert to numbers
data(any(cellfun('isempty',data),2),:) = []; %# Remove empty cells
data = cell2mat(data); %# Convert to N-by-2 array
The matrix data will now look like this, given your sample file in the question:
>> data
data =
12 1
21 2
32 7
22 4
13 5
31 6
And you can get the rows that have a value greater than 5 in the second column like so:
>> data(data(:,2) > 5,:)
ans =
32 7
31 6

fid = fopen('txt.txt','r');
Aout = [];
while(1)
[a1,count1] = fscanf(fid,'%s',1);
[a2,count2] = fscanf(fid,'%s',1);
if(count1 < 1 | count2 < 1)
break;
end
if(~isempty(str2num(a2)) & str2num(a2) > 5 & (~isempty(str2num(a1))) )
Aout = [ Aout ; str2num(a1) str2num(a2) ];
end
end
fclose(fid);
Violates the unspoken rule of growing a Matlab variable during a loop, but it's text processing anyway so you probably won't notice the slowness.
Edit: Had too many errors in previous version, had to start fresh.

Related

Matlab Split-String

Hello,
I have a little problem.
I have a txt file with over 200mb.
It looks like:
%Hello World
%second sentences
%third;
%example
12.02.2014
;-400;-200;200
;123;233;434
%Hello World
%second sentences
%third
%example
12.02.2014
;-410;200;20300
;63;23;43
;23;44;78213
..
... ...
I need only the Values after the semicolon like:
Value1{1,1}=[-400]; Value{1,2}=[-200]; and Value{1,3}=[200]
Value2{1,1}=[123]; Value{1,2}=[233]; and Value{1,3}=[434]
and so on.
Hase someone an ideas, how i can split the values in a cell array or vektor?
Thus, the variables must be:
Var1=[-400 -200 200;
434 233 434;
Var2=[
-410 200 20300;
63 23 43;
23 44 28213]
I will seperate, after every date in a another Value. Example when i have 55 Dates, i will have 55 Values.
shareeditundeleteflag
This could be one approach assuming a uniformly structured data (3 valid numbers per row) -
%// Read in entire text data into a cell array
data = importdata('sample.txt','');
%// Remove empty lines
data = data(~cellfun('isempty',data))
%// Find boundaries based on delimiter "%example"
exmp_delim_matches = arrayfun(#(n) strcmp(data{n},'%example'),1:numel(data))
bound_idx = [find(exmp_delim_matches) numel(exmp_delim_matches)]
%// Find lines that start with delimiter ";"
matches_idx = find(arrayfun(#(n) strcmp(data{n}(1),';'),1:numel(data)))
%// Select lines that start with character ";" and split lines based on it
%// Split selected lines based on the delimiter ";"
split_data = regexp(data(matches_idx),';','split')
%// Collect all cells data into a 1D cell array
all_data = [split_data{:}]
%// Select only non-empty cells and convert them to a numeric array.
%// Finally reshape into a format with 3 numbers per row as final output
out = reshape(str2double(all_data(~cellfun('isempty',all_data))),3,[]).' %//'
%// Separate out lines based on the earlier set bounds
out_sep = arrayfun(#(n) out(matches_idx>bound_idx(n) & ...
matches_idx<bound_idx(n+1),:),1:numel(bound_idx)-1,'Uni',0)
%// Display results for verification
celldisp(out_sep)
Code run -
out_sep{1} =
-400 -200 200
123 233 434
out_sep{2} =
-410 200 20300
63 23 43
23 44 78213
A brute force approach would be to open up your file, then read each line one at a time. With each line, you check to see if the first character is a semi-colon and if it is, split up the string by the ; delimiter from the second character of the line up until the end. You will receive a cell array of strings, so you'd have to convert this into an array of numbers. Because you will probably have each line containing a different amount of numbers, let's store each array into a cell array where each element in this cell array will contain the numbers per line. As such, do something like this. Let's assume your text file is stored in text.txt:
fid = fopen('text.txt');
if fid == -1
error('Cannot find file');
end
nums = {};
while true
st = fgetl(fid);
if st == -1
break;
end
if st(1) == ';'
st_split = strsplit(st(2:end), ';');
arr = cellfun(#str2num, st_split);
nums = [nums arr];
end
end
Let's go through the above code slowly. We first use fopen to open up the file for reading. We check to see if the ID returned from fopen is -1 and if that's the case, we couldn't find or open the file so spit out an error. Next, we declare an empty cell array called nums which will store our numbers that you are getting when parsing your text file.
Now, until we reach the end of the file, get one line of text starting from the top of the file and we proceed to the end. We use fgetl for this. If we read a -1, this means we have reached the end of the file, so get out of the loop. Else, we check to see if the first character is ;. If it is, then we take a look at the second character until the end of this line, and split the string based on the ; character with strsplit. The result of this will be a cell array of strings where each element is the string representation of your number. You need to convert this cell array back into a numeric array, and so what you would need to do is apply str2num to each element in this cell. You can either use a loop to go through each cell, or you can conveniently use [cellfun](http://www.mathworks.com/help/matlab/ref/cellfun.html to allow you to go through each element in this cell and convert the string representation into a numeric value. The resulting output of cellfun will give you a numeric array representation of each value delimited by the ; character for that line. We then place this array into a single cell stored in nums.
The end result of this entire code will give you numeric arrays that are based on what you are looking for stored in nums.
Warning
I am assuming that your text file only has numbers delimited by ; characters if we encounter a line that starts with ;. If this is not the case, then my code will not work. I'm assuming this isn't the case!

Matlab - printing selected lines from a text file in new text file

I have a text file whose contents are for example:
[1] John where are you bring me copy number = 1 4 5
[2] Hi, Sam what is the cost of your calculator = 200 500 800
[3] Nancy, bring me the no. of copy that I gave you = 4 8 5
[4] Hi, how many litres of milk you have = 10 12 6
[5] Peter, give your copy numb = 23 45 32
& so on.
I am interested in those lines which contains the word ‘copy’. The code below gives me the line number in the text file that contains the word ‘copy’.
fid = fopen(‘my_file.txt', 'r');
C = textscan(fid, '%s', 'Delimiter', '\n');
fclose(fid);
% Search for string ‘copy’ and find all rows that matches it
D = strfind(C{1}, 'copy');
rows = find(~cellfun('isempty', D));
I want to print those lines with the word ‘copy’ in another text file.
First, you need to select the content you want:
C2=C{1}(rows);
fprint can write a comma separated list, which allows to write the file in one line:
fid2=fopen(....
fprintf(fid2,'%s\n',C2{:});
fclose(fid2);

Converting data to Excel from Matlab

In matlab one of my variable produce a sort of number as follows:
t =
1.0e-07 *
Columns 1 through 4
0.000002188044002 0.000011853757224 0.000043123777130 0.000134856642090
Columns 5 through 8
0.000414700915105 0.001479279377534 0.003134050793671 0.008617995925603
Columns 9 through 12
0.065830078792745 0.087987267599604 0.106338163623915 0.121617374878836
Columns 13 through 16
0.134520178924611 0.145518794399287 0.155035638788571 0.163042823513867
Columns 17 through 18
0.170181805020581 0.172442168463983
How I can produce them in one column in order to easily copy and paste to Excel?
try
format long g
t'
or else just double click on t in your workspace and you'll get a datagrid (the variable editor) that you can just copy and paste out of
Try using this:
fprintf ('%g\n', t);
Which won't have the leading spaces you get from format long g; t'

matlab string vector / array handling (multiplication u and str2num)

I would like to understand if this is really correct, or if this might be an issue in matlab.
I create an string vector/array via:
>>a=['1','2';'3','4']
It returns:
a =
12
34
Now I would like to convert the content from string to number and multiply this with a number:
>>6*str2num(a)
The result looks like this:
a =
72
204
I don't understand why the comma separated elements (strings) will be concatenated and not separated handled. If you use number instead of strings they will be separated handled. Then it looks like this:
>> a=[1,2;3,4]
a =
1 2
3 4
>> 6*a
ans =
6 12
18 24
I would expect the same results. Any ideas ?
Thanks
Have you read about how string handling is done in MATLAB?
Basically, multiple strings can only be stored as a column vector (of strings). If attempted to store as a row vector, they will be concatenated. This is why strings '1' and '2' are being concatenated, as well as '3' and '4'. Also note, that this is only possible if all resulting strings are of the same length.
I'm not sure what you're trying to do, but if you want to store strings as a matrix (that is, multiple strings in a row), consider storing them in a cell array, for instance:
>> A = {'1', '2'; '3', '4'}
A =
'1' '2'
'3' '4'
>> cellfun(#str2num, A)
ans =
1 2
3 4
I would say that using a cell array as #EitanT suggests would probably be the best solution for you.
However, it is possible to handle strings (or rather characters) like the way you tried by manually inserting spaces and lining up the number of characters.
For example
>> a=['1 2';'3 4']
produces
a =
1 2
3 4
and using
>> 6*str2num(a)
produces
ans =
6 12
18 24
Converting between a matrix and a string using
b=[1,2;3,10000];
num2str(b)
spaces are inserted automatically and the characters are lined up properly. This produces
ans =
1 2
3 10000

is it possible to index character strings pulled from an external .txt file?

I'm in a basic MATLAB college course, and need some help with my code.
theres an external .txt file with names in it, with corresponding numbers assigned to each name. my goal is to place all the first names, last names, and numbers into arrays, find the lowest number in the 'number' array, get the corresponding indexer number, and print the first and last name related with that number.
the text file reads 25 different names and numbers
(i.e.:
Bob
Smith
17
Jane
Doe
23
Bill
Johnson
13
...etc...)
here is my general code so far:
1 clear
2
3 clc
4
5
6 fid1=fopen('facedata.txt','rt');
7
8 for index = 1:1:25
9 firstn(index) = fgetl(fid1);
10 lastn(index) = fgetl(fid1);
11 number(index) = fscanf(fid1,'%f');
12 end
13
14 [distmin,I] = min(dist);
15 fprintf('%5.4f %10s %10.0f', distmin, firstn(I), I);
My hope is for the code to run through, get matlab to recognize '13' as the lowest number, and print 'bill johnson' to the screen, but if I run the code, matlab says there are errors
Subscripted assignment dimension mismatch.' # line 9.
and
Index exceeds matrix dimensions.' # the firstn**(I)** in line 15.
any ideas?? i know this is crazy long, but any help would be appreciated! :]
The command fgetl means read a line from the text file. Therefore your code is reading 2x25 = 50 lines of text. How do you know that your file has this many lines in it? You should read a new line, process it, and repeat until you reach the end of the file:
fid = fopen('fgetl.m');
tline = fgetl(fid);
while ischar(tline)
disp(tline)
tline = fgetl(fid);
end
fclose(fid);
However, this would not do what you want. You should rather use fscanf to read data in the format you want. You want to read two consecutive strings (first name, last name) and an integer number. So you can use
A = fscanf(fid, '%s %s %d', [3 inf]);
to read three items at a time and repeat until the end of the file.
I answered my own question earlier today, but here's what I found if anyone is interested:
you have to index a line of string by using curly brackets instead of straight ones.
i.e.:
for index = 1:1:25
firstname{index} = fgetl(fid1);
end
fprintf('%10s', firstn{index});
fprintf will print whichever number index is supplied.
thanks anyway kavka :]

Resources