I have a devised a function in matlab that allows me (or so I thought) to extract data from a textfile that looks like this (at least the beginning)
G1 50
G2 50
M-0.35 0
M-0.05 0.013
M3.3 0.1
M9.75 0.236
M17.15 0.425
M25.85 0.666
M35.35 0.958
The idea is to match the letter I have with its position with a vector (because only the values next to M are really interesting to me), and get the two other numbers in a vector.
The end of the treatment works well, but the values I get by the end of my code are sometimes far from the real ones.
For instance, instead of [0 0.013 0.1 0.236 0.425 0.666 0.958] I get [0 0.013 0.1010 0.237 0.426 0.666 0.959].
This is not such an issue, the problem is much worse for the first column : instead of a maximum at 119, it doesn't reach 90. I had a code that worked properly with integers, but now I'm using floats it fails everytime.
I will try and display only the interesting parts of the code :
nom_essai='test.txt'
fid1 = fopen(nom_essai, 'rt');
tableau = textscan(fid1, '%s %.5f ', 'HeaderLines', 1, 'CollectOutput', true); %There are a few lines that I skip because they give the parameters, I get them with another line of the code
colonne_force=tableau{1}; %on recupere la premiere colonne
colonne_deplacement=tableau{2}; %on recupere la seconde colonne
indice=2*found_G+found_F+3*found_R; %this is the result of the treatment on colonne_force to match an index with the letter, which helps me keep the period next to G and the 2 values next to M.
force=linspace(0,0,length(n_indices)); %initialisation
deplacement=linspace(0,0,length(n_indices)); %initialisation
temps=linspace(0,0,length(n_indices)); %initialisation
for k=1:length(colonne_force) %%%%k is for the length of my vectors, while j is for the length of the columns
if indice(k)==2 %un G est trouve => temps d'echantillonnage
T=colonne_deplacement(k); %to keep the period next to G
end
elseif indice(k)==1 %an F is found : skip it
elseif indice(k)==3 %an R is found : skip it
else %an M is found : I need to get the values on these lines
j=j+1;
deplacement(j)=colonne_deplacement(k); %I keep the value on the second column
M=strsplit(colonne_force{k},'M'); %I get the string 'MXXX'
force(j)=str2double(M{2}); %I recover this string without the M, and convert the number to double
end
end
The kind of precision I would like to have is to keep values like [M108.55 23.759] with up to 3 digits.
Thank you in advance, feel free to ask for any information if I failed to give only the part of the code that contains the problem.
Modifying a bit your code as:
nom_essai='test.txt';
fid1 = fopen(nom_essai, 'rt');
tableau = textscan(fid1, '%s %f ', 'HeaderLines', 1, 'CollectOutput', true); % Change to %f not to miss significative figures
colonne_force = tableau{1}; %on recupere la premiere colonne
colonne_deplacement=tableau{2}; %on recupere la seconde colonne
% Check if has M
hasM = cellfun(#(x) any(x == 'M'), colonne_force);
column2 = colonne_deplacement(hasM);
column1 = colonne_force(hasM);
column1 = cellfun(#(x) str2double(x(2:end)), column1); % delete M and convert to double
The precision is retained:
Related
How do I format the following numbers that are in vector?
For an instance, numbers which I have:
23.02567
0.025679
and I would like to format to this:
0.230256700+E02
0.025679000+E00
First, note that this is not the proper way to format numbers in scientific- or engineering-notation. Those numbers should always have exactly one digit in front of the decimal point, unless the exponent is required to be a multiple of 3 (i.e. a power of 1000, corresponding to one of the SI prefixes). If, however, you have to use this format, you could write your own format string for that.
>>> x, e = 23.02567, 2
>>> "%f%sE%02d" % (x/10**e, "+" if e >= 0 else "-", abs(e))
'0.230257+E02'
>>> x, e = 0.025679, -1
>>> "%f%sE%02d" % (x/10**e, "+" if e >= 0 else "-", abs(e))
'0.256790-E01'
This is assuming that the exponent, e, is given. If the exponent does not matter, you could also use the proper %E format and just replace E+ with +E:
>>> ("%E" % x).replace("E+", "+E").replace("E-", "-E")
'2.567900-E02'
Is it possible to format the output of sprintf, like following or should I use another function.
Say I have an variable dt= 9.765625e-05 and I want use sprintf to make a string for use when saving say a figure
fig = figure(nfig);
plot(x,y);
figStr = sprintf('NS2d_dt%e',dt);
saveas(fig,figStr,'pdf')
The punctuation mark dot presents me with problems, some systems mistake the format of the file.
using
figStr = sprintf('NS2d_dt%.2e',dt);
then
figStr = NS2d_dt9.77e-05
using
figStr = sprintf('NS2d_dt%.e',dt);
then
figStr = NS2d_dt1e-04
which is not precise enough. I would like something like this
using
figStr = sprintf('NS2d_dt%{??}e',dt);
then
figStr = NS2d_dt9765e-08
Essentially the only way to get your desired output is with some manipulation of the value or strings. So here's two solutions for you first with some string manipulation and second by manipulating the value. Hopefully, these 2 approaches will help reason out solutions for other problems, particularly the number manipulation.
String Manipulation
Solution
fmt = #(x) sprintf('%d%.0fe%03d', (sscanf(sprintf('%.4e', x), '%d.%de%d').' .* [1 0.1 1]) - [0 0.5 3]);
Explanation
First I use sprintf to print the number in a defined format
>> sprintf('%.4e', dt)
ans =
9.7656e-05
then sscanf to read it back in making sure to remove the . and e
>> sscanf(sprintf('%.4e', dt), '%d.%de%d').'
ans =
9 7656 -5
before printing it back we perform some manipulation of the data to get the correct values for printing
>> (sscanf(sprintf('%.4e', dt), '%d.%de%d').' .* [1 0.1 1]) - [0 0.5 3]
ans =
9 765.1 -8
and now we print
>> sprintf('%d%.0fe%03d', (sscanf(sprintf('%.4e', dt), '%d.%de%d').' .* [1 0.1 1]) - [0 0.5 3])
ans =
9765e-08
Number Manipulation
Solution
orderof = #(x) floor(log10(abs(x)));
fmt = #(x) sprintf('%.0fe%03d', x*(10^(abs(orderof(x))+3))-0.5, orderof(x)-3);
Explanation
First I create an anonymous orderof function which tells me the order (the number after e) of the input value. So
>> dt = 9.765625e-05;
>> orderof(dt)
ans =
-5
Next we manipulate the number to convert it to a 4 digit integer, this is the effect of adding 3 in
>> floor(dt*(10^(abs(orderof(dt))+3)))
ans =
9756
finally before printing the value we need to figure out the new exponent with
>> orderof(x)-3
ans =
-8
and printing will give us
>> sprintf('%.0fe%03d', floor(dt*(10^(abs(orderof(dt))+3))), orderof(dt)-3)
ans =
9765e-08
Reading your question,
The punctuation mark dot presents me with problems, some systems mistake the format of the file.
it seems to me that your actual problem is that when you build the file name using, for example
figStr = sprintf('NS2d_dt%.2e',dt);
you get
figStr = NS2d_dt9.77e-05
and, then, when you use that string as filename, the . is intepreted as the extension and the .pdf is not attached, so in Explorer you can not open the file double-clicking on it.
Considering that changing the representation of the number dt from 9.765e-05 to 9765e-08 seems quite wierd, you can try the following approach:
use the print function to save your figure in .pdf
add .pdf in the format specifier
This should allows you the either have the right file extension and the right format for the dt value.
peaks
figStr = sprintf('NS2d_dt_%.2e.pdf',dt);
print(gcf,'-dpdf', figStr )
Hope this helps.
figStr = sprintf('NS2d_dt%1.4e',dt)
figStr =
NS2d_dt9.7656e-05
specify the number (1.4 here) as NumbersBeforeDecimal (dot) NumbersAfterDecimal.
Regarding your request:
A = num2str(dt); %// convert to string
B = A([1 3 4 5]); %// extract first four digits
C = A(end-2:end); %// extract power
fspec = 'NS2d_dt%de%d'; %// format spec
sprintf(fspec ,str2num(B),str2num(C)-3)
NS2d_dt9765e-8
I have a homework program I have run into a problem with. We basically have to take a word (such as MATLAB) and have the function give us the correct score value for it using the rules of Scrabble. There are other things involved such as double word and double point values, but what I'm struggling with is converting to ASCII. I need to get my string into ASCII form and then sum up those values. We only know the bare basics of strings and our teacher is pretty useless. I've tried converting the string into numbers, but that's not exactly working out. Any suggestions?
function[score] = scrabble(word, letterPoints)
doubleword = '#';
doubleletter = '!';
doublew = [findstr(word, doubleword)]
trouble = [findstr(word, doubleletter)]
word = char(word)
gameplay = word;
ASCII = double(gameplay)
score = lower(sum(ASCII));
Building on Francis's post, what I would recommend you do is create a lookup array. You can certainly convert each character into its ASCII equivalent, but then what I would do is have an array where the input is the ASCII code of the character you want (with a bit of modification), and the output will be the point value of the character. Once you find this, you can sum over the points to get your final point score.
I'm going to leave out double points, double letters, blank tiles and that whole gamut of fun stuff in Scrabble for now in order to get what you want working. By consulting Wikipedia, this is the point distribution for each letter encountered in Scrabble.
1 point: A, E, I, O, N, R, T, L, S, U
2 points: D, G
3 points: B, C, M, P
4 points: F, H, V, W, Y
5 points: K
8 points: J, X
10 points: Q, Z
What we're going to do is convert your word into lower case to ensure consistency. Now, if you take a look at the letter a, this corresponds to ASCII code 97. You can verify that by using the double function we talked about earlier:
>> double('a')
97
As there are 26 letters in the alphabet, this means that going from a to z should go from 97 to 122. Because MATLAB starts indexing arrays at 1, what we can do is subtract each of our characters by 96 so that we'll be able to figure out the numerical position of these characters from 1 to 26.
Let's start by building our lookup table. First, I'm going to define a whole bunch of strings. Each string denotes the letters that are associated with each point in Scrabble:
string1point = 'aeionrtlsu';
string2point = 'dg';
string3point = 'bcmp';
string4point = 'fhvwy';
string5point = 'k';
string8point = 'jx';
string10point = 'qz';
Now, we can use each of the strings, convert to double, subtract by 96 then assign each of the corresponding locations to the points for each letter. Let's create our lookup table like so:
lookup = zeros(1,26);
lookup(double(string1point) - 96) = 1;
lookup(double(string2point) - 96) = 2;
lookup(double(string3point) - 96) = 3;
lookup(double(string4point) - 96) = 4;
lookup(double(string5point) - 96) = 5;
lookup(double(string8point) - 96) = 8;
lookup(double(string10point) - 96) = 10;
I first create an array of length 26 through the zeros function. I then figure out where each letter goes and assign to each letter their point values.
Now, the last thing you need to do is take a string, take the lower case to be sure, then convert each character into its ASCII equivalent, subtract by 96, then sum up the values. If we are given... say... MATLAB:
stringToConvert = 'MATLAB';
stringToConvert = lower(stringToConvert);
ASCII = double(stringToConvert) - 96;
value = sum(lookup(ASCII));
Lo and behold... we get:
value =
10
The last line of the above code is crucial. Basically, ASCII will contain a bunch of indexing locations where each number corresponds to the numerical position of where the letter occurs in the alphabet. We use these positions to look up what point / score each letter gives us, and we sum over all of these values.
Part #2
The next part where double point values and double words come to play can be found in my other StackOverflow post here:
Calculate Scrabble word scores for double letters and double words MATLAB
Convert from string to ASCII:
>> myString = 'hello, world';
>> ASCII = double(myString)
ASCII =
104 101 108 108 111 44 32 119 111 114 108 100
Sum up the values:
>> total = sum(ASCII)
total =
1160
The MATLAB help for char() says (emphasis added):
S = char(X) converts array X of nonnegative integer codes into a character array. Valid codes range from 0 to 65535, where codes 0 through 127 correspond to 7-bit ASCII characters. The characters that MATLABĀ® can process (other than 7-bit ASCII characters) depend upon your current locale setting. To convert characters into a numeric array, use the double function.
ASCII chart here.
I have a huge csv file (as in: more than a few gigs) and would like to read it in Matlab and process each file. Reading the file in its entirety is impossible so I use this code to read in each line:
fileName = 'input.txt';
inputfile = fopen(fileName);
while 1
tline = fgetl(inputfile);
if ~ischar(tline)
break
end
end
fclose(inputfile);
This yiels a cell array of size(1,1) with the line as string. What I would like is to convert this cell to a normal array with just the numbers.
For example:
input.csv:
0.0,0.0,3.201,0.192
2.0,3.56,0.0,1.192
0.223,0.13,3.201,4.018
End result in Matlab for the first line:
A = [0.0,0.0,3.201,0.192]
I tried converting tline with double(tline) but this yields completely different results. Also tried using a regex but got stuck there. I got to the point where I split up all values into a different cell in one array. But converting to double with str2double yields only NaNs...
Any tips? Preferably without any loops since it already takes a while to read the entire file.
You are looking for str2num
>> A = '0.0,0.0,3.201,0.192';
>> str2num(A)
ans =
0 0 3.2010 0.1920
>> A = '0.0 0.0 3.201 0.192';
>> str2num(A)
ans =
0 0 3.2010 0.1920
>> A = '0.0 0.0 , 3.201 , 0.192';
>> str2num(A)
ans =
0 0 3.2010 0.1920
e.g., it's quite agnostic to input format.
However, I will not advise this for your use case. For your problem, I'd do
C = dlmread('input.txt',',', [1 1 1 inf]) % for first line
C = dlmread('input.txt',',') % for entire file
or
[a,b,c,d] = textread('input.txt','%f,%f,%f,%f',1) % for first line
[a,b,c,d] = textread('input.txt','%f,%f,%f,%f') % for entire file
if you want all columns in separate variables:
a = 0
b = 0
c = 3.201
d = 0.192
or
fid = fopen('input.txt','r');
C = textscan(fid, '%f %f %f %f', 1); % for first line only
C = textscan(fid, '%f %f %f %f', N); % for first N lines
C = textscan(fid, '%f %f %f %f', 1, 'headerlines', N-1); % for Nth line only
fclose(fid);
all of which are much more easily expandable (things like this, whatever they are, tend to grow bigger over time :). Especially dlmread is much less prone to errors than writing your own clauses is, for empty lines, missing values and other great nuisances very common in most data sets.
Try
data = dlmread('input.txt',',')
It will do exactly what you want to do.
If you still want to convert string to a vector:
line_data = sscanf(line,'%g,',inf)
This code will read the entire coma-separated string and convert each number.
I am trying to read the file with the following format which repeats itself (but I have cut out the data even for the first repetition because of it being too long):
1.00 'day' 2011-01-02
'Total Velocity Magnitude RC - Matrix' 'm/day'
0.190189 0.279141 0.452853 0.61355 0.757833 0.884577
0.994502 1.08952 1.17203 1.24442 1.30872 1.36653
1.41897 1.46675 1.51035 1.55003 1.58595 1.61824
Download the actual file with the complete data here
This is my code which I am using to read the data from the above file:
fid = fopen(file_name); % open the file
dotTXT_fileContents = textscan(fid,'%s','Delimiter','\n'); % read it as string ('%s') into one big array, row by row
dotTXT_fileContents = dotTXT_fileContents{1};
fclose(fid); %# don't forget to close the file again
%# find rows containing 'Total Velocity Magnitude RC - Matrix' 'm/day'
data_starts = strmatch('''Total Velocity Magnitude RC - Matrix'' ''m/day''',...
dotTXT_fileContents); % data_starts contains the line numbers wherever 'Total Velocity Magnitude RC - Matrix' 'm/day' is found
ndata = length(data_starts); % total no. of data values will be equal to the corresponding no. of '** K' read from the .txt file
%# loop through the file and read the numeric data
for w = 1:ndata-1
%# read lines containing numbers
tmp_str = dotTXT_fileContents(data_starts(w)+1:data_starts(w+1)-3); % stores the content from file dotTXT_fileContents of the rows following the row containing 'Total Velocity Magnitude RC - Matrix' 'm/day' in form of string
%# convert strings to numbers
tmp_str = tmp_str{:}; % store the content of the string which contains data in form of a character
%# assign output
data_matrix_grid_wise(w,:) = str2num(tmp_str); % convert the part of the character containing data into number
end
To give you an idea of pattern of data in my text file, these are some results from the code:
data_starts =
2
1672
3342
5012
6682
8352
10022
ndata =
7
Therefore, my data_matrix_grid_wise should contain 1672-2-2-1(for a new line)=1667 rows. However, I am getting this as the result:
data_matrix_grid_wise =
Columns 1 through 2
0.190189000000000 0.279141000000000
0.423029000000000 0.616590000000000
0.406297000000000 0.604505000000000
0.259073000000000 0.381895000000000
0.231265000000000 0.338288000000000
0.237899000000000 0.348274000000000
Columns 3 through 4
0.452853000000000 0.613550000000000
0.981086000000000 1.289920000000000
0.996090000000000 1.373680000000000
0.625792000000000 0.859638000000000
0.547906000000000 0.743446000000000
0.562903000000000 0.759652000000000
Columns 5 through 6
0.757833000000000 0.884577000000000
1.534560000000000 1.714330000000000
1.733690000000000 2.074690000000000
1.078000000000000 1.277930000000000
0.921371000000000 1.080570000000000
0.934820000000000 1.087410000000000
Where am I wrong? In my final result, I should get data_matrix_grid_wise composed of 10000 elements instead of 36 elements. Thanks.
Update: How can I include the number before 'day' i.e. 1,2,3 etc. on a line just before the data_starts(w)? I am using this within the loop but it doesn't seem to work:
days_str = dotTXT_fileContents(data_starts(w)-1);
days_str = days_str{1};
days(w,:) = sscanf(days_str(w-1,:), '%d %*s %*s', [1, inf]);
Problem in line tmp_str = tmp_str{:}; Matlab have strange behaviour when handling chars. Short solution for you is replace last with the next two lines:
y = cell2mat( cellfun(#(z) sscanf(z,'%f'),tmp_str,'UniformOutput',false));
data_matrix_grid_wise(w,:) = y;
The problem is with last 2 statements. When you do tmp_str{:} you convert cell array to comma-separated list of strings. If you assign this list to a single variable, only the first string is assigned. So the tmp_str will now have only the first row of data.
Here is what you can do instead of last 2 lines:
tmp_mat = cellfun(#str2num, tmp_str, 'uniformoutput',0);
data_matrix_grid_wise(w,:) = cell2mat(tmp_mat);
However, you will have a problem with concatenation (cell2mat) since not all of your rows have the same number of columns. It's depends on you how to solve it.