Igor Pro 8, functions for comparing strings - string

Hi I'm pretty new to using Igor Pro. I'm looking for some help on writing a procedure for a task.
I have 4 waves, two are text waves and two are numeric waves(one of which has no data yet). I need to write a function that will compare the two text waves and if they are equal, have igor pull data from one of the numeric waves and put it in the correct point to match the text wave it's coupled with.
To make it visually conceptually
twave1 twave2
nwave1 nwave2
twave1 is a list of all isotopes up to neptunium but they're not in order, and nwave1 is their corresponding mass values. (both on table1)
twave2 is the same list of isotopes but ordered properly (i.e. 1H, 2H, 3H, 4H...3He, 4He...ect) and nwave2 is empty (both on table2)
so the goal is to create a function that will sort through twave1 and twave2, and if they match, pull the data from nwave1 into nwave2, so that the masses match with the correct isotopes on table2. So table2 will have the correctly ordered isotopes and now the mass data as well, in the correct places.
Any help would be greatly appreciated; this is where I've gotten so far
function assignMEf()
wave ME, ME_FRIB
wave isotope_FRIB, isotope
variable len = numpnts(ME)
variable i, j
variable ME_current, iso_current
for(i=0; i<len; i+=1)
ME_current = ME[i]
iso_current = isotope[i]
for(j=0; j<4254; j+=1)
if(iso_current == isotope_frib[j])
ME_frib = ME[i]
endif
endfor
endfor
end

If I understood correctly, the two waves you want at the end are isotope and ME. Your code was close to working, however you need to tell Igor when you declare a text wave that it is a text wave, by using the /t flag. I simplified the code a bit more:
function assignMEf()
wave ME, ME_FRIB
wave/t isotope, isotope_FRIB
variable len = numpnts(ME)
variable i, j
for(i = 0; i < len; i += 1)
for(j = 0; j < len; j += 1)
if(stringmatch(isotope[i],isotope_frib[j]))
ME[i] = ME_FRIB[j]
endif
endfor
endfor
end
This code is not very robust but works for what you'd like to do.
To test the code, here is my MWE:
•Make/O/N=(10) ME_FRIB = p
•Make/O/N=(10) ME = NaN
•Make/O/N=(10)/T isotope_FRIB = "iso" + num2str(10 - p)
•Duplicate/O isotope_FRIB,isotope
•Sort isotope,isotope
•Edit isotope_FRIB,ME_FRIB,isotope,ME
•assignmef()

I don't think stringmatch is the right choice here. It uses wildcard matching but the OP AFAIU wants match/no-match so !cmpstr is a better choice.

Related

Error: "can't multiply sequence by non-int of type 'float'"

It says the error is when h (altitude) is between 11000 and 25000, so I only posted the initial stuff outside all my if loops and the specific loop where the problem is happening. Here is my code:
import math;
T = 0.0;
P = 0.0;
hString = ("What is the altitude in meters?");
h = int(hString);
e = math.exp(0.000157*h);
elif 11000 < h < 25000:
T = -56.46;
P = (22.65)*[(1.73)-e];
When you use mathematical operations you need to be careful with brackets.
P = (22.65)*((1.73)-e); #will be right way of using
[ ] using will create a list which you, do not need in this program.
Here is a link which will help you learn much more about type conversions and proper use of brackets while doing mathematics on it.
Also in your code you have not used
hString =input ("What is the altitude in meters?");
h = int(hString);
input will allow you to take value from user and then int(your_input) will help you convert to integer
The square brackets in the last line ([(1.73)-e]) create a list. In this case, it's a list with one element, namely (1.73)-e. I imagine you intended those to be parens. Make that change and it will work.
The final line becomes:
P = (22.65)*((1.73)-e);

Can you group data with similar written column headings on xlswrite, matlab?

Very new to matlab and still learning the basics. I'm trying to write a script which calculates the distance between two peaks in a waveform. That part I have managed to do, and I have used xlswrite to put the values I have obtained onto an excel file.
For each file, I have between about 50-250 columns, with just two rows: the second row has the numerical value, and the first has the column headings, copied from original excel files I extracted the data from.
Some of the columns have similar, but not identical, headings, e.g. 'green227RightEyereading3' and 'green227RightEyereading4' etc. Is there a way I can group columns with similar headings, e.g. which have the same number/colour in the heading (I.e.green227) and either 'right eye' or 'left eye', and calculate an average of their numerical values? Link to file here: >https://www.dropbox.com/s/ezpyjr3raol31ts/SampleBatchForTesting.xls?dl=0>
>[Excel_file,PathName] = uigetfile('*.xls', 'Pick a File','C:\Users\User\Documents\Optometry\Year 3\Dissertation\A-scan3');
>[~,name,ext] = fileparts(Excel_file);
>sheet = 2;
>FullXLSfile = [PathName, Excel_file];
>[number_data,txt_data,raw_data] = xlsread(FullXLSfile,sheet);
>HowManyWide = size(txt_data);
>NumberOfTitles = HowManyWide(1,2);
>xlRangeA = txt_data;
>Chickens = {'Test'};
>for f = 1:xlRangeA; %%defined as top line of cells on sheet;
>Text = xlRangeA{f};
>HyphenLocations = find(Text == '-');
>R = HyphenLocations(1,1) -1;
>Chick = Text(1:R);
>Chick = cellstr(Chick);
>B = length(Chick);
>TF = strncmp(Chickens,Chick,B);
>if any(TF == 1); %do nothing
>else
>Chickens = {Chickens;Chick};
>end
>end
Here also is a link to the file that is created when I run my entire script. The values below the headings are the calculated thickesses of the tissue I'm analysing. https://www.dropbox.com/s/4p6iu9kk75ecyzl/Choroid_Thickness.xls?dl=0
Thanks very much
If the different characters are located at the very end (or the very beginning) of the heading, you can go with strncmp buit-in function and compare only part of the string. See more here. But please, provide some code and a part of your excel file. It would help.
Also, if I am not mistaken, you are saving all the data into excel and then re-call it again in order to sort it. Maybe you should consider saving only the final result in excel, it will save you some time, especially if you want to run your script many times.
EDIT:
Here is the code I came up with. It is not the best possible solution for sure, but it works with the file you uploaded. I have omitted the unnecessary lines and variables. The code works only if the numbers of each reading have the same amount of digits. They can be 4 digits as long as every entry has 4 digits. Since in each file you have waves of the same color, the only thing that you care about is whether the reading was recorded with the left or the right eye (correct?). Based on that and the code you wrote, the comparison concerns the part of the string that contains the words "Right" or "Left", i.e. the characters between the hyphens.
[Excel_file,PathName] = uigetfile('*.xls', 'Pick a File',...
'C:\Users\User\Documents\Optometry\Year 3\Dissertation\A-scan3');
sheet = 1;
FullXLSfile = [PathName,Excel_file];
[number_data,txt_data,raw_data] = xlsread(FullXLSfile,sheet);
%% data manipulation
NumberOfTitles = length(txt_data);
TextToCompare = txt_data{1};
r1 = 1; % counter for Readings1 vector
r2 = 1; % counter for Readings2 vector
for ff = 1:NumberOfTitles % in your code xlRangeA is a cell vector not a number!
Text = txt_data{ff};
HyphenLocations = find(Text == '-');
Text = Text(HyphenLocations(1,1):HyphenLocations(1,2)); % take only the part that contains the "eye" information
TextToCompare = TextToCompare(HyphenLocations(1,1):HyphenLocations(1,2)); % same here
if (Text == TextToCompare)
Readings1(r1) = number_data(ff); % store the numerical value in a vector
r1 = r1 + 1; % increase the counter of this vector
else
Readings2(r2) = number_data(ff); % same here
r2 = r2 + 1;
end
TextToCompare = txt_data{1}; % TextToCompare re-initialized for the next comparison
end
mean_readings1 = mean(Readings1); % Find the mean of the grouped values
mean_readings2 = mean(Readings2);
I am positive that this can be done in a more efficient and delicate way. I don't know exactly what kind of calculations you want to do so I only included the mean values as an example. Inside the if statement you can also store the txt_data if you need it. Below I have also included a second way which I find more delicate. Just substitute the %%data manipulation part with the part below if you want to test it:
%% more delicate way
Text_Vector = char(txt_data);
TextToCompare2 = txt_data{1};
HyphenLocations2 = find(TextToCompare2 == '-');
TextToCompare2 = TextToCompare2(HyphenLocations2(1,1):HyphenLocations2(1,2));
Text_Vector = Text_Vector(:,HyphenLocations2(1,1):HyphenLocations2(1,2));
Text_Vector = cellstr(Text_Vector);
dummy = strcmpi(Text_Vector,TextToCompare2);
Readings1 = number_data(dummy);
Readings2 = number_data(~dummy);
I hope this helps.

Finding location of specified substring in a specified string (MATLAB)

I have a simple question that I need help on. My code,I believe, is almost complete but im having trouble with the a specific line of code.
I have an assignment question (2 parts) that asks me to find whether a protein (string), has the specified motif (substring) at that particular location (location). This is the first part, and the function and code looks like this:
function output = Motif_Match(motif,protein,location)
%This code wil print a '1' if the motif occurs in the protein starting
at the given location, else it wil print a '0'
for k = 1:location %Iterates through specified location
if protein(1, [k, k+1]) == motif; % if the location matches the protein and motif
output = 1;
else
output = 0;
end
end
This part I was able to get correctly, and example of this is as follows:
p = 'MGNAAAAKKGN'
m = 'GN'
Motif_Match(m,p,2)
ans =
1
The second part of the question, which I am stuck on, is to take the motif and protein and return a vector containing the locations at which the motif occurs in the protein. To do this, I am using calls to my previous code and I am not supposed to use any functions that make this easy such as strfind, find, hist, strcmp etc.
My code for this, so far, is:
function output = Motif_Find(motif,protein)
[r,c] = size(protein)
output = zeros(r,c)
for k = 1:c-1
if Motif_Match(motif,protein,k) == 1;
output(k) = protein(k)
else
output = [];
end
end
I belive something is wrong at line 6 of this code. My thinking on this is that I want the output to give me the locations to me and that this code on this line is incorrect, but I can't seem to think of anything else. An example of what should happen is as follows:
p = 'MGNAAAAKKGN';
m = 'GN';
Motif_Find(m,p)
ans =
2 10
So my question is, how can I get my code to give me the locations? I've been stuck on this for quite a while and can't seem to get anywhere with this. Any help will be greatly appreciated!
Thank you all!
you are very close.
output(k) = protein(k)
should be
output(k) = k
This is because we want just the location K of the match. Using protien(k) will gives us the character at position K in the protein string.
Also the very last thing I would do is only return the nonzero elements. The easiest way is to just use the find command with no arguments besides the vector/matrix
so after your loop just do this
output = find(output); %returns only non zero elements
edit
I just noticed another problem output = []; means set output to an empty array. this isn't what you want i think what you meant was output(k) = 0; this is why you weren't getting the result you expected. But REALLY since you already made the whole array zeros, you don't need that at all. all together, the code should look like this. I also replaced your size with length since your proteins are linear sequences, not 2d matricies
function output = Motif_Find(motif,protein)
protein_len = length(protein)
motif_len = length(motif)
output = zeros(1,protein_len)
%notice here I changed this to motif_length. think of it this way, if the
%length is 4, we don't need to search the last 3,2,or 1 protein groups
for k = 1:protein_len-motif_len + 1
if Motif_Match(motif,protein,k) == 1;
output(k) = k;
%we don't really need these lines, since the array already has zeros
%else
% output(k) = 0;
end
end
%returns only nonzero elements
output = find(output);

Matlab, order cells of strings according to the first one

I have 2 cell of strings and I would like to order them according to the first one.
A = {'a';'b';'c'}
B = {'b';'a';'c'}
idx = [2,1,3] % TO FIND
B=B(idx);
I would like to find a way to find idx...
Use the second output of ismember. ismember tells you whether or not values in the first set are anywhere in the second set. The second output tells you where these values are located if we find anything. As such:
A = {'a';'b';'c'}
B = {'b';'a';'c'}
[~,idx] = ismember(A, B);
Note that there is a minor typo when you declared your cell arrays. You have a colon in between b and c for A and a and c for B. I placed a semi-colon there for both for correctness.
Therefore, we get:
idx =
2
1
3
Benchmarking
We have three very good algorithms here. As such, let's see how this performs by doing a benchmarking test. What I'm going to do is generate a 10000 x 1 random character array of lower case letters. This will then be encapsulated into a 10000 x 1 cell array, where each cell is a single character array. I construct A this way, and B is a random permutation of the elements in A. This is the code that I wrote to do this for us:
letters = char(97 + (0:25));
rng(123); %// Set seed for reproducibility
ind = randi(26, [10000, 1]);
lettersMat = letters(ind);
A = mat2cell(lettersMat, ones(10000,1), 1);
B = A(randperm(10000));
Now... here comes the testing code:
clear all;
close all;
letters = char(97 + (0:25));
rng(123); %// Set seed for reproducibility
ind = randi(26, [10000, 1]);
lettersMat = letters(ind);
A = mat2cell(lettersMat, 1, ones(10000,1));
B = A(randperm(10000));
tic;
[~,idx] = ismember(A,B);
t = toc;
fprintf('ismember: %f\n', t);
clear idx; %// Make sure test is unbiased
tic;
[~,idx] = max(bsxfun(#eq,char(A),char(B)'));
t = toc;
fprintf('bsxfun: %f\n', t);
clear idx; %// Make sure test is unbiased
tic;
[~, indA] = sort(A);
[~, indB] = sort(B);
idx = indB(indA);
t = toc;
fprintf('sort: %f\n', t);
This is what I get for timing:
ismember: 0.058947
bsxfun: 0.110809
sort: 0.006054
Luis Mendo's approach is the fastest, followed by ismember, and then finally bsxfun. For code compactness, ismember is preferred but for performance, sort is better. Personally, I think bsxfun should win because it's such a nice function to use ;).
This seems to be significantly faster than using ismember (although admittedly less clear than #rayryeng's answer). With thanks to #Divakar for his correction on this answer.
[~, indA] = sort(A);
[~, indB] = sort(B);
idx = indA(indB);
I had to jump in as it seems runtime performance could be a criteria here :)
Assuming that you are dealing with scalar strings(one character in each cell), here's my take that works even when you have not-commmon elements between A and B and uses the very powerful bsxfun and as such I am really hoping this would be runtime-efficient -
[v,idx] = max(bsxfun(#eq,char(A),char(B)'));
idx = v.*idx
Example -
A =
'a' 'b' 'c' 'd'
B =
'b' 'a' 'c' 'e'
idx =
2 1 3 0
For a specific case when you have no not-common elements between A and B, it becomes a one-liner -
[~,idx] = max(bsxfun(#eq,char(A),char(B)'))
Example -
A =
'a' 'b' 'c'
B =
'b' 'a' 'c'
idx =
2 1 3

Sorting strings in MATLAB like Windows 7 sorts filenames in explorer (respecting numerals mid-string)

What is the best way to sort strings in MATLAB, respecting numerals that could be present mid-string?
The following example illustrates my problem. 401 is a numerically higher value than 6. Therefore, the Ie401sp2 string should be listed after the Ie6 string when sorting in ascending order. In this example, note how the following strings, which contain numerals, are sorted.
---Matlab--- (Not sorting the way I want)
Ie4_01
Ie4_128
Ie401sp2
Ie5
Ie501sp2
Ie6
---Windows 7--- (The way I want MATLAB to sort)
Ie4_01
Ie4_128
Ie5
Ie6
Ie401sp2
Ie501sp2
Windows 7 respects the relative values of the numerals that appear mid-string. What is the best way to do this in Matlab? I am trying to avoid going off on a minor tangent to reinvent the wheel.
This is a bit of a hacky version, but it roughly works:
function x = sortit(x)
% get a sortable version of each element of x
hacked_x = cellfun( #iSortVersion, x, 'UniformOutput', false );
% sort those, discard the sorted output
[~, idx] = sort( hacked_x );
% re-order input by sorted order.
x = x(idx);
end
% convert a string with embedded numbers into a sortable string
function hack = iSortVersion( in )
pieces = regexp( in, '(\d+|[^\d]+)', 'match' );
pieces = cellfun( #iZeroPadNumbers, pieces, 'UniformOutput', false );
hack = [ pieces{:} ];
end
% if a string containing a number, pad with lots of zeros
function nhack = iZeroPadNumbers( in )
val = str2double(in);
if isnan(val)
nhack = in;
else
nhack = sprintf( '%030d', val );
end
end

Resources