Matlab Split-String - string

Hello,
I have a little problem.
I have a txt file with over 200mb.
It looks like:
%Hello World
%second sentences
%third;
%example
12.02.2014
;-400;-200;200
;123;233;434
%Hello World
%second sentences
%third
%example
12.02.2014
;-410;200;20300
;63;23;43
;23;44;78213
..
... ...
I need only the Values after the semicolon like:
Value1{1,1}=[-400]; Value{1,2}=[-200]; and Value{1,3}=[200]
Value2{1,1}=[123]; Value{1,2}=[233]; and Value{1,3}=[434]
and so on.
Hase someone an ideas, how i can split the values in a cell array or vektor?
Thus, the variables must be:
Var1=[-400 -200 200;
434 233 434;
Var2=[
-410 200 20300;
63 23 43;
23 44 28213]
I will seperate, after every date in a another Value. Example when i have 55 Dates, i will have 55 Values.
shareeditundeleteflag

This could be one approach assuming a uniformly structured data (3 valid numbers per row) -
%// Read in entire text data into a cell array
data = importdata('sample.txt','');
%// Remove empty lines
data = data(~cellfun('isempty',data))
%// Find boundaries based on delimiter "%example"
exmp_delim_matches = arrayfun(#(n) strcmp(data{n},'%example'),1:numel(data))
bound_idx = [find(exmp_delim_matches) numel(exmp_delim_matches)]
%// Find lines that start with delimiter ";"
matches_idx = find(arrayfun(#(n) strcmp(data{n}(1),';'),1:numel(data)))
%// Select lines that start with character ";" and split lines based on it
%// Split selected lines based on the delimiter ";"
split_data = regexp(data(matches_idx),';','split')
%// Collect all cells data into a 1D cell array
all_data = [split_data{:}]
%// Select only non-empty cells and convert them to a numeric array.
%// Finally reshape into a format with 3 numbers per row as final output
out = reshape(str2double(all_data(~cellfun('isempty',all_data))),3,[]).' %//'
%// Separate out lines based on the earlier set bounds
out_sep = arrayfun(#(n) out(matches_idx>bound_idx(n) & ...
matches_idx<bound_idx(n+1),:),1:numel(bound_idx)-1,'Uni',0)
%// Display results for verification
celldisp(out_sep)
Code run -
out_sep{1} =
-400 -200 200
123 233 434
out_sep{2} =
-410 200 20300
63 23 43
23 44 78213

A brute force approach would be to open up your file, then read each line one at a time. With each line, you check to see if the first character is a semi-colon and if it is, split up the string by the ; delimiter from the second character of the line up until the end. You will receive a cell array of strings, so you'd have to convert this into an array of numbers. Because you will probably have each line containing a different amount of numbers, let's store each array into a cell array where each element in this cell array will contain the numbers per line. As such, do something like this. Let's assume your text file is stored in text.txt:
fid = fopen('text.txt');
if fid == -1
error('Cannot find file');
end
nums = {};
while true
st = fgetl(fid);
if st == -1
break;
end
if st(1) == ';'
st_split = strsplit(st(2:end), ';');
arr = cellfun(#str2num, st_split);
nums = [nums arr];
end
end
Let's go through the above code slowly. We first use fopen to open up the file for reading. We check to see if the ID returned from fopen is -1 and if that's the case, we couldn't find or open the file so spit out an error. Next, we declare an empty cell array called nums which will store our numbers that you are getting when parsing your text file.
Now, until we reach the end of the file, get one line of text starting from the top of the file and we proceed to the end. We use fgetl for this. If we read a -1, this means we have reached the end of the file, so get out of the loop. Else, we check to see if the first character is ;. If it is, then we take a look at the second character until the end of this line, and split the string based on the ; character with strsplit. The result of this will be a cell array of strings where each element is the string representation of your number. You need to convert this cell array back into a numeric array, and so what you would need to do is apply str2num to each element in this cell. You can either use a loop to go through each cell, or you can conveniently use [cellfun](http://www.mathworks.com/help/matlab/ref/cellfun.html to allow you to go through each element in this cell and convert the string representation into a numeric value. The resulting output of cellfun will give you a numeric array representation of each value delimited by the ; character for that line. We then place this array into a single cell stored in nums.
The end result of this entire code will give you numeric arrays that are based on what you are looking for stored in nums.
Warning
I am assuming that your text file only has numbers delimited by ; characters if we encounter a line that starts with ;. If this is not the case, then my code will not work. I'm assuming this isn't the case!

Related

Excel : Find only Hexa decimals from 1 cell

I'm a newbie on Excel.
So I have a list of some names ending with Hexa decimals. And some names, that doesn't have any.
My mission is to see only those names with Hexa decimals. (Mabye somehow filter them out)
Column:
BFAXSPOINTDEVBAUHOFLAN2AD
BFAXSQLBAUHOFLAN207
BFAXSQLDEVBAUHOFLAN27A
BFREPDEVBAUHOFLAN258
BFREPORTINGBAUHOFLAN20B
COBALTSEA02900
COBALTSEAVHOST900
DIRECTO8000
DIRECTO9000
DIRECTODCDIRECTOLA009
DYNAMAEBSSISE006
SURVEYEBSSISE006
KVMSRV00",
KVMSRV01",
KVMSRV02",
ASR
CACTI
DBSYNC",
DTV
and so on...
The Function HEX2DEC will help you achieve what you want - it attempts to convert a number as a hexidecimal, into a decimal. If it is not a valid Hex input, it will produce an error.
The key is understanding how many digits you expect your decimal to be - is it the last 5 characters; the last 10; etc. Also note that there is a risk that random text / numbers will be seen as hexidecimal when really that's not what it represents [but that's a problem with the question as you have laid it out; going solely based on the text provided, all we can see is whether a particular cell creates a valid Hexidecimal].
The full formula would look like this[assuming your data starts in A1, and that your Hexidecimal numbers are expected to be 6 characters long, this goes in B1 and is copied down]:
=ISERROR(HEX2DEC(RIGHT(A1,6)))
This takes the 6 rightmost characters of a cell, and attempts to convert it from Hex to Decimal. If it fails, it will produce TRUE [because of ISERROR]; if it succeeds, it will produce FALSE.
Then simply filter on your column to see the subset of results you care about.
Consider the following UDF:
Public Function EndsInHex(r As Range) As Boolean
Dim s As String, CH As String
s = r(1).Text
CH = Right(s, 1)
If CH Like "[A-F]" Or CH Like "[0-9]" Then
EndsInHex = True
Else
EndsInHex = False
End If
End Function
For the string to end in a hex, the last character must be a hex.

How to print a number within a string in matlab

I would like to use the command text to type numbers within 57 hexagons. I want to use a loop:
for mm=1:57
text(x(m),y(m),'m')
end
where x(m) and y(m) are the coordinates of the text .
The script above types the string "m" and not the value of m. What am I doing wrong?
Jubobs pretty much told you how to do it. Use the num2str function. BTW, small typo in your for loop. You mean to use mm:
for mm=1:57
text(x(mm),y(mm),num2str(mm));
end
The reason why I've even decided to post an answer is because you can do this vectorized without a loop, which I'd also like to write an answer for. What you can do place each number into a character array where each row denotes a unique number, and you can use text to print out all numbers simultaneously.
m = sprintfc('%2d', 1:57);
d = reshape([m{:}], 2, 57).';
text(x, y, d);
The (undocumented!) function sprintfc takes a formatting specifier and an array and creates a cell array of strings where each cell is the string version of each element in the array you supply. In order to ensure that the character array has the same number of columns per row, I ensure that each string takes up 2 characters, and so any number less than 10 will have a blank space at the beginning. I then convert the cell array of strings into a character array by converting the cell array into a comma-separated list of strings and I reshape the matrix into an acceptable form, and then I call text with all of the pairs of x and y, with the corresponding labels in m together on the screen.

How to extract excel column and import it to MATLAB?

Hello I have an excel file with multiple columns up to "CD". My code words perfectly for excel files with 26 columns but after that it doesn't work.
[ia ib] = ismember(header, {item});
letter = find(ia)+'A'-1;
cell = fprintf('%c:%c', letter, letter);
out = xlsread('filename', cell);
This code works until I get to Z:Z. When I get to AA, AB, AC,... it won't work. How do I extract the AA, CD, BG columns?
It doesn't work because you are assuming that your letter for the header is only one character as indicated by:
letter = find(ia) + 'A' - 1;
What are you doing is essentially building the ASCII code for a capital letter between A to Z. This will obviously fail if you are trying to find a header with more than one letter. What you'll need to do is build a dictionary of all possible characters of AA to ZZ, then you can use the output of find(ia) on this dictionary if we exceed the column Z in your Excel sheet to extract out the right sequence of characters you need, then finally use this sequence of characters to index into your Excel sheet.
Referencing this question, I'm going to take Rody Oldenhuis's answer. Therefore, construct this dictionary of all possible two characters:
x = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
y = unique(nchoosek(repmat(x, 1,26), 2), 'rows');
y will be a N x 2 character matrix where each row is a unique permutation of two letters from A-Z (so AA, AB, etc.). The way the code is written, it should maintain the exact ordering like how Excel does it for columns that go beyond Z, so AA, AB, AC, ... AZ, BA, BB, BC, ... BZ, ..., ZX, ZY, ZZ. Next, we need to see whether or not the found index is between 1 and 26. If it is, you can use your previous code. If it isn't, then we'll do what we outlined above. Note that I will have to subtract this found index by 26 so I can index into this character array that we created. Assuming that header has all unique entries, we can do:
[ia ib] = ismember(header, {item});
index = find(ia, 1);
if index <= 26 %// Check if we are within columns A - Z
letter = index + 'A' - 1;
else %// If not, we are at a column that is beyond Z.
x = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
y = unique(nchoosek(repmat(x, 1,26), 2), 'rows');
index = index - 26; %// Subtract by 26 to reference into character array
letter = y(index,:);
end
cell = sprintf('%s:%s', letter, letter);
out = xlsread('filename', cell);
Note that I changed your fprintf call to sprintf as you desire to store the string representation of which cells you want to access. fprintf (in your case) will print to the screen, which is probably not what you want. Also, I've changed the variable cell to ce as cell is an actual function in MATLAB.
Also note that I've changed the %c formatting string to %s as the header may consist of more than one character.

I want to extract only last two numeric values from a string variable in SAS

I want to extract only last two numeric values from a string variable and assign it to a new variable. Firstly i have extracted all the numeric values from the string using the code below and assigned it to a new variable but i ultimately want to extract only the last two numeric values so is there any better way to do this.
UI_DUM = input(compress(Prod_Desc,,"kd"),best.);
And one more question is: how to assign a temp variable for doing some manupulation work in SAS?
Here is the code.
You are doing it right, to remove the characters and keeping only digits. The same is being done for variable "temp1"(in the below code).
In the second step, using the length function, to calculate the total length of the string which now contains only digits. In the third step using the substr function to extract the last two digits.
If you want to do it in one statement, "final" variable is the answer.
LENGTH Function - Returns the length of a non-blank character string, excluding
trailing blanks, and returns 1 for a blank character string
compress function with "kd" option - would keep only digits.
COMPRESS(<, chars><, modifiers>)
Modifier - specifies a character constant, variable, or expression in which each non-blank character modifies the action of the COMPRESS function. Blanks are ignored. The following characters can be used as modifiers.
d or D adds digits to the list of characters.
k or K keeps the characters in the list instead of removing them
substr function - Extracts a substring from an argument -
SUBSTR(string, position<,length>)
data _null_;
Test_string="ada13117a1w11da1286s";
temp1=compress(Test_string, , 'kd');
temp2=length(temp1);
temp3=substr(temp1,temp2-1,2);
final=substr(compress(Test_string, , 'kd'),length(compress(temp1))-1,2);
put _all_;
run;
Regarding the temp variable, there is no such one in SAS. Just use any variable name and use the drop statement in final dataset like below;
data test(drop = temp); /*Would work as the temp variable*/
temp= 2*balance;/*just for example*/
/*use the temp in further calculations*/
run;
A somewhat different take:
data want;
set have;
UI_DUM = input(compress(Prod_Desc,,"kd"),best.);
UI_DUM_last2 = mod(UI_DUM,100);
run;
You could do that all in one line of course as well. This uses the numeric modulo function to simply give you the last 2 digits (any number modulo 100 will return the final 2 digits).

Counting the occurence of substrings in matlab

I have a cell, something like this P= {Face1 Face6 Scene6 Both9 Face9 Scene11 Both12 Face15}. I would like to count how many Face values, Scene values, Both values in P. I don't care about the numeric values after the string (i.e., Face1 and Face23 would be counted as two). I've tried the following (for the Face) but I got the error "If any of the input arguments are cell arrays, the first must be a cell array of strings and the second must be a character array".
strToSearch='Face';
numel(strfind(P,strToSearch));
Does anyone have any suggestion? Thank you!
Use regexp to find strings that start (^) with the desired text (such as 'Face'). The result will be a cell array, where each cell contains 1 if there is a match, or [] otherwise. So determine if each cell is nonempty (~cellfun('isempty', ...): will give a logical 1 for nonempty cells, and 0 for empty cells), and sum the results (sum):
>> P = {'Face1' 'Face6' 'Scene6' 'Both9' 'Face9' 'Scene11' 'Both12' 'Face15'};
>> sum(~cellfun('isempty', regexp(P, '^Face')))
ans =
4
>> sum(~cellfun('isempty', regexp(P, '^Scene')))
ans =
2
Your example should work with some small tweaks, provided all of P contains strings, but may give the error you get if there are any non-string values in the cell array.
P= {'Face1' 'Face6' 'Scene6' 'Both9' 'Face9' 'Scene11' 'Both12' 'Face15'};
strToSearch='Face';
n = strfind(P,strToSearch);
numel([n{:}])
(returns 4)

Resources