How to split a string into sub strings of n length? - string

How would i split a string into sub array's of n length in Matlab?
eg.
Input: "ABCDEFGHIJKL", with sub arrays of length 3
Output: {ABC}, {DEF}, {GHI}, {JKL}

If the string length is not a multiple of n you probably need a loop or arrayfun:
x = 'ABCDEFGHIJK'; % length 11
n = 3;
result = arrayfun(#(k) x(k:min(k+n-1, end)), 1:n:numel(x), 'UniformOutput', false)
Alternatively, accumarray can be used as well:
x = 'ABCDEFGHIJK';
n = 3;
result = accumarray(floor((0:numel(x)-1).'/n)+1, x, [], #(t) {t.'}).';
Either of the above gives, in this example,
result =
1×4 cell array
{'ABC'} {'DEF'} {'GHI'} {'JK'}

A regular expression can do the job here:
str = 'abcdefgh'
exp = '.{1,3}' %the regular expression (get all the group of 3 char, if number of char left < 3, take the rest)
res = regexp(str,exp,'match')
which give:
res =
1×3 cell array
{'abc'} {'def'} {'gh'}
If you only want to match group of 3 char:
exp = '.{3}' %this will output {'abc'} {'def'} but no {'gh'}

This shoud do it :)
string = cellstr(reshape(string, 3, [])')

Related

Matlab String Conversion to Array

I got a string array of the format
sLine =
{
[1,1] = 13-Jul-16,10.46,100.63,15.7,54.4,55656465
[1,2] = 12-Jul-16,10.47,100.64,15.7,54.4,55656465
[1,3] = 11-Jul-16,10.48,100.65,15.7,54.4,55656465
[1,4] = 10-Jul-16,10.49,100.66,15.7,54.4,55656465
}
In which each element is a string ("13-Jul-16,10.46,100.63,15.7,54.4,55656465" is a string).
I need to convert this to 6 vectors, something like
[a b c d e f] = ...
such a way, for example, for the 1st column, it would be
a = [13-Jul-16;12-Jul-16;11-Jul-16;10-Jul-16]
I tried to use cell2mat function, but for some reason it does not separate the fields into matrix elements, but it concatenates the whole string into something like
cell2mat(sLine)
ans =
13-Jul-16,10.46,100.63,15.7,54.4,5565646512-Jul-16,10.47,100.64,15.7,54.4,5565646511-Jul-16,10.48,100.65,15.7,54.4,5565646510-Jul-16,10.49,100.66,15.7,54.4,55656465
So, how can I solve this?
Update
I got the sLine matrix following the steps
pFile = urlread('http://www.google.com/finance/historical?q=BVMF:PETR4&num=365&output=csv');
sLine = strsplit(pFile,'\n');
sLine(:,1)=[];
Update
Thanks to #Suever I could get now the column dates. So the updated last version of the code is
pFile = urlread('http://www.google.com/finance/historical?q=BVMF:PETR4&num=365&output=csv');
pFile=strtrim(pFile);
sLine = strsplit(pFile,'\n');
sLine(:,1)=[];
split_values = regexp(sLine, ',', 'split');
values = cat(1, split_values{:});
values(:,1)
Your data is all strings, therefore you will need to do some string manipulation rather than using cell2mat.
You will want to split each element at the ,characters and then concatenate the result together.
sLine = {'13-Jul-16,10.46,100.63,15.7,54.4,55656465',
'12-Jul-16,10.47,100.64,15.7,54.4,55656465',
'11-Jul-16,10.48,100.65,15.7,54.4,55656465',
'10-Jul-16,10.49,100.66,15.7,54.4,55656465'};
split_values = cellfun(#(x)strsplit(x, ','), sLine, 'uniformoutput', 0);
values = cat(1, split_values{:});
values(:,1)
% {
% [1,1] = 13-Jul-16
% [2,1] = 12-Jul-16
% [3,1] = 11-Jul-16
% [4,1] = 10-Jul-16
% }
If you want it to be more concise, we can just use regexp to split it up instead of strsplit since it can accept a cell array as input.
split_values = regexp(sLine, ',', 'split');
values = cat(1, split_values{:});
Update
The issue with the code that you've posted is that there is a trailing newline in the input and when you split on newlines the last element of your sLine cell array is empty causing your issues. You'll want to use strtrim on pFile before creating the cell array to remove trailing newlines.
sLine = strsplit(strtrim(pFile), '\n');
sLine(:,1) = [];

Compare 1 string with a cell array of strings with indexes (Matlab)

I have 1 string and 1 cell array of srings :
F = 'ABCD'
R = {'ACBD','CDAB','CABD'};
I would like to compare the string F with all of the strings in R as follows: F(1)='A' and R{1}(1)='A', we will count 1 ( because they have the same value 'A') , F(2)='B' and R{1}(2)='C' we will count 0 ( because they have different values)...and like that until the end of all strings.
We will get same = 2 , dif = 2 for this 'ABCD' and 'ACBD'.
How can I compare F with all the elements in R in the above rule and get the total(same) and total(dif) ?
Assuming all strings in R has the same length as F you can use cellfun:
same = cellfun( #(r) sum(F==r), R )
Results with
2 0 1
That is, the same value per string in R. If you want dif:
dif = numel(F)-same;
If you want the totals:
tot_same = sum(same);
tot_dif = sum(dif);

Find substring of string w/o knowing the length of string

I have a string x: x = "{abc}{def}{ghi}"
And I need to print the string between second { and second }, in this case def. How can I do this without knowing the length of the string? For example, the string x could also be {abcde}{fghij}{klmno}"
This is where pattern matching is useful:
local x = "{abc}{def}{ghi}"
local result = x:match(".-{.-}.-{(.-)}")
print(result)
.- matches zero or more characters, non-greedy. The whole pattern .-{.-}.-{(.-)} captures what's between the second { and the second }.
Try also x:match(".-}{(.-)}"), which is simpler.
I would go about it in a different manner:
local i, x, result = 1, "{abc}{def}{ghi}"
for w in x:gmatch '{(.-)}' do
if i == 2 then
result = w
break
else
i = i + 1
end
end
print( result )

Is it possible to concatenate a string with series of number?

I have a string (eg. 'STA') and I want to make a cell array that will be a concatenation of my sting with a numbers from 1 to X.
I want the code to do something like the fore loop here below:
for i = 1:Num
a = [{a} {strcat('STA',num2str(i))}]
end
I want the end results to be in the form of {<1xNum cell>}
a = 'STA1' 'STA2' 'STA3' ...
(I want to set this to a uitable in the ColumnFormat array)
ColumnFormat = {{a},... % 1
'numeric',... % 2
'numeric'}; % 3
I'm not sure about starting with STA1, but this should get you a list that starts with STA (from which I guess you could remove the first entry).
N = 5;
[X{1:N+1}] = deal('STA');
a = genvarname(X);
a = a(2:end);
You can do it with combination of NUM2STR (converts numbers to strings), CELLSTR (converts strings to cell array), STRTRIM (removes extra spaces)and STRCAT (combines with another string) functions.
You need (:) to make sure the numeric vector is column.
x = 1:Num;
a = strcat( 'STA', strtrim( cellstr( num2str(x(:)) ) ) );
As an alternative for matrix with more dimensions I have this helper function:
function c = num2cellstr(xx, varargin)
%Converts matrix of numeric data to cell array of strings
c = cellfun(#(x) num2str(x,varargin{:}), num2cell(xx), 'UniformOutput', false);
Try this:
N = 10;
a = cell(1,N);
for i = 1:N
a(i) = {['STA',num2str(i)]};
end

String lexicographical permutation and inversion

Consider the following function on a string:
int F(string S)
{
int N = S.size();
int T = 0;
for (int i = 0; i < N; i++)
for (int j = i + 1; j < N; j++)
if (S[i] > S[j])
T++;
return T;
}
A string S0 of length N with all pairwise distinct characters has a total of N! unique permutations.
For example "bac" has the following 6 permutations:
bac
abc
cba
bca
acb
cab
Consider these N! strings in lexicographical order:
abc
acb
bac
bca
cab
cba
Now consider the application of F to each of these strings:
F("abc") = 0
F("acb") = 1
F("bac") = 1
F("bca") = 2
F("cab") = 2
F("cba") = 3
Given some string S1 of this set of permutations, we want to find the next string S2 in the set, that has the following relationship to S1:
F(S2) == F(S1) + 1
For example if S1 == "acb" (F = 1) than S2 == "bca" (F = 1 + 1 = 2)
One way to do this would be to start at one past S1 and iterate through the list of permutations looking for F(S) = F(S1)+1. This is unfortunately O(N!).
By what O(N) function on S1 can we calculate S2 directly?
Suppose length of S1 is n, biggest value for F(S1) is n(n-1)/2, if F(S1) = n(n-1)/2, means it's a last function and there isn't any next for it, but if F(S1) < n(n-1)/2, means there is at least one char x which is bigger than char y and x is next to y, find such a x with lowest index, and change x and y places. let see it by example:
S1 == "acb" (F = 1) , 1 < 3 so there is a char x which is bigger than another char y and its index is bigger than y, here smallest index x is c, and by first try you will replace it with a (which is smaller than x so algorithm finishes here)==> S2= "cab", F(S2) = 2.
Now let test it with S2, cab: x=b, y=a, ==> S3 = "cba".\
finding x is not hard, iterate the input, and have a variable name it min, while current visited character is smaller than min, set min as newly visited char, and visit next character, first time you visit a character which is bigger than min stop iteration, this is x:
This is pseudocode in c# (but I wasn't careful about boundaries e.g in input.Substring):
string NextString(string input)
{
var min = input[0];
int i=1;
while (i < input.Length && input[i] < min)
{
min = input[i];
i++;
}
if (i == input.Length) return "There isn't next item";
var x = input[i], y=input[i-1];
return input.Substring(0,i-2) + x + y + input.Substring(i,input.Length - 1 - i);
}
Here's the outline of an algorithm for a solution to your problem.
I'll assume that you have a function to directly return the n-th permutation (given n) and its inverse, ie a function to return n given a permutation. Let these be perm(n) and perm'(n) respectively.
If I've figured it correctly, when you have a 4-letter string to permute the function F goes like this:
F("abcd") = 0
F("abdc") = 1
F(perm(3)) = 1
F(...) = 2
F(...) = 2
F(...) = 3
F(perm(7)) = 1
F(...) = 2
F(...) = 2
F(...) = 3
F(...) = 3
F(...) = 4
F(perm(13)) = 2
F(...) = 3
F(...) = 3
F(...) = 4
F(...) = 4
F(...) = 5
F(perm(19)) = 3
F(...) = 4
F(...) = 4
F(...) = 5
F(...) = 5
F(perm(24)) = 6
In words, when you go from 3 letters to 4 you get 4 copies of the table of values of F, adding (0,1,2,3) to the (1st,2nd,3rd,4th) copy respectively. In the 2nd case, for example, you already have one derangement by putting the 2nd letter in the 1st place; this simply gets added to the other derangements in the same pattern as would be true for the original 3-letter strings.
From this outline it shouldn't be too difficult (but I haven't got time right now) to write the function F. Strictly speaking the inverse of F isn't a function as it would be multi-valued, but given n, and F(n) there are only a few cases for finding m st F(m)==F(n)+1. These cases are:
n == N! where N is the number of letters in the string, there is no next permutation;
F(n+1) < F(n), the sought-for solution is perm(n+(N-1)!), ;
F(n+1) == F(n), the solution is perm(n+2);
F(n+1) > F(n), the solution is perm(n+1).
I suspect that some of this might only work for 4 letter strings, that some of these terms will have to be adjusted for K-letter permutations.
This is not O(n), but it is at least O(n²) (where n is the number of elements in the permutation, in your example 3).
First, notice that whenever you place a character in your string, you already know how much of an increase in F that's going to mean -- it's however many characters smaller than that one that haven't been added to the string yet.
This gives us another algorithm to calculate F(n):
used = set()
def get_inversions(S1):
inv = 0
for index, ch in enumerate(S1):
character = ord(ch)-ord('a')
cnt = sum(1 for x in range(character) if x not in used)
inv += cnt
used.add(character)
return inv
This is not much better than the original version, but it is useful when inverting F. You want to know the first string that is lexicographically smaller -- therefore, it makes sense to copy your original string and only change it whenever mandatory. When such changes are required, we should also change the string by the least amount possible.
To do so, let's use the information that the biggest value of F for a string with n letters is n(n-1)/2. Whenever the number of required inversions would be bigger than this amount if we didn't change the original string, this means we must swap a letter at that point. Code in Python:
used = set()
def get_inversions(S1):
inv = 0
for index, ch in enumerate(S1):
character = ord(ch)-ord('a')
cnt = sum(1 for x in range(character) if x not in used)
inv += cnt
used.add(character)
return inv
def f_recursive(n, S1, inv, ign):
if n == 0: return ""
delta = inv - (n-1)*(n-2)/2
if ign:
cnt = 0
ch = 0
else:
ch = ord(S1[len(S1)-n])-ord('a')
cnt = sum(1 for x in range(ch) if x not in used)
for letter in range(ch, len(S1)):
if letter not in used:
if cnt < delta:
cnt += 1
continue
used.add(letter)
if letter != ch: ign = True
return chr(letter+ord('a'))+f_recursive(n-1, S1, inv-cnt, ign)
def F_inv(S1):
used.clear()
inv = get_inversions(S1)
used.clear()
return f_recursive(len(S1), S1, inv+1, False)
print F_inv("acb")
It can also be made to run in O(n log n) by replacing the innermost loop with a data structure such as a binary indexed tree.
Did you try to swap two neighbor characters in the string? It seems that it can help to solve the problem. If you swap S[i] and S[j], where i < j and S[i] < S[j], then F(S) increases by one, because all other pairs of indices are not affected by this permutation.
If I'm not mistaken, F calculates the number of inversions of the permutation.

Resources