Octave strings malipulating - string

I have a problem in Octave
I want to find all different(!) pairs of two letters in a text(with no spaces, only letters)
For example:
my text = "abcdabcd"
i want find array(or vector?) that looks like: ab bc cd da
How do i do this in the easies way possible?
Thanks for your help

You can use the unique() function to do this. The only trick is in creating the list of two characters which can be done by using two lines, shifted by one character.
str = "abcdabcd";
str(2,:) = shift (str, -1);
str(:,end) = []; # remove last column
unique (str', "rows")

Related

How to check if all values in the list are the same length? And if not how to add extra digits to equalise those values?

I'm doing a little coding in Python, and I came up to the issue that some of my values and not the same length.
Desired length is 15 characters
for example:
string = ['110000111100111', '100100110011011', '001101100110', '01011010001110', '111100111001', '1101100111011']
Some of these values are different, and I want to add zeros to equalise them to the same length. Specifically by adding some zeros to those values that are shorter.
Can someone give me a hand on this?
Many thanks!
I tried comparing them and finding shorter values in the list. I'm new to this.
Try this:
s = ['110000111100111', '100100110011011', '001101100110', '01011010001110', '1100111001', '1101100111011']
s = [x.zfill(15) for x in s]
zfill() will pad a string with zeros on the left, up to the desired string length.

How to split string by same multiple delimiters in Java

I am trying to split the string 'id#namespace' by #.
There is this special case that the id has a format name#gmail which makes the string I am trying to split look like name#gmail#namespace.
Is there any way to achieve that only split by the last # which will give me name#gmail and namespace?
If it is always the last index. Use lastIndex to find the last index of the character and then use substring
Something like this
int idx = string.lastIndexOf("#");
String[] splitStrings = {string.substring(0, idx), string.substring(idx)};
You could match the last # using a negative lookahead (?! to assert that there are no more # following:
#(?!.*#)
Regex demo
System.out.println("name#gmail#namespace".split("#(?!.*#)")[0]); //name#gmail

Finding mean of ascii values in a string MATLAB

The string I am given is as follows:
scrap1 =
a le h
ke fd
zyq b
ner i
You'll notice there are 2 blank spaces indicating a space (ASCII 32) in each row. I need to find the mean ASCII value in each column without taking into account the spaces (32). So first I would convert to with double(scrap1) but then how do I find the mean without taking into account the spaces?
If it's only the ASCII 32 you want to omit:
d = double(scrap1);
result = mean(d(d~=32)); %// logical indexing to remove unwanted value, then mean
You can remove the intermediate spaces in the string with scrap1(scrap1 == ' ') = ''; This replaces any space in the input with an empty string. Then you can do the conversion to double and average the result. See here for other methods.
Probably, you can use regex to find the space and ignore it. "\s"
findSpace = regexp(scrap1, '\s', 'ignore')
% I am not sure about the ignore case, this what comes to my mind. but u can read more about regexp by typying doc regexp.

Replace multiple substrings using strrep in Matlab

I have a big string (around 25M characters) where I need to replace multiple substrings of a specific pattern in it.
Frame 1
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
Frame 2
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
Frame 7670
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
The substring I need to remove is the 'Frame #' and it occurs around 7670 times. I can give multiple search strings in strrep, using a cell array
strrep(text,{'Frame 1','Frame 2',..,'Frame 7670'},';')
However that returns a cell array, where in each cell, I have the original string with the corresponding substring of one of my input cell changed.
Is there a way to replace multiple substrings from a string, other than using regexprep? I noticed that it is considerably slower than strrep, that's why I am trying to avoid it.
With regexprep it would be:
regexprep(text,'Frame \d*',';')
and for a string of 25MB it takes around 47 seconds to replace all the instances.
EDIT 1: added the equivalent regexprep command
EDIT 2: added size of the string for reference, number of occurences for the substring and timing of execution for the regexprep
Ok, in the end I found a way to go around the problem. Instead of using regexprep to change the substring, I remove the 'Frame ' substring (including whitespace, but not the number)
rawData = strrep(text,'Frame ','');
This results in something like this:
1
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
2
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
7670
0,0,0,0,0,1,2,34,0
0,1,2,3,34,12,3,4,0
...........
Then, I change all the commas (,) and newline characters (\n) into a semicolon (;), using again strrep, and I create a big vector with all the numbers
rawData = strrep(rawData,sprintf('\r\n'),';');
rawData = strrep(rawData,';;',';');
rawData = strrep(rawData,';;',';');
rawData = strrep(rawData,',',';');
rawData = textscan(rawData,'%f','Delimiter',';');
then I remove the unnecessary numbers (1,2,...,7670), since they are located at a specific point in the array (each frame contains a specific amount of numbers).
rawData{1}(firstInstance:spacing:lastInstance)=[];
And then I go on with my manipulations. It seems that the additional strrep and removal of the values from the array is much much faster than the equivalent regexprep. With a string of 25M chars with regexprep I can do the whole operation in about 47", while with this workaround it takes only 5"!
Hope this helps somehow.
I think that this can be done using only textscan, which is known to be very fast. Be specifying a 'CommentStyle' the 'Frame #' lines are stripped out. This may only work because these 'Frame #' lines are on their own lines. This code returns the raw data as one big vector:
s = textscan(text,'%f','CommentStyle','Frame','Delimiter',',');
s = s{:}
You may want to know how many elements are in each frame or even reshape the data into a matrix. You can use textscan again (or before the above) to get just the data for the first frame:
f1 = textscan(text,'%f','CommentStyle','Frame 1','Delimiter',',');
f1 = s{:}
In fact, if you just want the elements from the first line, you can use this:
l1 = textscan(text,'%f,','CommentStyle','Frame 1')
l1 = l1{:}
However, the other nice thing about textscan is that you can use it to read in the file directly (it looks like you may be using some other means currently) using just fopen to get an FID. Thus the string data text doesn't have to be in memory.
Using regular expressions:
result = regexprep(text,'Frame [0-9]+','');
It's possible to avoid regular expressions as follows. I use strrep with suitable replacement strings that act as masks. The obtained strings are equal-length and are assured to be aligned, and can thus be combined into the final result using the masks. I've also included the ; you want. I don't know if it will be faster than regexprep or not, but it's definitely more fun :-)
% Data
text = 'Hello Frame 1 test string Frame 22 end of Frame 2 this'; %//example text
rep_orig = {'Frame 1','Frame 2','Frame 22'}; %//strings to be replaced.
%//May be of different lengths
% Computations
rep_dest = cellfun(#(s) char(zeros(1,length(s))), rep_orig, 'uni', false);
%//series of char(0) of same length as strings to be replaced (to be used as mask)
aux = cell2mat(strrep(text,rep_orig.',rep_dest.'));
ind_keep = all(double(aux)); %//keep characters according to mask
ind_semicolon = diff(ind_keep)==1; %//where to insert ';'
ind_keep = ind_keep | [ind_semicolon 0]; %// semicolons will also be kept
result = aux(1,:); %//for now
result(ind_semicolon) = ';'; %//include `;`
result = result(ind_keep); %//remove unwanted characters
With these example data:
>> text
text =
Hello Frame 1 test string Frame 22 end of Frame 2 this
>> result
result =
Hello ; test string ; end of ; this

Python Challenge # 2 = removing characters from a string

I have the code:
theory = """}#)$[]_+(^_#^][]_)*^*+_!{&$##]((](}}{[!$#_{&{){
*_{^}$#!+]{[^&++*#!]*)]%$!{#^&%(%^*}#^+__])_$#_^#[{{})}$*]#%]{}{][#^!#)_[}{())%)
())&##*[#}+#^}#%!![#&*}^{^(({+#*[!{!}){(!*#!+#[_(*^+*]$]+#+*_##)&)^(#$^]e#][#&)(
%%{})+^$))[{))}&$(^+{&(#%*#&*(^&{}+!}_!^($}!(}_##++$)(%}{!{_]%}$!){%^%%#^%&#([+[
_+%){{}(#_}&{&++!#_)(_+}%_#+]&^)+]_[#]+$!+{#}$^!&)#%#^&+$#[+&+{^{*[#]#!{_*[)(#[[
]*!*}}*_(+&%{&#$&+*_]#+#]!&*#}$%)!})#&)*}#(#}!^(]^#}]#&%)![^!$*)&_]^%{{}(!)_&{_{
+[_*+}]$_[##_^]*^*##{&%})*{&**}}}!_!+{&^)__)#_#$#%{+)^!{}^#[$+^}&(%%)&!+^_^#}^({
*%]&#{]++}#$$)}#]{)!+#[^)!#[%#^!!"""
#theory = open("temp.txt")
key = "##!$%+{}[]_-&*()*^#/"
new2 =""
print()
for letter in theory:
if letter not in key:
new2 += letter
print(new2)
This is a test piece of code to solve the python challenge #2: http://www.pythonchallenge.com/pc/def/ocr.html
The only trouble is, the code I wrote seems to leaves lots of whitespace but I'm not sure why.
Any ideas on how to remove the unnecessary white? In other words I want the code to return "e" not " e ".
The challenge is to find a rare character. You could use collections.Counter for that:
from collections import Counter
c = Counter(theory)
print(c.most_common()[-1])
Output
('e', 1)
The unnecessary whitespace could be removed using .strip():
new2.strip()
Adding '\n' to the key works too.
The best would be to use regular expression library, like so
import re
characters = re.findall("[a-zA-Z]", sourcetext)
print ("".join(characters))
In a resulting string you will have ONLY an alphabetic characters.
If you look at the distribution of characters (using collections.Counter), you get:
6000+ each of )#(]#_%[}!+$&{*^ (which you are correctly excluding from the output)
1220 newlines (which you are not excluding from the output)
1 each of — no, I'm not going to give away the answer
Just add \n to your key variable to exclude the unwanted newlines. This will leave you with just the rare (i.e., 1 occurrence only) characters you need.
P.S., it's highly inefficient to concatenate strings in a loop. Instead of:
new2 =""
for letter in theory:
if letter not in key:
new2 += letter
write:
new2 = ''.join(letter for letter in theory if letter not in key)
The theory string contains several newlines. They get printed by your code. You can either get rid of the newline, like this:
theory = "}#)$[]_+(^_#^][]_)*^*+_!{&$##]((](}}{[!$#_{&{){" \
"*_{^}$#!+]{[^&++*#!]*)]%$!{#^&%(%^*}#^+__])_$#_^#[{{})}$*]#%]{}{][#^!#)_[}{())%)" \
"())&##*[#}+#^}#%!![#&*}^{^(({+#*[!{!}){(!*#!+#[_(*^+*]$]+#+*_##)&)^(#$^]e#][#&)(" \
"%%{})+^$))[{))}&$(^+{&(#%*#&*(^&{}+!}_!^($}!(}_##++$)(%}{!{_]%}$!){%^%%#^%&#([+[" \
"_+%){{}(#_}&{&++!#_)(_+}%_#+]&^)+]_[#]+$!+{#}$^!&)#%#^&+$#[+&+{^{*[#]#!{_*[)(#[[" \
"]*!*}}*_(+&%{&#$&+*_]#+#]!&*#}$%)!})#&)*}#(#}!^(]^#}]#&%)![^!$*)&_]^%{{}(!)_&{_{" \
"+[_*+}]$_[##_^]*^*##{&%})*{&**}}}!_!+{&^)__)#_#$#%{+)^!{}^#[$+^}&(%%)&!+^_^#}^({" \
"*%]&#{]++}#$$)}#]{)!+#[^)!#[%#^!!"
or your can filter them out, like this:
key = "##!$%+{}[]_-&*()*^#/\n"
Both work fine (yes, I tested).
a simpler way to output the answer is to:
print ''.join([ c for c in theory if c not in key])
and in your case you might want to add the newline character to key to also filter it out:
key += "\n"
You'd better work in reverse, something like this:
out = []
for i in theory:
a = ord(i)
if (a > 96 and a < 122) or (a > 65 and a < 90):
out.append(chr(a))
print ''.join(out)
Or better, use a regexp.

Resources