How to remove several substring within a string in matlab? - string

I'm trying to implement in a different way what I can already do implementing some custom matlab functions. Let us suppose to have this string 'AAAAAAAAAAAaaaaaaaaaaaTTTTTTTTTTTTTTTTsssssssssssTTTTTTTTTT' I know to remove each lowercase sub strings with
regexprep(String, '[a-z]*', '')
But since I want to understand how to take indexes of these substrings and using them to check them and remove them maybe with a for loop I'm investigating about how to do it.
Regexp give the indexes :
[Start,End] = regexp(Seq,'[a-z]{1,}');
but i'm not succeeding in figuring out how to use them to check these sequences and eliminate them.

With the indexing approach you get several start and end indices (two in your example), so you need a loop to remove the corresponding sections from the string. You should remove them from last to first, otherwise indices that haven't been used yet will become invalid as you remove sections:
x = 'AAAAAAAAAAAaaaaaaaaaaaTTTTTTTTTTTTTTTTsssssssssssTTTTTTTTTT'; % input
y = x; % initiallize result
[Start, End] = regexp(x, '[a-z]{1,}');
for k = numel(Start):-1:1 % note: from last to first
y(Start(k):End(k)) = []; % remove section
end

Related

Shuffle and unscramble a string (Lua)

I have a function to shuffle strings from another article adapted to reorder the characters with a table of predefined numbers. It works perfectly, so I also needed a function to unscramble this string using the number table, but I have no idea how to do this, especially after having tried and failed several times.
Shuffle function:
randomValues = {}
for i = 1, 60 do
table.insert(randomValues, 1, math.random())
end
function shuffle(str)
math.randomseed(4)
local letters = {}
local idx = 0
for letter in str:gmatch'.[\128-\191]*' do
idx = idx + 1
table.insert(letters, {letter = letter, rnd = randomValues[idx]})
end
table.sort(letters, function(a, b) return a.rnd < b.rnd end)
for i, v in ipairs(letters) do
letters[i] = v.letter
end
return table.concat(letters)
end
Any tips?
I will assume, that all you're trying to do is:
Split a string into unicode characters.
Shuffle these characters.
Restore the characters to their original position.
I have separated the splitting up of unicode characters and doing the actual shuffle, to make it a bit easier to follow.
1. Splitting the characters
Starting off with the splitting of characters:
-- Splits a string into a table of unicode characters.
local function splitLetters(str)
local letters = {}
for letter in str:gmatch'.[\128-\191]*' do
table.insert(letters, letter)
end
return letters
end
This is mostly copied from the first part of your function.
2. Shuffling the table of characters
Now that we have a nice table of characters, that we can work with, it's time to shuffle them. Shuffling a list can be done by going through each character in order and swapping it with a randomly chosen (but still unshuffled) item. While we do that, we also keep a table of all indices that got swapped, which I call swapTable here.
-- Shuffles in place and returns a table, which can be used to unshuffle.
local function shuffle(items)
local swapTable = {}
for i = 1, #items - 1 do
-- Swap the first item with a random item (including itself).
local j = math.random(i, #items)
items[i], items[j] = items[j], items[i]
-- Keep track of each swap so we can undo it.
table.insert(swapTable, j)
-- Everything up to i is now random.
-- The last iteration can be skipped, as it would always swap with itself.
-- See #items - 1 at the top of the loop.
end
return swapTable
end
3. Restoring the letters to their original positions
Using this swapTable, it is now pretty straightforward to just do the whole shuffle again, but in reverse.
-- Restores a previous shuffle in place.
local function unshuffle(items, swapTable)
-- Go through the swap table backwards, as we need to do everything in reverse.
for i = #swapTable, 1, -1 do
-- Do the same as before, but using the swap table.
local j = swapTable[i]
items[i], items[j] = items[j], items[i]
end
end
A full example using all those functions
Using those few functions (and table.concat to build up the list of letters into a string again) we can do everything you want:
-- Make our output reproducible
math.randomseed(42)
-- Split our test string into a table of unicode characters
local letters = splitLetters("Hellö Wörld! Höw are yoü?")
-- Shuffle them in-place, while also getting the swapTable
local swapTable = shuffle(letters)
-- Print out the shuffled string
print(table.concat(letters)) --> " rH?doröWüle Hl lwa eyöö!"
-- Unshuffle them in-place using the swapTable
unshuffle(letters, swapTable)
-- And we're back to the original string
print(table.concat(letters)) --> "Hellö Wörld! Höw are yoü?"
Creating the swapTable upfront
In your example, you generate the swapTable upfront (and it also works slightly different for you). You can of course split that part out and have your shuffle function work similar to how unshuffle is currently implemented. Tell me, if you want me to elaborate on that.

MATLAB cell to string

I am trying to read an excel sheet and then and find cells that are not empty and have date information in them by finding two '/' in a string
but matlab keeps to erroring on handling cell type
"Undefined operator '~=' for input arguments of type 'cell'."
"Undefined function 'string' for input arguments of type 'cell'."
"Undefined function 'char' for input arguments of type 'cell'."
MyFolderInfo = dir('C:\');
filename = 'Export.xls';
[num,txt,raw] = xlsread(filename,'A1:G200');
for i = 1:length(txt)
if ~isnan(raw(i,1))
if sum(ismember(char(raw(i,1)),'/')) == 2
A(i,1) = raw(i,1);
end
end
end
please help fixing it
There are multiple issues with your code. Since raw is a cell array, you can't run isnan on it, isnan is for numerical arrays. Since all you're interested in is cells with text in them, you don't need to use raw at all, any blank cells will not be present in txt.
My approach is to create a logical array, has_2_slashes, and then use it to extract the elements from raw that have two slashes in them.
Here is my code. I generalized it to read multiple columns since your original code only seemed to be written to handle one column.
filename = 'Export.xls';
[~, ~, raw] = xlsread(filename, 'A1:G200');
[num_rows, num_cols] = size(raw);
has_2_slashes = false(num_rows, num_cols);
for row = 1:num_rows
for col = 1:num_cols
has_2_slashes(row, col) = sum(ismember(raw{row, col}, '/')) == 2;
end
end
A = raw(has_2_slashes);
cellfun(#numel,strfind(txt,'/'))
should give you a numerical array where the (i,j)th element contains the number of slashes. For example,
>> cellfun(#numel,strfind({'a','b';'/','/abc/'},'/'))
ans =
0 0
1 2
The key here is to use strfind.
Now you may want to expand a bit in your question on what you intend to do next with txt -- in other words, specify desired output more, which is always a good thing to do. If you intend to read the dates, it may be better to just read it upfront, for example by using regexp or datetime as opposed to getting an array which can then map to where the dates are. As is, using ans>=2 next gives you the logical array that can let you extract the matched entries.

Is there anything else used instead of slicing the String?

This is one of the practice problems from Problem solving section of Hackerrank. The problem statement says
Steve has a string of lowercase characters in range ascii[‘a’..’z’]. He wants to reduce the string to its shortest length by doing a series of operations. In each operation he selects a pair of adjacent lowercase letters that match, and he deletes them.
For example : 'aaabbccc' -> 'ac' , 'abba' -> ''
I have tried solving this using slicing of strings but this gives me timeout runtime error on larger strings. Is there anything else to be used?
My code:
s = list(input())
i=1
while i<len(s):
if s[i]==s[i-1]:
s = s[:i-1]+s[i+1:]
i = i-2
i+=1
if len(s)==0:
print("Empty String")
else:
print(''.join(s))
This gives me terminated due to timeout message.
Thanks for your time :)
Interning each new immutable string can be expensive,
as it has O(N) linear cost with the length of the string.
Consider processing "aa" * int(1e6).
You will write on the order of 1e12 characters to memory
by the time you're finished.
Take a moment (well, take linear time) to
copy each character over to a mutable list element:
[c for c in giant_string]
Then you can perform dup processing by writing a tombstone
of "" to each character you wish to delete,
using just constant time.
Finally, in linear time you can scan through the survivors using "".join( ... )
One other possible solution is to use regex. The pattern ([a-z])\1 matches a duplicate lowercase letter. The implementation would involve something like this:
import re
pattern = re.compile(r'([a-z])\1')
while pattern.search(s): # While match is found
s = pattern.sub('', s) # Remove all matches from "s"
I'm not an expert at efficiency, but this seems to write fewer strings to memory than your solution. For the case of "aa" * int(1e6) that J_H mentioned, it will only write one, thanks to pattern.sub replacing all occurances at once.

Set position according part of an objectname

I'm trying to script something in blender3D using python.
I've got a bunch of objects in my scene and want to translate them using a the numerical part of their objectname.
First of all i collect objects from the scene by matching a part of their name.
root_obj = [obj for obj in scene.objects if fnmatch.fnmatchcase(obj.name, "*_Root")]
This gives me a list with:[bpy.data.objects['01_Root'],bpy.data.objects['02_Root'],bpy.data.objects['03_Root'],bpy.data.objects['00_Root']]
My goal is to move these objects 15x their corresponding part of the name. So '00_Root' doesnt have to move, but '01_Root' has to move 15 blender units and '02_Root' 30 blender units.
How do i exctract the numberpart of the names and use them as translation values.
I'm a pretty newb with python so i would appreciate all the help i can get.
A string is a list of characters, each character can be accessed by index starting with 0, get the first character with name[0], the second with name[1]. As with any list you can use slicing to get a portion of the list. If the value is always the first two characters you can get the value with name[:2] you can them turn that into an integer with int() or a float with float(). Combined that becomes,
val = int(name[:2])
You then have a number you can calculate the new location with.
obj.location.x = val * 15
If the number of digits in the name might vary you can use split() to break the string on a specific separating character. This returns a list of items between the specified character, so if you want the first item to turn into an integer.
name = '02_item'
val = int(name.split('_')[0])
Using split also allows multiple values in a name.
name = '2_12_item'
val1 = int(name.split('_')[0])
val2 = int(name.split('_')[1])

Lua Pattern Exclusion

I have a predefined code, e.g."12-345-6789", and wish to match the first and last portions with Lua patterns, e.g. "12-6789". An exclusion of the second number set and the hyphen should work but I am having trouble figuring that out with patterns or if it is possible.
I know I could capture each individually like so
code = "12-345-6789"
first, middle, last = string.match(code, "(%d+)-(%d+)-(%d+)")
and use that but it would require a lot of code rewriting on my part. I would ideally like to take the current table of pattern matches and add it to be used with string.match
lcPart = { "^(%d+)", "^(%d+%-%d+)", "(%d+)$", ?new pattern here? }
code = "12-345-6789"
newCode = string.match(code, lcPart[4])
You can't do this with one capture, but it's trivial to splice the results of two captures together:
local first, last = string.match(code, "(%d+)%-%d+%-(%d+)")
local newid = first .. "-" .. last
If you're trying to match against a list of patterns, it may be better to refactor it into a list of functions instead:
local matchers = {
function(s) return string.match(s, "^(%d+)") end,
function(s) return string.match(s, "^(%d+%-%d+)") end,
-- ...
function(s)
local first, last = string.match(code, "(%d+)%-%d+%-(%d+)")
return first .. "-" .. last
end,
}
for _,matcher in ipairs(matcher) do
local match = matcher(code)
if match then
-- do something
end
end
I know this is an old thread, but someone might still find this useful.
If you need only the first and last sets of digits, separated by a hyphen you could use string.gsub for that
local code = "12-345-6789"
local result = string.gsub(code, "(%d+)%-%d+%-(%d+)", "%1-%2")
This will simply return the string "12-6789" by using the first and second captures from the pattern.

Resources