Set of strings efficient implementation - string

Is there an easy way to create a set of strings in Matlab?
I am going through a list of filepaths and want to get all names of folders at a specific level.
But since in some folders there are several files, I get these folders several times.
I know there would be the possibility to create a cell array and check every time if the current folder name is already in the array, and if not, add it.
Another option would be to use the java HashSet class.
But is there any easy inbuilt Matlab way to do something like that?
I can't use a Vector since it would create a vector of chars not strings.

Unfortunately there's nothing as efficient as Java Set implementations.
But you can use set operations. Either union when you add, or just call unique on your collection with duplicates.

You could use the rdir script... MATLAB file exchange to the rescue!
Use it like this:
listing = rdir(name);
The function returns a structure listing similar to the built-in dir command.
It should save you the headache of iterating through a directory tree yourself.

How about "unique":
x = {'dog', 'cat', 'cat', 'fish', 'horse', 'bird', 'rat', 'rat'};
x_set=unique(x)
x_set =
'bird' 'cat' 'dog' 'fish' 'horse' 'rat'

Related

Doing Apply(str) to int creates a bunch of \n's?

I have a dataframe like this labeled by year:
I would like to change these ints to strs because they are actually categorical variables. However, this is the result:
X_train['YrSold'] = X_train['YrSold'].apply(str)
I would prefer to get rid of all those \n's automatically in the .apply(str) process, as opposed to postprocessing each column via regex. Seems like the latter would have more room for error.
Solution was astype(str) instead of apply(str)

Recommended data structure to store a changeable sequence with a number

I am trying to build a FP tree, and feeling quite confused which data structure I should use to record the prefix path and its occurrence. The prefix path is a sequence recording item set like ('coffee','milk','bear') and its occurrence is an int number. I post two requirements of the data structure below so that you don't need to go deep into FP-tree:
The occurrence of prefix path need to be searched frequently, so maybe dict like {prefix_path : occurrence} is the best way to store them.
The prefix path need to be updated(re-rank and filter) in a conditional FP tree.
I have searched other's work in Github, and found out people would use {tuple(['coffee','milk','bear']):occurrence} or {frozenset(['coffee','milk','bear']):occurrence} to do so. However, when prefix path update, they need to change tuple or frozensetinto list then change it back. I think this is quite not pythonic.
I am wondering if there is a better way to store prefix path with its occurrence.

How to save strings in matrixes in matlab

I want to have a matrix/cell, that has strings inside that I can access and use later as strings.
For instance, I have one variable (MyVar) and one cell (site) with names inside:
MyVar=-9999;
site={'New_York'; 'Lisbon'; 'Sydney'};
Then I want to do something like:
SitePosition=strcat(site{1},'_101'}
and then do this
save(sprintf('SitePosition%d',MyVar),);
This doesn't work at all! Is there a way to have strings in a matrix and access them in order to keep working with them if they were a string?
This:
MyVar=-9999; site={'New_York'; 'Lisbon'; 'Sydney'};
SitePosition = strcat(site{1},'_101');
save(sprintf('SitePosition%d',MyVar));
Works fine and yields SitePosition-9999.mat, note the syntax changes in lines 2 and 3.
Is there something else you're expecting?
EDIT: Based on your comment
Check out the documentation for save regarding saving specific variables
New example:
MyVar=-9999;
site={'New_York'; 'Lisbon'; 'Sydney'};
SitePosition = strcat(site{1},'_101');
save(SitePosition,'MyVar');
Creates New_York_101.mat with only the variable MyVar in it.

Checking if values in List is part of String

I have a string like this:
val a = "some random test message"
I have a list like this:
val keys = List("hi","random","test")
Now, I want to check whether the string a contains any values from keys. How can we do this using the in built library functions of Scala ?
( I know the way of splitting a to List and then do a check with keys list and then find the solution. But I'm looking a way of solving it more simply using standard library functions.)
Something like this?
keys.exists(a.contains(_))
Or even more idiomatically
keys.exists(a.contains)
The simple case is to test substring containment (as remarked in rarry's answer), e.g.
keys.exists(a.contains(_))
You didn't say whether you actually want to find whole word matches instead. Since rarry's answer assumed you didn't, here's an alternative that assumes you do.
val a = "some random test message"
val words = a.split(" ")
val keys = Set("hi","random","test") // could be a List (see below)
words.exists(keys contains _)
Bear in mind that the list of keys is only efficient for small lists. With a list, the contains method typically scans the entire list linearly until it finds a match or reaches the end.
For larger numbers of items, a set is not only preferable, but also is a more true representation of the information. Sets are typically optimised via hashcodes etc and therefore need less linear searching - or none at all.

How to format a flat string with integers in it in erlang?

In erlang, I want to format a string with integers in it and I want the result to be flattened. But I get this:
io_lib:format("sdfsdf ~B", [12312]).
[115,100,102,115,100,102,32,"12312"]
I can get the desired result by using the code below but it is really not elegant.
lists:flatten(io_lib:format("sdfsdf ~B", [12312])).
"sdfsdf 12312"
Is there a better formatting strings with integers in them, so that they are flat? Ideally, using only one function?
You flatten a list using lists:flatten/1 as you've done in your example.
If you can accept a binary, list_to_binary/1 is quite efficient:
1> list_to_binary(io_lib:format("sdfsdf ~B", [12312])).
<<"sdfsdf 12312">>
However, question why you need a flat list in the first place. If it is just cosmetics, you don't need it. io:format/1,2,3 and most other port functions (gen_tcp etc) accept so called deep IO lists (nested lists with characters and binaries):
2> io:format([115,100,102,115,100,102,32,"12312"]).
sdfsdf 12312ok
There is an efficiency reason that io_lib:format returns deep lists. Basically it saves a call to lists:flatten.
Ask yourself why you want the list flattened. If you are going to print the list or send it to a port or write it to a file, all those operations handle deep lists.
If you really need a flattened list for some reason, then just flatten it. Or you can create your own my_io_lib:format that returns flattened lists if you think it important.
(If you only want to flatten the list for debugging reasons then either print your strings with ~s, or create a flattener in an erlang module named user_default. Something like this:
-module(user_default).
-compile(export_all).
%% either this:
fl(String) ->
lists:flatten(String).
%% or this:
pp(String) ->
io:format("~s~n", [String]).
Then you can use fl/1 and print/1 in the Erlang shell (as long as user_default.beam is in your path of course).)

Resources