I have a string array, for instance:
arr = ['hello'; 'world'; 'hello'; 'again'; 'I----'; 'said-'; 'hello'; 'again']
How can I extract the most frequent string, which is 'hello' in this example?
First step, use a cell array rather than string array:
arr = {'hello', 'world'; 'hello', 'again'; 'I----', 'said-'; 'hello', 'again'};
Second, use unique to get the unique strings (this doesn't work on a string array, which is why I suggest the cell):
[unique_strings, ~, string_map]=unique(arr);
Then use mode on the string_map variable to find the most common values:
most_common_string=unique_strings(mode(string_map));
It is better to use cell arrays and regexp function; the behavior of string arrays may not be what you expect.
arr = {'hello', 'world'; 'hello', 'again'; 'I----', 'said-'; 'hello', 'again'};
If you use
hellos = sum(~cellfun('isempty', regexp(arr, 'hello')));
it will return the number of 'hello''s in cell array arr.
Related
So far, I have:
my_list = ['hello', 'oi']
comparison_list = ['this hellotext', 'this oitext']
for w in my_list:
if w in comparison_list: print('yes')
However, nothing prints because no element in my_list equals any element in comparison_list.
So how do I make this check as a subset or total occurance?
Ideal output:
yes
yes
You are checking the occurrence of the complete string in the list currently. Instead you can check for the occurrence of the string inside each comparison string and make a decision. A simple approach will be to re-write the loop as below
for w in my_list:
# Check for every comparison string. any() considers atleast 1 true value
if any([True for word in comparison_list if w in word]):
print('yes')
It's because you're comparing w to the list elements. If you wanna find w in each string in your comparison_list you can use any:
my_list = ['hello', 'oi', 'abcde']
comparison_list = ['this hellotext', 'this oitext']
for w in my_list:
if any(w in s for s in comparison_list):
print('yes')
else:
print('no')
I added a string to your list and handle the 'no' case in order to get an output for each element
Output:
yes
yes
no
Edited Solution:
Apologies for older solution, I was confused somehow.
Using re module , we can use re.search to determine if the string is present in the list of items. To do this we can create an expression using str.join to concatenate all the strings using |. Once the expression is created we can iterate through the list of comparison to be done. Note | means 'Or', so any of the searched strings if present should return bool value of True. Note I am assuming there are no special characters present in the my_list.
import re
reg = '|'.join(my_list)
for item in comparison_list:
print(bool(re.search(reg, item)))
Problem to solve:
Write a function that will find all the anagrams of a word from a list. You will be given two inputs a word and an array with words. You should return an array of all the anagrams or an empty array if there are none.
Solution Tested:
a = ['aabb', 'abcd', 'bbaa', 'dada']
b = ['abab']
listA = []
sorted_defaultword = sorted(b[0])
print (sorted_defaultword)
for i in range (len(a)):
#print(a[i])
sorted_word = sorted(a[i])
#print (sorted_word)
if (sorted_word == sorted_defaultword):
listA.append(a[i])
print (listA)
Test Output:
['a', 'a', 'b', 'b']
['aabb', 'bbaa']
Using the test, I then tried to write my function but apparently it will not work. Can someone please suggest why:
def anagrams(word, words):
sorted_defaultword = sorted(word[0])
anagram_List = []
for i in range (len(words)):
sorted_word = sorted(words[i])
if (sorted_word == sorted_defaultword):
anagram_List.append(words[i])
return anagram_List
Why is this failing when I put it in a function?
You are passing wrong arguments to the function.
Test.assert_equals(anagrams('abba', ['aabb', 'abcd', 'bbaa', 'dada']), ['aabb', 'bbaa']
here you are passing the first parameter as a string. while the function expects a list.
Change your code to:
Test.assert_equals(anagrams(['abba'], ['aabb', 'abcd', 'bbaa', 'dada']), ['aabb', 'bbaa']
note that I have just passed 'abba' in a list, because your function expects it to be a list.
If you want to use your previous code, from your function change this line sorted_defaultword = sorted(word[0]) to sorted_defaultword = sorted(word)
And this should do the job...
I'm in the process of learning python and with a practical example I've come across a problem I cant seem to find the solution for.
The error I get with the following code is
'list' object has to attribute 'upper'.
def to_upper(oldList):
newList = []
newList.append(oldList.upper())
words = ['stone', 'cloud', 'dream', 'sky']
words2 = (to_upper(words))
print (words2)
Since the upper() method is defined for string only and not for list, you should iterate over the list and uppercase each string in the list like this:
def to_upper(oldList):
newList = []
for element in oldList:
newList.append(element.upper())
return newList
This will solve the issue with your code, however there are shorter/more compact version if you want to capitalize an array of string.
map function map(f, iterable). In this case your code will look like this:
words = ['stone', 'cloud', 'dream', 'sky']
words2 = list(map(str.upper, words))
print (words2)
List comprehension [func(i) for i in iterable].In this case your code will look like this:
words = ['stone', 'cloud', 'dream', 'sky']
words2 = [w.upper() for w in words]
print (words2)
You can use the list comprehension notation and apply theupper method to each string in words:
words = ['stone', 'cloud', 'dream', 'sky']
words2 = [w.upper() for w in words]
Or alternatively use map to apply the function:
words2 = list(map(str.upper, words))
AFAIK, upper() method is implemented for strings only. You have to call it from each child of the list, and not from the list itself.
It's great that you're learning Python! In your example, you are trying to uppercase a list. If you think about it, that simply can't work. You have to uppercase the elements of that list. Additionally, you are only going to get an output from your function if you return a result at the end of the function. See the code below.
Happy learning!
def to_upper(oldList):
newList = []
for l in oldList:
newList.append(l.upper())
return newList
words = ['stone', 'cloud', 'dream', 'sky']
words2 = (to_upper(words))
print (words2)
Try it here!
Is there a better way to do this? Note: part1, part2 and part3 are string variables defined elsewhere (they can be null).
def list = [part1, part2, part3]
list.removeAll([null])
def ans = list.join()
The desired result is a concatenated string with null values left out.
You can do this:
def ans = [part1, part2, part3].findAll({it != null}).join()
You might be able to shrink the closure down to just {it} depending on how your list items will evaluate according to Groovy Truth, but this should make it a bit tighter.
Note: The GDK javadocs are a great resource.
If you use findAll with no parameters. It will return every "truthful" value, so this should work:
def ans = [part1, part2, part3].findAll().join()
Notice that findAll will filter out empty strings (because they are evaluated as false in a boolean context), but that doesn't matter in this case, as the empty strings don't add anything to join() :)
If this is a simplified question and you want to keep empty string values, you can use findResults{ it }.
Alternatively, you can do this as a fold operation with inject:
def ans = [part1, part2, part3].inject('') { result, element ->
result + (element ?: '')
}
This iterates the whole list and concatenates each successive element to a result, with logic to use the empty string for null elements.
You could use grep:
groovy:000> list = ['a', 'b', null, 'c']
===> [a, b, null, c]
groovy:000> list.grep {it != null}.join()
===> abc
I am doing string processing in Matlab and I usually use cell arrays to store individual words in the text
Example:
a = {'this', 'is', 'an', 'array', 'of', 'strings'}
For searching for the word 'of' in this array I loop through the array and check each individual element against my word. This method does not scale since if I get a large dataset my array a will grow large and looping through elements is not wise. I am wondering if there is any more smart way, perhaps a better native data structure in Matlab, that can help me run this faster?
A map container is one option. I don't know what specific sort of string processing you intend to do, but here's an example for how you can store each string as a key which is associated with a vector of index positions of that word in a cell array:
a = {'this', 'is', 'an', 'array', 'of', 'strings', 'this', 'is'};
strMap = containers.Map(); %# Create container
for index = 1:numel(a) %# Loop over words to add
word = a{index};
if strMap.isKey(word)
strMap(word) = [strMap(word) index]; %# Add to an existing key
else
strMap(word) = index; %# Make a new key
end
end
You could then get the index positions of a word:
>> indices = strMap('this')
indices =
1 7 %# Cells 1 and 7 contain 'this'
Or check if a word exists in the cell array (i.e. if it is a key):
>> strMap.isKey('and')
ans =
0 %# 'and' is not present in the cell array