How to compare character frequency and return the most occurring character? - python-3.x

I am trying to build a function which returns the most occurred character in a given string and it's working pretty nicely, but how do I return None if the characters have same frequency?
Like for input: 'abac'
Expected output is: 'a'
and for input: 'abab'
Expected output is: None
I have tried using a dictionary to store character frequency and then returning the element with largest value.
def most_occuring_char(str1):
count = {}
max = 0
c = ''
for char in str1:
if char in count.keys():
count[char]+=1
else:
count[char] = 1
for char in str1:
if max < count[char]:
max = count[char]
c = char
return c
I don't know how to check whether the count dictionary elements have same frequency.

You can do that counting with the dict using collections.Counter.
You basically only have to add a check to see if the maximum count is unique (if so, return the char with maximum number of occurrences) or not (if so, return None):
from collections import Counter
def most_occurring_char(string):
counter = Counter(string)
max_char_count = max(counter.values())
is_unique = len([char_count for char_count in counter.values() if char_count == max_char_count]) == 1
if is_unique:
char = [char for char, count in counter.items() if count == max_char_count][0]
return char
return None
# Tests
assert most_occurring_char('abac') == 'a'
assert most_occurring_char('abab') is None

Once you have a dictionary containing the counts of every character (after your first for loop), you can inspect this to determine whether certain counts are the same or not.
If you wish to return None only when all the character counts are the same, you could extract the values (i.e. the character counts) from your dictionary, sort them so they are in numerical order, and compare the first and last values. Since they are sorted, if the first and last values are the same, so are all the intervening values. This can be done using the following code:
count_values = sorted(count.values())
if count_values[0] == count_values[-1]: return None
If you wish to return None whenever there is no single most frequent character, you could instead compare the last value of the sorted list to the second last. If these are equal, there are two or more characters that occur most frequently. The code for this is very similar to the code above.
count_values = sorted(count.values())
if count_values[-1] == count_values[-2]: return None

Another possibility:
def most_occuring_char(s):
from collections import Counter
d = Counter(s)
k = sorted(d, key=lambda x:d[x], reverse=True)
if len(k) == 1: return k[0]
return None if len(k) == 0 or d[k[0]] == d[k[1]] else k[0]
#Test
print(most_occuring_char('abac')) #a
print(most_occuring_char('abab')) #None (same frequencies)
print(most_occuring_char('x')) #x
print(most_occuring_char('abcccba')) #c
print(most_occuring_char('')) #None (empty string)

Related

Two Sum function not working with recurring elements in list

I'm trying to complete "Two Sum", which goes as such:
Write a function that takes an array of numbers (integers for the tests) and a target number. It should find two different items in the array that, when added together, give the target value. The indices of these items should then be returned in a tuple like so: (index1, index2).
Efficiency of my code aside, this is what I have so far:
def two_sum(numbers, target):
for i in numbers:
for t in numbers:
if i + t == target:
if numbers.index(i) != numbers.index(t):
return (numbers.index(i), numbers.index(t))
return False
It works for inputs such as:
>>> two_sum([1,2,3,4,5,6,7,8,9,10], 11)
(0, 9)
But when I try a list of numbers that have recurring numbers that add up to the target, the code doesn't work:
>>> two_sum([2, 2], 4)
False
The code, for some reason that I cannot figure out, does not reach index [1] of the list, and thus returns False.
Why is that?
The list method index() always returns the first occurence of an item in a list, so numbers.index(i) != numbers.index(t) evaluates to 1 != 1 which is False.
You should use the builtin enumerate() to store the indices while looping over the list.
def two_sum(numbers, target):
for i, number_a in enumerate(numbers):
for j, number_b in enumerate(numbers):
if number_a + number_b == target and i != j:
return (i, j)
return False
'''
return will break the loop and come out of function, so first you need to complete the cycle, store the result in list as you cant write to tuple,
once your loop gets completed convert list to tuple and return
'''
def two_sum(numbers, target):
result = []
for i in numbers:
for t in numbers:
if (i + t == target) and (numbers.index(i) != numbers.index(t)):
result.append(i)
result.append(t)
if (len(result)> 0):
return tuple(result)
else:
return False
Your code looks fine except this part:
if numbers.index(i) != numbers.index(t):
return (numbers.index(i), numbers.index(t))
return False
Because the index method returns only the first occurrence of a value, i and t are always the same. It will always return false. The index of the value 2 is always 0 in the list even though there is another 2 at index 1.
Source: https://www.w3schools.com/python/ref_list_index.asp
What you want to do is this:
def two_sum(numbers, target):
i_index = 0
t_index = 0
for i in numbers:
for t in numbers:
if i + t == target:
if i_index != t_index:
return (i_index, t_index)
t_index +=1
i_index +=1
return False
This way the index is not associated with the value
def pairs_sum_to_target(list1, list2, target):
'''
This function is about a game: it accepts a target integer named target and
two lists of integers (list1 and list2).
Then this function should return all pairs of indices in the form [i,j]
where list1[i] + list[j] == target.
To summarize, the function returns the pairs of indices where the sum of
their values equals to target.
Important: in this game list1 and list2 will always have the same number of
elements and returns the pairs in that order.
'''
pairs = [] #make a list, which is empty in the beginning. But store the sum pairs == target value.
#loop for all indices in list1 while looping all the same indices in list2 and comparing if the sum == target variable.
for i, value1 in enumerate (list1):
for j, value2 in enumerate(list2):
if value1 + value2 == target: ## if the value of element at indice i + value of element at indice j == target, then append the pairs to list pairs []- in order.
pairs.append(i,j)
return pairs
Simple Input #1
"""This is one example of input for list1, list2, and target. In order to properly test this function"""
list1 = [1,-2,4,5,9]
list2 = [4,2,-4,-4,0]

Need help writing a function which returns a dictionary with the keys as recursive digit sums

So I have written a function which calculates the sum of the digits when a number is input to the function. Now I am trying to write another function which would return a dictionary with the values from my digitsum function as the keys and the values would be how many times the count of that specific digitsum has occurred. Any ideas on how to go about writing the second function?
def digitsum(x):
if x < 10:
return x
else:
return (x%10) + digitsum(x//10)
def digitsumdictionary(lnum=0, hnum=100):
L =[digitsum(num) for num in range(100)]
counter = Counter(L).items()
return counter
Digitsum function is called depending on the length of the number.
You can simply find it by using len(list(str(num))). But if you want to count as the function calls itself, Then try this,
def digitsum(x, count=1):
if x < 10:
return { x : count }
else:
return {(x%10) + int(list(digitsum(x//10 , count+1).keys())[0]) : int(list(digitsum(x//10 , count+1).values())[0])}
Setting the count to 1 or 0 initially, includes or excludes the first call respectively.
The below code returns a list of dictionaries of the desired output.
[digitsum(i) for i in range(10)]

how to find the most occurring sentence from a string data frame in python 3

df = pd.DataFrame({
'Name': ['Ann', 'Juh', 'Jeo', 'Sam'],
'Age': [43,29,42,59],
'Task1': ['drafting a letter', 'Sending', 'Pasting', 'Sending'],
'Task2': ['Sending', 'Packing', 'Sending', 'Pasting'],
'Task3': ['Packing', 'Letter Drafting', 'Packing', 'Letter Drafting']
})
In the above string data frame, I need to check the occurrence of the given conditions.
condition = ["reading", "drafting a letter","packing book","sorting","sending","counting"]
for this I made a new column which combine Task1, Task2, Task3 using
df['NewTask'] = df[df.columns[2:]].apply(
lambda x: ','.join(x.dropna().astype(str)),
axis=1)
And I applied the logic obtained from
https://www.geeksforgeeks.org/sentence-that-contains-all-the-given-phrases/
and I am getting
Phrase1:count=0, plus the corresponding index values.
Phrase2:count=1 etc..
Now I need to find which is the most common 'occurring sentence' and most common 'occurring pairs of sentence' in df that is given in the condition. The above given data frame is a sample.
logic I worked on to get the count of each separately is
def getRes(sent, ph):
sentHash = dict()
# Loop for adding hased sentences to sentHash
for s in range(1, len(sent)+1):
sentHash[s] = set(sent[s-1].split())
# For Each Phrase
for p in range(0, len(ph)):
print("Phrase"+str(p + 1)+":")
# Get the list of Words
wordList = ph[p].split()
res = []
# Then Check in every Sentence
for s in range(1, len(sentHash)+1):
wCount = len(wordList)
# Every word in the Phrase
for w in wordList:
if w in sentHash[s]:
wCount -= 1
#wCount -= 1
# If every word in phrase matches
if wCount == 0:
# add Sentence Index to result Array
res.append(s+1)
if(len(res) == 0):
print("NONE")
else:
print('% s' % ' '.join(map(str, res)))
def main():
sent = dff['NewTask']
condition = ["reading", "drafting a letter","Packing","pasting","Sending","counting"]
getRes(sent,condition)
main()`
To produce a count of rows by condition you can filter your dataframe to only contain the rows where one of your tasks meets the condition and then sum the rows.
condition2 = {}
for criteria in condition:
condition2[criteria] = df.loc[(df['Task1'] == criteria) | (df['Task2'] == criteria) | (df['Task3'] == criteria)].shape[0]
if you would prefer to use your new column for this you can check the column for the task name though this is less robust.
condition2 = {}
for criteria in condition:
condition2[criteria] = df.loc[df['NewTask'].str.contains(criteria)].shape[0]
in order to identify common pairs of tasks one apporach is to use the itertools module to create every possible combination of tasks then to count how many rows contain both tasks.
import itertools
combinations = itertools.combinations(condition, 2)
You can then find the rows where both of these tasks are carried out in the same way as before.
pairs = {}
for i in combinations:
pairs[i] = df.loc[(df['NewTask'].str.contains(i[0]) )* (df['NewTask'].str.contains(i[1]) ) ].shape[0]
To return the highest pair you can use the below;
print(max(pairs, key=pairs.get), pairs[max(pairs, key=pairs.get)] )

Python 3.xx - Deleting consecutive numbers/letters from a string

I actually need help evaluating what is going on with the code which I wrote.
It is meant to function like this:
input: remove_duple('WubbaLubbaDubDub')
output: 'WubaLubaDubDub'
another example:
input: remove_duple('aabbccdd')
output: 'abcd'
I am still a beginner and I would like to know both what is wrong with my code and an easier way to do it. (There are some lines in the code which were part of my efforts to visualize what was happening and debug it)
def remove_duple(string):
to_test = list(string)
print (to_test)
icount = 0
dcount = icount + 1
for char in to_test:
if to_test[icount] == to_test[dcount]:
del to_test[dcount]
print ('duplicate deleted')
print (to_test)
icount += 1
elif to_test[icount] != to_test[dcount]:
print ('no duplicated deleted')
print (to_test)
icount += 1
print ("".join(to_test))
Don't modify a list (e.g. del to_test[dcount]) that you are iterating over. Your iterator will get screwed up. The appropriate way to deal with this would be to create a new list with only the values you want.
A fix for your code could look like:
In []:
def remove_duple(s):
new_list = []
for i in range(len(s)-1): # one less than length to avoid IndexError
if s[i] != s[i+1]:
new_list.append(s[i])
if s: # handle passing in an empty string
new_list.append(s[-1]) # need to add the last character
return "".join(new_list) # return it (print it outside the function)
remove_duple('WubbaLubbaDubDub')
Out[]:
WubaLubaDubDub
As you are looking to step through the string, sliding 2 characters at a time, you can do that simply by ziping the string with itself shifted one, and adding the first character if the 2 characters are not equal, e.g.:
In []:
import itertools as it
def remove_duple(s):
return ''.join(x for x, y in it.zip_longest(s, s[1:]) if x != y)
remove_duple('WubbaLubbaDubDub')
Out[]:
'WubaLubaDubDub'
In []:
remove_duple('aabbccdd')
Out[]:
'abcd'
Note: you need itertools.zip_longest() or you will drop the last character. The default fillvalue of None is fine for a string.

How to compare two string where the order of the elements no matter

I'm looking for some function where I can compare two string with the exception that the order of the elements no matter. E.g.:
'abc' == 'bca'
True
Does python have some built-in function like this?
Strings in python are iterables that return the characters in the string. Like any iterable, they can be wrapped with sorted. If you compare those wrappers, you'd essentially be comparing whether both strings have the same characters in them, regardless of their original order:
>>> s = 'abc'
>>> t = 'bca'
>>> s == t
False
>>> sorted(s) == sorted(t)
True
6For smaller strings you could call sorted on both strings:
def is_equal(a, b):
return len(a) == len(b) and sorted(a) == sorted(b)
For sorted(a) == sorted(b) to be True, both strings must be the same length and have the same characters:
Or use a Counter dict to count the frequencies if the string lengths are the same:
from collections import Counter
def is_equal(a, b):
return len(a) == len(b) and Counter(a) == Counter(b)
If the strings are different lengths then they cannot be equal, if they are all the chars from a have to be in b and the frequency has to be the same. The solution is O(n) as opposed to O(n log n) calling sorted

Resources