Calculating the occurrence of unknown values in a list - python-3.x

I very often face the following problem:
I have a list with unknown elements in it (but each element is of the same type, e.g.: str) and I want to count the occurrence of each element. Sometime I also want to do something with the occurrence values, so I usually store them in a dictionary.
My problem is, that I cannot "auto initialize" a dictionary with +=1, so I first I have to do a check, if the given element is still in the dictionary.
My usual go to solution:
dct = {}
for i in iterable:
if i in dct:
dct[i] += 1
else:
dct[i] = 1
Is there a simpler colution to this problem?

Yes! A defaultdict.
from collections import defaultdict
dct = defaultdict(int)
for i in iterable:
dict[i] += 1
You can auto-initialise with other types too:
Docs: https://docs.python.org/3.3/library/collections.html#collections.defaultdict
d = defaultdict(str)
d[i] += 'hello'
If you're just counting things, you could use a Counter instead:
from collections import Counter
c = Counter(iterable) # c is a subclass of dict

Related

Number of elements in a nested sublist (starting from the first Index)

My code is as follows :
Nums=[['D'],['A','B'],['A','C'],['C','A']]
Output should be D=0
A=2
C=1
B=0
I have tried as follows:
nums=[['D'],['A','B'],['A','C'],['C','A']]
d=dict()
for i in (nums):
for j in i:
if(len(i)==1):
d[j]=0
else:
d[j]=1
print(d)
Am I on the right path to choose a dictionary to count the path?
Please post your suggestion in any data-structure
import collections
seen_dict = collections.Counter([x[0] for x in Nums if len(x) > 1])
To obtain a dictionary with the sum minus one of the occurrences you can perform a dictionary comprehension using the library Counter:
from collections import Counter # import library
flat = sum(nums, []) # returns a flat list
count = Counter(flat).items() # counts the elements (returns a dictionary)
result = {c[0]:c[1]-1 for c in count} # dictionary comprehnsion returning the sum minus one
Compressed form:
result = {c[0]:c[1]-1 for c in Counter(sum(nums, [])).items()}

Why KeyError raises even the respective key exists?

def countLetter(string):
dic = dict()
for char in string:
dic[char] = (1,dic[char]+1)[char in dic]
print(dic)
countLetter('aabcb')
Here, I'm trying to count the number of times each letter has occured, but the line 4 raises an error.
It raises an KeyError.
The problem is that this line:
dic[char] = (1,dic[char]+1)[char in dic]
is eagerly evaluating dic[char]+1 as part of constructing the tuple to index before you get around to testing if char in dic to select the element of the tuple. So it dies with a KeyError before your test has a chance to prevent the failing lookup. To make it lazy, you could do:
dic[char] = dic[char] + 1 if char in dic else 1
or you could just use a method designed for this to avoid the explicit test:
dic[char] = dic.get(char, 0) + 1
Though this particular pattern is made even simpler with collections.Counter:
import collections
def countLetter(string):
print(collections.Counter(string)) # print(dict(collections.Counter(string))) if it must look like a dict

Regex Output Count

I am trying to count the output of a regex search I am conducting on a dataset but for some reason my count is off by a lot. I was wondering what I am doing wrong and how I can get an official count. I should have around 1500 matches but I keep getting an error that says "'int' object is not iterable".
import re
with open ('Question 1 Logfile.txt' , 'r') as h:
results = []
count = []
for line in h.readlines():
m = re.search(r'(((May|Apr)(\s*)\w+\s\w{2}:\w{2}:\w{2}))', line)
t = re.search(r'(((invalid)(\s(user)\s\w+)))',line)
i = re.search(r'(((from)(\s\w+.\w+.\w+.\w+)))', line)
if m and t and i:
count += 1
print(m.group(1),' - ',i.group(4),' , ',t.group(4))
print(count)
You want to increment the number of times you satisfy a condition over a series of loop iterations. The confusion here seems to be how exactly to do that, and what variable to increment.
Here's a small example that captures the difficulty you've encountered, as described in OP and in OP comments. It's meant as a learning example, but it does also provide a couple of options for a solution.
count = []
count_int = 0
for _ in range(2):
try:
count += 1
except TypeError as e:
print("Here's the problem with trying to increment a list with an integer")
print(str(e))
print("We can, however, increment a list with additional lists:")
count += [1]
print("Count list: {}\n".format(count))
print("Most common solution: increment int count by 1 per loop iteration:")
count_int +=1
print("count_int: {}\n\n".format(count_int))
print("It's also possible to check the length of a list you incremented by one element per loop iteration:")
print(len(count))
Output:
"""
Here's the problem with trying to increment a list with an integer:
'int' object is not iterable
We can, however, increment a list with additional lists:
Count list: [1]
Most common is to increment an integer count by 1, for each loop iteration:
count_int: 1
Here's the problem with trying to increment a list with an integer:
'int' object is not iterable
We can, however, increment a list with additional lists:
Count list: [1, 1]
Most common is to increment an integer count by 1, for each loop iteration:
count_int: 2
It's also possible to check the length of a list you incremented
by one element per loop iteration:
2
"""
Hope that helps. Good luck learning Python!

Creating a list of objects from a dictionary

So first of all I have a function that count words in a text file, and a program that creates a dictionary based on how many occurences of the word is in that text file. The program is
def counter (AllWords):
d = {}
for word in AllWords:
if word in d.keys():
d[word] = d[word] + 1
else:
d[word] = 1
return d;
f = open("test.txt", "r")
AllWords = []
for word in f.read().split():
AllWords.append(word.lower())
print(counter(AllWords))
Now given that dictionary, I want to create a list of objects such that the objects will have two instance variables, the word (string) and how many time it appears (integer). Any help is appreciated!
What about:
list(d.items())
It will create a list of tuples like:
[('Foo',3),('Bar',2)]
Or you can define your own class:
class WordCount:
def __init__(self,word,count):
self.word = word
self.count = count
and use list comprehension:
[WordCount(*item) for item in d.items()]
So here you create a list of WordCount objects.
Nevertheless, your counter(..) method is actually not necessary: Python already has a Counter:
from collections import Counter
which is "a dictionary with things" so to speak: you can simply construct it like:
from collections import Counter
Counter(allWords)
No need to reinvent to wheel to count items.
What about a quasi one-liner to do all the heavy lifting, using of course collections.Counter and the mighty str.split ?
import collections
with open("text.txt") as f:
c = collections.Counter(f.read().split())
Now c contains the couples: word,number of occurences of the word

How to find the smallest and largest freq or word in an list of objects represented by an object python

I have an object that represents a list of objects. Each of these represents a word and its frequency of occurrence in a file.
each object in the list has a word, and the frequency that it shows up in a file. Currently i'm getting an error that says "object is not iterable".
#each object in the list looks like this
#word = "hello", 4
def max(self):
max_list = [None, 0]
for item in WordList:
if item.get_freq() > max_list[1]:
max_list[0] = item.get_word()
max_list[1] = item.get_freq()
return max_list
how do i find the max and min frequency of these objects
Note: this is in a class WordList and that get_word and get_freq is in the class that created the objects in the list.
You question is not clear to me. Using 'object' in the title is at least once too many. The function does not use self. If WordList is a class, you cannot iterate it. Etc. However, I will try to give you an answer to what I think you are asking, which you might be able to adapt.
def minmax(items)
"""Return min and max frequency words in iterable items.
Items represent a word and frequency accessed as indicated.
"""
it = iter(items)
# Initialize result variables
try:
item = next(items)
min_item = max_item = item.get_word(), item.get_freq()
except StopIteration:
raise ValueError('cannon minmax empty iterable')
# Update result variables
for item in it:
word = item.get_word()
freq = item.get_freq()
if freq < min_item[1]:
min_item = word, freq
elif freq > max_item[1]:
max_item = word, freq
return min_item, max_item

Resources