So first of all I have a function that count words in a text file, and a program that creates a dictionary based on how many occurences of the word is in that text file. The program is
def counter (AllWords):
d = {}
for word in AllWords:
if word in d.keys():
d[word] = d[word] + 1
else:
d[word] = 1
return d;
f = open("test.txt", "r")
AllWords = []
for word in f.read().split():
AllWords.append(word.lower())
print(counter(AllWords))
Now given that dictionary, I want to create a list of objects such that the objects will have two instance variables, the word (string) and how many time it appears (integer). Any help is appreciated!
What about:
list(d.items())
It will create a list of tuples like:
[('Foo',3),('Bar',2)]
Or you can define your own class:
class WordCount:
def __init__(self,word,count):
self.word = word
self.count = count
and use list comprehension:
[WordCount(*item) for item in d.items()]
So here you create a list of WordCount objects.
Nevertheless, your counter(..) method is actually not necessary: Python already has a Counter:
from collections import Counter
which is "a dictionary with things" so to speak: you can simply construct it like:
from collections import Counter
Counter(allWords)
No need to reinvent to wheel to count items.
What about a quasi one-liner to do all the heavy lifting, using of course collections.Counter and the mighty str.split ?
import collections
with open("text.txt") as f:
c = collections.Counter(f.read().split())
Now c contains the couples: word,number of occurences of the word
Related
My code is as follows :
Nums=[['D'],['A','B'],['A','C'],['C','A']]
Output should be D=0
A=2
C=1
B=0
I have tried as follows:
nums=[['D'],['A','B'],['A','C'],['C','A']]
d=dict()
for i in (nums):
for j in i:
if(len(i)==1):
d[j]=0
else:
d[j]=1
print(d)
Am I on the right path to choose a dictionary to count the path?
Please post your suggestion in any data-structure
import collections
seen_dict = collections.Counter([x[0] for x in Nums if len(x) > 1])
To obtain a dictionary with the sum minus one of the occurrences you can perform a dictionary comprehension using the library Counter:
from collections import Counter # import library
flat = sum(nums, []) # returns a flat list
count = Counter(flat).items() # counts the elements (returns a dictionary)
result = {c[0]:c[1]-1 for c in count} # dictionary comprehnsion returning the sum minus one
Compressed form:
result = {c[0]:c[1]-1 for c in Counter(sum(nums, [])).items()}
I very often face the following problem:
I have a list with unknown elements in it (but each element is of the same type, e.g.: str) and I want to count the occurrence of each element. Sometime I also want to do something with the occurrence values, so I usually store them in a dictionary.
My problem is, that I cannot "auto initialize" a dictionary with +=1, so I first I have to do a check, if the given element is still in the dictionary.
My usual go to solution:
dct = {}
for i in iterable:
if i in dct:
dct[i] += 1
else:
dct[i] = 1
Is there a simpler colution to this problem?
Yes! A defaultdict.
from collections import defaultdict
dct = defaultdict(int)
for i in iterable:
dict[i] += 1
You can auto-initialise with other types too:
Docs: https://docs.python.org/3.3/library/collections.html#collections.defaultdict
d = defaultdict(str)
d[i] += 'hello'
If you're just counting things, you could use a Counter instead:
from collections import Counter
c = Counter(iterable) # c is a subclass of dict
I have a list of sequences to be found in the sequencing data. So I run a for loop to find the match sequences in a dataset, and used Counter() to get the maximum sequences. But I found the Counter() function would add previous loop data, not as separate one.
ls = ['AGC', 'GCT', 'TAC', 'CGT']
dataset.txt like a bunch of sequences of "AGTAGCTTT", "AGTTAGC"......
def xfind(seq):
ls2 = []
with open(dataset.txt, 'r') as f:
for line in f:
if seq in line:
ls2.append(line)
import collections
from collections import Counter
cnt = Counter()
for l in ls2:
cnt[l] += 1
print (cnt.most_common()[0])
for l2 in ls:
xfind(l2)
The results look like:
('AGTAGCTTT", 2)
('AGTAGCTTT", 5)
It should be:
('AGTAGCTTT', 2)
('GCT...', 3)
I'm not sure you understand your code very well and your use of Counter isn't really how it's intended to be used I think.
You start by checking if the substring is in the sequence (line) for each line of the text file, and if it is you add it to a list ls2
Then for every element of that list (which are the whole lines/sequences from the text file) you add 1 to the counter for that key. You do this in a loop, when the whole point of Counter is that you can simply call:
cnt = Counter(ls2)
This all means that you are reporting the most common sequence in the file, which also contains the given subsequence.
Now it is actually a bit hard to say what your exact output should be, without knowing what your dataset.txt looks like.
I would start by tidying up the code a little:
from collections import Counter
subsequences = ['AGC', 'GCT', 'TAC', 'CGT']
def xfind(subseq):
contains_ss = []
with open("dataset.txt", 'r') as f:
for line in f:
if subseq in line:
contains_ss.append(line)
cnt = Counter(contains_ss)
print(cnt.most_common()[0])
for ss in subsequences:
xfind(ss)
I want to write a function that takes a list of words and keys and outputs those keys as dictionary keys with any words starting with that letter attached.
How could this be achieved using simple python 3 code?
eg. takes (['apples', 'apple', 'bananna', 'fan'], 'fad')
returns {'a' : ['apple', 'apples'], 'f' : ['fan']}
so far i have tried:
def dictionary(words, char_keys)
char_keys = remove_duplicates(char_keys)
ret = {}
keys_in_dict = []
words = sorted(words)
for word in words:
if word[0] in char_keys and word[0] not in keys_in_dict:
ret[word[0]] = word
keys_in_dict.append(word[0])
elif word[0] in keys_in_dict:
ret[word[0]] += (word)
return ret
This gives kinda the right output but it the output is in a single string rather than a list of strings.(the def is not indented properly i know)
If the input is a list of strings, you can check if the char is in the dict, if yes, append the word, otherwise add a list with the word:
def dictionary(inpt):
result = {}
for word in inpt:
char = word[0]
if char in result:
result[char].append(word)
else:
result[char] = [word]
return result
The modern way to do this is to use a collections.defaultdict with list as argument.
def dictionary(inpt):
result = defaultdict(list)
for word in inpt:
result[word[0]].append(word)
return result
Not sure if your list of inputs are consisted with only strings or it can also include sub-lists of strings (and I'm not so sure why "fad" disappeared in your example). Obviously, in the latter scenario it will need some more effort. For simplicity I assume if contains only strings and here's a piece of code which hopefully points the direction:
d = {}
for elem in input_list[0]:
if elem[0] in input_list[1]
lst = d.get(elem[0], [])
lst.append(elem)
d[elem] = lst
I am creating a code that allows the user to input a .txt file of their choice. So, for example, if the text read:
"I am you. You ArE I."
I would like my code to create a dictionary that resembles this:
{I: 2, am: 1, you: 2, are: 1}
Having the words in the file appear as the key, and the number of times as the value. Capitalization should be irrelevant, so are = ARE = ArE = arE = etc...
This is my code so far. Any suggestions/help?
>> file = input("\n Please select a file")
>> name = open(file, 'r')
>> dictionary = {}
>> with name:
>> for line in name:
>> (key, val) = line.split()
>> dictionary[int(key)] = val
Take a look at the examples in this answer:
Python : List of dict, if exists increment a dict value, if not append a new dict
You can use collections.Counter() to trivially do what you want, but if for some reason you can't use that, you can use a defaultdict or even a simple loop to build the dictionary you want.
Here is code that solves your problem. This will work in Python 3.1 and newer.
from collections import Counter
import string
def filter_punctuation(s):
return ''.join(ch if ch not in string.punctuation else ' ' for ch in s)
def lower_case_words(f):
for line in f:
line = filter_punctuation(line)
for word in line.split():
yield word.lower()
def count_key(tup):
"""
key function to make a count dictionary sort into descending order
by count, then case-insensitive word order when counts are the same.
tup must be a tuple in the form: (word, count)
"""
word, count = tup
return (-count, word.lower())
dictionary = {}
fname = input("\nPlease enter a file name: ")
with open(fname, "rt") as f:
dictionary = Counter(lower_case_words(f))
print(sorted(dictionary.items(), key=count_key))
From your example I could see that you wanted punctuation stripped away. Since we are going to split the string on white space, I wrote a function that filters punctuation to white space. That way, if you have a string like hello,world this will be split into the words hello and world when we split on white space.
The function lower_case_words() is a generator, and it reads an input file one line at a time and then yields up one word at a time from each line. This neatly puts our input processing into a tidy "black box" and later we can simply call Counter(lower_case_words(f)) and it does the right thing for us.
Of course you don't have to print the dictionary sorted, but I think it looks better this way. I made the sort order put the highest counts first, and where counts are equal, put the words in alphabetical order.
With your suggested input, this is the resulting output:
[('i', 2), ('you', 2), ('am', 1), ('are', 1)]
Because of the sorting it always prints in the above order.