Count occurrences of item in JSON element grouped by another element - python-3.x

I am trying to count the number of occurences of an item (Activity) in a json file grouped by another item (Source). Example json below.
"No": "9",
"Time": "08:12",
"Source": "location1",
"Dest": "location3",
"Activity": "fast"
My code below so far counts the occurences of each Activity
from collections import Counter
import json
with open('dataset_3.json', 'r') as json_file:
json_data = json.load(json_file) # loads json data
c = Counter(item['Activity'] for item in json_data)
The code correctly counts and outputs below.
Counter({'fast': 8, 'medium': 1, 'slow': 1})
I would like now to count each occurence of activity again, but grouped by location so the output should be something like:
location 1 Fast: 8, Medium: 1, Slow: 2
loctaion 2 Fast: 6, Medium: 3, Slow: 4
I have tried the code below but the output is not correct (see below)
with open('dataset_3.json', 'r') as json_file:
json_data = json.load(json_file) # loads json data
for item in json_data:
if item['Source'] == 'location1':
c = Counter(item['Activity'])
Counter({'f': 3, 'a': 1, 's': 1, 't'})
Counter({'s': 1, 'l': 1, 'o': 1, 'w'})

You can put an if inside the generator statement for the Counter to add a condition to the for loop. I pasted your code with the fix below:
from collections import Counter
import json
with open('dataset_3.json', 'r') as json_file:
json_data = json.load(json_file) # loads json data
c = Counter(item['Activity'] for item in json_data if item['Source'] == 'location1')


Python nested dictionary Issue when iterating

I have 5 list of words, which basically act as values in a dictionary where the keys are the IDs of the documents.
For each document, I would like to apply some calculations and display the values and results of the calculation in a nested dictionary.
So far so good, I managed to do everything but I am failing in the easiest part.
When showing the resulting nested dictionary, it seems it's only iterating over the last element of each of the 5 lists, and therefore not showing all the elements...
Could anybody explain me where I am failing??
This is the original dictionary data_docs:
{'doc01': ['simpl', 'hello', 'world', 'test', 'python', 'code'],
'doc02': ['today', 'wonder', 'day'],
'doc03': ['studi', 'pac', 'today'],
'doc04': ['write', 'need', 'cup', 'coffe'],
'doc05': ['finish', 'pac', 'use', 'python']}
This is the result I am getting (missing 'simpl','hello', 'world', 'test', 'python' in doc01 as example):
{'doc01': {'code': 0.6989700043360189},
'doc02': {'day': 0.6989700043360189},
'doc03': {'today': 0.3979400086720376},
'doc04': {'coffe': 0.6989700043360189},
'doc05': {'python': 0.3979400086720376}}
And this is the code:
def tfidf (data, idf_score): #function, 2 dictionaries as parameters
tfidf = {} #dict for output
for word, val in data.items(): #for each word and value in data_docs(first dict)
for v in val: #for each value in each list
a = val.count(v) #count the number of times that appears in that list
scores = {v :a * idf_score[v]} # dictionary that will act as value in the nested
tfidf[word] = scores #final dictionary, the key is doc01,doc02... and the value the above dict
return tfidf
tfidf(data_docs, idf_score)
Did you mean to do this?
def tfidf(data, idf_score): # function, 2 dictionaries as parameters
tfidf = {} # dict for output
for word, val in data.items(): # for each word and value in data_docs(first dict)
scores = {} # <---- a new dict for each outer iteration
for v in val: # for each value in each list
a = val.count(v) # count the number of times that appears in that list
scores[v] = a * idf_score[v] # <---- keep adding items to the dictionary
tfidf[word] = scores # final dictionary, the key is doc01,doc02... and the value the above dict
return tfidf
... see my changes with <----- arrow :)
{'doc01': {'simpl': 1,
'hello': 1,
'world': 1,
'test': 1,
'python': 1,
'code': 1},
'doc02': {'today': 1, 'wonder': 1, 'day': 1},
'doc03': {'studi': 1, 'pac': 1, 'today': 1},
'doc04': {'write': 1, 'need': 1, 'cup': 1, 'coffe': 1},
'doc05': {'finish': 1, 'pac': 1, 'use': 1, 'python': 1}}

while loop with dynamic list in python using a dictionnary

I have a dictionary with lots of data from a CSV file where the key is the row number and the value is a list that contains the column data.
What I want to do is to check from a data of each line (key) and from column 2, I take its data from column 4 and then look for this data in column 2 of another line (key) and take its data in column 4 and continue until it finds the last value of column 4 in column 2.
My code is this:
dict_FH_coord = self.dict_FH()
list_test = [site_SI]
while len(list_test) > 0:
for ele in list_test:
for cle, val in dict_FH_coord.items():
list_test = []
if val[2] == ele:
But this code does not work because it stops at the first iteration and it finds that the elements linked to the starting site_SI only
Is there a way to do successive iterations with the list list_test which becomes dynamic to solve my problem?
If you want to modify list 'on air' you sohuld do something like
a = [1, 2, 3]
a[:] = [1, 2]
In your case the only way you may use this inside the loop (avoiding infinite list size increasement):
if val[2] == ele:
list_test[:] = list_test[1:]+[val[4]]
list_test[:] = list_test[1:]
But it wont work as intended because previous iteration ends at index 1 (for ele in list_test:), and list_test would never change in size.
Both this cases can not be merged with each other.
I suggest you to use Queue, but be careful to avoid infinite links looping inside your data:
from queue import Queue
dict_FH_coord = {
1: [0, 1, 'some_site', 3, 'some_site_2'],
2: [0, 1, 'some_site_2', 3, 'some_site_3'],
3: [0, 1, 'some_site_2', 3, 'some_site_4'],
4: [0, 1, 'some_site_3', 3, 'some_site_5'],
site_SI = 'some_site'
unvisited = Queue()
list_def = list()
while not unvisited.empty():
ele = unvisited.get()
for cle, val in dict_FH_coord.items():
if val[2] == ele:

Task : Find unique elements in an array. Count their occurrences. Find the numbers that occur less than 10 times in an array of 5000 elements

I tried a few solutions :
uniqueValues, indexList,occurCount = np.unique(desired_array,
return_index=True, return_counts=True)
However the indexList only gives first occurrence of a number. For example : if num 33 occurred at 20,56,3000, indexList would only show that it occurred at 20. Since 33 occurs less than 10 times, i.e 3 times, I need all the locations.
2.) I decided to use dictionary to find all the index locations. But this is not working.
for i in range(5000):
...: if not d.get(i):
...: d[desired_array[i]]=[i]
...: else:
...: indices = d[desired_array[i]]
...: indices.append(i)
This jobs screams for collections.Counter:
from collections import Counter
desired_array = [1, 2, 3, 1, 3, 5, 3]
result = Counter(desired_array)
This will print out the unique elements and the count of occurrences:
Counter({3: 3, 1: 2, 2: 1, 5: 1})
You can replace
for i in range(1250):
var = desired_array[i]
if not d.get(var):
d[var] = []
# print(var)
s = d[var]
for i in range(1250):
var = desired_array[i]
d.setdefault(var, []).append(i)
According to the documentation dict.setdefault(key, default):
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
To write a csv file it's best to use the standard csv.Writer class:
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
If you want to write the key/value pairs of your dict to the csv file you need to write something like:
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
for k, v in desired_array.items():
writer.writerow((k, v))

Need help to convert some keys from a tuple dictionary in another dictionary

I have a csv file that I read with csv module in a csv.DictReader().
I have an output like this:
{'biweek': '1', 'year': '1906', 'loc': 'BALTIMORE', 'cases': 'NA', 'pop': '526822.1365'}
{'biweek': '2', 'year': '1906', 'loc': 'BALTIMORE', 'cases': 'NA', 'pop': '526995.246'}
{'biweek': '3', 'year': '1906', 'loc': 'BALTIMORE', 'cases': 'NA', 'pop': '527170.1981'}
{'biweek': '4', 'year': '1906', 'loc': 'BALTIMORE', 'cases': 'NA', 'pop': '527347.0136'}
And I need to get the 'loc' as key for a new dict and the count of the 'loc' as values for that new dict, as the 'loc' have a lot of repetitions in the file.
with open('Dalziel2015_data.csv') as fh:
new_dct = {}
cities = set()
cnt = 0
reader = csv.DictReader(fh)
for row in reader:
data = dict(row)
for (k, v) in data.items():
if data['loc'] in cities:
cnt += 1
new_dct[data['loc']] = cnt + 1
I have done this. I got the keys alright, but I didn't get the count right.
My results:
{'BALTIMORE': 29, 'BOSTON': 59, 'BRIDGEPORT': 89, 'BUFFALO': 134, 'CLEVELAND': 174}
I know pandas is a very good tool but I need the code with csv module.
If any of you guys could help me to get the count done I appreciate.
Thank you!
You can use collections.Counter to count occurrences of the cities in CSV file. Counter.keys() will also give you all cities found in CSV:
import csv
from collections import Counter
with open('csvtest.csv') as fh:
reader = csv.DictReader(fh)
c = Counter(row['loc'] for row in reader)
You are updating a global counter and not the counter for the specific location. You are also iterating each column of each row and updating it for no reason.
Try this:
with open('Dalziel2015_data.csv') as fh:
new_dct = {}
cities = set()
reader = csv.DictReader(fh)
for row in reader:
data = dict(row)
new_dct[data['loc']] = new_dct.get(data['loc'], 0) + 1
This line: new_dct[data['loc']] = new_dct.get(data['loc'], 0) + 1 will get the last counter for that city and increment the number by one. If the counter does not exist yet, the function get will return 0.

How to create a dictionary from CSV using a loop function?

I am trying to create several dictionaries out of a table of comments from a CSV with the following columns:
I need to create a dictionary for every row (hopefully using a loop so I don't have to create them all manually), where the dictionary keys are:
However, I cannot figure out a fast way to do this. I tried creating a list of dictionaries using the following code:
# Import libraries
import csv
import json
import pprint
# Open file
reader = csv.DictReader(open('Comments.csv', 'rU'))
# Create list of dictionaries
dict_list = []
for line in reader:
However, now I do not know how to access the dictionaries or whether the key value pairs are matched properly since in the following image:
The ID, ReviewType and Comment do not seem to be showing as
dictionary keys
The Comment value seems to be showing as a list of half-sentences.
Is there any way to just create one dictionary for each row instead of a list of dictionaries?
Note: I did look at this question, however it didn't really help.
Here you go. I put the comment into an array
# Import libraries
import csv
import json
import pprint
# Open file
def readPerfReviewCSVToDict(csvPath):
reader = csv.DictReader(open(csvPath, 'rU'))
perfReviewsDictionary = []
for line in reader:
perfReviewsDictionaryWithCommentsSplit = []
for item in perfReviewsDictionary:
itemId = item["id"]
itemType = item["type"]
itemComment = item["comments"]
itemCommentDictionary = []
itemCommentDictionary = itemComment.split()
perfReviewsDictionaryWithCommentsSplit.append({'id':itemId, 'type':itemType, 'comments':itemCommentDictionary})
return perfReviewsDictionaryWithCommentsSplit
dict_list = readPerfReviewCSVToDict("test.csv")
The output is:
[{'comments': ['test', 'ape', 'dog'], 'id': '1', 'type': 'Test'},
{'comments': ['dog'], 'id': '2', 'type': 'Test'}]
Since you haven't given a reproducible example, with a sample DataFrame, I've created one for you
import pandas as pd
df = pd.DataFrame([[1, "Contractor", "Please post"], [2, "Developer", "a reproducible example"]])
df.columns = ['ID', 'ReviewType', 'Comment']
In your computer, instead of doing this, type:
df = pd.read_csv(file_path)
to read in the csv file as a pandas DataFrame.
Now I will create a list, called dictList which will be empty initially, I am going to populate it with a dictionary for each row in the DataFrame df
dictList = []
#Iterate over each row in df
for i in df.index:
#Creating an empty dictionary for each row
rowDict = {}
#Populating it
rowDict['ID'] =[i, 'ID']
rowDict['ReviewType'] =[i, 'ReviewType']
rowDict['Comment'] =[i, 'Comment']
#Once I'm done populating it, I will append it to the list
#Go to the next row and repeat.
Now iterating over the list of dictionaries we have created for my example
for i in dictList:
We get
{'ID': 1, 'ReviewType': 'Contractor', 'Comment': 'Please post'}
{'ID': 2, 'ReviewType': 'Developer', 'Comment': 'a reproducible example'}
Do you want this?
DICT = {}
for line in reader:
DICT[line['ID']] = line
