Related
I have 5 list of words, which basically act as values in a dictionary where the keys are the IDs of the documents.
For each document, I would like to apply some calculations and display the values and results of the calculation in a nested dictionary.
So far so good, I managed to do everything but I am failing in the easiest part.
When showing the resulting nested dictionary, it seems it's only iterating over the last element of each of the 5 lists, and therefore not showing all the elements...
Could anybody explain me where I am failing??
This is the original dictionary data_docs:
{'doc01': ['simpl', 'hello', 'world', 'test', 'python', 'code'],
'doc02': ['today', 'wonder', 'day'],
'doc03': ['studi', 'pac', 'today'],
'doc04': ['write', 'need', 'cup', 'coffe'],
'doc05': ['finish', 'pac', 'use', 'python']}
This is the result I am getting (missing 'simpl','hello', 'world', 'test', 'python' in doc01 as example):
{'doc01': {'code': 0.6989700043360189},
'doc02': {'day': 0.6989700043360189},
'doc03': {'today': 0.3979400086720376},
'doc04': {'coffe': 0.6989700043360189},
'doc05': {'python': 0.3979400086720376}}
And this is the code:
def tfidf (data, idf_score): #function, 2 dictionaries as parameters
tfidf = {} #dict for output
for word, val in data.items(): #for each word and value in data_docs(first dict)
for v in val: #for each value in each list
a = val.count(v) #count the number of times that appears in that list
scores = {v :a * idf_score[v]} # dictionary that will act as value in the nested
tfidf[word] = scores #final dictionary, the key is doc01,doc02... and the value the above dict
return tfidf
tfidf(data_docs, idf_score)
Thanks,
Did you mean to do this?
def tfidf(data, idf_score): # function, 2 dictionaries as parameters
tfidf = {} # dict for output
for word, val in data.items(): # for each word and value in data_docs(first dict)
scores = {} # <---- a new dict for each outer iteration
for v in val: # for each value in each list
a = val.count(v) # count the number of times that appears in that list
scores[v] = a * idf_score[v] # <---- keep adding items to the dictionary
tfidf[word] = scores # final dictionary, the key is doc01,doc02... and the value the above dict
return tfidf
... see my changes with <----- arrow :)
Returns:
{'doc01': {'simpl': 1,
'hello': 1,
'world': 1,
'test': 1,
'python': 1,
'code': 1},
'doc02': {'today': 1, 'wonder': 1, 'day': 1},
'doc03': {'studi': 1, 'pac': 1, 'today': 1},
'doc04': {'write': 1, 'need': 1, 'cup': 1, 'coffe': 1},
'doc05': {'finish': 1, 'pac': 1, 'use': 1, 'python': 1}}
I have a dictionary with lots of data from a CSV file where the key is the row number and the value is a list that contains the column data.
What I want to do is to check from a data of each line (key) and from column 2, I take its data from column 4 and then look for this data in column 2 of another line (key) and take its data in column 4 and continue until it finds the last value of column 4 in column 2.
My code is this:
dict_FH_coord = self.dict_FH()
site_SI='some_site'
list_test = [site_SI]
while len(list_test) > 0:
for ele in list_test:
for cle, val in dict_FH_coord.items():
list_test = []
if val[2] == ele:
list_test.append(val[4])
list_def.append(val)
print(val)
But this code does not work because it stops at the first iteration and it finds that the elements linked to the starting site_SI only
Is there a way to do successive iterations with the list list_test which becomes dynamic to solve my problem?
If you want to modify list 'on air' you sohuld do something like
a = [1, 2, 3]
a[:] = [1, 2]
In your case the only way you may use this inside the loop (avoiding infinite list size increasement):
if val[2] == ele:
list_test[:] = list_test[1:]+[val[4]]
list_def.append(val)
else:
list_test[:] = list_test[1:]
But it wont work as intended because previous iteration ends at index 1 (for ele in list_test:), and list_test would never change in size.
Both this cases can not be merged with each other.
I suggest you to use Queue, but be careful to avoid infinite links looping inside your data:
from queue import Queue
dict_FH_coord = {
1: [0, 1, 'some_site', 3, 'some_site_2'],
2: [0, 1, 'some_site_2', 3, 'some_site_3'],
3: [0, 1, 'some_site_2', 3, 'some_site_4'],
4: [0, 1, 'some_site_3', 3, 'some_site_5'],
}
site_SI = 'some_site'
unvisited = Queue()
unvisited.put(site_SI)
list_def = list()
while not unvisited.empty():
ele = unvisited.get()
for cle, val in dict_FH_coord.items():
if val[2] == ele:
unvisited.put(val[4])
list_def.append(val)
print(val)
I tried a few solutions :
1.)
uniqueValues, indexList,occurCount = np.unique(desired_array,
return_index=True, return_counts=True)
print(uniqueValues,indexList,occurCount)
However the indexList only gives first occurrence of a number. For example : if num 33 occurred at 20,56,3000, indexList would only show that it occurred at 20. Since 33 occurs less than 10 times, i.e 3 times, I need all the locations.
2.) I decided to use dictionary to find all the index locations. But this is not working.
for i in range(5000):
...: if not d.get(i):
...: d[desired_array[i]]=[i]
...: else:
...: indices = d[desired_array[i]]
...: indices.append(i)
This jobs screams for collections.Counter:
from collections import Counter
desired_array = [1, 2, 3, 1, 3, 5, 3]
result = Counter(desired_array)
print(result)
This will print out the unique elements and the count of occurrences:
Counter({3: 3, 1: 2, 2: 1, 5: 1})
You can replace
for i in range(1250):
var = desired_array[i]
if not d.get(var):
d[var] = []
# print(var)
s = d[var]
s.append(i)
with
for i in range(1250):
var = desired_array[i]
d.setdefault(var, []).append(i)
According to the documentation dict.setdefault(key, default):
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
To write a csv file it's best to use the standard csv.Writer class:
import csv
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
If you want to write the key/value pairs of your dict to the csv file you need to write something like:
with open('some.csv', 'w', newline='') as f:
writer = csv.writer(f)
for k, v in desired_array.items():
writer.writerow((k, v))
I have a csv file that I read with csv module in a csv.DictReader().
I have an output like this:
{'biweek': '1', 'year': '1906', 'loc': 'BALTIMORE', 'cases': 'NA', 'pop': '526822.1365'}
{'biweek': '2', 'year': '1906', 'loc': 'BALTIMORE', 'cases': 'NA', 'pop': '526995.246'}
{'biweek': '3', 'year': '1906', 'loc': 'BALTIMORE', 'cases': 'NA', 'pop': '527170.1981'}
{'biweek': '4', 'year': '1906', 'loc': 'BALTIMORE', 'cases': 'NA', 'pop': '527347.0136'}
And I need to get the 'loc' as key for a new dict and the count of the 'loc' as values for that new dict, as the 'loc' have a lot of repetitions in the file.
with open('Dalziel2015_data.csv') as fh:
new_dct = {}
cities = set()
cnt = 0
reader = csv.DictReader(fh)
for row in reader:
data = dict(row)
cities.add(data.get('loc'))
for (k, v) in data.items():
if data['loc'] in cities:
cnt += 1
new_dct[data['loc']] = cnt + 1
print(new_dct)
example_file:
biweek,year,loc,cases,pop
1,1906,BALTIMORE,NA,526822.1365
2,1906,BALTIMORE,NA,526995.246
3,1906,BALTIMORE,NA,527170.1981
4,1906,BALTIMORE,NA,527347.0136
5,1906,BALTIMORE,NA,527525.7134
6,1906,BALTIMORE,NA,527706.3183
4,1906,BOSTON,NA,630880.6579
5,1906,BOSTON,NA,631295.9457
6,1906,BOSTON,NA,631710.8403
7,1906,BOSTON,NA,632125.3403
8,1906,BOSTON,NA,632539.4442
9,1906,BOSTON,NA,632953.1503
10,1907,BRIDGEPORT,NA,91790.75578
11,1907,BRIDGEPORT,NA,91926.14732
12,1907,BRIDGEPORT,NA,92061.90153
13,1907,BRIDGEPORT,NA,92198.01976
14,1907,BRIDGEPORT,NA,92334.50335
15,1907,BRIDGEPORT,NA,92471.35364
17,1908,BUFFALO,NA,413661.413
18,1908,BUFFALO,NA,413934.7646
19,1908,BUFFALO,NA,414208.4097
20,1908,BUFFALO,NA,414482.3523
21,1908,BUFFALO,NA,414756.5963
22,1908,BUFFALO,NA,415031.1456
23,1908,BUFFALO,NA,415306.0041
24,1908,BUFFALO,NA,415581.1758
25,1908,BUFFALO,NA,415856.6646
6,1935,CLEVELAND,615,890247.9867
7,1935,CLEVELAND,954,890107.9192
8,1935,CLEVELAND,965,889967.7823
9,1935,CLEVELAND,872,889827.5956
10,1935,CLEVELAND,814,889687.3781
11,1935,CLEVELAND,717,889547.1492
12,1935,CLEVELAND,770,889406.9283
13,1935,CLEVELAND,558,889266.7346
I have done this. I got the keys alright, but I didn't get the count right.
My results:
{'BALTIMORE': 29, 'BOSTON': 59, 'BRIDGEPORT': 89, 'BUFFALO': 134, 'CLEVELAND': 174}
I know pandas is a very good tool but I need the code with csv module.
If any of you guys could help me to get the count done I appreciate.
Thank you!
Paulo
You can use collections.Counter to count occurrences of the cities in CSV file. Counter.keys() will also give you all cities found in CSV:
import csv
from collections import Counter
with open('csvtest.csv') as fh:
reader = csv.DictReader(fh)
c = Counter(row['loc'] for row in reader)
print(dict(c))
print('Cities={}'.format([*c.keys()]))
Prints:
{'BALTIMORE': 6, 'BOSTON': 6, 'BRIDGEPORT': 6, 'BUFFALO': 9, 'CLEVELAND': 8}
Cities=['BALTIMORE', 'BOSTON', 'BRIDGEPORT', 'BUFFALO', 'CLEVELAND']
You are updating a global counter and not the counter for the specific location. You are also iterating each column of each row and updating it for no reason.
Try this:
with open('Dalziel2015_data.csv') as fh:
new_dct = {}
cities = set()
reader = csv.DictReader(fh)
for row in reader:
data = dict(row)
new_dct[data['loc']] = new_dct.get(data['loc'], 0) + 1
print(new_dct)
This line: new_dct[data['loc']] = new_dct.get(data['loc'], 0) + 1 will get the last counter for that city and increment the number by one. If the counter does not exist yet, the function get will return 0.
I am trying to create several dictionaries out of a table of comments from a CSV with the following columns:
I need to create a dictionary for every row (hopefully using a loop so I don't have to create them all manually), where the dictionary keys are:
ID
ReviewType
Comment
However, I cannot figure out a fast way to do this. I tried creating a list of dictionaries using the following code:
# Import libraries
import csv
import json
import pprint
# Open file
reader = csv.DictReader(open('Comments.csv', 'rU'))
# Create list of dictionaries
dict_list = []
for line in reader:
dict_list.append(line)
pprint.pprint(dict_list)
However, now I do not know how to access the dictionaries or whether the key value pairs are matched properly since in the following image:
The ID, ReviewType and Comment do not seem to be showing as
dictionary keys
The Comment value seems to be showing as a list of half-sentences.
Is there any way to just create one dictionary for each row instead of a list of dictionaries?
Note: I did look at this question, however it didn't really help.
Here you go. I put the comment into an array
# Import libraries
import csv
import json
import pprint
# Open file
def readPerfReviewCSVToDict(csvPath):
reader = csv.DictReader(open(csvPath, 'rU'))
perfReviewsDictionary = []
for line in reader:
perfReviewsDictionary.append(line)
perfReviewsDictionaryWithCommentsSplit = []
for item in perfReviewsDictionary:
itemId = item["id"]
itemType = item["type"]
itemComment = item["comments"]
itemCommentDictionary = []
itemCommentDictionary = itemComment.split()
perfReviewsDictionaryWithCommentsSplit.append({'id':itemId, 'type':itemType, 'comments':itemCommentDictionary})
return perfReviewsDictionaryWithCommentsSplit
dict_list = readPerfReviewCSVToDict("test.csv")
pprint.pprint(dict_list)
The output is:
[{'comments': ['test', 'ape', 'dog'], 'id': '1', 'type': 'Test'},
{'comments': ['dog'], 'id': '2', 'type': 'Test'}]
Since you haven't given a reproducible example, with a sample DataFrame, I've created one for you
import pandas as pd
df = pd.DataFrame([[1, "Contractor", "Please post"], [2, "Developer", "a reproducible example"]])
df.columns = ['ID', 'ReviewType', 'Comment']
In your computer, instead of doing this, type:
df = pd.read_csv(file_path)
to read in the csv file as a pandas DataFrame.
Now I will create a list, called dictList which will be empty initially, I am going to populate it with a dictionary for each row in the DataFrame df
dictList = []
#Iterate over each row in df
for i in df.index:
#Creating an empty dictionary for each row
rowDict = {}
#Populating it
rowDict['ID'] = df.at[i, 'ID']
rowDict['ReviewType'] = df.at[i, 'ReviewType']
rowDict['Comment'] = df.at[i, 'Comment']
#Once I'm done populating it, I will append it to the list
dictList.append(rowDict)
#Go to the next row and repeat.
Now iterating over the list of dictionaries we have created for my example
for i in dictList:
print(i)
We get
{'ID': 1, 'ReviewType': 'Contractor', 'Comment': 'Please post'}
{'ID': 2, 'ReviewType': 'Developer', 'Comment': 'a reproducible example'}
Do you want this?
DICT = {}
for line in reader:
DICT[line['ID']] = line