Suppose ,I have a dictionary
key={'a':5}
Now ,I want to add values to it cumulatively without overwriting the current value but adding on to it.How to do it?
I am giving an instance:
for post in doc['post']:
if 'wow' in post:
value=2
for reactor in post['wow']['reactors']:
dict_of_reactor_ids.update({reactor['img_id']:value})
if 'sad' in post:
value=2
for reactor in post['sad']['reactors']:
dict_of_reactor_ids.update({reactor['img_id']:value})
Suppose if the dictionary is like this in first iteration
dict_of_reactor_ids={101:2,102:1}
and NOW I want to increase the value of 101 key by 3 ,then how to do that.
dict_of_reactor_ids={101:5,102:1}
Now in second iteration of post ,I want to add values to the current values in dictionary without overwriting the current value.
I have tried update method but I think it just updates the whole value instead of adding onto it.
Sounds like a typical case of Counter:
>>> from collections import Counter
>>> c = Counter()
>>> c["a"] += 1 # works even though "a" is not yet present
>>> c.update({"a": 2, "b": 2}) # possible to do multiple updates
{"a": 3, "b": 2}
In your case the benefit is that it works even when the key is not already in there (default value is 0), and it allows updates of multiple values at once, whereas update on a normal dict would overwrite the value as you've noticed.
You can also use defaultdict, it "defaults" when there is not yet an existing key-value pair and you still use the cumulative add +=:
from collections import defaultdict
dict_of_reactor_ids = defaultdict(int)
dict_of_reactor_ids[101] += 2
dict_of_reactor_ids[102] += 1
dict_of_reactor_ids['101'] += 3
print(dict_of_reactor_ids['101'])
5
Related
Let's say I have a python dictionary {apple: 15, orange: 22, juice: 7, blueberry:19} and I want to reorder the dictionary according to the order of this list [juice, orange, blueberry, apple]. How would I do this cleanly/efficiently?
Note: the list could very well have more items than the dictionary.
O(n) method is to simply loop over your list of ordered keys:
{k: d[k] for k in keys}
If you want to maintain the same mutable reference to the dictionary (for whatever reason), just delete the key and add it in a loop. It will naturally order itself correctly.
As mentioned in the comments, dictionaries don't have an order in Python. Assuming you want to get the dictionary values in the same order as the list, you could do something like this. It checks to see if the key is in the dictionary first, so extra items in the list won't matter
d = {apple: 15, orange: 22, juice: 7, blueberry:19}
order = [juice, orange, blueberry, apple]
for key in order:
if key in d:
print(d[key])
Alternatively as #ewong mentioned, you can use an OrderedDict, which tracks the order a key is added. It seems like to re-order one you have to create a new OrderedDict, so in your case you could potentially create one from your original dictionary.
from collections import OrderedDict
ordered_d = OrderedDict([(key, d[key]) for key in order if key in d])
how to make a Multidimensional Dictionary with multiple keys and value and how to print its keys and values?
from this format:
main_dictionary= { Mainkey: {keyA: value
keyB: value
keyC: value
}}
I tried to do it but it gives me an error in the manufacturer. here is my code
car_dict[manufacturer] [type]= [( sedan, hatchback, sports)]
Here is my error:
File "E:/Programming Study/testupdate.py", line 19, in campany
car_dict[manufacturer] [type]= [( sedan, hatchback, sports)]
KeyError: 'Nissan'
And my printing code is:
for manufacuted_by, type,sedan,hatchback, sports in cabuyao_dict[bgy]:
print("Manufacturer Name:", manufacuted_by)
print('-' * 120)
print("Car type:", type)
print("Sedan:", sedan)
print("Hatchback:", hatchback)
print("Sports:", sports)
Thank you! I'm new in Python.
I think you have a slight misunderstanding of how a dict works, and how to "call back" the values inside of it.
Let's make two examples for how to create your data-structure:
car_dict = {}
car_dict["Nissan"] = {"types": ["sedan", "hatchback", "sports"]}
print(car_dict) # Output: {'Nissan': {'types': ['sedan', 'hatchback', 'sports']}}
from collections import defaultdict
car_dict2 = defaultdict(dict)
car_dict2["Nissan"]["types"] = ["sedan", "hatchback", "sports"]
print(car_dict2) # Output: defaultdict(<class 'dict'>, {'Nissan': {'types': ['sedan', 'hatchback', 'sports']}})
In both examples above, I first create a dictionary, and then on the row after I add the values I want it to contain. In the first example, I give car_dict the key "Nissan" and set it's values to a new dictionary containing some values.
In the second example I use defaultdict(dict) which basically has the logic of "if i am not given a value for key then use the factory (dict) to create a value for it.
Can you see the difference of how to initiate the values inside of both of the different methods?
When you called car_dict[manufacturer][type] in your code, you hadn't yet initiated car_dict["Nissan"] = value, so when you tried to retrieve it, car_dict returned a KeyError.
As for printing out the values, you can do something like this:
for key in car_dict:
manufacturer = key
car_types = car_dict[key]["types"]
print(f"The manufacturer '{manufacturer}' has the following types:")
for t in car_types:
print(t)
Output:
The manufacturer 'Nissan' has the following types:
sedan
hatchback
sports
When you loop through a dict, you are looping through only the keys that are contained in it by default. That means that we have to retrieve the values of key inside of the loop itself to be able to print them correctly.
Also as a side note: You should try to avoid using Built-in's names such as type as variable names, because you then overwrite that functions namespace, and you can have some problems in the future when you have to do comparisons of types of variables.
I am trying to make a program that finds out how many integers in a list are not the integer that is represented the most in that list. To do that I have a command which creates a dictionary with every value in the list and the number of times it is represented in it. Next I try to create a new list with all items from the older list except the most represented value so I can count the length of the list. The problem is that I cannot access the most represented value in the dictionary as I get an error code.
import operator
import collections
a = [7, 155, 12, 155]
dictionary = collections.Counter(a).items()
b = []
for i in a:
if a != dictionary[max(iter(dictionary), key=operator.itemgetter(1))[0]]:
b.append(a)
I get this error code: TypeError: 'dict_items' object does not support indexing
The variable you called dictionary is not a dict but a dict_items.
>>> type(dictionary)
<class 'dict_items'>
>>> help(dict.items)
items(...)
D.items() -> a set-like object providing a view on D's items
and sets are iterable, not indexable:
for di in dictionary: print(di) # is ok
dictionary[0] # triggers the error you saw
Note that Counter is very rich, maybe using Counter.most_common would do the trick.
I have a nested dictionary and to simplify the problem the sample dictionary below should get the value of key 2 without calling its parent key 1.
dictionary = {1: {2: 4}}
print(dictionary.some_builtin_function(2)) # must print 4
The way you get a value from a nested dictionary in Python as in the example above is:
dictionary[1][2]
dictionary[1] calls the first element of the dictionary. It could really be named anything such as
d = {"parent": {"child": 4}}
in this example you would call, 4 by using the command:
d["parent"]["child"]
This is nice because it allows you to name your dictionary keys rather then just calling them by number.
I'm very new in python (I usually write in php). I want to understand how to store information in an associative array, and if you can explain me whats the difference of "tuples", "arrays", "dictionary" and "list" will be wonderful (I tried to read different source but I still not caching it).
So This is my code:
#!/usr/bin/python3.4
import csv
import string
nidless_keys = dict()
nidless_keys = ['test_string1','test_string2'] #this contain the string to
# be searched in linesreader
data = {'type':[],'id':[]} #here I want to store my information
with open('path/to/csv/file.csv',newline="") as csvfile:
linesreader = csv.reader(csvfile,delimiter=',',quotechar="|")
for row in linesreader: #every line in this csv have a url like
#www.test.com/?test_string1&id=123456
current_row_string = str(row)
for needle in nidless_keys:
current_needle = str(needle)
if current_needle in current_row_string:
data[current_needle[current_row_string[-8:]]) += 1 # also I
#need to count per every id how much rows there are.
In conclusion:
my_data_stored = [current_needle][current_row_string[-8]]
current_row_string[-8] is a url which the last 8 digit of the url is an ID.
So the array should looks like this at the end of the script:
test_string1 = 123456 = 20
= 256468 = 15
test_string2 = 123155 = 10
Edit 1:
Which type I need here to store the information?
Can you tell me how to resolve this script?
It seems you want to count how many times an ID in combination with a test string occurs.
There can be multiple ID/count combinations associated with every test string.
This suggests that you should use a dictionary indexed by the test strings to store the results. In that dictionary I would suggest to store collections.Counter objects.
This way, you would have to add a special case when a key in the results dictionary isn't found to add an empty Counter. This is a common problem, so there is a specialized form of dictionary in the collections module called defaultdict.
import collections
import csv
# Using a tuple for the keys so it cannot be accidentally modified
keys = ('test_string1', 'test_string2')
result = collections.defaultdict(collections.Counter)
with open('path/to/csv/file.csv',newline="") as csvfile:
linesreader = csv.reader(csvfile,delimiter=',',quotechar="|")
for row in linesreader:
for key in keys:
if key in row:
id = row[-6:] # ID's are six digits in your example.
# The first index is into the dict, the second into the Counter.
result[key][id] += 1
There is an even easier way, by using regular expressions.
Since you seem to treat every row in a CSV file as a string, there is little need to use the CSV reader, so I'll just read the whole file as text.
import re
with open('path/to/csv/file.csv') as datafile:
text = datafile.read()
pattern = r'\?(.*)&id=(\d+)'
The pattern is a regular expression. This is a large topic in and of itself, so I'll only cover briefly what it does. (You might also want to check out the relevant HOWTO) At first glance it looks like complete gibberish, but it is actually a complete language.
In looks for two things in a line. Anything between ? and &id=, and a sequence of digits after &id=.
I'll be using IPython to give an example.
(If you don't know it, check out IPython. It is great for trying things and see if they work.)
In [1]: import re
In [2]: pattern = r'\?(.*)&id=(\d+)'
In [3]: text = """www.test.com/?test_string1&id=123456
....: www.test.com/?test_string1&id=123456
....: www.test.com/?test_string1&id=234567
....: www.test.com/?foo&id=234567
....: www.test.com/?foo&id=123456
....: www.test.com/?foo&id=1234
....: www.test.com/?foo&id=1234
....: www.test.com/?foo&id=1234"""
The text variable points to the string which is a mock-up for the contents of your CSV file.
I am assuming that:
every URL is on its own line
ID's are a sequence of digits.
If these assumptions are wrong, this won't work.
Using findall to extract every match of the pattern from the text.
In [4]: re.findall(pattern, test)
Out[4]:
[('test_string1', '123456'),
('test_string1', '123456'),
('test_string1', '234567'),
('foo', '234567'),
('foo', '123456'),
('foo', '1234'),
('foo', '1234'),
('foo', '1234')]
The findall function returns a list of 2-tuples (that is key, ID pairs). Now we just need to count those.
In [5]: import collections
In [6]: result = collections.defaultdict(collections.Counter)
In [7]: intermediate = re.findall(pattern, test)
Now we fill the result dict from the list of matches that is the intermediate result.
In [8]: for key, id in intermediate:
....: result[key][id] += 1
....:
In [9]: print(result)
defaultdict(<class 'collections.Counter'>, {'foo': Counter({'1234': 3, '123456': 1, '234567': 1}), 'test_string1': Counter({'123456': 2, '234567': 1})})
So the complete code would be:
import collections
import re
with open('path/to/csv/file.csv') as datafile:
text = datafile.read()
result = collections.defaultdict(collections.Counter)
pattern = r'\?(.*)&id=(\d+)'
intermediate = re.findall(pattern, test)
for key, id in intermediate:
result[key][id] += 1
This approach has two advantages.
You don't have to know the keys in advance.
ID's are not limited to six digits.
A brief summary of the python data types you mentioned:
A dictionary is an associative array, aka hashtable.
A list is a sequence of values.
An array is essentially the same as a list, but limited to basic datatypes. My impression is that they only exists for performance reasons, don't think I've ever used one. If performance is that critical to you, you probably don't want to use python in the first place.
A tuple is a fixed-length sequence of values (whereas lists and arrays can grow).
Lets take them one by one.
Lists:
List is a very naive kind of data structure similar to arrays in other languages in terms of the way we write them like:
['a','b','c']
This is a list in python , but seems very similar to array structure.
However there is a very large difference in the way lists are used in python and the usual arrays.
Lists are heterogenous in nature. This means that we can store any kind of data simultaneously inside it like:
ls = [1,2,'a','g',True]
As you can see, we have various kinds of data within a list and is a valid list.
However, one important thing about them is that we can access the list items using zero based indices. So we can write:
print ls[0],ls[3]
output: 1 g
Dictionary:
This datastructure is similar to a hash map data structure. It contains a (key,Value) pair. An empty dictionary looks like:
dc = {}
Now, to store a key,value pair, e.g., ('potato',3),(tomato,5), we can do as:
dc['potato'] = 3
dc['tomato'] = 5
and we saved the data in the dictionary dc.
The important thing is that we can even store another data structure element like a list within a dictionary like:
dc['list1'] = ls , where ls is the list defined above.
This shows the power of using dictionary.
In your case, you have difined a dictionary like this:
data = {'type':[],'id':[]}
This means that your dictionary will consist of only two keys and each key corresponds to a list, which are empty for now.
Talking a bit about your script, the expression :
current_row_string[-8:]
doesn't make a sense. The index should have been -6 instead of -8 that would give you the id part of the current row.
This part is the id and should have been stored in a variable say :
id = current_row_string[-6:]
Further action can be performed as seen the answer given by Roland.