Can't figure how to stop dictionary keys from overwriting themselves - python-3.x

I'm trying to create a dictionary and my dictionary keys keep overwriting themselves. I don't understand how I can handle this issue.
Here's the script:
import MDAnalysis as mda
u = mda.Universe('rps5.prmtop', 'rps5.inpcrd')
ca = u.select_atoms('protein')
charges = ca.charges
atom_types = ca.names
resnames = ca.resnames
charge_dict = {}
for i in range(len(charges)):
#print(i+1 ,resnames[i], atom_types[i], charges[i])
charge_dict[resnames[i]] = {}
charge_dict[resnames[i]][atom_types[i]] = charges[i]
print(charge_dict)
The charges, atom_types and resnames are all lists, with the same number of elements.
I want my dictionary to look like this: charge_dict[resname][atom_types] = charges (charge_dict['MET']['CA'] = 0.32198, for example).
Could you please help me with this issue?

Without actually seeing a complete problem description, my guess is that your final result is that each charge_dict[name] is a dictionary with just one key. That's not because the keys "overwrite themselves". Your program overwrites them explicitly: charge_dict[resnames[i]] = {}.
What you want is to only reset the value for that key if it is not already set. You could easily do that by first testing if resnames[i] not in charge_dict:, but the Python standard library provides an even simpler mechanism: collections.defaultdict. A defaultdict is a dictionary with an associated default value creator. So you can do the following:
from collections import defaultdict
charge_dict = defaultdict(dict)
After that, you won't need to worry about initializing charge_dict[name] because a new dictionary will automatically spring into existence when the default value function (dict) is called.

Related

How to convert all dict key from str to float

I have got this current dictionary :
mydict = { "123.23":10.50, "45.22":53, "12":123 }
and I would to get this dictionary (with key as float):
mydict = { 123:23:10.50, 45.22:53, 12:123 }
I know that I can iterate over key and recreate a new dict like this:
new_dict = {}
for k in mydict.keys():
new_dict[float(k)]=mydict[k]
but I expect that it may be possible to convert dict key "inline" ( without to have to recreate a new dict ) ...
What is the most efficient method to do it ?
I suggest you to use a dictionary comprehension, which is easy to understand, as follows:
my_dict = { "123.23":10.50, "45.22":53, "12":123 }
my_dict = {float(i):j for i,j in mydict.items()}
print(my_dict) # {123.23: 10.5, 45.22: 53, 12.0: 123}
Use comprehension :
new_dict = { float(k): v for k, v in mydict.items() }
I expect that it may be possible to convert dict key "inline" ( without to have to recreate a new dict ) ...
What is the most efficient method to do it ?
Unless it materially matters to your runtime and you have time to waste profiling things and trying out various configurations, I'd strongly recommend just creating a second dict using a dict comprehension and focusing on actually relevant concerns: because dict views are "live" updating the dict as you iterate the keys directly may have odd side-effects e.g. you might find yourself iterating it twice as you first iterate the original keys, then try the keys you added; or the iteration might break entirely as deletions lead to internal storage compaction and the iterator gets invalidated.
So to change the key types without creating a new dict, you need to first copy the keys to a list, then iterate that and move values from one key to another:
for k in list(mydict.keys()):
mydict[float(k)] = mydict.pop(k)
However because of the deletions this may or may not be more efficient than creating a new dict with the proper layout, so the "optimisation" would be anything but.

Reading and writing to a dictionary dynamically

I am trying to read value from dictionary and then write to a different one.
The following works, but is hardcoded
this_application['bounces'] = {}
this_application['bounces']['month'] = {}
this_application['bounces']['month']['summary'] = {}
try:
got_value = application.ga_data['ga:bounces']['ga:month']['summary']['recent']
except:
got_value = ""
this_application['bounces']['month']['summary']['recent'] = got_value
What I want to do though, is pass in a from and to list (as I will have lots of these).
I was imagining the input would be something like this
{"ga_data": [{"from": "ga:bounces.ga:month.summary.recent", "to": "bounces.month.summary.recent"},{"from": "ga:sessions.ga:month.summary.recent", "to": "sessions.month.summary.recent"}]}
In which case it would do the above twice (with checking for existing dictionaries etc). I am fine on the checking etc., it is how to use the above that I am stuck on.
Any help would be appreciated
Thanks
You can use defaultdict but there are some special things to consider when doing it. If you read a value that does not exist it will add an empty value to the dict.
import collections
nested_dict = lambda: collections.defaultdict(nested_dict)
d = nested_dict()
d[1][2][3] = 'Hello, dictionary!'
print(d[2]) # I read the value and thus added the key to the dict
print(d[1][2][3]) # Prints Hello, dictionary!
print(d.keys()) # Shows that [1, 2] are keys
Credit To: How can I get Python to automatically create missing key/value pairs in a dictionary?

Create nested python dictionary and assign value inline?

I know that one can do something like this
d = {}
d['key1'] = {'Innerkey1':{'Response':'value','Type':'value2'}}
However, I need something like this
d['Key1']['Innerkey1'] = {'Response':'value','Type':'value2'}
as i'm consistently adding new innerkeys depending on various factors, and if I were to just do
d['Key1'] = {'NewInnerkey2': {'Response':'value','Type':'value2'}}
it overwrites and replaces innerkey1.
I suppose I can initialize d['key1'] first and use
.append({'NewInnerkey2': {'Response':'value','Type':'value2'}}),
however there are many different spots where a new primary or inner key may need to be initialized or not, and so it would lead to quite a bit of fluff to check if whether it's been initialized or not and etc.
Any ideas?
You can use a defaultdict
d = defaultdict(dict)
d['key1']['innerkey'] = {'Response':'value','Type':'value1'} # won't throw errors
d['key2']['innerkey2'] = {'Response':'value','Type':'value2'} # won't overwrite the value for innerkey

List, tuples or dictionary, differences and usage, How can I store info in python

I'm very new in python (I usually write in php). I want to understand how to store information in an associative array, and if you can explain me whats the difference of "tuples", "arrays", "dictionary" and "list" will be wonderful (I tried to read different source but I still not caching it).
So This is my code:
#!/usr/bin/python3.4
import csv
import string
nidless_keys = dict()
nidless_keys = ['test_string1','test_string2'] #this contain the string to
# be searched in linesreader
data = {'type':[],'id':[]} #here I want to store my information
with open('path/to/csv/file.csv',newline="") as csvfile:
linesreader = csv.reader(csvfile,delimiter=',',quotechar="|")
for row in linesreader: #every line in this csv have a url like
#www.test.com/?test_string1&id=123456
current_row_string = str(row)
for needle in nidless_keys:
current_needle = str(needle)
if current_needle in current_row_string:
data[current_needle[current_row_string[-8:]]) += 1 # also I
#need to count per every id how much rows there are.
In conclusion:
my_data_stored = [current_needle][current_row_string[-8]]
current_row_string[-8] is a url which the last 8 digit of the url is an ID.
So the array should looks like this at the end of the script:
test_string1 = 123456 = 20
= 256468 = 15
test_string2 = 123155 = 10
Edit 1:
Which type I need here to store the information?
Can you tell me how to resolve this script?
It seems you want to count how many times an ID in combination with a test string occurs.
There can be multiple ID/count combinations associated with every test string.
This suggests that you should use a dictionary indexed by the test strings to store the results. In that dictionary I would suggest to store collections.Counter objects.
This way, you would have to add a special case when a key in the results dictionary isn't found to add an empty Counter. This is a common problem, so there is a specialized form of dictionary in the collections module called defaultdict.
import collections
import csv
# Using a tuple for the keys so it cannot be accidentally modified
keys = ('test_string1', 'test_string2')
result = collections.defaultdict(collections.Counter)
with open('path/to/csv/file.csv',newline="") as csvfile:
linesreader = csv.reader(csvfile,delimiter=',',quotechar="|")
for row in linesreader:
for key in keys:
if key in row:
id = row[-6:] # ID's are six digits in your example.
# The first index is into the dict, the second into the Counter.
result[key][id] += 1
There is an even easier way, by using regular expressions.
Since you seem to treat every row in a CSV file as a string, there is little need to use the CSV reader, so I'll just read the whole file as text.
import re
with open('path/to/csv/file.csv') as datafile:
text = datafile.read()
pattern = r'\?(.*)&id=(\d+)'
The pattern is a regular expression. This is a large topic in and of itself, so I'll only cover briefly what it does. (You might also want to check out the relevant HOWTO) At first glance it looks like complete gibberish, but it is actually a complete language.
In looks for two things in a line. Anything between ? and &id=, and a sequence of digits after &id=.
I'll be using IPython to give an example.
(If you don't know it, check out IPython. It is great for trying things and see if they work.)
In [1]: import re
In [2]: pattern = r'\?(.*)&id=(\d+)'
In [3]: text = """www.test.com/?test_string1&id=123456
....: www.test.com/?test_string1&id=123456
....: www.test.com/?test_string1&id=234567
....: www.test.com/?foo&id=234567
....: www.test.com/?foo&id=123456
....: www.test.com/?foo&id=1234
....: www.test.com/?foo&id=1234
....: www.test.com/?foo&id=1234"""
The text variable points to the string which is a mock-up for the contents of your CSV file.
I am assuming that:
every URL is on its own line
ID's are a sequence of digits.
If these assumptions are wrong, this won't work.
Using findall to extract every match of the pattern from the text.
In [4]: re.findall(pattern, test)
Out[4]:
[('test_string1', '123456'),
('test_string1', '123456'),
('test_string1', '234567'),
('foo', '234567'),
('foo', '123456'),
('foo', '1234'),
('foo', '1234'),
('foo', '1234')]
The findall function returns a list of 2-tuples (that is key, ID pairs). Now we just need to count those.
In [5]: import collections
In [6]: result = collections.defaultdict(collections.Counter)
In [7]: intermediate = re.findall(pattern, test)
Now we fill the result dict from the list of matches that is the intermediate result.
In [8]: for key, id in intermediate:
....: result[key][id] += 1
....:
In [9]: print(result)
defaultdict(<class 'collections.Counter'>, {'foo': Counter({'1234': 3, '123456': 1, '234567': 1}), 'test_string1': Counter({'123456': 2, '234567': 1})})
So the complete code would be:
import collections
import re
with open('path/to/csv/file.csv') as datafile:
text = datafile.read()
result = collections.defaultdict(collections.Counter)
pattern = r'\?(.*)&id=(\d+)'
intermediate = re.findall(pattern, test)
for key, id in intermediate:
result[key][id] += 1
This approach has two advantages.
You don't have to know the keys in advance.
ID's are not limited to six digits.
A brief summary of the python data types you mentioned:
A dictionary is an associative array, aka hashtable.
A list is a sequence of values.
An array is essentially the same as a list, but limited to basic datatypes. My impression is that they only exists for performance reasons, don't think I've ever used one. If performance is that critical to you, you probably don't want to use python in the first place.
A tuple is a fixed-length sequence of values (whereas lists and arrays can grow).
Lets take them one by one.
Lists:
List is a very naive kind of data structure similar to arrays in other languages in terms of the way we write them like:
['a','b','c']
This is a list in python , but seems very similar to array structure.
However there is a very large difference in the way lists are used in python and the usual arrays.
Lists are heterogenous in nature. This means that we can store any kind of data simultaneously inside it like:
ls = [1,2,'a','g',True]
As you can see, we have various kinds of data within a list and is a valid list.
However, one important thing about them is that we can access the list items using zero based indices. So we can write:
print ls[0],ls[3]
output: 1 g
Dictionary:
This datastructure is similar to a hash map data structure. It contains a (key,Value) pair. An empty dictionary looks like:
dc = {}
Now, to store a key,value pair, e.g., ('potato',3),(tomato,5), we can do as:
dc['potato'] = 3
dc['tomato'] = 5
and we saved the data in the dictionary dc.
The important thing is that we can even store another data structure element like a list within a dictionary like:
dc['list1'] = ls , where ls is the list defined above.
This shows the power of using dictionary.
In your case, you have difined a dictionary like this:
data = {'type':[],'id':[]}
This means that your dictionary will consist of only two keys and each key corresponds to a list, which are empty for now.
Talking a bit about your script, the expression :
current_row_string[-8:]
doesn't make a sense. The index should have been -6 instead of -8 that would give you the id part of the current row.
This part is the id and should have been stored in a variable say :
id = current_row_string[-6:]
Further action can be performed as seen the answer given by Roland.

Python3 access previously created object

Im new to programming in general and I need some help for accessing a previously created instance of Class. I did some search on SO but I could not find anything... Maybe it's just because I should not try to do that.
for s in servers:
c = rconprotocol.Rcon(s[0], s[2],s[1])
t = threading.Thread(target=c.connect)
t.start()
c.messengers(allmessages, 10)
Now, what can I do if I want to call a function on "c" ?
Thanks, Hugo
You're creating several different objects that you briefly name c as you go through the loop. If you want to be able to access more than the last of them, you'll need to save them somewhere that won't be overwritten. Probably the best approach is to use a list to hold the successive values, but depending on your specific needs another data structure might make sense too (for instance, using a dictionary you could look up each value by a specific key).
Here's a trivial adjustment to your current code that will save the c values in a list:
c_list = []
for s in servers:
c = rconprotocol.Rcon(s[0], s[2],s[1])
t = threading.Thread(target=c.connect)
t.start()
c.messengers(allmessages, 10)
c_list.append(c)
Later you can access any of the c values with c_list[index], or by iterating with for c in c_list.
A slightly more Pythonic version might use a list comprehension rather than append to create the list (this also shows what a loop over c_list later one might look like):
c_list = [rconprotocol.Rcon(s[0], s[2],s[1]) for s in servers]
for c in c_list:
t = threading.Thread(target=c.connect)
t.start()
c.messengers(allmessages, 10)

Resources