I have the following for instance:
x = [{'A':1},{'A':1},{'A':2},{'B':1},{'B':1},{'B':2},{'B':3},{'C':1},{'D':1}]
and I would like to get a dictionary like this:
x = [{'A': [1,2], 'B': [1,2,3], 'C':[1], 'D': [1]}]
Do you have any idea how I could get this please?
You could use a collections.defaultdict of sets to collect unique values, then convert the final result to a dictionary with values as lists using a dict comprehension:
from collections import defaultdict
lst = [{'A':1},{'A':1},{'A':2},{'B':1},{'B':1},{'B':2},{'B':3},{'C':1},{'D':1}]
result = defaultdict(set)
for dic in lst:
for key, value in dic.items():
result[key].add(value)
print({key: list(value) for key, value in result.items()})
Output:
{'A': [1, 2], 'B': [1, 2, 3], 'C': [1], 'D': [1]}
Although its probably better to add your data directly to the defaultdict to begin with, instead of creating a list of singleton dictionaries(don't recommend this data structure) then converting the result.
Using dict.setdefault
Ex:
x = [{'A':1},{'A':1},{'A':2},{'B':1},{'B':1},{'B':2},{'B':3},{'C':1},{'D':1}]
res = {}
for i in x:
for k, v in i.items():
res.setdefault(k, set()).add(v)
#or res = [{k: list(v) for k, v in res.items()}]
print(res)
Output:
{'A': {1, 2}, 'B': {1, 2, 3}, 'C': {1}, 'D': {1}}
I have a text file below:
A test B echo C delete
A test B echo C delete D modify
A test B echo C delete
I want to parse the text file above, translate to list of list, and then to a dictionary.
Expected list of list is:
[['A', 'test', 'B', 'echo', 'C', 'delete'], ['A', 'test', 'B', 'echo', 'C', 'delete', 'D', 'modify'], ['A', 'test', 'B', 'echo', 'C', 'delete']]
Final result for dictionary is:
[{'A':'test','B':'echo','C':'delete'},{'A':'test','B':'echo','C':'delete','D': 'modify'},{'A':'test', 'B':'echo', 'C':'delete'}]
This is my script:
#!/usr/bin/python3
def listToDict(list):
listDict = {list[i]: list[i + 1] for i in range (0, len(list), 2)}
return listDict
def parse_file(filepath):
string_to_listoflist = []
with open(filepath, 'r') as file_object:
lines = file_object.readlines()
for line in lines:
string_to_listoflist.append(line.rstrip().split())
dictionary = listToDict(string_to_listoflist)
print(dictionary)
if __name__ == '__main__':
filepath = 'log.txt'
parse_file(filepath)
with the above script will produce an error below:
Traceback (most recent call last):
File "parse.py", line 19, in <module>
parse_file(filepath)
File "parse.py", line 14, in parse_file
dictionary = listToDict(string_to_listoflist)
File "parse.py", line 4, in listToDict
listDict = {list[i]: list[i + 1] for i in range (0, len(list), 2)}
File "parse.py", line 4, in <dictcomp>
listDict = {list[i]: list[i + 1] for i in range (0, len(list), 2)}
TypeError: unhashable type: 'list'
Now I create another loop in the list of list below:
#!/usr/bin/python3
def listToDict(list):
listDict = {list[i]: list[i + 1] for i in range (0, len(list), 2)}
return listDict
def parse_file(filepath):
string_to_listoflist = []
dictionary = {}
with open(filepath, 'r') as file_object:
lines = file_object.readlines()
for line in lines:
string_to_listoflist.append(line.rstrip().split())
for e in string_to_listoflist:
dictionary = listToDict(e)
print(dictionary)
if __name__ == '__main__':
filepath = 'log.txt'
parse_file(filepath)
The script above will produce unexpected result even I define the dictionary variable before the loop:
{'A': 'test', 'B': 'echo', 'C': 'delete'}
Then change the position of print command as below:
#!/usr/bin/python3
def listToDict(list):
listDict = {list[i]: list[i + 1] for i in range (0, len(list), 2)}
return listDict
def parse_file(filepath):
string_to_listoflist = []
dictionary = {}
with open(filepath, 'r') as file_object:
lines = file_object.readlines()
for line in lines:
string_to_listoflist.append(line.rstrip().split())
for e in string_to_listoflist:
dictionary = listToDict(e)
print(dictionary)
if __name__ == '__main__':
filepath = 'log.txt'
parse_file(filepath)
Unexpected result for the script above is:
{'A': 'test', 'B': 'echo', 'C': 'delete'}
{'A': 'test', 'B': 'echo', 'C': 'delete', 'D': 'modify'}
{'A': 'test', 'B': 'echo', 'C': 'delete'}
Can anyone help how to resolve my issue?
Thanks
In your first attempt, your variable string_to_listoflist is a list of lists.
When you pass it to your function listToDict, the function iterates on the parent level of the list instead of iterating over each list within the parent list. Thus, the first entry attempted in the dictionary is
['A', 'test', 'B', 'echo', 'C', 'delete']:['A', 'test', 'B', 'echo', 'C', 'delete', 'D', 'modify']
rather than your intended
'A':'test'
This causes the error you observe TypeError: unhashable type: 'list' since a list (mutable) is attempted to be used as a key in a dictionary, which requires immutable keys.
Adding the extra loop surrounding each element of the parent list is the correct way to resolve this. However, if you want your final result to be inside a list, you simply need to append the result to a list.
In other words, perhaps the following
dictionaries=[]
for e in string_to_listoflist:
dictionary = listToDict(e)
dictionaries.append(dictionary)
print(dictionaries)
You can use re module to obtain your desired dict.
For example:
import re
with open('file.txt', 'r') as f_in:
out = [dict(re.findall(r'([A-Z]+) ([^\s]+)', line)) for line in f_in]
print(out)
Prints:
[{'A': 'test', 'B': 'echo', 'C': 'delete'}, {'A': 'test', 'B': 'echo', 'C': 'delete', 'D': 'modify'}, {'A': 'test', 'B': 'echo', 'C': 'delete'}]
I am trying to do this, where i key in the value of 10 and it will filter out the values have more than 10 and give the result:
'b':['sam',20], 'c':['rose',30], 'd':['mary',40], 'e':['jon',50]
Below is the code I am trying:
h = int(input("Enter Value: "))
ini_dict = {'a':['abc',10], 'b':['sam',20], 'c':['rose',30], 'd':['mary',40], 'e':['jon',50]}
# printing initial dictionary
print ("initial lists", str(ini_dict))`
result = dict(filter(lambda x: x[1]>h, ini_dict.items()))
result = dict(result)
print("resultant dictionary : ", str(result))
I encountered this error "TypeError: '>' not supported between instances of 'list' and 'int'" .
Beside this, I have tried to modify:
result = dict(filter(lambda x: x[1]>h, ini_dict.items())) into
this result = dict(filter(lambda x,y:x,y[1]>h, ini_dict.items())) and encounter error y undefined.
Thank you for your help!
This is the issue here: when you type x[1] you are selecting the entirety of the list in each element in the dictionary. in order to access the index that you require INSIDE the list, you should try x[1][1]. so modifying the code like I said:
h = int(input("Enter Value: "))
ini_dict = {'a':['abc',10], 'b':['sam',20], 'c':['rose',30], 'd':['mary',40], 'e':['jon',50]}
# printing initial dictionary
print ("initial lists", str(ini_dict))
result = dict(filter(lambda x: x[1][1]>h, ini_dict.items()))
result = dict(result)
print("resultant dictionary : ", str(result))
The output:
Enter Value: 10
initial lists {'a': ['abc', 10], 'b': ['sam', 20], 'c': ['rose', 30], 'd': ['mary', 40], 'e': ['jon', 50]}
resultant dictionary : {'b': ['sam', 20], 'c': ['rose', 30], 'd': ['mary', 40], 'e': ['jon', 50]}
I have a DataFrame with four columns. I want to convert this DataFrame to a python dictionary. I want the elements of first column be keys and the elements of other columns in same row be values.
DataFrame:
ID A B C
0 p 1 3 2
1 q 4 3 2
2 r 4 0 9
Output should be like this:
Dictionary:
{'p': [1,3,2], 'q': [4,3,2], 'r': [4,0,9]}
The to_dict() method sets the column names as dictionary keys so you'll need to reshape your DataFrame slightly. Setting the 'ID' column as the index and then transposing the DataFrame is one way to achieve this.
to_dict() also accepts an 'orient' argument which you'll need in order to output a list of values for each column. Otherwise, a dictionary of the form {index: value} will be returned for each column.
These steps can be done with the following line:
>>> df.set_index('ID').T.to_dict('list')
{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}
In case a different dictionary format is needed, here are examples of the possible orient arguments. Consider the following simple DataFrame:
>>> df = pd.DataFrame({'a': ['red', 'yellow', 'blue'], 'b': [0.5, 0.25, 0.125]})
>>> df
a b
0 red 0.500
1 yellow 0.250
2 blue 0.125
Then the options are as follows.
dict - the default: column names are keys, values are dictionaries of index:data pairs
>>> df.to_dict('dict')
{'a': {0: 'red', 1: 'yellow', 2: 'blue'},
'b': {0: 0.5, 1: 0.25, 2: 0.125}}
list - keys are column names, values are lists of column data
>>> df.to_dict('list')
{'a': ['red', 'yellow', 'blue'],
'b': [0.5, 0.25, 0.125]}
series - like 'list', but values are Series
>>> df.to_dict('series')
{'a': 0 red
1 yellow
2 blue
Name: a, dtype: object,
'b': 0 0.500
1 0.250
2 0.125
Name: b, dtype: float64}
split - splits columns/data/index as keys with values being column names, data values by row and index labels respectively
>>> df.to_dict('split')
{'columns': ['a', 'b'],
'data': [['red', 0.5], ['yellow', 0.25], ['blue', 0.125]],
'index': [0, 1, 2]}
records - each row becomes a dictionary where key is column name and value is the data in the cell
>>> df.to_dict('records')
[{'a': 'red', 'b': 0.5},
{'a': 'yellow', 'b': 0.25},
{'a': 'blue', 'b': 0.125}]
index - like 'records', but a dictionary of dictionaries with keys as index labels (rather than a list)
>>> df.to_dict('index')
{0: {'a': 'red', 'b': 0.5},
1: {'a': 'yellow', 'b': 0.25},
2: {'a': 'blue', 'b': 0.125}}
Should a dictionary like:
{'red': '0.500', 'yellow': '0.250', 'blue': '0.125'}
be required out of a dataframe like:
a b
0 red 0.500
1 yellow 0.250
2 blue 0.125
simplest way would be to do:
dict(df.values)
working snippet below:
import pandas as pd
df = pd.DataFrame({'a': ['red', 'yellow', 'blue'], 'b': [0.5, 0.25, 0.125]})
dict(df.values)
Follow these steps:
Suppose your dataframe is as follows:
>>> df
A B C ID
0 1 3 2 p
1 4 3 2 q
2 4 0 9 r
1. Use set_index to set ID columns as the dataframe index.
df.set_index("ID", drop=True, inplace=True)
2. Use the orient=index parameter to have the index as dictionary keys.
dictionary = df.to_dict(orient="index")
The results will be as follows:
>>> dictionary
{'q': {'A': 4, 'B': 3, 'D': 2}, 'p': {'A': 1, 'B': 3, 'D': 2}, 'r': {'A': 4, 'B': 0, 'D': 9}}
3. If you need to have each sample as a list run the following code. Determine the column order
column_order= ["A", "B", "C"] # Determine your preferred order of columns
d = {} # Initialize the new dictionary as an empty dictionary
for k in dictionary:
d[k] = [dictionary[k][column_name] for column_name in column_order]
Try to use Zip
df = pd.read_csv("file")
d= dict([(i,[a,b,c ]) for i, a,b,c in zip(df.ID, df.A,df.B,df.C)])
print d
Output:
{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}
If you don't mind the dictionary values being tuples, you can use itertuples:
>>> {x[0]: x[1:] for x in df.itertuples(index=False)}
{'p': (1, 3, 2), 'q': (4, 3, 2), 'r': (4, 0, 9)}
For my use (node names with xy positions) I found #user4179775's answer to the most helpful / intuitive:
import pandas as pd
df = pd.read_csv('glycolysis_nodes_xy.tsv', sep='\t')
df.head()
nodes x y
0 c00033 146 958
1 c00031 601 195
...
xy_dict_list=dict([(i,[a,b]) for i, a,b in zip(df.nodes, df.x,df.y)])
xy_dict_list
{'c00022': [483, 868],
'c00024': [146, 868],
... }
xy_dict_tuples=dict([(i,(a,b)) for i, a,b in zip(df.nodes, df.x,df.y)])
xy_dict_tuples
{'c00022': (483, 868),
'c00024': (146, 868),
... }
Addendum
I later returned to this issue, for other, but related, work. Here is an approach that more closely mirrors the [excellent] accepted answer.
node_df = pd.read_csv('node_prop-glycolysis_tca-from_pg.tsv', sep='\t')
node_df.head()
node kegg_id kegg_cid name wt vis
0 22 22 c00022 pyruvate 1 1
1 24 24 c00024 acetyl-CoA 1 1
...
Convert Pandas dataframe to a [list], {dict}, {dict of {dict}}, ...
Per accepted answer:
node_df.set_index('kegg_cid').T.to_dict('list')
{'c00022': [22, 22, 'pyruvate', 1, 1],
'c00024': [24, 24, 'acetyl-CoA', 1, 1],
... }
node_df.set_index('kegg_cid').T.to_dict('dict')
{'c00022': {'kegg_id': 22, 'name': 'pyruvate', 'node': 22, 'vis': 1, 'wt': 1},
'c00024': {'kegg_id': 24, 'name': 'acetyl-CoA', 'node': 24, 'vis': 1, 'wt': 1},
... }
In my case, I wanted to do the same thing but with selected columns from the Pandas dataframe, so I needed to slice the columns. There are two approaches.
Directly:
(see: Convert pandas to dictionary defining the columns used fo the key values)
node_df.set_index('kegg_cid')[['name', 'wt', 'vis']].T.to_dict('dict')
{'c00022': {'name': 'pyruvate', 'vis': 1, 'wt': 1},
'c00024': {'name': 'acetyl-CoA', 'vis': 1, 'wt': 1},
... }
"Indirectly:" first, slice the desired columns/data from the Pandas dataframe (again, two approaches),
node_df_sliced = node_df[['kegg_cid', 'name', 'wt', 'vis']]
or
node_df_sliced2 = node_df.loc[:, ['kegg_cid', 'name', 'wt', 'vis']]
that can then can be used to create a dictionary of dictionaries
node_df_sliced.set_index('kegg_cid').T.to_dict('dict')
{'c00022': {'name': 'pyruvate', 'vis': 1, 'wt': 1},
'c00024': {'name': 'acetyl-CoA', 'vis': 1, 'wt': 1},
... }
Most of the answers do not deal with the situation where ID can exist multiple times in the dataframe. In case ID can be duplicated in the Dataframe df you want to use a list to store the values (a.k.a a list of lists), grouped by ID:
{k: [g['A'].tolist(), g['B'].tolist(), g['C'].tolist()] for k,g in df.groupby('ID')}
Dictionary comprehension & iterrows() method could also be used to get the desired output.
result = {row.ID: [row.A, row.B, row.C] for (index, row) in df.iterrows()}
df = pd.DataFrame([['p',1,3,2], ['q',4,3,2], ['r',4,0,9]], columns=['ID','A','B','C'])
my_dict = {k:list(v) for k,v in zip(df['ID'], df.drop(columns='ID').values)}
print(my_dict)
with output
{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}
With this method, columns of dataframe will be the keys and series of dataframe will be the values.`
data_dict = dict()
for col in dataframe.columns:
data_dict[col] = dataframe[col].values.tolist()
DataFrame.to_dict() converts DataFrame to dictionary.
Example
>>> df = pd.DataFrame(
{'col1': [1, 2], 'col2': [0.5, 0.75]}, index=['a', 'b'])
>>> df
col1 col2
a 1 0.1
b 2 0.2
>>> df.to_dict()
{'col1': {'a': 1, 'b': 2}, 'col2': {'a': 0.5, 'b': 0.75}}
See this Documentation for details