I have a DataFrame with four columns. I want to convert this DataFrame to a python dictionary. I want the elements of first column be keys and the elements of other columns in same row be values.
DataFrame:
ID A B C
0 p 1 3 2
1 q 4 3 2
2 r 4 0 9
Output should be like this:
Dictionary:
{'p': [1,3,2], 'q': [4,3,2], 'r': [4,0,9]}
The to_dict() method sets the column names as dictionary keys so you'll need to reshape your DataFrame slightly. Setting the 'ID' column as the index and then transposing the DataFrame is one way to achieve this.
to_dict() also accepts an 'orient' argument which you'll need in order to output a list of values for each column. Otherwise, a dictionary of the form {index: value} will be returned for each column.
These steps can be done with the following line:
>>> df.set_index('ID').T.to_dict('list')
{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}
In case a different dictionary format is needed, here are examples of the possible orient arguments. Consider the following simple DataFrame:
>>> df = pd.DataFrame({'a': ['red', 'yellow', 'blue'], 'b': [0.5, 0.25, 0.125]})
>>> df
a b
0 red 0.500
1 yellow 0.250
2 blue 0.125
Then the options are as follows.
dict - the default: column names are keys, values are dictionaries of index:data pairs
>>> df.to_dict('dict')
{'a': {0: 'red', 1: 'yellow', 2: 'blue'},
'b': {0: 0.5, 1: 0.25, 2: 0.125}}
list - keys are column names, values are lists of column data
>>> df.to_dict('list')
{'a': ['red', 'yellow', 'blue'],
'b': [0.5, 0.25, 0.125]}
series - like 'list', but values are Series
>>> df.to_dict('series')
{'a': 0 red
1 yellow
2 blue
Name: a, dtype: object,
'b': 0 0.500
1 0.250
2 0.125
Name: b, dtype: float64}
split - splits columns/data/index as keys with values being column names, data values by row and index labels respectively
>>> df.to_dict('split')
{'columns': ['a', 'b'],
'data': [['red', 0.5], ['yellow', 0.25], ['blue', 0.125]],
'index': [0, 1, 2]}
records - each row becomes a dictionary where key is column name and value is the data in the cell
>>> df.to_dict('records')
[{'a': 'red', 'b': 0.5},
{'a': 'yellow', 'b': 0.25},
{'a': 'blue', 'b': 0.125}]
index - like 'records', but a dictionary of dictionaries with keys as index labels (rather than a list)
>>> df.to_dict('index')
{0: {'a': 'red', 'b': 0.5},
1: {'a': 'yellow', 'b': 0.25},
2: {'a': 'blue', 'b': 0.125}}
Should a dictionary like:
{'red': '0.500', 'yellow': '0.250', 'blue': '0.125'}
be required out of a dataframe like:
a b
0 red 0.500
1 yellow 0.250
2 blue 0.125
simplest way would be to do:
dict(df.values)
working snippet below:
import pandas as pd
df = pd.DataFrame({'a': ['red', 'yellow', 'blue'], 'b': [0.5, 0.25, 0.125]})
dict(df.values)
Follow these steps:
Suppose your dataframe is as follows:
>>> df
A B C ID
0 1 3 2 p
1 4 3 2 q
2 4 0 9 r
1. Use set_index to set ID columns as the dataframe index.
df.set_index("ID", drop=True, inplace=True)
2. Use the orient=index parameter to have the index as dictionary keys.
dictionary = df.to_dict(orient="index")
The results will be as follows:
>>> dictionary
{'q': {'A': 4, 'B': 3, 'D': 2}, 'p': {'A': 1, 'B': 3, 'D': 2}, 'r': {'A': 4, 'B': 0, 'D': 9}}
3. If you need to have each sample as a list run the following code. Determine the column order
column_order= ["A", "B", "C"] # Determine your preferred order of columns
d = {} # Initialize the new dictionary as an empty dictionary
for k in dictionary:
d[k] = [dictionary[k][column_name] for column_name in column_order]
Try to use Zip
df = pd.read_csv("file")
d= dict([(i,[a,b,c ]) for i, a,b,c in zip(df.ID, df.A,df.B,df.C)])
print d
Output:
{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}
If you don't mind the dictionary values being tuples, you can use itertuples:
>>> {x[0]: x[1:] for x in df.itertuples(index=False)}
{'p': (1, 3, 2), 'q': (4, 3, 2), 'r': (4, 0, 9)}
For my use (node names with xy positions) I found #user4179775's answer to the most helpful / intuitive:
import pandas as pd
df = pd.read_csv('glycolysis_nodes_xy.tsv', sep='\t')
df.head()
nodes x y
0 c00033 146 958
1 c00031 601 195
...
xy_dict_list=dict([(i,[a,b]) for i, a,b in zip(df.nodes, df.x,df.y)])
xy_dict_list
{'c00022': [483, 868],
'c00024': [146, 868],
... }
xy_dict_tuples=dict([(i,(a,b)) for i, a,b in zip(df.nodes, df.x,df.y)])
xy_dict_tuples
{'c00022': (483, 868),
'c00024': (146, 868),
... }
Addendum
I later returned to this issue, for other, but related, work. Here is an approach that more closely mirrors the [excellent] accepted answer.
node_df = pd.read_csv('node_prop-glycolysis_tca-from_pg.tsv', sep='\t')
node_df.head()
node kegg_id kegg_cid name wt vis
0 22 22 c00022 pyruvate 1 1
1 24 24 c00024 acetyl-CoA 1 1
...
Convert Pandas dataframe to a [list], {dict}, {dict of {dict}}, ...
Per accepted answer:
node_df.set_index('kegg_cid').T.to_dict('list')
{'c00022': [22, 22, 'pyruvate', 1, 1],
'c00024': [24, 24, 'acetyl-CoA', 1, 1],
... }
node_df.set_index('kegg_cid').T.to_dict('dict')
{'c00022': {'kegg_id': 22, 'name': 'pyruvate', 'node': 22, 'vis': 1, 'wt': 1},
'c00024': {'kegg_id': 24, 'name': 'acetyl-CoA', 'node': 24, 'vis': 1, 'wt': 1},
... }
In my case, I wanted to do the same thing but with selected columns from the Pandas dataframe, so I needed to slice the columns. There are two approaches.
Directly:
(see: Convert pandas to dictionary defining the columns used fo the key values)
node_df.set_index('kegg_cid')[['name', 'wt', 'vis']].T.to_dict('dict')
{'c00022': {'name': 'pyruvate', 'vis': 1, 'wt': 1},
'c00024': {'name': 'acetyl-CoA', 'vis': 1, 'wt': 1},
... }
"Indirectly:" first, slice the desired columns/data from the Pandas dataframe (again, two approaches),
node_df_sliced = node_df[['kegg_cid', 'name', 'wt', 'vis']]
or
node_df_sliced2 = node_df.loc[:, ['kegg_cid', 'name', 'wt', 'vis']]
that can then can be used to create a dictionary of dictionaries
node_df_sliced.set_index('kegg_cid').T.to_dict('dict')
{'c00022': {'name': 'pyruvate', 'vis': 1, 'wt': 1},
'c00024': {'name': 'acetyl-CoA', 'vis': 1, 'wt': 1},
... }
Most of the answers do not deal with the situation where ID can exist multiple times in the dataframe. In case ID can be duplicated in the Dataframe df you want to use a list to store the values (a.k.a a list of lists), grouped by ID:
{k: [g['A'].tolist(), g['B'].tolist(), g['C'].tolist()] for k,g in df.groupby('ID')}
Dictionary comprehension & iterrows() method could also be used to get the desired output.
result = {row.ID: [row.A, row.B, row.C] for (index, row) in df.iterrows()}
df = pd.DataFrame([['p',1,3,2], ['q',4,3,2], ['r',4,0,9]], columns=['ID','A','B','C'])
my_dict = {k:list(v) for k,v in zip(df['ID'], df.drop(columns='ID').values)}
print(my_dict)
with output
{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}
With this method, columns of dataframe will be the keys and series of dataframe will be the values.`
data_dict = dict()
for col in dataframe.columns:
data_dict[col] = dataframe[col].values.tolist()
DataFrame.to_dict() converts DataFrame to dictionary.
Example
>>> df = pd.DataFrame(
{'col1': [1, 2], 'col2': [0.5, 0.75]}, index=['a', 'b'])
>>> df
col1 col2
a 1 0.1
b 2 0.2
>>> df.to_dict()
{'col1': {'a': 1, 'b': 2}, 'col2': {'a': 0.5, 'b': 0.75}}
See this Documentation for details
Related
I have csv file , which have one column and inside this column have string , string contains many values , i want to convert this string in muultiple columns
here is example data:
df = pd.DataFrame({'column1':[{'A':2,'B':3,'c':2}]})
print(df)
column1
0 {'A': 2, 'B': 3, 'c': 2}
1 {'A': 3, 'B': 5, 'c': 10}
i want output:
df = pd.DataFrame({'A':[2],'B':[3],'c':[2]})
try this:
pd.DataFrame([*df['column1'].apply(eval)])
First convert string that looks like dictionary to an actual dictionary
import json
my_dict = json.loads(column1)
# Gives you {'A': 2, 'B': 3, 'c': 2}
Then convert that dictionary to a dataframe:
pd.Dataframe([my_dict])
I am trying to wright a function that works with Series and Dataframe.
dct= {10: 0.5, 20: 2, 30: 3,40:4}
#Defining the function
def funtion_dict(row,dict1):
total_area=row['total_area']
if total_area.round(-1) in dict1:
return dict1.get(total_area.round(-1))*total_area
#checking function in a test situation
row = pd.DataFrame(
{
'total_area': [53, 14.8, 94, 77, 12],
'b': [5, 4, 3, 2, 1],
'c': ['X', 'Y', 'Y', 'Y', 'Z'],
}
)
print(funtion_dict(row,dct))
I keep getting an error 'Series' objects are mutable, thus they cannot be hashed'. Please help
This is the expected behavior because you are trying to use a "Series" as a lookup for a dictionary which is not allowed.
From your code,
dct= {10: 0.5, 20: 2, 30: 3,40:4}
df = pd.DataFrame({
'total_area': [53, 14.8, 94, 77, 12],
'b': [5, 4, 3, 2, 1],
'c': ['X', 'Y', 'Y', 'Y', 'Z'],
})
If you want to add another column to your data frame with multipliers matched from a dictionary, you can do it like so:
df['new_column'] = df['total_area'].round(-1).map(dct) * df['total_area']
which will then give you
total_area b c new_column
0 53.0 5 X NaN
1 14.8 4 Y 7.4
2 94.0 3 Y NaN
3 77.0 2 Y NaN
4 12.0 1 Z 6.0
I have a list of the dictionary as follows:
[{"A":5,"B":10},
{"A":6,"B":13},
{"A":10,"B":5}]
I want to this list in decending order on the value of B. The output should look like this:
[{"A":6,"B":13},
{"A":5,"B":10},
{"A":10,"B":5}]
How to do that?
You can sort lists by the results of applying a function to each element: https://docs.python.org/3.9/library/functions.html#sorted
>>> data = [{"A":5,"B":10},
... {"A":6,"B":13},
... {"A":10,"B":5}]
>>> sorted(data, key=lambda dct: dct["B"], reverse=True)
[{'A': 6, 'B': 13}, {'A': 5, 'B': 10}, {'A': 10, 'B': 5}]
I'm trying to convert a pandas dataframe into a dictionary but I need an specifyc ouput format, I have been reading and reviewing many other answers but I can't resolve; my dataframe looks like:
label Min Max Prom Desv. Est. Cr Tz Cpk Zup Zlow PPM % OOS # Datos
0 test1 1.25 1.46 1.329 0.0426 1.161 -0.023 0.697 2.090 3.077 19354 2 268
1 test2 4.80 5.50 5.110 0.1368 0.774 -1.097 0.926 2.778 4.972 2735 0 268
2 test3 2.58 2.96 2.747 0.0709 0.760 -1.029 0.973 2.918 4.977 1762 0 268
I've tried this (and others options but this is the most similar to the desire output):
dict = df.set_index('label').groupby('label').apply(lambda g: g.values.tolist()).to_dict()
And I got:
{'test1': [[1.25, 1.46, 1.329, 0.0426, 1.161, -0.023, 0.697, 2.09, 3.077, 19354.0, 2.0, 268.0]],
'test2': [[4.8, 5.5, 5.11, 0.1368, 0.774, -1.097, 0.926, 2.778, 4.972, 2735.0, 0.0, 268.0]],
'test3': [[2.58, 2.96, 2.747, 0.0709, 0.76, -1.0290, 0.973, 2.918, 4.977, 1762.0, 0.0, 268.0]]}
But what I'm looking for is something like:
{'label':'test1', 'cols':[1.25, 1.46, 1.329, 0.0426, 1.161, -0.023, 0.697, 2.09, 3.077, 19354.0, 2.0, 268.0]},
{'label':'test2', 'cols': [4.8, 5.5, 5.11, 0.1368, 0.774, -1.097, 0.926, 2.778, 4.972, 2735.0, 0.0, 268.0]},
{'label':'test3', 'cols': [2.58, 2.96, 2.747, 0.0709, 0.76, -1.0290, 0.973, 2.918, 4.977, 1762.0, 0.0, 268.0]}
Many thanks in advance for any idea or suggestion.
You can use a lambda function to build the output you want:
df.apply(lambda x: {'label':x.label, 'cols': x.tolist()[1:]}, axis=1).tolist()
Well, reading the title of your question literally, there's always .to_dict():
>>> df = pd.DataFrame([dict(a=1, b=2), dict(a=3, b=4), dict(a=5, b=6)])
>>> df
a b
0 1 2
1 3 4
2 5 6
>>> df.to_dict()
{'a': {0: 1, 1: 3, 2: 5}, 'b': {0: 2, 1: 4, 2: 6}}
But your example suggests you're looking for a list of dicts,
as might conveniently be produced by iterrows or itertuples:
>>> df = pd.DataFrame([dict(a=1, b=2), dict(a=3, b=4), dict(a=5, b=6)])
>>> df
a b
0 1 2
1 3 4
2 5 6
>>>
>>> for i, row in df.iterrows():
... print(dict(row), list(row))
...
{'a': 1, 'b': 2} [1, 2]
{'a': 3, 'b': 4} [3, 4]
{'a': 5, 'b': 6} [5, 6]
>>>
>>> for row in df.itertuples(index=False):
... print(dict(row._asdict()))
...
{'a': 1, 'b': 2}
{'a': 3, 'b': 4}
{'a': 5, 'b': 6}
Using list(row)[1:], to skip past the label, would probably fit the bill for you.
I have a dataframe with empty columns and a corresponding dictionary which I would like to update the empty columns with based on index, column:
import pandas as pd
import numpy as np
dataframe = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9], [4, 6, 2], [3, 4, 1]])
dataframe.columns = ['x', 'y', 'z']
additional_cols = ['a', 'b', 'c']
for col in additional_cols:
dataframe[col] = np.nan
x y z a b c
0 1 2 3
1 4 5 6
2 7 8 9
3 4 6 2
4 3 4 1
for row, column in x.iterrows():
#caluclations to return dictionary y
y = {"a": 5, "b": 6, "c": 7}
df.loc[row, :].map(y)
Basically after performing the calculations using columns x, y, z I would like to update columns a, b, c for that same row :)
I could use a function as such but as far as the pandas library and a method for the DataFrame object I am not sure...
def update_row_with_dict(dictionary, dataframe, index):
for key in dictionary.keys():
dataframe.loc[index, key] = dictionary.get(key)
The above answer with correct indent
def update_row_with_dict(df,d,idx):
for key in d.keys():
df.loc[idx, key] = d.get(key)
more short would be
def update_row_with_dict(df,d,idx):
df.loc[idx,d.keys()] = d.values()
for your code snipped the syntax would be:
import pandas as pd
import numpy as np
dataframe = pd.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9], [4, 6, 2], [3, 4, 1]])
dataframe.columns = ['x', 'y', 'z']
additional_cols = ['a', 'b', 'c']
for col in additional_cols:
dataframe[col] = np.nan
for idx in dataframe.index:
y = {'a':1,'b':2,'c':3}
update_row_with_dict(dataframe,y,idx)