pandas df from a dictionary of list - python-3.x

I have a dictionary of a two-elements list and I would like to transform it into a 3 columns pandas df.
This dict
{
'Abg': [2, 0],
'Aidi': [1, 2],
'Geng': [0, 0],
}
into this df
0 1 2
Arg 2 0
Aidi 1 2
Geng 0 0
How do I do that?

Solution found:
pd.DataFrame.from_items(name_dict.items(),
orient='index',
columns=['A','B'])

Related

How to get duplicated values in a data frame when the column is a list?

Good morning!
I have a data frame with several columns. One of this columns, data, has lists as content. Below I show a little example (id is just an example with random information):
df =
id data
0 a [1, 2, 3]
1 h [3, 2, 1]
2 bf [1, 2, 3]
What I want is to get rows with duplicated values in column data, I mean, in this example, I should get rows 0 and 2, because the values in its column data are the same (list [1, 2, 3]). However, this can't be achieved with df.duplicated(subset = ['data']) due to list is an unhashable type.
I know that it can be done getting two rows and comparing data directly, but my real data frame can have 1000 rows or more, so I can't compare one by one.
Hope someone knows it!
Thanks you very much in advance!
IIUC, We can create a new DataFrame from df['data'] and then check with DataFrame.duplicated
You can use:
m = pd.DataFrame(df['data'].tolist()).duplicated(keep=False)
df.loc[m]
id data
0 a [1, 2, 3]
2 bf [1, 2, 3]
Expanding on Quang's comment:
Try
In [2]: elements = [(1,2,3), (3,2,1), (1,2,3)]
...: df = pd.DataFrame.from_records(elements)
...: df
Out[2]:
0 1 2
0 1 2 3
1 3 2 1
2 1 2 3
In [3]: # Add a new column of tuples
...: df["new"] = df.apply(lambda x: tuple(x), axis=1)
...: df
Out[3]:
0 1 2 new
0 1 2 3 (1, 2, 3)
1 3 2 1 (3, 2, 1)
2 1 2 3 (1, 2, 3)
In [4]: # Remove duplicate rows (Keeping the first one)
...: df.drop_duplicates(subset="new", keep="first", inplace=True)
...: df
Out[4]:
0 1 2 new
0 1 2 3 (1, 2, 3)
1 3 2 1 (3, 2, 1)
In [5]: # Remove the new column if not required
...: df.drop("new", axis=1, inplace=True)
...: df
Out[5]:
0 1 2
0 1 2 3
1 3 2 1

How to convert a 2d numpy array into a 1d numpy array by summing the values and not using for loop?

Is there a numpy function which can combine a 2d numpy array into a 1d numpy array. I want to do it without using a for loop.
Example:
[[1 0 0 0 0], [0 1 0 0 0]] => [1 1 0 0 0]
Just use the ndarray method sum along row axis:
arr2d = np.array([[1, 3, 8, 2, 0], [0, 1, 0, 5, 1]])
arr1d = arr2d.sum(axis=0)
>>> array([1, 4, 8, 7, 1])

Is there any function to create pairing of values from columns in pandas

I have to make the pairing of values in particular column like 3 2 2 4 2 2 to [3,2][2,2][2,4][4,2][2,2] in whole of the data set.
Expected output
[[3, 2], [2, 2], [2, 4], [4, 2], [2, 2]] Every row in separate columns like Pair 1 , Pair 2 ,Pair 3 ....
content = pd.read_csv('temp2.csv')
df = ([content], columns=['V2','V3','V4','V5','V6','V7'])
def get_pairs(x):
arr = x.split(' ')
return list(map(list, zip(arr,arr[1:])))
df['pairs'] = df.applymap(get_pairs)
df
IIUC, you can use list comprehension and zip:
# Setup
df = pd.DataFrame([3, 2, 2, 4, 2, 2], columns=['col1'])
[[x, y] for x, y in zip(df.loc[:, 'col1'], df.loc[1:, 'col1'])]
or alternatively using map and list constructor:
list(map(list, zip(df.loc[:, 'col1'], df.loc[1:, 'col1'])))
[out]
[[3, 2], [2, 2], [2, 4], [4, 2], [2, 2]]
Or if this is how your data is structured you could use applymap with your own function:
# Setup
df = pd.DataFrame(['3 2 2 4 2 2', '1 2 3 4 5 6'], columns=['col1'])
# col1
# 0 3 2 2 4 2 2
# 1 1 2 3 4 5 6
def get_pairs(x):
arr = x.split(' ')
return list(map(list, zip(arr, arr[1:])))
df['pairs'] = df.applymap(get_pairs)
[out]
col1 pairs
0 3 2 2 4 2 2 [[3, 2], [2, 2], [2, 4], [4, 2], [2, 2]]
1 1 2 3 4 5 6 [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]

Is there a way to extract code that constructs a data frame from the data frame?

I am looking for a way to extract code that constructs a data frame, from the loaded data frame.
Consider the following process.
# Code to construct a df:
df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
# Obtain the df output:
df
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8
I am looking for an automatized reverse process. Suppose, I start with the df, which I load from a csv file (example below, same df as above).
df =
pd.read_csv('/path_to_data/df.csv', sep='\t')
df
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8
At this point, is there a way to extract the code (listed below), that would construct the df, assuming that I did not have the code to begin with.
df = pd.DataFrame({'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]},
index=['falcon', 'dog', 'spider', 'fish'])
This is not always useful, but I am curious if this can be done, for certain portability purposes. For instance, this would allow sharing one jupyter notebook document, without referencing anything external. And allow for a fully self-sustained replicability of data analysis.
You can get this information using df.to_dict('list') and df.index respectively:
In [9]: df
Out[9]:
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8
In [10]: df.to_dict('list')
Out[10]:
{'num_legs': [2, 4, 8, 0],
'num_wings': [2, 0, 0, 0],
'num_specimen_seen': [10, 2, 1, 8]}
In [11]: df.index
Out[11]: Index(['falcon', 'dog', 'spider', 'fish'], dtype='object')
In [12]: new_df = pd.DataFrame(df.to_dict('list'), index=df.index)
In [13]: new_df
Out[13]:
num_legs num_wings num_specimen_seen
falcon 2 2 10
dog 4 0 2
spider 8 0 1
fish 0 0 8

How to normalizing data in pythons pandas [duplicate]

This question already has answers here:
Pandas: convert categories to numbers
(6 answers)
Label encoding multiple columns with the same category
(4 answers)
Closed 4 years ago.
I have a problem normalizing data in pandas.
In [37]:
import pandas as pd # data processing
from IPython.display import display
This is my dataset I have ...
In [37]:
d = {'FTR': ['W', 'D', 'L', 'W'], 'HTG': [3, 0, 1, 2], 'ATG': [0, 0, 2, 0], 'HTN': ['Alpha', 'Alpha', 'Alpha', 'Beta'], 'ATN': ['Beta', 'Chi', 'Epsilon', 'Alpha']}
df = pd.DataFrame(data=d)
display(df)
FTR HTG ATG HTN ATN
0 W 3 0 Alpha Beta
1 D 0 0 Alpha Chi
2 L 1 2 Alpha Epsilon
3 W 2 0 Beta Alpha
... and so I would like the data to have
d = {'FTR': ['W', 'D', 'L', 'W'], 'HTG': [3, 0, 1, 2], 'ATG': [0, 0, 2, 0], 'HTN': [1, 1, 1, 2], 'ATN': [2, 22, 5,1]}
df = pd.DataFrame(data=d)
display(df)
FTR HTG ATG HTN ATN
0 W 3 0 1 2
1 D 0 0 1 22
2 L 1 2 1 5
3 W 2 0 2 1
Any idea?

Resources