Converting a pandas dataframe to dictionnary of lists - python-3.x

I have the following problem. I have a pandas dataframe that was populated from a pandas.read_sql.
The dataframe looks like this:
PERSON GRADES
20 A 70
21 A 23
22 A 67
23 B 08
24 B 06
25 B 88
26 B 09
27 C 40
28 D 87
29 D 11
And I would need to convert it to a dictionary of lists like this:
{'A':[70,23,67], 'B':[08,06,88,09], 'C':[40], 'D':[87,11]}
I've tried for 2 hours and now I'm out of ideas. I think I'm missing something very simple.

With groupby and to_dict
df.groupby('PERSON').GRADES.apply(list).to_dict()
Out[286]: {'A': [70, 23, 67], 'B': [8, 6, 88, 9], 'C': [40], 'D': [87, 11]}

Related

How to extract and remove few range of indices from a two dimensional numpy array in python

I am struck into a problem and it is required to be resolved. I have created a two dimensional matrix from a continuous range of length. Now, I want to extract few ranges of indices from that 2D matrix. Suppose, I have a matrix like:
a = [[ 12 4 35 0 26 15 100]
[17 37 29 87 46 95 120]]
Now I want to delete some part based on the indices for example: index number 2 to 5 and 8:10. After deleting I want to return my array with same two dimension. Thank you in advance.
I have tried many ways like numpy stacking and concatenating but I cannot solve the problem.
deleting columns of the numpy array is relatively straight forward.
using a corrected example from the question, it looks like this:
import numpy as np
a = np.array([
[12, 4, 35, 0, 26, 15, 100],
[17, 37, 29, 87, 46, 95, 120]])
print('first array:')
print(a)
# deletes items from first row
b = np.delete(a, [2,3], 1)
print('second array:')
print(b)
which gives this:
first array:
[[ 12 4 35 0 26 15 100]
[ 17 37 29 87 46 95 120]]
second array:
[[ 12 4 26 15 100]
[ 17 37 46 95 120]]
So the columns 2,3,4 have been removed in the above example.

how to splits columns by date using python

df.head(7)
df
Month,ward1,ward2,...ward30
Apr-19, 20, 30, 45
May-19, 18, 25, 42
Jun-19, 25, 19, 35
Jul-19, 28, 22, 38
Aug-19, 24, 15, 40
Sep-19, 21, 14, 39
Oct-19, 15, 18, 41
to:
Month, ward1
Apr-19, 20
May-19, 18
Jun-19, 25
Jul-19, 28
Aug-19, 24
Sep-19, 21
Oct-19, 15
Month,ward2
Apr-19, 30
May-19, 25
Jun-19, 19
Jul-19, 22
Aug-19, 15
Sep-19, 14
Oct-19, 18
Month, ward30
Apr-19, 45
May-19, 42
Jun-19, 35
Jul-19, 38
Aug-19, 40
Sep-19, 39
Oct-19, 41
How to group-by date wise in python using pandas?
I have dataframe df that contains a datetime and 30 other columns which I want to split by date attached with each of those columns in pandas but I am facing some difficulties.
try using a dictionary comprehension to hold your separate dataframes.
dfs = {col : df.set_index('Month')[[col]] for col in (df.set_index('Month').columns)}
print(dfs['ward1'])
ward1
Month
Apr-19 20
May-19 18
Jun-19 25
Jul-19 28
Aug-19 24
Sep-19 21
Oct-19 15
print(dfs['ward30'])
ward30
Month
Apr-19 45
May-19 42
Jun-19 35
Jul-19 38
Aug-19 40
Sep-19 39
Oct-19 41
One straight forward way would be to set date column as index and separating out every other column:
data.set_index('Month', inplace =True)
data_dict = {col: data[col] for col in data.columns}
You have to create new DataFrames:
data1 = pd.DataFrame()
data1['Month'] = df['Month']
data1['ward1'] = df['ward1']
data1.head()

Can anyone please explain me what the ‘def displayRMagicSquare(magic_square)’ is asking for exactly?

I want to achieve the expected output by creating two functions. I have included some guidelines within the functions.
def createRMagicSquare(row1,row2,row3,row4):
'''
A function that creates a Ramanujan magic square
Returns
----------
'''
return magic_square
def displayRMagicSquare(magic_square):
'''
A function that displays a Ramanujan magic square
Parameters
----------
'''
c = createRMagicSquare([23, 22, 18 ,87], [89, 27, 9 ,25] , [90, 24 ,89 ,16] , [19, 46 ,23 ,11])
displayRMagicSquare(c)
Expected output
23 22 18 87
89 27 9 25
90 24 89 16
19 46 23 11
You are achieving your expected output in createRMagicSquare() and i think there is no need of displayRMagicSquare()
To createRMagicSquare() you are passing 4 lists,
createRMagicSquare([23, 22, 18 ,87], [89, 27, 9 ,25] , [90, 24 ,89 ,16] , [19, 46 ,23 ,11])
and they are being grouped into a single list such as,
magic_square = [row1,row2,row3,row4]
and the line print(" ".join(map(str,i))) is converting each list into a string of space seperated elements such as,
23 22 18 87
89 27 9 25
90 24 89 16
19 46 23 11
and if you still want to use 2 function to print the expected output then,
def createRMagicSquare(row1,row2,row3,row4):
magic_square = [row1,row2,row3,row4]
return magic_square
def displayRMagicSquare(magic_square):
for i in magic_square:
print(" ".join(map(str,i)))
return magic_square
c = createRMagicSquare([23, 22, 18 ,87], [89, 27, 9 ,25] , [90, 24 ,89 ,16] , [19, 46 ,23 ,11])
displayRMagicSquare(c)

Splitting time formatted object doesn't work with python and pandas

I have the simple line of code:
print(df['Duration'])
df['Duration'].str.split(':')
print(df['Duration'])
Here are the value I have for each print
00:58:59
00:27:41
00:27:56
Name: Duration, dtype: object
Why is the split not working here? What do I'm missing?
str.split doesn't modify column inplace, so you need to assign the result to something:
import pandas as pd
df = pd.DataFrame({'Duration':['00:58:59', '00:27:41', '00:27:56'], 'other':[10, 20, 30]})
df['Duration'] = df['Duration'].str.split(':')
print(df)
Prints:
Duration other
0 [00, 58, 59] 10
1 [00, 27, 41] 20
2 [00, 27, 56] 30
If you want to expand the columns of DataFrame by splitting, you can try:
import pandas as pd
df = pd.DataFrame({'Duration':['00:58:59', '00:27:41', '00:27:56'], 'other':[10, 20, 30]})
df[['hours', 'minutes', 'seconds']] = df['Duration'].str.split(':', expand=True)
print(df)
Prints:
Duration other hours minutes seconds
0 00:58:59 10 00 58 59
1 00:27:41 20 00 27 41
2 00:27:56 30 00 27 56

How to sort a pandas dataframe by the standard deviations of its columns?

Given the following example :
from statistics import stdev
d = pd.DataFrame({"a": [45, 55], "b": [5, 95], "c": [30, 70]})
stds = [stdev(d[c]) for c in d.columns]
With output:
In [87]: d
Out[87]:
a b c
0 45 5 30
1 55 95 70
In [91]: stds
Out[91]: [7.0710678118654755, 63.63961030678928, 28.284271247461902]
I would like to be able to sort the columns of the dataframe by their
standard deviations, resulting in the following
b c a
0 5 30 45
1 95 70 55
you are looking for:
d.iloc[:,(-d.std()).argsort()]
Out[8]:
b c a
0 5 30 45
1 95 70 55
You can get the column order like this:
column_order = d.std().sort_values(ascending=False).index
# >>> column_order
# Index(['b', 'c', 'a'], dtype='object')
And then sort the columns like this:
d[column_order]
b c a
0 5 30 45
1 95 70 55

Resources