sum of all the columns values in the given dataframe and display output in in a new data frame - python-3.x

I have tried the below code:
import pandas as pd
dataframe = pd(C1,columns=['School-A','School-B','School-C','School-D','School-E'])
sum_column = daeframet.sum(axis=0)
print (sum_column)
I am getting the below error
TypeError: 'module' object is not callable
Data:
Output:

The error is coming from calling the module pd as a function. It's difficult to know which function you should be calling from pandas without knowing what C1 is, but if it is a dictionary or a pandas data frame, try:
import pandas as pd
# common to abbreviate dataframe as df
df = pd.DataFrame(C1, columns=['School-A','School-B','School-C','School-D','School-E'])
sum_column = df.sum(axis=0)
print(sum_column)

using sum will only return a series and not a dataframe, there are many ways you can do this. Lets try using select_dtypes and the to_frame() method
import numpy as np
import pandas as pd
np.random.seed(5)
df = pd.DataFrame({'class' : ['first','second','third','fourth','fifth'],
'School A' : np.random.randint(1,50,5),
'School B' : np.random.randint(1,50,5),
'School C' : np.random.randint(1,50,5),
'School D' : np.random.randint(1,50,5),
'School E' : np.random.randint(1,50,5)})
print(df)
class School A School B School C School D School E
0 first 36 10 49 16 14
1 second 15 9 31 40 12
2 third 48 37 17 17 2
3 fourth 39 40 8 28 48
4 fifth 17 28 13 45 31
new_df = (df.select_dtypes(include='int').sum(axis=0).to_frame()
.reset_index().rename(columns={0 : 'Total','index' : 'School'}))
print(new_df)
School Total
0 School A 155
1 School B 124
2 School C 118
3 School D 146
4 School E 107
Edit
seems like there are some typos in your code :
import pandas as pd
dataframe = pd.DataFrame(C1,columns=['School-A','School-B','School-C','School-D','School-E'])
sum_column = dataframe.sum(axis=0)
print (sum_column)
will return the sum as a series, and also sum the text columns by way of string concatenation :
class firstsecondthirdfourthfifth
School A 155
School B 124
School C 118
School D 146
School E 107
dtype: object

Related

Generate conditional lists of lists in Pandas, "Pythonically"

I want to generate a conditional list of lists. The number of embedded lists is determined by the number of unique conditions, and each embedded list contains values from a given condition.
I can generate this list of lists using a for-loop. See the code below. However, I am looking for a faster and more Pythonic (i.e, no for-loop) approach.
import pandas as pd
from random import randint
example_conditions = ["A","A","B","B","B","C","D","D","D","D"]
example_values = [randint(-100,100) for _ in example_conditions ]
df = pd.DataFrame({
"conditions":example_conditions,
"values": example_values
})
lol = []
for condition in df["conditions"].unique():
sublist = df.loc[df["conditions"]==condition]["values"].values.tolist()
lol.append(sublist)
Thanks!
Try:
x = df.groupby("conditions")["values"].agg(list).to_list()
print(x)
Prints:
[[-1, 78], [33, 74, -79], [59], [-32, -2, 52, -66]]
Input dataframe:
conditions values
0 A -1
1 A 78
2 B 33
3 B 74
4 B -79
5 C 59
6 D -32
7 D -2
8 D 52
9 D -66

Apply styles to specific cells in Pandas Multindex Dataframe based on value comparison

I have a pandas multindex dataframe that looks something like this:
in [1]
import pandas as pd
import numpy as np
iterables = [['Chemistry', 'Math', 'English'],['Semester_1', 'Semester_2']]
columns = pd.MultiIndex.from_product(iterables)
index = ['Gabby', 'Sam', 'Eric', 'Joe']
df = pd.DataFrame(data=np.random.randint(50, 100, (len(index), len(columns))), index=index, columns=columns)
df
out[1]
Chemistry Math English
Semester_1 Semester_2 Semester_1 Semester_2 Semester_1 Semester_2
Gabby 86 80 63 50 87 75
Sam 57 84 91 84 60 87
Eric 67 64 52 96 84 70
Joe 51 68 74 69 85 86
I am trying to see if there were students who's grades dropped in more than 10 points in the last semester, color the cells containing the bad grade red and export the whole table to excel. For example, Gabby's Math grade in the second semester dropped 13 points, so I would like the cell containing "50" to be colored red.
Here is the full output I'm expecting.
I have tried the following:
def color_values(row):
change = row['Semester_1'] - row['Semester_2']
color = 'red' if change > 10 else ''
return 'color: ' + color
for subject in ['English', 'Algebra', 'Geometry']:
df = df.style.apply(color_values, axis=1, subset=[subject])
However I'm getting the following error:
AttributeError Traceback (most recent call last)
<ipython-input-5-e83756bce6ef> in <module>
1 for subject in ['English', 'Algebra', 'Geometry']:
----> 2 df = df.style.apply(color_values, axis=1, subset=[subject])
AttributeError: 'Styler' object has no attribute 'style'
I cannot figure out a way to do this. Please help.
in your first loop when subject is "English" you are setting df to be a Styler object, e.g. df.style.
The in your second loop you are calling .style on df which you set as a Styler object, hence the AttributeError.
styler = df.style
for subject in ['English', 'Algebra', 'Geometry']:
styler.apply(color_values, axis=1, subset=[subject])

How to add new Column in Python pandas dataframe by searching keyword value given in list?

I want to add new Column in the dataframe on the basis of Identified keyword:
This is Current Data(Dataframe name = df):
Topic Count
0 This is Python 39
1 This is SQL 6
2 This is Paython Pandas 98
3 import tkinter 81
4 Learning Python 94
5 SQL Working 85
6 Pandas and Work 67
7 This is Pandas 30
8 Computer 20
9 Mobile Work 55
10 Smart Mobile 69
My desired output as below
Topic Count Groups
0 This is Python 39 Python_Group
1 This is SQL 6 SQL_Group
2 This is Paython Pandas 98 Python_Group
3 import tkinter 81 Python_Group
4 Learning Python 94 Python_Group
5 SQL Working 85 SQL_Group
6 Pandas and Work 67 Python_Group
7 This is Pandas 30 Python_Group
8 Computer 20 Devices_Group
9 Mobile Work 55 Devices_Group
10 Smart Mobile 69 Devices_Group
How to Identify Groups Column Value
The Groups created on the basis of below Identity in Topics Column.
if particular word found in Topics then particular group name need assign to it
List of Keywords from Topic Column
Python_Group = ['Python','Pandas','tkinter']
SQL_Group = ['SQL', 'Select']
Devices_Group = ['Computer','Mobile']
I have tried below code for it:
df['Groups'] = [
'Python Group' if "Python" in x
else 'Python Group' if "Pandas" in x
else 'Python Group' if "tkinter" in x
else 'SQL Group' if "SQL" in x
else 'Devices Group' if "Computer" in x
else 'Devices Group' if "Mobile" in x
else '000'
for x in df['Topic']]
print(df)
Above code is also giving me the desired output but I want to make it more short and quick because in above mentioned dataframe has almost 2MM+ Records and its very difficult for me to write 1k+ line of code to define grouping.
Is there any way where I can utilized List of keyword falling under Topic Column?
OR
any Custom Function that can help me in this iterative process?
Code:2 Another below code tried after consulting Stack overflow Experts:
d = pd.read_excel('Map.xlsx').to_dict('list')
keyword_groups = {x:k for k, v in d.items() for x in v}
pat = '({})'.format('|'.join(keyword_groups)) #This line is giving an error
df['Groups'] = (df['Topic'].str.extract(pat, expand=False)
.map(keyword_groups)
.fillna('000'))
The Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-131-543675c0b403> in <module>
3
4 keyword_groups = {x:k for k, v in d.items() for x in v}
----> 5 pat = '({})'.format('|'.join(keyword_groups))
6 pat
TypeError: sequence item 5: expected str instance, float found
Thanks for you help.
One way could be to consider maintaining your groups and keywords in a dict:
d = {'Python_Group': ['Python','Pandas','tkinter'],
'SQL_Group': ['SQL', 'Select'],
'Devices_Group': ['Computer','Mobile']}
From here, you could easily reverse this to a "keyword: Group" dict.
keyword_groups = {x:k for k, v in d.items() for x in v}
# {'Python': 'Python_Group',
# 'Pandas': 'Python_Group',
# 'tkinter': 'Python_Group',
# 'SQL': 'SQL_Group',
# 'Select': 'SQL_Group',
# 'Computer': 'Devices_Group',
# 'Mobile': 'Devices_Group'}
Then you can use Series.str.extract to find these keywords using regex and map them to the correct group. Use fillna to catch any non-matching groups.
pat = '({})'.format('|'.join(keyword_groups))
df['Groups'] = (df['Topic'].str.extract(pat, expand=False)
.map(keyword_groups)
.fillna('000'))
[out]
Topic Count Groups
0 This is Python 39 Python_Group
1 This is SQL 6 SQL_Group
2 This is Paython Pandas 98 Python_Group
3 import tkinter 81 Python_Group
4 Learning Python 94 Python_Group
5 SQL Working 85 SQL_Group
6 Pandas and Work 67 Python_Group
7 This is Pandas 30 Python_Group
8 Computer 20 Devices_Group
9 Mobile Work 55 Devices_Group
10 Smart Mobile 69 Devices_Group
you can do this using np.select. np.select receives 3 parameters, one of conditions, one of results and the last the default value when no condition is found.
Python_Group = ['Python','Pandas','tkinter']
SQL_Group = ['SQL', 'Select']
Devices_Group = ['Computer','Mobile']
conditions = [
df['Topic'].str.contains('|'.join(Python_Group))
,df['Topic'].str.contains('|'.join(SQL_Group))
,df['Topic'].str.contains('|'.join(Devices_Group))
]
results = [
"Python_Group"
,"SQL_Group"
,"Devices_Group"
]
df['Groups'] = np.select(conditions, results, '000')
#output:
Topic Count Groups
0 This is Python 39 Python_Group
1 This is SQL 6 SQL_Group
2 This is Paython Pandas 98 Python_Group
3 import tkinter 81 Python_Group
4 Learning Python 94 Python_Group
5 SQL Working 85 SQL_Group
6 Pandas and Work 67 Python_Group
7 This is Pandas 30 Python_Group
8 Computer 20 Devices_Group
9 Mobile Work 55 Devices_Group
10 Smart Mobile 69 Devices_Group

Get total of Pandas column and row

I have a Pandas data frame, as shown below,
a b c
A 100 60 60
B 90 44 44
A 70 50 50
Now, I would like to get the total of column and row, skip c, as shown below,
a b sum
A 170 110 280
B 90 44 134
So I do not know how to do, I'm in trouble, please help me, thank you, guys.
My example dataframe is:
df = pd.DataFrame(dict(a=[100, 90,70], b=[60, 44,50],c=[60, 44,50]),index=["A", "B","A"])
(
df.groupby(level=0)['a','b'].sum()
.assign(sum=lambda x: x.sum(1))
)
Use:
#remove unnecessary column
df = df.drop('c', 1)
#get sum of rows
df['sum'] = df.sum(1)
#get sum per index
df = df.sum(level=0)
print (df)
a b sum
A 170 110 280
B 90 44 134
df["sum"] = df[["a","b"]].sum(axis=1) #Column-wise sum of "a" and "b"
df[["a", "b", "sum"]] #show all columns but not "c"
The pandas way is:
#create sum column
df['sum'] = df['a']+df['b']
#remove colimn c
df = df[['a', 'b', 'sum']]

Remove index from dataframe using Python

I am trying to create a Pandas Dataframe from a string using the following code -
import pandas as pd
input_string="""A;B;C
0;34;88
2;45;200
3;47;65
4;32;140
"""
data = input_string
df = pd.DataFrame([x.split(';') for x in data.split('\n')])
print(df)
I am getting the following result -
0 1 2
0 A B C
1 0 34 88
2 2 45 200
3 3 47 65
4 4 32 140
5 None None
But I need something like the following -
A B C
0 34 88
2 45 200
3 47 65
4 32 140
I added "index = False" while creating the dataframe like -
df = pd.DataFrame([x.split(';') for x in data.split('\n')],index = False)
But, it gives me an error -
TypeError: Index(...) must be called with a collection of some kind, False
was passed
How is this achievable?
Use read_csv with StringIO and index_col parameetr for set first column to index:
input_string="""A;B;C
0;34;88
2;45;200
3;47;65
4;32;140
"""
df = pd.read_csv(pd.compat.StringIO(input_string),sep=';', index_col=0)
print (df)
B C
A
0 34 88
2 45 200
3 47 65
4 32 140
Your solution should be changed with split by default parameter (arbitrary whitespace), pass to DataFrame all values of lists without first with columns parameter and if need first column to index add DataFrame.set_axis:
L = [x.split(';') for x in input_string.split()]
df = pd.DataFrame(L[1:], columns=L[0]).set_index('A')
print (df)
B C
A
0 34 88
2 45 200
3 47 65
4 32 140
For general solution use first value of first list in set_index:
L = [x.split(';') for x in input_string.split()]
df = pd.DataFrame(L[1:], columns=L[0]).set_index(L[0][0])
EDIT:
You can set column name instead index name to A value:
df = df.rename_axis(df.index.name, axis=1).rename_axis(None)
print (df)
A B C
0 34 88
2 45 200
3 47 65
4 32 140
import pandas as pd
input_string="""A;B;C
0;34;88
2;45;200
3;47;65
4;32;140
"""
data = input_string
df = pd.DataFrame([x.split(';') for x in data.split()])
df.columns = df.iloc[0]
df = df.iloc[1:].rename_axis(None, axis=1)
df.set_index('A',inplace = True)
df
output
B C
A
0 34 88
2 45 200
3 47 65
4 32 140

Resources