Is it possible to count positive values of each column in a dataframe ?
I tried to do this with 'count'
import pandas as pd
import numpy as np
np.random.seed(18)
df = pd.DataFrame(np.random.randint(-10,10,size=(5, 4)), columns=list('ABCD'))
print(df)
A B C D
0 0 9 -5 7
1 4 8 -8 -2
2 -8 7 -5 5
3 0 0 1 -6
4 -6 1 -9 -7
positive_count = df.gt(0).count()
print(positive_count)
A 5
B 5
C 5
D 5
dtype: int64
The "gt" (greater than) seems doesn't work.
I tried with 'value_counts', and it works for column 'A' in this example
positive_count = df['A'].gt(0).value_counts()[1]
But I would like to get this result for all columns at one time.
Does anyone have an idea to help me?
Related
In my dataframe i want to concat same value of column x rows horizontally ,
here is my dataframe:
df=pd.DataFrame({'x':[-2,-4,-6,-7,-9,-2,-4,-6,-7,-9],'dd':[1,2,3,4,5,6,7,8,9,10]})
df_out:
df=pd.DataFrame({'x':[-2,-4,-6,-7,-9],'dd':[1,2,3,4,5],'dd1':['6,7,8,9,10']})
Use GroupBy.cumcount for counter with reshape by Series.unstack:
df = (df.set_index(['x', df.groupby('x').cumcount()])['dd']
.unstack()
.sort_index(ascending=False)
.add_prefix('dd')
.reset_index())
print (df)
x dd0 dd1
0 -2 1 6
1 -4 2 7
2 -6 3 8
3 -7 4 9
4 -9 5 10
I have a dataframe:
df = {A:[1,1,1], B:[2012,3014,3343], C:[12,13,45], D:[111,222,444]}
but I need to join the last 3 columns in consecutive order horizontally and thus assign it to the first column, some like this:
df2 = {A:[1,1,1,2,2,2], Fusion3:[2012,12,111,3014,13,222]}
I have tried with .melt, but you are struggling with some ideas and grateful for your comments
From the desired output I'm making the assumption that the initial dataframe should have 1,2,3 in the A column rather 1,1,1
import pandas as pd
df= pd.DataFrame({'A':[1,2,3], 'B':[2012,3014,3343], 'C':[12,13,45], 'D':[111,222,444]})
df = df.set_index('A')
df = df.stack().droplevel(1)
will give you this series:
A
1 2012
1 12
1 111
2 3014
2 13
2 222
3 3343
3 45
3 444
Check melt
out = df.melt('A').drop('variable',1)
Out[15]:
A value
0 1 2012
1 2 3014
2 3 3343
3 1 12
4 2 13
5 3 45
6 1 111
7 2 222
8 3 444
enter image description here
i'm trying to shift the first row only by one cell to the right so the dates start under number 1 column,
also i'm trying to remove the tailing '\n' by doing this but its not working, any help please?
income_df2 = income_df2.replace('[\$,)]','', regex=True )\
.replace( '[(]','-', regex=True)\
.replace( '', 'NaN', regex=True)
Yes, you can do something like this shift he first row of a dataframe to the right one column. Use iloc to select this row all columns which returns a pd.Series, then use shift to shift the values of this series one position and assign this newly shifted series back to the first row of the dataframe.
df.iloc[0, :] = df.iloc[0, :].shift()
MCVE:
import pandas as pd
import numpy as np
df = pd.DataFrame([[*'ABCD']+[np.nan],[1,2,3,4,5],[5,6,7,9,10],[11,12,13,14,15]])
df
# Input DataFrame
# 0 1 2 3 4
# 0 A B C D NaN
# 1 1 2 3 4 5.0
# 2 5 6 7 9 10.0
# 3 11 12 13 14 15.0
df.iloc[0, :] = df.iloc[0, :].shift()
df
# Output DataFrame
# 0 1 2 3 4
# 0 NaN A B C D
# 1 1 2 3 4 5
# 2 5 6 7 9 10
# 3 11 12 13 14 15
I have a dataframe of many columns as given below
df =
index P1 Q1 W1 P2 Q2 W2 P3 Q3 W3
0 1 -1 2 3 0 -4 -4 4 0
1 2 -5 8 9 3 -7 -8 9 6
2 -4 -5 3 4 5 -6 -7 8 8
I want to compute row wise difference between max and min in P columns.
df['P_dif'] = max (P1,P2,P3) - min (P1,P2,P3)
My expected output
df =
index P1 Q1 W1 P2 Q2 W2 P3 Q3 W3 P_dif
0 1 -1 2 3 0 -4 -4 4 0 7 # 3-(-4)
1 2 -5 8 9 3 -7 -8 9 6 17 # 9-(-8)
2 -4 -5 3 4 5 -6 -7 8 8 11 # 4-(-7)
My present code
df['P_dif'] = df[df.columns[::3]].apply(lambda g: g.max()-g.min())
My present output
print(df['P_dif'])
NaN
NaN
NaN
Not sure why you're getting Nan values but I suspect it may be because you have rows with NaN in the Px columns (in the rows you hven't shown us in your example).
The reason I suspect this is because the lambda you're applying is operating on columns rather than rows, as per the following transcript:
>>> import pandas
>>> data = [[1,-1,2,3,0,-4,-4,4,0],[2,-5,8,9,3,-7,-8,9,6],[-4,-5,3,4,5,-6,-7,8,8]]
>>> df=pandas.DataFrame(data,columns=['P1','Q1','W1','P2','Q2','W2','P3','Q3','W3'])
>>> df
P1 Q1 W1 P2 Q2 W2 P3 Q3 W3
0 1 -1 2 3 0 -4 -4 4 0
1 2 -5 8 9 3 -7 -8 9 6
2 -4 -5 3 4 5 -6 -7 8 8
>>> df[df.columns[::3]].apply(lambda g: g.max()-g.min())
P1 6 # 2 - -4 -> 6
P2 6 # 9 - 3 -> 6
P3 4 # -4 - -8 -> 4
Note the output specifying the P1, P2 and P3 values and the stuff I've added as comments to the right, to show that it's the maximal difference of the column rather than the row.
You can get the information you need with the following:
>>> numpy.ptp(numpy.array(df[['P1', 'P2', 'P3']]), axis=1)
array([7, 17, 11], dtype=int64)
I don't doubt someone more familar than I with Pandas and Numpy could improve on that so feel free to edit this answer if that's the case.
You can use DataFrame.max, DataFrame.min with axis=1 to calculate max and min value among columns
computed_cols = df.loc[:, ['P1', 'P2', 'P3']]
df['P_dif'] = computed_cols.max(axis=1) - computed_cols.min(axis=1)
Best,
I have a excel sheet:
I read it out:
import pandas as pd
import numpy as np
excel_file = 'test.xlsx'
df = pd.read_excel(excel_file, sheet_name=0)
print(df)
it shows:
name value
0 a 10000.000000
1 b 20000.000000
2 c 30000.000000
3 d 40000.000000
4 e 50000.000000
5 f 1.142857
6 g 1.285714
how can I format the number output to like %.2f, can I format it directly by print (df) like add something in print like %.2f, or I must first modify it content and commit back to df and then print again?
UPDATE:
I try the answer below, df['value'].apply("{0:.2f}".format) doesn't work:
import pandas as pd
import numpy as np
excel_file = 'test.xlsx'
df = pd.read_excel(excel_file, sheet_name=0)
print(df['value'])
df['value'].apply("{0:.2f}".format)
print(df['value'])
print(df)
it shows:
0 10000.000000
1 20000.000000
2 30000.000000
3 40000.000000
4 50000.000000
5 1.142857
6 1.285714
Name: value, dtype: float64
0 10000.000000
1 20000.000000
2 30000.000000
3 40000.000000
4 50000.000000
5 1.142857
6 1.285714
Name: value, dtype: float64
name value
0 a 10000.000000
1 b 20000.000000
2 c 30000.000000
3 d 40000.000000
4 e 50000.000000
5 f 1.142857
6 g 1.285714
pd.set_option('display.float_format', lambda x: '%.2f' % x) works:
0 10000.00
1 20000.00
2 30000.00
3 40000.00
4 50000.00
5 1.14
6 1.29
Name: value, dtype: float64
name value
0 a 10000.00
1 b 20000.00
2 c 30000.00
3 d 40000.00
4 e 50000.00
5 f 1.14
6 g 1.29
You can change the float_format of pandas in pandas set_option like this
pd.set_option('display.float_format', lambda x: '%.2f' % x)
If you want to change the formating of just one column, you can do a apply and change formating like this
df['age'].apply("{0:.2f}".format)
# Output
0 10000.00
1 20000.00
2 30000.00
3 40000.00
4 50000.00
5 1.14
6 1.29
Name: age, dtype: object
The default precision is 6. You can override this this with pandas.set_option:
pd.set_option('display.precision', 2)