How to filter a data frame from each first non NaN value until next and sum values from corresponding column? - python-3.x

I am struggling with the following data frame:
Activity Duration (mins)
BREAK/REST 120
AVAILABILITY 57
WORK 13
DRIVING 10
WORK 31
DRIVING 100
DRIVING 81
DRIVING 106
BREAK/REST 89
BREAK/REST 4
I am trying to find total duration for similar consecutive activities. Following is the output I am trying to achieve.
Activity Duration (mins)
BREAK/REST 120
AVAILABILITY 57
WORK 13
DRIVING 10
WORK 31
DRIVING 287
BREAK/REST 93
I am doing something like this:
import pandas as pd
df = pd.read_excel('reformed_data.xlsx')
df['Activity'].mask((df['Activity'].shift()==df['Activity']), inplace=True)
I am stuck at this point and don't know how to proceed. Please help! :(

IIUC we need shift + cumsum create the group key
s=df.groupby(df.Activity.ne(df.Activity.shift()).cumsum()).\
agg({'Activity':'first','Duration(mins)':'sum'})
s
Out[185]:
Activity Duration(mins)
Activity
1 BREAK/REST 120
2 AVAILABILITY 57
3 WORK 13
4 DRIVING 10
5 WORK 31
6 DRIVING 287
7 BREAK/REST 93

Related

Python: How to find nth minimum value from a dataframe column?

Have got a dataframe like below:
Store Row_no
11 56
11 57
11 58
12 89
12 90
12 91
12 92
For each store need to get 3rd minimum value from Row_no. Expected output below.
Store Row_no
11 58
12 91
have tried df.Row_no.nsmallest(3) but it works different. Any help will be appreciated. Thank You!
Use DataFrame.sort_values with GroupBy.nth:
df = df.sort_values(['Store','Row_no']).groupby('Store', as_index=False).nth(2)
print (df)
Store Row_no
2 11 58
5 12 91

Find out minimum value of specific columns in a row in MS Excel

My table in Excel looks something like this:
abcd 67 94 52 89 24
efgh 23 45 93 54 34
ijkl 64 83 23 45 92
mnop 34 45 10 66 53
This is a student database containing marks obtained in various subjects. I need to calculate the percentage in each row such that out of 5 subjects, the first subject is always included with other 3 subject with maximum marks.
Example: abcd 67 94 52 89 24 75.5%
Here 75.5%=(67+94+52+89)/4=302/4=75.5 where 24 being the lowest has been excluded and 67 has to be taken even if it were the least.
What I require is the least(excluding the first column, of course) of all the columns in that particular row, so that I can sum all the marks and subtract this least marks and finally use it to calculate the percentage.
Any help/suggestion would be appreciated. Thank You.
You'll need to adjust this for your columns, but if you sum the entire range, then subtract the min value after, do a count of the range then subtract one from that, you will be able to get the average.
This code is using the 4 values from column B through F and the 4 values are: 67 94 52 89 24... which results in 75.5
=(SUM(B3:F3)-MIN(C3:F3))/(COUNT(B3:F3)-1)

SUMIFS/SUMPRODUCT for 2D data with multiple possible values in 2nd direction

I have been struggling with the following:
I have a data sheet as follows, from which I want to sum the amounts per week and groups of projects, where the group of projects is user input. This "data" sheet is schematically looking like this
A B C D E F G
1 YEAR 2017 2017 2017 2017 2017 2017
2 WEEK 40 41 42 43 44 45
3 ProjectA 100 101 102 104 100 85
4 ProjectB 80 80 85 82 80 82
5 ProjectC 60 60 60 60 60 60
6 ProjectD 105 108 112 116 120 122
Next step is that the question of which projects you'd need to sum, is user input, so in another sheet ("projects"), the user would input:
A
1 ProjectA
2 ProjectC
3
4
5
Then in the third sheet, I would have to show the summed data per week:
A B C D E F
1 2017 2017 2017 2017 2017 2017
2 40 41 42 43 44 45
3
Now the big question is, what formula could I use in row 3 of this last sheet?
What I have tried so far is: (in A3)
{=SUM(IF(data!B1:G1=A1;IF(data!B2:G2=A2;IF(data!A3:A6=projects!A1:A5;data!B3:G6))))}
This gives me a #N/A error. If I replace projects!A1:A5 by projects!A1, everything works fine, but then that's not much of a summation anymore :)
I have tried other versions with SUMIFS and SUMPRODUCT but those get me nowhere closer to where I'd like to be.
So, any help would be greatly appreciated.
(One last note, I am not able/allowed to change or add anything in the "data" sheet)
Use SUMPRODUCT:
=SUMPRODUCT((Data!$B$2:$G$2=A2)*(Data!$B$1:$G$1=A1)*(ISNUMBER(MATCH(Data!$A$3:$A$6,projects!$A:$A,0))),Data!$B$3:$G$6)

Excel Rank Multiple Columns

I'm facing a issue with ranking in Excel particularly in regards to tie breaking. I tried several options but i guess they don't fit my issue. Its quite simple really, I'll explain:
The Data:
1 2 3 4 5 6 7 8 9 10
87 83 74 95 69 90 73 0 74 85
121 121 96 121 121 121 121 83 121 121
As you can see its easy for me to rank the first line (I'm working in columns instead of rows for the data). When i do a Rank Function gives the following result:
3 5 6 1 9 2 8 10 6 4
Which is correct.
The problem arises in the second line. There are ties because all of them reach the maximum of 121:
1 1 9 1 1 1 1 10 1 1
What i would like to do is take the first row as a tie breaker. So even if there is a tie the first line which was firstly text but now is a sequence from 1 to 10 could provide as secondary criteria to order the rank, thus giving the following ranking line:
1 2 9 3 4 5 6 10 7 9
Could one achieve this result?
Thank You very much in advance.
You need a helper row to break the tie. You can add a fraction of the first row to the second row to create a new row & use the new row to rank
A4 = A3+(A2/(MAX($A$2:$J$2)+1))
Using the MAX I ensure the fraction is less than 1 which is adequate to break ties in this case.
A6 = RANK(A4,$A$4:$J$4)
You can hide the helper row if you dont want to show it.

How to format a number to appear as percentage in Excel

So lets say I have a few numbers in a sheet
a b c d
1 33 53 23 11
2 42 4 83 64
3 75 3 48 38
4 44 0 22 45
5 2 34 76 6
6
7 Total 85
I would like to display those numbers so that the cell value still holds the original figure (A1 = 33)
but the cell displays both the number and a percentage from the total (B7) eg
a b c d
1 33 (39%) 53 (62%) 23 (27%) 11 (13%)
2 42 (49%) 4 (5%) 83 (98%) 64 (75%)
3 75 (88%) 3 (4%) 48 (56%) 38 (45%)
4 44 (52%) 0 (0%) 22 (26%) 45 (53%)
5 2 (2%) 34 (40%) 76 (89%) 6 (7%)
6
7 Total 85
I know how to format a cell as a percentage, but I can't figure out how to display both original values, the calculated percentage value (value/total*100), but not change the cell value so I could still sum the cells in the end (eg. A6 =SUM(A1:A5) = 196)
Does anyone have an idea? I was hoping there could be a way to duplicate and calculate the figure using text formatting, but I can't get anything to work.
I'm guessing this is a trivial answer and maybe not what you're looking for, but why not just add a column for each of the columns you have now?
a a' b b' c c' d d'
1 33 (39%) 53 (62%) 23 (27%) 11 (13%)
2 42 (49%) 4 (5%) 83 (98%) 64 (75%)
3 75 (88%) 3 (4%) 48 (56%) 38 (45%)
4 44 (52%) 0 (0%) 22 (26%) 45 (53%)
5 2 (2%) 34 (40%) 76 (89%) 6 (7%)
6
7 Total 85
#Ari’s answer seems to meet to meet the requirements in your question, not repeat information more than the example you gave for output requirement and be viable for up to around 8000 or so columns to start with (unless a very old version of Excel) and Jerry’s comment is also correct that what you want to achieve the way you want to achieve it is not possible.
However there are other approaches that might be acceptable substitutes. One is to copy your data and Paste Special with Operation Divide, either elsewhere or over the top of your data. If over the top this either shows the values or the percentages otherwise duplicates your data. Over the top would also require something like Operation Multiply to revert back to values, and reformatting each time if to appear as in your example.
Another is to use a PivotTable with some calculated fields and both are shown below:
I appreciate neither is exactly what you are asking for.

Resources