I have below table
Month LoB Score Rank
Jan A 1
Jan B 2
Feb B 1
Feb B 2
Jan A 2
Mar C 1
Feb A 3
Jan A 3
Mar C 2
Mar A 1
Mar C 3
I want to Rank the scores basis Month and LoB. For ex in Jan for A whatever is highest will get Rank 1. Similarly in Jan for LoB B whatever is highest will get Rank 1.
I understand that Index and Row formula are to be used in conjunction with Rank.eq but i am unable to put it together at all. I would appreciate any help on this.
Thank you
Assuming Row1 is the header row and actual data lies in the range A2:C11, then try this...
In D2
=SUMPRODUCT(($A$2:$A$11=A2)*($B$2:$B$11=B2)*($C$2:$C$11>C2))+1
and copy it down.
AoA / Good morning
by using Rank formula problems face in correct position.=RANK(K5,K5:K34)
Marks Position total marks 350
obtained
290 29
346 9 (student obtained 346 marks how he have 9th position he must be 4th position)
250 30
343 20
345 13
342 21
334 26
346 9
345 13
346 9
346 9
348 5
350 1
349 3
335 24
345 13
335 24
348 5
339 22
295 28
350 1
345 13
348 5
344 18
345 13
338 23
347 2
349 3
297 27
Related
I am having difficulty creating two columns, "Home Score" and "Away Score", in the wikipedia table I am trying to parse.
I tried the following script with two try-except-else statements to see if that would work.
test_matches = pd.read_html('https://en.wikipedia.org/wiki/List_of_Wales_national_rugby_union_team_results')
test_matches = test_matches[1]
test_matches['Year'] = test_matches['Date'].str[-4:].apply(pd.to_numeric)
test_matches_worst = test_matches[(test_matches['Winner'] != 'Wales') & (test_matches['Year'] >= 2007) & (test_matches['Competition'].str.contains('Nations'))]
try:
test_matches_worst['Home Score'] = test_matches_worst['Score'].str.split("–").str[0].apply(pd.to_numeric)
except:
print("let's try again")
else:
test_matches_worst['Home Score'] = test_matches_worst['Score'].str.split("-").str[0].apply(pd.to_numeric)
try:
test_matches_worst['Away Score'] = test_matches_worst['Score'].str.split("–").str[1].apply(pd.to_numeric)
except:
print("let's try again")
else:
test_matches_worst['Away Score'] = test_matches_worst['Score'].str.split("-").str[1].apply(pd.to_numeric)
test_matches_worst['Margin'] = (test_matches_worst['Home Score'] - test_matches_worst['Away Score']).abs()
test_matches_worst.sort_values('Margin', ascending=False).reset_index(drop = True)#.head(20)
However, I would receive a Key error message and the "Home Score" is not displayed in the dataframe when shortening the code. What is the best way to handle this particular table and to generate the columns that I want? Any assistance on this would be greatly appreciated. Thanks in advance.
The problem of the data you collect is the hyphen or dash. Except the last row, all score separator are the 'En Dash' (U+2013) and not the 'Hyphen' (U+002D):
sep = r'[-\u2013]'
# df is test_matches_worst
df[['Home Score','Away Score']] = df['Score'].str.split(sep, expand=True).astype(int)
df['Margin'] = df['Home Score'].sub(df['Away Score']).abs
Output:
>>> df[['Score', 'Home Score', 'Away Score', 'Margin']]
Score Home Score Away Score Margin
565 9–19 9 19 10
566 21–9 21 9 12
567 32–21 32 21 11
568 23–20 23 20 3
593 21–16 21 16 5
595 15–17 15 17 2
602 30–17 30 17 13
604 20–26 20 26 6
605 27–12 27 12 15
614 19–26 19 26 7
618 28–9 28 9 19
644 22–30 22 30 8
656 26–3 26 3 23
658 29–18 29 18 11
666 16–21 16 21 5
679 16–16 16 16 0
682 25–21 25 21 4
693 16–21 16 21 5
694 29–13 29 13 16
696 20–18 20 18 2
704 12–6 12 6 6
705 37–27 37 27 10
732 24–14 24 14 10
733 23–27 23 27 4
734 33–30 33 30 3
736 10–14 10 14 4
737 32–9 32 9 23
739 13–24 13 24 11
745 32–30 32 30 2
753 29-7 29 7 22
Note: you will probably receive a SettingWithCopyWarning
To solve it, use test_matches = test_matches[1].copy()
Bonus
Pandas function like to_datetime, to_timedelta or to_numeric can take a Series as parameter so you can avoid apply:
test_matches['Year'] = pd.to_numeric(test_matches['Date'].str[-4:])
I got two files.
file 1:
4
14
18
45
53
60
64
102
106
158
162
file2:
28 1 2
54 1 2
90 1 1
103 1 1
155 1 17
191 1 1
235 1 1
245 4 1
275 4 1
362 4 1
377 18 1
391 18 1
413 18 2
466 18 2
492 18 2
494 18 41
498 45 1
522 45 1
529 57 3
542 53 1
560 58 6
562 164 25
568 164 5
I want to extract the value from file2 if the second column of file two matches the value in file 1.
So the expected output will be:
245 4 1
275 4 1
362 4 1
377 18 1
391 18 1
413 18 2
466 18 2
492 18 2
494 18 41
498 45 1
522 45 1
542 53 1
I saw many of the solution online is using python or Perl, however, I want to use linux command to do this, any idea?
This should do it?
awk 'FNR==NR{a[$0]++};FNR!=NR{if($2 in a){print}}' file1 file2
245 4 1
275 4 1
362 4 1
377 18 1
391 18 1
413 18 2
466 18 2
492 18 2
494 18 41
498 45 1
522 45 1
542 53 1
Explanation:
we hand awk both files (order is important in this case!).
as long as we read the first file (FNR==NR) we store each value in an array a[$1]++
when we reach the second file we just check if values from the second file's second column ($2) are in the array; if yes, we print them.
I've got a dataset containing data values associated with times (amongst other categories), and I'd like to add an accumulated value column - that is, the sum of all values up to and including the time. So, taking something like this:
ID YEAR VALUE
0 A 2018 144
1 B 2018 147
2 C 2018 164
3 D 2018 167
4 A 2019 167
5 B 2019 109
6 C 2019 183
7 D 2019 121
8 A 2020 136
9 B 2020 187
10 C 2020 170
11 D 2020 188
and adding a column like this:
ID YEAR VALUE CUMULATIVE_VALUE
0 A 2018 144 144
1 B 2018 147 147
2 C 2018 164 164
3 D 2018 167 167
4 A 2019 167 311
5 B 2019 109 256
6 C 2019 183 347
7 D 2019 121 288
8 A 2020 136 447
9 B 2020 187 443
10 C 2020 170 517
11 D 2020 188 476
Where e.g. in row 7 the CUMULATIVE_VALUE is the sum of the 2 VALUE for ID="D" in years 2018 and 2019 (and not 2020).
I've looked at cumsum() but can't see how I could use it in this specific case so the best I've come up with is this:
import numpy as np
import pandas as pd
np.random.seed(0)
ids=["A","B","C","D"]
years=[2018,2019,2020]
df = pd.DataFrame({"ID": np.tile(ids, 3),
"YEAR": np.repeat(years, 4),
"VALUE": np.random.randint(100,200,12)})
print(df)
df["CUMULATIVE_VALUE"] = None
for id in ids:
for year in years:
df.loc[(df.ID==id) & (df.YEAR==year), "CUMULATIVE_VALUE"] = \
df[(df.ID==id) & (df.YEAR <= year)].VALUE.sum()
print(df)
but I'm sure there must be a better and more efficient way of doing it. Anyone?
You can use pd.Groupby to group by ID and aggregate with cumsum:
df['CUMULATIVE_VALUE'] = df('ID').VALUE.cumsum()
ID YEAR VALUE CUMULATIVE_VALUE
0 A 2018 144 144
1 B 2018 147 147
2 C 2018 164 164
3 D 2018 167 167
4 A 2019 167 311
5 B 2019 109 256
6 C 2019 183 347
7 D 2019 121 288
8 A 2020 136 447
9 B 2020 187 443
10 C 2020 170 517
11 D 2020 188 476
In the case the years are not sorted instead do:
df = df.sort_values(['ID','YEAR']).reset_index(drop=True)
df['cumsum'] = df.groupby('ID').agg({'VALUE':'cumsum'})
I'm trying to create a dynamic rolling 12 month cash flow in Excel.
Lets say the month name is in cell A1.
Underneath cell A1 l have a list of cash flow expenses in my rows and the expenses listed in the columns by month. I have a separate column at the end that totals up 12 months of expenses based on the month name (in cell A1).
So, if cell A1 says Jun-18, l want to add up the expenses for each row item from Jun-18 to May-19. OR say, if cell A1 says Sep-18, l want to add up the expenses for each row item from Sep-18 to Aug-19.
I don't know how to do this, can anyone please advise.
Thanks for your help,
M
You can use sum ofset match
Given the example data below (as I think that you have described in your question), you can use the the following formula (this example the formula result is showing in B2) A1 contains the start date to calculate from.
=SUM(OFFSET(A2,MATCH(A1,A2:A100)-1,4,12,1))
You will need to research how ofset works, as I currently do not have time to explain, but to help you use this formula within your worksheet you will need to change the number 4 which is 4th column away (column E containing the months totals) from the matched date.
The 12 in the formula shows how many rows down you want to sum.
A B C D E
1 Jul-18 8638.21
2 Expence1 Expence2 Expence3 Total
3 Jan-18 1 1 1 3
4 Feb-18 2 2 2 6
5 Mar-18 3541 531 51 4123
6 May-18 100000 31 351 100382
7 Jun-18 846 8 321 1175
8 Jul-18 1 153 12 166
9 Aug-18 0 8 21 29
10 Sep-18 0 65 8 73
11 Oct-18 54 321 1 376
12 Nov-18 321 123 1 445
13 Dec-18 1 321 2 324
14 Jan-19 546 0 51 597
15 Feb-19 132 51 15 198
16 Mar-19 12 321 51 384
17 Apr-19 51 123 321 495
18 May-19 5161 3.21 351 5515.21
19 Jun-19 21 3 12 36
20 Jul-19 321 1 1351 1673
I've been trying to figure out how to SUM the top 2 values of an array using SUMPRODUCT but I also want to add a criteria that will only sum the product if it matches a specific string. I thought I could combine SUMPRODUCT and SUMIF but I have been unsuccessful.
Position Age ADP Trend Value
QB 23 241 84.2 21
QB 35 185 -37.5 142
QB 27 300 25 19
QB 26 300 25 19
QB 32 300 25 19
RB 22 98 -2.2 1051
RB 24 69 0.3 1929
RB 24 238 6 25
RB 26 300 25 19
RB 26 300 25 19
WR 22 300 25 19
WR 24 300 25 19
WR 26 232 -17 36
WR 25 300 25 19
WR 28 300 25 19
WR 23 9 -4.2 8591
WR 23 178 21.4 161
WR 23 38 8.5 4679
WR 26 222 102.8 53
WR 23 300 25 19
WR 26 300 25 19
TE 26 117 -18.7 617
TE 36 193 -30.3 119
TE 26 199 -22.5 105
TE 24 300 25 19
What I want is to SUM the top two values under the Value column IF the Position = QB.
How can I accomplish this?
Cheers!
Use this array formula:
=SUM(LARGE(IF(A2:A25="QB",E2:E25,""),1),LARGE(IF(A2:A25="QB",E2:E25,""),2))
Press CTRL+SHIFT+ENTER to evaluate the formula as it is an array formula.