INDEX / MATCH / VLOOKUP Assistance - excel

The below table is the first 29 rows in an XLSX I'm working on, which basically aims to calculate the costs of exported call data.
The data in the table below is the result of population from a PowerShell script, which combines Rate Data from a CSV (to calculate call charges) with Call Data from the customer's daily call stats in another CSV.
Rate Data:
Column F [Destination] contains every known Country Code.
Column E [Rate] contains a Rate value for each Country Code in column F, which will be used to calculate the call cost at the end.
Call Export Data:
Column C [Callee Number] contains the original phone number that was called (Callee).
Column H [Callee Country Code] takes first few digits of the number in Column C for the next step.
Required goal that I'm quite frankly stuck on:
Column I is what I'm working on.
I need a formula that effectively looks for the dialled country code in Column H and looks for the country code that's the MOST SIMILAR to it (Doesn't need to be exact) in Column F. Once found, I need it to return the value on the same row, in Column E [Rate].
Column I should then be populated with the correct Rate for the Number in Column C / H.
Formula's I've tried:
=INDEX($A$2:$K$100000,VLOOKUP(H2,$F$2:$F$100000,5,TRUE))
=INDEX($G$2:$G$100000,MATCH(H2,$F$2:$F$100000,0))
I'm not great with Excel but and I'm sure using 100000 to select the whole column is poor practise.
Thanks for any help :)
Start time
Customer
Callee Number
Country
Rate
Destination
Duration (Minutes)
Callee Country Code
Rate for call
Cost of call
2020-09-01T07:25:30.5190000Z
Name1
+44***
AFGHANISTAN
1.415
93
0
44
2020-09-01T08:05:52.6250000Z
Name2
+442476******
AFGHANISTAN
1.415
9320
0.383333333
442
2020-09-01T08:33:49.6530000Z
Name3
+441509******
AFGHANISTAN
1.415
9321
0.7
441
2020-09-01T08:35:18.5300000Z
Name4
+441509******
AFGHANISTAN
1.415
9322
0.766666667
441
2020-09-01T08:43:45.3300000Z
Name5
+447976******
AFGHANISTAN
1.415
9323
1.85
447
2020-09-01T08:47:29.9630000Z
Name6
+442476******
AFGHANISTAN
1.415
9324
2.533333333
442
2020-09-01T08:57:43.2680000Z
Name7
+447875******
AFGHANISTAN
1.415
9325
3.633333333
447
2020-09-01T09:04:42.8230000Z
Name8
+441212******
AFGHANISTAN
1.415
9326
4.916666667
441
2020-09-01T09:15:32.7220000Z
Name9
+441923******
AFGHANISTAN
1.415
9327
1.9
441
2020-09-01T09:30:36.4750000Z
Name10
+441923******
AFGHANISTAN
1.415
9328
5.8
441
2020-09-01T09:58:12.8380000Z
Name11
+442476******
AFGHANISTAN
1.415
9370
0.516666667
442
2020-09-01T10:03:04.1270000Z
Name12
+442476******
AFGHANISTAN
1.415
9375
13.51666667
442
2020-09-01T10:27:49.6090000Z
Name13
+442476******
AFGHANISTAN
1.415
9377
2.716666667
442
2020-09-01T11:04:21.7850000Z
Name14
+442476******
AFGHANISTAN
1.415
9378
1.6
442
2020-09-01T11:13:31.9810000Z
Name15
+442070******
AFGHANISTAN
1.415
9379
9.816666667
442
2020-09-01T11:46:53.4730000Z
Name16
+442476******
ALAND ISLANDS
247
0.283333333
442
2020-09-01T11:47:14.9110000Z
Name17
+442476******
ALBANIA
0.537
355
0.866666667
442
2020-09-01T12:30:38.4380000Z
Name18
+442476******
ALBANIA
0.537
3554
0.25
442
2020-09-01T12:30:59.5190000Z
Name19
+442476******
ALBANIA
0.537
35567
0.283333333
442
2020-09-01T12:31:34.3300000Z
Name20
+442476******
ALBANIA
0.537
35568
0.283333333
442
2020-09-01T12:35:20.8430000Z
Name21
+442476******
ALBANIA
0.537
35569
0.3
442
2020-09-01T12:37:36.5550000Z
Name22
+442476******
ALGERIA
0.537
213
1.366666667
442
2020-09-01T12:42:07.9660000Z
Name23
+447723******
ALGERIA
0.537
21321
1.466666667
447
2020-09-01T13:13:37.7610000Z
Name24
+441926******
ALGERIA
0.537
21355
3.283333333
441
2020-09-01T13:44:57.3190000Z
Name25
+442476******
ALGERIA
0.537
21356
0.15
442
2020-09-01T13:46:39.2640000Z
Name26
+442476******
ALGERIA
0.537
21366
0.15
442
2020-09-01T13:58:14.1340000Z
Name27
+442476******
ALGERIA
0.537
21369
6.2
442
2020-09-01T13:58:30.5560000Z
Name28
+442476******
ALGERIA
0.537
21377
0.583333333
442

Copied the table to a workbbok and figured out this code:
=IF(LEFT($C1;4)=$H1;INDEX($A$1:$I$28;ROW($H1);5);"")
This code checks, that if the first 4 digits of the phone number equals to the number in H column. If this is true then it gets the row number of the H and gets the value from E. If its false writes nothing.
I don't know if You mistyped the question, but I see no similarities between columns H and F.
I need a formula that effectively looks for the dialled country code in Column H and looks for the country code that's the MOST SIMILAR to it (Doesn't need to be exact) in Column F.
So I made the code for Column C and H.
BUT theres is 1 downside to this code: You need to have exactly the first 4 digits in H column. It means You have to have H formed like this: +442; +447 etc.
Feel free to change the range and names of columns and rows to match Your Excel table.
Just a tip: if You want to select entire columns, You have to write A:A;B:B etc. into the formula. Or just click on the name of the column, and it inserts automatically. Same with rows.

Couple of suggestions to improve your Excel:
In order to work well with Excel I would suggest you would use help columns with the required sub calculation, since no-one can understand lengthy formulas. Better yet: use small lookup tables for reference.
You need two sub columns: 1 for country code (extract that from the phone number) and 1 for rate.
So I would suggest to add a table with "Country Code" "Country Name" "Rate". It is going to be a small table and you can look you data up from there.
Please note the following:
A phone number is a string. It's not a number. Look at the cells in "Callee Number". Are they formatted as Numbers? Text? It's important to make sure they are formatted as TEXT. once they are, you can start manipulating them properly.
You're right, and as was noted, if you want to search an entire column, Just write F:F and don't use the start row - end row if there isn't really one. That's also a good reason to use small lookup table: Excel need to look for less data in order to find the information required.
Test your results in order to make sure that the VLOOKUP or INDEX/MATCH (or XLOOKUP, the new formula on the block) are doing what is expected of them :)

Related

Sum rows with same values and write it in new cell

I have the following table:
OrderNumber
Value
123
2
123
3
333
5
333
6
555
8
555
9
My goal is to sum all OrderNumbers with the same values (e.g. for OrderNumber 123 the sum should be 5) and output the result in a new row.
The output should be like this:
OrderNumber
Value
Result
123
2
5
123
3
5
333
5
11
333
6
11
555
8
17
555
9
17
I've seen some formulas beginning with =SUM(A2:A6;A2;B2:B6). Important to me is that the searching criteria must be dynamically because my table has about 1k rows.
Do you have any references or suggestions?
You need SUMIF() function.
=SUMIF($A$2:$A$7,A2,$B$2:$B$7)
If you are a Microsoft 365 user then can try BYROW() for one go.
=BYROW(A2:A7,LAMBDA(x,SUMIF(A2:A7,x,B2:B7)))
This is the exact reason why the "Subtotals" feature has been invented:

Print list from a dataframe based on another columns value

I would like to slice the dataframe according to conditions. I want to keep the area name where the length of codes are 5 or 3.
The dataframeAreaCode is as bellowed
codes area
0 113 Leeds
2 115 Nottingham
3 116 Leicester
... ... ...
596 1985 Warminster
597 1986 Bungay
598 1987 Ebbsfleet
This is the code I wrote, but it didn't work.
# print([AreaCode['codes']>4])
for i in AreaCode['codes']:
if len(i)>4:
print(AreaCode['area'][i])

difference between two column of a dataframe

I am new to python and would like to find out the difference between two column of a dataframe.
What I want is to find the difference between two column along with a respective third column. For example, I have a dataframe Soccer which contains the list of all the team playing soccer with the goals against and for their club. I wanted to find out the goal difference along with the team name. i.e. (Goals Diff=goalsFor-goalsAgainst).
Pos Team Seasons Points GamesPlayed GamesWon GamesDrawn \
0 1 Real Madrid 86 5656 2600 1647 552
1 2 Barcelona 86 5435 2500 1581 573
2 3 Atletico Madrid 80 5111 2614 1241 598
GamesLost GoalsFor GoalsAgainst
0 563 5947 3140
1 608 5900 3114
2 775 4534 3309
I tried creating a function and then iterating through each row of a dataframe as below:
for index, row in football.iterrows():
##pdb.set_trace()
goalsFor=row['GoalsFor']
goalsAgainst=row['GoalsAgainst']
teamName=row['Team']
if not total:
totals=np.array(Goal_diff_count_Formal(int(goalsFor), int(goalsAgainst), teamName))
else:
total= total.append(Goal_diff_count_Formal(int(goalsFor), int(goalsAgainst), teamName))
return total
def Goal_diff_count_Formal(gFor, gAgainst, team):
goalsDifference=gFor-gAgainst
return [team, goalsDifference]
However, I would like to know if there is a quickest way to get this, something like
dataframe['goalsFor'] - dataframe['goalsAgainst'] #along with the team name in the dataframe
Solution if unique values in Team column - create index by Team, get difference and select Team by index:
df = df.set_index('Team')
s = df['GoalsFor'] - df['GoalsAgainst']
print (s)
Team
Real Madrid 2807
Barcelona 2786
Atletico Madrid 1225
dtype: int64
print (s['Atletico Madrid'])
1225
Solution if possible duplicated values in Team column:
I believe you need grouping by Team and aggregate sum first and then get difference:
#change sample data for Team in row 3
print (df)
Pos Team Seasons Points GamesPlayed GamesWon GamesDrawn \
0 1 Real Madrid 86 5656 2600 1647 552
1 2 Barcelona 86 5435 2500 1581 573
2 3 Real Madrid 80 5111 2614 1241 598
GamesLost GoalsFor GoalsAgainst
0 563 5947 3140
1 608 5900 3114
2 775 4534 3309
df = df.groupby('Team')['GoalsFor','GoalsAgainst'].sum()
df['diff'] = df['GoalsFor'] - df['GoalsAgainst']
print (df)
GoalsFor GoalsAgainst diff
Team
Barcelona 5900 3114 2786
Real Madrid 10481 6449 4032
EDIT:
s = df['GoalsFor'] - df['GoalsAgainst']
print (s)
Team
Barcelona 2786
Real Madrid 4032
dtype: int64
print (s['Barcelona'])
2786

Excel need to sum distinct id's value

I am struggling to find the sum of distinct id's value. Example given below.
Week TID Ano Points
1 111 ANo1 1
1 112 ANo1 1
2 221 ANo2 0.25
2 222 ANo2 0.25
2 223 ANo2 0.25
2 331 ANo3 1
2 332 ANo3 1
2 333 ANo3 1
2 999 Ano9 0.25
2 998 Ano9 0.25
3 421 ANo4 0.25
3 422 ANo4 0.25
3 423 ANo4 0.25
3 531 ANo5 0.5
3 532 ANo5 0.5
3 533 ANo5 0.5
From the above data i need to bring the below result. Could anyone help please using some excel formula?
Week Points_Sum
1 1
2 1.50
3 0.75
You say "sum of distinct id's value"? All the IDs are different so I'm assuming you want to sum for each different "Ano" within the week?
=SUM(IF(FREQUENCY(IF(A$2:A$17=F2,MATCH(C$2:C$17,C$2:C$17,0)),ROW(A$2:A$17)-ROW(A$2)+1),D$2:D$17))
confirmed with CTRL+SHIFT+ENTER
where F2 contains a specific week number
Assumes that each "Ano" will always have the same points value
Probably not the most efficient solution... but this array formula works:
= SUMPRODUCT(IF($A$2:$A$15=$F2,$D$2:$D$15),1/MMULT((IF($A$2:$A$15=$F2,$D$2:$D$15)=
TRANSPOSE(IF($A$2:$A$15=$F2,$D$2:$D$15)))+0,(ROW($A$2:$A$15)>0)+0))
Note this is an array formula, so you have to press Ctrl+Shift+Enter after typing this formula instead of just Enter.
See working example below. This formula is in cell G2 and dragged down.

Sum of the values in the previous cell/s, if current cell/s is an error

I've been trying to work out how to get the sum of the values in the previous cell/s, if the current cell/s is an error. Example below:
ABC ASD BHP WER THY SUM SUM of previous Error
1 789 564 654 546 654 3207 0
2 103 123 213 123 654 1216 0
3 546 N/A 879 654 654 2733 123
4 654 N/A N/A N/A 987 1641 1533
5 665 N/A N/A N/A 987 1652 0
Any help would be appreciated.
Thanks
Assuming that your "N/A"s represent real #N/A errors and that your data is in columns A to F starting in row B, this array formula, entered with Ctrl-Shift-Enter, should do it:
=SUM(IF(ISNA(A2:F2),IF(ISNA(A1:F1),"",A1:F1),""))
entered with Ctrl-Shift-Enter
That looks like a text value in your example, in which case you could use this formula in G2 copied down
=SUMIF(A2:F2,"N/A",A1:F1)
......but if those are actual #N/A errors you can use this version
=SUMIFS(A1:F1,A2:F2,"#N/A",A1:F1,"<>#N/A")
[revised due to comments]

Resources