How can i get my aggregated exposure by identifiers across a hierarchy? - activepivot

Let's say I have the following data :
Trade Data :
TradeId,CptyID,Exposure
T1 , C3, 100
T2 , C2, 50
T3 , C6, 200
Business Hierarchy Data :
CptyID,L1-Acronym,L2-Acronym,L3-Acronym
C3, H1, H2, H3
C2, H4, H5, H2
C6, H4, H5, H6
ID Mapping :
Acronym,CptyID,Identifier
H1 , C1, B1
H2 , C2, B2
H3 , C3, B3
H4 , C4, B4
H5 , C5, B5
H6 , C6, B6
IE having hierarchies like :
level Acronym(Identifier)
L1 H1(B1) H4(B4)
L2 H2(B2) H5(B5)
L3 H3(B3) H2(B2) H6(B6)
Trade T1 T2 T3
I would like to get the exposure by identifiers (B1, B2, B3, B4, B5, B6) where Exp(B1) = Exp(T1), Exp(B2) = Exp(T1)+Exp(T2)...
Joining them together doesn't work. It would give me 3 facts :
TradeID, CptyID, Exposure, L1-Acronym, L2-Acronym, L3-Acronym, Identifier
T1 , C3 , 100, H1, H2, H3, B3
T2 , C2 , 50, H4, H5, H2, B2
T3 , C6 , 200, H4, H5, H6, B6
and give me the wrong results as I would only get the exposures for the identifiers at Level 3 :
Identifier,ResultInLive,ExpectedResult
B1 , Null, 100 (Null because I have no facts associated directly to B1)
B2 , 50, 150
B3 , 100, 100
B4 , Null, 250
B5 , Null, 250
B6 , 200, 200
Another difficulty is that those dimensions can have a lot of members (>300K).
Kind regards,
Christophe

Thanks for your answer !
Each level of my Business Hierarchy data are "entities" which have identifiers.
For instance, lets only consider trade T1, who has an exposure of 100. I have a hierarchy of 3 levels:
the first level is H1, which has an identifier = B1
the second level is H2, which has an identifier = B2
the third and lower level is H3, which has an identifier of B3
The thing we are trying to achieved is to have an identifier dimension with members B1,B2, B3... with the right exposure.
Hence, in this case :
B3 would have an exposure of 100 coming from T1 => Exposure(B3) = Exposure(T1)
B2, who is B3 parent, would also have an exposure of 100 coming from T1 => Exposure(B2) = Exposure(T1)
B1, who is B2 parent, would also have an exposure of 100 coming from T1 => Exposure(B1) = Exposure(T1)
Joining using the cptyId doesnt give us the expected result as the underlying fact would be :
TradeID, CptyID, Exposure, L1-Acronym, L2-Acronym, L3-Acronym, Identifier
T1 , C3 , 100, H1, H2, H3, B3
Therefore, in ActivePivot Live, we would see :
Identifier,ResultIn AP Live,ExpectedResult
B1 , Null, 100 (Null because there is no facts associated directly to B1)
B2 , Null, 100 (Null because there is no facts associated directly to B2)
B3 , 100, 100 (given by the trade fact)
In the first post, I also wanted to illustrate the fact that the same identifier can be in 2 different hierarchies.
For instance :
L1 H1(B1) H4(B4)
L2 H2(B2) H5(B5)
L3 H3(B3) H2(B2) H6(B6)
Trade T1 T2 T3
we can see that B2 is present in at the L2 of the first hierarchy and L3 of the second hierarchy.
Therefore, we would expect to have Exposure(B2) = Exposure (T1) + Exposure (T2) = 150.
Kind regards

Related

Multiply a number by previous cell and different column

I wish to know if it is possible to multiply column(A) by column(B) and then keep on multiplying the last column(A) result by column(B), in a fast way instead of doing it manually that is.
For example:
A1 = 200
B2 = 0.8
C1 = A1 * B2 = 160 OR C1 = 200 * 0.8
C2 = C1 * B2 = 128
C3 = C2 * B2 = 102.4
C4 = C3 * B2 = 81.92
Etc...
You can use:
=SCAN(200,SEQUENCE(4),LAMBDA(a,b,a*0.8))
Note that you can use references for both 200 and 0.8.
You can use exponents to make the dynamic formula you describe based on the row. So if you put this in cell A2 and it would do what you describe, and then just drag down:
=$A$1*($B$2)^row()
Example on google Sheets (same formula works on Excel).

pyspark - Read files with custom delimiter to RDD?

I am newbie in pyspark, and I'm trying to read and merge RDD rows into one row.
Assuming that I have the following text file:
A1 B1 C1
A2 B2 C2 D3
A3 X1 YY1
DELIMITER_ROW
Z1 B1 C1 Z4
X2 V2 XC2 D3
DELIMITER_ROW
T1 R1
M2 MB2 NC2
S3 BB1
AQ3 Q1 P1"
Now, I want to combine all rows appears in each section (between DELIMITER_ROW) into one row, and return a list of these merged rows.
I want to create this kind of list:
[[A1 B1 C1 A2 B2 C2 D3 A3 X1 YY1]
[Z1 B1 C1 Z4 X2 V2 XC2 D3]
[T1 R1 M2 MB2 NC2 S3 BB1 AQ3 Q1 P1]]
How can It be done in pyspark using RDD?
For now I know how to read the file and filter out the delimiter rows:
sc.textFile(pathToFile).filter(lambda line: DELIMITER_ROW not in line).collect()
but I don't know how to reduce/merge/combine/group the rows in each section into one row.
Thanks.
Rather than reading and splitting, You can use hadoopConfiguration.set to set the delimiter which separates the row and then split the row.
spark.sparkContext.hadoopConfiguration.set("textinputformat.record.delimiter", "DELIMITER_ROW")
Hope this helps!

Add A1 to C1 if B1 = Specific Number

I will try to be as clear and concise as possible. I am working on a spreadsheet in which I have item prices listed in a range of A1:A40. B1:B40 lists a numerical digit (either 1, 2, 3, etc.) that corresponds with a purchase category type (groceries, gas, etc.). Now I want one cell, such as C1, to add all instances in the A range that equal a specific number in B.
For example:
A1 = $5.00 | B1 = 1 | C1 = The sum in range A1:A3 if it's corresponding B value is equal to 1 (In this case B1 and B3, so C1=A1+A3)
A2 = $2.50 | B2 = 2 | C2 = The sum in range A1:A3 if it's corresponding B value is equal to 2 (In this case B2, so C2= B2)
A3 = $4.00 | B3 = 1 | C3 =
Use SUMIF Function
SUMIF(range, criteria, [sum_range])
In Cell C1 enter the formula = SUMIF(B:B,1,A:A)

Logical calculation in Excel

I need advice/help. I am working on calculation in excel where I have data like mentioned below.
. A B C D E F G H
1| A275 A277 A273 A777 A777 TOTAL A222 GRAND TOTAL
2| 5 7 4 3 4 7 7
Now, I want to count row 2 based on the header.
Here is the condition.
If A1 <> B1 then take A1, if B1 <> C1 then take B1, if C1 <> D1 then C1, so on.
But tricky part is...
If D1<>E1 then D1 else (if E1<>F1 then E1 else (if F1 = "TOTAL" then F1 else(if F1<>G1 then F1)))
In short H2 should have 30 and not 37.
Added comments:------------------------------------
So, Basically if A1<>B1 then take A1 but if A1=B1 then take B1, but then for B1, its a same rule like if B1<>C1 then take B1, but if B1=C1 then take C1 and for C1, same rule. Stopping point will be "TOTAL". Along with these logic I need to check if any cell in row 1 is "TOTAL" then take value for same column. Now this "TOTAL" can be in any cell in row 1.
So from above table my calculation will be 5(A2) + 7(B2) + 4(C2) + 7(F2) + 7(G2) = 30
In this calculation I have not included D2 and E2 as D2=E2 so I took D2, here E2<>F2 so I should have taken E2, but as F2="TOTAL" so I took F2 and not D2 and E2.
I hope this make sense. (Sorry, I know its confusing.)
I have data in more then 100 columns.
Can this be achieved using Macro?
------------------------------------------------------------
Another pain point is data and header are dynamic, so I can't have a fix format. Logic should be in a way that can handle the dynamic data and header.
Any help or suggestion will be greatly appreciated.
I achieved the results you want with this.
Add a helper row. In cell A3 write this formula and drag it to the right:
=IF(OR(A$1=B$1,B$1="TOTAL"),0,1)
Calculate sum in say cell H4 (not H2 because if the formula refers to entire row 2 there will be circular reference):
=SUMIF($3:$3,1,$2:$2)

Microsoft Excel If Statements

I have altered a statement I got from a previous answer a bit and it now looks like this:
=IF(C6=$R$3,IF(D6<=0.99,$U$2,IF(AND(D6>0.99,D6<=4.99),$U$3,IF(AND(D6>4.99,D6<=14.99),$U$4,IF(AND(D6>14.99,D3<=29.99),$U$5,IF(AND(D6>29.99,D6<99.99),$U$6,""))))),$S$8)
It all works fine until you change the value in cell D6 to say £45 when it still picks up the figure in cell U5.
Can you or anyone else help me tweak this so that it works? I need a statement to do the following:
If C2=R2 and D2 is < T2 then U2, if D2 is >T but T3 but < T4 then U4 if D2 is > T4 but < T5 then U5, if D2 is > T5 but < T6 then U6 BUT if C2 does not equal R2 then S8
Take all your problems and rip them apart:
If C2=R2 and D2 is < T2 then U2, if D2 is >T but T3 but < T4 then U4 if D2 is > T4 but < T5 then U5, if D2 is > T5 but < T6 then U6 BUT if C2 does not equal R2 then S8
Start with this using NA() to represent parts which haven't been completed yet (this will show the #N/A value in the cell):
=IF(C2=R2,NA(),S8)
Add the lookup based on D2:
=IF(C2=R2,IF(D2<T2,U2,NA()),S8)
Assuming that the next part is D2 > T2 and D2 < T3 (althought strictly this formula says D2 >= T2) and result is U3:
=IF(C2=R2,IF(D2<T2,U2,IF(D2<T3,U3,NA())),S8)
Now add between T3 and T4:
=IF(C2=R2,IF(D2<T2,U2,IF(D2<T3,U3,IF(D2<T4,U4,NA()))),S8)
Between T4 and T5:
=IF(C2=R2,IF(D2<T2,U2,IF(D2<T3,U3,IF(D2<T4,U4,IF(D2<T5,U5,NA())))),S8)
Finally between T5 and T6:
=IF(C2=R2,IF(D2<T2,U2,IF(D2<T3,U3,IF(D2<T4,U4,IF(D2<T5,U5,IF(D2<T6,U6,NA()))))),S8)
We still have NA() because you haven't defined the behaviour for C2=R2 and D2 >= T6
As Stobor said in the comment to your original question, using VLOOKUP would be much better - see http://office.microsoft.com/en-us/excel/HP052093351033.aspx for details
Your current structure in the T and U columns won't work with VLOOKUP because:
the next largest value that is less
than lookup value is returned
This would mean that VLOOKUP would return U3 when you wanted U2, U4 instead of U3 and so on. To solve this you would need to move all of the entries in the U column down by one row, put a dummy value or =NA() into U2 and create a value in T7 that was greater than the existing value in T6

Resources