Array formulas: nested ifs and same row calculation - excel

There are 2 inputs: A1 and B1.
In column D, there are many types of objects A.
In column B, there are many types of objects B.
Here's what the formula is supposed to do:
If (D2 is 'A1' and G2 is 'B1') then, if (E2 is bigger than F2), subtract E2 and F2 (5 - 4, in this example), otherwise subtract F2 to E2 (like what happens in line 12).
If there is no match, don't do anything and just skip the row.
I would like to do this as an array formula (Ctrl+Shift+Enter), so it would sum everything in the end.
In this example, the output would be -1, because sum(and(5-4)(2-4)) .
So far, I have the following:
{=SUM(IF((D2:D12="A1")+(G2:G12="B1");E2:E12-F2:F12;0))}
But it doesn't work properly as I'm not sure how Excel reads the subtraction part. I want to be able to subtract the values for the row where the combination was found.

If all you need is to have Column E subtracted by Column F for all matches then consider the following Array Formula:
=SUM((D2:D12=$B$2)*(G2:G12=$B$3)*(E2:E12-F2:F12))
(This can be updated with extra checks on what to subtract if needed)
This will SUM all of the subtractions (Column E) - (Column F) that contain a match to your inputs.
Here is the breakdown:
D2:D12=$B$2 and G2:G12=$B$3 will produce arrays containing 1's for a match and 0's for non-match:
{A1,A2,A3, -,A1, -, -,A4,A5,A1,A1} {B1, -,B1, -,B4, -, -,B6,B5,B2,B1}
V V V V V V V
{1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1 } {1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1 }
E2:E12-F2:F12 will result in a 3rd array consisting of the subtracted values:
{5, 5, 3, 1, 3, 3, 7, 3, 9, 7, 4}
-{4, 3, 4, 5, 6, 5, 9, 6, 7, 8, 2}
={1, 2,-1,-4,-3,-2,-2,-3, 2,-1, 2}
Multiplying all of them will result like so:
{1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1}
x{1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1}
x{1, 2,-1,-4,-3,-2,-2,-3, 2,-1, 2}
={1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2}
Then of course SUM will do it's job:
SUM({1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2}) = 3

If I understood what you are asking for correctly then yours answer would be:
=SUM((D2:D12="A1")*(G2:G12="B1")*ABS(E2:E12-F2:F12))
Remember that TRUE to Excel is the same thing as nubmer 1 and FALSE is 0.
So if any in my formula any row that has either D or G column not matching will be multiplied by 0.
Also your rule about E and F columns sounds to me like
Subtract the smaller from bigger number
this is same as:
|4-5|=1
Or in Excel formula notation:
ABS(4-5)

Related

Array element shift

I have a question. Sorry if it's very simple, I'm new to this and have struggled for several hours to do this without success.
a1 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
a2 = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
I am trying to divide the first element of a1 by the second element of a2, the second element of a1 by the third element of a2, the third element of a1 by the fourth element of a2, etc...it's a long list but this is a short form.
The new array or list should be something like this:
a3 = [(1/2, 2/3, 3/4, 4/5, 5/6, 6/7, 7/8, 8/9, 9/10]
Here is my code:
a1_new = a1[:-1]
a2_new = a1[1:]
a3 = a1_new/a2_new
return a3
The answer is not correct.
What is a better way to do this?
In Excel 365
={1,2,3,4,5,6,7,8,9,10}/{2,3,4,5,6,7,8,9,10,11}

Pyspark: How to count the number of each equal distance interval in RDD

I have a RDD[Double], I want to divide the RDD into k equal intervals, then count the number of each equal distance interval in RDD.
For example, the RDD is like [0,1,2,3,4,5,6,6,7,7,10]. I want to divided it into 10 equal intervals, so the intervals are [0,1), [1,2), [2,3), [3,4), [4,5), [5,6), [6,7), [7,8), [8,9), [9,10].
As you can see, each element of RDD will be in one of the intervals. Then I want to calculate the number of each interval. Here, there are one element in [0,1),[1,2),[2,3),[3,4),[4,5),[5,6), and both [6,7) and [7,8) have two element. [9,10] has one element.
Finally I expected an array like array([1,1,1,1,1,1,2,2,0,1].
Try this. I have assumed that first element of the range is inclusive and last exclusive. Please confirm on this. For example when considering the range [0,1] and element is 0 the condition is element >= 0 and element < 1.
for index_upper, element_upper in enumerate(array_range):
counter = 0
for index, element in enumerate(rdd.collect()):
if element >= element_upper[0] and element < element_upper[1] :
counter +=1
countElementsWithinRange.append(counter)
print(rdd.collect())
# [0, 1, 2, 3, 4, 5, 6, 6, 7, 7, 10]
print(countElementsWithinRange)
# [1, 1, 1, 1, 1, 1, 2, 2, 0, 0]

Count number of 2 consecutive negative values in Excel

Per the title I want to count the number of times 2 consecutive negative values exist in a series of returns.
To illustrate, the formula should return 2 given the series
0, -1, 3, -2, -1, -4, 5, -6
and should return 1 given the series
0, -1, 3, -2, -1, 5, -6
Use COUNTIFS with staggered ranges:
=COUNTIFS(A2:A8,"<0",A3:A9,"<0")
Change the , to ; if needed based on your Excel version.

Replace indicator values with actual values

I have a numpy array like this
array([[0, 0, 1],
[1, 0, 0],
[0, 1, 0],
[0, 0, 1]])
and an array with values
array([1, 2, 3, 4])
I would like to replace the ones in the first two-dimensional array with the corresponding values in the second array. Each row of the first array has exactly one 1, and there is only 1 replacement in the second array.
Result:
array([[0, 0, 1],
[2, 0, 0],
[0, 3, 0],
[0, 0, 4]])
I would like an elegant solution to achieve this, without loops and such.
Let's say a is the 2D data array and b the second 1D array.
An elegant solution would be -
a[a==1] = b
For performance, leveraging the fact that there's exactly one 1 per row, we could also use indexing -
a[np.arange(len(a)),a.argmax(1)] = b
Selectively assign per row
If we want to selectively mask and asign values per row, we could use one more level of masking. So, let's say we have the rows to be selected as -
select_rows = np.array([1,3])
Then, we could do -
rowmask = np.isin(np.arange(len(a)),select_rows)
So, for the replacement for the first approach would be -
a[(a==1) & rowmask[:,None]] = b[rowmask]
And for the second one -
a[np.arange(len(a))[rowmask],a.argmax(1)[rowmask]] = b[rowmask]

Need to sum a column if 2 or 3 columns contain a specific text

I've the following data set and want to add the values that reflect "ABC" in any cell.
Column1 Column 2 Column 3 Column 4 Column 5
ABC is good CNN $150 ABC NBA
Better life N-H $40 LIT MNM
Nice Job ABC is good $35 MN ABC
Poor H-I $200 ITL ABC
Best TI $120 SQL ABC
Poor life N-T $40 LT NM
Great BE $800 ABC BEF
The sum it should return is $150+$35+200+120+$400 = $905 because somewhere in the cells it has the text "ABC". I tried using sumif(find) formula but gives me value error.
Any thoughts?
Short Answer
Use this array formula:
=SUMPRODUCT(IF(IF(LEN(SUBSTITUTE(A:A,"ABC",""))<LEN(A:A),1,0)+IF(LEN(SUBSTITUTE(B:B,"ABC",""))<LEN(B:B),1,0)+IF(LEN(SUBSTITUTE(D:D,"ABC",""))<LEN(D:D),1,0)+IF(LEN(SUBSTITUTE(E:E,"ABC",""))<LEN(E:E),1,0)>0,1,0),C:C)
Note: array formulas are entered with ctrl + shift + enter
Explaination
To test whether or not a cell contains ABC we can use the SUBSTITUTE forumla combined with a LEN to test the difference between the string lengths:
LEN(SUBSTITUTE(A:A,"ABC",""))<LEN(A:A)
We can then wrap that in an IF statement to get a nice array of 1's and 0's
IF(IF(LEN(SUBSTITUTE(A:A,"ABC",""))<LEN(A:A),1,0)
If we mapped this out for your data it would look like this:
IF(IF(LEN(SUBSTITUTE(A:A,"ABC",""))<LEN(A:A),1,0) = {0, 1, 0, 0, 0, 0, 0, 0}
IF(IF(LEN(SUBSTITUTE(B:B,"ABC",""))<LEN(B:B),1,0) = {0, 0, 0, 1, 0, 0, 0, 0}
IF(IF(LEN(SUBSTITUTE(D:D,"ABC",""))<LEN(D:D),1,0) = {0, 1, 0, 0, 0, 0, 0, 1}
IF(IF(LEN(SUBSTITUTE(E:E,"ABC",""))<LEN(E:E),1,0) = {0, 0, 0, 1, 1, 1, 0, 0}
+= {0, 2, 0, 2, 1, 1, 0, 1}
All we have to do then is check if the number in the array is >0 and multiply it by column C using SUMPRODUCT:
{0, 2, 0, 2, 1, 1, 0, 1 }
>0 {0, 1, 0, 1, 1, 1, 0, 1 }
*C:C {0, 150, 40, 35, 200, 120, 40, 800}
= {0, 150, 0, 35, 200, 120, 0, 800}
-----------------------------------------
SUM = 1305
Since we are looking for ABC in any of the cells, we can use CONCATENATE-FIND to join all the cells together and then find ABC in the new string. This saves a ton of code and simplifies the logic. It always makes it easier to expand to more cells.
Ranges for reference
Formula in G1. This is an array formula (enter with CTRL+SHIFT+ENTER).
=SUM(IF(ISERR(FIND("ABC",CONCATENATE(A1:A7,B1:B7,D1:D7,E1:E7))), 0, C1:C7))
How it works
CONCATENATE forms a single large string with all the columns combined
FIND looks for ABC in that single string. It will return a number if found and an error (#VALUE) otherwise.
ISERR checks if the error was returned
IF decides if the value in column C should be returned or a 0 based on that error
SUM takes all of those numbers and adds them

Resources