I have a table with 2 colums filled with numbers, like this:
A | B
-----
1 | 2
3 | 1
4 | 3
5 | 2
1 | 2
I would like to know how to obtain the number of coincidences in which there is a '1' in B A and out of those how many have a '2' in their correspondent row in B.
So for the example the result would be 2, because there is a 1&2 in the first row and a 1&2 in the last row:
The equivalent in code would be something like:
%MATLAB SINTAX
A = {1 ; 3 ; 4 ; 5 ; 1};
B = {2 ; 1 ; 3 ; 2 ; 2};
sum = 0;
for i=1:length(A)
if(A(i)==1 and B(i)==2)
sum = sum+1;
end
end
In this case, sum is the result that i want.
I was hoping to do something like SUM(IF(AND(A1:A5=1,B1:B5=2),1,{0))
Notes: This is for an assignment, the rules are simply no macros, just one formula without partial results in other cells.
Thank you for your answers.
There are so many ways and as the comments state, COUNTIFS() would be the simplest and most effective...
As you provided a coded example I thought I would try to formulate your logic as closely as I can with an array formula like this: (Ctrl+Shift+Enter while still in the formula bar)
=IFERROR(SUM(IF(IF(A1:A5=1,B1:B5)=2,1)),0)
We build an array of either FALSE or the resulting B:B cell content using the inner if (IF(A1:A5=1,B1:B5)) then equate that array against the logical in the outer IF([innerIf]=2,1) to get an array of FALSE or 1 which we then sum to get the result. I think it will handle the errors as is treating FALSE as 0, but as I wrote this pseudo I wrapped it in an IFERROR() just in case (if errors still occur, provide the false variable of the IF() statements as 0).
The issue with AND() is that it doesn't perform in array constructs, or at least I have never got it to produce an array result.
Related
In pyspark, I'm trying to replace multiple text values in a column by the value that are present in the columns which names are present in the calc column (formula).
So to be clear, here is an example :
Input:
|param_1|param_2|calc
|-------|-------|--------
|Cell 1 |Cell 2 |param_1-param_2
|Cell 3 |Cell 4 |param_2/param_1
Output needed:
|param_1|param_2|calc
|-------|-------|--------
|Cell 1 |Cell 2 |Cell 1-Cell 2
|Cell 3 |Cell 4 |Cell 4/Cell 3
In the column calc, the default value is a formula. It can be something as much as simple as the ones provided above or it can be something like "2*(param_8-param_4)/param_2-(param_3/param_7)".
What I'm looking for is something to substitute all the param_x by the values in the related columns regarding the names.
I've tried a lot of things but nothing works at all and most of the time when I use replace or regex_replace with a column for the replacement value, the error the column is not iterable occurs.
Moreover, the columns param_1, param_2, ..., param_x are generated dynamically and the calc column values can some of these columns but not necessary all of them.
Could you help me on the subject with a dynamic solution ?
Thank you so much.
Best regards
Update: Turned out I misunderstood the requirement. This would work:
for exp in ["regexp_replace(calc, '"+col+"', "+col+")" for col in df.schema.names]:
df=df.withColumn("calc", F.expr(exp))
Yet Another Update: To Handle Null Values add coalesce:
for exp in ["coalesce(regexp_replace(calc, '"+col+"', "+col+"), calc)" for col in df.schema.names]:
df=df.withColumn("calc", F.expr(exp))
Input/Output:
------- Keeping the below section for a while just for reference -------
You can't directly do that - as you won't be able to use column value directly unless you collect in a python object (which is obviously not recommended).
This would work with the same:
df = spark.createDataFrame([["1","2", "param_1 - param_2"],["3","4", "2*param_1 + param_2"]]).toDF("param_1", "param_2", "calc");
df.show()
df=df.withColumn("row_num", F.row_number().over(Window.orderBy(F.lit("dummy"))))
as_dict = {row.asDict()["row_num"]:row.asDict()["calc"] for row in df.select("row_num", "calc").collect()}
expression = f"""CASE {' '.join([f"WHEN row_num ='{k}' THEN ({v})" for k,v in as_dict.items()])} \
ELSE NULL END""";
df.withColumn("Result", F.expr(expression)).show();
Input/Output:
HI.
how can i come up with return value of "company name" (column H) at Column B IF any of the "PrefiX" (Column G) found at "con no" (Column A).
Sample of outcome needed as in column B.
Sample:
620011113 = DD
CN1234 = BB
thanks
=INDEX($H:$H,AGGREGATE(15,6,ROW($G$1:$G$7)/(--(FIND($G$1:$G$7,$A2)=1)*--(LEN($G$1:$G$7)>0)),1),1)
Breaking this down, the INDEX retrieves the Nth item from Column H (Company name). To find the value of N, we are using the AGGREGATE function
AGGREGATE is a weird function - it lets us use things like MAX or LARGE or SUM while ignoring any error values. In this case, we will be using it for SMALL (first argument, 15), while Ignoring Error Values (second argument, 6). We will want the very smallest value, so the fourth argument will be 1. (If we wanted the second smallest, it would be 2, and so on)
=INDEX($H:$H,AGGREGATE(15,6, <SOMETHING> ,1),1)
So, all we need now is a list of values to compare! To make things slightly simpler, I'll break that bit of the code out for you here:
ROW($G$1:$G$7) / (--(FIND($G$1:$G$7,$A2)=1) * --(LEN($G$1:$G$7)>0))
There are 3 parts to this. The first, ROW($G$1:$G$7)is the actual value we want to retrieve - these will be the Row Numbers for each Prefix that matches your value. On its own, however, it will be all the row numbers. Since we are skipping errors, we want any Rows that don't match the prefix to throw an error. The easiest way to do this is to Divide by Zero
At the start of --(FIND($G$1:$G$7,$A2)=1) and --(LEN($G$1:$G$7)>0) we have a double-negative. This is a quick way to convert True and False to 1 and 0. Only when both tests are True will we not divide by 0, as this table shows:
A | B | A*B
1 | 1 | 1
1 | 0 | 0
0 | 1 | 0
0 | 0 | 0
Starting with the second test first (it's easier), we have LEN($G$1:$G$7)>0 - basically "don't look at blank cells".
The other test (FIND($G$1:$G$7,$A2)=1) will search for the Prefix in the Con No, and return where it is found (or a #VALUE! error if it isn't). We then check "is this at position 1" - in other words, "Is this at the start of the Con No, rather than in the middle". We don't want to say Con No CNQ6060 is part of Company AA instead of Company BB by mistake!
So, if the Prefix is at the Start of the Con No, AND it isn't Blank (because there is an infinite amount of Nothing Before, After, and Between every number and letter), then we get it added to our list of Rows. We then take the smallest row (i.e. closest to the top - change AGGREGATE(15 to AGGREGATE(14 if you want the closest to the bottom!), and use that to get the Company Name
You could try the below formal:
=VLOOKUP(IF(LEFT(A3,1)="6",LEFT(A3,4),IF(LEFT(A3,1)="C",LEFT(A3,2),IF(LEFT(A3,1)="E",LEFT(A3,7)))),$G$3:$H$7,2,0)
Have in mind that you have to use ' before the cell value of column A & G in order to convert cell value into text get the correct out comes using VLOOKUP
Result:
I believed the condtions written will be quite long and i am not really good in writing this long formula
There are 6 columns i've used which is D ,E, M, N, O, P
Sample data:
D3=123456(Changing variable as it can be 12345, 12345A,123456A)
E3=1
M3=31
N3=_
O3=00
P3=0
The formula are design based on this Column D field(the variable changes is in this field) let say
if length of D3 = 6 then (the current formula i've done)
=IF(LEN(D3)=6,CONCATENATE(M3,D3,N3,O3,E3),CONCATENATE(M3,D3,O3,E3))
The outcome for this will be 31123456_001, if let say the D variable is changed to 123456A( the else
in the formula i've shown as no concatenate N3)
then the outcome will be 31123456A001.
I have added in column p, so that i can use it to concatenate to the format that i need.
There are a few more conditions i need to add in,
Which is
1. If the D3= 12345, the format outcome will be 31012345_001 (concatenate M3,P3,D3,N3,O3,E3)
2. If the D= 12345A, the format outcome will be 31012345A001 (concatenate M3,P3,D3,O3,E3)
3. Data for the column D3 field, 12345A, the A alphabet can be in A-Z.
These are the list of all conditions and outcome that i required in a formula.
1. D3 = 123456 then the outcome will be 31123456_001
2. D3 = 123456A then outcome will be 31123456A001
3. D3 = 12345 then outcome will be 31012345_001
4. D3 = 12345A then outcome will be 31012345A001
Additional info:
These are just format as it can be any numbers combinations, the last letter alphabet can be A-Z
D3 = 123456
D3 = 123456A
D3 = 12345
D3 = 12345A
As I couldn't quite catch all the conditions and outcomes, here is an example of how your formula could look:
=IF(LEN(D3)=5,Outcome_1_Concatenation,IF(LEN(D3)=7,Outcome_2_Concatenation,IF(ISNUMBER(VALUE(RIGHT(D3,1))),Outcome_3_Concatenation,Outcome_4_Concatenation)))
Outcome_1_Concatenation => replace with formula when LEN = 5
Outcome_2_Concatenation => replace with formula when LEN = 7
Outcome_3_Concatenation => replace with formula when LEN = 6 and all are numbers
Outcome_4_Concatenation => replace with formula when LEN = 6 and last is character
If you give all examples in a condition => outcome list, I would be glad to help further.
I would look at creating a lookup table range with 3 options for lengths of 5,6,7.
I named my lookup table range "Length".
First setup this lookup table like this:
5 |
=CONCATENATE(M$3,P$3,D$3,IF(ISNUMBER(VALUE(RIGHT(D3,1))),N3,""),O$3,E$3)
6 |
=CONCATENATE(M$3,IF(ISNUMBER(VALUE(RIGHT(D$3,1))),"",P$3),D$3,IF(ISNUMBER(VALUE(RIGHT(D3,1))),N$3,""),O$3,E$3)
7 |
=CONCATENATE(M$3,D$3,IF(ISNUMBER(VALUE(RIGHT(D$3,1))),N$3,""),O$3,E$3)
For any D3 value, it is checking if that last character is a letter, and if not it will insert N3, otherwise it leaves it out.
Also, for any 6 character value, it checks if the last character is a letter, and if so, it will insert P3, otherwise it leaves it out.
Then, your output formula should be:
=VLOOKUP(LEN(D3),Length,2,FALSE)
This makes it clean and simple.
This is your formula plus the added conditions 1 and 2:
=IF(D3=12345,CONCATENATE(M3,P3,D3,N3,O3,E3),IF(D3="12345A",CONCATENATE(M3,P3,D3,O3,E3),IF(LEN(D3)=6,CONCATENATE(M3,D3,N3,O3,E3),CONCATENATE(M3,D3,O3,E3)))
If you want a more generalized version you can check if D3 is a number, the length of it, if D3 ends with a letter, and replace the nested ifs according to your needs
I got my answers, it's
=IF(AND(LEN(D3)>=6,ISNUMBER(RIGHT(D3,1)*1)),M3&D3&N3&O3&E3,IF(AND(LEN(D3)<6,ISNUMBER(RIGHT(D3,1)*1)),M3&P3&D3&N3&O3&E3,IF(AND(LEN(D3)=6,ISTEXT(RIGHT(D3,1))),M3&P3&D3&O3&E3,M3&D3&O3&E3)))
I've got two data sets: Data-A and Data-B.
Data-A
A B C D Start_Date End_Date
N C P 1 23-05-2015 27-05-2015
N C K 1 30-05-2015 07-06-2015
N C Ke 1 09-06-2015 28-06-2015
N C Ch 1 14-07-2015 25-07-2015
N C Th 1 29-06-2015 13-07-2015
N C Po 2 23-05-2015 27-05-2015
N C Kan 2 30-05-2015 08-06-2015
Data-B
X D Date A B C
444 1 09-07-2015
455 1 20-07-2015
1542 1 28-06-2015
2321 1 21-07-2015
2744 1 01-07-2015
7455 2 25-05-2015
12454 2 02-06-2015
18568 2 24-05-2015
28329 2 03-06-2015
28661 2 31-05-2015
Values is data-Bare missing and I need to fill them using conditional index matching/vlookup such that column D(Data-B) is matched along with Date(Data-B) such that Start Date<= Date <=End Date.
Desired Output:
X D Date A B C
444 1 09-07-2015 N C Th
455 1 20-07-2015 N C Ch
1542 1 28-06-2015 N C Ke
2321 1 21-07-2015 N C Ch
2744 1 01-07-2015 N C Th
7455 2 25-05-2015 N C Po
12454 2 02-06-2015 N C Kan
18568 2 24-05-2015 N C Po
28329 2 03-06-2015 N C Kan
28661 2 31-05-2015 N C Kan
Proof of Concept
In order to achieve the above I used the AGGREGATE function. It is a normal formula that performs array like calculations. The following formula will return the results from the first row that matches your criteria.
=INDEX(A$2:A$8,AGGREGATE(15,6,ROW($D$2:$D$8)/(($J2=$D$2:$D$8)*($E$2:$E$8<=$K2)*($K2<=$F$2:$F$8)),1)-1)
This assumed your table Data-A Started in A1 and included 1 row as a header row. The formula can be place in the first cell under A in Data-B and copied down and to the right as needed.
UPDATE Formula explained
The aggregate function performs array calculations within its brackets for certain sub function. There are about 19 different subfunctions. Subfunction 14 and 15 are both array calculations. This is a nice feature since it does array like calculations while being a regular formula.
Since I wanted the first row that met your criteria, I opted to use the small function or subfunction 15 for the first argument. Basically I am telling the aggregate function to generate a list and sort it in ascending order.
The second argument has a value of 6 which tell the aggregate to ignore any results from the array that generate errors. This will come in very handy if we can make results we do not want turn in to errors.
Now we are getting into the array portion of the formula. You can take this next part of the equation and highlight the appropriate rows in a neighbouring column and enter it as a CONTROL+SHIFT+ENTER (CSE) formula. As long as you do this in the top cell the array formula will propagate to the remainder of the selected cells and show you the results of the array. Also check the formula bar to see if { } appeared around your formula. You cannot add the { } manually.
{=ROW($D$2:$D$8)/(($J2=$D$2:$D$8)*($E$2:$E$8<=$K2)*($K2<=$F$2:$F$8))}
What this will do is determine the current row and then will divide it by the results of our conditions. You can also try each of the following conditions in a separate column as CSE formulas in the same manner described above to see their results.
($J2=$D$2:$D$8)
($E$2:$E$8<=$K2)
($K2<=$F$2:$F$8)
These on their own will provide you with either TRUE or FALSE as it checks each row. Now the interesting thing is, and this applies to excel formulas, when you perform a math operation on a Boolean, it will treat 0 as false and anything other number as TRUE. It will actually convert TRUE to 1. You will also note that each of the logic checks was separated by *. In this case * is acting like an AND operator as only when all results are true will you get an answer of 1. (+ will act like an OR operator)
Now if you remember from earlier 6 said to ignore all errors. So any row that does not meet our logic check will result in a division by 0 since not all logic checks results in TRUE or 1. All the checks that wound up false wind up getting ignored. So now after doing that, a list of only row numbers that met our criteria is left inside the aggregates array.
After the logic check there is a ,1 for the next argument. In this case we are telling the aggregate to return the 1st number in the list which is the first row number that met our criteria. If we wanted the third number, this would be ,3 instead.
So aggregate is returning the first row number of the results we want. When this is paired with an INDEX function, when can use the result to tell us what row of the INDEX function to look in. In this case we said we wanted to look in the index A$2:A$8. The aggregate function is telling us how many rows to go down in the index. If the index had start in row 1 we would not have to do anything. But since there is a header row, we need to adjust the results from the aggregate function by subtracting 1 for the head row (in reality you need to subtract the row number above the start of your data). This is why you see the -1 after the aggregate function.
Now if you pay attention to the lock on the range you will notice I did not lock the A in A$2:A$8. I did this so that I could copy the formula to the right and the column A address would update as I did. This only works because you were keeping the columns in the same order. If the order has changed I would have changed the index from a 1D array to a 2D array and used a MATCH function to line up the column headers.
I have been trying to get this array function to output (non-zero) minimum values in the 'FINAL DATA' AE column. Can you see a structural error in this formula?
=IF($C$4="All EMEA",
MIN(IF('FINAL DATA'!$2:$AE$250000<>0,
('FINAL DATA'!$J$2:$J$250000=$C$4)*('FINAL DATA'!$E$2:$E$250000=$E$4)*( 'FINAL DATA'!$AE$2:$AE$250000))),
MIN(IF('FINAL DATA'!$AE$2:$AE$250000<>0,
('FINAL DATA'!$K$2:$K$250000=$C$4)*('FINAL DATA'!$E$2:$E$250000=$E$4)*( 'FINAL DATA'!$AE$2:$AE$250000)))
)
By using <>0 that will eliminate zeroes and blanks, so that isn't the problem.....[although if you only want to eliminate blanks and have zero as a valid return value you should use <>""]
You can't multiply the conditions with the number range because by multiplying you get zeroes for any rows where the conditions are not satisfied, use multiple IFs instead, like this:
=MIN(IF('FINAL DATA'!$AE$2:$AE$250000<>0,IF('FINAL DATA'!$J$2:$J$250000=$C$4,IF('FINAL DATA'!$E$2:$E$250000=$E$4,'FINAL DATA'!$AE$2:$AE$250000))))
Second line, you have !$2, no column specified.
MIN(IF('FINAL DATA'!$2:$AE$250000<>0,
Also, it looks like you are trying to run a single If comparison against a range, which I don't think will work the way you are trying to use it.
Barry has identified the core problem (tests returnimg 0 to the MIN function).
Here's a refactor of your formula (still an array formula) that solves this, and is quite a bit shorter
=MIN(IF(($S:$S<>0)*($E:$E=$E$4)*(IF($C$4="All EMEA",$J:$J,$K:$K)=$C$4),
($S:$S)))
Note that this (as would your original formaul, when fixed) will return 0 if there are no qualifying values >0 in the ranges
You can eliminate the zeros by using an IF() function in an array formula. Consider the following:
A
Row -----
1 0
2 7
3 5
4 6
5
6 3
The array formula =MIN(IF($A$1:$A$6>0,$A$1:$A$6)) will return 3 because the 0 and blank cell are eliminated with the >0 portion of the if statement.