Excel - mark duplicate value and if required or not - excel

I have a data like this:
ID ID-Name CountUnique Required Available **Results**
1-Line 1 Line 1 1 1 Y Y
2-Line 1 Line 1 0 0 N Y-Duplicate
3-Line 1 Line 1 0 1 N Y-Duplicate
1-Line 2 Line 2 1 0 N N-Duplicate
2-Line 2 Line 2 0 1 N N
3-Line 2 Line 2 0 1 N N-Duplicate
I am using excel and I want to use an if condition to determine when a ID have the same values and if it is available or not. If it has duplicates and is available, I want to have Y (for the one that is available) and Y-Duplicate in Results column for all the same ID-Names (regardless if other ID-Names are available or not) and if not available similar logic.
How can I do this for the whole sheet?
My attempts were based on the following logic. If I am able to do it for individual steps then I can combine. The issue I noticed is that I need to take into account the ID-Name and have it used.
Current formulas:
=IF(AND([#Available] = "Y", [#CountUnique] =1),"Y", "Y-Duplicate")
=IF(AND([#Available] = "Y", [#CountUnique] =0),"Y-Duplicate","")
=IF(AND([#Available] = "N", [#CountUnique] =1),"N", "N-Duplicate")
=IF(AND([#Available] = "N", [#CountUnique] =0),"N-Duplicate","")
Thanks.

Not sure if I understood your logic properly, but just in case, note that you may benefit from you field CountUnique
My formula in Results is:
=IF(C2=1;E2;E2&"-Duplicate")

Related

Excel - assign values based on the first unique item

I have got an excel question that I can not answer. Here is my table:
ID Key Count Unique Available Text Results
1 0 Text-1 Dupe-Y
2 1 Y Text-1 Y
3 0 Text-1 Dupe-Y
4 0 Text-1 Dupe-Y
5 1 N Text-2 N
6 1 Y Text-3 Y
7 0 Text-2 Dupe-N
8 0 Duplicate Text-2 Dupe-N
9 0 Duplicate Text-2 Dupe-N
10 0 Y Text-2 Dupe-N
Id Key is just unique key.
Count unique picks up the first time each value in column Text appears. Available can have Y, N, Duplicate and Text is the main column I need to analyze my table. The Results are for the first time each value in Text appears (Count unique = 1), if there is a value in Available then that is the value I need, if Count Unique is 0 then is either Dupe-Y or Dupe-N depending on the value in Available.
I tried with a formula like this one but got stuck after initial progress. =IF(B2=0,"",IFERROR(IF(COUNTIF(D:D,D2)>1,IF(COUNTIF($D:$D,D2)=1,"",C2),1),1))
Note that the column Results is the one I need to populate with a formula that is not affected by sorting or lack of it.
I guess you got all those values and you just need a formula for column Results.
My formul will work only if the data is sorted like in your example. If sorting changes, formula will fail:
My formula is:
=IF(B2=1;D2;"Dupe-"&RIGHT(G1;1))

Unique function in python

for i in df["col"].unique():
...
Here unique function is called after each iteration of loop or is it just called once and stores the result in memory??
Asking this just to check if unique function is executed after every iteration then there is chance that even in next iteration i might be same as in previous iteration.
Ex first time unique function is called then df["col"].unique() gives [1,2] so i would be 1 for first iteration and in second iteration unique function is again called and i may again get 1 as value.
The construction you are using first calculates the .unique() function and uses the result of that function, if iterable, to loop over.
If you'd want the loop to evaluate a function every iteration, you could use structures like:
list = [x.function() for x in items]
check this link for more information:
https://opensource.com/article/18/3/loop-better-deeper-look-iteration-python
Here is a quick code example to see how it works :
df = pd.DataFrame(0, index=np.arange(2), columns=['1','2'])
df.iloc[0][0]=1
for i in df['1'].unique():
print(f'unique values : {i}')
print(df)
df.iloc[1][0]=2
First, we create a 2x2 Dataframe of zeroeswith a 1 at [1][1] position :
1 2
0 1 0
1 0 0
Then we call unique to get all the uniques values of column 1 (i.e 0 and 1).
During the loop, we change the value one of the cell on column 1 (the one we iterate over). But as you can see in the output, it does not add any iteration in the loop.
This means that df.unique() store the result before iterating over it just as MdBrainz said and that modifying the value during the loop ain't going to change how many time the loop iterate.
Output :
unique values : 1
1 2
0 1 0
1 0 0
unique values : 0
1 2
0 1 0
1 2 0

Counting digit in column based on subject

I am just using formulas in excel and was wondering how you could count all the 0s until a 1 is reached, and then start the process over again, based on subject number. If this is not possible simply with formulas, how could I write a VBA code for this?
Right now I am trying to use,
=IF(OR(F4=0,F3=1),"",COUNTIFS($A$2:A2, $A$2,$F$2:F2,0)-SUM($I$2:I2))
which I input in I3 and I change the COUNTIFS($A$#:A#, $A$#...) part for each subject number.
This seems to work with the exception of the last grouping, as it won't output a number before the next subject.
Example Data:
subid yes number_yes(output)
1 0
1 0
1 0 3
1 1
1 0 1
1 1
1 0
2 0
2 0 2
2 1
2 0
2 0
3
etc.
A blank cell is numerically zero and that is one of your accepted conditions. Differentiate between blanks and zero values.
=IF(and(f4<>"", OR(F4=0,F3=1)),"",COUNTIFS($A$2:A2, $A$2,$F$2:F2,0)-SUM($I$2:I2))
Based on #Jeeped answer. If you use -SUMIF($A$2:A2,A3,$I$2:I2) instead of -SUM($I$2:I2) you don't need to adjust this part for each subject number. Just use the following formula in I3 and copy it down.
=IF(AND(F4<>"",OR(F4=0,F3=1)),"",COUNTIFS($A$2:A3,A3,$F$2:F3,0)-SUMIF($A$2:A2,A3,$I$2:I2))
Note that I also changed the second parameter in the COUNTIFS to A3.

Conditional formatting in Excel for totals

X Y Z
1 5 0
1 4 0
1 9 1
2 5 0
2 4 0
2 **8** 1
Basically, I have an excel table with 3 variables. X is an group variable, Y is a value, and Z is a dummy variable that indicates if the Y value in that row is a total or not. Is there any way to write the conditional formatting rule so that discrepancies between SUM(Y) over the same X, and the supposed total are highlighted?
In the table above, The third row with X would not be marked because 5+4=9, so there is no discrepancy, but the 6th row would be marked because 5+4!=8, so that one should be highlighter. I appreciate your help!
Use SUMIFS();
Assuming your data starts in A1
=AND(SUMIFS($B:$B,$A:$A,$A1,$C:$C,0)<>$B1,$C1=1)
Apply it to B:B.

Excel split given number into sum of other numbers

I'm trying to write formulae that will split a given number into the sum of 4 other numbers.
The other numbers are 100,150,170 and 200 so the formula would be
x = a*100+b*150+c*170+d*200 where x is the given number and a,b,c,d are integers.
My spreadsheet is set up as where col B are x values, and C,D,E,F are a,b,c,d respectively (see below).
B | C | D | E | F |
100 1 0 0 0
150 0 1 0 0
200 0 0 0 1
250 1 1 0 0
370 0 0 1 1
400 0 0 0 2
I need formulae for columns C,D,E,F (which are a,b,c,d in the formula)
Your help is greatly appreciated.
UPDATE:
Based on the research below, for input numbers greater than 730 and/or for all actually divisible input numbers use the following formulas:
100s: =CHOOSE(MOD(ROUNDUP([#number]/10;0); 20)+1;
0;1;1;0;1;1;0;1;0;0;1;0;0;1;0;0;1;0;1;1)
150s: =CHOOSE(MOD(ROUNDUP([#number]/10;0); 10)+1;
0;0;1;1;0;1;1;0;0;1)
170s: =CHOOSE(MOD(ROUNDUP([#number]/10;0); 5)+1;
0;3;1;4;2)
200s: =CEILING(([#number]-930)/200;1) +
CHOOSE(MOD(ROUNDUP([#number]/10;0); 20)+1;
4;1;2;0;2;3;1;3;1;2;4;2;3;0;2;3;0;3;0;1)
MOD(x; 20) will return numbers 0 - 19, CHOOSE(x;a;b;...) will return n-th argument based on the first argument (1=>second argument, ...)
see more info about CHOOSE
use , instead of ; based on your Windows language&region settings
let's start with the assumption that you want to preferably use 200s over 170s over 150s over 100s - i.e. 300=200+100 instead of 300=2*150 and follow the logical conclusions:
the result set can only contain at most 1 100, at most 1 150, at most 4 170s and unlimited number of 200s (i started with 9 170s because 1700=8x200+100, but in reality there were at most 4)
there are only 20 possible subsets of (100s, 150s, 170s) - 2*2*5 options
930 is the largest input number without any 200s in the result set
based on observation of the data points, the subset repeats periodically for
number = 740*k + 10*l, k>1, l>0 - i'm not an expert on reverse-guessing on periodic functions from data, but here is my work in progress (charted data points are from the table at the bottom of this answer)
the functions are probably more complicated, if i manage to get them right, i'll update the answer
anyway for numbers smaller than 740, more tweaking of the formulas or a lookup table are needed (e.g. there is no way to get 730, so the result should be the same as for 740)
Here is my solution based on lookup formulas:
Following is the python script i used to generate the data points, formulas from the picture and the 60-row table itself in csv format (sorted as needed by the match function):
headers = ("100s", "150s", "170s", "200s")
table = {}
for c200 in range(30, -1, -1):
for c170 in range(9, -1, -1):
for c150 in range(1, -1, -1):
for c100 in range(1, -1, -1):
nr = 200*c200 + 170*c170 + 150*c150 + 100*c100
if nr not in table and nr <= 6000:
table[nr] = (c100, c150, c170, c200)
print("number\t" + "\t".join(headers))
for r in sorted(table):
c100, c150, c170, c200 = table[r]
print("{:6}\t{:2}\t{:2}\t{:2}\t{:2}".format(r, c100, c150, c170, c200))
__________
=IF(E$1<740; 0; INT((E$1-740)/200))
=E$1 - E$2*200
=MATCH(E$3; table[number]; -1)
=INDEX(table[number]; E$4)
=INDEX(table[100s]; E$4)
=INDEX(table[150s]; E$4)
=INDEX(table[170s]; E$4)
=INDEX(table[200s]; E$4) + E$2
__________
number,100s,150s,170s,200s
940,0,0,2,3
930,1,1,4,0
920,0,1,1,3
910,0,0,3,2
900,1,0,0,4
890,0,1,2,2
880,0,0,4,1
870,1,0,1,3
860,0,1,3,1
850,1,1,0,3
840,1,0,2,2
830,0,1,4,0
820,1,1,1,2
810,1,0,3,1
800,0,0,0,4
790,1,1,2,1
780,1,0,4,0
770,0,0,1,3
760,1,1,3,0
750,0,1,0,3
740,0,0,2,2
720,0,1,1,2
710,0,0,3,1
700,1,0,0,3
690,0,1,2,1
680,0,0,4,0
670,1,0,1,2
660,0,1,3,0
650,1,1,0,2
640,1,0,2,1
620,1,1,1,1
610,1,0,3,0
600,0,0,0,3
590,1,1,2,0
570,0,0,1,2
550,0,1,0,2
540,0,0,2,1
520,0,1,1,1
510,0,0,3,0
500,1,0,0,2
490,0,1,2,0
470,1,0,1,1
450,1,1,0,1
440,1,0,2,0
420,1,1,1,0
400,0,0,0,2
370,0,0,1,1
350,0,1,0,1
340,0,0,2,0
320,0,1,1,0
300,1,0,0,1
270,1,0,1,0
250,1,1,0,0
200,0,0,0,1
170,0,0,1,0
150,0,1,0,0
100,1,0,0,0
0,0,0,0,0
Assuming that you want as many of the highest values as possible (so 500 would be 2*200 + 100) try this approach assuming the number to split in B2 down:
Insert a header row with the 4 numbers, e.g. 100, 150, 170 and 200 in the range C1:F1
Now in F2 use this formula:
=INT(B2/F$1)
and in C2 copied across to E2
=INT(($B2-SUMPRODUCT(D$1:$G$1,D2:$G2))/C$1)
Now you can copy the formulas in C2:F2 down all columns
That should give the results from your table

Resources