Nested SEARCH and IF grid - excel

Column A lists the categories for individual products. Each product will have between 1 to 14 categories. The goal is to split all categories into separate cells (Columns B through O), to become easier to sort.
I've created formulas for columns B through O, to SEARCH for hyphens "-" and separate each category into its own column. Here's what the output should look like (except for the bottom row):
A B C D E F G ... O
Categories Cat 1 Cat 2 Cat 3 Cat 4 Cat 5 Cat 6 ... Cat 14
UX UX
CFT-WET CFT WET
WEM-US-CFT WEM US CFT
NC-US-CFT NC US CFT
TP-OB-SB-WEB TP OB SB WEB
DB-B-FC DB B FC
P-TP-SB-CP-DT P TP SB CP DT
DP-S-OB-WB-SB-FC DP S OB WB SB FC
P-TP-SB-CP-WEB-WS-S-TP-OB-C-CT-G-FC-MCB
I initially built these formulas:
Col B: =if(iserror(search("-",$A5)),$A5,search("-",$A5))
Col C: =if(iserror(search("-",$A5,sum(search("-",$A5)+1))),"",mid($A5,sum(search("-",$A5),1),sum(search("-",$A5,sum(search("-",$A5)+1)),-search("-",$A5),-1)))
Col D: =if(iserror(search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)),"",mid($A5,sum(search("-",$A5,search("-",$A5)+1),1),sum(search("-",$A5,search("-",$A5,search("-",$A5)+1)+1),-search("-",$A5,search("-",$A5)+1),-1)))
Col E: =if(iserror(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)),"",mid($A5,sum(search("-",$A5,search("-",$A5,search("-",$A5)+1)+1),1),sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1),-search("-",$A5,search("-",$A5,search("-",$A5)+1)+1),-1)))
Col F: =if(iserror(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)),"",mid($A5,sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1),1),sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1),-search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1),-1)))
Col G: =if(iserror(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)),"",mid($A5,sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1),1),sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1),-search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1),-1)))
Col H: =if(iserror(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)),"",mid($A5,sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1),1),sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1),-search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1),-1)))
Col I: =if(iserror(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)),"",mid($A5,sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1),1),sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1),-search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1),-1)))
Col J: =if(iserror(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)),"",mid($A5,sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1),1),sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1),-search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1),-1)))
Col K: =if(iserror(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)),"",mid($A5,sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1),1),sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1),-search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1),-1)))
Col L: =if(iserror(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)),"",mid($A5,sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1),1),sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1),-search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1),-1)))
Col M: =if(iserror(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)),"",mid($A5,sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1),1),sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1),-search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1),-1)))
Col N: =if(iserror(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)),"",mid($A5,sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1),1),sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1),-search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1),-1)))
Col O: =if(iserror(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)),"",mid($A5,sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1),1),sum(search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1),-search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1)+1),-1)))
...but these fail to recognize the final category for each row. Col B's formula has no problems, so I began to edit the formulas in Columns C to O:
Col C: =if(iserror(search("-",$A5,sum(search("-",$A5)+1))),if($B5<>"",right($A5,sum(len($A5),-search("-",$A5))),""),mid($A5,sum(search("-",$A5),1),sum(search("-",$A5,sum(search("-",$A5)+1)),-search("-",$A5),-1)))
Col D: =if(iserror(search("-",$A5,search("-",$A5,search("-",$A5)+1)+1)),if($C5<>"",if(iserror(search("-",$A5,search("-",$A5)+1)),"",right($A5,len(sum($A5,-search("-",$A5,search("-",$A5)+1))))),""),mid($A5,sum(search("-",$A5,search("-",$A5)+1),1),sum(search("-",$A5,search("-",$A5,search("-",$A5)+1)+1),-search("-",$A5,search("-",$A5)+1),-1)))
It appears this solution will work for all situations except when Column A has only one category. When that happens, a #VALUE! error will populate all columns to the right of Col B. How do I solve this?
For now, I'm stuck with this:
A B C D E F G ... O
Categories Cat 1 Cat 2 Cat 3 Cat 4 Cat 5 Cat 6 ... Cat 14
UX UX #VALUE! #VALUE! #VALUE! #VALUE! #VALUE! ... #VALUE!
CFT-WET CFT WET
WEM-US-CFT WEM US CFT
NC-US-CFT NC US CFT
TP-OB-SB-WEB TP OB SB WEB
DB-B-FC DB B FC
P-TP-SB-CP-DT P TP SB CP DT
DP-S-OB-WB-SB-FC DP S OB WB SB FC
P-TP-SB-CP-WEB-WS-S-TP-OB-C-CT-G-FC-MCB

Here's what I would use
I'm assuming row 1 of your spreadsheet contains "Cat 1", "Cat 2" etc. This formula relies on there being a space between "Cat" and the category number. I'm also assuming column A contains the concatenated string.
Note this requires that no category will ever have the character : in it. If there is a risk of this, you can modify the ":" to become any other character in the formula below (the below example goes in cell B2
=IF(LEN($A2)-LEN(SUBSTITUTE($A2,"-",""))+1>=VALUE(MID(B$1,SEARCH(" ",B$1)+1,100)), MID("-"&$A2&"-",SEARCH(":",SUBSTITUTE("-"&$A2&"-","-",":",VALUE(MID(B$1,SEARCH(" ",B$1)+1,100))))+1,SEARCH(":",SUBSTITUTE("-"&$A2&"-","-",":",VALUE(MID(B$1,SEARCH(" ",B$1)+1,100)+1)))-SEARCH(":",SUBSTITUTE("-"&$A2&"-","-",":",VALUE(MID(B$1,SEARCH(" ",B$1)+1,100))))-1),"")
A brief explanation: the last optional term of SUBSTITUTE is instance_num, which means you can change the nth instance of "-" to ":" and then search on that. This keeps the formula pretty simple. I've also added a "-" before and after the string in the formula to avoid having to use any if statements (there will always be a trailing and preceding "-")

Related

Pandas Filter rows by comparing columns A1 with A2

CHR
SNP
BP
A1
A2
OR
P
8
rs62513865
101592213
T
C
1.00652
0.8086
8
rs79643588
106973048
A
T
1.01786
0.4606
I have this table example, and I want to filter rows by comparing column A1 with A2.
If this four conditions happen, delete the line
A1
A2
A
T
T
A
C
G
G
C
(e.g. line 2 in the first table).
How can i do that using python Pandas ?
here is one way to do it
Combine the two columns for each of the two DF. Make it a list in case of the second DF and search the first combination in the second one
df[~(df['A1']+df['A2']).str.strip()
.isin(df2['A1']+df2['A2'].tolist())]
CHR SNP BP A1 A2 OR P
0 8 rs62513865 101592213 T C 1.00652 0.8086
keeping
Assuming df1 and df2, you can simply merge to keep the common values:
out = df1.merge(df2)
output:
CHR SNP BP A1 A2 OR P
0 8 rs79643588 106973048 A T 1.01786 0.4606
dropping
For removing the rows, perform a negative merge:
out = (df1.merge(df2, how='outer', indicator=True)
.loc[lambda d: d.pop('_merge').eq('left_only')]
)
Or merge and get the remaining indices to drop (requires unique indices):
out = df1.drop(df1.reset_index().merge(df2)['index'])
output:
CHR SNP BP A1 A2 OR P
0 8.0 rs62513865 101592213.0 T C 1.00652 0.8086
alternative approach
As it seems you have nucleotides and want to drop the cases that do not match a A/T or G/C pair, you could translate A to T and C to G in A1 and check that the value is not identical to that of A2:
m = df1['A1'].map({'A': 'T', 'C': 'G'}).fillna(df1['A1']).ne(df1['A2'])
out = df1[m]

How to create a one to many relationship?

The title may be confusing/misleading; I'm frankly having trouble trying to say what I need in a concise manner.
I have 2 lists of distinct values in Excel.
List A:
1
2
3
List B:
C
D
E
I need to create a sheet that shows a one to many relationship where List A is the 'One' and list B is the 'Many'. So the result would be something like :
Ouput:
1 C
1 D
1 E
2 C
2 D
2 E
3 C
3 D
3 E
The results are not concatenated and are in their own cols/rows. Any suggestions?
Assuming list 1 is in A1:A3, list to is in B1:B3. Then in D1 put :
=IF(CEILING(ROW()/ROWS($A$1:$A$3),1)>ROWS($A$1:$A$3),"",INDIRECT("A"&CEILING(ROW()/ROWS($A$1:$A$3),1),TRUE))
and in E1 :
=IF(CEILING(ROW()/ROWS($B$1:$B$3),1)>ROWS($B$1:$B$3),"",INDIRECT("B"&IF(MOD(ROW(),ROWS($B$1:$B$3))=0,ROWS($B$1:$B$3),MOD(ROW(),ROWS($B$1:$B$3))),TRUE))
and drag both downwards.
Idea : Use row() to 'guide' how the which cell will indirect() address to. You can test the given mod() and ceiling function separately to 'examine' how the pattern works. [do ask if you didn't get it.] (:
please share if it works/not.

Can I use SUMPRODUCT to accomplish this?

Need to sum a range based on if a value is in a column and one of a set of values is in another column, or vice versa.
e.g. I have The following table:
A B C D
M C C 1
F C C 2
S N C 3
S N N 4
M - C 5
N C C 6
M C N 7
If (Column A contains "M" or "S") AND ((Column B contains "C" AND Column C Contains "C" Or "N" Or "-") OR (Column C contains "C" AND Column B Contains "C" Or "N" Or "-")) Then Sum column D
So from my table my results would be
1 + 3 + 5 + 7 = 16
You can use SUMPRODUCT like this:
=SUMPRODUCT(ISNUMBER(MATCH(A2:A10,{"M","S"},0)*MATCH(B2:B10&"^"&C2:C10,{"C^C","C^N","C^-","N^C","-^C"},0))+0,D2:D10)
MATCH is used to check for both valid possibilities in column A and then all 5 possibilities for concatenated columns B and C - if those conditions are met then column D will be summed. Extend column ranges as required but preferably don't use whole columns
.....or shorter with SUMIFS like this:
=SUM(SUMIFS(D:D,A:A,{"M";"S"},B:B,{"C","C","C","N","-"},C:C,{"C","N","-","C","C"}))
For that version you can use whole columns with no loss of efficiency.
Note that in this version all the separators in the array constants are commas EXCEPT for the semi-colon in {"M";"S"} which needs to be that way
I would add a fifth column with a condition for the current line returning the value in D if all conditions are true or 0 otherwise.
=Iif(AND(Or($A1 = "M", $A1 = "S"),OR(AND($B1 = "C",Or($C1 = "C",$C1 = "N",$C1 = "-")), AND($C1 = "C",OR($B1 = "C",$B1 = "N",$B1 = "-")))),$D1,0)
Then in a cell somewhere write =sum($E:$E). With your example, I get 16, as intended.

How to align and merge matching values, from various columns, in excel

I have had some success matching data from various columns and creating a new data output. Here is what I typically start with;
COL A COL B COL C COL D
ITEM VALUE ITEM VALUE2
---- ---- ---- ----
A 1 B 100
B 2 A 200
C 3 F 300
G 4 E 400
H 5 C 500
J 6 M 600
And I can achieve this result using VLOOKUP;
COL E COL F COL G
ITEM VALUE VALUE2
---- ---- ----
A 1 200
B 2 100
C 3 500
G 4
H 5
J 6
But what I'm really after is this: both matching AND merging;
COL E COL F COL G
ITEM VALUE VALUE2
---- ---- ----
A 1 200
B 2 100
C 3 500
E 400
F 300
G 4
H 5
J 6
M 600
I'm punching a tad above my weight class with this one, so any help would be greatly appreciated.
With something like:
=COUNTIF(A:A,C2)
in a column and copied down you should get indication of which items are not already present in ColumnA. Sort with C:D on that basis and simply copy the items associated with 0 to ColumnA and their values to ColumnD, then sort and apply your VLOOKUP.
The following is based on: http://www.get-digital-help.com/2009/05/25/create-a-drop-down-list-containing-only-unique-distinct-alphabetically-sorted-text-values-using-excel-array-formula/#comment-74005
It's a great site for clever use of Excel formulas and does a good job of explaining the monster formulas below, creating dynamic named ranges, and how to work with array formulas. Look at it!
My solution is not quite as elegant as I'd like but it works. So, you'll need to create some named ranges.
One for the items in COL A (e.g. "Items1") and one for the items in COL C (e.g. "Items2")
In COL G we'll have another List "AllItems1". In COL H we'll have another List "AllItems2". You'll need to make a named range of "AllItems1" too.
The array formula in COL G is:
=IFERROR(IFERROR(INDEX(Items1,MATCH(0,IF(MAX(NOT(COUNTIF($G$1:$G1,Items1))* COUNTIF(Items1,">"&Items1)+1))=(COUNTIF(Items1,">"&Items1)+1),0,1),0)),INDEX(Items2,MATCH(0,IF(MAX(NOT(COUNTIF($G$1:$G1,Items2))*(COUNTIF(Items2,">"&Items2)+1))=(COUNTIF(Items2,">"&Items2)+1),0,1),0))),"")
Which is a mouthful. This cascades through each list (i.e. Items1 and Items2) and gives you a non-repetitive sorta alphabetized list of the items in "Items1" and "Items2".
COL G is really just an intermediate step to get us to COL H. I haven't found a way to combine two lists without doing this step. Let me know if you find a better way.
To get a non-repetitive AND alphabetized list of the items we put the following array formula in COL H:
=IFERROR(INDEX(AllItems1,MATCH(0,IF(MAX(NOT(COUNTIF($H$1:$H1,AllItems1))*(COUNTIF(AllItems1,">"&AllItems1)+1))=(COUNTIF(AllItems1,">"&AllItems1)+1),0,1),0)),"")
Then we do VLOOKUP s off of "AllItems2".
Put the following VLOOKUP in COL I :
=IFERROR(VLOOKUP($H2,$A$2:$B$7,2,0),"")
And put the following VLOOKUP in COL J:
=IFERROR(VLOOKUP($H2,$C$2:$D$7,2,0),"")
Of course, you could make "AllItems2" a named range also and use that in the VLOOKUP s.
I think that gets you what you want.

How to sum constants if the values of a row contian a specific value in excel?

I have the following row in excel:
12 4 12p 12a 12b
I need to sum this elements with their values from the legend.
12 = 12;
4 = 4;
12p = 12,5;
12a = 12,2;
12b = 12,3;
For example
=12 + 4 + 12,5 + 12,2 + 12,3
Any ideas?
If you have all the elements within one cell as a single string of text, the optimal approach would be to start by using text-to-column to split them up. So you'll have 12 in A, 4 in B, 12p in C, 12a in D, 12b in E. If that's not an option, I can show you string manipulations that can be an alternative.
You'll need to turn your "legend" into a look-up table, (perhaps on sheet2?), with column A having: p, a, b, etc.. and column B having the relative values.
Once that's done, place this formula on sheet1, in F column:
=A2+IFERROR(VLOOKUP(RIGHT(A2),Sheet2!$A:$B,2,FALSE),0)
Then drag it to the right 5 times, and it will have the values of the elements "translated".
You can sum the translated range easily.

Resources