Excel Conditional Data and Text Manipulation - excel

I have an Excel spreadsheet containing a number of columns most important of which are called "sequence", "modifications" and "signal". Column called "sequence" contains a number of entries, which repeat itself as long as there is different "modification". Each particular sequence with given "modification" assigned certain "signal" value.
Sequence Modification Signal
ABCDEF None 100
ABCDEF Carba 200
ABCDEF NEIAA 300
ABCDEF NEIAA,Carba 400
ABCDEFG None 400
ABCDEFG Carba 600
ABCDEFG NEIAA 700
ABCDEFG NEIAA, Carba 800
ABCDEFG 2XNEIAA 900
The task which I am having problem with consist of clustering similar sequences with different modification together, getting a total sum of signal for this particular group , dividing signal from individual group components to the calculated sum for this particular sequence cluster, then grouping obtained percentage values to entries with and without NEIAA tag, summing values for all entries with NEIAA tag within the group and reporting it as a final "% MODIFICATION" value for this particular cluster.
For example Sequence "ABCDEF" have total signal of 1000 with 30% and 40% belonging to entries with NEIAA tag, thus total % Modification for this particular cluster is 70%. Similarly for sequence "ABCDEFG" total % MODIF is equal to 100*(700+800+900)/(400+600+700+800+900).
Both formula or VBA would work for me.
My sample data is as below:

with
SUMIFS(C:C,A:A,A2)
you can summ all (which you allready got)
and with
SUMIFS(C:C,A:A,A2,B:B,"*NEIAA*")
you can sum only the ones including NEIAA
put everything together (we want it only once at the first sequence but have it dragable) just put in F2
=IF(AND(COUNTIF($A$1:A1,A2)=0,LEN(A2)>0),SUMIFS(C:C,A:A,A2,B:B,"*NEIAA*")/SUMIFS(C:C,A:A,A2),"")
if you still have questions, just ask
COUNTIF($A$1:A1,A2)=0 just checks to have the value in column A the first time and LEN(A2)>0) just skips the the blank cells ;)
EDIT:
Assuming everything is shifted to the right and column A gains a unique Keyword, so each combination of column A and B is like it is now for only column A you can try this pls: (put in G2 and then auto-fill down as you need it)
=IF(AND(COUNTIFS($B$1:B1,B2,$A$1:A1,A2)=0,LEN(B2)>0),SUMIFS(D:D,B:B,B2,C:C,"*NEIAA*",A:A,A2)/SUMIFS(D:D,B:B,B2,A:A,A2),"")
as said:
- everything is shifted to the right (insert column in front of everything)
- column A now holds the "run" -> sum up everything having the same "run" AND "sequence" (and the "NEIAA"-part)
If you still have any questions, just ask :)

Related

Count Matches in an Array, Duplicates Once

I have an array with a number of columns but am looking to count based on two columns in particular. I'm looking to have a function that will count how many blank products (column C) are in production (Column B). See image below, the desired output here would be 2 (111 and 333 are in production. 111 appears twice but should be counted only once).
Case Example Image
I'm sure there is a better way to do this, but this will get the job done.
=COUNT(UNIQUE(C1:C4*--(B1:B4="In Production")))-1

Excel multilevel array formula with partial string matches to sum resultant cells

I've been trying to sort this for over a day now without much luck. I have successfully used SUMIFS, INDEX, MATCH, COUNTIF, "--" etc array functions previously and am not a novice, but also not an expert on these. I can't seem to weave these together correctly, and likely on an altogether incorrect path.
Basically, I am trying to aggregate data from multiple spreadsheets, requiring a mapping of various items (rows) into a canonical form for summing.
The image here shows a representative, but simplified version of my quest. Each "region" on this example spreadsheet (Final..., Mapping, DataSet1, DataSet2) is actually in different spreadsheets, and there are several sheets with 50-150 rows in each xlsx.
Note that the names in Column B are quite arbitrary (meaning not all P1's have an 'x' pattern, like shown here as x1, x2, etc. Do not rely on any pattern in the names, except the x, y , z in the Mapping table are substrings (case insensitive, trailing match) of the names in Column B in the DataSets.
And in the image, the Final Result Table (summed manually) is what I want to compute via(an array) formula: A single formula would be ideal (given I have many spreadsheets from which the monthly data is being pulled from, so I can't readily modify but can create an interim spreadsheet if required, so open to helper columns or helper rows).
Here's the process - For each name (B3-B5) in the Final Result Table, I want to sum the name from it's components as follows:
Lookup all the matches in the Mapping Table (so for P1, the formula =IF($C$10:$C$15=$B3, $B$10:$B$15,"") gives {"x1";"";"";"x2";"";"x3"}.
I then want to search each of x1, x2, and x3 in B19:B26 to get rows 21, 22, 24, 25, 26 in DataSet1 and B31:B35 to get row 32 in DataSet2, to then add up the Jan totals into C3. (Effectively,
C3=C21+C22+C24+C25+C26+C32). Same for P2 and P3, and thru Feb, Mar, ...
I am stuck on how to remove blank or 0 or Div0 or such "error rows" from the interim result in 2, and also need to use 2 arrays of different sizes (3 valid rows in example 2 above, ignoring blanks) to search many rows in DataSets. I tried SEARCH("*"&IF($C$10:$C$15=$B3, $B$10:$B$15,""), $B$19:$B$26) but get unexpected results. I have tried to replace text in the interim result {"x1";"";"";"x2";"";"x3"} with TRUE/FALSE, and 1/0, etc. to help with INDEX or MATCH, but am stymied by errors in downstream ("surrounding") formulas.
Thanks in advance.
Here is a solution without resorting to nasty (imo) CSE formulas.
= SUMPRODUCT($C$19:$F$26*(COUNTIFS($B$10:$B$15, RIGHT($B$19:$B$26,2),$C$10:$C$15,$B3)>0)*($C$18:$F$18=C$2))
+
SUMPRODUCT($C$31:$F$35*(COUNTIFS($B$10:$B$15, RIGHT($B$31:$B$35,2),$C$10:$C$15,$B3)>0)*($C$30:$F$30=C$2))
There is one SUMPRODUCT for each data set. If possible, it would be better to put all your data sets into a single table with a column identify which data set it is a part of.
The way it works is to takes each values in your data set and multiplies it by whether the 2 right most character appear in your mapping table for that P code, multiplied by whether the value is in the correct month. So it returns 0 if either of those conditions are false. Then returns the sum.
UPDATE IN RESPONSE TO OP COMMENTS
If, the X,Y, Z codes are not always 2 digits but the first part is ALWAYS 8 digits, you can easily amend the:
RIGHT($B$19:$B$26,2)
to be:
RIGHT($B$19:$B$26,LEN($B$19:$B$26)-8)
Making the formula for the first data set:
=SUMPRODUCT($C$19:$F$26*(COUNTIFS($B$10:$B$15, RIGHT($B$19:$B$26,LEN($B$19:$B$26)-8),$C$10:$C$15,$B3)>0)*($C$18:$F$18=C$2))
And you can amend for other data sets and simply add them together.
Nice challenge! Are you willing to drop all your tables (DataSet1, DataSet2...) into one spreadsheet, so that we can refer just one single range for each month?
Here's one solution (hopefully a good starting point) - array formula (Ctrl+Shift+Enter):
=SUMPRODUCT(IFERROR(IF(TRANSPOSE(IF($B3=$C$10:$C$15,$B$10:$B$15,""))=RIGHT($B$18:$B$36,2),C$18:C$36,0),0))

How to find the index of remaining columns if the data is repetitive

I have a data entry like thisData entries
Now, i need to find the smallest 10 values and also get the corresponding person and area and date along with it.
I used SMALL functoin to find the least 10 values. Then I used the index and match functions for getting their corresponding row entries. The problem is since some data entries are being repetitive, these functions are giving the row of the first 2 for all the remaining 2s. How to solve this
In F2 use Rank like this, so you have unique numbers:
=RANK(C2,$C$2:$C$21,1)+ROW()/1000
in G2 use Small, to pull the smallest of the ranked numbers and copy down 10 rows.
=SMALL($F$2:$F$21,ROW(A1))
Now you can pull person, date, real hours and area with an index match in H2, copied across and down.
=INDEX(A$2:A$21,MATCH($G2,$F$2:$F$21,0))

Excel - find the biggest gap between numbers in rows

I have an excel file with >12500 rows in one column.
It contains such random strings with 20 digits:
2,3,4,6,7,8,12,13,14,24,30,42,45,46,48,50,56,58,**59**,61
1,2,6,8,11,12,13,16,17,21,24,27,28,33,34,42,44,48,58,61
3,7,10,13,14,15,18,21,23,24,25,29,30,34,37,48,51,56,57,60
8,11,13,16,17,19,21,27,29,35,36,39,42,44,46,50,53,54,57,60
2,4,7,9,21,26,28,30,32,34,35,37,38,39,43,44,50,60,61,62
10,13,15,18,21,22,23,24,25,26,40,42,48,49,51,52,56,**59**,61,62
1,2,4,7,14,15,18,20,24,29,30,32,35,41,42,50,52,55,58,62
1,4,8,9,10,12,17,24,25,33,37,41,43,44,46,49,52,**59**,61,62
1,2,4,6,9,12,15,17,21,24,30,31,32,36,41,44,47,48,51,58
2,7,10,12,15,16,20,24,25,27,30,33,39,44,45,52,54,55,58,60
5,7,10,11,20,22,24,31,32,33,36,38,39,41,43,47,50,52,56,58
3,6,8,9,14,15,19,21,25,28,34,37,39,45,47,54,55,56,57,**59**
1,2,3,4,5,8,14,15,18,20,23,31,33,37,42,45,46,51,52,55
I need to know whats the biggest gap between rows where a number hasn't repeated. For example - I search for any number (e.g 59) and I need to know what's the largest gap between two rows where number 59 hasn't repeated.
In this example it's 4 row gap between 59's.
Hope that I make myself clear.
Seems like a fun problem which admits a simple but not quite obvious answer. First -- make sure that the data is in 20 columns (use the text to columns feature under the data tab). Using your example, I came up with a spreadsheet that looks like:
V1 holds the target number. The formulas are in columns U.
In U1 I entered:
=IF(ISNA(MATCH($V$1,A1:T1,0)),1,0)
This formula uses MATCH to test if the value in V1 lies in the range to the left of it. If it doesn't the match function returns #N/A. The function ISNA checks for this error value. IF it is present, the overall formula returns 1 (since there are now 1 consecutive row without the target number) otherwise it returns 0.
The formula in U2 is similar with a little twist:
=IF(ISNA(MATCH($V$1,A2:T2,0)),1+U1,0)
The same basic logic -- but rather than returning 1 if the target number isn't present it adds 1 to the number above. The formula is then copied down the rest of the range. It has the effect of keeping a running total of consecutive rows without the target value. This running total is reset to 0 whenever a row with the target value is encountered.
The final ingredient requires no comment. In U14 I just have
=MAX(U1:U13)
which is the number you are looking for (assuming that the maximum number of consecutive rows without the target number is what you are looking for, even if this occurs either at the top or bottom of the data. If you want the largest gap that is literally between two rows where the number occurs, the logic would need to be made more complex).

separate Last Name, First Name and Middle Initial in three different columns

I have a file which contains Last Name, First Name MI for about 5000 people.
I need to split them in 3 different columns.
The issue I am facing is , that sometimes there are more than 1 first names, for example I have a person as Davis, Mary Ann L.
I want Davis in one column.
Mary Ann in another column and L in the 3rd column. Basically check if after the comma the number of characters is greater than 1. If it is greater than 1 then consider it as first name. If number of characters is equal to 1, then consider it Middle Initial.
How can I achieve this?
In your case, I would do a first approach by using the "Text to Column" command. Just mark the whole column, then choose Data -> Text to Column. Choose "delimited", then next, then select "Space".
After this, I would look through the processed data and get a picture. I assume that most records will be ok already now. And those records which are exceptions to the standard should be easily identifyable. You could even filter for them.
Only then, in a third step, I'd write a formula which processes the columns you have created in the first step.
Or, possibly a formula is not necessary at all. Possibly you can just easily filter and process some of the exceptions manually.

Resources