Deal with a variable number of samples associated with identificative values - excel

I'm a new member and not an Excel ace!
My problem is that I've (for example, the dataset is much larger) five identification values (id1,id2,id3,id4,id5) in the column A which identify a cast. In column B I've 19 samples for the 5 casts in column A. However, in each cast the sample size vary from 2 to 6. So, I want to obtain 19 values in the way that each cast identification value correspond to its number of samples. In the end I want to have this kind of column A with the identification values (id1,id1,id2,id2,id2,id2,id3,id3,id3,id3,id3,id4,id4,id4,id4,id4,id4,id5,id5) which correspond with the column B with the number of samples (1,2,1,2,3,4,1,2,3,4,5,1,2,3,4,5,6,1,2). So, I want each identification value to correspond to the number of its samples.
Thank you!

I think you might be best served by instead creating a "CAST+Sample" column that would uniquely identify each row. However the below, entered into A3 should produce the desired effect from your example.
=IF(B3=1,"ID"&VALUE(RIGHT(A1,1))+1,A2)
This will only work when there are less than 10 Casts.

Okay, so it was worth giving example data. Assuming your Sample is in column B and you want your ID's in column A. Put this formula in cell A2 and drag till you have your Sample.
=IF(VALUE(B2)=1,IF(LEFT(A1,2)="ID","ID"&RIGHT(A1,LEN(A1)-2)+1,"ID1"),IF(LEFT(A1,2)="ID",A1,"ID1"))
Here is the screenshot of how it will look after you place this formula.

Related

Attempting to Determine Progress Between 3 Columns Which May Contain Blank Values

I am a teacher trying to track progress of my students' grades over 3 assessments using Excel. Unfortunately, absences can cause blank values to occur within the columns. I would do (ex. P2, Q2, R2 contain data and S2 being the progress made between P2 and R2) =R2-P2, which would work, but it would not work if a student was absent and did not have a value within P2 or R2. Ideally, I would like to keep the formula consistent throughout the entire progress column.
As shown in the example, columns P, Q, R are the assessments and S is the overall progress. For Column S, I would like to show the overall growth that was made between these assessments. However, the issue is that sometimes, students are not able to be in attendance for these assessments, so the cell will be blank. The formula mentioned by Spectral Instance worked when P and R only had values, but did not work when just P or just R were missing. The last two rows show where the formula did not work correctly. In those instances, I would like to then only show progress with the assessments did take. Is there a formula that will work universally no matter which of the 3 columns are missing data?
This is shown in the example in the photo that is attached.
EXAMPLE
Can someone please provide insight as to how I can correctly track progress?
Assuming that you have headers in Row 1, that data begins in Row 2, and that the only data down the columns is score data — delete everything in Col D (including the header). Then place the following formula in D1:
=ArrayFormula({"Progress";IF(MMULT(IF(A2:C>0,1,0),{1;1;1})<2,,REGEXEXTRACT(A2:A&"~"&B2:B&"~"&C2:C,"(\d+)~*$")-REGEXEXTRACT(A2:A&"~"&B2:B&"~"&C2:C,"^~*(\d+)"))})
This one formula will return the header (which you can change as you like within the formula itself) and all results for all rows where there are at least two entries. If there are fewer than two entries, null will be returned. You will not drag this formula down.
Understand that this is an array formula that will "own" all of Col D. So you will not be able to type anything into Col D manually without "breaking the array" (which will result in an error in cell D1 and all rows of formulaic data being blank).
As to how the formula works, MMULT (as used here) just counts valid values row-by-row. IF uses that count to determine if there are at least two values.
The two commas together is actually "comma-null" (meaning return null if the count of values is less than 2 in any row).
Otherwise, REGEXEXTRACT will act on a concatenation of all cell values with a tilde ~ separating them. The last value present will be taken from which the first value will be subtracted.
I cannot fully explain MMULT nor REGEX2 expressions in this post, but I trust the gist is clear.
As shown in the example, columns P, Q, R are the assessments and S is the overall progress. For Column S, I would like to show the overall growth that was made between these assessments. However, the issue is that sometimes, students are not able to be in attendance for these assessments, so the cell will be blank. The formula mentioned by Spectral Instance worked when P and R only had values, but did not work when just P or just R were missing. The last two rows show where the formula did not work correctly. In those instances, I would like to then only show progress with the assessments did take. Is there a formula that will work universally no matter which of the 3 columns are missing data?
Once again, thank you all for your insight and feedback. I apologize if this is a small fix that I am maybe missing.

Selecting data In Excel as a group between certain cells and dividing each by group value to find percentage

Background: I have a table describing a imaginary formulation. The main ingredients/materials are Flavour A, B, and Emulsion and a gel mix. These are highlighted in bold in the table (see attached image).
Flavour A makes up 54% of the total formulation and is made up of sub-components: water, Benzoic acid, HCl and Sodium
Flavour B makes up 10% of the formulation and is not made up on any sub components i.e. is 100%.
The Emulsion makes up 19% of the formulation consisting of Water, Oil, Nacl and Ester and a Blue Dye
The Gel mix makes up the final 17% of the formulation and consists of Gel A, B, a gum texture and purified water.
The attached images also shows the percentage of each sub component making up each ingredient.
Although this is a made up example the data I am provided with is presented in the same way.
Problem: I wish to create a new column D that, looking at this data, can automatically calculates the percentage of each sub ingredient as a percentage of the Main ingredients. For example the Oil/Lipid is 63% of the emulsion which is 19% of the total formulation. (0.63*0.19)*100 = 11.97 as the desired output in column D. Similarly a hyphen in column C indicates the ingredient is 100% and should return a value equal to the percentage of the entire formulation e.g 54% returns 54. The total of each of these values will not add up to 100 since the output will contain values for the original material and its constituent sub components.
The way the data is formatted is however what makes this quite challenging.
What I have tried so far:
Firstly I cannot offer any existing code, simply because I don't know how to go about this. All I can think of so far is that because when a Hyphen is present in column C, this identifies the start of the sub component list and the next hyphen will identify the end. Each value between these two then need to be independently divided by the cell adjacent to the hyphen In column B and multiplied by 100 (Note if necessary the hyphen can be changed to 1.0 or 100%). I'm wondering if the data can be filtered in some way (FILTERXML?) but I'm not sure.
The desired outcome column in the attached image show the the values I am trying to achieve in column D. These were achieved by manually calculating each value however that is what I am trying to avoid here. (apologies if there are any mistakes)
Any help is really appreciated (even if some elaborate work around). Equally however if you don't think this is possible let me know.
Thanks very much.
One simple way to achieve this is to add a formula within a helper column which checks whether a cell in Column B is empty and if it is, takes the value from above, and if it is not, takes the value from Column B, e.g. 54%. So in Column E add the following:
=IF(B3="", E2, B3)
And then in Column D add the following which uses your helper column to get your desired result:
=IF(C3="-", B3, E3*C3)
This looks to see if there is a "-" in column C and if there is, takes the total value from column B, and if there is not, multiplies the respective percentages together.
Please try this formula. Paste to D3 and copy down.
=LOOKUP(2,1/($B$1:$B3<>""),$B:$B)*IF(ISNUMBER($C3),$C3/100,1)
The formula works on the assumption that cells in column B are vertically merged and that column C has a non-numeric character in it (such as a hyphen) to identify the caption row of each segment.

Excel automatically converting 7 digit CAS number to another number (date?)

Problem: I am working with 2 list. One called HYPHEN and one called CAS Number in columns A and B respectively.
Column C uses a formula that combines column A and B and sorts them such that if a hyphen is present in column A, this is inserted before the adjacent CAS number which is then inserted below and the sequence continues so that all hyphens and CAS numbers are included. I've attached an image to better explain this and the Formula to replicate this is given below.
A CAS number is a unique Identify for a material/chemical and usually is written as 000-00-0, however occasionally you get materials with CAS numbers of 0000-00-0 (or other variations).
For the most part column C is correct because all but one CAS numbers are in the usual format. However As highlighted in red 6132-04-3 is being converted to 1545801.
What I have tried:
I have realised that 6132-04-3 is being converted to 03/04/6132 so I'm pretty sure that this is being recognised as a date which is causing the problem. I have tried to format the cells to all be a text format, I have added a comma before the CAS number but nothing returns the desired value of 6132-04-3 and instead always returns 1545801.
To replicate the issue: Column A and B can have any data entered. To replicate the output of column C the formula is given below:
Formula for Column C:
=FILTERXML(""&SUBSTITUTE(TEXTJOIN(",",TRUE,A2:B26),",","")&"","//b")
(Formula provided by #Gary's Student on Stack Overflow)
Any thoughts on how to prevent the red CAS number being converted when it is sorted in Column C would be really appreciated.
This is a crude way to fix it by adding then removing an arbitrary character:
=MID(FILTERXML("<a><b>"&SUBSTITUTE(TEXTJOIN(",",TRUE,IF(A2:B26="","","x"&A2:B26)),",","</b><b>")&"</b></a>","//b"),2,99)
If you have the issue of some of your strings containing a comma, just use a different separator:
=MID(FILTERXML("<a><b>"&SUBSTITUTE(TEXTJOIN("|",TRUE,IF(A2:B26="","","x"&A2:B26)),"|","</b><b>")&"</b></a>","//b"),2,99)
Looks like you could use:
Formula in D2:
=SUBSTITUTE(FILTERXML("<t><s>'"&TEXTJOIN("</s><s>'",,A2:B10)&"</s></t>","//s"),"'","")
Or:
=MID(FILTERXML("<t><s>'"&TEXTJOIN("</s><s>'",,A2:B10)&"</s></t>","//s"),2,99)
I can suggest you this:
Go to Format Cells ---> Number ---> Custom ---> Type
In this "Type" field write this #000-00-0
Press "OK"

Excel - Location and prioritisation of values in a range

Hi and thanks very much for taking the time to read/respond.
I'm struggling with trying to adapt a very advance formula given by tigeravatar here: tigeravatar
I have an almost identical issue, but have the following possible states:
A, B, C, D, and corresponding levels of priority.
In a given range, multiple entries in any of these four categories could be made. But I need to return only the highest value regardless of all other entries.
Here's the original formula.
=INDEX({"","D","C","B","A"},MATCH(SUMPRODUCT({4,3,2,1},--(COUNTIF('Sheet 002'!E29:E32,{"A","B","C","D"})>0)),{0,1,2,3,4}))
The only problem being that in the formula above if B and C co-occur it displays A, and if B and C only display if they occur in isolation.
Thanks in advance for your and any inputs you're willing to share!
If you were willing to have a helper column, you could convert the letters to numbers using =CODE(). This column can then be ranked by taking the minimum (A=65, B=66, etc.) Note that CODE() is case-sensitive. You can find the highest ranking letter by using a formula like this:
=INDEX(E29:E32,MATCH(MIN(F29:F32),F29:F32,0))
Screenshot

Taking average of certain values in one Excel column based on values in another

I have a (large) array of data in Excel of which I need to compute the average value of certain values in one column, based on the values of another column. For example, here's a snippet of my data:
So specifically, I want to take the average of the F635 mean values corresponding with Row values of 1. To take it a step further, I want this to continue to Row values of 2, Row values of 3 etc.
I'm not familiar with how to run code in Excel but have attempted to solve this by using the following:
=IF($C = "1", AVERAGE($D:$D), "")
which (to my understanding) can be interpreted as "if the values (anywhere) in column C are equal to 1, then take the average of the corresponding values in column D."
Of course, as I try this I get a formula error from Excel.
Any guidance would be incredibly appreciated. Thanks in advance.
For more complicated cases, I would use an array-formula. This one is simple enough for the AVERAGEIF formula. For instance =AVERAGEIF(A1:A23;1;B1:B23)
Array-formula allows for more elaborate ifs. To replicate the above, you could do =SUM(IF($A$1:$A$23=1;$B$1:$B$23;0))/COUNT(IF($A$1:$A$23=1;$B$1:$B$23;0)).
Looks like more work but you can create extremely elaborate if-statements. Instead of hitting ENTER, do CTRL-ENTER when entering the formula. Use * between criteria to replicate AND or + for OR. Example: SUM(IF(($A$1:$A$23="apple")*($B$1:$B$23="green");$C$1:$C$23;0)) tallies values for green apples in c1:c23.
Your sample data includes three columns with potential ifs so my guess is that you're going to need array formulas at some point.
Excel already has a builtin function for exactly this use; AVERAGEIF().
=AVERAGEIF(C:C,1,D:D)

Resources