variable length substring of a string with specific beginning and ending - excel

I have to extract a substring which is always variable in length from within the middle of a string (cell) in excel.
The criteria is:
it is always starting with a specific set of symbols (in this example "Ingredients:")
it is always ending with a specific set of symbols (in this example "Table of Nutritional Information").
The length can be any from 1 word to about 500.
It could be an excel formula or even VBA. But I am a complete beginner with VBA, so please give specific advice there.
My example cell content is like this:
We could tell you that our Beanz are hard to beat. That they're brimming with deliciously rich, tomatoey flavour. But you already know that. Because you know what Beanz Meanz...
Heinz baked beans don't just taste great, but are nutritious too; high in fibre, high in protein and low in fat, as well as contributing to 1 of your 5 a day. Packed full of quality ingredients... it has to be Heinz. Love our Heinz Beanz as much as we do? Discover the rest of our range, including organic and no added sugar varieties!
Heinz Beanz come in a variety of multipacks, perfect for when you need to feed the whole family!
1 of your 5 a day.
No artificial colours, flavours or preservatives.
Suitable for Vegetarians and Vegans.
Naturally high in protein and fibre.
Gluten free and low in fat.
Ingredients:
Beans (51%), Tomatoes (34%), GRAIN, Water, Sugar, Spirit Vinegar, Modified Corn Flour, Salt, Spice Extracts, Herb Extract.
Suitable for Vegetarians. Free From Artificial Flavours.
Empty unused contents into a suitable covered container. Keep refrigerated and use within 2 days.
Table of Nutritional Information
Per 100g Per 1/2 can %RI*
Energy 329kJ 682kJ -
78kcal 162kcal 8%
Fat 0.2g 0.4g 1%
- of which saturates <0.1g <0.1g <1%
Carbohydrate 12.5g 25.9g 10%
- of which sugars 4.7g 9.8g 11%
Fibre 3.7g 7.7g -
Protein 4.7g 9.7g 19%
Salt 0.6g 1.2g 21%
*RI per serving. Reference intake of an average adult (8400kJ/2000kcal)
The desired outcome would be:
Ingredients:
Beans (51%), Tomatoes (34%), GRAIN, Water, Sugar, Spirit Vinegar, Modified Corn Flour, Salt, Spice Extracts, Herb Extract.
Suitable for Vegetarians. Free From Artificial Flavours.
Empty unused contents into a suitable covered container. Keep refrigerated and use within 2 days.

Let's say your example cell is A1, then in another cell you can do:
=TRIM(MID(A1;SEARCH("Ingredients:";A1);SEARCH("Table of Nutritional Information";A1)-SEARCH("Ingredients:";A1)))
You will probably will have to adapt a little bit to get rid of final breaklines.
This is how it works:
SEARCH("Ingredients:";A1) will find the position of the first coincidence of text Ingredientes. returning a number. This will be starting point of extracting text with MID.
SEARCH("Table of Nutritional Information";A1) same than before, but with text Table of Nutritional Information. So this is the end point of extracting text
Step 2 - Step 1 will return how many chars you want to extract, starting at Step 1.
TRIM will just delete extra blanks if added. Notice that extra blanks are not the same than breaklines.
In this case, to get rid of final BREAKLINES, just do extra -5:
=TRIM(MID(A1;SEARCH("Ingredients:";A1);SEARCH("Table of Nutritional Information";A1)-5-SEARCH("Ingredients:";A1)))
This will return the exact output you want, but don't know if it will work with all your inputs.

Assume source data housed in column A, put criteria header "Ingredients" & "Table of Nutritional Information" in B1 and C1.
Then,
In B2, formula copied down :
=MID(LEFT($A2,FIND(C$1,$A2)-1),FIND(B$1,$A2)+LEN(B$1)+1,599)

Related

How can I determine the 'total cost' from a tiered pricing structure using standard formulas in Excel?

I'm trying to evaluate various tiered pricing structures (for say, electricity plans) using Excel (more-or-less) to see what costing/plan is 'optimal', given some existing usage data I have.
Consider an example 'Table of Usage & Rates' (with fictitious but easily manipulated values):-
For a daily usage value of 120, we'd have 100 (in the 1st tier) and 20 (in the 2nd tier). The amount used within a tier gets charged at a certain rate (the 'factor')... and each 'tier charge' is addded together to form a total charge for the day.
So, we can calculate:-
100 x 8 = 800 ...a part of the total
20 x 4 = 80 ...another part of the total
...and that's all, giving a total of 880.
...but how to do that in a single formula within a cell?
I've done some pretty decent explorations for a few hours today, as I can't nut out how to deal with this... and most suggestions talk about multiple =IF formulas (cumbersome and unscalable - I shouldn't need to recode cell contents if I split/add another tier)... and suggestions with =VLOOKUP just don't 'click' with me ( = I don't understand them).
I'm actually using 'PlanMaker', a component of Softmaker's 'Office 2021' product to create/maintain this spreadsheet.. and there is no VBA-like plugin available.
I'd appreciate a method of attack, if anyone can suggest something, please...
So:
=product(10,8)+product(20,4)
or if we assume Factor starts in B9 then =product(A9,B9)+product(A10,B10+product(A11,B11)
then take the sum of those results etc assuming A9 is the amount used.
You can also use:
=sumproduct(A9:A11,B9:B11)
for the same but only needs one cell. And the advantage of a lot less typing.
You can include a 3rd array in sumproduct (or as many as needed) such as a binary value to include in the calculation or not.

How to sum total hours in a row while skipping certain values?

I study wildlife and currently, I am doing an analysis regarding how long my focal species goes off of the mountain (its main habitat) and into human settlements.
Here is a picture with the data: data
Anyways, as you can see there are three coloured columns. Yellow is data, green is time, and blue is whether the animal is on or off the mountain (with red being when the animal is off).
As you can see, this one particular animal went off on several occasions. In this case, he went off the mountain three times but stayed off at various lengths. As I have thousands of data points, I essentially would like to determine how long each "off the mountain" event lasted. That is, since I consider every time the animal went off the mountain to be a separate event, I would like to determine how long the animal was off the mountain for each excursion, separately. In this case, the animal went off three times and I would like to total those three events individually.
So, as stated, an event would be every single occasion that the animal left the mountain, stayed there (for however long), and eventually made its way back up.
Any help would be greatly appreciated.
The simplest way would be just to count how many consecutive "off" periods there are in a particular run following an "on" period then multiply by 3 hours 20 minutes which you could do like this (starting in (say) K2)
=IF(AND(G1="On",G2="Off"), MATCH("On",G3:G$100,0)*TIME(3,20,0)*24,0)
You could take it further by looking at the individual times of the fixes as well to get an upper and lower limit (e.g. for the first excursion it could be between 3 hours 20 minutes and 10 hours 40 minutes roughly).
Upper limit
=IF(AND(G1="On",G2="Off"), (INDEX(J3:J$100,MATCH("On",G3:G$100,0))-J1)*24,0)
Lower limit
=IFERROR(IF(AND(G1="On",G2="Off"), (INDEX(J3:J$100,MATCH("On",G3:G$100,0)-1)-J2)*24,0),0)
where my column J contains a datetime value formed by adding the date and time in columns A and B together.
This raises an issue about what happens when the animal is still off-mountain at the end of its data (currently gives #N/A because MATCH is unable to find a cell containing "On"). Would need to decide how to treat this case if it ever occurs in practice.
Note when there is only one off-mountain measurement the lower limit is zero because in theory the animal could have left immediately before the measurement and returned immediately afterwards.
EDIT
To address the above issue where the animal is still off-mountain at the end of its data (and looking at the sample data it looks as if a different animal's data is immediately following the first animal's data) you would need this
=IF(AND(G1="On",G2="Off"), IFERROR(MATCH(1,(G3:G$100="On")*(E3:E$100=E2),0),MATCH(TRUE,E3:E$100<>E2,0))*TIME(3,20,0)*24,0)
which would have to be entered as an array formula using CtrlShiftEnter
You could argue that you might need to do some averaging for an incomplete off-mountain excursion like this which would make it even more complicated, but this is an Excel answer and can't go too far into the rights or wrongs of the analysis.
I guess a good starting-point would be knowing how you gather these statistics in the first place.

Excel: Allocating Streams to Students based on GPA and Choice Preference

I have a total of 64 students to be allocated to 19 different streams on the basis of their GPA. Students have also indicated their preference for the streams. Only 5 students can be allocated to first three streams (A-C), 4 students to Stream D and the remaining fifteen streams get 3 students each. I have added the data to an excel sheet but have no idea how to process that to achieve the above results. The data is organized like this. I'm happy to share the original file but couldn't find way to upload that.
This could be solved automatically in a vba macro but the slightly faster version is doing it a bit fast and dirty with formulas. It starts like you have already, ranking by gpa.
Insert 2 rows for each student. In this row you will calculate the "filling" of the streams. If you have a bazillion students, you should write a small macro to copy paste so you get every other line. A shame that excel does not accept pasting empty values as no-change-values.
Either change the rankings by search/replace or create a new sheet where the "desired allocation" value is HIGHEST for the MOST WANTED stream.
Cell C2 should be like this:
=20-'originalsheet'!C2
We are now working on the new sheet
So you have these rows: (student - allocation - availability - student - allocation - availability)
In the allocation line, you will fill the allocation of the student above, in the availability row you will fill the current available spaces.
The first student gets allocated to his or hers choice.
The allocation row gets a formula that investigates if the current cell has the lowest possible value. So. Row2: first student, Row3: Allocation, Row4: new availability, Row5: Students wishes, Row 6: We use this formula in cell C6
=if(sumproduct(max(($C5:$U5)*(($C4:$U4)>0)))=C6;1;0)
Then the next line, new availability is simply
=C4-C6
copy paste the formulas downwards. You may get trouble with students on exactly similar GPAs but your ranking from top to bottom is what sets the priority in this scheme. I don't have the original file and I don't have time to build it from an image but it looked like it worked for my tiny test-setup.
Slightly easier than doing it by hand, but again; if you have a bazillion students and are doing this allocation often - ask in the vba section.
Summary:
Rank by GPA
Change so highest wish is highest number
Insert 2 rows for each student
Calculate manually for first student, should not be difficult...
Use formulas and make certain the cell references are not bungled up.

Microsoft Excel 2007 Always round up even if the decimal is under 0.5

So I'm creating a spreadsheet that determines the cost of materials and the number of each material needed in order to complete a desire project using input from myself. Right now the desired project is a wall that is 250x9 that requires replace all the 4x8 sheets of wood with OSB and install Vinyl Siding. The issue I'm running into is I cannot get it to always round up. By that I mean even if the value is 1.1 it should round up. In this specific case I am buying nails for my nail gun in a box of 2,000 and each sheet of OSB will have 32 nails in it. If 250x9 area requires 70.3125 sheets of OSB it means I still have to buy 71 sheets of OSB. If that OSB is 71 sheets then it require that I have 2272 Nails then the result is I need 1.125 Boxes of nails. However I can't seem to get it to show this as 2 boxes because again I still need to purchase more than one box to complete the project. So with that being said if I take the number of OSB needed 70.3125 and I place it in a formula with a roundup function it still rounds down (gives me a headache that there is a roundup and a rounddown function and it will still round down on me. Perhaps it is the way I am using it in the formula that is incorrect, I'm not sure. So let me translate the formula's used and you can let me know if I'm doing something wrong or if there is a function or set of functions that I can use to solve this issue.
=SUM(((B30*C30)+(B35*C35)+(E30*F30)+(H30*I30))/(E9*G9))
This says that if I added Wall1 L*W with Wall2 L*W with Wall3 L*W with Wall4 L*W and divide it by OSB H*W I get the number of sheets needed. Which in this case is 2250/32 basically. But its programmed in a manner that I can input the information for individual walls to different area's and get it to spit out the total SqFt for each wall and give an individual breakdown per wall of material needed with cost associated per sq ft of material bleh bleh bleh. The point is I take the result that is the 70.3125 and I move it to a different workbook and I say "Sheets OSB Needed" and in that box I have
=ROUNDUP(Sheet1!A9,1)
Whereas I'm asking it to roundup A9 which is the result of the above formula by intervals of 1. But the output is still 70 instead of 71. and much the same case with the nails needed. Which can be calculated in a few different manners but regardless the amount of nails needed divided by 2000 would output the decimal answer which yields a value of less than 1.5 and it too provides me with a value of 1 instead of 2 with much the same formula. I could achieve my desired result I suppose with Trunc and Mod functions collaborating using multiple cells to output the different portions of the data. But is there a way to do this that doesn't involve so many cells being used up?
C7
=Trunc(A9)
Removes Decimal from 70.3125
C8
=MOD(A9)
Outputs decimals from 70.3125
C9
=IF(C8<1,"1",C8)
If Decimals are < a whole number make it a whole number
C3
=SUM(C7+C9)
Add the whole number to the Trunc Number to get value desired.
Which I'm already seeing an issue with this if there is no decimals in the sheets needed then wouldn't it always add one because the decimal place would be 0? How can I handle this issue? Isn't there an easier way to do this or a way to code it so that its all nested into one calculation or at least mostly all into one calculation without making a circular reference of some sort?
You need to change the second parameter to a 0 ROUNDUP(70.3125, 1) is 70.3 the 3 must be getting dropped elsewhere or lost in formatting.
ROUNDUP(70.3125, 0) will give 71.
The second parameter of round up is the decimal place. So to round to integers it should be 0 not 1

How can I implement 'balanced' error spreading functionality in Excel?

I have a requirement in Excel to spread small; i.e. pennies, monetry rounding errors fairly across the members of my club.
The error arises when I deduct money from members; e.g. £30 divided between 21 members is £1.428571... requiring £1.43 to be deducted from each member, totalling £30.03, in order to hit the £30 target.
The approach that I want to take, continuing the above example, is to deduct £1.42 from each member, totalling £29.82, and then deduct the remaining £0.18 using an error spreading technique to randomly take an extra penny from 18 of the 21 members.
This immediately made me think of Reservoir Sampling, and I used the information here: Random selection,
to construct the test Excel spreadsheet here: https://www.dropbox.com/s/snbkldt6e8qkcco/ErrorSpreading.xls, on Dropbox, for you guys to play with...
The problem I have is that each row of this spreadsheet calculates the error distribution indepentently of every other row, and this causes some members to contribute more than their fair share of extra pennies.
What I am looking for is a modification to the Resevoir Sampling technique, or another balanced / 2 dimensional error spreading methodology that I'm not aware of, that will minimise the overall error between members across many 'error spreading' rows.
I think this is one of those challenging problems that has a huge number of other uses, so I'm hoping you geniuses have some good ideas!
Thanks for any insight you can share :)
Will
I found a solution. Not very elegant, through.
You have to use two matrix. In the first you get completely random number, chosen with =RANDOM() and in the second you choose the n greater value
Say that in F30 you have the first
=RANDOM()
cell.
(I have experimented with your sheet.)
Just copy a column of n (in your sheet 8) in column A)
In cell F52 you put:
=IF(RANK(F30,$F30:$Z30)<=$A52, 1, 0)
Until now, if you drag left and down the formulas, you have the same situation that is in your sheet (only less elegant und efficient).
But starting from the second row of random number you could compensate for the penny esbursed.
In cell F31 you put:
=RANDOM()-SUM(F$52:F52)*0.5
(pay attention to the $, each random number should have a correction basated on penny already spent.)
If the $ are ok you should be OK dragging formulas left and down. You could also parametrize the 0.5 and experiment with other values. With 0,5 I have a error factor (the equivalent of your cell AB24) between 1 and 2

Resources