I study wildlife and currently, I am doing an analysis regarding how long my focal species goes off of the mountain (its main habitat) and into human settlements.
Here is a picture with the data: data
Anyways, as you can see there are three coloured columns. Yellow is data, green is time, and blue is whether the animal is on or off the mountain (with red being when the animal is off).
As you can see, this one particular animal went off on several occasions. In this case, he went off the mountain three times but stayed off at various lengths. As I have thousands of data points, I essentially would like to determine how long each "off the mountain" event lasted. That is, since I consider every time the animal went off the mountain to be a separate event, I would like to determine how long the animal was off the mountain for each excursion, separately. In this case, the animal went off three times and I would like to total those three events individually.
So, as stated, an event would be every single occasion that the animal left the mountain, stayed there (for however long), and eventually made its way back up.
Any help would be greatly appreciated.
The simplest way would be just to count how many consecutive "off" periods there are in a particular run following an "on" period then multiply by 3 hours 20 minutes which you could do like this (starting in (say) K2)
=IF(AND(G1="On",G2="Off"), MATCH("On",G3:G$100,0)*TIME(3,20,0)*24,0)
You could take it further by looking at the individual times of the fixes as well to get an upper and lower limit (e.g. for the first excursion it could be between 3 hours 20 minutes and 10 hours 40 minutes roughly).
Upper limit
=IF(AND(G1="On",G2="Off"), (INDEX(J3:J$100,MATCH("On",G3:G$100,0))-J1)*24,0)
Lower limit
=IFERROR(IF(AND(G1="On",G2="Off"), (INDEX(J3:J$100,MATCH("On",G3:G$100,0)-1)-J2)*24,0),0)
where my column J contains a datetime value formed by adding the date and time in columns A and B together.
This raises an issue about what happens when the animal is still off-mountain at the end of its data (currently gives #N/A because MATCH is unable to find a cell containing "On"). Would need to decide how to treat this case if it ever occurs in practice.
Note when there is only one off-mountain measurement the lower limit is zero because in theory the animal could have left immediately before the measurement and returned immediately afterwards.
EDIT
To address the above issue where the animal is still off-mountain at the end of its data (and looking at the sample data it looks as if a different animal's data is immediately following the first animal's data) you would need this
=IF(AND(G1="On",G2="Off"), IFERROR(MATCH(1,(G3:G$100="On")*(E3:E$100=E2),0),MATCH(TRUE,E3:E$100<>E2,0))*TIME(3,20,0)*24,0)
which would have to be entered as an array formula using CtrlShiftEnter
You could argue that you might need to do some averaging for an incomplete off-mountain excursion like this which would make it even more complicated, but this is an Excel answer and can't go too far into the rights or wrongs of the analysis.
I guess a good starting-point would be knowing how you gather these statistics in the first place.
Related
Hopefully the title makes some sense because I'm trying to wrap my head around the logic and I'm not quite sure how to phrase the question.I'll try to give a brief explanation of the end goal without over complicating it with unnecessary details.
I have a table of survey score averages for every month per person and a correlating table with the number of surveys each person received for each month. The logic is essentially multiple the score for each month by the number of surveys, combine them, divide by the total number of surveys within that time period to get their true average. Where things get a little complicated is that I have to include the ability to set a custom date range and return the value. So sometimes I might be looking at the average for Jan - Apr, other times I might just be looking at Feb-Mar etc.
I think sumproduct is going to get what I need done but I'm running into issues trying to write it out. I've written it several different ways and none of them worked so here's one that best conveys what I'm trying to do,
=SUMPRODUCT(--(F7:I7,L7:O7>=C2),--(F7:I7,L7:O7<=C3),--(E8:E12,K8:K12=B9),tbl_average[[Jan-20]:[Apr-20]],tbl_surveys[[Jan-20]:[Apr-20]])
I super appreciate any assistance I can get on this. I'm hoping the end result is not nearly as difficult as I'm making it out to be.
Some additional information:
I'm going to be using this same process to calculate multiple metrics across multiple worksheets.In the test example each of the tables will most likely be on different sheets. The dashboard with the calculated results will contain everyone's names and will be filtered and rearranged frequently, so I need to make sure we're always matching directly to their names and not just the relative rows. Basically, in my example I show that Agent 1 is always lined up on row 8 but that's not always going to be the case. Agent 1 could be in Row 8 on Sheet 1, Row 10 on Sheet 2, and Row 12 on Sheet 3 and I need all the correct values to multiply and sum against one another.
I work for a hospital that is part of a larger network. We were recently asked by our corporate overlords to address the use of a specific laboratory test. in general, this test should only be performed daily, which should be considered to corresponded to a 24 hour period from last draw. sometimes, however, based on when people arrive to the hospital (e.g. 7pm), and in the interest of bundling labs for a single draw, they may be drawn sooner to coincide with routine testing i.e. 5am. it would never be necessary to otherwise need to repeat within a short (8 hour) window, particularly on the same day.
we have been asked to validate to see if we are adhering to this general practice, as testing any more frequent than that, say, within 12h of a previous test, has no real clinical value and thus adds unnecessary cost.
To address this issue I was given a dataset that among other items includes all instances the lab was performed including collection date and time.
please see HIPPA-safe example below (to be clear, no real data and identifiers are not real); the actual dataset has over 4,174 entries corresponding to 1,328 unique persons. everyone had at least one test performed, not everyone had >1.
I THINK what I want to do is an IF formula that reads the antecedent cell to 1) check if same person and 2) if so, perform a subtraction of the time stamp to display the relevant difference in time, which I can then filter, create histogram, etc. does this seem like a reasonable approach? is there a more preferable method to facilitate analysis? do any other forms of analysis come to mind?
=IF(B2=B1, D2-D1, "n/a")
example data set with formula:
any other forms of analysis come to mind?
By the looks of it you should consider taking the values under "Results" into account, assuming there is a band that might be considered 'normal' readings. The "one in 24 hours is sufficient" rule of thumb may well be appropriate for a series of values within the 'normal' band but not so much so if readings are close to 'danger level'.
That is, in some cases a higher than 'standard' frequency of monitoring may be in the patient's interest, even if not hospital policy, so it may be worth separating the "less than 24 hours interval" readings into those where the higher frequency provided information of little value (eg readings remaining within a 'normal' band) from any that crossed into or out of the band and/or large changes in value. This though may be more a matter of statistical analysis than programming and depend upon whether any action might be taken as a result of such "extra" readings.
Example data with desired outcome that I need to calculate
I have 12 items of a certain current value. I have a 'soft' cap of $1,000,000 for these values. Some of the items fall above, and some below this cap level.
I have an amount of money (for this example $900,000) that I want to distribute amongst only the items that fall below the cap (in this example 6 items), with the aim of bringing the value of these items up to but not over the cap value.
If I distribute the $900,000 evenly over these 6 items (each receiving $150,000), you can see that items 2 and 9 would then be over the $1,000,000 cap. So items 2 and 9 should only receive $100,000 to raise their value to the cap, then the remaining 4 items would receive and equal share on the remaining pool of money ($700,000 / 4 = $175,000).
So I need a formula to check every item to see if it needs a distribution (i.e below the cap) and then portion/divide out the money pool as illustrated above in the desired distribution column.
Note: The pool of money to be distributed can change. Also the number of items below the cap can change. The cap value itself can change.
I am hoping to avoid VBA or Solver because the spreadsheet could be used on other people's computers.
Hopefully this makes sense. Thanks.
EDIT:
So far I have been able to get close by adding a helper column and using the following formula:
=IF(SUM($F$6:F14)=$D$23,0,E15*MIN(D15,($D$23-SUM($F$6:F14))/SUM(E15:$E$18)))
Working example when values are sorted.
This seems to work when the values are sorted in descending order, as shown in the example image above. But seems to break when the values are a bit more randomly assorted which is likely to happen (as in the original post).
Just to give you an idea of how the solver can be set up to do a capital budget model here is one, also shows the solver and its settings:
I have a problem trying to come out with a solution for my job.
I work at a warehouse and we work on containers. Each containers has X amount of boxes and X amount of paperwork for the boxes. We are also given a due date.
I wanted to use some kind of algorithm or formulas but I don't know how to use it of any kind.
Lets say i have 1000 containers and 700,000 boxes&28,000 Paper Work to be done in the next 7 days with 2 different shifts. For those containers I already have a list of the 1,000 of what needs to be done, but i want to do it quickly with excel to distribute the work based on categories. In those 1000, I also have a due date. So lets say I want to do 72 Containers per shift (144 for 1 day). I also want to do 50,000 boxes each shift with 2000 paper work also each shift.
My priorities is first Due Date, if a container is getting close to current date, that takes priority(lets say within 1 day). Next if i want to do 50,000 boxes and 2000 paper work per shift, i want to insert that into a cell and excel will automatically select the best 72 containers per shift that prioritizes due date(doesn't matter how many boxes or papers if it is within 1 day)and then chooses what ever best containers that will equal out to 50,000 boxes and 2,000 papers. Each container and its information in written per row with the columns having headers. I want it to be done this way so that i don't manually count each container so that i can best distribute the same amount of work to each shift. This way my 1 hour process can be done in 5 minutes.
I don't know if i am asking too much but i just need a sample to then test it on a file.
Sorry for providing the wrong information right away. I've included a sample of what I am trying to achieve.
Instead of 72 containers, with the example I lowered it 11 per shift but the boxes and paper count didn't change.
Container List
What this shows is that the yellow highlighted were selected because of due date being very close to current date. Orange highlight were selected to complete the 11 container count goal while being as close as possible to 50k boxes and 2k load slips. Then were sorted for email purposes, the other 11 were also already selected for the 2nd shift when they come in keeping in mind of the same goal as 1st shift. Any left containers are going to ignored at least till the next day.
Is this possible or am I thinking too far ahead of myself.
I have a requirement in Excel to spread small; i.e. pennies, monetry rounding errors fairly across the members of my club.
The error arises when I deduct money from members; e.g. £30 divided between 21 members is £1.428571... requiring £1.43 to be deducted from each member, totalling £30.03, in order to hit the £30 target.
The approach that I want to take, continuing the above example, is to deduct £1.42 from each member, totalling £29.82, and then deduct the remaining £0.18 using an error spreading technique to randomly take an extra penny from 18 of the 21 members.
This immediately made me think of Reservoir Sampling, and I used the information here: Random selection,
to construct the test Excel spreadsheet here: https://www.dropbox.com/s/snbkldt6e8qkcco/ErrorSpreading.xls, on Dropbox, for you guys to play with...
The problem I have is that each row of this spreadsheet calculates the error distribution indepentently of every other row, and this causes some members to contribute more than their fair share of extra pennies.
What I am looking for is a modification to the Resevoir Sampling technique, or another balanced / 2 dimensional error spreading methodology that I'm not aware of, that will minimise the overall error between members across many 'error spreading' rows.
I think this is one of those challenging problems that has a huge number of other uses, so I'm hoping you geniuses have some good ideas!
Thanks for any insight you can share :)
Will
I found a solution. Not very elegant, through.
You have to use two matrix. In the first you get completely random number, chosen with =RANDOM() and in the second you choose the n greater value
Say that in F30 you have the first
=RANDOM()
cell.
(I have experimented with your sheet.)
Just copy a column of n (in your sheet 8) in column A)
In cell F52 you put:
=IF(RANK(F30,$F30:$Z30)<=$A52, 1, 0)
Until now, if you drag left and down the formulas, you have the same situation that is in your sheet (only less elegant und efficient).
But starting from the second row of random number you could compensate for the penny esbursed.
In cell F31 you put:
=RANDOM()-SUM(F$52:F52)*0.5
(pay attention to the $, each random number should have a correction basated on penny already spent.)
If the $ are ok you should be OK dragging formulas left and down. You could also parametrize the 0.5 and experiment with other values. With 0,5 I have a error factor (the equivalent of your cell AB24) between 1 and 2