Predictive formula - statistics

I would like to predict how much we should keep in a box "Magical-box"
"Magical-box" Should have the ability to predict the value of the next deposit in it :
For Example :
DepositNbr Coins MagicBox
1 6 6
2 4 4
3 10 13 <==> the prediction process may starts from the third deposit
4 13 8
5 8 23
6 23 2
7 2 ...
is there any way to perform this prediction based on the past or the present ?,any formula ( markov Chain , normal distribution,Regression... ) is welcomed

Related

Calculate production capacity per product/day up to goal

I have the following data.
Available resources data per day:
A
B
C
D
E
F
G
H
I
J
K
L
M
N
2
resources
day
3
1
2
3
4
5
6
7
8
9
10
11
12
4
empl.1
8
8
4
2
2
4
4
8
8
5
empl.2
8
4
4
8
4
8
6
empl.3
And different products and it's production per hour (per employee) and the required quantity per part:
P
Q
R
S
2
product
production/hour
required qty
3
4
prod.1
1
60
5
prod.2
1
6
6
prod.3
2
4
From this data I want to calculate the number of products that can be produced per day based on the available employees for the day and the production capacity for that product up until the goal is reached for that product.
edit: calculation from original post was calculating to hours spent per product per day only, not to qty of products produced; also the MOD-part gave wrong calculation results if the daily produced qty exceeds the goal
I use the following formula to calculate the above (used in C11 and dragged to the right):
=LET(
prod,BYROW($B11:B13,LAMBDA(r,SUM(r))),
reached,--(prod<$S$4:$S$6),
dayprod,IFERROR(SUM(C4:C6)/SUM(reached*$R$4:$R$6),0)*reached*$R$4:$R$6,
IF(prod+dayprod>$S$4:$S$6,dayprod-((prod+dayprod)-$S$4:$S$6),dayprod))
This results in the following:
A
B
C
D
E
F
G
H
I
J
K
L
M
N
9
product
day
10
1
2
3
4
5
6
7
8
9
10
11
12
11
prod.1
2
8
4
6
2
8
4
0
8
12
6
0
12
prod.2
2
4
0
0
0
0
0
0
0
0
0
0
13
prod.3
4
0
0
0
0
0
0
0
0
0
0
0
This formula sums the hours from the employees available that day and divides their hours over the products that did not reach the goal yet.
If the goal is reached the available hours are divided over the remaining products to produce.
Screenshot of the data + current result:
Now the problem I'm having is the following:
If the goal is reached for a product somewhere halfway the day the dayprod-((prod+dayprod)-$S$4:$S$6)-part of the function calculates the remaining hours of production for that product for that day, but the available hours from the employees are divided over each product that needs production still, but let's take the following example:
prod.1, day 2: value 8
prod.2, day 2: value 4
The 8 for prod.1 is calculated based on both prod.1 & prod.2 in need for production still and both take 1 hour per person to produce one.
Having 16 hours available that day that means a capacity of 8 for each.
But the challenge lies in the goal being reached halfway the day.
In fact the first 4 hours are used by both employees to produce 4 of each product.
The last 4 hours both employees can focus on prod.1 resulting in not qty 4 of production for the last 4 hours, but 4 + 4 which results in a total of 12 being produced for prod.1, not 8 like now calculated.
How can I get the formula to add the remaining time to the remaining products?
Original post, prior to edit, containing error (not calculating to number of products, but to number of hours spent per product per day only)
I use the following formula to calculate the above (used in C11 and dragged to the right):
=LET(
prod,BYROW($B11:B13,LAMBDA(r,SUM(r))),
reached,--(prod<$S$4:$S$6),
dayprod,IFERROR(SUM(C4:C6)/SUM(reached*$R$4:$R$6),0)*reached*$R$4:$R$6,
IF(prod+dayprod>$S$4:$S$6,dayprod-MOD(prod+dayprod,$S$4:$S$6),dayprod))
This results in the following:
A
B
C
D
E
F
G
H
I
J
K
L
M
N
9
product
day
10
1
2
3
4
5
6
7
8
9
10
11
12
11
prod.1
2
8
4
6
2
8
4
0
8
12
6
0
12
prod.2
2
4
0
0
0
0
0
0
0
0
0
0
13
prod.3
4
0
0
0
0
0
0
0
0
0
0
0
This formula sums the hours from the employees available that day and divides their hours over the products that did not reach the goal yet.
If the goal is reached the available hours are divided over the remaining products to produce.
Screenshot of the data + current result:
Now the problem I'm having is the following:
If the goal is reached for a product somewhere halfway the day the MOD-part of the function calculates the remaining qty for that product for that day, but the available hours from the employees are divided over each product that needs production still, but let's take the following example:
prod.1, day 2: value 8
prod.2, day 2: value 4
The 8 for prod.1 is calculated based on both prod.1 & prod.2 in need for production still and both take 1 hour per person to produce one.
Having 16 hours available that day that means a capacity of 8 for each.
But the challenge lies in the goal being reached halfway the day.
In fact the first 4 hours are used by both employees to produce 4 of each product.
The last 4 hours both employees can focus on prod.1 resulting in a total of 12 being produced for prod.1, not 8.
I kind of broke my head on getting this far, but from here I could use some help.
How can I get the MOD part of the formula to add the remaining time to the remaining products?
I was able to find a solution to my problem.
I had to use the result from the formula in place and check if the sum up to the current day (including that day's production) exceeds the goal. If so I needed to get the time difference between the day's production and that day's production needed to get to the goal. The difference is the time of production to be added to the remaining part(s) for that day that did not reach the goal yet, also not when adding the day's production.
This results in the following formula in C11 dragged to the right:
=LET(
prod,BYROW($B11:B13,LAMBDA(r,SUM(r))),
prodhour,$R$4:$R$6,
goal,$S$4:$S$6,
reached,--(prod<goal),
dayprod,(IFERROR(SUM(C$4:C$6)/SUM(reached*prodhour),0)*reached*prodhour)*prodhour,
preres,IF(prod+dayprod>goal,dayprod-((prod+dayprod)-goal),dayprod),
timecorr,(dayprod*(dayprod<>preres)-preres*(dayprod<>preres))/prodhour,
reachedcorr,reached*(timecorr=0),
dayprodcorr,(IFERROR(SUM(timecorr)/SUM(reachedcorr*prodhour),0)*reachedcorr*prodhour)*prodhour,
IF(prod+dayprod>=goal,dayprod-((prod+dayprod)-goal),dayprod+dayprodcorr))
Where preres is the previous result (from where I got stuck in the opening post).
And the corr parts are taking care of the correction if goal is reached for a product and there was still production time remaining.

How to generate the equal number of groups per week that have members that change the group every week

I have been trying to create an excel that I can use to assign members to a group per week. I need to make sure that each member is in the different group every week.
Below is my excel and here is the formula I use in B3
=INDEX(UNIQUE(RANDARRAY(2, 10, 1, 11)), SEQUENCE(1), {1,2,3,4,5,6,7,8,9,10,11})
The issue i cannot fix is the fact that the groups differ in sizes? Any ideas please.
Week 1 2
PPLs
1 7 8
2 6 11
3 1 3
4 2 7
5 9 7
6 2 10
7 4 8
8 4 5
9 10 8
10 8 9
11 3 6
12 8 6
13 7 9
14 7 8
15 5 8
16 5 8
17 8 8
18 8 10
19 4 9
20 3 2
21 10 9
22 2 10
23 10 6
24 9 3
25 4 9
26 7 6
27 10 7
28 7 7
29 10 5
30 2 5
31 6 6
32 8 8
33 4 4
34 9 10
35 5 9
36 9 7
37 5 7
38 10 9
39 2 10
40 6 5
41 9 2
2 thoughts : simple&(somewhat)repeatable OR noExcel&random.
[ simple&(somewhat)repeatable ]
2 sets only.
Set 1 : group 1 is {member 1-10}, group 2 is {member 11-20} .. group 10 is {member 91-100}
Set 2 : group 1 is {member 1,11,21,31,41,51,61,71,81,91}, group 2 is {member 2,12,22,32,42,52,62,72,82,92} .. group 10 is {member 10,20,30,40,50,60,70,80,90,100}
p/s : although member 1 is always is group 1.. I'd consider it is as a valid group change, since ALL other members of group 1 had changed.
[ noExcel&random ]
Ever heard of 10x10 sudoku? it is widely available online. I have no specific one.. But Sudoku, IS what I meant by noExcel.
example 10x10 sudoku solution :
how to use it :
get 5 different (different is important) solved 10x10 sudoku set, and put it one on top of each other. We should get a 10(col)x50(row) table.
Sort by all 1st row, shall have the sequence (from top, in column 1) : 1,1,1,1,1,2,2,2,2,2,3,3,3,3,3, .. 9,9,9,9,9,10,10,10,10,10
Since we have 50 members (10groups, 5members each). Column 1 will be the assigned group for 50 respective (row) member in week1, Column 2 will be the assigned group for week2... and so on.
if every week is different group, then by week 11 it will come back to his/her original group. OR .. another (5 10x10 sudoku) set perhaps?
(If the idea is unclear.. please ask)
AFAIK. with the rules of sudoku itself, each member(row) will have a different group each week(column), and each element (group number) WILL be repeated for 5 times for each week(column). Thus, solve the
the fact that the groups differ in sizes
part. (please share if this doesn't work though..)
ref : I used https://sudokuspoiler.azurewebsites.net/Sudoku/Sudoku10 to solve http://www.sudoku4me.com/sudoku%2010x10.php puzzle (this solver need 0 to be replaced with 10). As long as it is a valid sudoku solution, it should be fine. Some sudoku use letters instead of numbers. Still, the idea applies.
p/s : I tried excel built in random number generator to generate a ranked (sorted) list, but still end up unable get consistent 5 members per group arrangement, with different group each week (same trouble as OP). I had a fond memory with 9x9 Sudoku, glad to know it came in handy for this solution.

generate normalized discrete values for feature engineering

There is a dataframe, with one columns store the discrete values, shown as follows. I would like to create another column storing the normalized values. For instance, for 4050, the corresponding entry will be 4. Are there any efficient ways to do that instead of writing my own function? In Sklearn, are there any functions to generating normalized values?
Based on your comment:
there are around 20 different values, and the range is from 1000 to 9999, so I would like to use every 1000 as a category
This isn't really normalization in the strict sense of the word. However, to do that, you can easily use floor division (//):
df['new_column'] = df['values']//1000
For example:
>>> df
values
0 2021
1 8093
2 9870
3 4508
4 2645
5 1441
6 8888
7 8921
8 7292
9 8571
df['new_column'] = df['values']//1000
>>> df
values new_column
0 2021 2
1 8093 8
2 9870 9
3 4508 4
4 2645 2
5 1441 1
6 8888 8
7 8921 8
8 7292 7
9 8571 8

Using Excel to allocate values based off their rank while remaining within constraints

I am trying to create a resource calculator that can tell me how many people i need to put on each section depending on the current work waiting and work coming in. Prioritizing sections which have the most work waiting first.
Upper Limit Allocation Prod Ranking
12 [to calc] 28% 1
15 18% 2
5 17% 3
4 8% 4
2 6% 5
3 .2% 6
4 .2% 6
Similar to the other question I have a constraint that i only have so much to allocate. For this example we will use 38 as the amount that is to be allocated.
I have used the formula from the other answer:
=MIN(A2,$E$1-SUMIF($D$2:$D$8,"<"&D2,$B$2:$B$8))
Where E1 contains the total to be allocated.
I have two issues with this formula:
1)The issue that I am having is that I require a minimum value of atleast 1 person in each of these sections.
I have tried using a max function to simply set this value, however this leads to the resources allocated going over the total amount.
What equation would I need to use to make it account for both the total available to allocate, the minimum requirement for each fund and the maximum limit for each fund.
2) It only returns solid integers, would there be a way to retreive more precise results, maybe by changing it to a % distribution?
UL Alloc Rank Capacity Lower Limit
2 1 15 93 1
3 1 15
4 1 15
6 6 8
1 1 15
2 1 15
4 4 9
2 2 7
4 4 4
15 15 2
12 12 10
12 12 1
1 1 11
13 13 5
6 6 6
5 1 15
5 5 3
1 1 14
2 2 13
3 3 12
3 1 15
Reference: Using the Excel's Rank() function to calculate allocations based on ranking and constraints
Simply subtract the 100 on all sides and add them separately:
=MIN(A2-100,($E$1-100*COUNTA($A$2:$A$8))-(SUMIF($D$2:$D$8,"<"&D2,$B$2:$B$8)-COUNTIF($D$2:$D$8,"<"&D2)*100))+100
What is returned depends on your entries in Column A and in E1. You can change Column A based on a percentage distribution and the formula will return the corresponding values.
Edit:
If you set your lower threshold into F2, your Constraint into E2, using this formula
=MIN(A2-$F$2,($E$2-$F$2*COUNTA($A$2:$A$8))-(SUMIF($D$2:$D$8,"<"&D2,$B$2:$B$8)-COUNTIF($D$2:$D$8,"<"&D2)*$F$2))+$F$2
the result looks like this:

What type of ANOVA

what type of anova is 5 treatment group, I have data for the number of cold reported as a function of vitamin c dose
0mg 250mg 500mg 100mg 2000mg
5 6 4 6 3
6 5 6 6 0
2 4 2 3 1
5 4 5 0 3
This is a pretty simple one-way ANOVA, one factor with five treatments. Be aware that you have pretty low sample size in each group, so your power is low.
Also be aware that your data are integers (and not continuous), so you may need to log-transform the response or use a Poisson model.

Resources