Historic data with missing dates in excel - excel

Everyone,
I have an excell sheet which I have imported from my ERP program. It contains data about deviations in raw materials which were noted at specific dates.
There are 40 different materials and data was gathered throughout the last year.
The raw data looks like this:
Material name | Date | Deviation
Blue dye |2014.05.01| 50
Yellow dye |2014.07.02|-40
Blue dye |2014.07.04| 10
How can I transform this data to a stock-type chart which would should cumulitive deviations throughout the year (i.e. if Blue dye is always positive, how much had added up on each date).
I have figured out how to sum up the deviations with their previous values, I have also transformed the table so that all the materials have their deviations in a seperate row:
Material name1|Date1|Date2|Date3
|50 |-10 |20
Material name2|Date2|Date5|Date6
|5 |10 |-100
The problem is that the deviations don't happen on the same dates. If they were noted on the same day every week, this would be hard at all. In this case each material might not have a deviation for a month or two, while another has fluctuations every couple of days. I would need to somehow interpolate the data in between the dates, so that every day of the year is filled up.
I would appreciate any ideas, at this point I'm just stuck...

I thought the above may have been a little vague...i've done you a quick example at the link below - there are 3 tabs
1 for the raw data
2 to get the differences by date & material
3 to show stock holding each day by material (with no change should there be no change)
Assuming you wanted to graph this info by date/product you should have no problem doing this from the example.
hope this is of more help!
http://www.filedropper.com/materialexample

I would suggest using the original data but creating a table on a new tab, list the entire year in column A from A2 down then list the 40 materials across in B1,C1,D1...etc
Then Starting in first cell (B2) use SUMIF(AND functions to match the date in A2 and the material in B1. if there is a match then +/- the difference...perhaps start one cell lower down and use B1,C1 etc for the starting number to perform calculations...this should give you the holding at any one point.
Hope to have helped :)

Related

Which is the best Data Mining model to extrapolate known values to missing values in a table? (General question)

I am working on a little data mining project (I am still a Data Science student, not a professional). Maybe you can help me to choose a proper model for my task.
So, let's say we have a table with three columns and around 4000 rows:
YEAR
COLOR
NAME
1900
Green
David
1901
Yellow
Sarah
1902
Green
???
1902
Red
Sarah
…
…
…
2020
Purple
John
Any value for any field can be repeated in the dataset (also Year values).
In the first two columns we don't have missing values, but we only have around 20% of Name values in the third column. Name value deppends somewhat on the first two columns (not a causal relation).
My goal is to extrapolate the available Name values to the whole table and get a range of occurrences for each name value (for example in a boxplot)
I have imagined a process like that, although I am not very sure if statitically it makes sense (any objections and suggestions are appreciated):
For every unknown NAME value, the algorythm choose randomly one of the already known NAME values. The odds of a particular NAME value to be chosen depend on the variables YEAR and COLOR. For instance, if 'David' values tend to be correlated with low Year values AND with 'Green' or 'Purple' values for Color, the algorythm give 'David' a higher probability to be chosen if input values for Year and Color are "1900, Purple".
When the above process ends, the number of occurrences for each name is counted.
The above process is applied 30 times and the results for each name are displayed in a plotbox.
However, I don't know which is the best model to implement an idea similar to this. I have drawn the process in a simple paint drawing:
Possible output for the task
Which do you think it could be a good approach to this task? I appreciate any help.
I think you have the process down, it's converting the data which may be the first hurdle.
I would look at using from sklearn.preprocessing import OrdinalEncoder to encode the data to convert from categorical to numeric.
You could then use a random number generator to produce a number within the range defined by the encoding which would randomly select a name.
Loop through this 30 times with an f loop to achieve the result.
It also looks like you will need to provide the ranking values for year and colour prior to building out your code. From here you would just provide bands, for example, if year > 1985, etc within your for loop to specify the names.

Determining how many entries were present on each day for a constantly shifting daily dataset

I have searched a lot for a solution to this, but have found nothing. Maybe that's because it's a little hard to describe, or at least, I'm having trouble describing it for a search engine.
I have two columns of dates, the first column is the date a purchase order was received to be inspected, the second is the date that purchase order was accepted or rejected. What I would like is a graph with dates on the X-axis, and then the number of purchase orders in the queue on that day on the Y-axis.
Some purchase orders are completed that day, so they would still be counted, but they might not get addressed for days or weeks, so they would be counted on all those days until they were addressed.
I've been trying to do this with a formula, but am stumped. I feel like I might need to use multiple formulas, or go over to VBA, but my VBA is a little limited.
Edit: Here is a sample dataset.
Date In Date Out
9/1/18 9/1/18
9/1/18 9/1/18
9/1/18 9/2/18
9/1/18 9/3/18
9/2/18 9/2/18
9/2/18 9/4/18
So, it would be 4 for 9/1/18, 4 for 9/2/18, 2 for 9/3/18, and 1 for 9/4/18.
I have tried using COUNTIFS, but I don't know how to check between the two columns for the "between" dates.
If your data is in column A and B. Put your dates in column C (the X axis of your chart), then in column D you can write =COUNTIFS($A$1:$A$1000,"<="&C1,$A$1:$A$1000,">="&C1). The COUNTIFS function will consider that for each row of data, all conditions must be met to be added to the count (a little weird, but definitely useful). See screenshot.

Distribution of time values randomly in a table Excel - Modeling Power Grid

I am working on a model of charging load of electric vehicle. I am attaching a link to an excel workbook for your better understanding.
Column B contains random time values
Column G to P represents houses and each house can have 1 car. So the each time values needs to be distributed in one column. Now when a car is plugged in, its load stays constant for 3 cells.
I want excel to randomly distribute these cars e.g. 4 cars to 4 houses and leave others blank.
what i can think of is, to assign each time a random house then use IF formula with AND function to match random times with time series and second condition to match random houses with columns 1-10.
the problem i am facing is, the formula gives a value error and only works in the rows with has random generated time in front of them screenshot. I know there is a very small thing that i am missing. please help me find it
Regards
workbook
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(AND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))>=$F6,INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))<=$F6+TIME(0,30,0)),11,""))
The two elements in the AND find the house number in column C and return the corresponding time in column B.
The first element compares the time in F to that time. The second element compares the time + 30 minutes to F (three cells). If it's between those two times, it gets an 11.
The ISNA makes sure that the house in question is on the list. You could also use an IFERROR, but I prefer the precision of ISNA.
Update
If you want the values to wrap around, you need to OR compare to the next day.
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(OR(AND(ROUND($F6,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,30,0),5)),AND(ROUND($F6+1,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6+1,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,30,0),5))),11,""))
That formula structure looks like
=If(isna(),"",if(or(and(today,today),and(tomorrow,tomorrow)),11,"")
This formulas already getting too big. If you triple it for your three voltages, it will be huge. You should consider writing a UDF in VBA. It won't be as quick to calculate, but will probably be more maintainable.
If you want to stick with a formula, you could put the wattage in row 4 above the house number. Then in another table, list the wattages and minutes to charge. So in, say, B12:C14 you have
3.7 120
11 30
22 15
Now where you have 11 in your formula, you'd have G$4 and the two placed you have TIME(0,30,0), you'd have TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0). I re-arranged some stuff to make it more 'readable' (but it's still pretty tough) and here's the final formula
=IF(ISNA(MATCH(G$5,$C$6:$C$9,FALSE)),"",IF(OR(AND(ROUND($F6,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0),5)),AND(ROUND($F6+1,5)>=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE)),5),ROUND($F6+1,5)<=ROUND(INDEX($B$6:$B$9,MATCH(G$5,$C$6:$C$9,FALSE))+TIME(0,INDEX($C$12:$C$14,MATCH(G$4,$B$12:$B$14,FALSE)),0),5))),G$4,""))

I'm trying to calculate the average with multiple columns (days and firms) and rows

I'm currently trying to calculate the following:
That is, the average daily sales of a firm. The example underneath is small, but in reality I have 280 days and over a 100 firms. I've tried with VLOOKUP(firmname A, (cells), sales) but it obviously only brings back one number. Sum(Vlookup) is also not the best of choice.
Anyone who could point me in the right direction ... SUMIF?
http://i66.tinypic.com/2wmnrbn.png
Assuming data is located at B2:C11 and the list of Firms is at F2:F5 enter this formula in G3 and copy till last record
=AVERAGEIFS($D$3:$D$11,$C$3:$C$11,$F3)

Excel Formulae needed to calculate weight loss

Myself and some friends are taking part in a weight loss challenge this year and I will be recording monthly weigh in's and body measurements. I need to find a calculation which will work out the difference's in inches and pounds.
I have the item title in column B from row 10 down to Row 17. The first one in Row 10 is weight which is calculated in pounds.
Then going across from Column C is the month starting with Jan ending in December in Column N.
The total loss needs to be updated after every monthly entry into column O.
Unfortunately I cannot post a picture of the table as I'm new to this group.
I've tried other formulaes suggested to people with similar problems but they don't work for me.
Can anyone help?
Many Thanks
Helen
Bit hard to work it out from your description but I think you are looking for
=C10-MIN(D10:N10)
That assumes the largest figure will always be in column C and will update every time a new entry is placed in the row.
If the weight might go up (not that you are going to fail the challenge) you could use
=C10 - LOOKUP(1,1/(D10:N10<>""),D10:N10)
This should do the trick. (And you can copy down to other rows as necessary)
=INDEX(C10:N10,1,COUNT(C10:N10))-C10
INDEX used here, returns the value from the range C10:N10 in the first and only row, where the column is determined by the count of values already entered. So if you have values entered for 4 months, the formula will take April's value and subtract January's value.
A negative number will represent weight loss. A positive number means weight gain.
Total fat loss :
(AVERAGE(C10:N10) - C10)*2

Resources