Normalization of data in Excel from 0 to 1 - excel

For my class we are learning normalization of data. My proffessor gave an example of how to do do it with the data below (first table below). But he didn't really show how to get the numbers of the second table in Excel. Can someone show me how he got those numbers? The numbers in the second table are from 0 to 1, where 0 is the wrost and 1 is the best.
https://i.stack.imgur.com/cBs1c.png
DRAW BEST TO WORST CHART ON BOARD – WITH LINEARY NORMALIZATION BETWEEN
https://i.stack.imgur.com/mnmoP.png

To normalize you only need to divide by the greatest value in each column. Thus the biggest value will be equal to one.

Related

Counting data in Excel to calculate probabilities of intersections

I have the following problem and hope someone can give me a hint: I have an Excel sheet with three columns. In the first column I have a country code, in the second column I have a sector code (~50 sector codes per country and more than 30 countries). The third column includes a 0/1-Dummy. I would like to know the probability that the Dummy is one for sector 1 AND sector 2 (intersection). For that I need to know how often a 1 occurs in sector 1 and in sector 2.
The final output should be a conditional probability, and I think calculating it with the well know formula P(A|B)=P(intersection A and B)/P(B) is the easiest way - however, if there are easier ways to calculate the conditional probability, I would be very grateful as well.
In a simplified version the problem looks as follows, where I would like to know the probability that a AND b are 1:
screenshot of simplified table
Thanks in advance!
Just to get things started, I suggest you pivot the data first, then divide the number of rows with a=1 and b=1 by the number of rows (countries) in the table using
=COUNTIFS(G3:G5,1,H3:H5,1)/COUNT(G3:G5)

Excel Rolling Mean of 3 Similar Consecutive Observations

I'm trying to find the rolling mean of time series while ignoring values that do not follow the trend.
x
869
1570
946
0
1136
So, what I would want the result to look like is...
x | y
869 | 0
1570 | 0
946 | 1128.33
3 | 0
1136 | 1217.33 ([1136+1570+946]/3)
900 | 2982 ([946+1136+900]/3)
860 | 2896
The tough part here is if the row I'm on is a trending value I want to take the 3 previous trending values and find them mean of them, but if it's a non-trending value I want it to just zero out. Sometimes I might have to skip 2 or 3 previous lines to get 3 trending values to take the average as well.
So far I've been using array, RC formulas in a VBA macro form, but I'm not sure I could use RC here or if it has to be something else completely. Any help would be greatly appreciated.
I believe I can help you with your problem. First three notes:
1) It appears to me that you are trying to do DCA on smoothed production profiles, ignoring months without a complete record or no data. I'm making this assumption since you mentioned this was time series data but didn't give a sample rate. 2) I've added some extra 'data' for the sake of demo-ing. 3) In your example you shared, the last two values in your 'Y' column it looks like you may have summed but have forgotten to divide.
The solution I came up with has three parts: 1) create a metric to identify 'outliers'; 2) flag 'outliers'; 3) smooth non-flagged data. Let's establish some worksheet infrastructure and say that your production values are in column B and the associated time is in column A as follows:
Part 1) In column 'C', estimate a rough data value based on a trend approximated from two points on either side of your current time step. Subtract the actual value from this approximation. The result will always be positive and quite large for a timestep with little or no production.
=(INTERCEPT(B1:B6,A1:A6)+(A4*SLOPE(B1:B6,A1:A6)))-B4
Part 2) In column 'D', add a condition for when the value computed above is larger than the actual data point. Have it use '0' to identify a point that shouldn't be included in your average. Copy this down to the end of your data as well.
=IF(C4>B4,0,1)
Our sheet now looks like this:
3) Your three element average can now be computed. In the last cell of column 'E', enter the following array formula. You have to accept this formula by pressing ctrl + shift + enter. Once that is done fill the column from bottom to top:
=IFERROR(IF(D17=1,AVERAGE(INDEX(B12:B17,MATCH(2,1/(FIND(1,D12:D17)))),INDEX(B12:B16,MATCH(2,1/(FIND(1,D12:D16)))-COUNTIF(D17,"=0")),INDEX(B12:B15,MATCH(2,1/(FIND(1,D12:D15)))-COUNTIF(D16:D17,"=0"))),0),"")
This takes averages the most recent three values and allows for a skip of up to three time steps of outlier data per your problem statement. For an idea of how the completed sheet looks:
This was a fun challenge, I have some ideas for a more efficient formula but this should get the job done. Please let me know how this works for you!
Cheers
[EDIT]
An alternative approach which allows the user to specify the number of previous entries to include is detailed below. This is a more general (preferred alternative) and picks up in place of the previously described step 3.
3Alt) In cell G2 enter a number of previous values to average, for this example I am sticking with 3. In cell E4 enter the following array expression (ctrl+shift+enter) and drag to the end of column E:
=IFERROR(IF(D4=1,SUM(INDEX(D:D,LARGE(($D$4:D4=1)*ROW($D$4:D4),$G$2)):D4 * INDEX(B:B,LARGE(($D$4:D4=1)*ROW($D$4:D4),$G$2)):B4)/$G$2,0),"")
This uses the LARGE function to find the 'nth' largest value, where n is the number of preceding values from the current time-step to average. Then it builds a range that extends from the found cell to the current time step. Then it multiplies the flags (0's and 1's) by each month's production value, sums them and divides by n. In this way months flagged as bad are set to 0 and not included in the sum.
This is a much cleaner way to achieve the desired result and has the flexibility to average different periods of time. See example of the final value below.

Statistic with Median

So I have an excel-sheet where I have different values, for example:
I want to judge these objects by their values. The values should be between 0 and 1 so in the end I can draw a Matrix. So far so good. What you could do is just take the maximum value and divide the value of the object with that maximum value. For the final result I just take the average of all 3 values.
Now I have the problem, that if one value is too big, it effects the whole situation. So I know, that the Median tries to resolve this, but how can I use this, to get the percentages/values between 0 and 1? And is there an easy way to do this in Excel?
Does excel not have a median function?
Otherwise, you can sort and find the middle value if the number of rows is odd, or the two middle values if the number of rows is even and take the average of those two values to get the median.

How to generate random numbers within a normal distribution using Excel

I want to use the RAND() function in Excel to generate a random number between 0 and 1.
However, I would like 80% of the values to fall between 0 and 0.2, 90% of the values to fall between 0 and 0.3, 95% of the values to fall between 0 and 0.5, etc.
This reminds me that I took an applied statistics course once upon a time, but not of what was actually in the course...
How is the best way to go about achieving this result using an Excel formula. Alternatively, what is this kind of statistical calculation called / any other pointers that I can Google around for.
=================
Use case:
I have a single column of meter readings, which I would like to duplicate 7 times (each column for a new month). each column has 55 000 rows. While the meter readings need to vary for each month, when taken as a time series, each meter number should have 7 realistic readings.
The aim is to produce realistic data to turn into heat maps (i.e. flag outlying meter readings)
I don't think that there is a formula which would fit exactly to your requirements. I would use a very straightforward solution:
Generate 80% of data using =RANDBETWEEN(0,20)/100
Generate 10% of data using =RANDBETWEEN(20,30)/100
Generate 5% of data using =RANDBETWEEN(30,50)/100
and so on
You can easily change the precision of generated data by modifying the parameters, for example: =RANDBETWEEN(0,2000)/10000 will generate data with up to 4 digits after decimal point.
UPDATE
Use a normal distribution for the use case, for example:
=NORMINV(RAND(), 20, 5)
where 20 is a mean value and 5 is a standard deviation.

Plotting an upper envelope function in Excel

I have got data points of a sound sample from Audacity which I exported to a .txt file and imported in Excel. Is it possible to plot an upper envelope function in Excel?
(In the end I have to determine the reverberation time, so the time in which the loudness decreases with 60dB.)
For a decaying oscillation such as this or a damped pendulum, the decay envelope can be found by looking at the difference between each sampled reading. You then need to see where the is a change in gradient from +ve to -ve. (or zero on one side). To do this some logical operators are used.
Method:
Consider data starting at column A row 5 for time and Column B row 5 for first data value.
Create a column which has the value minus the previous value. [ C6=B6-B5 etc.]
Next do a column for the gradient change giving a "flag" of 1 for a positive to negative or zero inflection [D7= IF (AND(C7>0,C6<=0],1,0)
This should produce a column of data that corresponds to the peaks.
In the next columns use the flag to get the original co-ordinates to display
[E7 =IF (D7=1,A7,"")] For time and [F7 IF(D7=1,B7,"")]
Copy this data using "Value" only in the paste option to yet another column.
Filter this to exclude the null data.
Beware of aliasing in data set.
Not the most elegant of solutions (and some steps can be linked -for clarity shown separately) but it works.
Tech99m

Resources