count how often a piece of information appears and calculate that average - excel

I do not want to know the traditional frequency or the traditional averages; so I'll give an example below:
I have this data:
1
3
5
5
2
3
5
5
1
3
The analysis that I would like to obtain is the following:
for example number 1 appears once every eight rows, number 3 appears once every four rows, number 5 appears twice every two rows....
I did it by hand, but now I have more than 21000 rows of data and I'm stuck.
I searched but I can not find a function that does it; But before I started developing my own, I decided to ask for a guide on how to achieve it.

I believe that I was able to achieve the desired result:
The formula is:
Or, if you want to copy/paste:
=IF(CONCATENATE("1-",MATCH(D1,INDIRECT(ADDRESS(MATCH(D1,A1:A17,0)+1,1,4)&":A17"),0))="1-1",CONCATENATE("2-",MATCH(D1,INDIRECT(ADDRESS(MATCH(D1,A1:A17,0)+2,1,4)&":A17"),0)-1),CONCATENATE("1-",MATCH(D1,INDIRECT(ADDRESS(MATCH(D1,A1:A17,0)+1,1,4)&":A17"),0)))
Note that the IF function solves the duplicates (like the number 5). In case you have triplicates you will have to add another instance of IF and adjust the formula accordingly.
Hope that helps!

Well this doesn't exactly reproduce your results, but you could start by looking at the max and min separation of the numbers:
=IF(COUNTIF(A$1:A$10,C2)<=1,"",MIN(IF((ROW(A$1:INDEX(A$1:A$10,COUNTIF(A$1:A$10,C2)+1))>1)
*(ROW(A$1:INDEX(A$1:A$10,COUNTIF(A$1:A$10,C2)+1))<=COUNTIF(A$1:A$10,C2)),
FREQUENCY(IF(A$1:A$10<>C2,ROW(A$1:A$10)),IF(A$1:A$10=C2,ROW(A$1:A$10)))))+1)
=IF(COUNTIF(A$1:A$10,C2)<=1,"",MAX(IF((ROW(A$1:INDEX(A$1:A$10,COUNTIF(A$1:A$10,C2)+1))>1)
*(ROW(A$1:INDEX(A$1:A$10,COUNTIF(A$1:A$10,C2)+1))<=COUNTIF(A$1:A$10,C2)),
FREQUENCY(IF(A$1:A$10<>C2,ROW(A$1:A$10)),IF(A$1:A$10=C2,ROW(A$1:A$10)))))+1)
This gives the min or max number of rows between each occurrence of the particular number.
Must be entered as an array formula using CtrlShiftEnter
You could add other statistics (like mean, standard deviation) the same way although the average could be calculated just by (lastrow-firstrow)/(count-1) e.g. for 5 it would be (8-3)/(4-1)=5/3.

Related

What function to use for this difficult excel calculation for the roulette wheel?

so I am a complete excel and math noob and I want to have a cell in excel which will display the "Pelayo number", which is used in calculating bias in a roulette wheel. You can read more about it here: https://www.roulette-bet.com/2015/06/the-roulette-bias-winning-method.html
enter image description here
Let me explain briefly what I want. As you can see on the image there are two columns, in one there are the numbers on a roulette wheel and and in the second one there is the frequency of each number. On top you see number of spins (852). The number on the bottom (23,02.....) is the expected frequency of each number. The table is dynamic, constantly evolving as I enter new data.
Now I want a cell to display the total number of positives. Which is calculated like this:
If there have been 300 spins, each numbers has to have been spun 300/36 = 8.33 in order to be breaking even. This means those which have been spun 8 times are losing a little, and those which have showed 9 times are winning something. If a number has appeared 14 times it is clear it has 14-8.33 = 5.67 which we will express in an abbreviated form like +5. Let’s suppose the exact same situation has occurred for 6 other numbers also, they all will make a total sum of 5.67 + 5.67 + 5.67 + 5.67 + 5.67 + 5.67+ 5.67 = 39.69. as no other number has been spun over 9 times, then we say the amount of total positives at this table at 300 spins is +39.
TLDR So ideally something like: Select all the numbers from (G6:G42) which are bigger than value in (G50) and then substract them one after another from the expected frequency (G50) and then add this all up.
I tried to solve it but just couldnt find a tutorial anywhere
I'll break this down for you, and show you a few helpful Excel concepts along the way.
Especially if you are a beginner, I'd recommend using a helper column. Helper columns are great ways to break down complicated functions into smaller, more manageable parts.
In H6, write =IF(G6>G$50,G6,0). That if statement will set us up for our sum, with either the value in G6 or a 0. The $ will be cleared up in a moment.
Then, hover your mouse over that cell, and a little square box will appear in the lower right corner of H6. Grab that tiny box, and drag it down to H42. This fills in the formula, adjusting all of the numbers relatively as you go. Note that the 50 stayed constant, however - that's what the $ did!
H6 is now your helper column. It doesn't find your answer, but it gets an important, intermediary step done.
Finally, wherever you'd like your answer, write =SUM(G6:G42), and you should be well on your way.
=SUMIF(G6:G42,">"&G50,G6:G42)-COUNTIF(G6:G42,">"&G50)*G50
It sums values that are over in G50 then distracts G50 value as much times as there were values to sum up to.
For example in case G50 is 23.02 and you have values 20, 21, 22, 23, 24, 25.
It would calculate like (22+23+24+25)-4*23.02

Excel Rolling Mean of 3 Similar Consecutive Observations

I'm trying to find the rolling mean of time series while ignoring values that do not follow the trend.
x
869
1570
946
0
1136
So, what I would want the result to look like is...
x | y
869 | 0
1570 | 0
946 | 1128.33
3 | 0
1136 | 1217.33 ([1136+1570+946]/3)
900 | 2982 ([946+1136+900]/3)
860 | 2896
The tough part here is if the row I'm on is a trending value I want to take the 3 previous trending values and find them mean of them, but if it's a non-trending value I want it to just zero out. Sometimes I might have to skip 2 or 3 previous lines to get 3 trending values to take the average as well.
So far I've been using array, RC formulas in a VBA macro form, but I'm not sure I could use RC here or if it has to be something else completely. Any help would be greatly appreciated.
I believe I can help you with your problem. First three notes:
1) It appears to me that you are trying to do DCA on smoothed production profiles, ignoring months without a complete record or no data. I'm making this assumption since you mentioned this was time series data but didn't give a sample rate. 2) I've added some extra 'data' for the sake of demo-ing. 3) In your example you shared, the last two values in your 'Y' column it looks like you may have summed but have forgotten to divide.
The solution I came up with has three parts: 1) create a metric to identify 'outliers'; 2) flag 'outliers'; 3) smooth non-flagged data. Let's establish some worksheet infrastructure and say that your production values are in column B and the associated time is in column A as follows:
Part 1) In column 'C', estimate a rough data value based on a trend approximated from two points on either side of your current time step. Subtract the actual value from this approximation. The result will always be positive and quite large for a timestep with little or no production.
=(INTERCEPT(B1:B6,A1:A6)+(A4*SLOPE(B1:B6,A1:A6)))-B4
Part 2) In column 'D', add a condition for when the value computed above is larger than the actual data point. Have it use '0' to identify a point that shouldn't be included in your average. Copy this down to the end of your data as well.
=IF(C4>B4,0,1)
Our sheet now looks like this:
3) Your three element average can now be computed. In the last cell of column 'E', enter the following array formula. You have to accept this formula by pressing ctrl + shift + enter. Once that is done fill the column from bottom to top:
=IFERROR(IF(D17=1,AVERAGE(INDEX(B12:B17,MATCH(2,1/(FIND(1,D12:D17)))),INDEX(B12:B16,MATCH(2,1/(FIND(1,D12:D16)))-COUNTIF(D17,"=0")),INDEX(B12:B15,MATCH(2,1/(FIND(1,D12:D15)))-COUNTIF(D16:D17,"=0"))),0),"")
This takes averages the most recent three values and allows for a skip of up to three time steps of outlier data per your problem statement. For an idea of how the completed sheet looks:
This was a fun challenge, I have some ideas for a more efficient formula but this should get the job done. Please let me know how this works for you!
Cheers
[EDIT]
An alternative approach which allows the user to specify the number of previous entries to include is detailed below. This is a more general (preferred alternative) and picks up in place of the previously described step 3.
3Alt) In cell G2 enter a number of previous values to average, for this example I am sticking with 3. In cell E4 enter the following array expression (ctrl+shift+enter) and drag to the end of column E:
=IFERROR(IF(D4=1,SUM(INDEX(D:D,LARGE(($D$4:D4=1)*ROW($D$4:D4),$G$2)):D4 * INDEX(B:B,LARGE(($D$4:D4=1)*ROW($D$4:D4),$G$2)):B4)/$G$2,0),"")
This uses the LARGE function to find the 'nth' largest value, where n is the number of preceding values from the current time-step to average. Then it builds a range that extends from the found cell to the current time step. Then it multiplies the flags (0's and 1's) by each month's production value, sums them and divides by n. In this way months flagged as bad are set to 0 and not included in the sum.
This is a much cleaner way to achieve the desired result and has the flexibility to average different periods of time. See example of the final value below.

Excel: Dividing numbers and using the remainders

Little issue I'm having that I'm hoping someone can help me with please?
So I have 3 columns in Excel. Each Column (A/B/C) contains either "high" / "Medium" / "Low" scored issues. However, if you have 3 Low issues, this is grouped together, and this becomes 1 Medium Issue for example.
The difficulty I'm having is writing a formula that will do this for me. Obviously I could just divide the number of Low issues I have by 3, but in the case where I have 7 Low issues, It should result with 2 Mediums and 1 remaining Low. I've tried using the "Mod" function, but that only returns the remainder.
What I need is a formula that will say "If you have 7 Low Issues, (3 low = 1 medium), therefore you have 2 medium and 1 Low). The medium issues would then be added to the Medium Column (Col B), and the remaining low issue is counted in the Low issue column (Col C).
I hope this explanation makes sense, fingers crossed one of you might be able to help me! Thank you in advance
As requested, a screenshot!
If I understand you correctly, I think you should be able to adapt the following formulas to meet your needs.
To get the number of occurrences of the word "Low" in column A:
=COUNTIF(A:A, "=Low")
To get the number of "Mediums" from 3 occurrences of "Low" in column A, round down the above number divided by 3:
=FLOOR(COUNTIF(A:A, "=Low")/3,1)
To get the remaining "Lows" after groupings of 3 into "Mediums", use MOD:
=MOD(COUNTIF(A:A, "=Low"),3)
Putting this into a worksheet:
Values
Formulas
Finally, if you wanted one "Mediums" count, i.e. adding the remaining "Mediums" which aren't grouped into "Highs", you would use a combination of the above formulas for what is left after grouping to "Highs" with what is gained from grouping of "Lows".
Edit:
Now you've included an image, I can show how these formulas are directly applicable...
Values
Formulas
Sounds like you were already nearly there with using =MOD() just needed a little tweak:
For the high column:
=COUNTA(A2:A8)+FLOOR(COUNTA(B2:B8)/3,1)
For the medium column:
=FLOOR(COUNTA(C2:C8)/3,1)+MOD(COUNTA(B2:B8),3)
For the low column:
=MOD(COUNTA(C2:C8),3)
It's exactly like a long addition that you do at school where each column carries over to the one to the left of it (except base 3 instead of base 10). I'm not clear that existing answers cover the case where there is a carry from one column and that causes a further carry from the next column so here is another answer
In the totals row (e.g. for the medium column) in (say) C12
=COUNTA(C2:C10)+INT(D12/3)
Then use mod as before
=MOD(C12,3)
except that in the high column you don't want to use MOD so it's just
=B12

Excel AVERAGEIFS else statement

I'm trying to perform an AVERAGEIFS formula on some data, but there are 2 possible results and as far as I can tell AVERAGEIFS doesn't deal with that situation.
I basically want to have an ELSE inside it.
At the moment I have 2 ranges of data:
The first column only contains values 'M-T' and 'F' (Mon-Thurs and Fri).
The second column contains a time.
The times on the rows with an 'F' value in column 1 are an hour behind the rest.
I want to take an average of all the times, adjusting for the hour delay on Fridays.
So for example I want it to take an average of all the times, but subtract 1 hour from the values which are in a row with an 'F' value in it.
The way I've been doing it so far is by having 2 separate results for each day, then averaging them again for a final one:
=AVERAGEIFS(G3:G172, B3:B172, "M-T")
=AVERAGEIFS(G3:G172, B3:B172, "F")
I want to combine this into just one result.
The closest I can get is the following:
=AVERAGE(IF(B3:B172="M-T",G3:G172,((G3:G172)-1/24)))
But this doesn't produce the correct result.
Any advice?
Try this
=(SUMPRODUCT(G3:G172)-(COUNTIF(B3:B172,"=F")/24))/COUNTIF(B3:B172,"<>""""")
EDIT
Explaining various steps in the formula as per sample data in the snapshot.
SUMPRODUCT(G3:G17) sums up all the value from G3 to G17. It gives a
value of 4.635416667. This after formatting to [h]:mm gives a value
of 111.15
OP desires that Friday time be one hour less. So I have kept one hour less for Friday's in the sample data. Similar SUMPRODUCT on H3:H17 leads to a value of 4.510416667. This after formatting to [h]:mm gives a value
of 108.15. Which is exactly three hours less for three occurrences of Fridays in the sample data.
=COUNTIF(B3:B17,"=F") counts the occurrences of Friday's in the B3:B17 range which are 3 occurrences.Hence 3 hours have to less. These hours are to be represented in terms of 24 hours hence the Function COUNTIF() value is divided by 24. This gives 0.125. Same is the difference of 4.635416667 and 4.510416667 i.e. 0.125
Demonstration column H is for illustrative purposes only. Infact Friday accounted values that is 108.15 in sample data has to be divided by total data points to get the AVERAGE. The occurrences of data points are calculated by =COUNTIF(B3:B17,"<>""""") with a check for empty columns.
Thus 108:15 divided by 15 data points give 7:13 in the answer.
Revised EDIT Based upon suggestions by #Tom Sharpe
#TomSharpe has been kind enough to point the shortcomings in the method proposed by me. COUNTIF(B3:B172,"<>""""") gives too many values and is not advised. Instead of it COUNTA(B3:B172) or COUNT(G3:G172) are preferable. Better Formula to get AVERAGE as per his suggestion gives very accurate results and is revised to:
=AVERAGE(IF(B3:B172="M-T",G3:G172,((G3:G172)-1/24)))
This is an Array Formula. It has to be entered with CSE and further cell to be formatted as time.
If your column of M-T and F is named Day and your column of times is named TIME then:
=SUMPRODUCT(((Day="M-T")*TIME + (Day="F")*(TIME-1/24)))/COUNT(TIME)
One simple solution would be to create a separate column that maps the time column and performs the adjustment there. Then average this new column.
Is that an option?
Ended up just combining the two averageifs. No idea why I didn't just do that from the start:
=((AVERAGEIFS(G$3:G171, $B$3:$B171, "F")-1/24)+AVERAGEIFS(G$3:G171, $B$3:$B171, "M-T"))/2

Percentage Calculation Formula

I am trying to calculate score that has 5 sections( minimum of 3 questions per sections). Among these some can be N/A. I need to calculate in such a way that the first 4 parts should be (Total Yes answers) / (total Yes+NO questions) * 10 and the 5th part should be multiplied by 60. Finally I need to sum all 5 parts to get a final score .
My Solution : (first 4 parts)
=IFERROR((COUNTIF(D70:D72,"Yes")/SUM(COUNTIF(D70:D72,"Yes"),(COUNTIF(D70:D72,"No"))))*10,"N/A")
The above formula continues for other 3 sections with different range values
5th part :
=IFERROR((COUNTIF(D70:D72,"Yes")/SUM(COUNTIF(D70:D72,"Yes"),(COUNTIF(D70:D72,"No"))))*60,"N/A")
Final Score :
=AGGREGATE(9,6,(G30,G42,G52,G64,G73))/100
I tested my formula having one section has N/A and other has YES. This gives me a result of 90% rather than 100%
My question is what if one of the parts is completely n/a. then how I should ignore the n/a section and still get 100%
In your final calculation, rather than dividing by 100 you need to divide by the sum of the possible points. There will be a more elegant way to do this, but since there are only five of them:
=AGGREGATE(9,6,(G30,G42,G52,G64,G73))/SUM(IF(ISERROR(G30),0,10),IF(ISERROR(G42),0,10),IF(ISERROR(G52),0,10),IF(ISERROR(G64),0,10),IF(ISERROR(G73),0,60))
Yeah !It did work Finally ! Thanks a lot for your guidance !
=IFERROR((SUM(G30,G42,G52,G64,G73)/SUM(IF(G30="#N/A",0,10),IF(G42="#N/A",0,10),IF(G52="#N/A",0,10),IF(G64="#N/A",0,10),IF(G73="#N/A",0,60))), "-")

Resources