Finding Percentile from Data set and eliminating outliers - excel

Disclaimer, I am completely new to excel coding VBA etc and I am having a tough time even figuring out what I have to do.
In short, I have lengthy amount of raw data, and I need to determine the 95th and 5th percentile for each day (i.e. 14/3/2017), and subsequently remove rows with the values above/below the 95th and 5th percentile. It is tedious to find the percentile value for each day and use the or function to remove the outliers. I appreciate all help and please understand that I am very new to this hence I may not understand what you reply or where to put any code.
Thank you for all help

Related

Subtracting times across a day in excel

I am working on the capstone project in of the Google Career Certificate in Data Analytics. I am using Microsoft Excel. I have to calculate the ride length based on the start and end ride times. I've inputted the formula =F2(end time)-D2(start time) which returns the ride length. Going through my entire list I have some areas where the start time is like 11pm and the end time is 1am and this is returning ###### because it is a negative number with the regular formula. I've found a modified formula that can kind of do the conversion I am looking for but it is still a bit problematic. The modified formula is =(F2-D2+(F2<D2))*24 and it seems to give an accurate ride length if I reformat the answer to number. The issue is the rest of my data is in time format and the modified ones are in number format. If I convert the number values to time, the ride length values are inaccurate.
It is tricky to make the numeric value change as well due to me using a formula. I can correct them one by one after I save Excel and it no longer stores the numbers as the formula, but there are lots of data points to change and that would be time consuming. I'm hoping to find a more concise way to solve this problem. Maybe with a better formula.
[Snippet of the chart 1
Just like everything in life, there are multiple ways to achieve things. I would have formatted the date and time into a single cell; but. if you're gathering the data from another source, that's understandable.
A simple IF statement here will work. IF the days are one apart, then take '1' day off the starting time, else do your original formula:
=IF(E4-C4=1,F4-(D4-1),F4-D4)

Having problems spreading expense over months with partial month calculation

I am writing a nested if statement that will calculate monthly expense based on an "Expense Frequency" option. Most of them see to be working but I have two that will not work for some reason.
The first problem is the "Fixed" option - I want this to put the whole expense rate in the first month of the expense period. For some reason it is not triggering in the first month. I feel like this might be a simple fix?
The second problem is a little more complex. It is the "Spread Amount" option. When the months are full calendar months the calculation is easy, divide total expense amount by # of months. But when I factor in partial months the calculation diverges from what I am looking for. The shorter the time duration, the larger the variance is from the total expense. When I stretch the expense over a longer period the variance shrinks. It is a pretty complex calculation (I think) and basically I want to use days out of the month for the calculation in the partial months (first and/or last) and then whole months in the middle. I have attempted this in my attached spreadsheet and I thought it might work but it isn't. Is anyone able to help me out here? I would even be satisfied with a succinct explanation of why this calculation isn't possible / doesn't make sense / cannot be done so I can explain this to my boss. I am providing a cash reward for this if it can be handled in the next three hours. Please help!! Thank you!
This is my formula
=IF($I9="Spread Amount",IF(AND($J9<=O$6,$K9>=O$5),IF(EOMONTH($J9,0)=O$6,(($M9)/(DATEDIF($J9,$K9,"m")+1)((O$6-$J9+1)/O$4)),IF(EOMONTH($K9,0)=O$6,IF(EOMONTH($K9,0)=O$6,(($M9)/(DATEDIF($J9,$K9,"m")+1)(($K9-O$5+1)/O$4))),($M9-IF(ISERROR(HLOOKUP($K9,$6:9,$XFD9,0)),HLOOKUP(EOMONTH($K9,0),$6:9,$XFD9,0),0)-IF(ISERROR(HLOOKUP($J9,$6:9,$XFD9,0)),HLOOKUP(EOMONTH($J9,0),$6:9,$XFD9,0),0))/(DATEDIF($J9,$K9,"m")))),0),IF(O$7>$K9,0,IF($I9="EOQ",IF(OR(MONTH(O$7)=3,MONTH(O$7)=6,MONTH(O$7)=9,MONTH(O$7)=12),$M9,0),IF($I9="Spread Evenly",IF(AND($J9<=O$6,$K9>=O$5),IF(EOMONTH($J9,0)=O$6,$M9*((O$6-$J9+1)/O$4),IF(EOMONTH($K9,0)=O$6,$M9*(($K9-O$5+1)/O$4),$M9)),IF($I9="Fixed",IF(AND($J9>=O$5,$J9<=O$6),$M9,0),IF($I9="Repeat Annually",IF(MONTH($J9)=MONTH(O$6),$M9,0),0))),IF($I9="Odd Month",IF(ISODD(MONTH(O$6)),$M9,0),IF($I9="Daily",IF(AND($J9<=O$6,$K9>=O$5),IF(EOMONTH($J9,0)=O$6,($M9/($K9-$J9+1))(O$6-$J9+1),IF(EOMONTH($K9,0)=O$6,($M9/($K9-$J9+1))($K9-O$5+1),($M9/($K9-$J9+1))*O$4)),0))))))*1)
This is what the spreadsheet looks like

Excel Index Function

I just want to thank you guys in advance. I think you guys are doing a great job in helping people out with programming stuff. Pats on the back for all of you.
Here is what I've been working on: I have daily stock price return data on about 4000 stocks. I want to add them to my portfolio after observing their performance for 12 months. I will choose the top 10% best performers and bottom 10% worst performers. I will create multiple portfolios over a period of time. I have done that with no problem.
I want to use the INDEX function to calculate the daily return of my portfolio. Not all 4000 stocks are in my portfolio, about 300 stocks are in my portfolio at any given time. The daily portfolio returns will be calculated by multiplying the weights (they are equal weighted, so 1/300) to that stock's return on the specific date. I assume it has to do with a combination of INDEX, SUMPRODUCT, and IF or MATCH functions.
I have been thinking this for a long time and I just can't get to the bottom of it. I have attached pictures for a portion of what I was working on. I think will give you a good picture of what I'm trying to do. I bet this is such an easy thing for you guys. I hope you can help me out! Thanks again!
PICTURES:IN or OUT portfolio & Stock's individual returns
Charles
Not sure I understood your problem, but here is a trial suggestion:
You get data for 4000 stocks while you are monitoring 300. So, you need to find the correct one within your sheet (there will be 3700 that will not match anything).
If you have your stocks listed in, say, column "A", you could use the function LOOKUP (well explained in the Web). If you need to get the row of your stock, you can use the function MATCH.
If this is not what you are looking for, it means that I (at least) did not understand you, so you would need to add details to your question.

Excel VBA large data runtime issue

I have large scale data (700K rows), and I'm trying to count the number
of appearance of a word within the rows, and do so for also many times (50K iterations).
I'm wondering if Excel is appropriate platform, using VBA or maybe COUNTIFS, or should I use different Platform?
If so, is there a platform that has similarity points to Excel and VBA?
Thanks!
With your small sentences in column A and the 700k lines in column A of Sheet1, this formula will count the occurrences. It's an array formula and must be entered with Ctrl+Shift+Enter.
=SUM(--NOT(ISERR(FIND(A2,Sheet1!$A$1:$A$700000))))
To calculate 200 small sentences took about 20 seconds on my machine. If that's an indication, it will take about 1.5 hours to calculate 50k small sentences. You should probably find a better tool or at least hit calculate right before you leave for lunch. Definitely test it on a smaller number to make sure it gives you the answers you want. If you don't have to do this often, maybe 1.5 hours is palatable.

how to specify the first cell with data in it inside an if statement in Excel

I keep a running avg of kids grades in Excel over a 2 week period. The way i have the code now
AVERAGE(OFFSET('1'!E4,COUNTA('1'!D:D)-29,):OFFSET('1'!E4,COUNTA('1'!D:D),))
it returns an error if i don't have 2 weeks of data. I found a way around this by doing this
=IFERROR(AVERAGE(OFFSET('1'!E4,COUNTA('1'!D:D)-29,):OFFSET('1'!E4,COUNTA('1'!D:D),)),IFERROR(AVERAGE(OFFSET('1'!E4,COUNTA('1'!D:D)-28,):OFFSET('1'!E4,COUNTA('1'!D:D),)),IFERROR(AVERAGE(OFFSET('1'!E4,COUNTA('1'!D:D)-27,....
Im Sure there is a better way to do this any help would be appreciated.
As pointed out in the comments - it's difficult to give a definitive answer without some more knowledge - it's not easy to tell, for instance, where the numeric data starts (E4 or E5)?
Firstly you can simplify your original formula which appears to AVERAGE the last 30 rows of data (not sure how 30 rows equates to 2 weeks of data) - you can do that with just:
=AVERAGE(OFFSET('1'!E4,COUNTA('1'!D:D)-29,0,30))
Now I assume the error comes about when COUNTA('1'!D:D) is < 29 so you can simply add an If function which AVERAGES all the data if that function returns a number < 29, i.e.
=IF(COUNTA('1'!D:D)<29,AVERAGE('1'!E4:E33),AVERAGE(OFFSET('1'!E4,COUNTA('1'!D:D)-29,0,30)))
That formula may need some small adjustments to cater for the specifics of your layout but the general approach is valid

Resources