Assistance with Excel Formatting/Formula/Equations - excel

I have been building an automated trading robot. I have done this via using a virtual GUI in order to integrate it.
I want to be able to analyze the data using Excel. The problem is that I don't know how to organize the data on excel to give me the most vital information. So I thought there wouldn't be a better place to post this then stackoverflow. I have been looking for the "right" youtube video, which would be able to describe what I am looking for.
1.) Count for maximum loss trade sequence( I.e. If the program loss three times in the row as the max, this would be the count for it.)
2.)Count for maximum win trade sequence ( Same thing as mentioned previously, but for wins)
3.) Count for "Pages" loss trade sequence (Every 10 trades equals 1 page. This would mean that if we have 6 losses out of the 10 trades this consists of 1 loss page, if next ten trades consist the same thing or more losses then this means we have 2 pages of loss trade sequence.)
4.) Count for "Pages" Win trade sequence (Every 10 trades equals 1 page. This would mean the same as "Count for Pages loss trade sequence" just for wins instead.)
5.) Count for "Pages" Draw trade sequence ( Every 10 trades equals 1 page. This would mean we had 5 winning and losing trades out of the 10)
6.) Maximum Win pages in a row ( How many pages we won in a row based on #4 calculation)
7.) Maximum Lost pages in a row ( How many pages we lost in a row based on #3 calculation)
8.) Average of winning trades # (Averaging total amount of wins in a row)
9.) Average of losing trades # (Averaging total amount of losses in a row)
Examples below are correlated numerically with list above.
H:27-H:22 is the maximum lost trade sequence = 6
H:1-H:5 is the maximum win sequence = 5
(Rounding) is the total of losing pages = 2 (locations H:39-H:30 and H:29-H:20)
(Rounding) is the total of winning pages = 2 (locations H:50- H:41 and H-10:H-1)
(Ronding) is the total of drawn pages = 1 (location H:60-H:69)
There were no wins in a row
Two page losses in a row (H:39-H:20)
(would take count of wins / sequential wins count) i.e. if we had 10 sequences of winning in a row out of 100 wins this would equal 1 every ten times as the average
(would take count of losses / sequential losses) ^^ same thing except for losses
Any help would be greatly appreciated, heck even a youtube video. I have been working on this program for a while. I learned the hard way, and noticed that I need better data management. Thanks in advance!
"Excel Sheet"

Related

Number of days for delivery and number of orders delivered in two separate columns. Is there a way to get summary statistics about orders?

I've had a bit of trouble explaining this so please bear with me. I'm also very new to using excel so if there's a simple fix, I apologize in advance!
I have two columns, one listing number of days starting from 0 and increasing consecutively. The other column has the number of orders delivered. The two correspond to each other. For example, I've typed out how it would look below. It would mean that there were 100 orders delivered in 1 day, 150 orders delivered in 2 days, 800 orders delivered in 3 days, etc.
Is there a way to get summary statistics (mean, median, mode, upper and lower quartiles) for the number of days it took for the average order to get delivered? The only way I can think of solving this is to manually punch in "1" 100 times, "2" 150 times, etc. into a new column and take median, mean, and upper & lower quartile from that, but that seems extremely inefficient. Would I use a pivot table for this? Thank you in advance!
I tried using the data analysis add-on and doing summary statistics that way, but it didn't work. It just gave me the mean, median, mode, and quartiles of each individual column. It would have given me 3 for median number of days for delivery and 300 for median number of orders.
Method 1
The mean is just
=SUMPRODUCT(A2:A6,B2:B6)/SUM(B2:B6)
Mode is the value with highest frequency
=INDEX(A2:A6,MATCH(MAX(B2:B6),B2:B6,0))
The quartiles and median (or any other quantile by varying the value of p) from first principles following this reference
=LET(p,0.25,
values,A2:A6,
freq,B2:B6,
N,SUM(freq),
h,(N+1)*p,
floorh,FLOOR(h,1),
ceilh,CEILING(h,1),
frac,h-floorh,
cusum,SCAN(0,SEQUENCE(ROWS(values)),LAMBDA(a,c,IF(c=1,0,a+INDEX(freq,c-1)))),
xlower,XLOOKUP(floorh-1,cusum,values,,-1),
xupper,XLOOKUP(ceilh-1,cusum,values,,-1),
xlower+(xupper-xlower)*frac)
Method 2
If you don't like doing it this way, you can always expand the data like this:
=AVERAGE(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1))
=MODE(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1))
=QUARTILE.EXC(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1),1)
=MEDIAN(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1))
and
=QUARTILE.EXC(XLOOKUP(SEQUENCE(SUM(B2:B6),1,0),SCAN(0,SEQUENCE(ROWS(A2:A6)),LAMBDA(a,c,IF(c=1,0,INDEX(B2:B6,c-1)+a))),A2:A6,,-1),3)

How to calculate confidence interval and sample size

My company has 1000 locations. We will be conducting a survey (lets say to ask "yes" or "no" about something) using a sample of about 250 locations. Based on the results, we hope to estimate the proportion of all companies that is "yes". After surveying 250, say for example the proportion of "yes" is 70% and "no" is 30%. I would like to construct a 95% confidence interval to estimate the proportion of all locations that is "yes".
Question 1 - Do I still use the regular confidence interval calculation for a population proportion, i.e p_hat +/- z*SQRT((p_hat(1-p_hat)/n), or is there another formula since my population "N" in this case is 1000.
Question 2 - Is there a statistical calculation/guidance to determine the correct number of locations to survey in the first place.

In Excel, how do I find the margin of error and confidence intervals for surveys with different sample sizes and population sizes?

I'm calculating the NPS (Net Promoter Scores) for about 50 different sessions at a recent event. Each session was attended by about 50-500 people, and the number of survey responses for each session ranges between 15-400.
If I know:
The number of respondents for each session (sample size)
The number of attendees for each session (population size)
The NPS score for each session (average rating, basically—more info below)
How can I figure out the margin of error and/or confidence intervals for each session in Excel?
What formula would I use where, for example, X = sample size, Y = population size, and Z = avg rating?
I don't need this to be incredibly correct as long as I'm in the ballpark—so you can ignore the NPS part which might throw things off slightly:
This is slightly complicated by the fact that NPS is a weird metric.
It asks "How likely would you be to recommend X to a friend or
colleague?" with a scale from 0-10 (10 = extremely likely, 0 = not at
all likely). You then count every 10 and 9 as a "promoter," count
every 8 and 7 as a "neutral" or "passive," and count everything
between 6 and 0 as a "detractor."
You then get the NPS by subtracting the detractors from the promoters, dividing that number by the total responses, then multiplying it by 100, so: ((Promoters - Detractors)/(Total responses))*100. NPS sort of flattens every response to a +1, 0, or -1, so it might complicate the calculations.
Assume I've already calculated the NPS for each session. I'm trying to figure out the margin of errors and/or confidence intervals for each session using Excel.
So, for example, my data would look like this:
Again, you can ignore the NPS stuff if it makes it easier and just assume it's an average rating where people were asked to rate each session from -100 to +100. What function(s) would I use in Excel to find the margin of error and/or confidence intervals for each session, given the sample size and target population size, and the average rating?

Simulation in Excel using probability

I am trying to create a spreadsheet that can find the most likely probability that a student scored a specific grade on a test.
Only one student can score a grade and only one grade can have a student.
I have limited information about each student.
There are 5 students (1,2,3,4,5)
and the grades possible are only (100,90,80,70,60)
In the spreadsheet a 1 denotes that the student DIDN'T score that grade.
Does anyone know how to make a simulation that I can find the most likely probability of what student scored what grade?
Link:
https://docs.google.com/spreadsheets/d/1a8uUIRzUKsY3DolTM1A0ISqMd-42WCUCiDsxmUT5TKI/edit?usp=sharing
Based on your response in comments, each student has an equal likelihood of getting each grade. No simulation is necessary.
If you want to simulate it anyway, don't use Excel*. Create a vector of students, and pair it with a shuffled vector of the grades. Lather, rinse, repeat as many times as you want to see that the student-to-grade matching is uniformly distributed.
* - To get an idea of how bad Excel can be for random variate generation, enable the Analysis Toolpak, go to "Data -> Data Analysis" on the ribbon, and select "Random Number Generation". Fill in the tabs that you want 10 variables, number of random numbers 2000, a "Normal" distribution, leave the mean and std dev at 0 and 1, and enter a "Random Seed" value of 123. You will find that the resulting table contains 3 instances of the value "-9.35764". Values that extreme should occur about once per twenty thousand years if you generate a billion a second. Getting three of them is so extreme that it should happen once per 1030 times the current estimated age of the universe. Conclude that a) it's your lucky day, or b) Excel sucks at random numbers, and despite being informed about this as far back as 1998 Microsoft hasn't bothered to fix it.

Monte Carlo Simulation using Excel Solver

I am trying to figure out what the optimal number of products I should make per day are, displaying the values in a chart and then using the chart to find the optimal number of products to make per day.
Cost of production: $4
Sold for: $12
Leftovers sold for $1
So the ideal profit for a product is $8, but it could be -$3 if it's left over at the end of the day.
The daily demand of sales has a mean of 150 and a standard deviation of 30.
I have been able to generate a list of random numbers using to generate a list of how many products: NORMINV(RAND(),mean,std_dev)
but I don't know where to go from here to figure out the amount sold from the amount of products made that day.
The number sold on a given day is min(# produced, daily demand).
ADDENDUM
The decision variable is a choice you make: "I will produce 150 each day", or "I will produce 145 each day". You told us in the problem statement that daily demand is a random outcome with a mean of 150 and a SD of 30. Let's say you go with producing 150, the mean of demand. Since it's the mean of a symmetric distribution, half the time you will sell everything you made and have no losses, but in most of those cases you actually could have sold more and made more money. You can't sell products you didn't make, so your profit is capped at selling 150 on those days. The other half of the time, you won't sell all 150 and will take a loss on the unsold items, reducing your profit a bit. The actual profit on any given day is a random variable, because it is determined by random demand.
Since profit is random, you can calculate your average earnings across many days based on the assumption that you produce 150. You can also average earnings based on the assumption that you produce 140 per day, or 160 per day, or any other number. It sounds like you've been asked to plot those average earnings versus how many you decided to produce, and choose a production level that results in the highest long-term average earnings.

Resources