How can I obtain the probability that one person will die at or before a certain age if I only have the average of the life expectancy?
For instance, a person is 45 years old. The life expectancy is 60 years. Can I find the probability the person will die at age 45 (not before 44, but not after 45)?
Keeping in mind the comments above and after some searching, I found these datasets
(WHO Life tables by country)
which helped me with the prediction that I want. In fact, apparently is imposible to determine the probability of survival with just the life expectancy, these datasets provide the survival probability for age based on the gender and the country.
Related
ey guys I need your help. I want to predict rice production in India using a simple regression. For this I have a dataset with the yield and production data for the last 40 years. As explanatory variables I have the daily data on rainfall, temperature etc. Now to my problem. Obviously the number of observations of the y-variable (40) does not match the number of observations of the x-variable (about 15,000). Thus a regression is not feasible. What is the best way to proceed?
Average the weather data over the year and thus estimate the y-variable, i.e. a kind of undersampling of the x-variable. Of course, this means that important data such as outliers are lost.
Add the annual production values for each weather entry in the associated year. This would give us the same y value 365 times. Doesn't sound reasonable to me either.
What other ideas do you guys have? If interested, I'll be happy to attach the datasets as well.
Data set given has below columns
age sex bmi children smoker region charges
19 female 27.9 0 no southwest 19393.03
i need just the graph name along with the parameters to be used to achieve the below questions result.
i need 5 point summary of numerical attributes?
Distribution of bmi column
Measure of skewness of bmi column
Distribution of categorical column
Do charges of people who smoke differ significantly from the people who don't?
6.Does bmi of males differ significantly from that of females?
Is the proportion of smokers significantly different in different genders
Is the distribution of bmi across women with no children, one child and two children,the same
I recommend https://bookdown.org/ndphillips/YaRrr/ as a good introduction to R that includes a big section on data visualisation.
The Doobie Brothers garage band is planning a concert. Tickets are set at $20. Based on what other bands have done, they figure they should sell 350 tickets, but that could fluctuate. They figure the standard deviation of sales at 50 tickets. No shows are uniformly distributed between 1 and 10. Fixed costs are 5000.
How profitable is the concert likely to be?
So I am able to enter the excel formula for revenue 50*20 and subtract 5000 for FC, but I am having trouble deciphering how to account for the no show costs. I know that I have to use RANDBETWEEN(1,10) formula, but I am not sure if it gets multiplied or divided by something. Again, I am looking for what to do with the formula in the context of a profit equation.
If it helps, the mean for the number of tickets sold is 350 and stdev is 50, so I used that to get the number of attendees in a simulated sense...That is, NORM.INV(RAND(),350,50)
Of course, this problem may not be realistic in real life because promoters keep the money, but for the purposes of the problem...just assume that no promoters exist here.
I am working on a tool for Fantasy Football that calculates the average value a player offers per million pounds of cost. It essentially boils down to their average points per game divided by their cost.
So for example, a player who costs £10m and scores an average of 5 points per game offers 0.5 points per game per million. Whereas a player who costs £8m and scores an average of 5 points per game offers 0.625 points per game per million. Clearly the player who costs £8m is better value.
My problem is, players are capable of scoring negatively, and so how do I account for that in calculating the value of a player?
To give another example, a player who costs £10m and scores an average of -2 points per game offers -0.2 points per game per million. Whereas a player who costs £8m and scores an average of -2 points per game offers -0.25 points per game per million.
Now the player who costs £10m appears to be better value because their PPG/£m is higher. This shouldn't be true, they can't be better value if they cost more but score the same points. So if I have a list of players sorted by their value, calculated in this manner, some players will incorrectly show higher than players that are technically better value.
Is there a way to account for this problem? Or is just an unfortunate fact of the system I'm using?
One simple trick will be to slightly change your formula for PPG/£m as the ratio of the square of the average points he scored and the cost.
If you are particular about the scales, consider its positive square root.
I am trying to figure out what the optimal number of products I should make per day are, displaying the values in a chart and then using the chart to find the optimal number of products to make per day.
Cost of production: $4
Sold for: $12
Leftovers sold for $1
So the ideal profit for a product is $8, but it could be -$3 if it's left over at the end of the day.
The daily demand of sales has a mean of 150 and a standard deviation of 30.
I have been able to generate a list of random numbers using to generate a list of how many products: NORMINV(RAND(),mean,std_dev)
but I don't know where to go from here to figure out the amount sold from the amount of products made that day.
The number sold on a given day is min(# produced, daily demand).
ADDENDUM
The decision variable is a choice you make: "I will produce 150 each day", or "I will produce 145 each day". You told us in the problem statement that daily demand is a random outcome with a mean of 150 and a SD of 30. Let's say you go with producing 150, the mean of demand. Since it's the mean of a symmetric distribution, half the time you will sell everything you made and have no losses, but in most of those cases you actually could have sold more and made more money. You can't sell products you didn't make, so your profit is capped at selling 150 on those days. The other half of the time, you won't sell all 150 and will take a loss on the unsold items, reducing your profit a bit. The actual profit on any given day is a random variable, because it is determined by random demand.
Since profit is random, you can calculate your average earnings across many days based on the assumption that you produce 150. You can also average earnings based on the assumption that you produce 140 per day, or 160 per day, or any other number. It sounds like you've been asked to plot those average earnings versus how many you decided to produce, and choose a production level that results in the highest long-term average earnings.