I'm comparing machinery upgrade data and evaluating the performance gain after a machinery upgrade.
I have data (GT Power, ST Power) for both pre and post conditions. These are my x=GT Power and Y = ST Power variables.
I ask this question, after the upgrade at the same GT Power (independent variable) how much did the ST power increase (dependent variable).
Having a regression analysis as show in the below table, I can take the difference of line pre and post for a one full year. From there I can estimate how much energy gain was achieved as a percentage or by calculating the differences of ST Power but with the GT Power of pre-condition.
In the graph the blue lines are pre data and green lines are post data. How can I express the error in my analysis as there are two regression analysis and both has an error. Also is my method the best way to do this comparison?
Related
I am attempting to do a difference in independent means t-test. I am using international large-scale assessment data and thus need to use the pv command:
pv, pv(low_bench*) jkzone(jk_zone) jkrep(jk_rep) weight(total_student_weight ) jrr timss: reg #pv [aw=#w]
estimates store low_lang_sa06
I then repeat this for another year and save the estimate as low_lang_sa16.
I would like to know how to 1) conduct a difference in independent means test in Stata (between years) from these saved estimates or/& 2) how to use the standard error of each estimate to conduct a t test (if possible). If these questions don't make sense, I would appreciate an explanation.
I am using the meta-analysis from the Strang 2013 paper below to inform an economic model. It includes a meta analysis of reductions in reoffending. The results are presented using Standardised Mean Difference. This measure can be converted to an odds ratio using the Cochrane Formula (linked to below).
https://restorativejustice.org.uk/sites/default/files/resources/files/Campbell%20RJ%20review.pdf
https://handbook-5-1.cochrane.org/chapter_12/12_6_3_re_expressing_smds_by_transformation_to_odds_ratio.htm
This is unconventional because an odds ratio would normally be applied to the probability of a dichotomous outcome occurring. What is the most methodologically valid way of transforming the results of this meta-analysis, such that they can be incorporated within an economic model?
I am trying to create a quadratic model in Excel with tennis ranks data.
When running the automatic model trendline function it gives me a model with negative y values, which can obviously not occur for ranks.
How do I tell Excel to keep model y-values >=0?
Thank you!
Screenshot below/here refer.
There are several advantages to understanding the formulation / construct of the quadratic trend. For instance, replicating the 'automatic trend' using 'linest' as follows provides the user with additional control over the individual terms, and can highlight any graphical errors:
=$L$3+$K$3*D3+$J$3*D3^2+$I$3*D3^3
This demonstrates a cubic regression (white dots) - which coincide with Excel's 'automatic' trend line.
Summary of possible issues
There are several potential issues | remedies you can consider, depending on the goodness of fit, data in question, etc. A non-exhaustive list of issues you may be encountering include the following:
Issue
Resolve
1) Overfitting
Reduce # terms (e.g. order = 2 instead of 3 etc.)
2) Wrong fit
Attempt Lognormal
3) Negative left
Set Intercept
4) Graphical error
Use scatter chart, sort x values (ascending)
5) Outliers
Various: exclude/adjust, fit separate curve (Extreme Value Theory), manually adjust polynomial terms noting reduction to goodness of fit etc.
1) Overfitting
Trendline options: reduce the order per screenshot:
2) Lognormal | Other
Transform / consider other fits/curves (you can also place y and x axes on lognormal scale which will automatically remove negatives, although consider outliers and impact upon R-squared / goodness of fit).
3) Negative left
In certain circumstances, a negative left may be removed by setting the intercept to an appropriate value.
4) Graphical error
It's often easier to use a scatter chart, with x-values ordered per description (regression parameters may be affected otherwise).
5) Outliers
It may be the case you're fitting to 1 or 2 outliers. Consider reducing complexity/number of terms; or adjusting / omitting outliers suitably. There is an entire branch of statistics that deal with the distribution of extreme values/outliers (Extreme Value Theory - beyond scope of present answer).
Other remarks:
Rounding errors in the automatic trend-line function can lead to inaccuracies; human-error in replicating 'automatic trend-line' displayed on the chart - suggesting linest / exact formulation preferable).
Reference(s)
Data / formulation for first screenshot here
Useful video content: here
I ran an impulse response analysis on a value weighted stock index and a few variables in python and got the following results:
I am not sure how to interpret these results.
Can anyone please help me out?
You might want to check the book "New introduction to Multiple Time Series Analysis" by Helmut Lutkepohl, 2005, for a slightly dense theory about the method.
In the meantime, a simple way you can interpret your plots is, let's say your variables are VW, SP500, oil, uts, prod, cpi, n3 and usd. They all are parts of the same system; what the impulse response analysis does is, try to assess how much one variable impacts another one independently of the other variables. Therefore, it is a pairwise shock from one variable to another. Your first plot is VW -> VW, this is pretty much an autocorrelation plot. Now, look at the other plots: apparently, SP500 exerts a maximum impact on VW (you can see a peak in the blue line reaching 0.25. The y-axis is given in standard deviations and x-axis in lag-periods. So in your example, SP500 cause a 0.25 change in VW at the lag of whatever is in your x-axis (I can't see from your figure). Similarly, you can see n3 negatively impacting VW at a given period.
There is an interesting link that you probably know and shows an example of the application of Python statsmodels VAR for Impulse Response analysis
I used this method to assess how one variable impact another in a plant-water-atmosphere system, there are some explanations there and also the interpretation of similar plots, take a look:
Use of remote sensing indicators to assess effects of drought and human-induced land degradation on ecosystem health in Northeastern Brazil
Good luck!
I'm trying to work out how to solve what seems like a simple problem, but I can't convince myself of the correct method.
I have time-series data that represents the pdf of a Power output (P), varying over time, also the cdf and quantile functions - f(P,t), F(P,t) and q(p,t). I need to find the pdf, cdf and quantile function for the Energy in a given time interval [t1,t2] from this data - say e(), E(), and qe().
Clearly energy is the integral of the power over [t1,t2], but how do I best calculate e, E and qe ?
My best guess is that since q(p,t) is a power, I should generate qe by integrating q over the time interval, and then calculate the other distributions from that.
Is it as simple as that, or do I need to get to grips with stochastic calculus ?
Additional details for clarification
The data we're getting is a time-series of 'black-box' forecasts for f(P), F(P),q(P) for each time t, where P is the instantaneous power and there will be around 100 forecasts for the interval I'd like to get the e(P) for. By 'Black-box' I mean that there will be a function I can call to evaluate f,F,q for P, but I don't know the underlying distribution.
The black-box functions are almost certainly interpolating output data from the model that produces the power forecasts, but we don't have access to that. I would guess that it won't be anything straightforward, since it comes from a chain of non-linear transformations. It's actually wind farm production forecasts: the wind speeds may be normally distributed, but multiple terrain and turbine transformations will change that.
Further clarification
(I've edited the original text to remove confusing variable names in the energy distribution functions.)
The forecasts will be provided as follows:
The interval [t1,t2] that we need e, E and qe for is sub-divided into 100 (say) sub-intervals k=1...100. For each k we are given a distinct f(P), call them f_k(P). We need to calculate the energy distributions for the interval from this set of f_k(P).
Thanks for the clarification. From what I can tell, you don't have enough information to solve this problem properly. Specifically, you need to have some estimate of the dependence of power from one time step to the next. The longer the time step, the less the dependence; if the steps are long enough, power might be approximately independent from one step to the next, which would be good news because that would simplify the analysis quite a bit. So, how long are the time steps? An hour? A minute? A day?
If the time steps are long enough to be independent, the distribution of energy is the distribution of 100 variables, which will be very nearly normally distributed by the central limit theorem. It's easy to work out the mean and variance of the total energy in this case.
Otherwise, the distribution will be some more complicated result. My guess is that the variance as estimated by the independent-steps approach will be too big -- the actual variance would be somewhat less, I believe.
From what you say, you don't have any information about temporal dependence. Maybe you can find or derive from some other source or sources an estimate the autocorrelation function -- I wouldn't be surprised if that question has already been studied for wind power. I also wouldn't be surprised if a general version of this problem has already been studied -- perhaps you can search for something like "distribution of a sum of autocorrelated variables." You might get some interest in that question on stats.stackexchange.com.