How are the control variables chosen when using economic growth as the explanatory variable? what will affect economic growth? - economics

When doing empirical analysis in Econometrics, I don't know how to choose the control variables. What conditions need to be satisfied between the explanatory variables?

Related

Differences in Differences Parallel Trends

I want to measure whether the impact of a company's headquarter country on my independent variable (goodwill paid) is stronger during recessions. After some researching, I found out that the differences-in-differences analysis could solve my problem. However, in the internet they always show a diagram (see example under: https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.publichealth.columbia.edu%2Fresearch%2Fpopulation-health-methods%2Fdifference-difference-estimation&psig=AOvVaw1yMN6knTtOEahZ9vstJpnV&ust=1676208292554000&source=images&cd=vfe&ved=0CAwQjRxqFwoTCLjbrNDIjf0CFQAAAAAdAAAAABAE ) with the "treatment" and "parallel trends". So two lines that increase or decrease in the same way until the treatment and then one line increase/decreases more than the other.
My question now is what is my treatment and what is my control variable in my example? The treatment cannot be recessions because otherwise I just have the treatment group after the treatment and the control group before the recessions. If you think another statistical test may be better, I would be happy to consider that.
Furthermore, I just want to make sure that I created my model correctly: Goodwil Paid=B0+B1ressions+B2Country+B3ressionsCountry
Would that tell me whether the impact of the country is stronger during recessions?
Thanks a lot for your help.

Can changing constraints be used with the scikit-optimize API?

The canonical use case for scikit-optimize is an optimization objective given a fixed set of hyperparameters, where skopt is given full control to explore the space. However, one may wish to simultaneously expose a variable to skopt and fix it to a certain value on a subsequent iteration because it is outside of one's control. Is this possible using the current API?
Hypothetical use case:
We wish to maximize bike sale profit. Price is a free parameter to be optimized. The rain forecast is outside of our control, but we wish to control for it in skopt.

Which statistical method to choose?

I want to to find out if the level of education has an effect on the answer to the question: "Do you think the climate is changing?"
My level of education variable has 3 levels and there are 5 different possible answers to the question (probably changing, definitely changing etc.).
I am not sure which statistical method is appropriate here
This could depend on how you record your "climate change opinion" variable. If you keep it as an ordinal categorical variable, you could use ordinal logistic regression.
You could keep the variables both as categorical and conduct a Chi-Square Test of Homogeneity.
A path for more specific interpretations would be to assign a numerical value to this variable; such as definitely not changing = 1, maybe changing = 3, definitely changing = 5.
Your null hypothesis could be: "The mean climate change opinion is the same for each education level"
Alt: "At least one education level has a different mean climate change opinion."
You can perform an F-test of our 3 education groups to reveal if there is evidence of at least one group being significantly different from the others. From there you can use the Tukey HSD method to make comparisons between each education level group. This is like performing a t-test between each group.

How does SAS pick reference group when using CLASS statement?

How does SAS pick reference group when using CLASS statement?
I have a categorical variable and it can take on about 200 different values. Is it good practice to create dummies for only specific characteristics of this variable? I know that the other values are rarely used and in a correlation analysis they are not significant in predicting Y. The example is: There are about 200 different add-ons and the outcome variable is Sale (success vs. no success) the model is a logistic regression. I want to see whether any of these add ons seem to be more popular among customers and therefore are more likely to lead to a sale. Other IV are: how much the customer already pays on a monthly basis, where the customer comes from and which location the sales agent comes from.
How does SAS pick reference group when using CLASS statement?
By default, the first value in sort order is picked as the reference variable. This can be changed with the ref= option.
class var(ref='B')
Is it good practice to create dummies for only specific
characteristics of this variable?
That's a question better asked on Cross Validated

Learning Optimal Parameters to Maximize a Reward

I have a set of examples, which are each annotated with feature data. The examples and features describe the settings of an experiment in an arbitrary domain (e.g. number-of-switches, number-of-days-performed, number-of-participants, etc.). Certain features are fixed (i.e. static), while others I can manually set (i.e. variable) in future experiments. Each example also has a "reward" feature, which is a continuous number bounded between 0 and 1, indicating the success of the experiment as determined by an expert.
Based on this example set, and given a set of static features for a future experiment, how would I determine the optimal value to use for a specific variable so as to maximise the reward?
Also, does this process have a formal name? I've done some research, and this sounds similar to regression analysis, but I'm still not sure if it's the same thing.
The process is called "design of experiments." There are various techniques that can be used depending on the number of parameters, and whether you are able to do computations between trials or if you have to pick all your treatments in advance.
full factorial - try each combination, the brute force method
fractional factorial - eliminate some of the combinations in a pattern and use regression to fill in the missing data
Plackett-Burman, response surface - more sophisticated methods, trading off statistical effort for experimental effort
...and many others. This is an active area of statistical research.
Once you've built a regression model from the data in your experiments, you can find an optimum by applying the usual numerical optimization techniques.

Resources