I have a question regarding my understanding about repeated cross section and panel.
Is the Stata command xtreg, fe the same as regress and putting all possible fixed effects?
The Assumption here is: the dataset is a balanced panel.
So can I treat this panel as repeated cross-section when I put the fixed effects?
The slopes will be the same. This seems like an easy fact to verify. Compare the output of
webuse nlswork
xtreg ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure c.tenure#c.tenure in 1/100, fe
reg ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure c.tenure#c.tenure i.idcode in 1/100
You can use regress, though it's not computationally efficient. You will probably need to increase the matsize to at least N+K. You should also consider clustering your errors on the panel id.
You might also want to take a look at areg:
areg ln_w age c.age#c.age ttl_exp c.ttl_exp#c.ttl_exp tenure c.tenure#c.tenure in 1/100, absorb(idcode)
It is designed for datasets with many groups, but not a number of groups that increases with the sample size. The intercept will be slightly different too.
Related
I want to measure whether the impact of a company's headquarter country on my independent variable (goodwill paid) is stronger during recessions. After some researching, I found out that the differences-in-differences analysis could solve my problem. However, in the internet they always show a diagram (see example under: https://www.google.com/url?sa=i&url=https%3A%2F%2Fwww.publichealth.columbia.edu%2Fresearch%2Fpopulation-health-methods%2Fdifference-difference-estimation&psig=AOvVaw1yMN6knTtOEahZ9vstJpnV&ust=1676208292554000&source=images&cd=vfe&ved=0CAwQjRxqFwoTCLjbrNDIjf0CFQAAAAAdAAAAABAE ) with the "treatment" and "parallel trends". So two lines that increase or decrease in the same way until the treatment and then one line increase/decreases more than the other.
My question now is what is my treatment and what is my control variable in my example? The treatment cannot be recessions because otherwise I just have the treatment group after the treatment and the control group before the recessions. If you think another statistical test may be better, I would be happy to consider that.
Furthermore, I just want to make sure that I created my model correctly: Goodwil Paid=B0+B1ressions+B2Country+B3ressionsCountry
Would that tell me whether the impact of the country is stronger during recessions?
Thanks a lot for your help.
I am plotting students' data from different schools to see the difference between male and female student numbers at some majors. I am using python, I already plot the data for some schools and as I expected male numbers are genuinely higher, then I realized that for each school I have a different number of total students. does my work make any sense when the sample size is different? if not may I have some suggestion to make some changes.
Now I'm realizing. Look: you have two classes where the first has 2 men, the second one - 20 men. And their marks. 2 men - both are 90/100. And 20 marks in the second one. Let it be a range from 40 to 80. Will it be correct if we say "Well, the first class made the test much better then the second"? Ofc, not.
To solve this problem just take a min(sizes of samples). If it looks too small, so throw away this programm, because you have not enough data to say something. And put a total size of sample via proxy legend or text, or add it in title. Anyway it will show you reliability of your results.
This question is not about programming, but rather about statistics, but I will try to answer.
Important question I didn't get there: What are you doing it for? If you ask question like "Hmm... Are there more men than women in the population(in this case, population = all persons in major programm)?". So each schools aren't important for you,and you can work with samples as you work with one (but don't forger gather them).
But you may ask question: "are there any difference between schools in samples?". In this case, gathering is not correct. For this purpose I highly recommend barh plot with stucked=True for each school. And for normalization just use percents. And difference between samples' size won't be problem.
PLS, If you ask question, put some code. 3 rows and one plot from a sample would be very helpful...
I have a dataset, on which i am working on Data Cleaning part, where one of the attribute or feature is having the values with various units. for example some of the values are as follow.
1 kg; 6 LB; 900 gms; 32 oz; etc.
If i use the standard scaler then it will not be fair as the values and their units are different, so cannot treat them as is.
Please do suggest how to handle such data.
I will recommend to change the different value to same unit first of all. For example, you can make all the value to kg or whatever suits best for you, and then perform the standard scale.
Thanks All, I did some research and found that i need to convert the various units into standard units and which follow internation norms referred to SI Units https://www.nist.gov/pml/weights-and-measures/metric-si/si-units , and same suggestion has given by #sharmajee499.
Moving ahead with this approach.. though this is going to be a lot of manual code, but seems there is no direct short and easy way.
Please do post if have any better solution.
If Yelp wanted to understand if ratings helped users pick a listing, and we use the CTR as the success metric to run the ab test, how do we know that a significant change in CTR is due to just the ratings and not other parts of the listing like the reviews?
Do we have to do some kind of user segmentation instead of randomly assigning users before running the ab test?
Randomization takes care of all other variables but the treatment. Test on statistical significance takes care of the choice between the treatment and chance. It's only when you can't do a randomized trial, that you need to control for other differentiators.
You generally want to trust randomization for most experiments. Randomization is an unbiased process that with enough users, controls for all possible confounding factors, both known (eg. age, gender and OS) and unknown (eg. personality, hair color and sophistication), making comparisons between test and control groups balanced and fair. Since both groups are exposed and measured simultaneously, A/B testing also corrects for temporal and seasonal effects. Statistically significant differences between the test and control groups can be directly attributed to the change being tested. I wrote more about this in a blog post.
Going with a custom user segmentation is usually reserved for rare instances where randomization can be expected to produce unbalanced groups. This is generally rare, but an example is if you split a room of 100 people into two groups but Bill Gates and Elon Musk are in this room. Depending on what metric you want to measure, they can severely mess things up. Randomization will stick both billionaires in the same group half the time. This is a scenario where it's worth doing a custom segmentation and enforcing that they end up in different groups. But this sort of thing is generally rare and rarely affects binary metrics like CTR.
For an ecommerce website how do you measure if a change to your site actually improved usability? What kind of measurements should you gather and how would you set up a framework for making this testing part of development?
Multivariate testing and reporting is a great way to actually measure these kind of things.
It allows you to test what combination of page elements has the greatest conversion rate, providing continual improvement on your site design and usability.
Google Web Optimiser has support for this.
Similar methods that you used to identify the usability problems to begin with-- usability testing. Typically you identify your use-cases and then have a lab study evaluating how users go about accomplishing certain goals. Lab testing is typically good with 8-10 people.
The more information methodology we have adopted to understand our users is to have anonymous data collection (you may need user permission, make your privacy policys clear, etc.) This is simply evaluating what buttons/navigation menus users click on, how users delete something (i.e. changing quantity - are more users entering 0 and updating quantity or hitting X)? This is a bit more complex to setup; you have to develop an infrastructure to hold this data (which is actually just counters, i.e. "Times clicked x: 138838383, Times entered 0: 390393") and allow data points to be created as needed to plug into the design.
To push the measurement of an improvement of a UI change up the stream from end-user (where the data gathering could take a while) to design or implementation, some simple heuristics can be used:
Is the number of actions it takes to perform a scenario less? (If yes, then it has improved). Measurement: # of steps reduced / added.
Does the change reduce the number of kinds of input devices to use (even if # of steps is the same)? By this, I mean if you take something that relied on both the mouse and keyboard and changed it to rely only on the mouse or only on the keyboard, then you have improved useability. Measurement: Change in # of devices used.
Does the change make different parts of the website consistent? E.g. If one part of the e-Commerce site loses changes made while you are not logged on and another part does not, this is inconsistent. Changing it so that they have the same behavior improves usability (preferably to the more fault tolerant please!). Measurement: Make a graph (flow chart really) mapping the ways a particular action could be done. Improvement is a reduction in the # of edges on the graph.
And so on... find some general UI tips, figure out some metrics like the above, and you can approximate usability improvement.
Once you have these design approximations of user improvement, and then gather longer term data, you can see if there is any predictive ability for the design-level usability improvements to the end-user reaction (like: Over the last 10 projects, we've seen an average of 1% quicker scenarios for each action removed, with a range of 0.25% and standard dev of 0.32%).
The first way can be fully subjective or partly quantified: user complaints and positive feedbacks. The problem with this is that you may have some strong biases when it comes to filter those feedbacks, so you better make as quantitative as possible. Having some ticketing system to file every report from the users and gathering statistics about each version of the interface might be useful. Just get your statistics right.
The second way is to measure the difference in a questionnaire taken about the interface by end-users. Answers to each question should be a set of discrete values and then again you can gather statistics for each version of the interface.
The latter way may be much harder to setup (designing a questionnaire and possibly the controlled environment for it as well as the guidelines to interpret the results is a craft by itself) but the former makes it unpleasantly easy to mess up with the measurements. For example, you have to consider the fact that the number of tickets you get for each version is dependent on the time it is used, and that all time ranges are not equal (e.g. a whole class of critical issues may never be discovered before the third or fourth week of usage, or users might tend not to file tickets the first days of use, even if they find issues, etc.).
Torial stole my answer. Although if there is a measure of how long it takes to do a certain task. If the time is reduced and the task is still completed, then that's a good thing.
Also, if there is a way to record the number of cancels, then that would work too.