I am a beginner when it comes to SPSS, and I am not certain if when you need to discover the characteristics of individuals participating in the survey using SPSS should you do Pie charts, histograms and other graphs for age, gender etc and also find the mode, median, mean and standard deviation? Or is there another approach?
This isn't a technical question, but a statistical one. So next time you'd better ask a question like this on Cross Validated.
To choose the appropriate types of graphs and statistical values you must first determine the measurement level of your variables. See: Levels of Measurement on www.socialresearchmethods.net
Age and income for instance are ratio/scale level variables, so a histogram would be an appropriate graph and mean and standard deviation would be good characteristic values.
Religion on the other hand is a nominal variable, where the "mean" of this variable would be meaningless. In this case "mode" would be an appropriate characteristic value (it will only tell you which is most common Religion within your survey participants, though). A pie chart would be a meaningful graph here.
Related
I have two data sets (each with datapoints + standard deviation) and want to check whether they are statistically different. What kind of test would be appropriate?
Thank you!
The answer depends. If blue and red samples are randomly obtained, and the same group of items but measured at different times, then paired two-sample t-test applies. If they belong to different groups, the unpaired two-sample t-test is suitable. This decision is based on the assumption that both blue and red samples are normally distributed or can be transformed to a normal distribution by means of a logarithmic transformation. Otherwise, you need to implement Mann-Whitney test. The data values to be used are Output percentages given the same Input value. Data values should be continuous as in your case.
I want to know what is the best approach to handle a regression analysis on all text data type. I have the following data set.
my feature columns are: Strength, area of development, leadership, satisfactory
values of these columns are predefined set of texts eg. "Continuous Improvement,Self-Development,Coaching and Mentoring,Creativity,Adaptability"
based on the value in these columns I want to predict the label (overall Performance) - Outstanding or Exceeding Expectation or Meeting Expectation.
what should be the best approach to deal with this dataset ?
I am trying to understand if there is any difference between two terminologies viz. descriptive statistics and descriptive analytics. Googling didn't give clear picture on what is common and what is different between these two terminologies.
It appears that both terminologies summarizes and analyses the data with the help of statistics.
So does it means that they are just same? Statistician may like to mention descriptive statistics while data scientist may call it descriptive analytics.
Both are the same.
Descriptive statistics summarizes or describes characteristics of a data set.
Descriptive statistics consists of two basic categories of measures:
measures of central tendency
measures of variability or spread.
Measures of central tendency describe the center of a data set.Like mean, median, or mode, which measures the most common patterns of the analyzed data set.
Measures of variability or spread describe the dispersion of data within the data set.describing the shape and spread of the data set. Range, quartiles, absolute deviation, and variance
I would like to simulate the performance a baseball player. I know his expected performance for every future year and the standard deviations of those performances (based on regression analysis). At first, I was thinking of using the NORMINV(RAND(),REF,REF) function in excel, but the underlying distribution of baseball players' performances is dramatically right skewed. Is there a way that I can perform this sort of analysis in Excel or some other free or low-cost software? The end-goal here is for the simulation to use the right skewed distribution. Thanks very much.
R has lots of tools to do this sort of analysis, though you'd have to look through the docs to figure out how to use it. R is free, at least for non-commercial use.
If you have a cumulative distribution table (that is evenly spaced and sufficiently detailed) then you can easily generate random values from this distribution in Excel by looking up a uniform random number generated by RAND() in your distribution table and take the corresponding "x-axis" value.
=OFFSET($A$1,MATCH(RAND(),$B$2:$B$102),0)
A1 is the cell just above the table of "x-axis" values.
B2:B102 is the cumulative distribution table.
This is a simplified example. Some small modifications may be needed to handle edge-cases and adjust for biases.
If you have enough empirical data you should be able to create the cumulative distribution table.
I didn't know what stack exchange site to put this on, so I put it here. I am trying to determine if there is a correlation between the size of a school and the major that the school specializes in.
In order to do this, I programatically collected and analyzed data. In order to make my report, I need to make a few graphs in excel, but I have no clue how to do this.
What I'm looking for is a scatter plot, with quantitative values on the Y-Axis (the school size) and qualitative values on the X-Axis, I would like there to be every major listed out (kinda like a bar graph). From there, I want to plot a point above the major that a school specializes in; and have that point be as high as its student size.
Any help?
Edit:
Here is my sample data set. I want it to have categories that are to the right of the data, and points on the graph that correspond.
When you say "correlation" between X and Y, I think regression.
I would recommend doing an X-Y scatter plot and asking Excel to add a trend line. Not only will you get a least squares fit for the "best" line for your data, you'll get the correlation coefficient that tells you whether or not there's a relationship. The correlation coefficient ranges from -1 to +1; the closer your correlation coefficient is to 1.0, the better the relationship.