I didn't know what stack exchange site to put this on, so I put it here. I am trying to determine if there is a correlation between the size of a school and the major that the school specializes in.
In order to do this, I programatically collected and analyzed data. In order to make my report, I need to make a few graphs in excel, but I have no clue how to do this.
What I'm looking for is a scatter plot, with quantitative values on the Y-Axis (the school size) and qualitative values on the X-Axis, I would like there to be every major listed out (kinda like a bar graph). From there, I want to plot a point above the major that a school specializes in; and have that point be as high as its student size.
Any help?
Edit:
Here is my sample data set. I want it to have categories that are to the right of the data, and points on the graph that correspond.
When you say "correlation" between X and Y, I think regression.
I would recommend doing an X-Y scatter plot and asking Excel to add a trend line. Not only will you get a least squares fit for the "best" line for your data, you'll get the correlation coefficient that tells you whether or not there's a relationship. The correlation coefficient ranges from -1 to +1; the closer your correlation coefficient is to 1.0, the better the relationship.
Related
Trying to draw a Bell Curve/Normal Distribution curve with the data set provided, but it is not getting created on Excel. Can anyone help me in creating the same.
https://docs.google.com/spreadsheets/d/1ipDo6WlbmDUBZuuS4ya3ZGD7mkP_vnbByK3KvyLbJ88/edit?usp=sharing
The above file can be used as the data set for creating the curve. Can someone explain me the procedure of how to make a curve with the above data set in Excel?
if your data is normally distributed it should resemble a bell curve.
By "Trying to draw a Bell Curve/Normal Distribution curve", are you referring to a line diagram?
Remember, the bell curve is a histogram of your data. If you inserted a histogram of your data, would that be enough?
If not, what you could do is calculate the standard deviation of your data (and the mean), then you could make a column for different standard deviations and what value we expect it to be.
We could then incorporate that into your old histogram. You could use a "Combo" chart and plot the histogram on one axis and the a line for your calculated values (you can make it smooth if you think it's too sharp. Also, you could decrease the distance between each of your calculated values (1.1, 1.2, ...) instead of let's say halves of standard deviations.
Unfortunately, the data you provided is not at all normally distributed.
So you can't create a bell curve based on this data, no.
I currently have a data set of x and y coordinates (position of an animal in an arena) over a period of time. I just used the coordinates to plot a scatter plot of what that looks like. However, instead of having every single coordinate as a separate point, i was wondering if there was a way to create a heat map of the points? So, the higher the likelihood of the animal in a specific area/ similar coordinates, the warmer the color? Hoping for the final product to be a depiction of the arena with a gradient of colors based on the likelihood the animal explores those regions?
Well with that many points, I don't know if Excel is the right choice if wanting to color-coordinate. The site https://app.rawgraphs.io/ has some really cool graphing capabilities. I use this when needing sankey's or something unusual that Excel cannot easily handle.
Here I used 1500 x/y points between 0 and 20. Then I selected the graph type called "Contour Plot".
Would this work?
Or here's a Hexagonal Binning chart of the same data...
I have several curves that contain many data points. The x-axis is time and let's say I have n curves with data points corresponding to times on the x-axis.
Is there a way to get an "average" of the n curves, despite the fact that the data points are located at different x-points?
I was thinking maybe something like using a histogram to bin the values, but I am not sure which code to start with that could accomplish something like this.
Can Excel or MATLAB do this?
I would also like to plot the standard deviation of the averaged curve.
One concern is: The distribution amongst the x-values is not uniform. There are many more values closer to t=0, but at t=5 (for example), the frequency of data points is much less.
Another concern. What happens if two values fall within 1 bin? I assume I would need the average of these values before calculating the averaged curve.
I hope this conveys what I would like to do.
Any ideas on what code I could use (MATLAB, EXCEL etc) to accomplish my goal?
Since your series' are not uniformly distributed, interpolating prior to computing the mean is one way to avoid biasing towards times where you have more frequent samples. Note that by definition, interpolation will likely reduce the range of your values, i.e. the interpolated points aren't likely to fall exactly at the times of your measured points. This has a greater effect on the extreme statistics (e.g. 5th and 95th percentiles) rather than the mean. If you plan on going this route, you'll need the interp1 and mean functions
An alternative is to do a weighted mean. This way you avoid truncating the range of your measured values. Assuming x is a vector of measured values and t is a vector of measurement times in seconds from some reference time then you can compute the weighted mean by:
timeStep = diff(t);
weightedMean = timeStep .* x(1:end-1) / sum(timeStep);
As mentioned in the comments above, a sample of your data would help a lot in suggesting the appropriate method for calculating the "average".
I am trying to plot a Pareto plot in Spotfire.
I would like it to look like this (I have used some very basic input data):
But so far I can only make it look like this:
I have done this by creating a hierarchy in 'column properties' and ordering names from lowest to highest experience, then flipping both the x and y axis, so that my curve looks like a Pareto plot, but it is not quite the same.
Is there a more efficient way to do this in spotfire that would allow correct y axis display and also allow the names to go along the bottom?
Any help is greatly appreciated.
Thanks
I think I have an answer for you. In this you do not have to flip the y-axis, and I can get the names in the x-axis, but I still use a hierarchy. I'm not sure it's more efficient, but here goes.
First, I created a Hierarchy defined as:
CREATE NESTED HIERARCHY [New hierarchy (2)]
[yearsOfExp] AS [yearsOfExp],
[name] AS [name]
This allowed me to order the category axis by years of experience.
Secondly, I created a calculated column defined as:
Sum([yearsOfExp]]) over (AllNext([yearsOfExp]]))
I then created a line chart. The hierarchy is the x axis and the calculated column is the y axis. When setting the x axis, be sure to "reverse scale".
I hope this does what you're looking for. Good luck. Any questions, just ask.
Please dont eat me because of this question :)
I have some data in excel and I would like to make a graphical representation of those data. Structure of my data:
persons ID : from 1 to 485 to every person, there is one parameter like average jumping distance and another parameter (like height) and finally there is a class to which every person belongs to (1, 2 or 3).
To assign persons to classes I have used k-means algorithm.
Now I would like to make a graph of this result. How can I do it please in excel (or by using another tool)?
Thank you
I would use a scatter (XY chart with markers and no lines). Plot average jumping distance on one axis, height on the second axis. Then for the classes I would separate all the data into 3 series and use different colors for each series. I would adjust the marker size to see which one works best with the data.
Here is a fast example to give you an idea how to it would look like. Its not as easy as just clicking once to insert the chart from the data though: