How to use Excel column chart for datasets that have very different scales - excel

There are 2 datasets that have values in the interval [0; 1]. I need to visualize these 2 datasets in Excel as a column chart. The problem is that some data points have values 0.0001, 0.0002, and other data point have values 0.8, 0.9, etc. So, the difference is hugde, and therefore it´s impossible to see data points with small values. What could be the solution? Should I use logarithmic scale? I appreciate any example.

Two possible ways below
Graph the smaller data set as a second series against a right hand Y axis (with same ratio from min to max as left hand series)
Multiply the smaller data set by 1000 and compare the multiplied data set to the larger one
Note that a log scale will give negative results given you are working with fractions, so that isn't really an option

Related

Normalisation or Standardisation for detecting outlier?

When to use min max scaling that is normalisation and when to use standardisation that is using z score for data pre-processing ?
I know that normalisation brings down the range of feature to 0 to 1, and z score bring downs to -3 to 3, but am unsure when to use one of the two technique for detecting the outliers in data?
Let us briefly agree on the terms:
The z-score tells us how many standard deviations a given element of a sample is away from the mean.
The min-max scaling is the method of rescaling a range of measurements the interval [0, 1].
By those definitions, z-score usually spans an interval much larger than [-3,3] if your data follows a long-tailed distribution. On the other hand, a plain normalization does indeed limit the range of the possible outcomes, but will not help you help you to find outliers, since it just bounds the data.
What you need for outlier dedetction are thresholds above or below which you consider a data point to be an outlier. Many programming languages offer Violin plots or Box plots which nicely show your data distribution. The methods behind plots implement a common choice of thresholds:
Box and whisker [of the box plot] plots quartiles, and the band inside the box is always the second quartile (the median). But the ends of the whiskers can represent several possible alternative values, among them:
the minimum and maximum of all of the data [...]
one standard deviation above and below the mean of the data
the 9th percentile and the 91st percentile
the 2nd percentile and the 98th percentile.
All data points outside the whiskers of the box plots are plotted as points and considered outliers.

Excel - Plot average of two plots with inconsistent time (X) axis

I have managed to plot two different data sets on the same axis however, I'm also looking to plotting another line showing their average.
The main problem is that both data sets have different X (time) values so it's not possible to add an average column at the end and plot that. (See the highlighted row 22 for example, corresponding Time values are different)
Is there any way I can plot an average of two plots on the same axis?
One idea that might work is to place the values of both series, one above the other in two new columns, sort this new data according to time, smooth it, then plot the smoothed combined data. Alternatively, you could do the smoothing by simply plotting the new sorted series, adding a moving average trendline to it, then change the formatting of the new series so that it is no longer visible (but the trendline is). Something like this:
In the above picture, series 3 is the plot of the sorted aggregate data of series 1 and 2. If you change the formatting of series 3 so that there is no line, you get something like this:
For my relatively small mock data sets, the results are admittedly poor (it was based on just 25 data points in each series), but if you have a large amount of closely spaced data, and you play around with the moving average window size, you might get something acceptable. If not, you should probably just interpolate both datasets to obtain two consistent time series.

Averaging many curves with different x and y values

I have several curves that contain many data points. The x-axis is time and let's say I have n curves with data points corresponding to times on the x-axis.
Is there a way to get an "average" of the n curves, despite the fact that the data points are located at different x-points?
I was thinking maybe something like using a histogram to bin the values, but I am not sure which code to start with that could accomplish something like this.
Can Excel or MATLAB do this?
I would also like to plot the standard deviation of the averaged curve.
One concern is: The distribution amongst the x-values is not uniform. There are many more values closer to t=0, but at t=5 (for example), the frequency of data points is much less.
Another concern. What happens if two values fall within 1 bin? I assume I would need the average of these values before calculating the averaged curve.
I hope this conveys what I would like to do.
Any ideas on what code I could use (MATLAB, EXCEL etc) to accomplish my goal?
Since your series' are not uniformly distributed, interpolating prior to computing the mean is one way to avoid biasing towards times where you have more frequent samples. Note that by definition, interpolation will likely reduce the range of your values, i.e. the interpolated points aren't likely to fall exactly at the times of your measured points. This has a greater effect on the extreme statistics (e.g. 5th and 95th percentiles) rather than the mean. If you plan on going this route, you'll need the interp1 and mean functions
An alternative is to do a weighted mean. This way you avoid truncating the range of your measured values. Assuming x is a vector of measured values and t is a vector of measurement times in seconds from some reference time then you can compute the weighted mean by:
timeStep = diff(t);
weightedMean = timeStep .* x(1:end-1) / sum(timeStep);
As mentioned in the comments above, a sample of your data would help a lot in suggesting the appropriate method for calculating the "average".

How to produce the data points for a circle in Excel using ROW INDIRECT

The page linked to here has been a great help to me. The method of using the named function (=(ROW(INDIRECT("1:361"))-1)*PI()/180) to produce the circle data points is very slick compared to my original method that was to calculate them individually, writing them in to rows.
My data set includes some 50k rows of data, each one defining a circle. The set is divided into 50 groups and I need to plot one circle from each group as selected via a scroll bar controlling a LOOKUP routine.
Please can someone suggest how I might modify the function (=(ROW(INDIRECT("1:361"))-1)*PI()/180) to reduce the number of data points it produces? I want to reduce the computing load and also, it's not practical to display & format data markers with such high data density. My existing circles are produced with just 18 coordinate pairs and are satisfactorily rounded.
Thanks in advance. Steve.
This would give you 19 data points, 0 and 360 as the start/end points with another every 20%
=(ROW(INDIRECT("1:19"))-1)*PI()/9

Excel graphing and axis

I have a scatterplot with values that range from 2 to -2. The catch is that 1 is the "zero-point". In other words the minimum positive value is 1.01 and the minimum negative value is -1.01. How can I edit the axis of the graph so that 0 is replaced with 1.
If you have a version of Excel later than 2007 I'd suggest splitting the positives and negatives into separate series and plotting one on a secondary axis (not that I know whether or not that would work!), but with 2007 I have not been able to place one vertical axis above the horizontal and another below. Instead the best I could manage was to use two separate charts:
by again splitting up the series, careful positioning and judicious use of a text box for 0.
At least this way you are not constrained by the outer limits.
Based on some very specific conditions you can print the zero-point as 1 using a custom number format: You have to set the axis options to be fixed at -2 (minimum) and 2 (maximum) with a major unit of 2 as well. This ensure that you only have the three values -2, 0 and 2 on the vertical/y-axis. Why is this important? Well, custom number formats can easily distinguish between positive/negative and zero values which is exactly what you have when you have -2, 0 and 2.
Here's a visual of the input/output:
The custom number format is set to 2;-2;1, thereby formatting all positive numbers to 2, all negative numbers to -2 and zero to 1.
If all you want is to replace the axis label "0" with "1" (as in the answer by Werner), then you can use the following (similar to this):
Add X and Y values for a dummy series, with 3 points. If the minimum value in your X-axis is xm, your points are (xm, -2), (xm, 0), (xm, 2).
Add cells with the 3 labels that you will use for the dummy series: "-2", "1", "2".
Go to the chart, and remove the tick labels of the Y-axis.
Add a series with the 3 dummy data points.
Add the labels to the data points. You can use references to the cells of item 2, or enter explicit labels. Entering each label (either a reference or an explicit label) is tedious when you have many data points. Check this, and in particular Rob Bovey´s add-in. It is excellent.
Format the dummy series so it is visually ok (e.g., small, hairline crosses, no line).
You can use variations on this. For instance, you can add extra points to your dummy series, with corresponding labels. Gridlines would match the dummy series.
But I think this is not appropriate, as the locations of your data points will be inconsistent with the scales.
What is appropriate is having an interrupted axis, where the interval (-1,1) is eliminated.
The answer by pnuts aims at that.
I propose something different, with the advantage of using only one chart:
Create a column where you add 2, only to negative Y-values. Use that column as your new Y-values.
Use the same trick as above, with your dummy series now being (xm, 0), (xm, 1), (xm, 2), and the labels the same as above.
You can use additional points in your dummy series.
You can use this technique to create an arbitrary number of axis interruptions. The formula for the "fake" Y-values would be more complicated, with IFs to detect the interval corresponding to each point, and suitable linear transformations to account for the change in scale for each interval (assuming linear scales; no mixing linear-log). But that is all.
PS: see also the links below. I still think my alternative is better.
http://peltiertech.com/broken-y-axis-in-excel-chart/
http://ksrowell.com/blog-visualizing-data/2013/08/12/how-to-simulate-a-broken-axis-value-axis/
http://www.tushar-mehta.com/excel/newsgroups/broken_y_axis/tutorial/index.html#Rescale%20and%20hide%20the%20y-axis

Resources