When Excel determines the axis values it will use to represent your data in a chart, the values are 'evenly distributed'.
For Example:
If you plot the following series in an Excel Line Chart.
[0.22,0.33,0.44,0.55,0.66,0.77,0.88,0.99,1.1,1.21,1.32,1.43,1.54,1.65,1.76,1.87,1.98,2.09,2.2]
Excel determines that the y-axis values should be [0,0.5,1,1.5,2,2.5].
What technique or formula is used to determine these values ?
After doing some experiementing, I conclude that Excel:
1) It will keep the Y axis starting at zero unless you explicitly tell it otherwise
2) It will set the Y axis max one major tick higher than your largest value
3) The last part seems more arbitrary - it clearly has "preferred" units (.1, .2, .5, 1, 2, 5, 10, 50, 100, etc) that it will choose for the major tick. It will use the smallest preferred unit that will result in between 5 and 10 major ticks, meeting the requirements above.
Please have a look at:
Reasonable optimized chart scaling
And
Algorithm for "nice" grid line intervals on a graph
There are more questions like this.
Related
I want to write a program to transform temperature dependend data into an Arrhenius plot. An Arrhenius plot show the logarithm of a property that is thermally acitavted versus the reciprocal temperature aka 1/T. Now is 1/T something that most people are not used to. This is why most of the plot also contain the translation in temperature on a second axis. Usually on top of the graph. The output should look like this:
Picture Source
The second axis is only for better readability and is corresponds to the primary axis with the relation:
primary=1/secondary
secondary=1/primary
What I am not able to do in excel-VBA (excel 2010) is the reciprocal second x-axis. There is no predefined axis scaling like this. There is xlScaleLinear and xlScaleLogarithmic for the property ScaleType of an axis. Is there a way to do this?
A secondary problem is that this:
Dim CH As Chart
Set CH = Tabelle2.ChartObjects(1).Chart
CH.ChartType = xlXYScatterLinesNoMarkers
With CH
.HasAxis(xlCategory, xlSecondary) = True
End With
Does not seem to work. Which means that a xyScatterplot does not seem to have a secondaryXaxis enabled.
I could try to add the lables and ticks myself using forms but this seems a little to much pain, I cannot be the only one who has encountered this problem.
Problem 1: How to format an axis reciprocal (1/x)?
Problem 1b: How to properly add a second x axis in a XYscatterplot?
You can do this by creating a fake axis using a series with data labels (inspired by https://peltiertech.com/secondary-axes-that-work-proportional-scales/):
Columns A and B are your data. Column C matches the X-ticks of your primary X-axis. Column D is =1/C2 etc and column E is the y-axis maximum for your chart. Now simply create a new series of columns C and E, format it to have no line and in this case I chose the + marker but you can create your own vertical line marker if you want it to be exact. Then add data labels set to range column D.
I don't think you'll find another way to do it without this hack, but it's really not that hard and doesn't require VBA which is always a plus in terms of readability / audibility of your workbook.
Another alternative would be to use the data labels to write the actual temperature to the data points:
Column C has the temperature in Celsius.
Since the Arrhenius plot is only defined as ln(k) against 1/T, this would be a good option I think.
Most of us may be aware of normal distribution curves however those who are new to front-loaded and back-loaded normal distribution, I would like to provide the background and then would proceed on stating my problem.
Front-Loaded Distribution: As demonstrated below, it have a rapid start. For e.g. in a project when more resources assumed to be consumed early in the project, cost/hours is distributed aggressively at the start of project.
Back-Loaded Distribution: Contrary to Front-Loaded distribution, it start out with a lower slope and increasingly steep towards the end of the project. For e.g. when most resources assumed to be consumed late in the project.
In the above charts, green line is S-Curve which represents cumulative distribution (utilization of resources over the proposed time) and the blue Columns represents the isolated distribution of resources (Cost/Hours) in that period.
For reference, I am providing the Bell Curve / standard normal distribution (when Mean=Median) chart (below) and the associated formula to begin with.
Problem Statement: I was able to generate the normal distribution curve (See below with formulae) however I am unable to find a solution for Front loaded or Back Loaded curves.
How to bring the skewness to the right (front-loaded / positively skewed distribution which means mean is greater than median) and left skewed (back-loaded / negatively skewed distribution which means mean is less than median) in a normal distribution?
Formula Explaned:
Cell B8 denotes arbitrarily chosen standard deviation. It affects the kurtosis of normal distribution. In the above screenshot, I am choosing the range of the normal distribution to be from -3SD to 3SD.
Cell B9 to B18 denotes the even distribution of Z-Score using the formula:
=B8-((2*$B$8)/Period)
Cell C9 to C18 denotes the normal distribution on the basis of Z Score and the Amount using the formula:
=(NORMSDIST(B9)-NORMSDIST(B8))*Amount/(1-2*NORMSDIST($B$8))
Update: Following one of the link in comment, I closest got to the below situation. The issue is highlighted in Yellow pattern as due to the usage of volatile Rand() function the charts are not smooth as they should be. As my given formula above do not create ZigZag pattern, I am sure we can have skewed normal distribution and smooth too !
Note:
I am using Excel 2016, so I welcome if any newly introduced formula can solve my problem. Also, I am not hesitant to use UDFs.
The numbers of front-load and back-load distribution are notional. They could vary. I am only interested in shape of resulting chart.
Kindly help !
You can generate the curve using below methods and can use the numbers generated by them for your requirement.
With formulae
The curve
Notes:
If you want to change the bins you have to drag the cells down or up
in order to complete the series
If you want to change the total cost, you can change the multiplier
If you want to change the tilting of the curve you can change the
divider in column C which is currently set to 2, if it is -2 the tilt
will change direction, you can experiment with different numbers,
the direction depends upon either it is less than zero or greater
than zero
For copy past
=A2+180/($G$3-1)
=RADIANS(A2)
=$G$4*SIN(B2 + SIN(B2)/2)
I used the actual mathematical formulas to arrive at the result. It looks like to me what you wanted to achieve. The orange cells in 'Skewed' section are the ones which can be changed to vary the degree and direction of skew. Some pictures for demonstration are below, followed by the formulas used.
Formulas in row 5, column
B:=(A5*$A$2)+0 (0 is the mean, you can change as you like)
C:=(1/($A$2* SQRT(2*PI())))*EXP(-(B5^2)/2)
D:=0.5*(1+ERF(B5/SQRT(2)))
E:=$A$1*C5
F: =(A5*$A$2*(1+$F$2*SIN((F4*PI())/(2*$F$4))))+0 (0 is the mean, you can change as you like)
G:=(1/($A$2* SQRT(2*PI())))*EXP(-((F5+$G$2)^2)/2)
H:=0.5*(1+ERF((B5+$G$2)/SQRT(2)))
I:=$A$1*G5
If you want to make sure the bins always have a value in them, you can use the following approach, which uses normal distributions and simply changes the mean and the standard deviation to get a curve that you want.
Changing the mean moves the peak to the left or right. Changing the standard deviation makes the quantities more uniform or more variable. I've used 0-1000 as my default range in the example below, but it should be easy to modify the formula to bring any value you want. NOTE in order to fulfill your requirement that all bins must be non-zero, you need to manually adjust the numbers till you get a curve that suits.
Yellow cells are for data entry, green cells are a count (so if you add bins, they would need to be numbered according to the sequence).
Formula in cell B7 (copied down to cell B16):
=NORMDIST($A7*1000/MAX($A$6:$A$17),$B$3,$B$4,TRUE)-NORMDIST($A6*1000/MAX($A$6:$A$17),$B$3,$B$4,TRUE)
Formula in cell C7 (copied down to cell C16):
=IF(A7=MAX($A$6:$A$17),$C$5-SUM(C$6:C6),ROUND(B7/SUM($B$7:$B$17)*$C$5,0))
Adding new bins is simple enough and is still based on a 0-1000 range, so you don't need to change any numbers other than adding rows and copying down the formulae:
The above example is also showing how a narrow standard deviation and a high mean combine to make the starting bins have very little quantity. But there is still a value (as long as count is big enough).
You may want to pre-define the different skewness selections if this is going to be used by other people (make column B dependent on a lookup, for example) but hopefully this is extensible enough for your needs.
If you are open to a Python answer the I can give you the code to get Python Pandas libary to generate the random observations from a skewed Normal and then bin (bucket) them for you. The following in a Python script which captures the use case but also can be created using COM and so creatable from VBA.
import numpy as np
import pandas as pd
from scipy.stats import skewnorm
class PythonSkewedNormal(object):
_reg_clsid_ = "{1583241D-27EA-4A01-ACFB-4905810F6B98}"
_reg_progid_= 'SciPyInVBA.PythonSkewedNormal'
_public_methods_ = ['GeneratePopulation','BinnedSkewedNormal']
def GeneratePopulation(self,a, sz):
# https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html
np.random.seed(10);
#https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.skewnorm.html
return skewnorm.rvs(a, size=sz).tolist();
def BinnedSkewedNormal(self,a, sz, bins):
# https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.random.seed.html
np.random.seed(10);
#https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.stats.skewnorm.html
pop = skewnorm.rvs(a, size=sz); #.tolist();
bins2 = np.array(bins)
bins3 = pd.cut(pop,bins2)
table = pd.value_counts(bins3, sort=False)
table.index = table.index.astype(str)
return table.reset_index().values.tolist();
if __name__=='__main__':
print ("Registering COM server...")
import win32com.server.register
win32com.server.register.UseCommandLine(PythonSkewedNormal)
And the VBA client code
Option Explicit
Sub TestPythonSkewedNormal()
Dim skewedNormal As Object
Set skewedNormal = CreateObject("SciPyInVBA.PythonSkewedNormal")
Dim lSize As Long
lSize = 100
Dim shtData As Excel.Worksheet
Set shtData = ThisWorkbook.Worksheets.Item("Sheet3") '<--- change sheet to your circumstances
shtData.Cells.Clear
Dim vBins
vBins = Array(-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5)
'Stop
Dim vBinnedData
vBinnedData = skewedNormal.BinnedSkewedNormal(-5, lSize, vBins)
Dim rngData As Excel.Range
Set rngData = shtData.Cells(2, 1).Resize(UBound(vBins) - LBound(vBins), 2)
rngData.Value2 = vBinnedData
'Stop
End Sub
Sample output
(-5, -4] 0
(-4, -3] 0
(-3, -2] 4
(-2, -1] 32
(-1, 0] 57
(0, 1] 7
(1, 2] 0
(2, 3] 0
(3, 4] 0
(4, 5] 0
Original code deposited on my blog
Based on #usmanhaq 's ans, vba macro made for distribution curve simulation. Corrected for 100% scaling of front & backloading curve.
click here to go Github Lib
I have some data
20,10.00
21,10.00
22,10.00
23,09.00
00,10.00
01,10.00
...
I want to graph the first value on the x axis and the second value on the y axis. I want the y axis to be autoset but I want the x axis to follow in line with my data eg. 20, 21, ..., 0, 1... instead of 0, 1, ..., 23
I thought I would do this with xticlabels, stating plot "filename" using xticlabels(1):2 or, as inspired by this, 1:2:xticlabels(1). Neither has the desired effect. What am I to do?
Yes, you must use xticlabels to add individual labels. But now you must still specify some value for the x-axis. If you know, that the rows all have the same spacing, then use the zeroth column as x-value:
plot "filename" using 0:2:xticlabels(1)
For my specific case, setting xrange [23:0] will suffice. However, this is not dynamic as it does not apply in the case of unordered values, so I am still curious of how the problem would otherwise be solved.
I want to create an automated scatter plot. This is the first example table based on the step size I end up measuring A, B, C, D for a specific frequency. In this scatter plot I created manually you can see I want to plot C v/s A for a particular frequency.
But I need to do this automatically as based on the step size the number of row can change. Here, since the step size decreased the number of samples increased, and now the scatter plot needs to update number of A and C values it plots.
Is there a formula I can use without using any macros?
The relation between the step size and frequency is (number of samples of a single frequency = (360/step size)) so for a step size of 60 you will have in reality six entries of frequency 100 and six of 200 .
You can use formulas to define chart ranges if you hide the formulas in named ranges. Combine that with the fact that #N/A values are not plotted and you can get this to work without VBA.
For your example graph you could define two names ranges as follows:
Name: A_100
Refers To: =IF(Sheet1!$E$3:$E$100=100,OFFSET(Sheet1!$A$3,0,0,360/Sheet1!$B$1,1),NA())
and
Name: C_100
Refers To: =IF(Sheet1!$E$3:$E$100=100,OFFSET(Sheet1!$C$3,0,0,360/Sheet1!$B$1,1),NA())
Then set the X and Y axis of the chart to SheetName!A_100 and SheetName!C_100
The if statement filters out all the points not at frequency 100, if you have a formula for selecting the frequency replace "Sheet1!$E$3:$E$100=100" with that.
The offset function takes the first cell in the column and expands the number of rows according to your 360/step size formula.
I have a data series with 5 decimals such as 0,58861; now I plot with a XYChart as LineChart but I only see 3 numbers plotted, i.e. 0,58 or .584.
I have also tryed to change font size
yAxis.setTickLabelFont(Font.font("Arial", 5));
with no result, I always have 3 numbers plotted.
Below are two picture to show this behavior.
How to set more decimal on Y axis?
So what is wrong with that y-axis plotted numbers? You can think 0.58 as 0.58000 etc. These are just major ticks and not the points (numbers) of your data series. If you want to see more ticks and hence more precise measure values change the tickUnit parameter:
NumberAxis(double lowerBound, double upperBound, double tickUnit)
But this time if your datas range (upperBound-lowerBound) is large according to tickUnit then your graph's shape and form will degrade. To overcome this problem play with lowerBound and upperBound values to show only part of your data series as like they have been zoomified.
To change the tick label font and size, try:
yAxis.setTickLabelFont(Font.font("Arial", FontWeight.MEDIUM, 18));
Changing the font size will not help. You have to change the tick label formatter using the setTickLabelFormatter function.