In Stata, how can I combine box plots of different widths? - graphics

I'm trying to combine several box plots across categories of different size.
Here is an example illustrating problem:
sysuse auto
graph box mpg, by(rep78, rows(1)) name(g1, replace )
graph box mpg, by(foreign, rows(1)) name(g2, replace )
graph combine g1 g2 , ycom r(2)
This gives me the following results.
All works according to the manual so for but I have two problems with this output.
Firstly - aesthetics. Personally, I think plot with the same width across rows would look better.
Secondly, and more importantly - on more complex graphs the font size for categories, axes, etc. is also sized proportionally. So even if I specify, let's say - medium size of axis label on all graphs - some of them will be slightly bigger or smaller.
I was wondering if there is an option to programmatically force width of second row of box plots to have the same size as the first one.

Is this you want? It is based on a trick, but the trick is quite general.
sysuse auto, clear
expand 2
gen what = cond(_n <= 74, rep78, 6 + foreign)
label def what 6 Domestic 7 Foreign
label val what what
graph box mpg, by(what, note("Repair record and Foreign") row(2) holes(8 9 10))
The logic is that
The two categorical variables are combined lengthwise. That ensures that each box plot will be the same size.
By specifying holes, we persuade graph box to put graphs on two rows.
I guess that your label size problem will disappear once 1 is solved.
For even more flexibility, you may need to abandon graph box and use twoway instead. A detailed discussion was given by me in the Stata Journal in 2009: you can go straight to http://www.stata-journal.com/sjpdf.html?articlenum=gr0039

Related

Is there a way in Microsoft Excel to give specific bins different bin widths when making a histogram plot?

I am trying to give specific bins on my histogram plot different bin widths. Is there a way to say I want bin 1 to have a width 1-10 and bin 20 to have a width of 300-1000?
Excel builtin histogram tool only allows equal bin-width. We must create instead a "variable width column chart" as explained by Jon Peltier. This can be a tedious and error-prone process if you've got a lot of bins.
Video tutorial for Excel 2016. The main steps are as followed:
Create a cascade table:
should turn into:
Note: dummy + Label columns aren't required, but helps with labeling
generates an stacked area chart (a type of Area chart)
Then change the Primary Axis's category to Time-Scale to straighten the areas into bars. As explained by Jon Peltier, this is because:
This is somewhat misleading, as Excel time-scale axes only consider
dates and ignore times.

Stacked bar graph showing only every other column

I am trying to create a 100% stacked bar graph in excel, however the resulting graph is only showing information for every other x-value. The x-axis is 134 values so I am not sure if the size is the problem or if it something else.
The X axis labels of a category axis (like in a a column or bar chart) is dynamically adjusted to the available space. When there are many columns/bars, not every column/bar will have a label. You can test that by making the chart wider/higher, to see how the X axis labels appear and disappar.
Having over 130 categorical items on an X axis is not good data visualisation. It's a rather horrible experience for the poor reader. Excel's behaviour is actually useful here. By not displaying all labels, it makes such a chart fairly unusable, and you may be inspired to think of better ways to visualise the data.
Maybe several smaller charts for segments of the data would be an option.

How to draw line X=1?

I know how to draw a line with scatter plot options where X is the independent and Y the dependent variable.
In the scatter plot of that data I need to add another line: X=2. I have the following data:
But how to draw a line X=1 ?
Maybe you want something like this:
I hear that charting is more different than many other aspects of Excel between versions and that perhaps my version (Excel 2007) is one of the least ‘friendly’ hence some of the reason for “not very easy” but the principle is as #Bill the Lizard has described. In view of some weird behaviour with (my?) Excel 2007 however I recommend being careful about the sequence in which the lines are drawn.
First I suggest getting your chart right for all aspects but the green line. Then add another series with X values of 1 and 1 and Y=2 values of 10 and -2 (or whatever the limits are of your chosen y-axis as displayed). Select and copy that array (four cells) select your chart and Paste Special…, and Add cells as New Series, Columns, Categories (X Values) in First Column, OK.
This should add a vertical line of the same chart type as the existing (ie XY (Scatter) Scatter with Straight Lines and Markers). The colour can be changed, by selecting that series (click on it and Format Data Series…, Line Color etc) and presumably you would want the markers removed. It was these that for me at first refused to disappear to order – but persistence paid off. Click on either of the data points, and under Marker Options choose none for Marker Type. If necessary, repeat for the other data point – and keep repeating if required!
Also, I selected what was showing as Series3 (text) in the legend and deleted that.
Forgot to mention that for anything to do with Excel charts Jon Peltier is the ultimate authority (eg) and that an alternative approach is to use an error bar and a secondary vertical axis.

Excel Chart doesn't keep format

I have a table (came from a pivot table) where I have formatted the column 4 cells to show 1 billion as 1. But when I select the table and insert a chart, I am getting my units in millions. So the 14.8 billion number for Mexico is showing up as 14,800 on the chart. Why might this be happening and how can I fix this? This is also making all my other bars negligibly small. Note that the first three columns are not in billions and are totally different things. Some are percentages, some are other small numbers.
Table:
Chart:
You need a secondary horizontal axis and some formatting on the Axes.
In Excel 2013
First change the Chart Type to Combo and select Clustered Bar for both sets of data, then Check
Secondary Axis for the Percentage Series.
Then set up the axis limits so they match, e.g.
Percentage: min -.5 max 2
Billions: min -5e9 max 20e9
Then set the percentage format on the source data to a custom Number format of "";(0)%;0%
Then set the Billions format as 0,,,;"";0
You will get something like this:
EDIT
Now that we have the general principles, we can apply them to your specific data.
I will also switch to Excel 2010 do show the different menus.
The data selection looks like this
Select the non-Billion series (plural!) and check the secondary axis
If the larger data is always positive then you can use custom formatting to clean up the axis
Align the primary and secondary axes so that the grid lines match on both
The end result is clean and readable.
Mixing percentages and numbers for the smaller numbers is not handled by this but I would suggest that that would be confusing anyway?
The simplest way to fix this might be to plot cells containing the billions values divided by 10^9 rather than to plot the billions themselves, though via a secondary axis may be possible.
Using Excel 2007. For the purple bars, the example on the left uses ColumnE values, on the right ColumnF values. E1 contains =F1/10^9 and F1 contains =14800000000:
It appears that there are 3 questions here: 1) "Why might this be happening", 2) "how can I fix this", and 3) something like "how can I plot data which lie on two widely differing ranges, and make them all reasonably visible anyway", even if there was no explicit question on this.
There are several ways to solve issue #2 about the units (e.g., billions) and numbers (e.g., 14.8 vs. 14,800.0) shown in the axis, each one with its own pros and cons:
Use Format Axis -> Axis Options -> Display units.
This might be the answer to your issue #1 as well, you might have the following selection: Display units -> Millions, and unchecked Show display units... Otherwise, I wouldn't know why you chart shows what it shows.
Use faked tick marks, as indicated in the (excellent) site of Jon Peltier
http://peltiertech.com/Excel/Charts/ArbitraryAxis.html
It gives detailed instructions on how to create tick marks on an axis with arbitrary labels (which may be text, numbers, etc.), which is more generic than what the OP wants here. In this particular case, the labels will be the desired numbers.
Create new cells containing data that would be plotted exactly the way you want.
As for your issue #3, I guess the only option is to have a Secondary Axis (see the answer by pnuts).
Thus, to come up with the best final chart for you might use a combination of one of the options I gave here and a secondary axis.

Partially missing gridlines on log-scale charts in Excel 2007

I'm using Excel 2007 to create a log-scale chart of numbers (specifically the Zimbabwean dollar exchange rate) over time. I'm using an x-y scatterplot and noticing one odd quirk.
The range of y values (numbers) spans a factor of about 10^30. On every chart I make using this data, half the gridlines are missing. Specifically, only the gridlines corresponding to the largest values show up. In fact, regardless of the total range only the top factor of 10^13 or so have gridlines. This is not dependent on the log base.
Am I doing something wrong? Is this a known bug? I can't find any references to this issue on google or microsoft's bug reports.
Silly work around as well, but if you are going to be presenting your graph in Powerpoint, you can make the background color of the graph "no fill" and then when you paste it into Powerpoint (I paste it as a PDF). You can draw grid lines and match them up with the ticks on the y-axis. Arrange your graph "bring to front" when you are finished drawing so that the lines won't appear in front of your data. You can group it all to make sure the lines don't shift while making your presentation and so that they re-size properly if you re-size your graph.
I'm having the same problem, it's definitely a bug.
Try a sequence 1, 10, 100, 1e+12, 1e+30 vs 0..4 and plot x,y scatter, and clearly the scale grid is messed-up even in linear, and in log is the behaviour you described.
My workaround was to make a transformation of the values and depict them scaled down (by a Million factor). That way the data the graph is handling is never above 10e9 (the value I started to hit issues).
So, my suggestion is: graph a Log version of the data (and clearly make a legend for it)
I was able to replicate your problem and come up with a pseudo-workaround.
The formatting goes a bit funny, but all the lines show up if you right-click on the axis, select Format Axis. Under the Axis Options, there is a Horizontal Axis Crosses setting. Changing it from Automatic to Maximum Axis Value causes all the gridlines to appear.
Ran into same thing: Will not show log grid lines for y-axis ranging below 1e-7. Have need for dynamic range of 1e5 down to 1e-15. Tagging auto or max will show grid, but puts axis labels in non-useful place for display.
My workaround: used Open Office to get what I needed. Could not find useful solution in Excel 2010.

Resources