Power BI reduce bubble size on maps - powerbi-desktop

I have a list of cities that I'm plotting on a map in Power BI. The client asked if there is anything we can do to reduce the bubble sizes, but I already have them set to 1%. Even cities with only 1 or 2 records show up at a pretty large bubble size. In the image below you can see a value of 2 is similarly sized to values as high as 2000. Any ideas on how to deal with this?

#aGuy It looks like there are significant variations in the sizes of bubbles on the map already (e.g., larger bubbles over Los Angeles & # southern tip of FL).
If the bubble sizes are such that one can't see a difference between 2 & 2000 calls, maybe adding a calculated column that describes call ranges (e.g., low (0-2000), medium (2001-10,000), & high (> 10,001) & using the new calculated column as a legend on the visual will help the client differentiate b/t these groups?

Related

Excel, Determine where data takes a dive

I'm trying to determine where, in a set of measurement data, the data takes a dive...
... so I can plot a vertical line and
... plot a horizontal line in the graph.
I have no problem doing the 2nd and 3rd bullet points above on my own, so that's taken care of.
The problem I need help with is the first bullet point - determining WHERE the data takes a dive - WHERE the data crosses a threshold that basically says, "Whatever-it-is you're measuring, is no longer performing as it is expected to.".
Here's what I'm doing:
I am taking measurements using a measuring device and that device is logging the measurements in its internal memory and allowing me to download that measurement data to my computer into a csv when the test session is complete.
I pull that csv into an xls and plot the data on a graph. (see attached image)
Here's what I want to do:
If you look at the attached image I would like to find the value where the data DEFINITELY crosses BELOW the horizontal line so I can say, "Here is where the device being tested 'gave up the ghost' and was no longer able to perform as desired."
What the data roughly looks like:
Each measurement set will have the rough look and feel of the attached image but slightly different each time. (because each object I am testing will have roughly the same performance characteristics but they all have their own manufacturing defects and variations.)
The data set for the attached image is a data set of 7000 measurements.
I never really know where the horizontal line will be.
Examples of the data sets I have gotten in the past several tests look like this:
(394 to 0)
(390000 to 0)
(3.88 to 0)
(375000 to 0)
(39.55 to 0)
(59200 to 0)
and each data set will have about 1,000 to 7,000 measurements each.
Here's how I was trying to solve this issue:
I was using SLOPE() and trying to latch onto where the slop of the line took a dive / started to work its way to a zero slope (which is a vertical line) so when it starts approaching a really small slope then it MUST be taking a dive. That didn't really work.
I was looking at using STDEV.P() in Excel and feeding it the entire data set. Then I was looking at doing the same thing but feeding it only the first 10, 30, 60 measurements but then I thought - we never really know just how many measurements will come through. Then I thought I would use the first 10% of the measurements and feed that to STDEV.P().
Please let me know what you think of this and please let me know of any ideas you may have.
Thanks.
H
Something like this should work to flag when the decay rate increases.
To find what 'direction' your data is going in you need the derivative.
Excel doesn't have a derivative formula but you can set it up pretty easily by using the (change in y)/(change in x) as demonstrated here:
http://faculty.educ.ubc.ca/sanderson/lab/CLFbiom/demo/diff.htm
I would then check a formula which counts how many datarows you have (=COUNTA(A:A) or similar)
Then uses that to get a step of 10% of your data
Then check the value of the derivative in a cell against a cell 10% further down. If it's still a negative (to account for the slight downhill at first) then you'll know
The right way to go about this is to model the data with an unknown discontinuity, something like "if time < break_time then (some constant plus noise) else (decaying exponential)". A maximum likelihood estimation for that model might require iteration or other operations which are clumsy in Excel -- maybe you should consider VB or Python or some other programming language. I.e. choose the tool to fit the problem and not the other way around.
See Seber and Wild, "Nonlinear Regression", for an extensive discussion of models with discontinuities.
If your data can be generally characterized as having:
(A) a more or less flat plateau region, followed by
(B) a downward trending region
then a basic strategy could be to start at then end of the data and march towards the beginning one point at a time, checking to see that the values are increasing. Once they stop increasing, you've found the break point.
The strategy assumes (unwisely?) that the downward trending region is smooth/noiseless. To make the solution more robust to noise, you could compare values that are 5 apart, or 10 apart, or whatever interval works to filter out the noise. Or you could use a moving average.
This strategy could potentially be made more efficient by starting the search somewhere in the middle of the data but still in downward trending portion. If you know (based on experience) that any value that is (say) 0.5X the maximum is in the downward trending portion, you could start the search there.
Hope that helps.
It appears as though you want to detect when the slope changes from something near zero to something negative. One way to detect this is to calculate the 2nd derivative of the values (calculate the slope of the slope). The 2nd derivative should be near zero in the flat portion of the data AND in the downward trending portion of the data. It should go negative at the break point. So finding the minimum (most negative) value of the 2nd should locate the break point.
To implement this, you probably will need to filter noise. So calculate the first derivative (slope) over some suitable window of data:
=SLOPE(moving window of say 25 raw values)
Then calculate the second derivative (slope of slope):
=SLOPE(moving window of say 25 slope values)
Then look for the minimum.
Hope that helps.

How to produce the data points for a circle in Excel using ROW INDIRECT

The page linked to here has been a great help to me. The method of using the named function (=(ROW(INDIRECT("1:361"))-1)*PI()/180) to produce the circle data points is very slick compared to my original method that was to calculate them individually, writing them in to rows.
My data set includes some 50k rows of data, each one defining a circle. The set is divided into 50 groups and I need to plot one circle from each group as selected via a scroll bar controlling a LOOKUP routine.
Please can someone suggest how I might modify the function (=(ROW(INDIRECT("1:361"))-1)*PI()/180) to reduce the number of data points it produces? I want to reduce the computing load and also, it's not practical to display & format data markers with such high data density. My existing circles are produced with just 18 coordinate pairs and are satisfactorily rounded.
Thanks in advance. Steve.
This would give you 19 data points, 0 and 360 as the start/end points with another every 20%
=(ROW(INDIRECT("1:19"))-1)*PI()/9

Excel Chart doesn't keep format

I have a table (came from a pivot table) where I have formatted the column 4 cells to show 1 billion as 1. But when I select the table and insert a chart, I am getting my units in millions. So the 14.8 billion number for Mexico is showing up as 14,800 on the chart. Why might this be happening and how can I fix this? This is also making all my other bars negligibly small. Note that the first three columns are not in billions and are totally different things. Some are percentages, some are other small numbers.
Table:
Chart:
You need a secondary horizontal axis and some formatting on the Axes.
In Excel 2013
First change the Chart Type to Combo and select Clustered Bar for both sets of data, then Check
Secondary Axis for the Percentage Series.
Then set up the axis limits so they match, e.g.
Percentage: min -.5 max 2
Billions: min -5e9 max 20e9
Then set the percentage format on the source data to a custom Number format of "";(0)%;0%
Then set the Billions format as 0,,,;"";0
You will get something like this:
EDIT
Now that we have the general principles, we can apply them to your specific data.
I will also switch to Excel 2010 do show the different menus.
The data selection looks like this
Select the non-Billion series (plural!) and check the secondary axis
If the larger data is always positive then you can use custom formatting to clean up the axis
Align the primary and secondary axes so that the grid lines match on both
The end result is clean and readable.
Mixing percentages and numbers for the smaller numbers is not handled by this but I would suggest that that would be confusing anyway?
The simplest way to fix this might be to plot cells containing the billions values divided by 10^9 rather than to plot the billions themselves, though via a secondary axis may be possible.
Using Excel 2007. For the purple bars, the example on the left uses ColumnE values, on the right ColumnF values. E1 contains =F1/10^9 and F1 contains =14800000000:
It appears that there are 3 questions here: 1) "Why might this be happening", 2) "how can I fix this", and 3) something like "how can I plot data which lie on two widely differing ranges, and make them all reasonably visible anyway", even if there was no explicit question on this.
There are several ways to solve issue #2 about the units (e.g., billions) and numbers (e.g., 14.8 vs. 14,800.0) shown in the axis, each one with its own pros and cons:
Use Format Axis -> Axis Options -> Display units.
This might be the answer to your issue #1 as well, you might have the following selection: Display units -> Millions, and unchecked Show display units... Otherwise, I wouldn't know why you chart shows what it shows.
Use faked tick marks, as indicated in the (excellent) site of Jon Peltier
http://peltiertech.com/Excel/Charts/ArbitraryAxis.html
It gives detailed instructions on how to create tick marks on an axis with arbitrary labels (which may be text, numbers, etc.), which is more generic than what the OP wants here. In this particular case, the labels will be the desired numbers.
Create new cells containing data that would be plotted exactly the way you want.
As for your issue #3, I guess the only option is to have a Secondary Axis (see the answer by pnuts).
Thus, to come up with the best final chart for you might use a combination of one of the options I gave here and a secondary axis.

Way to reduce geopoints?

Does anyone have any handy algorithms that could be used to reduce the number of geo-points ?
I am using a list of 2,000,000 postcodes which come with their own geo-point. I am using them to collect data from an API to be used offline. The program is written in C++.
I have to go through each postcode, calculate a bounding box based on the postcodes location, and then send it to the API which gives me some data near to that postcode.
However 2,000,000 is a lot to process and some of the postcodes are next to each other or close enough to each other that they would share some of the same data.
So far I've came up with two ways I could reduce them but I am not sure if they would work:
1 - Program uses data structure to record which postcode overlaps which and then run a routine a few time to removes the ones that have overlaps one by one until we are left without ones without overlapping postcodes.
Start at the top left geo point of the UK and slowly increment it the rough size of a postcode area until we have covered the entire UK.
Is there a easy way to reduce these number of postcodes so that I have few of them overlapping as possible ? whilst still making sure I get data covering as much of the UK as possible ? I was thinking there may be an algorithm handy for this, that people use else where.
You can use a quadtree especially a quadkey. A quadkey plot the points along a curve. It's similar to sort the points into a grid. Then you can traverse the grid to search deeper in the tree. You can also search around a center point. You can also use a database with a spatial index. It depends how much the data overlap but with a quadtree you can choose the size of the grid.

In Stata, how can I combine box plots of different widths?

I'm trying to combine several box plots across categories of different size.
Here is an example illustrating problem:
sysuse auto
graph box mpg, by(rep78, rows(1)) name(g1, replace )
graph box mpg, by(foreign, rows(1)) name(g2, replace )
graph combine g1 g2 , ycom r(2)
This gives me the following results.
All works according to the manual so for but I have two problems with this output.
Firstly - aesthetics. Personally, I think plot with the same width across rows would look better.
Secondly, and more importantly - on more complex graphs the font size for categories, axes, etc. is also sized proportionally. So even if I specify, let's say - medium size of axis label on all graphs - some of them will be slightly bigger or smaller.
I was wondering if there is an option to programmatically force width of second row of box plots to have the same size as the first one.
Is this you want? It is based on a trick, but the trick is quite general.
sysuse auto, clear
expand 2
gen what = cond(_n <= 74, rep78, 6 + foreign)
label def what 6 Domestic 7 Foreign
label val what what
graph box mpg, by(what, note("Repair record and Foreign") row(2) holes(8 9 10))
The logic is that
The two categorical variables are combined lengthwise. That ensures that each box plot will be the same size.
By specifying holes, we persuade graph box to put graphs on two rows.
I guess that your label size problem will disappear once 1 is solved.
For even more flexibility, you may need to abandon graph box and use twoway instead. A detailed discussion was given by me in the Stata Journal in 2009: you can go straight to http://www.stata-journal.com/sjpdf.html?articlenum=gr0039

Resources