I am trying to compute the average cell size on the following set of points, as seen on the picture: . The picture was generated using gnuplot:
gnuplot> plot "debug.dat" using 1:2
The points are almost aligned on a rectangular grid, but not quite. There seems to be a bias (jitter?) of say 10-15% along either X or Y. How would one compute efficiently a proper partition in tiles so that there is virtually only one point per tile, size would be expressed as (tilex, tiley). I use the word virtually since the 10-15% bias may have moved a point in another adjacent tile.
Just for reference, I have manually sorted (hopefully correct) and extracted the first 10 points:
-133920,33480
-132480,33476
-131044,33472
-129602,33467
-128162,33463
-139679,34576
-138239,34572
-136799,34568
-135359,34564
-133925,34562
Just for clarification, a valid tile as per the above description would be (1435,1060), but I am really looking for a quick automated way.
Let's do this for X coordinate only:
1) sort the X coordinates
2) look at deltas between two subsequent X coordinates. These delta will fall into two categories - either they correspond to spaces between two columns, or to spaces between crosses within the same column. Your goal is to find a threshold that will separate the long spaces from the short ones. This can be done by finding a threshold that separates the deltas into two groups whose means are the furthest apart (I think)
3) once you have the threshold, separate points into columns. A columns starts and ends with a delta corresponding to the threshold you measured previously
4) calculate average position of each detected column
5) take deltas between subsequent columns. Now, the problem is that you may get a stray point that would break your columns. Use a median to get the strays out.
6) You should have a robust estimate of your gridX
Example, using your data, looking at axis X:
-133920 -132480 -131044 -129602 -128162 -139679 -138239 -136799 -135359 -133925
Sorted + deltas:
5 1434 1436 1440 1440 1440 1440 1440 1442
Here you can see that there is a very obvious threshold between small (5) and large (1434 and up) delta. 1434 will define your space here
Split the points into columns:
-139679|-138239|-136799|-135359|-133925 -133920|-132480|-131044|-129602|-128162
1440 1440 1440 1434 5 1440 1436 1442 1440
Almost all points are alone, except the two -133925 -133920.
The average grid line positions are:
-139679 -138239 -136799 -135359 -133922.5 -132480 -131044 -129602 -128162
Sorted deltas:
1436.0 1436.5 1440.0 1440.0 1440.0 1440.0 1442.0 1442.5
Median:
1440
Which is the correct answer for your SMALL data set, IMHO.
Related
I am wanting to create a line graph with the same variables, but two different machines. There are 16 total variables so overlapping 16 lines would look so messy. I don't want the user to have to copy and paste two charts at the end of each week to email, so I am wondering if it would be possible to create a chart like the one attached. Any help is awesome, thank you!
Yes, but it may not be as pretty as you want. Just assign half the series to the secondary axis, then change the display ranges for the two axes. Here is how you figure out what to set them to.
Axis 1 minimum = Actual Data Minimum - Actual Data Maximum
Axis 1 maximum = Actual Data Maximum
Axis 2 Minimum = actual data minimum
Axis 2 maximum = Actual data maximum * 2
Since you have percentages in your example, this would be:
Primary axis minimum = -100%, maximum = 100%
Secondary axis minimum = 0%, maximum = 200%
Then you can play around with where the labels display etc to try to clean it up, but it won't look nearly as good as 2 charts (which you have ruled out).
I m designing an application that is suppose to plot multiple graphs for sensor data on a single tablet screen.All the graphs shall have common x-axis that displays time(1 sec to 2 mins) but y axis data for all the plots is different. I was able to successfully plot all the graphs but not sure how to display a common x- axis for the graphs? Has anyone tried doing this?
You can definitely display multiple plots (each containing it's own graph space) with identical domain (xAxis) labeling. The key to doing this is constraining each plot's boundaries in the same way.
Let's say that for the sake of this example your series data uses a time offset in milliseconds as it's x values. For 1 second that gives us:
xMin = 1000 (1 second)
xMax = 120000 (2 minutes)
which translates to:
plot.setDomainBoundaries(1000, 12000, BoundaryMode.FIXED);
If you're using real timestamps instead of offsets, the same principle applies. You'll just have to decide what the starting timestamp should be and then calculate the ending timestamp by adding 120000 to it.
I have designed a netlogo model which outputs number of turtles in each run. Number of turtles increases with ticks and becomes constant to a value N. I run the model 50 times and I have the data with 50 different N values varying from 9 to 12. I have to report the result with a graph showing number of turtles increasing with the ticks. For one simulation it will become constant at 9 (N = 9) and for some other it will become constant at 10 (N = 10).
For which simulation out of the 50, should I draw the graph for?
or
Should I take the average of 50 values for each tick, and draw a graph for that?
What is the right approach to convey that in my result, confirmed by 50 simulations, the number of turtles increases with ticks and becomes constant (which varies in the range of (9 - 12) for different simulations) ?
Thank you.
The point of doing multiple simulations is to average out the stochastic effects. Without seeing your data, the most appropriate graph is probably one that averages your variable of interest (eg final turtle count, or turtle count at each tick). That average should be taken across the simulations that are running the same scenario (that is, have the same starting parameters) if you want to compare scenarios.
I am trying to plot the envelope (maximum) values of a series of data. What I need is not the maximum value of the y-axis as the value of x-axis increase but an envelope or spectrum which joins only the maximum points as the values of x-axis increase.
My data look like:
If I ask for the maximum y-values as the values of the x-axis increase, I will get this one (the black line is the maximum of all data as x is asceding):
But I need a line which joins only the next maximum points till x=30 and then the maximum values, which descend (from x=30 to x=100). The curve I need should be smooth and not follow the values of the data but only join the next maximum.
The next curve is the envelope but only after the absolute maximum point. At the left of the absolute maximum point the envelope is not the wished one:
After posting my questions (as comments), I think the following will do what you want (here I'm assuming I understood what you need):
1) At any point along the X axis, you already know how to recognize a maximum,
2) If (1) is correct, you will take into account a maximum (i.e. make it part of the envelope curve) if and only if:
a) All the points to the right are lower than the current maximum, and/or
b) All the points to the left are lower than the current maximum.
Intuitively, this should work.
EDIT:
Assuming that data is arranged in columns, say between B and D and rows 10 to 100, define in cell E10 the following:
=IF(AND(MAX(B10,D10)>MAX(B9:D9),AND(MAX(B10,D10)>MAX(B11:D11)),MAX(B10,D10),"")
This formula will result into a value if you have a local maximum in rows 11 to 99 or blanks otherwise. Then, drag the formula till row 100 and voilĂ !!!
Note that the first and last point (i.e. rows 10 and 100) might yield a wrong result though. To prevent that, just alter the formula in those two rows.
Hope this is what you were looking for.
I'm trying to draw a graph where the y-axis is disk sizes.
And I have sizes ranging from 2 kilobytes through about 22 petabytes.
Represented as numbers that is ~2000 to 22e12
This looks pretty bad on a chart axis.
I could set the scale to "thousands" and then I'd be left with numbers between 2 and 22e9 and the reader is left to do the math that 22e9 (thousand) bytes is 22 petabytes and stuff like that.
But that's not intuitive.
So I tried a custom format.
I know that I can do
[Red][>1000000000];[Blue][>1000000]
but only two can be provided in this way.
I also know that I can do stuff for positive, negative and zero as well.
But is there a way in which I can accomplish the following:
(a) cell values are numbers, sizes in bytes, kilobytes or some such unit
(b) graph shows y axis with these numbers
(c) y-axis is logarithmic (very important)
(d) the y-axis labels are converted to K, M, G or P bytes as appropriate
If you think you have a solution, please verify it with this sample data:
1990, 2050
1992, 21246
1993, 208557
1996, 20971520
2000, 306184192
2012, 1.75922E+14
Your graph should be an X-Y Scatter (with lines)
Your graph should include the numbers in the first column as the x-axis on a linear scale
Your graph should include the numbers in the second column as the y-axis on a logarithmic scale
Your graph should have y-axis legends like "1K", "10K", "100K", "1M", "10M", "100M", ... "1P" and so on at the appropriate points.
This same solution would also be obviously applicable for money, where you want to show numbers in thousands, millions or billions with the appropriate suffix and a small number.
Try this to convert a string value in the form 99.9G to 99.9E^9 value
=CHOOSE(SEARCH(RIGHT(B5),"kMG"), 10^3,10^6,10^9)*VALUE(LEFT(B5,LEN(B5)-1))