Excel VBA How do I compare values in two large datasets with a mass defect/error? - excel

I am active in the field of analytical chemistry for my internship and wish to compare large datasets (two columns up to 15,000 rows). The main idea of this is that I have two columns with mass data (with 4 decimals) in which a macro should look for each mass in column one in column two, but with a mass defect/error.
What this means is that the value does not perfectly correspond (due to instrumental errors in measurements) but should fall within a lower limit and an upper limit. When the macro has cycled through dataset2 to check each value, dataset1's checked cell is offset to the next (.Offset(1,0)) to repeat the search for that specific value. The upperlimit and lowerlimit are automatically adjusted.
To give an example (code format used, otherwise the table would not display correctly):
Value to check from dataset1 is 101.1048, mass error is 5 ppm (parts-per-million, 0.000005%),
so the lower limit is 101.1043 and the upper limit is 101.1053. So in the example shown below,
the mass in dataset2 falls within the boundaries, after which the macro should sum the intensity
(linked to the mass column) of all mass values from dataset2 that fall within the dataset1 limits
for the checked cell. So SumIntensity=105+209 in the example, if no corresponding value is found,
the intensity of the dataset1 is used (so 100).
**Dataset1** " **Dataset2**
Mass ' Intensity " Mass ' Intensity
''''''''''''''''''''''''''''''''''''''''''''''''''''
101.1048 ' 100 " 101.1045 ' 105
101.1272 ' 300 " 101.1051 ' 209
I am however not well experienced with VBA (I have only written some basic macros to compare values in the same dataset with a mass defect/error) and after countless attempts I have not yet been able to get the macro to work.
My current code is as follows but keeps crashing (most likely due to the loops):
Sub CompareColumnsTest2()
Dim wscalc, wsdata, wscontrol As Worksheet
Set wscalc = Sheet2
Set wsdata = Sheet1
Set wscontrol = Sheet4
''-----------------------------------------------------------
''Compares datasets 1 and 2 in two steps:
''Looks up each Rounded Mass from dataset1 in dataset2 and substracting the relative intensity respectively
''Looks up each Rounded Mass from dataset 2 in dataset1 and if NOT present in dataset 1, copies Rounded Mass and (negative) Intensity
wscalc.Range("B3:B" & wscalc.Range("B" & Rows.Count).End(xlUp).Row).Copy
wscalc.Range("K3").PasteSpecial (xlPasteValues)
''Step one
Dim refcl, refint, massdefect, lowerlimit, upperlimit As Range
Set refcl = wscalc.Range("B3")
Set refint = wscalc.Range("D3")
Set pastecell = wscalc.Range("L3")
Set massdefect = wscontrol.Range("D4")
Set lowerlimit = wscalc.Range("Z2")
Set upperlimit = wscalc.Range("Z4")
Set checkcl = wscalc.Range("G3")
Set checkint = wscalc.Range("I3")
Dim refclnext, refintnext, checkclnext, pastecellnext As Range, sumint As Long
Do While Not IsEmpty(refcl)
Set refclnext = refcl.Offset(1, 0)
Set refintnext = refint.Offset(1, 0)
Set pastecellnext = pastecell.Offset(1, 0)
Set checkclnext = checkcl.Offset(1, 0)
Set checkintnext = checkint.Offset(1, 0)
sumint = 0
lowerlimit.Value = refcl / (1 + (massdefect / 1000000))
upperlimit.Value = refcl * (1 + (massdefect / 1000000))
Do While Not IsEmpty(checkcl)
If checkcl <= upperlimit And checkcl >= lowerlimit Then
sumint = sumint + checkint
End If
Set checkcl = checkclnext
Set checkint = checkintnext
Loop
Set pastecell.Value = refint - sumint
Set refcl = refclnext
Set refint = refintnext
Set pastecell = pastecellnext
Loop
End Sub
I hope my description is clear enough to be able to help me out. I do not ask of you to completely rewrite my code as that would ofcourse take a lot of time, but any tips/modifications would be highly appreciated.
Best,
JamesLooks
Edit 1:
Here are some screenshots showing some data and how the sheet is organized for cell references.
Overview of data and sheet layout,
Cells used for upper and lower limit

As suggested in the comments by Naresh Bhople I used excel functions (IF and SUMIFS) and incorporated these in a macro, which has solved my issue.
Best,
JamesLooks

Related

Is it possible to concatenate data from a row and a column together to display on the axis of a line graph?

In Excel, if I have data entered in a matrix format with rows representing years and the columns representing quarters, is there a method to concatenate the row/column names together when selecting the axis label range of a line graph?
Data Entry
Graph
Select Axis Label
In regards to the image of the graph, I have just selected the cells containing Qtr 1, Qtr 2, Qtr 3, Qtr 4 over and over again to get the output.
The simple solution would be to just create a new column containing the desired data. However, the spreadsheet I'm working on has dozens of similar data entry points, which is why I'm looking to see if there is another solution.
Note: I don't seem to have enough reputation to embed images yet, so I can only link.
In the image below, I've shown part of your data table and a rearranged version of it. With the rearranged table, the graph is made naturally using a scatter plot, at the right.
Notice that each quarter is shown by the minor gridlines. This may not be what you want, but it fits very well with how Excel does things.
Here's some code to convert your type of data table to the rearranged version. To run it, first select the data portion of your data table and then run the code. Let me know if you have questions.
Sub qtrTable()
Dim r As Range, qtr As Range, i As Integer, j As Integer
Set qtr = Selection
Set r = Application.InputBox("Select new table location", "Upper left", qtr(1, 1).Offset(0, 6).Address, , , , , Type:=8)
For i = 1 To qtr.Rows.Count
For j = 0 To 3
r.Offset(j, 0) = qtr(i, 1) + j * 0.25
r.Offset(j, 1) = qtr(i, j + 2)
Next j
Set r = r.Offset(4, 0)
Next i
End Sub

How to set Excel column widths to a certain number of pixels?

I have the following data set on a worksheet:
SheetName|ColumnIndex|Pixels
---------+-----------+------
abc |1 |50
abc |2 |150
def |1 |125
For each sheet, I'd like to set the column width to the appropriate number of pixels, using something like:
Sub setColumn (sheetName As string, columnIndex As long, pixels As long)
width=getWidthInCharacters(pixels)
ThisWorkbook.Sheets(sheetName).Cells(1, columnIndex).EntireColumn.ColumnWidth = width
End Sub
I haven't been able to figure out how to write the getWidthInCharacters() function. How do I convert pixels to characters, or possibly set .ColumnWidth to pixels directly?
I'm sorry to tell you, but in my experience, you can't. Column width is measured in points, and whilst you can - in theory - convert points to pixels, Excel won't listen very precisely when you assign them. They also seem to vary somewhat from monitor to monitor. Basically, points are fractions of inches, pixels are dots on the screen. Windows has a notion (right or wrong) of how many pixels there are to a point given a particular output device.
You can write a function that tweaks column width, but usually the approach has to be
Find the smallest contextual value that excel is willing to increment a column width by (say, store the original value, then assign .ColumnWidth = dblOriginal + 0.01. Check if columnwidth has changed - if it has, you just made a 1-pixel adjustment. If it hasn't, you need a bigger number than 0.01.
Find a final column width in pixels that you want, and repeat this first step until you've incremented the column width that many times.
Check the result, and see if it looks OK.
Word of warning: this is horrible, slow, and not good code, and if they've fixed column widths in versions of Excel after 2010, then you might be lucky and just be able to use a pixels-to-points function, convert and assign. There are some around, just in my experience they didn't give me consistent results on different screens on the same machine. Really weird that one.
.ColumnWidth does not depend on theme font selection, but pixel width does: https://support.microsoft.com/en-us/kb/214123
'pixel width of column A
Debug.Print (Columns("A").Width / 72) * ThisWorkbook.WebOptions.PixelsPerInch
After some reading and thinking, my solution:
Sub setColumnWidth(rColumnWidth As Range, iPixelWidth As Integer)
' set column width by pixels
' check status ScreenUpdating
Dim bScreenUpdatingState As Boolean
bScreenUpdatingState = Application.ScreenUpdating
' set status ScreenUpdating
If bScreenUpdatingState = True Then Application.ScreenUpdating = False
Dim iPointsPerInch As Byte
iPointsPerInch = 72
Dim iPixelsPerInch As Byte
iPixelsPerInch = ThisWorkbook.WebOptions.PixelsPerInch
' check 2 column widths: get iPointDelta
Dim rColumn As Range
Set rColumn = rColumnWidth.EntireColumn
rColumn.ColumnWidth = 1
Dim iPoint_1 As Single
iPoint_1 = rColumn.Width
rColumn.ColumnWidth = 2
Dim iPoint_2 As Single
iPoint_2 = rColumn.Width
Dim iPointDelta As Single
iPointDelta = iPoint_2 - iPoint_1
' set column width to iPixelWidth
Dim iPoint_New As Single
iPoint_New = iPixelWidth / iPixelsPerInch * iPointsPerInch
Dim iChar_New As Single
iChar_New = (iPoint_New - (iPointDelta - 1.5)) / iPointDelta
rColumn.ColumnWidth = iChar_New
' reset status ScreenUpdating
If bScreenUpdatingState = True Then Application.ScreenUpdating = True
End Sub
To run the sub setColumnWidth:
Sub call_setColumnWidth()
Dim r As Range
Set r = ActiveSheet.Range("C1")
setColumnWidth r, 70
End Sub

Color cells by absolute value in a range in Excel 2010

I'm looking to color a table of values in Excel 2010 by their absolute value. Basically, if I have the table:
...the cells are colored by the cell's raw value. What I would like to do is color by the cell's absolute value, so with the cell coloring of this table:
...but with the values of the first table (the real values). Any ideas on how one might do this? Through the GUI or with VBA?
I don't think that there is any way to do this with three colors (red, yellow, green), but you can do it with two colors (for example yellow and green). Simply make the color for the low value and the color for the high value the same. That way, the cells with the lower absolute value will have the middle color and cells with the higher absolute value will have the other color.
Select Your data
Conditional Formatting
Color Scale
More Rules
Select "3-Point Scale" under Format Style
Change the colors so that the Maximum and Minimum colors are the same
Here is my solution to this problem. The conditional format formula reads
=AND(ABS(B3)>0,ABS(B3)<=500)
for the darkest green, the scale changes to 500 to 1000, 1000 to 1500, and finally 1500 to 2000 for the red band.
Conditional Formats
Color Scale Values
Here is a picture of the dataset that I used to test these conditional formats:
A variation on this simple conditional formatting illustration may work for you.
Highlight the whole of the data range (you need the top LH cell to be the anchor for relative addressing) and enter the Formula: in 'relative notation' i.e. cell references without the dollar signs. You also have to consider the order of the rules.
The uppermost formula is obscured but reads =(ABS(B3)>39) * (ABS(B3)<41) Note that the * symbol applies an AND operation.
Ok, I have a solution that works with 3 color conditioning. Basically you supply a region to my code. It then creates two ranges, one of neg numbers and one of positive ones. It then applies conditional formatting
red-low yellow-mid green-high to the positive range and
red-high yellow-mid green-low to the negative range.
It was a quick solution so its sloppy and not robust (for instance it only works in columns A-Z because of a lazy ascii conversion for column numbers), but it works. (i'd post a pic but I don't have enough points)
---------------------edit-------------------------------
#pnuts is right, unless the data is symmetric this solution wont work as is. so with that in mind I came up with a new solution. First I will explain the general idea, then basically just dump the code, if you understand the logic the code should be fairly clear. It is a rather involved solution for such a seemingly simple problem, but isn't that always the way? :-P
We are still using the basic idea of the original code, create a negative range and apply colorscale to it, then create a positive range and apply the inverted color scale to it. As seen below
Negative ........... 0 ................ positive
green yellow red | red yellow green
So with our skewed data data_set={-1,-1,-2,-2,-2,-2,-3,-4,1,5,8,13} what I do is mirror the the extreme value. In this case 13, so now data_set={-13,-1,-1,-2,-2,-2,-2,-3,-4,1,5,8,13} Notice the additional -13 element. I assume you have a button to enact this macro so I store the extra -13 in a cell that is underneath the button so even though its there it isn't visible (yeah I know they can move the button etc, but it was the easiest thing I could think of)
Well that's all well and good green maps to 13 AND -13 but the color gradient is based on percentiles (in fact the color bar code uses the 50th percentile to determine the midpoint, or in our case where the yellow section is)
Selection.FormatConditions(1).ColorScaleCriteria(2).Value = 50
so with our distribution {-13,-1,-1,-2,-2,-2,-2,-3,-4,1,5,8,13} we could start seeing the yellow in the positive range around the number 8.5 Since 8.5 is 50th percentile. but in the neg range (even if we add a mirrored -13) the 50th percentile is -2, so our yellow in the negative range would start at 2!! Hardly ideal. just like pnuts mentioned, but we are getting closer. if you have fairly symmetric data this issue won't be present, but again we are looking at worst case of skewed datasets
What I did next is statistically match the midpoints....or at least their colors. So since our extreme value (13) is in the positive range we leave the yellow at the 50th percentile and try to mirror it to the negative range by changing what percentile the yellow color appears at (if the negative range had the extreme value we would leave the yellow at that 50th percentile and try to mirror it to the positive range). That means in our negative range we want to shift our yellow (50th percentile) from -2 to a number around -8.5 so it matches the positive range. I wrote a function called
Function iGetPercentileFromNumber(my_range As Range, num_to_find As Double) That does just that! More Specifically it takes a range and reads the values into an array. It then adds num_to_find to the array and figures out what percentile num_to_find belongs to as an integer 0-100 (hence the i in the function name). Again using our example data we would call something like
imidcolorpercentile = iGetPercentileFromNumber(negrange with extra element -13, -8.5)
Where the -8.5 is the negative(50th percentile number of positive range = 8.5). Don't worry the code automatically supplies the ranges and the numbers, this is just for your understanding. The function would add -8.5 to our array of negative values {-13,-1,-1,-2,-2,-2,-2,-3,-4,-8.5} then figure out what percentile it is.
Now we take that percentile and pass it in as the midpoint for our negrange conditional formatting. so we changed the yellow from 50th percentile
Selection.FormatConditions(1).ColorScaleCriteria(2).Value = 50
to our new value
Selection.FormatConditions(1).ColorScaleCriteria(2).Value = imidcolorpercentile 'was 50
which now deskewed the colors!! we have basically created a symmetric in appearance color bar. Even if our numbers are far from symmetric.
Ok, I know that was a TON to read and digest. but here are the main takeaways this code
- uses full 3-color conditional formatting (not simply setting the two extreme colors the same to look like abs value)
- creates symmetric color ranges by using a obstructed cell (say under a button) to hold the extreme values
- uses statistical analysis to match the color gradients even in skewed data sets
both steps are necessary and neither one on its own is sufficient to create a true mirror color scale
Since this solution requires statistical analysis of the data set, you would need to run it again any time you changed a number (which was actually the case before, I just never said it)
and now the code. Put it in vba or some other highlighting program. It is nearly impossible to read as is ..... takes deep breath
Sub main()
Dim Rng As Range
Dim Cell_under_button As String
Set Rng = Range("A1:H10") 'change me!!!!!!!
Cell_under_button = "A15"
Call AbsoluteValColorBars(Rng, Cell_under_button)
End Sub
Function iGetPercentileFromNumber(my_range As Range, num_to_find As Double)
If (my_range.Count <= 0) Then
Exit Function
End If
Dim dval_arr() As Double
'this is one bigger than the range becasue we will add "num_to_find" to it
ReDim dval_arr(my_range.Count + 1)
Dim icurr_idx As Integer
Dim ipos_num As Integer
icurr_idx = 0
'creates array of all the numbers in your range
For Each cell In my_range
dval_arr(icurr_idx) = cell.Value
icurr_idx = icurr_idx + 1
Next
'adds the number we are searching for to the array
dval_arr(icurr_idx) = num_to_find
'sorts array in descending order
dval_arr = BubbleSrt(dval_arr, False)
'if match_type is 0, MATCH finds an exact match
ipos_exact = Application.Match(CLng(num_to_find), dval_arr, 0)
'there is a runtime error that can crop up when num_to_find isn't formated as long
'so we converted it, if it was a double we may not find an exact match so ipos_Exact
'may fail. now we have to find the closest numbers below or above clong(num_to_find)
'If match_type is -1, MATCH finds the value <= num_to_find
ipos_small = Application.Match(CLng(num_to_find), dval_arr, -1)
If (IsError(ipos_small)) Then
Exit Function
End If
'sorts array in ascending order
dval_arr = BubbleSrt(dval_arr, True)
'now we find the index of our mid color point
'If match_type is 1, MATCH finds the value >= num_to_find
ipos_large = Application.Match(CLng(num_to_find), dval_arr, 1)
If (IsError(ipos_large)) Then
Exit Function
End If
'barring any crazy errors descending order = reverse order (ascending) so
ipos_small = UBound(dval_arr) - ipos_small
'to minimize color error we pick the value closest to num_to_find
If Not (IsError(ipos_exact)) Then
'barring any crazy errors descending order = reverse order (ascending) so
'since the index was WRT descending subtract that from the length to get ascending
ipos_num = UBound(dval_arr) - ipos_exact
Else
If (Abs(dval_arr(ipos_large) - num_to_find) < Abs(dval_arr(ipos_small) - num_to_find)) Then
ipos_num = ipos_large
Else
ipos_num = ipos_small
End If
End If
'gets the percentile as an integer value 0-100
iGetPercentileFromNumber = Round(CDbl(ipos_num) / my_range.Count * 100)
End Function
'fairly well known algorithm doesn't need muxh explanation
Public Function BubbleSrt(ArrayIn, Ascending As Boolean)
Dim SrtTemp As Variant
Dim i As Long
Dim j As Long
If Ascending = True Then
For i = LBound(ArrayIn) To UBound(ArrayIn)
For j = i + 1 To UBound(ArrayIn)
If ArrayIn(i) > ArrayIn(j) Then
SrtTemp = ArrayIn(j)
ArrayIn(j) = ArrayIn(i)
ArrayIn(i) = SrtTemp
End If
Next j
Next i
Else
For i = LBound(ArrayIn) To UBound(ArrayIn)
For j = i + 1 To UBound(ArrayIn)
If ArrayIn(i) < ArrayIn(j) Then
SrtTemp = ArrayIn(j)
ArrayIn(j) = ArrayIn(i)
ArrayIn(i) = SrtTemp
End If
Next j
Next i
End If
BubbleSrt = ArrayIn
End Function
Sub AbsoluteValColorBars(Rng As Range, Cell_under_button As String)
negrange = ""
posrange = ""
'deletes existing rules
Rng.FormatConditions.Delete
'makes a negative and positive range
For Each cell In Rng
If cell.Value < 0 Then
' im certain there is a better way to get the column character
negrange = negrange & Chr(cell.Column + 64) & cell.Row & ","
Else
' im certain there is a better way to get the column character
posrange = posrange & Chr(cell.Column + 64) & cell.Row & ","
End If
Next cell
'removes trailing comma
If Len(negrange) > 0 Then
negrange = Left(negrange, Len(negrange) - 1)
End If
If Len(posrange) > 0 Then
posrange = Left(posrange, Len(posrange) - 1)
End If
'finds the data extrema
most_pos = WorksheetFunction.Max(Range(posrange))
most_neg = WorksheetFunction.Min(Range(negrange))
'initial values
neg_range_percentile = 50
pos_range_percentile = 50
'if the negative range has the most extreme value
If (most_pos + most_neg < 0) Then
'put the corresponding positive number in our obstructed cell
Range(Cell_under_button).Value = -1 * most_neg
'and add it to the positive range, to reskew the data
posrange = posrange & "," & Cell_under_button
'gets the 50th percentile number from neg range and tries to mirror it in pos range
'this should statistically skew the data
the_num = WorksheetFunction.Percentile_Inc(Range(negrange), 0.5)
pos_range_percentile = iGetPercentileFromNumber(Range(posrange), -1 * the_num)
Else
'put the corresponding negative number in our obstructed cell
Range(Cell_under_button).Value = -1 * most_pos
'and add it to the positive range, to reskew the data
negrange = negrange & "," & Cell_under_button
'gets the 50th percentile number from pos range and tries to mirror it in neg range
'this should statistically skew the data
the_num = WorksheetFunction.Percentile_Inc(Range(posrange), 0.5)
neg_range_percentile = iGetPercentileFromNumber(Range(negrange), -1 * the_num)
End If
'low red high green for positive range
Call addColorBar(posrange, False, pos_range_percentile)
'high red low green for negative range
Call addColorBar(negrange, True, neg_range_percentile)
End Sub
Sub addColorBar(my_range, binverted, imidcolorpercentile)
If (binverted) Then
'ai -> array ints
adcolor = Array(8109667, 8711167, 7039480)
' green , yellow , red
Else
adcolor = Array(7039480, 8711167, 8109667)
' red , yellow , greeb
End If
Range(my_range).Select
'these were just found using the record macro feature
Selection.FormatConditions.AddColorScale ColorScaleType:=3
Selection.FormatConditions(Selection.FormatConditions.Count).SetFirstPriority
'assigns a color for the lowest values in the range
Selection.FormatConditions(1).ColorScaleCriteria(1).Type = _
xlConditionValueLowestValue
With Selection.FormatConditions(1).ColorScaleCriteria(1).FormatColor
.Color = adcolor(0)
.TintAndShade = 0
End With
'assigns color to... midpoint of range
Selection.FormatConditions(1).ColorScaleCriteria(2).Type = _
xlConditionValuePercentile
Selection.FormatConditions(1).ColorScaleCriteria(2).Value = imidcolorpercentile 'originally 50
With Selection.FormatConditions(1).ColorScaleCriteria(2).FormatColor
.Color = adcolor(1)
.TintAndShade = 0
End With
'assigns colors to highest values in the range
Selection.FormatConditions(1).ColorScaleCriteria(3).Type = _
xlConditionValueHighestValue
With Selection.FormatConditions(1).ColorScaleCriteria(3).FormatColor
.Color = adcolor(2)
.TintAndShade = 0
End With
End Sub
I am going to borrow heavily from the answer of #barryleajo (won't hurt my feelings if you select that answer). As was stated in that answer the order of the conditional formatting is the key, start with the smallest absolute values and work your way up. The difference between that answer and this one is that there is no need to use an "and" statement, since the OP seems to indicate that all values within a certain range of absolute value should receive the same color format. Here is a small example:

Generated Data Label values from worksheet sometimes don't show

I am generating values in Excel 2010, initially putting them into an array, then copying them into a worksheet for use as datalabels for a logarithmix x-axis (actually calling Chart Labeller to do that, but this also happens when i manually apply through Excel). For the most part this works fine without problems. In certain instances, however, some, but not all, of the data labels do not visible show, even though the data in the worksheet is there, manually selecting the data labels shows an invisible label selected.
What I found out, and I think this may be a bug in Excel, when I go to the worksheet, and re-type in the value that is not showing up on the chart, it then shows up on the chart.
Here are my dim's for the array:
Dim chart_labeler_info_x()
Here is how I populate the array:
'Assuming we are going to do the x-axis
ReDim chart_labeler_info_x(1 To x_axis_interval_num, 1 To 3)
For k = 1 To x_axis_interval_num
'Column 1 is the new chart label value, column 2 is the y value of the new series , column 3 is the x value(equivalent to 111...)
'--------------------------------------------------------
chart_labeler_info_x(k, 1) = suf_ize(10 ^ (Log(x_axis.MinimumScale) / Log(10#) + (k - 1)))
chart_labeler_info_x(k, 2) = y_axis.MinimumScale
chart_labeler_info_x(k, 3) = 10 ^ (Log(x_axis.MinimumScale) / Log(10#) + (k - 1))
'--------------------------------------------------------
Next k
Here is how I initialize the range on the worksheet:
Set new_labeler_ws_x_axis = Sheets.Add
new_labeler_ws_x_axis.Name = Chart_for_series & "Eng_Labels_X_Axis"
new_labeler_ws_x_axis.Range("a1:c" & x_axis_interval_num).Value = chart_labeler_info_x
new_labeler_ws_x_axis.Range("a1:c" & x_axis_interval_num).Font.Name = "Arial"
new_labeler_ws_x_axis.Range("a1:c" & x_axis_interval_num).Font.Size = 7
I also create a new series attached to this range:
With ActiveChart.SeriesCollection.NewSeries
.XValues = Sheets(new_labeler_ws_x_axis.Name).Range("C1:C" & x_axis_interval_num)
.Values = Sheets(new_labeler_ws_x_axis.Name).Range("B1:B" & x_axis_interval_num)
.Name = "=""Labeller_x"""
.Border.Color = RGB(0, 0, 0)
.Format.line.Visible = True
End With
The data that is generated in the worksheet looks like this:
1m 100 0.001
10m 100 0.01
100m 100 0.1
1 100 1
10 100 10
100 100 100
1k 100 1000
Column 1 has the values that will be used as the new data labels. Column 2 is the y-value, Column 3 is the actual x-value. (I can attach the worksheet if that helps.)
Here is an image of what I am talking about:
You notice that the 1k data label that should be there, is not visible.
I can make the 1k data label appear one of two ways:
Extend the maximum value for the series, in this case to 10,000 (10k) in which case the 1k label shows.
Manually go to the worksheet, select the cell that has the 1000 value, re-enter the value 1000 and press return, the data label then shows up as 1k.
Some other interesting anomalies, when the maximum value is 100, the data label for 100 disappears also. As the maximum value is increased beyond 1000, there seems to be no problems the data labels all show themselves.
I have tried changing the number format, which general, to number, with two decimal places, no luck. Changing to text and back, no luck.
I think this is a bug, but haven't found in info, can any of the experts out there shine some light on this?
I found a solution, while somewhat of a hack, i think it underlines what the problem may actually be, and maybe someone can suggest a more elegant solution.
I added the last line of this code block:
chart_labeler_info_x(k, 1) = suf_ize(10 ^ (Log(x_axis.MinimumScale) / Log(10#) + (k - 1)))
chart_labeler_info_x(k, 2) = y_axis.MinimumScale
chart_labeler_info_x(k, 3) = 10# ^ (Log(x_axis.MinimumScale) / Log(10#) + (k - 1#))
'This line did the trick
If chart_labeler_info_x(k, 3) >= 1 Then chart_labeler_info_x(k, 3) = Round(chart_labeler_info_x(k, 3), 0)
As i said previously, we had found that manually updating the value in the cell, caused the labels to be visible. We tried applying the Round function to the value in the cell, and that worked, so i put it as a check in the code, for values of 1 and higher.
It appears that, even though the cell value shown is 1000, internally, it must not be. (I checked .value and .value2, they both reported 1000.) The bug i think lies in this happening SOMETIMES. If the maximum value of the series is increased, the 1k label appears, even though it it the same math being used to generate the values.
Maybe someone can explain why this is happening and offer a more elegant solution!
Thanks,
Russ

VBA: Generating Data that mimics specific parameters (Avg, StdDev..etc)

I have modified a VBA array function given to me here: Excel Generate Normalized Data
That question will explain what I am after.
Download the excel I am using to completely understand:
http://www.mediafire.com/?smq5tl9poitdacc
I am using the following data (The left side are values I enter for the data to be based upon, the right side is the results of the generated data):
As you can see, the % Diff is very good for Avg Click, but Click/Time is off when there is a high Day StdDev (Day +/-). The difference when there is a low Day Stddev is close to 0.
I think this is because the var NoClickDaysPerClick_Running_Avg becomes inaccurate because the NoClickDays_Total (Which is used indirectly to determine the running avg) is "guessed" at the start, and needs to be reassessed each click because the high StdDev adds randomness and the original "guess" becomes more and more inaccurate.
I am not sure if this is the problem, or if it is how I can even solve it.
I am just looking for advice on the best way to do what it is I want. I am not sure why the stdDev are so far off either, but thats not a big deal. I'd rather have a more accurate Click/Time than anything else- reguardless what the Day StdDev is.
here is the function in my VBA:
Function ClickSpacer(Total_Days As Long, ClicksPerDay_Desired_Avg As Double, Clicks_Desired_Deviation As Double, Clicks_Min As Integer, Clicks_Max As Integer, TotalClicksOverTotalDays_Desired_Avg As Double, NoClickDays_Desired_Deviation As Double, NoClickDays_Min As Integer, NoClickDays_Max As Integer)
Dim Day_Array() As Integer
ReDim Day_Array(1 To Total_Days, 1 To 1)
Dim NumDaysToGetClicks As Double
Dim ClickOffset As Long
Dim Clicks_Total As Long
Dim Clicks_SoFar As Long
Dim Clicks_Remaining As Long
Dim NoClickDaysPerClick_Desired_Avg As Double
' Number of clicks that are needed to Achieved desired Avg of clicks over time
Clicks_Total = Round(Total_Days * TotalClicksOverTotalDays_Desired_Avg, 0)
' Number of days in which the user has to click atleast once to achieve desired Avg. clicks per day
NumDaysToGetClicks = Round(Clicks_Total / ClicksPerDay_Desired_Avg, 0)
' The number of non-click days in order fill out the total days
NoClickDays_Total = Round(Total_Days - NumDaysToGetClicks, 0)
' The guessimated average of non-click days per click to fill out total non-click days
' This is never used, just used for comparsion of the running Avg
NoClickDaysPerClick_Desired_Avg = NoClickDays_Total / NumDaysToGetClicks
'This variable is here to achieved closer results to the desired StdDev.
'a higher multiplyer will not limit the deviation but just give an average deviation
'For example, if the Average was 3 with a +/- 2, then with a StdDevMulti of 1
'ALL numbers will be 1 (3-2) through 5 (3+2) with an avg of 3 and stddev of 2, the numbers will NEVER exceed the StdDev.
'With a StdDevMulti of 2, the numbers will be 0 through 6, but should still have an
'Avg deviation of 2.
StdDevMulti = 1
NoClickDays_Desired_Deviation = NoClickDays_Desired_Deviation * StdDevMulti
Clicks_Desired_Deviation = Clicks_Desired_Deviation * StdDevMulti
'Set the obvious defaults
ClickedDaysSoFar = 0
Clicks_SoFar = 0
NoClickDays_SoFar = 0
'Give the ClickOffset a starting value
ClickOffset = NoClickDaysPerClick_Desired_Avg
Do
'used to find the "running" average of days not clicked
NoClickDays_Remaining = NoClickDays_Total - NoClickDays_SoFar
'used to find the "running" average of clicks per day
Clicks_Remaining = (Clicks_Total - Clicks_SoFar)
'used in both "running" averages mentioned above and also will
'mark the end of the while loop.
RemainingClickedDays = (NumDaysToGetClicks - ClickedDaysSoFar)
' Find what the average num. click should be based on the remaining
' and then apply the deviation. Only accept a click below its max
' above its min.
Do
' Generate a random number between -1 and 1
SignChanger = Rnd() - Rnd()
' Apply the randomized StdDev
Clicks_Deviation = Clicks_Desired_Deviation * SignChanger
'Figure out the "running" average
ClicksPerDay_Running_Avg = Clicks_Remaining / RemainingClickedDays
'Figure out a click value and round to the nearest whole number
Generated_Clicks = Round(ClicksPerDay_Running_Avg + Clicks_Deviation, 0)
' Make sure it meets the requirements, if not, try again
Loop While Generated_Clicks < Clicks_Min Or Generated_Clicks > Clicks_Max
' Set the click value to the spaced-out array index
Day_Array(ClickOffset, 1) = Generated_Clicks
'Find a random space based upon the "running" avg. and desired deviation
'Make sure it between the min and max required.
Do
' Generate a random number between -1 and 1
SignChanger = Rnd() - Rnd()
' Apply the randomized StdDev
NoClickDays_Deviation = NoClickDays_Desired_Deviation * SignChanger
'Figure out the "running" average
NoClickDaysPerClick_Running_Avg = NoClickDays_Remaining / RemainingClickedDays
'Figure out a space value and round to the nearest whole number
Generated_NoClickDays = Round(NoClickDaysPerClick_Running_Avg + NoClickDays_Deviation, 0)
' Make sure it meets the requirements, if not, try again
Loop While Generated_NoClickDays < NoClickDays_Min Or Generated_NoClickDays >= NoClickDays_Max
'Define the array index based upon the spacing previously generated.
' Make sure to "add" upon the already known index. Add 1 because you
'have to account for the index the click occupies
ClickOffset = ClickOffset + Generated_NoClickDays + 1
'These should be self-explaintory
ClickedDaysSoFar = ClickedDaysSoFar + 1
Clicks_SoFar = Clicks_SoFar + Generated_Clicks
NoClickDays_SoFar = NoClickDays_SoFar + Generated_NoClickDays
Loop While ClickOffset < Total_Days And RemainingClickedDays > 0
'Set the array equal to the clicks so that it returns the array as
'we want. Ideally this will be just replace Total_Days fields under
'the base, so not to require a array-function. Neither of these work:
'ClickSpacer = Range("P1:P" & UBound(Day_Array) + 1).Value
'Range("P1:P" & UBound(Day_Array) + 1) = Application.Transpose(Day_Array)
ClickSpacer = Day_Array
End Function
I think your assumption is correct. The "problem" with the code you have above is that it uses StdDev as the basis for generating random numbers, so the standard deviation will tend to be accurate and the mean will be less accurate.
If you want more accuracy with the mean and less with the standard deviation, then you'll have to "flip" how numbers are generated: they'll need to center around your desired mean and use the desired standard deviation as a guide, rather than the other way around.
I have an idea about how this can be done, but it will take more concentration than I can apply at work, so I'll have to come back and edit this later. I'll see what I can do.

Resources