Ranking two dimensions of data - excel

I have a set of data in two dimensions, like so:
A B C
A - 9 4
B 24 - 13
C 3 12 -
It represents relationships between two entities. I would like to return a list of those values ranked, such as:
AB:4
AC:5
BA:1
BC:2
CA:6
CB:3
Any thoughts on the best way to approach this?

Create a copy of your source matrix with, assuming layout as below,
=IFERROR(RANK(B2,$B$2:$D$4),"")
in G2 copied down and across to I4. From this 2D version of your rankings create a PivotTable with multiple consolidation ranges ("reverse pivot" - maybe Alt+D, P). Double click on the Totals intersect. If results are copied back in to source sheet as below (for convenience), add =Q2&R2&": "&S2 in U2 (or adjust accordingly) and copy down to suit:
May be more appropriate for larger datasets!

You could also use VBA to create a flat table from your summary table. From then, it would be easy to sort the pairs based on their numerical value, using CONCATENATE and RANK.
Let's assume this is your starting table:
Here is the VBA code that would transform this pivot table to a flat table:
(Go to Developer tab -> Visual Basic -> Insert -> Module -> Copy-paste the code here)
Then click on Run (green Play sign)
Sub ReversePivotTable()
' Before running this, make sure you have a summary table with column headers.
' The output table will have three columns.
Dim SummaryTable As Range, OutputRange As Range
Dim OutRow As Long
Dim r As Long, c As Long
On Error Resume Next
Set SummaryTable = ActiveCell.CurrentRegion
With SummaryTable
r = Application.Match("Totals", Columns(1), False)
c = Application.Match("Total", Rows(1), False)
End With
Set SummaryTable = SummaryTable.Resize(r - 1, c - 1)
MsgBox SummaryTable.Address
If SummaryTable.Count = 1 Or SummaryTable.Rows.Count < 3 Then
MsgBox "Select a cell within the summary table.", vbCritical
Exit Sub
End If
SummaryTable.Select
Set OutputRange = Application.InputBox(prompt:="Select a cell for the 3-column output", Type:=8)
' Convert the range
OutRow = 2
Application.ScreenUpdating = False
OutputRange.Range("A1:C3") = Array("Column1", "Column2", "Column3")
For r = 2 To SummaryTable.Rows.Count
For c = 2 To SummaryTable.Columns.Count
OutputRange.Cells(OutRow, 1) = SummaryTable.Cells(r, 1)
OutputRange.Cells(OutRow, 2) = SummaryTable.Cells(1, c)
OutputRange.Cells(OutRow, 3) = SummaryTable.Cells(r, c)
OutputRange.Cells(OutRow, 3).NumberFormat = SummaryTable.Cells(r, c).NumberFormat
OutRow = OutRow + 1
Next c
Next r
End Sub
You will get a flat table like this:
Then, use CONCATENATE and RANK to get a column of pairs and ranks.
This would be the final result:

Related

Excel index match with a condition to choose a specific column in a 3d matrix (formula or VBA)

I have two tables in Excel, one with categories and listings, and another with points based on the category and listing threshold. It goes as follows:
Categories table:
ID
CATEGORY
LISTINGS
Points
001
A
56
002
C
120
003
A
4
004
B
98
Points table:
Category
tier1
tier2
tier3
A
Tier 1
Tier 2
Tier 3
Range
1-30
31-90
91-
Points
10
20
30
B
Tier 1
Tier 2
Tier 3
Range
1-25
26-100
101-
Points
10
20
30
C
Tier 1
Tier 2
Tier 3
Range
1-40
41-80
81-
Points
10
20
30
I started with an INDEX MATCH formula pointing at the points:
=INDEX(Points!A1:D11, MATCH(Categories!B2, Points!A1:A11, 0)+2)
--> the +2 is to get the points directly
I also though of evaluating the thresholds with this formula:
=IF(Categories!C2 >= NUMBERVALUE(LEFT(Points!D3, FIND("-",Points!D3)-1)),Points!D4, IF(Categories!C2 >=NUMBERVALUE(LEFT(Points!C3, FIND("-",Points!C3)-1)),Points!C4, Points!B4))
I thought that the else if the if would make it faster.
Could someone help me populate the Points column in the Categories table? VBA code is also acceptable. The tables are in different sheets.
José, your original Match formula is the right starting place. It locates the particular sub-table to do a further lookup on. But in order to make this formula a whole lot simpler, can we change the ranges (1-30, 31-90, 91-) to have just their starting points (1, 31, 91)? If we do that simple change then we can use the approximate lookup feature of HLookup to easily and compactly specify the Lookup:
=HLookup($C2,Offset(Points!$A$1,Match($B2,Points!$A:$A,0),1,2,3),2,True)
In the middle of this formula you can see your original Match function to locate the correct sub-table based on the category. We need to feed that start point to OFFSET() to create a table range useful to the HLookup. Offset takes an anchor cell reference (top left of the Points table), number of rows to count down from there (result of the Match), number of columns to the right (1), the number of rows in the range (2), and the number of columns in the range (3).
The Hlookup is just like a VLookup, but for tables arranged left-to-right not top-to-bottom. The True as the last parameter is very important as it tells HLookup to use the range lookup instead of an exact match.
A slightly verbose formula which takes the data as originally formatted (using Excel 365 Let):
=LET(ranges,INDEX(Points!B$2:D$12,MATCH(B2,Points!A$1:A$12,0),0),
leftRanges,VALUE(LEFT(ranges,FIND("-",ranges)-1)),
points,INDEX(Points!B$2:D$12,MATCH(B2,Points!A$1:A$12,0)+1,0),
INDEX(points,MATCH(C2,leftRanges)))
As it's urgent this is what I got but I cannot continue before tomorrow. Assuming you can at least put the tiers next to the category (A on same line as tiers) this will get you the first tier. You just need to copy the whole formula in the "nextCol", with minor modifications to get the next tiers.
if you really cannot change the source you need to add an additional offset in the first match.
=IF(AND(D1>NUMBERVALUE(LEFT(INDIRECT("B"&MATCH(E1,A:A,0)),FIND("-",INDIRECT("B"&MATCH(E1,A:A,0)))-1)),D1<NUMBERVALUE(RIGHT(INDIRECT("B"&MATCH(E1,A:A,0)),2))),INDEX(B:D,MATCH(E1,A:A,0)+1,1),"nextCol")
Option Explicit
Sub Score()
Dim wsP As Worksheet, wsC As Worksheet, dict
Dim iLastRow As Long, r As Long, i As Long, j As Integer
Dim sCat As String, iListing As Integer
Dim data, ar
Set wsP = Sheets("Points")
Set wsC = Sheets("Categories")
Set dict = CreateObject("Scripting.Dictionary")
iLastRow = wsC.Cells(Rows.Count, 1).End(xlUp).Row
data = wsC.Range("A1:D" & iLastRow).Value2
' category to row lookup
For i = 1 To UBound(data)
If data(i, 1) = "Range" Then
dict.Add Trim(data(i - 1, 1)), i ' range
End If
Next
' scan points
iLastRow = wsP.Cells(Rows.Count, 1).End(xlUp).Row
For r = 2 To iLastRow
sCat = Trim(wsP.Cells(r, "B"))
iListing = wsP.Cells(r, "C")
If dict.exists(sCat) Then
i = dict(sCat)
For j = 4 To 2 Step -1
ar = Split(data(i, j), "-") ' lower-upper limit
If iListing >= ar(0) Then ' check lower limit
' update points
wsP.Cells(r, "D") = data(i + 1, j) ' points
Exit For
End If
Next
Else
MsgBox "No match '" & sCat & "'", vbCritical, "ERROR row " & r
End If
Next
MsgBox "done"
End Sub
Please, try the next code. It uses arrays and should be very fast, working only in memory. Please use your sheets when setting shC and shP as your real sheets. I only use the active sheet and the next one for testing reason:
Sub GetPoints()
Dim shC As Worksheet, shP As Worksheet, lastRC As Long, lastRP As Long, arrBC, arrP, arrPP, arrFin
Dim i As Long, j As Long, p As Long, k As Long
Set shC = ActiveSheet 'use here your Categories sheet
Set shP = shC.Next 'use here your Points sheet
lastRC = shC.Range("A" & shC.rows.count).End(xlUp).row
lastRP = shP.Range("A" & shP.rows.count).End(xlUp).row
arrBC = shC.Range("B2:C" & lastRC).Value 'put the range B:C in an array
arrP = shP.Range("A2:D" & lastRP).Value 'put all the range in an array
ReDim arrFin(1 To UBound(arrBC), 1 To 1) 'redim the array to keep processed values
For i = 1 To UBound(arrBC) 'iterate between Categ array elements:
For j = 1 To UBound(arrP) 'iterate between Points array elements:
If arrP(j, 1) = arrBC(i, 1) Then 'if Category is found:
For p = 2 To 4 'iterate between the next array row elements
arrPP = Split(arrP(j + 1, p), "-") 'split the element by "-" to determine the interval
If arrPP(1) <> "" Then 'for the tier3 case:
If arrBC(i, 2) >= CLng(arrPP(0)) And arrBC(i, 2) <= CLng(arrPP(1)) Then
k = k + 1
arrFin(k, 1) = arrP(j + 2, p): j = j + 2: Exit For 'place the value in the final array and exit iteration
End If
Else 'for the tier1 and tier2 cases:
If arrBC(i, 2) >= CLng(arrPP(0)) Then
k = k + 1
arrFin(k, 1) = arrP(j + 2, p): j = j + 2: Exit For 'place the value in the final array and exit iteration
End If
End If
Next p
End If
Next j
Next i
'drop the final array result at once:
shC.Range("D2").Resize(UBound(arrFin), 1).Value = arrFin
End Sub

In excel, is there a way to create similar tables based on the days you enter

I have the need to automatic create report that is based on the dates and portfolios.
For example if I put 2 dates and 2 portfolios like this
The report should look like this:
So if we enter 3 dates and 4 portfolios the report should have 3 tables and each one has 4 portfolios...
I'm ok to do it manual but it is ideally to be automatic,(I think it can be done through VBA, but not quite sure about it) can you guys help? Thank you.
You have to to move your data from where is stored to the Project sheet.
I guess that the date doesn't make any different on the portfolio, isn't it?
If so, it's quite easy. I don't know how your data is stored, but let's guess it's equal as shown it your screenshot.
Option Base 1 'array will start at 1 instead of 0
Public Setting As Worksheet, ListPortfolios As Worksheet, Project As Worksheet
Public RangeSelectDates As Range, RangeSelectPortfolios As Range, RowOfCodePortfolios As Range
Public ArraySelectDates(), ArraySelectPortfolios(), ArrayOfCodePortfolios(), ArrayPortfolio(), ArrayProject()
Public PortfolioCode$
Dim i%, j%, k%, r%, c%
Sub Automate()
Set Setting = Worksheets("Setting")
Set ListPortfolios = Worksheets("ListPortfolios")
Set Project = Worksheets("Project")
'First, read the portfolio code and dates to import and store in array
With Setting
Set RangeSelectDates = .Cells(4, 5).CurrentRegion
Set RangeSelectPortfolios = .Cells(4, 8).CurrentRegion
End With
ArraySelectDates = RangeSelectDates
ArraySelectPortfolios = RangeSelectPortfolios 'store the range in a Array
ReDim ArrayProject(1 To 24, 1 To 1)
'Now, create an array with the names of the portfolios where you have stored them. I don't know how your data is stored.
'I assume you've got it as the Project sheet result it's shown and also at "ListPortfolios" sheet
With ListPortfolios
Set RowOfCodePortfolios = .Rows(5)
End With
ArrayOfCodePortfolios = RowOfCodePortfolios 'store the row in a Array
k = 0 'means no value is found
For i = LBound(ArraySelectPortfolios) To UBound(ArraySelectPortfolios) 'Navigate to all the Portfolios Selected
'the portfolio codes are stored in the "second column" of the array, say PortfolioCode is the name of the portfolio
PortfolioCode = ArraySelectPortfolios(i, 2)
For j = LBound(Application.Transpose(ArrayOfCodePortfolios)) To UBound(Application.Transpose(ArrayOfCodePortfolios)) 'now navigate to where your portfolios are stored
If ArrayOfCodePortfolios(1, j) = PortfolioCode Then 'if match, create a new array with the whole portfolio
With ListPortfolios
ArrayPortfolio = .Range(.Cells(1, j), .Cells(24, j + 2)) 'I don't know the size of your data. I assume that the first column is the same of where the portfoliocode is stored and its size is 24 rows x 3 columns
End With
'now, copy it to the Project Portfolio
ReDim Preserve ArrayProject(1 To 24, 1 To 3 + k * 3)
For r = 1 To 24 'from the r (row) one to 24th. I don't know how your data is stored
For c = 1 To 3 'from the column 1 to the 3rd of each portfolio
ArrayProject(r, c + k * 3) = ArrayPortfolio(r, c) 'built the result for each portfolio found
Next c
Next r
k = k + 1 'one value is found, let's go for the next one if so
End If
Next j
Next i
If k <> 0 Then 'if any value is found then
For i = 1 To UBound(ArraySelectDates) 'let's place the date and print to the excel
ArrayProject(2, 1) = ArraySelectDates(i, 2) 'paste the date into the array
With Project
.Range(.Cells(1, 4 + 1 + (i - 1) * k), .Cells(24, UBound(Application.Transpose(ArrayProject)) + 3 + (i - 1) * k)) = ArrayProject 'print the array
'1+(i-1)*k is the first column + which date are we copying times portfolio codes found
End With
Next i
End If
End Sub
There's no error handling, either if there aren't input values may crash. But first, make it work

Count duplicates and copy results

I run a performance database and have gotten stuck with a way to track repeat offenders.
In a Results sheet is all the data, I want to create a macro that goes through the results, filters column C to each staff number and count how many times they have a "Fail" result in column D.
If they have a count of 2 or above I want the sheet to copy their name in column B and staff no in column c to the first available line in a different sheet called "Flagged" with the fail count in a 3rd column.
My data runs from rows b8 to b10008 and I have 300 staff who could be assessed
Thanks in advance!
Set up your source data as a table (Ctrl+T with cell in range selected). Add a helper column with the formula:
=SUMPRODUCT(--([Fail/Pass]="FAIL"),--([Staff No]=[#[Staff No]]))>=2=SUMPRODUCT(--(D:D="FAIL"),--(C:C=[#[Staff No]]))>=2
Create your pivottable, Alt+N+V, using compact report layout, and add your helper column to the page field and filter on True. Add name and staff No to the row fields and remove subtotals.
As it is an Excel table you can add more rows and the formula will autofill down. You then just refresh the pivottable to update your flagged list.
Data:
Fields:
Compact design layout and no subtotals.
I would recommend to make use of arrays and loop your data that way, it should be nearly instant (comparing to looping in the sheet itself).
Keep in mind this is not fully tested, but it should get you pretty close to what you are trying to achieve:
Sub flagged()
Dim arrData As Variant, arrFails As Variant
Dim failCnt As Long, i As Long, j As Long, x As Long, lastRow As Long
Dim shResults As Worksheet, shFails As Worksheet
Set shResults = ActiveWorkbook.Sheets("Results")
Set shFlagged = ActiveWorkbook.Sheets("Flagged")
ReDim arrFails(0 To 300, 0 To 2)
arrData = shResults.Range("B8:D10008").Value
For i = LBound(arrData) To UBound(arrData)
For j = LBound(arrData) To UBound(arrData)
If arrData(i, 2) = arrData(j, 2) Then
If arrData(i, 3) = "FAIL" Then
failCnt = failCnt + 1
End If
If failCnt >= 2 Then
arrFails(x, 0) = arrData(i, 1)
arrFails(x, 1) = arrData(i, 2)
arrFails(x, 2) = failCnt
x = x + 1
End If
End If
Next j
failCnt = 0
Next i
For i = LBound(arrFails) To UBound(arrFails)
If arrFails(i, 0) <> "" Then
lastRow = shFlagged.Cells(1, j).End(xlDown).Row
For j = 1 To 3
shFlagged.Cells(lastRow + 1, j) = arrFails(i, j)
Next j
End If
Next i
End Sub
EDIT: changed the size of the dimension to accommodate 3 columns. Also I've initially done this to look for sorted data by staff number, but given is not that much data, that doesn't matter much, so I've edited out the code accordingly.

Collect and subtotal duplicate rows in a Variant 2D array

I have a dataset in Excel consisting of 20 columns and a varying number of rows, ranging from 20,000 - 50,000.
Each row is a collection of items with one column denoting the quantity of items in the collection and another column denoting the total combined weight of the collection.
Some of these rows are completely identical in all columns, while some are identical in all but quantity and weight.
I want to create a macro that runs through the dataset and "stacks" rows that are duplicate on all other parameters than quantity and weight, and sums these two up.
In other words, a macro that converts this:
|Param1|Param2|...|Param18|Quantity|Weight|
| A | 1 |...| C | 5 | 12.5 |
| A | 1 |...| C | 2 | 5.0 |
| A | 1 |...| C | 3 | 7.5 |
| B | 2 |...| C | 1 | 2.3 |
| B | 2 |...| C | 2 | 4.6 |
To this:
|Param1|Param2|...|Param18|Quantity|Weight|
| A | 1 |...| C | 10 | 25.0 |
| B | 2 |...| C | 3 | 6.9 |
I know this is possible to do in a simple pivot table, but for a number of reasons this is not viable in this case.
Since I'm dealing with a large dataset, I want to load it all into memory at once rather than reading and writing line by line to speed up performance (as suggested in tip #13 in this great article http://www.databison.com/how-to-speed-up-calculation-and-improve-performance-of-excel-and-vba/). However I'm stuck as to how to make the row operations on the data stored in memory.
So far my code looks like this:
Dim r, c, LastRow As Integer
Dim temp_range As Variant
LastRow = Cells(65536, 2).End(xlUp).Row
'Load the data set into memory
temp_range = Sheets("1.1").Range(Sheets("1.1").Cells(2, 1), Sheets("1.1").Cells(LastRow, 20)).Value
'Run through the data set from bottom to top and bulk identical rows together
For r = UBound(temp_range) To LBound(temp_range)
For i = r - 1 To LBound(temp_range)
'PSEUDO CODE START HERE
If row temp_range(r) = row temp_range(i) Then
temp_range(i,19) = temp_range(r,19) + temp_range(i,19)
temp_range(i,20) = temp_range(r,19) + temp_range(i,20)
Delete row temp_range(r)
Exit For
End if
'PSEUDO CODE END HERE
Next i
Next r
I'm stuck at the Pseudo-code section highligted in the code. I simply don't know how to compare the rows, copy quantity and weight from one row to another and then delete the duplicate row in the variant holding the range in memory.
Use Microsoft Query (SQL) in Excel:
Such data manipulation operations are ideal for SQL queries. No need of going through the data line by line using VBA:
SELECT S1.Param1, S1.Param2, S1.Param18, SUM(S1.Quantity), SUM(S1.Weight)
FROM [Sheet1$] AS S1 GROUP BY Param1, Param2,Param18
This is guaranteed to run quickly and efficiently via OLE DB. Whatever VBA code will be much less efficient.
To refresh the query at any time simply run the following code from VBA:
Set ws = ActiveSheet
ws.QueryTables(1).Refresh BackgroundQuery:=False
Where ws is the worksheet where you locate the query table.
Either use the Microsoft Query from Data->From other sources->From Microsoft Query or feel free to use my Add-In: http://www.analystcave.com/excel-tools/excel-sql-add-in-free/
See if you can F8 through this. You will need to go into the VBE's Tools ► References and add Microsoft Scripting Runtime for the use of the Scripting.Dictionary object.
Sub Stack_Dupes()
Dim r As Long, c As Long, v As Long, k As Long
Dim vKEYs As Variant, vITMs As Variant, vTMP1 As Variant, vTMP2 As Variant
Dim sKEY As String, sITM As String
Dim dITMs As New Scripting.dictionary
With ActiveSheet
With .Cells(1, 1).CurrentRegion
.Rows(1).Cells.Copy _
Destination:=.Cells(.Rows.Count + 3, 1)
With .Offset(1, 0).Resize(.Rows.Count - 1, .Columns.Count)
vKEYs = .Cells.Resize(.Rows.Count, .Columns.Count - 2).Value2
vITMs = .Cells.Offset(0, .Columns.Count - 2).Resize(.Rows.Count, 2).Value2
For r = LBound(vKEYs, 1) To UBound(vKEYs, 1)
sKEY = Join(Application.Index(vKEYs, r, 0), ChrW(8203))
sITM = Join(Application.Index(vITMs, r, 0), ChrW(8203))
If dITMs.Exists(sKEY) Then
vTMP1 = Split(dITMs.Item(sKEY), ChrW(8203))
vTMP2 = Split(sITM, ChrW(8203))
vTMP1(0) = CDbl(vTMP1(0)) + CDbl(vTMP2(0))
vTMP1(1) = CDbl(vTMP1(1)) + CDbl(vTMP2(1))
sITM = Join(vTMP1, ChrW(8203))
dITMs.Item(sKEY) = sITM
Else
dITMs.Add Key:=sKEY, Item:=sITM
End If
Next r
ReDim vKEYs(1 To (.Columns.Count - 2), 1 To dITMs.Count)
ReDim vITMs(1 To 2, 1 To dITMs.Count)
For Each vTMP1 In dITMs.Keys
k = k + 1
vTMP2 = Split(vTMP1, ChrW(8203))
For v = LBound(vTMP2) To UBound(vTMP2)
vKEYs(v + 1, k) = vTMP2(v)
Next v
vTMP2 = Split(dITMs.Item(vTMP1), ChrW(8203))
For v = LBound(vTMP2) To UBound(vTMP2)
vITMs(v + 1, k) = vTMP2(v)
Next v
Next vTMP1
.Cells.Offset(.Rows.Count + 3, 0).Resize(UBound(vKEYs, 2), .Columns.Count - 2) = Application.Transpose(vKEYs)
.Cells.Offset(.Rows.Count + 3, .Columns.Count - 2).Resize(UBound(vITMs, 2), 2) = Application.Transpose(vITMs)
End With
End With
End With
dITMs.RemoveAll: Set dITMs = Nothing
End Sub
Results are written underneath the sample data as follows:
        
I would be interested in the timed results on larger data sets if you find time to post them back here.
I found this article (http://sitestory.dk/excel_vba/arrays-and-ranges.htm, go to middle of the page) on how to delete identical rows that I built some code on. It doesn't solve my original question 100% since it doesn't compare entire rows in the array but rather each column separately in each row, but it turned out to have quite good performance nonetheless.
Here's the code:
'I couldn't get the final step of pasting the output array into the new sheet working properly without declaring this option, otherwise the data would be pasted one cell to the right and below where I wanted it.
Option Base 1
Dim r, i, c, LastRow, DeletedRows As Integer
Dim input_array, output_array As Variant
Dim identical As Boolean
Dim s As Worksheet
Dim NewRange As Range
LastRow = Cells(65536, 2).End(xlUp).Row
'Load the data set into memory, consisting of 20 columns of data and a 21th column with no data that is used for marking rows for deletion.
input_array = Sheets("1.1").Range(Sheets("1.1").Cells(2, 1), Sheets("1.1").Cells(LastRow, 21)).Value
DeletedRows = 0
'Run through the data set from bottom to top comparing rows one at a time, copy Quantity and Weight values and mark rows for deletion
For r = UBound(input_array) To 2 Step -1
For i = r - 1 To 2 Step -1
'Assume row r and i are identical
identical = True
'Run through columns of r and i, flag if non-identical value is found and stop the loop (col 18 is ignored, since this is the "Quantity" column, while col 20 is the "Weight" column)
For c = 1 To 18
If input_array(r, c) <> input_array(i, c) And c <> 18 Then
identical = False
Exit For
End If
Next c
' If no non-identical columns were found, add "Quantity" and "Weight" from row r to row i and mark row r for deletion
If identical Then
input_array(i, 18) = input_array(i, 18) + input_array(r, 18)
input_array(i, 20) = input_array(i, 20) + input_array(r, 20)
input_array(r, 21) = "_DELETE_"
DeletedRows = DeletedRows + 1
Exit For
End If
Next i
Next r
' Resize the new array to the size of the old array minus the number of deleted rows
ReDim output_array(UBound(input_array) - DeletedRows, 20)
' Copy rows not marked as deleted from old array to new array
i = 1
For r = 1 To UBound(input_array)
If input_array(r, 21) <> "_DELETE_" Then
For c = 1 To 20
output_array(i, c) = input_array(r, c)
Next c
i = i + 1
End If
Next r
' Create new sheet and
Set s = Sheets.Add
Set NewRange = s.Range("A2").Resize(UBound(output_array), 20)
NewRange = output_array
The macro takes about 30 seconds to reduce 20,000 rows to 3,000 on my computer, which I think is quite good considering the large amounts of data.
This can be done in milliseconds, not seconds. I'm also using arrays and dictionary object. However showing in simpler and more thoughtful implementation. It's faster than using sql against sheet. It can compare any number of columns, just make sure they are included in the KeyIn variable as concatenated string. I also simplify the function by assuming my value to sum is in column 4. You can adjust code for multiple values in other columns. I'm writing from 1 array to another (InAy to OutAy), the dictionary determines if row already existed. The magic happens in the dictionary's Item property. I assign the item property value to the row (r) when a new OutAy row is written. Then when it exists already, I retrieve the row (r) where it was written to OutAy using the item key: d.item(KeyIn) I can then update that value in OutAy(r, 4) with a sum of existing value and new value 'KeyVal'.
This solves the same as sql query aggregate: "Select a, b, c, sum(d) from data group by a, b, c"
Note: add a tools->reference to Microsoft Scripting runtime
sub somesub()
...
data = Range("WhereYourDataIs") 'create data array
Range("WhereYourDataIs").clear 'assumes you'll output to same location
data = RemoveDupes(data) 'removedupes and sum values
Range("A2").Resize(UBound(data), UBound(data, 2)) = data 'A2 assumes your data headers begin in row 1, column 1
...
End Sub
Function RemoveDupes(InAy As Variant) As Variant
Dim d As Scripting.Dictionary
Set d = New Scripting.Dictionary
ReDim OutAy(1 To UBound(InAy), 1 To 4)
r = 1
For i = 1 To UBound(InAy)
KeyIn = ""
KeyVal = InAy(i, 4) 'the value field to sum/aggregate if exists
For c = 1 To 3 'a, b, c metadata to roll up
KeyIn = KeyIn & InAy(i, c)
Next c
If d.Exists(KeyIn) Then
OutAy(d.item(KeyIn), 4) = OutAy(d.item(KeyIn), 4) + KeyVal 'd.item(KeyIn) is r, set when OutAy row was added. Same as OutAy(r,4)=OutAy(r,4) + KeyVal
Else:
d.Add KeyIn, r 'r is set as the item value referencing the row of the OutAy when it was first added. The reference is used when .Exists is true
For c = 1 To 4
OutAy(r, c) = InAy(i, c)
Next c
r = r + 1
End If
Next
RemoveDupes = OutAy
End Function

creating a joined matrix from a list of data in vba (like outer join)

I have a 2 columns that look like:
field group1
a 1.2
b 0.2
c 2.4
field group2
a 0.2
c 0.8
field group3
c 0.6
d 0.8
and so forth. I have been pondering about this for a while but can't seem to find a good way.
Is there a efficient way to make the dataset look like:
field group1 group2 group3
a 1.2 0.2
b 0.2
c 2.4 0.8 0.6
d 0.8
and so forth. Any help or idea?
For a one-off, you can probably do it just with formulae to identify which groups a row is in and then pivot, as described by others in the comments to your question.
However, for repeated use / less hassle the below should work.
This works on your test data and outputs on a new sheet according to your desired output in the question.
It works in memory so it should have good performance when scaled up to thousands of cells.
Sub blah()
'Declarations
Dim outWs As Worksheet
Dim inArr, outArr
Dim vector(), groups()
Dim outC As Collection
Dim currentGroup As Long
Dim i As Long, j As Long
Dim key
'load data
inArr = Selection.Value
Set outC = New Collection
'iterate through
For i = LBound(inArr, 1) To UBound(inArr, 1)
If inArr(i, LBound(inArr, 2)) Like "field*" Then 'new group
currentGroup = currentGroup + 1
ReDim Preserve groups(1 To currentGroup)
groups(currentGroup) = inArr(i, LBound(inArr, 2) + 1) 'assign group name
Else 'is a record/field
key = inArr(i, LBound(inArr, 2))
'retrieve existing, ignoring the exception thrown if key does not exist
On Error Resume Next
vector = outC(key)
If Err.Number = 5 Then 'error raised when key does not exist
ReDim vector(0 To currentGroup)
vector(0) = key 'add key
Else
outC.Remove (key) 'the reference of item is immutable so we must remove and add again
ReDim Preserve vector(0 To currentGroup) 'resize vector
End If
On Error GoTo 0
vector(currentGroup) = inArr(i, LBound(inArr, 2) + 1) 'assign value to current group in vector
outC.Add vector, key 'add to results
Erase vector
End If
Next i
'Process our results collection into an array suitable for dumping to a sheet
ReDim outArr(1 To outC.Count, 1 To currentGroup + 1)
For i = 1 To outC.Count
For j = 0 To UBound(outC(i))
outArr(i, j + LBound(outArr, 2) - LBound(outC(i))) = outC(i)(j)
Next j
Next i
'dump data
With ActiveWorkbook.Worksheets.Add
.Range(.Cells(1, 2), .Cells(1, 1 + UBound(groups))).Value = groups
.Range(.Cells(2, 1), .Cells(1 + UBound(outArr, 1), UBound(outArr, 2))).Value = outArr
End With
Exit Sub
End Sub
I hope that helps.
so i have an idea, its not beautiful but it will probably work...
copy your whole field column and paste it to a fresh sheet, use data tab and hit remove duplicates, if you transpose that so your top row is Field, a, b, c, d you can drop a formula thats something like this (untested) "=INDEX(Sheet1!B:B, MATCH($B$1,Sheet1!A1:A3,0))"
the search range in match is intentionally small and left without $ to that if you drag this formula down it will search a little further(A2:A4,A3:A5,etc) once you get all of them just find/replace all the N/As remove blanks and your good
if i have time i will try and put together a little macro that would be a lot cleaner...
In outline: Create a copy of your group1 column, filter it for values greater than 0 and delete these. Fill the blanks with the respective groups and then pivot.
i would rearrange data first, with a macro, this way:
Sub sa()
For Each cl In Range("B2:B1000").Cells
If IsNumeric(cl.Value) And Not IsEmpty(cl.Value) Then
If Not IsNumeric(cl.Offset(-1, 0).Value) Then
cl.Offset(0, 1).Value = cl.Offset(-1, 0).Value
Else
cl.Offset(0, 1).Value = cl.Offset(-1, 1).Value
End If
End If
Next
End Sub
such that data would be rearranged with this column assignment:
[field] [value] [group]
then it would be easy to do what you want, just create a pivot table... tell me in the commentaries if in need of further help...

Resources