I don't know the term to describe the situation so just make it up
basically for example there are two columns
Col A Col B
----------------------
| date 1 | 2020-02-03|
| date 2 | 2020-03-12|
| date 3 | 2020-04-25|
======================
I'd like to have a function to generate following results
Col C
----------------------
| date 1 = 2020-02-03|
| date 1 = 2020-03-12|
| date 1 = 2020-04-25|
| date 2 = 2020-02-03|
| date 2 = 2020-03-12|
| date 2 = 2020-04-25|
| date 3 = 2020-02-03|
| date 3 = 2020-03-12|
| date 3 = 2020-04-25|
======================
it's like concat union of each row from both column values but allow to add extra string (= for example).
Place your first table as follows, i.e. add column headers. (this is my existing code, which requires column header. Too lazy to modify it, so, please follow)
Change the value of paraVal in below Sub to be the range of your first table (including the headers) and run it. The combination will be generated below your first table (so, make sure there's sufficient space for the result). Combine the resulted columns by your own way (e.g. TEXTJOIN, CONCAT...)
Sub CombinationTable()
Dim paraVal As Range
Dim paraInfo() As Long
Dim rowTtl As Long
Dim colIdx As Long
Dim repIdx As Long
Dim colIdxG As Long
Dim rowIdxG As Long
Dim rowStartG As Long
Dim rowEndG As Long
Dim colCell1 As Range
Dim colCellN As Range
Dim repeat As Range
Set paraVal = Range("F4", "G7")
On Error GoTo 0
With paraVal
rowTtl = 1
ReDim paraInfo(1 To .Columns.Count)
For colIdx = 1 To .Columns.Count
If IsEmpty(.Cells(.Rows.Count - 1, colIdx)) Then ' .rows.count-1 = 1st value
paraInfo(colIdx) = 0
Else
Set colCellN = .Cells(.Rows.Count, colIdx)
Set colCell1 = colCellN.End(xlUp)
paraInfo(colIdx) = colCellN.Row - colCell1.Row ' no +1 bcoz last row is header, not value
rowTtl = rowTtl * paraInfo(colIdx)
End If
Next colIdx
rowStartG = .Row + .Rows.Count
rowEndG = rowStartG + rowTtl - 1
For colIdx = 1 To .Columns.Count
If paraInfo(colIdx) > 0 Then
rowTtl = rowTtl / paraInfo(colIdx)
rowIdxG = rowStartG
colIdxG = .Columns(colIdx).Column
Set colCellN = .Columns(colIdx).Cells(.Columns(colIdx).Rows.Count).Offset(-paraInfo(colIdx)).Resize(paraInfo(colIdx))
For Each colCell1 In colCellN.Cells
ActiveSheet.Range(ActiveSheet.Cells(rowIdxG, colIdxG), ActiveSheet.Cells(rowIdxG + rowTtl - 1, colIdxG)).Value = colCell1.Value
rowIdxG = rowIdxG + rowTtl
Next colCell1
Set repeat = ActiveSheet.Range(ActiveSheet.Cells(rowStartG, colIdxG), ActiveSheet.Cells(rowIdxG - 1, colIdxG))
If colIdx > 1 Then
repeat.Copy Destination:=ActiveSheet.Range(ActiveSheet.Cells(rowIdxG, colIdxG), ActiveSheet.Cells(rowEndG, colIdxG))
End If
End If
Next colIdx
End With
End Sub
I have a dataset in Excel consisting of 20 columns and a varying number of rows, ranging from 20,000 - 50,000.
Each row is a collection of items with one column denoting the quantity of items in the collection and another column denoting the total combined weight of the collection.
Some of these rows are completely identical in all columns, while some are identical in all but quantity and weight.
I want to create a macro that runs through the dataset and "stacks" rows that are duplicate on all other parameters than quantity and weight, and sums these two up.
In other words, a macro that converts this:
|Param1|Param2|...|Param18|Quantity|Weight|
| A | 1 |...| C | 5 | 12.5 |
| A | 1 |...| C | 2 | 5.0 |
| A | 1 |...| C | 3 | 7.5 |
| B | 2 |...| C | 1 | 2.3 |
| B | 2 |...| C | 2 | 4.6 |
To this:
|Param1|Param2|...|Param18|Quantity|Weight|
| A | 1 |...| C | 10 | 25.0 |
| B | 2 |...| C | 3 | 6.9 |
I know this is possible to do in a simple pivot table, but for a number of reasons this is not viable in this case.
Since I'm dealing with a large dataset, I want to load it all into memory at once rather than reading and writing line by line to speed up performance (as suggested in tip #13 in this great article http://www.databison.com/how-to-speed-up-calculation-and-improve-performance-of-excel-and-vba/). However I'm stuck as to how to make the row operations on the data stored in memory.
So far my code looks like this:
Dim r, c, LastRow As Integer
Dim temp_range As Variant
LastRow = Cells(65536, 2).End(xlUp).Row
'Load the data set into memory
temp_range = Sheets("1.1").Range(Sheets("1.1").Cells(2, 1), Sheets("1.1").Cells(LastRow, 20)).Value
'Run through the data set from bottom to top and bulk identical rows together
For r = UBound(temp_range) To LBound(temp_range)
For i = r - 1 To LBound(temp_range)
'PSEUDO CODE START HERE
If row temp_range(r) = row temp_range(i) Then
temp_range(i,19) = temp_range(r,19) + temp_range(i,19)
temp_range(i,20) = temp_range(r,19) + temp_range(i,20)
Delete row temp_range(r)
Exit For
End if
'PSEUDO CODE END HERE
Next i
Next r
I'm stuck at the Pseudo-code section highligted in the code. I simply don't know how to compare the rows, copy quantity and weight from one row to another and then delete the duplicate row in the variant holding the range in memory.
Use Microsoft Query (SQL) in Excel:
Such data manipulation operations are ideal for SQL queries. No need of going through the data line by line using VBA:
SELECT S1.Param1, S1.Param2, S1.Param18, SUM(S1.Quantity), SUM(S1.Weight)
FROM [Sheet1$] AS S1 GROUP BY Param1, Param2,Param18
This is guaranteed to run quickly and efficiently via OLE DB. Whatever VBA code will be much less efficient.
To refresh the query at any time simply run the following code from VBA:
Set ws = ActiveSheet
ws.QueryTables(1).Refresh BackgroundQuery:=False
Where ws is the worksheet where you locate the query table.
Either use the Microsoft Query from Data->From other sources->From Microsoft Query or feel free to use my Add-In: http://www.analystcave.com/excel-tools/excel-sql-add-in-free/
See if you can F8 through this. You will need to go into the VBE's Tools ► References and add Microsoft Scripting Runtime for the use of the Scripting.Dictionary object.
Sub Stack_Dupes()
Dim r As Long, c As Long, v As Long, k As Long
Dim vKEYs As Variant, vITMs As Variant, vTMP1 As Variant, vTMP2 As Variant
Dim sKEY As String, sITM As String
Dim dITMs As New Scripting.dictionary
With ActiveSheet
With .Cells(1, 1).CurrentRegion
.Rows(1).Cells.Copy _
Destination:=.Cells(.Rows.Count + 3, 1)
With .Offset(1, 0).Resize(.Rows.Count - 1, .Columns.Count)
vKEYs = .Cells.Resize(.Rows.Count, .Columns.Count - 2).Value2
vITMs = .Cells.Offset(0, .Columns.Count - 2).Resize(.Rows.Count, 2).Value2
For r = LBound(vKEYs, 1) To UBound(vKEYs, 1)
sKEY = Join(Application.Index(vKEYs, r, 0), ChrW(8203))
sITM = Join(Application.Index(vITMs, r, 0), ChrW(8203))
If dITMs.Exists(sKEY) Then
vTMP1 = Split(dITMs.Item(sKEY), ChrW(8203))
vTMP2 = Split(sITM, ChrW(8203))
vTMP1(0) = CDbl(vTMP1(0)) + CDbl(vTMP2(0))
vTMP1(1) = CDbl(vTMP1(1)) + CDbl(vTMP2(1))
sITM = Join(vTMP1, ChrW(8203))
dITMs.Item(sKEY) = sITM
Else
dITMs.Add Key:=sKEY, Item:=sITM
End If
Next r
ReDim vKEYs(1 To (.Columns.Count - 2), 1 To dITMs.Count)
ReDim vITMs(1 To 2, 1 To dITMs.Count)
For Each vTMP1 In dITMs.Keys
k = k + 1
vTMP2 = Split(vTMP1, ChrW(8203))
For v = LBound(vTMP2) To UBound(vTMP2)
vKEYs(v + 1, k) = vTMP2(v)
Next v
vTMP2 = Split(dITMs.Item(vTMP1), ChrW(8203))
For v = LBound(vTMP2) To UBound(vTMP2)
vITMs(v + 1, k) = vTMP2(v)
Next v
Next vTMP1
.Cells.Offset(.Rows.Count + 3, 0).Resize(UBound(vKEYs, 2), .Columns.Count - 2) = Application.Transpose(vKEYs)
.Cells.Offset(.Rows.Count + 3, .Columns.Count - 2).Resize(UBound(vITMs, 2), 2) = Application.Transpose(vITMs)
End With
End With
End With
dITMs.RemoveAll: Set dITMs = Nothing
End Sub
Results are written underneath the sample data as follows:
I would be interested in the timed results on larger data sets if you find time to post them back here.
I found this article (http://sitestory.dk/excel_vba/arrays-and-ranges.htm, go to middle of the page) on how to delete identical rows that I built some code on. It doesn't solve my original question 100% since it doesn't compare entire rows in the array but rather each column separately in each row, but it turned out to have quite good performance nonetheless.
Here's the code:
'I couldn't get the final step of pasting the output array into the new sheet working properly without declaring this option, otherwise the data would be pasted one cell to the right and below where I wanted it.
Option Base 1
Dim r, i, c, LastRow, DeletedRows As Integer
Dim input_array, output_array As Variant
Dim identical As Boolean
Dim s As Worksheet
Dim NewRange As Range
LastRow = Cells(65536, 2).End(xlUp).Row
'Load the data set into memory, consisting of 20 columns of data and a 21th column with no data that is used for marking rows for deletion.
input_array = Sheets("1.1").Range(Sheets("1.1").Cells(2, 1), Sheets("1.1").Cells(LastRow, 21)).Value
DeletedRows = 0
'Run through the data set from bottom to top comparing rows one at a time, copy Quantity and Weight values and mark rows for deletion
For r = UBound(input_array) To 2 Step -1
For i = r - 1 To 2 Step -1
'Assume row r and i are identical
identical = True
'Run through columns of r and i, flag if non-identical value is found and stop the loop (col 18 is ignored, since this is the "Quantity" column, while col 20 is the "Weight" column)
For c = 1 To 18
If input_array(r, c) <> input_array(i, c) And c <> 18 Then
identical = False
Exit For
End If
Next c
' If no non-identical columns were found, add "Quantity" and "Weight" from row r to row i and mark row r for deletion
If identical Then
input_array(i, 18) = input_array(i, 18) + input_array(r, 18)
input_array(i, 20) = input_array(i, 20) + input_array(r, 20)
input_array(r, 21) = "_DELETE_"
DeletedRows = DeletedRows + 1
Exit For
End If
Next i
Next r
' Resize the new array to the size of the old array minus the number of deleted rows
ReDim output_array(UBound(input_array) - DeletedRows, 20)
' Copy rows not marked as deleted from old array to new array
i = 1
For r = 1 To UBound(input_array)
If input_array(r, 21) <> "_DELETE_" Then
For c = 1 To 20
output_array(i, c) = input_array(r, c)
Next c
i = i + 1
End If
Next r
' Create new sheet and
Set s = Sheets.Add
Set NewRange = s.Range("A2").Resize(UBound(output_array), 20)
NewRange = output_array
The macro takes about 30 seconds to reduce 20,000 rows to 3,000 on my computer, which I think is quite good considering the large amounts of data.
This can be done in milliseconds, not seconds. I'm also using arrays and dictionary object. However showing in simpler and more thoughtful implementation. It's faster than using sql against sheet. It can compare any number of columns, just make sure they are included in the KeyIn variable as concatenated string. I also simplify the function by assuming my value to sum is in column 4. You can adjust code for multiple values in other columns. I'm writing from 1 array to another (InAy to OutAy), the dictionary determines if row already existed. The magic happens in the dictionary's Item property. I assign the item property value to the row (r) when a new OutAy row is written. Then when it exists already, I retrieve the row (r) where it was written to OutAy using the item key: d.item(KeyIn) I can then update that value in OutAy(r, 4) with a sum of existing value and new value 'KeyVal'.
This solves the same as sql query aggregate: "Select a, b, c, sum(d) from data group by a, b, c"
Note: add a tools->reference to Microsoft Scripting runtime
sub somesub()
...
data = Range("WhereYourDataIs") 'create data array
Range("WhereYourDataIs").clear 'assumes you'll output to same location
data = RemoveDupes(data) 'removedupes and sum values
Range("A2").Resize(UBound(data), UBound(data, 2)) = data 'A2 assumes your data headers begin in row 1, column 1
...
End Sub
Function RemoveDupes(InAy As Variant) As Variant
Dim d As Scripting.Dictionary
Set d = New Scripting.Dictionary
ReDim OutAy(1 To UBound(InAy), 1 To 4)
r = 1
For i = 1 To UBound(InAy)
KeyIn = ""
KeyVal = InAy(i, 4) 'the value field to sum/aggregate if exists
For c = 1 To 3 'a, b, c metadata to roll up
KeyIn = KeyIn & InAy(i, c)
Next c
If d.Exists(KeyIn) Then
OutAy(d.item(KeyIn), 4) = OutAy(d.item(KeyIn), 4) + KeyVal 'd.item(KeyIn) is r, set when OutAy row was added. Same as OutAy(r,4)=OutAy(r,4) + KeyVal
Else:
d.Add KeyIn, r 'r is set as the item value referencing the row of the OutAy when it was first added. The reference is used when .Exists is true
For c = 1 To 4
OutAy(r, c) = InAy(i, c)
Next c
r = r + 1
End If
Next
RemoveDupes = OutAy
End Function
I have an Excel sheet that looks like this:
3 | latitude | 46.142737
3 | longitude| -57.608968
8 | latitude | 43.142737
8 | longitude| -52.608968
15 | latitude | 41.142737
15 | longitude| -59.608968
I need the end result to look like this:
3 | 46.142737, -57.608968
8 | 43.142737, -52.608968
15 | 41.142737, -59.608968
It can be concatenated based on every other row, or based on the integer value in the first column.
VBA suggestions? Thank you.
Edit: There is no actual "|" in my Excel sheet. The "|" is meant to be a visual cue representing a new column.
You could read the data into an array and then write that to a range
Original Data:
Result:
Code:
Sub Example()
Dim i As Long
Dim x As Long
Dim arry As Variant
ReDim arry(1 To 2, 1 To 1) As Variant
For i = 1 To ActiveSheet.UsedRange.Rows.Count
If Cells(i, 1).Row Mod 2 = 1 Then
x = x + 1
ReDim Preserve arry(1 To 2, 1 To x) As Variant
arry(1, x) = Cells(i, 1).Value
arry(2, x) = Cells(i, 3).Value & ", " & Cells(i + 1, 3).Value
End If
Next
arry = WorksheetFunction.Transpose(arry)
Sheets("Sheet2").Select
Range(Cells(1, 1), Cells(UBound(arry), UBound(arry, 2))).Value = arry
End Sub
I have the following values in a spreadsheet:
Printer Name | Pages | Copies
HP2300 | 2 | 1
HP2300 | 5 | 1
Laser1 | 2 | 2
Laser1 | 3 | 4
HP2300 | 1 | 1
How can I get the total number of pages printed (pages * copies) on each printer like this:
Printer Name | TotalPages |
HP2300 | 8 |
Laser1 | 16 |
I managed to create a list counting the number of times a printer was used to print:
Sub UniquePrints()
Application.ScreenUpdating = False
Dim Dict As Object
Set Dict = CreateObject("scripting.dictionary")
Dim varray As Variant, element As Variant
varray = Sheets("Prints").Range("E:E").Value
For Each element In varray
If Dict.exists(element) Then
Dict.Item(element) = Dict.Item(element) + 1
Else
Dict.Add element, 1
End If
Next
Sheets("Stats").Range("D6").Resize(Dict.Count, 1).Value = _
WorksheetFunction.Transpose(Dict.keys)
Sheets("Stats").Range("E6").Resize(Dict.Count, 1).Value = _
WorksheetFunction.Transpose(Dict.items)
Application.ScreenUpdating = True
End Sub
How can I calculate the total pages for each print (row) (pages*copies) and save that in the dictionary instead of just adding 1?
Thank you for your help
Read in the columns E:G rather than just E and use the second dimension of that array to add pages * copies, rather than adding 1.
Sub UniquePrints()
Dim Dict As Object
Dim vaPrinters As Variant
Dim i As Long
Set Dict = CreateObject("scripting.dictionary")
vaPrinters = Sheets("Prints").Range("E2:G6").Value
For i = LBound(vaPrinters, 1) To UBound(vaPrinters, 1)
If Dict.exists(vaPrinters(i, 1)) Then
Dict.Item(vaPrinters(i, 1)) = Dict.Item(vaPrinters(i, 1)) + (vaPrinters(i, 2) * vaPrinters(i, 3))
Else
Dict.Add vaPrinters(i, 1), vaPrinters(i, 2) * vaPrinters(i, 3)
End If
Next i
Sheets("Stats").Range("D6").Resize(Dict.Count, 1).Value = _
WorksheetFunction.Transpose(Dict.keys)
Sheets("Stats").Range("E6").Resize(Dict.Count, 1).Value = _
WorksheetFunction.Transpose(Dict.items)
End Sub
It's possible to use an array formula to get cells populated:
={SUMPRODUCT(IF($A$2:$A$6=$F2;1;0);$B$2:$B$6;$C$2:$C$6)}
The formula is inserted from formula window with Ctrl-Shift-Enter. Curled brackets are inserted by excel, not by a user. The formula can be copied elsewhere.
I've got data that looks like this:
BOB | 4
BOB | 3
BOB | 7
MARY | 1
JOE | 2
JOE | 1
MIKE | 6
I want to end up with data that looks like this:
BOB | 4 | 3 | 7
MARY | 1 | |
JOE | 2 | 1 |
MIKE | 6 | |
The problem is, how do I account for the variable number of times a name shows up?
I came up with the following code. It feels like it could be cleaner.
This will work for any selected block of data on your sheet (assuming it is pre-sorted). It outputs on the same sheet in the same area.
Sub WrapDuplicates()
Dim data(), i As Long, startCell As Range, rwCnt As Long, col As Long
data = Selection //pull selected data into an array
Set startCell = Selection.Cells(1, 1) //Get reference to write results to
Selection.ClearContents //remove original data
startCell = data(1, 1) //Output first name
startCell.Offset(0, 1) = data(1, 2) //Output first value
rwCnt = 0
col = 2
For i = 2 To UBound(data) //Loop through array and check if name is same or not and output accordingly
If data(i, 1) = data(i - 1, 1) Then
startCell.Offset(rwCnt, col) = data(i, 2)
col = col + 1
Else
rwCnt = rwCnt + 1
col = 2
startCell.Offset(rwCnt, 0) = data(i, 1)
startCell.Offset(rwCnt, 1) = data(i, 2)
End If
Next i
End Sub
I'm assuming you want to do this in code based on the excel-vba tag in your post.
I'm also assuming the data is sorted by name, or you are OK with sorting it by name before the code executes.
Source is in sheet 1, target is in sheet 2. Code is in Excel VBA. I tested with your sample data, dropping this subroutine in the ThisWorkbook section of the Excel codebehind and pressing play.
The target header gets rewritten every time, which isn't ideal from a performance perspective, but I don't think is a problem otherwise. You could wrap it in an if statement that checks the target column index = 2 if it becomes a problem.
Sub ColumnsToRows()
Dim rowHeading
Dim previousRowHeading
Dim sourceRowIndex
Dim targetRowIndex
Dim targetColumnIndex
sourceRowIndex = 1
targetRowIndex = 1
targetColumnIndex = 2
rowHeading = Sheet1.Cells(sourceRowIndex, 1)
previousRowHeading = rowHeading
While Not rowHeading = ""
If Not previousRowHeading = rowHeading Then
targetRowIndex = targetRowIndex + 1
targetColumnIndex = 2
End If
Sheet2.Cells(targetRowIndex, 1) = rowHeading
Sheet2.Cells(targetRowIndex, targetColumnIndex) = Sheet1.Cells(sourceRowIndex, 2)
previousRowHeading = rowHeading
sourceRowIndex = sourceRowIndex + 1
targetColumnIndex = targetColumnIndex + 1
rowHeading = Sheet1.Cells(sourceRowIndex, 1)
Wend
End Sub
I'm a developer, not an Excel guru. There may be some Excel function, pivot table, or some other Excel magic that does this for you automatically.