Combining duplicate entries with unique data in Excel - excel

I have an Excel database and I'm trying avoid doing some manual combining of duplicate data. I've got a bunch of listings that are essentially the same aside from the tags column. What I'd like to have it do is combine these 5 listings into 1 listing, making the categories a comma separated list in a single cell.
Turn this
into this
Is there any way of achieving this? My document has a couple thousand listings, so I'm obviously trying to avoid the manual edit route. I'm an Excel novice, so any hand holding or tutorials you could point me to would be appreciated.

This can also be done using formulas. For my example to work, the data would need to be sorted by the first column and there would need to be a header row.
You would need two more columns (C & D). First, add a formula that essentially says to concatenate the data in column B if data in column A is the same as the row above it, otherwise reset the concatenation. The next column would contain a formula to identify the final concatenations so you can sort later.
This is how I would do it with listings and categories in columns A & B (again, the data would need to be sorted by column A and there would need to be a header row):
Here's the results. Now I would copy the entire range and paste values into another sheet. The rows with zero for column D is what I'd want to use. Sorting by column D would float them to the top.

This will (should) generate a new sheet from your source sheet with the duplicates concatenated.
To use the following code you need to add it to a new module in the VBA Editor
A Shortcut to open the VBA Editor is Alt+F11 (for Windows) and Alt+Fn+F11 (for Mac)
Once the Editor is open add a new module by selecting it from the "insert" menu in the main menu bar. It should automatically open the module ready to accept code, If not you need to select it (will be named "ModuleN" where N is the next available number) from the project explorer.
I'm not sure if the "Scripting.Dictionary" is available in osx, but it cant hurt to try.
Option Explicit
Sub Main()
Dim Source As Worksheet: Set Source = ThisWorkbook.Worksheets("Sheet1")
Dim Destination As Worksheet: Set Destination = ThisWorkbook.Worksheets("Sheet2")
Dim Records As Object: Set Records = CreateObject("Scripting.Dictionary")
Dim Data As Variant
Dim Index As Long
Dim Row As Integer: Row = 1
Data = Source.Range("A1", "B" & Source.Rows(Source.UsedRange.Rows.Count).Row).Value2
For Index = LBound(Data, 1) To UBound(Data, 1)
If Records.Exists(Data(Index, 1)) Then
Destination.Cells(Records(Data(Index, 1)), 2).Value2 = Destination.Cells(Records(Data(Index, 1)), 2).Value2 & ", " & Data(Index, 2)
Else
Records.Add Data(Index, 1), Row
Destination.Cells(Row, 1).Value2 = Data(Index, 1)
Destination.Cells(Row, 2).Value2 = Data(Index, 2)
Row = Row + 1
End If
Next Index
Set Records = Nothing
End Sub

Related

What is the most effective way to copy cells from one Sheet to another

I am working on a Report of reports. That means; I have a main report (sheet 'Rm') that will hold the result values from the rest of the sub reports (sheet 'Rs_1'....'Rs_n').
Rm will write only 2 columns from each of the Rs_i and it will be pasted sequentially to the right side for each sub report(Rs) found.
Since there is more than 100,000 cells and more the 100 sub-reports to copy and paste I would like to optimize the best way possible the timing for that.
So my question is. What is the most optimized way by vba to copy the columns Worksheets(Rs_1).(Range("B14:B500;F14:F500") from worksheets(Rs_1).Range("A14:F500") into the two(2) followed columns in the Main Report: Worksheets(Rm).Range("E15:F501")
For Each i_Rs In ActiveWorkbook.Worksheets
If i_Rs.Name = mainReportName Then
'do Nothing on Main Report
Else
'-->take report's Order Nr and Part Nr
i_Rs_Nr = GetNumeric(i_Rs.Range(Rs_NrPosRng).Value)
i_Rs_PartNr = GetNumeric(i_Rs.Range(Rs_ParNrPosRng).Value)
'-->get Rs big Range
Set i_Rs_BigRng = FindStringRng("A:H", "Characteristic", i_Rs.Name)
Set i_Rs_BigRng = Range(i_Rs_BigRng, i_Rs_BigRng.End(xlToRight).End(xlDown))
'-->set the actual range of MainReport to paste value from
For j_Rm = 1 To Rm_BigRng.row Step 2 ' loop on every two columns
Set j_RmRng = Range(Rm_BigRng(1, 1), Rm_BigRng(qtyCharacsRows, 2))
'************************************************
'*** HERE logic to take the sub from each report and copy-paste values
'************************************************
Set j_RmPartNrRng = Range(Rm_BigRng(1, 1), Rm_BigRng(1, 1))
Set j_RmPartNrRng = Cells(j_RmRng.row - 2, j_RmRng.Column + 1)
'-->write part number into j_Rm
j_RmPartNrRng.Value = i_Rs_PartNr
'**** get sub range for each Rs
Set i_Rs_BigRng = Range(i_Rs_BigRng(2, 1), i_Rs_BigRng(qtyCharacsRows + 1, i_Rs_BigRng.Columns.count)) ' Need optimization to erase first row
Set i_Rs_subRng = Union(i_Rs_BigRng.Columns(2), i_Rs_BigRng.Columns(6)) ' need columns in letters
'*** Copy Rs("B:B,F:F").values into Rm("E:F").values
j_RmRng.Value = i_Rs_subRng.Value 'Error: the first column is copying good the second is just a copy of the first
Next j_Rm
End If
Next
If you don't need cell formatting then the fastest way would be to avoid Range.Copy entirely and just directly pass values using arrays.
When a Range contains more than one cell, Range.Value returns an array of variant values which you can put back into a worksheet by setting a different Range.Value = Array. In this case, since you don't actually need to manipulate or process the values, you can skip the array and directly do Range1.Value = Range2.Value which transfers all values as long as the ranges are the same size.
So as far as I know, the fastest way to transfer the data would be Range1.Value = Range2.Value.
For your code:
Worksheets(Rm).Range("E15:F501").Value = Worksheets(Rs_1).Range("B14:B500,F14:F500").Value

Selecting rows in a table based on same cell values in a column and looping through the table

I have a Workbook with 2 Sheets. One Sheet is a table that gets updated every month with invoicing information and one sheet is a preset for an invoice.
I have working VBA code that takes a (manual) selection of rows and then copies the relevant information from these rows to the preset invoice in the second sheet. This then gets automatically exported as a PDF and saved to a folder.
This code/module works flawlessly. The selection of rows is based on a transaction number that is in a column. I select (manually) all rows with the same transaction number.
Now instead of manually going through the rows and selecting the relevant rows based on a transaction number and then starting the module, I'd like to automate this process.
What I have so far:
Sub PrintAll()
Dim i As Long, j As Long
Dim Arr
Dim DB As Worksheet
Set DB = Sheets("Data")
'~~> Set Range here
Arr = DB.Range("A2:X400").Value
For i = 1 To UBound(Arr, 1)
For j = 2 To UBound(Arr, 2)
If Arr(i, 7) = Arr(i + 1, 7) Then
Arr(i, j).Select
Call InvoiceCreator.InvoiceCreator
End If
Next j
Next i
End Sub
The idea is to select an array based on an if statement that the values in column 7 are the same and then running the module based on the selected array. This might be completely wrong so I'm asking you guys for your experience regarding this.

Remove columns based on their name (first value)

I'm using macros to quickly search a large table of student data and consolidate it into a single cell for use in seating plans (I'm a teacher). Most of it works but I have a problem with selecting just the data I need.
Steps:
1. Remove data.
2. Run a formula to check if students fit into particular groups and consolidate their information
3. Format
Different subjects and year groups have different layouts for their data and so this step is causing me problems. I've tried using absolute cell references in step 2 but this doesn't work as sometimes the information that should be in column D is in column E etc.
What I want to be able to do is have a macro that checks the first value in the column (ie the title) and if it doesn't match one of a predetermined list delete the whole column along with it's data.
Dim rng As Range
For Each rng In Range("everything")
If rng.Value = "Test" Or rng.Value = "Test1" Then
rng.EntireColumn.Hidden = True
End If
I think I could use something like this if I could change the output from hiding columns to deleting them?
re: What I want to be able to do is have a macro that checks the first value in the column (ie the title) and if it doesn't match one of a predetermined list delete the whole column along with it's data.
To delete all columns NOT WITHIN the list:
Sub del_cols()
Dim c As Long, vCOL_LBLs As Variant
vCOL_LBLs = Array("BCD", "CDE", "DEF")
With Worksheets("Sheet7") '<~~ set this worksheet reference properly!
For c = .Cells(1, Columns.Count).End(xlToLeft).Column To 1 Step -1
If IsError(Application.Match(.Cells(1, c), vCOL_LBLs, 0)) Then
.Columns(c).Delete
End If
Next c
End With
End Sub
To delete all columns WITHIN the list:
Sub del_cols()
Dim v As Long, vCOL_LBLs As Variant
vCOL_LBLs = Array("BCD", "CDE", "DEF")
With Worksheets("Sheet7") '<~~ set this worksheet reference properly!
For v = LBound(vCOL_LBLs) To UBound(vCOL_LBLs)
Do While Not IsError(Application.Match(vCOL_LBLs(v), .Rows(1), 0))
.Cells(1, Application.Match(vCOL_LBLs(v), .Rows(1), 0)).EntireColumn.Delete
Loop
Next v
End With
End Sub

Excel function to search a string for a multiple keywords

I have two tables. One of them has server names. The other has timestamps (first table, column A below) and text strings (first table, column B below). I want to search those strings for a keywords specified in the server table (second table below). If the match is found function writes to the cell name from the header of the column where the keyword is.
Example
I want to complete column System in Blue table. So for example C2 should show GreenSys and C8 - RedSys.
I have tried using SEARCH function but it looks like it tries to match whole table to the string if I pass it as an argument. VLOOKUP doesnt work too as I am using two tables. What's the best way for me to get this working?
If you change the way you have the data setup so that it is a bit more Excel-friendly, this can be rather easily accomplished.
The lookup sheet should look like this (the formula below has this as 'Sheet2'):
Then on your main data sheet, in cell C2 and copied down:
=IF(SUMPRODUCT(COUNTIF(B2,"*"&Sheet2!$A$2:$A$7&"*")),INDEX(Sheet2!B:B,SUMPRODUCT(COUNTIF(B2,"*"&Sheet2!$A$2:$A$7&"*")*ROW(Sheet2!$A$2:$A$7))),"")
The results look like this:
With the assumption that all Servers start with "Serv".. this should work without using vba.
=MID(B1,SEARCH("Serv",B1,1),IF(ISERROR(SEARCH(" ",B1,SEARCH("Serv",B1,1))),LEN(B1)-SEARCH("Serv",B1,1),SEARCH(" ",B1,SEARCH("Serv",B1,1))-SEARCH("Serv",B1,1)))
Essentially the formulas searches for the keyword serv and then attempts to parse until the end of the word to return the full name.
As someone else mentioned, it would be easier to do with vba but then again there is a benefit of not having macros.
Can you try this formula to cellC2?
=IF(SUMPRODUCT((B2=Sheet2!$A$2:$D$4)*COLUMN(Sheet2!$A$1:$D$1))>0,
INDEX(Sheet2!$A$1:$D$1,SUMPRODUCT((B2=Sheet2!$A$2:$D$4)*COLUMN(Sheet2!$A$1:$D$1)))
,"")
I have assumed that the second table is at Sheet2 and that data is upto column D, starting with the headers at A1, with the format you describe.
EDIT:
I can see you have amended the original post, and my answer no longer meets the specifications. Therefore I think it is best that I delete it.
EDIT2:
Added VBA solution. Assumptions:
Orignal data table in Sheet1
Destination table in Sheet2
Headers of Sheet1 in 1st row
The below code was tested, it should be OK but needs error handling:
Sub moveData()
Dim rngDestination As Range
Dim lRowCounter As Long, lColCounter As Long, lValueCounter As Long, lLastRow As Long
Dim vOriginArray As Variant, vValuesArray As Variant, vDestinationArray As Variant
' Database table in Sheet2
vOriginArray = Sheet2.UsedRange.Value
' Destination table in Sheet1
With Sheet1
lLastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
' Put the values we need to compare into an array
vValuesArray = .Range(.Cells(2, 2), .Cells(lLastRow, 2)).Value
Set rngDestination = .Range(.Cells(2, 3), .Cells(lLastRow, 3))
End With
' We will store the values to an array first and then
' back to the sheet, it is faster this way
ReDim vDestinationArray(1 To rngDestination.Rows.Count, 1 To 1)
' Loop through all rows and columns, exclude header row
For lRowCounter = 2 To UBound(vOriginArray, 1)
For lColCounter = LBound(vOriginArray, 2) To UBound(vOriginArray, 2)
' For each entry, find which values match and store them
For lValueCounter = 1 To UBound(vValuesArray, 1)
If InStr(1, vValuesArray(lValueCounter, 1), vOriginArray(lRowCounter, lColCounter), vbTextCompare) Then
vDestinationArray(lValueCounter, 1) = vOriginArray(1, lColCounter)
End If
Next lValueCounter
Next lColCounter
Next lRowCounter
' Put the data back to excel
With rngDestination
.ClearContents
.Value = vDestinationArray
End With
End Sub

Convert row with columns of data into column with multiple rows in Excel 2007

I have a row of data as follows:
header1 header2 header3 header4 header5
row key datavalue1 datavalue2 datavalue3 datavalue4 datavalue5....
so basically, I have a denormalized data set where the datavalues may or may not be empty on a row-by-row basis. I need to normalize them.
ie
12345678 NULL 10 3 NULL 14
would become:
12345678 header2 10
12345678 header3 3
12345678 header5 14
I could do this by using a paste special transform, but I have thousands of rows and I'd need to make sure that I get the right row key for each. furthermore, each row has a bunch of descriptives associated with it that I need copied over with each datavalue.
What is the easiest way to convert each row of columns such that I have multiple rows of a single column with all non-empty datavalues plus the associated datavalue reference? I need to be able to pivot the dataset.
If you have five "header" columns, enter these formulas
H1: =OFFSET($A$1,INT((ROW()-1)/5)+1,0)
I1: =OFFSET($A$1,0,IF(MOD(ROW(),5)=0,5,MOD(ROW(),5)))
J1: =INDEX($A$1:$F$9,MATCH(H1,$A$1:$A$9,FALSE),MATCH(I1,$A$1:$F$1,FALSE))
Copy H1:J?? and paste special values over the top. Sort on column J and delete anything that's a zero. If you have legitmate zeros in the data, then you first need to replace blank cells with some unique string that you can then delete later.
If you have more columns, then replace the '5' in all the above formulas with whatever number you have.
Seems to me that part of what you are trying to do is to "de-pivot" a pivot table. I've found this tip to be a tremendous help when I've had to do similar tasks: http://spreadsheetpage.com/index.php/tip/creating_a_database_table_from_a_summary_table/
Note that in Excel 2007, you can get to the old Excel 2003 pivot table wizard using the keystrokes Alt+D, P .
Excel has a transpose feature which may address your needs. It's pretty hidden and a bit clumsy but likely easier than delving into VBA. Here's an excerpt from Excel 2007 Help:
Blockquote
Switch (transpose) columns and rows
Show AllHide All
If data is entered in columns or rows, but you want to rearrange that data into rows or columns instead, you can quickly transpose the data from one to the other.
For example, the regional sales data that is organized in columns appears in rows after transposing the data, as shown in the following graphics.
1.On the worksheet, do the following:
To rearrange data from columns to rows, select the cells in the columns that contain the data.
To rearrange data from rows to columns, select the cells in the rows that contain the data.
2.On the Home tab, in the Clipboard group, click Copy .
Keyboard shortcut To copy the selected data, you can also press CTRL+C.
Note You can only use the Copy command to rearrange the data. To complete this procedure successfully, do not use the Cut command.
3.On the worksheet, select the first cell of the destination rows or columns into which you want to rearrange the copied data.
Note Copy areas (copy area: The cells that you copy when you want to paste data into another location. After you copy cells, a moving border appears around them to indicate that they've been copied.) and paste areas (paste area: The target destination for data that's been cut or copied by using the Office Clipboard.) cannot overlap. Make sure that you select a cell in a paste area that falls outside of the area from which you copied the data.
4.On the Home tab, in the Clipboard group, click the arrow below Paste, and then click Transpose.
5.After the data is transposed successfully, you can delete the data in the copy area.
Tip If the cells that you transpose contain formulas, the formulas are transposed and cell references to data in transposed cells are automatically adjusted. To make sure that formulas continue to refer correctly to data in nontransposed cells, use absolute references in the formulas before you transpose them.
For more information, see Switch between relative, absolute, and mixed references.
Blockquote
Let's look at a possible solution in VBA. I think this will really help. Here are a few things you should know about my code.
You'll need to put this code in a code module in VBA (the same place where Macros go)
Look at what I named the sheets: Original and Normalized. You'll either want to change your sheet names or the code
I'm checking for values with a string field of NULL. If the cell is empty, you'll want to check for If IsEmpty(rngCurrent.Value) Then instead.
'
Sub NormalizeSheet()
Dim wsOriginal As Worksheet
Dim wsNormalized As Worksheet
Dim strKey As String
Dim clnHeader As Collection
Dim lngColumnCounter As Long
Dim lngRowCounterOriginal As Long
Dim lngRowCounterNormalized As Long
Dim rngCurrent As Range
Dim varColumn As Variant
Set wsOriginal = ThisWorkbook.Worksheets("Original") 'This is the name of your original worksheet'
Set wsNormalized = ThisWorkbook.Worksheets("Normalized") 'This is the name of the new worksheet'
Set clnHeader = New Collection
wsNormalized.Cells.ClearContents 'This deletes the contents of the destination worksheet'
lngColumnCounter = 2
lngRowCounterOriginal = 1
Set rngCurrent = wsOriginal.Cells(lngRowCounterOriginal, lngColumnCounter)
' We'll loop through just the headers to get a collection of header names'
Do Until IsEmpty(rngCurrent.Value)
clnHeader.Add rngCurrent.Value, CStr(lngColumnCounter)
lngColumnCounter = lngColumnCounter + 1
Set rngCurrent = wsOriginal.Cells(lngRowCounterOriginal, lngColumnCounter)
Loop
'Here we'll reset our Row Counter and loop through the entire data set'
lngRowCounterOriginal = 2
lngRowCounterNormalized = 1
lngColumnCounter = 1
Do While Not IsEmpty(wsOriginal.Cells(lngRowCounterOriginal, lngColumnCounter))
Set rngCurrent = wsOriginal.Cells(lngRowCounterOriginal, lngColumnCounter)
strKey = rngCurrent.Value ' Get the key value from the current cell'
lngColumnCounter = 2
'This next loop parses the denormalized values for each row'
Do While Not IsEmpty(wsOriginal.Cells(lngRowCounterOriginal, lngColumnCounter))
Set rngCurrent = wsOriginal.Cells(lngRowCounterOriginal, lngColumnCounter)
'We're going to check to see if the current value'
'is equal to NULL. If it is, we won't add it to'
'the Normalized Table.'
If rngCurrent.Value = "NULL" Then
'Skip it'
Else
'Add this item to the normalized sheet'
wsNormalized.Range("A" & lngRowCounterNormalized).Value = strKey
wsNormalized.Range("B" & lngRowCounterNormalized).Value = clnHeader(CStr(lngColumnCounter))
wsNormalized.Range("C" & lngRowCounterNormalized).Value = rngCurrent.Value
lngRowCounterNormalized = lngRowCounterNormalized + 1
End If
lngColumnCounter = lngColumnCounter + 1
Loop
lngRowCounterOriginal = lngRowCounterOriginal + 1
lngColumnCounter = 1 'We reset the column counter here because we're on a new row'
Loop
End Sub
I would create a VBA macro that loops through each row and output the data to another page. This would let you create your pivot table in the new page once the data has been outputed.
Not sure how familiar you are with VBA, but this could pretty easily be done by loading the data into an array (or collection of objects if you really want to do it correctly) and writing it back out.
Here is a link to a good VBA document.
http://social.msdn.microsoft.com/Forums/en/isvvba/thread/d712dbdd-c876-4fe2-86d2-7d6323b4262c
Edit
Please note this is not meant to be a fully working solution but really a generic framework to help you in the right direction.
As a generic example that does a lot of what you would need to do (not the best way, but probably the easiest for a beginer), something like this should get you started, although it is hard to say without seeing more of your worksheet.
Sub RowsToColumns ()
Application.ScreenUpdating = False
Dim srcWrkSheet As Worksheet
Dim destWrkSheet As Worksheet
Dim excelData as pExcelData
Dim srcRowNumber As Long
Dim srcRolNumber As Long
Dim destRowNumber As Long
Dim destColNumber As Long
SET srcWrkSheet = Sheets("YourSourceWorkSheetName")
SET destWrkSheet = Sheets("YourDestinationWorkSheetName")
srcRowNumber = 1
srcColNumber = 1
destRowNumber = 1
destColNumber = 1
'Loop until blank row is encountered in column 1
Do
destWrkSheet.Cells(destRowNumber ,1).Value = "Header 1 " & srcWrkSheet.Cells(srcRowNumber,srcColNumber )
destWrkSheet.Cells(destRowNumber ,1).Value = "Header 2 " & srcWrkSheet.Cells(srcRowNumber ,srcColNumber)
srcRowNumber = srcRowNumber + 1
srcColNumber = srcColNumber + 1
destRowNumber = destRowNumber + 1
Loop Until srcWrkSheet .Cells(rowNumber, 1).value = ""
End Sub

Resources