I have two problems:
I need to copy a table in .docx that has paragraph numbering in column A. The first row the of the table is a always merged(A-C). The table can be any number of rows but follows the same format.
.docx table Ex:
A B C
|'title...'|
|1.| T | F |
|2.| F | T |
|3.| T | T |
I know this code looks at (2, 1) but it does not return that table numbering '1.'. It just returns (2, 1) as a blank cell.
Ideally it would return the values of '1' (Without the period).
When I run the full code it passes through to 'Next iCol' the first time and then errors at 'Cells(resultRow, iCol)...' with: "The Requested member of the collection does not exist". I am thinking it has something to do with the first row being merged so Cell(1,2) does not exist but I am not sure of the solution.
CODE IN QUESTION:
ElseIf .Found = True Then
For iRow = 1 To wrdDoc.Tables(3).Rows.Count
For iCol = 1 To wrdDoc.Tables(3).Columns.Count
Cells(resultRow, iCol) = WorksheetFunction.Clean(wrdDoc.Tables(3).Cell(iRow, iCol).Range.Text)
Next iCol
resultRow = resultRow + 1
Next iRow
resultRow = resultRow + 1
End If
The final results in excel should match the .docx table without the column A period. If is easier the 'title' can just be placed in A1 with the rest of the table to follow.
A B C
|'title..'|
|1| T | F |
|2| F | T |
|3| T | T |
OR
A B C
|tle| | |
|1 | T | F |
|2 | F | T |
|3 | T | T |
Thank you for your help and time.
If the first cell in each row is formatted as "numbered list" then you can read the value like this:
Dim r As Long
With wrdDoc.Tables(3)
For r=2 to .Rows.Count
Debug.Print .Cell(r, 1).Range.ListFormat.ListValue
Next r
End with
Or use ListFormat.ListString if the list uses (eg) A, B, C, ...
Try something along the lines of:
Dim i as long
With wrdDoc.Tables(3).Range
For i = 1 To .Cells.Count
If .Cells(i).RowIndex > iRow Then resultRow = resultRow + 1
iRow = .Cells(i).RowIndex: iCol = .Cells(i).ColumnIndex
ActiveSheet.Cells(resultRow, iCol) = WorksheetFunction.Clean(.Cells(i).Range.Text)
Next
End With
Note the inclusion of a worksheet reference in the code - you may need to define that differently.
The code as posted works fine for me - The first output cell in Excel contains the first cell's text from the Word table. If that cell contains automatic numbering, though, the automatic number is not output. That is consistent with your own code.
The problem with trying to capture content that is included with automatic numbering is that such numbering is usually followed by a tab, which Excel will at best interpret as a column separator. For example:
wrdDoc.Tables(3).Range.Copy
xlWkSht.Cells(resultRow, 1).PasteSpecial xlPasteValues
resultRow = resultRow + wrdDoc.Tables(1).Range.Rows.Count
Related
I am currently starting a project where I need to manage contractual requirements for a large-scale engineering and construction contract.
Unfortunately, all of the identified project requirements were delivered via PDF (there are thousands of Reqs...). I’ve since taken these PDFs and converted them to spreadsheets in Excel. I will eventually use .CSV files to import these into our RM Tool.
My problem is that all of the project requirement PDFs were written for ease of readability- not so much for use in spreadsheet form. Every section is written like a numbered list, which is fine, but I do not need to have the requirements decomposed to the level they are.
I need to take the “outline, list-numbered format” of the docs and be able to have the child requirements (sub items in example below), combined (concatenated) into rows based on their sections and list numbering.
Right now, I am doing all of this merging of rows by hand, but I don't see how I can get the task done quick enough.
Here is an example of how the PDFs look from the client:
Section 1-1.1: General
A. The contractor shall do “this”, then “that”.
“This” will cost less than this much money
“That” will cost less than this amount of money
a. If “that” costs more, it should not be added
b. Another option is “this”
B. The contractor shall name “this”...
The name should use proper grammar
C. The contractor shall complete work before 2022
Section 1-2.1: Materials
A. The contractor shall use these three materials:
Aluminum
Steel
Cement
a. Cement should be gray only
__i. Gray coloring must be this shade
__ii. Gray cement must not lose coloring
b. Cement should be mixed on-site
Section 1-2.2: Material Suppliers
A. Aluminum must be supplied by “ABC, Inc”
B. Steel must be supplied by “DEF, Inc”
C. Cement must be supplied by "GHI, Inc"
Table 1-1: Supplier Contact Info
[TABLE HERE]
Section 1-3.1: Landscaping
...
Here is how the Excel docs look in their main state. Note that we have added attributes to support each requirement item row, so there is not just one or two columns in these sheets:
CURRENT EXAMPLE IMAGE
| [Col A] Requirement List Item | [Col B] Requirement Text |
|-------------------------------|-------------------------------------------------|
| Section 1-1.1 | General |
| A. | The contractor shall do “this”, then “that”. |
| 1 | “This” will cost less than this much money |
| 2 | “That” will cost less than this amount of money |
| a. | If “that” costs more, it should not be added |
| b. | Another option is “this” |
| B. | The contractor shall name “this”... |
| 1 | The name should use proper grammar |
| C. | The contractor shall complete work before 2022 |
| Section 1-2.1 | Materials |
| A. | The contractor shall use these three materials: |
| 1 | Aluminum |
| 2 | Steel |
| 3 | Cement |
| a. | Cement should be gray only |
| i. | Gray coloring must be this shade |
| ii. | Gray cement must not lose coloring |
| b. | Cement should be mixed on-site |
| Section 1-2.2 | Material Suppliers |
| A. | Aluminum must be supplied by “ABC, Inc” |
| B. | Steel must be supplied by “DEF, Inc” |
| C. | Cement must be supplied by "GHI, Inc" |
| Table 1-1 | [Table Here] |
| Section 1-3.1 | Landscaping |
| ... | ... |
As you can see, the current format I have just has each line item in a different row, with no clear indication of whether or not it's a parent or child item. It is easy to tell what items exist within each Section, but we need to determine what items are included for each top level (Level 1 being A, B, C, D items).
Finally, here is what we’re hoping to have for an end result. We want to just have the rows of data sorted by their section number and then their top-level child requirements (Levels A,B,C,D in this example):
FINAL EXAMPLE IMAGE
Does anyone have an idea on how to create a Python script to handle this task? I have been trying to figure this out for a week and a half, but I can’t seem to find the right functions to use. I have looked up Excel functions/forumlas, and different uses for Concat and Merge/TextJoin.
TLDR: I want to take the Project Requirement data given to us in numbered list format (from PDFs) and organize it in Excel spreadsheets to group/concate/merge rows only based on what the section and top-level sub-item data is (Please see examples above).
Thank you for any help/advice. I am really interested in learning script-writing to make my job easier, I just am at a loss as to where to start with something like this.
Here is the VBA code that is unable to complete the task. It seems to break after 2-3 sections of data:
Sub GroupLists()
Dim startRow As Integer
Dim lastRow As Integer
Dim outputRow As Integer
outputRow = 0
Dim outputStr As String
Dim col_list As Integer
Dim col_list_str As String
Dim col_content As Integer
Dim col_content_str As String
Dim currentList As String
Dim i As Integer
Dim j As Integer
Dim parseSheet As String
Dim outputSheet As String
Dim startList As Boolean
startList = False
Dim indent As Integer
indent = 0
'regex patterns
Dim regexObject As RegExp
Set regexObject = New RegExp
Dim pattern_listHeader As String
Dim pattern_listUpAlpha As String
Dim pattern_listLowAlpha As String
Dim pattern_listNum As String
pattern_listHeader = "[\d]-"
pattern_listUpAlpha = "[A-Z]"
pattern_listLowAlpha = "[a-z]"
'configurable variables
parseSheet = "Input"
outputSheet = "Output"
col_list = 2
col_content = 3
outputRow = 1
startRow = Worksheets("Control").Cells(2, 1).Value
lastRow = Worksheets("Control").Cells(2, 2).Value
For i = startRow To lastRow
col_list_str = Worksheets(parseSheet).Cells(i, col_list).Value
col_content_str = Worksheets(parseSheet).Cells(i, col_content).Value
With regexObject
.Pattern = "^[\d]-"
End With
If regexObject.Test(col_list_str) = True Then
If startList = True Then
'write output to row, then write current row to output row
Worksheets(outputSheet).Cells(outputRow, 1).Value = outputStr
outputRow = outputRow + 1
End If
outputStr = Worksheets(parseSheet).Cells(i, col_content)
'write to row
Worksheets(outputSheet).Cells(outputRow, 1).Value = outputStr
'increment row
outputRow = outputRow + 1
outputStr = ""
startList = False
End If
With regexObject
.Pattern = "^[A-Z]."
End With
If regexObject.Test(col_list_str) = True Then
If startList = True Then
startList = False
Else
startList = True
End If
outputStr = outputStr & col_list_str & col_content_str & vbNewLine
'outputRow = outputRow + 1
End If
If startList = True Then
With regexObject
.Pattern = "^[a-z]."
End With
If regexObject.Test(col_list_str) = True Then
'indent = 1
outputStr = outputStr & col_list_str & col_content_str & vbNewLine
'outputRow = outputRow + 1
End If
With regexObject
.Pattern = "^[0-9]."
End With
If regexObject.Test(col_list_str) = True Then
'indent = indent + 1
outputStr = outputStr & col_list_str & col_content_str & vbNewLine
'outputRow = outputRow + 1
End If
End If
Next i
If startList = True Then
Worksheets(outputSheet).Cells(outputRow, 1).Value = outputStr
End If
End Sub
Typical duplication shown below. In response to comments for #Dy.Lee.
As I wrote so many comments I decided to post a guide answer and delete my comments i.e. provide some pointers to one possible solution.
Logic:
I would read everything into an array and loop the rows of that. Apply logic that processes rows according to rules e.g. does it start with Section.....
is it a Capital letter at start of col 1 of array... is it a number...
A nice idea is probably to use helper functions. E.g.
A helper function which grabs the lines upper case after Section down to next upper case (stop before) and returns that text (as an array).
Then pass that function return value to another function which handles the alphanumeric based indentation levels (using appropriate Chr$()) etc
Another function call to concatenate those values into single string - which will be the value associated with a given key.
Then update the dictionary with the key value pair.
This way you can create a dictionary of key A: associated text from last of helper function calls and write out to sheet at end.
You will need additional logic for tables etc. Tables would be added to dictionary as an array with key being value from first column
At the end you can loop and write out to sheet. I think caution is needed with writing out tables as they may take more than one row. In that case, have an additional helper function that calculates last populated row anywhere in sheet (or target column range) and ensure next key:value pair is written out to after the table by adding + 1 to last populated row number.
When you write out the table, you will need to test the dictionary key for it containing 'table', or that its value is an array; you will need to resize the target cell to the size of the array e.g. if dict(key) = results then targetCell.Resize(UBound(results, 1), UBound(results, 2)) = results
Think about how any hyperlinks (where friendly name different from destination)| additional metadata will be passed around.
Supplying sample data:
I think I would suggest you use a markdown table generator to supply the current data (the input data) to save people time having to write out their own data. You can paste from Excel into markdown generator, press generate table, then copy to clipboard that table, use edit to insert into question. Highlight table just pasted in and press Ctrl + K to indent properly.
Regex:
Regex is not required from what I can see. Data resides in separate cells in source.
Tagging:
I think remove the python tag unless you have python code to add.
Resources:
https://bettersolutions.com/vba/strings-characters/builtin-constants.htm
Some useful constants regarding indentation
| VBA.Constants | Chr | Comments |
|---------------|-----------------------|----------------------------------------------------------------------------------------------------------------|
| vbCr | Chr(13) | Carriage return character |
| vbLf | Chr(10) | Linefeed character |
| vbCrLf | Chr(13) + Chr(10) | Carriage return - linefeed combination |
| vbNewLine | Chr(13) + Chr(10) | New line character |
| vbNullChar | Chr(0) | Character having a value of 0. |
| vbNullString | String having value 0 | Not the same as a zero-length string (""); used for calling external procedures. Cannot be passed to any DLL's |
| vbTab | Chr(9) | Tab character |
| vbBack | Chr(8) | Backspace character |
| vbFormFeed | Chr(12) | Word VBA Manual - manual page break ? |
| vbVerticalTab | Chr(11) | Word VBA Manual - manual line break (Shift + Enter) |
Dictionaries -
https://learn.microsoft.com/en-us/office/vba/language/reference/user-interface-help/dictionary-object
http://www.snb-vba.eu/VBA_Dictionary_en.html
Passing and returning arrays - http://www.cpearson.com/excel/passingandreturningarrays.htm
Arrays and ranges - http://www.cpearson.com/excel/ArraysAndRanges.aspx
Typed functions - What's the difference between Trim() and Trim$() in VBA? re: Chr$ v Chr.
Alternative approach:
If working with arrays and dictionary (which should be quick) doesn't work for you, the logic regarding how to separate out your chunks/indentations can still apply in a loop over the actual rows in the sheet. You would need to keep track however of where you are writing out to based on last row in destination range (you can use a different last row helper function and pass in the column to use to determine the last populated row in).
Try,
Sub setGroup()
Dim Ws As Worksheet, toWs As Worksheet
Dim vDB As Variant, vR() As Variant
Dim vResult(), vRow()
Dim s As String, s2 As String
Dim sName As String
Dim i As Long, j As Long, r As Long
Dim cnt As Long, n As Long, k As Integer
Dim st As Long, et As Long
Set Ws = Sheets(1) 'Data Sheet
Set toWs = Sheets(2) 'Result Sheet
vDB = Ws.Range("a1").CurrentRegion
r = UBound(vDB, 1)
ReDim vRow(1 To r)
For i = 2 To r
s = vDB(i, 1)
If s Like "[A-Z].*" Or s Like "Section*" Or s Like "Table*" Then
n = n + 1
vRow(n) = i
End If
Next i
If vRow(n) < r Then
ReDim Preserve vRow(1 To n + 1)
vRow(n + 1) = r
Else
ReDim Preserve vRow(1 To n)
End If
cnt = UBound(vRow)
ReDim vResult(1 To r, 1 To 2)
n = 0
sName = vDB(1, 1)
For j = 1 To cnt - 1
k = 0
Erase vR
st = vRow(j)
If j = cnt - 1 Then
et = vRow(j + 1)
Else
et = vRow(j + 1) - 1
End If
For i = st To et
s = vDB(i, 1)
s2 = ""
'*** Set spacing for each group
If s Like "[A-Z].*" Or s Like "Section*" Or s Like "Table*" Then
Else
If IsNumeric(s) Then
s2 = Space(4) & s & "."
ElseIf s Like "[a-z].*" And Not (s Like "i*") Then
s2 = Space(8) & s
Else
s2 = Space(12) & s
End If
End If
'Collect data according to conditions.
If s Like "Section*" Then
n = n + 1
vResult(n, 1) = vDB(i, 1)
vResult(n, 2) = vDB(i, 2)
ElseIf s Like "[A-Z].*" Then
n = n + 1
k = k + 1
sName = s
ReDim Preserve vR(1 To k)
vR(k) = vDB(i, 2)
ElseIf s Like "Table*" Then
n = n + 2
vResult(n - 1, 1) = vDB(i, 1)
vResult(n - 1, 2) = "Supplier Contct Infor"
vResult(n, 2) = vDB(i, 2)
Else
k = k + 1
ReDim Preserve vR(1 To k)
vR(k) = s2 & " " & vDB(i, 2)
End If
Next i
If k Then
vResult(n, 1) = sName
vResult(n, 2) = Join(vR, vbCrLf)
Else
End If
Next j
With toWs
.Cells.Clear
.Range("a1").Resize(n, 2) = vResult
End With
End Sub
Data Image
Result Image
I have a table which has some string keys and numerical values like this-
-----------------
| Keys | Scores |
-----------------
| k1 | 10 |
| k2 | 15 |
| k3 | 8 |
-----------------
Now there's another table which has comma separated keys like this-
--------------------
| Keys | Total |
--------------------
| k1,k2 | |
| k3 | |
| k1,k2,k3 | |
--------------------
I want to fill the "Total" column by referencing the table. Is it possible in excel with formulas or VBScript?
This formula iterates the parts and uses SUMIFS to return the number to SUMPRODUCT:
=SUMPRODUCT(SUMIFS(B:B,A:A,TRIM(MID(SUBSTITUTE(E2,",",REPT(" ",999)),(ROW($ZZ$1:INDEX($ZZ:$ZZ,LEN(E2)-LEN(SUBSTITUTE(E2,",",""))+1))-1)*999+1,999))))
No vba or named range workarounds needed.
Here's one approach which uses the Evaluate method to create an array that consists of the criteria that will be passed to SUMPRODUCT(SUMIF(...)).
1) First, select cell F2.
2) Then, define the following name (Ribbon >> Formulas >> Defined Names >> Define Name)...
Name: MyArray
Refers to: =EVALUATE("{"""&SUBSTITUTE(SUBSTITUTE('Sheet1'!$E2," ",""),",",""",""")&"""}")
Click OK
Change the sheet name accordingly.
3) Then, enter the following formula in F2, and copy down:
=SUMPRODUCT(SUMIF($A$2:$A$4,MyArray,$B$2:$B$4))
You could use:
Code:
Option Explicit
Sub test()
Dim i As Long, y As Long, w As Long
Dim arrLookingValues As Variant, arrDataStored As Variant, arrValues As Variant
Dim Total As Long
With ThisWorkbook.Worksheets("Sheet1")
'Set an array which store Keys & Scores
arrDataStored = .Range("A2:B4")
'Set an array which store what we are looking for
arrLookingValues = .Range("D2:D4")
For i = LBound(arrLookingValues) To UBound(arrLookingValues)
Total = 0
'Set an array with the values of each row we are looking for
arrValues = Split(arrLookingValues(i, 1), ",")
'Loop the array which store the values we are looking for
For y = LBound(arrValues) To UBound(arrValues)
'Loop the array which store both Keys & Scores
For w = LBound(arrDataStored) To UBound(arrDataStored)
'If we find a match
If Trim(arrValues(y)) = arrDataStored(w, 1) Then
'Add Total
Total = Total + arrDataStored(w, 2)
Exit For
End If
Next w
Next y
'Print the result in column E
.Range("E" & i + 1).Value = Total
Next i
End With
End Sub
Results:
I have managed to get to a point with a data set where i have a list of items delimited with a "|" symbol. I am now trying to separate each item in the list into the corresponding column, however the identifier of the column is a bit of text at the end of each value of variable length.
Example Data (all in one column):
Column A
40.00A|24.00QS|8.00J[a]
40.00A|12.00J|8.00J[a]
20.00A|4.00V
30.00A|12.00CS|8.00QS
Desired Outcome:
+-------+-------+------+-------+-------+------+
| A | QS | J[a] | J | CS | V |
+-------+-------+------+-------+-------+------+
| 40.00 | 23.00 | 8.00 | | | |
| 40.00 | | 8.00 | 12.00 | | |
| 20.00 | | | | | 4.00 |
| 30.00 | 8.00 | | | 12.00 | |
+-------+-------+------+-------+-------+------+
The number of trailing characters that define columns is fixed to 6 (A,QS,J[a],J,CS & V), so I know at the beginning how many columns I will need.
I have some ideas on how to do it directly through formulas, but it would require me to split out the items into individual columns by the delimiter, then use some sort of if statement on some additional columns. Would prefer to avoid the helper column issue. Also, looked at the following link, but it doesn't solve the solution, as it assumes the value matches the column heading (I can correct that, but I feel like there is a faster VBA solution here):
How to split single column (with unequal values) to multiple columns sorted according to values from the original single column?
I have been reading about Regular Expressions, and i suspect there is a solution there, but I can't quite figure out how to sort the result.
Once i have this data setup, it is a small task to unpivot it and get the data in a proper tabular format using Power Query.
Thanks in advance!
since headers are fixed, it can simply be tried out like this (the Row & Column of the Source & destination data may be changed to your requirement)
Option Explicit
Sub test()
Dim Ws As Worksheet, SrcLastrow As Long, TrgRow As Long, Rw As Long
Dim Headers As Variant, xLine As Variant
Dim i As Long, j As Long
Set Ws = ThisWorkbook.ActiveSheet
'Column A assumed to have the texts
SrcLastrow = Ws.Range("A" & Rows.Count).End(xlUp).Row
TrgRow = 2
Headers = Array("A", "QS", "J[a]", "J", "CS", "V")
For Rw = 1 To SrcLastrow
xLine = Split(Ws.Cells(Rw, 1).Value, "|")
For i = 0 To UBound(xLine)
For j = 0 To UBound(Headers)
xLine(i) = Trim(xLine(i))
If Right(xLine(i), Len(Headers(j))) = Headers(j) Then
Ws.Range("D" & TrgRow).Offset(0, j).Value = Replace(xLine(i), Headers(j), "") ' The output data table was assumed to be at Column D
End If
Next j
Next i
TrgRow = TrgRow + 1
Next
End Sub
I have a rather significant amount of data in a spreadsheet that is poorly arranged. The current format has a company, a product name, and the ingredients listed afterwards. The ingredients all have their own column without a header. For instance, and I apologize this is not reflected below since I'm terrible at markup language, Column A would be labeled Manufacturer, Column B would be labeled as Product Name, Column C would be labeled Ingredients but then the rest of the columns are unlabeled.
Ultimately, I need to move the data to a new sheet, where data only appears in columns A, B, and C. The number of ingredients each product has varies.
I hope that the desired format helps.
Current Format:
1| Acme Inc. | ABC123 | Water | Sugar | Eggs | Salt
2| Acme Inc. | BCD456 | Cornmeal | Salt
3| JJ Baking | JJ4567 | Flour | Nuts | Fruit
Desired Format:
1| Acme Inc. | ABC123 | Water
2| Acme Inc. | ABC123 | Sugar
3| Acme Inc. | ABC123 | Eggs
4| Acme Inc. | ABC123 | Salt
5| Acme Inc. | BCD456 | Cornmeal
6| Acme Inc. | BCD456 | Salt
7| JJ Baking | JJ4567 | Flour
8| JJ Baking | JJ4567 | Nuts
9| JJ Baking | JJ4567 | Fruit
Here's a short one that should work:
Sub test()
Dim lastRow&, lastCol&, noItems&
Dim i&, k&
' This macro will assume your column A and B are constant, and your items will start in column C
lastRow = Cells(Rows.Count, 1).End(xlUp).Row
For i = lastRow To 1 Step -1
lastCol = Cells(i, Columns.Count).End(xlToLeft).Column
noItems = WorksheetFunction.CountA(Range(Cells(i, 3), Cells(i, lastCol)))
' Now we know how many items, so add the info to the new rows.
' Start with the name and col B
Range(Cells(i + 1, 1), Cells(i + noItems - 1, 1)).EntireRow.Insert
Range(Cells(i, 1), Cells(i + noItems - 1, 2)).FillDown
For k = 1 To noItems - 1
Cells(i + k, 3).Value = Cells(i, 3 + k).Value
Cells(i, 3 + k).Value = ""
Next k
Next i
End Sub
It will look in column C through [whatever column in that row is the last one, going right], then create new rows to fit the amount of items in there.
This should do the trick, i put my many assumptions in the code comments.
My main one is that there are no gaps between columns with data. EG: Column D is filled out, Column E is blank, Column F is filled out.
Also that there are no 'blank' rows for entries for Column A, and that when we do see a blank row the function stops. Please fill in your 'Worksheet Name' where it says 'Customize to your sheet name'.
Public Sub ReOrder()
Dim sheet As Worksheet
Dim row As Integer
Dim col As Integer
Dim offset As Integer
row = 2
'Customize to your sheet name
Set sheet = ThisWorkbook.Worksheets("Sheet1")
'I am assuming there are no 'blanks' between rows or columns
'If there are such 'blanks' use UsedRange.Rows or UsedRange.Columns
'Then skip over the blanks with an if statement
'Keep processing while we see data in the first column
While (sheet.Cells(row, 1).Value <> "")
'We only need to make a new row if anything past column C is filled out with something
col = 4
offset = 1
While (sheet.Cells(row, col).Value <> "")
'Insert new row
sheet.Rows(row + offset).EntireRow.Insert shift:=xlDown
'Assign Column values to the new row
sheet.Cells(row + offset, 1).Value = sheet.Cells(row, 1).Value
sheet.Cells(row + offset, 2).Value = sheet.Cells(row, 2).Value
sheet.Cells(row + offset, 3).Value = sheet.Cells(row, col).Value
'Remove Column value from the source row
sheet.Cells(row, col).Value = ""
col = col + 1
offset = offset + 1
Wend
row = row + 1
Wend
End Sub
I’m guessing the 3a3n codes are company specific. Assuming ABC1231 is in B1 of Sheet1, insert a new Row1 and apply the technique detailed here, selecting Range in “Step 2b of 3” to be B1: to the end of your data. When you get to the Table you may filter to select and delete the rows blank in Column Value.
In B2 enter:
=INDEX(Sheet3!A:A,MATCH(A4,Sheet3!B:B,0))
Select and Copy Table then Paste Special…, Values over the top and switch the order of Columns A and B.
I have looked at several other stack overflow questions, however I haven't found the answer I am looking for.
I have an excel sheet with 25000 rows in the following format:
userid | taskid | taskcode
1 | 3 | K
1 | 4 | O
1 | 4 | L
1 | 5 | O
2 | 3 | O
What I want to do is identify any rows that are duplicated considering userid and taskid, in the above example I would expect the second and third rows to be highlighted.
I'm not sure how feasible it is, I'd ideally like to identify the duplicate row containing the taskcode O rather than L.
My end goal is to remove all duplicate userid / taskid rows with the taskcode O regardless of how it is achieved (highlighting then sorting by highlights or using vba).
Ok solved it after messing around - I'll leave this answer here for future reference unless there is a more efficient way identified.
Step 1
Create a combined cell with both userid and taskid values using =A2&B2 in column D.
Step 2
Use the built in condition formatting for highlighting duplicates.
Step 3
Custom sort using the highlighted duplicate red highlighted cells at the top, then sorting the taskcode in the custom order selected (O, K, L)
Step 4
Simply highlight all the rows (which are now in blocks) that you wish to delete.
if you wanna try using VBA.
Sub duplicates_separation()
Dim duplicate(), i As Long
Dim delrange As Range, cell As Long
Dim delrange2 As Range
Dim shtIn As Worksheet
Dim numofrows1
Set shtIn = ThisWorkbook.Sheets("sheet1")
x = 2
y = 1
Set delrange = shtIn.Range("c1:b30000") 'set your range here
Set delrange2 = shtIn.Range("b1:f30000")
ReDim duplicate(0)
'search duplicates in 2nd column
For cell = 1 To delrange.Cells.Count
If Application.CountIf(delrange, delrange(cell)) > 1 Then
ReDim Preserve duplicate(i)
duplicate(i) = delrange(cell).Address
i = i + 1
End If
Next
'search duplicates in 3rd column
For cell = 1 To delrange2.Cells.Count
If Application.CountIf(delrange2, delrange2(cell)) > 1 Then
ReDim Preserve duplicate(i)
duplicate(i) = delrange2(cell).Address
i = i + 1
End If
Next
'print duplicates
For i = UBound(duplicate) To LBound(duplicate) Step -1
shtIn.Range(duplicate(i)).EntireRow.Value.delete
x = x + 1
Next i
End Sub