I have 2 tabs of data with a unique identifier. The identifier is not in any particular order. I need my vlookup / index / match to show me all the identifiers that are not present in tab 2.
Reason: I am working where the systems they used failed a data transfer. I have to see what data there was compared to what data is currently on the system. Any data that is missing, i will need to add to the new system.
Example;
Tab1 Column A:
123456,
654321,
789456,
456789.
Tab2 Column B:
654321,
123456,
456789.
In Tab 3, I want excel to tell me that 789456 is not present in Tab 2.
As you can see in the above example, the unique identifier could be in any order, therefore i cannot put both columns side by side and ask to do a match between the 2 - i need it to look through the whole column.
All the tutorials i have seen assume that column A matches in order of column B
I have 70,000 rows to go through.
Any help would be appreciated.
Thanks in advance.
To do it with a formula you will want a helper column in the First tab.
In an empty column, I used column B, put the following in the second row:
=IF(ISERROR(VLOOKUP(A2,Sheet2!B:B,1,FALSE)),MAX($B$1:B1)+1,"")
This will create a column of numbers that increment on the ones not found in sheet two.
At this point you can simply filter on the new column for anything that in non blank and get your list.
If you want to do it with a formula in the Third tab then use this formula that refers to the helper column on the first tab:
=IFERROR(INDEX(Sheet1!A:A,MATCH(ROW(1:1),Sheet1!B:B,0)),"")
Then copy/drag down sufficient to get blanks.
With 70,000 items I would avoid array formulas as it will slow the calculation down and may even crash excel.
You could try using something like this:
=IFERROR(VLOOKUP(<value cell>, 'Tab2'!B:B, 1, FALSE), FALSE)<>FALSE
Copy all the values from tab 1 column A into tab 3 column A. In tab 3 column B, paste the above formula in every row where there is a value in column A, using referencing the cell from column A and the same row as the value cell. The formula will attempt to look up the value from tab 1 in tab 2. If it is missing, it will generate an error which is caught by the IFERROR function, which will return FALSE instead of letting the error escape. Finally, that FALSE is negated to return TRUE if the value is present in tab 2, and FALSE if the value is missing in tab 2.
From this point you can use a column filter in tab 3 to only see those rows with a TRUE value, that will only show you values that are present in both tab 1 and tab 2.
Soulution for this is COUNTIF() the formula would be:
=COUNTIF(Sheet1!A:A,Sheet2!A1)
After applying that for all rows, just filter those that have value 0.
This macro will produce a compact list in Sheet3:
Sub WhatsMissing()
Dim s1 As Worksheet, s2 As Worksheet, s3 As Worksheet
Dim r1 As Range, N As Long, K As Long, i As Long
Dim v As Variant
Set s1 = Sheets("Sheet1")
Set s2 = Sheets("Sheet2")
Set s3 = Sheets("Sheet3")
Set r2 = s2.Range("B:B")
K = 1
N = s1.Cells(Rows.Count, "A").End(xlUp).Row
With Application.WorksheetFunction
For i = 1 To N
v = s1.Cells(i, "A").Value
If .CountIf(r2, v) = 0 Then
s3.Cells(K, "A").Value = v
K = K + 1
End If
Next i
End With
End Sub
Related
I've been trying soo hard much to clean up this csv data for a coworker.
I’m going to walk through what the data usually looks like and then walk through the steps I’ve done and then bring up what I’m currently struggling with… Bear with me as this is my first post (and I have no background in vba and everything is self-taught by Google).
So the data export is a csv which can be opened in excel broken out by several columns. The column in question is column G, which essentially has multiple data sets (1 – 219) for the same menu item (row).
For example:
A B C D E F G
Chicken Soup {1;$6.00;59;$9.00;88;$6.00}
Beef Soup {1;$8.00;59;$12.00;88;$8.00}
Duck Soup {1;$6.00;59;$6.00;88;$6.00}
Egg Soup {1;$8.00;59;$9.00;88;$8.00}
Water {1;$0.00}
French Onion Soup {1;$16.00;59;$15.00;88;$12.00}
Chili Soup {1;$17.00;84;$17.00}
So in column G, you can tell, there is multiple prices the format is:
{Column Number ; $ Price ; Column number $ Price etc & }
Regex: .[0-9]{1,2},[$][0-9]{1,3}[.][0-9][0-9].|[0-9]{1,2},[$][0-9]{1,3}[.][0-9][0-9]
The first goal was to parse out the data in the column into the row, in a format that is true to the csv (so it can be combined and resubmitted).
For example: (imagine there is a semi colon between each data set, as there should be in the final result)
{1;$21.00}
{1;$16.00}
{1;$12.00 5;$12.00 8;$12.00}
{1;$18.00 6;$18.00 8;$18.00}
{1;$10.00 6;$7.00 9;$12.00 11;$10.00}
{1;$20.00 6;$20.00 8;$20.00}
{1;$5.49 3;$3.99 10;$4.99 12;$4.99}
{1;$18.99}
{1;$21.00}
{1;$21.00}
To accomplish this goal, I wrote a macro that:
Copies column G from “Sheet1” and inputs to new sheet “Sheet2” in A1
Replace all “;$” with “,$” to help separate each data set by itself instead of having it broken out column name then dollar sign in two different columns
Text to columns macro splitting on “;” (and inputs results starting B1 so I can keep A1 with all the data sets in one column in case I need it) – also if you know how to keep the semi colon here, that would be helpful so I don’t have to re-add it in the future
Replace All from b1 to end of data set "," to ";" <-- to bring it back to original formatting
Copies the Data from B1 to last cell with data (data is not in order, the 50th row could have 219 columns and then the last row could only have 150) and pastes this data into column G of “rp items” (therefore overriding the existing data and shifting the columns as far right as the last column used.
However, when I showed my coworker what I’ve done, he wanted the leading number (column number) to correspond to the Columns (since data starts in column G, this will be column 1, H will be 2 etc). Therefore looking something like this so he can filter the column by the all the items that have that column number:
For example, this photo is how the outcome should look
So, now the goal is to create a macro that…
Loops through B1:B in sheet “STEP ONE” (column B starting at B1 then C1 then when blank in that row go to next row)
While (B1 (or next row) is blank, do nothing, end macro)
If B1 (or active cell) is not blank, read the cell value to extract column; copy the cell’s contents, paste in “STEP TWO” sheet in the same row as the active cell, but offset by the column number from cell value.
Toggle back to main sheet, goes to next cell within that row – if blank, go to next row and repeat until all data is done.
To give you some background, I have more than 25,000 lines of data (menu items) and the longest column (I believe is 219). So far, I’ve mostly been trying pieces of scripts I’ve found online but none of them are doing similar to what I need and I don’t know how to write enough code to just write the script out myself. I believe I’ll need to have to establish a variable: the column name (not sure if I can extract this using the regex code I found out) and then use that in the offset...
But my range needs to be dynamic and loop…
Please let me know if you can assist – I’ve been stuck on this for like a week!
Thank you all so much for reading – if I can provide extra detail please let me know.
For example you could do it this way:
Sub Tester()
Dim arr, i As Long, c As Range, v, col, price
For Each c In Range("G2:G4").Cells
v = Replace(Replace(c.Value, "{", ""), "}", "") 'remove braces
If Len(c.Value) > 0 Then 'anything to process?
arr = Split(v, ";") 'split on ;
For i = 0 To UBound(arr) - 1 Step 2 'loop 2 at a time
col = CLng(Trim(arr(i))) 'column number
price = Trim(arr(i + 1)) 'price
c.Offset(0, col).Value = col & ";" & price
Next i
End If
Next c
End Sub
I've run into a bit of a road block. I get a .PDF output from an accounting program and copy/paste the data into excel, then convert text to columns. I am trying to match the GL code with the totals for that specific account. Columns A, B, and C show the state of my data prior to sorting it, and the lines under Intended Output show how I would like the data to output.
I am trying to automate this process, so I can paste data into columns A, B, & C in the raw format and have it automatically spit out the required numbers in the format of the Intended Output. The GL codes remain the same, but the numbers and the number of rows will change. I've color coded them for ease of review.
Thank you very much in advance!
Using a combination of the following formulas you can create a list of filtered results. It works on the principal that you Data1 text that you want to pull is the only text with a "-" in it, and that the totals you are pulling from Data2 and Data3 are the only numbers in the column. Any change to that pattern will most likely break the system. Note the formulas will not copy formatting.
IFERROR
INDEX
AGGREGATE
ROW
ISNUMBER
FIND
Lets assume the output will be place in a small table with E2 being the upper left data location.
In E2 use the following formula and copy down as needed:
=IFERROR(INDEX(A:A,AGGREGATE(15,6,ROW($A$1:$A$30)/ISNUMBER(FIND("-",$A$1:$A$30)),ROW(A1))),"")
In F2 use the following formula and copy to the right 1 column and down as needed:
=IFERROR(INDEX(B:B,AGGREGATE(15,6,ROW($A$1:$A$30)/ISNUMBER(B$1:B$30),ROW(A1))),"")
AGGREGATE performs array like calculations. As such, do not use full column references such as A:A in it as it can lead to excess calculations. Be sure to limit it to the range you are looking at.
Try this procedure:
Public Sub bruce_wayne()
'Assumptions
'1. Data spreadsheet will ALWAYS have the structure shown in the question
'2. The key word "Total" (or whatever else it might be) is otherwise NOT found
' anywhere else in the 1st data column
'3. output is written to the same sheet as the data
'4. As written, invoked when data sheet is the active sheet
'5. set the 1st 3 constants to the appropriate values
Const sData2ReadTopLeft = "A1" 'Top left cell of data to process
Const sData2WriteTopLeft = "J2" 'Top left cell of where to write output
Const sSearchText = "Total" 'Keyword for summary data
'*******************
Const sReplaceText = "Wakanda"
Dim r2Search As Range
Dim sAccountCode As String
Dim rSearchText As Range
Dim iRowsProcessed As Integer
Set r2Search = Range(sData2ReadTopLeft).EntireColumn
sAccountCode = Range(sData2ReadTopLeft).Offset(1, 0).Value
iRowsProcessed = 0
Do While Application.WorksheetFunction.CountIf(r2Search, sSearchText) > 0
Set rSearchText = r2Search.Find(sSearchText)
Range(sData2WriteTopLeft).Offset(iRowsProcessed, 0) = sAccountCode
Range(sData2WriteTopLeft).Offset(iRowsProcessed, 1) = rSearchText.Offset(0, 1).Value
Range(sData2WriteTopLeft).Offset(iRowsProcessed, 2) = rSearchText.Offset(0, 2).Value ' add this if there are more summary columns to return
'last two lines could be collapsed into a single line; at the expense of readability..
rSearchText.Value = sReplaceText 'so that next search will find the next instance of the trigger text
iRowsProcessed = iRowsProcessed + 1
sAccountCode = rSearchText.Offset(1, 0).Value
Loop
r2Search.Replace what:=sReplaceText, Replacement:=sSearchText
End Sub
I have a large table that only shows a single type of information: whether or not a species of plant was present at a particular study site. I have 500+ species listed in the first column, and 30 sites as column names. The table is populated with a simple "Y" or "N" to show presence. Example:
Scientific Name Old Wives Beach Dadi Orote N Airstrip
Abelmoschus moschatus N N N
Abrus precatorius Y N Y
Abutilon indicum N N N
However, the species list contains some species that do not occur at any sites, rendering a row full of "N"s, like the 1st and 3rd rows in the example above. I need to delete those rows in order to make the table more manageable.
Is there any way to achieve this without a long IF AND statement?
Inspired by pnuts' comment, in a new column, use the a COUNTIF() formula. For example, =COUNTIF(B2:AE2,"Y"), assuming the row/column headers are in row 1 and column A and the data is in the range B2:AE501+.
If you then select the entire range, including the headers and the new formula column and add filters, then you can select only the rows where the count of Y's is 0. Once you have only the 0's showing, you can select the entire rows and delete them (using Right-Click, Delete) without effecting the non-zero rows.
At this point, if you no longer need the count column, you can turn off the filter and delete the column but I wouldn't be surprised if you find the count comes in handy for some other reason.
Alternately, you could just use the filter to HIDE the 0 rows rather than delete them and that way to don't remove the data altogether but it's no longer in your way.
The code below is one way to do this, assuming there are no gaps in the data. The animated gif steps through to demonstrate how it works. You should remove the .select statements once you understand it.
Sub deleteIfAllN()
Dim plantR As Range, cell As Range, allN As Boolean
Set plantR = Range("A2")
While plantR <> ""
plantR.Select
Set r = plantR.Offset(0, 1)
allN = True
Do
r.Select
If r <> "N" Then
allN = False
Exit Do
End If
Set r = r.Offset(0, 1)
Loop Until r = ""
Set plantR = plantR.Offset(1, 0)
Rows(plantR.row - 1).Select
If allN Then Rows(plantR.row - 1).Delete
Wend
End Sub
You can use the Advanced Filter
Set up your data and criterion area as below
For the example you posted, the formula would be:
=COUNTIF($B8:$D8,"N")<>3
For 30 columns, just modify the range and the count.
Before
After
I chose to filter in place
Note that there is also an option to Copy to another location which would place the results of the filter in another location.
I have two columns in Excel:
Column A
Row 1 Apple
Row 2 Blueberry
Row 3 Strawberry
Column B
Row 1 App
Row 2 Application
Row 3 Appendage
I would like to use Column B to see if any cells within it exist within the given cell in Column A. So far, I have used the VLOOKUP and MATCH functions and I can't seem to get either to work properly, but MATCH seems to be the one I should be using. I tried using wildcards on Column B and it returns a value error. Here is what I have:
=MATCH(A1,"*"&B:B&"*",0)
Your help is greatly appreciated!
There is a natural VBA solution. In a standard code module place:
Function PartialMatch(v As Variant, R As Range) As Variant
Dim i As Long
For i = 1 To R.Cells.Count
If v Like "*" & R.Cells(i).Value & "*" Then
PartialMatch = i
Exit Function
End If
Next i
PartialMatch = CVErr(xlErrNA)
End Function
Then where you want it in a spreadsheet you can use the formula:
=PartialMatch(A1,B:B)
It will give the index of the first partial match, if any exists, or #N/A if it doesn't. Note that a blank cell counts as a partial match, so you might want to make sure that the range that you pass the function contains no blanks (so don't pass the whole column). That, or redefine what you mean by a partial match.
I've got a spreadsheet full of names and peoples' roles, like the one below:
Role Name Change
1 A Yes
2 A No
5 A N/Ap
1 B Yes
3 B No
2 C Yes
4 C No
I have to come up with a spreadsheet like the one below:
1 2 3 4 5 6
A Yes
B
C
Basically, it should retrieve the information from the first spreadsheet and be layed out clearly on the second one.
There are way too many names and roles to do it manually. VLMOVE won't work and I've tried MATCH and INDEX.
Alternative to #RocketDonkey (but thanks for more complete desired result!) could be to string together Role and Name (say in a column inserted between B & C in Sheet1 [because I think OP wants a separate sheet for the results]):
C2=A1&B2 copied down as required
then use a lookup in Sheet2!B2:
=IFERROR(VLOOKUP(B$1&$A2,Sheet1!$C$2:$D$8,2,FALSE),"")
copied across and down as required.
This assumes the grid for the results (as in question) has been constructed (and that there are 7 rows with data - adjust $8 as necessary otherwise.)
Agree with #Melanie that if you can force your data into a structure that can be interpreted as numbers (1 being yes, 0 being false, for example), Pivot tables are far and away the easiest way (since they will display numbers as the values - not the text). *(see below)
If you want to display arbitrary text, you could try this:
=IF(
SUMPRODUCT(--($A$2:$A$8=F$1),--($B$2:$B$8=$E2),ROW($A$2:$A$8))=0,"",
INDEX(
$A$1:$C$8,
SUMPRODUCT(--($A$2:$A$8=F$1),--($B$2:$B$8=$E2),ROW($A$2:$A$8)),
3))
This checks to see if the SUMPRODUCT of the three columns totals 0 (which will happen when no combo of x/y is matched (like Name: C, Role: 5, for instance), and if so, it returns "". Otherwise, it returns the value in column Value.
*A ‘pivot table option’ would be to represent the Change as a number (eg as formula in D2 copied down). Then create a pivot table from (in the example) A1:D8, with fields as shown. Copy the pivot table to a different sheet with Paste Special/Values (though shown in F11:K15 of same sheet in example). Then in that other sheet select row starting with Name A and as far down as required, Replace -1 with No, 1 with Yes and 0 with N/Ap.
AMENDED
You can use array formulas to reorganize your table, without having to change the its structure. Assuming the data is in the range A2:C8 on Sheet1 and the result table is to be in range A1:G4 on Sheet2, the following formula would be the first entry (role 1 and name A) in the result table.
=IFERROR(INDEX(Sheet1!$A$2:$C$8,MATCH(B$1&$A2,Sheet1!$A$2:$A$8&Sheet1!$B$2:$B$8,0),3),"-")
The MATCH formula returns the row number in which the role/name combination 1A occurs. The INDEX function returns the contents of the cell at the row number found by the MATCH formula and the column number 3, i.e., the Change column of your data table. The IFERROR returns "-" if the role/name combination is not in the data table.
Be sure to enter the formula using the Control-Shift-Enter key combination. Then copy the formula to the remaining cells of the result table.
The data table on Sheet1:
The result table on Sheet2:
Well since there's Excel-VBA tag, thought it would complete the solutions types by adding one in VBA :) The following code is not elegant, in any case you need to use code base, give it a try :)
Code:
Option Explicit
Public Sub sortAndPivot()
Dim d As Object
Dim ws As Worksheet
Dim sourceArray As Variant, pvtArray As Variant, v As Variant
Dim maxRole As Long
Dim i, j, k, m As Integer
Set d = CreateObject("Scripting.Dictionary")
Set ws = Worksheets("Sheet3") '-- set according to your sheet
'-- you could enhance by using an input box to select the range
sourceArray = Application.WorksheetFunction.Transpose(ws.Range("B3:D9").Value)
'-- max role number
maxRole = Application.WorksheetFunction.Max(ws.Range("B3:B9"))
'-- find unique name list
For i = LBound(sourceArray, 2) To UBound(sourceArray, 2)
If Not d.exists(sourceArray(2, i)) Then
d.Add sourceArray(2, i), i
End If
Next i
ReDim pvtArray(d.Count, maxRole)
pvtArray(0, 0) = "Name"
'-- add unique names from dictionary
j = 1
For Each v In d.keys
pvtArray(j, 0) = v
j = j + 1
Next
'-- add unique Role number list
For i = UBound(pvtArray, 2) To LBound(pvtArray) + 1 Step -1
pvtArray(0, i) = i
Next i
'-- sort into the correct positions
For k = LBound(pvtArray, 1) + 1 To UBound(pvtArray, 1)
For m = LBound(pvtArray, 2) + 1 To UBound(pvtArray, 2)
For i = LBound(sourceArray, 2) To UBound(sourceArray, 2)
If pvtArray(k, 0) = sourceArray(2, i) Then
If pvtArray(0, m) = sourceArray(1, i) Then
pvtArray(k, m) = sourceArray(3, i)
End If
End If
Next i
Next m
Next k
'Output the processed array into the Sheet in pivot view.
Range("F2").Resize(UBound(pvtArray) + 1, _
UBound(Application.Transpose(pvtArray))) = pvtArray
Set d = Nothing
End Sub
Results:
There is another way to go about it without VBA. If you create another column that concatenates the first two in the first spreadsheet, like so:
Role Name Change CheckColumn
1 A Yes 1A
2 A No 2A
5 A N/Ap 5A
1 B Yes 1B
3 B No 3B
2 C Yes 2C
4 C No 4C
Then you can use the Offset and Match functions together to find the change in the 2nd sheet. So assuming your data is laid from cell A1, the formula in cell B2 would be:
=iferror(offset(Sheet1!$A$1,match(B$1&$A2,sheet1!$D:$D,0),2),"")
Alternatively, if you put the concatenated column in sheet1 before the role column, you can use vlookup in sheet2, with the formula being:
=iferror(vlookup(B$1&$A2,sheet1!$A:$D,4,false),"")