Cross reference two data sources for matches in Excel 2010 - excel

Firstly, thank you for checking my quesiton. I'm new to doing anything advanced in Excel so I'm a bit lost.
I am trying to match names from two different sources that have the same data structure. There are 3 columns, LastName, FirstName, MiddleName. I added a fourth row to denote which organization the record came from and put both sources into one table and made a pivot out of it which works well enough but I'm having a hard time generating any useful data from it.
There are two main objectives once I have them matched.
I need a percentage of matching.
I need to be able to filter out the ones that matched so I can investigate the ones that didn't.
Here is a small example.
+-------------+-----------+------------+------+
| LastName | FirstName | MiddleName | Org. |
+-------------+-----------+------------+------+
| Jones | Mike | Anthony | Org1 |
| Black | Marry | | Org1 |
| Zeek | Winston | E | Org1 |
| Jones | Mike | A | Org2 |
| Black-Smith | Marry | | Org2 |
| Zeek | Winston | E | Org2 |
+-------------+-----------+------------+------+
As you can see out of the list only Winston E Zeek would really match because all three names are exactly the same. Mike Jones won't match because the listed middle names are wrong and Black and Black-Smith won't match because they are technically different last names. These issues with the data are fine at this stage because those are exactly what I'm trying to identify with a larger data set.
Maybe Excel isn't the best for this issue without using VBA? I'm not familiar with VBA which is why I haven't tried it yet and I unfortunately have limited time.
How can I solve this matching problem?
Any assistance and guidance will be appreciated.

Here's a quick idea:
Sort the data by last name, first name, middle name. That should put same/similar names next to each other.
Add a column that, for each row, has a worksheet function like =IF(A3=A2,1,0). This will indicate if this row matches the one above.
Sum the new column... That will tell you the number of matches. Divide by the total number of rows, to get your percentage.
You can modify the function in step 2, to indicate as tight of a match that you want.
Advantage: No VBA needed. Disadvantage: It requires some manual work and interpretation.

Related

Excel: How do i auto extract updated info from the main sheet to individual sheets

I am currently doing a sales summary consist of lots of customers, and I am trying to find a way to automatically update the value from the main sheet to individual customers sheet as there are too many customers for me to do that.
The Main Sheet would look like like this with headers
| Serial| | Date | | Customer Name| | Product Info|
| 001 | |Jan4th| | Mike | | Apple |
i am trying to create a formula so that the other individual sheet could only extract rows that was from only the customer (Mike), and other sheets would be of other customers as well.
It would help a lot if the formula can auto add in value that would soon be update as well, as other method i found only able to distribute available values, and when there are new value to be add, i have to repeat the process again which is not effiecent given the number of customers i have to summary for
If not formula then any other method would help too, but VBA is a bit above my capability so if you can provide detail for how i can make use of it, it would be delightful
If anyone can come up with anything i would be grateful, thank you for your attention
i have tried the copy paste link but they do not auto update new value after the paste.

VBA - Remove duplicates which contain less information

First question on Stack, but not my first visit!
Basically I have this huge Excel database (>24 000 rows, merged from different tables) I have been working on for weeks and now that I'm done adding new entries, I have to clean it by removing a lot of duplicates.
The array/table is structured in the following manner :
+---------+-------+--------------------+-------------+--------------------+
| Company | Name | Address | Phone | Email |
+---------+-------+--------------------+-------------+-----------+--------+
| Baij&Co | Steve | 458 Preston avenue | 4156854789 | steve#baij&co.com |
I did search through conventional methods but they don't exactly answer my problem, such as:
Using the "Remove Duplicates" Excel button by selecting all columns to make sure I only keep unique values
Using the filtering method to identify the duplicates and then remove them.
However, my goal is to remove the duplicates for which the given row(s) contains the minimal amount of information, as shown in this example:
+--------- +-------+--------------------+-------------+--------------------+
| Company | Name | Address | Phone | Email |
+--------- +-------+--------------------+-------------+--------------------+
| Baij&Co | Steve | (blank) | 4156854789 | steve#baij&co.com |
| Baij&Co | Steve | (blank) | (blank) | steve#baij&co.com |
| Baij&Co | Steve | 458 Preston avenue | 4156854789 | steve#baij&co.com |
Here, I would like to remove the 1st AND 2nd row as they contain less information (missing address & phone entry) about the same contact.
Does it makes sense..?
I only know the basics of VBA (like creating a userform to add a new contact and fill out the entered information in the right cells) but I struggle with advanced algorithms.
I just know the VBA related function cannot be customized, apart from selecting the columns in which I want to remove the duplicates :
Sheets("Database").Range("ContactsTable").RemoveDuplicates Columns:=1:15, Header:=xlNo
Any ideas?
Thanks fellas!
So I followed #Tim Williams 's suggestion (which is similar to Scott's actually) and did the following:
I realized that email addresses were the unique identifier (or primary key) and I have to delete rows that don't contain any (as it becomes useless to have a contact file without contact information).
I added a column named "Count" and inserted the following formula:
=COUNTIF(N:N; N2)
--> Here, "N:N" is the column containing all email addresses. "N2" being the first cell.
I then sorted the table by descending order on the newly "Count" column to have the most occurrences first.
Then used the "Remove Duplicate" Excel tool and selected the email address column.
As a result, 10 000 rows have been removed (out of 24 000). One thing for sure is this table contains now unique contact files based on the email address. However, I will never know for sure if the most filled row was kept for each contact sadly (unless I spend days comparing both databases, row after row).
Problem solved I guess! Although I would be interested in a VBA-script to do the same (to learn on the algorithm aspect) if anyone knows anything about it :-)
Thanks again!

How can I make a fixed width mutliline table in vim/neovim?

I want to write a text file for a video script. I want to format the text like a table. It needs to have two columns and any number of rows. I want the text in the 'cells' to be multiline but have a fixed width for the columns.
Here's the effect I'm trying to achieve with three columns. I don't need the scene numbers: Example Script
VimWiki is the best I've found so far, but the columns aren't fixed width and it's difficult or impossible to re-flow the text.
Any suggestions for a better way to do this?
I wonder if the markdown table style is suited for your demands. The example scripts can be found below. I use vim-table-mode to draw these tables, once the table has been created, you can add new rows to it with automatically match the previous width. Here is the demo for this plugin and its GitHub repo can be found here.
| App | Usage | NeedTimeMaster? |
| ------------------- | --------------- | ----------------- |
| Telegram | tec. Chat | No |
| Mail | Communications | No |
| Chrome | Browser | No |
| Things | Tasks | Maybe(slightly) |
| Books | Books | No |

Searching in Excel for certain values, if found give text from cell to the left of where we found the value

First let me explain what I want to achieve.
I currently have an Excel like this:
Names | Standards
James | Standard 1
James | Standard 2
James | Standard 3
Francis | Standard 1
Francis | Standard 2
Francis | Standard 3
Leon | Standard 2
Leon | Standard 3
Peter | Standard 2
Michael | Standard 3
And I want to create something like this:
Standard | Name 1 | Name 2 | Name 3 | Name 4
Standard 1 | James | Francis | |
Standard 2 | James | Francis | Leon | Peter
Standard 3 | James | Francis | Leon | Michael
My real Excel has more than 300 standards, so I would like to automate this using Excel Formula. I know this is possible, but I haven't used Excel in a while, so I could use a push in the right direction.
Couple of things I need (I think):
Need to count how many times people in the names column mention a standard. So I want to know that I need 2 names for standard 1 and 4 for standard 3. I think I can do this by using the COUNTIF method.
We need to search for the location of the standards. I think I can do this by using the Match function. This gives us the location of the first match in my original Excel. By sorting my original Excel a-z and combining it with the countif result I know where all the matches are (first match + countif = location of the last match, and everything inbetween is also that standard).
For the first name that mentioned a standard, I will reference the cell left of the first match (because the names are in the cell to the left of the standard I found). For the second name I will reference the cell left of the cell below the first match. I keep doing this till I find as many names as Countif mentioned. So I need an IF statement that makes sure that if 2 people mention standard 1 only gets 2 names and 2 cells with a "".
How will I reference the cells? By another if statement that uses this: Excel Reference To Current Cell , Correct me if I am wrong, but can't I then just say THIS.CELL=cell location I found (probably should use INDIRECT here?).
This is just me brainstorming, but I would love to know if people have any other ideas for my problem or have some feedback for my current plan.
An important thing to mention is that I want to do this using Excel Formula. I do realise that this isn't always the best, but VBA is not an option atm. I am also not worried about performance issues, because I think i'll just copy all the values after I found all the names using formulas.
Thanks in advance!
Depending on how you want to have the layout, I think you should use a pivot table. Drag the 'Standards' and 'Names' fields to the 'rows' data box and then right-click on a standard, click 'Field Settings' - 'Layout and Print' - 'Show item labels in tabular form'. (See example below.)
If you definitely need the data in the format in your question, I would edit the pivot table by dragging the 'names' field to the 'columns' data box. Then drag the 'standards' field from the field list above a second time and duplicate it in the 'values' box (see example below).
In the space underneath the pivot table, use an IF formula to only copy the name if there is a 1. This kind of approach will obviously be quite fragile, so if you can make do with the first approach, I think you will run into fewer problems in the future.

Use heading as column

I have a spreadsheet set out similar to below:
Weekly Report | 25/06/2012
-------------------------------
Name | Course |
-------------------------------
Peter | Maths |
-------------------------------
John | English |
-------------------------------
James | History |
-------------------------------
Each week a new report is sent with sometimes different and sometimes the same people on different courses. I want to use SSIS to create an extra column to show the date which is usually in cell b3 of the spreadsheet.
So the final thing would look like
Weekly Report | 25/06/2012
--------------------------------------------
Name | Course | Date |
--------------------------------------------
Peter | Maths | 25/06/2012 |
--------------------------------------------
John | English | 25/06/2012 |
--------------------------------------------
James | History | 25/06/2012 |
-------------------------------------------
Hopefully explained myself there. I am rather new to SSIS so don't know if it's really obvious or something more difficult.
Thanks for clarifying your question. You have two steps: extracting the date and turning it into a column. I would probably use an Execute SQL task to query the Excel sheet and map the date to a package variable. Then you can use that variable in a Derived Column transformation to add it to the data set being processed.
Finally, you might want to consider not using SSIS at all. If your source is Excel and your destination is MySQL, then using SSIS means you also need SQL Server. So depending on your environment writing your own script or program might be simpler.

Resources