How do I parse every string while comparing two excel spreadsheets?

How do I parse every string while comparing two excel spreadsheets? - excel

Good evening,
I'm attempting to compare two excel spreadsheets by using the IF and MATCH functions as follows:
=IF(ISERROR(MATCH(fromADP!$C2,fromSMS!$A$2:$A$4792,0)),"No match found",fromADP!$C2)
I have two worksheets (fromADP and fromSMS). I'm trying to compare the two worksheets to find out which records in the fromADP worksheet appear in the fromSMS worksheet. The MATCH function allows me only three options for the match_type arguement. I'm using 0, although I'm not sure I understand exactly how the other two options work. I tried them though without desirable results.
When I use match_type 0 I only get one match - but this is an exact match (as I would expect). My problem is, some of the records do in fact exist in both worksheets but there are minor differences (for example, "Tony's" vs. "Tonny's" or "Jimmy's Trucking, LLC" vs. "Jimmy's Trucking").
So I'm wondering, is there another way to do this or could there be - perhaps - a vbscript that would parse each string in my lookup_value? This way, I can find those records where there might be slight differences.
I'm afraid I may simply have to pull out my ruler and pencil and start combing through the spreadsheets, line-by-line. Any help would be appreciated.

Hi all,
Thanks to the solution offer here, I was able to use the Fuzzy Lookup Addin for Excel to accomplish this task. Thus, my question is answered and my issue resolved.

Related

Replacing numeric values in Excel sheet with text values from other sheet

I am using Surveymonkey for a questionnaire. Most of my data has a regular scale from 0-6, and additionally an "Other" option that people can use in case they choose to not answer the item. However, when I download the data, Surveymonkey automatically assigns a value of 0 to that not-answer category, and it appears this cant be changed.
This leads to me not knowing when a zero in my numeric dataset actually means zero or just participants choosing to not answer the question. I can only figure that out by looking at another file that includes the labels of participants answers (all answers are provided by the corresponding labels, so this datafile misses all non-labeled answers...).
This leads me to my problem: I have two excel files of same size. I would need to find a way to find certain values in one dataset (text value, scattered randomly over dataset), and replace the corresponding numeric values in the other dataset (at the same position in the dataset) with those values.
I thought it would just be possible to find all values and copy paste in the same pattern, but I cannot seem to find a way to do that. I feel like I am missing an obvious solution, but after searching for quite a while I really could not find an answer to my specific question.
I have never worked with macros or more advanced excel programming before, but have a bit of knowledge about programming in itself. I hope I explained this well, I would be very thankful for any suggestions or scripts that could help me out here!
Thank you!
Alex

I don't know how your Excel file is organised, but if it's like the legacy Condensed format, all you should need to do is to select the column corresponding to a given question (if that's what you have), and search and replace all 0 (match entire cell) with the text you want.

"Fuzzy Lookup" add in results

Using Excel 2010, and the Microsoft "Fuzzy Lookup" add in to compare a column out of 2 worksheets. First worksheet has around 48,000 rows (x 3 columns), second worksheet has around 23,000 rows (x 5 columns). The "Fuzzy Lookup" is comparing one column from each & returning a similarity between the two.
The fuzzy lookup appears to run without a problem, and the results - in most cases - appear to be correct. For example:
W2-NK22/16 in one worksheet shows to have a 0.97 similarity to W2NK2216.
But not in all cases. Some that I expected to have some degree of similarity, instead have 0.000 returned by the Add-In. For example:
761689700000
should have some degree of similarity to:
761689700000EN4239
but the Fuzzy Lookup add in returns 0.000 for it. Both fields are formatted as text. Neither have spaces before or after them, and the first 12 characters are identical.
I have uninstalled & reinstalled the add-in, and have used the default settings. The only other Fuzzy Lookup settings I have changed were in Configure --> Global -- UseApproximateIndexing. I have set it to both False and True which have had no impact.
I have hundreds of examples like the one above that show 0.000 similarity, but upon inspection appear to be very similar. Rows before & after them show various degrees of similarity.
Any thoughts or ideas as to why this doesn't appear to function correctly, or a better way to do this approximate match would be appreciated.

Trying to add content even though this case is 2 years old. Hopefully someone else can use it.
For Transformations, Tokenization, etc - look in the same folder where Fuzzy Lookup is installed. There is an example file there called Portfolio.xlsx and a corresponding Readme.docx file. Those are very helpful. Frankly the documentation on the Fuzzy Lookup add-in is terrible (but it is free). The Readme file talks about an entitlement called "EditTransformationProvider" that might help this kind of problem.
I've implemented Fuzzy on a couple processes at my work and we have saved hundreds of man-hours when working in Excel. It's no joke.

How to optimize COUNTIFS with very large data

I would like to create a report that look like this picture below.
My data has around 500,000 cells (it will continue to grow larger)
Right now, I'm using countifs function from excel but it takes a very long time to calculate. (cannot turnoff automatic calculate)
The main value is collected as date and the range of date is about 3 years, so I have to put a lot of formula to cover all range of value.
result
The picture below is the datasource the top one cannot be changed. , while the bottom is the one I created by myself (can change). I use weeknum to change date to week number.
data
Are there any better formula or any ways to make this file faster? Every kinds of suggestions are welcome!
I was thinking about using Pivot Table, but I don't know how to make pivot table from this kind of datasource.
PS. VBA is the last option.
You can download example file here: https://www.mediafire.com/?t21s8ngn9mlme2d

I will post this answer with the disclaimer that it is entirely dependent on the size of the data set. That turning on and off the auto calculate is the best way, but your question doesn't let me do that, so keep reading.
Your question made me curious, so I gave it a try and timed it. I essentially set up two columns of over 100,000 rand numbers choosing from 1-1000 and then tried to do a countif on the two columns if they were equal. I made a macro that I can run that turns off the autocalculate, inserts the start time, calculates, and then inserts the finish time. I highlighted in yellow the time difference.
First I tried your way, two criteria, countifs:
Then I tried to combine (concatenate) the two columns to see if I could make it easier by only having one countif criteria and data set. It doesn't. see result below:
Finally, realizing what was going on. I decided to make the criteria only match the FIRST value in the number to look for. I was essentially reducing the number of characters to check per cell. This had a positive result. See below:
Therefore my suggestion is to limit the length of the words you are comparing in anyway possible. You are mostly looking at dates, so you might have to get creative, but this seems to be the best way possible without going to manual calculation.

I have worked with Excel sheets of a similar size. Especially if you are using the data on a regular basis, I would heartily recommend switching to a proper database SQL based, Access, or whatever fits your purpose. I does wonders for the speed and also you won't run into the size limits of Excel. :-)
You can import the data you have now fairly easy.
I am happy as a clam with my postgresql db.

Improve Vlookup on large file

I´ve a very large file that I reduced as much as possible to 3 columns and 80k rows.
I need to perform a vlookup in order to bring values from column 1 or 2 match some other spreadsheets values.
The thing is Excel doesn´t seem to support such large searches, and it stops responding - the computer has 4GB and a Quad core, and not much more running at the same time.
As far as I understand, as I´m not looking for exact matches, I should not use match-index.
The only thing I thouhgt could help but not sure about that, is dividing the file in 2-4, and asking Excel many parallel searches instead of a big one. Could this work?
What else should I try?
Thanks!!!

Sort your data and use True as the 4th VLOOKUP argument. This makes VLOOKUP use binary search rather than linear search and is lightning fast.
If you need to handle missing data you will need to use the double VLOOKUP trick, see
http://fastexcel.wordpress.com/2012/03/29/vlookup-tricks-why-2-vlookups-are-better-than-1-vlookup/

Append to Excel 2010 List

I have two lists that I need to compare and have a resulting list with all values and no duplicates. I've been trying to write this in VBA and without the knowledge of how long each comparable column will be.
Ex:
Recent_ID Prior_ID
76000 76000
76010 76300
76020 76020
Result should be:
76000
76010
76020
76300
I see a lot of other posts about this subject but I can't seem to find one that has a good and generic answer.

My friendly neighborhood search engine told me that this VBA code should probably do what you want without too much trouble.
http://mrspreadsheets.com/1/post/2012/08/vba-code-snippet-16-combine-two-columns-into-one-and-remove-duplicates.html

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string