I'm trying to use fuzzy lookup to match a list of correct names with a set of "dirty" names. But apparently vba only uses one core of my processors and it takes too much time because I am using it on at least 5000 names.
Here's a link to the fuzzy code: https://www.mrexcel.com/forum/excel-questions/195635-fuzzy-matching-new-version-plus-explanation.html#post955137
I also researched about "multi-threading" solutions for VBA and I found that there's no native way of doing it but someone found made an alternative using some scripts.
Here's the link for the multithreading vba script tool: https://analystcave.com/excel-vba-multithreading-tool/
Now, all I need to do is to integrate the lookup code to this multithreading script so that it will speed up the processing of this function. I am assuming that this is possible right?
Can someone help me with this? I only learned VBA through googling and reading other codes but this vba multithread tool is quite complicated for a beginner like me.
Thank you very much!
I'm not qualified to address the multithreading, but about your speed issue: are you running the code directly on the spreadsheet?
A better method is to import the entire table or range into an Array, and run the code on it there while it's in computer memory. It runs MUCH faster there. Then paste the results into the spreadsheet.
Here's some info on pulling the data into an array:
Creating an Array from a Range in VBA
http://www.cpearson.com/excel/ArraysAndRanges.aspx
You'll have to fiddle with the rest of your code, but basically you'll treat the array as if it were a table.
Below is an excerpt from Microsoft website. I believe their C# based add-in Fuzzy Lookup for MS-Excel is multi-threading based and much faster than the code you provide. Why to re-invent the wheel when we have a better option available.
The Fuzzy Lookup Add-In for Excel was developed by Microsoft Research and performs fuzzy matching of textual data in Microsoft Excel. It can be used to identify fuzzy duplicate rows within a single table or to fuzzy join similar rows between two different tables. The matching is robust to a wide variety of errors including spelling mistakes, abbreviations, synonyms and added/missing data. For instance, it might detect that the rows “Mr. Andrew Hill”, “Hill, Andrew R.” and “Andy Hill” all refer to the same underlying entity, returning a similarity score along with each match. While the default configuration works well for a wide variety of textual data, such as product names or customer addresses, the matching may also be customized for specific domains or languages. The following libraries are required and will be installed if necessary:
.NET 4.5
VSTO 4.0
Related
I have recently observed an issue regarding my data in a column that I use to perform data validation on my spreadsheet.
So There is nothing wrong with the formula, neither is there anything from with the use of data validation.
It should be looking for duplicate entries, which works quite fine.
The issue is that it no longer recognizes input made from a smartphone using the excel app.
so what i did was to retype cell text field from my PC and it worked perfectly.
Is there a way that I can continue using this technique (Data validation) without having to re-enter data from a PC in order for it to process?
Certainly! Yes, that is possible.
But... with all the possibilities in today's world, is your current strategy the one that is the best for you?
That is something I cannot answer for you.
That is something I cannot enumerate for you.
But... There is something that I can introduce to you.
PowerQuery
PowerQuery was a free add-on for Excel 2010 and 2013 and it has been baked directly into Excel for more than half a decade. So, if you're using the mobile app then you probably have a modern version of Excel with PowerQuery right at your finger tips.
Your first step if to determine how you want to make your data available for Excel to get. Go to the Data Tab on the ribbon and review your options in the "Get Extetnal Data" group.
It doesn't matter if free data is your Creed and your most intimate moments are publicly available through your raw data feed. Or if paranoia is the reason why you constantly drive around the block scraping SSIDs before squirreling them away to SQL server for detailed analysis. Or if you're using a USB cable to transfer photos to your PC because your mom walked in on you without knocking and was so disgusted by what she saw on your desktop that you're banned from the family LAN... For life. None of that matters because Excel can connect to your data in so many ways that one of them will be perfect for you.
There is a sense of familiarity when Importing your data into PowerQuery. It's not unlike following those timeless MS Wizards; but nothing like the uncanny sensation of being dropped into the PowerQuery editor. It is simultaneously the same as Excel and different from Excel and it may be the closest you ever come to visiting a parallel universe. Many of the same tools are available but they behave just slightly differently. And in some cases, like the Text To Columns tool, it is light years ahead of Excel and you will find yourself cursing at MS for not using it as a replacement for the old tool.
When you're done transforming your data, you'll have a tight clean table. But the real prize, is that you have fully automated pipe from source to product .
I figured that the phone user included extra spaces when inputting the data.
So i Used the TRIM() function which takes care of the extra spaces between, before, or after each word, and that did the job.
Therefore the major error was that there were additional spaces that was not recognized in the tested data.
I am new to Alteryx and am trying to use it for analysing unstructured data. I have a column of description in text form and I intend to use the K-Means Clustering tool for topic modelling. For K-means to work on text, I will need to convert my text into a Document Term Matrix (DTM) so that they appear as continuous variables to the clustering tool. However, I am struggling to find a way I can convert my text to a DTM.
Does anyone know a way to do so? I am currently looking at the R tool but am not exactly sure how to start too. Hoping that all of you experts here can help me out!
I have looked through posts on text analysis and realized that most fell back on the Microsoft Azure ML Text Analysis Macro. However, I would like to avoid using the macro (to not be restricted to limited runs every month for scalability) and instead use tools that are available in Alteryx.
Thanks to everyone in advance!
with Alteryx being more of a pictoral drag-and-drop workflow, it's not trivial to explain here, however I've created the following workflow and included the actual workflow itself on the Alteryx forum here. The workflow utilizes term frequencies from Inauguration speeches but should apply to any collection of documents. It just splits the words based on various non-numeric characters and does a summary. This is what the workflow looks like:
I'm developing an add-in for Excel using the Office Add-ins platform. In this add-in I'm writing data to a range using the setSelectedDataAsync** function. It works fine, but after the data is written, I'm not able to delete or edit the cells (although I can select new ranges) unless I click anywhere outside the worksheet or double click a cell. I think it is an issue with Excel not regaining focus correctly (the filename in the top of the app remains grayed out).
Some users seem to think that Excel becomes unresponsive, which is a problem.
Is this a known issue? Is there a work around for this?
** I have noticed that setSelectedDataAsync is way, way quicker than setting range.values to a matrix and then ctx.sync(). Am I losing some important functionality by not using the latter method?
This is not a known issue (unable to interact with worksheet after setting the data). We can look into that.
Surprised to hear that setSelectedDataAsync works faster than the range.values set. The batched syntax allows you to combine not just one instruction, but many related instructions such as setting number format, font, background, etc. and you can do a single sync() to send all instructions in one batch. So, it is more efficient when you combine related instructions together.
There is no restriction of which API to use as such; however the Excel1.1 version was introduced with Office 2016 and then there have been many releases since then incrementally adding new features along the way.
setSelectedDataAsync() API was designed to work across hosts such as Excel, Word, etc. and hence doesn't go deeper in-terms of setting number format, formats, etc.
What I want to do: Generate a report in Word based on unique data that I manually enter for different clients.
I collect at least 100 variables of data for different clients. I must write a report for each client that contains this information.
What I have tried in the past: I tried to streamline this process by using Excel to enter the data in select cells and run the Mail Merge function, which would then export the unique data into a templated Word document.
Problem: Unfortunately, this process is prone to error and has a tendency to crash my computer.
Question: Is there a way that I can successfully make this a seamless process?
Note: I do NOT have any programming knowledge whatsoever but I am here because I think a non-programming approach is simply not efficient. I am hoping I can reach a solution to this issue by teaching myself basic programming principles. Is this possible?
Yes - one way is to first add the Microsoft Office Word references in the VBA window. Then you can set up a word document with bookmarks. Then for each data you would like to insert:
Doc.Bookmarks("Bookmarknamehere").Select
App.Selection.TypeText "ClientDataHere"
You will have to define the word application and document variables and the above will work.
We have a SharePoint 2007 deployment which will have a substantially large document library. My client wants the ability to export this library to an Excel spreadsheet, but specifically wants the ability to divide the spreadsheet into several worksheets based on a specific field. Is this possible to accomplish in WSS 3.0, through the object model or otherwise?
There is a out-of-the-box Export to Spreadsheet, but it does not appear to support automated subdivision of the list items into separate worksheets. I do not know if Excel Services that come with MOSS are capable of it, but we do not have MOSS so we cannot consider it an option for now.
EDIT
It seems that by mentioning "out-of-the-box", I am implying that I'd prefer something quick and simple. Let's dispel that. I do a lot of heavy work in the object model. I only mentioned the Export to Spreadsheet because that's the only available method I know of off-hand, and its options are limitted. So I am comfortable with all manner of work level that can be suggested.
I should also note that keeping the list linked with the spreadsheet is undesired. We want to be able to download the spreadsheet as a reference. Because of the number of people who will be working on the list, it would be absolute chaos to try and synchronize all of the linked files. My client has agreed that it'll be easier to handle obsolete copies than to try some synchronized system.
The solution also needs to be deployable. So things which do not tailor to an individual site are best.
You won't be able to do this OOTB. You will have to write some code to iterate through the records of the list either using
The SharePoint OM - Better performance and richer API but has to run on a Web Front End
The web service - Can run on any machine
Then you can build up the Excel spreadsheet either by
Using the Excel object model (aka Automation) if this is a quick kludge running from a workstation - but excel wasn't designed to be used from an unattended server and/or high volume so you may also want to look at
A 3rd party component such as SpreadsheetGear to generate the Excel spreadsheet files.
A good bet is to quickly create views for your items (using filters as you want) mirroring your desired worksheets and then export those views into excel. Those views update with the list and you can manually grab new versions later. Still manual but OOTB and no excel hacking needed.
I posted this on SharePoint Overflow. One of the answers I received there was very useful, regarding the utility of the Open XML SDK. Thank you to those who answered... I looked over your suggestions. My client has decided to go through with this one on account that it does not cost money to implement (as Spreadsheet Gear or datapresentation's plugin would).