my problem is as follows:
Imagine I have a large Excel dataset that has a company identifier in column (A) and long string texts within cells in column (B). These texts in those cells contain many words, imagine it as being the company description. I need to find fuzzy duplicates in this database, i.e. an example could be:
In cell B2 I have:
"Amazon.com Inc (Amazon) is an online retailer and web service provider. The company provides products such as apparel, auto and industrial items, beauty and health products, electronics, grocery, books, games, jewellery, kids and baby products, movies, music, sports goods, toys, tools and other related products."
and in cell B222 I have:
"%& COMPANY DESCRIPTION: Amazon is an online retailer and web service provider. Amazon provides products such as apparel, auto and industrial items, beauty and health products, electronics, grocery, books, games, jewellery, kids and baby products, movies, music, sports goods, toys, tools and other related products. Amazon is a great company."
So my point is: is there a way to find B222 fast and somehow show in B2 that there is a fuzzy duplicate, e.g. with an 80% match, in B222?
I have tried multiple tools such as Ablebits and the Levenshtein Distance in VBA. However I am not 100% satisfied with the result.
Thank you for any help!
Best,
Related
I have no programming experience, yet I am in a desperate need for a help. I am managing a large restaurant/catering company where i once a month have to make an inventory of more than 1.000 products. I have an excell sheets, that are devided into a categories, for example for the kitchen, there is Fruits And Vegetables, Dairy, Frozen, Fresh Meat etc.
It is really hard, to go around using a laptop and do the inventory of more than a 1.000 items manually, so i was thinking, since I actually have a barcode scanner, if there is a possibility to just go around, and scan the products, after scanning a products barcode, excell would take me to the row of the products that corresponds to the scanned barcode.
Right now, if i scan the barcode with a scanner while in excell, it will just add the scanned value to what ever field i am in currently. so the scanner works, all i need is to capture that scan, compare it to the all barcodes that are associated with products, so it could take me to the QTY field of that particular product.
For example, if i scan a Star Anis product with barcode 23135165, the excell should find that product, which is as per picture in field B6, and than take me to the field F6, which is where the quantity should be entered.
\So, in short, i scan a product that is in a dry storage, the excell finds a product within 6 different sheets via corresponding barcode, and takes me to the field where i can enter the quantity of that product :)
This would save me hours and hours of work :)
can this be done? Thank you :)
This is the picture of one of the sheets that i use when making and inventory.
I have data in an excel sheet that is essentially a list of all orders for my company over its life span (approximately 14 thousand orders if it matters.) The following fields are relevant for what I'm trying to do currently.
Purchase date (i.e. 6/23/19)
Customer ID (unique ID given to each customer, this ID is constant across all of a customers purchases)
Product category (widgets, woozits, etc.)
Sales person (John Doe, Jane Doe, etc.)
What I'm trying to figure out is our repeat purchase rate by category, and then by sales person.
So ideally I'd like to be able to determine something like
Product category: Widgets
20% of people whose first purchase from us is a widget, purchased something else later on
Of widgets sold by John Doe to first time customers, 15% of the customers purchased something else from us later on
Of widgets sold by Jane Doe to first time customers, 25% of customers purchased something else from us later on.
So basically trying to figure out if different sales reps have better repeat purchase rates on their orders, however we must divide this by product category as our repeat purchase rate by category is going to vary widely (and some sales reps only sell items from certain categories so it would be unfair to compare across categories.)
I believe to do this I need to figure out how to say something like
"Find every widget John Doe sold ; see how many have customer ID's that did not appear on an earlier date ; then see what % of those customer ID's appear at a later date regardless of the next product category or sales person they purchased from."
Hopefully someone can help. I apologize if I didn't explain something particularly well, if there's any confusion I can try my best to clarify.
Thank you!
I have a controlled assessment at school, and one of the tasks is that you should have prices for different qualities of pet food and bedding.
I have tried using insert spinner in Developer, but it only works for number value, and I need high, low, and medium qualities of pet food and bedding.
Here is a screenshot of the animal table:
I am trying to find out the similarity between two english sentences. Among Jaccard, Dice, Exact and Cosine string matching algorithm which is the best when it comes to string matching or determining the closeness?
Sentence 1: Online shopping for electronics, computer parts, apple accessories, health & beauty, video games, cell phone accessories, home & garden and more at tmart.com. we provide wide selections of products at best price for worldwide free shipping.
Sentence 2: Shop for electronics, apparels & more using our Flipkart app Free shipping & COD.
Based on football data I am looking to create a league table which will give rankings of all the teams, the only difference is, each game week, teams will earn points based on the current ranking of the team they are playing.
So week by week, the excel document (with data added) will need to work out league position, work out who plays who, and then allocate points based on the result and the teams ranking of that week(which would be accumulated up to that week)
Ie, if Arsenal - Ranked 1st played Stoke Ranked 18 - and stoke won - Then the score stoke would receive for that week would be completely based on how high Arsenal was in the league..
This is a broad question, but I'll take a crack at answering it.
If you have a worksheet with ranking by week, say with team in col A then rank in each column thereafter, you could then refer to this with a relative cell reference. If you need the ranking to be calculated, that could be done on the weekly ranking sheet. A formula on a worksheet for points calculated might end up being something like:
='points'!b2/'weeklyrank'!b2
This would divide points by a team's rank in the prior week, thus awarding more points for higher ranked teams, assuming that is the objective. There are all kinds of ways to tweak this. Hopefully this gets you started.
If you have a question about how to calculate the rankings, I would post it as another question, but with more detail.