Excel string similarity/keyword comparison - excel

I have an excel sheet with two datasets and each of them having two columns namely MinistryName and Revenue. I need to match MinistryName in Dataset-1 with that in Dataset-2 and note the revenue.
First I thought it was easy and I can use vlookup to grab all the values from second dataset but then I realized that there is slight difference in names. So vlookup didn't work. Take a look at the below sample values from two datasets.
Dataset-1 Revenue
-----------------------------------------------------------
High Office of Anti Corruption 78.67
Central Statistics organization 6.56
National Academy of Sciences 54.21
Dataset-2 Revenue
-----------------------------------------------------------
The High office of Oversight and Anti Corruption 86.00
Central Statistics Office 12.40
Science Academy 75.91
There is a lot of data in spreadsheet having similar values in Name column. I know that for example;
Central Statistics organization is the same as Central Statistics Office logically but how i can compare them in excel. Using vlookup excel will always assume that they are different.
Is there any string comparision function in excel by which i can compare keywords or compare two strings for similarity? The need is that I want to pick Revenue values from Dataset-2 and put them after Revenue Column in Dataset-1. It will also be useful if any one has solution using VBA function.

The MS fuzzy look up may be what you need here. This allows for "fuzzy" searches which is those that essentially are within a certain distance of differences from each other.

Related

Excel: Sum of Total Population per Continent (using Excel formulas - without Pivot)

I was playing a bit with the data of nowadays topic: Covid-19 and I downloaded some data to do some Analytics from:
https://www.ecdc.europa.eu/sites/default/files/documents/COVID-19-geographic-disbtribution-worldwide.xlsx
And this is what I managed to do using Power Query and Pivot tables:
From this data here - it is just a small piece of huge dataset:
Pivot tables are great and you can do a lot of things as you can see, BUT, what I wanted to do is do calculations (sum per Continent with excel formulas for the Total Population per Continent. There in the data set is lot of countries with certain Population and I am not managing the right way using =max/if/sumif. I just wanna know this way for myself!
I am sure that ain't that hard but I am now bit out of logic =)
I hope you got the point!!
First Edit:
=sumif doesn't work at the first place, because you have repetitive Population per one Country per day...it is summing the same land with its population per day - not what I need but only 1x the general sum of population.

turning rows to columns in Excel

I have an excel datasheet with more than 1,000,000 rows and 80 columns. the datasheet contains sales information of a chain store with more than 1700 store nationwide. each store is repeated 52(weeks in a year)* about 30 (products sold in that given week)* 2(two years). I want to convert the rows corresponding to products to columns. I can't do that using transverse because the products sold each week might not be exactly the same as those sold next week. do you have any solutions?
thanks
I just made a very simplified version of that excel file. the problem is that the products sold are not the same each week. there is a limited set of product, but only some of the items are sold each week
https://drive.google.com/open?id=1B2vjIL2hemfQNrCz0X6u_pzi7Euy6IWa3Lj0_HzDXDE
This isn't much of an answer yet - but it either will become one or I'll delete it, depending on the OP's response.
I'm thinking that transverse/transpose is the wrong term for what you're trying to do.
Perhaps you're just trying to better organize/visualize this data, something similar to one of these Pivot Tables:
or
These are just two of the infinite ways you can organize data in a Pivot Table.
Is that similar to what you're trying to do? If so I'll share some more info.
If this quantity of data is going to keep coming your way, way you really need is to start using an Access database to get this under control and be able to report on it properly (and easily, once it's setup).

Is there a way to obtain a count of instances where a criteria is fulfilled in a pivot table in Excel 2016

I'm analyzing a set of clinical data in Excel that includes
the level of care that a variety of individuals have received
the date of the receipt of that care
the date that the care ended
The question I'm trying to answer in a report is how many individuals that have received a higher level of care connected to a lower level within 14 days.
I've organized the data in such a way that by creating a pivot table, it organizes the data nicely. However, in this dataset individuals have had multiple instances of each level of care that may or may not be connected within 14 days to a lower level of care.
Granted, the dataset is small enough to count this out by hand, but I foresee having to do this many times in the future with possibly much larger datasets.
As such, I'm wondering if there's a way to automate this process. I can almost conceptualize a nested if statement to flag the instances prior to developing the pivot table, and then count these flags as follows:
as everything is already 2 way sorted by individual and then by date, I might be able to do if(levelofcare<>levelofcare of the cell above it, if(date of admission - date of discharge of the cell above it <=14,1),0 and then generate the pivot table and sum that column.
However, I feel this would be rather inaccurate considering the data and that sometimes, the "level of care" field isn't a standardized string.
I would add columns to the data that calculate what you need to show in the pivot table.
Probably a column that shows individuals that received a higher level of care (an if statement that gives a yes or no answer). A column that shows if an individual received a lower level within 14 days if the previous column is yes.
These can then be added to the pivot.
Hypothetical answer as no data was provided in the question - happy to edit if data provided.

Dynamic Excel 2007 Dashboard Without VBA

Morning guys,
I'm hoping that one (or more) of you can help me.
I have been tasked with creating a dashboard which needs to display trends and have a dynamic frontsheet, preferably with drop-down or data forms so as to update a chart / graph.
The information itself is incredibly limited - the scope of the document is tracking a value (0-4) assigned to a staff member's ability to fulfill a task, e.g. 'Quotes - 4', 'Cancellation - 2' and so on. So the metrics are limited to:
Month (a worksheet for each month of the year and one front for the dashboard)
Team (Presently 6 teams, but this is likely to increase over time, so hopefully the solution facilitates relatively easy incorporation of new teams)
Employee (Self explanatory)
Task (Presently 25, but as above - subject to change)
Score (the 0-4 value referred to above)
So as you can see, it's a very simple dataset. The sheets are presently set out with six grids with data validation lists for determining Team and Score (dropdowns for easy data input), with the Task being pre-written and the employee entered manually by the user.
What I'm hoping to do is have a frontsheet with dynamic tables that update accordingly when a dropdown and/or data form is changed. The key focus is on getting the staff members up to 4s for all tasks, so ultimately, the charts will display trends for the individual teams (one chart for each team - 6 charts) on a month-on-month basis and also a dynamic table which can reflect specific information (e.g. employee performance on a specific month, or number of '3s' achieved by a specific team to date).
I've read a reasonable amount on this, but seem to have overwhelmed myself with the sheer amount of options. However, the options can be narrowed given that I'm working on a large corporate network that doesn't really facilitate downloads (so add-ins or anything extraneous to Excel 2007 'out-the-box' isn't an option) and preferably without the use of VBA (1. I'm quite a novice insofar as VBA, 2. Easy distribution and maintainence of the document might be marred by VBA?), though I appreciate that my requirements may dictate VBA to be essential.
Does anyone have any suggestions around how best to proceed creation of this dashboard?
Any and all help is appreciated and I apologise as a newbie if I've contravened any conventions around forum etiquette.
Thank you all for your time,
Rob
There are a couple of things that you need to consider in a task such as this:
a) what sort of output do you require?
b) how are you going to manage the data?
For a) I'd separate it further into the basics of what's required (time series charts of employee and/or team performances [how will team performance be measured? average, % achieving 4, or ?]) and then the bells and whistles of drop-downs. Focus on the basics, the other stuff first the whizzy stuff can come later. Getting b) right is vital - you are going to be extracting subsets of the data to build the charts you want to display. Get b) wrong and you'll just create a horrible task for yourself.
In your position I would consider re-organising the data into the form of a table. Excel's help defines what is meant by a table, but in essence it is a list of your observations where each observation simply comprises the score for a particular month/team/employee/task combination (so each observation comprises 5 values). The observations are arranged as successive rows of the table with the first row being the header row which will contain suitable labels such as "Month", "Team", "Employee", "Task", "Score". The real advantage of using a table such as this is that Excel provides a heap of in-built facilities for manipulating them - look up the help for Sort and Filter on the Data tab. In your case there is an even more compelling reason for using a table - you can use the Pivot Table and Pivot Chart facilities for analysing and displaying the data. If you have not used these before some time and effort spent learning about them will pay dividends. Once your data is organised and you know how to use Pivot Tables and Charts you should be able to prototype sum output very quickly.
If you do decide to organise your data as a table you can still keep a nice friendly looking grid of 6 team "tables" (different from Excel's use of the word) as a data entry facility to enter each month's scores by employee and task. You will need to find a way of getting each month's data from the data entry "tables" to the main data table. (Easiest way would be to use a bit of spare worksheet under the data entry tables to reproduce the entered data as a series of observation rows and then use Paste Special Values to append these rows to the end of the main table of observations. You can use VBA to automate the copy/paste operation if you want, you just need to figure out a way of identifying how may observations are currently in the main table and precisely where you want the paste to end up - COUNT() or COUNTA() is a useful friend here). Main problem to avoid (whether automated or not) is to avoid appending same entered data more than once to main data table.
Have a look at http://www.mediafire.com/download/x64swkp689k10a1/DataEntrytoTable.xlsx for a simple example of some of the above thoughts

Calculating Percent of Total in Power Pivot Model

I have created a power pivot table as shown in the picture. I want to calculate quarter over quarter sales change. For which I have to divide for example corporate family "Acer" 's sales in 2012Q4 by sum of all the corporate family. I am using calculated measure to do this, but I am not sure what formula I can use.
My need is to create two columns, one for 2012Q4 percent of total and one for 2013Q1 percent of total. Then I will create another measure to find the difference. So the formula for 2012Q4 should be like this 1624442 / (1624442+22449+1200+16123) . Any idea which function can help me do it?
It sounds like you are measuring the change in the percent of total for each corporate family from quarter to quarter. You will need to create 3 calculated measures. I'm not sure what your model looks like so I can't give you the exact formula, but here is the idea.
CurrentQtr%ofTotal:= Divide(Sum('Sales'[Units]),Calculate(Sum('Sales'[Units]), All['Product'[Corporate Family])))
PrevQtr%ofTotal:= DIVIDE(CALCULATE(Sum('Sales'[Units]), DATEADD(DimDate[DateKey], -1, QUARTER)),
CALCULATE(Sum('Sales'[Units]), DATEADD(DimDate[DateKey], -1, QUARTER), All('Product'[Corporate Family]))))
Change%ofTotal:= DIVIDE(([CurrentQtr%ofTotal]-[PrevQtr%ofTotal]),[PrevQtr%ofTotal])
I used the divide function because it handles divide by zero errors. You use the ALL function to remove the filter on the Corporate Family column from the filter context. The Change%ofTotal is just to find the differenc. I'm calculating % change but you may just want to subtract.
Here's the link to a good blog post on time intelligence. And here's one on calculating percent of total.
For percentages please follow the tutorial on the Tech on the Net.
Adding another column where you calculate a difference between two pivot columns will not work - this column is "unpivotable", as it relies on a column defintion. You would need to copy and paste pivot as values to another worksheet and do the extra calculation there.

Resources