Report generation based on multi lookup and dynamic columns - excel

I am a little stuck with a report I am trying to generate in Excel and was hoping someone could help.
Here is a summary of what I am trying to do:
Table 1 has one column called people (it’s basically a list of
employees)
Table 2 has one column called countries (it’s basically a
list of relevant countries)
Table 3 has three columns called person,
country and date.
There is one entry for every person each time they review a country.
So the data will look something like:
PERSON | COUNTRY | DATE
John | uk | 10/01/2013
Paul | uk | 15/01/2013
John | France | 15/01/2013
Bob | Spain | 16/01/2013
The report I need to produce is one which shows who has/hasn't checked each country.
So the columns will be ‘Person’, uk, France, Spain (and any other unique value from the country table).
There will then be one single row for each person with a Yes/No value in the relevant column if that person has reviewed that country i.e. Table 3 contains a value that matches that value for the person and country.
So to be clear the report should be similar to:
PERSON | UK | FRANCE | SPAIN
John | Yes | Yes | No
Paul | Yes | No | No
Bob | No | No | Yes
In summary I can split this into two problems:
How to generate a table that has a column for every unique value in another table (country in the explanation above)
How to do a double lookup i.e. IF EXISTS in TABLE 3 ‘person’=john & ‘country’=uk then return ‘Yes’, otherwise return ‘No’
I’m happy to keep in Excel or make use of SQL reporting i.e move my data to SQL first.

It's kind of a wonky formula but =sumproduct() will do a dual lookup.
=IF(SUMPRODUCT(--($K$2:$K$5=$K13), --($L$2:$L$5=M$10)),"Yes","No")
The Person/Country/Date table is located in the range K1:M5 the results table is located in range K10:N13. I had a workbook open and put it in the corner. (Nobody puts sumproduct in the corner)
The gist is, -- turns a true and false into a 1 or 0. sumproduct will multiply the two results line by line. If both are true, you get 1 x 1 and funnels that back into the if for a yes and no. You'll have to be mindful of th $ in the formula.

Related

Excel VBA for Data Cleaning

I want to find a way to automate our data cleaning (de-duping) process using excel VBA. Currently, our team gets a patient list from the clinics to de-dup since we find variations of duplicate records.
The reports we get from the clinic comes in an excel spreadsheet. We use specific columns of information from the spreadsheet we receive since we do not need everything from there. I've used multiple functions to remove duplicates since the patient records are entered manually, so there's a lot of different ways you'll see in the list.
It's okay if they have different addresses and insurance info because people move and they switch insurances. The conditions we focus on is whether someone has the same date of birth and first name and last name. We're not too strict on the last name either because last names change. In this case, we make sure that their date of birth is the same, first name is the same, and the address. We need at least three elements the same to consider that these are the same person, or we'll say these are different people and leave the records alone.
The list we work consist of the following columns:
First name, last name, DOB, Address, City, State, Zip, Primary Insurance.
| First Name | Last Name | DOB | Address | City | State | Zip | Primary Insurance |
|------------|-----------|--------------|------------------|----------|------------|--------|--------------------|
| John | Smith | 01/01/1990 | 123 ABC Street | Denver | Colorado | 87880 | Humana Insurance |
| Jon | Smith | 01/01/1990 | 123 ABC Street | Denver | Colorado | 87880 | Humana Insurance |
| Anthony | Bowen | 02/02/1992 | 456 ABC Street | Austin | Texas | 78632 | Aetna |
| Tony | Bowen | 02/02/1992 | 456 ACC Str | Austin | Texas | 78632 | Aetna |
Currently I sort the entire sheet of data from DOB oldest to newest, first name a-z, last name a-z.
From there I apply a formula:
if(and(C2=C3, B2=B3, A2=A3),"Check","") and apply through the entire rows. I filter all the blank cells to remove the formulas embedded and un-filter to jump down to the next cell that is flagged down with "Check" and check the next two rows.
I spot check to make sure the formula is picking all the true duplicates and then filter to see all the "Check" and remove all rows at the same time.
Then I move on to applying another formula:
if(and(C2=C3,left(B2,2)=left(B3,3),left(A2,2)=left(A3,2)),"Check","")
This is to match by same DOB, first two letters of first name and last name. Same process by jumping down to the next row that is flagged down with "Check" to review the two consecutive rows.
Again, spot check to make sure they're true duplicates, then filter to see all "check" and mass delete.
I would like to do this by having macro embedded buttons and the button will help grab the flagged down records to another tab of worksheet and removing those duplicated records from the original working sheet. So this way, it's removed from the original list but the user can still go to the other tab to review those removed duplicates if they want to.
I'd appreciate any suggestions anyone can give.
Thank you!

How to get two+ rows to link together? Excel 2010 (Example)

I have a parts list with competitor pricing. One part number brings multiple brands up with the location of the company.
As you can see from the picture, I have part numbers for one item with three companies. I want to sort by part type. So for example I want to list only the brake pads. When I do this the blanks get sent to the bottom, but the blanks are not really blanks because they have additional info with them for that part number.
Column 1 | Column 2 | Column 3 | Column 4 | Column 5 | Column 6 | Column 7
Part No | Company A | Price | Company B | Price | Company C | Price
4656546 | Brand A | $5 | Brand A | $5 | Brand A | $5
(BLANK) | Brand b | $8 | Brand b | $8 | Brand b | $8
I have tried to use a helper column, but I have 1,000+ rows.
Does anyone know if you can link or have a relationship between two+ rows?
I hope you understand and if not. I can try to explain better.
I asume that a "blank" in PartNo means "take the PartNo from the cell above" ...
In order to normalize the PartNo (= get rid of the blanks) use another PartNo-Normalized column (e.g. [K:K]) and normalize as following:
K1 ="PartNo-Normalized"
K2:Kxx =IF(A2<>"",A2,K1)
Next convert all formulas in [K:K] into values !!! (Copy / PasteAs - Values) before sorting ... as a sort operation will destroy the calculated values.
After conversion to values it's save to sort, and you may create a filter on that column.
Depending on how well organized your data is, it might be a good idea to add one more column and fill it with 1, 2, 3, 4, 5 ... before any sorting so you can restore the original sort order just in case something nasty happens.

Transponse just some columns in excel

I have a worksheet with columns similar to the below
name | id | contact | category | week 1 | week 2 | week 3 | ... |week 52
What I need to do is transpose the 'week' columns into rows, so I end up with:
name | id | contact | category | week
With an entry for each week as a row in the s/sheet - thus making a long list on rows with the column data for each week.
example current format:
jones | 12345 | simon | electronics | 100 | 120| 130| 110 | ..........150
Required format
jones | 12345 | simon | electronics | 100
jones | 12345 | simon | electronics | 120
jones | 12345 | simon | electronics | 130
jones | 12345 | simon | electronics | 110
...
jones | 12345 | simon | electronics | 150
I have tried the usual excel transpose (via paste) but cannot get the first few columns to stay static, whilst transposing the week columns
Ideally I would like to achieve this within excel, but I can import the data into a mysql database and use that if the solution would be easier that way
Hope this makes sense
[added examples]
I would do the work on a second sheet, which uses the INDIRECT function to do the lookups for you:
http://www.excelfunctions.net/Excel-Indirect-Function.html
Start by setting up some indexes on the new sheet - we will use these to indirectly look up into the original sheet and pull the data across.
I would count up to 52 again and again in column A, starting with a 1 in A2, and using this formula below:
=if(A2=52,1,A2+1)
This would be my count of the weeks per person.
In column B, I would count my people, starting with a 1 in B2, and using this formula:
=if(A3=1,B2+1,B2)
This gives me the row and column offsets to use in the INDIRECT function to fetch the data from your original sheet.
Now the fun part - matching these row and column offsets to your actual data.
Lets assume your original data is in a sheet called "original". This is where we need to look up the data.
We will map the original column A into the new sheet's column C. So C2 can hold this formula:
=indirect("original!R"&($B2+1)&"C1",false)
What you are doing there is looking in the row that you calculated in the B column (formula above), and looking in the first column of that row (i.e. column A) - this is where the Name is stored.
Similarly, the "id", "contact" and "category" columns get mapped to new sheet columns D, E, F, using modifications of that formula:
=indirect("original!R"&($B2+1)&"C2",false)
=indirect("original!R"&($B2+1)&"C3",false)
=indirect("original!R"&($B2+1)&"C4",false)
Only the column offset gets changed in these updates.
To pull the weekly data across, we use a similar formula; the difference is that now we get to use the newly calculated column A, where we counted up from 1 to 52 over and over.
So G2 becomes:
=indirect("original!R"&($B2+1)&"C"&(4+$A2),false)
Copy this all down as far as you need, and hide columns A and B.

In Excel, how can I use a formula to merge rows with common values and concatenate differences?

Using the example from this question:
Excel - Merge rows with common values and concatenate the differences in one column
How can I change:
Customer Name | NEW YORK | ALBANY
Customer Name | NEW YORK | CLINTON
Customer Name | NEW YORK | COLUMBIA
Customer Name | NEW YORK | DELAWARE
Customer Name | NEW YORK | DUTCHESS
Customer Name | VERMONT | BENNINGTON
Customer Name | VERMONT | CALEDONIA
Customer Name | VERMONT | CHITTENDEN
Customer Name | VERMONT | ESSEX
Customer Name | VERMONT | FRANKLIN
to this:
Customer Name | VERMONT | BENNINGTON,CALEDONIA,CHITTENDEN,ESSEX,FRANKLIN
Customer Name | NEW YORK | ALBANY,CLINTON,COLUMBIA,DELAWARE,DUTCHESS
where | denotes a cell. The answer given in the question above was for a macro. I need to create a manageable template and most people do not know how to manage macros. Thus, I need a formula to do this. Can anyone help out?
If your data is sorted by state, you can use a formula column and a helper column to make it simple. First column, you concatenate the states
in D2:
=if(B2=B1,D1&C2&",",C2&",")
in the second column, you can put a filter that tells you if the list is finished
=if(B2=B1,"","State Complete")
You can filter on the State Complete value and get your results.
If you're trying to go a lot more fancy than that, you'll need macros or user-defined functions.
Similar but a singe formula (and a 'wheeze'), assuming ALBANY is in C2, etc:
=IF(B1=B2,D1&","&C2,C2)&IF(B2<>B3,".","")
The full stop (period) is to identify the last of each set (which are assumed to be sorted and for ColumnC TRIMmed if necessary - also assumes no full stops in Column C).
A formula will not delete rows so either filter to select rows with cells containing full stops or copy the column, Paste Special Values over the top, filter and delete those not containing a full stop. I'd prefer the latter as presumably ColumnC itself should be deleted.
The periods could be removed with Find and Replace.

Counting the number of older siblings in an Excel spreadsheet

I have a longitudinal spreadsheet of adolescent growth.
ID | CollectionDate | DOB | MOTHER ID | Sex
1 | 1Aug03 | 3Apr90 | 12 | 1
1 | 4Sept04 | 3Apr90 | 12 | 1
1 | 1Sept05 | 3Apr90 | 12 | 1
2 | 1Aug03 | 21Dec91 | 12 | 0
2 | 4Sept04 | 21Dec91 | 12 | 0
2 | 1Sept05 | 21Dec91 | 12 | 0
3 | 1Aug03 | 30Jan89 | 23 | 0
3 | 4Sept04 | 30Jan89 | 23 | 0
This is a sample of how my data is formatted and some of the variables that I have. As you can see, since it is longitudinal, each individual has multiple measurements. In the actual database there are over 10 measurements per individual and over 250 individuals.
What I am wanting to do is input a value signifying the number of older brothers and older sisters each individual has. That is why I have included the Mother ID (because it represents genetic relatedness) and sex. These new variable columns would just say how many older siblings of each sex each individual has. Is there a formula that I could use to do this quickly?
=COUNTIFS($B:$B,"<>"&$B2,$H:$H,$H2,$AI:$AI,$AI2,$J:$J,"<"&$J2)
Create a column named Distinct with this formula
=1/COUNTIF([ID],[#ID])
Then you can find all the older 0-sexed siblings like this
=SUMPRODUCT(([DOB]>[#DOB])*([MOTHERID]=[#MOTHERID])*([Sex]=0)*([Distinct]))
Note that I made the data a Table and used table notation. If you're not familiar [COLUMNNAME] refers to the whole column and [#COLUMNNAME] refers to the value in that column on the current row. It's similar to saying $A:$A and A2 if you're dealing with column A.
The first formula gives you a value to count that will always result in 1 for a particular ID. So ID=1 has three lines and Distinct will result in .33333 for each line. When you add up the three lines you get 1. This is similar to a SELECT DISTINCT in Sql parlance.
The SUMPRODUCT formula sums [Distinct] for every row where the DOB is greater than the current DOB, the Mother is the same as the current Mother, and the Sex is zero.
I have a possible solution. It involves adding two columns -- One for "# older siblings" and one for "unique?". So here are all the headings I have currently:
A -- ID
B -- CollectionDate
C -- DOB
D -- MOTHER ID
E -- Sex
F -- # older siblings
G -- unique?
In G2, I added the following formula:
=IF(A2=A1,0,1)
And dragged down. As long as the data is sorted by ID, this will only display "1" once for each unique person.
In F2, I added the following formula:
=COUNTIFS(G:G,"=1",D:D,"="&D2,C:C,"<"&C2)
And dragged down. It seemed to work correctly for the sample data you provided.
The stipulations are:
You would need the two columns.
The data would need to be sorted by ID
I hope this helps.
You need a formula like this (for example, for row 2):
=COUNTIFS($A:$A,"<>"&$A2,$E:$E,$E2,$D:$D,$D2,$C:$C,"<"&$C2)
Assuming E:E is column for sex, D:D is column for mother ID and C:C is column for DOB.
Write this formula in H2 cell for example and drag it down.

Resources