Reporting Row and Column Names Connected to a Value (Microsoft Access) - excel

I'm well out of my depth on this one, as I don't use databases much. I hope it suffices that I've tried to pay it forward by helping people with InDesign and Photoshop on other websites!
I can use Access or Excel for what follows.
I have data that looks like:
year President Vice
1980 Reagan Bush Sr.
1984 Reagan Bush Sr.
1988 Bush Sr. Quayle
1992 Clinton Gore
1996 Clinton Gore
2000 Bush Jr. Cheney
2004 Bush Jr. Cheney
2008 Obama Biden
2012 Obama Biden
I want a report that looks like:
Biden: Vice 2008, 2012
Bush Jr.: President 2000, 2004
Bush Sr.: President 1988; Vice 1980, 1984
Cheney: Vice 2000, 2004
Clinton: President 1992, 1996
Gore: Vice 1992, 1996
Obama: President 2008, 2012
Quayle: Vice 1988
Reagan: President 1980, 1984
I'm having trouble figuring out how to identify a common name that may appear anywhere on the table, and how to grab the row and column labels for the report.
This is a simplified version of the real data, which doesn't have to do with politicians. There are actually ten column labels that are relevant, not just two. "Bush Sr." gives an example of one person holding two different offices.
There are not currently any cases where the same name appears in two different columns in the same row, but I'd prefer not to rule out the possibility, unless it's dramatically more complex to allow that.
Thanks!

The first thing we need to do is convert that data from "few rows, many columns" to "few columns, many rows" via a UNION query. (I saved your test data in a table called [N_column_table].)
SELECT [year], "President" AS office, [President] AS person
FROM [N_column_table]
UNION ALL
SELECT [year], "Vice" AS office, [Vice] AS person
FROM [N_column_table]
If you save that query as "3_column_data" then you can use it just like a table in other queries, reports, etc.. (You will have to add ~8 more UNION ALL constructs when you build your query for the real data.)
So now our data looks like this
year office person
1980 President Reagan
1984 President Reagan
1988 President Bush Sr.
1992 President Clinton
1996 President Clinton
2000 President Bush Jr.
2004 President Bush Jr.
2008 President Obama
2012 President Obama
1980 Vice Bush Sr.
1984 Vice Bush Sr.
1988 Vice Quayle
1992 Vice Gore
1996 Vice Gore
2000 Vice Cheney
2004 Vice Cheney
2008 Vice Biden
2012 Vice Biden
Now, as for "gluing together" the offices and years, we'll need to use a little VBA function for that. Create a Module in Access, and paste in the following code
Public Function ListTerms(person As String) As String
Dim cdb As DAO.Database
Dim rstOffice As DAO.Recordset, rstYear As DAO.Recordset
Dim result As String, yearString As String
Const YearSeparator = ", "
Const OfficeSeparator = "; "
Set cdb = CurrentDb
result = ""
Set rstOffice = cdb.OpenRecordset( _
"SELECT DISTINCT office " & _
"FROM 3_column_data " & _
"WHERE person=""" & Replace(person, """", """""", 1, -1, vbBinaryCompare) & """ " & _
"ORDER BY 1")
Do While Not rstOffice.EOF
yearString = ""
Set rstYear = cdb.OpenRecordset( _
"SELECT DISTINCT [year] " & _
"FROM 3_column_data " & _
"WHERE person=""" & Replace(person, """", """""", 1, -1, vbBinaryCompare) & """ " & _
"AND office=""" & Replace(rstOffice!Office, """", """""", 1, -1, vbBinaryCompare) & """ " & _
"ORDER BY 1")
Do While Not rstYear.EOF
If Len(yearString) > 0 Then
yearString = yearString & YearSeparator
End If
yearString = yearString & rstYear!Year
rstYear.MoveNext
Loop
rstYear.Close
Set rstYear = Nothing
If Len(result) > 0 Then
result = result & OfficeSeparator
End If
result = result & rstOffice!Office & " " & yearString
rstOffice.MoveNext
Loop
rstOffice.Close
Set rstOffice = Nothing
Set cdb = Nothing
ListTerms = result
End Function
Now we can use that function in a query to list each person and their terms in office
SELECT personlist.[person], ListTerms(personlist.[Person]) as terms
FROM (SELECT DISTINCT person FROM 3_column_data) personlist
which returns
person terms
Biden Vice 2008, 2012
Bush Jr. President 2000, 2004
Bush Sr. President 1988; Vice 1980, 1984
Cheney Vice 2000, 2004
Clinton President 1992, 1996
Gore Vice 1992, 1996
Obama President 2008, 2012
Quayle Vice 1988
Reagan President 1980, 1984

Related

Proper syntax for calling a particular UDF

This has reference to SO question Find all other cells with same adjacent element and data reproduced below to avoid cross reference.
I have an excel spreadsheet with the following columns
• A: City
• B: State
• C: Other cities that are in the same state as column A
For example, the result may look like this:
City
State
Other cities in State
Philadelphia
Pennsylvania
Pitsburgh
Pitsburgh
Pennsylvania
Philadelphia
San Diego
California
Palo Alto, Mountain View, LA, San Jose, Houston
Palo Alto
California
San Jose, Mountain View, San Diego
Mountain View
California
San Jose, LA, Palo Alto, San Diego
LA
California
San Jose, Mountain View, Palo Alto, San Diego
San Jose
California
LA, Mountain View, Palo Alto, San Diego
Austin
Texas
Houston, Dallas
Houston
Texas
Austin, Dallas
Dallas
Texas
Dallas, Houston
It was answered by user4039065 who advised to use an UDF and the code is as follows.
Option Explicit
Function CITYJOIN(rst As Range, sst As String, rct As Range, _
Optional sct As String = "", _
Optional bIncludeSelf As Boolean = False, _
Optional delim As String = ", ")
Dim r As Long
Static dict As Object
If dict Is Nothing Then
Set dict = CreateObject("Scripting.Dictionary")
dict.compareMode = vbTextCompare
End If
dict.RemoveAll
'truncate any full column references to the .UsedRange
Set rst = Intersect(rst, rst.Parent.UsedRange)
'set the cities to the same size as the states
Set rct = rct.Resize(rst.Rows.Count, rst.Columns.Count)
'loop through the cells to create unique dictionary keys
For r = 1 To rst.Cells.Count
If LCase(rst(r).Value2) = LCase(sst) Then
dict.Item(StrConv(rct(r).Value2, vbProperCase)) = vbNullString
End If
Next r
'get rid of 'self-city'
If Not bIncludeSelf Then
dict.Remove sct
End If
'return a delimited string
CITYJOIN = Join(dict.keys, delim)
End Function
It gives correct answer when used in worksheet as per following formula.
=CITYJOIN(B:B,B2,A:A,A2)
My level in VBA is elementary and I want to understand the Function code fully by stepping through the code using F8 key. With this in view I coded the following Test sub.
Sub test()
Call CITYJOIN("B: B", B2, "A: A", A2)
'CITYJOIN B: B , B2, A: A , A2
End Sub
I am getting an error at the following line in Function code stating compiler error.
CITYJOIN = Join(dict.keys, delim)
Can someone help me and provide proper code of test sub explaining the mistake in the above code of test sub.
Thanks
Please, call the function in this way:
Sub test()
Debug.print CITYJOIN(Range("B:B"), Range("B2").value, Range("A:A"), Range("A2").value)
End Sub
And see the result in Immediate Window (Ctrl + G being in VBE (Visual Basic for Applications Editor).

Combine SIMILAR value rows using Python Pandas

Suppose I have the following Dataframe-
company money
jack & jill, Boston, MA 02215 51
jack & jill, MA 02215 49
Now, I know that these 2 rows mean the same company, so I want to merge them and also sum the money-
company money
jack & jill, Boston, MA 02215 100
I don't care about the format of the company name, as long as the duplicates get merged and the money gets added.
How should I go about this? Is there a library out there that merges SIMILAR value rows and sums the corresponding quantitative value?
If you have same pattern in company column i.e. the value before the 1st comma is company name. You can use something like below:
df = pd.DataFrame({'company':['jack & jill, Boston, MA 02215','jack & jill, MA 02215','Google, New Jersey', 'Google'],
'money':[51,49, 33, 22]})
df['company'] = df['company'].apply(lambda x: x.split(",")[0])
new_df = df.groupby(['company'])['money'].sum().reset_index()
print(new_df)
Output:
company money
0 Google 55
1 jack & jill 100

Excel - Search from range, not specific cell?

I have a range of cells in column I2:I8:
WILEY
ELSEVIER
SPRINGER
TAYLOR
SAGE
OXFORD
CAMBRIDGE
I want to use the SEARCH function on column G, so that it'll search for any one of the values in this range, and return true/false to column H if it finds anything.
The problem is, that the values in column G are also longer, and the string in column I will only be a substring of the column G text.
Column G contains (for example):
BLACKWELL PUBL LTD
ISRAEL MEDICAL ASSOC JOURNAL
PERGAMON-ELSEVIER SCIENCE LTD
PERGAMON-ELSEVIER SCIENCE LTD
MOSBY, INC
OXFORD UNIV PRESS
CELL PRESS
AMER COLL PHYSICIANS
NATURE PUBLISHING GROUP
COLD SPRING HARBOR LAB PRESS, PUBLICATIONS DEPT
AMER COLL PHYSICIANS
MASSACHUSETTS MEDICAL SOC
WILEY-BLACKWELL
BLACKWELL PUBLISHING INC
AMER ASSOC ADVANCEMENT SCIENCE
OXFORD UNIV PRESS
MASSACHUSETTS MEDICAL SOC
OXFORD UNIV PRESS
ACADEMIC PRESS INC ELSEVIER SCIENCE
ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD
So for examples, each time the word Wiley, Oxford, Elsevier etc appear in column G (such as in OXFORD UNIV PRESS or WILEY-BLACKWELL or ACADEMIC PRESS INC ELSEVIER SCIENCE), it will return true in column H.
I have build the following functions:
=(ISNUMBER(SEARCH(($I$2:$I$8),G2)))
=(ISNUMBER(SEARCH(($I$2:$I$2:$I$3:$I$3:$I$4:$I$4:$I$5:$I$5:$I$6:$I$6:$I$7:$I$7:$I$8:$I$8),G23)))
But they do not seem to work.
Any suggestions?
Example of wanted result:
BLACKWELL PUBL LTD FALSE WILEY
ISRAEL MEDICAL ASSOC JOURNAL FALSE ELSEVIER
PERGAMON-ELSEVIER SCIENCE LTD TRUE SPRINGER
PERGAMON-ELSEVIER SCIENCE LTD TRUE TAYLOR
MOSBY, INC FALSE SAGE
OXFORD UNIV PRESS TRUE OXFORD
CELL PRESS FALSE CAMBRIDGE
AMER COLL PHYSICIANS FALSE
NATURE PUBLISHING GROUP FALSE
AMER COLL PHYSICIANS FALSE
MASSACHUSETTS MEDICAL SOC FALSE
WILEY-BLACKWELL TRUE
BLACKWELL PUBLISHING INC FALSE
AMER ASSOC ADVANCEMENT SCIENCE FALSE
OXFORD UNIV PRESS TRUE
MASSACHUSETTS MEDICAL SOC FALSE
OXFORD UNIV PRESS TRUE
ACADEMIC PRESS INC ELSEVIER SCIENCE TRUE
ACADEMIC PRESS LTD- ELSEVIER SCIENCE LTD TRUE
NATURE PUBLISHING GROUP FALSE
ELSEVIER SCIENCE BV TRUE
MOSBY-ELSEVIER TRUE
MASSACHUSETTS MEDICAL SOC FALSE
Wrap your formula in SUMPRODUCT()
=SUMPRODUCT(--ISNUMBER(SEARCH($I$1:$I$7,G1)))>0

Excel INDEX MATCH when one of multiple criteria is within a range of numbers

To identify duplicates in a large list of personal records, I'm replacing all names with a CONCATENATE of Name and Date of Birth (DOB). Here is the sheet I'm referencing (DOBs!):
DOBs! -----------------------------------------
C D E F G I K
Name Years Start End # Yrs DOB NewNameDOB
Sally Adams 2014-2014 2014 2014 1 1968-1204 Sally Adams1968-1204
John Agnew 2014-2014 2014 2014 1 1979-0419 John Agnew1979-0419
Bob Anderson 2013-2014 2013 2014 2 1965-0402 Bob Anderson1965-0402
Antonio Andrews 2014-2014 2014 2014 1 1955-0716 Antonio Andrews1955-0716
Julie Assan 2012-2014 2012 2014 3 1978-0805 Julie Assan1978-0805
On the main sheet (Employees!), each person has a row of data for each active year. Work 14 years, you have 14 lines of data to track.
Employees! -----------------------------------------
C D **E** F G H I J...
Year Name NewNameDOB Dept
2013 Julie Assan Julie Assan1978-0805 East
1998 Mike Rogers Duplicate in Same Year Main
1999 Mike Rogers Duplicate in Same Year Main
2000 Mike Rogers Mike Rogers1969-0510 Main
2001 Mike Rogers Mike Rogers1969-0510 Main
As mentioned, I need to separate duplicate names from 10395 records (like Mike Rogers and Mike Rogers). Employees! column E will now identify the employees as Julie Assan1978-0805 and Julie Assan1980-0131 (for example).
Today we take my first step, using the years they worked in order to solve 99% of the duplicates. After this, only a few duplicate names will be left who worked at the same time as each other, which I'll have to handle manually.
If the Employees! sheet has a 2013 record for "Julie Assan," then the first step is to check DOBs to find any Julie Assans who worked in 2013. My new column E in Employees! will take the current 2013 record of Julie Assan, and find any matches in DOB! where C (name) matches Julie Assan, E <= 2013, and F >= 2013. Usually, there will be only one match, and it will tell me that is Julie Assan1978-0805. Sometimes, there will be two Mike Rogers who worked during the same year, and it should tell me "Duplicate in Same Year".
On the Employees! sheet column E, I've started with this...
=index(DOBs!$k$2:$k$10395,match($d3&$c3,DOBs!$c$2:$c$10395& ??? ,0)
Not sure where to go with this formula, whether that means adding "IFs" or something different.
edited to explain in great depth
=IF(COUNTIFS(DOBs!C$2:C$10395,Employees!D2,DOBs!E$2:E$10395,"<="&Employees!C2,DOBs!F$2:F$10395,">="&Employees!C2)>1,"Duplicate in Same Year",INDEX(DOBs!K$2:K$10395,MATCH(TRUE,IF(DOBs!C$2:C$10395=Employees!D2,IF(DOBs!E$2:E$10395<=Employees!C2,IF(DOBs!F$2:F$10395>=Employees!C2,TRUE))),0)))
Enter as an array formula by confirming with Ctrl+Shift+Enter, then autofill down. It first checks for duplicates using COUNTIFS, and returns "Duplicate in same year" if it is. If there are not duplicates, it uses INDEX/MATCH to find the NewNameDOB.

T-SQL Search In String for specific words

Short to the point:
"I am using SQL Server Manager 2008 R2. I have a table with columns "product name" and "product size". The size of the product is recorded in his name like that:
Ellas Kitchen Apple & Ginger Baby Cookies 120g
Ellas Kitchen Apple, Raisin 'n' Cinnamon Bakey Bakies 4 x 12g
Elastoplast Spray Plaster 32.5ml
Ellas Kitchen Stage 1 Butternut Squash Each
the size of this product should be:
120g
4 x 12g
32.5ml
N/A
(some of the products can have no size in there name and should be set to "N/A")
I want to write T-SQL statement that update the product size getting it from the product name.
I have done this in javascript, but in order to do the things right I have to write SQL statement and that's my problem. I have found it very difficult to work with "regular expressions" in T-SQL.
I have seen a exmaple of how to get only the number from string, but have no idea how to do using sql.
Declare #ProductName varchar(100)
Select #ProductName= 'dsadadsad15234Nudsadadmbers'
Select #ProductName= SubString(#ProductName,PATINDEX('% [0-9]%',#ProductName),Len(#ProductName))
Select #ProductName= SubString(#ProductName,0,PATINDEX('%[^0-9]%',#ProductName))
Select #ProductName
I will appreciate any example or idea.
Thanks, in advance.
EDIT:
Thanks for your reply,xQbert.
I have not included all possible formats, because if I have a working example with few of them I think I will be able to do for all. Anyway, in order to give more details here are the possible situations:
( Inumber + "x" + Dnumber + word)* + (_)* + (Dnumber + word)*
- * means 0 or more
where word can be - g, kg, ml, cl, pack
where Inumber is integer
where Dnumber is double
where _ is space
For exmaple:
12 x 100g
100ml
2 x kg
And the price (if there is ) is always in the end of the name:
Product name + product prize
For example:
Organix Organic Chick Pea & Red Pepper Lasagne 190g
Organix Organic Vegetable & Pork Risotto 250g
Organix Rice Cakes apple 50g
Organix Rusks 7m+ 6 Pack
Organix Savoury Biscuits Cheese & Onion Each
Organix Savoury Biscuits Tomato & Basil Each
Organix Stage 1 Squash & Chicken 2 x 120g
PATINDEX is not REGX and you have limited logic processing in TSQL compared to .NET. Have you condisidered CLR integration?
http://msdn.microsoft.com/en-us/library/ms131089(SQL.100).aspx
This from 2005 but and example of REGX in SQL via CLR integration.
http://msdn.microsoft.com/en-us/library/ms345136(v=SQL.90).aspx

Resources