I have a large table of two columns in Excel 2010. Column A is the user, column B is the person who invited the user. Usernames are alphanumeric, including some which are just numeric. The earliest users don't have an invitee.
User | Parent
-------------
AAA |
BBB |
CCC | AAA
DDD | BBB
EEE | DDD
FFF | DDD
GGG | FFF
HHH |
III | GGG
What I would like to do is have a formula which allows me to go to grandparent (and great-grand-parent, and beyond), so I'm trying to find a formula-based solution which uses mixed relative and absolute columns where appropriate.
The above chain would go to a maximum of four, but I have reason to believe my data set goes to no more than 20 levels deep at maximum. I would like to find a formula or combination of formulas that get me to this (and, as I said, beyond):
USER | PARENT | P2 | P3 | P4 | ...
AAA | |
BBB | |
CCC | AAA |
DDD | BBB |
EEE | DDD | BBB |
FFF | DDD | BBB |
GGG | FFF | DDD | BBB
HHH |
III | GGG | FFF | DDD | BBB
...
I've tried various methods combining VLOOKUP, MATCH, and INDEX commands, with and without a key row of user ID numbers (since some of those solutions without a numeric column broke down when faced with the fact that "0" was a valid username, which makes error trapping more difficult). I can get to P2 pretty reliably, but I can't ever seem to get to P3 without it breaking down. Incidentally, the formulas I've tried are very CPU-intensive, given the data goes to nearly 400,000 rows, but calculation time doesn't concern me much. My brute-force methods aren't working. There are several somewhat similar questions on stackoverflow, but they're asking for slightly different things, and I haven't been able to adapt any of them.
If this can be done via standard functions, that would be preferable to VBA (which I am not familiar with), even if the calculation time is longer, as it would increase my ability to maintain it when I need to revisit this issue next year.
Try this formula:=IFERROR(VLOOKUP(C5,UserParent,2,FALSE) & "",""), replacing UserParent with your absolutely referenced column pair (e.g. $B$5:$C$30) or an appropriate named range. Copy it down and across your grandparent columns.
I'm betting this is the approach that you tried before, but you end up with a bunch of zeroes in the output. The juicy bit in this formula is the & "". This forces the empty cells in your parent column to be treated as empty strings rather that zero-valued cells when VLOOKUP does its work. This removes all those zeroes that dork up the output.
I was able to make it work with a bunch of random alphanumerics, but without sample data, this is the best I could do.
As you have noted, the existence of 0 as a valid username, is a real problem - since this also gets returned as a value by VLOOKUP() (and equivalently by INDEX(,MATCH())) for names with no parents.
An alternative strategy is to use some dummy value which does not appear in either the User or Parent column, such as -99999, to signify the absent parent and to add this in place of any empty cell in the Parent column of the User/Parent table. Also add a row to this table with this same dummy value in both the User and Parent columns. Now you will only get a zero returned by VLOOKUP if 0 is genuinely the parent of the cell whose parent you are attempting to find. You will detect when there are no more "levels" when all the values in the column are equal to the dummy value.
Related
First question on Stack, but not my first visit!
Basically I have this huge Excel database (>24 000 rows, merged from different tables) I have been working on for weeks and now that I'm done adding new entries, I have to clean it by removing a lot of duplicates.
The array/table is structured in the following manner :
+---------+-------+--------------------+-------------+--------------------+
| Company | Name | Address | Phone | Email |
+---------+-------+--------------------+-------------+-----------+--------+
| Baij&Co | Steve | 458 Preston avenue | 4156854789 | steve#baij&co.com |
I did search through conventional methods but they don't exactly answer my problem, such as:
Using the "Remove Duplicates" Excel button by selecting all columns to make sure I only keep unique values
Using the filtering method to identify the duplicates and then remove them.
However, my goal is to remove the duplicates for which the given row(s) contains the minimal amount of information, as shown in this example:
+--------- +-------+--------------------+-------------+--------------------+
| Company | Name | Address | Phone | Email |
+--------- +-------+--------------------+-------------+--------------------+
| Baij&Co | Steve | (blank) | 4156854789 | steve#baij&co.com |
| Baij&Co | Steve | (blank) | (blank) | steve#baij&co.com |
| Baij&Co | Steve | 458 Preston avenue | 4156854789 | steve#baij&co.com |
Here, I would like to remove the 1st AND 2nd row as they contain less information (missing address & phone entry) about the same contact.
Does it makes sense..?
I only know the basics of VBA (like creating a userform to add a new contact and fill out the entered information in the right cells) but I struggle with advanced algorithms.
I just know the VBA related function cannot be customized, apart from selecting the columns in which I want to remove the duplicates :
Sheets("Database").Range("ContactsTable").RemoveDuplicates Columns:=1:15, Header:=xlNo
Any ideas?
Thanks fellas!
So I followed #Tim Williams 's suggestion (which is similar to Scott's actually) and did the following:
I realized that email addresses were the unique identifier (or primary key) and I have to delete rows that don't contain any (as it becomes useless to have a contact file without contact information).
I added a column named "Count" and inserted the following formula:
=COUNTIF(N:N; N2)
--> Here, "N:N" is the column containing all email addresses. "N2" being the first cell.
I then sorted the table by descending order on the newly "Count" column to have the most occurrences first.
Then used the "Remove Duplicate" Excel tool and selected the email address column.
As a result, 10 000 rows have been removed (out of 24 000). One thing for sure is this table contains now unique contact files based on the email address. However, I will never know for sure if the most filled row was kept for each contact sadly (unless I spend days comparing both databases, row after row).
Problem solved I guess! Although I would be interested in a VBA-script to do the same (to learn on the algorithm aspect) if anyone knows anything about it :-)
Thanks again!
I'm trying to sort a text file in this manner:
6 aaa
4 bbb
2 ccc
2 ddd
That is, each line sorted first in numeric descending order (the number indicates the number of occurrences of the word on the right), and if multiple words are repeated the same number of times, I'd like to have those words sorted alphabetically.
What I have:
6 aaa
4 bbb
2 ddd
2 ccc
When I try sort -nr | sort -V it kind of does what I want but in ascending order.
2 ccc
2 ddd
4 bbb
6 aaa
What's a clean way to accomplish this?
I think you just need to specify that the numeric reverse sort only applies to the first field:
$ sort -k1,1nr file
6 aaa
4 bbb
2 ccc
2 ddd
-k1,1[OPTS] means that OPTS only apply between the 1st and 1st field. The rest of the line is sorted according to global ordering options. In this case, since no other options were passed, this means the default lexicographic sort.
Maybe using tac? (not a shell expert here, just remembering uni days...
sort -nr | sort -V | tac
First let me explain what I want to achieve.
I currently have an Excel like this:
Names | Standards
James | Standard 1
James | Standard 2
James | Standard 3
Francis | Standard 1
Francis | Standard 2
Francis | Standard 3
Leon | Standard 2
Leon | Standard 3
Peter | Standard 2
Michael | Standard 3
And I want to create something like this:
Standard | Name 1 | Name 2 | Name 3 | Name 4
Standard 1 | James | Francis | |
Standard 2 | James | Francis | Leon | Peter
Standard 3 | James | Francis | Leon | Michael
My real Excel has more than 300 standards, so I would like to automate this using Excel Formula. I know this is possible, but I haven't used Excel in a while, so I could use a push in the right direction.
Couple of things I need (I think):
Need to count how many times people in the names column mention a standard. So I want to know that I need 2 names for standard 1 and 4 for standard 3. I think I can do this by using the COUNTIF method.
We need to search for the location of the standards. I think I can do this by using the Match function. This gives us the location of the first match in my original Excel. By sorting my original Excel a-z and combining it with the countif result I know where all the matches are (first match + countif = location of the last match, and everything inbetween is also that standard).
For the first name that mentioned a standard, I will reference the cell left of the first match (because the names are in the cell to the left of the standard I found). For the second name I will reference the cell left of the cell below the first match. I keep doing this till I find as many names as Countif mentioned. So I need an IF statement that makes sure that if 2 people mention standard 1 only gets 2 names and 2 cells with a "".
How will I reference the cells? By another if statement that uses this: Excel Reference To Current Cell , Correct me if I am wrong, but can't I then just say THIS.CELL=cell location I found (probably should use INDIRECT here?).
This is just me brainstorming, but I would love to know if people have any other ideas for my problem or have some feedback for my current plan.
An important thing to mention is that I want to do this using Excel Formula. I do realise that this isn't always the best, but VBA is not an option atm. I am also not worried about performance issues, because I think i'll just copy all the values after I found all the names using formulas.
Thanks in advance!
Depending on how you want to have the layout, I think you should use a pivot table. Drag the 'Standards' and 'Names' fields to the 'rows' data box and then right-click on a standard, click 'Field Settings' - 'Layout and Print' - 'Show item labels in tabular form'. (See example below.)
If you definitely need the data in the format in your question, I would edit the pivot table by dragging the 'names' field to the 'columns' data box. Then drag the 'standards' field from the field list above a second time and duplicate it in the 'values' box (see example below).
In the space underneath the pivot table, use an IF formula to only copy the name if there is a 1. This kind of approach will obviously be quite fragile, so if you can make do with the first approach, I think you will run into fewer problems in the future.
I have a spreadsheet in Excel that contains a "Member ID" in the first column, with six variables, related to this Member ID, in the next six columns.
I need to somehow convert these columns into rows, but still have the Member ID column at the beginning of each row.
Here's the data as it stands (there are 5000 rows, hence hoping to find an automated solution):
MEMBER 1 | AAA | BBB | CCC | DDD | EEE | FFF
MEMBER 2 | BBB | ZZZ | FFF | AAA | RRR | SSS
MEMBER 3 | YYY | FFF | OOO | MMM | PPP | AAA
And here's the format that I need:
MEMBER 1 AAA
MEMBER 1 BBB
MEMBER 1 CCC
MEMBER 1 DDD
MEMBER 1 EEE
MEMBER 1 FFF
MEMBER 2 BBB
MEMBER 2 ZZZ
MEMBER 2 FFF
MEMBER 2 AAA
MEMBER 2 RRR
MEMBER 2 SSS
MEMBER 3 YYY
MEMBER 3 FFF
MEMBER 3 OOO
MEMBER 3 MMM
MEMBER 3 PPP
MEMBER 3 AAA
I attempted to follow the steps in this question: Split multiple excel columns into rows , however that seems to only work for numeric values and not text.
Any help that anyone can give me would be hugely appreciated, I'm stumped as to how to do this. Thanks so much in advance!
This question is stale but I just needed it, so I'll answer.
I found the solution by Chris Chua on Quora, please upvote him there if this is helpful: https://www.quora.com/How-do-I-convert-multiple-column-data-into-a-column-with-multiple-rows-in-excel
I'll also copy-paste it here in case of a broken link:
Apologies. I can’t really see the writings clearly, but I’m assuming that you want to unpivot the column headers as part of the data set.
Here I am suggesting a solution using Microsoft Excel’s Power Query feature. If you have Excel 2016, the feature is found under Data | Get & Transform ribbon command. Otherwise if you have Excel 2013 or the Professional Plus version of 2010, you can download the Power Query add-in from here.
Step 1: Select all the data, press Ctrl+T to create a table, and under Create Table, make sure that My table has headers is selected.
Sample data create table dialog
Step 2: In Excel 2016, go to Data | Get & Transform | From Table. For Power Query add-in for Excel 2013 and 2010, look for something in the Power Query tab ribbon that says From Table.
Ribbon location for "From Table"
Step 3: The Power Query interface will open up. Click on the header of No. Column, hold Ctrl and click on the header of the Name Column. With both columns now selected, right click on the headers and select Unpivot Other Columns.
Right Click, Unpivot Other Columns screencap
Step 4: Notice that the data is now unpivoted (which is what you wanted). Double click on the Attribute and Value headers to rename them.
Step 5: Click Home | Close & Load.
Close & Load screencap
Step 6: Excel will load the new cleaned up table onto a new sheet. Ta-da you’re done!
Screencap of pivoted table, which is the desired output.
To update any new data in the future:
Supposed you now have new data in your original table.
Updated sample data table screencap with a new column, row and value inserted.
Go to the new cleaned up table, right click and select Refresh.
Screencap of right-clicking inside the pivoted output table, clicking Refresh.
The new data will be updated immediately.
This should be doable, try saving excel file as csv then make a script to format that csv to your liking then import csv into excel.
There are many scripting languages like bash, powershell, python and tutorials on how to different things with them. Python is beginner friendly.
Thanks Spidey for your help. I managed to get it sorted using Sublime Text rather than Excel.
For those interested, I just copied the six columns into Sublime Text, did a find & replace for the gap/space/tab character between each column and turned this gap into a return/new line. That gave me a giant list of variables.
I then copied in the first column (Member ID), selected all the rows and did "Split selection into lines", which gave me a cursor at the end of each line. I then just selected each line and did a copy -> return -> paste, six times. This gave me a giant list of each Member ID showing up six times.
Then it was just a matter of (in Excel) pasting that long Member ID list into Column A, with the long list of variables (from the first step) into Column B. Done!
Firstly, thank you for checking my quesiton. I'm new to doing anything advanced in Excel so I'm a bit lost.
I am trying to match names from two different sources that have the same data structure. There are 3 columns, LastName, FirstName, MiddleName. I added a fourth row to denote which organization the record came from and put both sources into one table and made a pivot out of it which works well enough but I'm having a hard time generating any useful data from it.
There are two main objectives once I have them matched.
I need a percentage of matching.
I need to be able to filter out the ones that matched so I can investigate the ones that didn't.
Here is a small example.
+-------------+-----------+------------+------+
| LastName | FirstName | MiddleName | Org. |
+-------------+-----------+------------+------+
| Jones | Mike | Anthony | Org1 |
| Black | Marry | | Org1 |
| Zeek | Winston | E | Org1 |
| Jones | Mike | A | Org2 |
| Black-Smith | Marry | | Org2 |
| Zeek | Winston | E | Org2 |
+-------------+-----------+------------+------+
As you can see out of the list only Winston E Zeek would really match because all three names are exactly the same. Mike Jones won't match because the listed middle names are wrong and Black and Black-Smith won't match because they are technically different last names. These issues with the data are fine at this stage because those are exactly what I'm trying to identify with a larger data set.
Maybe Excel isn't the best for this issue without using VBA? I'm not familiar with VBA which is why I haven't tried it yet and I unfortunately have limited time.
How can I solve this matching problem?
Any assistance and guidance will be appreciated.
Here's a quick idea:
Sort the data by last name, first name, middle name. That should put same/similar names next to each other.
Add a column that, for each row, has a worksheet function like =IF(A3=A2,1,0). This will indicate if this row matches the one above.
Sum the new column... That will tell you the number of matches. Divide by the total number of rows, to get your percentage.
You can modify the function in step 2, to indicate as tight of a match that you want.
Advantage: No VBA needed. Disadvantage: It requires some manual work and interpretation.