Excel 2007. How can I convert names into unique IDs? - excel

Currently I am forming a dataset in excel. With the dataset, I am going to do the panel regression using stata(fixed effect model).
However, I cannot change people's names into their unique IDs. In my dataset, I have name in column B, and I want to generate a new column A with corresponding numbers to names in column B. Doing manually is not an option for me since the number of data is larger than 1,000. Two particular problems I encounter are that names are in Korean so the number of characters are mostly the same, and there are same people in different rows. Is there any method that I could use?

If I understand you question correctly, I would do following.
Step 1: Used the Advanced Filter to filter on unique names
You can place these results on the same sheet or on a different sheet (as per Scott Craner's comment). Both options are listed below
Filter Names on the same sheet
Select all the data in column B, then click Data > Sort & Filter > Advanced.
Select the option to Copy to another location, Select a blank cell in column B location at the very bottom of your worksheet, several rows away from your data (since you won't be able to paste the results to a new sheet), and then select Unique records only. Then Click OK
Filter Names on a different sheet
Add a new sheet and the click Data > Sort & Filter > Advanced.
Select the option to Copy to another location
Set the "List range" to the column on your dataset sheet containing the names, and Set the "Copy to" range to B1 on your new sheet
Select Unique records only and then Click OK
This will paste a new range that has all the unique names in your list.
Step 2: Assign unique IDs to the names in the unique list
This can easily be done by entering '1' in the Column C cell next to the first name in the unique list, '2' next to the second name, selecting those two cells, and then clicking the bottom right corner of the selected cells and dragging it down to the bottom of the unique names list
Now you have a range (i.e. unique names and IDs) that you can use VLOOKUP to populate an ID column in your data set
Step 3: Use VLOOKUP to populate IDs for the rows in your dataset
For example, If your unique names and ID is in the range B1200:C1500, then you can enter the following formula in the first row of your dataset in column A (what you want to be the column with your unique IDs)
=VLOOKUP(B2, $B$1200:$C$1500,2,0)
After you drag this formula down your entire dataset, you'll now have the correesponding unique ID for each name.
Step 4: Cleanup
Copy your column A (should be all VLOOKUP formulas) and paste Values only so you don't have the formulas there anymore
Delete the unique data and IDs range at the bottom of sheet (or the new sheet you created to do this)

Related

How to delete duplicates within a large database on a column by column basis

I have a large set of data (over 3000 columns) for work, with text in every cell. Each column is unrelated to each other. Within each column there are potentially duplicates and I need to keep only the first instance , but there is no way to highlight the cells with duplicates on a column by column basis as when the whole data set is highlighted excel treats the rows as related data and looks for duplicates on a row by row basis. I have tried using macros (I am a total novice) but the macros don't work.
Image shows the columns of data with some duplicates in the columns.
If you use the modern Excel, you could use the UNIQUE function, which returns the array of unique elements.
Just duplicate the sheet and in the copy delete everything below the lines with "Processor 1" and "Processor 2". Then in the first column use UNIQUE referring to the first respective column of the original sheet.
Just fill the formula right (Ctrl + R) and in the new sheet each column will have only the unique elements.
You can then paste the whole resulting table as values and delete the original one.

Copy multiple cells from two separate sheets based on input

I have three sheets:
I manually copy values from column A in Sheet 1 into column A in Sheet 3 and from Column A in Sheet 2 into Column B in Sheet 3 (as shown in the attached images).
The values are selected based on Column A and Column E. E.g. in Sheet 3, Q001-1 S1 from Sheet 1 is copied with G001-3 S3 and G002-2 S2.
I would like to simplify this process as much as possible because it is time consuming- however I cannot find a way to do this.
Is this even possible, using formulas or VBA??
if i understood correctly you need to perform a join in Excel using the cell ID as a criteria.
I'd use Power Query instead of VBA because it's the tool made just for this kind of task:
Select the original table and format it as a table (Tab Home) -
perform this operation for both data source
Format as a table
Load both tables into PowerQuery (click on the table and then go to Data/Get and TrasformData and select From Table PowerQuery repeat this step for both tables
From the table 1 select the tab Home select Merge, then choose the cell ID as a criteria (use the default left join) Join
Click on the new column to spread the data, remove uneccessary columns
Close and load (top left) your query into a new sheet (sheet3) - Load only the output table and not the original tables since they aren't necessary
To automate the process simply add new rows to the original tables and then click into the tab Data and then Refresh all. The results will appear into sheet3
I hope it helps!
Emanuele
If your plan is arbitrary or based on criteria that can't be determined mathematically, it can't be solved with code alone. If you'd like to plan your arrangement on a 4th sheet, however, you can write conditional formulas in each cell on sheet 3 that evaluate your plan on sheet 4 and compare them to the cells containing the strings S1, S2, and S3. In cell A2 you could paste the following;
=IF((EXACT(Sheet1!E2,Sheet4!A2)),Sheet1!A2,IF((EXACT(Sheet1!E3,Sheet4!A2)),Sheet1!A3,Sheet1!A4))
It's evaluating the corresponding cell on sheet 4 for an S1 or S2, if it sees neither, it assumes S3. It then chooses the corresponding cell on sheet 1. The code in column B evaluates the corresponding cells on sheet 4 using the same trick;
=IF((EXACT(Sheet2!E2,Sheet4!B2)),Sheet2!A2,IF((EXACT(Sheet2!E3,Sheet4!B2)),Sheet2!A3,Sheet2!A4))
It then chooses the corresponding cell on sheet 2. You can drag these cells down as far as you need.
I wrote a working model and posted it to GoogleDocs

Excel formulas and conditional lookups based on multiple criteria and sheets

I have 2 sheets:
sheet_a is a styled print-ready layout for a single data record
sheet_b is a bulk data table which is continually growing. Each row corresponds to a single complete record
Currently I am using a VLOOKUP to collect the data from sheet_b and put it in the respective cells in sheet_a. I have a drop down list on sheet_a which allows me to select a single record at a time to view.
Now I want to introduce a second drop down list to sheet_a where I want to select 1 of 4 specific conditions relating to the value of a cell in a specific column of each record on sheet_b.
I only want the entries that meet this condition in sheet_b to be made available in the range of records I can view on sheet_a.
Can anyone help?
As I understand it, you are looking for a way to filter the list that is used in the drop-down on Sheet_A.
add a column to the source data and calculate or mark manually which of the four conditions the record belongs to.
on Sheet_a add a drop-down field where the user can select from the four conditions. Name this cell "criterion"
in the source data table, add a column with a formula that returns the row number if the current row matches the selected criterion. Something like this copied down
=IF(B2=criterion,ROW(),"")
create another helper column that contains only the items that match the criterion using a formula like this and copied down
=INDEX(Data,SMALL($E$2:$E$18,ROW(A1)))
use a dynamic range name called "FilteredList" that contains only the values of the result list, not the errors. The formula for "FilteredList" is
=Sheet1!$F$2:INDEX(Sheet1!$F:$F,MATCH("zzzzz",Sheet1!$F:$F,1))
change the drop-down that is currently used to select a record to source its values from =FilteredList

Copy duplicate values from one column to a separate sheet

I have a list of names in one column in sheetA. I want to copy the duplicate names from sheet A to sheet B and also provide a count of those duplicate names beside each copied name in sheet B.
I’m guessing it is ‘manually’ so would suggest a helper column in sheetA with =COUNTIF(A:A,A1) copied down (assuming your list of names starts in A1. For each row this should count the number of rows that contain that row’s name.
Copy ColumnA:B into your sheetB with Paste Special Values, then use Data > Data Tools – Remove Duplicates and filter and delete all rows with 1 in the count column.
easiest method I can think of would be a pivot table, rows=names, columns=count of names, filter table for counts greater than 0? then when you want to run a fresh report, past in the original information and refresh the table?
Probably way too late to be helpful, but maybe for the next person...

Trying add up values but have multiple entries

I am trying to look up the value in one column and pull the number from another column.
Of course, I could use the simple V-lookup or Match.
However, the first column of data has multiple entries that are the same. If I Vlookup it is just going to pull the first number in the second column.
I need to pull each number from the second column and somehow add them together. Despite the fact I have multiple entries.
If there is a way to consolidate the multiple entries in 1st column while also summing up the numbers in the 2nd, that would be great.
I would recommend a Pivot Table. To create one, select a cell in your data range (which needs to have column names in the first row. Choose Insert / Pivot Table from the Ribbon and select the New Worksheet option for the location.
In the Pivot Table list on the new worksheet, drag the name of the first column to the Row Labels box and the name of the second column to the Values box. The name in the Values box should turn to Sum of <2nd column name>.
The Pivot Table will now show a sorted list of the column 1 values and the summed values of column 2. In the example, you'll see that
Does SUMIF do what you are looking for?

Resources