Relabelling large amounts of data in Excel - excel

In a CSV file: I want to relabel 433,000+ rows of IDs that look like "e904ab64a642efcd25f4a43cb729701646d4bf7a4ed0bacbae9d85127978606a" into simpler ID codes. For each of these unique IDs there are 4-5 rows of data. I really don't want to "find and replace" each of them because there are over 2000+ unique IDs. Is there any function in excel that can help me do that? Otherwise, any recommendations of what programs I can use?

If the IDs are always on consecutive lines, you can
Store the ID before replacement
replace it with your simpler ID (also store it)
go to next line
check if the ID is the same as stored on the previous line
If yes, use the same replacement ID as on previous line
If no, do same 1)

If you are happy doing this manually (since your tags do not currently include vba) then here is a simple approach:
Create a Unique List of IDs, for example by creating a 1-column
PivotTable
Next to each Unique ID, put your simplified ID (however you are creating that - is there an algorithm, or could it just be =Row()?)
Insert a column in the original sheet, adjacent to the ID column
Use a VLOOKUP to find the matching Simplified ID (e.g. =VLOOKUP(A1,'New IDs'!$A:$B,2,FALSE))
When it has finished calculating, copy the Simplified IDs, and Paste Special as Values

Related

Excel Concatenate a string in multiple columns, count and remove the duplicates

Currently the stock process at my company is very manual and it normally doesn't get carried out due to the process being rather boring. Currently all excel based I am slowly moving over to SQL that will automatically update the information.
We have come up with a naming system/code for each item, this is made up from several fields on the excel document. However there is the same codes in different columns that we wish to remove for when we push into SQL (Basically we just want the 1 line item and a count of how many times it has been used)
It has to be dynamic. (I can add an extra tab to the excel document to do any magic required) and if possible not use any Macros
So the data starts like this:
#Counts and then the duplicates are removed to produce this list
I have tried a range of countifs/Vlookups and I can get it roughly working but its not dynamic enough and I end up having multiple rows of 0 Qtys
Hopefully this is enough information
Cheers all
It looks like a very similar question was answered here.
After plugging in that formula in a different column, you can use the CountIf function in the next column.

MS Excel: is there a simpler way to use an IF statement to check for matching data and output a date?

I have a list of lot IDs and dates in one tab ("Lot IDs"), raw data in two others, and data for presentation in the last tab ("Selected Data"). In the Selected Data tab there is an IF statement that checks to make sure the lot ID on a given line matches one of the lot IDs in the Lot Data tab and, if it is true, it outputs the date associated with the lot.
A snippet of my function, from the Selected Data tab:
=IF(B2=LEFT('Lot IDs'!$C$2,6),'Lot IDs'!$D$2,IF(B2=LEFT('Lot IDs'!$C$3,6),'Lot IDs'!$D$3,IF(B2=LEFT('1303 Lot IDs'!$C$4,6),'Lot IDs'!$D$4,"false/paste pattern here")))
where
B column holds lot ID numbers
Lot IDs'!$C column holds lot ID numbers
Lot IDs'!$D column holds dates
This function is getting very long, over 30 repetitions.
Is there a way to generalize this function so I don't have to keep repeating the same pattern?
Use INDEX/MATCH with wildcards:
=IFERROR(INDEX('Lot IDs'!$D:$D,MATCH(B2&"*",'Lot IDs'!$C$:$C,0)),"false/paste pattern here")

excel vba Delete entire row if cell contains the GREP search

I have a single column of text in Excel that is to be used for translating into foreign languages. The text is automatically generated from an InDesign File. I would like to clean it up for the translator by removing rows that simply contain a number ("20", 34.5" etc), or if they contain a measurement "5mm", "3.5 µm", etc. I've found many posts (see link below) on how to remove a row with specific string, but none that use search strings, such as those I typically use with GREP searches: "\d+" and "\d.\d µm"
How would I do this? I am on Mac iOS if that helps.
Note that I would need to delete the row if the cell only contains a number or a measurement, not if the number is contained within a phrase, sentence, or paragraph, etc.
https://stackoverflow.com/a/30569969
It may not be what you are looking for, but how about just sorting the column and remove the rows starting with numbers? It is a manual approach but from what I understand this translation process only happens from time to time. Am I right?
I see two possible issues in your question:
How to work with regular expressions in Excel?
How to delete rows in a loop?
Let me start with the second question: when you want to create a for-loop in order to remove items from a list, you MUST start at the end and go back to the beginning (it's a beginner's trick, but a lot of people trip over it.
About the first question: this is a very useful post about this subject, it's too large to even give a summary here.

Nested list in excel

I'm not even sure how to ask this.
I have a database, where each row is a person. Columns are contact info, phone, etc. One column is 'date visited'. There can be multiple dates visited for each person. I don't want to use a comma or stack them all in one field.
Is there a way to have a 'nested' list (not a drop-down menu - just a list of visited dates for each person), such that one person still only consumes one single row?
Yes,
To accomplish this give each person an ID that is unique and won't change.
Then on a separate sheet, store the ID and date.
main sheet ( ID, Name, Contact Info, phone, ect)
second sheet ( ID, date visited)
In database theory this is called a 'one to many' relationship, and what i'm describing is called 'normalizing your dataset'.
In Excel you can now use formulas to manipulate the data however you need to or can imagine after you split this apart.
As you mentioned in comment, counting all visited dates for a user.
On the main sheet to the right you could use:
=countif(Sheet2!A:A,Sheet1!A1)
This would Count all of the ID's in the second sheet that match the current row's ID on your main sheet.
Notes about using one cell:
Storing all the dates in one cell will eventually max it out, and will make it hard ot view/search as it grows so i highly advise against this approach.
If however you insist on keeping the dates in there, you could count the visits by counting the total number of comma's + 1 liek this =(LEN(G1) - LEN(SUBSTITUTE(G1,",","")))+1 This formula takes the length of all the dates, and the length of dates with commas removed and subtracts them to get a number of occurrences.
Notes about using multiple columns:
This approach has the same idea as the one I suggested, where we are associating a number of dates with the row's identity of a person. However, there are a few key limitations and drawbacks.
The main difference is that when we abstract the dates by transposing them to extend vertically we can manipulate them easier, and make a list of 20 dates for one person much easier to read. By transposing the dates vertically in the second sheet instead of using this approach we also gain the ability to use Excel's built in filter. Just storing large amounts of data is useless by itself. While storing it in a way that you can view and manipulate easy makes everything much more powerful.

Excel Lookup with multiple queries

I have a question that I a may not be thinking correctly about. But I have an a long excel file that I pull from somewhere else with the following columns:
Project_Name1, Employee_Name1, Date_Worked1, Hours_Worked1
In another sheet I have these columns
Project_Name2, Employee_Name2, Begin_Date2, End_Date2, Hours_Worked2
This second sheet is filled with data, and works just fine.
However, it turns out that I have some employee names that I do not know that are also working on the same project. I need to figure out the names of the employees and then sum the number of hours they worked for a given period.
So I need a lookup with three criteria:
Project_Name1 = Project_Name2
Employee_Name1 <> {Array of Employee_Name2}
Begin_Date2 <= Date_Worked1 > End_Date2
Returning Employee name.
Once I have the employee name, I can do a sumifs=() and get the total hours they worked no problem.
I have tried a number of combinations of Index Match functions, using ctrl-shift-enter... and have not been able to figure out it. Any help would be greatly appreciated.
What you're talking about doing is extremely complicated and a little bit past what Excel was designed to do by default. However, there are a few workarounds that you can use to attempt to get the information that you're looking for.
It's possible to do multiple-criteria VLOOKUPs and SUMIFs by concatenating fields to make a multi-part identifier (Ex: Insert a new column and have a forumla in it like =A1&B1)
Open a new workbook and use Microsoft Query (I'm not sure if you can select from more than one sheet, but if you can select from multiple sheets like tables you should be able to write a semi-complex query to pull the dataset you want.
http://office.microsoft.com/en-us/excel-help/use-microsoft-query-to-retrieve-external-data-HA010099664.aspx
Use the embedded macro feature and use visual basic script to write out your business logic. (Hotkey is ALT+F11)
One way to do this would be to first create an additional column to the right of entries on the sheet you're trying to pull employee_name from: =ROW()
You could then use an array formula like you were trying to implement to pull the corresponding 'match' row:
{=SUM((project_name1=projectname2)*(employeename1<>employeename2)*(begindate<=date_worked1)*(date_worked1>end_date2)*(match_column))}
You could then use this returned match_column entry within the index as you described to retrieve the appropriate entries.

Resources