Is there a quick way for excel to identify and remove duplicate series from a cell such as this? - excel-formula

Is there a built in function, or a simple UDF that can identify the pattern in the information below and remove the duplicates?
Assume the following is all within a single excel cell:
80154, 80299, 80299, 82055, 82145, 82205, 82520, 82570, 83840, 83925,
83925, 83986, 83992, 84315, 80154, 80299, 80299, 82055, 82145, 82205,
82520, 82570, 83840, 83925, 83925, 83986, 83992, 84315
There are two sets of data (starts with 80154 ends with 84315). I want to end up with only one set, but I want to do it to 50,000 lines. The final output should be just the BOLD text. Also, sometimes the data repeats itself 3 times, again, I just want the unique set of data.
NOTE: I can't just remove duplicates, because sometimes there will be duplicates in the set that I need to capture in the final output. For example, (A,A,B,C,A,A,B,C) needs to be reduced to (A,A,B,C).

This finds where the first 20% is repeated and cuts the string at that point.
IF it does not find a duplicate it will return the whole string.
=IFERROR(LEFT(A1,FIND(LEFT(A1,LEN(A1)/5),A1,2)-3),A1)
Play with the 5 till you find the proper length of string that will get you the correct answer on all your strings. The higher the number the smaller the string it compares.
Also if it is cutting off too much or not enough, like leaving the , at the end adjust the -3 up and down.

Related

How to generate a random number within a range only once?

I want to randomize numbers in a range between x and y. The problem is, while the random numbers are generated, Excel generates the numbers again, each time I make a change in the spreadsheet.
The purpose of this question is for the sake of generating realistic-looking ID Numbers, basically, for showing a group of students how to generate a range of ID numbers in Excel for a Mail Merge, later on. However, I don't want them to panic, so, I want to make the Generated Numbers be generated only once. For instance, let's say that =RANDBETWEEN(20,45) generates 31 for one cell, I want that particular cell to retain that value. In addition, I also want to demonstrate to students who want to go one step further, how to insert a string prior to the Numbers. For example, "Case: " (without quotes), followed by the generated value. Some students even asked how to add multiple generated numbers, separated by dashes.
The code I use: =RANDBETWEEN(20,45).
As mentioned, I only want the numbers to be generated once. Instead, every time I modify the spreadsheet, the values change.
UPDATE:
I used the following formula: ="CASE: "&RANDBETWEEN(20,45)& "-" & RANDBETWEEN(20,45) , and it produces CASE: 21-38. All I need is to make the value stay the same as I make changes to the Spreadsheet.

excel vba Delete entire row if cell contains the GREP search

I have a single column of text in Excel that is to be used for translating into foreign languages. The text is automatically generated from an InDesign File. I would like to clean it up for the translator by removing rows that simply contain a number ("20", 34.5" etc), or if they contain a measurement "5mm", "3.5 µm", etc. I've found many posts (see link below) on how to remove a row with specific string, but none that use search strings, such as those I typically use with GREP searches: "\d+" and "\d.\d µm"
How would I do this? I am on Mac iOS if that helps.
Note that I would need to delete the row if the cell only contains a number or a measurement, not if the number is contained within a phrase, sentence, or paragraph, etc.
https://stackoverflow.com/a/30569969
It may not be what you are looking for, but how about just sorting the column and remove the rows starting with numbers? It is a manual approach but from what I understand this translation process only happens from time to time. Am I right?
I see two possible issues in your question:
How to work with regular expressions in Excel?
How to delete rows in a loop?
Let me start with the second question: when you want to create a for-loop in order to remove items from a list, you MUST start at the end and go back to the beginning (it's a beginner's trick, but a lot of people trip over it.
About the first question: this is a very useful post about this subject, it's too large to even give a summary here.

Is there a way to replace cells of a particular value with multiple cells?

Let's say I need to replace any cell that has a value of "outgoing" with multiple cells such as (0), (1), (0), (0), (2), in Excel. Is there a way to actually make this happen? I am doing this for a research project. Every item in my data needs to be coded on five different scales. There are 30-or-so items make up for almost half of the data. It would be enormously helpful to be able to simply replace the high frequency items with the five values at once.
I am not sure I completely understand the result you are looking for but here goes:
How about using the Find and Replace functionality to replace all instances of "outgoing" with "(0),(1),(0),(0),(2)" and then use the Text to Columns functionality to split the single column with "(0),(1),(0),(0),(2)" in to five separate columns, thus each value would be in its own cell.
You would need to split based on a delimiter (probably ",") and you should do all your replacing before you start splitting. Obviously you should test on some sample data first - Find and Replace is not your friend if you are not certain about your data set.

Concatenate Custom Function

On a daily basis I need to load data to one of our systems. However Excel deletes the previous zeros in front of the contractor IDs. So i have to add THREE zeros manually. I normally use the CONCATENATE function however now the IDs are coming differently so some IDs now only need to have TWO zeros added.
example:
ID
911111
I use concatenate to make it look like:
000911111
I came up with the IF formula that detects if the ID starts with a number NINE, to concatenate TWO zeros and if not, then to add THREE zeros.
example:
=IF(LEFT(A32,1)="9",CONCATENATE("00",A32),CONCATENATE("000",A32))
Now I want to create this formula as a custom defined so I do not have to write down the formula ever time I work on the data every day.
Any suggestions I will really appreciate.
In addition to the formatting responses provided in the comments, you could use the RIGHT function to cut off the leading zeroes to the appropriate amount.
For example, assuming A1 holds a string of numbers, between 0 & 9 digits long. We can create text representing a 9 digit string, with as many leading zeroes as necessary, as follows:
=RIGHT(REPT("0",9) & A1,9)
REPT("0",9) tells Excel to repeat the character "0" 9 times. It then tacks on whatever text is in A1. Then it takes only the rightmost 9 characters of the concatenation.
I generally would recommend the Formatting options noted in those comments, unless you need the text to be 9 characters for other formula purposes.

retrieve part of the info in a cell in EXCEL

I vaguely remember that it is possible to parse the data in a cell and keep only part of the data after setting up certain conditions. But I can't remember what exact commands to use. Any help/suggestion?
For example, A1 contains the following info
0/1:47,45:92:99:1319,0,1320
Is there a way to pick up, say, 0/1 or 1319,0,1320 and remove the rest unchosen data?
I know I can do text-to-column and set the delimiter, followed by manually removing the "un-needed" data, but my EXCEL spreadsheet contains 100 columns X 500000 rows with each cell looking similar to the data above, so I am afraid EXCEL may crash before finishing the work. (have been trying with LEFT, LEN, RIGHT, MID, but none seems to work the way I had hoped)
Any suggestion will be greatly appreciated.
I think what you are looking for is combination of find and mid, but you'll have to work out exactly how you want to split your string:
A1 = 0/1:47,45:92:99:1319,0,1320 //your number
B1 = Find(“:“,A1) //location of first ":" symbol
C1 = LEN(A1) - B1 //character count to copy ( possibly requires +1 or -1 after B1.
=Left(A1,B1) //left of your symbol
=Mid(A1,B1+1,C1) //right size from your symbol (you can also replace C1 with better defined number to extract only 1 portion
//You can also nest the statements to save space, but usually at cost of processing quantity increase
This is the concept, you will probably need to do it in multiple cells to split a string as long as yours. For multiple splits you probably want to replicate this command to target the result of previous right/mid command.
That way, you will get cell result sequence like:
0/1:47,45:92:99:1319,0,1320; 47,45:92:99:1319,0,1320; 92:99:1319,0,1320; 99:1319,0,1320......
From each of those you can retrieve left side of the string up to ":" to get each portion of a string.
If you are working with a large table you probably want to look into VB scripting. To my knowledge there is no single excel command that can take 1 cell and split it into multiple ones.
Let me try to help you about this, I am not a professional so you may face some problems. First of all my solution contains 2 columns to be added to the source column as you can see below. However you can improve formulas with this principle.
Column B Formula:
=LEFT(A2,FIND(":",A2,1)-1)
Column C Formula:
=RIGHT(A2,LEN(A2)-FIND("|",SUBSTITUTE(A2,":","|",LEN(A2)-LEN(SUBSTITUTE(A2,":","")))))
Given you statement of having 100x columns I imagine in some instances you are needing to isolate characters in the middle of your string, thus Left and Right may not always work. However, where possible use them where you can.
Assuming your string is in cell F2: 0/1:47,45:92:99:1319,0,1320
=LEFT(F2,3)
This returns 0/1 which are the first 3 characters in the string counting from the left. Likewise, Right functions similarly:
=RIGHT(F2,4)
This returns 1320, returning the 4 characters starting from the right.
You can use a combination of Mid and Find to dynamically find characters or strings based off of defined characters. Here are a few examples of ways to dynamically isloate values in your string. Keep in mind the key to these examples is the nested Find formula, where the inner most Find is the first character to start at in the string.
1) Return 2 characters after the second : character
In cell F2 I need to isolate the "92":
=MID(F2,FIND(":",F2,FIND(":",F2)+1)+1,2)
The inner most Find locates the first : in the string (4 characters in). We add the +1 to move to the 5th character (moving beyond the first : so the second Find will not see it) and move to the next Find which starts looking for : again from that character. This second Find returns 10, as the second : is the 10th character in the string. The Mid formula takes over here. The formula is saying, Starting at the 10th character return the following 2 characters. Returning two characters is dictated by the 2 at the end of the formula (the last part of the Mid formula).
2) In this case I need to find the 2 characters after the 3rd : in the string. In this case "99":
=MID(F2,FIND(":",F2,FIND(":",F2,FIND(":",F2)+1)+1)+1,2)
You can see we have simply added one more nested Find to the formula in example 1.

Resources