Splitting words in different combinations in Excel - excel

I would like to get only bold part of the text in new column i.e
Czechowice - Dziedzice AMBRA
Białystok DEF
Komorniki
Bielsko Biała EC
Farmacja Luboń
Gorzów Wlkp.
Grudziądz EC
Kędzierzyn-Koźle EC
Ostrowiec Świętokrzyski EC
Puck T
Przeworsk+Sklep
Białystok + sklep
Kielce (Masłów)
Barlinek + Myślibórz
Lublin TR
Biała Podlaska TR
Puławy II TR
Toruń DLS TR
Kraków SJ TR
I tried to use IF(ISNUMBER(SEARCH("AMBRA";B2));LEFT(B2;LEN(B2)-6) for all options but it's very inefficient. Any help is appreciated.

This needs VBA to create a custom function. You can use the Characters property of the Range to return information about individual characters within a cell:
Public Function getBoldText(cellReference As Range) As String
Dim i As Long
'Loop through each character in the cell
For i = 0 To cellReference.Characters.Count
'If the character is bold then...
If cellReference.Characters(i, 1).Font.Bold Then
'...add it to the output
getBoldText = getBoldText & cellReference.Characters(i, 1).Text
End If
Next i
End Function
Create a module, paste in this code, and then you can use e.g. =getBoldText(A1) within your worksheet to return only bold text from a cell. This function only works with single cells, and returns #VALUE! if the cell contains anything other than text.
Note I would have used For Each...Next for the loop, but despite appearances Characters isn't actually a collection so you can't iterate over it.

#simoco has made the very pertinent point:
How do you imagine Excel should detect which part of string should be left, if there is not any pattern in your values?
However OP has mentioned I have about 10000 rows so though Excel may not be able to provide a full solution it may still be of some help. OP seems, understandably, to have lost interest (perhaps is still working through the 10,000 manually?) but the problem is not unusual and demonstrating an approach, even if one that is only partially successful, may be of some use for others.
So I put the OP's list in A1:A19. From observation, most of what is no emboldened starts after the last space, so in B1:
=LEN(A1)-LEN(SUBSTITUTE(A1," ",""))
By comparing the original length of the strings with their length after removing spaces we obtain the number of spaces in the original string.
In C1:
=IF(RIGHT(A1)=".",A1,SUBSTITUTE(A1," ",REPT(" ",LEN(A1)),B1))
we start at the last space and replace it with as many spaces as the length of the original string. (There is reason for doing so!). Also noticing that Gorzów Wlkp. ends in a full stop (to be retained) whereas nothing to be removed does, we make a specific exception for strigs ending ..
In D1:
=IF(ISERROR(C1),A1,LEFT(C1,LEN(A1)))
Having inserted a large number of spaces between the text to be kept and not to be kept we now select from the left the number of characters that we started with - so mostly what we want to keep plus a lot of blanks. Where there were no spaces to start with the formula in C1 returns an error, so in those cases we take the whole of the original string instead.
In E1:
=IFERROR(LEFT(D1,FIND("+",D1)),D1)
This attempts to cope with at least some of the data containing plus signs +. Which are, with one exception, Barlinek + Myślibórz, in the data sample, to be removed along with any following characters. The above removes those following characters.
In F1:
=TRIM(SUBSTITUTE(E1,"+",""))
This is mainly tidying because there is not much more that can be 'automated'. The +s are removed and the surplus from the spaces that were inserted earlier.
So although the original data was 'unnormalised', of the sample of 19 only the exception mentioned above and Toruń DLS TR and Kraków SJ TR are not as required. For these last two DLS and SJ are retained where they should not be - whether worth a further processing step may depend on what is in the rest of the 10,000 entries - but it may be better anyway to fall short by omission rather than commission.
Stripping Myślibórz from Barlinek + Myślibórz may be considered more of an issue but it may be possible to review all entries containing a + and substitute say & for + and append a space in cases such as Barlinek + Myślibórz, which would then result in Barlinek & Myślibórz (unless the choice is made to continue with a step that reverses the replacement).
So for the example data formulae can handle all but about two of the 19 cases. Extrapolated to the 10,000 rows 8,947 as required might be considered at least a good start and further 'rules' might be added in other columns to deal with any other observable patterns in data in ColumnF that at least is close to what is required.

Related

Excel: Concatenating cells & text using IF statements

I'm trying to combine several cells of data. My problem is in placing spaces between data and, more importantly NOT putting a space when there's no data so I don't get double spaces. Here's a sample:
=TRIM(M12)&IF(N12<>M12;"-"&TRIM(N12);"")&" "&TRIM(G12)&" "&TRIM(H12)&IF(LEN(I12>0);" "&TRIM(I12)&" ")&TRIM(J12)
The data is start year (M), end year (N), make (G), model (H), body style (I), driveline (J).
For some the values in start year and end year are the same.
&IF(N12<>M12;"-"&TRIM(N12);"")
This works perfectly. If the end year is the same as the start year it does not add a - or space after.
For many rows there is no value in body style.
&IF(LEN(I12>0);" "&TRIM(I12)&" ")
This will print the body style if it's present but it always adds a double space if there is no value in body style.
When I change that reference to:
&IF(LEN(I12>0);"-"&TRIM(I12)&"+")
both the - and + print regardless of what's in I12
I've tried many variations. None work, some throw errors. Probably obviously, I do not know what I'm doing in Excel but I'm thinking there must be a better way of checking the cell I12? I tried >1 with no luck but I'm not sure what to check besides the length of the data within.
The TRIM function not only removes leading and trailing spaces, but also reduces any internal multiple space sequences to a single space. By wrapping the whole formula in TRIM(..) you can ignore the possibility of creating double spaces.
Regarding
When I change that reference to:
&IF(LEN(I12>0);"-"&TRIM(I12)&"+")
both the - and + print regardless of what's in I12
This suggests that I12 actually has one or more spaces. Fix that by using LEN(TRIM(I12))>0
Or better, just go ahead and concatenate I12 and let TRIM clean up the spaces.
Note: I'm assuming the IF(LEN(I12>0);"-"&TRIM(I12)&"+") version was just to test that bit of code, so haven't delt with adding - and +.
So, your whole formula can become
=TRIM(M17&IF(M17<>N17;"-"&TRIM(N17);"")&" "&G17&" "&H17&" "&I17&" "&J17)
If you have a version of Excel that supports TEXTJOIN then you can use
=TRIM(M16&IF(M16<>N16,"-"&TRIM(N16),"")&" "&TEXTJOIN(" ",TRUE,G16:J16))

Is there a quick way for excel to identify and remove duplicate series from a cell such as this?

Is there a built in function, or a simple UDF that can identify the pattern in the information below and remove the duplicates?
Assume the following is all within a single excel cell:
80154, 80299, 80299, 82055, 82145, 82205, 82520, 82570, 83840, 83925,
83925, 83986, 83992, 84315, 80154, 80299, 80299, 82055, 82145, 82205,
82520, 82570, 83840, 83925, 83925, 83986, 83992, 84315
There are two sets of data (starts with 80154 ends with 84315). I want to end up with only one set, but I want to do it to 50,000 lines. The final output should be just the BOLD text. Also, sometimes the data repeats itself 3 times, again, I just want the unique set of data.
NOTE: I can't just remove duplicates, because sometimes there will be duplicates in the set that I need to capture in the final output. For example, (A,A,B,C,A,A,B,C) needs to be reduced to (A,A,B,C).
This finds where the first 20% is repeated and cuts the string at that point.
IF it does not find a duplicate it will return the whole string.
=IFERROR(LEFT(A1,FIND(LEFT(A1,LEN(A1)/5),A1,2)-3),A1)
Play with the 5 till you find the proper length of string that will get you the correct answer on all your strings. The higher the number the smaller the string it compares.
Also if it is cutting off too much or not enough, like leaving the , at the end adjust the -3 up and down.

Excel: Extracting a full number from a cell - bringing it all together :)

First off, thank you to all how have helped get me to this point. I'm so close! On to the scenario, which I apologize in advance is a bit of a work in progress.
I have text in a cell and I need to extract a number. The tricky part is there are various situations to address.
The number may immediately follow a "#" and could vary in length. People on Stack Overflow helped me with coming up with this which works great:
MID(B2,(FIND("#",B2,1)+1),FIND(" ",B2,FIND("#",B2,1)+1)-FIND("#",B2,1))
That was a huge leap forward, but there are also situations where there is no # sign and the cell might have "abc (1205) 645 chan", where I need to extract the 645.
I'm using this, below, in conjunction with an on error statement for when there is no "#"
TRIM(MID(B53,(FIND(" " &{"1","2","3","4","5","6","7","8","9"},B53,1)),FIND(" ",B53,FIND({"1","2","3","4","5","6","7","8","9"},B53,1))-FIND({"1","2","3","4","5","6","7","8","9"},B53)))
So I use the first Mid/Find to avoid the (1205) and find the next " x" where x is a number. The problem is it seems I have trouble when the number I'm searching for has 1 or 3+ numbers in it, but if it has 2 I return the value just fine.
It seems I'm very close but just not there yet.
This formula will return the number that follows either a # or a ) in your string. If that pattern does not exist, it will return a #NUM!` error
=AGGREGATE(14,6,--MID(A1,MIN(FIND({"#",")"},A1&"#)"))+1,{1,2,3,4,5}),1)
Note the array constant as the num_chars argument of the MID function. The maximum number should be at least the largest number of digits (or decimal + digits) plus any spaces between the delimiter and the first digit, that might be expected to be found.
EDIT: If your version of Excel is prior to 2010, and does not have the AGGREGATE function, you may use this array-entered formula instead, so long as the values to be returned will be positive numbers:
=MAX(IFERROR(--MID(A1,MIN(FIND({"#",")"},A1&"#)"))+1,{1,2,3,4,5}),0))
This formula must be entered by holding down ctrl+shift while hitting enter

retrieve part of the info in a cell in EXCEL

I vaguely remember that it is possible to parse the data in a cell and keep only part of the data after setting up certain conditions. But I can't remember what exact commands to use. Any help/suggestion?
For example, A1 contains the following info
0/1:47,45:92:99:1319,0,1320
Is there a way to pick up, say, 0/1 or 1319,0,1320 and remove the rest unchosen data?
I know I can do text-to-column and set the delimiter, followed by manually removing the "un-needed" data, but my EXCEL spreadsheet contains 100 columns X 500000 rows with each cell looking similar to the data above, so I am afraid EXCEL may crash before finishing the work. (have been trying with LEFT, LEN, RIGHT, MID, but none seems to work the way I had hoped)
Any suggestion will be greatly appreciated.
I think what you are looking for is combination of find and mid, but you'll have to work out exactly how you want to split your string:
A1 = 0/1:47,45:92:99:1319,0,1320 //your number
B1 = Find(“:“,A1) //location of first ":" symbol
C1 = LEN(A1) - B1 //character count to copy ( possibly requires +1 or -1 after B1.
=Left(A1,B1) //left of your symbol
=Mid(A1,B1+1,C1) //right size from your symbol (you can also replace C1 with better defined number to extract only 1 portion
//You can also nest the statements to save space, but usually at cost of processing quantity increase
This is the concept, you will probably need to do it in multiple cells to split a string as long as yours. For multiple splits you probably want to replicate this command to target the result of previous right/mid command.
That way, you will get cell result sequence like:
0/1:47,45:92:99:1319,0,1320; 47,45:92:99:1319,0,1320; 92:99:1319,0,1320; 99:1319,0,1320......
From each of those you can retrieve left side of the string up to ":" to get each portion of a string.
If you are working with a large table you probably want to look into VB scripting. To my knowledge there is no single excel command that can take 1 cell and split it into multiple ones.
Let me try to help you about this, I am not a professional so you may face some problems. First of all my solution contains 2 columns to be added to the source column as you can see below. However you can improve formulas with this principle.
Column B Formula:
=LEFT(A2,FIND(":",A2,1)-1)
Column C Formula:
=RIGHT(A2,LEN(A2)-FIND("|",SUBSTITUTE(A2,":","|",LEN(A2)-LEN(SUBSTITUTE(A2,":","")))))
Given you statement of having 100x columns I imagine in some instances you are needing to isolate characters in the middle of your string, thus Left and Right may not always work. However, where possible use them where you can.
Assuming your string is in cell F2: 0/1:47,45:92:99:1319,0,1320
=LEFT(F2,3)
This returns 0/1 which are the first 3 characters in the string counting from the left. Likewise, Right functions similarly:
=RIGHT(F2,4)
This returns 1320, returning the 4 characters starting from the right.
You can use a combination of Mid and Find to dynamically find characters or strings based off of defined characters. Here are a few examples of ways to dynamically isloate values in your string. Keep in mind the key to these examples is the nested Find formula, where the inner most Find is the first character to start at in the string.
1) Return 2 characters after the second : character
In cell F2 I need to isolate the "92":
=MID(F2,FIND(":",F2,FIND(":",F2)+1)+1,2)
The inner most Find locates the first : in the string (4 characters in). We add the +1 to move to the 5th character (moving beyond the first : so the second Find will not see it) and move to the next Find which starts looking for : again from that character. This second Find returns 10, as the second : is the 10th character in the string. The Mid formula takes over here. The formula is saying, Starting at the 10th character return the following 2 characters. Returning two characters is dictated by the 2 at the end of the formula (the last part of the Mid formula).
2) In this case I need to find the 2 characters after the 3rd : in the string. In this case "99":
=MID(F2,FIND(":",F2,FIND(":",F2,FIND(":",F2)+1)+1)+1,2)
You can see we have simply added one more nested Find to the formula in example 1.

Replacing a section of the data in a cell for thousands of excel data

I have a large spreadsheet with column data like:
ABC:1:I.0
ABC:1:I.1
ABC:1:I.2
ABC:1:I.3
ABC:2:I.0
ABC:2:I.1
ABC:2:I.2
ABC:2:I.3
ABC:3:I.0
ABC:3:I.2
ABC:3:I.3
ABC:4:I.0
ABC:4:I.1
ABC:4:I.2
ABC:4:I.3
ABC:5:I.0
ABC:5:I.1
ABC:5:I.2
ABC:5:I.3
ETC.
I need to replace the above with the following:
ABC:I.Data[1].0
ABC:I.Data[1].1
ABC:I.Data[1].2
ABC:I.Data[1].3
ABC:I.Data[2].0
ABC:I.Data[2].1
ABC:I.Data[2].2
ABC:I.Data[2].3
ABC:I.Data[3].0
ABC:I.Data[3].2
ABC:I.Data[3].3
ABC:I.Data[4].0
ABC:I.Data[4].1
ABC:I.Data[4].2
ABC:I.Data[4].3
ABC:I.Data[5].0
ABC:I.Data[5].1
ABC:I.Data[5].2
ABC:I.Data[5].3
ETC.
Here is a sample of the data, most of the data follows a similar format with the exception of the naming "ABC", which can vary in size, so it might be "ABCD" and also with the exception of the letter "I", it can be "O" as well. Also, some might be missing some values such as ABC:3:I.1 which is missing from the data. I am not too familiar with excel formulas or VBA code. Does anyone know how to do this? I have no preference on which method it has to be done in as I don't mind learning some VBA code if someone provides me with a VBA solution.
I was thinking of using some sort of loop along with some conditional statements.
Thanks!
Please try:
=LEFT(F11,FIND(":",F11))&MID(F11,FIND(":",F11,6)+1,1)&".Data["&MID(F11,FIND(":",F11,2)+1,1)&"]."&RIGHT(F11,1)
copied down to suit, assuming placed in Row11 and your data is in ColumnF starting in Row11.
Curiosities:
When this A was first posted it attempted to address only the tabulated example input and output. I temporarily deleted that version while addressing that what was in the table as ABC might at times be ABCD and that what was I might at times be O.
OP has posted an answer that I edited to make no visible change but which shows as the deletion of two characters. A copy of the OP’s formula exhibited a syntax error prior to my edit.
OP suggested an edit to my answer but this was rejected by the review process. As it happens, I think the edit suggestion was incorrect.
I have edited my answer again to include these ‘curiosities’ and to match the cell reference used by the OP in his answer.
=LEFT(A1,SEARCH(":",A1)) & MID(A1, SEARCH(".",A1)-1, 2) &
"Data[" & MID(A1,SEARCH(":",A1)+1,1) & "]" & RIGHT(A1,2)
With the help of pnuts I was able to come up with my own solution:
=LEFT(F11,LEN(F11)-5)&MID(F11,LEN(F11)-2,2)&"Data["&MID(F11,LEN(F11)-4,1)&"]"&RIGHT(F11,2)
My solution works based on the fact that the length of the last six values in the string ABC:1:I:0 will always be the same in size for all the data I have, hence you see LEN(F11)-some number in my code. The only part of the string that changes in size is the first part, in this case ABC which can also be ABCDEF, etc.
If you'd like to use formulas rather than VBA, an easy option is to split the data into 4 columns, using the Text To Columns option - first split using the colon as a delimiter, then using a full-stop / period as a delimiter.
Once you have 4 columns of data (one for each block), you can use the Concatenate function to join them and add in the extra characters: =CONCATENATE(A1,":",C1,".","Data[",B1,"].",D1)
This should still work if you have extra / alternative characters (eg ABCD instead of ABC), as long as you have the same delimiters, but obviously you'd need to test to make sure.

Resources