Excel: Concatenating cells & text using IF statements - excel

I'm trying to combine several cells of data. My problem is in placing spaces between data and, more importantly NOT putting a space when there's no data so I don't get double spaces. Here's a sample:
=TRIM(M12)&IF(N12<>M12;"-"&TRIM(N12);"")&" "&TRIM(G12)&" "&TRIM(H12)&IF(LEN(I12>0);" "&TRIM(I12)&" ")&TRIM(J12)
The data is start year (M), end year (N), make (G), model (H), body style (I), driveline (J).
For some the values in start year and end year are the same.
&IF(N12<>M12;"-"&TRIM(N12);"")
This works perfectly. If the end year is the same as the start year it does not add a - or space after.
For many rows there is no value in body style.
&IF(LEN(I12>0);" "&TRIM(I12)&" ")
This will print the body style if it's present but it always adds a double space if there is no value in body style.
When I change that reference to:
&IF(LEN(I12>0);"-"&TRIM(I12)&"+")
both the - and + print regardless of what's in I12
I've tried many variations. None work, some throw errors. Probably obviously, I do not know what I'm doing in Excel but I'm thinking there must be a better way of checking the cell I12? I tried >1 with no luck but I'm not sure what to check besides the length of the data within.

The TRIM function not only removes leading and trailing spaces, but also reduces any internal multiple space sequences to a single space. By wrapping the whole formula in TRIM(..) you can ignore the possibility of creating double spaces.
Regarding
When I change that reference to:
&IF(LEN(I12>0);"-"&TRIM(I12)&"+")
both the - and + print regardless of what's in I12
This suggests that I12 actually has one or more spaces. Fix that by using LEN(TRIM(I12))>0
Or better, just go ahead and concatenate I12 and let TRIM clean up the spaces.
Note: I'm assuming the IF(LEN(I12>0);"-"&TRIM(I12)&"+") version was just to test that bit of code, so haven't delt with adding - and +.
So, your whole formula can become
=TRIM(M17&IF(M17<>N17;"-"&TRIM(N17);"")&" "&G17&" "&H17&" "&I17&" "&J17)
If you have a version of Excel that supports TEXTJOIN then you can use
=TRIM(M16&IF(M16<>N16,"-"&TRIM(N16),"")&" "&TEXTJOIN(" ",TRUE,G16:J16))

Related

How can I substitute multiple occurrences of junk strings in Excel?

In the image, 'muddle' is the string containing junk words and the strings I want to extract. There is a fixed list of junk words - the good strings could be literally anything.
You can see this formula has correctly extracted "moo" and "coo", which are not in the list of junk words. The formula is below.
=LET(junkStart,FILTER(SEARCH(Table1[junkwords],Table2[muddle]),ISNUMBER(SEARCH(Table1[junkwords],Table2[muddle]))),
junkEnd,FILTER(SEARCH(Table1[junkwords],Table2[muddle])+LEN(Table1[junkwords])-1,ISNUMBER(SEARCH(Table1[junkwords],Table2[muddle])+LEN(Table1[junkwords])-1)),
goodstart,FILTER(junkEnd+1,(junkEnd+1<=LEN(Table2[muddle]))*(ISERROR(XMATCH(junkEnd+1,junkStart)))),
goodend,FILTER(junkStart-1,(junkStart-1>=LEN(1))*(ISERROR(XMATCH(junkStart-1,junkEnd))))+1,
goodchars,goodend-goodstart,
TEXTJOIN("; ",TRUE,MID(Table2[muddle],goodstart,goodchars)))
This works well, but it falls down if a junk word occurs more than once. See below.
The only difference is that 'woo' occurs twice in the second example.
I need a single cell solution. VBA is not an option for me. Using the name manager would be untidy, as would nested formulas.
I've got this far with formulas, which as far as I can tell is the furthest anyone has got with the 'removing multiple words from a cell' problem. I can see the issue - once SEARCH locates the start of a string in a cell, it doesn't go looking for a second occurrence of that string. But I don't know how to find the start of every instance of every string. Can anyone help?
REDUCE is perfect for this:
=REDUCE(Table2[muddle],Table1[junkwords],LAMBDA(m,j,SUBSTITUTE(m,j,"")))
REDUCE starts at the Table2[muddle] value as m then it substitutes the first value of Table1[junkwords] j with "" the outcome becomes the new m which will get a substitute of the second value of j. The result will be the new m, etc.
If you would want to have it comma separated it becomes more complicated, but you can realize by:
=LET(t,SUBSTITUTE(","&REDUCE(Table2[muddle],Table1[junkwords],LAMBDA(x,y,SUBSTITUTE(x,y,",")))&",",",,",","),
MID(t,2,LEN(t)-3))
This does almost the same as the previous solution, but instead of substituting for blanks it substitutes for , and substitutes all duplicate ,, for singles, so if more substitutes followed eachother it results in one comma. Also, if the first and/or last part got substituted by a single ,, then the result would have a leading and/or trailing ,. This is solved by first adding , in the front and back before substituting the double comma's for singles. the result t is then wrapped in MID, where the first and last character (both being a ,) are removed.
Alternate solution:
=LET(t,REDUCE(Table2[muddle],Table1[junkwords],LAMBDA(x,y,SUBSTITUTE(x,y," "))),
SUBSTITUTE(TRIM(t)," ",","))
Or in one go if you don't want to use LET:
=SUBSTITUTE(TRIM(REDUCE(Table2[muddle],Table1[junkwords],LAMBDA(x,y,SUBSTITUTE(x,y," "))))," ",",")
This replaces the junk words with a space. Regardless how many junk words in between words or how many trailing or leading spaces TRIM will fix it to the words separated by one space only. Substituting the spaces for comma gets to your result.
There's no single-formula solution if the junkwords list is not fixed.
Instead, you may choose to use the Substitute() function on each cell of the "Extracted Strings" column to substitute all occurances of each junk word in muddle, i.e. substitute "boo" muddle, then substitute "voo" in the resulted string, replace "noo" in the resulted string...so on. You will get the last cell.
One point to note though, you need to ensure no substring / partial strings problem in the junkwords or you need to define the rules of processing in order for the solution to be "complete". Consider the followings:
junk words = abc, def, cde
muddle = 1234abcdef5678
if you process the string in the above order, you got "12345678"
if you process the junk words in reverse order, you got "123abf5678"

Excel TEXT function - Number Positioning

I'm calculating the difference between 2 columns of data and calculating the numeric difference and % increase. I wanted to combine these two values in one cell using the text function.
The problem: I have successfully done this in excel but have a formatting problem. I have separated the numeric and percent difference by the delimiter "|". Sometimes the % difference value is two digits and some times its 1 digit. I'd like to have a placeholder for the tens digit so all of the delimiters align in the column. Is there any way to do this using the function?
For example, you could solve this problem by adding "000" in the format_text argument for the second text function, but I don't want any leading zeros in my display cell.
Thank you,
You can use ? to add leading or trailing space:
= H70-F70 & Text(H70/F70-1, " | ??%")
You can also use Monospaced font like Courier New for better alignment.

Concatenate Custom Function

On a daily basis I need to load data to one of our systems. However Excel deletes the previous zeros in front of the contractor IDs. So i have to add THREE zeros manually. I normally use the CONCATENATE function however now the IDs are coming differently so some IDs now only need to have TWO zeros added.
example:
ID
911111
I use concatenate to make it look like:
000911111
I came up with the IF formula that detects if the ID starts with a number NINE, to concatenate TWO zeros and if not, then to add THREE zeros.
example:
=IF(LEFT(A32,1)="9",CONCATENATE("00",A32),CONCATENATE("000",A32))
Now I want to create this formula as a custom defined so I do not have to write down the formula ever time I work on the data every day.
Any suggestions I will really appreciate.
In addition to the formatting responses provided in the comments, you could use the RIGHT function to cut off the leading zeroes to the appropriate amount.
For example, assuming A1 holds a string of numbers, between 0 & 9 digits long. We can create text representing a 9 digit string, with as many leading zeroes as necessary, as follows:
=RIGHT(REPT("0",9) & A1,9)
REPT("0",9) tells Excel to repeat the character "0" 9 times. It then tacks on whatever text is in A1. Then it takes only the rightmost 9 characters of the concatenation.
I generally would recommend the Formatting options noted in those comments, unless you need the text to be 9 characters for other formula purposes.

retrieve part of the info in a cell in EXCEL

I vaguely remember that it is possible to parse the data in a cell and keep only part of the data after setting up certain conditions. But I can't remember what exact commands to use. Any help/suggestion?
For example, A1 contains the following info
0/1:47,45:92:99:1319,0,1320
Is there a way to pick up, say, 0/1 or 1319,0,1320 and remove the rest unchosen data?
I know I can do text-to-column and set the delimiter, followed by manually removing the "un-needed" data, but my EXCEL spreadsheet contains 100 columns X 500000 rows with each cell looking similar to the data above, so I am afraid EXCEL may crash before finishing the work. (have been trying with LEFT, LEN, RIGHT, MID, but none seems to work the way I had hoped)
Any suggestion will be greatly appreciated.
I think what you are looking for is combination of find and mid, but you'll have to work out exactly how you want to split your string:
A1 = 0/1:47,45:92:99:1319,0,1320 //your number
B1 = Find(“:“,A1) //location of first ":" symbol
C1 = LEN(A1) - B1 //character count to copy ( possibly requires +1 or -1 after B1.
=Left(A1,B1) //left of your symbol
=Mid(A1,B1+1,C1) //right size from your symbol (you can also replace C1 with better defined number to extract only 1 portion
//You can also nest the statements to save space, but usually at cost of processing quantity increase
This is the concept, you will probably need to do it in multiple cells to split a string as long as yours. For multiple splits you probably want to replicate this command to target the result of previous right/mid command.
That way, you will get cell result sequence like:
0/1:47,45:92:99:1319,0,1320; 47,45:92:99:1319,0,1320; 92:99:1319,0,1320; 99:1319,0,1320......
From each of those you can retrieve left side of the string up to ":" to get each portion of a string.
If you are working with a large table you probably want to look into VB scripting. To my knowledge there is no single excel command that can take 1 cell and split it into multiple ones.
Let me try to help you about this, I am not a professional so you may face some problems. First of all my solution contains 2 columns to be added to the source column as you can see below. However you can improve formulas with this principle.
Column B Formula:
=LEFT(A2,FIND(":",A2,1)-1)
Column C Formula:
=RIGHT(A2,LEN(A2)-FIND("|",SUBSTITUTE(A2,":","|",LEN(A2)-LEN(SUBSTITUTE(A2,":","")))))
Given you statement of having 100x columns I imagine in some instances you are needing to isolate characters in the middle of your string, thus Left and Right may not always work. However, where possible use them where you can.
Assuming your string is in cell F2: 0/1:47,45:92:99:1319,0,1320
=LEFT(F2,3)
This returns 0/1 which are the first 3 characters in the string counting from the left. Likewise, Right functions similarly:
=RIGHT(F2,4)
This returns 1320, returning the 4 characters starting from the right.
You can use a combination of Mid and Find to dynamically find characters or strings based off of defined characters. Here are a few examples of ways to dynamically isloate values in your string. Keep in mind the key to these examples is the nested Find formula, where the inner most Find is the first character to start at in the string.
1) Return 2 characters after the second : character
In cell F2 I need to isolate the "92":
=MID(F2,FIND(":",F2,FIND(":",F2)+1)+1,2)
The inner most Find locates the first : in the string (4 characters in). We add the +1 to move to the 5th character (moving beyond the first : so the second Find will not see it) and move to the next Find which starts looking for : again from that character. This second Find returns 10, as the second : is the 10th character in the string. The Mid formula takes over here. The formula is saying, Starting at the 10th character return the following 2 characters. Returning two characters is dictated by the 2 at the end of the formula (the last part of the Mid formula).
2) In this case I need to find the 2 characters after the 3rd : in the string. In this case "99":
=MID(F2,FIND(":",F2,FIND(":",F2,FIND(":",F2)+1)+1)+1,2)
You can see we have simply added one more nested Find to the formula in example 1.

Splitting words in different combinations in Excel

I would like to get only bold part of the text in new column i.e
Czechowice - Dziedzice AMBRA
Białystok DEF
Komorniki
Bielsko Biała EC
Farmacja Luboń
Gorzów Wlkp.
Grudziądz EC
Kędzierzyn-Koźle EC
Ostrowiec Świętokrzyski EC
Puck T
Przeworsk+Sklep
Białystok + sklep
Kielce (Masłów)
Barlinek + Myślibórz
Lublin TR
Biała Podlaska TR
Puławy II TR
Toruń DLS TR
Kraków SJ TR
I tried to use IF(ISNUMBER(SEARCH("AMBRA";B2));LEFT(B2;LEN(B2)-6) for all options but it's very inefficient. Any help is appreciated.
This needs VBA to create a custom function. You can use the Characters property of the Range to return information about individual characters within a cell:
Public Function getBoldText(cellReference As Range) As String
Dim i As Long
'Loop through each character in the cell
For i = 0 To cellReference.Characters.Count
'If the character is bold then...
If cellReference.Characters(i, 1).Font.Bold Then
'...add it to the output
getBoldText = getBoldText & cellReference.Characters(i, 1).Text
End If
Next i
End Function
Create a module, paste in this code, and then you can use e.g. =getBoldText(A1) within your worksheet to return only bold text from a cell. This function only works with single cells, and returns #VALUE! if the cell contains anything other than text.
Note I would have used For Each...Next for the loop, but despite appearances Characters isn't actually a collection so you can't iterate over it.
#simoco has made the very pertinent point:
How do you imagine Excel should detect which part of string should be left, if there is not any pattern in your values?
However OP has mentioned I have about 10000 rows so though Excel may not be able to provide a full solution it may still be of some help. OP seems, understandably, to have lost interest (perhaps is still working through the 10,000 manually?) but the problem is not unusual and demonstrating an approach, even if one that is only partially successful, may be of some use for others.
So I put the OP's list in A1:A19. From observation, most of what is no emboldened starts after the last space, so in B1:
=LEN(A1)-LEN(SUBSTITUTE(A1," ",""))
By comparing the original length of the strings with their length after removing spaces we obtain the number of spaces in the original string.
In C1:
=IF(RIGHT(A1)=".",A1,SUBSTITUTE(A1," ",REPT(" ",LEN(A1)),B1))
we start at the last space and replace it with as many spaces as the length of the original string. (There is reason for doing so!). Also noticing that Gorzów Wlkp. ends in a full stop (to be retained) whereas nothing to be removed does, we make a specific exception for strigs ending ..
In D1:
=IF(ISERROR(C1),A1,LEFT(C1,LEN(A1)))
Having inserted a large number of spaces between the text to be kept and not to be kept we now select from the left the number of characters that we started with - so mostly what we want to keep plus a lot of blanks. Where there were no spaces to start with the formula in C1 returns an error, so in those cases we take the whole of the original string instead.
In E1:
=IFERROR(LEFT(D1,FIND("+",D1)),D1)
This attempts to cope with at least some of the data containing plus signs +. Which are, with one exception, Barlinek + Myślibórz, in the data sample, to be removed along with any following characters. The above removes those following characters.
In F1:
=TRIM(SUBSTITUTE(E1,"+",""))
This is mainly tidying because there is not much more that can be 'automated'. The +s are removed and the surplus from the spaces that were inserted earlier.
So although the original data was 'unnormalised', of the sample of 19 only the exception mentioned above and Toruń DLS TR and Kraków SJ TR are not as required. For these last two DLS and SJ are retained where they should not be - whether worth a further processing step may depend on what is in the rest of the 10,000 entries - but it may be better anyway to fall short by omission rather than commission.
Stripping Myślibórz from Barlinek + Myślibórz may be considered more of an issue but it may be possible to review all entries containing a + and substitute say & for + and append a space in cases such as Barlinek + Myślibórz, which would then result in Barlinek & Myślibórz (unless the choice is made to continue with a step that reverses the replacement).
So for the example data formulae can handle all but about two of the 19 cases. Extrapolated to the 10,000 rows 8,947 as required might be considered at least a good start and further 'rules' might be added in other columns to deal with any other observable patterns in data in ColumnF that at least is close to what is required.

Resources