Related
I have a column containing a string value as shown is the example below :
ZAE/GER-ERT/HEZ/PDC
The idea is to extract the first trigraph (ZAE in this extract) and a second one based a rule.
The rule is, if there is a '-' separating two trigraphs, we don't extract them, we just take the first trigraph after a '/' and without a '-' after it.
We then use a - to separate the two results, here is the aim for the example : ZAE-HEZ
I would like to get this value in a new calculated column.
I've tried to play with the indexes based on the Find() and ExtractRX() functions, but couldn't make it work.
Thanks in advance !
I am not sure this is the simplest way, but it works for your example (assuming the strings are always alphanumeric in chunks of 3).
You can do it via an intermediate column (for sanity, although you could put the [tmp] formula directly into the final column):
[tmp] as
RXReplace(RXReplace([your_column],'\\w{3}-\\w{3}','','g'),'/+','/','g')
This removes any double trigraph like GER-ERT and then removes any leftover double /
Then the final column splits [tmp] by / and concatenates the first and second item
Concatenate(Split([tmp],'/',1),'-',Split([tmp],'/',2))
This is my first StackOverflow question, so apologies if I am unclear.
Currently, my work uses an Excel tracking doc to log project info. The column info is like so:
CELL B1 (Project Number) =IF(B2=""," ",MID(B2,FIND("P2",B2),9))
CELL B2 (Project Name) Client / P2XXXXXXX / Name
Thus, the P2XXXXXXX gets pulled out of B2 and populated into B1.
However, management has recently switched systems, so now, some project numbers have the P2XXXXXXX format and others have a PRJ-XXXXX format.
So we need a formula the produces nothing if the cell is blank and EITHER the P2XXXXXXX number or PRJ-XXXXX number if the cell is not blank.
Is it possible? If any further details are needed, let me know. Thanks in advance!
Well, if the / is always there then this can work:
IF(B2="","",MID(B2,FIND("/",B2,1)+2,9))
assuming the name is always 9 characters.
String Between Two Same Characters
Maybe the next month your company will start using a different first letter or could add more numbers e.g. SPRXXXXXXXXXX. So you could solve this problem by extracting whatever is between those two slashes.
=IF(B2="","",TRIM(MID(B2,FIND("/",B2)+1,FIND("/",B2,FIND("/",B2)+1)-FIND("/",B2)-1)))
Find the first character =FIND("/",B2), but we need the next one:
=FIND("/",B2)+1
Find the second character but search from the postition after the first found:
=FIND("/",B2,FIND("/",B2)+1)
Now get the string between them:
=MID(B2,FIND("/",B2)+1,FIND("/",B2,FIND("/",B2)+1)-FIND("/",B2)-1)
(note how the last minus was 'converted' from a plus to a minus (- + + = -)).
Remove the leading and trailing spaces:
=TRIM(MID(B2,FIND("/",B2)+1,FIND("/",B2,FIND("/",B2)+1)-FIND("/",B2)-1))
Add the condition when the cell is blank:
=IF(B2="","",TRIM(MID(B2,FIND("/",B2)+1,FIND("/",B2,FIND("/",B2)+1)-FIND("/",B2)-1)))
Here's another way using LEFT and RIGHT:
=IF(B2="","",TRIM(LEFT(RIGHT(B2,LEN(B2)-FIND("/",B2)),FIND("/",B2))))
Although you can solve this problem with a combination of slicing, trimming, and complex conditionals, the most expressive and easy to maintain solution is to use regular expressions. Regular expressions have a bit of a learning curve, but there's a great playground website where you can experiment with them, and this page has a pretty good writeup on how regular expressions work in excel.
Specifically, this regular expression addresses the two naming conventions you've highlighted, but it can be updated to support more naming conventions as your company inevitably adds more:
P(RJ-)?((\d){9}|(\d){5})
To break that down from left to right:
P: both patterns start with a "P"
(RJ-)? One pattern follows with "RJ-", but the other doesn't. This is a grouped part of the pattern, and the question mark means that this part of the pattern is optional.
((\d){9}|(\d){5}): by far the nastiest part, but this basically means that there is going to be a sequence of numbers (\d), and there will either be nine of them or five of them. By wrapping the whole thing in parenthesis, they are always the second captured group, no matter the length of the sequence of numbers. This means that you can always extract the project id by looking at the value of the second capture group.
You can also make the expression more generalized by replacing ((\d){9}|(\d){5}) with simply (\d+). That just means "one or more digits." That gives you a much more simplified overall expression of this:
P(RJ-)?(\d+)
Depending on whether or not you care about validating strictly that project ids are 5 OR 9 digits long, that pattern above might be suitable, and it has the benefit of being more flexible. Still, the project ID is in the second captured group.
Abridged Question:
If I have a concatenated string of "|#|#|#|...|#|", how can I apply a multiplier to each of the numbers and update the concatenated text? For example, for |4|12|8|, multiply by a factor of 2 and update the concatenated text to |8|24|16|.
Background
I have three columns of interest. The first column contains a date, the second an amount or factor, and the third column concatenates data into the format "|#|#|...|#|" (e.g., |2|5|, |2|5|12|, |4|12|, etc.). At times, a multiplying factor needs to be applied to the concatenated data, and the individual numbers would need to be updated accordingly.
An example would be—
Date Amt Concatenated Data
01/01/18 2 |2|
01/05/18 5 |2|5|
02/06/18 12 |2|5|12|
03/25/18 -3 |4|12|
03/31/18 8 |4|12|8|
04/01/18 F2 |8|24|16| (factor of 2 applied)
04/15/18 12 |8|24|16|12|
04/01/18 F1/4 |2|6|4|3| (factor of 1/4 applied)
With a formula, how can I apply the factor to the concatenated data, and update the individual numbers?
I'm bound by the following conditions:
Excel 2007, so no TEXTJOIN function
No VBA or UDFs (due to security policies)
Individual numbers are dynamic (i.e., I can't use a static value for the "old_text" parameter of the SUBSTITUTE formula)
Amount of individual numbers within concatenated data is also dynamic (may contain one number, or may contain dozens of different numbers)
I can pull out the individual numbers using an array formula. I can even then multiply those numbers by the factor to produce an array result. However, I can't rebuild the concatenated data, because CONCATENATE doesn't work on an array. I've also tried SUBSTITUTE, but I can't iterate through the "|" separators. I can only substitute a given segment (e.g., change all entries of "|2|" to "|4|"). Nesting SUBSTITUTE or using individual columns won't work, since it could potentially involve dozens of instances.
Just to add some info on the concatenated data:
Amt>0, then value is concatenated to the end of the previous concatenated value
Amt<0, begin reducing individual numbers in concatenated value (CV) until reduction amount reached (e.g., for |2|5|12| and Amt=-3, reduce CV to |4|12|, which is -2 from the first segment and -1 from the second segment)
Amt reduction is limited to the sum of the previous CV's individual numbers (e.g., for |4|12|, the reduction cannot exceed 16)
Amt=F#, indicates a multiplying factor, and the CV's numbers need to be updated
The CV has no max (could have dozens to hundreds of individual numbers, with numbers going from 1 to 100,000+), other than any max applied by Excel itself on string length
HIGH LEVEL
Four parts to this solution
They satisfy pre-requisites (2007 compatibility, no VB, no Office 365 requirement, no custom VB functions, provide for complete 'dynamic' nature of variable length of cells to concatenate)
Caveat: to best knowledge / research, there is no parsimonious single-cell function & therefore an interim step has been proposed)
One more caveat: I imagine the simple 'hack' of wrapping a graph around the delimited data is out of question (see 'Other/Various' below ☺)
PARTS 1-4
Accompanying parts 1-4 below are functions which relate to the following screenshot:
I have also uploaded / amended to meet requirements of Google Sheets (see here)
Parts 1 & 2:
Similar in that they rely upon FilterXML technique to count component / terms, and split cells respectively:
Part 1:
=COUNT(2*TRANSPOSE(FILTERXML("<AllText><Num>"&SUBSTITUTE(LEFT(MID(D12,2,LEN(D12)-1),LEN(MID(D12,2,LEN(D12)-1))-1),"|","</Num><Num>")&"</Num></AllText>","//Num")))
Note: google sheets doesn't recognise FilterXML, so have amended technique/functions accordingly. For instance, above can be determined using counta on the split cells in Part 2 (easier / much more simpler than proposed approach above, albeit less robust given any cells lying to the right of the split cells will interfere with ordinary functionality of this approach).
Part 2:
It's either a manual approach, a fancy series of 'mid' &/or substitute / left/right functions, or the following FilterXML code which, per various sources (e.g. here) should be compatible with Excel 2007:
=IF(LEFT(C12,1)="F",1*SUBSTITUTE(C12,"F",""),1)*TRANSPOSE(FILTERXML("<AllText><Num>"&SUBSTITUTE(LEFT(MID(D12,2,LEN(D12)-1),LEN(MID(D12,2,LEN(D12)-1))-1),"|","</Num><Num>")&"</Num></AllText>","//Num"))
Commonality with Part 1 (re: FilterXML) can be seen - the only difference is that the count(Part 1) has been replaced with the transformation (multiplicative factor, as given in O.P's Q).
Part 3
Nothing fancy here - a simple concatenation (which is a far cry from a 'recursive' substitution function, I know, but hey - it does the trick and can always be placed in a mirror copy of the original sheet to avoid space issues/cell interaction issues)
=IF(H12="","",IF(G24="","|","")&G24&H12&"|")
Part 4
Thanks to the number of terms derived in Part 1, an offset function can easily determine the final cell pertaining to the concatenated 'build up' of 'transformed' values (per Part 3):
=OFFSET(H31,0,E31-1,1,1)
OTHER / VARIOUS
Various other proposals and 'workarounds' exist; unfortunately, these appear to fall short in one way or another of the pre-requisites set forth, videlicit:
a) Function/formula based
b) No VB
c) Excel 2007
d) Dynamic (variable/unknown number of terms)
Manual: e.g. function = concatenate(transpose(desired range)), and then components of the concatenate function and pressing F9 to convert to calculated values, which are readily applicable in the concatenate function. Disadvantage: time consuming in relation to 'automated solution' (needs to be done for each applicable toy). Advantage: no additional 'spreadsheet real estate' required, quicker/straightforward implementation in first instance.
Variants of the 'build-up' method: e.g. per Part 3, however, this alone does not ensure for an automated approach across an unknown number of terms in the original concatenated list.
Have mentioned in a previous solution (here), but may be case that you are eligible for Office 365 functionality whilst on a previous version of Excel (see Office Insider here)
Other proposals (above/below in this forum mind you) propose textjoin (so not sure if this is a comprehension issue or what ☺)
And yes, as alluded to at outset, you can easily achieve the desired outcome using a simple graph! Just for fun then, sort the data in reverse order, and include the split/delimited values as a bar graphs' "x-values" (which, by defn. for this type of graph, will now appear along the ordinary Cartesian 'y/vertical' axis)...
Zero points for this but thought it was an interesting discovery on my part!
(and if still in doubt, here's what the 'graph' would look like if I didn't kill everything except for the axis labels...):
Numerous references for relevant other items above, including research areas, as follows:
Excel Champs
StackOverflow - alternative applications for FilterXML
JUST ONE MORE THING...
In true Columbo style modus operandi, other ideas/approaches considered:
Application of pivot table?
Constructing matrices: I got a solution with a series of offset functions, but couldn't think of a feasible way to implement given space issues
Converting the split cells into a long digit through summation: e.g. 8 22 16 = 80 000 + 22 00 + 16. Using a substitute function with text (long digit, "General") I was able to successfully introduce the delimiter character ('|') for pairs of adjacent 'tuples' (e.g. I could get '8|2216', '822|16', but then a 'build up' formula where one cell depends upon converted values of the previous and so one was required once more, which landed me back to the proposal I have set out above
fyi - the matrix consideration only solves tuples of 2, for n-dimensional /combination one would need to 'pass' a string of characters over its mirror copy - e.g. {6,10,22} would pass over {6,10,22}, ignoring duplicate values would yield a trapezium as follows:
6
10
22
6
10
22
6
10
22
after the copy has 'passed' over the original (first row), we have the desired combination (22,10,6) (on the 'diagonal' such a matrix). This is akin to how Fourier Transforms work (kind of); but that aside, it was tempting to construct a matrix like this, but couldn't be bothered at this stage.
Will probably turn out to be a far simpler way that someone comes up with (I won't be the only person surprised based upon the various sources I've considered...)
I would like to ask for your help with the formulation of a formula in Excel in order to compare the total number of search results upon using different sets of separator characters.
As I have multiple columns with content, as in the example below, I thought it would be possible to Count the search results in some way and do this for each column separately ( I would actually prefer to treat each column separately).
A
1 L-516-S-221-S-223
2 H-140.STR3
3 ST0 XP 23-9
4 etc.......
Preferably, I would like to use a varying a set of separator characters in order to determine the impact on the number of search results based on this set of separator characters. Logically, with an increasing number of separators more results will be returned (depending on separators included in the cell values of course).
The set of characters that I would like to experiment with is: “-_ .,;: “
Hopefully this makes sense and someone is able to help me out. Thank you.
Kind regards,
P
In your example - on its own will detect all three instances but for an overview you might construct a grid (say B1:H1 of your separators, including a space rather than an empty cell) and ColumnA each column in turn (maybe via links) then a formula in B2 such as:
=--ISNUMBER(FIND(B$1,$A2))
copied across to ColumnH and down to suit.
Alternative formula (for different question):
=IF(LEN($A2)-LEN(SUBSTITUTE($A2,B$1,""))>0,LEN($A2)-LEN(SUBSTITUTE($A2,B$1,""))+1,0)
Assumes, for example, no trailing spaces and separators are always separated. Results are not necessarily cumulative.
I vaguely remember that it is possible to parse the data in a cell and keep only part of the data after setting up certain conditions. But I can't remember what exact commands to use. Any help/suggestion?
For example, A1 contains the following info
0/1:47,45:92:99:1319,0,1320
Is there a way to pick up, say, 0/1 or 1319,0,1320 and remove the rest unchosen data?
I know I can do text-to-column and set the delimiter, followed by manually removing the "un-needed" data, but my EXCEL spreadsheet contains 100 columns X 500000 rows with each cell looking similar to the data above, so I am afraid EXCEL may crash before finishing the work. (have been trying with LEFT, LEN, RIGHT, MID, but none seems to work the way I had hoped)
Any suggestion will be greatly appreciated.
I think what you are looking for is combination of find and mid, but you'll have to work out exactly how you want to split your string:
A1 = 0/1:47,45:92:99:1319,0,1320 //your number
B1 = Find(“:“,A1) //location of first ":" symbol
C1 = LEN(A1) - B1 //character count to copy ( possibly requires +1 or -1 after B1.
=Left(A1,B1) //left of your symbol
=Mid(A1,B1+1,C1) //right size from your symbol (you can also replace C1 with better defined number to extract only 1 portion
//You can also nest the statements to save space, but usually at cost of processing quantity increase
This is the concept, you will probably need to do it in multiple cells to split a string as long as yours. For multiple splits you probably want to replicate this command to target the result of previous right/mid command.
That way, you will get cell result sequence like:
0/1:47,45:92:99:1319,0,1320; 47,45:92:99:1319,0,1320; 92:99:1319,0,1320; 99:1319,0,1320......
From each of those you can retrieve left side of the string up to ":" to get each portion of a string.
If you are working with a large table you probably want to look into VB scripting. To my knowledge there is no single excel command that can take 1 cell and split it into multiple ones.
Let me try to help you about this, I am not a professional so you may face some problems. First of all my solution contains 2 columns to be added to the source column as you can see below. However you can improve formulas with this principle.
Column B Formula:
=LEFT(A2,FIND(":",A2,1)-1)
Column C Formula:
=RIGHT(A2,LEN(A2)-FIND("|",SUBSTITUTE(A2,":","|",LEN(A2)-LEN(SUBSTITUTE(A2,":","")))))
Given you statement of having 100x columns I imagine in some instances you are needing to isolate characters in the middle of your string, thus Left and Right may not always work. However, where possible use them where you can.
Assuming your string is in cell F2: 0/1:47,45:92:99:1319,0,1320
=LEFT(F2,3)
This returns 0/1 which are the first 3 characters in the string counting from the left. Likewise, Right functions similarly:
=RIGHT(F2,4)
This returns 1320, returning the 4 characters starting from the right.
You can use a combination of Mid and Find to dynamically find characters or strings based off of defined characters. Here are a few examples of ways to dynamically isloate values in your string. Keep in mind the key to these examples is the nested Find formula, where the inner most Find is the first character to start at in the string.
1) Return 2 characters after the second : character
In cell F2 I need to isolate the "92":
=MID(F2,FIND(":",F2,FIND(":",F2)+1)+1,2)
The inner most Find locates the first : in the string (4 characters in). We add the +1 to move to the 5th character (moving beyond the first : so the second Find will not see it) and move to the next Find which starts looking for : again from that character. This second Find returns 10, as the second : is the 10th character in the string. The Mid formula takes over here. The formula is saying, Starting at the 10th character return the following 2 characters. Returning two characters is dictated by the 2 at the end of the formula (the last part of the Mid formula).
2) In this case I need to find the 2 characters after the 3rd : in the string. In this case "99":
=MID(F2,FIND(":",F2,FIND(":",F2,FIND(":",F2)+1)+1)+1,2)
You can see we have simply added one more nested Find to the formula in example 1.