Very simple formulas following but I am missing some understanding and it seems frustratingly simple.
Very simple text extraction:
MID(A1,Start Num, Num of Chars)
A simple formula text finding formula,
SEARCH(Find_text, within_text, start_num)
Combined these two formulas can find and extract text from a field between two text characters, for instance 'underscores', 'parentheses' 'commas'
So for example to extract
text to extract >>> Jimbo Jones
from a cell containing parentheses an easy formula would be;
Sample text A1 = Incident Report No.1234, user (Jimbo Jones) Status- pending
formula;
=MID(A1, SEARCH("(", A1)+1, SEARCH(")", A1) - SEARCH("(", A1) -1)
Extracted text = Jimbo Jones
The logic is simple
1.Identify cell containing text
2.Find the start number by nominating a first searchable character
3.Find the end number of the text being extracted by searching for the second searchable character
4.Subtracting the Start Number from the End number gives the number of characters to extract
Without using Search formula the code is;
MID=(A1,32,11) = Jimbo Jones
But if i want to extract text between commas or other identical characters (such as quotation marks, apostrophes, asterisk ) I need to use this formula following (which I found suggested)
=MID(A1, SEARCH(",", A1)+1, SEARCH(",", A1, SEARCH(",", A1) +1) - SEARCH(",",A1) -1)
Sample text A1 Incident Report No.1234 user, Jimbo Jones, Status- pending
Extracted text = Jimbo Jones
But I how do i nominate other boundaries, such as text between 3rd and 4th comma for example?
Sample text A1 Incident Report, No.1234, user, Jimbo Jones, Status- pending
The reason for my confusion is in the above formula excel finds the second iteration of the comma no matter where they are in the text yet the actual formula being used is identical to the formula finding the first comma, the count of characters seems to automatically assume somehow that I want the second comma not the first, how do i instruct the formula find subsequent iterations of commas, such as 3rd 4th or 9th?
And what am i not understanding in why the formula finds the 2nd comma?
Cheers!
To explain what you are confused about:
At first sight it looks that it uses same formula to find 1st and 2nd searched symbol. But at second look you might notice that there is and argument start_num which tells for a formula where to start looking from. If you give first symbol location +1 (SEARCH(",", A1) +1))as that argument, formula will start looking for first search symbol in this part: ' No.1234, user, Jimbo Jones, Status- pending' and will give answer 42. You got 1st occasion place with first formula and then second occasion with formula above. Just find length by substracting and thats it.
Possible solutions:
If you have Office 365, use TEXTAFTER() and TEXTBEFORE() as others have stated where you can pass instance number as an argument:
=TEXTAFTER(TEXTBEFORE(A1,",",4),",",3)
Result:
Then you can use TRIM() to get rid of unwanted spaces in begining and end.
If you use older version of Office you can use SUBSTITUTE() as workaround as it lets you to change nth occasion of specific symbol in text.
Choose a symbol that does not appear in your text and change 3th and 4th occasions of your searched symbol to it. Then look for them (in this example we will substitute , to #:
=MID(A1,SEARCH("#",SUBSTITUTE(A1,",","#",3))+1,SEARCH("#",SUBSTITUTE(A1,",","#",4))-(SEARCH("#",SUBSTITUTE(A1,",","#",3))+1))
Little explanation:
Formulas used in explanation column C:
C
=SUBSTITUTE(A1,",","#",3)
=SUBSTITUTE(A1,",","#",4)
=SEARCH("#",B1)
=SEARCH("#",B2)
=MID(A1,B3+1,B4-(B3+1))
Full formula:
=MID(A1,SEARCH("#",SUBSTITUTE(A1,",","#",3))+1,SEARCH("#",SUBSTITUTE(A1,",","#",4))-(SEARCH("#",SUBSTITUTE(A1,",","#",3))+1))
Trimmed:
=TRIM(MID(A1,SEARCH("#",SUBSTITUTE(A1,",","#",3))+1,SEARCH("#",SUBSTITUTE(A1,",","#",4))-(SEARCH("#",SUBSTITUTE(A1,",","#",3))+1)))
Thanks for the responses all, I grinded through using the two Formulas I asked about (MID and SEARCH) and I have a result.
It's not pretty nor elegant but it extracts the data as per requirement. I will benefit from the tips left here in response to my question as simpler options are available.
Requirement: Extract text between 3rd and 4th Commas using MID and SEARCH
Sample text A15
Incident Report (ammended), No.12545424234, user, Jimbo Jones, Status-
pending
MID(A15,(SEARCH(",",A15,(1+(SEARCH(",",A15,SEARCH(",",A15)+1)))))+2,(SEARCH(",",A15,(SEARCH(",",A15,SEARCH(",",A15)+1)+(SEARCH(",",A15)))-(SEARCH(",",A15,SEARCH(",",A15)+1)-(SEARCH(",",A15)))+1)-(SEARCH(",",A15,(1+(SEARCH(",",A15,SEARCH(",",A15)+1))))))-2)
Test Extracted
Jimbo Jones
Obviously this solution works on other text, but it's also obviously not easy to quickly amend for other text locations.
Anyway, cheers again for the pointers...
Related
Using an MS Excel formula (No VBA/Macro), I would like to extract between the Nth and Nth characters and/or letters in a string. Example: I have text in Columns A2 and A3, I would like to extract text located between the 4th space and 9th space in each of the following strings.
Column A2: Johnny went to the store and bought an apple and some grapes
Column A3: We were not expecting to go home so early but we had no other choice due to rain
Results:
Column A2: store and bought an apple
Column A3: to go home so early
With Microsoft 365 or Excel 2019 you can use TEXTJOIN() in combination with FILTERXML():
Formula in B1:
=TEXTJOIN(" ",,FILTERXML("<t><s>"&SUBSTITUTE(A1," ","</s><s>")&"</s></t>","//s[position()>4][position()<6]"))
The xpath expressions first selects all elements from the 5th word onwards and consequently only returns the first 5 elements from that respective array. Therefor //s[position()>4][position()<6] can also be written as //s[position()>4 and position()<10].
What you are looking for is a substring function, which is not really available in excel, I would suggest looking at this tutorial for functions that perform the same functionality https://www.excel-easy.com/examples/substring.html, and try to find the best one for your use case.
I would think the MID function would be the best suit for your use case.
I have a document in google sheets and the column consists of the name and version, like NLog.Config.4.3.0, NLog.Config.4.4.9 and so on.
See the image below for other examples.
I need to divide this into two columns - name and version, but I'm not familiar with regular expressions so close that I can get this info.
I can use excel and then import it to the Google doc, it doesn't matter for me how to do that.
enter image description here
You can try something like this:
Suppose you have your string in A1, then in B1 you can enter this:
=LEFT(A1,LEN(A1)-MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789")))
and in C1 this:
=RIGHT(A1,LEN(A1)-MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))+1)
you may need to do some adjustments if there are cases without numbers as it will produce an error, for example you can round it with an Iferror like this:
=IFERROR(LEFT(A1,LEN(A1)-MIN(SEARCH({0,1,2,3,4,5,6,7,8,9},A1&"0123456789"))),A1)
Note: A1&"0123456789" is a 'trick' to avoid the search to return error, as the search is looking for all the numbers in the array; we just need the position of the first one, thus the MIN().
Supposing that your raw data were in A2:A, place this in B2:
=ArrayFormula(IFERROR(REGEXEXTRACT(A2:A,"(\D+)\.(.+)"),A2:A))
The regular expression reads "Extract any number of non-digits up to but not including a period as group one, and everything remaining into group two." (In other words, "As soon as you run into a digit after a period, start group two.")
The IFERROR clause means, "If this pattern can't be found, just return the original cell data."
Assuming your content is in column A (Google Sheets), try this arrayformula in any cell other than column A:
=arrayformula(iferror(split(REGEXREPLACE($A:$A,"(\.)(\d+.+$)",char(6655)&"$2"),char(6655)),))
There are two regex groups denoted in ():
(\.) and (\d+.+$).
The first group looks for a dot . - it's escaped using \. The second group looks for a number (0-9) \d, one or more occurrences + then ending with $ one or more + of any character ..
The replacement is char(6655) (wouldn't usually be found in your dataset), and the contents of group two $2.
Then the split function divides the text into two columns by the char(6655) character.
iferror returns nothing if nothing is split.
The arrayformula works down the sheet.
Currently I have a log file from restaurant which contains the product name and all.
I have to extract only Product name but not its price which is on same cell.
Report Format
Product Code | Product Name
00041 Beef Salted Tongue,1000 Yen (excl.tax)
00042 Pork Loin, 980 Yen (excl.tax)
Not that Every product name ends with " , [price][excl.tax]
So is there anyway to remove it from whole column?
I have used =LEFT(A1,(FIND(",",A1,1)-1))
It does help but some of the product contains multiple "," .
For Example : Eggs,Pickles,Sauce, Pepper .
So If I use the above formula, It only gives me Eggs.
Posting answer as per discussion over the comments:
Use the below link to find the location of last comma in your cell contents.
Excel: last character/string match in a string
To solve the second issue you encountered (pasting the value across filtered rows), refer to the below solution (courtesy TurgutKapisiz)
This does exist in Excel (despite being hidden or not immediately
intuitive).
After you have selected the data you need to copy into the
non-contiguous cells that have been filtered:
In Excel 2007 and earlier: Edit-> Go To -> Special -> Visible Cells
Only will select the data, then you do a Paste Special Values In Excel
2010 and above: in the Home tab Find and Select -> Go to Special ->
Visible Cells only will select the data, then you do a Paste Special
Values
Enter the following Formula in cell B3
=IF(ISERROR(FIND(" Yen ",B2,1)),B2,LEFT(B2,FIND("#",SUBSTITUTE(B2,",","#",LEN(B2)-LEN(SUBSTITUTE(B2,",",""))),1)-1))
Part 1 of the Formula checks the presence of Yen.
If there is a possibility that there can be a product with the text Yen the you may replace this part with If(Right(B2,1)=")") or any other unique text that exists only when price of the item is present.
Part 2 of the formula, substitutes comma and compares the length of the text before and after the comma is replaced. It replaces the comma with Has (#), and returns the text to the left of the Hash.
Do let me know if this is what you wanted to do or if require any changes.
Regards,
Vijaykumar Shetye,
Panaji, Goa,
India
The required formula is shown in the attached image for your calculation of extraction of required text (click on the Image for clarity).
=LEFT(B2,MIN(FIND({0,1,2,3,4,5,6,7,8,9},B2&"0123456789"))-1)
//For the data or info # B2 cell in excel
Output
Let's say I have the following value in a cell "test1_test2_test3_test4_test5". In another cell it could be "test1_test2_test3" or even "test 1_t est2".
What I would like is to have a 'general' function that I can specify to only give me back e.g. all characters before the first underscore, between the first en second underscore etc...and all the characters after the last underscore. And....if there isn't anything found, don't give back an error but just empty or nothing.
Thusfar I've googled a working format for when having a maximum of 2 underscores present (each different in formula):
For locating and displaying the characters before the first underscore: =LEFT(D32; SEARCH("";D32;1)-1)
For locating the characters after the first and before the second underscore: =MID(D32;SEARCH("";D32;1)+1;SEARCH("";D32;SEARCH("";D32;1)+1)-(SEARCH("";D32;1))-1)
For locating the characters after the second underscore (not limiting untill the next one is/is not present): =RIGHT(D32;LEN(D32)-SEARCH("";D32;SEARCH("_";D32;1)+1))
Ps: because my native (excel) language is Dutch, I've done my best to translate my working Excel functions to the English syntax.
With data in A1, in B1 enter:
=TRIM(MID(SUBSTITUTE($A1,"_",REPT(" ",999)),COLUMNS($A:A)*999-998,999))
and copy across:
I suggest Text to Columns with underscore as the delimiter, count how many pieces result (COUNTA) and then pick to suit accordingly. Use IF to return blank ("") if say you want the text after the second underscore and the count is 1.
Help me please, to find a formula for excel, which takes all the words in the text (for example, text from column A) and gives all the words from the text without repeating in a column B.
For example,
Column A
Text
Although simplicity is a virtue, theories regarding pedagogy do not work in practice if they are black and white. To say that the best way to teach is only to praise positive actions and to ignore negative ones is like saying that strawberries reduce one’s risk for cancer so people should cut apples out of their diet and only eat strawberries. In both situations, there does not have to be a choice.
Column B - Words from text
Although
simplicity
is
a
virtue,
theories
regarding
pedagogy
do
not
work
in
practice
if
they
are
black
and
white.
To
say
that
the
best
way
to
teach
is
only
to
praise
positive
actions
and
to
ignore
negative
ones
is
like
saying
that
strawberries
reduce
one’s
risk
for
cancer
so
people
should
cut
apples
out
of
their
diet
and
only
eat
strawberries.
In
both
situations,
there
does
not
have
to
be
a
choice.
This is a rather complex thing for a single formula .... here's a method ...
part 1: splitting a text into single words:
A1: your text
A3: =SUBSTITUTE(A1,",","") .... removing commas
A5= =SUBSTITUTE(A3,".","") .... removing full stops (repeat this for other punctations you might have
A8: constant value 0
A9: =FIND(" ",$A$5,A8+1) .... find the first blank in $A$5 after the position indicated by the cell above .... copy this formula down until you get the first #VALUE error
B9: =MID($A$5,A8+1,A9-A8-1) .... extract the word between previous and this blank position .... copy this formula down until you get the first #VALUE error
when you are happy with your split list, copy/paste as values the list and do some headers
part 2: finding uniques words:
You need to find each unique word exactly once. A method strictly without VBA would consist of the following:
sort the text in column B ascending
enter in C8: =IF(B8=B7,C7+1,1) and copy down to end of list ... you create a "running number starting with 1 and continuing to increment as long as the word remains the same
autofilter column C for value = 1 ... this will display the first occurence of each word
copy / paste the filtered list to whereever you want to store it for further processing ... I recommend a sheet different from your raw data
You can restore the original sort order of the result by sorting on the numeric values in column A.
As you can see in the example of words "in", "to", this method is case insensitive. A limitation is a possible false seperation between "ones" and "one's" ... this needs to be decided.
You can try this formula:
=TRIM(MID(SUBSTITUTE($A$1;" ";REPT(" ";LEN($A$1)));1+(ROW(A1)-1)*LEN($A$1);LEN($A$1)))
Assuming test in A1, write formula in B1 and copy down till you got last word
Depending on your regional settings you may need to replace ";" by ","