How to extract specific text from a sentence in Excel?

How to extract specific text from a sentence in Excel? - excel

I have a database that exports data like this:
How can I get for instance, the Net Rentable Area with the values needed:
E.G.
Net Rentable Area
I tried the TextSplit function but I got a spill.
Please let me know what can be done, thanks!
Also it would be nice to see it working in something such as the Asking Rate, which has a different format.

In cell C2 you can put the following formula:
=1*TEXTSPLIT(TEXTAFTER(A2, B2&" ")," ")
Note: Multiplying by 1 ensures the result will be a number instead of a text.
and here is the output:
If all tokens to find are all words (not interpreted as numbers), then you can use the following without requiring to specify the token to find:
=LET(split, 1*TEXTSPLIT(A2," "), FILTER(split, ISNUMBER(split)))
Under this assumption you can even have the corresponding array version as follow:
=LET(rng, A2:A100, input, FILTER(rng, rng <>""), IFERROR(DROP(REDUCE(0, input,
LAMBDA(acc,text, LET(split, 1*TEXTSPLIT(text," "),
nums, FILTER(split, ISNUMBER(split),""), VSTACK(acc, nums)))),1),"")
)
Note: It uses the trick for creating multiple rows using VSTACK within REDUCE. An idea suggested by #JvdV from this answer. It assumes A1 has the title of the column, if not you can use A:A instead.

Related

Index and Match multiple matches

I need help with the following query. There are 2 excel sheet and I need to find out in Sheet 1 in Column A what are the different accounts matching, the refernce is Sheet 2.
I am looking for a formula, which can give me all the account in sheet 1 in corressponds to the position nr. The anwser is in sheet 2. Can someone please help?
eg. 5001 = should give me 41150100, 41150101, 41200000
Position
Account
5001
5031
5051
Account
Position
41150100
5001
41150101
5001
78589545
5051
I am looking for a formula, which can give me all the account in sheet 1 in corressponds to the position nr. The anwser is in sheet 2. Can someone please help?
eg. 5001 = should give me 41150100, 41150101, 41200000

Assuming no Excel version constraints as per the tags listed in the question, you can try the following (formula 1):
=LET(pos, A2:A4, accnt, B2:B4, REDUCE({"Account","Position"}, pos, LAMBDA(ac,p,
VSTACK(ac,LET(f,TEXTSPLIT(#FILTER(accnt,pos=p),,","), HSTACK(f, IF(f=f, p)))))))
Here is the output:
Notes:
You would need to clean up your input because in some cases the delimiter is just a comma and in other cases, a space is added.
If the question refers to doing it backward, as #ScottCraner suggested in the comments, then assuming the output from the previous screenshot is now the input, then we have (formula 2):
=LET(acc, D2:D8, pos, E2:E8, ux, UNIQUE(pos), out, MAP(ux, LAMBDA(p,
TEXTJOIN(",",,FILTER(acc, pos=p)))), HSTACK(ux, out))
formula 1: Uses the REDUCE/VSTACK pattern, check my answer to the question: how to transform a table in Excel from vertical to horizontal but with different length for more details on how to use it. In this case, we use the header to initiate the accumulator.
TEXTSPLIT is used to split the account information by , into rows. We use implicit intersection (#) to convert the FILTER output (array of one element only) into a single string to be able to use TEXTSPLIT, otherwise, it returns the first element only.
We use the condition IF(f=f, p) to generate a constant array with the position value (p). HSTACK is used to generate the output on each iteration in the format we want (first account, then position).
A more verbose formula, but maybe easier to understand, since it doesn't use the VSTACK/REDUCE pattern, could be the following:
=LET(pos, A2:A4, accnt, B2:B4, split, TEXTSPLIT(TEXTJOIN(";",,accnt), ",",";"),
mult, MMULT(1-ISNA(split), SEQUENCE(COLUMNS(split),,1,0)),
outP, TOCOL(TEXTSPLIT(TEXTJOIN(";",,REPT(pos&",",mult)),",",";",1),2),
HSTACK(TOCOL(split,2), outP))
The main idea is to use TOCOL. The name split, generates the array with the account information. The name mult, calculates the number of columns with values. Now we know how many times we need to repeat position values. We use REPT to repeat the value and generate an array via TEXTSPLIT.

Parse a string into a Table using FILTERXML

This is related this question. The OP proposed to give inputs to a formula that contain a list of connection quantities and speeds like this:
1x1000,2x200,1x50 would mean that there is one 1000k connection, two 200k and 1 50k. I would like to parse this into an array table like this:
1
1000
2
200
1
50
I tried this formula, but it only produces the left hand side of the table:
=LET( case, A5,
a, FILTERXML("<t><s>"&SUBSTITUTE(case,",","</s><s>")&"</s></t>","//s[contains(., 'x')]"),
FILTERXML("<t><s>"&SUBSTITUTE(a,"x","</s><s>")&"</s></t>","//s") )
where case is the input variable, a parses the table into strings containing "x" (this is to ensure that only valid "q x speed" strings are used. I then tried to split this array and... no joy.
From this post by JvdV, I think the answer can be found in the xpath, but I cannot find a solution.

Looks like you want to either spill the entire array or use it in later calculations? Either way, I came up with:
=LET(X,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1,",","x"),"x","</s><s>")&"</s></t>","//s"),INDEX(X,SEQUENCE(COUNT(X)/2,2)))
Or, a littel more verbose without LET():
=INDEX(FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1,",","x"),"x","</s><s>")&"</s></t>","//s"),SEQUENCE(LEN(A1)-LEN(SUBSTITUTE(A1,"x","")),2))

One way to get it is something like
=LET(x, FILTERXML("<t><s>"&SUBSTITUTE($A$1, ",", "</s><s>")&"</s></t>", "//s"),
IF(SEQUENCE(1,2)=1, LEFT(x, SEARCH("x",x)-1), RIGHT(x, LEN(x)-SEARCH("x",x))))
Once you break up the string by comma, you can then break up the component strings by "x" with something like
=TRANSPOSE(FILTERXML("<t><s>"&SUBSTITUTE(A7, "x", "</s><s>")&"</s></t>", "//s"))
but I'm not sure if you can combine the two actions in one go to get both width and depth dimensions (i.e. =TRANSPOSE(FILTERXML("<t><s>"&SUBSTITUTE(original_filterxml, "x", "</s><s>")&"</s></t>", "//s")) will not work).

Maybe,
In C1, formula copied right to D1 and all copied down :
=TRIM(MID(SUBSTITUTE(SUBSTITUTE(","&$A$1,"X",","),",",REPT(" ",99)),((ROW(A1)*2+COLUMN(A1))-2)*99,99))
Or,
If using FILTERXML function, try :
=IFERROR(FILTERXML("<a><b>"&SUBSTITUTE(SUBSTITUTE($A$1,"X",","),",","</b><b>")&"</b></a>","//b["&(ROW(A1)*2+COLUMN(A1))-2&"]"),"")

Excel COUNTIF match variations of target: LET solution?

This is a slightly more complicated issue than a simple =COUNTIF(rng,"*"&value&"*"), as found here.
I have a 2D array with cells containing data such as:
abc
def
abc def
ghi
abc,def,ghi
abcdef
ghi; def
..... and several other variations of this. I am trying to count exact matches of "abc", but I want the count to be inclusive of cells containing "abc def" and other like variations, however I can't just use the above simple COUNTIF formula since "abcdef" is not an acceptable match. The target string must stand alone or be separated from other text by an acceptable character in chars.
I think I've got this one 90% done, but the bit I need help with is combining all the possible acceptable variations of a target "name" into a flat range that I can then check my data source against for the COUNTIF. I've tried INDEX(r_1:r_8,idxRow,idxCol) and other familiar solutions that work on the sheet when referencing other ranges, but I'm new to using the =LET function. All of this works well when broken out into separate components on my spreadsheet, but I'm looking for a cleaner solution with =LET. See below for current formula:
=LET(rg, DataTable[[Q14_1]:[Q14_9]],
name, AU38,
chars, {" ",",",";"},
r, 8,
r_1, CONCATENATE(name,chars),
r_2, CONCATENATE(chars,name),
r_3, CONCATENATE(chars,name,chars),
r_4, CONCATENATE(name,chars,"*"),
r_5, CONCATENATE("*",chars,name),
r_6, CONCATENATE(chars,name,chars,"*"),
r_7, CONCATENATE("*",chars,name,chars),
r_8, CONCATENATE("*",chars,name,chars,"*"),
c, COUNTA(chars),
mSeq, SEQUENCE(r*c),
idxRow, 1+MOD(mSeq,r),
idxCol, INT((SEQUENCE(r*c)-1)/r)+1,
X, INDEX(**NeedHelpHere**,idxRow,idxCol),
SUM(COUNTIF(rg,name),COUNTIF(rg,X))
)

Give a try on below formula. If you have more delimiter like space, comma & others then you need to use more SUBSTITUTE() function.
=LET(x,FILTERXML("<t><s>"&SUBSTITUTE(SUBSTITUTE(A1:A7," ","</s><s>"),",","</s><s>")&"</s></t>","//s"),y,FILTER(x,x="abc"),SUM(--(y<>"")))
To learn about FILTERXML() please read this article from JvdV.

I've thought about this again and am posting a solution that fits my needs.
I don't need to index a single column of potential matches to then COUNTIF, I can just COUNTIF multiple times. Additionally, I was not taking into account different combinations of chars, I was only searching for the same chars on either side of the target (e.g. ",abc," when I should have also been looking for ",abc;"). Transposing the chars array on one side is a simple way of fixing this. It also turns out that "*"&target&"*" searches for "*target*" AND "target" (duh!), so I simplified further, removing duplicative possibilities.
My final formula is below, which counts the number of times target (by itself or surrounded by any acceptable chars) is present in a given rng:
=LET(rng, DataTable[Q14_1]:[Q14_9]]),
name, $A6,
chars, {" " , "," , ";"},
r_1, CONCATENATE(name,chars,"*"),
r_2, CONCATENATE("*",chars,name),
r_3, CONCATENATE("*",chars,name,TRANSPOSE(chars),"*"),
SUM(COUNTIF(rng,name),COUNTIF(rng,r_1),COUNTIF(rng,r_2),COUNTIF(rng,r_3))
)

Find specific characters and return the next value in the cell using Excel Formula

I am not sure where to begin with the formula as I have gotten myself so confused with everything. I have a cell the contains "PON " or "PON: " or "PON = " then the actual PON (Example: PON 123467) I want to formula to return 123467 in the cell.
Examples What I want returned
I have PON 123467 for shoes 123467
I have PON: 234567-AB for food 234567-AB
I have PON - 569874-Weird for accessories 569874-Weird
I have PON = DOG-564-987 for dog food DOG-564-987
I am currently using Excel 365

Filterxml() will give you best companion here in this case. Try-
=FILTERXML("<t><s>"&SUBSTITUTE(FILTERXML("<t><s>"&SUBSTITUTE(A1," for","</s><s>")&"</s></t>","//s[1]")," ","</s><s>")&"</s></t>","//s[last()]")

Using FILTERXML, and testing for a substring following PON, you can try:
=FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>") & "</s></t>","//s[contains(.,'PON')]/following-sibling::*[string-length(.)>2][1]")
Note that FILTERXML solution will cause a PON that is solely numeric, but with a leading zero, to drop the leading zero. Unfortunately, the xPath implementation in that function does not include the string() function
If dropping the leading zero might be a problem, you can add a character to the node that will force the number to be seen as a string. In the modified formula below, I use the unicode zero-width space, but there are others you can use. Note that this will count as a character for the string=length function, so be sure to maintain the >2 parameter:
=FILTERXML("<t><s>"&SUBSTITUTE(TRIM(A1)," ","</s><s>"&UNICHAR(8203)) & "</s></t>","//s[contains(.,'PON')]/following-sibling::*[string-length(.)>2][1]")
Because of the variablity in your data, that sometimes there are extraneous space-separated substrings between PON and your desired extract, the xpath:
locates the substring PON
returns all subsequent siblings that have a string-length of more than two (adjust if necessary)
returns the first sibling that meets that criterion.

You might try this formula.
=TRIM(LEFT(MID(A2,FIND(#{1,2,3,4,5,6,7,8,9},A2),100),FIND(" ",MID(A2,FIND(#{1,2,3,4,5,6,7,8,9},A2),100))))
It extracts the text between the first number and the first space following that number. The size of that extract is limited to 100 characters.

Google Docs using multiple manual criterion in DSUM

How can I manually enter multiple criterion into a DSUM function?
I can have it check a single set of criterion with: =DSUM(J3:L55, "Charge", {"Category";"Coffee"})
However changing that to =DSUM(J3:L55, "Charge", {"Category";"Coffee";"Split";"Yes"} Causes it to just use the "Category";"Coffee" but and ignore the ;"Split";"Yes" section.
What is the syntax to set multiple criterion in google docs? I cannot really make a 2x2 table to each category I have (=DSUM(J3:L55, "Charge", D7:E8) ) and instead need to just manually enter the criterion.
DSUM with table criterion is in blue. I am selecting "Category" though "Split?" and want to use both Category and Split as criterion without having to resort to the darker blue table you see there

Try
=DSUM(J3:L55, "Charge", {{"Category";"Coffee"},{"Split";"Yes"}})

Google updated the DSUM function. The update requires you to specify an array/table for criterion.
Where Column names and Criterion were named before, you now reference a table:
{F3,G4}
Where
"F3" and "G3" are the column names to be referenced.
"F4" and "G4" are the Test Values.
"F3" is Category
"G3" is Split?
"F4" is Coffee
"G4" is Yes
An example of the new formula is:
=DSUM(A1:E55, "Charge", {F3:G4})
I think this makes it easier to update the table and reference changes.
You can have multiple criteria, I tested it with "Equipment" and "Yes" by adding another row to my criterion table.
Reference: Google Help: DSUM()
The Old Way:
The correct syntax is to use commas to separate the columns.
{"Category","Split";"Coffee","Yes"}
{ Column1 , Column2 ; Test 1 , Test 2}
so your formula should be
=DSUM(J3:L55, "Charge", {"Category","Split";"Coffee","Yes"})

In languages that use , as the decimal separator, you cannot use , for separating values. Use a backslash \ instead. So your formula should then be
=DSUM(J3:L55; "Charge"; {"Category"\"Split";"Coffee"\"Yes"})

Same if you are using French format. You can not use ,. Your formula should be
=DSUM(J3:L55; "Charge"; {"Category"\"Split";"Coffee"\"Yes"})
Maybe the same for other Latin laguages too

When you do d7:e8, it'll iterate on columns first and lines later. So on your example, you would have "Category", "Split?" (1st line) and "Coffee","Yes" (2nd line).
So, instead of {"Category";"Coffee";"Split";"Yes"} you should use {"Category";"Split";"Coffee";"Yes"}.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to extract specific text from a sentence in Excel? - excel

Related

Index and Match multiple matches

Parse a string into a Table using FILTERXML

Excel COUNTIF match variations of target: LET solution?

Find specific characters and return the next value in the cell using Excel Formula

Google Docs using multiple manual criterion in DSUM

Categories

Resources