How to extract text from a string between where there are multiple entires that meet the criteria and return all values - spotfire

This is an exmaple of the string, and it can be longer
1160752 Meranji Oil Sats -Mt(MA) (000600007056 0001), PE:Toolachee Gas Sats -Mt(MA) (000600007070 0003)GL: Contract Services (510000), COT: Network (N), CO: OM-A00009.0723,Oil Sats -Mt(MA) (000600007053 0003)
The result needs to be column1 600007056 column2 600007070 column3 600007053
I am working in Spotfire and creating calclated columns through transformations as I need the columns to join to other data sets
I have tried the below, but it is only picking up the 1st 600.. number not the others, and there can be an undefined amount of those.
Account is the column with the string
Mid([Account],
Find("(000",[Account]) + Len("(000"),
Find("0001)",[Account]) - Find("(000",[Account]) - Len("(000"))
Thank you!

Assuming my guess is correct, and the pattern to look for is:
9 numbers, starting with 6, preceded by 1 opening parenthesis and 3 zeros, followed by a space, 4 numbers and a closing parenthesis
you can grab individual occurrences by:
column1: RXExtract([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))',1)
column2: RXExtract([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))',2)
etc.
The tricky bit is to find how many columns to define, as you say there can be many. One way to know would be to first calculate a max number of occurrences like this:
maxn: Max((Len([Amount]) - Len(RXReplace([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))','','g'))) / 9)
still assuming the number of digits in each column to extract is 9. This compares the length of the original [Amount] to the one with the extracted patterns replaced by an empty string, divided by 9.
Then you know you can define up to maxn columns, the extra ones for the rows with fewer instances will be empty.
Note that Spotfire always wants two back-slash for escaping (I had to add more to the editor to make it render correctly, I hope I have not missed any).

Related

Excel Formula Extract any number greater than x charters from a string

I have a file which contains a list of data. In each cell is a name and number and a date the date is either mm/yy or mm-yy or mm-yyyy etc. (never the day just month and year)
The number I need is always going to be greater than 5 characters. Is there a way that I can get just the number from the string
xx company holding - 96923432 -02-22. (number required 96923432)
yy Company (HOLDINGS) LTD - 131002204 - 02/2023 (number required 131002204)
ab HOLDINGS LIMITED / 115472907 / Feb-23 (number required 115472907)
... prior removed
=========UPDATE=========
This formula will work for you, which splits your data by space, then converts to a number and then extracts the max. Adjust as needed if you have occasions where you may not have a number greater than 5 by wrapping with an IF().
=MAX(IFERROR(NUMBERVALUE(TEXTSPLIT(A2," ")),0))
This is interesting since you use 2 different delimiters. However, no worries you can simply use the following to capture both instances. If you have more possible delimiters simply just add them between the {} in both textbefore and textafter functions. Here is an example of the equation:=TEXTBEFORE(TEXTAFTER(A2, {"-","/"}), {"-","/"})
This should work for you then if you want to return nothing if output is less than 5. =IF(LEN(TEXTBEFORE(TEXTAFTER(A1,{"-","/"}),{"-","/"}))>5,TEXTBEFORE(TEXTAFTER(A1,{"-","/"}),{"-","/"}),"")

Using tbl.Lookup to match just part of a column value

This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800
tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")

in excel, I want to count the number of cells that do not contain a specific character

in excel, I want to count the number of cells that do not contain a specific character (in this case, a "." /period).
I tried something like countif(A1:A10,"<>.*") but this is wrong and I can't seem to figure it out.
Say I have these data in column A:
D
N
P
.
.
A
N
.
P
.
And the count would be 6
For your example:
=COUNTIF(A1:A10,"<>.")
returns 6. But it would be a different story if say you wanted to exclude P. from the count also.
Your data may not be quite what you think it is however, because including the * should make no difference for your example.
Or you could subtract periods from the total and be left with the non periods
=COUNTIF(A1:A10,"*")-COUNTIF(A1:A10,"=.")
gives 6.
If your data includes periods along with other characters in the same cell and want a similar count:
then this:
=COUNTA(A1:A10)-COUNTIF(A1:A10,"*.*")
will return 5

Using VLookUp for a partial search

I have two tables in excel.
In table 1, one column contains a list of order numbers. This is done the format of XXXX-YYYY where X is an integer and Y is a letter. For example 3485-XTIP
Table 2 also has an order number column but this time it's in the format XXXX-YYYY (ZZ) where Z is the initials of the customer who made the order. Example: 3485-XTIP (KN)
How can I use a VLookUp to search for the order number in Table 2 but only using the XXXX-YYYY part? I tried using TRUE for an approximate search but it still failed for some reason.
This is what I have
=VLOOKUP("I3",'Table2 '!A:B,2,FALSE)
I am open to any alternatives other than VLookup for this situation.
Note that there are hundreds of order numbers and entering the strings manually will take forever.
You can use * as wildcard and add it at the end of the order number so that your VLOOKUP will match any order plus any other characters that come after it:
=VLOOKUP(I3&"*", 'Table2 '!A:B, 2, 0)
* will match anything after the order number.
Note: 0 and False have the same behaviour here.

=IF(ISNUMBER(SEARCH.... the difference between 1 and 11

I am trying to convert a single column of numbers in excel to multiple depending on the content.
e.g. Table 1 contains 1 column that contains 1 or more numbers between 1 and 11 separated with a comma. Table 2 should contain 11 columns with a 1 or a 0 depending on the numbers found in Table 1.
I am using the following formula at present:
=IF(ISNUMBER(SEARCH("1",A2)),1,0)
The next column contains the following:
=IF(ISNUMBER(SEARCH("2",A2)),1,0)
All the way to 11
=IF(ISNUMBER(SEARCH("11",A2)),1,0)
The problem with this however is that the code for finding references to 1 also find the references to 11. Is it possible to write a formula that can tell the difference so that if I have the following in Table 1:
2, 5, 11
It doesn't put a 1 in column 1 of Table 2?
Thanks.
Use, for list with just comma:
=IF(ISNUMBER(SEARCH(",1,", ","&A2&",")),1,0)
If list is separated with , (comma+space):
=IF(ISNUMBER(SEARCH(", 1,", ", "&A2&",")),1,0)
A version of LS_dev's answer that will cope with 0...n spaces before or after each comma is:
=IF(ISNUMBER(SEARCH(", 1 ,",", "&TRIM(SUBSTITUTE(A2,","," , "))&" ,")),1,0)
The SUBSTITUTE makes sure there's always at least one space before and after each comma and the TRIM replaces multiple spaces with one space, so the result of the TRIM function will have exactly one space before and after each comma.
How about using the SUBSTITUTE function to change all "11" to Roman numeral "XI" prior to doing your search:
=IF(ISNUMBER(SEARCH("1",SUBSTITUTE(A2, "11", "XI"))),1,0)
If you want to eliminate "11" case, but this is all based on hardcoded values, there should be a smarter solution.
=IF(ISNUMBER(SEARCH(AND("1",NOT("11")),A2)),1,0)

Resources