T-SQL : Fetch all Text before last occurence of Dash - text

I'm stuck on trying to fetch all text up to (but not including) the last dash.
I can find a solution for fetching text to the left of 1 dash (eg. SUBSTRING(#ID, 1, CHARINDEX('-', #ID) -1) ) and even say the second dash but the issue is that the number of dashes in my list vary wildly.
Eg.
ID
ABC-DEF-GHI-001
ABC-DEF-2
ABC-DEF-GHI-JKL-00003
ABC-DEF-GH-4
ABC-123-DEF-008
From the above I would like to fetch, all the text to the left of the last dash.
ABC-DEF-GHI
ABC-DEF
ABC-DEF-GHI-JKL
ABC-DEF-GH
ABC-123-DEF
Any pointers appreciated.

One trick we can use here to find the first occurrence of dash in the reversed string. Then, use that index to offset a substring of the entire string, but taken from the original beginning.
SELECT
col,
LEFT(col, LEN(col) - CHARINDEX('-', REVERSE(col))) AS col_sub
FROM yourTable
WHERE
col LIKE '%-%';
Demo

Related

MS Excel Forumla assistance

I have a cell I need to split into 2 cells.
Data Sample: Note: All Cells are formatted as TEXT
"3851v61_18.005_ Have the anchors for all suspended scaffolding system suspension lines and separate vertical lifelines been verified? "
Data Sample 2: Parent_ID
Steps:
Need to check to see if the cell value starts with number.
Also, If it contains a special character ("_") if may have more than 1.
Display cell #1 = just the ID number containing the underscore(s).
Display cell #2 - Just the text right of the underscore. However, if the original cell only starts with Alpha characters then display the actual value. ie. Parent_Id
Strip off any erroneous underscores left hanging.
Expected results:
Cell #:
"3851v61_18.005" (ID Number portion of the Text)
"Have the anchors for all suspended scaffolding system suspension lines and separate vertical lifelines been verified?
This is what I have so far: (If it does not start with a number, then return the value of the cell, else continue with the equation)
`=`IF(NUMBERVALUE(LEFT(C321,1))>=1,IFERROR(LEFT(C321, FIND("_",C321)-1), C321),FALSE)`
=IFERROR(RIGHT(C321,LEN(C321)-FIND("_",C321)), C321)`
If the Underscore count is more than one need to include it in the entire number and strip off the text after the last underscore in Cell 1. At the same for the right of the Underscore to display the text after underscore in Cell 2.
Thank you for any assistances offered.
I think I understand but am not 100% sure.
Try something like the below to get the full string (if it starts with something that isn't a number) or the string up to the last underscore (if it does start with a number):
=IF(NOT(ISNUMBER(NUMBERVALUE(LEFT($D1,1)))), $D1,
LEFT($D1, FIND("!!!", SUBSTITUTE($D1, "_", "!!!",
LEN($D1)-LEN(SUBSTITUTE($D1, "_", ""))))-1))
Then in a similar fashion try something like the below to get the full string (if it starts with something that isn't a number) or the string to right of the last underscore (if it does start with a number):
=IF(NOT(ISNUMBER(NUMBERVALUE(LEFT($D1,1)))), $D1,
RIGHT($D1, LEN($D1)-FIND("!!!", SUBSTITUTE($D1, "_", "!!!",
LEN($D1)-LEN(SUBSTITUTE($D1, "_", ""))))))
For example:

How to extract text from a string between where there are multiple entires that meet the criteria and return all values

This is an exmaple of the string, and it can be longer
1160752 Meranji Oil Sats -Mt(MA) (000600007056 0001), PE:Toolachee Gas Sats -Mt(MA) (000600007070 0003)GL: Contract Services (510000), COT: Network (N), CO: OM-A00009.0723,Oil Sats -Mt(MA) (000600007053 0003)
The result needs to be column1 600007056 column2 600007070 column3 600007053
I am working in Spotfire and creating calclated columns through transformations as I need the columns to join to other data sets
I have tried the below, but it is only picking up the 1st 600.. number not the others, and there can be an undefined amount of those.
Account is the column with the string
Mid([Account],
Find("(000",[Account]) + Len("(000"),
Find("0001)",[Account]) - Find("(000",[Account]) - Len("(000"))
Thank you!
Assuming my guess is correct, and the pattern to look for is:
9 numbers, starting with 6, preceded by 1 opening parenthesis and 3 zeros, followed by a space, 4 numbers and a closing parenthesis
you can grab individual occurrences by:
column1: RXExtract([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))',1)
column2: RXExtract([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))',2)
etc.
The tricky bit is to find how many columns to define, as you say there can be many. One way to know would be to first calculate a max number of occurrences like this:
maxn: Max((Len([Amount]) - Len(RXReplace([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))','','g'))) / 9)
still assuming the number of digits in each column to extract is 9. This compares the length of the original [Amount] to the one with the extracted patterns replaced by an empty string, divided by 9.
Then you know you can define up to maxn columns, the extra ones for the rows with fewer instances will be empty.
Note that Spotfire always wants two back-slash for escaping (I had to add more to the editor to make it render correctly, I hope I have not missed any).

Replace string parts that appear twice Oracle

I am trying to work out in Oracle how to isolate/highlight word combinations in a concatenated string like the one below:
Some words##Again words##More of this||####||Some words##Again words##Other
The idea is to find the word combinations that appear exactly twice and replace them by 0 so I'm left with the ones that appear only once, either on the left side of the ||####|| or on the right side. The result of the query should be something like this:
Highlighted
Some words##Again words##More of this||####||Some words##Again words##**Other**
Replaced
0##0##More of this||####||0##0##Other
To give you some more information about the concatenation: the left side (before the ||####||) is my current customer record, while on the right hand side I have the previous version. By making the replacements I can reveal any differences between customer records.
I have tried to get this done by using:
regexp_replace: this does not work entirely with REGEXP_REPLACE(MY STRING,'((Some words){1,2})|((Again words){1,2})','0',1,0) as for some reason the string parts in my first record are never correctly replaced. I'm also hitting the limits of this function due to the number of word combinations I need to match;
nested CASE WHEN: does not work either obviously as CASE WHEN - even nested - stops when the first match is found but I need to have all conditions checked and replaced.
I have thought about using subselects, but as this query uses one of the largest tables in my schema, this will not be usable except on a per customer basis. And it might still not work...
Some more information in order to find a solid, performant solution:
I have 34 possible word combinations to match
I have no idea which ones will be there, ever, except when I run the query obviously
I have no idea in which order they will be in the concatenated string
I hope this is clear. Anyone with some magical ideas?
Thanks in advance
You can use a recursive sub-query factoring clause to replace one duplicated term at each iteration:
WITH replaced ( value, start_char ) AS (
SELECT REGEXP_REPLACE(
value,
'(##|^)([^#]+?)((##[^#]+?)*\|\|####\|\|([^#]+?##)*)\2(##|$)',
'\10\30\6',
1
),
REGEXP_INSTR(
value,
'(##|^)([^#]+?)((##[^#]+?)*\|\|####\|\|([^#]+?##)*)\2(##|$)',
1
)
FROM table_name
UNION ALL
SELECT REGEXP_REPLACE(
value,
'(##|^)([^#]+?)((##[^#]+?)*\|\|####\|\|([^#]+?##)*)\2(##|$)',
'\10\30\6',
start_char + 1
),
REGEXP_INSTR(
value,
'(##|^)([^#]+?)((##[^#]+?)*\|\|####\|\|([^#]+?##)*)\2(##|$)',
start_char + 1
)
FROM replaced
WHERE start_char > 0
)
SELECT value
FROM replaced
WHERE start_char = 0;
Which, for the sample data:
CREATE TABLE table_name ( value ) AS
SELECT 'Some words##Again words##More of this||####||Some words##Again words##Other' FROM DUAL UNION ALL
SELECT '333##123##789##555||####||123##456##789##222##333' FROM DUAL;
Outputs:
| VALUE |
| :------------------------------------ |
| 0##0##More of this||####||0##0##Other |
| 0##0##0##555||####||0##456##0##222##0 |
db<>fiddle here
Explanation:
The regular expression matches:
(##|^) either two # characters or the start of the string ^ (in the first capturing group ());
([^#]+?) one-or-more characters that are not # (in the second capturning group ());
( the start of the 3rd capturing group;
(##[^#]+?)* two # characters followed by one-or-more non-# characters (in the 4th capturing group ()) all repeated zero-or-more * times;
\|\|####\|\| then two | characters, four # characters and two | characters;
([^#]+?##)* then one-of-more non-# characters followed by two # characters (in the 5th capturing group ());
) the end of the 3rd capturing group;
\2 a duplicate of the 2nd capturing group; then
(##|$) either two # characters or the end-of-the-string $ (in the 6th capturing group).
This is replaced by:
\10\30\6 which is the contents of the 1st capturing group then a zero (replacing the 2nd capturing group) then the contents of the 3rd capturing group then a second zero (replacing the matched duplicate) then the contents of the 6th capturing group.
The query will replace a pair of duplicate terms in the string (if they exist) and REGEXP_INSTR will find the start of the match and put the values into value and start_char (respectively); then at the next iteration the regular expression will start looking from the next character on from the start of the previous match, so that it will gradually move across the string finding matches until no more duplicate terms can be found and REGEXP_REPLACE will not perform a replacement and REGEXP_INSTR will return 0 and the iteration will terminate.
The final query filters to return the only the final level of the iteration (when all the duplicates have been replaced).

To extract a string based on some specific characters

I have values in rows like below:
Https://abc/uvw/xyz
Https://def/klm/qew/asdas
Https://ghi/sdk/asda/as/aa/
Https://jkl/asd/vcx/asdsss/ssss/
Now i want the result to be like below
Https://abc/uvw/xyz
Https://def/klm/qew
Https://ghi/sdk/asda
Https://jkl/asd/vcx
So how to take result by skipping / for up to some count or is there any other way to get this done in excel. Is there any way to skip result of the RIGHT when it Finds 4 '/' in string?
You could use SUBSTITUTE to replace the nth / (in this case 5th) to a unique character and perform a LEFT based on that unique character obtained from FIND. I'll take CHAR(1) as the unique character:
=LEFT(A1,IFERROR(FIND(CHAR(1),SUBSTITUTE(A1,"/",CHAR(1),5))-1,LEN(A1)))
Another option would be to split on / using Text to Columns under the Data tab and join back only the columns you need.

Excel - remove characters after a specific character

I have colum B with values:
0015-04D-SEAW
0015-ADLKM-SPOK
0015-D-CURR
0016-01N-BOIL
etc.
How can I remove all characters after second dash and the second dash itself as well, it should look like this:
0015-04D
0015-ADLKM
0015-D
0016-01N
Assuming B1 contains 0015-04D-SEAW
This would do : =IFERROR(MID(B1,1,FIND("-",B1,FIND("-",B1,1)+1)-1),B1)
Result : 0015-04D
One dirty solution would be to convert text to columns delimited by - and then to concatenate the first two columns separated by -

Resources