Using REGEXP_SUBSTR to get key-value pair data

Using REGEXP_SUBSTR to get key-value pair data - string

I have a column with below values,
User_Id=446^User_Input=L307-60#/25" AP^^
I am trying to get each individual value based on a specified key.
All value after User_Id= until it encounters ^
All value after User_Input= until it encounters ^
I tried for and so far I have this,
SELECT LTRIM(REGEXP_SUBSTR('User_Id=446^User_Input=L307-60#/25" AP^'
,'[0-9]+',1,1),'^') User_Id
from dual
How do I get the value for the User_Input??
P.S: User input can have anything, like ',", *,% including a ^ in the middle of the string (that is, not as a delimiter).
Any help would be greatly appreciated..

This can be easily solved using boring old INSTR to calculate the offsets of the start and end points for the KEY and VALUE strings.
The trick is to use the optional occurrence parameter to identify each the correct instance of =. Because the input can contain carets which aren't intended as delimiters we need to use a negative position to identify the last ^.
with cte as (
select kv
, instr(kv, '=', 1, 1)+1 as k_st -- first occurrence
, instr(kv, '^', 1) as k_end
, instr(kv, '=', 1, 2)+1 as v_st -- second occurrence
, instr(kv, '^', -1) as v_end -- counting from back
from t23
)
select substr(kv, k_st, k_end - k_st) as user_id
, substr(kv, v_st, v_end - v_st) as user_input
from cte
/
Here is the requisite SQL Fiddle to prove it works. I think it's much easier to understand than any regex equivalent.

If there is no particular need to use Regex, something like this returns the value.
WITH rslt AS (
SELECT 'User_Id=446^User_Input=L307-60#/25" AP^' val
FROM dual
)
SELECT LTRIM(SUBSTR(val
,INSTR(val, '=', 1, 2) + 1
,INSTR(val, '^', 1, 2) - (INSTR(val, '=', 1, 2) + 1)))
FROM rslt;
Of course, if you can't guarantee that there will not be any carets that are valid text characters, this will possibly return partial results.

Assuming that you will always have 'User_Id=' and 'User_Input=' in your string, I would use a character group approach to parsing
Use the starting anchor,^, and ending anchor, $. Look for 'User_Id=' and 'User_Input='
Associate the value you are searching for with a character group.
SCOTT#dev>
1 SELECT REGEXP_SUBSTR('User_Id=446^User_Input=L307-60#/25" AP^','^User_Id=(.*\^)User_Input=(.*\^)$',1, 1, NULL, 1) User_Id
2* FROM dual
SCOTT#dev> /
USER
====
446^
SCOTT#dev>
1 SELECT REGEXP_SUBSTR('User_Id=446^User_Input=L307-60#/25" AP^','^User_Id=(.*\^)User_Input=(.*\^)$',1, 1, NULL, 2) User_Input
2* FROM dual
SCOTT#dev> /
USER_INPUT
================
L307-60#/25" AP^
SCOTT#dev>

Got this answer from a friend of mine.. Looks simple and works great...
SELECT
regexp_replace('User_Id=446^User_Input=L307-60#/25" AP^^', '.*User_Id=([^\^]+).*', '\1') User_Id,
regexp_replace('User_Id=446^User_Input=L307-60#/25" AP^^', '.*User_Input=(.*)[\^]$', '\1') User_Input
FROM dual
Posting here just in case any of you find it interesting..

Related

Oracle PLSQL : How to remove duplicate data in string

Step 01 : I have a column A in table tab_T contains that strings :
SELECT A FROM tab_T;
((<123>+<123>+<123>)(*<213>+<213>+<213>+<354>+<354>+<354>+1)(*<985>))(+<654>+<654>+1)
(<599>*<592>*<591>)
(<10945>)
(<736>+<736>+1)
(<216>*<518>)
(<598>*<593>)(*<594>+<594>+<594>+<597>+<595>+<595>+<595>)
...
...
I want to get :
((<123>)(*<213>+<354>+1)(*<985>))(+<654>+1)
(<599>*<591>)
(<10945>)
(<736>)
(<216>*<518>)
(<598>*<593>)(*<594>+<597>+<595>)
...
...
Step 02 : Then i will replace '+' by 'AND' and '*' by 'OR' and delete the number '1' from my string
this is my query (it works good and i share it with you if you need a help)
SELECT RTRIM(RTRIM(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(A,'+','AND'),'*','OR'),'(OR','OR('),'(AND','AND('),'(1)','')
,'OR1',''),'AND1',''),'1OR',''),'1AND',''),'ANDAND','AND'),'OROR','OR'),'AND'),'OR') AS logic
FROM tab_T
Result :
((<123>AND<123>AND<123>)OR(<213>AND<213>AND<213>AND<354>AND<354>AND<354>)OR(<985>))OR(<654>AND<654>)
(<599>OR<592>OR<591>)
(<10945>)
(<736>AND<736>)
(<216>OR<518>)
(<598>OR<593>)OR(<594>AND<594>AND<594>AND<597>AND<595>AND<595>AND<595>)
...
...
so when i apply step 01 and step 2 i will have this result
((<123>)OR(<213>AND<354>)OR(<985>))AND(<654>)
(<599>OR<591>)
(<10945>)
(<736>)
(<216>OR<518>)
(<598>OR<593>)OR(<594>AND<597>AND<595>)
...
...
I need a help or an idea for the step 01 please?
Thx

This will preserve the plus signs in-between the bracketed numbers:
select A original, regexp_replace(A, '(<\d+>)(\+?\1){1,}', '\1') fixed
from tab_T;
The regex can be read as: Remember a group of one or more digits inside of brackets when followed by a group of one or more of the SAME group of remembered numbers preceded by an optional plus sign. When this group is encountered, replace it with the first remembered group.
EDIT: For the sake of completeness, here's the whole thing done with successive CTE's breaking the replaces into logical groupings. This way it's a complete answer and I believe reduced the number of REPLACE() calls. You could do it as a bunch of nested REPLACE's, but I think this is arguably cleaner and easier to understand and maintain down the road.
with tab_T(A) as (
select '((<123>+<123>+<123>)(*<213>+<213>+<213>+<354>+<354>+<354>+1)(*<985>))(+<654>+<654>+1)' from dual union all
select '(<599>*<592>*<591>)' from dual union all
select '(<10945>)' from dual union all
select '(<736>+<736>+1)' from dual union all
select '(<216>*<518>)' from dual union all
select '(<598>*<593>)(*<594>+<594>+<594>+<597>+<595>+<595>+<595>)' from dual
),
-- Remove dups and '+1'
pass_1(original, fixed) as (
select A original, replace(regexp_replace(A, '(<\d+>)(\+?\1){1,}', '\1'), '+1') fixed
from tab_T
),
replace_ors(original, fixed) as (
select original, replace(replace(fixed, '(*', 'OR('), '*', 'OR')
from pass_1
),
replace_ands(original, fixed) as (
select original, replace(replace(fixed, '(+', 'AND('), '+', 'AND')
from replace_ors
)
select original, fixed
from replace_ands
;

I know this is not full answer for your question. But maybe it can help you:
with t as (select '((<123>+<123>+<123>)(*<213>+<213>+<213>+<354>+<354>+<354>+1)(*<985>))(+<654>+<654>+1)' as exp from dual)
, t1 as ( select distinct regexp_substr(exp, '[^+]+', 1, level) names
from t
connect by level <= length(regexp_replace(exp, '[^*+]'))+1
)
SELECT
RTrim(listagg(t1.names,'+') WITHIN GROUP (order by names desc)) string
from t1

I found it :)
select REGEXP_REPLACE
(A,
'(<[^>]+>)(\+|\*?\1)*',
'\1') as logic
FROM tab_T
Thank you anyway ;)

Find Specific number from list

I have millions records like this but im sharing here few records
what i need is just take 8 charchers from this recodrs so many have (.) and some have (/) so remove (.) abd (/) please see the sample output
Records in Table
GBR.FCL.AT.245448C.A
GBR.FCL.AT.225405L.A
at286623da
EASA UK/AT/311969F/A
AT/332092H/A
AT238691G/A
Output should be like this
245448CA
225405LA
286623da
311969FA
332092HA

Assuming we can rely on the sample as complete and representative (not always a safe assumption in SO) the desired output is the last eight characters of the string, ignoring . and \.
So the simplest thing that could possibly work would be to strip out the unwanted characters using translate() then return the last eight characters:
select substr(translate(str, 'a.\', 'a'), -8) as extracted_str
from your_table
A slightly more engineered solution would apply regex to fine a string of the format 999999AA:
select regexp_replace(translate(str, 'a.\', 'a'),
'^(.*)([[:digit:]]{6}[[:alpha:]]{2})(.*)$', '\2'
) as extracted_str
from your_table

Assuming that you need to get 8 characters, excluding / and ., starting from the string AT ( no matter the case) and that there is exactly one occurrence of AT (in any case combination) in the input string, this should be what you need:
with input(x) as (
select 'GBR.FCL.AT.245448C.A' from dual union all
select 'GBR.FCL.AT.225405L.A' from dual union all
select 'at286623da' from dual union all
select 'EASA UK/AT/311969F/A' from dual union all
select 'AT/332092H/A' from dual union all
select 'AT238691G/A' from dual
)
select x as yourString,
substr(translate(x, 'x/.', 'x'), instr(translate(upper(x), '/.x', 'x'), 'AT')+2, 8) as result
from input
Which gives:
YOURSTRING RESULT
-------------------- --------------------------------
GBR.FCL.AT.245448C.A 245448CA
GBR.FCL.AT.225405L.A 225405LA
at286623da 286623da
EASA UK/AT/311969F/A 11969FA
AT/332092H/A 332092HA
AT238691G/A 238691GA

Getting a string value that is between two characters

I need to get the value that is between !03 and !03.
Example:
JDC!0320151104!03OUT
I should get following string in return: 20151104
NOTE: The string isn't always 22 characters long, but I am only concerned with the value that is between !03 and !03.
This is what I have so far. I couldn't make any progress further than this:
SELECT
SUBSTRING(
RegStatsID,
CHARINDEX('!', RegStatsID) + 3,
CHARINDEX('!', REVERSE(RegStatsID))
)
From TableX

Great that you found a solution!
This might be better:
By replacing the "!03" with XML-tags you can easily pick the second "node". Your string will be transformed into <x>JDC</x><x>20151104</x><x>OUT</x>:
DECLARE #test VARCHAR(100)='JDC!0320151104!03OUT';
SELECT CAST('<x>' + REPLACE(#test,'!03','</x><x>') + '</x>' AS XML).value('/x[2]','datetime')
One advantage was to get the value between the two "!03" typed. In this case you get a "real" datetime back without any further casts. If the value there is not a datetime (or date) in all cases, you just use nvarchar(max) as type.
Another advantage was: If you - why ever - need the other values later, you just have them with .value('/x[1 or 3]'...)

I was able to get it right by doing following:
SELECT
SUBSTRING(
RegStatsID,
CHARINDEX('!', RegStatsID) + 3,
len(RegStatsID) - CHARINDEX('!', RegStatsID ) - 2 - CHARINDEX('!', Reverse(RegStatsID))
)

SELECT part of string between symbol and space

I need to create a subquery (or view) column with values pulled from part of a long string. Values will appear like this:
"Recruiter: Recruiter Name Date:..."
I need to select the recruiter name after : and end with the space after the recruiter name. I understand that normalizing would be better, but we only have query access not database setup access in this case.
Ideas appreciated!

You can use a regex for this. A regex will let you express that you want to search for the text Recruiter followed by a colon, a space, and a series of characters followed by a space, and that you want it to extract those characters.
The expression might look a bit like this (untested)
Recruiter: (.+) Date:
This would look for 'Recruiter: ' literally, followed by a string of any characters (.) of length 1 or larger (+), which is to be extracted (the brackets), followed by the literal string ' Date:'.
How you use this with SQL depends on your vendor.

I would create a function that pulls out the value for a given key. You would use it like:
select [dbo].[GetValue]('recruiter',
'aKey: the a value Recruiter: James Bond cKey: the c value')
This returns 'James Bond'
Here is the function:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
create function [dbo].[GetValue](#Key varchar(50), #Line varchar(max))
returns varchar(max)
as
begin
declare #posStart int, #posEnd int
select #posStart=charindex(#Key, #Line) -- the start of the key
if(#posStart = 0)
return '' -- key not found
set #posStart = #posStart + len(#Key) + 1 -- the start of the value
select #line = substring(#line, #posStart, 1000) -- start #Line at the value
select #posEnd=charindex(':', #line) -- find the next key
if(#posEnd > 0)
begin
-- shorten #line to next ":"
select #line = substring(#line, 0, #posEnd)
-- take off everything after the value
select #posEnd = charindex(' ', reverse(#line));
if(#posEnd > 0)
select #line = substring(#line, 0, len(#line) - #posEnd + 1)
end
return rtrim(ltrim(#line))
end
go

SQL - Pulling everything after last forward slash

I'm trying to pull the string to the right of the last forward slash in the string below.
/Applied Analytics/URMFG/Service Analysis/ServiceAnalysis
So basically, I would like to see ServiceAnalysis returned.
I've come across the following SQL, which is close to what I need, but it's not exact.
=MID(K19, FIND("/",K19)+1, LEN(K19))

DECLARE #test NVARCHAR(100)
SET #test = '/Applied Analytics/URMFG/Service Analysis/ServiceAnalysis'
SELECT REVERSE(LEFT(REVERSE(#test), CHARINDEX('/', REVERSE(#test)) -1))
Reverse the String and find first instance of /
Find characters to the left of /
Reverse again to get your desired result

In SQL, you could do this:
declare #string varchar(100) = '/Applied Analytics/URMFG/Service Analysis/ServiceAnalysis';
select RIGHT(#string,charindex('/',reverse(#string),1)-1)
However, still waiting to see if it's EXCEL you're referencing (since that looks like an EXCEL formula).
If it is Excel, then you can use the Reverse() function from this post and apply it like this:
Here's the formula:
=Reverse(LEFT(Reverse(A1),FIND("/",Reverse(A1),1)-1))

Regular Expressions to the rescue! you can achieve this using the RXReplace() function:
RXReplace([column],"^/.*/(.*)$","$1","")
I'll let you look up the RXReplace() documentation on your own, but just to explain the regex itself:
^/ matches the beginning of the string and the starting /
.*/ matches any characters that come next, followed by a / which is the final / before the end of the string (and preceeding the bit that we want to extract)
(.*)$ matches any characters that come next, putting them into a "capturing group" (basically a variable), followed by the end of the string
the $1 is a token which refers to the capturing group above (normally this looks like \1 in regex, but Spotfire is a bit different)
pretty much any time you need to deal with extracting bits of strings in Spotfire expressions, RXReplace() is what you want. it's a lot more sustainable than doing a ton of Left()s, Right()s, and Len()s, although the initial effort can be a bit higher.
more regex info at http://www.regular-expressions.info/.

Another additional approach using PARSENAME() function
DECLARE #String NVARCHAR(100)
SET #String = '/Applied Analytics/URMFG/Service Analysis/ServiceAnalysis'
SELECT PARSENAME(REPLACE(SUBSTRING(#String, 2, 100), '/', '.'), 1) AS [4th part],
PARSENAME(REPLACE(SUBSTRING(#String, 2, 100), '/', '.'), 2) AS [3rd part],
PARSENAME(REPLACE(SUBSTRING(#String, 2, 100), '/', '.'), 3) AS [2nd part],
PARSENAME(REPLACE(SUBSTRING(#String, 2, 100), '/', '.'), 4) AS [1st part]
output

The following SQL Statement worked for me in T-SQL 2017:
SELECT RIGHT([Filename], CHARINDEX('\', REVERSE('\' + [Filename])) - 1)

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using REGEXP_SUBSTR to get key-value pair data - string

Related

Oracle PLSQL : How to remove duplicate data in string

Find Specific number from list

Getting a string value that is between two characters

SELECT part of string between symbol and space

SQL - Pulling everything after last forward slash

Categories

Resources