Oracle extract variable number from string - string

I'm looking to extract a specific number out of a string of mixed alphanumeric characters that are variable in length in a query. I will need this in order to calculate a range based off that number. I'm using Oracle.
Example:
D-3-J32P232
-I need to grab the J32 at least, and most likely even the 32 out of that string. This range of numbers can change at any given time.
It could range from:
D-3-J1P232
to
D-3-J322P2342
The numbers after the second and third letters can be any number of length. Is there any way to do this?

This is simpler and gets both the numbers for the range
select substr( REGEXP_SUBSTR('D-3-J322P2342','[A-Z][0-9]+',1,1),2),
substr( REGEXP_SUBSTR('D-3-J322P2342','[A-Z][0-9]+',1,2),2)
from dual

REGEXP_SUBSTR could work (11g version):
SELECT REGEXP_SUBSTR('D-3-J322P2342','([A-Z]+-\d+-[A-Z]+)(\d+)',1,1,'i',2) num
FROM dual;
A test of your sample data:
SQL> SELECT REGEXP_SUBSTR('D-3-J322P2342',''([A-Z]+-\d+-[A-Z]+)(\d+)',1,1,'i',2) num
2 FROM dual;
NUM
---
322
SQL>
This will accept any case alpha string, followed by a dash, followed by one or more digits, followed by a dash, followed by another any case alpha string, then your number of interest.
In 10g REGEXP_REPLACE, it's bit less straightforward, as they did not add the ability to reference subexpressions until 11g:
SELECT REGEXP_SUBSTR(str,'\d+',1,1) NUM
FROM (SELECT REGEXP_REPLACE('D-3-J322P2342','([A-Z]+-\d+-[A-Z]+)','',1,1,'i') str
FROM dual);
Your sample data:
SQL> SELECT REGEXP_SUBSTR(str,'\d+',1,1) NUM
2 FROM
(SELECT REGEXP_REPLACE('D-3-J322P2342','([A-Z]+-\d+-[A-Z]+)','',1,1,'i') str
3 FROM dual);
NUM
---
322

REGEXP_SUBSTR would do the job

Related

How to extract text from a string between where there are multiple entires that meet the criteria and return all values

This is an exmaple of the string, and it can be longer
1160752 Meranji Oil Sats -Mt(MA) (000600007056 0001), PE:Toolachee Gas Sats -Mt(MA) (000600007070 0003)GL: Contract Services (510000), COT: Network (N), CO: OM-A00009.0723,Oil Sats -Mt(MA) (000600007053 0003)
The result needs to be column1 600007056 column2 600007070 column3 600007053
I am working in Spotfire and creating calclated columns through transformations as I need the columns to join to other data sets
I have tried the below, but it is only picking up the 1st 600.. number not the others, and there can be an undefined amount of those.
Account is the column with the string
Mid([Account],
Find("(000",[Account]) + Len("(000"),
Find("0001)",[Account]) - Find("(000",[Account]) - Len("(000"))
Thank you!
Assuming my guess is correct, and the pattern to look for is:
9 numbers, starting with 6, preceded by 1 opening parenthesis and 3 zeros, followed by a space, 4 numbers and a closing parenthesis
you can grab individual occurrences by:
column1: RXExtract([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))',1)
column2: RXExtract([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))',2)
etc.
The tricky bit is to find how many columns to define, as you say there can be many. One way to know would be to first calculate a max number of occurrences like this:
maxn: Max((Len([Amount]) - Len(RXReplace([Amount],'(?<=\\(000)6\\d{8}(?=\\s\\d{4}\\))','','g'))) / 9)
still assuming the number of digits in each column to extract is 9. This compares the length of the original [Amount] to the one with the extracted patterns replaced by an empty string, divided by 9.
Then you know you can define up to maxn columns, the extra ones for the rows with fewer instances will be empty.
Note that Spotfire always wants two back-slash for escaping (I had to add more to the editor to make it render correctly, I hope I have not missed any).

Alpha Numeric Validation in Microsoft Excel

1- I'd like to use a validation rule for an input cell where the entry must be 7 or 8 alphanumeric characters long
2- at the start of the string Alphas used must be 1 or 2 characters and uppercase.
3- at the end of the string Numerics will always be 6 characters long.
4- The following type of entries are required to be validated
FD456789
X256325
Z899666
DQ985421
FD000052
5-I have created a validation formula. it works fine except it cannot validate 2nd character as alphabate in the string. i used AP656569 and A5656569 for testing. it should allow only AP656569, but on the contrary it is allowing both strings.
Formula: =AND(OR(LEN(A3)=7,LEN(A3)=8),ISNUMBER(VALUE(RIGHT(A3,6))),IF(LEN(A3)=7,NOT(ISNUMBER(VALUE(LEFT(A3,1)))),ISTEXT(MID(A3,2,1))))
You may try:
=AND(AND(LEN(A1)>6,LEN(A1)<9,ISNUMBER(RIGHT(A1,6)*1),CODE(A1)>64,CODE(A1)<91),IF(LEN(A1)=8,AND(CODE(MID(A1,2,1))>64,CODE(MID(A1,2,1))<91),1))
=AND( - Let's check two things:
AND( - Check if multiple conditions are TRUE:
LEN(A1)>6 - Check if string is over 6 char.
LEN(A1)<9 - Check if string in under 9 chars.
ISNUMBER(RIGHT(A1,6)*1 - Check if 6 rightmost characters make up a numeric value.
CODE(A1)>64,CODE(A1)<91 - Check if leftmost characters is in class [A-Z].
IF( - Check the following:
LEN(A1)=8 - Check if the lengths is actually 8.
AND( - If TRUE then check the following:
CODE(MID(A1,2,1))>64,CODE(MID(A1,2,1))<91 - Check if 2nd char is in class [A-Z].
1 - If the length is not false, it will still be 7, therefor we return a 1 (equal to TRUE), to not mess with our parent AND().
You can apply this to your custom validation rule as a formula if you want to avoid false data, or as mentioned in the comments to conditional formatting if you want to be able to show false data after it being entered.
Alternatively, if you have Excel 2019 or higher, and you like code-golf you could use:
=AND(ISNUMBER(RIGHT(A1,6)*1),CODE(A1)>64,CODE(A1)<91,SWITCH(LEN(A1),7,1,8,AND(CODE(MID(A1,2,1))>64,CODE(MID(A1,2,1))<91),0))
Your conditions do not exclude a string like A1234567 (1 capital letter, 7 digits). According to your conditions and assuming your string is in cell A1, this formula should work:
=AND(OR(LEN(A1)=7,LEN(A1)=8),OR(IFERROR(LEFT(A1,1)*1,0)=0,AND(IFERROR(LEFT(A1,1)*1,0)=0,IFERROR(LEFT(A1,2)*1,0)=0)),UNICODE(A1)=UNICODE(UPPER(A1)),UNICODE(MID(A1,2,1))=UNICODE(UPPER(MID(A1,2,1))),IFERROR(MID(RIGHT(A1,6),1,1)*1,0),IFERROR(MID(RIGHT(A1,6),2,1)*1,0),IFERROR(MID(RIGHT(A1,6),3,1)*1,0),IFERROR(MID(RIGHT(A1,6),4,1)*1,0),IFERROR(MID(RIGHT(A1,6),5,1)*1,0),IFERROR(MID(RIGHT(A1,6),6,1)*1,0))
It's basically an AND function that contains:
a condition to check for the lenght of the string: OR(LEN(A1)=7,LEN(A1)=8)
a condition to check if first 2 characters of the string are letters (only the first or both): OR(IFERROR(LEFT(A1,1)*1,0)=0,AND(IFERROR(LEFT(A1,1)*1,0)=0,IFERROR(LEFT(A1,2)*1,0)=0))
a condition to check if the first character is capital: UNICODE(A1)=UNICODE(UPPER(A1))
a condition to check if the second character is capital: UNICODE(MID(A1,2,1))=UNICODE(UPPER(MID(A1,2,1)))
a condition for each last 6 characters to check if they are numeric (example refers to the first one): IFERROR(MID(RIGHT(A1,6),1,1)*1,0)
EDIT: Improvements
The formula can be improved like this:
=AND(OR(LEN(A1)=7,LEN(A1)=8),OR(IFERROR(LEFT(A1,1)*1,0)=0,AND(IFERROR(LEFT(A1,1)*1,0)=0,IFERROR(LEFT(A1,2)*1,0)=0)),EXACT(LEFT(A1,2),UPPER(LEFT(A1,2))),ISNUMBER(RIGHT(A1,6)*1))
It's still an AND function. This the changes:
it contains a single condition to check if the first 2 characters are capital (previously there were 1 for each character that used the UNICODE function): EXACT(LEFT(A1,2),UPPER(LEFT(A1,2))) [CREDIT: JvdV]
it contains a single condition for the last 6 characters to check if they are numeric (previously there were 1 for each character that used the IFERROR function): ISNUMBER(RIGHT(A1,6)*1)
EDIT: correction
In order to exclude special character, i've edited the formula:
=AND(OR(LEN(A1)=7,LEN(A1)=8),OR(AND(UNICODE(A1)>64,UNICODE(A1)<91,ISNUMBER(MID(A1,2,1)*1)),AND(UNICODE(A1)>64,UNICODE(A1)<91,UNICODE(MID(A1,2,1))>64,UNICODE(MID(A1,2,1))<91)),EXACT(LEFT(A1,2),UPPER(LEFT(A1,2))),ISNUMBER(RIGHT(A1,6)*1))

Replace string parts that appear twice Oracle

I am trying to work out in Oracle how to isolate/highlight word combinations in a concatenated string like the one below:
Some words##Again words##More of this||####||Some words##Again words##Other
The idea is to find the word combinations that appear exactly twice and replace them by 0 so I'm left with the ones that appear only once, either on the left side of the ||####|| or on the right side. The result of the query should be something like this:
Highlighted
Some words##Again words##More of this||####||Some words##Again words##**Other**
Replaced
0##0##More of this||####||0##0##Other
To give you some more information about the concatenation: the left side (before the ||####||) is my current customer record, while on the right hand side I have the previous version. By making the replacements I can reveal any differences between customer records.
I have tried to get this done by using:
regexp_replace: this does not work entirely with REGEXP_REPLACE(MY STRING,'((Some words){1,2})|((Again words){1,2})','0',1,0) as for some reason the string parts in my first record are never correctly replaced. I'm also hitting the limits of this function due to the number of word combinations I need to match;
nested CASE WHEN: does not work either obviously as CASE WHEN - even nested - stops when the first match is found but I need to have all conditions checked and replaced.
I have thought about using subselects, but as this query uses one of the largest tables in my schema, this will not be usable except on a per customer basis. And it might still not work...
Some more information in order to find a solid, performant solution:
I have 34 possible word combinations to match
I have no idea which ones will be there, ever, except when I run the query obviously
I have no idea in which order they will be in the concatenated string
I hope this is clear. Anyone with some magical ideas?
Thanks in advance
You can use a recursive sub-query factoring clause to replace one duplicated term at each iteration:
WITH replaced ( value, start_char ) AS (
SELECT REGEXP_REPLACE(
value,
'(##|^)([^#]+?)((##[^#]+?)*\|\|####\|\|([^#]+?##)*)\2(##|$)',
'\10\30\6',
1
),
REGEXP_INSTR(
value,
'(##|^)([^#]+?)((##[^#]+?)*\|\|####\|\|([^#]+?##)*)\2(##|$)',
1
)
FROM table_name
UNION ALL
SELECT REGEXP_REPLACE(
value,
'(##|^)([^#]+?)((##[^#]+?)*\|\|####\|\|([^#]+?##)*)\2(##|$)',
'\10\30\6',
start_char + 1
),
REGEXP_INSTR(
value,
'(##|^)([^#]+?)((##[^#]+?)*\|\|####\|\|([^#]+?##)*)\2(##|$)',
start_char + 1
)
FROM replaced
WHERE start_char > 0
)
SELECT value
FROM replaced
WHERE start_char = 0;
Which, for the sample data:
CREATE TABLE table_name ( value ) AS
SELECT 'Some words##Again words##More of this||####||Some words##Again words##Other' FROM DUAL UNION ALL
SELECT '333##123##789##555||####||123##456##789##222##333' FROM DUAL;
Outputs:
| VALUE |
| :------------------------------------ |
| 0##0##More of this||####||0##0##Other |
| 0##0##0##555||####||0##456##0##222##0 |
db<>fiddle here
Explanation:
The regular expression matches:
(##|^) either two # characters or the start of the string ^ (in the first capturing group ());
([^#]+?) one-or-more characters that are not # (in the second capturning group ());
( the start of the 3rd capturing group;
(##[^#]+?)* two # characters followed by one-or-more non-# characters (in the 4th capturing group ()) all repeated zero-or-more * times;
\|\|####\|\| then two | characters, four # characters and two | characters;
([^#]+?##)* then one-of-more non-# characters followed by two # characters (in the 5th capturing group ());
) the end of the 3rd capturing group;
\2 a duplicate of the 2nd capturing group; then
(##|$) either two # characters or the end-of-the-string $ (in the 6th capturing group).
This is replaced by:
\10\30\6 which is the contents of the 1st capturing group then a zero (replacing the 2nd capturing group) then the contents of the 3rd capturing group then a second zero (replacing the matched duplicate) then the contents of the 6th capturing group.
The query will replace a pair of duplicate terms in the string (if they exist) and REGEXP_INSTR will find the start of the match and put the values into value and start_char (respectively); then at the next iteration the regular expression will start looking from the next character on from the start of the previous match, so that it will gradually move across the string finding matches until no more duplicate terms can be found and REGEXP_REPLACE will not perform a replacement and REGEXP_INSTR will return 0 and the iteration will terminate.
The final query filters to return the only the final level of the iteration (when all the duplicates have been replaced).

Is Excel's FALSE like infinity?

Why does =FALSE<10000000000 evaluate as FALSE and =FALSE>10000000000 evaluate as TRUE? I have tried some different numbers and this seems to always be the case.
This is by design. Search help for "Troubleshoot Sort" to see the default sort order.
In an ascending sort, Microsoft Excel uses the following order.
Numbers: Numbers are sorted from the smallest negative number to the largest positive number.
Alphanumeric sort: When you sort alphanumeric text, Excel sorts left to right, character by character. For example, if a cell contains the text "A100," Excel places the cell after a cell that contains the entry "A1" and before a cell that contains the entry "A11."
Text and text that includes numbers are sorted in the following order:
0 1 2 3 4 5 6 7 8 9 (space) ! " # $ % & ( ) * , . / : ; ? # [ \ ] ^ _ ` { | } ~ + < = > A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Apostrophes (') and hyphens (-) are ignored, with one exception: If two text strings are the same except for a hyphen, the text with the hyphen is sorted last.
Logical values: In logical values, FALSE is placed before TRUE.
Error values: All error values are equal.
Blanks: Blanks are always placed last.
The default sort order matters because that is how Excel was designed to compare different data types. Logical values are always after text and numbers. Error values are always after that. Blanks are always last. When you use comparison operators (<, <=, =, etc.) it uses the same comparison algorithm as the sort (or more likely, the sort alogrithm uses the comparison operator code, which makes them identical).
TRUE<>1 according to the sort order, but --TRUE=1. The formula parser recognized that you're trying to negate something. If it's a Boolean value, it converts it to 0 or 1. There's nothing 0-ish or 1-ish about the Boolean value, it's just the result of an internal Type Coercion function. If you type --"SomeString" it does the same thing. It sends the string into the Type Coercion function that reports back 'Unable to coerce' and ends up as #VALUE! in the cell.
That's the 'Why it behaves that way' answer. I don't know the 'Why did they design it that way' answer.
Obviously the boolean TRUE/FALSE are different data types to numbers. Check this (http://msdn.microsoft.com/en-us/library/office/bb687869.aspx) to see that boolean variables are stored in 2-byte (or whatever a short integer is for a certain architecture). However this is the memory where the data is stored, because excel actually has a special data class for boolean vars. Specifically: xltypeNum for numbers, xltypeStr for strings, and xltypeBool for what we discuss.
The relations between same types is clear, now what TRUE<1000 does?? probably nothing meaningful-useful.
Ways to overcome this issue:
=ABS(BOOLEAN_VAR), i.e. =ABS(FALSE) --> 0 and =ABS(TRUE) --> 1
or
=INT(BOOLEAN_VAR), i.e. =INT(FALSE) --> 0 and =INT(TRUE) --> 1
or
=BOOLEAN_VAR*1, i.e. =FALSE*1 --> 0 and =TRUE*1 --> 1
or
=+BOOLEAN_VAR, i.e. =+FALSE --> 0 and =+TRUE --> 1
As you see in these ways you force excel to output a numeric type of data, either by providing the boolean into a function or using the boolean var in an expression.

Group digits in currency and remove leading zeroes

I want to know how to do
digit grouping
when I have value for money for example 3000000 ( 3million) i want to print 3.000.000 on the screen (there is a dot every three character from the last character)
Remove zeroes in front of value
when I select a value from table and print it on the screen, the value get padded with zeroes automatically: e.g. 129 becomes 0000129
The WRITE statement allows you to specify a currency. Example:
DATA price TYPE p DECIMALS 2.
price = '3000000'.
WRITE: / price CURRENCY 'USD'.
Note that this does not interpret the number itself, but just adds commas and dots at certain positions depending on the currency you specify. So in the event you have an integer with the value of 3000000 and you write it with currency USD the result will be 30.000,00.
I suggest you read the F1 help information on the WRITE statement, because there are a lot more options besides this one.
--
Removing leading zeroes is done by using a conversion routine.
The CONVERSION_EXIT_ALPHA_INPUT will add leading zeroes and CONVERSION_EXIT_ALPHA_OUTPUT will remove them.
It is possible to add these routines to a Domain in the dictionary, so the conversion will be done automatically. For example the MATNR type:
DATA matnr TYPE matnr.
matnr = '0000129'.
WRITE: / matnr.
This will output 129 because the Domain MATNR has a conversion routine specified.
In the case of a type which does not have this, for example:
DATA value(7) TYPE n.
value = '0000129'.
WRITE: / value.
The output will be 0000129. You can call the CONVERSION_EXIT_ALPHA_OUTPUT routine to achieve the output without leading zeroes:
DATA value(7) TYPE n.
value = '0000129'.
CALL FUNCTION 'CONVERSION_EXIT_ALPHA_OUTPUT'
EXPORTING
input = value
IMPORTING
output = value.
WRITE: / value.
Please also note that output conversion for numberlike types - triggered by the WRITE statement - is controlled by a property in the user master data.
Decimal separator and digit grouping should be configured there.
You could check this in the user master transactions e.g. SU01 or SU01D.
For removing the zero padding use NO-ZERO statement. For the thousand separator I do not see any problem because it is a standard way ABAP prints values of type P. Here is a sample code.
REPORT ZZZ.
DATA:
g_n TYPE n LENGTH 10 VALUE '129',
g_p TYPE p LENGTH 12 DECIMALS 2 VALUE '3000000'.
START-OF-SELECTION.
WRITE /: g_n, g_p.
WRITE /: g_n NO-ZERO, g_p.
This produces the output.
000000129
3.000.000,00
129
3.000.000,00
For removing leading zeros, you can do the following:
data: lv_n type n length 10 value '129'.
shift lv_n left deleting leading '0'.

Resources