Removing white spaces between numbers using regex in abinitio if length is 9 by excluding spaces

Removing white spaces between numbers using regex in abinitio if length is 9 by excluding spaces - regular-language

For example ,
If My Input is : some text here 345 646 356 some text 235 5343
Output should be : some text here 345646356 some text 235 5343
In this case, it needs to remove spaces between numbers if length is 9. otherwise spaces should be as it is .
I have tried below command but it removes all the spaces between numbers if length is <9 or >9.
Input : my data is 345 245 254 and 454 356 34 and
Logic :
final_value = re_replace( final_value , "((?<=\d) +(?=\d))" ,"");
Output :
my data is 345245254 and 45435634 and
But I would need output as
my data is 345245254 and 454 356 34 and

This is long but it works, assuming the string does not contain \x00 characters:
out::clean_9_digits(instr)=
begin
let string(int)[int] strv=string_split_no_empty(instr, " "); // length-prefixed to mark removed elements with "\0"
let int i=0;
while (i<length_of(strv))
if ( not string_is_numeric(strv[i]))
i = i + 1; // continue
else
begin
let string(int) thisnum = strv[i];
let int j = i + 1;
while (j < length_of(strv) && string_is_numeric(strv[j]) )
begin // concatenate all following numeric elements
thisnum = string_concat(thisnum,strv[j]);
j=j+1;
end;
if (length_of(thisnum) == 9) // match!
begin
strv[i]=thisnum; // replace first element with the 9 digit number
j = i + 1; // mark remaining numeric elements that were combined
while (j < length_of(strv) && string_is_numeric(strv[j]) )
begin
strv[j]="\0";
j = j + 1;
end;
end;
i=j+1; // continue at next element following numeric elements
end;
out :: string_replace(string_join(strv, " "), "\0 ", "");
end;
/*Reformat operation*/
out::reformat(in)=
begin
out.instr :: in.instr;
out.outstr :: clean_9_digits(in.instr);
end;

Try with this REGEXP (edited):
re_match_replace_all( str=in.final_value,
pattern="(\\D\\s*\\d)\\s*(\\d)\\s*(\\d)\\s*(\\d)\\s*(\\d)\\s*(\\d)\\s*(\\d)\\s*(\\d)\\s*(\\d\\s*\\D)",
replace_str="$1$2$3$4$5$6$7$8$9"
)
(This does not solve the problem in all cases, see comments)

Related

Longest prefix+suffix-combination in set of strings

I have a set of strings (less than 30) of length 1 to ~30. I need to find the subset of at least two strings that share the longest possible prefix- + suffix-combination.
For example, let the set be
Foobar
Facar
Faobaron
Gweron
Fzobar
The prefix/suffix F/ar has a combined length of 3 and is shared by Foobar, Facar and Fzobar; the prefix/suffix F/obar has a combined length of 5 and is shared by Foobar and Fzobar. The searched-for prefix/suffix is F/obar.
Note that this is not to be confused with the longest common prefix/suffix, since only two or more strings from the set need to share the same prefix+suffix. Also note that the sum of the lengths of both the prefix and the suffix is what is to be maximized, so both need to be taken into account. The prefix or suffix may be the empty string.
Does anyone know of an efficient method to implement this?

How about this:
maxLen := -1;
for I := 0 to Len(A) - 1 do
if Len(A[I]) > maxLen then // (1)
for J := 0 to Len(A[I]) do
for K := 0 to Len(A[I]) - J do
if J+K > maxLen then // (2)
begin
prf := LeftStr(A[I], J);
suf := RightStr(A[I], K);
found := False;
for m := 0 to Len(sufList) - 1 do
if (sufList[m] = suf) and (prfList[m] = prf) then
begin
maxLen := J+K;
Result := prf+'/'+suf;
found := True;
// (3)
n := 0;
while n < Len(sufList) do
if Len(sufList[n])+Len(prfList[n]) <= maxLen then
begin
sufList.Delete(n);
prfList.Delete(n);
end
else
Inc(n);
// (end of 3)
Break;
end;
if not found then
begin
sufList.Add(suf);
prfList.Add(prf);
end;
end;
In this example maxLen keeps sum of lengths of longest found prefix/suffix so far. The most important part of it is the line marked with (2). It bypasses lots of unnecessary string comparisons. In section (3) it eliminates any existing prefix/suffix that is shorter than newly found one (winch is duplicated).

PLSQL - Concatenate two strings and split into 3 strings on condition

In PL-SQL, I want to concatenate two strings taken from 2 columns (address line 1, address line 2, max 45 characters each) into 3 strings (address line 1, address line 2, address line 3, maximum 34 characters each) based on the condition that no word should cut in the middle. for example:
If address line 1 contains:
1, abc park, def chowk, ghi marg c-123 street
and address line 2 contains:
city mumbai, pin - 435353
Combined with numbering to show where 34 characters falls:
1111111111222222222233333 1111111111222222222233333
12345678901234567890123456789012341234567890123456789012345678901234123
1, abc park, def chowk, ghi marg c-123 street city mumbai, pin - 435353
The result should be like this
Add1 (max 34 char):
1, abc park, def chowk, ghi marg
Add2 (max 34 char):
c-123 street city mumbai,
Add3 (max 34 char):
pin - 435353

I had the same problem, I have written this function which split a text into fixed length lines without truncating words.
pi_text : Your unwrapped text
pi_max_line : Line length you want to split
CREATE OR REPLACE FUNCTION wrap_to_paragraph(pi_text VARCHAR2,
pi_max_line PLS_INTEGER,
pi_end_paragraph VARCHAR2 DEFAULT CHR(10)) RETURN VARCHAR2 IS
TYPE paragraph_tabletype_aat IS TABLE OF VARCHAR2(100) INDEX BY BINARY_INTEGER;
l_loc_para paragraph_tabletype_aat;
l_loc_lines INTEGER;
l_return VARCHAR2(32767);
PROCEDURE to_paragraph(pi_text_in IN VARCHAR2,
pi_line_length IN INTEGER,
po_paragraph_out IN OUT paragraph_tabletype_aat,
pio_num_lines_out IN OUT INTEGER,
pi_word_break_at_in IN VARCHAR2 := ' ') IS
l_len_text INTEGER := LENGTH(pi_text_in);
l_line_start_loc INTEGER := 1;
l_line_end_loc INTEGER := 1;
l_last_space_loc INTEGER;
l_curr_line VARCHAR2(100);
l_replace_string VARCHAR2(100) := NULL;
PROCEDURE set_replace_string IS
BEGIN
l_replace_string := RPAD('#', LENGTH(pi_word_break_at_in), '#');
END set_replace_string;
PROCEDURE find_last_delim_loc(pi_line_in IN VARCHAR2,
po_loc_out OUT INTEGER) IS
l_line VARCHAR2(1000) := pi_line_in;
BEGIN
IF pi_word_break_at_in IS NOT NULL
THEN
l_line := translate(pi_line_in, pi_word_break_at_in, l_replace_string);
END IF;
po_loc_out := INSTR(l_line, '#', -1);
END find_last_delim_loc;
BEGIN
set_replace_string;
IF l_len_text IS NULL
THEN
pio_num_lines_out := 0;
ELSE
pio_num_lines_out := 1;
LOOP
EXIT WHEN l_line_end_loc > l_len_text;
l_line_end_loc := LEAST(l_line_end_loc + pi_line_length, l_len_text + 1);
/* get the next possible line of text */
l_curr_line := SUBSTRB(pi_text_in || ' ', l_line_start_loc, pi_line_length + 1);
/* find the last space in this section of the line */
find_last_delim_loc(l_curr_line, l_last_space_loc);
/* When NO spaces exist, use the full current line*/
/* otherwise, cut the line at the space. */
IF l_last_space_loc > 0
THEN
l_line_end_loc := l_line_start_loc + l_last_space_loc;
END IF;
IF INSTR(l_curr_line, pi_end_paragraph) > 0
THEN
l_line_end_loc := l_line_start_loc + INSTR(l_curr_line, pi_end_paragraph) + 1;
END IF;
/* Add this line to the paragraph */
po_paragraph_out(pio_num_lines_out) := REPLACE(SUBSTRB(pi_text_in,
l_line_start_loc,
l_line_end_loc - l_line_start_loc),
pi_end_paragraph);
pio_num_lines_out := pio_num_lines_out + 1;
l_line_start_loc := l_line_end_loc;
END LOOP;
pio_num_lines_out := pio_num_lines_out - 1;
END IF;
END to_paragraph;
BEGIN
/* Return original */
IF (pi_max_line = 0 OR pi_max_line > 99)
THEN
RETURN pi_text;
END IF;
/* Build each paragraph in record */
to_paragraph(pi_text, pi_max_line, l_loc_para, l_loc_lines);
/* Extract Result */
FOR i IN 1 .. l_loc_lines
LOOP
l_return := l_return || l_loc_para(i) || pi_end_paragraph;
END LOOP;
RETURN TRIM(CHR(10) FROM l_return);
END wrap_to_paragraph;

REGEXP_SUBSTR() can be used to solve this for you. As I said in my comment to your original post, your text says don't break in a word but your example shows ADD3 breaking after the last comma/space, not just space so you need to define your rule further (Maybe when it's the last section after the last comma only?). Anyway, sticking with what you wrote, the regex gives either the 1st, 2nd or 3rd occurrence of up to 34 characters that are followed by a character that is not a whitespace, followed by a whitespace character or the end of the line.
SQL> with tbl(addr) as (
select '1, abc park, def chowk, ghi marg c-123 street city mumbai, pin - 435353'
from dual
)
select regexp_substr(addr, '(.{0,34}\S)(\s|$)', 1, 1) add1,
regexp_substr(addr, '(.{0,34}\S)(\s|$)', 1, 2) add2,
regexp_substr(addr, '(.{0,34}\S)(\s|$)', 1, 3) add3
from tbl;
ADD1 ADD2 ADD3
--------------------------------- -------------------------------- ------
1, abc park, def chowk, ghi marg c-123 street city mumbai, pin - 435353
SQL>

Is there a better way to insert "|' into binary string rep to get this 10|000|001

Is there a better way to insert "|" into a string
given a binary string representation of decimal 200 = 11001000
this function returns a string = 11|001|000
While this function works, it seems very kludgy!! Why is it so
hard in GO to do a simple character insertion???
func (i Binary) FString() string {
a := strconv.FormatUint(i.Get(), 2)
y := make([]string, len(a), len(a)*2)
data := []rune(a)
r := []rune{}
for i := len(data) - 1; i >= 0; i-- {
r = append(r, data[i])
}
for j := len(a) - 1; j >= 0; j-- {
y = append(y, string(r[j]))
if ((j)%3) == 0 && j > 0 {
y = append(y, "|")
}
}
return strings.Join(y, "")
}

Depends on what you call better. I'd use regular expressions.
In this case, the complexity arises from inserting separators from the right. If we padded the string so that its length was a multiple of 3, we could insert the separator from the left. And we could easily use a regular expression to insert | before every three characters. Then, we can just strip off the leading | + padding.
func (i Binary) FString() string {
a := strconv.FormatUint(i.Get(), 2)
pad_req := len(a) % 3
padding := strings.Repeat("0", (3 - pad_req))
a = padding + a
re := regexp.MustCompile("([01]{3})")
a = re.ReplaceAllString(a, "|$1")
start := len(padding) + 1
if len(padding) == 3 {
// If we padded with "000", we want to remove the `|` before *and* after it
start = 5
}
a = a[start:]
return a
}
Snippet on the Go Playground

If performance is not critical and you just want a compact version, you may copy the input digits to output, and insert a | symbol whenever a group of 2 has been written to the output.
Groups are counted from right-to-left, so when copying the digits from left-to-right, the first group might be smaller. So the counter of digits inside a group may not necessarily start from 0 in case of the first group, but from len(input)%3.
Here is an example of it:
func Format(s string) string {
b, count := &bytes.Buffer{}, len(s)%3
for i, r := range s {
if i > 0 && count == i%3 {
b.WriteRune('|')
}
b.WriteRune(r)
}
return b.String()
}
Testing it:
for i := uint64(0); i < 10; i++ {
fmt.Println(Format(strconv.FormatUint(i, 2)))
}
fmt.Println(Format(strconv.FormatInt(1234, 2)))
Output (try it on the Go Playground):
0
1
10
11
100
101
110
111
1|000
1|001
10|011|010|010
If you have to do this many times and performance does matter, then check out my answer to the question: How to fmt.Printf an integer with thousands comma
Based on that a fast solution can be:
func Format(s string) string {
out := make([]byte, len(s)+(len(s)-1)/3)
for i, j, k := len(s)-1, len(out)-1, 0; ; i, j = i-1, j-1 {
out[j] = s[i]
if i == 0 {
return string(out)
}
if k++; k == 3 {
j, k = j-1, 0
out[j] = '|'
}
}
}
Output is the same of course. Try it on the Go Playground.

This is a partitioning problem. You can use this function:
func partition(s, separator string, pLen int) string {
if pLen < 1 || len(s) == 0 || len(separator) == 0 {
return s
}
buffer := []rune(s)
L := len(buffer)
pCount := L / pLen
result := []string{}
index := 0
for ; index < pCount; index++ {
_from := L - (index+1)*pLen
_to := L - index*pLen
result = append(result, string(buffer[_from:_to]))
}
if L%pLen != 0 {
result = append(result, string(buffer[0:L-index*pLen]))
}
for h, t := 0, len(result)-1; h < t; h, t = h+1, t-1 {
result[t], result[h] = result[h], result[t]
}
return strings.Join(result, separator)
}
And s := partition("11001000", "|", 3) will give you 11|001|000.
Here is a little test:
func TestSmokeTest(t *testing.T) {
input := "11001000"
s := partition(input, "|", 3)
if s != "11|001|000" {
t.Fail()
}
s = partition(input, "|", 2)
if s != "11|00|10|00" {
t.Fail()
}
input = "0111001000"
s = partition(input, "|", 3)
if s != "0|111|001|000" {
t.Fail()
}
s = partition(input, "|", 2)
if s != "01|11|00|10|00" {
t.Fail()
}
}

ORACLE PL-SQL How to SPLIT a string and RETURN the list using a Function

How to Split the given String for the given Delimiter.
Ex:
INPUT
String => '1,2,3,4,5'
Delimiter => ','
OUTPUT
1
2
3
4
5

What about this? The regular expression allows for null list elements too.
SQL> with tbl(str) as (
2 select '1,2,,4,5' from dual
3 )
4 select regexp_substr(str, '(.*?)(,|$)', 1, level, null, 1) element
5 from tbl
6 connect by level <= regexp_count(str, ',')+1;
ELEMENT
--------
1
2
4
5
SQL>
See this post for a function that returns a list element: REGEX to select nth value from a list, allowing for nulls

I have found my own way to split the given String using a FUNCTION
A TYPE should be declared as belows:
TYPE tabsplit IS TABLE OF VARCHAR2 (50)
INDEX BY BINARY_INTEGER;
And the FUNCTION should be written like this:
FUNCTION fn_split (mp_string IN VARCHAR2, mp_delimiter IN VARCHAR2)
RETURN tabsplit
IS
ml_point NUMBER (5, 0) := 1;
ml_sub_str VARCHAR2 (50);
i NUMBER (5, 0) := 1;
taboutput tabsplit;
ml_count NUMBER (5, 0) := 0;
BEGIN
WHILE i <= LENGTH (mp_string)
LOOP
FOR j IN i .. LENGTH (mp_string)
LOOP
IF SUBSTR (mp_string, j, 1) = mp_delimiter
THEN
ml_sub_str := SUBSTR (mp_string, ml_point, j - ml_point);
ml_point := j + 1;
i := ml_point;
i := i - 1;
taboutput (ml_count) := ml_sub_str;
ml_count := ml_count + 1;
EXIT;
END IF;
END LOOP;
i := i + 1;
END LOOP;
ml_sub_str := SUBSTR (mp_string, ml_point, LENGTH (mp_string));
taboutput (ml_count) := ml_sub_str;
RETURN taboutput;
END fn_split;
This FUNCTION can be used as belows:
DECLARE
taboutput tabsplit;
BEGIN
taboutput := fn_split ('1,2,3,4,5', ',');
FOR i IN 0 .. taboutput.COUNT - 1
LOOP
DBMS_OUTPUT.put_line (taboutput (i));
END LOOP;
END;

SELECT LEVEL AS id, REGEXP_SUBSTR('A,B,C,D', '[^,]+', 1, LEVEL) AS data
FROM dual
CONNECT BY REGEXP_SUBSTR('A,B,C,D', '[^,]+', 1, LEVEL) IS NOT NULL;

SAS simplify the contents of a variable

In SAS, I've a variable V containing the following value
V=1996199619961996200120012001
I'ld like to create these 2 variables
V1=19962001 (= different modalities)
V2=42 (= the first modality appears 4 times and the second one appears 2 times)
Any idea ?
Thanks for your help.
Luc

For your first question (if I understand the pattern correctly), you could extract the first four characters and the last four characters:
a = substr(variable, 1,4)
b = substrn(variable,max(1,length(variable)-3),4);
You could then concatenate the two.
c = cats(a,b)
For the second, the COUNT function can be used to count occurrences of a string within a string:
http://support.sas.com/documentation/cdl/en/lefunctionsref/63354/HTML/default/viewer.htm#p02vuhb5ijuirbn1p7azkyianjd8.htm
Hope this helps :)

Make it a bit more general;
%let modeLength = 4;
%let maxOccur = 100; ** in the input **;
%let maxModes = 10; ** in the output **;
Where does a certain occurrence start?;
%macro occurStart(occurNo);
&modeLength.*&occurNo.-%eval(&modeLength.-1)
%mend;
Read the input;
data simplified ;
infile datalines truncover;
input v $%eval(&modeLength.*&maxOccur.).;
Declare output and work variables;
format what $&modeLength..
v1 $%eval(&modeLength.*&maxModes.).
v2 $&maxModes..;
array w {&maxModes.}; ** what **;
array c {&maxModes.}; ** count **;
Discover unique modes and count them;
countW = 0;
do vNo = 1 to length(v)/&modeLength.;
what = substr(v, %occurStart(vNo), &modeLength.);
do wNo = 1 to countW;
if what eq w(wNo) then do;
c(wNo) = c(wNo) + 1;
goto foundIt;
end;
end;
countW = countW + 1;
w(countW) = what;
c(countW) = 1;
foundIt:
end;
Report results in v1 and v2;
do wNo = 1 to countW;
substr(v1, %occurStart(wNo), &modeLength.) = w(wNo);
substr(v2, wNo, 1) = put(c(wNo),1.);
put _N_= v1= v2=;
end;
keep v1 v2;
The data I testes with;
datalines;
1996199619961996200120012001
197019801990
20011996199619961996200120012001
;
run;

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Removing white spaces between numbers using regex in abinitio if length is 9 by excluding spaces - regular-language

Try with this REGEXP (edited): re_match_replace_all( str=in.final_value, pattern="(\\D\\s\\d)\\s(\\d)\\s(\\d)\\s(\\d)\\s(\\d)\\s(\\d)\\s(\\d)\\s(\\d)\\s(\\d\\s\\D)", replace_str="$1$2$3$4$5$6$7$8$9" ) (This does not solve the problem in all cases, see comments)

Related

Longest prefix+suffix-combination in set of strings

PLSQL - Concatenate two strings and split into 3 strings on condition

Is there a better way to insert "|' into binary string rep to get this 10|000|001

ORACLE PL-SQL How to SPLIT a string and RETURN the list using a Function

SAS simplify the contents of a variable

Categories

Resources

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Removing white spaces between numbers using regex in abinitio if length is 9 by excluding spaces - regular-language

Try with this REGEXP (edited): re_match_replace_all( str=in.final_value, pattern="(\\D\\s*\\d)\\s*(\\d)\\s*(\\d)\\s*(\\d)\\s*(\\d)\\s*(\\d)\\s*(\\d)\\s*(\\d)\\s*(\\d\\s*\\D)", replace_str="$1$2$3$4$5$6$7$8$9" ) (This does not solve the problem in all cases, see comments)

Related

Longest prefix+suffix-combination in set of strings

PLSQL - Concatenate two strings and split into 3 strings on condition

Is there a better way to insert "|' into binary string rep to get this 10|000|001

ORACLE PL-SQL How to SPLIT a string and RETURN the list using a Function

SAS simplify the contents of a variable

Categories

Resources

Try with this REGEXP (edited): re_match_replace_all( str=in.final_value, pattern="(\\D\\s\\d)\\s(\\d)\\s(\\d)\\s(\\d)\\s(\\d)\\s(\\d)\\s(\\d)\\s(\\d)\\s(\\d\\s\\D)", replace_str="$1$2$3$4$5$6$7$8$9" ) (This does not solve the problem in all cases, see comments)