Replacing all the instances of a given character in a string except when it is framed by an other specific character - string

I'm looking for a simple/performant/elegant way for replacing all the instances of a given charater within a string except when it is framed by an other specific charater. As an Example :
I want to replace in the string a,b,c,"d,e,f,g",h,i,j all the , characters by # except when they are framed by ". The expected result is : a#b#c#"d,e,f,g"#h#i#j.
Any idea welcomed.

Here is my suggestion as a PL/pgSQL block that - if relevant - can be amended/shaped as a function.
Basically it extracts, stores and replaces the "immune" parts of the string (these enclosed in double quotes), replaces the commas with hashes and then replaces back the "immune" parts. IMMUNE_PATTERN may need to be amended too.
do language plpgsql
$$
declare
target_text text := 'a,b,c,"d,e,f,g",h,i,"d2,e2,f2,g2",j'::text;
IMMUNE_PATTERN constant text := '__%s__';
immune_parts text[];
immune text;
i integer;
begin
immune_parts := array(select * from regexp_matches(target_text,'"[^"]+"','g'));
for immune, i in select * from unnest(immune_parts) with ordinality loop
target_text := replace(target_text, immune, format(IMMUNE_PATTERN, i));
end loop;
target_text := replace(target_text, ',', '#');
for immune, i in select * from unnest(immune_parts) with ordinality loop
target_text := replace(target_text, format(IMMUNE_PATTERN, i), immune);
end loop;
raise notice '%', target_text;
end;
$$;
The result is that
a,b,c,"d,e,f,g",h,i,"d2,e2,f2,g2",j becomes
a#b#c#"d,e,f,g"#h#i#"d2,e2,f2,g2"#j

Related

Passing a commandline parameter containing quotes to installer

I'm trying to pass a custom commandline parameter to an installer created with Inno Setup. The parameter value actually consists of several parameters that will be used to launch the installed program when installation is complete, so the value contains spaces as well as quotes to group together parameters.
For example, when -arg "C:\path with spaces" -moreargs should be used as Parameters in a [Run] section entry, I would like to launch the installer like this:
setup.exe /abc="-arg "C:\path with spaces" -moreargs"
Outputting the parameters that the installer receives in a [Code] section via ParamStr() shows them (of course) split up: /abc=-arg C:\path, with, spaces -moreargs
How do I escape the quotes to retain them?
I tried doubling the inner quotes:
setup.exe /abc="-arg ""C:\path with spaces"" -moreargs"
This correctly keeps the parameter together (/abc=-arg C:\path with spaces -moreargs), however it seems that ParamStr() removes all quotes.
Is there a way to retain quotes within a parameter retrieved with ParamStr() or a param constant {param:abc|DefaultValue}?
Alternatives seem to be to either do my own parameter parsing from GetCmdTail (which contains the original parameter string) or use another character instead of the inner quotes that are retained in ParamStr() and then replace them with quotes afterwards. But I would prefer not doing that if there is a way to use the built-in functions.
It seems both {param} and ParamStr() strip out double quotes, but as you pointed out (thanks!) the GetCmdTail function returns the original.
So here's a function to get the original parameters with quotes:
function ParamStrWithQuotes(ParamName: String) : string;
var
fullCmd : String;
currentParamName : string;
i : Integer;
startPos : Integer;
endPos : Integer;
begin
fullCmd := GetCmdTail
// default to end of string, in case the option is the last item
endPos := Length(fullCmd);
for i := 0 to ParamCount-1 do
begin
// extract parameter name (eg, "/Option=")
currentParamName := Copy(ParamStr(i), 0, pos('=',ParamStr(i)));
// once found, we want the following item
if (startPos > 0) then
begin
endPos := pos(currentParamName,fullCmd)-2; // -1 to move back to actual end position, -1 for space
break; // exit loop
end;
if (CompareText(currentParamName, '/'+ParamName+'=') = 0) then // case-insensitive compare
begin
// found target item, so save its string position
StartPos := pos(currentParamName,fullCmd)+2+Length(ParamName);
end;
end;
if ((fullCmd[StartPos] = fullCmd[EndPos])
and ((fullCmd[StartPos] = '"') or (fullCmd[StartPos] = ''''))) then
begin
// exclude surrounding quotes
Result := Copy(fullCmd, StartPos+1, EndPos-StartPos-1);
end
else
begin
// return as-is
Result := Copy(fullCmd, StartPos, EndPos-StartPos+1);
end;
end;
You can access this with {code:ParamStrWithQuotes|abc}
When invoking the setup.exe, you do have to escape the quotes, so one of the following works for that:
setup.exe /abc="-arg ""C:\path with spaces"" -moreargs"
or
setup.exe /abc='-arg "C:\path with spaces" -moreargs'

Oracle - String - Punctuation Formatting Function

I have a FUNCTION that replaces multiple (consecutive) Horizontal Spaces within a STRING with a singular Horizontal Space;
e.g.
STR_ORIG = 'Hello World'
STR_NEW = 'Hello World'
The function is as follows;
CREATE OR REPLACE FUNCTION CP_RDN_PUNCT(
INS VARCHAR2)
RETURN VARCHAR2
AS
OUTSTR VARCHAR2(4000);
STR VARCHAR2(4000);
BEGIN
STR := INS;
WHILE (INSTR(STR,' ',1) > 0 )
LOOP
OUTSTR := OUTSTR || ' ' || SUBSTR(STR,1,INSTR(STR,' ',1) - 1);
STR := TRIM(BOTH ' ' FROM SUBSTR(STR,INSTR(STR,' ',1)));
END LOOP;
OUTSTR := OUTSTR || ' ' || TRIM(STR);
RETURN TRIM(OUTSTR);
END CP_RDN_PUNCT;
However, I would like to expand on this FUNCTION so it is able to correct basic punctuation formatting (commas, full stops and parentheses). BUT, it's important that the FUNCTION continues to remove multiple (consecutive) Horizontal Spaces.
For example;
If STR_ORIG = 'Hello , Marc' the output would become 'Hello, Marc'
If STR_ORIG = 'Hello.Marc' the output would become 'Hello. Marc'
If STR_ORIG = 'Hello(Marc )' the output would become 'Hello (Marc)'
The rules I would like to use are fairly basic:
Comma;...............One HORIZONTAL SPACE after a Comma.
No HORIZONTAL SPACE before a Comma.
Full Stop;...........One HORIZONTAL SPACE after a Full Stop.
No HORIZONTAL SPACE before a Full Stop.
Open Parenthesis;....No HORIZONTAL SPACE after an Open Parenthesis.
One HORIZONTAL SPACE before an Open Parenthesis.
Closed Parenthesis;..One HORIZONTAL SPACE after an Closed Parenthesis*.
No HORIZONTAL SPACE before an Closed Parenthesis.
*Note: When a Comma or Full Stop is present directly after the Closed Parenthesis, instead of the 'One HORIZONTAL SPACE' rule it will use the 'No HORIZONTAL SPACE' rule.
I believe a FUNCTION is the best approach for this issue (I have explored using pure SQL (REG_EXP) but the code starts getting quite messy - primarily due to inconsistencies in the data). Also, if I wanted to add additional rules in the future (e.g. a rule for underscores), I'm assuming a FUNCTION would be easier to maintain. However, as always I am open to suggestions from the professionals.
Many thanks in advance.
One more approach I could think of is to use an associated array to store the patterns and replacements instead of plain sql. Then in a loop apply each transformation on the string.
CREATE OR REPLACE FUNCTION cp_rdn_punct2 (
inp_pattern VARCHAR2
) RETURN VARCHAR2 AS
v_outstr VARCHAR2(1000) := inp_pattern;
TYPE v_astype IS
TABLE OF VARCHAR2(40) INDEX BY VARCHAR(40);
v_pat v_astype;
v_idx VARCHAR2(40);
BEGIN
v_pat(' *, *' ) := ', ';
v_pat(' *\. *') := '. ';
v_pat(' *\( *') := ' (';
v_pat(' *\) *') := ') ';
v_idx := v_pat.first;
WHILE v_idx IS NOT NULL LOOP
v_outstr := regexp_replace(v_outstr,v_idx,v_pat(v_idx) );
v_idx := v_pat.next(v_idx);
END LOOP;
RETURN v_outstr;
END;
/
You can write the function with a REGEXP than using INSTR , SUBSTR.
Note: This function does not consider multiple type of pattern appearing in the same string. So if "," and "."both appear it won't work. So, you can write all the transformation code required ,EXCEPTION handling etc yourself to cover such scenarios. I have given you the idea as to how it can be done. You may have to rewrite with IF THEN or CASE blocks as I coded inside with clause for a PL/SQL like code.
CREATE OR REPLACE FUNCTION CP_RDN_PUNCT(
inp_pattern VARCHAR2)
RETURN VARCHAR2
AS
outstr VARCHAR2(4000);
BEGIN
with reg ( pattern, regex ,replacement ) AS
(
select ',' , ' *, *', ', ' FROM DUAL UNION ALL
select '.' , ' *\. *', '. ' FROM DUAL UNION ALL
select '(' , ' *\( *', ' (' FROM DUAL
)
SELECT
TRIM(regexp_replace(rep,' *\) *',') ') ) INTO outstr
FROM
(
SELECT
regexp_replace(inp_pattern,regex,replacement) rep
FROM
reg
WHERE
inp_pattern LIKE '%'
|| pattern
|| '%'
);
RETURN outstr;
END;
/

String manipulation in ada

I am getting a path of directory in a string , like "C:\Users\Me\Desktop\Hello”, and I am trying to get the last directory, but without success.
I tried a lot of manipulation on the string but in the end of the day i stayed with nothing... i will be grateful to get some help. Thanks !
Here was my first idea :
Get_Line(Line, Len);
while (Line /="") loop
FirstWord:=Index(Line(1..Len),"\")+1;
declare
NewLine :String := (Line(FirstWord .. Len));
begin
Line:=NewLine ;
end;
end loop;
I know its not working (I can’t assign NewLine to Line because there isn't a match between their lengths), and now I am stuck.
I’m assuming you want to manipulate directory (and file) names, rather than just any old string?
In which case you should look at the standard library packages Ada.Directories (ARM A.16) and Ada.Directories.Hierarchical_File_Names (ARM A.16.1):
with Ada.Directories;
with Ada.Text_IO; use Ada.Text_IO;
procedure Tal is
Line : constant String := "C:\Users\Me\Desktop\Hello";
begin
Put_Line ("Full_Name: "
& Ada.Directories.Full_Name (Line));
Put_Line ("Simple_Name: "
& Ada.Directories.Simple_Name (Line));
Put_Line ("Containing_Directory: "
& Ada.Directories.Containing_Directory (Line));
Put_Line ("Base_Name: "
& Ada.Directories.Base_Name (Line));
end Tal;
On the other hand, if you’re trying to work out plain string manipulation, you could use something like
with Ada.Strings.Fixed;
with Ada.Text_IO; use Ada.Text_IO;
procedure Tal is
function Get_Last_Word (From : String;
With_Separator : String)
return String is
Separator_Position : constant Natural :=
Ada.Strings.Fixed.Index (Source => From,
Pattern => With_Separator,
Going => Ada.Strings.Backward);
begin
-- This will fail if there are no separators in From
return From (Separator_Position + 1 .. From'Last); --'
end Get_Last_Word;
Line : constant String := "C:\Users\Me\Desktop\Hello";
Last_Name : constant String := Get_Last_Word (Line, "\");
begin
Put_Line (Last_Name);
end Tal;
As you can see, putting the logic in Get_Last_Word allows you to hoist Last_Name out of a declare block. But it will never be possible to overwrite a fixed string with a substring of itself (unless you’re prepared to deal with trailing blanks, that is): it’s much better never to try.

Replace substring with binary strings

I want to perform a substring replace operation on binary strings. There is a function available that does this exact thing for strings of type text (c.f.):
replace(string text, from text, to text)
But unfortunately none for binary strings of type bytea (c.f.).
Now I wonder, do I need to reimplement this operation for binary strings or can I use the corresponding basic string function for this task? Are there edge cases that could break my application:
select replace('\000\015Hello World\000\015Hello World'::bytea::text,
'World',
'Jenny')::bytea
I couldn't find a specific note in the documentation so far. Can someone help me on that?
According to the suggestion by #DanielVérité I have implemented a plpgsql function that does a string replace with binary strings of type bytea.
In the implementation I only used functions from the binary strings section, so I think it should be safe to use.
Here's my code:
CREATE OR REPLACE FUNCTION
replace_binary(input_str bytea, pattern bytea, replacement bytea)
RETURNS bytea
AS $$
DECLARE
buf bytea;
pos integer;
BEGIN
buf := '';
-- validate input
IF coalesce(length(input_str), 0) = 0 OR coalesce(length(pattern), 0) = 0
THEN
RETURN input_str;
END IF;
replacement := coalesce(replacement, '');
LOOP
-- find position of pattern in input
pos := position(pattern in input_str);
IF pos = 0 THEN
-- not found: append remaining input to buffer and return
buf := buf || substring(input_str from 1);
RETURN buf;
ELSE
-- found: append substring before pattern to buffer
buf := buf || substring(input_str from 1 for pos - 1);
-- append replacement
buf := buf || replacement;
-- go on with substring of input
input_str := substring(input_str from pos + length(pattern));
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql
IMMUTABLE;
As for my test cases it works quite well:
with input(buf, pattern, replacement) as (values
('tt'::bytea, 't'::bytea, 'ttt'::bytea),
('test'::bytea, 't'::bytea, 'ttt'::bytea),
('abcdefg'::bytea, 't'::bytea, 'ttt'::bytea),
('\000\015Hello 0orld\000\015Hello 0orld'::bytea, '0'::bytea, '1'::bytea))
select encode(replace_binary(buf, pattern, replacement), 'escape') from input;
outputs as expected:
encode
------------------------------------
tttttt
tttesttt
abcdefg
\000\rHello 1orld\000\rHello 1orld
(4 rows)
The problem with casting to text and back to bytea is that it wouldn't work if the replacement strings involved quoted bytes in strings. Let's see with an example.
(I'm setting bytea_output to hex to better see the text, otherwise it's all hex numbers)
Initial query:
with input(x) as (values (('\000\015Hello World\000\015Hello World'::bytea)))
select replace(x::text, 'World', 'Jenny')::bytea from input;
The result is fine:
replace
----------------------------------------
\000\015Hello Jenny\000\015Hello Jenny
(1 row)
But if trying with a modified version that wants to replace the character 0 by 1
with input(x) as (values (('\000\015Hello 0orld\000\015Hello 0orld'::bytea)))
select replace(x::text, '0', '1')::bytea from input;
The result is:
replace
----------------------------------------
IMHello 1orldIMHello 1orld
whereas the desired result would be: \000\015Hello 1orld\000\015Hello 1orld.
This happens because the intermediate representation \000\015 gets replaced by \111\115

Oracle - how replace utf substring in dec notation with coresponding character

i have varchar2 field with values like this: "abc&#193&#158ef" and I need to replace all of the UTF substrings (e.g. #&123) with its corresponding characters in DB encoding (cp-1250).
Any suggestion?
you could use a NVARCHAR2 datatype instead of a VARCHAR2 datatype. Look in the view NLS_DATABASE_PARAMETERS to determine the NVARCHAR2 character set (it will always support unicode).
So, I answer myself. First create a function:
CREATE OR REPLACE FUNCTION decode_string(in_string VARCHAR2) RETURN VARCHAR2
IS
working_string VARCHAR2(4000) := in_string;
regexp VARCHAR2(20):= '&#[[:digit:]]{3}';
utf_code CHAR(5);
replaced_char CHAR(1);
BEGIN
LOOP
utf_code := regexp_substr(working_string, regexp);
EXIT WHEN utf_code IS NULL;
replaced_char := CHR(SUBSTR(utf_code, -3, 3));
working_string := REPLACE(utf_code, replaced_char);
END LOOP;
RETURN working_string;
END;
Then use this function in clasic update statement:
UPDATE foo
SET strfield = decode_string(strfield)
WHERE strfield LIKE '%&#___%';

Resources