Replace substring with binary strings - string

I want to perform a substring replace operation on binary strings. There is a function available that does this exact thing for strings of type text (c.f.):
replace(string text, from text, to text)
But unfortunately none for binary strings of type bytea (c.f.).
Now I wonder, do I need to reimplement this operation for binary strings or can I use the corresponding basic string function for this task? Are there edge cases that could break my application:
select replace('\000\015Hello World\000\015Hello World'::bytea::text,
'World',
'Jenny')::bytea
I couldn't find a specific note in the documentation so far. Can someone help me on that?

According to the suggestion by #DanielVérité I have implemented a plpgsql function that does a string replace with binary strings of type bytea.
In the implementation I only used functions from the binary strings section, so I think it should be safe to use.
Here's my code:
CREATE OR REPLACE FUNCTION
replace_binary(input_str bytea, pattern bytea, replacement bytea)
RETURNS bytea
AS $$
DECLARE
buf bytea;
pos integer;
BEGIN
buf := '';
-- validate input
IF coalesce(length(input_str), 0) = 0 OR coalesce(length(pattern), 0) = 0
THEN
RETURN input_str;
END IF;
replacement := coalesce(replacement, '');
LOOP
-- find position of pattern in input
pos := position(pattern in input_str);
IF pos = 0 THEN
-- not found: append remaining input to buffer and return
buf := buf || substring(input_str from 1);
RETURN buf;
ELSE
-- found: append substring before pattern to buffer
buf := buf || substring(input_str from 1 for pos - 1);
-- append replacement
buf := buf || replacement;
-- go on with substring of input
input_str := substring(input_str from pos + length(pattern));
END IF;
END LOOP;
END;
$$ LANGUAGE plpgsql
IMMUTABLE;
As for my test cases it works quite well:
with input(buf, pattern, replacement) as (values
('tt'::bytea, 't'::bytea, 'ttt'::bytea),
('test'::bytea, 't'::bytea, 'ttt'::bytea),
('abcdefg'::bytea, 't'::bytea, 'ttt'::bytea),
('\000\015Hello 0orld\000\015Hello 0orld'::bytea, '0'::bytea, '1'::bytea))
select encode(replace_binary(buf, pattern, replacement), 'escape') from input;
outputs as expected:
encode
------------------------------------
tttttt
tttesttt
abcdefg
\000\rHello 1orld\000\rHello 1orld
(4 rows)

The problem with casting to text and back to bytea is that it wouldn't work if the replacement strings involved quoted bytes in strings. Let's see with an example.
(I'm setting bytea_output to hex to better see the text, otherwise it's all hex numbers)
Initial query:
with input(x) as (values (('\000\015Hello World\000\015Hello World'::bytea)))
select replace(x::text, 'World', 'Jenny')::bytea from input;
The result is fine:
replace
----------------------------------------
\000\015Hello Jenny\000\015Hello Jenny
(1 row)
But if trying with a modified version that wants to replace the character 0 by 1
with input(x) as (values (('\000\015Hello 0orld\000\015Hello 0orld'::bytea)))
select replace(x::text, '0', '1')::bytea from input;
The result is:
replace
----------------------------------------
IMHello 1orldIMHello 1orld
whereas the desired result would be: \000\015Hello 1orld\000\015Hello 1orld.
This happens because the intermediate representation \000\015 gets replaced by \111\115

Related

Replacing all the instances of a given character in a string except when it is framed by an other specific character

I'm looking for a simple/performant/elegant way for replacing all the instances of a given charater within a string except when it is framed by an other specific charater. As an Example :
I want to replace in the string a,b,c,"d,e,f,g",h,i,j all the , characters by # except when they are framed by ". The expected result is : a#b#c#"d,e,f,g"#h#i#j.
Any idea welcomed.
Here is my suggestion as a PL/pgSQL block that - if relevant - can be amended/shaped as a function.
Basically it extracts, stores and replaces the "immune" parts of the string (these enclosed in double quotes), replaces the commas with hashes and then replaces back the "immune" parts. IMMUNE_PATTERN may need to be amended too.
do language plpgsql
$$
declare
target_text text := 'a,b,c,"d,e,f,g",h,i,"d2,e2,f2,g2",j'::text;
IMMUNE_PATTERN constant text := '__%s__';
immune_parts text[];
immune text;
i integer;
begin
immune_parts := array(select * from regexp_matches(target_text,'"[^"]+"','g'));
for immune, i in select * from unnest(immune_parts) with ordinality loop
target_text := replace(target_text, immune, format(IMMUNE_PATTERN, i));
end loop;
target_text := replace(target_text, ',', '#');
for immune, i in select * from unnest(immune_parts) with ordinality loop
target_text := replace(target_text, format(IMMUNE_PATTERN, i), immune);
end loop;
raise notice '%', target_text;
end;
$$;
The result is that
a,b,c,"d,e,f,g",h,i,"d2,e2,f2,g2",j becomes
a#b#c#"d,e,f,g"#h#i#"d2,e2,f2,g2"#j

Ada - How do I split a string in two parts?

If I create a subprogram of type function that for instance orders you to type a string of a particular length and you type Overflow, it's supposed to type the last half of the string, so in this case it would be flow. But on the other end if I type an odd number of characters like Stack it's supposed to type the last half of the string + the middle letter, so in this case it would be "ack".
Let me make it clearer (text in bold is user input):
Type a string that's not longer than 7 characters: Candy
The other half of the string is: ndy
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
function Split_String (S : in String) return String is
begin
Mid := 1 + (S'Length / 2);
return S(Mid .. S'Last);
end Split_String;
S : String(1 .. 7);
I : Integer;
begin
Put("Type a string that's no longer than 7 characters: ");
Get_Line(S, I);
Put(Split_String(S));
end Split;
Let me tell you how I've been thinking. So I do a Get_Line to see how many characters the string contains. I then put I in my subprogram to determine if its evenly dividable by two or not. If it's dividable by two, the rest should be 0, thus it'll mean that typing out the other half of the string + THE MIDDLE CHARACTER is not needed. If in all the other cases, it's not dividable by two I have to type out the other half of the string + the middle character. But now I stumbled upon a big problem in my main program. I don't know how type out the other half of a string. If a string contains 4 words I can just type out Put(S(3 .. 4); but the thing is that I don't know a general formula for this. Help is appreciated! :) Have a good day!
You need a more general approach to your problem. Also, try to understand how Get_Line works for you.
For example, if you declare an input string with a large size such as
Input : String (1..1024);
You will have a string large enough to work with any likely input values.
Next, you need a variable to indicate how many characters were actually read by Get_Line.
Length : Natural;
The data returned by Get_Line will then be in the slice of the input string designated as
Input (1 .. Length);
Pass that slice to your function to return the second half of the string.
function last_half(S : string) return string;
last_half(Input(1..Length));
Now all you need is to calculate the last half of the string passed to the function last_half. The function will output a slice of the string passed to it. To find the first index of the last half of the input string you must perform the calculation
mid : Positive := 1 + (S'length / 2);
Then simply return the string S(mid .. S'Last).
It appears that the goal of this exercise is to learn how to use array slices. Concentrate on how slices work for you in the problem and the solution will be very simple.
One possible solution is
with Ada.Text_IO; use Ada.Text_IO;
procedure Main is
Input : String (1 .. 1_024);
Length : Natural;
function last_half (S : in String) return String is
Mid : Positive := 1 + (S'Length / 2);
begin
return S (Mid .. S'Last);
end last_half;
begin
Put ("Enter a string: ");
Get_Line (Input, Length);
Put_Line (Input (1 .. Length) & " : " & last_half (Input (1 .. Length)));
end Main;
Study how the solution uses array slices on the return value of Get_Line and on the parameter for the function last_half and on its return statement. It is also important to remember that the type String is defined as an unbounded array of character. This means that every slice of a string is also a string.
type String is array ( Positive range <> ) of Character;
Aside from being an untidy mess, your latest code edit (as of 20:11 GMT on 15 Nov 2021) doesn’t even compile. Please don’t show us code like this! (unless, of course, that’s the problem).
I’d like to strongly suggest this alternate way of inputting strings:
declare
S : constant String := Get_Line;
begin
-- do things with S, which is exactly as long as
-- the input you typed: no undefined characters at
-- the end to confuse the result, no need to worry
-- about overrunning an input buffer
end;
With this change, and obvious syntactic changes, your current code will do what you want.

Replace a character in a string in golang

I am trying to replace a specific position character from an array of strings. Here is what my code looks like:
package main
import (
"fmt"
)
func main() {
str := []string{"test","testing"}
str[0][2] = 'y'
fmt.Println(str)
}
Now, running this gives me the error:
cannot assign to str[0][2]
Any idea how to do this? I have tried using strings.Replace, but AFAIK it will replace all the occurrence of the given character, while I want to replace that specific character. Any help is appreciated. TIA.
Strings in Go are immutable, you can't change their content. To change the value of a string variable, you have to assign a new string value.
An easy way is to first convert the string to a byte or rune slice, do the change and convert back:
s := []byte(str[0])
s[2] = 'y'
str[0] = string(s)
fmt.Println(str)
This will output (try it on the Go Playground):
[teyt testing]
Note: I converted the string to byte slice, because this is what happens when you index a string: it indexes its bytes. A string stores the UTF-8 byte sequence of the text, which may not necessarily map bytes to characters one-to-one.
If you need to replace the 2nd character, use []rune instead:
s := []rune(str[0])
s[2] = 'y'
str[0] = string(s)
fmt.Println(str)
In this example it doesn't matter though, but in general it may.
Also note that strings.Replace() does not (necessarily) replace all occurrences:
func Replace(s, old, new string, n int) string
The parameter n tells how many replacement are to be performed max. So the following also works (try it on the Go Playground):
str[0] = strings.Replace(str[0], "s", "y", 1)
Yet another solution could be to slice the string up until the replacable character, and starting from the character after the replacable one, and just concatenate them (try this one on the Go Playground):
str[0] = str[0][:2] + "y" + str[0][3:]
Care must be taken here too: the slice indices are byte indices, not character (rune) indices.
See related question: Immutable string and pointer address
Here's a function that will do that for you. It takes care of converting the string that you want to modify into a []rune, and then back out to string.
If your intention is to replace bytes rather than runes, you can:
copy this function's code, rename it from runeSub to byteSub
change the r rune parameter to b byte
Also available on repl.it
package main
import "fmt"
// runeSub - given an array of strings (ss), replace the
// (ri)th rune (character) in the (si)th string
// of (ss), with the rune (r)
//
// ss - the array of strings
// si - the index of the string in ss that you want to modify
// ri - the index of the rune in ss[si] that you want to replace
// r - the rune you want to insert
//
// NOTE: this function has no panic protection from things like
// out-of-bound index values
func runeSub(ss []string, si, ri int, r rune) {
rr := []rune(ss[si])
rr[ri] = r
ss[si] = string(rr)
}
func main() {
ss := []string{"test","testing"}
runeSub(ss, 0, 2, 'y')
fmt.Println(ss)
}

ada split() method

I am trying to write an Ada equivalent to the split() method in Java or C++. I am to intake a string and an integer and output two seperate string values. For example:
split of "hello" and 2 would return:
"The first part is he
and the second part is llo"
The code I have is as follows:
-- split.adb splits an input string about a specified position.
--
-- Input: Astring, a string,
-- Pos, an integer.
-- Precondition: pos is in Astring'Range.
-- Output: The substrings Astring(Astring'First..Pos) and
-- Astring(Pos+1..Astring'Last).
--------------------------------------------------------------
with Ada.Text_IO, Ada.Integer_Text_IO, Ada.Strings.Fixed;
use Ada.Text_IO, Ada.Integer_Text_IO, Ada.Strings.Fixed;
procedure Split is
EMPTY_STRING : String := " ";
Astring, Part1, Part2 : String := EMPTY_STRING;
Pos, Chars_Read : Natural;
------------------------------------------------
-- Split() splits a string in two.
-- Receive: The_String, the string to be split,
-- Position, the split index.
-- PRE: 0 < Position <= The_String.length().
-- (Ada arrays are 1-relative by default)
-- Passback: First_Part - the first substring,
-- Last_Part - the second substring.
------------------------------------------------
function Split(TheString : in String ; Pos : in Integer; Part1 : out String ; Part2 : out String) return String is
begin
Move(TheString(TheString'First .. Pos), Part1);
Move(TheString(Pos .. TheString'Last), Part2);
return Part1, Part2;
end Split;
begin -- Prompt for input
Put("To split a string, enter the string: ");
Get_Line(Astring, Chars_Read);
Put("Enter the split position: ");
Get(Pos);
Split(Astring, Pos, Part1, Part2);
Put("The first part is ");
Put_Line(Part1);
Put(" and the second part is ");
Put_Line(Part2);
end Split;
The main part I am having trouble with is returning the two separate string values and in general the whole split() function. Any pointers or help is appreciated. Thank you
Instead of a function, consider making Split a procedure having two out parameters, as you've shown. Then decide if Pos is the last index of Part1 or the first index of Part2; I've chosen the latter.
procedure Split(
TheString : in String; Pos : in Integer;
Part1 : out String; Part2 : out String) is
begin
Move(TheString(TheString'First .. Pos - 1), Part1);
Move(TheString(Pos .. TheString'Last), Part2);
end Split;
Note that String indexes are Positive:
type String is array(Positive range <>) of Character;
subtype Positive is Integer range 1 .. Integer'Last;
Doing this is so trivial, I'm not sure why you'd bother making a routine for it. Just about any routine you could come up with is going to be much harder to use anyway.
Front_Half : constant String := Original(Original'first..Index);
Back_Half : constant String := Original(Index+1..Original'last);
Done.
Note that static Ada strings are very different than strings in other languages like C or Java. Due to their static nature, they are best built either inline like I've done above, or as return values from functions. Since functions cannot return more than one value, a single unified "split" routine is just plain not a good fit for static Ada string handling. Instead, you should either do what I did above, call the corresponding routines from Ada.Strings.Fixed (Head and Tail), or switch to using Ada.Strings.Unbounded.Unbounded_String instead of String.
The latter is probably the easiest option, if you want to keep your Java mindset about string handling. If you want to really learn Ada though, I'd highly suggest you learn to deal with static fixed Strings the Ada way.
From looking over your code you really need to read up in general on the String type, because you're dragging in a lot of expectations in from other languages on how to work with them--which aren't going to work with them. Ada's String type is not one of its more flexible features, in that they are always fixed length. While there are ways of working around the limitations in a situation such as you're describing, it would be much easier to simply use Unbounded_Strings.
The input String to your function could remain of type String, which will adjust to the length of the string that you provide to it. The two output Unbounded_Strings then are simply set to the sliced string components after invoking To_Unbounded_String() on each of them.
Given the constraints of your main program, with all strings bounded by the size of EMPTY_STRING. the procedure with out parameters is the correct approach, with the out parameter storage allocated by the caller (on the stack as it happens)
That is not always the case, so it is worth knowing another way. The problem is how to deal with data whose size is unknown until runtime.
Some languages can only offer runtime allocation on the heap (via "new" or "malloc") and can only access the data via pointers, leaving a variety of messy problems including accesses off the end of the data (buffer overruns) or releasing the storage correctly (memory leaks, accessing freed pointers etc)
Ada will allow this method too, but it is usually unnecessary and strongly discouraged. Unbounded_String is a wrapper over this method, while Bounded_String avoids heap allocation where you can accept an upper bound on the string length.
But also, Ada allows variable sized data structures to be created on the stack; the technique just involves creating a new stack frame and declaring new variables where you need to, with "declare". The new variables can be initialised with function calls.
Each function can only return one object, but that object's size can be determined at runtime. So either "Split" can be implemented as 2 functions, returning Part1 or Part2, or it can return a record containing both strings. It would be a record with two size discriminants, so I have chosen the simpler option here. The function results are usually built in place (avoids copying).
The flow in your example would require two nested Declare blocks; if "Pos" could be identified first, they could be collapsed into one...
procedure Split is
function StringBefore( Input : String; Pos : Natural) return String is
begin
return Input(1 .. Pos-1);
end StringBefore;
function StringFrom ...
begin
Put("To split a string, enter the string: ");
declare
AString : String := Get_Line;
Pos : Natural;
begin
Put("Enter the split position: ");
Get(Pos);
declare
Part1 : String := StringBefore(AString, Pos);
Part2 : String := StringFrom(AString, Pos);
begin
Put("The first part is ");
Put_Line(Part1);
Put(" and the second part is ");
Put_Line(Part2);
end; -- Part1 and Part2 are now out of scope
end; -- AString is now out of scope
end Split;
This can obviously be wrapped in a loop, with different size strings each time, with no memory management issues.
Look at the Head and Tail functions in Ada.Strings.Fixed.
function Head (Source : in String; Count : in Natural; Pad : in Character := Space) return String;
function Tail (Source : in String; Count : in Natural; Pad : in Character := Space)
return String;
Here's an approach that just uses slices of the string.
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Strings.Fixed; use Ada.Strings.Fixed;
procedure Main is
str : String := "one,two,three,four,five,six,seven,eight";
pattern : String := ",";
idx, b_idx : Integer;
begin
b_idx := 1;
for i in 1..Ada.Strings.Fixed.Count ( Source => str, Pattern => pattern ) loop
idx := Ada.Strings.Fixed.Index( Source => str(b_idx..str'Last), Pattern => pattern);
Put_Line(str(b_idx..idx-1)); -- process string slice in any way
b_idx := idx + pattern'Length;
end loop;
-- process last string
Put_Line(str(b_idx..str'Last));
end Main;

Ada string comparison

I am new to Ada and currently trying to write a simple program involving an if-else if statement. The code is as follows:
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Integer_Text_IO; use Ada.Integer_Text_IO;
procedure Year_Codes is
Year : String(1..9) := " ";
CharsRead : Natural;
function YearCode(Name : in String) return Integer is
begin
if(Name = "freshman")then
return 1;
elsif(Name = "sophomore")then
return 2;
elsif(Name = "junior")then
return 3;
elsif(Name = "senior")then
return 4;
else
return 0;
end if;
end YearCode;
begin
Put("Enter your academic year: "); -- Prompt for input
Get_Line(Year, CharsRead); -- Input
Put( YearCode(Year) ); -- Convert and output
New_Line;
end Year_Codes;
I am getting 0 for every answer. Any input on what I am doing wrong?
The "=" operation on strings compares the entire strings. If the user's input is "freshman", the value of Name will be "freshman ", not "freshman". Read the documentation for the Get_Line procedure.
You should probably pass YearCode a slice of the Year string, not the entire string; CharsRead tells you what that slice should be.
Specifically, the call should be:
Put( YearCode(Year(Year'First..CharsRead)) );
Here's a case-insensitive version using attributes:
function YearCode(Name : in String) return Integer is
Type Class is (Freshman, Sophmore, Junior, Senior);
begin
Return 1 + Class'Pos(Class'Value(Name));
exception
When CONSTRAINT_ERROR => Return 0;
end YearCode;
With that extra character in your buffer, it looks to me like you are thinking of strings in C terms. You need to stop that. Of everything in the language, string handling is the most different between Ada and C.
While C strings are null terminated, Ada strings are not. Instead, an Ada string is assumed to be the size of the string array object. Its a simple difference, but it has enormous consequences in how you handle strings.
I go into this a bit in my answer to How to I build a string from other strings in Ada? The basic gist is that in Ada you always try to build perfectly-sized string objects on the fly.
Sadly, Text_IO input is one place that has traditionally made that really hard, due to its string buffer-based input. In that case, you are forced to use an overly large string object as a buffer, and use the returned value as the end of the defined area of the buffer, as Keith showed.
However, if you have a new version of the compiler, you can use the function version of Get_Line to fix that. Simply change your middle two lines to:
Put( YearCode(Get_Line) );

Resources