Related
Im facing some issue here. Can anyone tell me what is wrong with my code?
This is the check50 result:
:) substitution.c exists
:) substitution.c compiles
:( encrypts "A" as "Z" using ZYXWVUTSRQPONMLKJIHGFEDCBA as key
expected "ciphertext: Z...", not ""
:( encrypts "a" as "z" using ZYXWVUTSRQPONMLKJIHGFEDCBA as key
expected "ciphertext: z...", not ""
:( encrypts "ABC" as "NJQ" using NJQSUYBRXMOPFTHZVAWCGILKED as key
expected "ciphertext: NJ...", not ""
:( encrypts "XyZ" as "KeD" using NJQSUYBRXMOPFTHZVAWCGILKED as key
expected "ciphertext: Ke...", not ""
:( encrypts "This is CS50" as "Cbah ah KH50" using YUKFRNLBAVMWZTEOGXHCIPJSQD as key
expected "ciphertext: Cb...", not ""
:( encrypts "This is CS50" as "Cbah ah KH50" using yukfrnlbavmwzteogxhcipjsqd as key
expected "ciphertext: Cb...", not ""
:( encrypts "This is CS50" as "Cbah ah KH50" using YUKFRNLBAVMWZteogxhcipjsqd as key
expected "ciphertext: Cb...", not ""
:( encrypts all alphabetic characters using DWUSXNPQKEGCZFJBTLYROHIAVM as key
expected "ciphertext: Rq...", not ""
:( does not encrypt non-alphabetical characters using DWUSXNPQKEGCZFJBTLYROHIAVM as key
expected "ciphertext: Yq...", not ""
:) handles lack of key
:) handles too many arguments
:) handles invalid key length
:) handles invalid characters in key
:) handles duplicate characters in key
:) handles multiple duplicate characters in key
This is my code:
#include <cs50.h>
#include <stdio.h>
#include <ctype.h>
#include <string.h>
int main(int argc, string argv[])
{
string alphabet= "abcdefghijklmnopqrstuvwxyz";
if(argc != 2)
{
printf("missing/more than 1 command-line argument\n");
return 1;
}
//check if there are 26 characters
int a= strlen(argv[1]);
if(a!=26)
{
printf("key must contain 26 characters\n");
return 1;
}
//Check if characters are all alphabetic
for(int i=0, n=strlen(argv[1]); i<n; i++)
{
if(!isalpha(argv[1][i]))
{
printf("only alphabetic characters allowed\n");
return 1;
}
//check if each letter appear only once
for(int j=1; j<n; j++)
{
if(argv[1][i]==argv[1][j])
{
printf("repeated alphabets not allowed\n");
return 1;
}
}
}
//prompt user for plaintext
string b= get_string("plaintext: \n");
int m=strlen(b);
char ciphertxt[m+1];
//find out the alphabetical position of each character in string b (i.e character c in string b has alphabetical position of 3)
for(int k=0; k<m; k++)
{
for(int p=0, q=strlen(alphabet); p<q; p++)
{
if(b[k]==alphabet[p])
{
ciphertxt[k]= tolower(argv[1][p]);
break;
}
else if(b[k]==(alphabet[p]-32))
{
ciphertxt[k]= toupper(argv[1][p]);
break;
}
else
{
ciphertxt[k]= b[k];
}
}
}
ciphertxt[m]='\0';
//print ciphertext
printf("ciphertext: %s\n", ciphertxt);
return 0;
}
Did you run your code using the tests cs50 shows you? I did; it does not encrypt anything; it always gives "repeated alphabets not allowed" message.
The problem is in the j loop. It will always report the 2nd letter of argv[1] as a duplicate. That is because i and j are both 1 therefore this if(argv[1][i]==argv[1][j]) always evaluates to true.
There are a several approaches to solving this problem. I will not solve them for you using the C programming language. You must do that yourself. Following is an approach that works very efficiently using the Ada programming language, but is not easily accomplished using the C programming language.
In Ada a string is defined as
type string is array (Positive range <>) of Character;
Thus, a string is an unconstrained array type, meaning instances of an array may be any length. Ada arrays require the programmer to define the range of values for the array index. Index values need not start at 0. Index values may start at any value which is valid for the type declared to be the index type. Index types may be integer types or enumeration types. Ada characters are an enumeration type, which allows the programmer to index an array using characters.
The following example uses many of the features described above.
with Ada.Command_Line; use Ada.Command_Line;
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Characters.Handling; use Ada.Characters.Handling;
procedure substitution is
subtype lower is Character range 'a' .. 'z';
subtype upper is Character range 'A' .. 'Z';
subtype sequence is String (1 .. 26);
alphabet : constant array (lower) of Positive :=
(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26);
function substitute (Char : Character; Key : sequence) return Character is
begin
if Char in lower then
return To_Lower (Key (alphabet (To_Lower (Char))));
elsif Char in upper then
return To_Upper (Key (alphabet (To_Lower (Char))));
else
return Char;
end if;
end substitute;
function is_duplicate (char : Character; Key : sequence) return Boolean is
count : Natural := 0;
begin
for C of Key loop
if C = char then
count := count + 1;
end if;
end loop;
return count > 1;
end is_duplicate;
Key : String (1 .. 26);
Invalid_Argument_Error : exception;
begin
if Argument_Count /= 1 then
Put_Line ("There must be exactly one command line argument.");
raise Invalid_Argument_Error;
end if;
if Argument (1)'Length /= 26 then
Put_Line ("The argument must contain 26 characters.");
raise Invalid_Argument_Error;
else
Key := Argument (1);
end if;
for C of Key loop
if is_duplicate (C, Key) then
Put_Line ("The argument cannot contain duplicate values.");
raise Invalid_Argument_Error;
end if;
end loop;
for C of Key loop
if not (C in lower or else C in upper) then
Put_Line ("The argument must contain only alphabetic characters.");
raise Invalid_Argument_Error;
end if;
end loop;
Put_Line ("Enter plain text:");
declare
input : String := Get_Line;
cipher : String (input'Range);
begin
for I in input'Range loop
cipher (I) := substitute (input (I), Key);
end loop;
Put_Line ("cipher text: " & cipher);
end;
end substitution;
Ada allows the starting procedure for a program to be named whatever the programmer wants to name it. In C the starting function must be named "main". In this example the starting procedure is named "substitution". Ada characters are full eight bit characters and represent the Latin-1 character set. The lower seven bits of the Latin-1 character set is the same as the ASCII character set. Thus, there are some lower case characters in the Latin-1 character set which are not part of the ASCII character set. For this reason the program defines the upper case characters and lower case characters unique to the ASCII character set by declaring two subtypes of the type character.
subtype lower is Character range 'a' .. 'z';
subtype upper is Character range 'A' .. 'Z';
The syntax 'a' .. 'z' defines a range of values and includes all the characters starting with 'a' and ending with 'z'.
A subtype of the Ada string type is named sequence and is declared to be a string indexed by the value range 1 .. 26. Thus, each instance of the subtype sequence must contain a 26 character string. Ada does not append a null character to the end of its strings.
The array named alphabet is defined to be a constant array indexed by the subtype lower. Each element of the array is an integer with a minimum value of 1. The array is initialized to the numbers 1 through 26 with 1 indexed by 'a' and 26 indexed by 'z'. This array is used as a look-up table for indexing into the key entered on the program command line.
The function named substitute takes two parameters; Char, which is a Character and Key which is a sequence (a 26 character string). Substitute returns the encrypted character value.
The function returns the character in the Key parameter indexed by the number which is indexed by the letter in the parameter Char. The array named alphabet becomes the look-up table for the index value corresponding to the character contained in the parameter Char.
The function named is_duplicate is used to determine if a character occurs more than once in a Key sequence. It simply counts the number of times the character in the Char parameter occurs in the Key sequence. The function returns TRUE if the count is greater than 1 and false if the count is not greater than 1.
After performing the necessary checks on the command-line parameter the program prompts for a string to encrypt and then simply assigns to the string cipher the encrypted character corresponding to each input character.
with Ada.Text_IO; use Ada.Text_IO;
with Ada.Float_Text_IO; use Ada.Float_Text_IO;
procedure Help is
F: Float
S: String(1..6);
begin
Put("Type a string with max 5 characters: ");
Get_Line(S(1..5), I);
Put("You typed the string: ");
Put(S(1..I));
Skip_Line;
New_Line;
Put("Type a string with max 5 characters: ");
Get_Line(S(1..5), I);
Put("You typed the string: ");
Put(S(1..I));
end Help;
When I run this code, and for instance type "Hey brother" I get "Hey b" as an output, and then it continues to the next problem. All good and clear!
But when I type a string containing less than 5 characters like "Hey", it types it out like expected BUT it is still waiting for me to fill out the string containing 5 characters. It's supposed to jump to the next problem but it doesn't. If I now type "ss", you will get
"Heyss" and then it continues to the next problem. So how do I do this? How do I make it as if even I have less than 5 characters it will not only type it out but also continue to the next problem? I've kind of figured out that i have to use simple "if sequences" but I have no clue on how I should apply it as all of my attempts have failed.
My problem should be able to execute all of these inputs and give the following outputs:
Tye a string with max 5 characters: Hi
You typed the string: Hi
Tye a string with max 5 characters: Hello
You typed the string: Hello
Tye a string with max 5 characters: Hey there
You typed the string: Hey t
Tye a string with max 5 characters:
You typed the string:
The Get_Line function takes two parameters named Item, which is a string and Last which is an instance of Natural.
Get_Line will read the input to the end of the line or the end of the string, whichever comes first. The Last parameter is an OUT parameter returning the index value of the last character read into the string.
Try the following approach:
with Ada.Text_IO; use Ada.Text_IO;
procedure help is
Input : String (1..80); -- It need not be only 5 characters
Length : Natural;
Num_Good : Natural := 0;
begin
while Num_Good < 2 loop
Put ("Enter a string of 5 characters: ");
Get_Line (Item => Input, Last => Length);
if Length = 5 then
Put_Line (Input (1..Length));
Num_Good := Num_Good + 1;
else
Put_Line ("Error: Input does not contain exactly 5 characters.");
end if;
end loop;
end help;
Following is a version that uses only the features you describe:
with Ada.Text_IO; use Ada.Text_IO;
procedure help2 is
S : String(1..80);
I : Integer;
J : Integer := 0;
begin
loop
Put("Enter a string containing 5 characters: ");
Get_Line (S, I);
if I = 5 then
J := J + 1;
Put_Line (S(1..I));
end;
if J = 2 then
exit;
end if;
end loop;
end help2;
The program will read the string input by the user and output the string if it contains exactly 5 characters. Nothing will be output if the string does not contain exactly 5 characters. The loop exits when the user successfully enters two strings containing exactly 5 characters.
The following version accepts a string of up to 5 characters.
with Ada.Text_IO; use Ada.Text_IO;
procedure Up_To_five is
S : String (1..5);
I : Integer;
begin
Put ("Enter a string with a max of 5 characters: ");
Get_Line (S, I);
Skip_Line;
Put ("You typed the string: ");
Put_Line (S(1..I));
New_Line;
Put ("Enter a string with a max of 5 characters: ");
Get_Line (S, I);
Skip_Line;
Put ("You typed the string: ");
Put_Line (S(1..I));
end Up_To_five;
Your program says
Put("Type a string with max 5 characters: ");
Get_Line(S(1..5), I);
and you type hi and press RET (the return key). Get_Line returns, having consumed h i RET, setting S (1 .. 2) to hi, which you print out, and I to 2.
Now, your program says
Skip_Line;
which according to ARM A.10.5(9)
... Reads and discards all characters until a line terminator has been read, ...
and so sits there waiting for another RET.
If on the other hand you type 5 or more characters, Get_Line finishes before needing to read the RET, so it’s still there in the input buffer.
So you need to decide whether or not to call Skip_Line.
I have specific dataformat, say 'n' (arbitrary) row and '4' columns. If 'n' is '10', the example data would go like this.
1.01e+00 -2.01e-02 -3.01e-01 4.01e+02
1.02e+00 -2.02e-02 -3.02e-01 4.02e+02
1.03e+00 -2.03e-02 -3.03e-01 4.03e+02
1.04e+00 -2.04e-02 -3.04e-01 4.04e+02
1.05e+00 -2.05e-02 -3.05e-01 4.05e+02
1.06e+00 -2.06e-02 -3.06e-01 4.06e+02
1.07e+00 -2.07e-02 -3.07e-01 4.07e+02
1.08e+00 -2.08e-02 -3.08e-01 4.07e+02
1.09e+00 -2.09e-02 -3.09e-01 4.09e+02
1.10e+00 -2.10e-02 -3.10e-01 4.10e+02
Constraints in building this input would be
data should have '4' columns.
data separated by white spaces.
I want to implement a feature to check whether the input file has '4' columns in every row, and built my own based on the 'M.S.B's answer in the post Reading data file in Fortran with known number of lines but unknown number of entries in each line.
program readtest
use :: iso_fortran_env
implicit none
character(len=512) :: buffer
integer :: i, i_line, n, io, pos, pos_tmp, n_space
integer,parameter :: max_len = 512
character(len=max_len) :: filename
filename = 'data_wrong.dat'
open(42, file=trim(filename), status='old', action='read')
print *, '+++++++++++++++++++++++++++++++++++'
print *, '+ Count lines +'
print *, '+++++++++++++++++++++++++++++++++++'
n = 0
i_line = 0
do
pos = 1
pos_tmp = 1
i_line = i_line+1
read(42, '(a)', iostat=io) buffer
(*1)! Count blank spaces.
n_space = 0
do
pos = index(buffer(pos+1:), " ") + pos
if (pos /= 0) then
if (pos > pos_tmp+1) then
n_space = n_space+1
pos_tmp = pos
else
pos_tmp = pos
end if
endif
if (pos == max_len) then
exit
end if
end do
pos_tmp = pos
if (io /= 0) then
exit
end if
print *, '> line : ', i_line, ' n_space : ', n_space
n = n+1
end do
print *, ' >> number of line = ', n
end program
If I run the above program with a input file with some wrong rows like follows,
1.01e+00 -2.01e-02 -3.01e-01 4.01e+02
1.02e+00 -2.02e-02 -3.02e-01 4.02e+02
1.03e+00 -2.03e-02 -3.03e-01 4.03e+02
1.04e+00 -2.04e-02 -3.04e-01 4.04e+02
1.05e+00 -2.05e-02 -3.05e-01 4.05e+02
1.06e+00 -2.06e-02 -3.06e-01 4.06e+02
1.07e+00 -2.07e-02 -3.07e-01 4.07e+02
1.0 2.0 3.0
1.08e+00 -2.08e-02 -3.08e-01 4.07e+02 1.00
1.09e+00 -2.09e-02 -3.09e-01 4.09e+02
1.10e+00 -2.10e-02 -3.10e-01 4.10e+02
The output is like this,
+++++++++++++++++++++++++++++++++++
+ Count lines +
+++++++++++++++++++++++++++++++++++
> line : 1 n_space : 4
> line : 2 n_space : 4
> line : 3 n_space : 4
> line : 4 n_space : 4
> line : 5 n_space : 4
> line : 6 n_space : 4
> line : 7 n_space : 4
> line : 8 n_space : 3 (*2)
> line : 9 n_space : 5 (*3)
> line : 10 n_space : 4
> line : 11 n_space : 4
>> number of line = 11
And you can see that the wrong rows are properly detected as I intended (see (*2) and (*3)), and I can write 'if' statements to make some error messages.
But I think my code is 'extremely' ugly since I had to do something like (*1) in the code to count consecutive white spaces as one space. I think there would be much more elegant way to ensure the rows contain only '4' column each, say,
read(*,'4(X, A)') line
(which didn't work)
And also my program would fail if the length of 'buffer' exceeds 'max_len' which is set to '512' in this case. Indeed '512' should be enough for most practical purposes, I also want my checking subroutine to be robust in this way.
So, I want to improve my subroutine in at least these aspects
Want it to be more elegant (not as (*1))
Be more general (especially in regards to 'max_len')
Does anyone has some experience in building this kind of input-checking subroutine ??
Any comments would be highly appreciated.
Thank you for reading the question.
Without knowledge of the exact data format, I think it would be rather difficult to achieve what you want (or at least, I wouldn't know how to do it).
In the most general case, I think your space counting idea is the most robust and correct.
It can be adapted to avoid the maximum string length problem you describe.
In the following code, I go through the data as an unformatted, stream access file.
Basically you read every character and take note of new_lines and spaces.
As you did, you use spaces to count to columns (skipping double spaces) and new_line characters to count the rows.
However, here we are not reading the entire line as a string and going through it to find spaces; we read char by char, avoiding the fixed string length problem and we also end up with a single loop. Hope it helps.
EDIT: now handles white spaces at beginning at end of line and empty lines
program readtest
use :: iso_fortran_env
implicit none
character :: old_char, new_char
integer :: line, io, cols
logical :: beg_line
integer,parameter :: max_len = 512
character(len=max_len) :: filename
filename = 'data_wrong.txt'
! Output format to be used later
100 format (a, 3x, i0, a, 3x , i0)
open(42, file=trim(filename), status='old', action='read', &
form="unformatted", access="stream")
! set utils
old_char = " "
line = 0
beg_line = .true.
cols = 0
! Start scannig char by char
do
read(42, iostat = io) new_char
! Exit if EOF
if (io < 0) then
exit
end if
! Deal with empty lines
if (beg_line .and. new_char==new_line(new_char)) then
line = line + 1
write(*, 100, advance="no") "Line number:", line, &
"; Columns: Number", cols
write(*,'(6x, a5)') "EMPTYLINE"
! Deal with beginning of line for white spaces
elseif (beg_line) then
beg_line = .false.
! this indicates new columns
elseif (new_char==" " .and. old_char/=" ") then
cols = cols + 1
! End of line: time to print
elseif (new_char==new_line(new_char)) then
if (old_char/=" ") then
cols = cols+1
endif
line = line + 1
! Printing out results
write(*, 100, advance="no") "Line number:", line, &
"; Columns: Number", cols
if (cols == 4) then
write(*,'(6x, a5)') "OK"
else
write(*,'(6x, a5)') "ERROR"
end if
! Restart with a new line (reset counters)
cols = 0
beg_line = .true.
end if
old_char = new_char
end do
end program
This is the output of this program:
Line number: 1; Columns number: 4 OK
Line number: 2; Columns number: 4 OK
Line number: 3; Columns number: 4 OK
Line number: 4; Columns number: 4 OK
Line number: 5; Columns number: 4 OK
Line number: 6; Columns number: 4 OK
Line number: 7; Columns number: 4 OK
Line number: 8; Columns number: 3 ERROR
Line number: 9; Columns number: 5 ERROR
Line number: 10; Columns number: 4 OK
Line number: 11; Columns number: 4 OK
If you knew your data format, you could read your lines in a vector of dimension 4 and use iostat variable to print out an error on each line where iostat is an integer greater than 0.
Instead of counting whitespace you can use manipulation of substrings to get what you want. A simple example follows:
program foo
implicit none
character(len=512) str ! Assume str is sufficiently long buffer
integer fd, cnt, m, n
open(newunit=fd, file='test.dat', status='old')
do
cnt = 0
read(fd,'(A)',end=10) str
str = adjustl(str) ! Eliminate possible leading whitespace
do
n = index(str, ' ') ! Find first space
if (n /= 0) then
write(*, '(A)', advance='no') str(1:n)
str = adjustl(str(n+1:))
end if
if (len_trim(str) == 0) exit ! Trailing whitespace
cnt = cnt + 1
end do
if (cnt /= 3) then
write(*,'(A)') ' Error'
else
write(*,*)
end if
end do
10 close(fd)
end program foo
this should read any line of reasonable length (up to the line limit your compiler defaults to, which is generally 2GB now-adays). You could change it to stream I/O to have no limit but most Fortran compilers have trouble reading stream I/O from stdin, which this example reads from. So if the line looks anything like a list of numbers it should read them, tell you how many it read, and let you know if it had an error reading any value as a number (character strings, strings bigger than the size of a REAL value, ....). All the parts here are explained on the Fortran Wiki, but to keep it short this is a stripped down version that just puts the pieces together. The oddest behavior it would have is that if you entered something like this with a slash in it
10 20,,30,40e4 50 / this is a list of numbers
it would treat everything after the slash as a comment and not generate a non-zero status return while returning five values. For a more detailed explanation of the code I think the annotated pieces on the Wiki explain how it works. In the search, look for "getvals" and "readline".
So with this program you can read a line and if the return status is zero and the number of values read is four you should be good except for a few dusty corners where the lines would definitely not look like a list of numbers.
module M_getvals
private
public getvals, readline
implicit none
contains
subroutine getvals(line,values,icount,ierr)
character(len=*),intent(in) :: line
real :: values(:)
integer,intent(out) :: icount, ierr
character(len=:),allocatable :: buffer
character(len=len(line)) :: words(size(values))
integer :: ios, i
ierr=0
words=' '
buffer=trim(line)//"/"
read(buffer,*,iostat=ios) words
icount=0
do i=1,size(values)
if(words(i).eq.'') cycle
read(words(i),*,iostat=ios)values(icount+1)
if(ios.eq.0)then
icount=icount+1
else
ierr=ios
write(*,*)'*getvals* WARNING:['//trim(words(i))//'] is not a number'
endif
enddo
end subroutine getvals
subroutine readline(line,ier)
character(len=:),allocatable,intent(out) :: line
integer,intent(out) :: ier
integer,parameter :: buflen=1024
character(len=buflen) :: buffer
integer :: last, isize
line=''
ier=0
INFINITE: do
read(*,iostat=ier,fmt='(a)',advance='no',size=isize) buffer
if(isize.gt.0)line=line//buffer(:isize)
if(is_iostat_eor(ier))then
last=len(line)
if(last.ne.0)then
if(line(last:last).eq.'\\')then
line=line(:last-1)
cycle INFINITE
endif
endif
ier=0
exit INFINITE
elseif(ier.ne.0)then
exit INFINITE
endif
enddo INFINITE
line=trim(line)
end subroutine readline
end module M_getvals
program tryit
use M_getvals, only: getvals, readline
implicit none
character(len=:),allocatable :: line
real,allocatable :: values(:)
integer :: icount, ier, ierr
INFINITE: do
call readline(line,ier)
if(allocated(values))deallocate(values)
allocate(values(len(line)/2+1))
if(ier.ne.0)exit INFINITE
call getvals(line,values,icount,ierr)
write(*,'(*(g0,1x))')'VALUES=',values(:icount),'NUMBER OF VALUES=',icount,'STATUS=',ierr
enddo INFINITE
end program tryit
Honesty, it should work reasonably with just about any line you throw at it.
PS:
If you are always reading four values, using list-directed I/O and checking the iostat= value on READ and checking if you hit EOR would be very simple (just a few lines) but since you said you wanted to read lines of arbitrary length I am assuming four values on a line was just an example and you wanted something very generic.
I have a string like "1-3,4,9,11-15" and i´m looking for an easy way to convert this to a comma separated string with single numbers like "1,2,3,4,9,11,12,13,14,15' with a function in PL/SQL.
Thanks for your help!
I gave it a try and could achieve something like,
CREATE OR REPLACE FUNCTION get_series_of_numbers(p_input_string IN VARCHAR2)
RETURN VARCHAR2 IS
lo_start NUMBER;
lo_end NUMBER;
lo_final_string VARCHAR2(4000);
--a convinient method to get the series for a strig like '1-3' or 9-15'
FUNCTION get_series(p_series_string VARCHAR2) RETURN VARCHAR2 IS
lo_string_to_return VARCHAR2(4000);
BEGIN
IF instr(p_series_string
,'-') > 0
THEN
lo_start := to_number(substr(p_series_string
,1
,instr(p_series_string
,'-') - 1));
lo_end := to_number(substr(p_series_string
,instr(p_series_string
,'-') + 1));
--query to generate a series of numbers between a start and end point and then concatenate all with ','
SELECT listagg(actual_numbers
,',') within GROUP(ORDER BY actual_numbers)
INTO lo_string_to_return
FROM (SELECT LEVEL actual_numbers FROM dual WHERE LEVEL >= lo_start CONNECT BY LEVEL <= lo_end);
ELSE
lo_string_to_return := p_series_string;
END IF;
RETURN lo_string_to_return;
END;
BEGIN
--this loop is to get all the elements in the string separated by ',' as column
--so that we can loop over all
FOR i IN (SELECT regexp_substr(str
,'[^,]+'
,1
,rownum) split
FROM (SELECT p_input_string str FROM dual)
CONNECT BY LEVEL <= length(regexp_replace(str
,'[^,]+')) + 1)
LOOP
IF lo_final_string IS NOT NULL
THEN
lo_final_string := lo_final_string || ',' || get_series(i.split);
ELSE
lo_final_string := get_series(i.split);
END IF;
END LOOP;
RETURN lo_final_string;
END get_series_of_numbers;
Some test results:
DECLARE
input_string VARCHAR2(4000) := '1,2,8-3,4';
result_string VARCHAR2(4000);
begin
dbms_output.put_line('Input string is: '||input_string);
result_string := get_series_of_numbers(p_input_string => input_string);
dbms_output.put_line('Output string is: '||result_string);
end;
/*
Input string is: 1-3,4,9,11-15
Output string is: 1,2,3,4,9,11,12,13,14,15
Input string is: 1-3,4-6,9-10,11-15
Output string is: 1,2,3,4,5,6,9,10,11,12,13,14,15
Input string is: 1,2,3,4
Output string is: 1,2,3,4
Input string is: 1,2,3-8,4
Output string is: 1,2,3,4,5,6,7,8,4
--a negative case
Input string is: 1,2,8-3,4
Output string is: 1,2,,4
*/
Hope it gives an idea about the requirement which could be further optimized to handle corner cases or anything which is not yet covered. Cheers!!
Consider this different approach breaking the steps down using CTE's. See the comments within. One could combine some of these but keeping steps separated keeps it simpler. It could and should be made into a procedure or function for re-usability too.
This also handles NULL list elements and numbers out of numeric ordering will be sorted numerically in the output. Try with data like '1-3,17-20,4,9,,11-15'. Always expect the unexpected!
-- tbl_orig only creates a source for the original data
WITH tbl_orig(orig_str) AS (
SELECT '1-3,4,9,11-15' FROM dual
),
-- tbl_rows then contains that data split into rows on
-- the comma
tbl_rows(str_element) AS (
SELECT REGEXP_SUBSTR(orig_str, '(.*?)(,|$)', 1, LEVEL, NULL, 1)
FROM tbl_orig
CONNECT BY LEVEL <= REGEXP_COUNT(orig_str, ',')+1
),
-- Next look at those rows and if does not contain a hyphen just keep it,
-- else use an inline view to expand the range using listagg and regex's to
-- get the start and end of the range
tbl_expanded(str_expanded) AS (
SELECT
CASE INSTR(str_element, '-', 1)
WHEN 0 THEN str_element
ELSE (SELECT LISTAGG(n, ',') WITHIN GROUP (ORDER BY n)
FROM (SELECT ROWNUM n FROM dual CONNECT BY LEVEL <= REGEXP_SUBSTR(str_element, '\d+-(\d+)', 1, 1, NULL, 1))
WHERE n >= REGEXP_SUBSTR(str_element, '(\d+)-\d+', 1, 1, NULL, 1)
)
END AS str_expanded
FROM tbl_rows
)
-- Lastly put it all back together, but order by the numeric value of the first part of
-- the character strings.
SELECT LISTAGG(str_expanded, ',') WITHIN GROUP (ORDER BY TO_NUMBER(REGEXP_REPLACE(str_expanded, '(\d+).*', '\1')))
as fullrange
FROM tbl_expanded;
FULLRANGE
--------------------------------------------------------------------------------
1,2,3,4,9,11,12,13,14,15
I have an interesting problem and am wondering if oracle has a built-in function to do this or I need to find a fast way to do it in plsql.
Take 2 strings:
s1 = 'abc def hijk'
s2 = 'abc def iosk'
The function needs to return abc def because the strings are exactly the same up to that point.
Another example:
s1 = 'abc def hijk www'
s2 = 'abc def iosk www'
The function needs to return abc def.
The only way I can think of doing this is loop through string1 and compare each character with substr() again the substr of string 2.
Just wondering if Oracle's got something built-in. Performance is pretty important.
After re-reading your question, here would be what you really wanted:
with cte1 as (
select 1 id, 'abc def hijk www' str from dual
union all
select 2 id, 'abc def iosk www' str from dual
), num_gen as (
-- a number generator up to the minimum length of the strings
SELECT level num
FROM dual t
CONNECT BY level <= (select min(length(str)) from cte1)
), cte2 as (
-- build substrings of increasing length
select id, num_gen.num, substr(cte1.str, 1, num_gen.num) sub
from cte1
cross join num_gen
), cte3 as (
-- self join to check if the substrings are equal
select x1.num, x1.sub sub1, x2.sub sub2
from cte2 x1
join cte2 x2 on (x1.num = x2.num and x1.id != x2.id)
), cte4 as (
-- select maximum string length
select max(num) max_num
from cte3
where sub1 = sub2
)
-- finally, get the substring with the max length
select cte3.sub1
from cte3
join cte4 on (cte4.max_num = cte3.num)
where rownum = 1
Essentially, this is what you would do in pl/sql: Build substrings of increasing length and stop at the point at which they are not matching anymore.
I doubt that there is some built-in SQL function, but it can be done in SQL only using regular expressions:
with cte1 as (
select 1 id, 'abc def hijk www' str from dual
union all
select 2 id, 'abc def iosk www' str from dual
), cte2 as (
SELECT distinct id, trim(regexp_substr(str, '[^ ]+', 1, level)) str
FROM cte1 t
CONNECT BY instr(str, ' ', 1, level - 1) > 0
)
select distinct t1.str
from cte2 t1
join cte2 t2 on (t1.str = t2.str and t1.id != t2.id)
I haven't done any performance tests, but my experience tells me this is most likely faster than any pl/sql solution since you are totally avoiding context switches.
You should check the package UTL_MATCH for a similar functionality, but the get exact your request you must write own function.
The binary search for the common substring length provides good performance for long strings.
create or replace function ident_pfx(str1 varchar2, str2 varchar2) return varchar2
as
len_beg PLS_INTEGER;
len_end PLS_INTEGER;
len_mid PLS_INTEGER;
len_result PLS_INTEGER;
begin
if str1 is null or str2 is null then return null; end if;
--
len_result := 0;
len_beg := 0;
len_end := least(length(str1),length(str2));
LOOP
BEGIN
-- use binary search for the common substring length
len_mid := ceil((len_beg + len_end) / 2);
IF (substr(str1,1,len_mid) = substr(str2,1,len_mid))
THEN
len_beg := len_mid; len_result := len_mid;
ELSE
len_end := len_mid;
END IF;
END;
IF (len_end - len_beg) <= 1 THEN
-- check last character
IF (substr(str1,1,len_end) = substr(str2,1,len_end))
THEN
len_result := len_end;
END IF;
EXIT ;
END IF;
END LOOP;
return substr(str1,1,len_result);
end;
/
select ident_pfx('abc def hijk www','abc def iosk www') ident_pfx from dual;
abc def
Another possible solution would be to use the XOR.
If you XOR the two strings together, the result should have a NUL byte whereever the two strings match.
XOR is not a native operator, but i am pretty sure there is support for it in one of the libraries.
If "the performance is pretty important", you should avoid the "looping" on substrings.
Here an alternative using the XOR (as proposed by #EvilTeach).
with string_transform as (
select 'abc def hijk www' str1, 'abc def iosk www' str2 from dual
),
str as (
select
str1, str2,
-- add suffix to handle nulls and identical strings
-- calculate XOR
utl_raw.bit_xor(utl_raw.cast_to_raw(str1||'X'),utl_raw.cast_to_raw(str2||'Y')) str1_xor_str2
from string_transform
), str2 as (
select
str1, str2,
str1_xor_str2,
-- replace all non-identical characters (not 00) with 2D = '-'
utl_raw.translate(str1_xor_str2,
utl_raw.translate(str1_xor_str2,'00','01'),
utl_raw.copies('2D',length(str1_xor_str2))) xor1
from str
), str3 as (
select
str1, str2,
-- replace all identical characters (00) with 2B (= '+') and cast back to string
utl_raw.cast_to_varchar2(utl_raw.translate(xor1,'00','2B')) diff
-- diff = ++++++++---+++++ (+ means identical position; - difference)
from str2
)
select str1, str2,
-- remove the appended suffix character
substr(diff,1,length(diff)-1) diff,
-- calculate the length of the identical prefix
instr(diff,'-')-1 same_prf_length
from str3
;
Basically both strings are first converted to RAW format. XOR sets the identical bytes (characters) to 00. With translate the identical bytes are converted to '+', all other to '-'.
The identical prefix length is the position of the first '-' in the string minus one.
Technically a (different) sufix character is added to both strings to hanlde NULLs and identical strings.
Note that if the string is longer that 2000, some extra processing must be added
due to limitation of UTL_RAW.CAST_TO_VARCHAR2.