Replace Non-Alphanumeric Characters in Oracle - string

I am trying to convert a string in Oracle into a modified string that is compatible with a specific API.
I would like to leave all alphanumeric characters intact, replace all spaces with the + character, and replace all special characters with % plus their hex code.
For example,
Project 1: Nuts & Bolts
should become
Project+1%3A+Nuts+%26+Bolts
Is there any way to do this using only SQL?

I don't think you can get there with plain SQL without nested replace calls. You can get your sample value with the utl_url.escape() function, but because you have to pass it a second parameter and that is a boolean, you have to do it in an PL/SQL block:
set define off
begin
dbms_output.put_line(replace(utl_url.escape('Project 1: Nuts & Bolts', true),
'%20', '+'));
end;
/
Project+1%3A+Nuts+%26+Bolts
The url_utl.escape function converts the spaces to %20:
Project%201%3A%20Nuts%20%26%20Bolts
... and the single replace call converts those to +.
As Ed Gibbs said, you can make that a function so you can at least call it from plain SQL:
create or replace function my_escape(str in varchar2) return varchar2 is
begin
return replace(utl_url.escape(str, true), '%20', '+');
end;
/
set define off
select my_escape('Project 1: Nuts & Bolts') from dual;
MY_ESCAPE('PROJECT1:NUTS&BOLTS')
--------------------------------
Project+1%3A+Nuts+%26+Bolts
You probably need to check the legal and reserved character lists to see if there's anything else that needs special handling.
(I've used set define off to stop my client treating the ampersand as a substitution variable; your client or application might not need that, e.g. if calling over JDBC).

apex_util.url_encode should work.

Related

how do I get rid of leading/trailing spaces in SAS search terms?

I have had to look up hundreds (if not thousands) of free-text answers on google, making notes in Excel along the way and inserting SAS-code around the answers as a last step.
The output looks like this:
This output contains an unnecessary number of blank spaces, which seems to confuse SAS's search to the point where the observations can't be properly located.
It works if I manually erase superflous spaces, but that will probably take hours. Is there an automated fix for this, either in SAS or in excel?
I tried using the STRIP-function, to no avail:
else if R_res_ort_txt=strip(" arild ") and R_kom_lan=strip(" skåne ") then R_kommun=strip(" Höganäs " );
If you want to generate a string like:
if R_res_ort_txt="arild" and R_kom_lan="skåne" then R_kommun="Höganäs";
from three variables, let's call them A B C, then just use code like:
string=catx(' ','if R_res_ort_txt=',quote(trim(A))
,'and R_kom_lan=',quote(trim(B))
,'then R_kommun=',quote(trim(C)),';') ;
Or if you are just writing that string to a file just use this PUT statement syntax.
put 'if R_res_ort_txt=' A :$quote. 'and R_kom_lan=' B :$quote.
'then R_kommun=' C :$quote. ';' ;
A saner solution would be to continue using the free-text answers as data and perform your matching criteria for transformations with a left join.
proc import out=answers datafile='my-free-text-answers.xlsx';
data have;
attrib R_res_ort_txt R_kom_lan length=$100;
input R_res_ort_txt ...;
datalines4;
... whatever all those transforms will be performed on...
;;;;
proc sql;
create table want as
select
have.* ,
answers.R_kommun_answer as R_kommun
from
have
left join
answers
on
have.R_res_ort_txt = answers.res_ort_answer
& have.R_kom_lan = abswers.kom_lan_answer
;
I solved this by adding quotes in excel using the flash fill function:
https://www.youtube.com/watch?v=nE65QeDoepc

string parts seperated by ; to ASCII written in a new string

Something like that is coming in:
str="Hello;this;is;a;text"
What I do want as result is this:
result="72:101:108:108:111;116:104:105:115;..."
which should be the Text in ASCII.
You could use string matching to get each word separated by ; and then convert, concat:
local str = "Hello;this;is;a;text"
for word in str:gmatch("[^;]+") do
ascii = table.pack(word:byte(1, -1))
local converted = table.concat(ascii, ":")
print(converted)
end
The output of the above code is:
72:101:108:108:111
116:104:105:115
105:115
97
116:101:120:116
I'll leave the rest of work to you. Hint: use table.concat.
Here is another approach, which exploits that fact that gsub accepts a table where it reads replacements:
T={}
for c=0,255 do
T[string.char(c)]=c..":"
end
T[";"]=";"
str="Hello;this;is;a;text"
result=str:gsub(".",T):gsub(":;",";")
print(result)
Another possibility:
function convert(s)
return (s:gsub('.',function (s)
if s == ';' then return s end
return s:byte()..':'
end)
:gsub(':;',';')
:gsub(':$',''))
end
print(convert 'Hello;this;is;a;text')
Finding certain character or string (such as ";") can be done by using string.find - https://www.lua.org/pil/20.1.html
Converting character to its ASCII code can be done by string.byte - https://www.lua.org/pil/20.html
What you need to do is build a new string using two functions mentioned above. If you need more string-based functions please visit official Lua site: https://www.lua.org/pil/contents.html
Okay...I got way further, but I can't find how to return a string made up of two seperate strings like
str=str1&" "&str2

Display the specific part of the string in PostgreSQL 9.3

I have a string to modify as per the requirements.
For example:
The given string is:
str1 varchar = '123,456,789';
I want to show the string as:
'456,789'
Note: The first part (delimited) with comma, I want to remove from string and show the rest of string.
In SQL Server I used STUFF() function.
SELECT STUFF('123,456,789',1,4,'');
Result:
456,789
Question: Is there any string function in PostgreSQL 9.3 version to do the same job?
you can use regular expressions:
select substring('123,456,789' from ',(.*)$');
The comma matches the first comma found in the string. The part inside the brackets (.*) is returned from the function. The symbol $ means the end of the string.
A alternative solution without regular expressions:
select str, substring(str from position(',' in str)+1 for length(str)) from
(select '123,456,789'::text as str) as foo;
You could first turn the string to array and return second and third cell:
select array_to_string((regexp_split_to_array('123,456,789', ','))[2:3], ',')
Or you could use substring-function with regular expressions (pattern matching):
SELECT substring('123,456,789' from '[0-9]+,([0-9]+,[0-9]+)')
[0-9]+ means one or more digits
parentheses tell to return that part from the string
Both solutions work on your specific string.
Your The SQL Server example indicates you just want to remove the first 4 characters, which makes the rest of your question seem misleading because it completely ignores what's in the string. Only the positions matters.
Be that as it may, the simple and cheap way to cut off leading characters is with right():
SELECT right('123,456,789', -4);
SQL Fiddle.

Reading from a string using sscanf in Matlab

I'm trying to read a string in a specific format
RealSociedad
this is one example of string and what I want to extract is the name of the team.
I've tried something like this,
houseteam = sscanf(str, '%s');
but it does not work, why?
You can use regexprep like you did in your post above to do this for you. Even though your post says to use sscanf and from the comments in your post, you'd like to see this done using regexprep. You would have to do this using two nested regexprep calls, and you can retrieve the team name (i.e. RealSociedad) like so, given that str is in the format that you have provided:
str = 'RealSociedad';
houseteam = regexprep(regexprep(str, '^<a(.*)">', ''), '</a>$', '')
This looks very intimidating, but let's break this up. First, look at this statement:
regexprep(str, '^<a(.*)">', '')
How regexprep works is you specify the string you want to analyze, the pattern you are searching for, then what you want to replace this pattern with. The pattern we are looking for is:
^<a(.*)">
This says you are looking for patterns where the beginning of the string starts with a a<. After this, the (.*)"> is performing a greedy evaluation. This is saying that we want to find the longest sequence of characters until we reach the characters of ">. As such, what the regular expression will match is the following string:
<ahref="/teams/spain/real-sociedad-de-futbol/2028/">
We then replace this with a blank string. As such, the output of the first regexprep call will be this:
RealSociedad</a>
We want to get rid of the </a> string, and so we would make another regexprep call where we look for the </a> at the end of the string, then replace this with the blank string yet again. The pattern you are looking for is thus:
</a>$
The dollar sign ($) symbolizes that this pattern should appear at the end of the string. If we find such a pattern, we will replace it with the blank string. Therefore, what we get in the end is:
RealSociedad
Found a solution. So, %s stops when it finds a space.
str = regexprep(str, '<', ' <');
str = regexprep(str, '>', '> ');
houseteam = sscanf(str, '%*s %s %*s');
This will create a space between my desired string.

Oracle/SQL - Removing undefined chars from string

I currently have an assignemnt where i have to handle data from a lot of countries. My customer have given me a list of acceptable characters, lets call it:
'aber =*'
All other characters should just be changed to '_'.
I know the conversion for my country's specific chars (æøå), easily done with something like
select replace ('Ål', 'Å', 'AA') from dual;
But how would i go about removing all unwanted "noise" without splitting it up in char-by-char comparison?
For example "bear*2 = fear" should become "bear*_ = _ear" as 2 and f are not in the accepted list.
Oracle 10g and up. As one of the approaches, you can use regular expression function regexp_replace():
select regexp_replace('bear*2 = fear', '[^aber =*]', '_') as res
from dual
res
------------------------------
bear*_ = _ear
Find out more about regexp_replace() function.

Resources