Oracle/SQL - Removing undefined chars from string - string

I currently have an assignemnt where i have to handle data from a lot of countries. My customer have given me a list of acceptable characters, lets call it:
'aber =*'
All other characters should just be changed to '_'.
I know the conversion for my country's specific chars (æøå), easily done with something like
select replace ('Ål', 'Å', 'AA') from dual;
But how would i go about removing all unwanted "noise" without splitting it up in char-by-char comparison?
For example "bear*2 = fear" should become "bear*_ = _ear" as 2 and f are not in the accepted list.

Oracle 10g and up. As one of the approaches, you can use regular expression function regexp_replace():
select regexp_replace('bear*2 = fear', '[^aber =*]', '_') as res
from dual
res
------------------------------
bear*_ = _ear
Find out more about regexp_replace() function.

Related

how do I get rid of leading/trailing spaces in SAS search terms?

I have had to look up hundreds (if not thousands) of free-text answers on google, making notes in Excel along the way and inserting SAS-code around the answers as a last step.
The output looks like this:
This output contains an unnecessary number of blank spaces, which seems to confuse SAS's search to the point where the observations can't be properly located.
It works if I manually erase superflous spaces, but that will probably take hours. Is there an automated fix for this, either in SAS or in excel?
I tried using the STRIP-function, to no avail:
else if R_res_ort_txt=strip(" arild ") and R_kom_lan=strip(" skåne ") then R_kommun=strip(" Höganäs " );
If you want to generate a string like:
if R_res_ort_txt="arild" and R_kom_lan="skåne" then R_kommun="Höganäs";
from three variables, let's call them A B C, then just use code like:
string=catx(' ','if R_res_ort_txt=',quote(trim(A))
,'and R_kom_lan=',quote(trim(B))
,'then R_kommun=',quote(trim(C)),';') ;
Or if you are just writing that string to a file just use this PUT statement syntax.
put 'if R_res_ort_txt=' A :$quote. 'and R_kom_lan=' B :$quote.
'then R_kommun=' C :$quote. ';' ;
A saner solution would be to continue using the free-text answers as data and perform your matching criteria for transformations with a left join.
proc import out=answers datafile='my-free-text-answers.xlsx';
data have;
attrib R_res_ort_txt R_kom_lan length=$100;
input R_res_ort_txt ...;
datalines4;
... whatever all those transforms will be performed on...
;;;;
proc sql;
create table want as
select
have.* ,
answers.R_kommun_answer as R_kommun
from
have
left join
answers
on
have.R_res_ort_txt = answers.res_ort_answer
& have.R_kom_lan = abswers.kom_lan_answer
;
I solved this by adding quotes in excel using the flash fill function:
https://www.youtube.com/watch?v=nE65QeDoepc

SQLite - Left-pad zeros in returned Text field

I have a text field in my SQLite database that stores a Time value, but for unrelated reasons I can't change the data type to TIME.
The values are stored in HH:MM format, and I'm having trouble trying to sort results by time because the values below '10:00' are missing a leading zero. I would prefer not to store the data with leading zero for the same unrelated reasons.
I'd like to add something to the Query that would pad the missing character if necessary, causing the results to read '08:30' when collected. I've been searching through the command and function lexicon though and I'm not finding what I need.
Is there a simple way to do this inside a query?
Thanks
I think this would work:
select your_col, case when length(your_col) < 5
then '0' || your_col else your_col end from your_table
Demo using Python
>>> conn.execute('''select c, case when length(c) < 5
then '0' || c else c end from t''').fetchall()
[(u'10:00', u'10:00'), (u'8:00', u'08:00')]
SELECT REPLACE(PRINTF('%5s', your_col), ' ', '0') FROM your_table
The PRINTF call pads the value with spaces until it's 5 characters, and the
REPLACE call replaces those spaces with zeros.

Display the specific part of the string in PostgreSQL 9.3

I have a string to modify as per the requirements.
For example:
The given string is:
str1 varchar = '123,456,789';
I want to show the string as:
'456,789'
Note: The first part (delimited) with comma, I want to remove from string and show the rest of string.
In SQL Server I used STUFF() function.
SELECT STUFF('123,456,789',1,4,'');
Result:
456,789
Question: Is there any string function in PostgreSQL 9.3 version to do the same job?
you can use regular expressions:
select substring('123,456,789' from ',(.*)$');
The comma matches the first comma found in the string. The part inside the brackets (.*) is returned from the function. The symbol $ means the end of the string.
A alternative solution without regular expressions:
select str, substring(str from position(',' in str)+1 for length(str)) from
(select '123,456,789'::text as str) as foo;
You could first turn the string to array and return second and third cell:
select array_to_string((regexp_split_to_array('123,456,789', ','))[2:3], ',')
Or you could use substring-function with regular expressions (pattern matching):
SELECT substring('123,456,789' from '[0-9]+,([0-9]+,[0-9]+)')
[0-9]+ means one or more digits
parentheses tell to return that part from the string
Both solutions work on your specific string.
Your The SQL Server example indicates you just want to remove the first 4 characters, which makes the rest of your question seem misleading because it completely ignores what's in the string. Only the positions matters.
Be that as it may, the simple and cheap way to cut off leading characters is with right():
SELECT right('123,456,789', -4);
SQL Fiddle.

Reading from a string using sscanf in Matlab

I'm trying to read a string in a specific format
RealSociedad
this is one example of string and what I want to extract is the name of the team.
I've tried something like this,
houseteam = sscanf(str, '%s');
but it does not work, why?
You can use regexprep like you did in your post above to do this for you. Even though your post says to use sscanf and from the comments in your post, you'd like to see this done using regexprep. You would have to do this using two nested regexprep calls, and you can retrieve the team name (i.e. RealSociedad) like so, given that str is in the format that you have provided:
str = 'RealSociedad';
houseteam = regexprep(regexprep(str, '^<a(.*)">', ''), '</a>$', '')
This looks very intimidating, but let's break this up. First, look at this statement:
regexprep(str, '^<a(.*)">', '')
How regexprep works is you specify the string you want to analyze, the pattern you are searching for, then what you want to replace this pattern with. The pattern we are looking for is:
^<a(.*)">
This says you are looking for patterns where the beginning of the string starts with a a<. After this, the (.*)"> is performing a greedy evaluation. This is saying that we want to find the longest sequence of characters until we reach the characters of ">. As such, what the regular expression will match is the following string:
<ahref="/teams/spain/real-sociedad-de-futbol/2028/">
We then replace this with a blank string. As such, the output of the first regexprep call will be this:
RealSociedad</a>
We want to get rid of the </a> string, and so we would make another regexprep call where we look for the </a> at the end of the string, then replace this with the blank string yet again. The pattern you are looking for is thus:
</a>$
The dollar sign ($) symbolizes that this pattern should appear at the end of the string. If we find such a pattern, we will replace it with the blank string. Therefore, what we get in the end is:
RealSociedad
Found a solution. So, %s stops when it finds a space.
str = regexprep(str, '<', ' <');
str = regexprep(str, '>', '> ');
houseteam = sscanf(str, '%*s %s %*s');
This will create a space between my desired string.

C# 4.0 function to check for first four characters in the string

I need to validate for valid code name.
So, my string can have values like below:
String test = "C000. ", "C010. ", "C020. ", "C030. ", "CA00. ","C0B0. ","C00C. "
So my function needs to validate below conditions:
It should start with C
After that next 3 characters should be numeric before .
Rest it can be anything.
So in above string values, only ["C000.", "C010.", "C020.", "C030."] are valid ones.
EDIT:
Below is the code I tried:
if (nameObject.Title.StartsWith(String.Format("^[C][0-9]{3}$",nameObject.Title)))
I'd suggest a regex, for example (written off the top of my head, may need work):
string s = "C030.";
Regex reg = new Regex("C[0-9]{3,3}\\.");
bool isMatch = reg.IsMatch(s);
This regex should do the trick:
Regex.IsMatch(input, #"C[0-9]{3}\..*")
Check out http://www.techotopia.com/index.php/Working_with_Strings_in_C_Sharp
for a quick tutorial on (among other things) individual access of string elements, so you can test each element for your criteria.
If you think your criteria may change, using regular expressions gives you maximum flexibility (but is more runtime intensive than regular string-element evaluation). In your case, it may be overkill, IMHO.

Resources