Oracle conditionally adding spaces into data - string

I have a table that was given to me with some 'incorrect' data. The format of the data should be:
"000 00000"
There are TWO spaces. WHERE the spaces are can be different for different records, so for example one record could be the previous example and another could be "00 00 0000". The problem is that the data came in, in some instances with only a single space. (so "000 00000").
Ideally, id like to do this in a query to fix the data that's been loaded via an update statement. If this is easier done outside of oracle, that's fine, I can re-load the data (its a bit of data, at almost 400,000 rows).
What would be the easiest way to find the single space and add another as needed, or leave it alone if there are already two spaces?
I am currently working on a query to split the string ON the spaces, trim the data then put it all back together with 2 spaces.... its not working out too well in testing.
Thanks in advance!

here is the query to find single space record , try making CASE statement as needed.
WITH sample_data AS (SELECT '000 00000' value FROM dual UNION ALL
SELECT '00 00 0000' value FROM dual UNION ALL
SELECT '000 00000' value FROM dual )
Select * from sample_data where REGEXP_COUNT(VALUE,'[[:space:]]') =1

Related

Extracting text in excel

I have some text which I receive daily that I need to seperate. I have hundreds of lines similar to the extract below:
COMMODITY PRICE DIFFERENTIAL: FEB50-FEB40 (APR): COMPANY A OFFERS 1000KB AT $0.40
I need to extract individual snippets from this text, so for each in a seperate cell, I the result needs to be the date, month, company, size, and price. In the case, the result would be:
FEB50-40
APR
COMPANY A
100
0.40
The issue I'm struggling with is uniformity. For example one line might have FEB50-FEB40, another FEB5-FEB40, or FEB50-FEB4. Another example giving me difficult is that some rows might have 'COMPANY A' and the other 'COMPANYA' (one word instead of two).
Any ideas? I've been trying combinations of the below but I'm not able to have uniform results.
=TRIM(MID(SUBSTITUTE($D7," ",REPT(" ",LEN($D7))), (5)*LEN($D7)+1,LEN($D7)))
=MID($D7,20,21-10)
=TRIM(RIGHT(SUBSTITUTE($D6,"$",REPT("$",2)),4))
Sometimes I get
FEB40-50(' OR 'FEB40-FEB5'
when it should be
'FEB40-FEB50'`
Thank you to who is able to help.
You might get to the limits of formulas with this scenario, but with Power Query you can still work.
As I see it, you want to apply the following logic to extract text from this string:
COMMODITY PRICE DIFFERENTIAL: FEB50-FEB40 (APR): COMPANY A OFFERS 1000KB AT $0.40
text after the first : and before the first (
text between the brackets
text after the word OFFERS and before AT
text after 'AT`
These can be easily translated into several "Split" scenarios inside Power Query.
split by custom delimiter : - that's colon and space - for each ocurrence
remove first column
Split new first column by ( - that's space and bracket - for leftmost
Replace ) with nothing in second column
Split third column by delimiter OFFERS
split new fourth column by delimiter AT
The screenshot shows the input data and the result in the Power Query editor after renaming the columns and before loading the query into the worksheet.
Once you have loaded the query, you can add / remove data in the input table and simply refresh the query to get your results. No formulas, just clicking ribbon commands.
You can take this further by removing the "KB" from the column, convert it to a number, divide it by 100. Your business processing logic will drive what you want to do. Just take it one step at a time.

LIKE clause in Sybase/SAP ASE trimmed at the end?

The emp table below has no ENAME ending in three spaces. However, the following SQL statement behaves like the clause is trimmed at the end (like a '%' pattern), because it returns all records:
select ENAME from dbo.emp where ENAME like '% '
I tried many other database platforms (including SQL Server, SQL Anywhere, Oracle, PostgreSQL, MySQL etc), I've seen this happening only in Sybase/SAP ASE (version 16). Is this a bug or is it "by design"? Nothing specific found in the online spec in this regard.
I'm looking for a generic fix, to apply some simple transformation to the pattern and return what is expected from any other platform. Without knowing in advance what data type the field is or what kind of data it holds.
This is caused by the VARCHAR semantics in ASE, which will always strip leading spaces from a value before operating on it. This is applied to the string '% ' before it is used, since that is a VARCHAR value by definition. This is indeed a particular semantic of ASE.
Now, you could try working around this by using the [ ] wildcard to match a space, but there are some things to be aware of. First, the column being matched (ENAME) must be CHAR, not VARCHAR, otherwise any trialing spaces will have been stripped as well before they were stored. Assuming the column is CHAR, then using a pattern '%[ ][ ][ ]' unfortunately still does not appear to work. I think there may be some trailing-space-stripping still happening here.
The best way to work around this is to use an artificial end-of-field delimiter which will not occur in the data, e.g.
ENAME||'~EOF~' like '% ~EOF~'
This works. But note that the column ENAME must still be CHAR rather than VARCHAR.
Like behavior is somehow documented in here .
For VARCHAR columns this will never work because ASE removes the trailing spaces
For CHAR it depends how do you insert the data.. in a char(10) column , if you insert 2 characters , ASE will add 8 blank spaces after the 2 characters to make them 10 .. so when you query , you will get this 2 characters entry as part of the result set because it includes more than 3 trailing spaces..
If this is not a problem for you, instead of like you can use char_index () which will count the trailing spaces and won't truncate them as like, so you could write something like :
select ENAME from dbo.emp where char_index(' ',ENAME) >0
Or you can calculate the trailing spaces , then check if your 3 spaces come after that or not , like :
select a from A
where charindex(' ',a) > (len(a) - len(convert (varchar(10) , a)))
Now again, this will get you more rows than expected if the data were inserted in a non-uniform count, but will work perfectly if you know exactly what to search for.
SELECT ename from dbo.emp where RIGHT(ENAME ,3) = '      '

Redshift: Truncate VARCHAR value automatically on INSERT or maybe use max length?

When performing an INSERT, Redshift does not allow you to insert a string value that is longer/wider than the target field in the table. Observe:
CREATE TEMPORARY TABLE test (col VARCHAR(5));
-- result: 'Table test created'
INSERT INTO test VALUES('abcdefghijkl');
-- result: '[Amazon](500310) Invalid operation: value too long for type character varying(5);'
One workaround for this is to cast the value:
INSERT INTO test VALUES('abcdefghijkl'::VARCHAR(5));
-- result: 'INSERT INTO test successful, 1 row affected'
The annoying part about this is that now all of my code will have to have these cast statements on every INSERT for each VARCHAR field like this, or the application code will have to truncate the string before trying to construct the query; either way, it means that the column's width specification has to go into the application code, which is annoying.
Is there any better way of doing this with Redshift? It would be great if there was some option to just have the server truncate the string and perform (and maybe raise a warning) the way it does with MySQL.
One thing I could do is just declare these particular fields as a very large VARCHAR, perhaps even 65535 (the maximum).
create table analytics.testShort (a varchar(3));
create table analytics.testLong (a varchar(4096));
create table analytics.testSuperLong (a varchar(65535));
insert into analytics.testShort values('abc');
insert into analytics.testLong values('abc');
insert into analytics.testSuperLong values('abc');
-- Redshift reports the size for each table is the same, 4 mb
The one disadvantage of this approach I have found is that it will cause bad performance if this column is used in a group by/join/etc:
https://discourse.looker.com/t/troubleshooting-redshift-performance-extensive-guide/326
(search for VARCHAR)
I am wondering though if there is no harm otherwise if you plan to never use this field in group by, join, and the like.
Some things to note in my scenario: Yes, I really don't care about the extra characters that may be lost with truncation, and no, I don't have a way to enforce the length of the source text. I am capturing messages and URLs from external sources which generally fall into certain range in length of characters, but sometimes there are longer ones. It doesn't matter in our application if they get truncated or not in storage.
The only way to automatically truncate the strings to match the column width is using the COPY command with the option TRUNCATECOLUMNS
Truncates data in columns to the appropriate number of characters so
that it fits the column specification. Applies only to columns with a
VARCHAR or CHAR data type, and rows 4 MB or less in size.
Otherwise, you will have to take care of the length of your strings using one of these two methods:
Explicitly CAST your values to the VARCHAR you want:
INSERT INTO test VALUES(CAST('abcdefghijkl' AS VARCHAR(5)));
Use the LEFT and RIGHT string functions to truncate your strings:
INSERT INTO test VALUES(LEFT('abcdefghijkl', 5));
Note: CAST should be your first option because it handles multi-byte characters properly. LEFT will truncate based on the number of characters not bytes and if you have a multi-byte character in your string, you might end up exceeding the limit of your column.

Quick SQL question

Working on postgres SQL.
I have a table with a column that contains values of the following format:
Set1/Set2/Set3/...
Seti can be a set of values for each i. They are delimited by '/'.
I would like to show distinct entries of the form set1/set2 and that is - I would like to trim or truncate the rest of the string in those entries.
That is, I want all distinct options for:
Set1/Set2
A regular expression would work great: I want a substring of the pattern: .*/.*/
to be displayed without the rest of it.
I got as far as:
select distinct column_name from table_name
but I have no idea how to make the trimming itself.
Tried looking in w3schools and other sites as well as searching SQL trim / SQL truncate in google but didn't find what I'm looking for.
Thanks in advance.
mu is too short's answer is fine if the the lengths of the strings between the forward slashes is always consistent. Otherwise you'll want to use a regex with the substring function.
For example:
=> select substring('Set1/Set2/Set3/' from '^[^/]+/[^/]+');
substring
-----------
Set1/Set2
(1 row)
=> select substring('Set123/Set24/Set3/' from '^[^/]+/[^/]+');
substring
--------------
Set123/Set24
(1 row)
So your query on the table would become:
select distinct substring(column_name from '^[^/]+/[^/]+') from table_name;
The relevant docs are http://www.postgresql.org/docs/8.4/static/functions-string.html
and http://www.postgresql.org/docs/8.4/static/functions-matching.html.
Why do you store multiple values in a single record? The preferred solution would be multiple values in multiple records, your problem would not exist anymore.
Another option would be the usage of an array of values, using the TEXT[] array-datatype instead of TEXT. You can index an array field using the GIN-index.
SUBSTRING() (like mu_is_too_short showed you) can solve the current problem, using an array and the array functions is another option:
SELECT array_to_string(
(string_to_array('Set1/Set2/Set3/', '/'))[1:2], '/' );
This makes it rather flexible, there is no need for a fixed length of the values. The separator in the array functions will do the job. The [1:2] will pick the first 2 slices of the array, using [1:3] would pick slices 1 to 3. This makes it easy to change.
If they really are that regular you could use substring; for example:
=> select substring('Set1/Set2/Set3/' from 1 for 9);
substring
-----------
Set1/Set2
(1 row)
There is also a version of substring that understands POSIX regular expressions if you need a little more flexibility.
The PostgreSQL online documentation is quite good BTW:
http://www.postgresql.org/docs/current/static/index.html
and it even has a usable index and sensible navigation.
If you want to use .*/.* then you'd want something like substring(s from '[^/]+/[^/]+'), for example:
=> select substring('where/is/pancakes/house?' from '[^/]+/[^/]+');
substring
-----------
where/is
(1 row)

PLSQL to modify VARCHAR2 column data

I am working on an app that involves evaluating modifications made to vehicles, and does some number crunching from figures stored in an Oracle 10g database. Unfortunately, I only have a text data in the database, yet I need to work with numbers and not text. I would like to know if anyone could help me with understanding how to perform string operations on VARCHAR2 column data in an Oracle 10g database with PLSQL:
For example: I need to take a VARCHAR2 column named TOP_SPEED in a table named CARS, parse the text data in its column to break it up into two new values, and insert these new values into two new NUMBER type columns in the CARS table, TOP_SPEED_KMH and TOP_SPEED_MPH.
The data in the TOP_SPEED column is as such: eg. "153 km/h (94.62 mph)"
I want to save the value of 153.00 into the TOP_SPEED_KMH column, and the 94.62 value into TOP_SPEED_MPH column.
I think what I have to do in a query/script is this:
select the text data in TOP_SPEED into a local text variable
modify the local text variable and save the new values into two number variables
write back the two number variables to the corresponding TOP_SPEED_KMH and TOP_SPEED_MPH columns
Could someone please confirm that I am on the right track? I would also really appreciate any example code if anyone has the time.
Cheers
I think it's a better idea to just have the top_speed_kmh column, and get rid of the mph one. As the number of kms in a mile never changes, you can simply multiply by 0.6 to convert to miles. So you can do the same update statement as N West suggested without the mph column:
UPDATE CARS SET TOP_SPEED_KMH = TO_NUMBER(SUBSTR(1, (INSTR(UPPER(TOP_SPEED), "KM/H") -1)));
And whenever you need the mph speed, just do
Select top_speed_kmh*0.6 as top_speed_mph from cars;
For the parsing bit, you would probably use either REGEXP_SUBSTR or INSTR with SUBSTR
Then use TO_NUMBER to convert to number
You can either create a PL/SQL function for each parsing, returning the number value, and run an UPDATE query on the fields, or you could create a PL/SQL procedure with a cursor looping over all the data that is to be updated.
Here are som links for some of the built-ins:
http://psoug.org/reference/substr_instr.html
http://download.oracle.com/docs/cd/B14117_01/server.101/b10759/functions116.htm
You probably don't even need to do this with PL/SQL at all.
As long as the data in the column is consistent "99.99 km/h (99.99 m/h)" you could do this directly with SQL:
UPDATE CARS
SET TOP_SPEED_KMH = TO_NUMBER(SUBSTR(1, (INSTR(UPPER(TOP_SPEED), "KM/H") - 1))),
TOP_SPEED_MPH = <similar substr/instr combination to pull the 99.99 mph out of code>;
Set-operations are typically much faster than procedural operations.
I am working on an app that involves
evaluating modifications made to
vehicles, and does some number
crunching from figures stored in an
Oracle 10g database. Unfortunately, I
only have a text data in the database,
yet I need to work with numbers and
not text
Sounds like you should have some number columns to store these parsed out values. Instead of always calling some parsing routine (be it regexp or substr or a custom function), pass through all the data in the table(s) ONCE and populate the new number fields. You should also modify the ETL process to populate the new number fields moving forward.
If you need numbers and can parse them out, do it once (hopefully in a staging area or off hours at least) and then have the numbers you want. Now you're free to do arithmetic and everything else you'd expect from real numbers ;)
with s as
(select '153 km/h (94.62 mph)' ts from dual)
select
ts,
to_number(substr(ts, 1, instr(ts, ' ') -1)) speed_km,
to_number(substr(regexp_substr(ts, '\([0-9]+'), 2)) speed_mph
from s
Thanks everyone, it was nice to be able to use everyone's input to get the answer below:
UPDATE CARS
SET
CAR_TOP_SPEED_KPH =
to_number(substr(CAR_TOP_SPEED, 1, instr(UPPER(CAR_TOP_SPEED), ' KM/H') -1)),
CAR_TOP_SPEED_MPH =
to_number(substr(regexp_substr(CAR_TOP_SPEED, '\([0-9]+'), 2));

Resources