PLSQL to modify VARCHAR2 column data - string

I am working on an app that involves evaluating modifications made to vehicles, and does some number crunching from figures stored in an Oracle 10g database. Unfortunately, I only have a text data in the database, yet I need to work with numbers and not text. I would like to know if anyone could help me with understanding how to perform string operations on VARCHAR2 column data in an Oracle 10g database with PLSQL:
For example: I need to take a VARCHAR2 column named TOP_SPEED in a table named CARS, parse the text data in its column to break it up into two new values, and insert these new values into two new NUMBER type columns in the CARS table, TOP_SPEED_KMH and TOP_SPEED_MPH.
The data in the TOP_SPEED column is as such: eg. "153 km/h (94.62 mph)"
I want to save the value of 153.00 into the TOP_SPEED_KMH column, and the 94.62 value into TOP_SPEED_MPH column.
I think what I have to do in a query/script is this:
select the text data in TOP_SPEED into a local text variable
modify the local text variable and save the new values into two number variables
write back the two number variables to the corresponding TOP_SPEED_KMH and TOP_SPEED_MPH columns
Could someone please confirm that I am on the right track? I would also really appreciate any example code if anyone has the time.
Cheers

I think it's a better idea to just have the top_speed_kmh column, and get rid of the mph one. As the number of kms in a mile never changes, you can simply multiply by 0.6 to convert to miles. So you can do the same update statement as N West suggested without the mph column:
UPDATE CARS SET TOP_SPEED_KMH = TO_NUMBER(SUBSTR(1, (INSTR(UPPER(TOP_SPEED), "KM/H") -1)));
And whenever you need the mph speed, just do
Select top_speed_kmh*0.6 as top_speed_mph from cars;

For the parsing bit, you would probably use either REGEXP_SUBSTR or INSTR with SUBSTR
Then use TO_NUMBER to convert to number
You can either create a PL/SQL function for each parsing, returning the number value, and run an UPDATE query on the fields, or you could create a PL/SQL procedure with a cursor looping over all the data that is to be updated.
Here are som links for some of the built-ins:
http://psoug.org/reference/substr_instr.html
http://download.oracle.com/docs/cd/B14117_01/server.101/b10759/functions116.htm

You probably don't even need to do this with PL/SQL at all.
As long as the data in the column is consistent "99.99 km/h (99.99 m/h)" you could do this directly with SQL:
UPDATE CARS
SET TOP_SPEED_KMH = TO_NUMBER(SUBSTR(1, (INSTR(UPPER(TOP_SPEED), "KM/H") - 1))),
TOP_SPEED_MPH = <similar substr/instr combination to pull the 99.99 mph out of code>;
Set-operations are typically much faster than procedural operations.

I am working on an app that involves
evaluating modifications made to
vehicles, and does some number
crunching from figures stored in an
Oracle 10g database. Unfortunately, I
only have a text data in the database,
yet I need to work with numbers and
not text
Sounds like you should have some number columns to store these parsed out values. Instead of always calling some parsing routine (be it regexp or substr or a custom function), pass through all the data in the table(s) ONCE and populate the new number fields. You should also modify the ETL process to populate the new number fields moving forward.
If you need numbers and can parse them out, do it once (hopefully in a staging area or off hours at least) and then have the numbers you want. Now you're free to do arithmetic and everything else you'd expect from real numbers ;)

with s as
(select '153 km/h (94.62 mph)' ts from dual)
select
ts,
to_number(substr(ts, 1, instr(ts, ' ') -1)) speed_km,
to_number(substr(regexp_substr(ts, '\([0-9]+'), 2)) speed_mph
from s

Thanks everyone, it was nice to be able to use everyone's input to get the answer below:
UPDATE CARS
SET
CAR_TOP_SPEED_KPH =
to_number(substr(CAR_TOP_SPEED, 1, instr(UPPER(CAR_TOP_SPEED), ' KM/H') -1)),
CAR_TOP_SPEED_MPH =
to_number(substr(regexp_substr(CAR_TOP_SPEED, '\([0-9]+'), 2));

Related

Change Data layout in Excel

I have Excel data taken from vendor.
I want to change the format of layout so that it would be easier for me to ingest this data to database such as (Postgre / Mysql)
There are thousands of file in this format as one file contain information of only sales in one date.
Is there a better or fast way to convert this data or is creating VBA the only way?
We could definitely do this in VBA!
On the other hand, why bother when it can easily be accomplished with excel formulas?
My assumptions:
You're dealing with data of variable sizes (number of regions and items)
Cells you show as merged are merged on the actual data.
I set this up to work with a max data range of row 100, but easy to change that.
Step 1)
Identify how many regions and items we have:
=COUNTA($D$4:$K$4) (Regions)
=COUNTA($C$6:$C$100) (Items)
Step 2) List Items, repeating for number of regions.
Use local line numbers rather than worksheet row numbers to make this more portable.
To make it repeat, we're going to divide line# by total regions, roundup, then index items using that. Don't forget error handling to identify when to stop listing items. From now on, all the rest of our formulas will have a If( *item column* <> "", *show stuff* , *don't show stuff* ) . We can gloss over that in future steps.
=IFERROR(INDEX($C$6:$C$100,MATCH(ROUNDUP(M5/$P$1,0),$B$6:$B$100,0)),"")
Step 3) Date is super easy:
Just pull the same date for each: =IF($O5<>"",$C$2,"")
Step 4) We're going to use the Mod() function to create a cyclic action with our regions.
It took me a while to play with the formula... so don't ask me what all the adjustments(translation and scaling) are for, just trust it works. =IF($O5<>"",INDEX($D$4:$K$4,2*(MOD($M5-1,$P$1)+1)-1),"")
Step 5) "Quantity" and "Price" are the same formula, except "Price" has an extra "+1"
Quantity = =IFERROR(INDEX($D$6:$K$100,MATCH($O5,$C$6:$C$100,0),MATCH($P5,$D$4:$K$4,0)),"")
Price = =IFERROR(INDEX($D$6:$K$100,MATCH($O5,$C$6:$C$100,0),MATCH($P5,$D$4:$K$4,0)+1),"")
FINAL PRODUCT:

Converting grouped /nested data table to tabular form

I have been trying to covert tables that I get in the grouped format as in the picture to tabular form to make it pivotable. I have tried to use power query but not sure how to do it. I am not sure where to start and I appreciate your guidance as to where to start or how to make it
if you want to use powerquery, Assuming those are spaces, your best bet is probably to add some custom columns that check the number of leading spaces from column one to determine the level. For example,
= if Text.StartsWith([Column1], " ") then Text.Trim([Column1]) else null
then do something similar but with different leading space counts into separate columns, and then fill down to populate a table
You could alternately try to split the column on, lets say, 3 spaces, depending on what the indentions are. (0/3/6/9)

Extracting text in excel

I have some text which I receive daily that I need to seperate. I have hundreds of lines similar to the extract below:
COMMODITY PRICE DIFFERENTIAL: FEB50-FEB40 (APR): COMPANY A OFFERS 1000KB AT $0.40
I need to extract individual snippets from this text, so for each in a seperate cell, I the result needs to be the date, month, company, size, and price. In the case, the result would be:
FEB50-40
APR
COMPANY A
100
0.40
The issue I'm struggling with is uniformity. For example one line might have FEB50-FEB40, another FEB5-FEB40, or FEB50-FEB4. Another example giving me difficult is that some rows might have 'COMPANY A' and the other 'COMPANYA' (one word instead of two).
Any ideas? I've been trying combinations of the below but I'm not able to have uniform results.
=TRIM(MID(SUBSTITUTE($D7," ",REPT(" ",LEN($D7))), (5)*LEN($D7)+1,LEN($D7)))
=MID($D7,20,21-10)
=TRIM(RIGHT(SUBSTITUTE($D6,"$",REPT("$",2)),4))
Sometimes I get
FEB40-50(' OR 'FEB40-FEB5'
when it should be
'FEB40-FEB50'`
Thank you to who is able to help.
You might get to the limits of formulas with this scenario, but with Power Query you can still work.
As I see it, you want to apply the following logic to extract text from this string:
COMMODITY PRICE DIFFERENTIAL: FEB50-FEB40 (APR): COMPANY A OFFERS 1000KB AT $0.40
text after the first : and before the first (
text between the brackets
text after the word OFFERS and before AT
text after 'AT`
These can be easily translated into several "Split" scenarios inside Power Query.
split by custom delimiter : - that's colon and space - for each ocurrence
remove first column
Split new first column by ( - that's space and bracket - for leftmost
Replace ) with nothing in second column
Split third column by delimiter OFFERS
split new fourth column by delimiter AT
The screenshot shows the input data and the result in the Power Query editor after renaming the columns and before loading the query into the worksheet.
Once you have loaded the query, you can add / remove data in the input table and simply refresh the query to get your results. No formulas, just clicking ribbon commands.
You can take this further by removing the "KB" from the column, convert it to a number, divide it by 100. Your business processing logic will drive what you want to do. Just take it one step at a time.

Separating values that are combined in one string

I would like to solve this either in Excel or in SPSS:
I have categorical data (each number representing a medical diagnosis) that are combined into single cells. In other words, a row (patient) has multiple diagnoses. However, I would like to know the frequencies of each diagnosis. What is the best way to go about this? (See picture for reference)
For SPSS:
First just creating some sample data to demonstrate on:
data list free/e_cerv_dis_state (a20).
begin data
"{1/2/3/6}" "{1/2/4}" "{2/4/5}" "{1/5/6}" "{4}" "{4/5/6}" "{1/2/3/4/5/6}"
end data.
Now the following code will create a separate variable for each possible diagnosis, and will put a 1 in it if the diagnosis exists in the original variable.
do repeat vr=diag1 to diag9/vl=1 to 9.
compute vr=char.index(e_cerv_dis_state, string(vl, f1) ) > 0.
end repeat.
freq diag1 to diag6.
Note this will only work for up to 9 diagnoses. If you have more than that the solution will have to be adapted to multiple digits.
Assuming that the number of columns is fairly regular, I would suggest using text to columns, and then using COUNTIF on the cells if they are the value wanted. However there is a more robust and reproducible solution that would involve using SQL. If you download the free version of SQL Express here: https://www.microsoft.com/en-gb/sql-server/sql-server-downloads
Then you can import your table of data, here's how to do that: How to import an Excel file into SQL Server?
Then you could use the more friendly SQL database to get the answers you want. For example you can use a select statement that would say:
SELECT count(e_cerv_dis_state)
WHERE e_cerv_dis_state = '6'
It would also be possible to use a CASE WHEN statement to add-in the names of the diagnoses.

Taking means of irregular amounts data

I'm not able to take the means for a large dataset given that the amount of attributes is irregular.
I have posted a simplified case for the problem. It explains the problem very well.
An idea that I came up with: Make a filter to condition on a single attribute. However, still, I don't see a way to do this in an efficient way (other then doing it all by hand).
see excel file:
All help is much appreciated.
I'm basically looking for a function/method to achieve taking means of all different attributes conditioned on each person for a large dataset without doing it by hand.
You can use AVERAGEIFS() inside an IF:
=IF(OR(A2<>A1,B2<>B1),AVERAGEIFS(C:C,A:A,A2,B:B,B2),"")
the ifrst part of the if tests whether the row starts a new group either by the person or the attribute changing. Then it uses AVERAGEIFS() to return the correct average of that group. otherwise it returns a blank
What you want to do can be accomplished very simply with a pivot table.
Simply select one of the cells inside the range of data you want to process(See the video for general use of a pivot table https://www.youtube.com/watch?v=iCiayB6GrpQ )
go the insert tab and insert pivot table.
Once you have it, simply check people, attribute, and values. Then drag people and attribute into rows, drag valut into the values window, select the drop down list and change it from sum of value to average and you should be done. https://i.stack.imgur.com/nYEzw.png

Resources