Teradata String Manipulation - string

I have taken a good look at the Teradata Syntax reference to no avail
I have some rows with numbers:
ID
Mickey
Laura9
Larry59N
How do I take away the integers from my rows?
I understand that SUBSTR(id, 0, index(id, '%FORMAT%')) would work, but I don't know what could I enter in the %FORMAT% area to just find integers.

You can use oTranslate to remove numbers:
BTEQ -- Enter your SQL request or BTEQ command:
Select the_name, oTranslate( the_name, 'a0123456789','a')
from
( SELECT 'Larry59N' the_name FROM ( SELECT 'X' DUMMY ) a
UNION ALL
SELECT 'Laura9' FROM ( SELECT 'X' DUMMY ) b
UNION ALL
SELECT 'Mickey' the_name FROM ( SELECT 'X' DUMMY ) c
) d
;
*** Query completed. 3 rows found. 2 columns returned.
*** Total elapsed time was 1 second.
the_name oTranslate(the_name,'a0123456789','a')
-------- -----------------------------------------------------
Larry59N LarryN
Laura9 Laura
Mickey Mickey
HTH.
Cheers.

Unfortunately, I don't believe there is a function native to Teradata that will accomplish this. I would suggest looking at the UDF's posted on the Teradata Developer Exchange (link). The function eReplaceChar in particular looks like it may help you accomplish what you are looking to do with this data. The UDF's found at the link above were published under the Apache 2.0 license so you should not have any problems using them.

Related

How to select and drop empty rows in ADX (KQL)

I can not find any documentation on how to view and drop fully empty rows from an ADX source table.
Using where field == "" doesn't return what we are looking for, and the microsoft docs don't provide much insight. Does anyone know any ways to maybe filter these rows out of the ingestion source table in the first place, or run a cleaning function to automatically drop these rows?
Thanks in advance!
Prefer filter during ingestion over delete.
Demo
.create table t(Id:int, txt:string, val:real)
.ingest inline into table t <|
1,Hello,2.3
,,
,,
2,World,4.5
3,,null
,,
t
Id
txt
val
1
Hello
2.3
2
World
4.5
3
Naive checking
t
| where not
(
isnull(Id)
and isempty(txt) // string is never null
and isnull(val)
)
or
t
| where isnotnull(Id)
or isnotempty(txt)
or isnotnull(val)
Shortcut
t
| where tostring(pack_all(true)) != "{}"
Id
txt
val
1
Hello
2.3
2
World
4.5
3

Find the column in Subquery coalesce function

I am using the Coalesce function to return a value from my preferred ranking of Columns but I also want to include the name of the column that the value was derived from.
`i.e
Table:
Apples Pears Mangos
4 5
**SQL **
; with CTE as
(
Select
Coalesce(Apples,Pears,Mangos) as QTY_Fruit
from Table
) select *, column name from QTY_Fruit
from CTE
Result:
QTY_Fruit Col Name
4 Pears`
I am trying to avoid a case statement if possible because there are about 12 fields that I will need to use in my Coalesce. Would love for an easy way to pull the column name based on value in QTY_Fruit. I'm all ears if the answer lies outside the use of subqueries but I figured this would be a start.

I want to create a computed column based off a substring of another column in SQL

I have a column called TAG_
Data in the TAG_ column could like the below:
STV-123456
TV-12456
ME-666666
I want to create two computed columns
One that shows the first part of TAG_ before the hyphen
STV
TV
ME
One that shows the second part of TAG_ after the hyphen
123456
12456
666666
This shouldn't be hard but the light bulb is not on yet. Please help.
try this:
SELECT SUBSTRING(TAG_ ,0,CHARINDEX('-',TAG_ ,0)) AS before,
SUBSTRING(TAG_ ,CHARINDEX('-',TAG_ ,0)+1,LEN(TAG_ )) AS after from testtable
and the result:
Hope this helps!
Example for MySQL, syntax is likely different for other vendors:
create table t
( tag_ text not null
, fst text generated always as (substr(tag_, 1, locate('-', tag_)-1)) stored
, snd text generated always as (substr(tag_, locate('-', tag_)+1)) stored
);
Fiddle

Correct way to get the last value for a field in Apache Spark or Databricks Using SQL (Correct behavior of last and last_value)?

What is the correct behavior of the last and last_value functions in Apache Spark/Databricks SQL. The way I'm reading the documentation (here: https://docs.databricks.com/spark/2.x/spark-sql/language-manual/functions.html) it sounds like it should return the last value of what ever is in the expression.
So if I have a select statement that does something like
select
person,
last(team)
from
(select * from person_team order by date_joined)
group by person
I should get the last team a person joined, yes/no?
The actual query I'm running is shown below. It is returning a different number each time I execute the query.
select count(distinct patient_id) from (
select
patient_id,
org_patient_id,
last_value(data_lot) data_lot
from
(select * from my_table order by data_lot)
where 1=1
and org = 'my_org'
group by 1,2
order by 1,2
)
where data_lot in ('2021-01','2021-02')
;
What is the correct way to get the last value for a given field (for either the team example or my specific example)?
--- EDIT -------------------
I'm thinking collect_set might be useful here, but I get the error shown when I try to run this:
select
patient_id,
last_value(collect_set(data_lot)) data_lot
from
covid.demo
group by patient_id
;
Error in SQL statement: AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;;
Aggregate [patient_id#89338], [patient_id#89338, last_value(collect_set(data_lot#89342, 0, 0), false) AS data_lot#91848]
+- SubqueryAlias spark_catalog.covid.demo
The posts shown below discusses how to get max values (not the same as last in a list ordered by a different field, I want the last team a player joined, the player may have joined the Reds, the A's, the Zebras, and the Yankees, in that order timewise, I'm looking for the Yankees) and these posts get to the solution procedurally using python/r. I'd like to do this in SQL.
Getting last value of group in Spark
Find maximum row per group in Spark DataFrame
--- SECOND EDIT -------------------
I ended up using something like this based upon the accepted answer.
select
row_number() over (order by provided_date, data_lot) as row_num,
demo.*
from demo
You can assign row numbers based on an ordering on data_lots if you want to get its last value:
select count(distinct patient_id) from (
select * from (
select *,
row_number() over (partition by patient_id, org_patient_id, org order by data_lots desc) as rn
from my_table
where org = 'my_org'
)
where rn = 1
)
where data_lot in ('2021-01','2021-02');

How to efficiently fetch an earlier row that is unique through two attributes?

Previously, I asked about how we can fetch a simple previous row through an incremented ID field (Thank you Petr HavlĂ­k). In this case I have ID and ACTIVITY, where (ACTIVITY&ID) is the unique value per row.
From an SQL perspective I just do an inner join where ACTIVITY = Joined ACTIVITY and ID = ID - 1 in the joined table and get the row I need.
In other words, I want the previous percentage belonging to the same activity.
So using the answer in the previous post I was able to get the result I want on 1000 rows. However if I were to increase this number of rows to 85000+ this function is dauntingly slow.
=SUMX(FILTER ( Query, (EARLIER ( [ID] ) = [ID] + 1)&&(EARLIER([ACTIVITY])=[ACTIVITY])),[PERCENTAGE])
My end result is to make this function on up to 7 million rows, if this is possible, how I can optimize it ? And if it isn't, could you explain to me why I can't do it ?
One option could be to try a variation on the approach - without your dataset I can't test whether it is more efficient but I've run similar things on 1m+ row datasets without issue:
=
CALCULATE (
SUM ( [PERCENTAGE] ),
FILTER (
Query,
[ID] = EARLIER ( [ID] ) - 1
&& [ACTIVITY] = EARLIER ( [ACTIVITY] )
)
)
Probably not what you want to hear but doing this with SQL on import is probably your best bet.
The best answer here would be using Lookupvalue, that would bypass any filters you need to do and allows you to do a direct lookup of values in the table. This would be much faster.
It would look something like:
=LOOKUPVALUE(table[PERCENTAGE], [ID] = EARLIER ( [ID] ) - 1)
Please make sure the ID values are unique as lookupvalue can only return a single result, when more than one rows are returned it will error out. You can potentially wrap it around with iserror
= IF(ISERROR(LOOKUPVALUE(table[PERCENTAGE], [ID] = EARLIER ( [ID] ) - 1)), BLANK()
, LOOKUPVALUE(table[PERCENTAGE], [ID] = EARLIER ( [ID] ) - 1)
)
)
JShmay,
this is pretty much the same question - and as Jacob has suggested, you can use logical operators that are normally available in Excel/PowerPivot.
You can really go crazy with this and should you need something more complex - for example to get difference between two points following some other condition, I would point you to very similar questions and my answers to them:
Can I compare values in the same column in adjacent rows in PowerPivot?
2nd latest Date - DAX
Hope this helps :)

Resources