Hi this is my first post and I'm also new to sql. I am trying to extract text from within a string
I have a table column that looks like this
site - abc - left
site - def - left
site - ghi - right - inner
site - jkl - right - inner
site - mno
site - pqr
I need a query that would return text inbetween the first two '-' but as per example some only have one '-'.
For example.
abc
def
ghi
jkl
mno
pqr
Any help greatfully accepted
The code I have been working with only gives me the first column 'site'.
SELECT SUBSTR(site.description,1,instr(site.description,'-',1,1)-1) AS loc
FROM table
Suppose your data resides in a tabled named test_n with only column val with above values as mentioned in your question, the query is:
select val
, instr(val, '-', 1,1) + 1 START_POS
, instr(val, '-',1,2) END_POS
, substr(val, instr(val, '-', 1,1) + 1, decode(instr(val, '-',1,2),0,length(val)+1,instr(val, '-',1,2) ) - instr(val, '-', 1,1)-1 ) result
FROM test_n;
I have not checked yet, but i think this would work for you
substr(var1,1,instr(site.description,'-',1,1)+1-instr(site.description,'-',1,2)-1)
it should work,if it did not just need a little change.let me know the result
Related
In pyspark, I'm trying to replace multiple text values in a column by the value that are present in the columns which names are present in the calc column (formula).
So to be clear, here is an example :
Input:
|param_1|param_2|calc
|-------|-------|--------
|Cell 1 |Cell 2 |param_1-param_2
|Cell 3 |Cell 4 |param_2/param_1
Output needed:
|param_1|param_2|calc
|-------|-------|--------
|Cell 1 |Cell 2 |Cell 1-Cell 2
|Cell 3 |Cell 4 |Cell 4/Cell 3
In the column calc, the default value is a formula. It can be something as much as simple as the ones provided above or it can be something like "2*(param_8-param_4)/param_2-(param_3/param_7)".
What I'm looking for is something to substitute all the param_x by the values in the related columns regarding the names.
I've tried a lot of things but nothing works at all and most of the time when I use replace or regex_replace with a column for the replacement value, the error the column is not iterable occurs.
Moreover, the columns param_1, param_2, ..., param_x are generated dynamically and the calc column values can some of these columns but not necessary all of them.
Could you help me on the subject with a dynamic solution ?
Thank you so much.
Best regards
Update: Turned out I misunderstood the requirement. This would work:
for exp in ["regexp_replace(calc, '"+col+"', "+col+")" for col in df.schema.names]:
df=df.withColumn("calc", F.expr(exp))
Yet Another Update: To Handle Null Values add coalesce:
for exp in ["coalesce(regexp_replace(calc, '"+col+"', "+col+"), calc)" for col in df.schema.names]:
df=df.withColumn("calc", F.expr(exp))
Input/Output:
------- Keeping the below section for a while just for reference -------
You can't directly do that - as you won't be able to use column value directly unless you collect in a python object (which is obviously not recommended).
This would work with the same:
df = spark.createDataFrame([["1","2", "param_1 - param_2"],["3","4", "2*param_1 + param_2"]]).toDF("param_1", "param_2", "calc");
df.show()
df=df.withColumn("row_num", F.row_number().over(Window.orderBy(F.lit("dummy"))))
as_dict = {row.asDict()["row_num"]:row.asDict()["calc"] for row in df.select("row_num", "calc").collect()}
expression = f"""CASE {' '.join([f"WHEN row_num ='{k}' THEN ({v})" for k,v in as_dict.items()])} \
ELSE NULL END""";
df.withColumn("Result", F.expr(expression)).show();
Input/Output:
From a space delimited string, i would like to remove all words that are long from 1 to 3 characters.
For example: this string
LCCPIT A2 LCCMAD B JBPM_JIT CCC
should become
LCCPIT LCCMAD JBPM_JIT
So, A2, B and CCC words are removed (since they are long 2, 1 and 3 characters). Is there a way to do it? I think i could use REGEXP_REPLACE, but i didn't find the correct regular expression to have this result.
Split string to words and aggregate back only these substrings whose length is greater than 3.
Sample data:
SQL> with test (col) as
2 (select 'LCCPIT A2 LCCMAD B JBPM_JIT CCC' from dual)
Query begins here:
3 select listagg(val, ' ') within group (order by lvl) result
4 from (select regexp_substr(col, '[^ ]+', 1, level) val,
5 level lvl
6 from test
7 connect by level <= regexp_count(col, ' ') + 1
8 )
9 where length(val) > 3;
RESULT
--------------------------------------------------------------------------------
LCCPIT LCCMAD JBPM_JIT
SQL>
I prefer a regex replacement trick:
SELECT TRIM(REGEXP_REPLACE(val, '(^|\s+)\w{1,3}(\s+|$)', ' '))
FROM dual;
-- output is 'LCCPIT LCCMAD JBPM_JIT'
Demo
The strategy above is match any 1, 2, or 3 letter word, along with any surrounding whitespace, and to replace with just a single space. The outer call to TRIM() is necessary to remove dangling spaces which might arise from the first or last word being removed.
Table in SQLITE, want to simple way to delete everything to a right of a set phrase/character in the Company_name_ column, in this case everything after "LLC":
Company_name_
Example LLC $42
Example llc,klp
Example LLc jim
becomes
Company_name_
Example LLC
Example llc
Example LLc
Tried Set Charindex and Substr but getting syntax errors.
Thank you
You can do it with string functions SUBSTR() and INSTR().
If you want a SELECT query use a CASE expression with the operator LIKE to check if the column value contains 'LLC' or not:
SELECT CASE
WHEN Company_name_ LIKE '%LLC%'
THEN SUBSTR(
Company_name_,
1,
INSTR(UPPER(Company_name_), 'LLC') + LENGTH('LLC') - 1
)
ELSE Company_name_
END Company_name_
FROM tablename;
If you want to update the table:
UPDATE tablename
SET Company_name_ = SUBSTR(
Company_name_,
1,
INSTR(UPPER(Company_name_), 'LLC') + LENGTH('LLC') - 1
)
WHERE Company_name_ LIKE '%LLC%';
See the demo.
I need to concatenate the columns LastName and FirstName into a new column called EmployeeName. The problem is that some first name fields include a middle initial and some do not. We do not want the initial in the EmployeeName column. How do I remove it from those instances?
I have tried trim functions, left functions, right functions and I cannot get it to work quite right. I have tried concatenating the columns then cleaning it up, that does not work either
SELECT LEFT(EmployeeName, LEN(EmployeeName) - 2) FROM myTable
but this removes the last characters even for those with no middle initial. I have it as -2 to account for the space between FirstName and Middle Initial
When the EmployeeName field is Smith, John J it removes the space and J correctly
When the EmployeeName field is Smith, John it removes the 'hn'. I don't want that.
Thank you very much
I figured it out. I count the spaces then use a case statement to concatenate the fields
select LastName, FirstName,
(
case
when len(firstname) - len(replace(firstname, ' ', '')) > 0
then concat(LEFT(firstname, LEN(firstname) - 2),', ',LastName)
ELSE concat(FirstName, ', ', Lastname)
END
) as namesCombined
from myTable
I don't know excel very well and I am trying to take something like this (with a lot of entries):
Field ......Value ....... ID
A .......... blabla1 .......1
B ...........blabla2 .......1
C ...........blabla3 .......1
D ...........blabla4 .......1
A ...........blabla5 .......2
B ...........blabla6 .......2
C ...........blabla7 .......2
D ...........blabla8 .......2
and turn into something more readable like this:
ID -----A -------------B ---------------- C ---------------- D
1 ------blabla1 -----blabla2 -------- blabla3 --------blabla4
2 ------blabla5----- blabla6 -------- blabla7-------- blabla8
Does anyone know a good way to do that? Thank you
(sorry about the bad formatting)
The exact delimiter beween each word is key if text not already split in separate cells..
Assuming there are numerous words in place of '.....', with each word separated by a single space (different delimiter would be required if the blablas represented sentences comprising one / more spaces), then you could achieve the desired table representation as follows
(several function in this soln requires Office 365 compatible version of Excel,
the lookup in step 3 does not require Office 365, but may mean IDs and Fields need to be manually entered or VB could be deployed):
Starting position (after removing bank rows):
Field Value ID
A blabla1 1
B blabla2 1
C blabla3 1
D blabla4 1
A blabla5 2
B blabla6 2
C blabla7 2
D blabla8 2
1) Split cells according to delimiter (skip this step if not relevant)
=TRANSPOSE(FILTERXML("<x><y>"&SUBSTITUTE(F3," ","</y><y>")&"</y></x>","//y"))
(replace the " " inside the substitute function with a different delimiter if required/desired)
2) Obtain unique IDs (rows) and Fields (columns)
=UNIQUE(K4:K11)
=TRANSPOSE(UNIQUE(I4:I11))
3) Index lookup for table content
=INDEX(J4:J11,MATCH(M4#&N3#,K4:K11&I4:I11,0),0)