Compare blank string and null spark sql - apache-spark

I am writing an SQL query that joins two tables. The problem that I am facing is that the column on which I am joining is blank (""," ") on one table and null on the other.
Table A
id
col
1
2
3
SG
Table B
id
col
a
null
b
null
c
SG
source_alleg = spark.sql("""
SELECT A.*,B.COL as COLB FROM TABLEA A LEFT JOIN TABLEB B
ON A.COL = B.COL
""")
For my use case blank values and null are same. I want to do something like Trim(a.col) which will convert blank values to null and hence find all the matches in the join.
Output:
id
col
colb
1
either null or blank
either null or blank
2
either null or blank
either null or blank
3
SG
SG

In sql the NULL are ignored during a join unless you use a outer join or full join
more information : https://www.geeksforgeeks.org/difference-between-left-right-and-full-outer-join/
if you want to convert the nulls to a string you can just use an if
select
if(isnull(trim(col1)),"yourstring", col1),
if(isnull(trim(col2)),"yourstring", col2)
from T;

Related

How to search a string from a collection of string in hive column

I have below two input table : table 1 and table 2
Input : table 1 id and currency are of type string
ID
Currency
1
USD,JPY,EUR
2
NOK,MXN
3
AUD
4
AUD,HKD
input: table2 exception_currency are of type string
exception_currency
AUD
NOK
USD
HKD
expected output as below
Exception is yes if the each currency column values from table 1 and exception_currency from table 2 are not matching.
For example for id 1, the exception is "YES" because JPY and EUR are not available in the table 2.
ID
Currency
Exception
1
USD,JPY,EUR
YES
2
NOK,MXN
YES
3
AUD
NO
4
AUD,HKD
NO
I tried below code but not getting expected results.
select
id,
currency,
case when array_contains(split(t1.currency,','), t2.exception_currency) then 'NO' else 'YES' as exception
from table1 t1 left join table2 t2 on (t1.currency=t2.exception_currency);
First seperate the comma separated values into rows and then compare it with table2.
Select id, currency,
Max(excep) as exception
from table1
Join (Select
Id, Case when exception_currency is null then 'yes' else 'no' end excep
From
(SELECT ID, sep_curr
FROM table1 lateral view explode(split(currency,',')) currency AS sep_curr) rs
Left join table2 on exception_currency= sep_curr
) Rs on rs.id= table1.id
Group by id, currency
Here lateral view will generate rows from comma separated values.
Joining it with exception table to get non existent exception.
Then finally max will show yes if at least one value is exception.

How to do compare/subtract records

Table A having 20 records and table B showing 19 records. How to find that one record is which is missing in table B. How to do compare/subtract records of these two tables; to find that one record. Running query in Apache Superset.
The exact answer depends on which column(s) define whether two records are the same. Assuming you wanted to use some primary key column for the comparison, you could try:
SELECT a.*
FROM TableA a
WHERE NOT EXISTS (SELECT 1 FROM TableB b WHERE b.pk = a.pk);
If you wanted to use more than one column to compare records from the two tables, then you would just add logic to the exists clause, e.g. for three columns:
WHERE NOT EXISTS (SELECT 1 FROM TableB b WHERE b.col1 = a.col1 AND
b.col2 = a.col2 AND
b.col3 = a.col3)

sqlite instr function in not working in some cases

In Sqlite we have table1 with column column1
there are 4 rows with following values for column1
(p1,p10,p11,p20)
DROP TABLE IF EXISTS table1;
CREATE TABLE table1(column1 NVARCHAR);
INSERT INTO table1 (column1) values ('p1'),('p10'),('p11'),('p20');
Select instr(',p112,p108,p124,p204,p11,p1124,p1,p10,p20,',','+column1+',') from table1;
We have to get the position of each value of column1 in the given string:
,p112,p108,p124,p204,p11,p1124,p1,p10,p20,
the query
Select instr(',p112,p108,p124,p204,p11,p1124,p1,p10,p20,',column1) from table1;
returns values
(2,7,2,17)
which is not what we want
the query
Select instr(',p112,p108,p124,p204,p11,p1124,p1,p10,p20,',','+column1+',') from table1;
returns 9 for all rows -
it turned out that it is the position of first "0" symbol ???
Howe we can get the exact positions of column1 in the given string in sqlite ??
In SQLite the concatenation operator is || and not + (like SQL Server), so do this:
Select instr(',p112,p108,p124,p204,p11,p1124,p1,p10,p20,',',' || column1 || ',') from table1;
What you did with your code was number addition which resulted to 0, because none of the string operands could succesfully be converted to number,
so instr() was searching for '0' and found it always at position 9 of the string:',p112,p108,p124,p204,p11,p1124,p1,p10,p20,'.

DB2 splitting comma separated String to use in a IN clause.. Update: WITH clause query inside IN clause

I have a table TableA with values in ColumnA as below:
ColumnA
__________________
a,b,c
d,e
I have table TableB with values as:
ColumnB ColumnC
____________________
a 1
b 2
c 3
d 4
e 5
x 9
I want to use above values in another query:
SELECT columnC FROM TableB where ColumnB in (select ColumnA from TableA)
Obviously above query won't work.
The output should be 1, 2, 3, 4, 5.
How to do this without function i.e. in a simple query?
Update:
Based on mustoccio's comment below, I made it work using the WITH clause:
With split_data as (select ColumnA as split_string, ',' as split from TableA),
rec
(
split_string, split, row_num, column_value, pos
)
as
(
select
split_string,
split,
1,
varchar(substr(split_string, 1, decode(instr(split_string, split, 1),0,length(split_string), instr(split_string, split, 1)-1)), 255),
instr(split_string, split, 1) + length(split)
from split_data
union
all
select
split_string,
split,
row_num+1,
varchar(substr(split_string, pos, decode(instr(split_string, split, pos),0, length(split_string)-pos+1, instr(split_string, split, pos)-pos)), 255),
instr(split_string, split, pos)+length(split)
from rec
where row_num < 300000
and pos > length(split)
)
select
column_value as data
from rec
order by row_num
However, when I try to use above query inside the IN clause of my query:
SELECT columnC FROM TableB where ColumnB in (/* WITH query here */)
I get error as:
Error: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=as;in ( With split_data;JOIN, DRIVER=3.50.152
SQLState: 42601
ErrorCode: -104
Error: DB2 SQL Error: SQLCODE=-727, SQLSTATE=56098, SQLERRMC=2;-104;42601;as|in ( With split_data|JOIN, DRIVER=3.50.152
SQLState: 56098
ErrorCode: -727
Can't we use WITH clause query inside IN clause ?
If NO, what is the solution ?
Inner join must work here.. you can use this one
select columnC from tableB inner join tableA on tableB.columnB=tableA.columnA;
enter image description here
you can see the result here

Filling in NULLS with previous records - Netezza SQL

I am using Netezza SQL on Aginity Workbench and have the following data:
id DATE1 DATE2
1 2013-07-27 NULL
2 NULL NULL
3 NULL 2013-08-02
4 2013-09-10 2013-09-23
5 2013-12-11 NULL
6 NULL 2013-12-19
I need to fill in all the NULL values in DATE1 with preceding values in the DATE1 field that are filled in. With DATE2, I need to do the same, but in reverse order. So my desired output would be the following:
id DATE1 DATE2
1 2013-07-27 2013-08-02
2 2013-07-27 2013-08-02
3 2013-07-27 2013-08-02
4 2013-09-10 2013-09-23
5 2013-12-11 2013-12-19
6 2013-12-11 2013-12-19
I only have read access to the data. So creating Tables or views are out of the question
How about this?
select
id
,last_value(date1 ignore nulls) over (
order by id
rows between unbounded preceding and current row
) date1
,first_value(date2 ignore nulls) over (
order by id
rows between current row and unbounded following
) date2
You can manually calculate this as well, rather than relying on the windowing functions.
with chain as (
select
this.*,
prev.date1 prev_date1,
case when prev.date1 is not null then abs(this.id - prev.id) else null end prev_distance,
next.date2 next_date2,
case when next.date2 is not null then abs(this.id - next.id) else null end next_distance
from
Table1 this
left outer join Table1 prev on this.id >= prev.id
left outer join Table1 next on this.id <= next.id
), min_distance as (
select
id,
min(prev_distance) min_prev_distance,
min(next_distance) min_next_distance
from
chain
group by
id
)
select
chain.id,
chain.prev_date1,
chain.next_date2
from
chain
join min_distance on
min_distance.id = chain.id
and chain.prev_distance = min_distance.min_prev_distance
and chain.next_distance = min_distance.min_next_distance
order by chain.id
If you're unable to calculate the distance between IDs by subtraction, just replace the ordering scheme by a row_number() call.
I think Netezza supports the order by clause for max() and min(). So, you can do:
select max(date1) over (order by date1) as date1,
min(date2) over (order by date2 desc) as date2
. . .
EDIT:
In Netezza, you may be able to do this with last_value() and first_value():
select last_value(date1 ignore nulls) over (order by id rows between unbounded preceding and 1 preceding) as date1,
first_value(date1 ignore nulls) over (order by id rows between 1 following and unbounded following) as date2
Netezza doesn't seem to support IGNORE NULLs on LAG(), but it does on these functions.
I've only tested this in Oracle so hopefully it works in Netezza:
Fiddle:
http://www.sqlfiddle.com/#!4/7533f/1/0
select id,
coalesce(date1, t1_date1, t2_date1) as date1,
coalesce(date2, t3_date2, t4_date2) as date2
from (select t.*,
t1.date1 as t1_date1,
t2.date1 as t2_date1,
t3.date2 as t3_date2,
t4.date2 as t4_date2,
row_number() over(partition by t.id order by t.id) as rn
from tbl t
left join tbl t1
on t1.id < t.id
and t1.date1 is not null
left join tbl t2
on t2.id > t.id
and t2.date1 is not null
left join tbl t3
on t3.id < t.id
and t3.date2 is not null
left join tbl t4
on t4.id > t.id
and t4.date2 is not null
order by t.id) x
where rn = 1
Here's a way to fill in NULL dates with the most recent min/max non-null dates using self-joins. This query should work on most databases
select t1.id, max(t2.date1), min(t3.date2)
from tbl t1
join tbl t2 on t1.id >= t2.id
join tbl t3 on t1.id <= t3.id
group by t1.id
http://www.sqlfiddle.com/#!4/acc997/2

Resources