how to use Like any, like all in spark - apache-spark

I was trying to use like any and like all in this query in Spark/ Dremio SQL
SELECT * FROM table_name where col_name LIKE ANY (CONCAT('%', 10, '%'), CONCAT('%', 20, '%'), CONCAT('%', 12, '%'))
SELECT * FROM table_name where col_name LIKE ALL (CONCAT('%', 10, '%'), CONCAT('%', 20, '%'), CONCAT('%', 12, '%'))
it is. showing error :
Failure parsing the query.
BY USING any I want to select all the rows with either 10, 20 or 12values and by using all I was trying to get a value which has all 3 values in it.
kindly help in resolving the query

SELECT * FROM table_name where col_name LIKE '%10%' AND col_name LIKE '%20%' AND col_name LIKE '%12%'
SELECT * FROM table_name where col_name LIKE '%10%' OR col_name LIKE '%20%' OR col_name LIKE '%12%'

Related

Spark sql using posix

To find non numeric rows we can do something like below in sparksql
spark.sql("select * from tabl where UPC rlike '[^0-9]'").show()
can this same query also be written like below? I tested it does not seem to work, basically I am trying to use :alpha:/:digit:/:alnum: posix commands
spark.sql("select * from tabl where UPC rlike '[^[:digit:]]'").show()
spark-sql> select * from ( select '8787687' col1) where rlike (col1,'^[[:digit:]]+$');
Time taken: 0.02 seconds
spark-sql>

Can I use variable in python to extract data from Postgresql?

I'm writing a code from python to query data from my postgres database, here is the code:
cur.execute(
"""SELECT id FROM reddit_tesla_title WHERE created_time like '{"2016-08%' """)
In my query, I will gradually increase the month ascendingly as to 2016-09, 2016-10...2017-01..2020-11..
I wonder if is there a way to insert a variable, say like:
year = 2016
month = 09
and in the for loop every time the year and the month increase by 1, but since this code is inside the cur.execute select statement, I'm not sure how to do it...any ideas? Thanks!
also, if is possible, can someone please let me know how to do the same thing for the output csv file's name.
df.to_csv('201610_tesla_cooments.csv', index=False)
Here is one way. It is a little tricky because you have curly braces in the base part of the query, so I just split it into two variables so that it was easy to use the Python format function:
https://docs.python.org/3.4/library/functions.html#format
Code:
query_base = """SELECT id FROM reddit_tesla_title WHERE created_time like '{"""
query_date_form = "{year}-{month:02}%'"
start_year = 2017
start_month = 11
end_year = 2019
end_month = 5
year = start_year
month = start_month
while True:
if year == end_year and month > end_month:
break
if month > 12:
year = year + 1
month = 1
continue
query_date = query_date_form.format(year=year, month=month)
query = query_base + query_date
print(f"Execute this: {query}")
month = month + 1
Execution:
Just update this with your code.
(venv) [ttucker#zim stackoverflow]$ python sql.py
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2017-11%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2017-12%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2018-01%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2018-02%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2018-03%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2018-04%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2018-05%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2018-06%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2018-07%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2018-08%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2018-09%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2018-10%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2018-11%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2018-12%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2019-01%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2019-02%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2019-03%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2019-04%'
Execute this: SELECT id FROM reddit_tesla_title WHERE created_time like '{2019-05%'

Cognos 11 - filter between query subjects

Given Table A with columns: ColA1, ColA2, ColA3
And a Table B with columns: ColB1
I want to restrict the data that can be returned from Table A based on data in Table B, like:
ColA1 not in ColB1
Ideally, some way incorporate SQL queries in the filter with select statements
What you want is
SELECT a.ColA1
, a.ColA2
, a.ColA3
FROM TableA a
LEFT OUTER JOIN TableB b on b.ColB1 = a.ColA1
WHERE b.ColB1 IS NULL
So...
Query1 contains ColA1, ColA2, and ColA3 from TableA.
Query2 contains ColB1 from TableB.
Query3
joins Query1 and Query2 on ColA1 1..1 = 0..1 ColB1
Data Items: ColA1, ColA2, ColA3
Filter: ColB1 IS NOT NULL
not exists is probably what you are looking for
Try something like this
select * from TableA as T1
where not exists
(select * from TableB as T2
where t1.key1 = t2.key1 and T1.key2 = t2.key2)

How to use a variable in place of column names when doing select in Pyspark/Sparksql

I have a variable with column names in it, like below:
VAR=Fullname,Address,DOB
I need to pass this variable in my spark sql instead of column names like below:
spark.sql("""
Select
VAR
From mytable
where 1=1
#some additional filters
""")
so it is treated as if I was giving the column names explicitly like below:
spark.sql("""
Select
Fullname
,Address
,DOB
From mytable
where 1=1
#some additional filters
""")
how can this be implemented using pyspark/sparksql
Below is code sample in pyspark for same,
var = 'Fullname,Address,DOB'
query = "SELECT {} FROM From mytable WHERE 1=1 #additional filters".format(var)
spark.sql(query)

DB2 splitting comma separated String to use in a IN clause.. Update: WITH clause query inside IN clause

I have a table TableA with values in ColumnA as below:
ColumnA
__________________
a,b,c
d,e
I have table TableB with values as:
ColumnB ColumnC
____________________
a 1
b 2
c 3
d 4
e 5
x 9
I want to use above values in another query:
SELECT columnC FROM TableB where ColumnB in (select ColumnA from TableA)
Obviously above query won't work.
The output should be 1, 2, 3, 4, 5.
How to do this without function i.e. in a simple query?
Update:
Based on mustoccio's comment below, I made it work using the WITH clause:
With split_data as (select ColumnA as split_string, ',' as split from TableA),
rec
(
split_string, split, row_num, column_value, pos
)
as
(
select
split_string,
split,
1,
varchar(substr(split_string, 1, decode(instr(split_string, split, 1),0,length(split_string), instr(split_string, split, 1)-1)), 255),
instr(split_string, split, 1) + length(split)
from split_data
union
all
select
split_string,
split,
row_num+1,
varchar(substr(split_string, pos, decode(instr(split_string, split, pos),0, length(split_string)-pos+1, instr(split_string, split, pos)-pos)), 255),
instr(split_string, split, pos)+length(split)
from rec
where row_num < 300000
and pos > length(split)
)
select
column_value as data
from rec
order by row_num
However, when I try to use above query inside the IN clause of my query:
SELECT columnC FROM TableB where ColumnB in (/* WITH query here */)
I get error as:
Error: DB2 SQL Error: SQLCODE=-104, SQLSTATE=42601, SQLERRMC=as;in ( With split_data;JOIN, DRIVER=3.50.152
SQLState: 42601
ErrorCode: -104
Error: DB2 SQL Error: SQLCODE=-727, SQLSTATE=56098, SQLERRMC=2;-104;42601;as|in ( With split_data|JOIN, DRIVER=3.50.152
SQLState: 56098
ErrorCode: -727
Can't we use WITH clause query inside IN clause ?
If NO, what is the solution ?
Inner join must work here.. you can use this one
select columnC from tableB inner join tableA on tableB.columnB=tableA.columnA;
enter image description here
you can see the result here

Resources