MemSql GEOGRAPHY_DISTANCE,GEOGRAPHY_CONTAINS,GEOGRAPHY_WITHIN_DISTANCE is not working and returning null - singlestore

It is not returning any figure, It is returning null.
SELECT round(GEOGRAPHY_DISTANCE("GEOGRAPHY_POINT(-97.741890, 30.219940)", "POLYGON ((-97.11090087890626 33.08693925905123,-96.52862548828126 33.063924198120645,-96.56158447265626 32.80343616698929,-97.06970214843751 32.778037985363675,-97.11090087890626 33.08693925905123))"),0) FROM DUAL;
SELECT GEOGRAPHY_WITHIN_DISTANCE("GEOGRAPHY_POINT(96.843820, 32.926290)","POLYGON ((-97.11090087890626 33.08693925905123,-96.52862548828126 33.063924198120645,-96.56158447265626 32.80343616698929,-97.06970214843751 32.778037985363675,-97.11090087890626 33.08693925905123))",1000) from dual;
can anyone please help to make this memsql Geospatial Function work

Expanding on Damien_The_Unbeliever's answer above, the GEOGRAPHY_POINT syntax should not be inside a string. You can either define the point within a string in WKT syntax POINT(long lat), or using the GEOGRAPHY_POINT syntax (but not in a string).
So for your first query that wasn't working:
memsql> SELECT round(GEOGRAPHY_DISTANCE("GEOGRAPHY_POINT(-97.741890, 30.219940)", "POLYGON ((-97.11090087890626 33.08693925905123,-96.52862548828126 33.063924198120645,-96.56158447265626 32.80343616698929,-97.06970214843751 32.778037985363675,-97.11090087890626 33.08693925905123))"),0) FROM DUAL;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| round(GEOGRAPHY_DISTANCE("GEOGRAPHY_POINT(-97.741890, 30.219940)", "POLYGON ((-97.11090087890626 33.08693925905123,-96.52862548828126 33.063924198120645,-96.56158447265626 32.80343616698929,-97.06970214843751 32.778037985363675,-97.11090087890626 33.086939 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| NULL |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.39 sec)
memsql> show warnings;
+---------+------+----------------------------------------------------+
| Level | Code | Message |
+---------+------+----------------------------------------------------+
| Warning | 1862 | You have an error in your WKT syntax at position 0 |
+---------+------+----------------------------------------------------+
1 row in set (0.00 sec)
You can correctly write it as either of these two:
memsql> SELECT round(GEOGRAPHY_DISTANCE("POINT(-97.741890 30.219940)", "POLYGON ((-97.11090087890626 33.08693925905123,-96.52862548828126 33.063924198120645,-96.56158447265626 32.80343616698929,-97.06970214843751 32.778037985363675,-97.11090087890626 33.08693925905123))"),0) FROM DUAL;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| round(GEOGRAPHY_DISTANCE("POINT(-97.741890 30.219940)", "POLYGON ((-97.11090087890626 33.08693925905123,-96.52862548828126 33.063924198120645,-96.56158447265626 32.80343616698929,-97.06970214843751 32.778037985363675,-97.11090087890626 33.08693925905123))" |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 291334 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.31 sec)
memsql> SELECT round(GEOGRAPHY_DISTANCE(GEOGRAPHY_POINT(-97.741890, 30.219940), "POLYGON ((-97.11090087890626 33.08693925905123,-96.52862548828126 33.063924198120645,-96.56158447265626 32.80343616698929,-97.06970214843751 32.778037985363675,-97.11090087890626 33.08693925905123))"),0) FROM DUAL;
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| round(GEOGRAPHY_DISTANCE(GEOGRAPHY_POINT(-97.741890, 30.219940), "POLYGON ((-97.11090087890626 33.08693925905123,-96.52862548828126 33.063924198120645,-96.56158447265626 32.80343616698929,-97.06970214843751 32.778037985363675,-97.11090087890626 33.08693925 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 291334 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.54 sec)

Related

Null last in order by in Mem SQL

Can someone help me how to use Null last in Mem sqlMEM, in RDBMS we have option for null last but in Mem SQL it does not supports
SingleStore supports it:
singlestore> create table t(a int);
Query OK, 0 rows affected (0.02 sec)
singlestore> insert t values(1),(2),(null),(4);
singlestore> select a from t order by a;
+------+
| a |
+------+
| NULL |
| 1 |
| 2 |
| 4 |
+------+
4 rows in set (0.03 sec)
singlestore> select a from t order by a NULLS LAST;
+------+
| a |
+------+
| 1 |
| 2 |
| 4 |
| NULL |
+------+

Correlated Subquery in Spark SQL

I have the following 2 tables for which I have to check the existence of values between them using a correlated sub-query.
The requirement is - for each record in the orders table check if the corresponding custid is present in the customer table, and then output a field (named FLAG) with value Y if the custid exists, otherwise N if it doesn't.
orders:
orderid | custid
12345 | XYZ
34566 | XYZ
68790 | MNP
59876 | QRS
15620 | UVW
customer:
id | custid
1 | XYZ
2 | UVW
Expected Output:
orderid | custid | FLAG
12345 | XYZ | Y
34566 | XYZ | Y
68790 | MNP | N
59876 | QRS | N
15620 | UVW | Y
I tried something like the following but couldn't get it to work -
select
o.orderid,
o.custid,
case when o.custid EXISTS (select 1 from customer c on c.custid = o.custid)
then 'Y'
else 'N'
end as flag
from orders o
Can this be solved with a correlated scalar sub-query ? If not what is the best way to implement this requirement ?
Please advise.
Note: using Spark SQL query v2.4.0
Thanks.
IN/EXISTS predicate sub-queries can only be used in a filter in Spark.
The following works in a locally recreated copy of your data:
select orderid, custid, case when existing_customer is null then 'N' else 'Y' end existing_customer
from (select o.orderid, o.custid, c.custid existing_customer
from orders o
left join customer c
on c.custid = o.custid)
Here's how it works with recreated data:
def textToView(csv: String, viewName: String) = {
spark.read
.option("ignoreLeadingWhiteSpace", "true")
.option("ignoreTrailingWhiteSpace", "true")
.option("delimiter", "|")
.option("header", "true")
.csv(spark.sparkContext.parallelize(csv.split("\n")).toDS)
.createOrReplaceTempView(viewName)
}
textToView("""id | custid
1 | XYZ
2 | UVW""", "customer")
textToView("""orderid | custid
12345 | XYZ
34566 | XYZ
68790 | MNP
59876 | QRS
15620 | UVW""", "orders")
spark.sql("""
select orderid, custid, case when existing_customer is null then 'N' else 'Y' end existing_customer
from (select o.orderid, o.custid, c.custid existing_customer
from orders o
left join customer c
on c.custid = o.custid)""").show
Which returns:
+-------+------+-----------------+
|orderid|custid|existing_customer|
+-------+------+-----------------+
| 59876| QRS| N|
| 12345| XYZ| Y|
| 34566| XYZ| Y|
| 68790| MNP| N|
| 15620| UVW| Y|
+-------+------+-----------------+

Trim addtional whitespace between the names in PySpark

How to trim the additional spaces present between the names in PySpark dataframe?
Below is my dataframe
+----------------------+----------+
|name |account_id|
+----------------------+----------+
| abc xyz pqr | 1 |
| pqm rst | 2 |
+----------------------+----------+
Output I want
+-------------+----------+
|name |account_id|
+-------------+----------+
| abc xyz pqr | 1 |
| pqm rst | 2 |
+-------------+----------+
I tried using regex_replace, but it trims the space completely. Is there any other way to implement this ? Thanks a lot!
I tried using 'regexp_replace(,'\s+',' ')' and I got the output.
df=df.withColumn("name",regexp_replace(col("name"),'\s+',' '))
Output
+-----------+----------+
| name |account_id|
+-----------+----------+
|abc xyz pqr| 1 |
| pqm rst| 2 |
+-----------+----------+

SQL Transpose rows to columns (group by key variable)?

I am trying to transpose rows into columns, grouping by a unique identifier (CASE_ID).
I have a table with this structure:
CASE_ID AMOUNT TYPE
100 10 A
100 50 B
100 75 A
200 33 B
200 10 C
And I am trying to query it to produce this structure...
| CASE_ID | AMOUNT1 | TYPE1 | AMOUNT2 | TYPE2 | AMOUNT3 | TYPE3 |
|---------|---------|-------|---------|-------|---------|--------|
| 100 | 10 | A | 50 | B | 75 | A |
| 200 | 33 | B | 10 | C | (null) | (null) |
(assume much larger dataset with large number of possible values for CASE_ID, TYPE and AMOUNT)
I tried to use pivot but I don't need an aggregate function (simply trying to restructure the data). Now I'm trying to somehow use row_number but not sure how.
I'm basically trying to replicate and SPSS command called Casestovars, but need to be able to do it in SQL. thanks.
You can get the result by creating a sequential number with row_number() and then use an aggregate function with CASE expression:
select case_id,
max(case when seq = 1 then amount end) amount1,
max(case when seq = 1 then type end) type1,
max(case when seq = 2 then amount end) amount2,
max(case when seq = 2 then type end) type2,
max(case when seq = 3 then amount end) amount3,
max(case when seq = 3 then type end) type3
from
(
select case_id, amount, type,
row_number() over(partition by case_id
order by case_id) seq
from yourtable
) d
group by case_id;
See SQL Fiddle with Demo.
If you are using a database product that has the PIVOT function, then you can use row_number() with PIVOT, but first I would suggest that you unpivot the amount and type columns first. The basic syntax for a limited number of values in SQL Server would be:
select case_id, amount1, type1, amount2, type2, amount3, type3
from
(
select case_id, col+cast(seq as varchar(10)) as col, value
from
(
select case_id, amount, type,
row_number() over(partition by case_id
order by case_id) seq
from yourtable
) d
cross apply
(
select 'amount', cast(amount as varchar(20)) union all
select 'type', type
) c (col, value)
) src
pivot
(
max(value)
for col in (amount1, type1, amount2, type2, amount3, type3)
) piv;
See SQL Fiddle with Demo.
If you have an unknown number of values, then you can use dynamic SQL to get the result - SQL Server syntax would be:
DECLARE #cols AS NVARCHAR(MAX),
#query AS NVARCHAR(MAX)
select #cols = STUFF((SELECT ',' + QUOTENAME(col+cast(seq as varchar(10)))
from
(
select row_number() over(partition by case_id
order by case_id) seq
from yourtable
) d
cross apply
(
select 'amount', 1 union all
select 'type', 2
) c (col, so)
group by col, so
order by seq, so
FOR XML PATH(''), TYPE
).value('.', 'NVARCHAR(MAX)')
,1,1,'')
set #query = 'SELECT case_id,' + #cols + '
from
(
select case_id, col+cast(seq as varchar(10)) as col, value
from
(
select case_id, amount, type,
row_number() over(partition by case_id
order by case_id) seq
from yourtable
) d
cross apply
(
select ''amount'', cast(amount as varchar(20)) union all
select ''type'', type
) c (col, value)
) x
pivot
(
max(value)
for col in (' + #cols + ')
) p '
execute sp_executesql #query;
See SQL Fiddle with Demo. Each version will give the result:
| CASE_ID | AMOUNT1 | TYPE1 | AMOUNT2 | TYPE2 | AMOUNT3 | TYPE3 |
|---------|---------|-------|---------|-------|---------|--------|
| 100 | 10 | A | 50 | B | 75 | A |
| 200 | 33 | B | 10 | C | (null) | (null) |

Include Summation Row with Group By Clause

Query:
SELECT aType, SUM(Earnings - Expenses) "Rev"
FROM aTable
GROUP BY aType
ORDER BY aType ASC
Results:
| aType | Rev |
| ----- | ----- |
| A | 20 |
| B | 150 |
| C | 250 |
Question:
Is it possible to display a summary row at the bottom such as below by using Sybase syntax within my initial query, or would it have to be a separate query altogether?
| aType | Rev |
| ----- | ----- |
| A | 20 |
| B | 150 |
| C | 250 |
=================
| All | 320 |
I couldn't get the ROLLUP function from SQL to translate over to Sybase successfully but I'm not sure if there is another way to do this, if at all.
Thanks!
Have you tried just using a UNION ALL similar to this:
select aType, Rev
from
(
SELECT aType, SUM(Earnings - Expenses) "Rev", 0 SortOrder
FROM aTable
GROUP BY aType
UNION ALL
SELECT 'All', SUM(Earnings - Expenses) "Rev", 1 SortOrder
FROM aTable
) src
ORDER BY SortOrder, aType
See SQL Fiddle with Demo. This gives the result:
| ATYPE | REV |
---------------
| A | 10 |
| B | 150 |
| C | 250 |
| All | 410 |
May be you can work out with compute by clause in sybase like:
create table #tmp1( name char(9), earning int , expense int)
insert into #tmp1 values("A",30,20)
insert into #tmp1 values("B",50,30)
insert into #tmp1 values("C",60,30)
select name, (earning-expense) resv from #tmp1
group by name
order by name,resv
compute sum(earning-expense)
OR
select name, convert(varchar(15),(earning-expense)) resv from #tmp1
group by name
union all
SELECT "------------------","-----"
union all
select "ALL",convert(varchar(15),sum(earning-expense)) from #tmp1
Thanks,
Gopal
Not all versions of Sybase support ROLLUP. You can do it the old fashioned way:
with t as
(SELECT aType, SUM(Earnings - Expenses) "Rev"
FROM aTable
GROUP BY aType
)
select t.*
from ((select aType, rev from t) union all
(select NULL, sum(rev))
) t
ORDER BY (case when atype is NULL then 1 else 0 end), aType ASC
This is the yucky, brute force approach. If this version of Sybase doesn't support with, you can do:
select t.aType, t.Rev
from ((SELECT aType, SUM(Earnings - Expenses) "Rev"
FROM aTable
GROUP BY aType
) union all
(select NULL, sum(rev))
) t
ORDER BY (case when atype is NULL then 1 else 0 end), aType ASC
This is pretty basic, standard SQL.

Resources