I am new to SQL and I am having difficulty creating a query which includes three geometries.
I have a point layer (building) from which I want to count all the points-buildings that they have a maximum distance of 50 meters from the lines (roads) in a certain polygon (municipal unit) using some extra criteria for the two out of three tables.
Here's the table structure:
Table 1: building (id_building, address_name,color_tagged,point)
Table 2: roads (id_road, line)
Table 3: munic_units (id_munic, munic_name, polygon)
I tried the code below and I have made a lot of changes..but it gives me errors.
I would be glad to hear any suggestions. Thank you.
SELECT address_name, count(*) AS Frequency,munic_name,color_tagged
FROM building,roads,munic_units
WHERE ST_CONTAINS((SELECT polygon FROM munic_units WHERE munic_name=''),(ST_DWithin((SELECT point FROM building WHERE color_tagged=''),line,50)))
GROUP BY address_name,munic_name,color_tagged
ORDER BY Frequency DESC,address_name DESC;
First, I tried a simpler version with two geometries:
SELECT address_name, count(*) AS Frequency,color_tagged
FROM building,roads,loc_munic_units
WHERE ST_Dwithin(point,roads_geom,50) AND color_tagged='YELLOW'
GROUP BY address_name,color_tagged
ORDER BY Frequency DESC,address_name DESC;
and returns...enter image description here ..the expected result in this stage was to find all the buildings in a distance of 50 meters from roads which have color_tagged as 'Yellow'. The final desired result is to run the previous search only in a specific area - one polygon.The structure of tables is as showed below.
You are close to the solution. The 1st query fails because you are not properly joining the tables. The 2nd query likely returns over-estimated counts because you do a cross join with the city table.
Using the 2nd query, you can add the missing join condition (and properly write the joins)
SELECT address_name, count(*) AS Frequency,color_tagged
FROM building b --selects from buildings
JOIN roads r ON ST_Dwithin(b.point,r.roads_geom,50) -- when building is within 50 of a road
JOIN loc_munic_units m ON ST_WITHIN(r.roads_geom, m.polygon) -- and when the road is within a municipality polygon
WHERE b.color_tagged='YELLOW' AND m.munic_name = 'abc' -- restrain to a specific municipality (or >1) and color
GROUP BY address_name,color_tagged
ORDER BY Frequency DESC,address_name DESC;
Related
What is the correct behavior of the last and last_value functions in Apache Spark/Databricks SQL. The way I'm reading the documentation (here: https://docs.databricks.com/spark/2.x/spark-sql/language-manual/functions.html) it sounds like it should return the last value of what ever is in the expression.
So if I have a select statement that does something like
select
person,
last(team)
from
(select * from person_team order by date_joined)
group by person
I should get the last team a person joined, yes/no?
The actual query I'm running is shown below. It is returning a different number each time I execute the query.
select count(distinct patient_id) from (
select
patient_id,
org_patient_id,
last_value(data_lot) data_lot
from
(select * from my_table order by data_lot)
where 1=1
and org = 'my_org'
group by 1,2
order by 1,2
)
where data_lot in ('2021-01','2021-02')
;
What is the correct way to get the last value for a given field (for either the team example or my specific example)?
--- EDIT -------------------
I'm thinking collect_set might be useful here, but I get the error shown when I try to run this:
select
patient_id,
last_value(collect_set(data_lot)) data_lot
from
covid.demo
group by patient_id
;
Error in SQL statement: AnalysisException: It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.;;
Aggregate [patient_id#89338], [patient_id#89338, last_value(collect_set(data_lot#89342, 0, 0), false) AS data_lot#91848]
+- SubqueryAlias spark_catalog.covid.demo
The posts shown below discusses how to get max values (not the same as last in a list ordered by a different field, I want the last team a player joined, the player may have joined the Reds, the A's, the Zebras, and the Yankees, in that order timewise, I'm looking for the Yankees) and these posts get to the solution procedurally using python/r. I'd like to do this in SQL.
Getting last value of group in Spark
Find maximum row per group in Spark DataFrame
--- SECOND EDIT -------------------
I ended up using something like this based upon the accepted answer.
select
row_number() over (order by provided_date, data_lot) as row_num,
demo.*
from demo
You can assign row numbers based on an ordering on data_lots if you want to get its last value:
select count(distinct patient_id) from (
select * from (
select *,
row_number() over (partition by patient_id, org_patient_id, org order by data_lots desc) as rn
from my_table
where org = 'my_org'
)
where rn = 1
)
where data_lot in ('2021-01','2021-02');
We use UOM conversions at this client. We stock in Eaches and sell in Cases. The problem we are having with the Pick ticket is that both the quantity to be picked and the UOM being picked are the stocking unit and not the selling unit.
e.g. The customer orders 73 cases (12 ea per case). The pick ticket prints 876 each. This requires the warehouse person to look up each item determine if there is a Selling UOM and ratio and to then manually convert 876 eaches to 73 cases.
Obviously, the pick ticket should print 73 cases. But I cannot find a way to do this. The items are lotted and an order of 73 case might have 50 cases of Lot A and 23 cases of Lot B. This is represented in the SOShipLineSplit table. The quantities and UOM in this table are based on Stocking units.
Ideally, I could join the INUnits table to both the SOSHipLine and SOShipLineSPlit table. See Below.
Select case when isnull(U.UnitRate,0) = 0 then S.Qty else S.Qty/U.Unitrate end as ShipQty
,case when isnull(U.UnitRate,0) = 0 then s.uom else U.FromUnit end as UOM
from SOShipLineSplit S
inner join SOShipLine SL
ON S.CompanyID = SL.CompanyID and s.ShipmentNbr = SL.ShipmentNbr and S.LineNbr = SL.LineNbr and S.InventoryID = SL.InventoryID
Left Outer Join INUnit U
On S.CompanyID = U.CompanyID and S.InventoryID = U.InventoryID and s.UOm = U.ToUnit and SL.UOM = U.FromUnit
where S.ShipmentNbr = '000161' and S.CompanyId = 4
The problem is the Acumatica Report writer does not support a join with multiple tables.
Left Outer Join INUnit U
On S.CompanyID = U.CompanyID and S.InventoryID = U.InventoryID and s.UOm = U.ToUnit and SL.UOM = U.FromUnit
I believe I must be missing something. This cannot be the only client using Acumatica who utilizes Selling Units of Measure. Is there another table I could use that would contain the quantities and UOM already converted for this order to Selling Units?
Or another solution?
Thanks in advance.
pat
EDIT:
If the goal is to display accurate quantities before/after conversion then INUnit DAC can't be used. It doesn't store historical data, you can change INUnit values after an order has been finalized so re-using it to compute quantities will not yield accurate results.
For that scenario you would need to use the historical data fields with Base prefixes like ShippedQuantity/BaseShippedQuantity. If you require to store more historical data you need to add a custom field to hold these values and update them when shipment is created/modified.
The main issue appears to be a logical error in the requirement:
The problem is that the INUnit table has to be joined to BOTH the
SOShipLine and the SOShipLineSplit tables.
INUnit DAC has a single parent, not 2 so you need to change your requirement to reflect that constraint.
If SOShipLine and SOShipLineSplit values differ then you'll never get any record.
If they are identical then there's no need to join on both since they have the same value.
I suggest to add 2 joins, one for SOShipLine and another for SOShipLineSplit. In the report you can choose which one to display (1st, 2nd or both).
You can also add visibility conditions or IIF formula condition in the report if you want to handle null values error check for display purpose.
Use the Child Alias property in schema builder to join the same table 2 times without name conflicts. In the report formulas (to display field or in formula conditions) use the Child Alias table name too.
Example:
In Excel connected to SSAS, I am trying to build a pivot table and add a custom Measure Calculation using "OLAP Tools" and/or "OLAP Pivot Table Exensions". I am trying to add a calculation that is really simple in my mind, but I cannot get it to work. The calc I need is:
GOAL: A record count of the [Items] dimension records grouped by any of the
[Items] dimension fields.
In particular I am trying to group by [Items].[Items Groups] and [Items].[Item]. Item is the lowest grain, so the count should return value "1". I have created a couple calculations that are kind of in the ballpark (see below). But the calcs don't appears to be working as desired.
What I have tried:
Attempt #1 -- [Measures].[Items Count (With net amount values)]
DISTINCTCOUNT( {[Items].[Item].MEMBERS} )
The calc 'Items Count (With net amount values)' appears to be
returning a decent count value, but it appears it only counts the Item
if there are transnational records found (not sure why). Also, when
at the lowest grain level the calc returns that value for the parent
group, not the dimension level selected on the rows.
Attempt #2 -- [Measures].[Items Count (All)]
[Items].[Item].[Item].Count
This calc returns the TOTAL item count for the entire dimension
regardless of the dimension level placed on the rows.
Attempt #3 -- [Measures].[Items Count]
COUNT ( { [Items].[Item].MEMBERS}, EXCLUDEEMPTY)
This calc freezes up Excel and I have to quit Excel. No idea why. I have seen this sytnax recommended on a few different sites.
Screenshot:
Help please? This seems really simple, but I am not very skilled with MDX. In DAX and SSAS TABULAR this would be very simple expression. But I'm struggling to count the rows with MDX in SSAS MD.
The "Outside Purchased Beef" group has 18 items with transactions, but 41 items in total. I do not know how to calculate the "41" value.
SSAS Excel-CalcMeasure-CountRows.png
Take a look at the following samples on AdventureWorks.
with member [Measures].[CountTest]
as
count(existing [Product].[Subcategory].members - [Product].[Subcategory].[All])
select
{
[Measures].[Internet Sales Amount],[Measures].[CountTest]
}
on columns,
{
([Product].[Category].[Category]
,[Product].[Subcategory].[Subcategory] -- comment this line for the second result
)
}
on rows
from [Adventure Works]
Now comment the indicated line for the parent view.
I am trying to do a left join on my main table using this code
select distinct VBen.BENF_NO_INDIV_BEN_BANLS as benbanls,
VBen.BENF_COD_SEXE AS Sexe,
VBen.BENF_DAT_NAISS AS DatNaiss,
VBen.BENF_DAT_DECES AS DatDec,
A.date_ch as date_chsld
from PROD.V_FICH_ID_BEN_CM AS VBen
left join (select distinct VAss.BENF_NO_INDIV_BEN_BANLS as benbanls,
vass.BENF_DD_ADMIS_ASSU_MED as date_ch
from Prod.V_ADMIS_ASSU_MED_PLAN_PRIOR_CM as vass ) as A
on VBen.BENF_NO_INDIV_BEN_BANLS =A. benbanls
where Vben.BENF_DAT_NAISS>'2016-04-01' or Vben.BENF_DAT_DECES>'2011-04-01'
The problem is that the query result is a table with of number of rows greater than the main table with the same where 'condition'. I don't understand what I am missing
Thanks for your help
Why is it a problem?
The results simply indicate you have a 1:M (one to many) relationship between VBen:Vass(A)
If you don't have a 1:M relationship and it should be 1:1 then...
you're missing join criteria between the tables.
you should be getting a min/max on your date instead of all dates per benbanls
To better understand and answer we would need to know what VBen and Vass actually represent; but to put simply, you have multiple VASS(A) per VBEN
To illustrate with an example: Think about Order_Header and Order_Line tables...
Order_header contains (order_Number PK)
Order_line contains (Order_Number, Order_Line PK)
An order can have multiple lines, each line could have it's own ship date several items may have gone out on the same shipment/day. where some that were backordered went out on a different day. In this situation, an order would still have multiple lines even though we distinct order_number and shipmentdate in a subquery. I would guess your situation is similar.
so 1 in base table * 2 rows in derived/lines table gives us 2 records
1 < 2 which is the situation you have now; and that to me is perfectly fine and expected if it's a 1:M relationship.
Maybe you need to do a min or max on date instead of a distinct?
If not you're missing join criteria to make a 1:1 relationship
maybe your expectation is just flawed.
The below will give you a 1:1 relationship but I'm not sure it's what you're after.
SELECT distinct VBen.BENF_NO_INDIV_BEN_BANLS as benbanls,
VBen.BENF_COD_SEXE AS Sexe,
VBen.BENF_DAT_NAISS AS DatNaiss,
VBen.BENF_DAT_DECES AS DatDec,
A.date_ch as date_chsld
FROM PROD.V_FICH_ID_BEN_CM AS VBen
LEFT JOIN (SELECT VAss.BENF_NO_INDIV_BEN_BANLS as benbanls,
Max(vass.BENF_DD_ADMIS_ASSU_MED) as date_ch
FROM Prod.V_ADMIS_ASSU_MED_PLAN_PRIOR_CM as vass
GROUP BY VAss.BENF_NO_INDIV_BEN_BANLS) as A
on VBen.BENF_NO_INDIV_BEN_BANLS = A. benbanls
WHERE (Vben.BENF_DAT_NAISS>'2016-04-01'
or Vben.BENF_DAT_DECES>'2011-04-01)
It is likely that there is more than one counterpart in the detail table of a record on the main table.
I try your scenario on my db get a correct result.
In my DB:
select distinct p.PollId as PollId,
p.Title AS Title,
p.InsertDate AS DatDec,
ps.date_ch as date_chsld
from dbo.Poll AS p
left join (select distinct pSt.PollId as pollId,
Max(pSt.InsertDate) as date_ch
from dbo.PollStore as pSt
Group by pSt.PollId ) as ps
on p.PollId =ps.pollId
As Your Query like this :
select distinct VBen.BENF_NO_INDIV_BEN_BANLS as benbanls,
VBen.BENF_COD_SEXE AS Sexe,
VBen.BENF_DAT_NAISS AS DatNaiss,
VBen.BENF_DAT_DECES AS DatDec,
A.date_ch as date_chsld
please try this query
from PROD.V_FICH_ID_BEN_CM AS VBen
left join (select distinct VAss.BENF_NO_INDIV_BEN_BANLS as benbanls,
Max(vass.BENF_DD_ADMIS_ASSU_MED) as date_ch
from Prod.V_ADMIS_ASSU_MED_PLAN_PRIOR_CM Group by VAss.BENF_NO_INDIV_BEN_BANLS as vass ) as A
on VBen.BENF_NO_INDIV_BEN_BANLS =A. benbanls
where Vben.BENF_DAT_NAISS>'2016-04-01' or Vben.BENF_DAT_DECES>'2011-04-01'
Problem Statement :
I have two tables - Data (40 cols) and LookUp(2 cols) . I need to use col10 in data table with lookup table to extract the relevant value.
However I cannot make equi join . I need a join based on like/contains as values in lookup table contain only partial content of value in Data table not complete value. Hence some regex based matching is required.
Data Size :
Data Table : Approx - 2.3 billion entries (1 TB of data)
Look up Table : Approx 1.4 Million entries (50 MB of data)
Approach 1 :
1.Using the Database ( I am using Google Big Query) - A Join based on like take close to 3 hrs , yet it returns no result. I believe Regex based join leads to Cartesian join.
Using Apache Beam/Spark - I tried to construct a Trie for the lookup table which will then be shared/broadcast to worker nodes. However with this approach , I am getting OOM as I am creating too many Strings. I tried increasing memory to 4GB+ per worker node but to no avail.
I am using Trie to extract the longest matching prefix.
I am open to using other technologies like Apache spark , Redis etc.
Do suggest me on how can I go about handling this problem.
This processing needs to performed on a day-to-day basis , hence time and resources both needs to be optimized .
However I cannot make equi join
Below is just to give you an idea to explore for addressing in pure BigQuery your equi join related issue
It is based on an assumption I derived from your comments - and covers use-case when y ou are looking for the longest match from very right to the left - matches in the middle are not qualified
The approach is to revers both url (col10) and shortened_url (col2) fields and then SPLIT() them and UNNEST() with preserving positions
UNNEST(SPLIT(REVERSE(field), '.')) part WITH OFFSET position
With this done, now you can do equi join which potentially can address your issue at some extend.
SO, you JOIN by parts and positions then GROUP BY original url and shortened_url while leaving only those groups HAVING count of matches equal of count of parts in shorteded_url and finally you GROUP BY url and leaving only entry with highest number of matching parts
Hope this can help :o)
This is for BigQuery Standard SQL
#standardSQL
WITH data_table AS (
SELECT 'cn456.abcd.tech.com' url UNION ALL
SELECT 'cn457.abc.tech.com' UNION ALL
SELECT 'cn458.ab.com'
), lookup_table AS (
SELECT 'tech.com' shortened_url, 1 val UNION ALL
SELECT 'abcd.tech.com', 2
), data_table_parts AS (
SELECT url, x, y
FROM data_table, UNNEST(SPLIT(REVERSE(url), '.')) x WITH OFFSET y
), lookup_table_parts AS (
SELECT shortened_url, a, b, val,
ARRAY_LENGTH(SPLIT(REVERSE(shortened_url), '.')) len
FROM lookup_table, UNNEST(SPLIT(REVERSE(shortened_url), '.')) a WITH OFFSET b
)
SELECT url,
ARRAY_AGG(STRUCT(shortened_url, val) ORDER BY weight DESC LIMIT 1)[OFFSET(0)].*
FROM (
SELECT url, shortened_url, COUNT(1) weight, ANY_VALUE(val) val
FROM data_table_parts d
JOIN lookup_table_parts l
ON x = a AND y = b
GROUP BY url, shortened_url
HAVING weight = ANY_VALUE(len)
)
GROUP BY url
with result as
Row url shortened_url val
1 cn457.abc.tech.com tech.com 1
2 cn456.abcd.tech.com abcd.tech.com 2