Writing a subquery by filtering dates - subquery

I would like to write a query and/or subquery where Event_Name - Number1 is < 'date' and the remaining event_names are > 'date'. Then populate results based on the WHERE clause.
Each event_name has an effective_time associated with it.
Any ideas?
**I'm interested where queue event is <'date' and all other events are >'date'
Table
enter image description here
Select *
Table1.ID
, sum(CASE WHEN table1.event_name = 'queue' THEN 1 ELSE 0 END) Name1
, sum(CASE WHEN table1.event_name = 'cash_settle' THEN 1 ELSE 0 END) Name2
, sum(CASE WHEN table.event_name = 'complete_disbursement' THEN 1 ELSE 0 END) Name3
from Table1
WHERE
name3 = '1'
and name2 = '0'

Related

Pyspark How to create columns and fill True/False if rolling datetime record exists

Data-set contains products with daily record but sometime it misses out so i want to create extra columns to show whether it exists or not in the past few days
i have conditions below
Create T-1, T-2 and so on columns and fill it with below
Fill T-1 with 1 the record exist, otherwise zero
Original Table :
Item Cat DateTime Value
A C1 1-1-2021 10
A C1 2-1-2021 10
A C1 3-1-2021 10
A C1 4-1-2021 10
A C1 5-1-2021 10
A C1 6-1-2021 10
B C1 1-1-2021 20
B C1 4-1-2021 20
Expect Result :
Item Cat DateTime Value T-1 T-2 T-3 T-4 T-5
A C1 1-1-2021 10 0 0 0 0 0
A C1 2-1-2021 10 1 0 0 0 0 (T-1 is 1 as we have 1-1-2021 record)
A C1 3-1-2021 10 1 1 0 0 0
A C1 4-1-2021 10 1 1 1 0 0
A C1 5-1-2021 10 1 1 1 1 0
A C1 6-1-2021 10 1 1 1 1 1
B C1 1-1-2021 20 0 0 0 0 0
B C1 2-1-2021 0 1 0 0 0 0 (2-1-2021 record need to be created with value zero since we miss this from original data-set, plus T-1 is 1 as we have this record from original data-set)
B C1 3-1-2021 0 0 1 0 0 0
B C1 4-1-2021 20 0 0 1 0 0
B C1 5-1-2021 0 1 0 0 1 0
Let's assume you have the original table data stored in original_data, we can
create a temporary view to query with spark sql named daily_records
generate possible dates . This was done by identifying the amount of days between the min and max dates from the dataset then generating the possible dates using table generating function explode and spaces
generate all possible item, date records
join these records with the actual to have a complete dataset with values
Use spark sql to query the view and create the additional column using the left joins and CASE statements
# Step 1
original_data.createOrReplaceTempView("daily_records")
# Step 2-4
daily_records = sparkSession.sql("""
WITH date_bounds AS (
SELECT min(DateTime) as mindate, max(DateTime) as maxdate FROM daily_records
),
possible_dates AS (
SELECT
date_add(mindate,index.pos) as DateTime
FROM
date_bounds
lateral view posexplode(split(space(datediff(maxdate,mindate)),"")) index
),
unique_items AS (
SELECT DISTINCT Item, Cat from daily_records
),
possible__item_dates AS (
SELECT Item, Cat, DateTime FROM unique_items INNER JOIN possible_dates ON 1=1
),
possible_records AS (
SELECT
p.Item,
p.Cat,
p.DateTime,
r.Value
FROM
possible__item_dates p
LEFT JOIN
daily_records r on p.Item = r.Item and p.DateTime = r.DateTime
)
select * from possible_records
""")
daily_records.createOrReplaceTempView("daily_records")
daily_records.show()
# Step 5 - store results in desired_result
# This is optional, but I have chosen to generate the sql to create this dataframe
periods = 5 # Number of periods to check for
period_columns = ",".join(["""
CASE
WHEN t{0}.Value IS NULL THEN 0
ELSE 1
END as `T-{0}`
""".format(i) for i in range(1,periods+1)])
period_joins = " ".join(["""
LEFT JOIN
daily_records t{0} on datediff(to_date(t.DateTime),to_date(t{0}.DateTime))={0} and t.Item = t{0}.Item
""".format(i) for i in range(1,periods+1)])
period_sql = """
SELECT
t.*
{0}
FROM
daily_records t
{1}
ORDER BY
Item, DateTime
""".format(
"" if len(period_columns)==0 else ",{0}".format(period_columns),
period_joins
)
desired_result= sparkSession.sql(period_sql)
desired_result.show()
Actual SQL generated:
SELECT
t.*,
CASE
WHEN t1.Value IS NULL THEN 0
ELSE 1
END as `T-1`,
CASE
WHEN t2.Value IS NULL THEN 0
ELSE 1
END as `T-2`,
CASE
WHEN t3.Value IS NULL THEN 0
ELSE 1
END as `T-3`,
CASE
WHEN t4.Value IS NULL THEN 0
ELSE 1
END as `T-4`,
CASE
WHEN t5.Value IS NULL THEN 0
ELSE 1
END as `T-5`
FROM
daily_records t
LEFT JOIN
daily_records t1 on datediff(to_date(t.DateTime),to_date(t1.DateTime))=1 and t.Item = t1.Item
LEFT JOIN
daily_records t2 on datediff(to_date(t.DateTime),to_date(t2.DateTime))=2 and t.Item = t2.Item
LEFT JOIN
daily_records t3 on datediff(to_date(t.DateTime),to_date(t3.DateTime))=3 and t.Item = t3.Item
LEFT JOIN
daily_records t4 on datediff(to_date(t.DateTime),to_date(t4.DateTime))=4 and t.Item = t4.Item
LEFT JOIN
daily_records t5 on datediff(to_date(t.DateTime),to_date(t5.DateTime))=5 and t.Item = t5.Item
ORDER BY
Item, DateTime
NB. to_date is optional if DateTime is already formatted as a date field or in the format yyyy-mm-dd

Update two dataframe column based on condition

I Have a dataframe with columns as 'PK', 'Column1', 'Column2'.
I want to update Column1 and Column2 as follows:
If Column1 > Column2 then (Column1 = Column1 - Column2) and at the same time Column2 = 0
Similarly
If Column1 < Column2 then (Column2 = Column2 - Column1) and at the same time Column1 = 0
I have tried with following but it is not giving expected result:
df["Column1"] = np.where(df['Column1'] > df['Column2'], df['Column1'] - df['Column2'], 0)
df["Column2"] = np.where(df['Column1'] < df['Column2'], df['Column2'] - df['Column1'], 0)
Use DataFrame.assign for avoid testing overwriten column Column1 in second line of your code:
df = pd.DataFrame({
'Column1':[4,5,4,5,5,4],
'Column2':[7,8,9,4,2,3],
})
print (df)
Column1 Column2
0 4 7
1 5 8
2 4 9
3 5 4
4 5 2
5 4 3
a = np.where(df['Column1'] > df['Column2'], df['Column1'] - df['Column2'], 0)
b = np.where(df['Column1'] < df['Column2'], df['Column2'] - df['Column1'], 0)
df = df.assign(Column1 = a, Column2 = b)
print (df)
Column1 Column2
0 0 3
1 0 3
2 0 5
3 1 0
4 3 0
5 1 0

Sub queries in Apache Hive

I have a table with following structure
col1 col2 col3 col4 category
300 200 100 20 1
200 100 30 300 2
400 100 100 70 1
100 30 200 100 1
Now i am trying to calculate for col1 what % of total rows have value <= 100, for col2 what % of total rows have value<=50 and so on and from category I only want to select category 1
so the resulting table should look like
col1(<=100) col2(<=50)
x% x%
I tried something like this but don't know how to write sub query for this
SELECT COUNT(*) AS Total, COUNT(value1)* 100 /Total) AS col1(<=100) FROM table1 WHERE Category=1 GROUP BY value1 HAVING value1 <=100
Looks like I need multiple select queries, plz help
You can try using CASE statement as below:
"SELECT SUM(CASE WHEN col1 < 100 THEN 1 ELSE 0 END) / COUNT(col1) AS pct_col1,
SUM(CASE WHEN col2 < 50 THEN 1 ELSE 0 END) / COUNT(col2) AS pct_col2
FROM table1;"
Thanks
Arani

Replace suffix/ending on the Product name in query for grouping

In our database system we have some products that have the same name but have different endings.
Examples:
2012 2013
Produkt namn xyz ab 2002 0 2
Produkt namn xyz cd 2004 5 3
Produkt namn xyz ef 2002 2 1
Produkt namn xyz gh 2006 3 0
We would like to group them under one name.
Produkt namn xyz 10 6
We have created an query or stored procedure which is the source of a report. I used the REPLACE function to group similar products but in this case it is to many different so it would be useful to use wildcard with REPLACE like this:
REPLACE(product_name, 'Product name xyz%', 'Product name xyz')
Then we could group similar products under one product name.
Unfortunately it does not work that way.
Does anyone have a good idea on how to solve it?
EDIT:
Perhaps I was unclear. We have a lot of different products with different names and different length of the names.
New example list:
2012 2013
units sold units sold
Produkt namn kkkkkkk mmmm 7 9
Produkt namn xyz ab 2002 0 2
Produkt namn xyz cd 2004 5 3
Produkt namn AAAAA MMMM NN 6 8
Produkt namn xyz ef 2002 2 1
Produkt namn xyz gh 2006 3 0
Produkt namn ABC 123 anything 4 9
Find this part (Produkt namn xyz" cd 2004") of this particular product and replace with nothing to group them under one name for summary.
Produkt namn xyz 10 6
My code so far:
ALTER PROC [dbo].[uspOnAndOffTrade]
(
#PeriodStart AS INT
, #PeriodEnd AS INT
, #CurPerStart AS INT -- Ny 140510
, #PrePerEnd AS INT -- Ny 140510
, #PreYr AS INT = NULL
, #CurYr AS INT = NULL
, #FamiljTyp AS INT = NULL
, #ProdNo AS INT = NULL
, #ProdDescr AS VARCHAR(MAX) = NULL
, #ProducentName AS VARCHAR(MAX) = NULL
, #CustomerName AS VARCHAR(MAX) = NULL
, #Land AS VARCHAR(MAX) = NULL
, #Region AS VARCHAR(MAX) = NULL
)
AS
BEGIN
WITH qAllTradeSumTot
AS (
SELECT pt.InvoDt
, REPLACE(
REPLACE(
REPLACE(p.Descr, 'Product Name xxx 12345', 'Product Name xxx 750 ml'),
'Product Name 2004', 'Product Name'), 'Product Name 2002', 'Product Name') AS 'Product'
, p.Inf AS 'Producent'
, a.Nm AS 'Customer', p.Gr4, a.R9,
p.R8 AS 'Familj', pt.PrTp, p.ProdTp4,
dbo.ProdCat.Descr AS 'Land', ProdCat_1.Descr AS 'Region',
p.R8, pt.TrTp,
CASE WHEN p.Gr4 = 0 THEN CAST(ROUND(SUM(pt.NoInvoAb * 1), 0) AS INT)
ELSE CAST(ROUND(SUM(pt.NoInvoAb * p.Gr4), 0) AS INT) END AS 'SumU', LEFT(pt.InvoDt, 4)
AS 'Yr'
FROM dbo.ProdCat INNER JOIN
dbo.Prod AS p ON dbo.ProdCat.PrCatNo = p.PrCatNo INNER JOIN
dbo.ProdTr AS pt ON p.ProdNo = pt.ProdNo INNER JOIN
dbo.ProdCat AS ProdCat_1 ON p.PrCatNo2 = ProdCat_1.PrCatNo INNER JOIN
dbo.Actor AS a ON pt.CustNo = a.CustNo
WHERE ((a.R9 = '1') OR (a.R9 = '2'))
AND (pt.Price <>0)
AND (pt.InvoDt BETWEEN #PeriodStart AND #PeriodEnd)
AND (#FamiljTyp IS NULL OR p.R8 = #FamiljTyp)
AND (#CustomerName IS NULL OR a.Nm LIKE '%' + #CustomerName + '%')
AND (#ProdNo IS NULL OR p.ProdNo = #ProdNo)
AND (#ProdDescr IS NULL OR p.Descr LIKE '%' + #ProdDescr + '%')
AND (#ProducentName IS NULL OR p.Inf LIKE '%' + #ProducentName + '%')
AND (#Land IS NULL OR ProdCat.Descr LIKE '%' + #Land + '%')
AND (#Region IS NULL OR ProdCat_1.Descr LIKE '%' + #Region + '%')
GROUP BY pt.InvoDt, p.Descr, p.Inf, a.Nm, p.Gr4, a.R9, p.R8, pt.TrTp,
LEFT(pt.InvoDt, 4), p.ProdTp4, dbo.ProdCat.Descr, ProdCat_1.Descr, pt.PrTp
)
SELECT ats.Producent, Product,
SUM(CASE WHEN (InvoDt BETWEEN #PeriodStart AND #PrePerEnd) AND R9 = '1' THEN SumU ELSE 0 END) AS [OffTrade1],
SUM(CASE WHEN (InvoDt BETWEEN #CurPerStart AND #PeriodEnd) AND R9 = '1' THEN SumU ELSE 0 END) AS [OffTrade2],
SUM(CASE WHEN (InvoDt BETWEEN #PeriodStart AND #PrePerEnd) AND R9 = '2' THEN SumU ELSE 0 END) AS [OnTrade1],
SUM(CASE WHEN (InvoDt BETWEEN #CurPerStart AND #PeriodEnd) AND R9 = '2' THEN SumU ELSE 0 END) AS [OnTrade2],
SUM(CASE WHEN (InvoDt BETWEEN #PeriodStart AND #PrePerEnd) THEN ats.SumU ELSE 0 END) AS [Tot1],
SUM(CASE WHEN (InvoDt BETWEEN #CurPerStart AND #PeriodEnd) THEN ats.SumU ELSE 0 END) AS [Tot2]
FROM qAllTradeSumTot AS ats
GROUP BY Producent, Product
HAVING (SUM(CASE WHEN (InvoDt BETWEEN #PeriodStart AND #PrePerEnd) THEN ats.SumU END) >= 0 )
OR (SUM(CASE WHEN (InvoDt BETWEEN #CurPerStart AND #PeriodEnd) THEN ats.SumU END) >= 0)
ORDER BY Product;
END
Hope it is clearer now!
Thanks for all your help!
Kind regards
/martin
Regarding my comment - Have you tried a simple LEFT statement in your grouping? Here's an example, I've used a table variable to provide some test data.
DECLARE #X AS TABLE ( product varchar (50))
INSERT INTO #X (product) VaLUES('xx abc 1')
INSERT INTO #X (product) VaLUES('xx abc 2')
INSERT INTO #X (product) VaLUES('xx abc 3')
INSERT INTO #X (product) VaLUES('xx def 1')
INSERT INTO #X (product) VaLUES('yy abc 1')
INSERT INTO #X (product) VaLUES('yy abc 2')
INSERT INTO #X (product) VaLUES('zz abc 1')
SELECT LEFT(product, 6) AS leftbit, COUNT(*)
FROM #X
GROUP BY LEFT(product, 6)
results
leftbit Column1
xx abc 3
xx def 1
yy abc 2
zz abc 1
select 'Produkt namn xyz', sum(units2012), sum(units2013)
from table
where left(name,16) = 'Produkt namn xyz'

Output Multiple Aggregates From One Table

I have a single table 'EMPLOYEE'. I need to count the 'emp_no', so that I have a multiple columns with each aggregate based on different restrictions. Not sure how to write to get the below output.
SELECT DEP_NO, COUNT(EMP_NO) Active
FROM EMPLOYEE
WHERE STATUS = 'active'
SELECT DEP_NO, COUNT(EMP_NO) "On Leave"
FROM EMPLOYEE
WHERE STATUS = 'on leave'
dep_no| Active On Leave Female Male
------|------------------------------
1 | 236 10 136 100
2 | 500 26 250 250
3 | 130 2 80 50
4 | 210 7 60 150
One possible answer is to use SUM + CASE
SELECT DEP_NO, SUM(CASE WHEN STATUS = 'active' THEN 1 ELSE 0 END) AS Active,
SUM(CASE WHEN STATUS = 'on leave' THEN 1 ELSE 0 END) AS [On Leave],
SUM(CASE WHEN STATUS = 'female' THEN 1 ELSE 0 END) AS Female,
SUM(CASE WHEN STATUS = 'male' THEN 1 ELSE 0 END) AS Male
FROM EMPLOYEE
GROUP BY DEP_NO

Resources