JPQL: SELECT b, count(ts) FROM Branch b JOIN b.tourScheduleList WHERE ts.deleted = 0 - jpql

I get the desired result here
SELECT b, count(ts) FROM Branch b JOIN b.tourScheduleList ts WHERE ts.deleted = 0 GROUP BY b.id ORDER BY b.name ASC
b1 | 2
b2 | 1
but then I need to get the count of ts.tourAppliedList so I updated the query to
SELECT b, count(ts), count(ta) FROM Branch b JOIN b.tourScheduleList ts JOIN ts.tourAppliedList ta WHERE ts.deleted = 0 GROUP BY b.id ORDER BY b.name ASC
which resulted to
b1 | 3 | 3
b2 | 2 | 2
the result is wrong. I don't know why count(ts) is equal to count(ta)
I tried returning ts then just do a count later but it's returning all its content without considering the ts.deleted = 0
SELECT b, ts FROM Branch b JOIN b.tourScheduleList ts WHERE ts.deleted = 0 GROUP BY b.id ORDER BY b.name ASC
then in the view I just #{item.ts.tourAppliedList.size()} it's not considering the ts.deleted = 0

The problem is your expectation is wrong.
This Join will give you:
b1 | ts1 | ta1
b1 | ts1 | ta2
b1 | ts2 | ta3
b2 | ts3 | ta4
b2 | ts3 | ta5
Or something along this line...
What happens when you group and count those rows?
Simple you have 3 entry for b1 and 2 for b2.
What you need there is count(distinct ts)
Since there are multiple ts for every different ta you would then find a difference
P.s. i dont know if jpql permit a count(distinct ), if thats the case you better do two query and count ts with the join only on ts and then ta with the join on ts and ta

Related

Identify if an upsert operation inserts or updates the row in YugabyteDB

[Question posted by a user on YugabyteDB Community Slack]
Is there a way to identify if an upsert operation like the one shown below, inserts or either updates the row e.g., with the Java or Golang driver?
UPDATE test set value = 'value1', checkpoint = 'cas1' WHERE key = 'key1' IF checkpoint = '' OR NOT EXISTS;
The RETURNS STATUS AS ROW is a YCQL feature. In YSQL, you could use an AFTER INSERT OR UPDATE... EACH ROW trigger to detect the outcome. The challenge, then, would be to surfcae the result in the session that made the change. You could use a user-defined run-time parameter (set my_app.outcome = 'true') or a temp table.
—regards, bryn#yugabyte.com
You can use RETURNS STATUS AS ROW as documented here: https://docs.yugabyte.com/preview/api/ycql/dml_update/#returns-status-as-row 2
Example:
cqlsh:sample> CREATE TABLE test(h INT, r INT, v LIST<INT>, PRIMARY KEY(h,r)) WITH transactions={'enabled': true};
cqlsh:sample> INSERT INTO test(h,r,v) VALUES (1,1,[1,2]);
Unapplied update when IF condition is false:
cqlsh:sample> UPDATE test SET v[2] = 4 WHERE h = 1 AND r = 1 IF v[1] = 3 RETURNS STATUS AS ROW;
[applied] | [message] | h | r | v
-----------+-----------+---+---+--------
False | null | 1 | 1 | [1, 2]
Applied update when IF condition true:
cqlsh:sample> UPDATE test SET v[0] = 4 WHERE h = 1 AND r = 1 IF v[1] = 2 RETURNS STATUS AS ROW;
[applied] | [message] | h | r | v
-----------+-----------+------+------+------
True | null | null | null | null

Get last number until specific row in excel

I have an excel spreadsheet that looks like this column:
id
----
1
a
b
c
2
d
e
f
3
g
h
i
1
c
d
e
2
a
d
f
Due to the fact that the numbers aren't really IDs, but group-IDs, the desired output structure is:
id | group_id
----
a | 1
b | 1
c | 1
d | 2
e | 2
f | 2
g | 3
h | 3
i | 3
c | 1
d | 1
e | 1
a | 2
d | 2
f | 2
It occurred to me that I could manipulate the formula to obtain the last non-empty value in some manner:
=LOOKUP(2,1/(B:B<>""),B:B)
I couldn't figure out how to change the internal condition to find the last digit/number value. Note: the original order is essential.
Does anyone have a suggestion?
You could produce the matching numbers for each letter using a spill formula with xlookup on the row numbers like this if you have Excel 365:
=LET(range,A1:A20,
filteredNumbers,FILTER(range,ISNUMBER(range)),
filteredNumberRowNumbers,FILTER(ROW(range),ISNUMBER(range)),
filteredLetterRowNumbers,FILTER(ROW(range),ISTEXT(range)),
XLOOKUP(filteredLetterRowNumbers,filteredNumberRowNumbers,filteredNumbers,,-1))
to get the letters themselves it's just
=FILTER(A1:A20,ISTEXT(A1:A20))
Try to apply SCAN():
Formula in C2:
=FILTER(CHOOSE({1,2},A2:A21,SCAN(0,A2:A21,LAMBDA(a,b,IF(ISNUMBER(b),b,a)))),ISTEXT(A2:A21))
Or, with access to VSTACK() and HSTACK(), to include headers:
=VSTACK({"ID","GROUP_ID"},FILTER(HSTACK(A2:A21,SCAN(0,A2:A21,LAMBDA(a,b,IF(ISNUMBER(b),b,a)))),ISTEXT(A2:A21)))
All the above answers are great. Another option that works for me is:
=LOOKUP(2,1/(ISNUMBER($A$1:A2)),$A$1:A2)
I insert that formula in B2 and use flash fill to reuse each row. Then, I filtered out the rows with letters in the A column.

Pyspark How to create columns and fill True/False if rolling datetime record exists

Data-set contains products with daily record but sometime it misses out so i want to create extra columns to show whether it exists or not in the past few days
i have conditions below
Create T-1, T-2 and so on columns and fill it with below
Fill T-1 with 1 the record exist, otherwise zero
Original Table :
Item Cat DateTime Value
A C1 1-1-2021 10
A C1 2-1-2021 10
A C1 3-1-2021 10
A C1 4-1-2021 10
A C1 5-1-2021 10
A C1 6-1-2021 10
B C1 1-1-2021 20
B C1 4-1-2021 20
Expect Result :
Item Cat DateTime Value T-1 T-2 T-3 T-4 T-5
A C1 1-1-2021 10 0 0 0 0 0
A C1 2-1-2021 10 1 0 0 0 0 (T-1 is 1 as we have 1-1-2021 record)
A C1 3-1-2021 10 1 1 0 0 0
A C1 4-1-2021 10 1 1 1 0 0
A C1 5-1-2021 10 1 1 1 1 0
A C1 6-1-2021 10 1 1 1 1 1
B C1 1-1-2021 20 0 0 0 0 0
B C1 2-1-2021 0 1 0 0 0 0 (2-1-2021 record need to be created with value zero since we miss this from original data-set, plus T-1 is 1 as we have this record from original data-set)
B C1 3-1-2021 0 0 1 0 0 0
B C1 4-1-2021 20 0 0 1 0 0
B C1 5-1-2021 0 1 0 0 1 0
Let's assume you have the original table data stored in original_data, we can
create a temporary view to query with spark sql named daily_records
generate possible dates . This was done by identifying the amount of days between the min and max dates from the dataset then generating the possible dates using table generating function explode and spaces
generate all possible item, date records
join these records with the actual to have a complete dataset with values
Use spark sql to query the view and create the additional column using the left joins and CASE statements
# Step 1
original_data.createOrReplaceTempView("daily_records")
# Step 2-4
daily_records = sparkSession.sql("""
WITH date_bounds AS (
SELECT min(DateTime) as mindate, max(DateTime) as maxdate FROM daily_records
),
possible_dates AS (
SELECT
date_add(mindate,index.pos) as DateTime
FROM
date_bounds
lateral view posexplode(split(space(datediff(maxdate,mindate)),"")) index
),
unique_items AS (
SELECT DISTINCT Item, Cat from daily_records
),
possible__item_dates AS (
SELECT Item, Cat, DateTime FROM unique_items INNER JOIN possible_dates ON 1=1
),
possible_records AS (
SELECT
p.Item,
p.Cat,
p.DateTime,
r.Value
FROM
possible__item_dates p
LEFT JOIN
daily_records r on p.Item = r.Item and p.DateTime = r.DateTime
)
select * from possible_records
""")
daily_records.createOrReplaceTempView("daily_records")
daily_records.show()
# Step 5 - store results in desired_result
# This is optional, but I have chosen to generate the sql to create this dataframe
periods = 5 # Number of periods to check for
period_columns = ",".join(["""
CASE
WHEN t{0}.Value IS NULL THEN 0
ELSE 1
END as `T-{0}`
""".format(i) for i in range(1,periods+1)])
period_joins = " ".join(["""
LEFT JOIN
daily_records t{0} on datediff(to_date(t.DateTime),to_date(t{0}.DateTime))={0} and t.Item = t{0}.Item
""".format(i) for i in range(1,periods+1)])
period_sql = """
SELECT
t.*
{0}
FROM
daily_records t
{1}
ORDER BY
Item, DateTime
""".format(
"" if len(period_columns)==0 else ",{0}".format(period_columns),
period_joins
)
desired_result= sparkSession.sql(period_sql)
desired_result.show()
Actual SQL generated:
SELECT
t.*,
CASE
WHEN t1.Value IS NULL THEN 0
ELSE 1
END as `T-1`,
CASE
WHEN t2.Value IS NULL THEN 0
ELSE 1
END as `T-2`,
CASE
WHEN t3.Value IS NULL THEN 0
ELSE 1
END as `T-3`,
CASE
WHEN t4.Value IS NULL THEN 0
ELSE 1
END as `T-4`,
CASE
WHEN t5.Value IS NULL THEN 0
ELSE 1
END as `T-5`
FROM
daily_records t
LEFT JOIN
daily_records t1 on datediff(to_date(t.DateTime),to_date(t1.DateTime))=1 and t.Item = t1.Item
LEFT JOIN
daily_records t2 on datediff(to_date(t.DateTime),to_date(t2.DateTime))=2 and t.Item = t2.Item
LEFT JOIN
daily_records t3 on datediff(to_date(t.DateTime),to_date(t3.DateTime))=3 and t.Item = t3.Item
LEFT JOIN
daily_records t4 on datediff(to_date(t.DateTime),to_date(t4.DateTime))=4 and t.Item = t4.Item
LEFT JOIN
daily_records t5 on datediff(to_date(t.DateTime),to_date(t5.DateTime))=5 and t.Item = t5.Item
ORDER BY
Item, DateTime
NB. to_date is optional if DateTime is already formatted as a date field or in the format yyyy-mm-dd

Pandas: With array of col names in a desired column order, select those that exist, NULL those that don't

I have an array of column names I want as my output table in that order e.g. ["A", "B", "C"]
I have an input table that USUALLY contains all of the values in the array but NOT ALWAYS (the raw data is a JSON API response).
I want to select all available columns from the input table, and if a column does not exist, I want it filled with NULLs or NA or whatever, it doesn't really matter.
Let's say my input DataFrame (call it input_table) looks like this:
+-----+--------------+
| A | C |
+-----+--------------+
| 123 | test |
| 456 | another_test |
+-----+--------------+
I want an output dataframe that has columns A, B, C in that order to produce
+-----+------+--------------+
| A | B | C |
+-----+------+--------------+
| 123 | NULL | test |
| 456 | NULL | another_test |
+-----+------+--------------+
I get a keyerror when I do input_table[["A","B","C"]]
I get a NoneType returned when I do input_table.get(["A","B","C"])
I was able to achieve what I want via:
for i in desired_columns_array:
if i not in input_dataframe:
ouput_dataframe[i] = ""
else:
output_dataframe[i] = input_dataframe[i]
But I'm wondering if there's something less verbose?
How do I get a desired output schema to match an input array when one or more columns in the input dataframe may not be present?
Transpose and reindex
df = pd.DataFrame([[123,'test'], [456, 'another test']], columns=list('AC'))
l = list('ACB')
df1 = df.T.reindex(l).T[sorted(l)]
A B C
0 123 NaN test
1 456 NaN another test
DataFrame.reindex over the column axis:
cols = ['A', 'B', 'C']
df.reindex(cols, axis='columns')
A B C
0 123 NaN test
1 456 NaN another_test

Splitting rows that contain a list of postcode prefixes into multiple rows, based on postcode area

I have a table with several columns of data, one of which contains a list of different combined postcode prefixes on the same row.
Here's an example of the table layout:
+------+-----------------------------+
| Col1 | Col2 |
+------+-----------------------------+
| a | AB10; AB11; DD10; DD9 |
| b | S5; SS7; AA1; AA4 |
| c | AB33; AB34; AB36; GG10; GS9 |
+------+-----------------------------+
I'm looking to split the postcode prefixes into multiple rows, based on the area of the postcode, as below:
+------+------------------+
| Col1 | Col2 |
+------+------------------+
| a | AB10; AB11 |
| a | DD10; DD9 |
| b | S5 |
| b | SS7 |
| b | AA1; AA4 |
| c | AB33; AB34; AB36 |
| c | GG10 |
| c | GS9 |
+------+------------------+
I've found a VBA solution that splits, using the semicolon as a delimiter, but not how I need it done.
Sub splitByColB()
Dim r As Range, i As Long, ar
Set r = Worksheets("Sheet1").Range("B4").End(xlUp)
Do While r.Row > 1
ar = Split(r.Value, ";")
If UBound(ar) >= 0 Then r.Value = ar(0)
For i = UBound(ar) To 1 Step -1
r.EntireRow.Copy
r.Offset(1).EntireRow.Insert
r.Offset(1).Value = ar(i)
Next
Set r = r.Offset(-1)
Loop
End Sub
I could import the table in SQLExpress, so an SQL solution would also be welcome.
The SQL solution I put together uses a T-SQL function called DelimitedSplit8K which works like the VB SPLIT function you are using.
-- Sample Data
DECLARE #table TABLE (Col1 CHAR(1) UNIQUE, Col2 CHAR(200));
INSERT #table (Col1,Col2) VALUES ('a','AB10; AB11; DD10; DD9'),
('b','S5; SS7; AA1; AA4'),('c','AB33; AB34; AB36; GG10; GS9');
WITH xx(Col1,i,Pre) AS
(
SELECT t2.Col1, ss.Item+'', f.Pre
FROM #table AS t2
CROSS APPLY dbo.DelimitedSplit8K(t2.Col2,';') AS s
CROSS APPLY (VALUES(RTRIM(LTRIM(s.item)))) AS ss(Item)
CROSS APPLY (VALUES(SUBSTRING(ss.Item,0,PATINDEX('%[0-9]%',ss.Item)))) AS f(Pre)
)
SELECT xx.col1, col2 = STUFF((SELECT '; '+i
FROM xx AS x2
WHERE x2.Col1 = xx.Col1 AND x2.Pre = xx.Pre
FOR XML PATH('')),1,2,'')
FROM xx
GROUP BY col1, xx.Pre;
Returns:
col1 Col2
---- ----------------------
a AB10; AB11
a DD10; DD9
b AA1; AA4
b S5
b SS7
c AB33; AB34; AB36
c GG10
c GS9
I also put together a solution that works with SQL Server 2017 which is cleaner (in case you upgrade or others or using 2017.)
-- Sample Data
DECLARE #table TABLE (Col1 CHAR(1) UNIQUE, Col2 CHAR(200));
INSERT #table (Col1,Col2) VALUES ('a','AB10; AB11; DD10; DD9'),
('b','S5; SS7; AA1; AA4'),('c','AB33; AB34; AB36; GG10; GS9');
SELECT t.Col1, split.item
FROM #table AS t
CROSS APPLY
(
SELECT STRING_AGG(ss.Item,'; ') WITHIN GROUP (ORDER BY ss.Item)
FROM #table AS t2
CROSS APPLY STRING_SPLIT(t2.Col2,';') AS s
CROSS APPLY (VALUES(TRIM(s.[value]))) AS ss(Item)
WHERE t.Col1 = t2.col1
GROUP BY SUBSTRING(ss.Item,0,PATINDEX('%[0-9]%',ss.Item))
) AS split(item);
you could use nested dictionary objects:
Sub splitByColB()
Dim r As Range, ar, val1, val2, prefix As String
Dim obj1 As Object, obj2 As Object
Set obj1 = CreateObject("Scripting.Dictionary")
With Worksheets("Sheet1")
For Each r In .Range("B2:B4")
Set obj2 = CreateObject("Scripting.Dictionary")
With obj2
For Each val2 In Split(Replace(r.Value2, " ", vbNullString), ";")
prefix = GetLetters(CStr(val2))
.Item(prefix) = .Item(prefix) & val2 & " "
Next
End With
Set obj1.Item(r.Offset(, -1).Value2) = obj2
Next
.Range("A2:B4").ClearContents
For Each val1 In obj1.keys
.Cells(.Rows.Count, 1).End(xlUp).Offset(1).Resize(obj1(val1).Count).Value = val1
For Each val2 In obj1(val1).keys
.Cells(.Rows.Count, 2).End(xlUp).Offset(1).Value = obj1(val1)(val2)
Next
Next
End With
End Sub
Function GetLetters(s As String) As String
Dim i As Long
Do While Not IsNumeric(Mid(s, i + 1, 1))
i = i + 1
Loop
GetLetters = Left(s, i)
End Function

Resources