SQL query - SUM duration values (hh:mm:ss) from ALN field - string

I have table T1 with IDs:
ID
1
2
I have table T2 with ID and GROUP_TRACKING time because record can be multiple times at specific group. GROUP_TRACKING time is ALN type (STRING) and this cannot be changed but it contains always duration value in hh:mm:ss where hh column always has at least 2 characters but of course it can contain and more characters in case when record has been is some group for very long period of time:
ID GROUP GROUP_TRACKING
1 GROUP1 05:55:05
1 GROUP1 10:10:00
1 GROUP2 111:51:00
1 GROUP2 01:01:00
So I need to made SELECT clause from T1 table and to join T2 table to track for each group (G1 and G2) how much time it spent for that specific group.
So the final result should be like this:
ID GROUP1 GROUP2
1 16:05:05 112:52:00
2 null null
How to make this SELECT SUM of these duration in hours and minutes calculation?
Thank you

Here is a solution without the pivot step (which can be looked up in many other ansers)
with temp as (
select id
, group
, group_tracking
, SUBSTR(group_tracking, 1,LOCATE(':',group_tracking)-1) * 3600 as First_PART_in_s
, SUBSTR(group_tracking, LOCATE(':',group_tracking)+1,2) * 60 as Second_PART_in_s
, SUBSTR(group_tracking, LOCATE(':',group_tracking, LOCATE(':',group_tracking)+1)+1, 2) as Third_PART_in_s
from t2
)
select t1.id
, t.group
, int(sum(First_PART_in_s + Second_PART_in_s + Third_PART_in_s) / 360000) || replace(char(time('00:00:00') + mod(sum(First_PART_in_s + Second_PART_in_s + Third_PART_in_s),360000) seconds,ISO),'.',':') as duration
from t1
left join temp t
on t1.id = t.id
group by t1.id, group
I completly agree with #Clockwork-Muse that formats matter and using this inadequate format imposes lots of additional effort for reformatting or deconstructing and re-constructing things.

Related

Excel PowerPivot Count new and distinct items in a period not counted before

Assuming I have the following data table:
Period | ID
-----------
P1 | 1
P2 | 1
P1 | 2
P2 | 3
P1 | 2
I am intersted in the number of unique IDs / Period only if the ID has not been counted already in a pervious period, ordered alphabatically. IDs per period in the source themselves can already occure multiple times and shall count as 1 / peroid (distinct count).
Also the data source is not pre-ordered by period and I have no influence on the sort order.
So the result I would like to get in a Pivot is like:
Period | Number of Unique IDs not already counted
-------------------------------------------------
P1 | 2 # Because the are uniquelly ID 1 and 2 in the period
P2 | 1 # Only counting ID 3, because ID 1 has already been counted in period 1
Please help me with the DAX measure I can use in the Pivot.
This is a measure written in DAX. It should work in a pivot table with the Period selected on the rows
DistinctID =
VAR PeriodsPerId =
SELECTCOLUMNS (
ALL ( T[ID] ),
"ID", T[ID],
"Period", CALCULATE ( MIN ( T[Period] ), ALLEXCEPT ( T, T[ID] ) )
)
RETURN
COUNTROWS ( FILTER ( PeriodsPerId, [Period] IN VALUES ( T[Period] ) ) )
It works first by preparing a table variable containing the minimum period per ID and then filtering this table for the Periods in the current selection.
Of course, if the Period is selected through a dimension, substitute the dimension in the last VALUES
Here's one way, which would require to reposition your columns as well as add a new column. This assumes you don't have duplicates in ID/Period combos. You didn't list any duplicates in your sample, so I'm making this assumption.
In my data, I have ID as column A and Period as column B.
Order your data by Period, ascending. Then in column C, you can use this formula to determine if that ID has been used before.
Cell C2 formula: =IF(VLOOKUP(A2,A:B,2,FALSE) = B2,1,0)
Copy it down and then create your pivot table, summing column C.

How do I create a time frame window for customer purchases from a database where each row is a unique purchase?

I have a table in BigQuery, where each row represents a unique purchase made by a customer. The table has customer ID, what they purchased, when it was purchased and how many times we have seen that customer.
I want to create a table that contains a row of the customers first purchase and anything they bought within 31 days of this purchase.
I also want a new row for their second purchase and again, anything they bought within 31 days of this purchase.
Im trying to see if there are any patterns here, to help drive CRM campaigns.
If this would be easier in python, I can use that also.
I tried using something like this:
SELECT t1.ID, t1.ITEM as FIRST_PURCHASE, t2.ITEM as SECOND_PURCHASE, t3.ITEM as THIRD_PURCHASE
FROM `TABLE` as t1
left join `TABLE` as t2
on t1.ID = t2.ID and t1.PURCHASE_NUMBER = t2.PURCHASE_NUMBER+1
left join `TABLE` as t3
on t1.ID = t2.ID and t1.PURCHASE_NUMBER = t3.PURCHASE_NUMBER+2
where (t2.DATE_DIFF <= 31) and (t3.DATE_DIFF <= 31)
But obviously I only get a 31 day window from the first purchase, and no 31 day windows from any future purhases. I also thought I might be able to attempt this in python using pivot?
I think you can approach it somewhat like this:
with purchases as (
-- dummy column names
select customerID, item, date, visits from `project.dataset.table`
),
purchase_ordering as (
select *, row_number() over (partition by customerID order by date asc) as rn
from purchases
),
first_two_purchases as (
select
* except(rn),
case when rn = 1 then 'First Purchase' else 'Second Purchase' end as purchase_order
from purchase_ordering
where rn <= 2
),
additional_purchase_logic as (
select
ftp.purchase_order,
ftp.customerID,
ftp.item,
ftp.date,
array_agg(struct(p.item,p.date)) as within_31_days_of_purchase
from first_two_purchases ftp
left join purchases p using(customer_id)
where p.date > ftp.date and p.date <= date_add(ftp.date, interval 31 day)
group by 1,2,3,4
)
select * from additional_purchase_logic
Note, if your 2nd purchases is within 31 days of first purchase, it will show up as a subsequent purchase in your 'First Purchase' row as well as an independent record in your 'Second Purchase' row.

Google BigQuery nested select subquery in cross join

I have the following code:
SELECT ta.application as koekkoek, ta.ipc, ipc_count/ipc_tot as ipc_share, t3.sfields FROM (
select t1.appln_id as application, t1.ipc_subclass_symbol as ipc, count(t2.appln_id) as ipc_count, sum(ipc_count) over (PARTITION BY application) as ipc_tot
FROM temp.tls209_small t1
CROSS JOIN
(SELECT appln_id, FROM temp.tls209_small group by appln_id ) t2
where t1.appln_id = t2.appln_id
GROUP BY application, ipc
) as ta
CROSS JOIN thesis.ifris_ipc_concordance t3
WHERE ta.ipc LIKE t3.ipc+'%'
AND ta.ipc NOT LIKE t3.not_ipc+'%'
AND t3.not_appln_id NOT IN
(SELECT ipc_subclass_symbol from temp.tls209_small t5 where t5.appln_id = ta.application)
Giving the folllowing error:
Field 'ta.application' not found.
I have tried numerous notations for the field, but BigQuery doesn't seem to recognize any reference to other tables in the subquery.
The purpose of the code is as to assign new technology classifications to records based on a concordance table:
I have got two tables:
One large table with application id's, classifications and some other stuff tls209_small:
And a concordance table with some exception rules ifris_ipc_concordance:
In the end I need to assign the sfields label for each row in tls209 (300 million rows). The rules are that ipc_class_symbol+'%' from the first table should be like ipcin the second table, but not like not_ipc.
In addition, the not_appln_id value, if present, should not be associated with the same appln_id in the first table.
So a small example, say this is the input of the query:
appln_id | ipc_class_symbol
1 | A1
1 | A2
1 | A3
1 | C3
sfields | ipc | not_ipc | not_appln_id
X | A | A2 | null
Y | A | null | A3
appln_id 1 should get two times sfields X because ipc=A, not_ipc matches A1 and A3.
Y should not be assigned at all as A3 occurs in appln_id 1.
In the results, I also need the share of the ipc_class_symbol for a single application (1 for 328100001, 0.5 for 32100009 etc.)
Without the last condition (AND t3.not_appln_id NOT IN (SELECT ipc_subclass_symbol from temp.tls209_small t5 where t5.appln_id = ta.application) ) the query works fine:
Any suggestions on how to get the subquery to recognize the application id (ta.application), or other ways to introduce the last condition to the query?
I realize my explanation of the problem may not be very straightforward, so if anything is not clear please indicate so, I'll try to clarify the issues.
The query you're performing is doing an anti-join. You can re-write this as an explicit join, but it is a little verbose:
SELECT *
FROM [x.z] as z
LEFT OUTER JOIN EACH [x.y] as y ON y.appln_id = z.application
WHERE y.not_appln_id is NULL
A working solution for the problem was achieved by first generating a table my matching only the ipc_class_symbol from the first table, to the ipc column of the second, but also including the not_ipc, and not_appln_id columns from the second. In addition, a list of all ipc class labels assigned to each appln_id was added using the GROUP_CONCAT method.
Finally, with help from Pentium10, the resulting table was filtered based on the exeption rules as also discussed in this question.
In the final query, the GROUP BY and JOIN arguments needed EACH modifiers to allow the large tables to be processed:
SELECT application as appln_id, ipc as ipc_class, ipc_share, sfields as ifris_class FROM (
SELECT * FROM (
SELECT ta.application as application, ta.ipc as ipc, ipc_count/ipc_tot as ipc_share, t3.sfields as sfields, t3.ipc as yes_ipc, t3.not_ipc as not_ipc, t3.not_appln_id as exclude, t4.classes as other_classes FROM (
SELECT t1.appln_id as application, t1.ipc_class_symbol as ipc, count(t2.appln_id) as ipc_count, sum(ipc_count) over (PARTITION BY application) as ipc_tot
FROM thesis.tls209_appln_ipc t1
FULL OUTER JOIN EACH
(SELECT appln_id, FROM thesis.tls209_appln_ipc GROUP EACH BY appln_id ) t2
ON t1.appln_id = t2.appln_id
GROUP EACH BY application, ipc
) AS ta
LEFT JOIN EACH (
SELECT appln_id, GROUP_CONCAT(ipc_class_symbol) as classes FROM [thesis.tls209_appln_ipc]
GROUP EACH BY appln_id) t4
ON ta.application = t4.appln_id
CROSS JOIN thesis.ifris_ipc_concordance t3
WHERE ta.ipc CONTAINS t3.ipc
) as tx
WHERE (not ipc contains not_ipc or not_ipc is null)
AND (not other_classes contains exclude or exclude is null or other_classes is null)
)

Nested Sum( ) query is not working in mysql

i'm having a problem in calculating total bill of a ptient. I have three tables named as "test", "pharmacy", "check".
Columns in test are:
patient_ID
testname
rate
Columns in pharmacy are:
patient_ID
medicineDescription
qty
rate
Columns in check are:
patient_ID
doctorID
fees
date
I have a table Bill that will store total amount of a patient.
patient_ID
amount
date
I have used the following query. But it's giving the following error.
$result = mysqli_query($data, "SELECT patient_ID, (SUM(pharmacy.qty*pharmacy.rate ) + SUM(test.rate) + SUM(check.fees))
AS total FROM pharmacy, test, check WHERE patient_ID= '$pID'" );
Correct query should be, closing bracket was missing at the end of subquery (... AS total FROM pharmacy**)**):
$result = mysqli_query ($data, "SELECT patient_ID,
(SUM(pharmacy.qty*pharmacy.rate ) + SUM(test.rate) + SUM(check.fees)) AS total FROM pharmacy),
test,
check
WHERE patient_ID= '$pID'" );
You have three tables in your from clause, but with no join condition - this means you're pairing each row with all the other rows, which is obviously not what you intended. One way to handle this is to use proper joins:
SELECT p.patient_id, pharmacy_sum + test_sum + fees_sum AS total
FROM (SELECT patient_id, SUM(qty * rate) AS pharmacy_sum
FROM pharmacy
WHERE patient_ID= '$pID'
GROUP BY patient_id) p
JOIN (SELECT patient_id, SUM(rate) AS test_sum
FROM test
WHERE patient_ID= '$pID'
GROUP BY patient_id) t ON p.patient_id_id = t.patient_id
JOIN (SELECT patient_id, SUM(fees) AS fees_sum
FROM check
WHERE patient_ID= '$pID'
GROUP BY patient_id) c ON p.patient_id_id = c.patient_id

How to delete duplicate and original entries in either access or excel (multiple columns)

Is there a way to delete all duplicate rows and the original entry in either excel or access?
I need to delete whole rows that match in 3 columns. Here is a visual (Bottom table is what the table should become; in this case the duplicates + original with the same Part number, manufacturer and manufacture number are deleted):
This seems to work for me in Access:
DELETE FROM parts
WHERE EXISTS
(
SELECT p2.[PART NUMBER], p2.[MANUFACTURER], p2.[MANUFACTURER NUMBER]
FROM parts p2
WHERE parts.[PART NUMBER] = p2.[PART NUMBER]
AND parts.[MANUFACTURER] = p2.[MANUFACTURER]
AND parts.[MANUFACTURER NUMBER] = p2.[MANUFACTURER NUMBER]
GROUP BY p2.[PART NUMBER], p2.[MANUFACTURER], p2.[MANUFACTURER NUMBER]
HAVING COUNT(*) > 1
)
When I run it on my test data...
PART NUMBER MANUFACTURER QUALITY MANUFACTURER NUMBER
----------- ------------ ------- -------------------
123 GORD 1 750
123 OTHER 3 321
123 OTHER 4 321
...it deletes the two "OTHER" rows but leaves the "GORD" row alone.
DELETE * FROM MyTable WHERE PartNumber in (SELECT MyTable.PartNumber
FROM MyTable
GROUP BY MyTable.PartNumber
HAVING (((Sum(1))>1)));
This should do it for you. It checks all three fields, and deletes the original and all duplicates.
DELETE
parts.*
FROM parts
WHERE (( ((SELECT Count (*)
FROM parts AS P
WHERE ( P.partnum & P.manf & P.manfnum =
parts.partnum & parts.manf & parts.manfnum )
AND ( P.partnum <= parts.partnum ))) > 1 ));

Resources