Retention Rate within Cohort - retention

I want some cohort analysis on a userbase. We have 2 tables "signups" and "sessions", where users and sessions both have a "date" field. I'm looking to formulate a query that yields a table of numbers (with some blanks) that shows me: a count of users who created an account on a particular day and ho also have a session created , indicating that he returned on that day, 3rd day, 7th day and 14 day.
created_at d1 d3 d7 d14
05/07/2007 12 * * *
04/07/2007 49 21 1 2
03/07/2007 45 30 * 3
02/07/2007 47 41 18 12
...
In this case, 47 users who created an account on 2/07/2007 returned after 3 days(d3)
Can I perform this in a single MySQL query?

Yes you can:
Select Signups.date as created at,
count (distinct case when datediff(sessions.date, signups.date)=1 then signups.users else null end) as d1,
count (distinct case when datediff(sessions.date, signups.date)=3 then signups.users else null end) as d3,
count (distinct case when datediff(sessions.date, signups.date)=7 then signups.users else null end) as d7,
count (distinct case when datediff(sessions.date, signups.date)=14 then signups.users else null end) as d14 from signups
left join sessions using(users)
group by 1

Related

how to solve given Correlated Subqueries is not supported within case when statement

I have this code where I try to count every distinct user listed within 30 days prior rolling window, over the past 40 days.
for example: on the 12th feb (12/02) I need to count all listed user from 13th jan-12th feb (30 days) then on 11th feb I need to count from 12th jan to 11th feb and so on. I need to do this to a lot of other dates, is there a way to do it in presto? seeing as it does not support correlated subquery and when I try the code below, it returns
"Given correlated subquery is not supported"
WITH
get_date as
(
select
distinct date_trunc('day',date(uph.last_login)) as dates
from user_profile_id_history uph
where uph.last_login >= date('2020-02-04')-interval '30' day and uph.last_login <= ( date'2020-02-04')
order by 1 desc
)
select
get_date.dates,
(
case when (dates >= date('2020-02-04')-interval '30' day and dates <= ( date'2020-02-04'))
then
(
select
count(distinct CASE WHEN date_trunc('month',date(up.registration_time)) <= date_trunc('month',uph.last_login) THEN uph.userid END)
FROM
user_profile_id_history uph
LEFT JOIN
user_profile up ON uph.userid = up.userid
where uph.last_login >= dates-interval '30' day and uph.last_login <= dates
) end
) as mauser
from get_date
group by 1`

Excel Index and match: using the index and match or Vlookup to return the second record and all associated column

Customer# Date Qty, Cost
12 1/2/2013 3 500
12 1/3/2013 5 200
12 1/4/2013 4 200
13 1/5/2013 1 150
14 1/6/2013 2 110
14 1/7/2013 1 110
15 1/8/2013 1 110
I have a table similar to the above table (with millions of records and 26 column).I would like to create two table based of this one. the first table is to show me the first order of each customer and its associated column and the second one is to show me data for the second order of each customer ( if they don't have it will be null).
the Result i am looking for
Table one- First order
Customer#, Date , Qty, Cost
12 , 1/2/2013, 3, 500
13 , 1/5/2013, 1, 150
14 , 1/6/2013, 2, 110
15 , 1/8/2013, 1, 110
Table two- second order table
Customer#, Date , Qty , Cost
12 , 1/3/2013, 5, 200
14 , 1/7/2013, 1 , 110
The formula i tried but failed to work
=INDEX(B:D,MATCH(A3,A:A,0))
I would appreciate if someone shares their ideas how to use the Index and match function in excel to solve this question.
I was able to solve the issue above using Tableau. I just used the Index() function to calculate the rank based on their order date and id and filtered by the rank to get the first and second order table.

Rounding Up Minutes above 8, 23, 38, 53 to the nearest quarter hour

Here’s a challenge for someone. I’m trying to round session times up to the nearest quarter hour (I report my total client hours for license credentialing)
8 minutes or above: round up to 15
23 minutes or above: round up to 30
38 minutes or above: round up to 45
53 minutes or above: round up to 60
Ex: in the first quarter hour, minutes below 8 will be their exact value: 1=1, 2=2, 3=3.
When 8 is entered, it is automatically rounded up to 15 (the same holds true for the rest of the hour: ex: 16=16, 17=17, 18=18, 19=19, 20=20, 21=21, 22=22. But 23 through 29 are all rounded up to 30.
Ideally, I could enter both hours and minutes in a single column, ex: 1.54
However, I realize that it may be necessary to create a separate column for hours and minutes in order to make this work (i.e., so that my formula is only concerned with rounding up minutes. I can add my hours and minutes together after the minutes are rounded.) Thus:
Column A = Hours (3 hours maximum)
Column B = Minutes
Column C = Minutes Rounded up to nearest ¼ hour
Column D = Col A + Col C
In column B I would like to enter minutes as 1 through 60 (no decimal- i.e., in General, not Time format)
38 minutes in column B would automatically be rounded up to 45 minutes in column C
Does anyone have any ideas? How can I do this using the fewest number of columns?
[A Previously posted question - "Round up to nearest quarter" - introduces the concept of Math.Ceiling. Is this something I should use? I couldn't wrap my head around the answer).
With Grateful Thanks,
~ Jay
How's this go?
DECLARE #time DATETIME = '2014-03-19T09:59:00'
SELECT CASE
WHEN DATEPART(mi, #time) BETWEEN 8 AND 15 THEN DATEADD(mi, 15-DATEPART(mi, #time), #time)
WHEN DATEPART(mi, #time) BETWEEN 23 AND 30 THEN DATEADD(mi, 30-DATEPART(mi, #time), #time)
WHEN DATEPART(mi, #time) BETWEEN 38 AND 45 THEN DATEADD(mi, 45-DATEPART(mi, #time), #time)
WHEN DATEPART(mi, #time) BETWEEN 53 AND 59 THEN DATEADD(mi, 60-DATEPART(mi, #time), #time)
ELSE #time
END
Assume "sessions" is your table (CTE below contains 2 sample records), with session start time & end time stored (as noted in comments above, just store these data points, don't store the calculated values). You might be able to do the rounding as below. (not sure if this is what you want, since it either rounds up or down... do you not want to round down?)
;WITH sessions AS (
SELECT CAST('20140317 12:00' AS DATETIME) AS session_start, CAST('20140317 12:38' AS DATETIME) AS session_end
UNION ALL
SELECT CAST('20140317 12:00' AS DATETIME), CAST('20140317 12:37:59' AS DATETIME) AS session_end
)
SELECT *, DATEDIFF(MINUTE, session_start, session_end) AS session_time
, ROUND(DATEDIFF(MINUTE, session_start, session_end)/15.0, 0) * 15.0 AS bill_time
FROM sessions;
EDIT:
Hi Jay, I don't think you mentioned it is an Excel problem! I was assuming SQL. As Stuart suggested in a comment above, it would be helpful if you modified your question to indicate it is for Excel, so that others can possibly get help from this dialog in the future.
With Excel, you can do it with two columns that contain the session start date and time (column A) and session end date and time (column B), plus two formulas:
Column C (Actual Minutes) = ROUND((B1-A1) * 1440,0)
Column D (Billing Minutes) = (FLOOR(C1/15, 1) * 15) + IF(MOD(C1,15) >= 8, 15, MOD(C1,15))
This is what my table looks like:
3/18/2014 12:00 3/18/2014 12:38 38 45
3/18/2014 14:00 3/18/2014 14:37 37 37

percent_rank special case to not include the value being evaluated in the range of group to be evaluated

Consider these values:
company_ID 3yr_value
1 10
2 20
3 30
4 40
5 50
I have this statement on my query and my goal is to compute for the percent rank of value 50 in the group
round(((percent_rank() over (partition by bb.company_id order by bb.3yr_value)) * 100))
in excel, this is equivalent to
=percentrank(b1:b5,b5)
BUT, what I need is an equivalent to this 1:=percentrank(b1:b4,b5) -- notice that I don't include A5 in the range that needs to be evaluated. I'm out of options, and already consulted Mr. Google but it seems I still cant find the solution. I always end up including B5 in my query.
I'm using postgres sql

Using BigQuery to find outliers with standard deviation results combined with WHERE clause

Standard deviation analysis can be a useful way to find outliers. Is there a way to incorporate the result of this query (finding the value of the fourth standard deviation away from the mean)...
SELECT (AVG(weight_pounds) + STDDEV(weight_pounds) * 4) as high FROM [publicdata:samples.natality];
result = 12.721342001626912
...Into another query that produces information about which states and dates have the most babies born heavier that 4 standard deviations from average?
SELECT state, year, month ,COUNT(*) AS outlier_count
FROM [publicdata:samples.natality]
WHERE
(weight_pounds > 12.721342001626912)
AND
(state != '' AND state IS NOT NULL)
GROUP BY state, year, month
ORDER BY outlier_count DESC;
Result:
Row state year month outlier_count
1 MD 1990 12 22
2 NY 1989 10 17
3 CA 1991 9 14
Essentially it would be great to combine this into a single query.
You can abuse JOIN for this (and thus performance will suffer):
SELECT n.state, n.year, n.month ,COUNT(*) AS outlier_count
FROM (
SELECT state, year, month, weight_pounds, 1 as key
FROM [publicdata:samples.natality]) as n
JOIN (
SELECT (AVG(weight_pounds) + STDDEV(weight_pounds) * 4) as giant_baby,
1 as key
FROM [publicdata:samples.natality]) as o
ON n.key = o.key
WHERE
(n.weight_pounds > o.giant_baby)
AND
(n.state != '' AND n.state IS NOT NULL)
GROUP BY n.state, n.year, n.month
ORDER BY outlier_count DESC;

Resources