Using pandas, how can I find out if my customer made a purchase last month or two months ago? - excel

I'm new to python and new to pandas. Of course, if my project used exact dates, I could easily do this, but unfortunately, the date type is a little different, and as you can see, the sign 08 is after the year 1401, which means it is the eighth month of the year 1401.
I currently know that these 3 customers have bought from me this month. But I want to know if these 3 customers bought from me in the previous month or two months ago? If they do, I will give them a discount.
Of course, I should also say that the number 08 is not always fixed, but it could be 09 in the next month. I just want to know if they bought from me 1 month ago or not?
According to the picture, now only Sara should get a discount

You could convert the purchase date to an integer and calculate the number of months from there.
For instance, you have the purchase month 1901/07 and you want to know in 1901/08 how many months the last purchase took place. So you convert both values to integers and subtract them (190108 - 190107 = 1).
import pandas as pd
df = pd.DataFrame({'customer': ['david', 'sara'], 'date': ['1901/03', '1901/07']})
# Manually setting the reference month (190108 for Year 1901 and Month 08)
df['months'] = 190108 - df['date'].replace('/', '', regex=True).astype(int)
# Check if eligible for discount
df['discount'] = df['months'].isin([1, 2])
customer
date
months
discount
0
david
1901/03
5
False
1
sara
1901/07
1
True
To compare with today's month you could to the following:
df['months'] = int(pd.Timestamp.now().strftime('%Y%m'))\
- df['date'].replace('/', '', regex=True).astype(int)

Related

How to find a trend/forecast result 14 days from today

I have looked into the Forcast & Trend formula but I cannot figure it out for the life of me.
I want to work out the trend 14 days from now.
I have a set of data:
A1 - A30 with dates
B1 - B30 with daily ticket count for the business.
I would like to make a result in another cell that would predict what the estimated total ticket count would be 14 days from now. I do not need all 14 days, just the 14th day.
If I was to try show you what the formula looks like in my head it would be:
=trend/forecast(B1:B30,14)
or
=Predict(B1:B30)*14
Unfortunately it is not as easy as that. How can I do this?
I think you want to use the Forecast function. The inputs you have do not match the correct format though.
FORECAST( x, known y's, known x's) where...
x = the series (or date) you want to forecast
known y's = historical tickets per day
known x's = historical dates (or series)
The below example allows you to forecast tickets for any date (Forecasted Date) given the historical information (table on left). If your table is not formatted with actual dates, just create a series (first day = 1, second day = 2, etc.) and forecast that way.
Given the historical data, the forecasted tickets for Aug 28th (14 days after last known value) are 16.7

US convention different in spreadsheet functions (Libre, Google Sheets, etc.)

The Excel/Google-Sheets/LibreOffice function DAYS360() returns the number of days between two dates based on a 360-day year. 0 (default) is used for the US-based method and here are some examples
A = 30 Apr 2016, B = 29 Feb 2016, DAYS360(A, B) = -61
A = 29 Feb 2016, B = 30 Apr 2016, DAYS360(A, B) = 60
This seems ok according to the rules here
But the Excel/Google-Sheets/LibreOffice function YEARFRAC() returns the number of years, including fractional years, between two dates using a specified day count convention. Even here 0 (default) uses US method, (US (NASD) 30/360) which I presumed will also be equal to the value of number of days calculated by DAYS360 * the number of seconds in a day/number of seconds in 360 days. The values in the sheets are as follows
A = 30 Apr 2016, B = 29 Feb 2016, YEARFRAC(A, B) = 0.1666666667
A = 29 Feb 2016, B = 30 Apr 2016, YEARFRAC(A, B) = 0.1666666667
Since it can be seen that the absolute value of the DAYS360 is different by one, the YEARFRAC value is same and assumes 60 days according to the presumption made above, so are the US-based convention mentioned here is the same as mentioned for DAYS360.
If not, what are the exact rules for this one, or is there some other problem?
NOTE: Tested these values on Google Sheets and Libre Office.
DAYS360 parameter 3:
0 indicates the US method - Under the US method, if start_date is the last day of a month, the day of month of start_date is changed to
30 for the purposes of the calculation. Furthermore if end_date is the
last day of a month and the day of the month of start_date is earlier
than the 30th, end_date is changed to the first day of the month
following end_date, otherwise the day of month of end_date is changed
to 30.
1 or any other value indicates the European method - Under the European method, any start_date or end_date that falls on the 31st of
a month has its day of month changed to 30.
YEARFRAC parameter 3:
0 indicates US (NASD) 30/360 - This assumes 30 day months and 360 day
years as per the National Association of Securities Dealers standard,
and performs specific adjustments to entered dates which fall at the
end of months.
1 indicates Actual/Actual - This calculates based upon the actual
number of days between the specified dates, and the actual number of
days in the intervening years. Used for US Treasury Bonds and Bills,
but also the most relevant for non-financial use.
2 indicates Actual/360 - This calculates based on the actual number of
days between the specified dates, but assumes a 360 day year.
3 indicates Actual/365 - This calculates based on the actual number of
days between the specified dates, but assumes a 365 day year.
4 indicates European 30/360 - Similar to 0, this calculates based on a
30 day month and 360 day year, but adjusts end-of-month dates
according to European financial conventions.

How do I write an Excel formula to compare year-to-date to prior years & also account for leap years?

I’m trying to compare a measure as of today through the same day and month for the prior 4 years (e.g. through June 6 of 2016, 2015, 2014, etc.).
For each year, I decided to count the number of days since the beginning of the year, and sum my values through that number of days for each year.
To identify whether a date should be included in the year to date comparison, I used the formula where my date is in cell A1:
=IF((A1-DATE(YEAR(A1),1,1)+1)<=(TODAY()-DATE(YEAR(TODAY()),1,1)+1),1,0)
I’m looking for a way around the issue of the extra day added to leap years. In other words, after February 28th, the day count will always be off by one in a leap year, and trying to use Februrary 29th in a non-leap year will return an error.
I’d like to adjust this formula, but I’m open to using a different function & formula if it gets me the right results.
you can check any information about February, 29th. If an error occurs, you know its no leap year. Catch that error with =IFERROR(;).
Assuming a table structure like this:
A:Date | B:Value
----------------------
01/01/2016 | 0
01/01/2015 | 1
01/01/2014 | 2
01/01/2013 | 3
01/01/2012 | 4
Formula
To - for example - calculate the average of the previous four (excluding the current) years on January 1st (today is 01/01/2016):
=SUMPRODUCT(
(MONTH(A:A)=MONTH(compare))*
(DAY(A:A)=DAY(compare))*
(YEAR(A:A)>YEAR(compare)-5)*
(YEAR(A:A)<YEAR(compare))*
(B:B)
) / (
SUMPRODUCT(
(MONTH(A:A)=MONTH(compare))*
(DAY(A:A)=DAY(compare))*
(YEAR(A:A)>YEAR(compare)-5)*
(YEAR(A:A)<YEAR(compare))*
1
)
)
Result
For the above example, the result is 2.5
Explanation
To select only those rows representing the same month and day:
(MONTH(A:A)=MONTH(compare))*(DAY(A:A)=DAY(compare))
To select only those values from the previous 4 years (excluding the current):
(YEAR(A:A)>YEAR(compare)-5)*(YEAR(A:A)<YEAR(compare))*
The actual values we are interested in:
(B:B)
Divide by 4 for the average over the last four years. This assumes there is no missing data which might be an issue. You could use another SUMPRODUCT (replace B:B with 1) to count the number of resulting rows and divide by that number to handles this case. This seems to be rather slow, but it works.
Note
For performance reason you should not use A:A (a full column) in the formula, just use the actual range you need, which will likely be much faster.

How do I get the Month End Personal Time Off Days used?

Using the Start Date and End Date of PTO - Personal Time Off Days Used only count days used up to end of prior month, excluding weekends and U.S Holidays in that certain month. Example of a Holiday is Sept 7th 2015 in the United States.
My goals are:
Create a Data Item Month End Personal Time Off Days used.
Of course it should be getting the number of PTO Days USED from the prior month only.
Exclude weekends in that certain month. So if the Resource takes a Leave on Friday and Monday, Saturday and Sunday should not be excluded in the computation.
How to exclude U.S Holidays, if this is possible that's great but if it's not possible then I'm okay with numbers 1, 2 and 3.
I have created a Data Item column that captures the PTO days used. But this is good for Year to date.
Case when [PTO Info].[PTO Audit].[PTOAuditTypeId] = 31571
and [PTO Info].[PTO Audit].[TimeOffTypeId] = 31566
then [PTO Info].[PTO Audit].[PTODays]
when [PTO Info].[PTO Audit].[PTOAuditTypeId]=31572
and [PTO Info].[PTO Audit].[TimeOffTypeId] = 31566
and [PTO Info].[PTO Audit].[PTODays] < 0
then abs([PTO Info].[PTO Audit].[PTODays] )
else 0 end
I'm not sure if the query below can help.
A calendar table is really going to help you out here. Assuming it has one record per calendar date, you can use this table to note weekends, holidays, fiscal periods vs Calendar periods, beginning of month/end of month dates. A number of things that can help simplify your date based queries.
See this question here for an example on creating a calendar table.
The main point is to create a data set with 1 record per date, with information about each date including Month, Day of Week, Holiday status, etc.
Without a calendar table, you can use database functions to generate your set of dates on the fly.
Getting the Month number for a date can be done with
extract([Month], <date field goes here> )
Getting a list of values from nothing will be required to generate your list of dates (if you don't have a calendar table with 1 record per date to use) will vary depending on your source database. In oracle I use a 'select from all_objects' type query to achieve this.
An example from Ask Tom:
select to_date(:start_date,'dd-mon-yyyy') + rownum -1
from all_objects
where rownum <=
to_date(:end_date,'dd-mon-yyyy')-to_date(:start_date,'dd-mon-yyyy')+1
For Sql Server refer to this stackoverflow question here.
Once you have a data set with your calendar type information, you can join it to your query above:
join mycalendar cal on cal.date >= c.PTOStartDate
and cal.date <= c.PTOEndDate
Also note, _add_days is a Cognos function. When building your source queries, try and use Native functions, like in oracle you can 'c.PTOStartDate + a.PTODays'. Mixing Cognos functions with native functions will sometime force parts of your queries to be processed locally on the Cognos server. Generally speaking, the more work that happens on the database, the faster your reports will run.
Once you have joined to the calendar data, you are going to have your records multiplied out so that you have 1 record per date. (You would not want to be doing any summary math on PTODays here, as it will be inflated.)
Now you can add clauses to track your rules.
where cal.Day_Of_Week between 2 and 6
and cal.Is_Holiday = 'N'
Now if you are pulling a specific month, you can add that to the criteria:
and cal.CalendarPeriod = '201508'
Or if you are covering a longer period, but wanting to report a summary per month, you can group by month.
Final query could look something like this:
select c.UserID, cal.CalendarPeriod, count(*) PTO_Days
from dbo.PTOCalendar c
join myCalendar cal on on cal.date >= c.PTOStartDate
and cal.date <= c.PTOEndDate
where cal.day_of_week between 2 and 6
and cal.Is_Holiday = 'N'
group by c.UserID, cal.CalendarPeriod
So if employee with UserID 1234 Took a 7 day vacation from thursday June 25th to Friday July 3th, that covered 9 days, the result you get here will be:
1234 201506 4
1234 201507 3
You can join these results to your final query above to track days off per month.

Excel find date difference between minimum and maximum date in month

I'm trying to fix up a formula I have that's having some issues. It's supposed to track # days invoiced in a month, so the high-level idea is to take the maximum date in a month and subtract the minimum date in the month, and on error subtract the 1st day of the month. My current formula has issues adjust for invoices that may cross months, an example being 1/25 - 2/3 where if this were the only invoice, January should show 7 days invoiced and February would show 3. If there were another invoice from 2/15 - 2/28, I would want Feb to show the maxed invoice days, 14 in this example.
For reference here's what a table could look like:
A B C D E F
start month end month invoice begin invoice end Month Max Days invoiced
jan 1 feb 1 1/25/14 2/3/14 1/1 7
feb 1 feb 1 2/15/14 2/28/14 2/1 14
3/1
etc.........
I tried the formula below but it was erroring out, plus I don't think it will account for gaps in invoices like in my example.:
=IF(B2:B100=X1,MAX(D2:D100),) - IF(A2:A100=X2,MIN(C2:C100),A2)
'where column X is a list of months, X1 = 1/1, X2 = 2/1, etc.
No luck with this formula either, keeps erroring out and giving 0 values:
{=DATEDIF(IF(A2:A100=E2,MIN(C2:C100),),IF(B2:B100=E2,MAX(D2:D100),),"d")}
I appreciate your help!
Not sure exactly what you are looking for but you could probably make use of the EOMONTH() function. Here's an example of it:
=EOMONTH(A2,0)-A2+1
by the way - here is how you would get the start of the month:
=EOMONTH(TODAY(),-1)+1
Try the following per your comment below:
"I think this could be useful but I'm not sure it would work if the invoice end was, say, 2/21 or anytime before the EOM"
=IF(B3>=EOMONTH(A3,0),EOMONTH(A3,0)-A3+1,B3-A3+1)

Resources