Group by the dates to weeks - python-3.x

I have time in ms in epoch format, I need to translate this into a date and group it by a week number.
I tried the following procedure:
df.loc[0, 'seconds'] = df['seconds'].iloc[0]
for _, grp in df.groupby(pd.TimeGrouper(key='seconds', freq='7D')):x
print (grp)
df["week"].to_period(freq='w')
For example, if my 'seconds' column is presented like 1557499095332, then I want the 'dates' column to be 10-05-2019 20:08:15 and the 'Week' column to present W19 or 19.
How do I go about this?

Try using strftime method:
from datetime import datetime as dt
x = 1557499095332
dt.fromtimestamp(x/1000).strftime("%A, %B %d, %Y %I:%M:%S")
dt.fromtimestamp(x/1000).strftime("%W")
3rd line will return 'Friday, May 10, 2019 03:38:15'
4th line will return '18' (it's because 1st of January 2019 will return '0' as it's first week)

Related

Is there a way to convert numerical month/day/year to letter form month/day/year in pandas?

Currently, I have a column in the form month/day/year like 2/11/2020. I am trying to get it into the form of February Eleven 2020.
So far I've tried looking into dt.time to split into date and month but it looks like I need it in yy-mm-dd. I was thinking I can maybe split it into three columns then use .replace on each column with a dictionary.
Does anyone know how to get this method to work or have a better solution?
The following are the three I have come across.
df=pd.DataFrame({'Date':['2/11/2020']})
df['Date']=pd.to_datetime(df['Date']).dt.strftime('%d %B %Y')# day/Month/Year
df['Date']=pd.to_datetime(df['Date']).dt.strftime('%A %B %Y')#day of the week/month/Year
df['Date']=pd.to_datetime(df['Date']).dt.strftime('%A %d %B %Y')
I don't believe the standard datetime library has functionality to spell out the days of the month, but you can easily create your own dict to handle that.
from datetime import datetime
now = datetime.now()
date_dict = {'19': 'Nineteenth',
'20':'Twentieth',
'21': 'Twenty First'}
print(now.strftime('%B ') + date_dict[now.strftime('%d')] + now.strftime(' %Y'))
'June Twenty First 2020'
Here is a larger toy example if you want to use a lambda. The extra C column is added because I don't think apply works with an index.
import pandas as pd
dr = pd.date_range(start='6/19/2020', end='6/21/2020')
df = pd.DataFrame({'A' :['6/19/2020', '6/20/2020', '6/21/2020'], 'B':[3,4,5]}, index=dr)
df['C'] = df.index
df['D'] = df['C'].apply(lambda x: x.strftime('%B ') + date_dict[x.strftime('%d')] + x.strftime(' %Y'))
print(df)
A B C D
2020-06-19 6/19/2020 3 2020-06-19 June Nineteenth 2020
2020-06-20 6/20/2020 4 2020-06-20 June Twentieth 2020
2020-06-21 6/21/2020 5 2020-06-21 June Twenty First 2020
If you just want to use an already existing string column (like column A) in m/d/y format, you can use this method.
df['A'] = pd.to_datetime(df['A'], dayfirst=False)
df['D'] = df['A'].apply(lambda x: x.strftime('%B ') + date_dict[x.strftime('%d')] + x.strftime(' %Y'))

Pandas to recognize current date, and filter a date column relative to today's date

Having a lot of trouble translating the logic below in pandas/python, so I do not even have sample code or a df to work with :x
I run a daily report, that essentially filters for data from Monday thru the day before what 'Today' is. I have a Date column [ in dt.strftime('%#m/%#d/%Y') format] . It will never be longer than a Monday-Sunday scope.
1) Recognize the day it is 'today' when running the report, and recognize what day the closet Monday prior was. Filter the "Date" Column for the Monday-day before today's date [ in dt.strftime('%#m/%#d/%Y') format ]
2) Once the df is filtered for that, take this group of rows that have dates in the logic above, have it check for dates in a new column "Date2". If any dates are before the Monday Date, in Date2, change all of those earlier dates in 'Date2' to the Monday date it the 'Date' column.
3) If 'Today' is a Monday, then filter the scope from the Prior Monday through - Sunday in the "Date" Column. While this is filtered, do the step above [step 2] but also, for any dates in the "Date2" column that are Saturday and Sunday Dates - changes those to the Friday date.
Does this make sense?
Here're the steps:
from datetime import datetime
today = pd.to_datetime(datetime.now().date())
day_of_week = today.dayofweek
last_monday = today - pd.to_timedelta(day_of_week, unit='d')
# if today is Monday, we need to step back another week
if day_of_week == 0:
last_monday -= pd.to_timedelta(7, unit='d')
# filter for last Monday
last_monday_flags = (df.Date == last_mon)
# filter for Date2 < last Monday
date2_flags = (df.Date2 < last_monday)
# update where both flags are true
flags = last_monday_flags & date2_flags
df.loc[flags, 'Date2'] = last_monday
# if today is Monday
if day_of_week == 0:
last_sunday = last_monday + pd.to_timedelta(6, unit='d')
last_sat = last_sunday - pd.to_timedelta(1, unit='d')
last_week_flags = (df.Date >= last_monday) & (df.Date <= next_sunday)
last_sat_flags = (df.Date2 == last_sat)
last_sun_flags = (df.Date2 == last_sun)
# I'm just too lazy and not sure how Sat and Sun relates to Fri
# but i guess just subtract 1 day or 2 depending on which day
...

How to extract or validate date format from a text using python?

I'm trying to execute this code:
import datefinder
string_with_dates = 'The stock has a 04/30/2009 great record of positive Sept 1st, 2005 earnings surprises, having beaten the trade Consensus EPS estimate in each of the last four quarters. In its last earnings report on May 8, 2018, Triple-S Management reported EPS of $0.6 vs.the trade Consensus of $0.24 while it beat the consensus revenue estimate by 4.93%.'
matches = datefinder.find_dates(string_with_dates)
for match in matches:
print(match)
The output is:
2009-04-30 00:00:00
2005-09-01 00:00:00
2018-05-08 00:00:00
2019-02-04 00:00:00
The last date has come due to the percentage value 4.93% ... How to overcome this situation?
I cannot fix the datefinder module issue. You stated that you needed a solution, so I put this together for you. It's a work in progress, which means that you can adjusted it as needed. Also, some of the regex could have been consolidated, but I wanted to break them out for you. Hopefully, this answer helps you until you find another solution that works better for your needs.
import re
string_with_dates = 'The stock has a 04/30/2009 great record of positive Sept 1st, 2005 earnings surprises having beaten the trade Consensus EPS estimate in each of the last ' \
'four quarters In its last earnings report on March 8, 2018, Triple-S Management reported EPS of $0.6 vs.the trade Consensus of $0.24 while it beat the ' \
'consensus revenue estimate by 4.93%. The next trading day will occur at 2019-02-15T12:00:00-06:30'
def find_dates(input):
'''
This function is used to extract date strings from provide text.
Symbol references:
YYYY = four-digit year
MM = two-digit month (01=January, etc.)
DD = two-digit day of month (01 through 31)
hh = two digits of hour (00 through 23) (am/pm NOT allowed)
mm = two digits of minute (00 through 59)
ss = two digits of second (00 through 59)
s = one or more digits representing a decimal fraction of a second
TZD = time zone designator (Z or +hh:mm or -hh:mm)
:param input: text
:return: date string
'''
date_formats = [
# Matches date format MM/DD/YYYY
'(\d{2}\/\d{2}\/\d{4})',
# Matches date format MM-DD-YYYY
'(\d{2}-\d{2}-\d{4})',
# Matches date format YYYY/MM/DD
'(\d{4}\/\d{1,2}\/\d{1,2})',
# Matches ISO 8601 format (YYYY-MM-DD)
'(\d{4}-\d{1,2}-\d{1,2})',
# Matches ISO 8601 format YYYYMMDD
'(\d{4}\d{2}\d{2})',
# Matches full_month_name dd, YYYY or full_month_name dd[suffixes], YYYY
'(January|February|March|April|May|June|July|August|September|October|November|December)(\s\d{1,2}\W\s\d{4}|\s\d(st|nd|rd|th)\W\s\d{4})',
# Matches abbreviated_month_name dd, YYYY or abbreviated_month_name dd[suffixes], YYYY
'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sept|Oct|Nov|Dec)(\s\d{1,2}\W\s\d{4}|\s\d(st|nd|rd|th)\W\s\d{4})',
# Matches ISO 8601 format with time and time zone
# yyyy-mm-ddThh:mm:ss.nnnnnn+|-hh:mm
'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\+|-)\d{2}:\d{2}',
# Matches ISO 8601 format Datetime with timezone
# yyyymmddThhmmssZ
'\d{8}T\d{6}Z',
# Matches ISO 8601 format Datetime with timezone
# yyyymmddThhmmss+|-hhmm
'\d{8}T\d{6}(\+|-)\d{4}'
]
for item in date_formats:
date_format = re.compile(r'\b{}\b'.format(item), re.IGNORECASE|re.MULTILINE)
find_date = re.search(date_format, input)
if find_date:
print (find_date.group(0))
find_dates(string_with_dates)
# outputs
04/30/2009
March 8, 2018
Sept 1st, 2005
2019-02-15T12:00:00-06:30

Invalid Dates Python

I am new to Python. I was just wondering, how can you write code that makes beyond a certain date an invalid input. For example, if the user inputs anything after 12/02/2013, it will produce an error. Everything after that date will work perfectly
As glibdud suggested, use datetime objects.
date = datetime.date(YYYY, MM, DD)
where (YYYY, MM, DD) are integers representing years, months, and days. The condition can then be checked in your script with
inputDate > maxDate
for example:
import datetime
maxDate = datetime.date(2013, 12, 2)
y = int(input('Enter year:'))
m = int(input('Enter numerical month (1-12):'))
d = int(input('Enter numerical day (1-31):'))
inputDate = datetime.date(y, m, d)
if inputDate > maxDate:
print('Error - date after 02 December 2013')
else:
print('Success!')
Gives:
Enter year:2018
Enter numerical month (1-12):1
Enter numerical day (1-31):1
Error - date after 02 December 2013
and
Enter year:2000
Enter numerical month (1-12):1
Enter numerical day (1-31):1
Success!

Iterate through CSV and match lines with a specific date

I am parsing a CSV file into a list. Each list item will have a column list[3] which contains a date in the format: mm/dd/yyyy
I need to iterate through the file and extract only the rows which contain a specific date range.
for example, I want to extract all rows for the month of 12/2015. I am having trouble determining how to match the date. Any nudging in the right direction would be helpful.
Thanks.
Method1:
splits your column to month, day and year, converts month and year to integers and then compare and match 12/2015
column3 = "12/31/2015"
month, day, year = column3.split("/")
if int(month) == 12 and int(year) == 2015:
# do your thing
Method2:
parses a datetime string to time object and gets the attributes tm_year and tm_mon, compare them with corresponding month and year.
>>> import time
>>> to = time.strptime("12/03/2015", "%m/%d/%Y")
>>> to.tm_mon
12
>>> to.tm_year
2015

Resources