How to extract or validate date format from a text using python? - python-3.x

I'm trying to execute this code:
import datefinder
string_with_dates = 'The stock has a 04/30/2009 great record of positive Sept 1st, 2005 earnings surprises, having beaten the trade Consensus EPS estimate in each of the last four quarters. In its last earnings report on May 8, 2018, Triple-S Management reported EPS of $0.6 vs.the trade Consensus of $0.24 while it beat the consensus revenue estimate by 4.93%.'
matches = datefinder.find_dates(string_with_dates)
for match in matches:
print(match)
The output is:
2009-04-30 00:00:00
2005-09-01 00:00:00
2018-05-08 00:00:00
2019-02-04 00:00:00
The last date has come due to the percentage value 4.93% ... How to overcome this situation?

I cannot fix the datefinder module issue. You stated that you needed a solution, so I put this together for you. It's a work in progress, which means that you can adjusted it as needed. Also, some of the regex could have been consolidated, but I wanted to break them out for you. Hopefully, this answer helps you until you find another solution that works better for your needs.
import re
string_with_dates = 'The stock has a 04/30/2009 great record of positive Sept 1st, 2005 earnings surprises having beaten the trade Consensus EPS estimate in each of the last ' \
'four quarters In its last earnings report on March 8, 2018, Triple-S Management reported EPS of $0.6 vs.the trade Consensus of $0.24 while it beat the ' \
'consensus revenue estimate by 4.93%. The next trading day will occur at 2019-02-15T12:00:00-06:30'
def find_dates(input):
'''
This function is used to extract date strings from provide text.
Symbol references:
YYYY = four-digit year
MM = two-digit month (01=January, etc.)
DD = two-digit day of month (01 through 31)
hh = two digits of hour (00 through 23) (am/pm NOT allowed)
mm = two digits of minute (00 through 59)
ss = two digits of second (00 through 59)
s = one or more digits representing a decimal fraction of a second
TZD = time zone designator (Z or +hh:mm or -hh:mm)
:param input: text
:return: date string
'''
date_formats = [
# Matches date format MM/DD/YYYY
'(\d{2}\/\d{2}\/\d{4})',
# Matches date format MM-DD-YYYY
'(\d{2}-\d{2}-\d{4})',
# Matches date format YYYY/MM/DD
'(\d{4}\/\d{1,2}\/\d{1,2})',
# Matches ISO 8601 format (YYYY-MM-DD)
'(\d{4}-\d{1,2}-\d{1,2})',
# Matches ISO 8601 format YYYYMMDD
'(\d{4}\d{2}\d{2})',
# Matches full_month_name dd, YYYY or full_month_name dd[suffixes], YYYY
'(January|February|March|April|May|June|July|August|September|October|November|December)(\s\d{1,2}\W\s\d{4}|\s\d(st|nd|rd|th)\W\s\d{4})',
# Matches abbreviated_month_name dd, YYYY or abbreviated_month_name dd[suffixes], YYYY
'(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sept|Oct|Nov|Dec)(\s\d{1,2}\W\s\d{4}|\s\d(st|nd|rd|th)\W\s\d{4})',
# Matches ISO 8601 format with time and time zone
# yyyy-mm-ddThh:mm:ss.nnnnnn+|-hh:mm
'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\+|-)\d{2}:\d{2}',
# Matches ISO 8601 format Datetime with timezone
# yyyymmddThhmmssZ
'\d{8}T\d{6}Z',
# Matches ISO 8601 format Datetime with timezone
# yyyymmddThhmmss+|-hhmm
'\d{8}T\d{6}(\+|-)\d{4}'
]
for item in date_formats:
date_format = re.compile(r'\b{}\b'.format(item), re.IGNORECASE|re.MULTILINE)
find_date = re.search(date_format, input)
if find_date:
print (find_date.group(0))
find_dates(string_with_dates)
# outputs
04/30/2009
March 8, 2018
Sept 1st, 2005
2019-02-15T12:00:00-06:30

Related

Python: Converting Standard Time to Military time

Write a program that prompts a user to enter time in a 12-hour format and converts it to a 24-hour format. Converting a 12-hour time to 24-hour time requires adding 12 hours to any time between 1:00PM-11:59PM and subtracting 12 hours from any time between 12:00AM-12:59AM. 24-hour time uses leading zeros for hours so that there are always 4 digits.
This is currently in one my labs that I am trying to finish up for the semester and I do not know how to finish it.
from datetime import datetime
def convert_24hrs():
txt = input('input the 12-hour format: ')
tim = datetime.strftime(f'2021-12-09 {txt}', '%Y-%m-%d %I:%M %p')
print(datetime.strptime(tim, '%H:%M'))
convert_24hrs()
# input the 12-hour format: 4:21 am
# 04:21
convert_24hrs()
# input the 12-hour format: 5:36 pm
# 17:36
Incase anyone want to have the hour or minutes as an integers you can use this concept.
import datetime
YEAR = datetime.date.today().year # the current year
MONTH = datetime.date.today().month # the current month
DATE = datetime.date.today().day # the current day
HOUR = datetime.datetime.now().hour # the current hour
MINUTE = datetime.datetime.now().minute # the current minute
SECONDS = datetime.datetime.now().second #the current second
print(YEAR, MONTH, DATE, HOUR, MINUTE, SECONDS)
So with the above understanding you can get the difference in hours.
Let's reproduce onyambu example:
from datetime import datetime
def convert_24hrs():
txt = input('input the 12-hour format: ')
tim = datetime.strptime(f'2021-12-09 {txt}', '%Y-%m-%d %I:%M %p')
print(f'{str(tim.hour)}:{str(tim.minute)}')
convert_24hrs()
# input the 12-hour format: 5:36 pm
# 17:36

Group by the dates to weeks

I have time in ms in epoch format, I need to translate this into a date and group it by a week number.
I tried the following procedure:
df.loc[0, 'seconds'] = df['seconds'].iloc[0]
for _, grp in df.groupby(pd.TimeGrouper(key='seconds', freq='7D')):x
print (grp)
df["week"].to_period(freq='w')
For example, if my 'seconds' column is presented like 1557499095332, then I want the 'dates' column to be 10-05-2019 20:08:15 and the 'Week' column to present W19 or 19.
How do I go about this?
Try using strftime method:
from datetime import datetime as dt
x = 1557499095332
dt.fromtimestamp(x/1000).strftime("%A, %B %d, %Y %I:%M:%S")
dt.fromtimestamp(x/1000).strftime("%W")
3rd line will return 'Friday, May 10, 2019 03:38:15'
4th line will return '18' (it's because 1st of January 2019 will return '0' as it's first week)

Pandas to recognize current date, and filter a date column relative to today's date

Having a lot of trouble translating the logic below in pandas/python, so I do not even have sample code or a df to work with :x
I run a daily report, that essentially filters for data from Monday thru the day before what 'Today' is. I have a Date column [ in dt.strftime('%#m/%#d/%Y') format] . It will never be longer than a Monday-Sunday scope.
1) Recognize the day it is 'today' when running the report, and recognize what day the closet Monday prior was. Filter the "Date" Column for the Monday-day before today's date [ in dt.strftime('%#m/%#d/%Y') format ]
2) Once the df is filtered for that, take this group of rows that have dates in the logic above, have it check for dates in a new column "Date2". If any dates are before the Monday Date, in Date2, change all of those earlier dates in 'Date2' to the Monday date it the 'Date' column.
3) If 'Today' is a Monday, then filter the scope from the Prior Monday through - Sunday in the "Date" Column. While this is filtered, do the step above [step 2] but also, for any dates in the "Date2" column that are Saturday and Sunday Dates - changes those to the Friday date.
Does this make sense?
Here're the steps:
from datetime import datetime
today = pd.to_datetime(datetime.now().date())
day_of_week = today.dayofweek
last_monday = today - pd.to_timedelta(day_of_week, unit='d')
# if today is Monday, we need to step back another week
if day_of_week == 0:
last_monday -= pd.to_timedelta(7, unit='d')
# filter for last Monday
last_monday_flags = (df.Date == last_mon)
# filter for Date2 < last Monday
date2_flags = (df.Date2 < last_monday)
# update where both flags are true
flags = last_monday_flags & date2_flags
df.loc[flags, 'Date2'] = last_monday
# if today is Monday
if day_of_week == 0:
last_sunday = last_monday + pd.to_timedelta(6, unit='d')
last_sat = last_sunday - pd.to_timedelta(1, unit='d')
last_week_flags = (df.Date >= last_monday) & (df.Date <= next_sunday)
last_sat_flags = (df.Date2 == last_sat)
last_sun_flags = (df.Date2 == last_sun)
# I'm just too lazy and not sure how Sat and Sun relates to Fri
# but i guess just subtract 1 day or 2 depending on which day
...

Invalid Dates Python

I am new to Python. I was just wondering, how can you write code that makes beyond a certain date an invalid input. For example, if the user inputs anything after 12/02/2013, it will produce an error. Everything after that date will work perfectly
As glibdud suggested, use datetime objects.
date = datetime.date(YYYY, MM, DD)
where (YYYY, MM, DD) are integers representing years, months, and days. The condition can then be checked in your script with
inputDate > maxDate
for example:
import datetime
maxDate = datetime.date(2013, 12, 2)
y = int(input('Enter year:'))
m = int(input('Enter numerical month (1-12):'))
d = int(input('Enter numerical day (1-31):'))
inputDate = datetime.date(y, m, d)
if inputDate > maxDate:
print('Error - date after 02 December 2013')
else:
print('Success!')
Gives:
Enter year:2018
Enter numerical month (1-12):1
Enter numerical day (1-31):1
Error - date after 02 December 2013
and
Enter year:2000
Enter numerical month (1-12):1
Enter numerical day (1-31):1
Success!

Getting a day date for last week from a given date

I know how to get the last Saturday date using date --date="sat. last week" +"%m%d%Y" However, I am unable to get the last Saturday date for a particular given date.
I am trying to find the date of last saturday date from a given date. For an instance, let us say that the given date is 20140605 (in YYYYMMDD format) or any weekday date from that particular week. This given date could be any weekday date (20140602-20140606) in the following week for that Saturday, I need to derive the Saturday date for the week before this date which would be (in my case)= 20140531. How can I achieve this?
I just barfed a bit on my keyboard, but this seems to work:
D="20140605"; date --date "$D $[($(date --date "$D" +%u) + 1) % 7] days ago" +"%Y%m%d"
I have a small Python snippet for you:
from datetime import datetime, timedelta
import sys
fmt = "%Y%m%d"
daystring = sys.argv[1]
day = datetime.strptime(daystring, fmt)
for d in range(7):
# Go back a few days, but not more than 6.
testdate = day - timedelta(days=d)
# If the test date is a saturday (0 is monday, 6 sunday), output that date.
if testdate.weekday() == 5:
print testdate.strftime(fmt)
Test:
$ python test.py 20140605
20140531

Resources