Remove certain dates in list. Python 3.4 - python-3.x

I have a list that has several days in it. Each day have several timestamps. What I want to do is to make a new list that only takes the start time and the end time in the list for each date.
I also want to delete the Character between the date and the time on each one, the char is always the same type of letter.
the time stamps can vary in how many they are on each date.
Since I'm new to python it would be preferred to use a lot of simple to understand codes. I've been using a lot of regex so pleas if there is a way with this one.
the list has been sorted with the command list.sort() so it's in the correct order.
code used to extract the information was the following.
file1 = open("test.txt", "r")
for f in file1:
list1 += re.findall('20\d\d-\d\d-\d\dA\d\d\:\d\d', f)
listX = (len(list1))
list2 = list1[0:listX - 2]
list2.sort()
here is a list of how it looks:
2015-12-28A09:30
2015-12-28A09:30
2015-12-28A09:35
2015-12-28A09:35
2015-12-28A12:00
2015-12-28A12:00
2015-12-28A12:15
2015-12-28A12:15
2015-12-28A14:30
2015-12-28A14:30
2015-12-28A15:15
2015-12-28A15:15
2015-12-28A16:45
2015-12-28A16:45
2015-12-28A17:00
2015-12-28A17:00
2015-12-28A18:15
2015-12-28A18:15
2015-12-29A08:30
2015-12-29A08:30
2015-12-29A08:35
2015-12-29A08:35
2015-12-29A10:45
2015-12-29A10:45
2015-12-29A11:00
2015-12-29A11:00
2015-12-29A13:15
2015-12-29A13:15
2015-12-29A14:00
2015-12-29A14:00
2015-12-29A15:30
2015-12-29A15:30
2015-12-29A15:45
2015-12-29A15:45
2015-12-29A17:15
2015-12-29A17:15
2015-12-30A08:30
2015-12-30A08:30
2015-12-30A08:35
2015-12-30A08:35
2015-12-30A10:45
2015-12-30A10:45
2015-12-30A11:00
2015-12-30A11:00
2015-12-30A13:00
2015-12-30A13:00
2015-12-30A13:45
2015-12-30A13:45
2015-12-30A15:15
2015-12-30A15:15
2015-12-30A15:30
2015-12-30A15:30
2015-12-30A17:15
2015-12-30A17:15
And this is how I want it to look like:
2015-12-28 09:30
2015-12-28 18:15
2015-12-29 08:30
2015-12-29 17:15
2015-12-30 08:30
2015-12-30 17:15

First of all, you should convert all your strings into proper dates, Python can work with. That way, you have a lot more control on it, also to change the formatting later. So let’s parse your dates using datetime.strptime in list2:
from datetime import datetime
dates = [datetime.strptime(item, '%Y-%m-%dA%H:%M') for item in list2]
This creates a new list dates that contains all your dates from list2 but as parsed datetime object.
Now, since you want to get the first and the last date of each day, we somehow have to group your dates by the date component. There are various ways to do that. I’ll be using itertools.groupby for it, with a key function that just looks at the date component of each entry:
from itertools import groupby
for day, times in groupby(dates, lambda x: x.date()):
first, *mid, last = times
print(first)
print(last)
If we run this, we already get your output (without date formatting):
2015-12-28 09:30:00
2015-12-28 18:15:00
2015-12-29 08:30:00
2015-12-29 17:15:00
2015-12-30 08:30:00
2015-12-30 17:15:00
Of course, you can also collect that first and last date in a list first to process the dates later:
filteredDates = []
for day, times in groupby(dates, lambda x: x.date()):
first, *mid, last = times
filteredDates.append(first)
filteredDates.append(last)
And you can also output your dates with a different format using datetime.strftime:
for date in filteredDates:
print(date.strftime('%Y-%m-%d %H:%M'))
That would give us the following output:
2015-12-28 09:30
2015-12-28 18:15
2015-12-29 08:30
2015-12-29 17:15
2015-12-30 08:30
2015-12-30 17:15
If you don’t want to go the route through parsing those dates, of course you could also do this simply by working on the strings. Since they are nicely formatted (i.e. they can be easily compared), you can do that as well. It would look like this then:
for day, times in groupby(list2, lambda x: x[:10]):
first, *mid, last = times
print(first)
print(last)
Producing the following output:
2015-12-28A09:30
2015-12-28A18:15
2015-12-29A08:30
2015-12-29A17:15
2015-12-30A08:30
2015-12-30A17:15

Because your data is ordered you just need to pull the first and last value from each group, you can use re.sub to remove the single letter replacing it with a space then split each date string just comparing the dates:
from re import sub
def grp(l):
it = iter(l)
prev = start = next(it).replace("A"," ")
for dte in it:
dte = dte.replace("A"," ")
# if we have a new date, yield that start and end
if dte.split(None, 1)[0] != prev.split(None,1)[0]:
yield start
yield prev
start = dte
prev = dte
yield start, prev
l=["2015-12-28A09:30", "2015-12-28A09:30", .....................
l[:] = grp(l)
This could also certainly be done as your process the file without sorting by using a dict to group:
from re import findall
from collections import OrderedDict
with open("dates.txt") as f:
od = defaultdict(lambda: {"min": "null", "max": ""})
for line in f:
for dte in findall('20\d\d-\d\d-\d\dA\d\d\:\d\d', line):
dte, tme = dte.split("A")
_dte = "{} {}".format(dte, tme)
if od[dte]["min"] > _dte:
od[dte]["min"] = _dte
if od[dte]["max"] < _dte:
od[dte]["max"] = _dt
print(list(od.values()))
Which will give you the start and end time for each date.
[{'min': '2016-01-03 23:59', 'max': '2016-01-03 23:59'},
{'min': '2015-12-28 00:00', 'max': '2015-12-28 18:15'},
{'min': '2015-12-30 08:30', 'max': '2015-12-30 17:15'},
{'min': '2015-12-29 08:30', 'max': '2015-12-29 17:15'},
{'min': '2015-12-15 08:41', 'max': '2015-12-15 08:41'}]
The start for 2015-12-28 is also 00:00 not 9:30.
if you dates are actually as posted one per line you don't need a regex either:
from collections import defaultdict
with open("dates.txt") as f:
od = defaultdict(lambda: {"min": "null", "max": ""})
for line in f:
dte, tme = line.rstrip().split("A")
_dte = "{} {}".format(dte, tme)
if od[dte]["min"] > _dte:
od[dte]["min"] = _dte
if od[dte]["max"] < _dte:
od[dte]["max"] = _dte
print(list(od.values()
Which would give you the same output.

Related

Sorting dictionary by keys that are dates

When developing a telegram bot, you need to sort the dictionary by date, to output news in chronological order. The problem is the difference in key formats (dates). There is the format %d.%m at %H:M%, and there is %d.%m.%Y at %H:M%.
for k,v in sorted(Dnews_dict.items(), key=lambda x: DT.strptime(x[1].get("time"),'%d.%m.%Y at %H:%M')):
news = f"<b>{v['time']}</b>\n"\
f"{hlink(v['title'],v['url'])}"
await message.answer(news)
This code works fine, but only with 1 date type. As an option, i tried to add a string length condition (length is constant).
if len(round_data) == 18:
for k,v in sorted(Dnews_dict.items(), key=lambda x:DT.strptime(x[1].get("time"),'%d.%m.%Y в %H:%M')):
news = f"<b>{v['time']}</b>\n"\
f"{hlink(v['title'],v['url'])}"
await message.answer(news)
else:
for k,v in sorted(Dnews_dict.items(), key=lambda x:DT.strptime(x[1].get("time"),'%d.%m at %H:%M')):
news = f"<b>{v['time']}</b>\n"\
f"{hlink(v['title'],v['url'])}"
await message.answer(news)
But the condition doesn’t work. But that condition doesn’t work. How can this dilemma be resolved?
enter image description here
While I am unfamiliar with the telegraph bot, the following is a way to deal with datetime data having mixed formats e you have isolated the specific text containing the date time data:
So given a list of mixed datetime string data of the following formats:
datelist = ['08.02.2022 at 23:53',
'13.07 at 18:13',
'23.11.2022 at 19:55',
'15.02 at 01:06',
'09.07.2022 at 14:57',
'09.08 at 04:06',
'19.04.2022 at 07:19',
'28.10 at 21:56',
'19.10.2022 at 02:18',
'23.04 at 18:15']
from dateutil import parser #Utility for handling all the messy dates you encounter along the way
from dateutil.relativedelta import *
for dt in datelist:
print(parser.parse(dt))
Yields:
2022-08-02 23:53:00
2023-01-13 18:13:00
2022-11-23 19:55:00
2023-01-15 01:06:00
2022-09-07 14:57:00
2023-01-09 04:06:00
2022-04-19 07:19:00
2023-01-28 21:56:00
2022-10-19 02:18:00
2023-01-23 18:15:00
If the year 2023 is of concern, you can do the following:
for dt in datelist:
dtx = parser.parse(dt)
if dtx.year == 2023:
dtx = dtx + relativedelta(year=2022)
print(dtx)
Which produces a result keyed to the year 2022:
2022-08-02 23:53:00
2022-01-13 18:13:00
2022-11-23 19:55:00
2022-01-15 01:06:00
2022-09-07 14:57:00
2022-01-09 04:06:00
2022-04-19 07:19:00
2022-01-28 21:56:00
2022-10-19 02:18:00
2022-01-23 18:15:00

bumping to good business day when generating range

I am using pandas pd.bdate_range() to generate a range of dates given a start and end, but it seems to not work as expected.
What I am ultimately after is quarterly dates over a start and end date, but I want the dates to be valid business days.
start = '2015-06-01'
end = '2019-06-01'
dates = pd.bdate_range(start,end,freq='MS')[::3]
unfortunately this includes 2018-09-01 which is a Saturday
is there a more foolproof way to get an index of only business days, also taking account USFederalHolidayCalendar()?
You can take your existing Series and increment to the next business day like so
from pandas.tseries.offsets import BDay
start = '2015-06-01'
end = '2019-06-01'
dates = pd.bdate_range(start,end,freq='MS')[::3]
new_dates = dates.map(lambda x : x + 0*BDay())
Or you can pass BMS to the freq keyword attribute like so
start = '2015-06-01'
end = '2019-06-01'
dates = pd.bdate_range(start,end, freq='BMS')[::3]
Both give this output
DatetimeIndex(['2015-06-01', '2015-09-01', '2015-12-01', '2016-03-01',
'2016-06-01', '2016-09-01', '2016-12-01', '2017-03-01',
'2017-06-01', '2017-09-01', '2017-12-01', '2018-03-01',
'2018-06-01', '2018-09-03', '2018-12-03', '2019-03-01',
'2019-06-03'],
dtype='datetime64[ns]', freq=None)
I think you can pass the following to get what you desire.
freq='BMS' # Business month start
or
freq='BQS' # Business quarter start
Update:
You could do something like this take care of holidays that fall on month/quarter start.
from pandas import DatetimeIndex
from pandas.tseries.holiday import USFederalHolidayCalendar
holidays = USFederalHolidayCalendar().holidays(start, end, return_name=False)
month_dates = pandas.bdate_range(start, end, freq='CBMS', holidays=[holiday for holiday in holidays])
print(month_dates)
print(DatetimeIndex([e[1] for e in zip(month_dates.month, month_dates) if e[0] in {1, 4, 7, 10}]))
DatetimeIndex(['2015-01-02', '2015-02-02', '2015-03-02', '2015-04-01',
'2015-05-01', '2015-06-01', '2015-07-01', '2015-08-03',
'2015-09-01', '2015-10-01', '2015-11-02', '2015-12-01',
'2016-01-04', '2016-02-01', '2016-03-01', '2016-04-01',
'2016-05-02', '2016-06-01', '2016-07-01', '2016-08-01',
'2016-09-01', '2016-10-03', '2016-11-01', '2016-12-01',
'2017-01-03', '2017-02-01', '2017-03-01', '2017-04-03',
'2017-05-01', '2017-06-01', '2017-07-03', '2017-08-01',
'2017-09-01', '2017-10-02', '2017-11-01', '2017-12-01',
'2018-01-02', '2018-02-01', '2018-03-01', '2018-04-02',
'2018-05-01', '2018-06-01', '2018-07-02', '2018-08-01',
'2018-09-04', '2018-10-01', '2018-11-01', '2018-12-03',
'2019-01-02', '2019-02-01', '2019-03-01', '2019-04-01',
'2019-05-01'],
dtype='datetime64[ns]', freq='CBMS')
DatetimeIndex(['2015-01-02', '2015-04-01', '2015-07-01', '2015-10-01',
'2016-01-04', '2016-04-01', '2016-07-01', '2016-10-03',
'2017-01-03', '2017-04-03', '2017-07-03', '2017-10-02',
'2018-01-02', '2018-04-02', '2018-07-02', '2018-10-01',
'2019-01-02', '2019-04-01'],
dtype='datetime64[ns]', freq=None)

Time Duration in python

Given two strings :
ex:
start_time="3:00 PM"
Duration="3:10"
Start time is in 12-hour clock format (ending in AM or PM), and duration time
that indicates the number of hours and minutes
Assume that the start times are valid times.The minutes in the duration time will
be a whole number less than 60, but the hour can be any whole number.
I need to add the duration time to the start time and return the result
(WITHOUT ANY USE OF LIBRARIES).
The result should be in 12-hour clock format (ending in AM or PM) indicates the
number of hours and minutes
ex:
start_time = "6:30 PM"
Duration = "205:12"
# Returns: 7:42 AM
I Tried and finally got the required answer but unable to produce correct AM or PM for
the result after addition.
what I Tried:
start_time = "6:30 PM"
Duration = "205:12"
#My answer =7:42
#expected :7:42 AM
Can someone help me with the logic to produce correct AM or PM after addition of start
time and Duration.
def add_time(a,b):
a=a.split()
b=b.split()
be=int(a[0][:a[0].find(':')])
af=int(a[0][a[0].find(':')+1:])
be1 = int(b[0][:b[0].find(':')])
af1 = int(b[0][b[0].find(':') + 1:])
return(((be+be1)//24)+1)
s=be+(be1)%12
p=af+af1
if ((s>12) and (p<60)) :
return(str(s-12)+":"+str(p))
elif ((s<12) and (p>60)) :
f = p-60
if len(str(f))<=1:
return(str(s+1)+":"+str('0'+str(f)))
else:
return (str(s + 1)+":"+(str(f)))
elif ((s<12) and (p<60)) :
return(str(s)+":"+str(p))
elif ((s>12) and (p>60)):
f=p-60
if len(str(f)) <= 1:
return (str((s -12)+1)+":"+('0' + str(f)))
else:
return (str((s -12)+1)+":"+(str(f)))
print(add_time("10:10 PM", "3:30"))
# Returns: 1:40 AM
print(add_time("11:43 PM", "24:20"))
# Returns: 12:03 AM
Your code does not seem to cover all edge cases, e.g. add_time("11:43 PM", "1:20") returns None because the case s==12 is not covered.
Therefore one should put <= instead of < in the respective if conditions. The case where the addition of the minutes leads to hours greater than 12 although the addition of the hours itself did not, is not covered either. So we should check the minutes first and the hours after that instead of simultaneously.
To make the code more readable, we use f-strings and can use str.split() with an argument, forgive me for changing the code quite a bit:
def add_time(a,b):
start = a.split()
start_h, start_m = [int(val) for val in start[0].split(':')]
start_app = start[1]
dur_h, dur_m = [int(val) for val in b.split(':')]
end_m = start_m+dur_m
end_h = end_m//60
end_m %= 60
end_h += start_h+dur_h
if (end_h//12)%2==0:
end_app = start_app
else:
end_app = 'AM' if start_app=='PM' else 'PM'
return f'{end_h:02}:{end_m:02} {end_app}'

Why is time conversion between epoch seconds and string changing time by 1 calendar year?

I am using the time module of python3 to convert time between seconds and formatted string. Python functions used to generate string are localtime and strftime. To generate the time in seconds, I use string splicing followed by mktime. As I call these repeatedly on each result, only the year changes, always incrementing the seconds by a full year.
Code used is as below:
import time
def time_string(t):
#t is second obtained by time.mktime((yr, mn, dy, hr, mn, sec, 0, 0, 0))
time_struct = time.localtime(t)
time_string = time.strftime("%Y-%m-%d %H:%M:%S", time_struct)
return time_string
def string_time(t_string):
#t_string has format '2020-01-31 08:23:35'
yr = int(t_string[:4])
mn = int(t_string[5:7])
dy = int(t_string[8:10])
hr = int(t_string[11:13])
mn = int(t_string[14:16])
se = int(t_string[17:])
t=int(time.mktime((yr, mn, dy, hr, mn, se, 0, 0, 0)))
return t
t = int(time.mktime((2020, 3, 19, 18, 15, 20, 0, 0, 0)))
print (t)
for x in range(5):
t_st = time_string(t)
print(t_st)
t = string_time(t_st)
print(t)
sys.exit("stopping..")
The results I get from above code execution is as follows:
1584621920
2020-03-19 18:15:20
1616157920
2021-03-19 18:15:20
1647693920
2022-03-19 18:15:20
1679229920
2023-03-19 18:15:20
1710852320
2024-03-19 18:15:20
1742388320
SystemExit: stopping..
What am I doing wrong? Why does this happen?
What is a better way of converting time-string to seconds?
I do not get the purpose of the question, so what you're actually trying to do, however if you have a string of a time, and you want to have the seconds of it, try using datetime.timestamp() instead of a time-string-splicing...
Your code is increasing in the year by one beacuse in your method string_time(t_string) you set the variable mn twice! One time at mn = int(t_string[5:7]) and once at mn = int(t_string[14:16]) which will result in a month of 15 which will adapt the year by 1 year and 3 month which will result in the one year for you
Found time.strptime to solve the problem of converting back from string using the right formatters. The following code eliminated the need to do string splicing
def string_time(t_string):
#t_string has format '2020-01-31 08:23:35'
t_struct = time.strptime(t_string,"%Y-%m-%d %H:%M:%S")
t = int(time.mktime(t_struct))
return t
Korbinian had already found the error in my code. Is there a reason why I should use the datetime module instead of the date module?

Simple datetime conversion from integer or string

Is there a simple way to convert a start and end time input into a list of evenly separated times? the input can be string or integer with format 1000,"1000",or "10:00" in 2400hr format. I've managed to accomplish this in a messy looking way, is there a tighter more efficient way to create this list? As you'll notice I created an array first and then called .tolist() to make the time transformation iteration easier. The problem is that an input of 1030 or 1015 would need to be translated into 1050 or 1025 to create the right spacing but if there were a way I could call a datetime.timedelta or something and cleanly make the array?
start="1000"
end="1600"
total_minutes=(int(end[:2])*60)+int(end[2:])-(int(start[:2])*60)-
int(start[2:])
dog=list(range(0,int(total_minutes),25))
walk=dog_df["Walk Length"][dog_df.index[dog_df["Name"]==self.name][0]]
if walk=='half':
self.dogarr=np.array([(x-25,x,x+25,x+50) for x in dog])
elif walk=='full':
self.dogarr=np.array([(x-25,x,x+25,x+50,x+75,x+100) for x in dog])
else:
self.dogarr=np.array([(x,x+25,x+50) for x in dog])
if int(start[2])!=0:
start=start[:2]+str(int(int(start[2:])*1.667))
self.dogarr+=(int(start))
self.dogarr=self.dogarr.tolist()
z=0
while z<len(self.dogarr):
for timespot in self.dogarr[z].copy():
self.dogarr[z][self.dogarr[z].index(timespot)]=time.strftime('%H%M', time.gmtime(self.dogarr[z][self.dogarr[z].index(timespot)]*36))
z+=1
self.dogarr=np.array(self.dogarr)```
array([['1115', '1130', '1145', '1200'],
['1130', '1145', '1200', '1215'],
['1145', '1200', '1215', '1230'],
['1200', '1215', '1230', '1245'],
['1215', '1230', '1245', '1300']], dtype='<U4')
I'm sure you can figure out to parse times from any number of existing questions. The crux of your question seems to be how to create evenly separated times within a range. Here's a simple way:
start = datetime.datetime(2018,12,20,10) # or use strptime etc.
end = datetime.datetime(2018,12,24,18)
count = 10
interval = (end - start) / count
dt = start
while dt <= end:
print(dt)
dt += interval
The output is:
2018-12-20 10:00:00
2018-12-20 20:24:00
2018-12-21 06:48:00
2018-12-21 17:12:00
2018-12-22 03:36:00
2018-12-22 14:00:00
2018-12-23 00:24:00
2018-12-23 10:48:00
2018-12-23 21:12:00
2018-12-24 07:36:00
2018-12-24 18:00:00

Resources