I'm stuck on one piece of Python code.
From an XML file, we're parsing data successfully in the following code, excluding the while loops and associated variables. We need to load a table into SQL with the entire rent schedule, by month, for the life of the lease. Rent is always billed on the first of the month but the amount escalates at different times with different amounts depending on the lease. The objective is to return one row per billing month with the date of each months' rent to be billed (YYYY-MM-DD). If the lease is for 60 months and there is a rent escalation in the 25th month, we'll need to show 60 rows with the amount repeating 24 times for the first two years and 36 times for the remainder. The scenario needs to be flexible to adapt to annual increases for some, and a few other variable conditions.
Can someone point out where I've gone wrong in my While Loop to get the desired results?
import xml.etree.ElementTree as ET
import pyodbc
import dateutil.relativedelta as rd
import dateutil.parser as pr
tree = ET.parse('DealData.xml')
root = tree.getroot()
for deal in root.findall("Deals"):
for dl in deal.findall("Deal"):
dealid = dl.get("DealID")
for dts in dl.findall("DealTerms/DealTerm"):
dtid = dts.get("ID")
dstart = pr.parse(dts.find("CommencementDate").text)
dterm = dts.find("LeaseTerm").text
darea = dts.find("RentableArea").text
for brrent in dts.findall("BaseRents/BaseRent"):
brid = brrent.get("ID")
begmo = int(brrent.find("BeginIn").text)
if brrent.find("Duration").text is not None:
duration = int(brrent.find("Duration").text)
else:
duration = 0
brentamt = brrent.find("Rent").text
brper = brrent.find("Period").text
perst = dstart + rd.relativedelta(months=begmo-1)
perend = perst + rd.relativedelta(months=duration-1)
billmocount = begmo
while billmocount < duration:
monthnum = billmocount
billmocount += 1
billmo = perst
while billmo < perend:
billper = billmo
billmo += rd.relativedelta(months=1)
if dealid == "706880":
print(dealid, dtid, brid, begmo, dstart, dterm, darea, brentamt, brper, duration, perst, perend, \
monthnum, billper)
The results I'm getting look like this:
706880 4278580 45937180 1 2018-01-01 00:00:00 60 6200 15.0 rsf/year 36 2018-01-01 00:00:00 2020-12-01 00:00:00 35 2020-11-01 00:00:00
706880 4278580 45937181 37 2018-01-01 00:00:00 60 6200 18.0 rsf/year 24 2021-01-01 00:00:00 2022-12-01 00:00:00 35 2022-11-01 00:00:00
The problem that I was running into was simply the indentation of the print statement. By indenting the following text, I was able to get the expected results:
if dealid == "706880":
print(dealid, dtid, brid, begmo, dstart, dterm, darea, brentamt, brper, duration, perst, perend, \
monthnum, billper)
Related
I have a dataframe MyDf like this
MyDate MyData
2020-06-02 4588.0
2020-06-03 4555.5
2020-06-04 4604.3
2020-06-05 4634.1
2020-06-06 4617.8
2020-06-07 4598.9
2020-06-08 4596.1
2020-06-09 4607.0
2020-06-10 4601.6
2020-06-11 4547.4
I want to calculate the weekly rate of growth of MyData column this way:
#'WeeklyRate'='MyData'/pastweek('MyData')
from pandas.tseries.offsets import Week
MyDf['WeeklyRate'] = ((MyDf['MyData'] / MyDf['MyData'].shift(1, freq=Week()).reindex(MyDf['MyDate'].index))
.fillna(value=-1)
.astype(float))
#ToDo:not sure if .astype(float)
But I get the error "Not supported for type RangeIndex" when I run the last line.
What am I doing wrong?
I think if you set "MyDate" as your index, it will be solved :)
MyDf = MyDf.set_index("MyDate")
MyDf['WeeklyRate'] = MyDf['MyData'] / MyDf['MyData'].shift(1, freq=Week())
MyDf['WeeklyRate'] = MyDf["WeeklyRate"].fillna(value=-1)
I'm doing an analysis of a gym and want to know if we should get more treadmills or not.
The running history record contains the users name, start time, end time and duration.
My idea is to use a pivot table in Excel, put the start date and time into the row, and the sum of duration in the value, the right click on the time, click group, by hour, then the duration will be the sum for each hour from morning to evening.
If the sum of duration of each hour is close to 120 mins for 2 treadmills, it means it's full.
But if people run over the hour, for example, from 1:30pm to 2:30pm, the duration will all count to 1-2pm, so it's not correct.
Does anyone have a good method?
Thanks
Here is some sample data
name sex device id device type start date start time end date end time duration(mins) distance(km) Calorie
Emmie Aguila Female a001 Treadmill 2020-7-25 9:34:18 2020-7-25 10:20:20 46 4.5 338.6
Dusty Gorham Female a002 Treadmill 2020-7-25 9:13:45 2020-7-25 9:49:02 35 3.1 192.2
Diann Lafreniere Female a001 Treadmill 2020-7-25 9:12:06 2020-7-25 9:33:27 21 2.1 142.6
Rima Hoop Male a001 Treadmill 2020-7-25 7:10:10 2020-7-25 7:30:14 20 2.4 230.8
my pivot result
One of mine friend find a solution for Excel.
Convert start/end time to seconds, for example, 0-3600 for 0-1am, 3600-7200 for 1-2pm
Get the overlap time length by comparing the duration with each hour.
> name sex mobile device-id device-type start date starttime endtime duration(mins) distance(km) calorie Start–sec end-sec 3600 7200 10800 14400 18000 21600 25200 28800 32400 36000 39600 43200 46800 50400 54000 57600 61200 64800 68400 72000 75600 79200 82800 86400
> Emmie Female 1.23412E+11 a001 Treadmill 2020-7-25 9:34:18 2020-07-25 10:20:20 46 4.5 338.6 34458 37220 0 0 0 0 0 0 0 0 1542 1220 0 0 0 0 0 0 0 0 0 0 0 0 0
(I tried many times but last few zero digits can not align correctly)
convert time to seconds
starttiem(H3) -> start-sec(M3)
=HOUR(H3)*3600+MINUTE(H3)*60+SECOND(H3)
endtime(I3) -> end-sec(N3)
=HOUR(I3)*3600+MINUTE(I3)*60+SECOND(I3)
2.compare the duration with each hour in secods
=IF(OR(O$1>$N3, $M3>O$2), 0, MIN($N3, O$2)- MAX($M3, O$1))
Apply the fomula for whole sheet, then the data is ready.
Create a pivot sheet, drag the date into the row, all the hour seconds(1-24) into column, we can see the result clearly by creating a bar chart.
usage for treadmills for each day
I have a DataFrame With This Column :
Mi_Meteo['Time_Instant'].head():
0 2013/11/14 17:00
1 2013/11/14 18:00
2 2013/11/14 19:00
3 2013/11/14 20:00
4 2013/11/14 21:00
Name: Time_Instant, dtype: object
After Doing Some Inspection This is What I realised :
Mi_Meteo['Time_Instant'].value_counts():
2013/12/09 02:00 33
2013/12/01 22:00 33
2013/12/11 10:00 33
2013/12/05 09:00 33
.
.
.
.
2013/11/16 02:00 21
2013/11/07 10:00 11
2013/11/17 22:00 11
DateTIme 3
So I striped it:
Mi_Meteo['Time_Instant'] = Mi_Meteo['Time_Instant'].str.rstrip('DateTIme')# Cause Otherwise I would get this Error When Converting : 'Unknown string format'
And Then I tried To Convert it :
Mi_Meteo['Time_Instant'] = pd.to_datetime(Mi_Meteo['Time_Instant'])
But I Get This Error:
String does not contain a date.
Any Suggestion Would Be Much Appreciated , Thank U all.
A bit late, why don't you use this:
Mi_Meteo['Time_Instant'] = pd.to_datetime(Mi_Meteo['Time_Instant'], errors='coerce')
In the pandas.to_datetime document a description of the 'errors' parameter:
errors{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’ If ‘raise’, then
invalid parsing will raise an exception.
If ‘coerce’, then invalid parsing will be set as NaT.
If ‘ignore’, then invalid parsing will return the input.
I got the same error - it turns out that two of my dates were empty: ' '.
To find the row index of the problematic dates I used the following list comprehension:
badRows = [n for n,x in enumerate(df['DATE'].tolist()) if x.strip() in ['']]
This returned a list, containing the indices of the rows in the 'DATE' column that were causing the problems:
[745672, 745673]
Can then delete these rows in place:
df.drop(df.index[badRows],inplace=True)
I'm having trouble reproducing your error, so I cannot be sure if this will fix the issue you have. If not then please try to provide a minimum sample of code/data that reproduces your error.
This is what I tried to reproduce your situation:
lzt = ['2013/11/16 02:00 ',
'2013/11/07 10:00 ',
'2013/11/17 22:00 ',
'DateTIme',
'DateTIme',
'DateTIme']
ser = pd.Series(lzt)
ser = ser.str.rstrip('DateTIme')
ser = pd.to_datetime(ser)
But as I said I got no error, so either we have a different version of pandas or there's something else wrong with your data. Using rstrip leave some blank string data:
0 2013/11/16 02:00
1 2013/11/07 10:00
2 2013/11/17 22:00
3
4
5
which for me gives NaT (not a time) when I run pd.to_datetime on it:
Out[34]:
0 2013-11-16 02:00:00
1 2013-11-07 10:00:00
2 2013-11-17 22:00:00
3 NaT
4 NaT
5 NaT
dtype: datetime64[ns]
I'd say it's better practice to remove the unwanted rows all together:
ser = ser[ser != 'DateTIme']
Out[39]:
0 2013-11-16 02:00:00
1 2013-11-07 10:00:00
2 2013-11-17 22:00:00
dtype: datetime64[ns]
See if that works, otherwise please give enough information to reproduce the error.
There are two possible solutions to this:
Either you can make the error disappear by using coerce in errors argument of pd.to_datetime() as follows:Mi_Meteo['Time_Instant'] = pd.to_datetime(Mi_Meteo['Time_Instant'], errors='coerce')
Or if you are interested to know which dates have the unparsable values, you can search for them by converting each value at a time as follows. Ths will work regardless of the type or format of the wrong value:
dates = []
wrong_dates = []
for i in Mi_Meteo['Time_Instant'],unique():
try:
date = pd.to_datetime(i)
dates.append(i)
except:
wrong_dates.append(i)
In the wrong_dates list you will have all the wrong values while in dates, all the right values
I've got a pandas Series containing datetime-like strings with 12h format, but without the am/pm abbreviations. It covers an entire month of data :
40 01/01/2017 11:51:00
41 01/01/2017 11:51:05
42 01/01/2017 11:55:05
43 01/01/2017 11:55:10
44 01/01/2017 11:59:30
45 01/01/2017 11:59:35
46 02/01/2017 12:00:05
47 02/01/2017 12:00:10
48 02/01/2017 12:13:20
49 02/01/2017 12:13:25
50 02/01/2017 12:24:50
51 02/01/2017 12:24:55
52 02/01/2017 12:33:30
Name: TS, dtype: object
(318621,) # shape
My goal is to convert it to datetime format, so as to obtain the appropriate unix timestamps values, and make comparisions/arithmetics with other datetime data with, this time, 24h format. So I already tried this :
pd.to_datetime(df.TS, format = '%d/%m/%Y %I:%M:%S') # %I for 12h format
Which outputs me :
64 2017-01-02 00:46:50
65 2017-01-02 00:46:55
66 2017-01-02 01:01:00
67 2017-01-02 01:01:05
68 2017-01-02 01:05:00
But the am/pm informations are not taken into account. I know that, as a rule, the am/pm first have to be specified in the strings, then one can use dt.dt.strptime() or pd.to_datetime() to parse them with the %p indicator.
So I wanted to know if there's an other way to deal with this issue through datetime or pandas datetime modules ? Or, do I have to manualy add the abbreviations 'am/pm' before the parsing ?
You have data in 5 second intervals throughout multiple days. The desired end format is like this (with AM/PM column we need to add, because Pandas cannot possibly guess, since it looks at one value at a time):
31/12/2016 11:59:55 PM
01/01/2017 12:00:00 AM
01/01/2017 12:00:05 AM
01/01/2017 11:59:55 AM
01/01/2017 12:00:00 PM
01/01/2017 12:59:55 PM
01/01/2017 01:00:00 PM
01/01/2017 01:00:05 PM
01/01/2017 11:59:55 PM
02/01/2017 12:00:00 AM
First, we can parse the whole thing without AM/PM info, as you already showed:
ts = pd.to_datetime(df.TS, format = '%d/%m/%Y %I:%M:%S')
We have a small problem: 12:00:00 is parsed as noon, not midnight. Let's normalize that:
ts[ts.dt.hour == 12] -= pd.Timedelta(12, 'h')
Now we have times from 00:00:00 to 11:59:55, twice per day.
Next, note that the transitions are always at 00:00:00. We can easily detect these, as well as the first instance of each date:
twelve = ts.dt.time == datetime.time(0,0,0)
newdate = ts.dt.date.diff() > pd.Timedelta(0)
midnight = twelve & newdate
noon = twelve & ~newdate
Next, build an offset series, which should be easy to inspect for correctness:
offset = pd.Series(np.nan, ts.index, dtype='timedelta64[ns]')
offset[midnight] = pd.Timedelta(0)
offset[noon] = pd.Timedelta(12, 'h')
offset.fillna(method='ffill', inplace=True)
And finally:
ts += offset
I have table as
Id Name Date Time
1 S 1-Dec-2009 9:00
2 N 1-Dec-2009 10:00
1 S 1-Dec-2009 10:30
1 S 1-Dec-2009 11:00
2 N 1-Dec-2009 11:10
Need query to display as
Id Name Date Time
1 S 1-Dec-2009 9:00
1 S 1-Dec-2009 11:00
2 N 1-Dec-2009 10:00
2 N 1-Dec-2009 11:10
My backend database is MS Access and using VB6 for Max and Min time
I would make an additional two [int] columns, say hour and minute and then use an MS Access query to sort them. It would be MUCH easier to call that in VB. The query itself would be something like the following:
SELECT * FROM YOURTABLE ORDER BY id, hour, minute;