String does not contain a date - string

I have a DataFrame With This Column :
Mi_Meteo['Time_Instant'].head():
0 2013/11/14 17:00
1 2013/11/14 18:00
2 2013/11/14 19:00
3 2013/11/14 20:00
4 2013/11/14 21:00
Name: Time_Instant, dtype: object
After Doing Some Inspection This is What I realised :
Mi_Meteo['Time_Instant'].value_counts():
2013/12/09 02:00 33
2013/12/01 22:00 33
2013/12/11 10:00 33
2013/12/05 09:00 33
.
.
.
.
2013/11/16 02:00 21
2013/11/07 10:00 11
2013/11/17 22:00 11
DateTIme 3
So I striped it:
Mi_Meteo['Time_Instant'] = Mi_Meteo['Time_Instant'].str.rstrip('DateTIme')# Cause Otherwise I would get this Error When Converting : 'Unknown string format'
And Then I tried To Convert it :
Mi_Meteo['Time_Instant'] = pd.to_datetime(Mi_Meteo['Time_Instant'])
But I Get This Error:
String does not contain a date.
Any Suggestion Would Be Much Appreciated , Thank U all.

A bit late, why don't you use this:
Mi_Meteo['Time_Instant'] = pd.to_datetime(Mi_Meteo['Time_Instant'], errors='coerce')
In the pandas.to_datetime document a description of the 'errors' parameter:
errors{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’ If ‘raise’, then
invalid parsing will raise an exception.
If ‘coerce’, then invalid parsing will be set as NaT.
If ‘ignore’, then invalid parsing will return the input.

I got the same error - it turns out that two of my dates were empty: ' '.
To find the row index of the problematic dates I used the following list comprehension:
badRows = [n for n,x in enumerate(df['DATE'].tolist()) if x.strip() in ['']]
This returned a list, containing the indices of the rows in the 'DATE' column that were causing the problems:
[745672, 745673]
Can then delete these rows in place:
df.drop(df.index[badRows],inplace=True)

I'm having trouble reproducing your error, so I cannot be sure if this will fix the issue you have. If not then please try to provide a minimum sample of code/data that reproduces your error.
This is what I tried to reproduce your situation:
lzt = ['2013/11/16 02:00 ',
'2013/11/07 10:00 ',
'2013/11/17 22:00 ',
'DateTIme',
'DateTIme',
'DateTIme']
ser = pd.Series(lzt)
ser = ser.str.rstrip('DateTIme')
ser = pd.to_datetime(ser)
But as I said I got no error, so either we have a different version of pandas or there's something else wrong with your data. Using rstrip leave some blank string data:
0 2013/11/16 02:00
1 2013/11/07 10:00
2 2013/11/17 22:00
3
4
5
which for me gives NaT (not a time) when I run pd.to_datetime on it:
Out[34]:
0 2013-11-16 02:00:00
1 2013-11-07 10:00:00
2 2013-11-17 22:00:00
3 NaT
4 NaT
5 NaT
dtype: datetime64[ns]
I'd say it's better practice to remove the unwanted rows all together:
ser = ser[ser != 'DateTIme']
Out[39]:
0 2013-11-16 02:00:00
1 2013-11-07 10:00:00
2 2013-11-17 22:00:00
dtype: datetime64[ns]
See if that works, otherwise please give enough information to reproduce the error.

There are two possible solutions to this:
Either you can make the error disappear by using coerce in errors argument of pd.to_datetime() as follows:Mi_Meteo['Time_Instant'] = pd.to_datetime(Mi_Meteo['Time_Instant'], errors='coerce')
Or if you are interested to know which dates have the unparsable values, you can search for them by converting each value at a time as follows. Ths will work regardless of the type or format of the wrong value:
dates = []
wrong_dates = []
for i in Mi_Meteo['Time_Instant'],unique():
try:
date = pd.to_datetime(i)
dates.append(i)
except:
wrong_dates.append(i)
In the wrong_dates list you will have all the wrong values while in dates, all the right values

Related

Formatting printing with spacing Python

I am trying to have all the values indented and at the same level whilst printing the integers and strings specified below. I am trying to have the space equidistant throughout. How would I be able to do that?
print('Consecutive monthly {} results: {}\tIndexes: {} - {}'.format('positive',14,'2019-04-01 00:00:00','2020-06-01 00:00:00'))
print('Consecutive monthly {} results: {}\tIndexes: {} - {}'.format('negative',2,'2018-02-01 00:00:00','2020-06-01 00:00:00'))
Output
Consecutive monthly positive results: 14 Indexes: 2019-04-01 00:00:00 - 2020-06-01 00:00:00
Consecutive monthly negative results: 2 Indexes: 2018-02-01 00:00:00 - 2018-03-01 00:00:00
Expected Output:
Consecutive monthly positive results: 14 Indexes: 2019-04-01 00:00:00 - 2020-06-01 00:00:00
Consecutive monthly negative results: 2 Indexes: 2018-02-01 00:00:00 - 2018-03-01 00:00:00
You can specify the format for decimal number like this.
print('Consecutive monthly {} results: {:<2d}\tIndexes: {} - {}'.format('positive',14,'2019-04-01 00:00:00','2020-06-01 00:00:00'))
print('Consecutive monthly {} results: {:<2d}\tIndexes: {} - {}'.format('negative',2,'2018-02-01 00:00:00','2020-06-01 00:00:00'))
The ':<2d' in the format string specifies the number to be left justified, and space(in characters) it should take for it.

Error 'Not supported for type RangeIndex' when calculating weekly grow rate

I have a dataframe MyDf like this
MyDate MyData
2020-06-02 4588.0
2020-06-03 4555.5
2020-06-04 4604.3
2020-06-05 4634.1
2020-06-06 4617.8
2020-06-07 4598.9
2020-06-08 4596.1
2020-06-09 4607.0
2020-06-10 4601.6
2020-06-11 4547.4
I want to calculate the weekly rate of growth of MyData column this way:
#'WeeklyRate'='MyData'/pastweek('MyData')
from pandas.tseries.offsets import Week
MyDf['WeeklyRate'] = ((MyDf['MyData'] / MyDf['MyData'].shift(1, freq=Week()).reindex(MyDf['MyDate'].index))
.fillna(value=-1)
.astype(float))
#ToDo:not sure if .astype(float)
But I get the error "Not supported for type RangeIndex" when I run the last line.
What am I doing wrong?
I think if you set "MyDate" as your index, it will be solved :)
MyDf = MyDf.set_index("MyDate")
MyDf['WeeklyRate'] = MyDf['MyData'] / MyDf['MyData'].shift(1, freq=Week())
MyDf['WeeklyRate'] = MyDf["WeeklyRate"].fillna(value=-1)

Copy row of data from one pandas dataframe to another

A pandas newbie here. I imported an excel data into pandas, I want to copy subset of data of a specific row (placeholder) from one dataframe (Error_data1) to another dataframe (Error_data2) where the 'placeholder' exists.
Here is the first 4 rows of Error_data1 (it has 150 rows)
index student Error1 Error2 Error3 Error4 Error5
0 Henry 2.5647 -0.2145 1.3524 2.0124 6.2013
1 John -0.0124 1.0365 3.2145 4.0211 -5.0124
2 Terry 1.1120 2.2154 -6.2013 1.2032 2.3321
3 Gerald 9.2105 1.0212 3.2548 3.6478 4.1020
Here is the first 5 rows of Error_data2 (it has 358 rows)
index Day Time student Error1 Error2 Error3 Error4 Error5
0 Mon 01:00 Terry
1 Tue 05:15 John
2 Wed 05:25 john
3 Wed 12:15 Gerald
4 Thur 11:00 Henry
Here is the code i tried
for i in range(len(Error_data1)):
if Error_data1['Student'][i] == Error_data2['Student'][i]:
a = Error_data1.iloc[i,1:6]
Error_data2.iloc[i,4:9] = a
I expect Error_data2 to look like this:
index Day Time student Error1 Error2 Error3 Error4 Error5
0 Mon 01:00 Terry 1.1120 2.2154 -6.2013 1.2032 2.3321
1 Tue 05:15 John -0.0124 1.0365 3.2145 4.0211 -5.0124
2 Wed 05:25 john -0.0124 1.0365 3.2145 4.0211 -5.0124
3 Wed 12:15 Gerald 9.2105 1.0212 3.2548 3.6478 4.1020
4 Thur 11:00 Henry 2.5647 -0.2145 1.3524 2.0124 6.2013
You can try merging the two dataframes on student names.
combined = Error_data1.merge(Error_data2, on='student', how='left').fillna(0)

Convert incomplete 12h datetime-like strings into appropriate datetime type

I've got a pandas Series containing datetime-like strings with 12h format, but without the am/pm abbreviations. It covers an entire month of data :
40 01/01/2017 11:51:00
41 01/01/2017 11:51:05
42 01/01/2017 11:55:05
43 01/01/2017 11:55:10
44 01/01/2017 11:59:30
45 01/01/2017 11:59:35
46 02/01/2017 12:00:05
47 02/01/2017 12:00:10
48 02/01/2017 12:13:20
49 02/01/2017 12:13:25
50 02/01/2017 12:24:50
51 02/01/2017 12:24:55
52 02/01/2017 12:33:30
Name: TS, dtype: object
(318621,) # shape
My goal is to convert it to datetime format, so as to obtain the appropriate unix timestamps values, and make comparisions/arithmetics with other datetime data with, this time, 24h format. So I already tried this :
pd.to_datetime(df.TS, format = '%d/%m/%Y %I:%M:%S') # %I for 12h format
Which outputs me :
64 2017-01-02 00:46:50
65 2017-01-02 00:46:55
66 2017-01-02 01:01:00
67 2017-01-02 01:01:05
68 2017-01-02 01:05:00
But the am/pm informations are not taken into account. I know that, as a rule, the am/pm first have to be specified in the strings, then one can use dt.dt.strptime() or pd.to_datetime() to parse them with the %p indicator.
So I wanted to know if there's an other way to deal with this issue through datetime or pandas datetime modules ? Or, do I have to manualy add the abbreviations 'am/pm' before the parsing ?
You have data in 5 second intervals throughout multiple days. The desired end format is like this (with AM/PM column we need to add, because Pandas cannot possibly guess, since it looks at one value at a time):
31/12/2016 11:59:55 PM
01/01/2017 12:00:00 AM
01/01/2017 12:00:05 AM
01/01/2017 11:59:55 AM
01/01/2017 12:00:00 PM
01/01/2017 12:59:55 PM
01/01/2017 01:00:00 PM
01/01/2017 01:00:05 PM
01/01/2017 11:59:55 PM
02/01/2017 12:00:00 AM
First, we can parse the whole thing without AM/PM info, as you already showed:
ts = pd.to_datetime(df.TS, format = '%d/%m/%Y %I:%M:%S')
We have a small problem: 12:00:00 is parsed as noon, not midnight. Let's normalize that:
ts[ts.dt.hour == 12] -= pd.Timedelta(12, 'h')
Now we have times from 00:00:00 to 11:59:55, twice per day.
Next, note that the transitions are always at 00:00:00. We can easily detect these, as well as the first instance of each date:
twelve = ts.dt.time == datetime.time(0,0,0)
newdate = ts.dt.date.diff() > pd.Timedelta(0)
midnight = twelve & newdate
noon = twelve & ~newdate
Next, build an offset series, which should be easy to inspect for correctness:
offset = pd.Series(np.nan, ts.index, dtype='timedelta64[ns]')
offset[midnight] = pd.Timedelta(0)
offset[noon] = pd.Timedelta(12, 'h')
offset.fillna(method='ffill', inplace=True)
And finally:
ts += offset

Python individual rows for each month's rent in term

I'm stuck on one piece of Python code.
From an XML file, we're parsing data successfully in the following code, excluding the while loops and associated variables. We need to load a table into SQL with the entire rent schedule, by month, for the life of the lease. Rent is always billed on the first of the month but the amount escalates at different times with different amounts depending on the lease. The objective is to return one row per billing month with the date of each months' rent to be billed (YYYY-MM-DD). If the lease is for 60 months and there is a rent escalation in the 25th month, we'll need to show 60 rows with the amount repeating 24 times for the first two years and 36 times for the remainder. The scenario needs to be flexible to adapt to annual increases for some, and a few other variable conditions.
Can someone point out where I've gone wrong in my While Loop to get the desired results?
import xml.etree.ElementTree as ET
import pyodbc
import dateutil.relativedelta as rd
import dateutil.parser as pr
tree = ET.parse('DealData.xml')
root = tree.getroot()
for deal in root.findall("Deals"):
for dl in deal.findall("Deal"):
dealid = dl.get("DealID")
for dts in dl.findall("DealTerms/DealTerm"):
dtid = dts.get("ID")
dstart = pr.parse(dts.find("CommencementDate").text)
dterm = dts.find("LeaseTerm").text
darea = dts.find("RentableArea").text
for brrent in dts.findall("BaseRents/BaseRent"):
brid = brrent.get("ID")
begmo = int(brrent.find("BeginIn").text)
if brrent.find("Duration").text is not None:
duration = int(brrent.find("Duration").text)
else:
duration = 0
brentamt = brrent.find("Rent").text
brper = brrent.find("Period").text
perst = dstart + rd.relativedelta(months=begmo-1)
perend = perst + rd.relativedelta(months=duration-1)
billmocount = begmo
while billmocount < duration:
monthnum = billmocount
billmocount += 1
billmo = perst
while billmo < perend:
billper = billmo
billmo += rd.relativedelta(months=1)
if dealid == "706880":
print(dealid, dtid, brid, begmo, dstart, dterm, darea, brentamt, brper, duration, perst, perend, \
monthnum, billper)
The results I'm getting look like this:
706880 4278580 45937180 1 2018-01-01 00:00:00 60 6200 15.0 rsf/year 36 2018-01-01 00:00:00 2020-12-01 00:00:00 35 2020-11-01 00:00:00
706880 4278580 45937181 37 2018-01-01 00:00:00 60 6200 18.0 rsf/year 24 2021-01-01 00:00:00 2022-12-01 00:00:00 35 2022-11-01 00:00:00
The problem that I was running into was simply the indentation of the print statement. By indenting the following text, I was able to get the expected results:
if dealid == "706880":
print(dealid, dtid, brid, begmo, dstart, dterm, darea, brentamt, brper, duration, perst, perend, \
monthnum, billper)

Resources