Get days of the week on a series with DateTime Index - python-3.x

I am sure this is simple but...
I have a timestamped data which I convert to a data frame. From there I resample the data I am interested in using the timestamp column. This part works fine and the resample() function gives me a series which has the date time object as the index.
fiveminbins = mydata['measurement'].resample('5min').sum()
giving
Date
2019-04-05 04:55:00+01:00 160
2019-04-05 05:00:00+01:00 0
2019-04-05 05:05:00+01:00 0
2019-04-05 05:10:00+01:00 0
I now want to add days of the week, but for the life of me I can't get methods using either DateTime.dayofweek or dt.dayofweek to work on the index of this series. From the examples I've seen online I should be able to use
fiveminbins.dt.dayofweek
but the returns
AttributeError: Can only use .dt accessor with datetimelike values
I've tried calling it on the index of the series specifically
fiveminbins.index.dt.dayofweek
AttributeError: 'DatetimeIndex' object has no attribute 'dt'
So I tried using DatetimeIndex.dayofweek
fiveminbins.DatetimeIndex.dayofweek
'Series' object has no attribute 'DatetimeIndex'
and then on the index specifically
fiveminbins.index.DatetimeIndex.dayofweek
AttributeError: 'DatetimeIndex' object has no attribute 'DatetimeIndex'
and so now I'm lost doing something I've done several times before, just not on the index of a column...

You could try to reset the index, enforce the column to be a datetime and it should work:
fiveminbins = fiveminbins.reset_index()
fiveminbins.Date = pd.to_datetime(fiveminbins.Date)
print(fiveminbins.Date.dt.dayofweek)

Related

Convert all dates from a data frame column python

I have a csv file that have a column with the date that ppl get vaccinated, in format 'YYYY-MM-DD' as string. Then, my goal its add X days to the respective date, with X based on the vaccine that these person got. In order to add days to a date, i've to convert the string date to iso date, so i need to loop each element in that column conveting those dates. Im kinda new to Python and im not getting really right how do deal with it.
So i read and create a data frame with pandas, then i tryed as follow in the image:
df column content and for try
I dont know why im getting this error, i tryed different ways to deal with it but cant figure it out.
Thx
This is because the type of values is 'str,' and 'str' does not have 'fromisoformat' method. I would recommend you to convert a type of the values to 'datetime' instead of 'str,' so that you can do whatever you want regarding date calculation such as calculating X days from a specific date.
You can convert the values from 'str' to 'datetime' and do what you want as follows:
import pandas as pd
import datetime
df_reduzido['vacina_dataAplicacao'] = pd.to_datetime(df_reduzido['vacina_dataAplicacao'] , format='%Y-%m-%d')
df_reduzido['vacina_dataAplicacao'] = df_reduzido['vacina_dataAplicacao'] + datetime.datetime.timedelta(days=3)
print(df_reduzido['vacina_dataAplicacao']) # 3 days added
You can study how to deal with datetime in detail here: https://docs.python.org/3/library/datetime.html
Thanks for your help Sangkeun. Just want to point out that, for some reason, python was returning me error saying: "'AttributeError: type object 'datetime.datetime' has no attribute 'datetime'".
Then i've found a solution by calling
import datetime
from datetime import timedelta, date, datetime
Then using " + timedelta() ", like this:
df_reduzido['vacina_dataAplicacao'] = ( pd.to_datetime(df_reduzido['vacina_dataAplicacao'] , format='%Y-%m-%d', utc=False) + timedelta(days=10) ).dt.date
At the end, i set ().dt.date in order to rid off the time from pd.to_datetime(). Look that i tryed to set utc=False hoping that this would do the job but nothing happened. Anyway,
i'm grateful for your help.
Problem solved.

Python lambda function error when used with brackets

I have a pandas data frame with 'Datetime' column containing timestamp information, I want to extract the hour information from the 'Datetime' column and add it to the hour column of the data frame.
I am confused since my code works if I write the lambda function without the braces
df['Datetime'].apply(lambda x: x.hour)
but, when I try this code instead
df['Datetime'].apply(lambda x: x.hour**()**)
I get the error "TypeError: 'int' object is not callable".
On the other hand when I use split function with lambda expression, it works completely fine with the braces -
df['Reasons'] = df['title'].apply(lambda x: x.split(':')[0])
As mentioned by #Dani Mesejo, hour is an attribute of the datetime object. Hence it is working fine without brackets. Once you add brackets, it assumes the hour is a function and so you are getting that error.
You can read more about datetime object in its documentation

Python: how can I get the mode from a month column that i extracted from a datetime column?

I'm new at this! Doing my first Python project. :)
My tasks are:
convert df['Start Time'] from string to datetime
create a month column from df['Start Time']
get the mode of that month.
I used a few different ways to do all 3 of the steps, but trying to get the mode always returns TypeError: tuple indices must be integers or slices, not str. This happens even if I try converting the "tuple" into a list or NumPy array.
Ways I tried to extract month from Start Time:
df['extracted_month'] = pd.DatetimeIndex(df['Start Time']).month
df['extracted_month'] = np.asarray(df['extracted_month'])
df['extracted_month'] = df['Start Time'].dt.month
Ways I've tried to get the mode:
print(df['extracted_month'].mode())
print(df['extracted_month'].mode()[0])
print(stat.mode(df['extracted_month']))
Trying to get the index with df.columns.get_loc("extracted_month") then replacing it in the mode code gives me the SAME error (TypeError: tuple indices must be integers or slices, not str).
I think I should convert df['extracted_month'] into a different... something. What is it?
Note: My extracted_month column is a STRING, but you should still be able to get the mode from a string variable! I'm not changing it, that would be giving up.
Edit: using the following code still results in the same error
extracted_month = pd.Index(df['extracted_month'])
print(extracted_month.value_counts())
The error is likely caused by the way you are creating your dataframe.
If the dataframe is created in another function, and that function returns other things along with the dataframe, but you assign it to the variable df, then df will be a tuple that contains the actual dataframe, and not the dataframe itself.

Convert a pandas Timestamp list

In my variable 'Datelist3' there is a pandas Timestamp list, in the following format:
[Timestamp('2019-12-04 09:00:00+0100', tz='Europe/Rome'), Timestamp('2019-12-04 09:30:00+0100', tz='Europe/Rome'), ....]
I'm having difficulty converting this list to a datetime string list, in this format:
['2019-12-04 09:00:00', '2019-12-04 09:30:00', .....]
I did these tests:
Datelist3.to_datetime # -> error: 'list' object has no attribute 'to_datetime'
Datelist3.dt.to_datetime # -> error: 'list' object has no attribute 'dt'
Datelist3.to_pydatetime() # -> error: 'list' object has no attribute 'to_pydatetime()'
Datelist3.dt.to_pydatetime() # -> error: 'list' object has no attribute 'dt'
I got to the variable 'Datelist3' with the following statement:
Datelist3 = quoteIntraPlot.index.tolist()
If I this instruction changes it to:
Datelist3 = quoteIntraPlot.index.strftime("%Y-%m-%d %H:%M:%S").tolist()
That's exactly what I want to achieve.
The problem is that out of 10 times, 6-7 times is ok and 3-4 times it gives me an error: " 'Index' object has no 'strftime' ". It's very strange. How could I solve this problem?
If your data is well formed, this would work :
time_list = [Timestamp('2019-12-04 09:00:00+0100', tz='Europe/Rome'), Timestamp('2019-12-04 09:30:00+0100', tz='Europe/Rome'), ....]
str_list = [t.strftime("%Y-%m-%d %H:%M:%S") for t in time_list]
However, if you face the same error as before, it means that not all your index are timestamps. In this case, you need to clean your data first.

AttributeError: 'numpy.ndarray' object has no attribute 'sqrt'

I am trying to split dataframe in equal samples and applying some function to calculate value of each sample if any sample value greater than 0.3 then in result dataframe i want to save filename
df=pd.DataFrame({'Value':[-0.016,-0.006,0.003,-0.011,-0.036,-0.031,-0.014,-0.006,-0.01 ,-0.009,0.004,0.001,-0.012,-0.021,-0.008,0.001,-0.011,-0.01,-0.006,0.002,0.004],'Nmae':[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]})
x=pd.DataFrame([x.values.sqrt(np.mean(df2['Value']**2)) for x in np.array_split(df2, (len(df2)/10))])
getting this error
AttributeError: 'numpy.ndarray' object has no attribute 'sqrt'
if someone have any other effective way to do this task
This is a working version of your Code:
res= [np.sqrt(np.mean((x.Value**2))) for x in np.array_split(df, (len(df)/10))]
An alternative way of approaching this with Pandas would be. You define a new column 'Split_variable' and use it to apply your calculations:
df.groupby('Split_variable')['Value'].apply(lambda x: np.sqrt(np.mean((x**2))))

Resources