Python lambda function error when used with brackets - python-3.x

I have a pandas data frame with 'Datetime' column containing timestamp information, I want to extract the hour information from the 'Datetime' column and add it to the hour column of the data frame.
I am confused since my code works if I write the lambda function without the braces
df['Datetime'].apply(lambda x: x.hour)
but, when I try this code instead
df['Datetime'].apply(lambda x: x.hour**()**)
I get the error "TypeError: 'int' object is not callable".
On the other hand when I use split function with lambda expression, it works completely fine with the braces -
df['Reasons'] = df['title'].apply(lambda x: x.split(':')[0])

As mentioned by #Dani Mesejo, hour is an attribute of the datetime object. Hence it is working fine without brackets. Once you add brackets, it assumes the hour is a function and so you are getting that error.
You can read more about datetime object in its documentation

Related

Convert all dates from a data frame column python

I have a csv file that have a column with the date that ppl get vaccinated, in format 'YYYY-MM-DD' as string. Then, my goal its add X days to the respective date, with X based on the vaccine that these person got. In order to add days to a date, i've to convert the string date to iso date, so i need to loop each element in that column conveting those dates. Im kinda new to Python and im not getting really right how do deal with it.
So i read and create a data frame with pandas, then i tryed as follow in the image:
df column content and for try
I dont know why im getting this error, i tryed different ways to deal with it but cant figure it out.
Thx
This is because the type of values is 'str,' and 'str' does not have 'fromisoformat' method. I would recommend you to convert a type of the values to 'datetime' instead of 'str,' so that you can do whatever you want regarding date calculation such as calculating X days from a specific date.
You can convert the values from 'str' to 'datetime' and do what you want as follows:
import pandas as pd
import datetime
df_reduzido['vacina_dataAplicacao'] = pd.to_datetime(df_reduzido['vacina_dataAplicacao'] , format='%Y-%m-%d')
df_reduzido['vacina_dataAplicacao'] = df_reduzido['vacina_dataAplicacao'] + datetime.datetime.timedelta(days=3)
print(df_reduzido['vacina_dataAplicacao']) # 3 days added
You can study how to deal with datetime in detail here: https://docs.python.org/3/library/datetime.html
Thanks for your help Sangkeun. Just want to point out that, for some reason, python was returning me error saying: "'AttributeError: type object 'datetime.datetime' has no attribute 'datetime'".
Then i've found a solution by calling
import datetime
from datetime import timedelta, date, datetime
Then using " + timedelta() ", like this:
df_reduzido['vacina_dataAplicacao'] = ( pd.to_datetime(df_reduzido['vacina_dataAplicacao'] , format='%Y-%m-%d', utc=False) + timedelta(days=10) ).dt.date
At the end, i set ().dt.date in order to rid off the time from pd.to_datetime(). Look that i tryed to set utc=False hoping that this would do the job but nothing happened. Anyway,
i'm grateful for your help.
Problem solved.

Python: how can I get the mode from a month column that i extracted from a datetime column?

I'm new at this! Doing my first Python project. :)
My tasks are:
convert df['Start Time'] from string to datetime
create a month column from df['Start Time']
get the mode of that month.
I used a few different ways to do all 3 of the steps, but trying to get the mode always returns TypeError: tuple indices must be integers or slices, not str. This happens even if I try converting the "tuple" into a list or NumPy array.
Ways I tried to extract month from Start Time:
df['extracted_month'] = pd.DatetimeIndex(df['Start Time']).month
df['extracted_month'] = np.asarray(df['extracted_month'])
df['extracted_month'] = df['Start Time'].dt.month
Ways I've tried to get the mode:
print(df['extracted_month'].mode())
print(df['extracted_month'].mode()[0])
print(stat.mode(df['extracted_month']))
Trying to get the index with df.columns.get_loc("extracted_month") then replacing it in the mode code gives me the SAME error (TypeError: tuple indices must be integers or slices, not str).
I think I should convert df['extracted_month'] into a different... something. What is it?
Note: My extracted_month column is a STRING, but you should still be able to get the mode from a string variable! I'm not changing it, that would be giving up.
Edit: using the following code still results in the same error
extracted_month = pd.Index(df['extracted_month'])
print(extracted_month.value_counts())
The error is likely caused by the way you are creating your dataframe.
If the dataframe is created in another function, and that function returns other things along with the dataframe, but you assign it to the variable df, then df will be a tuple that contains the actual dataframe, and not the dataframe itself.

Get days of the week on a series with DateTime Index

I am sure this is simple but...
I have a timestamped data which I convert to a data frame. From there I resample the data I am interested in using the timestamp column. This part works fine and the resample() function gives me a series which has the date time object as the index.
fiveminbins = mydata['measurement'].resample('5min').sum()
giving
Date
2019-04-05 04:55:00+01:00 160
2019-04-05 05:00:00+01:00 0
2019-04-05 05:05:00+01:00 0
2019-04-05 05:10:00+01:00 0
I now want to add days of the week, but for the life of me I can't get methods using either DateTime.dayofweek or dt.dayofweek to work on the index of this series. From the examples I've seen online I should be able to use
fiveminbins.dt.dayofweek
but the returns
AttributeError: Can only use .dt accessor with datetimelike values
I've tried calling it on the index of the series specifically
fiveminbins.index.dt.dayofweek
AttributeError: 'DatetimeIndex' object has no attribute 'dt'
So I tried using DatetimeIndex.dayofweek
fiveminbins.DatetimeIndex.dayofweek
'Series' object has no attribute 'DatetimeIndex'
and then on the index specifically
fiveminbins.index.DatetimeIndex.dayofweek
AttributeError: 'DatetimeIndex' object has no attribute 'DatetimeIndex'
and so now I'm lost doing something I've done several times before, just not on the index of a column...
You could try to reset the index, enforce the column to be a datetime and it should work:
fiveminbins = fiveminbins.reset_index()
fiveminbins.Date = pd.to_datetime(fiveminbins.Date)
print(fiveminbins.Date.dt.dayofweek)

replacing a special character in a pandas dataframe

I have a dataset that '?' instead of 'NaN' for missing values. I could have gone through each column using replace but the only problem is I have 22 columns. I am trying to create a loop do it effectively but I am getting wrong. Here is what I am doing:
for col in adult.columns:
if adult[col]=='?':
adult[col]=adult[col].str.replace('?', 'NaN')
The plan is to use the 'NaN' then use the fillna function or to drop them with dropna. The second problem is that not all the columns are categorical so the str function is also wrong. How can I easily deal with this situation?
If you're reading the data from a .csv or .xlsx file you can use the na_values parameter:
adult = pd.read_csv('path/to/file.csv', na_values=['?'])
Otherwise do what #MasonCaiby said and use adult.replace('?', float('nan'))

How to use select() transformation in Apache Spark?

I am following the Intro to Spark course on edX. However, I cant understand few things, following is an lab assignment. FYI, I am not looking for solution.
I am not able to understand as why I am receiving the error
TypeError: 'Column' object is not callable
Following is the code
from pyspark.sql.functions import regexp_replace, trim, col, lower
def removePunctuation(column):
"""
Args:
column (Column): A Column containing a sentence.
"""
# This following is giving error. I believe I am calling all the rows from the dataframe 'column' where the attribute is named as 'sentence'
result = column.select('sentence')
return result
sentenceDF = sqlContext.createDataFrame([('Hi, you!',),
(' No under_score!',),
(' * Remove punctuation then spaces * ',)], ['sentence'])
sentenceDF.show(truncate=False)
(sentenceDF
.select(removePunctuation(col('sentence')))
.show(truncate=False))
Can you be little elaborate? TIA.
The column parameter is not a DataFrame object and, therefore, does not have access to the select method. You'll need to use other functions to solve this problem.
Hint: Look at the import statement.

Resources