Python: Convert time expressed in seconds to datetime for a series - python-3.x

I have a column of times expressed as seconds since Jan 1, 1990, that I need to convert to a DateTime. I can figure out how to do this for a constant (e.g. add 10 seconds), but not a series or column.
I eventually tried writing a loop to do this one row at a time. (Probably not the right way, and I'm new to python).
This code works for a single row:
def addSecs(secs):
fulldate = datetime(1990,1,1)
fulldate = fulldate + timedelta(seconds=secs)
return fulldate
b= addSecs(intag112['outTags_1_2'].iloc[1])
print(b)
2018-06-20 01:05:13
Does anyone know an easy way to do this for a whole column in a dataframe?
I tried this:
for i in range(len(intag112)):
intag112['TransactionTime'].iloc[i]=addSecs(intag112['outTags_1_2'].iloc[i])
but it errored out.

If you want to do something with column (series) in DataFrame you can use apply method, for example:
import datetime
# New column 'datetime' is created from old 'seconds'
df['datetime'] = df['seconds'].apply(lambda x: datetime.datetime.fromtimestamp(x))
Check documentation for more examples. Overall advice - try to think in terms of vectors (or series) of values. Most operations in pandas can be done with entire series or even dataframe.

Related

How to Subtract a column from another column if a condition is met, otherwise subtract from a different column?

I'm working with trading data and Pandas. Given a 4-column OHLC pandas DataFrame that is 100 rows in length, I'm trying to calculate if an "Upper Shadow" exists or not for an individual row and store the result in its own column. To calculate if an "Upper Shadow" exists all you have to do is take the high (H) value of the row and subtract the open (O) value if the close (C) value is less than the open value. Otherwise, you have to subtract the close value.
Right now I'm naively doing this in a for loop where I iterate over each row with an if statement.
for index, row in df.iterrows():
if row["close"] >= row["open"]:
df.at[index,"upper_shadow"]=float(row["high"]) - float(row["close"])
else:
df.at[index,"upper_shadow"]=float(row["high"]) - float(row["open"])
Is there a better way to do this?
You can use np.maximum to calculate the maximum of close and open in a vectorized way:
import numpy as np
df['upper_shadow'] = df['high'] - np.maximum(df['close'], df['open'])
I think #Psidom's solution is what you are looking for. However the following piece of code is another way of writing what you already have using apply-lambda...
df["upper_shadow"] = df.apply(lambda row: float(row["high"]) - float(row["close"]) if row["close"] >= row["open"] else float(row["high"]) - float(row["open"]),axis=1)

Identify and extract OHLC pattern on candlestick chart using plotly or pandas?

I'm using the Ameritrade API and pandas/plotly to chart a simple stock price on the minute scale, I'd like to use some of the properties of the produced chart to identify and extract a specific candlestick pattern.
Here I build my dataframe and plot it as a candlestick:
frame = pd.DataFrame({'open': pd.json_normalize(df, 'candles').open,
'high': pd.json_normalize(df, 'candles').high,
'low': pd.json_normalize(df, 'candles').low,
'close': pd.json_normalize(df, 'candles').close,
'datetime': pd.DatetimeIndex(pd.to_datetime(pd.json_normalize(df, 'candles').datetime, unit='ms')).tz_localize('UTC').tz_convert('US/Eastern')})
fig = go.Figure(data=[go.Candlestick(x=frame['datetime'],
open=frame['open'],
high=frame['high'],
low=frame['low'],
close=frame['close'])])
fig.update_layout(xaxis_rangeslider_visible=False)
fig.show()
The plot:
The pattern I'm searching for is simply the very first set in each day's trading of four consecutive red candles.
A red candle can be defined as:
close < open & close < prev.close
So in this case, I don't have access to prev.close for the very first minute of trading because I don't have pre-market/extended hours data.
I'm wondering if it's even possible to access the plotly figure data, because if so, I could just extract the first set of four consecutive red candles, and their data - but if not, I would just define my pattern and extract it using pandas but haven't gotten that far yet.
Would this be easier to do using plotly or pandas, and what would a simple implementation look like?
Not sure about Candlestick, but in pandas, you could try something like this. Note: I assume the data have 1 row for each business day already and is sorted. The first thing is to create a column named red with True where the condition for a red candle as described in you question is True:
df['red'] = df['close'].lt(df['open'])&df['close'].lt(df['close'].shift())
Then you want to see if it happens 4 days in a row and assuming the data is sorted ascending (usually), the idea is to reverse the dataframe with [::-1], use rolling with a window of 4, sum the column red created just above and check where it is equal to 4.
df['next_4days_red'] = df[::-1].rolling(4)['red'].sum().eq(4)
then if you want the days that are at the beginning of 4 consecutive red trading days you do loc:
df.loc[df['next_4days_red'], 'datetime'].tolist()
Here with a little example with dummy varaibles:
df = pd.DataFrame({'close': [10,12,11,10,9,8,7,10,9,10],
'datetime':pd.bdate_range('2020-04-01', periods=10 )})\
.assign(open=lambda x: x['close']+0.5)
df['red'] = df['close'].lt(df['open'])&df['close'].lt(df['close'].shift())
df['next_4days_red'] = df[::-1].rolling(4)['red'].sum().eq(4)
print (df.loc[df['next_4days_red'], 'datetime'].tolist())
[Timestamp('2020-04-03 00:00:00'), Timestamp('2020-04-06 00:00:00')]
Note: it catches two successive dates because it is a 5 days consecutive decrease, not sure if in this case you wanted the two dates

Send an email based on the date in a CSV column

I am looking to read data from a column in my CSV file.
All of the data in this column are dates. (DD/MM/YYYY).
I want my program to read the Dates column, and if the date is within 3 days of the current date, I want to add variables to all of the values in that row.
Ex.
Date,Name,LaterDate
1/1/19,John Smith, 2/21/19
If I run my program on 2/19/2019, I want an email sent that says "John Smith's case is closing on "2/21/2019".
I understand how to send an email. The part that I get stuck on is:
Reading the CSV column specifically.
If the date is within 3 days,
Assign variables to the values in the ROW,
Use those variables to send a custom email.
I see a lot of "Use Pandas" but I might need the individual steps broken down.
Thank you.
First things first, you need to read all the values of the csv file and store it in a variable (old_df). Then you need to save all the dates in the Series (dates). Next we create an empty DataFrame with the same columns. From here we create a simple for loop for each date in dates and it's index i. Turn date into a datetime object from the datetime library. Then we subtract amount of days between the current date and date. Take the absolute value of days so we always get a positive amount of days. Then add the index of that particular date in old_df to new_df.
import pandas as pd
from datetime import datetime
old_df = pd.read_csv('example.csv')
dates = old_df['LaterDate']
new_df = pd.DataFrame(columns=['Date', 'Name', 'LaterDate'])
for i, date in enumerate(dates):
date = datetime.strptime(date, '%m/%d/%y')
days = (datetime.now() - date).days
if abs(days) <= 3:
new_df = new_df.append(old_df.loc[i, :])
print(new_df)

Creating new column using for loop returns NaN value in Python/Pandas

I am using Python/Pandas to manipulate a data frame. I have a column 'month' (values from 1.0 to 12.0). Now I want to create another column 'quarter'. When I write -
for x in data['month']:
print ((x-1)//3+1)
I get proper output that is quarter number (1,2,3,4 etc).
But I am not being able to assign the output to the new column.
for x in data['month']:
data['quarter'] = ((x-1)//3 + 1)
This creates the quarter column with missing or 'NaN' value -
My question is why I am getting missing value while creating the column ?
Note: I am using python 3.6 and Anaconda 1.7.0. 'data' is the data frame I am using. Initially I had only the date which I converted to month and year using
data['month'] = pd.DatetimeIndex(data['first_approval']).month
Interestingly this month column shows dtype: float64 . I have read somewhere "dtype('float64') is equivalent to None" but I didn't understand that statement clearly. Any suggestion or help will be highly appreciated.
This is what I had in the beginning:
This is what I am getting after running the for loop:
The easiest way to get the quarter from the date would be to
data['quarter'] = pd.DatetimeIndex(data['date']).quarter
the same way as how you achieved the month information.
The below line would set the entire column to the last value achieved from the calculation. (There could have been some value which is not of a proper date format, hence the NaNs)
data['quarter'] = ((x-1)//3 + 1)
Try with below:
df['quarter'] = df['month'].apply(lambda x: ((x-1)//3 + 1))

How can I calculate values in a Pandas dataframe based on another column in the same dataframe

I am attempting to create a new column of values in a Pandas dataframe that are calculated from another column in the same dataframe:
df['ema_ideal'] = df['Adj Close'].ewm(span=df['ideal_moving_average'], min_periods=0, ignore_na=True).mean
However, I am receiving the error:
ValueError: The truth of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any(), or a.all().
If I have the span set to 30, or some integer, I do not receive this error. Also, ideal_moving_average is a column of float.
My two questions are:
Why exactly am I receiving the error?
How can I incorporate the column values from ideal_moving_average into the df['ema_ideal'] column (subquestion as I am new to Pandas - is this column a Series within the dataframe?)
Thanks for the help!
EDIT: Example showing Adj Close data, in bad formatting
Date Open High Low Close Adj Close
2017-01-03 225.039993 225.830002 223.880005 225.240005 222.073914
2017-01-04 225.619995 226.750000 225.610001 226.580002 223.395081
2017-01-05 226.270004 226.580002 225.479996 226.399994 223.217606
2017-01-06 226.529999 227.750000 225.899994 227.210007 224.016220
2017-01-09 226.910004 227.070007 226.419998 226.460007 223.276779
2017-01-10 226.479996 227.449997 226.009995 226.460007 223.276779
I think something like this will work for you:
df['ema_ideal'] = df.apply(lambda x: df['Adj Close'].ewm(span=x['ideal_moving_average'], min_periods=0, ignore_na=True).mean(), axis=1)
Providing axis=1 to DataFrame.apply allows you to access the data row wise like you need.
There's absolutely no issue creating a dataframe column from another dataframe.
The error you're receiving is completely different, this error is returned when you try to compare Series with logical fonctions such as and, or, not etc...
In general, to avoid this error you must compare Series element wise, using for example & instead of and, or ~ instead of not, or using numpy to do element wise comparation.
Here, the issue is that you're trying to use a Serie as the span of your ema, and pandas ewma function only accept integers as spans.
You could for example, calculate the ema for each possible periods, and then regroup them in a Serie that you set as the ema idealcolumn of your dataframe.
For anyone wondering, the problem was that span could not take multiple values, which was happening when I tried to pass df['ideal_moving_average'] into it. Instead, I used the below code, which seemed to go line by line passing the value for that row into span.
df['30ema'] = df['Adj Close'].ewm(span=df.iloc[-1]['ideal_ma'], min_periods=0, ignore_na=True).mean()
EDIT: I will accept this as correct for now, until someone shows that it doesn't work or can create something better, thanks for the help.

Resources