Very Basic Python 3.6 Plotting Issue - python-3.x

So I have a rather easy question r.e. some plotting issues. I have don't have the greatest level of Python knowledge, its been a few months since looking at it, there isn't anything that I can see that would aid me.
I have the following data frame:
Date Open High Low Close Adj Close Volume
0 11/01/2018 86.360001 87.370003 85.930000 86.930000 86.930000 143660001
1 10/01/2018 87.000000 87.190002 85.980003 86.080002 86.080002 108223002
This isn't all of the data; there's 3000+ rows of it.
QUESTION: I'm trying to plot Adj Close vs. Date. However, due to the index column, which I don't actually want, I end up with a plot of Adj Close vs. the index column. No use obviously.
I've used:
bp['Adj Close'].plot(label='BP',figsize=(16,8),title='Adjusted Closing Price')
So really it's a case of, where do I put the ['Date'] part into the code, so the Index column isn't used?
Many thanks for any help.

You need first convert column by to_datetime:
bp['Date'] = pd.to_datetime(bp['Date'])
and then use x and y parameters in DataFrame.plot:
bp.plot(x='Date', y='Adj Close', label='BP',figsize=(16,8),title='Adjusted Closing Price')
Or set_index from column Date and then use Series.plot:
bp.set_index('Date')['Adj Close'].plot(label='BP',figsize=(16,8),title='Adjusted Closing Price')

Related

Pandas: get first datetime-in and last datetime-out in one row

First of all thanks in advance, there are always answers here so we learn a lot from the experts. I'm a noob using "pandas" (it's super handie for what i tried and achieved so far).
I have these data, handed to me like this (don't have access to the origin), 20k rows or more sometimes. The 'in' and 'out' columns may have one or more data per date, so when i get a 'in' the next data could be a 'out' or a 'in', depending, leaving me a blank cell, that's the problem (see first image).
I want to filter the first datetime-in, to left it in one column and the last datetime-out in another but the two in one row (see second image); the data comes in a csv file. I am doing this particular work manually with LibreOffice Calc (yeap).
So far, I have tried locating and relocating, tried merging, grouping... nothing works for me so i feel frustrated, ¿would you please lend me a hand? here is a minimal sample of the file
By the way english is not my language. ¡Thanks so much!
First:
out_column = df["out"].tolist()
This gives you all the out dates as a list, we will need that later.
in_column = df["in"].tolist() # in is used by python so I suggest renaming that row
I treat NaT as NaN (Null) in this Case.
Now we have to find what rows to keep, which we do by going through the in column and only keeping the rows after a NaN (and the first one):
filtered_df = []
tracker = False
for index, element in enumerate(in):
if index == 0 or tracker is True:
filtered_df.append(True)
tracker = False
continue
if element is None:
tracker = True
filtered_df.append(False)
Then you filter your df by this Boolean List:
df = df[filtered_df]
Now you fix up your out column by removing the null values:
while null in out_column:
out_column.remove(null)
Last but not least you overwrite your old out column with the new one:
df["out"] = out_column

Identify and extract OHLC pattern on candlestick chart using plotly or pandas?

I'm using the Ameritrade API and pandas/plotly to chart a simple stock price on the minute scale, I'd like to use some of the properties of the produced chart to identify and extract a specific candlestick pattern.
Here I build my dataframe and plot it as a candlestick:
frame = pd.DataFrame({'open': pd.json_normalize(df, 'candles').open,
'high': pd.json_normalize(df, 'candles').high,
'low': pd.json_normalize(df, 'candles').low,
'close': pd.json_normalize(df, 'candles').close,
'datetime': pd.DatetimeIndex(pd.to_datetime(pd.json_normalize(df, 'candles').datetime, unit='ms')).tz_localize('UTC').tz_convert('US/Eastern')})
fig = go.Figure(data=[go.Candlestick(x=frame['datetime'],
open=frame['open'],
high=frame['high'],
low=frame['low'],
close=frame['close'])])
fig.update_layout(xaxis_rangeslider_visible=False)
fig.show()
The plot:
The pattern I'm searching for is simply the very first set in each day's trading of four consecutive red candles.
A red candle can be defined as:
close < open & close < prev.close
So in this case, I don't have access to prev.close for the very first minute of trading because I don't have pre-market/extended hours data.
I'm wondering if it's even possible to access the plotly figure data, because if so, I could just extract the first set of four consecutive red candles, and their data - but if not, I would just define my pattern and extract it using pandas but haven't gotten that far yet.
Would this be easier to do using plotly or pandas, and what would a simple implementation look like?
Not sure about Candlestick, but in pandas, you could try something like this. Note: I assume the data have 1 row for each business day already and is sorted. The first thing is to create a column named red with True where the condition for a red candle as described in you question is True:
df['red'] = df['close'].lt(df['open'])&df['close'].lt(df['close'].shift())
Then you want to see if it happens 4 days in a row and assuming the data is sorted ascending (usually), the idea is to reverse the dataframe with [::-1], use rolling with a window of 4, sum the column red created just above and check where it is equal to 4.
df['next_4days_red'] = df[::-1].rolling(4)['red'].sum().eq(4)
then if you want the days that are at the beginning of 4 consecutive red trading days you do loc:
df.loc[df['next_4days_red'], 'datetime'].tolist()
Here with a little example with dummy varaibles:
df = pd.DataFrame({'close': [10,12,11,10,9,8,7,10,9,10],
'datetime':pd.bdate_range('2020-04-01', periods=10 )})\
.assign(open=lambda x: x['close']+0.5)
df['red'] = df['close'].lt(df['open'])&df['close'].lt(df['close'].shift())
df['next_4days_red'] = df[::-1].rolling(4)['red'].sum().eq(4)
print (df.loc[df['next_4days_red'], 'datetime'].tolist())
[Timestamp('2020-04-03 00:00:00'), Timestamp('2020-04-06 00:00:00')]
Note: it catches two successive dates because it is a 5 days consecutive decrease, not sure if in this case you wanted the two dates

'KeyError: (Timestamp('1993-01-29 00:00:00'), 'colName')

I am trying to create a new column on my stockmarket data frame that was imported form yahoo. I am dealing with just one symbol at the moment.
symbol['profit']= [[symbol.loc[ei, 'close1']-symbol.loc[ei, 'close']] if symbol[ei, 'shares']==1 else 0 for ei in symbol.index]
I am expecting to have a new column in the dataframe labeled 'profit', but instead I am getting this as an output:
KeyError: (Timestamp('1993-01-29 00:00:00), 'shares')
I imported the csv to a df with
parse_dates=True
index_col='Date' setting the 'Date' column as a datetimeindex which has been working. I am not sure how to overcome this roadblock at the moment. Any help would be appreciated!
In your if statement, you forget the .loc
symbol['profit']= [symbol.loc[ei, 'close1']-symbol.loc[ei, 'close'] if symbol.loc[ei, 'shares']==1 else 0 for ei in symbol.index]
Also in pandas we usually try not use for loop as much as we could .
symbol['profit']=np.where(symbol.shares==1,symbol.close1-symbol.close,0)
I think it may be related to the fact that Jan 29th, 1993 was a Saturday
Try shifting the date to the next trading day

How can I calculate values in a Pandas dataframe based on another column in the same dataframe

I am attempting to create a new column of values in a Pandas dataframe that are calculated from another column in the same dataframe:
df['ema_ideal'] = df['Adj Close'].ewm(span=df['ideal_moving_average'], min_periods=0, ignore_na=True).mean
However, I am receiving the error:
ValueError: The truth of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any(), or a.all().
If I have the span set to 30, or some integer, I do not receive this error. Also, ideal_moving_average is a column of float.
My two questions are:
Why exactly am I receiving the error?
How can I incorporate the column values from ideal_moving_average into the df['ema_ideal'] column (subquestion as I am new to Pandas - is this column a Series within the dataframe?)
Thanks for the help!
EDIT: Example showing Adj Close data, in bad formatting
Date Open High Low Close Adj Close
2017-01-03 225.039993 225.830002 223.880005 225.240005 222.073914
2017-01-04 225.619995 226.750000 225.610001 226.580002 223.395081
2017-01-05 226.270004 226.580002 225.479996 226.399994 223.217606
2017-01-06 226.529999 227.750000 225.899994 227.210007 224.016220
2017-01-09 226.910004 227.070007 226.419998 226.460007 223.276779
2017-01-10 226.479996 227.449997 226.009995 226.460007 223.276779
I think something like this will work for you:
df['ema_ideal'] = df.apply(lambda x: df['Adj Close'].ewm(span=x['ideal_moving_average'], min_periods=0, ignore_na=True).mean(), axis=1)
Providing axis=1 to DataFrame.apply allows you to access the data row wise like you need.
There's absolutely no issue creating a dataframe column from another dataframe.
The error you're receiving is completely different, this error is returned when you try to compare Series with logical fonctions such as and, or, not etc...
In general, to avoid this error you must compare Series element wise, using for example & instead of and, or ~ instead of not, or using numpy to do element wise comparation.
Here, the issue is that you're trying to use a Serie as the span of your ema, and pandas ewma function only accept integers as spans.
You could for example, calculate the ema for each possible periods, and then regroup them in a Serie that you set as the ema idealcolumn of your dataframe.
For anyone wondering, the problem was that span could not take multiple values, which was happening when I tried to pass df['ideal_moving_average'] into it. Instead, I used the below code, which seemed to go line by line passing the value for that row into span.
df['30ema'] = df['Adj Close'].ewm(span=df.iloc[-1]['ideal_ma'], min_periods=0, ignore_na=True).mean()
EDIT: I will accept this as correct for now, until someone shows that it doesn't work or can create something better, thanks for the help.

pandas - convert Panel into DataFrame using lookup table for column headings

Is there a neat way to do this, or would I be best off making a look that creates a new dataframe, looking into the Panel when constructing each column?
I have a 3d array of data that I have put into a Panel, and I want to reorganise it based on a 2d lookup table using 2 of the axes so that it will be a DataFrame with labels taken from my lookup table using the nearest value. In a kind of double vlookup type of a way.
The main thing I am trying to achieve is to be able to quickly locate a time series of data based on the label. If there is a better way, please let me know!
my data is in a panel that looks like this, with items axis latitude and minor axis longitude.
data
Out[920]:
<class 'pandas.core.panel.Panel'>
Dimensions: 53 (items) x 29224 (major_axis) x 119 (minor_axis)
Items axis: 42.0 to 68.0
Major_axis axis: 2000-01-01 00:00:00 to 2009-12-31 21:00:00
Minor_axis axis: -28.0 to 31.0
and my lookup table is like this:
label_coords
Out[921]:
lat lon
label
2449 63.250122 -5.250000
2368 62.750122 -5.750000
2369 62.750122 -5.250000
2370 62.750122 -4.750000
I'm kind of at a loss. Quite new to python in general and only really started using pandas yesterday.
Many thanks in advance! Sorry if this is a duplicate, I couldn't find anything that was about the same type of question.
Andy
figured out a loop based solution and thought i may as well post in case someone else has this type of problem
I changed the way my label coordinates dataframe was being read so that the labels were a column, then used the pivot function:
label_coord = label_coord.pivot('lat','lon','label')
this then produces a dataframe where the labels are the values and lat/lon are the index/columns
then used this loop, where data is a panel as in the question:
data_labelled = pd.DataFrame()
for i in label_coord.columns: #longitude
for j in label_coord.index: #latitude
lbl = label_coord[i][j]
shut_nump['%s'%lbl]=data[j][i]

Resources