Python Pandas apply function not being applied to every row when using variables from a DataFrame - python-3.x

I have this weird Pandas problem, when I use the apply function using values from a data frame, it only gets applied to the first row:
import pandas as pd
# main data frame - to be edited
headerData = [['dataA', 'dataB']]
valuesData = [[10, 20], [10, 20]]
dfData = pd.DataFrame(valuesData, columns = headerData)
dfData.to_csv('MainData.csv', index=False)
readMainDataCSV = pd.read_csv('MainData.csv')
print(readMainDataCSV)
#variable data frame - pull values from this to edit main data frame
headerVariables = [['varA', 'varB']]
valuesVariables = [[2, 10]]
dfVariables = pd.DataFrame(valuesVariables, columns = headerVariables)
dfVariables.to_csv('Variables.csv', index=False)
readVariablesCSV = pd.read_csv('Variables.csv')
readVarA = readVariablesCSV['varA']
readVarB = readVariablesCSV['varB']
def formula(x):
return (x / readVarA) * readVarB
dfFormulaApplied = readMainDataCSV.apply(lambda x: formula(x))
print('\n', dfFormulaApplied)
Output:
dataA dataB
0 50.0 100.0
1 NaN NaN
But when I just use regular variables (not being called from a data frame), it functions just fine:
import pandas as pd
# main data frame - to be edited
headerData = [['dataA', 'dataB']]
valuesData = [[10, 20], [20, 40]]
dfData = pd.DataFrame(valuesData, columns = headerData)
dfData.to_csv('MainData.csv', index=False)
readMainDataCSV = pd.read_csv('MainData.csv')
print(readMainDataCSV)
# variables
readVarA = 2
readVarB = 10
def formula(x):
return (x / readVarA) * readVarB
dfFormulaApplied = readMainDataCSV.apply(lambda x: formula(x))
print('\n', dfFormulaApplied)
Output:
dataA dataB
0 50.0 100.0
1 100.0 200.0
Help please I'm pulling my hair out.

If you take readVarA and readVarB from the dataframe by selecting the column it is a pandas Series with an index, which gives a problem in the calculation (dividing a series by another series with a different index doesn't work).
You can take the first value from the series to get the value like this:
def formula(x):
return (x / readVarA[0]) * readVarB[0]

Related

Python Pandas sum() the difference between two values

In my python code, using pandas i have to resample a datetimedata series and calculate diffs between a column values (the sum of diffs between values), i write this piece of code:
import pandas as pd
import datetime
from .models import Results, VarsResults
start_date = datetime.date(2021, 6, 21)
end_date = datetime.date(2021, 6, 24)
def calc_q(start_d, end_d):
start_d = start_date
end_d = end_date
var_results = VarsResults.objects.filter(
id_res__read_date__range=(start_d, end_d)
).select_related(
"id_res"
).values(
"id_res__read_date",
"id_res__unit_id",
"id_res__device_id",
"id_res__proj_code",
"var_val",
)
df = pd.DataFrame(list(var_results))
df['id_res__read_date'] = pd.to_datetime(df['id_res__read_date'])
df = df.set_index('id_res__read_date')
df_15 = df.resample('15min').sum()
return df_15
but i get the sum of the values itself.
example
... | 5
... | 3
... | 1
i get 9
i would the sum of the difference between values not the sum of the values:
in this case 4 (5-3 = 2 + 3-1 = 2, 2+2)
Is there a method in pandas using resample for manage this kind of clcultion?
So many thanks in advance
Manuel
The sum of all the differences is equal to the difference between the first element and the last one: if you work it out, all the other elements cancel out. In your data for example the 3 cancels out:
(5-3) + (3-1)
= 5 - 3 + 3 - 1 # - 3 and + 3 cancel out
= 5 - 1
I don't know how Pandas works, but you can simply do the equivalent of first_value - last_value.

Pandas grouping and resampling for a bar plot:

I have a dataframe that records concentrations for several different locations in different years, with a high temporal frequency (<1 hour). I am trying to make a bar/multibar plot showing mean concentrations, at different locations in different years
To calculate mean concentration, I have to apply quality control filters to daily and monthly data.
My approach is to first apply filters and resample per year and then do the grouping by location and year.
Also, out of all the locations (in the column titled locations) I have to choose only a few rows. So, I am slicing the original dataframe and creating a new dataframe with selected rows.
I am not able to achieve this using the following code:
date=df['date']
location = df['location']
df.date = pd.to_datetime(df.date)
year=df.date.dt.year
df=df.set_index(date)
df['Year'] = df['date'].map(lambda x: x.year )
#Location name selection/correction in each city:
#Changing all stations:
df['location'] = df['location'].map(lambda x: "M" if x == "mm" else x)
#New dataframe:
df_new = df[(df['location'].isin(['K', 'L', 'M']))]
#Data filtering:
df_new = df_new[df_new['value'] >= 0]
df_new.drop(df_new[df_new['value'] > 400].index, inplace = True)
df_new.drop(df_new[df_new['value'] <2].index, inplace = True)
diurnal = df_new[df_new['value']].resample('12h')
diurnal_mean = diurnal.mean()[diurnal.count() >= 9]
daily_mean=diurnal_mean.resample('d').mean()
df_month=daily_mean.resample('m').mean()
df_yearly=df_month[df_month['value']].resample('y')
#For plotting:
df_grouped = df_new.groupby(['location', 'Year']).agg({'value':'sum'}).reset_index()
sns.barplot(x='location',y='value',hue='Year',data= df_grouped)
This is one of the many errors that cropped up:
"None of [Float64Index([22.73, 64.81, 8.67, 19.98, 33.12, 37.81, 39.87, 42.29, 37.81,\n 36.51,\n ...\n 11.0, 40.0, 23.0, 80.0, 50.0, 60.0, 40.0, 80.0, 80.0,\n 17.0],\n dtype='float64', length=63846)] are in the [columns]"
ERROR:root:Invalid alias: The name clear can't be aliased because it is another magic command.
This is a sample dataframe, showing what I need to plot; value column should ideally represent resampled values, after performing the quality control operations and resampling.
Unnamed: 0 location value \
date location value
2017-10-21 08:45:00+05:30 8335 M 339.3
2017-08-18 17:45:00+05:30 8344 M 45.1
2017-11-08 13:15:00+05:30 8347 L 594.4
2017-10-21 13:15:00+05:30 8659 N 189.9
2017-08-18 15:45:00+05:30 8662 N 46.5
This is how the a part of the actual data should look like, after selecting the chosen locations. I am a new user so cannot attach a screenshot of the graph I require. This query is an extension of the query I had posted earlier , with the additional requirement of plotting resampled data instead of simple value counts. Iteration over years to plot different group values as bar plot in pandas
Any help will be much appreciated.
Fundamentally, your errors come with this unclear indexing where you are passing continuous, float values of one column for rowwise selection of index which currently is a datetime type.
df_new[df_new['value']] # INDEXING DATETIME USING FLOAT VALUES
...
df_month[df_month['value']] # COLUMN value DOES NOT EXIST
Possibly, you meant to select the column value (out of the others) during resampling.
diurnal = df_new['value'].resample('12h')
diurnal.mean()[diurnal.count() >= 9]
daily_mean = diurnal_mean.resample('d').mean()
df_month = daily_mean.resample('m').mean() # REMOVE value BEING UNDERLYING SERIES
df_yearly = df_month.resample('y')
However, no where above do you retain location for plotting. Hence, instead of resample, use groupby(pd.Grouper(...))
# AGGREGATE TO KEEP LOCATION AND 12h
diurnal = (df_new.groupby(["location", pd.Grouper(freq='12h')])["value"]
.agg(["count", "mean"])
.reset_index().set_index(['date'])
)
# FILTER
diurnal_sub = diurnal[diurnal["count"] >= 9]
# MULTIPLE DATE TIME LEVEL MEANS
daily_mean = diurnal_sub.groupby(["location", pd.Grouper(freq='d')])["mean"].mean()
df_month = diurnal_sub.groupby(["location", pd.Grouper(freq='m')])["mean"].mean()
df_yearly = diurnal_sub.groupby(["location", pd.Grouper(freq='y')])["mean"].mean()
print(df_yearly)
To demonstrate with random, reproducible data:
Data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
np.random.seed(242020)
random_df = pd.DataFrame({'date': (np.random.choice(pd.date_range('2017-01-01', '2019-12-31'), 5000) +
pd.to_timedelta(np.random.randint(60*60, 60*60*24, 5000), unit='s')),
'location': np.random.choice(list("KLM"), 5000),
'value': np.random.uniform(10, 1000, 5000)
})
Aggregation
loc_list = list("KLM")
# NEW DATA FRAME WITH DATA FILTERING
df = (random_df.set_index(random_df['date'])
.assign(Year = lambda x: x['date'].dt.year,
location = lambda x: x['location'].where(x["location"] != "mm", "M"))
.query('(location == #loc_list) and (value >= 2 and value <= 400)')
)
# 12h AGGREGATION
diurnal = (df_new.groupby(["location", pd.Grouper(freq='12h')])["value"]
.agg(["count", "mean"])
.reset_index().set_index(['date'])
.query("count >= 2")
)
# d, m, y AGGREGATION
daily_mean = diurnal.groupby(["location", pd.Grouper(freq='d')])["mean"].mean()
df_month = diurnal.groupby(["location", pd.Grouper(freq='m')])["mean"].mean()
df_yearly = (diurnal.groupby(["location", pd.Grouper(freq='y')])["mean"].mean()
.reset_index()
.assign(Year = lambda x: x["date"].dt.year)
)
print(df_yearly)
# location date mean Year
# 0 K 2017-12-31 188.984592 2017
# 1 K 2018-12-31 199.521702 2018
# 2 K 2019-12-31 216.497268 2019
# 3 L 2017-12-31 214.347873 2017
# 4 L 2018-12-31 199.232711 2018
# 5 L 2019-12-31 177.689221 2019
# 6 M 2017-12-31 222.412711 2017
# 7 M 2018-12-31 241.597977 2018
# 8 M 2019-12-31 215.554228 2019
Plotting
sns.set()
fig, axs = plt.subplots(figsize=(12,5))
sns.barplot(x='location', y='mean', hue='Year', data= df_yearly, ax=axs)
plt.title("Location Value Yearly Aggregation", weight="bold", size=16)
plt.show()
plt.clf()
plt.close()

Apply function to pandas series given varying arguments

Initial question
I want to calculate the Levenshtein distance between multiple strings, one in a series, the other in a list. I tried my hands on map, zip, etc., but I only got the desired result using a for loop and apply. Is there a way to improve style and especially speed?
Here is what I tried and it does what it is supposed to do, but lacks of speed given a large series.
import stringdist
strings = ['Hello', 'my', 'Friend', 'I', 'am']
s = pd.Series(data=strings, index=strings)
c = ['me', 'mine', 'Friend']
df = pd.DataFrame()
for w in c:
df[w] = s.apply(lambda x: stringdist.levenshtein(x, w))
## Result: ##
me mine Friend
Hello 4 5 6
my 1 3 6
Friend 5 4 0
I 2 4 6
am 2 4 6
Solution
Thanks to #Dames and #molybdenum42, I can provide the solution I used, directly beneath the question. For more insights, please check their great answers below.
import stringdist
from itertools import product
strings = ['Hello', 'my', 'Friend', 'I', 'am']
s = pd.Series(data=strings, index=strings)
c = ['me', 'mine', 'Friend']
word_combinations = np.array(list(product(s.values, c)))
vectorized_levenshtein = np.vectorize(stringdist.levenshtein)
result = vectorized_levenshtein(word_combinations[:, 0],
word_combinations[:, 1])
result = result.reshape((len(s), len(c)))
df = pd.DataFrame(result, columns=c, index=s)
This results in the desired data frame.
Setup:
import stringdist
import pandas as pd
import numpy as np
import itertools
s = pd.Series(data=['Hello', 'my', 'Friend'],
index=['Hello', 'my', 'Friend'])
c = ['me', 'mine', 'Friend']
Options
option: an easy one-liner
df = pd.DataFrame([s.apply(lambda x: stringdist.levenshtein(x, w)) for w in c])
option: np.fromfunction (thanks to #baccandr)
#np.vectorize
def lavdist(a, b):
return stringdist.levenshtein(c[a], s[b])
df = pd.DataFrame(np.fromfunction(lavdist, (len(c), len(s)), dtype = int),
columns=c, index=s)
option: see #molybdenum42
word_combinations = np.array(list(itertools.product(s.values, c)))
vectorized_levenshtein = np.vectorize(stringdist.levenshtein)
result = vectorized_levenshtein(word_combinations[:,0], word_combinations[:,1])
df = pd.DataFrame([word_combinations[:,1], word_combinations[:,1], result])
df = df.set_index([0,1])[2].unstack()
(the best) option: modified option 3
word_combinations = np.array(list(itertools.product(s.values, c)))
vectorized_levenshtein = np.vectorize(distance)
result = vectorized_levenshtein(word_combinations[:,0], word_combinations[:,1])
result = result.reshape((len(s), len(c)))
df = pd.DataFrame(result, columns=c, index=s)
Performance testing:
import timeit
from Levenshtein import distance
import pandas as pd
import numpy as np
import itertools
s = pd.Series(data=['Hello', 'my', 'Friend'],
index=['Hello', 'my', 'Friend'])
c = ['me', 'mine', 'Friend']
test_code0 = """
df = pd.DataFrame()
for w in c:
df[w] = s.apply(lambda x: distance(x, w))
"""
test_code1 = """
df = pd.DataFrame({w:s.apply(lambda x: distance(x, w)) for w in c})
"""
test_code2 = """
#np.vectorize
def lavdist(a, b):
return distance(c[a], s[b])
df = pd.DataFrame(np.fromfunction(lavdist, (len(c), len(s)), dtype = int),
columns=c, index=s)
"""
test_code3 = """
word_combinations = np.array(list(itertools.product(s.values, c)))
vectorized_levenshtein = np.vectorize(distance)
result = vectorized_levenshtein(word_combinations[:,0], word_combinations[:,1])
df = pd.DataFrame([word_combinations[:,1], word_combinations[:,1], result])
df = df.set_index([0,1])[2] #.unstack() produces error
"""
test_code4 = """
word_combinations = np.array(list(itertools.product(s.values, c)))
vectorized_levenshtein = np.vectorize(distance)
result = vectorized_levenshtein(word_combinations[:,0], word_combinations[:,1])
result = result.reshape((len(s), len(c)))
df = pd.DataFrame(result, columns=c, index=s)
"""
test_setup = "from __main__ import distance, s, c, pd, np, itertools"
print("test0", timeit.timeit(test_code0, number = 1000, setup = test_setup))
print("test1", timeit.timeit(test_code1, number = 1000, setup = test_setup))
print("test2", timeit.timeit(test_code2, number = 1000, setup = test_setup))
print("test3", timeit.timeit(test_code3, number = 1000, setup = test_setup))
print("test4", timeit.timeit(test_code4, number = 1000, setup = test_setup))
Results
# results
# test0 1.3671939949999796
# test1 0.5982696900009614
# test2 0.3246431229999871
# test3 2.0100400850005826
# test4 0.23796007100099814
Using itertools, you can at least get all the required combinations. Using a vectorized version of stringcount.levenshtein (made using numpy.vectorize()) you can then get your desired result without looping at all, though I haven't tested the performance of the vectorized levenshtein function.
The code could look something like this:
import stringdist
import numpy as np
import pandas as pd
import itertools
s = pd.Series(["Hello", "my","Friend"])
c = ['me', 'mine', 'Friend']
word_combinations = np.array(list(itertools.product(s.values, c)))
vectorized_levenshtein = np.vectorize(stringdist.levenshtein)
result = vectorized_levenshtein(word_combinations[:,0], word_combinations[:,1])
At this point you have the results in a numpy array, each corresponding to one of all the possible combinations of your two intial arrays. If you want to get it into the shape you have in your example, there's some pandas trickery to be done:
df = pd.DataFrame([word_combinations[:,0], word_combinations[:,1], result]).T
### initially looks like: ###
# 0 1 2
# 0 Hello me 4
# 1 Hello mine 5
# 2 Hello Friend 6
# 3 my me 1
# 4 my mine 3
# 5 my Friend 6
# 6 Friend me 5
# 7 Friend mine 4
# 8 Friend Friend 0
df = df.set_index([0,1])[2].unstack()
### Now looks like: ###
# Friend Hello my
# Friend 0 6 6
# me 5 4 1
# mine 4 5 3
Again, I haven't tested the performance of this method, so I recommend checking that out - it should be faster than iteration though.
EDIT:
User #Dames has a better suggestion for making the result all pretty-like:
result = result.reshape(len(c), len(s))
df = pd.DataFrame(result, columns=c, index=s)

How to write from loop to dataframe

I'am trying to calculate 33 stock betas and write them to dataframe.
Unfortunately, I have an error in my code:
cannot concatenate object of type ""; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are vali
import pandas as pd
import numpy as np
stock1=pd.read_excel(r"C:\Users\Кир\Desktop\Uni\Master\Nasdaq\Financials 11.05\Nasdaq last\clean data\01.xlsx", '1') #read second sheet of excel file
stock2=pd.read_excel(r"C:\Users\Кир\Desktop\Uni\Master\Nasdaq\Financials 11.05\Nasdaq last\clean data\01.xlsx", '2') #read second sheet of excel file
stock2['stockreturn']=np.log(stock2.AdjCloseStock / stock2.AdjCloseStock.shift(1)) #stock ln return
stock2['SP500return']=np.log(stock2.AdjCloseSP500 / stock2.AdjCloseSP500.shift(1)) #SP500 ln return
stock2 = stock2.iloc[1:] #delete first row in dataframe
betas = pd.DataFrame()
for i in range(0,(len(stock2.AdjCloseStock)//52)-1):
betas = betas.append(stock2.stockreturn.iloc[i*52:(i+1)*52].cov(stock2.SP500return.iloc[i*52:(i+1)*52])/stock2.SP500return.iloc[i*52:(i+1)*52].cov(stock2.SP500return.iloc[i*52:(i+1)*52]))
My data looks like weekly stock and S&P index return for 33 years. So the output should have 33 betas.
I tried simplifying your code and creating an example. I think the problem is that your calculation returns a float. You want to make it a pd.Series. DataFrame.append takes:
DataFrame or Series/dict-like object, or list of these
np.random.seed(20)
df = pd.DataFrame(np.random.randn(33*53, 2),
columns=['a', 'b'])
betas = pd.DataFrame()
for year in range(len(df['a'])//52 -1):
# Take some data
in_slice = pd.IndexSlice[year*52:(year+1)*52]
numerator = df['a'].iloc[in_slice].cov(df['b'].iloc[in_slice])
denominator = df['b'].iloc[in_slice].cov(df['b'].iloc[in_slice])
# Do some calculations and create a pd.Series from the result
data = pd.Series(numerator / denominator, name = year)
# Append to the DataFrame
betas = betas.append(data)
betas.index.name = 'years'
betas.columns = ['beta']
betas.head():
beta
years
0 0.107669
1 -0.009302
2 -0.063200
3 0.025681
4 -0.000813

How to have a chart multiple columns continuously by iterating through a data-frame with matplotlib

BACKGROUND INFORMATION:
I have dataframe of x many stocks with y price sets (closing & 3 day SMA), (currently this is 5 and 2 respectively (one is closing price, the other is a 3 day Simple Moving Average SMA).
The current output is [2781 rows x 10 columns] with a ranging data set start_date = '2006-01-01' till end_date = '2016-12-31'. The output is as follows as a dataframe print(df):
CURRENT OUTPUT:
ANZ Price ANZ 3 day SMA CBA Price CBA 3 day SMA MQG Price MQG 3 day SMA NAB Price NAB 3 day SMA WBC Price WBC 3 day SMA
Date
2006-01-02 23.910000 NaN 42.569401 NaN 66.558502 NaN 30.792999 NaN 22.566401 NaN
2006-01-03 24.040001 NaN 42.619099 NaN 66.086403 NaN 30.935699 NaN 22.705400 NaN
2006-01-04 24.180000 24.043334 42.738400 42.642300 66.587997 66.410967 31.078400 30.935699 22.784901 22.685567
2006-01-05 24.219999 24.146667 42.708599 42.688699 66.558502 66.410967 30.964300 30.992800 22.794800 22.761700
... ... ... ... ... ... ... ... ... ... ...
2016-12-27 87.346667 30.670000 30.706666 32.869999 32.729999 87.346667 30.670000 30.706666 32.869999 32.729999
2016-12-28 87.456667 31.000000 30.773333 32.980000 32.829999 87.456667 31.000000 30.773333 32.980000 32.829999
2016-12-29 87.520002 30.670000 30.780000 32.599998 32.816666 87.520002 30.670000 30.780000 32.599998 32.816666
MY WORKING CODE:
#!/usr/bin/python3
from pandas_datareader import data
import pandas as pd
import itertools as it
import os
import numpy as np
import fix_yahoo_finance as yf
import matplotlib.pyplot as plt
yf.pdr_override()
stock_list = sorted(["ANZ.AX", "WBC.AX", "MQG.AX", "CBA.AX", "NAB.AX"])
number_of_decimal_places = 8
moving_average_period = 3
def get_moving_average(df, stock_name):
df2 = df.rolling(window=moving_average_period).mean()
df2.rename(columns={stock_name: stock_name.replace("Price", str(moving_average_period) + " day SMA")}, inplace=True)
df = pd.concat([df, df2], axis=1, join_axes=[df.index])
return df
# Function to get the closing price of the individual stocks
# from the stock_list list
def get_closing_price(stock_name, specific_close):
symbol = stock_name
start_date = '2006-01-01'
end_date = '2016-12-31'
df = data.get_data_yahoo(symbol, start_date, end_date)
sym = symbol + " "
print(sym * 10)
df = df.drop(['Open', 'High', 'Low', 'Adj Close', 'Volume'], axis=1)
df = df.rename(columns={'Close': specific_close})
# https://stackoverflow.com/questions/16729483/converting-strings-to-floats-in-a-dataframe
# df[specific_close] = df[specific_close].astype('float64')
# print(type(df[specific_close]))
return df
# Creates a big DataFrame with all the stock's Closing
# Price returns the DataFrame
def get_all_close_prices(directory):
count = 0
for stock_name in stock_list:
specific_close = stock_name.replace(".AX", "") + " Price"
if not count:
prev_df = get_closing_price(stock_name, specific_close)
prev_df = get_moving_average(prev_df, specific_close)
else:
new_df = get_closing_price(stock_name, specific_close)
new_df = get_moving_average(new_df, specific_close)
# https://stackoverflow.com/questions/11637384/pandas-join-merge-concat-two-dataframes
prev_df = prev_df.join(new_df)
count += 1
# prev_df.to_csv(directory)
df = pd.DataFrame(prev_df, columns=list(prev_df))
df = df.apply(pd.to_numeric)
convert_df_to_csv(df, directory)
return df
def convert_df_to_csv(df, directory):
df.to_csv(directory)
def main():
# FINDS THE CURRENT DIRECTORY AND CREATES THE CSV TO DUMP THE DF
csv_in_current_directory = os.getcwd() + "/stock_output.csv"
csv_in_current_directory_dow_distribution = os.getcwd() + "/dow_distribution.csv"
# FUNCTION THAT GETS ALL THE CLOSING PRICES OF THE STOCKS
# AND RETURNS IT AS ONE COMPLETE DATAFRAME
df = get_all_close_prices(csv_in_current_directory)
print(df)
# Main line of code
if __name__ == "__main__":
main()
QUESTION:
From this df I want to create x many lines graphs (one graph per stock) with y many lines (price, and SMAs). How can I do this with matplotlib? Could this be done with a for loop and save the individuals plots as the loop gets iterated? If so how?
First import import matplotlib.pyplot as plt.
Then it depends whether you want x many individual plots or one plot with x many subplots:
Individual plots
df.plot(y=[0,1])
df.plot(y=[2,3])
df.plot(y=[4,5])
df.plot(y=[6,7])
df.plot(y=[8,9])
plt.show()
You can also save the individual plots in a loop:
for i in range(0,9,2):
df.plot(y=[i,i+1])
plt.savefig('{}.png'.format(i))
Subplots
fig, axes = plt.subplots(nrows=2, ncols=3)
df.plot(ax=axes[0,0],y=[0,1])
df.plot(ax=axes[0,1],y=[2,3])
df.plot(ax=axes[0,2],y=[4,5])
df.plot(ax=axes[1,0],y=[6,7])
df.plot(ax=axes[1,1],y=[8,9])
plt.show()
See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html for options to customize your plot(s).
The best approach is to make a function that is dependent on the size of your lists x and y. Thereby the function should be as follows:
def generate_SMA_graphs(df):
columnNames = list(df.head(0))
print("CN:\t", columnNames)
print(len(columnNames))
count = 0
for stock in stock_list:
stock_iter = count * (len(moving_average_period_list) + 1)
sma_iter = stock_iter + 1
for moving_average_period in moving_average_period_list:
fig = plt.figure()
df.plot(y=[columnNames[stock_iter], columnNames[sma_iter]])
plt.xlabel('Time')
plt.ylabel('Price ($)')
graph_title = columnNames[stock_iter] + " vs. " + columnNames[sma_iter]
plt.title(graph_title)
plt.grid(True)
plt.savefig(graph_title.replace(" ", "") + ".png")
print("\t\t\t\tCompleted: ", graph_title)
plt.close(fig)
sma_iter += 1
count += 1
With the code above, irrespective how ever long either list is (for x or y, stock list or SMA list) the above function will generate a graph comparing the original price with every SMA for that given stock.

Resources