Loop through a list of tickers with separate outputs - python-3.x

I have a list of tickers of which I would like to output individual datasets with financial information from pandas datareader.
I have tried to create a simple loop that takes a list of tickers and inputs it into the pandas datareader function.
import pandas as pd
import pandas_datareader as pdr
myTickers = ['AAPL', 'PG']
for ticks in myTickers:
print(ticks)
ticks = pdr.DataReader(ticks, 'yahoo', start='2019-01-01', end='2019-01-08')['Adj Close']
The problem here seems to be that the loop only substitutes in the myTickers values inside the DataReader function but it does not change the name of the dataframe from "ticks" to e.g. AAPL. Thereby all results will be overridden with whatever ticker loops last.
What do I need to modify in order for this loop to output two different dataframes with the names in the ticker list?

You can save in a DataFrame, after get a col like a dataframe with a function.
import pandas as pd
import pandas_datareader as pdr
myTickers = ['AAPL', 'PG']
df=pd.DataFrame(columns=myTickers)
for ticks in myTickers:
df[ticks] = pdr.DataReader(ticks, 'yahoo', start='2019-01-01', end='2019-01-08')['Adj Close']
def ticks(s):
return df[s].to_frame()
ticks('AAPL')
Output:
AAPL
Date
2019-01-02 156.049484
2019-01-03 140.505798
2019-01-04 146.503891
2019-01-07 146.177811
2019-01-08 148.964386
ticks('PG')
Output:
PG
Date
2019-01-02 89.350105
2019-01-03 88.723633
2019-01-04 90.534523
2019-01-07 90.172348
2019-01-08 90.505157

As you pointed out, the loop variable is forgotten, so needs to be stored somewhere. You could replace ticks in myTickers with it's corresponding DataFrame. However, a reference to the ticker will be useful. Perhaps the following may help.
tickers_df_dict = {
ticks: pdr.DataReader(ticks, 'yahoo', start='2019-01-01', end='2019-01-08')['Adj Close']
for ticks in myTickers
}
That being said, as far as I'm aware, using the yahoo API will result in the following error. You may need to revise your chosen data source.
ImmediateDeprecationError:
Yahoo Daily has been immediately deprecated due to large breaks in the API without the
introduction of a stable replacement. Pull Requests to re-enable these data
connectors are welcome.
See https://github.com/pydata/pandas-datareader/issues

Related

sqlalchemy.exc.StatementError: (builtins.TypeError) SQLite Date type only accepts Python date objects as input

I have pandas data frame that contains Month and Year values in a yyyy-mm format. I am using pd.to_sql to set the data type value to sent it to .db file.
I keep getting error:
sqlalchemy.exc.StatementError: (builtins.TypeError) SQLite Date type only accepts Python date objects as input.
Is there a way to set 'Date' Data type for 'MonthYear' (yyyy-mm) column? Or it should be set in a VARCHAR? I tried changing it to different types pandas's datetime data type, none of them seem to work.
I don't have any issues with 'full_date', it assigns it properly. Data type for 'full_date' is datetime64[ns] in pandas.
MonthYear full_date
2015-03 2012-03-11
2015-04 2013-08-19
2010-12 2012-06-29
2012-01 2018-01-01
df.to_sql('MY_TABLE', con=some_connection,
dtype={'MonthYear':sqlalchemy.types.Date(),
'full_date':sqlalchemy.types.Date()})
My opinion is that you shouldn't store unnecessarily the extra column in your database when you can derive it from the 'full_date' column.
One issue you'll run into is that SQLite doesn't have a DATE type. So, you need to parse the dates upon extraction with your query. Full example:
import datetime as dt
import numpy as np
import pandas as pd
import sqlite3
# I'm using datetime64[ns] because that's what you say you have
df = pd.DataFrame({'full_date': [np.datetime64('2012-03-11')]})
con = sqlite3.connect(":memory:")
df.to_sql("MY_TABLE", con, index=False)
new_df = pd.read_sql_query("SELECT * FROM MY_TABLE;", con,
parse_dates={'full_date':'%Y-%m-%d'})
Result:
In [111]: new_df['YearMonth'] = new_df['full_date'].dt.strftime('%Y-%m')
In [112]: new_df
Out[112]:
full_date YearMonth
0 2012-03-11 2012-03

How do I pull daily pytrends data for multiple keywords and save them to a .csv

I've locked myself out of pytrends trying to solve this. Found some help in an old post
There are a few elements, firstly, I don't fully understand the documentation e.g. what is the payload? When i run it it doesn't seem to do anything. The result is I'm working with a lot of copy pasted code.
Second, I want to get keyword trend data for the year to date in a .csv
import pandas as pd
from pytrends.exceptions import ResponseError
from pytrends.request import TrendReq
import matplotlib.pyplot as plt
data = []
kw_list = ["maxi dresses", "black shorts"]
for kw in kw_list:
kw_data = dailydata.get_daily_data(kw, 2020, 1, 2020, 4, geo = 'GB')
data.append(kw_data)
data.to_csv(r"C:\Users\XXXX XXXXX\Documents\Python Files\PyTrends\trends_py.csv".)
I also tried:
df =pytrends.get_historical_interest(kw_list, year_start=2020, month_start=1, day_start=1, year_end=2020, month_end=4, geo='GB', gprop='', sleep=0)
df = df.reset_index()
df.head(20)
Though for my purposes get_historical_interest is useless because it provides hourly data with lots of 0s. The hourly data also doesn't match trends.

Changing column datatype from Timestamp to datetime64

I have a database I'm reading from excel as a pandas dataframe, and the dates come in Timestamp dtype, but I need them to be in np.datetime64, so that I can make calculations.
I am aware that the function pd.to_datetime() and the astype(np.datetime64[ns]) method do work. However, I am unable to update my dataframe to yield this datatype, for whatever reason, using the code mentioned above.
I have also tried creating an acessory dataframe from the original one, with just the dates that I wish to update the typing, converting it to np.datetime64 and plugging it back onto the original dataframe:
dfi = df['dates']
dfi = pd.to_datetime(dfi)
df['dates'] = dfi
But still it doesn't work. I have also tried updating values one by one:
arr_i = df.index
for i in range(len(arr_i)):
df.at[arri[l],'dates'].to_datetime64()
Edit
The root problem seems to be that the dtype of the column gets updated to np.datetime64, but somehow, when getting single values from within, they still have the dtype = Timestamp
Does anyone have a suggestion of a workaround that is fairly fast?
Pandas tries to standardize all forms of datetimes by storing them as NumPy datetime64[ns] values when you assign them to a DataFrame. But when you try to access individual datetime64 values, they are returned as Timestamps.
There is a way to prevent this automatic conversion from happening however: Wrap the list of values in a Series of dtype object:
import numpy as np
import pandas as pd
# create some dates, merely for example
dates = pd.date_range('2000-1-1', periods=10)
# convert the dates to a *list* of datetime64s
arr = list(dates.to_numpy())
# wrap the values you wish to protect in a Series of dtype object.
ser = pd.Series(arr, dtype='object')
# assignment with `df['datetime64s'] = ser` would also work
df = pd.DataFrame({'timestamps': dates,
'datetime64s': ser})
df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 10 entries, 0 to 9
# Data columns (total 2 columns):
# timestamps 10 non-null datetime64[ns]
# datetime64s 10 non-null object
# dtypes: datetime64[ns](1), object(1)
# memory usage: 240.0+ bytes
print(type(df['timestamps'][0]))
# <class 'pandas._libs.tslibs.timestamps.Timestamp'>
print(type(df['datetime64s'][0]))
# <class 'numpy.datetime64'>
But beware! Although with a little work you can circumvent Pandas' automatic conversion mechanism,
it may not be wise to do this. First, converting a NumPy array to a list is usually a sign you are doing something wrong, since it is bad for performance. Using object arrays is a bad sign since operations on object arrays are generally much much slower than equivalent operations on arrays of native NumPy dtypes.
You may be looking at an XY problem -- it may be more fruitful to find a way to (1)
work with Pandas Timestamps instead of trying to force Pandas to return NumPy
datetime64s or (2) work with datetime64 array-likes (e.g. Series of NumPy arrays) instead of handling values individually (which causes the coersion to Timestamps).

Python - Filtering Pandas Timestamp Index

Given Timestamp indices with many per day, how can I get a list containing only the last Timestamp of a day? So in case I have such:
import pandas as pd
all = [pd.Timestamp('2016-05-01 10:23:45'),
pd.Timestamp('2016-05-01 18:56:34'),
pd.Timestamp('2016-05-01 23:56:37'),
pd.Timestamp('2016-05-02 03:54:24'),
pd.Timestamp('2016-05-02 14:32:45'),
pd.Timestamp('2016-05-02 15:38:55')]
I would like to get:
# End of Day:
EoD = [pd.Timestamp('2016-05-01 23:56:37'),
pd.Timestamp('2016-05-02 15:38:55')]
Thx in advance!
Try pandas groupby
all = pd.Series(all)
all.groupby([all.dt.year, all.dt.month, all.dt.day]).max()
You get
2016 5 1 2016-05-01 23:56:37
2 2016-05-02 15:38:55
I've created an example dataframe.
import pandas as pd
all = [pd.Timestamp('2016-05-01 10:23:45'),
pd.Timestamp('2016-05-01 18:56:34'),
pd.Timestamp('2016-05-01 23:56:37'),
pd.Timestamp('2016-05-02 03:54:24'),
pd.Timestamp('2016-05-02 14:32:45'),
pd.Timestamp('2016-05-02 15:38:55')]
df = pd.DataFrame({'values':0}, index = all)
Assuming your data frame is structured as example, most importantly is sorted by index, code below is supposed to help you.
for date in set(df.index.date):
print(df[df.index.date == date].iloc[-1,:])
This code will for each unique date in your dataframe return last row of the slice so while sorted it'll return your last record for the day. And hey, it's pythonic. (I believe so at least)

extract information from single cells from pandas dataframe

I'm looking to pull specific information from the table below to use in other functions. For example extracting the volume on 1/4/16 to see if the volume traded is > 1 million. Any thoughts on how to do this would be greatly appreciated.
import pandas as pd
import pandas.io.data as web # Package and modules for importing data; this code may change depending on pandas version
import datetime
1, 2016
start = datetime.datetime(2016,1,1)
end = datetime.date.today()
apple = web.DataReader("AAPL", "yahoo", start, end)
type(apple)
apple.head()
Results:
The datareader will return a df with a datetimeIndex, you can use partial datetime string matching to give you the specific row and column using loc:
apple.loc['2016-04-01','Volume']
To test whether this is larger than 1 million, just compare it:
apple.loc['2016-04-01','Volume'] > 1000000
which will return True or False

Resources