Good morning,
I'm using python 3.6. I'm trying to name my index (see last line in code below) because I plan on joining to another DataFrame. The DataFrame should be multi-indexed. The index is the first two columns ('currency' and 'rtdate') and the data
rate
AUD 2010-01-01 0.897274
2010-02-01 0.896608
2010-03-01 0.895943
2010-04-01 0.895277
2010-05-01 0.894612
This is the code that I'm running:
import pandas as pd
import numpy as np
import datetime as dt
df=pd.read_csv('file.csv',index_col=0)
df.index = pd.to_datetime(df.index)
new_index = pd.date_range(df.index.min(),df.index.max(),freq='MS')
df=df.reindex(new_index)
df=df.interpolate().unstack()
rate = pd.DataFrame(df)
rate.columns = ['rate']
rate.set_index(['currency','rtdate'],drop=False)
Running this throw's an error message:
KeyError: 'currency'
What am I missing.
Thanks for the assistance
I think you need to set the names of the levels of MultiIndex by using rename_axis first and then reset_index for columns from MultiIndex:
So you'd end up with this:
rate = df.interpolate().unstack().set_axis(('currency','rtdate')).reset_index()
instead of this:
df=df.interpolate().unstack()
rate = pd.DataFrame(df)
rate.columns = ['rate']
rate.set_index(['currency','rtdate'],drop=False)
Related
I have just started out learning few things in python, I am stuck in between.
import yfinance as yf
import pandas as pd
import yahoo_fin.stock_info as si
ticker = ['20MICRONS.NS', '21STCENMGM.NS', '3IINFOTECH.NS', '3MINDIA.NS', '3PLAND.NS']
for i in ticker:
try:
quote = si.get_quote_table(i)
df = pd.DataFrame.from_dict(quote.items())
df = df.append(quote.items(), ignore_index=True)
except (ValueError, IndexError,TypeError):
continue
print(df)
Just for example: The value of i has more than 4 entries, whenever I am exiting the loop this data has to be added or should be appended in the dataframe.
But for some reason the dataframe is not appending these values.
Thanks in advance
You defined df within the loop, which means that a new dataframe is initialised in df at each iteration. Define a new dataframe in df before the loop and append to the df in the loop. Please add the information that you provided in the comments to the question.
I have a dataframe in python which is made using the following code:
import pandas as pd
df = pd.read_csv('myfile.txt', sep="\t")
df1 = df.iloc[:, 3:]
now in df1 there are 24 columns. I would like to transform the values to log2 value and make a new dataframe in which there are 24 columns with log value of original dataframe. to do so I used numpy.log like the following line:
df2 = (numpy.log(df1))
this code does not return what I would like to get. do you know how to fix it?
I am using a pandas DataFrame with datetime indexing. I know from the
Xarray documentation, that datetime indexing can be done as ds['date.year'] with ds being the DataArray of xarray, date the date index and years of the dates. Xarray points to datetime components which again leads to DateTimeIndex, the latter being panda documentation. So I thought of doing the same with pandas, as I really like this feature.
However, it is not working for me. Here is what I did so far:
# Import required modules
import pandas as pd
import numpy as np
# Create DataFrame (name: df)
df=pd.DataFrame({'Date': ['2017-04-01','2017-04-01',
'2017-04-02','2017-04-02'],
'Time': ['06:00:00','18:00:00',
'06:00:00','18:00:00'],
'Active': [True,False,False,True],
'Value': np.random.rand(4)})
# Combine str() information of Date and Time and format to datetime
df['Date']=pd.to_datetime(df['Date'] + ' ' + df['Time'],format = '%Y-%m-%d %H:%M:%S')
# Make the combined data the index
df = df.set_index(df['Date'])
# Erase the rest, as it is not required anymore
df = df.drop(['Time','Date'], axis=1)
# Show me the first day
df['2017-04-01']
Ok, so this shows me only the first entries. So far, so good.
However
df['Date.year']
results in KeyError: 'Date.year'
I would expect an output like
array([2017,2017,2017,2017])
What am I doing wrong?
EDIT:
I have a workaround, which I am able to go on with, but I am still not satisfied, as this doesn't explain my question. I did not use a pandas DataFrame, but an xarray Dataset and now this works:
# Load modules
import pandas as pd
import numpy as np
import xarray as xr
# Prepare time array
Date = ['2017-04-01','2017-04-01', '2017-04-02','2017-04-02']
Time = ['06:00:00','18:00:00', '06:00:00','18:00:00']
time = [Date[i] + ' ' + Time[i] for i in range(len(Date))]
time = pd.to_datetime(time,format = '%Y-%m-%d %H:%M:%S')
# Create Dataset (name: ds)
ds=xr.Dataset({'time': time,
'Active': [True,False,False,True],
'Value': np.random.rand(4)})
ds['time.year']
which gives:
<xarray.DataArray 'year' (time: 4)>
array([2017, 2017, 2017, 2017])
Coordinates:
* time (time) datetime64[ns] 2017-04-01T06:00:00 ... 2017-04-02T18:00:00
Just in terms of what you're doing wrong, your are
a) trying to call an index as a series
b) chaning commands within a string df['Date'] is a single column df['Date.year'] is a column called 'Date.year'
if you're datetime is the index, then use the .year or dt.year if it's a series.
df.index.year
#or assuming your dtype is a proper datetime (your code indicates it is)
df.Date.dt.year
hope that helps bud.
I'm following the bokeh tutorial and in the basic plotting section, I can't manage to show a plot. I only get the axis. What am I missing?
Here is the code:
df = pd.DataFrame.from_dict(AAPL)
weekapple = df.loc["2000-03-01":"2000-04-01"]
p = figure(x_axis_type="datetime", title="AAPL", plot_height=350, plot_width=800)
p.xgrid.grid_line_color=None
p.ygrid.grid_line_alpha=0.5
p.xaxis.axis_label = 'Time'
p.yaxis.axis_label = 'Value'
p.line(weekapple.date, weekapple.close)
show(p)
I get this:
My result
I'm trying to complete the exercise here (10th Code cell - Exercise with AAPL data) I was able to follow all previous code up to that point correctly.
Thanks in advance!
In case this is still relevant, this is how you should do you selection:
df = pd.DataFrame.from_dict(AAPL)
# Convert date column in df from strings to the proper datetime format
date_format="%Y-%m-%d"
df["date"] = pd.to_datetime(df['date'], format=date_format)
# Use the same conversion for selected dates
weekapple = df[(df.date>=dt.strptime("2000-03-01", date_format)) &
(df.date<=dt.strptime("2000-04-01", date_format))]
p = figure(x_axis_type="datetime", title="AAPL", plot_height=350, plot_width=800)
p.xgrid.grid_line_color=None
p.ygrid.grid_line_alpha=0.5
p.xaxis.axis_label = 'Time'
p.yaxis.axis_label = 'Value'
p.line(weekapple.date, weekapple.close)
show(p)
To make this work, before this code, I have (in my Jupyter notebook):
import numpy as np
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
import bokeh
import pandas as pd
from datetime import datetime as dt
bokeh.sampledata.download()
from bokeh.sampledata.stocks import AAPL
output_notebook()
As descried at, https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.loc.html, .loc is used in operations with the index (or boolean lists); date is not in the index in your dataframe (it is a regular column).
I hope this helps.
You dataframe sub-view is empty:
In [3]: import pandas as pd
...: from bokeh.sampledata.stocks import AAPL
...: df = pd.DataFrame.from_dict(AAPL)
...: weekapple = df.loc["2000-03-01":"2000-04-01"]
In [4]: weekapple
Out[4]:
Empty DataFrame
Columns: [date, open, high, low, close, volume, adj_close]
Index: []
I wanted to do the tutorial Python for Finance, Part I: Yahoo & Google Finance API, pandas, and matplotlib. As the Google API is not working properly, I used IEX data. When I try to reindex the pandas data Frame, to have consecutive dates, all values that were available before are replaced with NaN. The new index I think has something to do with the two different types of the indexes. I am quite new to programming and have no clue how to solve this problem after reading the pandas documentation.
The code looks like this.
Modules used
matplotlib==2.2.3
pandas_datareader==0.6.0
googlefinance.client==1.3.0
pandas==0.23.4
googlefinance==0.7
import pandas as pd
pd.core.common.is_list_like = pd.api.types.is_list_like
import pandas_datareader as pdr
import matplotlib.pyplot as plt
from datetime import datetime
# Define the instruments to download. We would like to see Apple, Microsoft and the S&P500 index.
#tickers = ["AAPL","MSFT","SPY"]
# We would like all available data from 01/01/2017 until 9/1/2018.
start_date = datetime(2017,1,1)
end_date = datetime(2018,9,1)
# User pandas_reader.data.DataReader to load the desired data.
aapl_data = pdr.DataReader("SPY",'iex',start_date,end_date, pause = 5)
#ms_data = pdr.DataReader('SPY','iex',start_date,end_date,pause = 5)
print(aapl_data.tail(9))
#print(ms_data.tail(9))
print(aapl_data.index.dtype)
close = aapl_data
# Getting all weekdays between start_date and end_date
all_weekdays = pd.date_range(start=start_date, end=end_date, freq='B')
print(all_weekdays.dtype)
# How do we align the existing prices in adj_close with our new set of dates?
# All we need to do is reindex close using all_weekdays as the new index
close = close.reindex(all_weekdays)
# Reindexing will insert missing values (NaN) for the dates that were not present
print(close)
You need convert both the index to datetime dtype.
Use pd.to_datetime : Docs
aapl_data = pdr.DataReader("SPY",'iex',start_date,end_date, pause = 5)
aapl_data.index = pd.to_datetime(aapl_data.index) # converts index to datetime dtype
close = aapl_data
all_weekdays = pd.date_range(start=start_date, end=end_date, freq='B')
close = close.reindex(all_weekdays)
print(aapl_data.index.dtype)
# Output:
datetime64[ns]
print(all_weekdays.dtype)
# Output:
datetime64[ns]