Multiple fields using Pandas and Quandl - python-3.x

I am using Quandl to download daily NAV prices for a specific set of Mutual Fund schemes. However it returns a data object instead of returning the specific value
import quandl
import pandas as pd
quandl.ApiConfig.api_key = <Quandl Key>
list2 = [102505, 129221, 102142, 103197, 100614, 100474, 102913, 102921]
def get_nav(mf_code):
df_main=pd.DataFrame()
code=str(mf_code)
df_main=quandl.get("AMFI/"+code,start_date='2019-04-05',end_date='2019-04- 05')
return (df_main['Net Asset Value'])
for each in list2:
mf_code=each
nav = get_nav(mf_code)
print (nav)
Output for the above code :
Date
2019-04-05 29.8916
Name: Net Asset Value, dtype: float64
Date
2019-04-05 19.354
Name: Net Asset Value, dtype: float64
whereas,
I am looking to extract only the values i.e. 29.8916, 19.354, etc
Updated code:
def get_nav(mf_code):
nav1=[]
df_main=pd.DataFrame()
code=str(mf_code)
# try:
df_main=quandl.get("AMFI/"+code,start_date='2019-04-05',end_date='2019-04-05')
nav_value=df_main['Net Asset Value']
if not nav_value.empty:
nav1=nav_value[0]
print(nav1)
# print(df_main.head())
# except IndexError:
# nav_value=0
return (nav1)
#Use merged sheet for work
df_port=pd.read_excel(fp_out)
df_port['Current Price']=df_port['Scheme_Code'].apply(lambda x:get_nav(x))
print(df_port['Current Price'].head())
df_port.to_excel(fp_out2)

By default, quandl Time-series API returns you a dataframe with date as index, even if there is only one row.
If you only need the value of first row, you can use iloc:
if not nav.empty:
print (nav.iloc[0])
or just plain integer indexing:
if not nav.empty:
print (nav[0])

Related

Replace items like A2 as AA in the dataframe

I have a list of items, like "A2BCO6" and "ABC2O6". I want to replace them as A2BCO6--> AABCO6 and ABC2O6 --> ABCCO6. The number of items are much more than presented here.
My dataframe is like:
listAB:
Finctional_Group
0 Ba2NbFeO6
1 Ba2ScIrO6
3 MnPb2WO6
I create a duplicate array and tried to replace with following way:
B = ["Ba2", "Pb2"]
C = ["BaBa", "PbPb"]
for i,j in range(len(B)), range(len(C)):
listAB["Finctional_Group"]= listAB["Finctional_Group"].str.strip().str.replace(B[i], C[j])
But it does not produce correct output. The output is like:
listAB:
Finctional_Group
0 PbPbNbFeO6
1 PbPbScIrO6
3 MnPb2WO6
Please suggest the necessary correction in the code.
Many thanks in advance.
I used for simplicity purpose chemparse package that seems to suite your needs.
As always we import the required packages, in this case chemparse and pandas.
import chemparse
import pandas as pd
then we create a pandas.DataFrame object like in your example with your example data.
df = pd.DataFrame(
columns=["Finctional_Group"], data=["Ba2NbFeO6", "Ba2ScIrO6", "MnPb2WO6"]
)
Our parser function will use chemparse.parse_formula which returns a dict of element and their frequency in a molecular formula.
def parse_molecule(molecule: str) -> dict:
# initializing empty string
molecule_in_string = ""
# iterating over all key & values in dict
for key, value in chemparse.parse_formula(molecule).items():
# appending number of elements to string
molecule_in_string += key * int(value)
return molecule_in_string
molecule_in_string contains the molecule formula without numbers now. We just need to map this function to all elements in our dataframe column. For that we can do
df = df.applymap(parse_molecule)
print(df)
which returns:
0 BaBaNbFeOOOOOO
1 BaBaScIrOOOOOO
2 MnPbPbWOOOOOO
dtype: object
Source code for chemparse: https://gitlab.com/gmboyer/chemparse

How to show alternative calendar dates in mplfinance?

TL;DR - The issue
I have an mplfinance plot based on a pandas dataframe in which the indices are in Georgian calendar format and I need to have them displayed as Jalali format.
My data and code
My data looks like this:
open high low close
date
2021-03-15 67330.0 69200.0 66870.0 68720.0
2021-03-16 69190.0 71980.0 69000.0 71620.0
2021-03-17 72450.0 73170.0 71700.0 71820.0
2021-03-27 71970.0 73580.0 70000.0 73330.0
2021-03-28 73330.0 73570.0 71300.0 71850.0
... ... ... ... ...
The first column is both a date and the index. This is required by mplfinance plot the data correctly;
Which I can plot with something like this:
import mplfinance as mpf
mpf.plot(chart_data.tail(7), figratio=(16,9), type="candle", style='yahoo', ylabel='', tight_layout=True, xrotation=90)
Where chart_data is the data above and the rest are pretty much formatting stuff.
What I have now
My chart looks like this:
However, the I need the dates to look like this: 1400-01-12. Here's a table of equivalence to further demonstrate my case.
2021-03-15 1399-12-25
2021-03-16 1399-12-26
2021-03-17 1399-12-27
2021-03-27 1400-01-07
2021-03-28 1400-01-08
What I've tried
Setting Jdates as my indices:
chart_data.index = history.jdate
mpf.plot(chart_data_j)
Throws this exception:
TypeError('Expect data.index as DatetimeIndex')
So I tried converting the jdates into datetimes:
chart_data_j.index = pd.to_datetime(history.jdate)
Which threw an out of bounds exception:
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1398-03-18 00:00:00
So I though maybe changing the timezone/locale would be an option, so I tried changing the timezones, following the official docs:
pd.to_datetime(history.date).tz_localize(tz='US/Eastern')
But I got this exception:
raise TypeError(f"{ax_name} is not a valid DatetimeIndex or PeriodIndex")
And finally I tried using libraries such as PersianTools and pandas_jalali to no avail.
You can get this to work by creating your own custom DateFormatter class, and using mpf.plot() kwarg returnfig=True to gain access to the Axes objects to be able to install your own custom DateFormatter.
I have written a custom DateFormatter (see code below) that is aware of the special way that MPLfinance handles the x-axis when show_nontrading=False (i.e. the default value).
import pandas as pd
import mplfinance as mpf
import jdatetime as jd
import matplotlib.dates as mdates
from matplotlib.ticker import Formatter
class JalaliDateTimeFormatter(Formatter):
"""
Formatter for JalaliDate in mplfinance.
Handles both `show_nontrading=False` and `show_nontrading=True`.
When show_nonntrading=False, then the x-axis is indexed by an
integer representing the row number in the dataframe, thus:
Formatter for axis that is indexed by integer, where the integers
represent the index location of the datetime object that should be
formatted at that lcoation. This formatter is used typically when
plotting datetime on an axis but the user does NOT want to see gaps
where days (or times) are missing. To use: plot the data against
a range of integers equal in length to the array of datetimes that
you would otherwise plot on that axis. Construct this formatter
by providing the arrange of datetimes (as matplotlib floats). When
the formatter receives an integer in the range, it will look up the
datetime and format it.
"""
def __init__(self, dates=None, fmt='%b %d, %H:%M', show_nontrading=False):
self.dates = dates
self.len = len(dates) if dates is not None else 0
self.fmt = fmt
self.snt = show_nontrading
def __call__(self, x, pos=0):
'''
Return label for time x at position pos
'''
if self.snt:
jdate = jd.date.fromgregorian(date=mdates.num2date(x))
formatted_date = jdate.strftime(self.fmt)
return formatted_date
ix = int(round(x,0))
if ix >= self.len or ix < 0:
date = None
formatted_date = ''
else:
date = self.dates[ix]
jdate = jd.date.fromgregorian(date=mdates.num2date(date))
formatted_date = jdate.strftime(self.fmt)
return formatted_date
# ---------------------------------------------------
df = pd.read_csv('so_67001540.csv',index_col=0,parse_dates=True)
mpf.plot(df,figratio=(16,9),type="candle",style='yahoo',ylabel='',xrotation=90)
dates = [mdates.date2num(d) for d in df.index]
formatter = JalaliDateTimeFormatter(dates=dates,fmt='%Y-%m-%d')
fig, axlist = mpf.plot(df,figratio=(16,9),
type="candle",style='yahoo',
ylabel='',xrotation=90,
returnfig=True)
axlist[0].xaxis.set_major_formatter(formatter)
mpf.show()
The file 'so_67001540.csv' looks like this:
date,open,high,low,close,alt_date
2021-03-15,67330.0,69200.0,66870.0,68720.0,1399-12-25
2021-03-16,69190.0,71980.0,69000.0,71620.0,1399-12-26
2021-03-17,72450.0,73170.0,71700.0,71820.0,1399-12-27
2021-03-27,71970.0,73580.0,70000.0,73330.0,1400-01-07
2021-03-28,73330.0,73570.0,71300.0,71850.0,1400-01-08
When you run the above script, you should get the following two plots:
Have you tried making these dates
1399-12-25
1399-12-26
1399-12-27
1400-01-07
1400-01-08
the index of the dataframe (maybe that's what you mean by "swapping the indices"?) and set kwarg datetime_format='%Y-%m-%d' ?
I think that should work.
UPDATE:
It appears to me that the problem is that
mplfinace requires a Pandas.DatetimeIndex as the index of your dataframe, and
Pandas.DatetimeIndex is made up of Pandas.Timestamp objects, and
Pandas.Timestamp has limits which preclude dates having years less than 1677:
In [1]: import pandas as pd
In [2]: pd.Timestamp.max
Out[2]: Timestamp('2262-04-11 23:47:16.854775807')
In [3]: pd.Timestamp.min
Out[3]: Timestamp('1677-09-21 00:12:43.145225')
I am going to poke around and see if I can come up with another solution. Internally Matplotlib dates can go to year zero.

How to create a dictionary of dates as keys with value pair as list of three temperatures in python

The function extracts the max, min and avg temperatures for all days in the list. I want to combine the data into a dictionary; i.e. the returned temperatures and values and the dates as keys. Can't seem to get this to work. I may be going about this in the wrong way. End aim is to create a chart with date and the three temperatures for each day. I was anticipating something like: my_dict: {date,[list of 3 temps], date2,[list of 3 temps2]...}
lstdates=['09-27','09-28','09-29','09-30','10-1']
def daily_normals(date):
"""Daily Normals.
Args:
date (str): A date string in the format '%m-%d'
Returns:
A list of tuples containing the daily normals, tmin, tavg, and tmax
"""
sel = [func.min(meas.tobs), func.avg(meas.tobs), func.max(meas.tobs)]
return session.query(*sel).filter(func.strftime("%m-%d", meas.date) == date).all()
lstdaynorm=[]
my_dict ={}
for i in lstdates:
print(i)
dn=daily_normals(l)
lstdaynorm.append(dn)
my_dict.append(i,dn)
For starters, a dict object has no method called append, so my_dict.append(i,dn) is invalid syntax. Also, your iterator variable is i, but you called daily_normals on l. You should convert the tuple dn to a list and directly insert that list into the dict to achieve what you want:
lstdaynorm=[]
my_dict = {}
for i in lstdates:
dn=daily_normals(i)
lstdaynorm.append(dn)
my_dict[i] = list(dn[0][1:]) # extract elements of tuple excluding date from list and convert it to list
my_dict = dict(my_dict)
To put this in a dataframe:
import pandas as pd
df = pd.DataFrame.from_dict(my_dict, orient='index', columns=['tmin', 'tavg', 'tmax'])

How to slice a pandas.DatetimeIndex?

What is the best way to get dates between, say, '2019-01-08' and '2019-01-16', from the pandas.DatetimeIndex object dti as constructed below? Ideally, some concise syntax like dti['2019-01-08':'2019-01-16']?
import pandas as pd
dti = pd.bdate_range(start='2019-01-01', end='2019-02-15')
DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
'2019-01-07', '2019-01-08', '2019-01-09', '2019-01-10',
'2019-01-11', '2019-01-14', '2019-01-15', '2019-01-16',
'2019-01-17', '2019-01-18', '2019-01-21', '2019-01-22',
'2019-01-23', '2019-01-24', '2019-01-25', '2019-01-28',
'2019-01-29', '2019-01-30', '2019-01-31', '2019-02-01',
'2019-02-04', '2019-02-05', '2019-02-06', '2019-02-07',
'2019-02-08', '2019-02-11', '2019-02-12', '2019-02-13',
'2019-02-14', '2019-02-15'],
dtype='datetime64[ns]', freq='B')
You can do it with the slice_indexer for DateTimeIndex
pandas.DateTimeIndex.slice_indexer(start, stop, step, [...])
It returns the indexes of the datetime items so you can pass it to dti
Example:
dti[dti.slice_indexer("2019-01-07", "2019-01-17")]
If you read the source code for DatetimeIndex.__getitem__ method, the individual dates in a DatetimeIndex is stored in a DatetimeArray. To support slicing, you need to get the integer indices of the start and stop date in that array. I suggest that you file a feature request with the pandas development team.
Meanwhile, you can monkey-patch it in:
from pandas.core.indexes.datetimes import DatetimeIndex
__old_getitem = DatetimeIndex.__getitem__
def __new_getitem(index, key):
if isinstance(key, slice):
_key = index.slice_indexer(key.start, key.stop, key.step)
else:
_key = key
return __old_getitem(index, _key)
DatetimeIndex.__getitem__ = __new_getitem
# Now you can slice
dti['2019-01-08':'2019-01-16':4]

Pandas dataframe column float inside string (i.e. "float") to int

I'm trying to clean some data in a pandas df and I want the 'volume' column to go from a float to an int.
EDIT: The main issue was that the dtype for the float variable I was looking at was actually a str. So first it needed to be floated, before being changed.
I deleted the two other solutions I was considering, and left the one I used. The top one is the one with the errors, and the bottom one is the solution.
import pandas as pd
import numpy as np
#Call the df
t_df = pd.DataFrame(client.get_info())
#isolate only the 'symbol' column in t_df
tickers = t_df.loc[:, ['symbol']]
def tick_data(tickers):
for i in tickers:
tick_df = pd.DataFrame(client.get_ticker())
tick = tick_df.loc[:, ['symbol', 'volume']]
tick.iloc[:,['volume']].astype(int)
if tick['volume'].dtype != np.number:
print('yes')
else:
print('no')
return tick
Below is the revised code:
import pandas as pd
#Call the df
def ticker():
t_df = pd.DataFrame(client.get_info())
#isolate only the 'symbol' column in t_df
tickers = t_df.loc[:, ['symbol']]
for i in tickers:
#pulls out market data for each symbol
tickers = pd.DataFrame(client.get_ticker())
#isolates the symbol and volume
tickers = tickers.loc[:, ['symbol', 'volume']]
#floats volume
tickers['volume'] = tickers.loc[:, ['volume']].astype(float)
#volume to int
tickers['volume'] = tickers.loc[:, ['volume']].astype(int)
#deletes all symbols > 20,000 in volume, returns only symbol
tickers = tickers.loc[tickers['volume'] >= 20000, 'symbol']
return tickers
You have a few issues here.
In your first example, iloc only accepts integer locations for the rows and columns in the DataFrame, which is generating your error. I.e.
tick.iloc[:,['volume']].astype(int)
doesn't work. If you want label-based indexing, use .loc:
tick.loc[:,['volume']].astype(int)
Alternately, use bracket-based indexing, which allows you to take a whole column directly without using slice syntax (:) on the rows:
tick['volume'].astype(int)
Next, astype(int) returns a new value, it does not modify in-place. So what you want is
tick['volume'] = tick['volume'].astype(int)
As for your dtype is a number check, you don't want to check == np.number, but you don't want to check is either, which only returns True if it's np.number and not if it's a subclass like np.int64. Use np.issubdtype, or pd.api.types.is_numeric_dtype, i.e.:
if np.issubdtype(tick['volume'].dtype, np.number):
or:
if pd.api.types.is_numeric_dtype(tick['volume'].dtype):

Resources