Max and min values in pandas - python-3.x

I have the following data:
High Low Open Close Volume Adj Close
Date
1999-12-31 1472.420044 1458.189941 1464.469971 1469.250000 374050000 1469.250000
2000-01-03 1478.000000 1438.359985 1469.250000 1455.219971 931800000 1455.219971
2000-01-04 1455.219971 1397.430054 1455.219971 1399.420044 1009000000 1399.420044
2000-01-05 1413.270020 1377.680054 1399.420044 1402.109985 1085500000 1402.109985
2000-01-06 1411.900024 1392.099976 1402.109985 1403.449951 1092300000 1403.449951
... ... ... ... ... ... ...
2020-01-06 3246.840088 3214.639893 3217.550049 3246.280029 3674070000 3246.280029
2020-01-07 3244.909912 3232.429932 3241.860107 3237.179932 3420380000 3237.179932
2020-01-08 3267.070068 3236.669922 3238.590088 3253.050049 3720890000 3253.050049
2020-01-09 3275.580078 3263.669922 3266.030029 3274.699951 3638390000 3274.699951
2020-01-10 3282.989990 3268.010010 3281.810059 3273.739990 920449258 3273.739990
5039 rows × 6 columns
Since this is the daily data this was resampled to weekly to find the 52 week high and low.
weekly_high = data.High.groupby(pd.Grouper(freq='M')).tail(52)
weekly_low = data.Low.groupby(pd.Grouper(freq='M')).tail(52)
Here is the problem:
weekly_high.max()
yields: 3282.989990234375
weekly_low.min()
yeilds: 666.7899780273438
These value are are issue because 3283.0 is the high so why am i getting in deimals? Secondly weekly low is is 666 which i know for a fact is incorrect. How can i fix this?

hi you can try the following code:
data['52weekhigh'] = data.High.rolling(252).max()
data['52weeklow'] = data.Low.rolling(252).min()
This allows you to prevent having to resample on a monthly basis and gives you the rolling 52 week high (52 weeks == 252 trading days) Let me know if you need any further clarification.

Related

API - Dataframe - getting updated value in the same 1st row instead of new row each time

Problem: (I am getting updated value in the same 1st row instead of new row each time.)
Details: I am trying to get live data from API, I am trying from several days but no code is giving proper dataframe results.
I am having around 100-150 stocks list to be read from .xlsx file and trying to get live data in dataframe and then write it in .xlsx file (1 stock in 1 sheet, 2nd stock in 2nd sheet in below format:
Date Time Symbol ltp ltq Volume
2021-10-01 11:00:00 A 103.45 50430 110350470
2021-10-01 11:00:01 A 104.29 99500 110400900
2021-10-01 11:00:02 A 105.14 70570 110500400
2021-10-01 11:00:03 A 105.99 90640 110570970
2021-10-01 11:00:04 A 106.84 65710 110661610
2021-10-01 11:00:05 A 107.69 98780 110727320
2021-10-01 11:00:06 A 108.54 84850 110826100
2021-10-01 11:00:07 A 109.39 77920 110910950
2021-10-01 11:00:08 A 110.24 61990 110988870
2021-10-01 11:00:09 A 111.09 53060 111050860
2021-10-01 11:00:10 A 111.94 74130 111103920
and in 1 Main sheet Dashboard with hyperlink to check each stock:
e.g.
Sr StockSheet Name ltp high open low previous close Today's volume
1 A 001_A
2 B 002_B
3 C 003_C
4 D 004_D
5 E 005_E
6 F 006_F
7 G 007_G
8 H 008_H
9 I 009_I
10 J 010_J
after credential and access token of API below is the main code:
socket_opened = False
def event_handler_quote_update(tick):
tick_symbol = tick['instrument'].symbol
tick_ltp = tick['ltp']
tick_volume = tick['volume']
tick_ltq = tick['ltq']
tick_ltt = datetime.datetime.fromtimestamp(tick['ltt'])
tick_timestamp = datetime.datetime.fromtimestamp(tick['exchange_time_stamp'])
d = {'symbol': [tick_symbol], 'ltp': [tick_ltp], 'ltq': [tick_ltq], 'volume': [tick_volume], 'ltt': [tick_ltt], 'timestamp': [tick_timestamp]}
df = pd.DataFrame(data=d)
xw.Book('test1.xlsx').sheets['Sheet1'].range('A1').value = df
print(df)
def open_callback():
global socket_opened
socket_opened = True
alice.start_websocket(event_handler_quote_update, open_callback, run_in_background=True)
call = ()
for symbol in fno_list:
callable(alice.get_instrument_by_symbol('NSE', symbol))
while socket_opened == False:
pass
alice.subscribe(alice.get_instrument_by_symbol('NSE', 'RELIANCE'), LiveFeedType.MARKET_DATA)
Code Ended.
Help: As I am new and learning python I'm looking forward for any improvement/suggestions/better way to do this.

How to create a Series of datetime values for the nth calendar of the month for this year?

I would like to create a [function that returns] a pandas series of datetime values for the Nth calendar day of each month for the current year. An added wrinkle is I would also need it to be the previous business day if it happens to fall on the weekend. Bonus would be to check against known holidays as well.
For example, I'd like the output to look like this for the [business day prior to or equal to the] 14th day of the month
0 2021-01-14
1 2021-02-12
2 2021-03-12
3 2021-04-14
4 2021-05-14
5 2021-06-14
6 2021-07-14
7 2021-08-13
8 2021-09-14
9 2021-10-14
10 2021-11-12
11 2021-12-14
I've tried using pd.date_range() and pd.bdate_range() and did not get the desired results. Example:
pd.date_range("2021-01-14","2021-12-14", periods=12)
>> DatetimeIndex(['2021-01-14 00:00:00',
'2021-02-13 08:43:38.181818182',
'2021-03-15 17:27:16.363636364',
'2021-04-15 02:10:54.545454546',
'2021-05-15 10:54:32.727272728',
'2021-06-14 19:38:10.909090910',
'2021-07-15 04:21:49.090909092',
'2021-08-14 13:05:27.272727272',
'2021-09-13 21:49:05.454545456',
'2021-10-14 06:32:43.636363640',
'2021-11-13 15:16:21.818181820',
'2021-12-14 00:00:00'],
dtype='datetime64[ns]', freq=None)>>
Additionally this requires knowing the first and last month days that would be the start and end. Analogous tests with pd.bdate_range() resulted mostly in errors.
Similar approach to Pandas Date Range Monthly on Specific Day of Month but subtract a Bday to get the previous buisness day. Also start at 12/31 of the previous year to get all values for the current year:
def get_date_range(day_of_month, year=pd.Timestamp.now().year):
return (
pd.date_range(start=pd.Timestamp(year=year - 1, month=12, day=31),
periods=12, freq='MS') +
pd.Timedelta(days=day_of_month) -
pd.tseries.offsets.BDay()
)
Usage for year:
get_date_range(14)
DatetimeIndex(['2021-01-14', '2021-02-12', '2021-03-12', '2021-04-14',
'2021-05-14', '2021-06-14', '2021-07-14', '2021-08-13',
'2021-09-14', '2021-10-14', '2021-11-12', '2021-12-14'],
dtype='datetime64[ns]', freq=None)
Or for another year:
get_date_range(14, 2020)
DatetimeIndex(['2020-01-14', '2020-02-14', '2020-03-13', '2020-04-14',
'2020-05-14', '2020-06-12', '2020-07-14', '2020-08-14',
'2020-09-14', '2020-10-14', '2020-11-13', '2020-12-14'],
dtype='datetime64[ns]', freq=None)
With Holidays (this is non-vectorized so it will raise a PerformanceWarning):
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
bday_us = CustomBusinessDay(calendar=USFederalHolidayCalendar())
def get_date_range(day_of_month, year=pd.Timestamp.now().year):
return (
pd.date_range(start=pd.Timestamp(year=year - 1, month=12, day=31),
periods=12, freq='MS') +
pd.Timedelta(days=day_of_month) -
bday_us
)
get_date_range(25)
DatetimeIndex(['2021-01-25', '2021-02-25', '2021-03-25', '2021-04-23',
'2021-05-25', '2021-06-25', '2021-07-23', '2021-08-25',
'2021-09-24', '2021-10-25', '2021-11-24', '2021-12-23'],
dtype='datetime64[ns]', freq=None)
You can use the months start and then add a timedelta to get it to the day you want. So for your example it would be:
pd.date_range(start=pd.Timestamp("2020-12-14"), periods=12, freq='MS') + pd.Timedelta(days=13)
Output:
DatetimeIndex(['2021-01-14', '2021-02-14', '2021-03-14', '2021-04-14',
'2021-05-14', '2021-06-14', '2021-07-14', '2021-08-14',
'2021-09-14', '2021-10-14', '2021-11-14', '2021-12-14'],
dtype='datetime64[ns]', freq=None)
to move to the previous business day use (see: Pandas offset DatetimeIndex to next business if date is not a business day and Most recent previous business day in Python) :
(pd.date_range(start=pd.Timestamp("2021-06-04"), periods=12, freq='MS') + pd.Timedelta(days=4)).map(lambda x: x - pd.tseries.offsets.BDay())
output:
DatetimeIndex(['2021-07-02', '2021-08-05', '2021-09-03', '2021-10-04',
'2021-11-04', '2021-12-03', '2022-01-06', '2022-02-04',
'2022-03-04', '2022-04-04', '2022-05-05', '2022-06-03'],
dtype='datetime64[ns]', freq=None)

How to Parse Timezone Datetime objects with Pandas

So, rather simply:
have columns with dates like this: "2000-05-30 17:27:00-05:00"
data open high low close
0 2000-05-30 17:27:00-05:00 0.9302 0.9302 0.9302 0.9302
1 2000-05-30 17:35:00-05:00 0.9304 0.9305 0.9304 0.9305
2 2000-05-30 17:38:00-05:00 0.9304 0.9304 0.9303 0.9303
3 2000-05-30 17:43:00-05:00 0.9301 0.9301 0.9300 0.9300
4 2000-05-30 17:44:00-05:00 0.9298 0.9298 0.9297 0.9297
I have tried the custom parser:
custom_parser = lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S-%z")
data = pd.read_csv('eurusd_2.csv', parse_dates=[0], date_parser=custom_parser, parse_dates=True)
but this doesn't work; I think it is due to the ":" in the timezone "-05:00" - any solutions for this?
Is there a way to specify the timezone format similar to how specifies the year/month/day format?
Many thanks in advance,
C
The simple:
df = pd.read_csv('file.csv', parse_dates=['data'], sep='\s\s+')
seems to work. df['data'] is:
0 2000-05-30 17:27:00-05:00
1 2000-05-30 17:35:00-05:00
2 2000-05-30 17:38:00-05:00
3 2000-05-30 17:43:00-05:00
4 2000-05-30 17:44:00-05:00
Name: data, dtype: datetime64[ns, pytz.FixedOffset(-300)]
Maybe your data contains some irregularities.

Is it possible to interpolate across a time series with influence from other columns?

I have a dataframe that contains missing data. I'm interested in exploring interpolation as a possible alternative to removing columns with missing data.
Below is a subset of the dataset. 'a_out' is outdoor temperature while 'b_in' etc. are temperatures from rooms in the same house.
a_out b_in c_in d_in e_in f_in
... ... ... ... ... ... ...
03/01/2016 6.51 17.71 15.15 14.04 15.27 16.32
04/01/2016 5.94 17.49 14.34 14.71
05/01/2016 6.74 17.57 14.80 15.18
06/01/2016 5.86 17.49 14.68 18.43 15.57
07/01/2016 5.18 17.18 14.02 14.88
08/01/2016 2.84 16.80 13.15 14.51 14.48
... ... ... ... ... ... ...
Might there be a way to interpolate the missing data, but with some weighting based on intact data in other columns? Perhaps 'cubic' interpolation could do the trick?
Thanks!

cumalativive the all other columns expect date column in python ML with cumsum()

I have stock data set like
**Date Open High ... Close Adj Close Volume**
0 2014-09-17 465.864014 468.174011 ... 457.334015 457.334015 21056800
1 2014-09-18 456.859985 456.859985 ... 424.440002 424.440002 34483200
2 2014-09-19 424.102997 427.834991 ... 394.795990 394.795990 37919700
3 2014-09-20 394.673004 423.295990 ... 408.903992 408.903992 36863600
4 2014-09-21 408.084991 412.425995 ... 398.821014 398.821014 26580100
I need to cumulative sum the columns Open,High,Close,Adj Close, Volume
I tried this df.cumsum(), its shows the the error time stamp error.
I think for processing trade data is best create DatetimeIndex:
#if necessary
#df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
And then if necessary cumulative sum for all column:
df = df.cumsum()
If want cumulative sum only for some columns:
cols = ['Open','High','Close','Adj Close','Volume']
df[cols] = df.cumsum()

Resources