Problem: (I am getting updated value in the same 1st row instead of new row each time.)
Details: I am trying to get live data from API, I am trying from several days but no code is giving proper dataframe results.
I am having around 100-150 stocks list to be read from .xlsx file and trying to get live data in dataframe and then write it in .xlsx file (1 stock in 1 sheet, 2nd stock in 2nd sheet in below format:
Date Time Symbol ltp ltq Volume
2021-10-01 11:00:00 A 103.45 50430 110350470
2021-10-01 11:00:01 A 104.29 99500 110400900
2021-10-01 11:00:02 A 105.14 70570 110500400
2021-10-01 11:00:03 A 105.99 90640 110570970
2021-10-01 11:00:04 A 106.84 65710 110661610
2021-10-01 11:00:05 A 107.69 98780 110727320
2021-10-01 11:00:06 A 108.54 84850 110826100
2021-10-01 11:00:07 A 109.39 77920 110910950
2021-10-01 11:00:08 A 110.24 61990 110988870
2021-10-01 11:00:09 A 111.09 53060 111050860
2021-10-01 11:00:10 A 111.94 74130 111103920
and in 1 Main sheet Dashboard with hyperlink to check each stock:
e.g.
Sr StockSheet Name ltp high open low previous close Today's volume
1 A 001_A
2 B 002_B
3 C 003_C
4 D 004_D
5 E 005_E
6 F 006_F
7 G 007_G
8 H 008_H
9 I 009_I
10 J 010_J
after credential and access token of API below is the main code:
socket_opened = False
def event_handler_quote_update(tick):
tick_symbol = tick['instrument'].symbol
tick_ltp = tick['ltp']
tick_volume = tick['volume']
tick_ltq = tick['ltq']
tick_ltt = datetime.datetime.fromtimestamp(tick['ltt'])
tick_timestamp = datetime.datetime.fromtimestamp(tick['exchange_time_stamp'])
d = {'symbol': [tick_symbol], 'ltp': [tick_ltp], 'ltq': [tick_ltq], 'volume': [tick_volume], 'ltt': [tick_ltt], 'timestamp': [tick_timestamp]}
df = pd.DataFrame(data=d)
xw.Book('test1.xlsx').sheets['Sheet1'].range('A1').value = df
print(df)
def open_callback():
global socket_opened
socket_opened = True
alice.start_websocket(event_handler_quote_update, open_callback, run_in_background=True)
call = ()
for symbol in fno_list:
callable(alice.get_instrument_by_symbol('NSE', symbol))
while socket_opened == False:
pass
alice.subscribe(alice.get_instrument_by_symbol('NSE', 'RELIANCE'), LiveFeedType.MARKET_DATA)
Code Ended.
Help: As I am new and learning python I'm looking forward for any improvement/suggestions/better way to do this.
I would like to create a [function that returns] a pandas series of datetime values for the Nth calendar day of each month for the current year. An added wrinkle is I would also need it to be the previous business day if it happens to fall on the weekend. Bonus would be to check against known holidays as well.
For example, I'd like the output to look like this for the [business day prior to or equal to the] 14th day of the month
0 2021-01-14
1 2021-02-12
2 2021-03-12
3 2021-04-14
4 2021-05-14
5 2021-06-14
6 2021-07-14
7 2021-08-13
8 2021-09-14
9 2021-10-14
10 2021-11-12
11 2021-12-14
I've tried using pd.date_range() and pd.bdate_range() and did not get the desired results. Example:
pd.date_range("2021-01-14","2021-12-14", periods=12)
>> DatetimeIndex(['2021-01-14 00:00:00',
'2021-02-13 08:43:38.181818182',
'2021-03-15 17:27:16.363636364',
'2021-04-15 02:10:54.545454546',
'2021-05-15 10:54:32.727272728',
'2021-06-14 19:38:10.909090910',
'2021-07-15 04:21:49.090909092',
'2021-08-14 13:05:27.272727272',
'2021-09-13 21:49:05.454545456',
'2021-10-14 06:32:43.636363640',
'2021-11-13 15:16:21.818181820',
'2021-12-14 00:00:00'],
dtype='datetime64[ns]', freq=None)>>
Additionally this requires knowing the first and last month days that would be the start and end. Analogous tests with pd.bdate_range() resulted mostly in errors.
Similar approach to Pandas Date Range Monthly on Specific Day of Month but subtract a Bday to get the previous buisness day. Also start at 12/31 of the previous year to get all values for the current year:
def get_date_range(day_of_month, year=pd.Timestamp.now().year):
return (
pd.date_range(start=pd.Timestamp(year=year - 1, month=12, day=31),
periods=12, freq='MS') +
pd.Timedelta(days=day_of_month) -
pd.tseries.offsets.BDay()
)
Usage for year:
get_date_range(14)
DatetimeIndex(['2021-01-14', '2021-02-12', '2021-03-12', '2021-04-14',
'2021-05-14', '2021-06-14', '2021-07-14', '2021-08-13',
'2021-09-14', '2021-10-14', '2021-11-12', '2021-12-14'],
dtype='datetime64[ns]', freq=None)
Or for another year:
get_date_range(14, 2020)
DatetimeIndex(['2020-01-14', '2020-02-14', '2020-03-13', '2020-04-14',
'2020-05-14', '2020-06-12', '2020-07-14', '2020-08-14',
'2020-09-14', '2020-10-14', '2020-11-13', '2020-12-14'],
dtype='datetime64[ns]', freq=None)
With Holidays (this is non-vectorized so it will raise a PerformanceWarning):
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar
from pandas.tseries.offsets import CustomBusinessDay
bday_us = CustomBusinessDay(calendar=USFederalHolidayCalendar())
def get_date_range(day_of_month, year=pd.Timestamp.now().year):
return (
pd.date_range(start=pd.Timestamp(year=year - 1, month=12, day=31),
periods=12, freq='MS') +
pd.Timedelta(days=day_of_month) -
bday_us
)
get_date_range(25)
DatetimeIndex(['2021-01-25', '2021-02-25', '2021-03-25', '2021-04-23',
'2021-05-25', '2021-06-25', '2021-07-23', '2021-08-25',
'2021-09-24', '2021-10-25', '2021-11-24', '2021-12-23'],
dtype='datetime64[ns]', freq=None)
You can use the months start and then add a timedelta to get it to the day you want. So for your example it would be:
pd.date_range(start=pd.Timestamp("2020-12-14"), periods=12, freq='MS') + pd.Timedelta(days=13)
Output:
DatetimeIndex(['2021-01-14', '2021-02-14', '2021-03-14', '2021-04-14',
'2021-05-14', '2021-06-14', '2021-07-14', '2021-08-14',
'2021-09-14', '2021-10-14', '2021-11-14', '2021-12-14'],
dtype='datetime64[ns]', freq=None)
to move to the previous business day use (see: Pandas offset DatetimeIndex to next business if date is not a business day and Most recent previous business day in Python) :
(pd.date_range(start=pd.Timestamp("2021-06-04"), periods=12, freq='MS') + pd.Timedelta(days=4)).map(lambda x: x - pd.tseries.offsets.BDay())
output:
DatetimeIndex(['2021-07-02', '2021-08-05', '2021-09-03', '2021-10-04',
'2021-11-04', '2021-12-03', '2022-01-06', '2022-02-04',
'2022-03-04', '2022-04-04', '2022-05-05', '2022-06-03'],
dtype='datetime64[ns]', freq=None)
So, rather simply:
have columns with dates like this: "2000-05-30 17:27:00-05:00"
data open high low close
0 2000-05-30 17:27:00-05:00 0.9302 0.9302 0.9302 0.9302
1 2000-05-30 17:35:00-05:00 0.9304 0.9305 0.9304 0.9305
2 2000-05-30 17:38:00-05:00 0.9304 0.9304 0.9303 0.9303
3 2000-05-30 17:43:00-05:00 0.9301 0.9301 0.9300 0.9300
4 2000-05-30 17:44:00-05:00 0.9298 0.9298 0.9297 0.9297
I have tried the custom parser:
custom_parser = lambda x: datetime.strptime(x, "%Y-%m-%d %H:%M:%S-%z")
data = pd.read_csv('eurusd_2.csv', parse_dates=[0], date_parser=custom_parser, parse_dates=True)
but this doesn't work; I think it is due to the ":" in the timezone "-05:00" - any solutions for this?
Is there a way to specify the timezone format similar to how specifies the year/month/day format?
Many thanks in advance,
C
The simple:
df = pd.read_csv('file.csv', parse_dates=['data'], sep='\s\s+')
seems to work. df['data'] is:
0 2000-05-30 17:27:00-05:00
1 2000-05-30 17:35:00-05:00
2 2000-05-30 17:38:00-05:00
3 2000-05-30 17:43:00-05:00
4 2000-05-30 17:44:00-05:00
Name: data, dtype: datetime64[ns, pytz.FixedOffset(-300)]
Maybe your data contains some irregularities.
I have a dataframe that contains missing data. I'm interested in exploring interpolation as a possible alternative to removing columns with missing data.
Below is a subset of the dataset. 'a_out' is outdoor temperature while 'b_in' etc. are temperatures from rooms in the same house.
a_out b_in c_in d_in e_in f_in
... ... ... ... ... ... ...
03/01/2016 6.51 17.71 15.15 14.04 15.27 16.32
04/01/2016 5.94 17.49 14.34 14.71
05/01/2016 6.74 17.57 14.80 15.18
06/01/2016 5.86 17.49 14.68 18.43 15.57
07/01/2016 5.18 17.18 14.02 14.88
08/01/2016 2.84 16.80 13.15 14.51 14.48
... ... ... ... ... ... ...
Might there be a way to interpolate the missing data, but with some weighting based on intact data in other columns? Perhaps 'cubic' interpolation could do the trick?
Thanks!
I have stock data set like
**Date Open High ... Close Adj Close Volume**
0 2014-09-17 465.864014 468.174011 ... 457.334015 457.334015 21056800
1 2014-09-18 456.859985 456.859985 ... 424.440002 424.440002 34483200
2 2014-09-19 424.102997 427.834991 ... 394.795990 394.795990 37919700
3 2014-09-20 394.673004 423.295990 ... 408.903992 408.903992 36863600
4 2014-09-21 408.084991 412.425995 ... 398.821014 398.821014 26580100
I need to cumulative sum the columns Open,High,Close,Adj Close, Volume
I tried this df.cumsum(), its shows the the error time stamp error.
I think for processing trade data is best create DatetimeIndex:
#if necessary
#df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date')
And then if necessary cumulative sum for all column:
df = df.cumsum()
If want cumulative sum only for some columns:
cols = ['Open','High','Close','Adj Close','Volume']
df[cols] = df.cumsum()