I'm learning to use backtrader and I've come across a problem when trying to print out the datafeed. It correctly prints the day, open, high, low, close and volume but the hour and minutes data seems to default to 23:59:59.999989 on every line.
Here is a sample of the data source:
datetime,open,high,low,close,volume,,
11/2/2020 9:30,330.187,330.188,329.947,330.038,4.79,,
11/2/2020 9:31,330.038,330.438,329.538,329.677,5.49,,
11/2/2020 9:32,329.667,330.248,329.577,330.117,5.8,,
11/2/2020 9:33,330.128,330.328,329.847,329.948,5.59,,
11/2/2020 9:34,329.967,330.308,329.647,329.698,6.24,,
and the code I use to add the data to backtrader is:
data = bt.feeds.GenericCSVData(
dataname = 'SPY_11_2020_1M.txt',
name= 'SPY',
datetime = 0,
dtformat = ('%m/%d/%Y %H:%M'),
period = bt.TimeFrame.Ticks,
compression = 1,
fromdate = params['fromdate'],
todate = params['todate'],
open = 1,
high = 2,
low = 3,
close = 4,
volume = 5,
openinterest = -1,
)
cerebro.adddata(data)
my code for the trategy, which is a simple buy and hold strategy, is:
import backtrader as bt
from datetime import datetime as dt
class BuyHold(bt.Strategy):
def __init__(self):
# self.time = self.datas[0].datetime.datetime(0),
self.open = self.datas[0].open
self.high = self.datas[0].high
self.low = self.datas[0].low
self.close = self.datas[0].close
self.volume = self.datas[0].volume
def next(self):
print('{0} {1}\t{2}\t{3}\t{4}\t{5}\t{6}'.format(
self.datas[0].datetime.date(0),
self.datas[0].datetime.time(0),
self.open[0],
self.high[0],
self.low[0],
self.close[0],
self.volume[0]
))
# print('{0}\t{1}\t{2}\t{3}\t{4}\t{5}'.format(
# self.time,
# self.open[0],
# self.high[0],
# self.low[0],
# self.close[0],
# self.volume[0]
# ))
if self.position.size == 0:
size = int(self.broker.getcash() / self.data)
self.buy(size = size)
The printout I get is as:
2020-11-02 23:59:59.999989 330.187 330.188 329.947 330.038 4.79
2020-11-02 23:59:59.999989 330.038 330.438 329.538 329.677 5.49
2020-11-02 23:59:59.999989 329.667 330.248 329.577 330.117 5.8
2020-11-02 23:59:59.999989 330.128 330.328 329.847 329.948 5.59
2020-11-02 23:59:59.999989 329.967 330.308 329.647 329.698 6.24
I also tried it with the commented out self.time with the commented out print line which provides similar result in a slightly different format as:
(datetime.datetime(2020, 11, 2, 23, 59, 59, 999989),) 330.187 330.188 329.947 330.038 4.79
(datetime.datetime(2020, 11, 2, 23, 59, 59, 999989),) 330.038 330.438 329.538 329.677 5.49
(datetime.datetime(2020, 11, 2, 23, 59, 59, 999989),) 329.667 330.248 329.577 330.117 5.8
(datetime.datetime(2020, 11, 2, 23, 59, 59, 999989),) 330.128 330.328 329.847 329.948 5.59
(datetime.datetime(2020, 11, 2, 23, 59, 59, 999989),) 329.967 330.308 329.647 329.698 6.24
(datetime.datetime(2020, 11, 2, 23, 59, 59, 999989),) 329.698 330.198 329.568 329.948 6.51
I don't know what I'm missing here.
struggle with the problem some days,i use isoformat to read the time and date
strategy
class my_strategy1(bt.Strategy):
def log(self, txt, dt=None):
''' Logging function for this strategy'''
# dt = dt or self.datas[0].datetime.date(0)
#the fellowing will print date ,that is,2021-08-06
print(self.datas[0].datetime.date(0).isoformat())
#the fellowing will print time,that is,08:09:01
print(self.datas[0].datetime.time(0).isoformat())
print(txt)
def __int__(self):
pass
def next(self):
self.log('Close, %.2f' % self.dataclose[0])
print("trend is", self.datas[0].lines.trend[0])
pass
data class
class My_CSVData(bt.feeds.GenericCSVData):
"""
如何添加格外的数据列在excel中进行处理
how to append the other data to csv
"""
lines = ('trend',)
params = (
('trend', -1),
)
def get_data_via_excel(path):
datatest=My_CSVData(
dataname=path,
timeframe=bt.TimeFrame.Minutes,
compression=60,
dtformat='%Y-%m-%d %H:%M:%S',
tmformat = '%H:%M:%S',
fromdate = datetime(2021,4,16),
todate = datetime(2021,7,30),
datetime = 0,
high=2,
open =1,
close=4,
low =3,
volume=5,
openinterest =6,
trend = 7 ,#not use -1
)
return datatest
data source
datetime,open,high,low,close,volume,openinterest,trend
2021-04-16 09:59:00,5138,5144,5109,5117,200,0,-2
2021-04-16 11:00:00,5117,5122,5089,5102,200,0,-2
2021-04-16 11:29:00,5103,5118,5096,5105,200,0,-1
2021-04-16 14:00:00,5105,5152,5105,5142,200,0,0
2021-04-16 15:00:00,5141,5142,5111,5116,200,0,1
2021-04-16 21:59:00,5122,5141,5116,5129,200,0,0
2021-04-16 23:00:00,5128,5136,5108,5120,200,0,0
This problem also took me few hours. And I find the solution from another web.
Here.
For minute data tell cerebro that you are using minute data (timeframe) and how many minutes per bar (compression).
data = bt.feeds.GenericCSVData(
dataname = 'SPY_11_2020_1M.txt',
name= 'SPY',
datetime = 0,
dtformat = ('%m/%d/%Y %H:%M'),
**timeframe=bt.TimeFrame.Minutes,**
period = bt.TimeFrame.Ticks,
compression = 1,
fromdate = params['fromdate'],
todate = params['todate'],
open = 1,
high = 2,
low = 3,
close = 4,
volume = 5,
openinterest = -1,
)
I am generating dates between 01-01-2010 and 31-01-2010 with a gap of 1 second as follows:
import datetime
dt = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2010, 1, 31, 23, 59, 59)
step = datetime.timedelta(seconds=1)
result = []
while dt < end:
result.append(dt.strftime('%Y-%m-%d %H:%M:%S'))
dt += step
and the result that it gives it perfectly fine, but it takes about 10secs to give the result.
I was wondering if the same can be achieved using pandas so that it can be achieved a bit quickly
Use date_range with DatetimeIndex.strftime:
result = pd.date_range(dt, end, freq='S').strftime('%Y-%m-%d %H:%M:%S').tolist()
Or:
result = pd.date_range('2010-01-01',
'2010-01-30 23:59:59',freq='S').strftime('%Y-%m-%d %H:%M:%S').tolist()
try like below
l = (pd.DataFrame(columns=['NULL'],
index=pd.date_range('2010-01-01T00:00:00Z', '2010-01-31T00:00:00Z',
freq='1T'))
.index.strftime('%Y-%m-%dT%H:%M:%SZ')
.tolist()
)
print(l)
The following coding represents a candlestick chart in bokeh:
from math import pi
import pandas as pd
from bokeh.plotting import figure, show, output_file
from bokeh.sampledata.stocks import MSFT
df = pd.DataFrame(MSFT)[:50]
df["date"] = pd.to_datetime(df["date"])
mids = (df.open + df.close)/2
spans = abs(df.close-df.open)
inc = df.close > df.open
dec = df.open > df.close
w = 12*60*60*1000 # half day in ms
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"
p = figure(x_axis_type="datetime", tools=TOOLS, plot_width=1000, toolbar_location="left")
#p.title = "MSFT Candlestick"
p.xaxis.major_label_orientation = pi/4
p.grid.grid_line_alpha=0.3
p.segment(df.date, df.high, df.date, df.low, color="black")
p.rect(df.date[inc], mids[inc], w, spans[inc], fill_color="#D5E1DD", line_color="black")
p.rect(df.date[dec], mids[dec], w, spans[dec], fill_color="#F2583E", line_color="black")
output_file("candlestick.html", title="candlestick.py example")
show(p) # open a browser
As you can see in this result that the x-axis dates matches March, the 1th and March, 15th, etc. Is there a possibility to increase the frequency, so that the next date is after March, 1th would be March, 5th for example?
Bokeh documentation offers several options. In some cases setting desired_num_ticks like this could help:
p.xaxis[0].ticker.desired_num_ticks = 20
Or you could try for example:
from bokeh.models import DaysTicker
p.xaxis[0].ticker = DaysTicker(days = [1, 5, 10, 15, 20, 25, 30])
Result:
I am trying to insert 'NULL' values in a table depending on the date.
If the date is between todays date and 3 months backward (which will be february).
Then I want to update the 'NULL' values into each selected columns.
The Traceback is as following:
Traceback (most recent call last):
File "C:\projects\docs\script.py", line 41, in <module>
if dt < date_sql < dr3:
TypeError: '<' not supported between instances of 'datetime.datetime' and 'pyodbc.Row'
Been strugling for a long time, so really appreciate your guidance as I have tried to find a solution.
Python code is:
import pyodbc
from datetime import date, datetime
import dateutil.relativedelta
conn = pyodbc.connect(
r'DRIVER={SQL Server};'
r'SERVER=server;'
r'DATABASE=db;'
)
dt = datetime.today()
dr3 = dt - dateutil.relativedelta.relativedelta(months=3)
print(dr3)
cursor = conn.cursor()
sent_date = cursor.execute("""SELECT TOP 30 sent_date, id
FROM Department.Customer""")
def fetch_date():
for row in sent_date:
r = row
print(r)
return r
date_sql = fetch_date()
if dt < date_sql < dr3:
try:
value = None
cursor.execute("""UPDATE Department.Customer SET name=?, address=?, email=?,
phone=?""", (value, value, value, value))
cursor.commit()
except pyodbc.Error as ex:
print(str(ex))
cursor.rollback()
cursor.close()
Output from print(dr3) is:
2018-02-28 17:19:50.452290
Output from print(r) in fetch_date() function is:
(datetime.datetime(2018, 5, 22, 10, 21, 36), 1)
(datetime.datetime(2018, 5, 22, 10, 21, 36), 2)
(datetime.datetime(2018, 5, 22, 10, 21, 36), 3)
...
I have a data set for which has two labels, label 1 = 0(case), label 2 =1(control). I have already calculated the mean for the two different labels. Furthermore, I need to calculate two sample t test(dependent) and two sample rank sum test. My data set looks like :
SRA ID ERR169499 ERR169500 ERR169501 mean_ctrl mean_case
Label 1 0 1
TaxID PRJEB3251_ERR169499 PRJEB3251_ERR169500 PRJEB3251_ERR169501
333046 0.05 0 0.4
1049 0.03 0.9 0
337090 0.01 0.6 0.7
I am new to statistics.The code I have so far is this:
label = []
data = {}
x = open('final_out_transposed.csv','rt')
for r in x:
datas = r.split(',')
if datas[0] == ' Label':
label.append(r.split(",")[1:])
label = label[0]
label[-1] = label[-1].replace('\n','')
counter = len(label)
for row in file1:
content = row.split(',')
if content[0]=='SRA ID' or content[0]== 'TaxID' or content[0]==' Label':
pass
else:
dt = row.split(',')
dt[-1] = dt[-1].replace('\n','')
data[dt[0]]=dt[1:]
keys = list(data)
sum_file = open('sum.csv','w')
for key in keys:
sum_case = 0
sum_ctrl = 0
count_case = 0
count_ctrl = 0
mean_case = 0
mean_ctrl = 0
print(len(label))
for i in range(counter):
print(i)
if label[i] == '0' or label[i] == 0:
sum_case=np.float64(sum_case)+np.float64(data[key][i])
count_case = count_case+1
mean_case = sum_case/count_case
else:
sum_ctrl = np.float64(sum_ctrl)+np.float64(data[key][i])
count_ctrl = count_ctrl+1
mean_ctrl = sum_ctrl/count_ctrl
Any help will be highly appreciated.
Instead of using open to read your csv file, I would use Pandas. That will place it in a dataframe that will be easier to use
import pandas as pd
data_frame = pd.read_csv('final_out_transposed.csv')
For a Two Sample dependent T-test you want to use ttest_rel
notice ttest_ind is for independent groups. Since you specifically asked for dependent groups, use ttest_rel.
It's hard from your example above to see where your two columns of sample data are, but imagine I had the following made up data of 'case' and 'control'. I could calculate a dependent Two Sample t-test using pandas as shown below:
import pandas as pd
from scipy.stats import ttest_rel
data_frame = pd.DataFrame({
'case':[55, 43, 51, 62, 35, 48, 58, 45, 48, 54, 56, 32],
'control':[48, 38, 53, 58, 36, 42, 55, 40, 49, 50, 58, 25]})
(t_stat, p) = ttest_rel(data_frame['control'], data_frame['case'])
print (t_stat)
print (p)
p would be the p-value, t_stat would be the t-statistic. You can read more about this in the documentation
In a similar manner, once you have your sample .csv data in a dataframe, you can perform a rank sum test:
from scipy.stats import ranksums
(t_stat, p) = ranksums(data_frame['control'], data_frame['case'])
documentation for ranksums