Can I backtest a hedging strategy in? - bots

I have a strategy that starts with a future short position, then a spot positions opens afterwards, and its entry point is related to the futures position. Can I open both positions and decide when to close one with conditions applied on the other one?

I don't think a backtest script can test simultaneously more than 1 asset unfortunately
However, we can use other assets for internal calculations but ultimately, the backtest results can only be for 1 asset

Related

Open multiple positions in Backtrader

Does someone know if it's possible to open multiple positions with only a single data feed? I am trying to do a second buy whilst in a position, which doesn't seem to be possible.
Nobody seems to adress this issue. Does anyone have any experience with Backtrader and have any input?
If you are just trying to buy more stock to add to your position, then yes, you should be able to do this and if you cannot recheck your strategy code in next.
If you are trying to track two separate positions of the same data...
One cannot have two separate positions in the same data feed. You may trade additional positions if you like but they will be combined in Backtrader. Even if you use two strategies you will still have one combined broker.
The reason for this is to simulate as near as possible real world conditions. If you have a brokerage account you most likely would have just one postion. (I know there are exceptions)
One solution would be to track your trading manually in a dictionary trades that result from different signals/sub-strategies. It's a bit more tedious to develop but very doable.

Loop in a activity diagram

I am trying to design an activity diagram for an image editing application. Let's say that the application has one adjustment to edit an image. that's brightness. When the user opens the application he can change the brightness again and again. Then finally save it. That's really not a loop. but it's a repetitive process. How can I represent such a process. I have found stack answers for looping through documents and for loops. But didn't found a matching scenario for like this.
Thank you!
Oh, but why do you say it is not a loop? It is.
Sorry, no reasonable drawing tool at hand, so this will be textual
Let's look at a pseudocode:
open app (image as attribute)
while decide to continue to brighten the image do
brighten the image
loop
As you can see you do loop and your condition to loop or finish it depends on the decision to brighten or finish working with the app.
The brightening itself can be more complex (e.g. may have some selection of settings like level or method of brightening, it may even have the ability to break the brightenin or undoing) but it is still the loop.
Out of this solution to represent a loop you can use options 2 and 3 easily.

How to parallelize execution of a custom function formula while keeping the Google Sheet shareable and permissionless?

I have a Google Sheet with a custom function formula that: takes in a matrix and two vectors from a spreadsheet, does some lengthy matrix-vector calculations (>30 sec, so above the quota), before outputting the result as a bunch of rows. It is single-threaded, since that's what Google Apps Script (GAS) natively is, but I want to parallelize the calculation using a multi-threading workaround, so it speeds it up drastically.
Requirements (1-3):
UX: It should run the calculations automatically and reactively as a custom function formula, which implies that the user doesn't have to manually start it by clicking a run button or similar. Like my single-threaded version currently does.
Parallelizable: It should ideally spawn ~30 threads/processes, so that instead of taking >30 seconds as it now does (which makes it time out due to Google's quota limit), it should take ~1 second. (I know GAS is single-threaded, but there are workarounds, referenced below).
Shareability: I should ideally be able to share the Sheet with other people, so they can "Make a copy" of it, and the script will still run the calculations for them:
3.1 Permissionless: Without me having to manually hand out individual permissions to users (permissionless). For instance whenever someone "Makes a copy" and "Execute the app as user accessing the web app". My rudimentary testing suggest that this is possible.
3.2 Non-intrusive: Without users of the spreadsheet having to give intrusive authorizations like "Give this spreadsheet/script/app access to your entire Google Drive or Gmail account?". Users having to give an non-intrusive authorization to a script/webapp can be acceptable, as long as requirement 3.1 is still maintained.
3.3 UX: Without forcing users to view a HTML sidebar in the spreadsheet.
I have already read this excellent related answer by #TheMaster which outlines some potential ways of solving parallelization in Google Apps script in general. Workaround #3 google.script.run and workaround #4 UrlFetchApp.fetchAll (both using a Google Web App) looks most promising. But some details are unknown to me, such as if they can adhere to requirements 1 and 3 with its sub-requirements.
I can conceive of an other potential naïve workaround which would be to split the function up into several custom functions formulas and do the parallelization (by some kind of Map/Reduce) inside the spreadsheet itself (storing intermediary results back into the spreadsheet, and having custom function formulas work on that as reducers). But that's undesired, and probably unfeasible, in my case.
I'm very confident my function is parallelizable using some kind of Map/Reduce process. The function is currently optimized by doing all the calculations in-memory, without touching the spreadsheet in-between steps, before finally outputting the result to the spreadsheet. The details of it is quite intricate and well over 100 lines, so I don't want to overload you with more (and potentially confusing) information which doesn't really affect the general applicability of this case. For the context of this question you may assume that my function is parallelizable (and map-reduce'able), or consider any function you already know that would be. What's interesting is what's generally possible to achieve with parallelizationin Google Apps Script, while also maintaining the highest level of shareability and UX. I'll update this question with more details if needed.
Update 2020-06-19:
To be more clear, I do not rule out Google Web App workarounds entirely, as I haven't got experience with their practical limitations to know for sure if they can solve the problem within the requirements. I have updated the sub-requirements 3.1 and 3.2 to reflect this. I also added sub-req 3.3, to be clearer on the intent. I also removed req 4, since it was largely overlapping with req 1.
I also edited the question and removed the related sub-questions, so it is more focused on the single main HOWTO-question in the title. The requirements in my question should provide a clear objective standard for which answers would be considered best.
I realise the question might entail a search for the Holy Grail of Google Sheet multithreading workarounds, as #TheMaster has pointed out in private. Ideally, Google would provide one or more features to support multithreading, map-reduce, or more permissionless sharing. But until then I would really like to know what is the optimal workaround within the current constraints we have. I would hope this question is relevant to others as well, even considering the tight requirements.
If you publish a web-app with "anyone, even anonymous", execute as "Me", then the custom function can use UrlFetchApp.fetchAllAuthorization not needed to post to that web-app. This will run in parallelproof. This solves all the three requirements.
Caveat here is: If multiple people use the sheet, and the custom function will have to post to the "same" webapp (that you published to execute as you) for processing, Google will limit simultaneous executionsquota limit:30.
To workaround this, You can ask people using your sheet to publish their own web-apps. They'll have to do this once at the beginning and no authorization is needed.
If not, you'll need to host a custom server for the load or something like google-cloud-functions might help
I ended up using the naïve workaround that I mentioned in my post:
I can conceive of an other potential naïve workaround which would be
to split the function up into several custom functions formulas and do
the parallelization (by some kind of Map/Reduce) inside the
spreadsheet itself (storing intermediary results back into the
spreadsheet, and having custom function formulas work on that as
reducers). But that's undesired, and probably unfeasible, in my case.
I initially disregarded it because it involves having an extra sheet tab with calculations which was not ideal. But when I reflected on it after investigating alternative solutions, it actually solves all the stated requirements in the most non-intrusive manner. Since it doesn't require anything extra from users the spreadsheet is shared with. It also stays 'within' Google Sheets as far as possible (no semi- or fully external Web App needed), doing the parallelization by relying on the native parallelization of concurrently executing spreadsheet cells, where results can be chained, and appear to the user like using regular formulas (no extra menu item or run-this-script-buttons necessary).
So I implemented MapReduce in Google Sheets using custom functions each operating on a slice of the interval I wanted to calculate. The reason I was able to do that, in my case, was that the input to my calculation was divisible into intervals that could each be calculated separately, and then joined later.**
Each parallel custom function then takes in one interval, calculates that, and outputs the results back to the sheet (I recommend to output as rows instead of columns, since columns are capped at 18 278 columns max. See this excellent post on Google Spreadsheet limitations.) I did run into the only 40,000 new rows at a time limitation, but was able to perform some reducing on each interval, so that they only output a very limited amount of rows to the spreadsheet. That was the parallelization; the Map part of MapReduce. Then I had a separate custom function which did the Reduce part, namely: dynamically target*** the spreadsheet output area of the separately calculated custom functions, and take in their results, once available, and join them together while further reducing them (to find the best performing results), to return the final result.
The interesting part was that I thought I would hit the only 30 simultaneous execution quota limit of Google Sheets. But I was able to parallelize up to 64 independently and seemingly concurrently executing custom functions. It may be that Google puts these into a queue if they exceed 30 concurrent executions, and only actually process 30 of them at any given time (please comment if you know). But anyhow, the parallelization benefit/speedup was huge, and seemingly nearly infinitely scalable. But with some caveats:
You have to define the number of parallelised custom functions up front, manually. So the parallelization doesn't infinitely auto-scale according to demand****. This is important because of the counter-intuitive result that in some cases using less parallelization actually executes faster. In my case, the result set from a very small interval could be exceedingly large, while if the interval had been larger then a lot of the results would have been ruled out underway in the algorithm in that parallelised custom function (i.e. the Map also did some reduction).
In rare cases (with huge inputs), the Reducer function will output a result before all of the parallel (Map) functions have completed (since some of them seemingly take too long). So you seemingly have a complete result set, but then a few seconds later it will re-update when the last parallel function returns its result. This is not ideal, so to be notified of this I implemented a function to tell me if the result was valid. I put it in the cell above the Reduce function (and colored the text red). B6 is the number of intervals (here 4), and the other cell references go to the cell with the custom function for each interval: =didAnyExecutedIntervalFail($B$6,S13,AB13,AK13,AT13)
function didAnyExecutedIntervalFail(intervalsExecuted, ...intervalOutputs) {
const errorValues = new Set(["#NULL!", "#DIV/0!", "#VALUE!", "#REF!", "#NAME?", "#NUM!", "#N/A","#ERROR!", "#"]);
// We go through only the outputs for intervals which were included in the parallel execution.
for(let i=0; i < intervalsExecuted; i++) {
if (errorValues.has(intervalOutputs[i]))
return "Result below is not valid (due to errors in one or more of the intervals), even though it looks like a proper result!";
}
}
The parallel custom functions are limited by Google quota of max 30 sec execution time for any custom function. So if they take too long to calculate, they still might time out (causing the issue mentioned in the previous point). The way to alleviate this timeout is to parallelise more, dividing into more intervals, so that each parallel custom function runs below 30 second.
The output of it all is limited by Google Sheet limitations. Specifically max 5M cells in a spreadsheet. So you may need to perform some reduction on the size of the results calculated in each parallel custom function, before returning its result to the spreadsheet. So that they each are below 40 000 rows, otherwise you'll receive the dreaded "Results too large" error). Furthermore, depending on the size the result of each parallel custom function, it would also limit how many custom functions you could have at the same time, as they and their result cells take space in the spreadsheet. But if each of them take in total, say 50 cells (including a very small output), then you could still parallelize pretty much (5M / 50 = 100 000 parallel functions) within a single sheet. But you also need some space for whatever you want to do with those results. And the 5M cells limit is for the whole Spreadsheet in total, not just for one of its sheet tabs, apparently.
** For those interested: I basically wanted to calculate all combinations of a sequence of bits (by brute force), so the function was 2^n where n was the number of bits. The initial range of combinations was from 1 to 2^n, so it could be divided into intervals of combinations, for example, if dividing into two intervals, it would be one from 1 to X and then one from X+1 to 2^n.
*** For those interested: I used a separate sheet formula to dynamically determine the range for the output of one of the intervals, based on the presence of rows with content. It was in a separate cell for each interval. For the first interval it was in cell S11 and the formula looked like this:
=ADDRESS(ROW(S13),COLUMN(S13),4)&":"&ADDRESS(COUNTA(S13:S)+ROWS(S1:S12),COLUMN(Z13),4) and it would output S13:Z15 which is the dynamically calculated output range, which only counts those rows with content (using COUNTA(S13:S)), thus avoiding to have a statically determined range. Since with a normal static range the size of the output would have to be known in advance, which it wasn't, or it would possibly either not include all of the output, or a lot of empty rows (and you don't want the Reducer to iterate over a lot of essentially empty data structures). Then I would input that range into the Reduce function by using INDIRECT(S$11). So that's how you get the results, from one of the intervals processed by a parallelized custom function, into the main Reducer function.
**** Though you could make it auto-scale up to some pre-defined amount of parallelised custom functions. You could use some preconfigured thresholds, and divide into, say, 16 intervals in some cases, but in other cases automatically divide into 64 intervals (preconfigured, based on experience). You'd then just stop / short-circuit the custom functions which shouldn't participate, based on if the number of that parallelised custom function exceeds the number of intervals you want to divide into and process. On the first line in the parallelised custom function: if (calcIntervalNr > intervals) return;. Though you would have to set up all the parallel custom functions in advance, which can be tedious (remember you have to account for the output area of each, and are limited by the max cell limit of 5M cells in Google Sheets).

several walkers walking on a grid: How to organize the threads?

My algorithm is processing DEMs. a DEM (Digital Elevation Model) is a representations of ground topography where elevation is known at grid nodes.
My problem can be summarized as follows:
Q is a queue containing nodes to visit.
at start, the boundary of the grid is pushed in Q.
while Q is not empty, do
remove Node N from the top of Q
if N was never visited then do
consider the 8 neighbors of N
among them select the unvisited ones
among them select those with a higher elevation than N's
push these at Q's tail
mark N as visited
done
done
As described, the algorithm will mark as 'visited' every node that can be reached from the border by a continuously ascendant path. It is worth noticing that the order of processing the nodes in the queue is unimportant. Note also that some points may request a tortuous ascendant path to be reached from the border. Think for example to a cone with a furrow spiraling around it. The ridge of the furrow is such a unique and tortuous path capable of reaching the top of the cone without never descending into the furrow.
Anyway, I want to mutithread this algorithm. I am still in the first step of wondering which is the best organization of data and threads in order to have as least pain as possible at debugging the beast when it is written.
My first thought is to divide the grid into tiles and split the Queue in as many tiles as there is in the grid. The tiles are piled in a work-list. A few threads are parsing the work-list and grab any tile where something can be done at the moment.
Working on a specific tile will firstly need that the tile's queue is not empty. I may also need that the neighboring tiles can be locked if the walker's tile has to visit a node at the edge of the tile.
I am thinking that when a walker cannot lock a neighboring tile while it needs to, then it can skip to the next node in the local queue, or even the thread itself can release the tile to the work-list and seek for another tile to work on.
My actual experience of multi-thread programming is good enough to understand that this lovely description is very likely to turn into a nightmare when I will debug it. However I am not experienced enough to evaluate the various possibilities of programming the algorithm and make a good decision, having in mind that I will not be given a month to debug a spaghetti dish.
Thanks for reading :)

Visit all nodes in a graph with least repeat visits

I have a tile based map where several tiles are walls and others are walkable. the walkable tiles make up a graph I would like to use in path planning. My question is are their any good algorithms for finding a path which visits every node in the graph, minimising repeat visits?
For example:
map example http://img220.imageshack.us/img220/3488/mapq.png
If the bottom yellow tile is the starting point, the best path to visit all tiles with least repeats is:
path example http://img222.imageshack.us/img222/7773/mapd.png
There are two repeat visits in this path. A worse path would be to take a left at the first junction, then backtrack over three already visited tiles.
I don't care about the end node but the start node is important.
Thanks.
Edit:
I added pictures to my question but cannot see them when viewing it. here they are:
http://img220.imageshack.us/img220/3488/mapq.png
http://img222.imageshack.us/img222/7773/mapd.png
Additionally, in the graphs I need this for there will never be a situation where min repeats = 0. That is, to step on every tile in the map the player must cross his own path at least once.
Your wording is bad -- it allows a reduction to an NP-complete problem. If you could minimize repeat visits, then could you push them to 0 and then you would have a Hamiltonian Cycle. Which is solvable, but hard.
This sounds like it could be mapped onto the traveling salesman problem ... and so likely ends up being NP complete and no efficient deterministic algorithm is known.
Finding a path is fairly straight forward -- find a (or the minimum) spanning subtree and then do a depth/breadth-first traversal. Finding the optimal route is the really difficult bit.
You could use one of the dynamic optimization techniques to try and converge on a fairly good solution.
Unless there is some attribute of the minimum spanning subtree that could be used to generate the best path ... but I don't remember enough graph theory for that.

Resources