How to set custom lookback in TA-Lib Pattern Recognition for Python - ta-lib

I'm trying to set a custom lookback period of 3 Candles for CDLHANGINGMAN. Reading the documentation of abstract function I can see that the default Lookback period is 11, using
Function('CDLHANGINGMAN').lookback
I want to change it to 3. How can I do that?

It seems you can't change timeperiod for candle functions. They're depend on default values of candle types. But there is a function in C++ interface to change these defaults: TA_SetCandleSettings(). And there is a code in Python wrapper of this lib that wraps that function. It's introduced by this commit in 0.4.14. Here is the example on how to call it from python.
Timeperiod for CDLHANGINGMAN is
max( max( max( TA_CANDLEAVGPERIOD(BodyShort), TA_CANDLEAVGPERIOD(ShadowLong) ),
TA_CANDLEAVGPERIOD(ShadowVeryShort) ),
TA_CANDLEAVGPERIOD(Near)
) + 1;
So you need to make sure that avgperiod for candle types BodyShort, ShadowLong, ShadowVeryShort and Near is <= 2. But as there is no function that changes only avgperiod and it sets all 3 rangetype, avgperiod, factor that seems not to be easy to do. The list of current default values for all candle types you can find here.

Related

Kusto Query Language: set column name of summarize by evaluated expression

Me again asking another Kusto related question (I really wish there would be a thorough video tutorial on this somewhere).
I have a summarize statement, that produces two columns for y axis and one for x axis.
Now i want to relabel the columns for x axis to show a string, that i also got from the database and already put into a variable with let.
This basically looks like this:
let android_col = strcat("Android: ", toscalar(customEvents
| where application_Version contains secondLatestVersionAndroid));
let iOS_col = strcat("iOS: ", toscalar(customEvents
| where application_Version contains secondLatestVersionIOS));
... some Kusto magic ...
| summarize
Android = 100 - (round((countif(hasUnhandledErrorAndroid == 1 ) * 100.0 ) / countif(isAndroid == 1), 2)),
iOS = 100 - (round((countif(hasUnhandledErroriOS == 1) * 100.0 ) / countif(isIOS == 1), 2))
by Time
|render timechart with (ytitle="crashfree users in %", xtitle="date", legend=visible )
Now i want to have the summarize display not Android and iOS, but the value of android_col and iOS_col.
Is that possible?
Best regards
Maverick
Generally, it's suggested to have predefined column names, otherwise various features don't work. For example, IntelliSense won't know the names of the columns, as they would be determined at run time only. Also, if you create a function that returns a dynamic schema, you won't be able to run this function from other clusters.
However, if you do want to change column names, you definitely have a way to do it by using various plugins. For example, bag_unpack, pivot and others.
As for courses on Kusto, there are actually several excellent courses on Pluralsight (all are free):
How to start with Microsoft Azure Data Explorer
Basic KQL
Azure Data Explorer – Advanced KQL
The usage of the "toscalar" in this query looks wrong, it seems to me that you should use the "extend" operator with the same logic to create the additional columns.

Altair's selection and transform_filter via binding_range slider for datetime values doesn't seem to work with equality condition or selector itself

I wanted to bind a range slider with datetime values to filter a chart for data for a particular date only. Using the stocks data, what I want to do is have the x-axis show the companies and y-axis the price of the stocks for a particular day which the user selects via a range slider.
Based on inputs from this answer and this issue I have the following code which shows something
when the slider is moved around after one particular value (with the inequality condition in transform_filter), but is empty for the rest.
What is peculiar is that if I have inequality operator then at least it shows something, but everything is empty when its ==.
import altair as alt
from vega_datasets import data
source = data.stocks()
def timestamp(t):
return pd.to_datetime(t).timestamp()
slider = alt.binding_range(step=86400, min=timestamp(min(source['date'])), max=timestamp(max(source['date']))) #86400 is the difference b/w consequetive days
select_date = alt.selection_single(fields=['date'], bind=slider, init={'date': timestamp(min(source['date']))})
alt.Chart(source).mark_bar().encode(
x='symbol',
y='price',
).add_selection(select_date).transform_filter(alt.datum.date == select_date.date)
Since the output is empty I am inclined to conclude that it's the transform_filter that is causing issues, but I have been at it for more than 6 hours now and tried all the permutation and combinations of using alt.expr.toDate and other conversions here and there but I cannot get it to work.
Also tried just transform_filter(select_date.date) and transform_filter(date) along with other things but nothing quite works.
The expected output is that, the heights of bars change(due to data being filtered on date) as the user drags the slider.
Any help would be really appreciated.
There are several issues here:
In Vega-Lite, timestamps are expressed in milliseconds, not seconds
You are filtering on equality between a numerical timestamp and a string representation of a date.
Even if you parse the date in the filter expression, Python date parsing and Javascript date parsing behave differently and the results will generally not match. Even within javascript, the date parsing behavior can vary from browser to browser; all this means that filtering on equality of a Python and Javascript timestamp is generally problematic
The data you are using has monthly timestamps, so the slider step should account for this
Keeping all that in mind, the best course would probably be to adjust the slider values and filter on matching year and month, rather than trying to achieve equality in the exact timestamp. The result looks like this:
import altair as alt
from vega_datasets import data
import pandas as pd
source = data.stocks()
def timestamp(t):
return pd.to_datetime(t).timestamp() * 1000
slider = alt.binding_range(
step=30 * 24 * 60 * 60 * 1000, # 30 days in milliseconds
min=timestamp(min(source['date'])),
max=timestamp(max(source['date'])))
select_date = alt.selection_single(
fields=['date'],
bind=slider,
init={'date': timestamp(min(source['date']))},
name='slider')
alt.Chart(source).mark_bar().encode(
x='symbol',
y='price',
).add_selection(select_date).transform_filter(
"(year(datum.date) == year(slider.date[0])) && "
"(month(datum.date) == month(slider.date[0]))"
)
You can view the result here: vega editor.

Kaplan Meier Estimator with a second dimension

I succeed to implement the Kaplan Meier estimator inside a line chart in Qlik Sense
like this
To do that, I write this expression which is the exact transcription of KM Estimator
= if(RowNo() = 1, 1,
(1 - (count({<Analyse_Type = {'Churn'}>}%Key_Contract) /
count({<Analyse_Type = {'Parc'}>}%Key_Contract))) * above(Column(1))
)
Everything works fine but I'd like to add a second dimension in the graph and when I do that, the recursive above seems to get muddle up.
I try to aggregate the above by my second dimension but it is not working.
Does someone have an idea to do that? Or another way to write the Kaplan Meier estimator without the using of a recursion?
I find a solution to my issue.
I switch the way to make a accumulation of product (the recursive above) by the mathematical logic
exp(rangeSum(log())). I aggregate the rangeSum by my second dimension ordered by my first dimension (the interval) and everything works fine.
Here the final expression of the Kaplan Meier Estimator:
exp(aggr(Rangesum(Above(log(fabs(
(1 - (count({<Analyse_Type = {'Churn'}>}%Key_Contract) / count({<Analyse_Type
{'SurvivalParc'}>}%Key_Contract)))) ),0, Rowno()))
, REGION, (Delivered_Days_5, NUMERIC, ASCENDING)))
And here is the visual result:

Taking advantage of the name of a variable in Stata

I´m using Stata and I have a set of variables named cal1, cal2, cal3 and so on until cal21. For every line of my dataset, i could have more or less cal* variables as non-missing (I designed the dataset with a reshape wide). I want to generate a new variable that returns the maximum name of variable cal* available for each line that is non-missing. For example, if line 1 has until cal3 as non- missing , this variable returns cal3; for the line 2 if i have cal1, cal2 and cal6, I want cal6. Is there a way to do this?
This would be much easier to accomplish with data in long format layout, but it is doable with wide data too with a loop:
gen max_cal = "none"
forvalues v=1/21 {
replace max_cal = "cal`v'" if !missing(cal`v')
}
This will update the max_cal variable each time there's a higher one not missing.

icCube - InterpolateRGBColors based on min & max values?

I understand that InterpolateRGBColors function is returning a color by position of value between 0 and 1... So its seems to be doable only with percentages, not numbers...
Is there a way to have the same functionality, but based on the min and max values returned in a set ?
What I want is to attribute colors to my measure but in a range of min([Measures].[NbSejours]) to max([Measures].[NbSejours]) ( not 0 to 1)...
WITH
MEMBER [Measures].[color] AS
InterpolateRGBColors(
[Measures].[NbSejours]
,rgb(176,224,230)
,rgb(135,206,235)
,rgb(0,191,255)
,rgb(100,149,237)
,rgb(0,0,255)
,rgb(0,0,139)
,rgb(25,25,112)
), BACK_COLOR=currentCellValue()
SELECT
{
{[Measures].[NbSejours]}
,[Measures].[color]
} ON COLUMNS
,{
NonEmpty
(
[Etablissement].[Etablissement].[Etablissement].ALLMEMBERS
,[Measures].[NbSejours]
)
} ON ROWS
FROM
(
SELECT
{{[Periode].[Periode].[All-M].&[2013]}} ON 0
FROM [Cube]
)
CELL PROPERTIES
STYLE
,CLASSNAME
,VALUE
,FORMATTED_VALUE;
Is there a way to do that ?
InterpolateRGBColors expect a numerical between 0 and 1 for interpolation. So we need to scale our measure to ensure we get the right colors.
There is an example in our live demo , here.
What we need is to scale [Measures].[NbSejours] between 0,1. There are two no documented function in icCube DistributionFlat & DistributionRank.
A non efficient version
WITH
SET [AxisX] AS NonEmpty([Etablissement].Etablissement].Etablissement].ALLMEMBERS,[Measures].[NbSejours])
FUNCTION distr(x_) as DistributionFlat( [AxisX], [Measures].[NbSejours], x_ )
MEMBER [Measures].[color] AS
InterpolateRGBColors(
distr([Measures].[NbSejours])
,rgb(176,224,230)
,rgb(135,206,235)
,rgb(0,191,255)
,rgb(100,149,237)
,rgb(0,0,255)
,rgb(0,0,139)
,rgb(25,25,112)
), BACK_COLOR=currentCellValue()
....
Once I got a bit of time I'll write a version using Vectors (here and here) that is more performant as in the example above we calculate every time the values for the set.
Hope it helps
I don'r know icCube so the following might not work, even though I have used standard functions. As #George commented you can use the standard RANK function to find each members relative position.
You will need to feed that value into the definition of [Measures].[color]...
WITH
SET [estMembersOrdered] AS
ORDER(
[Etablissement].[Etablissement].[Etablissement].ALLMEMBERS
,[Measures].[NbSejours]
,BDESC
)
MEMBER [Measures].[rnkEtablissement] AS
RANK(
[Etablissement].[Etablissement].CURRENTMEMBER
, [estMembersOrdered]
)
MEMBER [Measures].[color] AS
InterpolateRGBColors(
[Measures].[NbSejours]
,rgb(176,224,230)
,rgb(135,206,235)
,rgb(0,191,255)
,rgb(100,149,237)
,rgb(0,0,255)
,rgb(0,0,139)
,rgb(25,25,112)
), BACK_COLOR=currentCellValue()
SELECT
{
{[Measures].[NbSejours]}
,[Measures].[color]
,[Measures].[rnkEtablissement]
} ON COLUMNS
,{
NonEmpty
(
[Etablissement].[Etablissement].[Etablissement].ALLMEMBERS
,[Measures].[NbSejours]
)
} ON ROWS
FROM
(
SELECT
{{[Periode].[Periode].[All-M].&[2013]}} ON 0
FROM [Cube]
)
CELL PROPERTIES
STYLE
,CLASSNAME
,VALUE
,FORMATTED_VALUE;

Resources