The documentation says:
Specifying for Individual Primitives
Options for individual primitives or groups of primitives are set by the primitive_options parameter of DFS. This parameter maps any desired options to specific primitives. In the case of conflicting options, options set at this level will override options set at the entire DFS run level, and the include options will always take priority over their ignore counterparts.
However, I see that this is not true and the ignore option actually takes precedence over the include counterpart.
Below is the set-up I will use to demonstrate the claimed behaviour. It is an entityset with one grandparent (gp), two parents (p1,p2) and one child (c) to one parent (p1):
import pandas as pd
import featuretools as ft
from featuretools import variable_types as vt
# # Creating Relational Dataset
# ## Grand Parent
df_gp = pd.DataFrame({'gp_ind':['a','b'],
'gp_ncol1':[1,2],'gp_ncol2':[3,4],
'gp_ccol1':['x','y'],'gp_ccol2':['p','q'],
'gp_time_col1':pd.to_datetime(['20-01-2020','20-01-2019']),
'gp_time_ind':pd.to_datetime(['20-01-2021','20-01-2020'])})
# ## Parent 1
df_p1 = pd.DataFrame({'p1_ind':['a1','a2','b1'],
'p1_id': ['a','a','b'],
'p1_ncol1':[1,2,3],'p1_ncol2':[3,4,5],
'p1_ccol1':['x','y','z'],'p1_ccol2':['p','q','r'],
'p1_id1' : ['t','t','u'],
'p1_time_col1':pd.to_datetime(['16-01-2020','11-12-2019','16-01-2019'],format="%d-%m-%Y"),
'p1_time_ind':pd.to_datetime(['15-01-2021','10-12-2020','15-01-2020'],format="%d-%m-%Y")})
# ## Parent 2
df_p2 = pd.DataFrame({'p2_ind':['a1_','a2_','b1_'],
'p2_id': ['a','a','b'],
'p2_ncol1':[1,2,3],'p2_ncol2':[3,4,5],
'p2_ccol1':['x','y','z'],'p2_ccol2':['p','q','r'],
'p2_time_col1':pd.to_datetime(['18-01-2020','13-12-2019','18-01-2019'],format="%d-%m-%Y"),
'p2_time_ind':pd.to_datetime(['17-01-2021','12-12-2020','17-01-2020'],format="%d-%m-%Y")})
# ## Child
df_c = pd.DataFrame({'c_ind':['a1_1','a1_2','a2_1','a2_2','a2_3','b1_1'],
'c_id': ['a1','a1','a2','a2','a2','b1'],
'c_ncol1':[1,2,3,4,5,6],'c_ncol2':[3,4,5,6,7,8],
'c_ccol1':['x','y','z','a','b','c'],'c_ccol2':['p','q','r','s','t','u'],
'c_time_col1':pd.to_datetime(['13-01-2020','10-12-2019','8-12-2019','5-11-2019','2-10-2019','13-01-2019'],format="%d-%m-%Y"),
'c_time_ind':pd.to_datetime(['10-01-2021','5-12-2020','9-12-2020','6-11-2020','3-10-2019','12-01-2020'],format="%d-%m-%Y")})
# # Creating Entityset
es = ft.EntitySet(id='experimentation')
# ## Adding entities
# ### Adding gp
vt_gp = {'gp_ind':vt.Index,
'gp_ncol1':vt.Numeric,
'gp_ncol2':vt.Numeric,
'gp_ccol1':vt.Categorical,
'gp_ccol2':vt.Categorical,
'gp_time_col1':vt.Datetime,
'gp_time_ind':vt.DatetimeTimeIndex}
es.entity_from_dataframe(entity_id='gp',dataframe=df_gp,index='gp_ind',variable_types=vt_gp,
time_index='gp_time_ind')
# ### Adding p1
vt_p1 = {'p1_ind':vt.Index,
'p1_id':vt.Id,
'p1_id1' : vt.Id,
'p1_ncol1':vt.Numeric,
'p1_ncol2':vt.Numeric,
'p1_ccol1':vt.Categorical,
'p1_ccol2':vt.Categorical,
'p1_time_col1':vt.Datetime,
'p1_time_ind':vt.DatetimeTimeIndex}
es.entity_from_dataframe(entity_id='p1',dataframe=df_p1,index='p1_ind',variable_types=vt_p1,
time_index='p1_time_ind')
# ### Adding p2
vt_p2 = {'p2_ind':vt.Index,
'p2_id':vt.Id,
'p2_ncol1':vt.Numeric,
'p2_ncol2':vt.Numeric,
'p2_ccol1':vt.Categorical,
'p2_ccol2':vt.Categorical,
'p2_time_col1':vt.Datetime,
'p2_time_ind':vt.DatetimeTimeIndex}
es.entity_from_dataframe(entity_id='p2',dataframe=df_p2,index='p2_ind',variable_types=vt_p2,
time_index='p2_time_ind')
# ### Adding c
vt_c = {'c_ind':vt.Index,
'c_id':vt.Id,
'c_ncol1':vt.Numeric,
'c_ncol2':vt.Numeric,
'c_ccol1':vt.Categorical,
'c_ccol2':vt.Categorical,
'c_time_col1':vt.Datetime,
'c_time_ind':vt.DatetimeTimeIndex}
es.entity_from_dataframe(entity_id='c',dataframe=df_c,index='c_ind',variable_types=vt_c,
time_index='c_time_ind')
# ## Adding Relationships
r_gp_p1 = ft.Relationship(es['gp']['gp_ind'],es['p1']['p1_id'])
r_gp_p2 = ft.Relationship(es['gp']['gp_ind'],es['p2']['p2_id'])
r_p1_c = ft.Relationship(es['p1']['p1_ind'],es['c']['c_id'])
es.add_relationships([r_gp_p1,r_gp_p2,r_p1_c])
# ## Create Cutoff Times
cutoff_times = df_gp.loc[:,['gp_ind','gp_time_ind']].copy(deep=True)
# ## add interesting values
es['p1']['p1_ccol1'].interesting_values = es['p1'].df['p1_ccol1'].unique()[0:1]
es['c']['c_ccol1'].interesting_values = es['c'].df['c_ccol1'].unique()[0:1]
# ## Add last time index
es.add_last_time_indexes()
# ## Plotting entityset
es.plot()
Now on this entityset I run the following dfs:
I include p1 in both ignore_entities and include_entities keys. This way, I convey conflicting commands to dfs about whether or not to include p1 entity in the feature creation process.
Expected behaviour: include_entities to override ignore_entities and variables on entity p1 should be made
Behaviour seen: ignore_entities overrides include_entities and variables on p1 are not made
agg_primitives = ['sum']
where_primitives = ['sum']
primitive_options = {}
primitive_options[('sum',)] = {}
primitive_options[('sum',)]['ignore_entities'] = ['p1']
primitive_options[('sum',)]['include_entities'] = ['p1']
features = ft.dfs(entityset=es,target_entity='gp', cutoff_time=cutoff_times,
agg_primitives=agg_primitives,features_only=True,max_depth=2,
where_primitives = where_primitives,
primitive_options=primitive_options,trans_primitives=[])
features
output:
[<Feature: gp_ncol1>,
<Feature: gp_ncol2>,
<Feature: gp_ccol1>,
<Feature: gp_ccol2>]
No feature made on p1 which goes against what is stated in the documentation
Am I missing something here or is the documentation actually wrong as I see it and I should understand that ignore_entities overrides include_entities
This was a bug, you can track the proposed fix here: https://github.com/alteryx/featuretools/pull/1518
I am using the blpapi package in R to download FX forward prices. In the formula I want to specify the setting to download forward prices as points or as outright prices. I have tried the following:
conn <- blpConnect()
sdate <- as.Date("1998-12-31")
edate <- Sys.Date()-1
vFWD <- c("EURAUD1M Curncy")
opts.daily <- c("periodicitySelection"="DAILY","nonTradingDayFillMethod"="PREVIOUS_VALUE","nonTradingDayFillOption"="NON_TRADING_WEEKDAYS")
opts.monthly <- c("periodicitySelection"="MONTHLY","nonTradingDayFillMethod"="PREVIOUS_VALUE","nonTradingDayFillOption"="NON_TRADING_WEEKDAYS")
opts.fwd <- c("FWD_CURVE_QUOTE_FORMAT"="OUTRIGHTS")
dfwd <- bdh(securities = vFWD, c("PX_LAST"), start.date = sdate, end.date = edate, options = opts.daily, overrides = opts.fwd, con = defaultConnection())
** for Java coding the answer is here: In Bloomberg API how do you specify to get FX forwards as a spread rather than absolute values?
Use "OUTRIGHT", not "OUTRIGHTS" as your override option value.
Error in UseMethod("QuinlanAttributes") :
no applicable method for 'QuinlanAttributes' applied to an object of class "logical"
I am getting this error whenever I am running a code. I have installed several packages but this error is keep on repitiating.
it seems that C50 does not accept BOOLEAN features.
you can simpliy drop that column or replace BOOLEAN to 0/1.
if "tdata$Windy" is the BOOLEAN feature, replace the value of it.
library(C50)
tdata = read.csv('play.csv', header = TRUE, sep = ",")
xdata <- data.frame(tdata$Outlook,tdata$Temperature, tdata$Humidity, tdata$Windy)
ydata <- tdata$Play
treeModel <- C5.0(x = xdata, y = ydata )
summary(treeModel)
It's old question but I had same issue today and realized it's was due to read_sav().
I solved applying haven::as_factor to columns that should be factors.
data <- read_sav("datafile.sav")
data <- mutate(data, across(ends_with("_fct"), haven::as_factor ))
I have run this topic modeling script two months ago SUCCESSFULLY, but it suddenly gives me an error message (in the last three lines).
post <- posterior(TM1, newdata = dtm[-c(1:20),]) #this script gives me an error message.
perplex <- perplexity(TM1, newdata = dtm[-c(1:20),]) #this script does not give me an error message.
Can anybody help me what is going on here? Please~~
=====================
library("tm")
library("slam")
library("topicmodels")
library("SnowballC")
corpus <- Corpus(DirSource(directory="/Users/loni/Documents/TextMining/test", encoding="UTF-8"))
dtm <- DocumentTermMatrix(corpus, control=list(stemming=TRUE, stopwords=TRUE, removePunctuation=FALSE))
term_tfidf <- tapply(dtm$v/row_sums(dtm)[dtm$i], dtm$j, mean) * log2(nDocs(dtm)/col_sums(dtm>0))
dim(dtm)
[1] 26 919
dtm <- dtm[, term_tfidf >= .06] # petition corpus
dtm <- dtm[row_sums(dtm) > 0,]
dim(dtm)
[1] 26 499
k<-5
SEED <- 2
TM <- list(VEM = LDA(dtm, k = k, control = list(seed = SEED)))
TM1 <- list(VEM = LDA(dtm[c(1:20),], k = k, control = list(seed = SEED))) #validation
Topic <- topics(TM[["VEM"]],1)
Terms <- terms(TM[["VEM"]], 8)
Terms[, 1:5]
post <- posterior(TM1, newdata = dtm[-c(1:20),])
Error in (function (classes, fdef, mtable) :
unable to find an inherited method for function ‘posterior’ for signature ‘"list", "DocumentTermMatrix"’
It could be because of wrong indexing of list. Try [[]] or [] on TM1
I had the same error today and found that the issue was because I had other packages loaded that conflicted. The easiest fix was to create a new session with a clear workspace, and rerun the script.
This answer to a similar question clued me in:
Unable to find an inherited method for function ‘select’ for signature ‘"data.frame"’
I ran JAGS with runjags in R and I got a giant list back (named results for this example).
Whenever I access results$density, two lattice plots (one for each parameter) pop up in the default quartz device.
I need to combine these with par(mfrow=c(2, 1)) or with a similar approach, and send them to the pdf device.
Nothing I tried is working. Any ideas?
I've tried dev.print, pdf() with dev.off(), etc. with no luck.
Here's a way to ditch the "V1" panels by manipulation of the Trellis structure:
p1 <- results$density$c
p2 <- results$density$m
p1$layout <- c(1,1)
p1$index.cond[[1]] <- 1 # remove second index
p1$condlevels[[1]] <- "c" # remove "V1"
class(p1) <- "trellis" # overwrite class "plotindpages"
p2$layout <- c(1,1)
p2$index.cond[[1]] <- 1 # remove second index
p2$condlevels[[1]] <- "m" # remove "V1"
class(p2) <- "trellis" # overwrite class "plotindpages"
library(grid)
layout <- grid.layout(2, 1, heights=unit(c(1, 1), c("null", "null")))
grid.newpage()
pushViewport(viewport(layout=layout))
pushViewport(viewport(layout.pos.row=1))
print(p1, newpage=FALSE)
popViewport()
pushViewport(viewport(layout.pos.row=2))
print(p2, newpage=FALSE)
popViewport()
popViewport()
pot of c.trellis() result http://img142.imageshack.us/img142/3272/ctrellisa.png
The easiest way to combine the plots is to use the results stored in results$mcmc:
# prepare data, see source code of "run.jags"
thinned.mcmc <- combine.mcmc(list(results$mcmc),
collapse.chains=FALSE,
return.samples=1000)
print(densityplot(thinned.mcmc[,c(1,2)], layout=c(1,2),
ylab="Density", xlab="Value"))
For instance, for the included example from run.jags, check the structure of the list using
sink("results_str.txt")
str(results$density)
sink()
Then you will see components named layout. The layout for the two plots of each variable can be set using
results$density$m$layout <- c(1,2)
print(results$density$m)
The plots for different parameters can be combined using the c.trellis method from the latticeExtra package.
class(results$density$m) <- "trellis" # overwrite class "plotindpages"
class(results$density$c) <- "trellis" # overwrite class "plotindpages"
library("latticeExtra")
update(c(results$density$m, results$density$c), layout=c(2,2))
output of c.trellis http://img88.imageshack.us/img88/6481/ctrellis.png
Another approach is to use grid viewports:
library("grid")
results$density$m$layout <- c(2,1)
results$density$c$layout <- c(2,1)
class(results$density$m) <- "trellis"
class(results$density$c) <- "trellis"
layout <- grid.layout(2, 1, heights=unit(c(1, 1), c("null", "null")))
grid.newpage()
pushViewport(viewport(layout=layout))
pushViewport(viewport(layout.pos.row=1))
print(results$density$m, newpage=FALSE)
popViewport()
pushViewport(viewport(layout.pos.row=2))
print(results$density$c, newpage=FALSE)
popViewport()
popViewport()
grid output http://img88.imageshack.us/img88/5967/grida.png