How to use extract the hidden layer features in H2ODeepLearningEstimator? - python-3.x

I found H2O has the function h2o.deepfeatures in R to pull the hidden layer features
https://www.rdocumentation.org/packages/h2o/versions/3.20.0.8/topics/h2o.deepfeatures
train_features <- h2o.deepfeatures(model_nn, train, layer=3)
But I didn't find any example in Python? Can anyone provide some sample code?

Most Python/R API functions are wrappers around REST calls. See http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/_modules/h2o/model/model_base.html#ModelBase.deepfeatures
So, to convert an R example to a Python one, move the model to be the this, and all other args should shuffle along. I.e. the example from the manual becomes (with dots in variable names changed to underlines):
prostate_hex = ...
prostate_dl = ...
prostate_deepfeatures_layer1 = prostate_dl.deepfeatures(prostate_hex, 1)
prostate_deepfeatures_layer2 = prostate_dl.deepfeatures(prostate_hex, 2)
Sometimes the function name will change slightly (e.g. h2o.importFile() vs. h2o.import_file() so you need to hunt for it at http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/index.html

Related

target_transform in torchvision.datasets.ImageFolder seems not to work

I am using PuyTorch 1.13 with Python 3.10.
I have a problem where I import pictures from a folder structure using
data = ImageFolder(root='./faces/', loader=img_loader, transform=transform,
is_valid_file=is_valid_file)
In this command labels are assigned automatically according to which subdirectory belongs an image.
I wanted to assign different labels and use target_transform for this purpose (e.g. I wanted to use a word from the file name to assign an appropriate label).
I have used
def target_transform(id):
print(2)
return id * 2
data = ImageFolder(root='./faces/', loader=img_loader, transform=transform, target_transform=target_transform, is_valid_file=is_valid_file)
Next,
data = ImageFolder(root='./faces/', loader=img_loader, transform=transform, target_transform=lambda id:2*id, is_valid_file=is_valid_file)
or
data = ImageFolder(root='./faces/', loader=img_loader, transform=transform, target_transform=
torchvision.transforms.Lambda(lambda id:2*id), is_valid_file=is_valid_file)
But none of these affect the labels. In addition, in the first example I included the print statemet to see whether the function is called but it is not. I have serached the use of this funciton but the exmaples I have found do not work and the documentation is scarce in this respect. Any idea what is wrogn with the code?

How to add stop words from Tfidvectorizer?

I am trying to add stop words into my stop_word list, however, the code I am using doesn't seem to be working:
Creating stop words list:
stopwords = nltk.corpus.stopwords.words('english')
CustomListofWordstoExclude = ['rt']
stopwords1 = stopwords.extend(CustomListofWordstoExclude)
Here I am converting the text to a dtm (document term matrix) with tfidf weighting:
vect = TfidfVectorizer(stop_words = 'english', min_df=150, token_pattern=u'\\b[^\\d\\W]+\\b')
dtm = vect.fit_transform(df['tweets'])
dtm.shape
But when I do this, I get this error:
FutureWarning: Pass input=None as keyword args. From version 0.25 passing these as positional arguments will result in an error
warnings.warn("Pass {} as keyword args. From version 0.25 "
What does this mean? Is there an easier way to add stopwords?
I'm unable to reproduce the warning. However, note that a warning such as this does not mean that your code did not run as intended. It means that in future releases of the package it may not work as intended. So if you try the same thing next year with updated packages, it may not work.
With respect to your question about using stop words, there are two changes that need to be made for your code to work as you expect.
list.extend() extends the list in-place, but it doesn't return the list. To see this you can do type(stopwords1) which gives NoneType. To define a new variable and add the custom words list to stopwords in one line, you could just use the built-in + operator functionality for lists:
stopwords = nltk.corpus.stopwords.words('english')
CustomListofWordstoExclude = ['rt']
stopwords1 = stopwords + CustomListofWordstoExclude
To actually use stopwords1 as your new stopwords list when performing the TF-IDF vectorization, you need to pass stop_words=stopwords1:
vect = TfidfVectorizer(stop_words=stopwords1, # Passed stopwords1 here
min_df=150,
token_pattern=u'\\b[^\\d\\W]+\\b')
dtm = vect.fit_transform(df['tweets'])
dtm.shape

How to find the arguments for torch.nn.conv_transpose2d and max_unpool2d?

In a given forward function of a convolutional layer I have:
def forward(self, x):
c = torch.nn.functional.conv2d(x)
a,i = torch.maxpool2d(c)
o = torch.relu(a)
return a,i
I'm looking to undo this function with torch.nn.functional.max_unpool2d and torch.nn.functional.conv_transpose2d.
So far I have (not entirely sure about this either):
a = torch.relu(o)
c = torch.nn.functional.max_unpool2d(a, i, kernel_size=c.size[1])
x = torch.nn.functional.conv_transpose2d(c,..)
My questions:
How come the call to conv2d doesn't require any other arguments than the one given?
How do I get the parameters for conv_transpose2d from only a and i (e. g. from their sizes)? Or is there some other way I'm not seeing?
Is there a default kernel_size conv2d uses?
I found a partial answer to my question:
I can access the parameters for conv_transpose2d and max_unpool2d by using e. g. torch.nn.functional.conv2d.kernel_size. The parameter names are stored in the model and can be retrieved via print(torch.nn.functional.conv2d.features).
Arguments for torch.nn.ConvTranspose2d are slightly different from those for torch.nn.functional.conv_transpose2d (Functional API).
Please refer to: https://pytorch.org/docs/stable/nn.functional.html#conv-transpose2d for functional API arguments.
torch.nn.ConvTranspose2d initializes the kernel using U[-sqrt(k), sqrt(k)].
On the other hand, you can use your custom (initialized) kernel in torch.nn.functional.conv_transpose2d.

decision tree in R- extract data from a specific branch

I am trying to build a classify decision tree using rpart and partykit, and I am wondering is there any function within those packages (or any packages, for that matter) to allow me to create a dataset containing data from a specific subtree or branch?
I know that I can manually create the subset from original data set with DT rules, but I am trying to automate certain process and finding that function will help me immensely.
Example:
library (rpart)
library(partykit)
data("Titanic", package = "datasets")
ttnc <- as.data.frame(Titanic)
ttnc <- ttnc[rep(1:nrow(ttnc), ttnc$Freq), 1:4]
names(ttnc)[2] <- "Gender"
rp <- rpart(Survived ~ Gender + Age + Class, data = ttnc)
prp <- as.party(rp)
prp[5]
Lets say that I wanna extract data from the subtree #5, is there any function within those packages that allow me to do that?
Thank you!
In addition to the solution posted by #JakobGepp you can use the data_party() function provided by partykit:
data_party(prp, id = 5)
Essentially, this does the same thing internally that Jakob did explicitly by hand.
I don't know if you meant this by using the DT rules, but you could use the predict() function of the partykit package to predict the node / branches and then split the data according to your subtree.
ttnc$Node <- predict(prp, newdata = ttnc, type = "node")
subtree <- subset(ttnc, Node == 5)

Why I get different values everytime I run the function hmmlearn.hmm.GaussianHMM.fit()

I have a program.
n = 6
data=pd.read_csv('11.csv',index_col='datetime')
volume = data['TotalVolumeTraded']
close = data['ClosingPx']
logDel = np.log(np.array(data['HighPx'])) - np.log(np.array(data['LowPx']))
logRet_1 = np.array(np.diff(np.log(close)))
logRet_5 = np.log(np.array(close[5:])) - np.log(np.array(close[:-5]))
logVol_5 = np.log(np.array(volume[5:])) - np.log(np.array(volume[:-5]))
logDel = logDel[5:]
logRet_1 = logRet_1[4:]
close = close[5:]
Date = pd.to_datetime(data.index[5:])
A = np.column_stack([logDel,logRet_5,logVol_5])
model = GaussianHMM(n_components= n, covariance_type="full", n_iter=2000).fit([A])
hidden_states = model.predict(A)
I run the code the first time ,the value of "hidden_states" is as follow,
I run the code the second time ,the value of "hidden_states" is as follow,
Why are two values "hidden_states" different?
I am not completely sure what happens here, but here're two possible explanations for the results you're seeing.
The model does not maintain any ordering over state labels. So state labelled as 1 in one run could end up being 4 in another run. This is known as label switching problem in latent variable models.
GaussianHMM initializes emission parameters via k-means which might converge to different values depending on the data. The initial parameters are passed to the EM-algorithm which is also prone to local maxima. Therefore different runs could result in different parameter estimates and (as a result) slightly different predictions.
Try to control the randomness by setting the seed and the random_state when you define your model. Moreover you could initialize the startprob_ and the transmat_ and see how it behaves.
That way you might have a better explanation about the cause of this behavior.

Resources