CNTK Sequence 2 Sequence Tutorial : placeholder_variable initialization - python-3.x

I am new to CNTK and was following seq2seq tutorial of CNTK.
Inside the LSTM_layer function, there's following code :
dh = placeholder_variable(shape=(output_dim), dynamic_axes=input.dynamic_axes)
dc = placeholder_variable(shape=(output_dim), dynamic_axes=input.dynamic_axes)
LSTM_cell = LSTM(output_dim)
f_x_h_c = LSTM_cell(input, (dh, dc))
h_c = f_x_h_c.outputs
Now, in LSTM_Cell(input,(dh,dc)):what's the value for dh and dc?
I don't find them getting initialized anywhere when LSTM_layer function is called.

If you see a few lines below, you will find that the placeholders are replaced. At the time of model creation you may not have all the values needed but know the shape of the data you will need for that function to work. You create placeholders (containers) for those variables. Before executing the function, these values are replaced with variables that hold values to be computed.
replacements = { dh: h.output, dc: c.output }
f_x_h_c.replace_placeholders(replacements)

Related

How to add stop words from Tfidvectorizer?

I am trying to add stop words into my stop_word list, however, the code I am using doesn't seem to be working:
Creating stop words list:
stopwords = nltk.corpus.stopwords.words('english')
CustomListofWordstoExclude = ['rt']
stopwords1 = stopwords.extend(CustomListofWordstoExclude)
Here I am converting the text to a dtm (document term matrix) with tfidf weighting:
vect = TfidfVectorizer(stop_words = 'english', min_df=150, token_pattern=u'\\b[^\\d\\W]+\\b')
dtm = vect.fit_transform(df['tweets'])
dtm.shape
But when I do this, I get this error:
FutureWarning: Pass input=None as keyword args. From version 0.25 passing these as positional arguments will result in an error
warnings.warn("Pass {} as keyword args. From version 0.25 "
What does this mean? Is there an easier way to add stopwords?
I'm unable to reproduce the warning. However, note that a warning such as this does not mean that your code did not run as intended. It means that in future releases of the package it may not work as intended. So if you try the same thing next year with updated packages, it may not work.
With respect to your question about using stop words, there are two changes that need to be made for your code to work as you expect.
list.extend() extends the list in-place, but it doesn't return the list. To see this you can do type(stopwords1) which gives NoneType. To define a new variable and add the custom words list to stopwords in one line, you could just use the built-in + operator functionality for lists:
stopwords = nltk.corpus.stopwords.words('english')
CustomListofWordstoExclude = ['rt']
stopwords1 = stopwords + CustomListofWordstoExclude
To actually use stopwords1 as your new stopwords list when performing the TF-IDF vectorization, you need to pass stop_words=stopwords1:
vect = TfidfVectorizer(stop_words=stopwords1, # Passed stopwords1 here
min_df=150,
token_pattern=u'\\b[^\\d\\W]+\\b')
dtm = vect.fit_transform(df['tweets'])
dtm.shape

Python: Why my module didn't instantiate an object via exec() statement?

I created my own version of GridSearchCV module from sklearn.model_selection library. My version includes iterating through each parameter one by one instead of looking for all possible combinations. For example for a SVR model, if we have three parameters defined as follows:
{
'gamma' : np.arange(0.0, 1.0, 0.1),
'C': np.arange(1, 10, 1),
'epsilon': np.arange(0.0, 1.0, 0.1)
}
The algorithm would in the first turn find one best gamma coefficient (out of ten). Then it moves to assigning C parameter with given value of gamma. After ten iterations it moves to epsilon and assigns optimal epsilon value with given set of [gamma, C] parameters. This gives us in total 30 combinations to check instead of 1000 (10*10*10).
I'd like to import my opt_grid_search object into my projects, like below:
from own_udf_functions import show_description, opt_grid_search
The code of the object begins with dynamic statement that creates object that is going to be optimized:
exec_string = 'opt_object = ' + object_name + '(' + def_params + ')'
which returns for example:
opt_object = SVR(kernel = 'rbf')
However, when I try to use the code in another script as below:
opt_grid_search(object_name, params_set, X_train, y_train, X_test, y_test,
cross_val = 2, def_params = def_params)
following error appears:
*File "C:\Users\Marek\Desktop\Python\Github\Kernele\Kaggle Competitions\own-udf-
functions\own_udf_functions.py", line 40, in opt_grid_search
opt_object.fit(X_train,y_train)
NameError: name 'opt_object' is not defined*
It seems that opt_grid_search function didn't execute the following line of code:
opt_object = SVR(kernel = 'rbf')
and the object named opt_object wasn't actually created.
I think it has to do with classes, but I would like to ask you to help me better understand what actually happened in this error. I think it is a crucial knowledge that would help me a lot write more 'pythonic' codes instead of defining all of the functions in every single code.
Secondly, please let me know if such optimization makes sense as well or is it needed for the GridSearch to go through all possible combinations.
I tried to keep this description as short as possible, however if you would like to see / need it for the reference, my code is accessible below:
https://github.com/markoo26/own-udf-functions
The issue here is the exec function and the namespace/scope in which it operates. I'm struggling to get my head around this myself but essentially exec() doesn't work for assignment used inside a function in this way. The easiest workaround is to use eval() instead which explcitly returns an object. So end up with something like:
exec_string = object_name + '(' + def_params + ')'
opt_object = eval(exec_string)

How to use extract the hidden layer features in H2ODeepLearningEstimator?

I found H2O has the function h2o.deepfeatures in R to pull the hidden layer features
https://www.rdocumentation.org/packages/h2o/versions/3.20.0.8/topics/h2o.deepfeatures
train_features <- h2o.deepfeatures(model_nn, train, layer=3)
But I didn't find any example in Python? Can anyone provide some sample code?
Most Python/R API functions are wrappers around REST calls. See http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/_modules/h2o/model/model_base.html#ModelBase.deepfeatures
So, to convert an R example to a Python one, move the model to be the this, and all other args should shuffle along. I.e. the example from the manual becomes (with dots in variable names changed to underlines):
prostate_hex = ...
prostate_dl = ...
prostate_deepfeatures_layer1 = prostate_dl.deepfeatures(prostate_hex, 1)
prostate_deepfeatures_layer2 = prostate_dl.deepfeatures(prostate_hex, 2)
Sometimes the function name will change slightly (e.g. h2o.importFile() vs. h2o.import_file() so you need to hunt for it at http://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/index.html

Abaqus Python script -- Reading 'TENSOR_3D_FULL' data from *.odb file

What I want: strain values LE11, LE22, LE12 at nodal points
My script is:
#!/usr/local/bin/python
# coding: latin-1
# making the ODB commands available to the script
from odbAccess import*
import sys
import csv
odbPath = "my *.odb path"
odb = openOdb(path=odbPath)
assembly = odb.rootAssembly
# count the number of frames
NumofFrames = 0
for v in odb.steps["Step-1"].frames:
NumofFrames = NumofFrames + 1
# create a variable that refers to the reference (undeformed) frame
refFrame = odb.steps["Step-1"].frames[0]
# create a variable that refers to the node set ‘Region Of Interest (ROI)’
ROINodeSet = odb.rootAssembly.nodeSets["ROI"]
# create a variable that refers to the reference coordinate ‘REFCOORD’
refCoordinates = refFrame.fieldOutputs["COORD"]
# create a variable that refers to the coordinates of the node
# set in the test frame of the step
ROIrefCoords = refCoordinates.getSubset(region=ROINodeSet,position= NODAL)
# count the number of nodes
NumofNodes =0
for v in ROIrefCoords.values:
NumofNodes = NumofNodes +1
# looping over all the frames in the step
for i1 in range(NumofFrames):
# create a variable that refers to the current frame
currFrame = odb.steps["Step-1"].frames[i1+1]
# looping over all the frames in the step
for i1 in range(NumofFrames):
# create a variable that refers to the strain 'LE'
Str = currFrame.fieldOutputs["LE"]
ROIStr = Str.getSubset(region=ROINodeSet, position= NODAL)
# initialize list
list = [[]]
# loop over all the nodes in each frame
for i2 in range(NumofNodes):
strain = ROIStr.values [i2]
list.insert(i2,[str(strain.dataDouble[0])+";"+str(strain.dataDouble[1])+\
";"+str(strain.dataDouble[3]))
# write the list in a new *.csv file (code not included for brevity)
odb.close()
The error I get is:
strain = ROIStr.values [i2]
IndexError: Sequence index out of range
Additional info:
Details for ROIStr:
ROIStr.name
'LE'
ROIStr.type
TENSOR_3D_FULL
OIStr.description
'Logarithmic strain components'
ROIStr.componentLabels
('LE11', 'LE22', 'LE33', 'LE12', 'LE13', 'LE23')
ROIStr.getattribute
'getattribute of openOdb(r'path to .odb').steps['Step-1'].frames[1].fieldOutputs['LE'].getSubset(position=INTEGRATION_POINT, region=openOdb(r'path to.odb').rootAssembly.nodeSets['ROI'])'
When I use the same code for VECTOR objects, like 'U' for nodal displacement or 'COORD' for nodal coordinates, everything works without a problem.
The error happens in the first loop. So, it is not the case where it cycles several loops before the error happens.
Question: Does anyone know what is causing the error in the above code?
Here the reason you get an IndexError. Strains are (obviously) calculated at the integration points; according to the ABQ Scripting Reference Guide:
A SymbolicConstant specifying the position of the output in the element. Possible values are:
NODAL, specifying the values calculated at the nodes.
INTEGRATION_POINT, specifying the values calculated at the integration points.
ELEMENT_NODAL, specifying the values obtained by extrapolating results calculated at the integration points.
CENTROID, specifying the value at the centroid obtained by extrapolating results calculated at the integration points.
In order to use your code, therefore, you should get the results using position= ELEMENT_NODAL
ROIrefCoords = refCoordinates.getSubset(region=ROINodeSet,position= ELEMENT_NODAL)
With
ROIStr.values[0].data
You will then get an array containing the 6 independent components of your tensor.
Alternative Solution
For reading time series of results for a nodeset, you can use the function xyPlot.xyDataListFromField(). I noticed that this function is much faster than using odbread. The code also is shorter, the only drawback is that you have to get an abaqus license for using it (in contrast to odbread which works with abaqus python which only needs an installed version of abaqus and does not need to get a network license).
For your application, you should do something like:
from abaqus import *
from abaqusConstants import *
from abaqusExceptions import *
import visualization
import xyPlot
import displayGroupOdbToolset as dgo
results = session.openOdb(your_file + '.odb')
# without this, you won't be able to extract the results
session.viewports['Viewport: 1'].setValues(displayedObject=results)
xyList = xyPlot.xyDataListFromField(odb=results, outputPosition=NODAL, variable=((
'LE', INTEGRATION_POINT, ((COMPONENT, 'LE11'), (COMPONENT, 'LE22'), (
COMPONENT, 'LE33'), (COMPONENT, 'LE12'), )), ), nodeSets=(
'ROI', ))
(Of course you have to add LE13 etc.)
You will get a list of xyData
type(xyList[0])
<type 'xyData'>
Containing the desired data for each node and each output. It size will therefore be
len(xyList)
number_of_nodes*number_of_requested_outputs
Where the first number_of_nodes elements of the list are the LE11 at each nodes, then LE22 and so on.
You can then transform this in a NumPy array:
LE11_1 = np.array(xyList[0])
would be LE11 at the first node, with dimensions:
LE.shape
(NumberTimeFrames, 2)
That is, for each time step you have time and output variable.
NumPy arrays are also very easy to write on text files (check out numpy.savetxt).

Why I get different values everytime I run the function hmmlearn.hmm.GaussianHMM.fit()

I have a program.
n = 6
data=pd.read_csv('11.csv',index_col='datetime')
volume = data['TotalVolumeTraded']
close = data['ClosingPx']
logDel = np.log(np.array(data['HighPx'])) - np.log(np.array(data['LowPx']))
logRet_1 = np.array(np.diff(np.log(close)))
logRet_5 = np.log(np.array(close[5:])) - np.log(np.array(close[:-5]))
logVol_5 = np.log(np.array(volume[5:])) - np.log(np.array(volume[:-5]))
logDel = logDel[5:]
logRet_1 = logRet_1[4:]
close = close[5:]
Date = pd.to_datetime(data.index[5:])
A = np.column_stack([logDel,logRet_5,logVol_5])
model = GaussianHMM(n_components= n, covariance_type="full", n_iter=2000).fit([A])
hidden_states = model.predict(A)
I run the code the first time ,the value of "hidden_states" is as follow,
I run the code the second time ,the value of "hidden_states" is as follow,
Why are two values "hidden_states" different?
I am not completely sure what happens here, but here're two possible explanations for the results you're seeing.
The model does not maintain any ordering over state labels. So state labelled as 1 in one run could end up being 4 in another run. This is known as label switching problem in latent variable models.
GaussianHMM initializes emission parameters via k-means which might converge to different values depending on the data. The initial parameters are passed to the EM-algorithm which is also prone to local maxima. Therefore different runs could result in different parameter estimates and (as a result) slightly different predictions.
Try to control the randomness by setting the seed and the random_state when you define your model. Moreover you could initialize the startprob_ and the transmat_ and see how it behaves.
That way you might have a better explanation about the cause of this behavior.

Resources