I intend to randomly sample a VARMA model but I cannot seem to see a function in statsmodels for this, I studied the example on the ARMA and can replicate this successfully for a 1 variable.
# for the ARMA
import numpy as np
from statsmodels.tsa.arima_model import ARMA
import statsmodels.api as sm
arparams=np.array([.9,-.7])
maparams=np.array([.5,.8])
ar=np.r_[1,-arparams]
ma=np.r_[1,maparams]
obs=10000
sigma=1
# for the VARMA
import numpy as np
from statsmodels.tsa.statespace.varmax import VARMAX
# generate a a 2-D correlated normal series
mean = [0,0]
cov = [[1,0.9],[0.9,1]]
data = np.random.multivariate_normal(mean,cov,100)
# fit the data into a VARMA model
model = VARMAX(data, order=(1,1)).fit()
`enter code here`
# I cant seem to find a way to randomly sample the VARMA
Results objects from fitting a VARMAX model have a simulate method which can be used to generate a random sample. For example:
mod = VARMAX(data, order=(1,1))
res = mod.fit()
# to generate a time series of length 100 following the VARMAX process described by `res`:
sample = res.simulate(100)
This is true of any state space model, including SARIMAX, UnobservedComponents, VARMAX, and DynamicFactor.
(Also, the model class has a simulate method. The main difference is that since model objects don't have associated parameter values, you need to pass a particular parameter vector in that case).
Related
I'm updating a pytorch network from legacy code to the current code. Following documentation such as that here.
I used to have:
import torch
from torchtext import data
from torchtext import datasets
# setting the seed so our random output is actually deterministic
SEED = 1234
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True
# defining our input fields (text) and labels.
# We use the Spacy function because it provides strong support for tokenization in languages other than English
TEXT = data.Field(tokenize = 'spacy', include_lengths = True)
LABEL = data.LabelField(dtype = torch.float)
from torchtext import datasets
train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)
import random
train_data, valid_data = train_data.split(random_state = random.seed(SEED))
example = next(iter(test_data))
example.text
MAX_VOCAB_SIZE = 25_000
TEXT.build_vocab(train_data,
max_size = MAX_VOCAB_SIZE,
vectors = "glove.6B.100d",
unk_init = torch.Tensor.normal_) # how to initialize unseen words not in glove
LABEL.build_vocab(train_data)
Now in the new code I am struggling to add the validation set. All goes well until here:
from torchtext.datasets import IMDB
train_data, test_data = IMDB(split=('train', 'test'))
I can print the outputs, while they look different (problems later on?), they have all the info. I can print test_data fine with next(train_data.
Then after I do:
test_size = int(len(train_dataset)/2)
train_data, valid_data = torch.utils.data.random_split(train_dataset, [test_size,test_size])
It tells me:
next(train_data)
TypeError: 'Subset' object is not an iterator
This makes me think I am not correct in applying random_split. How to correctly create the validation set for this dataset? Without causing issues.
Try next(iter(train_data)). It seems one have to create iterator over dataset explicitly. And use Dataloader when effectiveness is required.
I create a program that predict digits from in a dataset. I want when it predict data their should be two cases if it predict right then data should added automatically in dataset otherwise it takes right answer throw user and insert to dataset.
code
import numpy as np
import pandas as pd
import matplotlib.pyplot as pt
from sklearn.tree import DecisionTreeClassifier
data = pd.read_csv("train.csv").values
clf = DecisionTreeClassifier()
xtrain = data[0:21000,1:]
train_label=data[0:21000,0]
clf.fit(xtrain,train_label)
xtest = data[21000: ,1:]
actual_label=data[21000:,0]
d = xtest[9]
d.shape = (28,28)
pt.imshow(d,cmap='gray')
print(clf.predict([xtest[9]]))
pt.show()
I'm not sure I'm following your question, but if you want to distinguish between good and wrong predictions and take different ways, you should specific do that.
predictions = clf.predict(xtest)
good_predictions = xtest[pd.Series(predictions == actual_label)]
bad_predictions = xtest[pd.Series(predictions != actual_label)]
So, in good_predictions will be all the rows in xtest that where predicted right.
I have two different data sets. One for training my classifier and the other one is for testing. Both the datasets are text files with two columns separated by a ",". FIrst column (numbers) is for the independent variable (group) and the second column is for the dependent variable.
Training data set
(just few lines for example. there are no empty lines between each row):
EMI3776438,1
EMI3776438,1
EMI3669492,1
EMI3752004,1
Testing data setup
(as you can see, i have picked data from the training data to be sure that the score surely can't be zero)
EMI3776438,1
Code in Python 3.6:
# #all the import statements have been ignored to keep the code short
# #loading the training data set
training_file_path=r'C:\Users\yyy\Desktop\my files\python\Machine learning\Carepack\modified_columns.txt'
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
training_file_data = pandas.read_table(training_file_path,
header=None,
names=['numbers','group'],
sep=',')
training_file_data = training_file_data.apply(le.fit_transform)
features = ['numbers']
x = training_file_data[features]
y = training_file_data["group"]
from sklearn.model_selection import train_test_split
training_x,testing_x, training_y, testing_y = train_test_split(x, y,
random_state=0,
test_size=0.1)
from sklearn.naive_bayes import GaussianNB
gnb= GaussianNB()
gnb.fit(training_x, training_y)
# #loading the testing data
testing_final_path=r"C:\Users\yyy\Desktop\my files\python\Machine learning\Carepack\testing_final.txt"
testing_sample_data=pandas.read_table(testing_final_path,
sep=',',
header=None,
names=['numbers','group'])
testing_sample_data = testing_sample_data.apply(le.fit_transform)
category = ["numbers"]
testing_sample_data_x = testing_sample_data[category]
# #finding the score of the test data
print(gnb.score(testing_sample_data_x, testing_sample_data["group"]))
First, the above data samples dont show how many classes are there in it. You need to describe more about it.
Secondly, you are calling le.fit_transform again on test data which will forget all the training samples mappings from strings to numbers. The LabelEncoder le will start encoding the test data again from scratch, which will not be equal to how it mapped training data. So the input to GaussianNB is now incorrect and hence incorrect results.
Change that to:
testing_sample_data = testing_sample_data.apply(le.transform)
UPDATE:
I'm sorry I overlooked the fact that you had two columns in your data. LabelEncoder only works on a single column of data. For making it work on multiple pandas columns at once, look at the answers of following question:
Label encoding across multiple columns in scikit-learn
If you are using the latest version of scikit (0.20) or can update to it, then you would not need any such hacks and directly use the OrdinalEncoder:
from sklearn.preprocessing import OrdinalEncoder
enc = OrdinalEncoder()
training_file_data = enc.fit_transform(training_file_data)
And during testing:
training_file_data = enc.transform(training_file_data)
I want to initialize a 4*11 matrix using glorot uniform in Keras, using following code:
import keras
keras.initializers.glorot_uniform((4,11))
I get this output :
<keras.initializers.VarianceScaling at 0x7f9666fc48d0>
How can I visualize the output? I have tried c[1] and got output 'VarianceScaling' object does not support indexing.
The glorot_uniform() creates a function, and later this function will be called with a shape. So you need:
# from keras.initializers import * #(tf 1.x)
from tensorflow.keras.initializers import *
unif = glorot_uniform() #this returns a 'function(shape)'
mat_as_tensor = unif((4,11)) #this returns a tensor - use this in keras models if needed
mat_as_numpy = K.eval(mat) #this returns a numpy array (don't use in models)
print(mat_as_numpy)
I would like to export decision tree using sklearn.
First I trained a decision tree classifier:
self._selected_classifier = tree.DecisionTreeClassifier()
self._selected_classifier.fit(train_dataframe, train_class)
self._column_names = list(train_dataframe.columns.values)
After that I used the following method in order to export the decision tree:
def _create_graph_visualization(self):
decision_tree_classifier = self._selected_classifier
from sklearn.externals.six import StringIO
dot_data = StringIO()
tree.export_graphviz(decision_tree_classifier,
out_file=dot_data,
feature_names=self._column_names)
import pydotplus
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("decision_tree_output.pdf")
After many errors regarding missing executables now the program is finished successfully.
The file is created, but it is empty.
What am I doing wrong?
Here is an example with output which works for me, using pydotplus:
from sklearn import tree
import pydotplus
import StringIO
# Define training and target set for the classifier
train = [[1,2,3],[2,5,1],[2,1,7]]
target = [10,20,30]
# Initialize Classifier. Random values are initialized with always the same random seed of value 0
# (allows reproducible results)
dectree = tree.DecisionTreeClassifier(random_state=0)
dectree.fit(train, target)
# Test classifier with other, unknown feature vector
test = [2,2,3]
predicted = dectree.predict(test)
dotfile = StringIO.StringIO()
tree.export_graphviz(dectree, out_file=dotfile)
graph=pydotplus.graph_from_dot_data(dotfile.getvalue())
graph.write_png("dtree.png")
graph.write_pdf("dtree.pdf")